In June 2016, Netflix researchers tried out a DDoS attack against itself. Security engineer Scott Behrens ran an infrastructure test on the streaming service as part of a presentation to coworkers. They watched Behrens take the site down; but instead of panic, the Netflix team celebrated. Behrens, along with cloud security engineer Jeremy Heffner and others, were excited to be able to demonstrate that Netflix was vulnerable to an unusual kind of DDoS attack. Proving that this was the case was the first step toward preventing it happening again, not only for Netflix, but also for the wider Internet.
Netflix is already built to withstand a massive amount of traffic: over 35TB per second of data during peak hours. It has a network of Open Connect devices, which localizes most of its traffic; so aiming a botnet at Netflix would almost certainly fail, even the most recent ones, reaching previously unheard of figures like 1.7TB.
The attack Behrens successfully attempted works instead by targeting Netflix’s application programming interface (API) with precisely chosen requests; essentially turning its own API against itself. The queries are constructed in such a way that they initiate a cascade within the middle and backend application layers that the service is built on – triggering the need for more and more resources as they ricochet across the infrastructure. Attackers need only send out a small amount of malicious data, but if done in the right way, could cause internal disruptions or at its worst, a complete crash of the site.
Application DDoS attacks such as this are unusual, but not unheard of. A recent report from Akamai on the State of the Internet said that they amount to less than 1 percent of all DDoS attacks. However, Behrens said that Netflix’s application security team needs to always try and stay two steps ahead of the attackers, so these kinds of possibilities merit examination. Furthermore, many other companies also an “API gateway” microservices architecture, by which a small portal connects users to a massive array of services underneath. If attackers started to expand this kind of attack, Netflix is not the only company that would be vulnerable to it.
Behrens’ advice on prevention of this kind of attack? Primarily, he suggests more careful monitoring of traffic deep inside your infrastructure. Typically, middle-tier and backend service traffic and behavior are monitored at a more casual level than the top-end traffic. Behrens also advocates for tools, which can help decode behavior patterns and distinguish good traffic from bad.
Last year, Netflix released two open source codes: Repulsive Grizzly and Cloudy Kraken, to do just that; allowing developers to run their own small-scale testing after they identify possible vulnerabilities to this kind of attack.
“The combination of those things has really raised the bar for causing this sort of issue against the product,” Behrens says. “A lot of the mitigations that I discuss definitely did hold true, but we have to be humble and realize that there’s always going to be something that might pop up. It’s a cat and mouse game, so we just continue to try to find ways to make our testing more sophisticated and then build in stronger remediations.”