AI is coming for your C code and it does not need coffee

Published on Apr 3, 2026 in research, voip security, webrtc security

Heads up

A shorter version of this piece was first published in the RTCSec Newsletter, March 2026. If you’re not already subscribed, you can sign up here.

TL;DR

AI agents can now autonomously find zero-day vulnerabilities in large C codebases. RTC projects (Kamailio, Asterisk, pjsip, rtpengine, coturn) are directly in the path. We’ve seen this in our own work. Here’s what it means for the people who build and secure these systems.

The claim, and why we believe it

Thomas Ptacek published a piece last week titled “Vulnerability Research Is Cooked” and I think every RTC developer should read it. His argument is simple: AI coding agents have gotten good enough at finding vulnerabilities that the economics of security research have changed permanently. Not in five years. Now.

Ptacek points to Nicholas Carlini’s work at Anthropic’s Frontier Red Team, and Carlini presented the details at Unprompted Con. What struck me about the methodology is how almost comically simple it is. You run Claude in a virtual machine, give it full permissions, and prompt it like it’s playing a CTF. To handle large codebases, you just add a hint pointing at each source file and loop through the project. That’s it. No elaborate fuzzing harnesses, no months of tooling work, no complex scaffolding.

And the results? More than 500 high-severity vulnerabilities, a collaboration with Mozilla in which Claude discovered 22 Firefox vulnerabilities over two weeks, and a Linux kernel heap buffer overflow in the NFS v4 daemon that had been sitting unnoticed since 2003. That last one is worth pausing on: the bug required two cooperating adversarial clients, the kind of subtle multi-party state interaction you would never find through traditional fuzzing. The model found it anyway.

We cannot claim this came as a total surprise. We have been running our own experiments at Enable Security and have seen similar results. The capability is real, and it works on the kinds of codebases we deal with every day.

Why RTC codebases are especially exposed

The VoIP and WebRTC infrastructure that most of us depend on is almost entirely C and C++: Kamailio, Asterisk, pjsip, rtpengine, coturn, FreeSWITCH. Large, mature projects with decades of history, complex state machines, multi-protocol session handling, and the kind of subtle memory management issues that LLMs are apparently very good at pattern-matching.

Think about what a typical RTC media server has to juggle: ICE, DTLS, SRTP, and RTP all multiplexed on the same port, session setup and teardown racing against each other, complex interactions between signaling and media layers. That complexity is precisely where subtle bugs hide. And now there’s a tool that is very patient, does not need sleep, and will loop through every source file without getting bored.

What we noticed is that some of the evidence is already public and explicit. In our March 2026 newsletter, three Firefox WebRTC CVEs were credited to a team including Carlini “using Claude from Anthropic”. Anthropic separately says its Mozilla collaboration led Claude to discover 22 Firefox vulnerabilities in two weeks. We also covered AISLE, an AI-native security startup whose autonomous analyzer has been finding real Firefox WebRTC and OpenSIPS vulnerabilities. By contrast, the March 2026 pjproject advisories, Chrome 146’s four WebRTC bugs, and coturn’s reversed password check are, from the public record, suggestive of the broader trend but not confirmed examples of AI-assisted discovery. That distinction matters.

The timeline problem

One detail from Carlini’s talk deserves attention: AI capability for this kind of vulnerability research is doubling roughly every four months. Only the newest models can do this reliably today. In about a year, the model running on your laptop will likely be just as capable.

So this is not a “watch this space” situation. Carlini mentioned in passing that he currently has several hundred Linux kernel crashes he has not had time to validate yet. Let that sink in for a moment. The bottleneck has already shifted from finding vulnerabilities to handling them. Projects that already struggle with security triage (and let’s be honest, most do) will face a significantly higher volume of valid, reproducible, high-severity findings within 12 months. Are they ready for that? I am not sure the answer is yes.

What can RTC project maintainers do?

So, this begs the question: what can projects actually do about this?

The honest answer is that there is no magic fix, but some things matter more than they did before.

Triage capacity is now the constraint. Discovery is becoming cheap. The ability to receive a valid bug report, understand it, reproduce it, and ship a fix? That is not cheap. Projects that invest in this (clear security contacts, sensible disclosure policies, maintainers who can act quickly on incoming reports) will be in a much better position than those that don’t.

Faster release cycles help. A fix that takes three months to land in a tagged release does not help the people running your software. If a finding is serious, ship it quickly.

Automated fuzzing and CI security checks raise the floor. They catch the obvious stuff before researchers do, which means you’re not triaging low-hanging fruit alongside the serious findings. Do keep in mind though that fuzzing a single parser in isolation is no longer enough. The bugs AI finds tend to involve the whole flow: state transitions across protocol layers, a response being constructed from request data, interactions between session setup and teardown. That’s where most of the interesting bugs (should) live now, not in the parser alone.

One thing we are not going to advocate here is memory-safe rewrites for large, established projects like Kamailio or Asterisk. The rewrite process introduces new bugs. Not all vulnerabilities are memory-related anyway: protocol logic flaws, authentication bypasses, and configuration issues do not go away because you’ve switched languages. For projects of this scale and complexity, a rewrite is a multi-year effort with its own security risks. Perhaps the long-term direction is memory-safe languages for new projects, but that does not help anyone today.

So where does this leave human researchers?

So where does this leave us carbon-based security researchers? The honest answer is uncomfortable.

AI already exceeds most humans at finding common C vulnerability classes: buffer overflows, use-after-free issues, integer overflows, and the like. For the majority of security researchers, the competition arrived without enough heads up.

For now, RTC-specific domain knowledge still matters. Understanding which protocol interaction is actually dangerous in a real deployment. Knowing that a DTLS fingerprint bypass in a media server carries a different threat model than the same class of bug elsewhere. Understanding what an attacker can actually do with a finding, versus what just looks alarming on paper. That kind of judgment takes real familiarity with how these systems work in practice.

Carlini’s own words: “current models are already better vulnerability researchers than I am, and in a year, they will likely be better than everyone.” That’s specifically about vulnerability research, but I don’t think it stops there. Whether AI will exceed the best human experts across most security domains seems a matter of when, not if. The RTC-specific domain knowledge edge is real for now. We are also clear-eyed that it will narrow. Perhaps sooner than any of us would like.

At Enable Security, we are adapting our methodology. The value of what we do is shifting toward the attack surface that requires deep RTC domain knowledge: protocol-level issues, configuration problems, the interactions between components that require real understanding of how VoIP and WebRTC deployments behave under adversarial conditions. That edge is meaningful today. If you are looking for a VoIP penetration test, that domain expertise is still very much where the value is. We’ve been incorporating AI into our own security work for a while now. There are still confidentiality constraints on what you can throw at a model, but that gap is closing. Where we can use it, we do.

The practical question for the RTC community is not whether this is coming. It is whether projects are ready for a world where finding vulnerabilities in their code is cheap and abundant. Right now, the honest answer is mostly no. That is worth taking seriously.

Luca Carettoni from Doyensec put it well: