Fuzzing is like searching for a needle in haystack, and testing via a network interface has unavoidable latency that limits test case throughput.  We can work within this limitation by intelligently designing the fuzzer to explore boundary conditions instead of all possible values.  We take a problem that is O(2^n) in the number of bits in our target message, and we transform it into a problem that is O(2^n) in the number of interesting values in our grammar.  All we’ve done is lowered the constant of proportionality.

Unfortunately, there’s no way around the fundamental combinatorial nature of the problem.  There are other ways to play with the constant of proportionality, however.  Testing software via a network interface is limited by a number of factors including:

  • The responsiveness of the software (interrupt driven or polling?)
  • Context switching of threads
  • Network stack overhead
  • Network interface overhead

The many context switches are the real killer.  Even testing software on the loop-back interface within the OS kills performance.  The fastest fuzzer/target systems we tested in Project Robus were able to maintain several hundred test cases/second.  This sounds pretty fast, but when you consider the shear combinatorial size of an input frame of 20+ bytes, it gets you nowhere without case limiting smart fuzzing.  Pure randomness just doesn’t cut it at these speeds.

If we have access to the source code (or we’re a developer on a project), we can do brute force fuzz testing in-memory. By this, I mean that we can run the fuzzer and the target code on the same thread, and even on multiple threads simultaneously.  I’ve been working on re-writing the parser for opendnp3 using a better architecture and a lot of code generation.  I decided to throw together a purely random fuzzer to see what kind of speeds I could hit.  The code is here.

The application takes command line inputs for the number of iterations to run and the maximum size of the frame to throw at the parser. It then spins up a thread for each core of the system it’s running on, each with a different random seed.

The results were impressive.  On a quad-core i7 I was able to sustain 3,500,000 test cases/sec with the following conditions:

  • random frame size in the interval [1, 100]
  • uniform* distribution of frame sizes
  • random bytes [0, 0xFF]
  • recording tabulated error codes in counters

3.5e6 test cases per second is about 15,000x faster than the smart fuzzer over a network interface.  Each test cases is much less intelligent (dirt stupid), but who cares when I’m running 4 orders of magnitude faster.  This would be incredibly cool to combine with a genetic algorithm and cache tainting feedback from the binary.

Obviously, this only works if you have the code and can write specialized test cases.  If you’re developing a network protocol that has modular parsers at different layers, why wouldn’t you do this? It has a really high bang to buck ratio, and will likely find different types of bugs than a smart fuzzer would.

The really interesting questions in mind are things like what optimal probably distributions are both for the frame length and the random data fill.  This is probably somewhat protocol specific.  Since programmers tend to make mistakes at integer boundaries, you might want to fill the data with an inverse normal distribution that emphasizes values like [0x00, 0x01 .. 0xEF, 0xFF].  There tend to be a lot of repeats for short frames, so you might want to use a probability distribution with more weight given to longer frames.  All questions I have, but answers I have not.  Anyone know of any papers on the subject?