Finite State Machine Parsers

Back when I was programming network code in C, one of my favorite strategies was to to implement protocol processing as a finite state machine. I found the design was always easier to understand and modify, often by simply changing an entry in the state table. I found I could often use it to simplify non-networking code too.

Robert David Graham and Peter C. Johnson have a very nice research report on Finite State Machine Parsing for Internet Protocols: Faster Than You Think. They claim that with a little hand tuning that takes the instruction pipeline and L1 cache into account, they can produce FSM parsers that out perform alternative methods. The paper doesn’t show more than a few tidbits of code but they do describe their general approach in detail.

I really enjoyed reading the paper and if you like getting deep into backend coding, you probably will too. They even talk about the Aho-Corasick algorithm1, one of my favorites.

As I said, a nice paper and well worth a read. If you don’t know how to use FSMs in your programming, it’s worth taking an hour or two to investigate the technique. It will pay you tremendous dividends every time you can use it.

Footnotes:

1

Aho-Corasick is the algorithm behind fgrep that searches text for several strings at the same time.

This entry was posted in Programming and tagged . Bookmark the permalink.