By Zed A. Shaw

Mongrel2 Almost Valgrind Pure, Bstring Happy

I managed to get Mongrel2 to serve files, although there's some bugs with how it works. This means that Mongrel2 is pretty close to being a full fledged server, minus all the usual edge case features like handling stupid J2ME clients that send chunked-encoding when they aren't supposed to (it's for servers morons). It also has bugs in how the proxying is done (which I'll explain in another post) and various other little things.

My next whack of work was to tighten up the HTTP processing so that it can handle all the bizarre ways that browsers use it, but before I did that I had some work to do.

You see, while Mongrel2 on Monday could served lots of requests without crashing, it wasn't really too tidy. I'd been slacking off in my usual C coding purity and just using all the stupid C string functions that invite buffer overflows. And while I'd been running mongrel2 under Valgrind during its entire development, I was playing fast and loose with the memory management.

It was time to get serious. Starting with this commit I finally set out to eliminate almost all use of the evil string functions using the bstring library. The bstring library is a heavily tested and very well designed library for working with strings. It gives you all the usual stuff you'd get in a modern language, and in a package that keeps things secure. I like this library because it doesn't try to become The One True String Replacement. Instead, you use it for your internal data structures, and then easily interface it to other libraries that already expect regular C strings. It's not totally foolproof, but it is a big help.

I was also determined to get as close as possible to valgrind purity in the unit tests. This meant using another favorite library of mine, halloc. With halloc you can solve the problem of cleaning up deeply nested data structures without resorting to pooling or GC. It's not as flexible as a GC, but in a server it works great. Servers are fairly predictable so managing their memory comes down to making sure things get deleted based on their structure at the right time. With halloc this is easy with minimal overhead.

Yak Shaving The Boehm GC

This was my plan, but before I did that commit, I tried out the Boehm GC to see what impact it would have. I knew that if I wanted to use the GC I'd have to convert to it before trying to use bstring, and I couldn't use halloc. Since converting to bstring was much harder than trying the Boehm GC I went with a quick experiment and worked the GC in.

Now, weenies who code with "soft" languages have been brainwashed into thinking C is like The Tumtum Tree where if you hang out there long enough something's gonna eat you. Despite all the evidence that C is quite alright for many applications, and that no, the world doesn't end because of its use, people are afraid of it. Their professor lamented its lack of purity, and say it's a horrible language, yet none of their professor's favorite languages are even close to as capable or useful.

I use C in servers because C is the best language for it. Sockets work exactly the way they're supposed to. Memory is predictable, and with Valgrind not a danger. Code is actually smaller and easier to follow as long as you stick to a few guidelines. Throw in a library like bstring and you've got yourself a great system for creating systems.

Yet, still people will lament the lack of a GC. I'll tell you, every time I've written a server using a GC its memory footprint is gigantic compared to the same thing in C with regular memory operations. It's also harder to tune up the memory usage with a GC. I wanted to test this out again though, just in case things had improved.

The result of my experiment is that no, the Boehm GC does not win for a server. First, Mongrel2 is using a fast ucontext based coroutine system which the Boehm GC absolutely hated. Switching stacks on the GC was like setting its house on fire and then shooting it as it ran out the front door in its lingerie. The GC would start trolling the "stack" looking for fresh RAM to free up and then kaboom stack would switch and the GC would explode.

To figure it out further, I quickly wrote a "mini server" that didn't use the coroutines. Just a simple loop cranking on sockets, nothing fancy. With no GC and just plain old memory usage, Mongrel2 used about 2MB to 4MB of RAM to handle requests (depending on the platform). With the GC, it started at 19MB, and then climbed to 50MB and had huge stalls when the GC stopped the world. Switching the GC to incremental collection killed the performance by a good 10-20%, which I guess is better than massive stalls, but still.

Since the GC couldn't work with the coroutines it was out no matter what, but my tests were an indicator of things I've known for a while. In a web application, you definitely want GC and lots of help managing resources. Time's too short to manage memory yourself in PHP, Python, or Ruby web applications.

In a web server you want tight, hardcore, wizard level memory management that keeps the thing light. Yes you can get a host with massive amounts of RAM, but using tons of memory is a sign of bad design and code in a well known piece of software like a web server.

The bstring And Valgrind Effort

If you look at the Mongrel2 timeline you can see I started the massive rip out of C strings for bstrings on Monday. I went to a cafe and slowly chipped away at all the strings using this handy regex I got from the and-httpd project. And-httpd is a web server I admire since it is written to be very secure, although I find its design very annoying.

Anyway, the regex is this:

[^_.>a-zA-Z0-9](str(n?cpy|n?cat|xfrm|n?dup|str|pbrk|tok|_)|stpn?cpy|r?index[^.]|a?sn?printf|byte_)

This is very handy and it's part of the Mongrel2 Makefile. I exclude a few false positive files, but overall it works great to find suspect stuff.

Another trick is that bstring comes with a bsafe.c file. If you compile your program with this, then it hijacks usage of any of the bad string functions and aborts when you use them. I also compile with this and it caught any remaining usages.

Then, I just went through the code and knocked them down one by one. I started at the edges, where it was easiest, and then kept refactoring, altering the tests, checking with valgrind, and moving until the whole code base was using bstring everywhere possible.

In the process I also found a couple bugs in bstring which I'll hopefully send in as a patch. Mostly just memory leaks on certain error conditions.

Next up was Valgrind purity. To do this, I just altered the premake4 build so that it ran Valgrind when it ran the tests. In C it's easiest to write your tests as individual little programs that test one thing. That way any memory errors are localized to the part that really causes it. With valgrind running each test and logging results to files, it was easy to hunt down possible leaks and squash them.

I just kept knocking down all the memory leaks and errors I could. Now there's none except in the configuration system, which will be next once I've finalized it and how server reconfigures will work.

A big part of getting the memory usage down and removing the leaks was using halloc. The halloc library simply wraps malloc with a linked list of related chunks of RAM. You can then attach arbitrary RAM to their related data structures, and then in on operation remove them all. It's not quite as fast as plain malloc, but it for long living nested structures it's great.

The End Result

From now on Mongrel2 will avoid using any unsafe C string functions, and try to be as valgrind pure as possible. Valgrind purity is mostly a vanity metric that helps spot memory errors and bugs you normally wouldn't find without Valgrind.