Today I'm dropping a fresh Mongrel2 that features a completely redesigned connection management algorithm that uses a bad ass Finite State Machine to keep everything straight. This state machine will make it possible for Mongrel2 to keep-alive connections no matter what kind of backend is requested and allow for developers to inject their own filters on the events that manage connections. This release also features a first (hackish) working HTTP->0MQ protocol based on SCGI and a simple demo.
I'm very excited about the connection state machine because it means that in addition to the original highly accurate Mongrel HTTP parser, I've now got a connection state system that's just as accurate. It will let Mongrel2 support bizarre proxy configurations, keep-alive state, any kind of backends, HTTP long poll operations, JSSocket, and WebSockets.
As I worked on it this last week I started stumbling into extra features I got for free with this design. HTTP long poll and keep-alive from HTTP->0MQ are probably the two sexiest. Thanks to this design, long polling isn't a special feature, it's just how things work. It also means that HTTP->0MQ has the ability to do N:M processing similar to the current chat demo but using plain HTTP.
What I'm most excited about is how, since the state machine is controlled by simple integer events through a fast Ragel FSM, that means people can write filters based on the events and the callbacks. In the same way people grabbed the Mongrel HTTP Parser and used it to build web servers, this new connection state machine will let you extend Mongrel2's internal connection state processing to meet whatever hairy problems you meet.
First up, I got HTTP->0MQ going. It's gross, but it works. If you go to this test you can see it in action. What this does is take your HTTP request, translate it to SCGI, and then hand it to a 0MQ backend. That backend is then just echoing back your headers and such, so it doesn't do much. Here's the code for the handler (which is a total hack):
import zmq
import time
sender_id = "82209006-86FF-4982-B5EA-D1E29E55D481"
ctx = zmq.Context()
reqs = ctx.socket(zmq.SUB)
reqs.setsockopt(zmq.SUBSCRIBE, "")
reqs.connect("tcp://127.0.0.1:9997")
resp = ctx.socket(zmq.PUB)
resp.connect("tcp://127.0.0.1:9996")
resp.setsockopt(zmq.IDENTITY, sender_id)
class Request(object):
def __init__(self, ident, headers, body):
self.ident = ident
self.headers = headers
self.body = body
def parse_request(msg):
ident, rest = msg.split(' ', 1)
length, rest = rest.split(':', 1)
length = int(length)
headers = rest[0:length]
headers = headers.split('\01')[:-1]
headers = dict(zip(headers[::2], headers[1::2]))
body = rest[length+1:]
return Request(ident, headers, body)
while True:
req = parse_request(reqs.recv())
response = "\nIDENT:%r\nHEADERS:%r\nBODY:%r%" % (req.ident, req.headers, req.body)
print response, "\n"
resp.send(req.ident + " HTTP/1.1 200 OK\r\nContent-Length: %d\r\n\r\n%s" % (
len(response), response))
As you can see, it is not doing much, and it is not doing it that well. In fact, all of this will change because what I really want is a more unified 0MQ transport that will work between different backends better. This is just to get the feature out the door and try it.
The surprising part is that, because of the state machine design, the HTTP connection is also in keep-alive mode inside Mongrel2, and the backend can send data to any currently connected browser.
I got long polling for free. Bad ass.
I personally think all these keep-alive hacks are pathetic, considering how huge of a hack they are in almost every other web server there is. In other web servers, getting long polling up and running is like a major holiday with cake and ice cream. In Mongrel2, this kind of asynchronous multiple response keep-alive operation is like...Tuesday.
Remember though, this is a total hack right now. It's going to get and cleaned up quite a lot, but the fact that the feature works better than I even planned is really fun.
To test out that your connection is in keep-alive mode, go hit the test page and then refresh real quick a few times. See how your ident number stays the same? That's actually your socket number inside Mongrel2, so when it changes your browser reconnected. I think there's a socket leak somewhere, but I'll fix that soon. The key point is that your connections are maintained as long as possible for the most speed.
When I first got proxying going I had a bug because I wasn't maintaining connection state properly. If you went to the proxy test and then hit the file serve test page you would randomly get a page from the proxy backend. The reason was keep-alives. I was just doing a simple proxying where once a request came in, I parsed the first HTTP header, figure out where it should go, and then held on for death shuttling everything between the browser and the backend.
That's just bad design because HTTP has many different states it has to be in and you need to switch between them without disrupting the browser's connection. If you proxy, and then a request comes in for some other resource, you can't use the proxy to get it, you have to get it separately. You also can't keep connections to proxies open forever, since you can overload them and cause problems. Browsers also flake out so you need to know exactly when to shut things down, and when not to send requests.
After thinking about it, I thought I'd try using a Ragel State Chart similar to how I did connection state with Utu back in the day. The way this works is instead of writing code with lots of if-statements for every possible edge case, you define three things:
Keeping control of complex state like in an HTTP connection is dead easy in a state machine, assuming you can get one defined. The problem with and FSM is that it forces you to sit down and really get what you want to happen straight. You can't half-ass an if-statement or switch because the FSM has to be complete or it won't work. This makes them harder to use at first, but much easier to deal with once you've defined them and simplified them.
Right away as I worked on this design it found bugs and gave me "magic" features for free. For example, because the connection state is designed so it continually processes requests and knows exactly when the socket is closed, Mongrel2 can do keep-alives even when the backend doesn't. This means a browser making an HTTP request doesn't have to care that it goes to a 0MQ handler, it just keeps the connection open.
When I wrote the HTTP->0MQ support, it sort of worked right away with keep-alives. Browser would connect, 0MQ would respond, and then the next request just kept using the same connection. That means....I got long polling for free. To test it out I did different tests where one browser tossed messages to others using the basic HTTP->0MQ support.
It was easy, and no special gear was needed, unlike with other implementations.
Debugging turned out to be simple too. The unit tests just look like this:
// Simulates doing a basic HTTP request then closing the connection.
RUN(http_dir,
OPEN, ACCEPT,
REQ_RECV, HTTP_REQ, DIRECTORY, RESP_SENT, CLOSE);
// Simulates two keep-alive handler requests then a close.
RUN(http_handler,
OPEN, ACCEPT,
REQ_RECV, HTTP_REQ, HANDLER, REQ_SENT,
REQ_RECV, HTTP_REQ, HANDLER, REQ_SENT, CLOSE);
To make sure the FSM works, I just have tests that feed it the events for different situations.
Since the FSM can log every state transition and why it's doing what it does, I can also pinpoint failures and figure out what should happen next.
Overall, getting this FSM right is much easier than using other methods, and should support future changes easily.
To give you an idea of how the Mongrel2 state machine works, here's how it runs one:
void Connection_task(void *v)
{
Connection *conn = (Connection *)v;
int i = 0;
int next = 0;
State_init(&conn->state, &CONN_ACTIONS);
for(i = 0, next = OPEN; next != CLOSE; i++) {
next = State_exec(&conn->state, next, (void *)conn);
check(next > EVENT_START && next < EVENT_END, "!!! Invalid next event[%d]: %d", i, next);
}
State_exec(&conn->state, CLOSE, (void *)conn);
return;
error:
State_exec(&conn->state, CLOSE, (void *)conn);
return;
}
This is the whole thing in C. All this does is initialize the state machine, and then loop getting events and feeding each one back into the state machine. Once it gets a 0 or a CLOSE event it drops out and finishes up.
Of course what this is running is much bigger, but the end result wasn't that much bigger than the code when I started. It's around 800 lines of code:
$ wc -l src/connection.c src/state.rl src/state_machine.rl 567 src/connection.c 129 src/state.rl 64 src/state_machine.rl 760 total
It will get bigger, but the fact that this redesign is about the same as the previous design but has full keep-alives, correct proxying, directory serving, and functioning HTTP->0MQ while the other code could barely proxy is awesome.
Alright, so apparently I've got this thing I feed integers into and it does stuff. How does this impact a developer who has to use it? My design idea is that the state machine will help developers on three levels using Mongrel2:
Take a look at the image of the state machine which is generated by Ragel. Try to see if you can figure out what might happen at different conditions while Mongrel2 is running. Maybe you can see what happens when a connection to a proxy is interrupted by a request for a backend HANDLER?
Nobody is expecting you to refer to this diagram as you use Mongrel2, but imagine if you needed to find out why something is failing. Right now your web server is a total black box. You got no idea what's going on unless you turn on some obtuse insane logging mode.
This image is actually how the state machine flows, and it can be generated from the code so you know what's going on. The code however is also pretty readable and understandable, take a look:
Proxy := (
start: (
CONNECT @proxy_deliver -> Sending |
FAILED @proxy_failed -> Closing
),
Proxying: (
HTTP_REQ @proxy_deliver -> Sending |
PROXY @proxy_exit_routing |
HANDLER @proxy_exit_routing |
DIRECTORY @proxy_exit_routing |
REMOTE_CLOSE @proxy_close -> Closing
),
Sending: (
REQ_SENT @proxy_parse -> Proxying |
REMOTE_CLOSE @proxy_close -> Closing
),
Closing: (
CLOSE @proxy_exit_idle
)
) <err(error);
Connection = (
start: ( OPEN @open -> Accepting ),
Accepting: ( ACCEPT @parse -> Idle ),
Idle: (
REQ_RECV @identify_request HTTP_REQ @route_request -> HTTPRouting |
REQ_RECV @identify_request MSG_REQ @route_request -> MSGRouting |
REQ_RECV @identify_request SOCKET_REQ @send_socket_response -> Responding |
CLOSE @close -> final
),
MSGRouting: ( HANDLER @msg_to_handler -> Queueing ),
HTTPRouting: (
HANDLER @http_to_handler -> Queueing |
PROXY @http_to_proxy |
DIRECTORY @http_to_directory -> Responding |
CLOSE @close -> final
),
Queueing: ( REQ_SENT @parse -> Idle ),
Responding: (
RESP_SENT @parse -> Idle |
CLOSE @close -> final
)
) %eof(finish) <err(error);
Despite the slightly odd syntax, hopefully you could figure out what causes transitions, what callbacks go off, and what states the machine can get in during processing.
The first advantage is very clear: It will be possible to give people direct control and fast debugging of connections based on the events that Mongrel2 is processing. For example:
Right now the Mongrel2 I'm running is actually logging the hell out of everything, and I can see what connections are in keep-alive, where they're proxied, who's closed, when they get closed, the works.
This design will also hopefully end the debates about how a web server should work. In other servers they have to incrementally tweak and bolt on features to support newly found edge cases. With this design, we'll be able to pinpoint changes based on the FSM, and figure out how to support new edge cases inside it, or even exactly why we should reject the identified edge case.
Because the event processing is just a sequence of integers, it'll be possible for you to manage Mongrel2's events on the fly in situations where you need emergency control. Imagine being able to say something along the lines of:
Crap! We're overloaded, filter all HTTP_REQ so that HANDLER, PROXY, and DIRECTORY are sent to the maintenance handler.
In current parlance that's setting up a maintenance page, but in other servers you have to configure some arbitrary file, then figure out the path magic that makes it happen, and then put the file there when you want, oh and on all the different servers.
With Mongrel2, I'm hoping you could just send out a command that says the above using just the event names, and have it affect all the machines.
There aren't very many events either, here's all of them so far:
ACCEPT=101,
CLOSE=102,
CONNECT=103,
DIRECTORY=104,
FAILED=105,
HANDLER=106,
HTTP_REQ=107,
MSG_REQ=108,
OPEN=110,
PROXY=111,
REMOTE_CLOSE=112,
REQ_RECV=113,
REQ_SENT=114,
RESP_SENT=116,
SOCKET_REQ=117,
TIMEOUT=118,
This is a bit of hand waving, since I'm not sure how the hell a command language with these would look, but I know they'd work great.
Imagine you've written a web framework, and it leaks memory like the Titanic leaks ice water. Imagine also that this web framework leaks memory because the language you chose to use is written by a bunch of hack hobbyists who had many bugs in their garbage collector and arrays, but denied there were any bugs at all. There was no way you'd ever fix this memory leak, and you can't tell everyone your super popular web framework is built on a sand pit, so you need a backup plan.
You need your web server to keep your framework alive using some kind of logic like this:
All joking aside, I'm aiming Mongrel2 at this kind of stupidity I call "the rails effect". This is where the arrogance of the framework requires that the operations people have to suffer through supporting whatever hair brained crap they think up to keep from having to actually fix their broken gear.
With Mongrel2, the plan is to allow you to create modules like say in Apache, but that these modules:
The idea is that you have the events mentioned above, so you can register callbacks that are fed the events before the state machine gets them. That'll let you alter them as you need. Maybe you want to authenticate people before they go to any proxies? Filter the PROXY event and change it to redirect if they haven't authenticated.
This idea is fairly powerful because the configuration could be dead simple. You just say what events the module gets, and maybe for what handlers. It is also a basic primitive for implementing other features like page caching, memcache caching, security, conditional serving, and all using one common concept: the event.
Events are nice and all, but you might also need to have things that happen deeper in the server, like you want to change out the protocol used to talk to Handlers because you like BIRT. Well, here's all of the calllbacks the state machine uses:
StateActions CONN_ACTIONS = {
.open = connection_open,
.error = connection_error,
.finish = connection_finish,
.close = connection_close,
.parse = connection_parse,
.identify_request = connection_identify_request,
.route_request = connection_route_request,
.send_socket_response = connection_send_socket_response,
.msg_to_handler = connection_msg_to_handler,
.http_to_handler = connection_http_to_handler,
.http_to_proxy = connection_http_to_proxy,
.http_to_directory = connection_http_to_directory,
.proxy_deliver = connection_proxy_deliver,
.proxy_failed = connection_proxy_failed,
.proxy_parse = connection_proxy_parse,
.proxy_close = connection_proxy_close
};
It would be possible to let special modules "wrap" these calls, either altering them directly, or completely swapping them out. They all have a consistent call structure and are mostly fairly small, averaging about 10-20 lines of C code. With this you could inject SSL encryption at certain stages, secure wipes, client certificate checks, whatever you need to control the server.
However, all of this cold also be safer than in other web servers because the state machine is completely defined. You'd know right away when your module caused problems, and why it fails, and assuming you didn't nuke the process, Mongrel2 could keep on trucking. Especially if the module is using 0MQ to do unix socket communication.
The gist of it all is that by having Mongrel2's connection state and main processing easily controlled by a few callbacks and some events, I can expose that safely to deployments that need it.
This new design is showing a lot of promise, but I need to cycle on it and see if I can simplify it more while adding the remaining features. I also need to figure out the module system and exactly how you'd configure them in the sqlite3 database. The nice thing configuration is mostly just using the events so it'll be cake. The bad thing is I'm not sure how to make these events usable yet, and maybe I'll just punt for a bit.
I also need to work on the generic unixy things like forking, daemonizing, chroot, etc. That's fairly easy but it's getting close to time for needing it.
Finally, I gotta work on a Python driver for this so that I can quit writing this hack job Python code to make it work.
If you've got comments, shoot me an email or come hang out in #mongrel2 on irc.freenode.org (since that will probably stay up after the chat demo dies).