AMQP Lulz

created: 1303302163|%e %B %Y, %H:%M
TAGS: notcooldude

When the AMQP workgroup publishes a final AMQP/1.0, in 2011 or 2012 or whenever, you may get a shock if you've been using AMQP before.

Your familiar model of exchanges, queues, and bindings, is gone. In its place are "links". You will need to rewrite your code. If you wrote an API stack, throw it out and start again. If you wrote an AMQP tutorial, start again. If you wrote any code that uses the old AMQP model, throw it out and start again.

When JPMorganChase and RedHat introduced their new proposals for AMQP/1.0 in September 2008, it was without consensus from other participants on the problems, and without running code. iMatix and others argued that it was a bad idea to throw out the exchange-queue-binding model. We argued that too many people (such as iMatix and all our users) would have to rewrite too much code. We had already pointed to many things to fix in AMQP, such as the binary command framing which ensured fragmentation of the protocol. These initial AMQP/1.0 discussions were conducted off-list, the rationale being that the AMQP/1.0 draft was private and could not be discussed in public. Only later were we able to discuss aspects of this development in public.

So I can only point you to discussions in January 2009 where participants were still asking "what were the use cases that made the refactoring the exchange/binding/queue model necessary?" We didn't know what was broken about AMQP/0.9.1, those pushing AMQP/1.0 never clearly explained that.

You still will not find a historical document that clearly identifies the issues with AMQP/0.9.1 or earlier versions, and why AMQP/1.0 is the best solution to those issues.

In May 2010, however, a public email from John O'Hara, chair of the working group and initial author of the AMQP/1.0 design, finally explained what happened (note: this email thread has since been purged from the archives). What was happening was: servers at JPMorganChase were crashing (or getting very hard to manage) due to slow subscribers. JPMC and RedHat sat down and figured out a way to fix this. Their fix was to throw out the old model and make a new one.

It comes down to this: Links seem very strange coming from an Exchange/Binding point of view.
So, why do Links exist at all?
They arise from a meeting back in JPM in London between Rob, Rafi, me and Carl in a dark room in the bowels of 60 Victoria Embankment.
Links were created to allow a declarative solution to how events should be delivered to multiple consumers while *admitting* (to use your parlance) multiple approaches to buffering pending messages for slower consumers.
To do pubsub in 0-91, Exchanges would route a copy of each message (pointer in reality) into a private reply queue for each subscribed client. The private reply queue acted as the buffer where pending messages were stored on the broker prior to the client picking them up. If there were 1000 subscribers on an exchange, there were 1000 reply queues. This works fine where consumers keep up with the message flow.
However, deployment experience at Red Hat and JPMorgan noted an annoying issue with this. If there were a lot of slow subscribers, a lot of state was built up on the broker. Using pointers to messages in the implementation, memory is not really an issue. But operational staff had complained that thay had no way of looking at the "topic" and seeing relatively where each client was in catchup up to the event stream. For these situations the users wanted to see a single topic buffer with multiple cursors onto it, one for each subscriber. Then operational staff can see where each subscriber is relative to the size of the event stream.
So, we have had a quandry. Both approaches i) an exchange routing messages into multiple output buffers and ii) a single buffer with multiple pointers are completely valid. Both can be desirable depending on the application, and both are semantically identical.
The Exchange/Binding language we had in 0-91 pretty much forced implementors to implement i). So we sought a more declarative way of expressing "client wants selective copy of data from source" — which is at the heart of what a "Link" is. Now an implementation can choose to fulfil that request by translating it into exchanges and bindings, or it can do it with a single queue with multiple tail pointers. It's kind of similar to how "SELECT foo FROM bar WHERE baz" doesn't tell you how a database will process the search, or even if its coming from a real table.

That's it.

An implementation issue (how to stop servers running out of memory and crashing) will force every single AMQP user and developer to throw away their code and start again.

Think about this for a second. I'm not going to rant about how programmers should not be trying to write protocols. I'll just note that the problem is how to handle slow consumers, and the solution is "throw away AMQP and start again".

Yes, AMQP/1.0 makes more changes than this, but that's the one which will hurt everyone the most.

I've no idea what was going through John's mind when he made this decree. Engineering for its own sake, maybe. As for RedHat, their approach to AMQP has been consistent from the start: change, command, control. RedHat want to own AMQP and the AMQP market and forcing through radical, catastrophic changes in the protocol is an excellent way to get this. RedHat's patenting of AMQP aspects such as XML-based routing is consistent with this long term strategy. No-one in that room defended the investment in AMQP/0.8 (which became AMQP/0.9.1), because that modest spec stood in the way of The Ultimate and Glorious Protocol (That We Shall Get Bloody Rich From).

Ironically, in January 2009 we also referred to our solution to exactly the same problem. We published that solution, called Direct Mode, in December 2008, on the AMQP wiki. When your server crashes because it holds too many messages for slow subscribers, don't change your protocol. Just push those messages out to the subscribers. Don't hold private queues on central servers! It's not rocket science, it's a basic theory of message queueing: queue as close as you can to the consumer. You only need to hold shared queues centrally, and these are great at handling slow consumers, that's what they're made for.

Direct Mode made our OpenAMQ broker run four times faster as well, by allowing a simpler message envelope, and aggressive batching. We rolled this out at DowJones & Company, where OpenAMQ happily pushes data around for the Dow Jones Industrial Average, and other DJ indices. Direct Mode does not require a new AMQP, it is a simple extension protocol that's fully backwards compatible.

But that's kind of history for us. We decided some time ago that AMQP was sinking and unsaveable. Sure, some large firms will use it, but the open source community will not forgive the insult of random protocol changes happening for no good reason. iMatix is, in this respect, just a classic open source team: small, meritocratic, and with a long memory. And 0MQ is just much simpler and in the end, does much the same work as AMQP. Only, faster and with less stress. Of course 0MQ puts queues right beside the consumer, where they belong, nicely fed by asynchronous I/O.

So if you're using AMQP today, take a peek at the latest AMQP/1.0 draft, and ask the AMQP working group to justify the changes. Ask this simple question: why?

Then, when you get an answer that makes your head spin, or you get no answer at all, hop over to http://www.zeromq.org and see what messaging can look like when it's done right.

Bookmark and Share

Rate this post:

rating: +2+x

Comments: 4


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License