Friday, February 24, 2012

WebLogic JMS Performance Tuning Series, Part 3: Consumer-Side Message Pipelining


Following Part 1 and Part 2 of this series, this entry is more in line with Part 2 as it is a setting that is particularly useful in lower Quality-of-Service, higher volume environments.
Reusing the diagram from Part 2, consider the basic structure of the average message send():
For a moment, let’s consider what it would look like if the WebLogic JMS server had to wait for a consumer that does not allow a backlog, and confirm the receipt of each message individually.
Note that, for both participants, the waiting constitutes a significant portion (if not the majority) of time. The longer the roundtrip, the more pronounced this effect is. This results in a low utilization of both the consumer and the JMS server. We frequently don’t want this behavior – it’s slower, and the usual manner of compensating for it is to add more consumer threads.
Consider the one-way sends from Part 2: The underlying trick is to remove the waiting, in favor of accepting a diminishment of message quality of service. Message pipelining (called “Maximum Messages per Session” in the WebLogic Administration Console) setting is a JMS configuration for the messaging consumer’s connection factory that is somewhat similar, as it may require a tradeoff for added performance.

Turn on the Speed: MessagesMaximum

Maximum messages per session (MessagesMaximum for short or in the configuration XML file), like many phrases or words with “Max” in them, can make a pretty spectacular performance gain in certain scenarios and is always welcome at the best of parties.
You can find this setting in the connection factory “Client” tab in the WebLogic Administration Console.
One of the interesting things about message pipelining is that the setting works a little differently based on your JMS client type. The main purpose of message pipelining is to lower the amount of time the client spends waiting, and increase the ratio of time that is spent by the JMS server transmitting messages. The message pipeline (also referred to as a “message backlog”) is created by sending more than one message to a consumer prior to receiving an acknowledgement.

Case 1: Asynchronous Client

If you’re using an MDB or otherwise using a client with an onMessage() method and implementing a MessageListener interface, you are using an asynchronous client. When you are using an asynchronous client, the “Maximum Messages per Session” setting applies to the message pipeline on the consumer side.
The message pipeline is in effect when the consumers unable to take messages off of the destination of the JMS server as fast as the messages are put there by the producers. Until production is faster than consumption, individual messages are received by the consumer from the JMS server in a two-way send. When production outpaces consumption, messages begin to be sent in batches to available, asynchronous consumer sessions.

The batch style of message delivery from the messaging server provides both the performance benefit of lowering the number of two-way sends and generally having messages more immediately available for consumption by the onMessage() method.
A potential downside is that the message pipeline very clearly affects memory consumption on the JMS consumer side, so getting optimal performance with this setting may be a balancing act if heap consumption becomes a concern on the consumer. If the pipeline is too large, you might wind up with one consumer overwhelmed by a huge backlog of messages when the other consumers are doing nothing.
There are a few behaviors to consider prior to implementing message pipelines for asynchronous consumers:
  • Messages in the pipeline will not be in the destination’s configured sort order. This isn’t surprising – if the messages have already left the server, the server isn’t going to be sorting messages that are now on the client. The messages are sorted, however, prior to being sent in batch to the client.
  • The message pipeline is sometimes sent as a single T3 message, which makes it easier to go over the MaxT3MessageSize. Generally this is more of a concern with larger messages (> 1MB), but it depends on your pipeline size setting and the average message size.

Case 2: Synchronous Client & Prefetch Mode

If you are receiving messages with receive() (or receive(long timeout), receiveNoWait()), you’re receiving synchronously. The consumer makes a two-way call to the JMS server to see if there is a message available, and retrieves it, if possible – it’s a polling behavior. If there is no message available, the call’s thread blocks for the specified time, waiting for the next message on the destination to arrive.
This is the behavior for synchronous clients unless “Prefetch Mode for Synchronous Consumers” is enabled. You can find this in the Administration Console, under your Connection Factory settings in the “Client” tab.
Like the asynchronous client message pipelining behavior, synchronous clients with Prefetch Mode enabled receive batches of messages when the client invokes the receive() method. The number specified in “Maximum Messages per Session” will apply here, as well. Despite the batches of messages that are sent to your client JVM, the receive() method returns messages individually. De-batching takes place in the code provided in the WebLogic client libraries – so no de-batching is needed in the user-written consumer / subscriber code.
As with the asynchronous client, performance improves if pipelining results in the consumer spending less time waiting and more time processing. There is also the added benefit that, since the consumer generally receives more than one (possibly many more) message per polling attempt, which reduces the amount of polling necessary and, therefore, overall network traffic. Overall, the trend is towards higher consumer utilization.
Pipelining works differently with user transactions (XA). It also behaves differently when more than one consumer shares the same session. I invite you to read the docs on this. They state that User Transactions (XA) will either silently ignore the Prefetch Mode setting, or the consumer will fail to retrieve the message and generate an exception (the same applies to multiple consumers on the same session). The docs didn’t clarify this adequately for my purposes, so I will expound a bit after having experimented with this on WebLogic. Keep in mind that these are just my findings, and not official aims or requirements of the product.

Synchronous Clients, User Transactions and Session Sharing

WebLogic implicitly disables Prefetch Mode / pipelining for the rest of the session when:
  1. Using an XA-enabled connection factory, the first receive() on a non-transacted session is a part of a User Transaction (XA).
  2. Multiple consumers are created in the same session prior to calling the first receive().
Otherwise, pipelining for a synchronous client is enabled for the rest of the session upon the first receive() when there is a single consumer on the session, presuming Prefetch Mode is enabled in the connection factory settings.
Knowing when pipelining is enabled or disabled is imperative to understand what conditions produce exceptions. If pipelining is already enabled for a session, and you perform one of the two conditions that would have caused it to be disabled (back when the session was newly created), you’ll get an exception.

When Do I Use it and How?

I think the question is not so much, “When do I use it?” as much as it is “When do I turn it off?” The performance advantage, presuming adequate producer-side performance, is significant. Presuming you don’t have a strict need involving message sorting, there isn’t much downside as long as you are using asynchronous consumers. Even using synchronous consumers, where transactions or consumer session sharing *might* (but shouldn’t, if you’ve read this blog) impact your usage.
The primary questions to ask on whether or not to enable pipelining, in general, are:
  • Is the utilization on my consumers currently low? Am I currently creating extra consumers to compensate for consumption rate in the presence of the low client utilization?
  • Are the JMS producers getting throttled or otherwise hitting quota because message consumption isn’t happening fast enough?
  • Are my messages small? Or are my messages very large? You may gain little to no advantage from enabling message pipelines with larger messages. Grouping large messages in a batch has some pretty negative consequences, and generally makes no sense. Think of it this way: Do you think receiving an acknowledgement is the time-consuming part of transmitting a 3 megabyte message? An average message size over 100 kB should be an indicator that this setting may possess less value for you.
  • Is throughput less of a consideration than latency? If so, batching messages together may make less sense than immediate sends. You may alternately benefit from simply keeping the number of pipelined messages low, in this case.
Message pipelines are turned on and set to 10 messages per session by default. This can be a conservative setting in some scenarios, and can frequently be set higher than a few hundred if the average message size is sufficiently low. You can explicitly turn it off, by setting it to “1”.
There is no way to guess at a generalized, ideal messages-per-session setting – except on a case-by-case basis. The following questions should be answered in order to guess at the initial setting (and you can tune from there):
  • What is the expected average message size?
  • What is the expected quantity of messages of the expected average size that the consumers can support?
  • What is the expected round trip time between the JMS server and the consumer? The smaller the round trip time, the less potential advantage there is in setting the messages per session at a higher number.
Fortunately, other than these considerations, MessagesMaximum is a relatively straightforward choice – there isn’t a special cluster consideration as with one-way sends.

Case Study

I’m simply going use the scenario from Part 1 and Part 2, and add on to it. To recap, I started with an out-of-the box configuration of a WebLogic JMS server (“Base” in the graph), and used the IBM Performance Harness producer and consumer for simulation. The producer threads are set to fire out as many non-persistent, non-transactional 1 KB messages at a JMS topic as the JMS server will take. The single-threaded, synchronous consumer was set to AUTO_ACKNOWLEDGE.
Adding quotas and quota-blocking sends (“Quotas Only” in the graph) reduced the large standard deviation in message rate caused the WebLogic JMS servers holding onto more messages than could be delivered, and paging the messages to disk. It also increased overall performance considerably.
Adding one-way sends (“One-Way Sends Enabled” in the graph) reduced the number of producer threads necessary to reach the level of performance seen in the Quotas test run.
Taking this configuration, and enabling “Prefetch Mode” (because I’m using a synchronous consumer) and setting “Maximum Messages per Session” to 300 (which is a guess, based on the size of the message, the presence of only one consumer, and the short round-trip time). Like my other efforts, there’s not a lot of purpose in perfecting these settings on my local machine, so I’m not concerned with ideal settings so much as illustrating the principle.
The hypothesis from the last blog entry (that message consumption was the bottleneck) seems accurate. Altering MessagesMaximum and enabling Prefetch Mode caused a fairly linear scaling for the producers up to 4 producers. Here, the scaling stops because the single consumer thread has become saturated. We can be confident this is true because: 1) UNIX top reports the thread is utilizing 100% of a CPU core, and 2) Adding a second subscriber to the topic doesn’t cause the message rate to alter significantly (each subscriber is getting > 90k messages per second, although this is not displayed in the graph).

Final Thoughts

There are fewer reasons not to use message pipelines (MessagesMaximum / Prefetch) than there are with One-Way Sends. Message backlog is valuable when: 1) Utilization is low in your consumers and producers are waiting (either due to quotas or throttling), 2) Message size trends towards smaller messages, and 3) You are willing to accept the transaction and message ordering caveats. As presented, performance can be dramatically improved (more than 4x in this case), and is quite simple to configure.