Thursday, November 17, 2011

JMS Performance Tuning Series, Part 1: The relationship between production, consumption, and quotas


After presenting at Oracle OpenWorld on WebLogic JMS performance tuning, I began to think that some additional information would be welcome in blog format.  This entry will kick off a larger number of blogs on JMS performance, starting by introducing the basics and mandatory settings (as in, “You need to set this to ensure your system is available”) and eventually move on to more advanced topics (as in, “You just might need to set this, depending on what you are doing”).

The Basics

Java Message Service as a specification is explained here, but I think a couple of basic items are needed to proceed in this blog.  First, let’s consider that in every working JMS scenario there is at least one producer sending messages to the JMS Server, and at least one consumer taking messages off of the JMS Server. 
Ideally, the consumer’s capacity to process messages is limitless, and the JMS server is at least able to receive messages and handoff to the consumer as fast as or faster than the producer is sending them.  In reality, this is hardly ever the case.  The creation of messages by producers and the capability of the consumers to process messages fluctuate over time.
When the consumer(s) is unable to keep up with the producer, this gets somewhat complicated as the JMS Server is left holding the message “surplus.”  When and if the surplus continues to build, a number of physical limitations start asserting themselves. 
From the JMS Server’s standpoint, managing the messages in this “surplus” becomes more difficult as the surplus grows.  The first line of storage occurs in the JVM heap – but eventually, the backlog may outgrow what we want to be stored in the heap.  WebLogic Server helps manage this backlog, by paging the messages out to a messaging store, but generally we don’t want to see the stored messages grow to unmanageable sizes.  Further, paging only helps in scenarios where we are experiencing a surge in messages that is fairly short-lived, and we will eventually catch up in terms of message processing.  Imagine having 100,000 messages arriving per second, and your message store has 2 GB of backlogged messages waiting for consumption.  When will they be consumed?  How many of them are still relevant / valid?  When will I run out of storage space for backlogged messages?  What if my disks cannot write fast enough to keep up with the messages being paged to them?

Enter Quotas…

Quotas allow us to set the maximum message number or maximum total size in bytes that our JMS server or destination will allow.  Set both of them.  Really, you should do it.  Quotas define, in effect, what the server or destination is willing to hold before refusing to take more messages from the producers.  In WebLogic 11gR1 PS4, you can find Quota settings in the Administration console, in MY_DOMAIN_NAME->Services->Messaging->JMS Servers->MY_JMS_SERVER_N AME under the “Thresholds and Quota” tab.
 Let’s take a look at the settings provided here:
 Neither of these stop whatever is creating the messages (e.g., stock purchases, etc.) – it just prevents the producers from swamping the JMS server by defining when the server will stop accepting new messages.  This is a good thing.  If we let the server be inundated to the point of instability by not adding quotas, periods of high volume can bring down our server (which is always bad). 
Now, there’s always the question, “How should I size my quota?”  Good question (I’m glad I asked it), and one I can’t answer on this blog entirely.  Among other concerns, this really depends on:
  • Whether or not the WebLogic Server is being dedicated to JMS or not.
  • The expected heap size is.
  • Your tolerance to latency (larger quotas tend to mean garbage collection will take longer).
  • Expected variance in producer send rate.  
So let me leave you with this thought: What is your expected fluctuation rate with your producers?  Given that, the larger the quota size is, the more you open yourself up to diminished transactions per second (due to GC, paging, etc), what is the smallest you can set the quota to and still expect to fulfill your requirements?  Keep in mind, 1/3 of the heap is a not-unreasonable quota total for all of the JMS servers present.

Send Timeouts / Quota Blocking Sends

Once a specified quota condition has been reached, the WLS JMS server will start rejecting new messages from producers until the backlog is back under the quota.  We can supplement the quota settings by altering the Send Timeout on the Connection Factory (it’s under the “Default Delivery” tab) – this will tell the producer how long to wait on the destination or JMS server prior to timing out on its send() operation.  This has the effect of causing the producer thread to block for up to the length of time specified, waiting for the message backlog to fall under the quota condition.  This is particularly useful if there are multiple producers.
From a programmatic standpoint, this is the length of time your producer thread will wait before it comes back with a timeout exception.  This has the net effect of slowing down the producers when you’ve exceeded your quota.  Do not mistake this mechanism for throttling.  Think of this as your insurance measure to guarantee your server is up, responsive, and continuing to provide messages to the consumer.  Quota blocking sends add a level of resiliency or forgiveness to the system. They enable the send operation to complete if space opens up during the wait time, which means that the application doesn’t need to handle resource exception when the quota is exceeded, especially if space is freed up quickly.
How should you determine your send timeout?  Again, depends.  After what time does your message lose its value?  What happens if your application holds onto a message for longer periods of time – does it continue to generate or queue up more messages while waiting?  For that matter, what is a more desirable behavior: Rolling back a transaction, or waiting?  I tend to keep my send timeouts under one second as a rule of thumb, but your mileage may vary.

Final Words on Quotas

I don’t often say “always configure x this way,” but always, always, always configure message quotas on WebLogic JMS.
Take a look at this graph, which shows a sample JMS server with no quotas set under heavy load, showing the average message rates for an 8 minute period:

This graph shows results from a performance harness that floods the server with as many messages as possible. This isn’t exactly a real-world scenario, but it can help me illustrate my point. Can you see how the standard deviation rate becomes more pronounced as more producers are added?  This is not good – the standard deviation range is larger than the actual average message rate, which means we simply don’t have predictable performance or availability. The server is over-loaded. It is doing everything it can to keep up with the unyielding stream of messages by paging to disk, which also results in frequent full garbage collections.
Now take a look at what happens when we add a quota of a size (100 MB) that I chose without research or tuning (i.e., there could be better or worse quota sizes, but this isn’t the point):

Note the disparity, not just in the message rate, but that the standard deviation is no longer visible.  The performance increase is experienced because the quota has prevented the server from being overwhelmed. 
Until next time.