Friday, December 23, 2011

JMS Performance Tuning Series, Part 2: One-Way Sends

WebLogic JMS Performance Tuning Series, Part 2: One-Way Sends
Part 1 of this series can be found here.  It covered quotas, a performance and availability setting that should always be set to protect your JMS server from being overwhelmed.  This entry is about an optional setting that’s valid only in certain environments – those with lower Quality of Service (QoS) needs but higher performance needs.
Consider the basic structure of a messaging system.  There’s a producer, a consumer, and there’s the messaging server.

Now, for the moment, let’s suppose that we wait for a receipt of each message we send (and for the most part, we do).  Each time the producer thread creates a message, it enters the send() method, and then the thread blocks for the length of time that it takes for the server to indicate that the message send operation completed normally – amounting to one round trip worth of time.  This is known as a two-way send. 
This send behavior is a fairly “safe behavior,” or at least a pessimistic behavior.  But what if performance needs trump the need for guaranteed delivery?

Enter One-Way Sends…

You might think of one-way sends as “fire and forget.”  The producer does not wait for the server response.  Your performance advantage per thread may vary, but you may see an increase in message production by a factor of several times – often, this factor is determined by what the round trip time is (the longer it takes for the message to be delivered, the more advantageous one-way sends tend to be with respect to performance).
If one-way send were completely fire-and-forget, producers would continue to send messages after (perhaps long after) a JMS server has become unavailable.  This is why you can specify a one-way send window size.  The window size specifies the number of one-way sends allowed before a two-way send is required.  Determining an appropriate window size is a tradeoff between performance (i.e., larger window size) and mitigating message loss when the JMS server becomes unavailable (smaller window size).  Increasing the window beyond a certain size (the size of which is determined, in part, by your network) may yield progressively less performance benefit – thus, it requires some experimentation to arrive at an appropriate setting.
In WebLogic 11gR1PS4, you can enable one-way sends in your producer connection factory, on the “Configuration->Flow Control” tab.

Now, *BEWARE* - merely changing the One-Way Send Mode to something other than “Disabled” doesn’t mean that it’s actually enabled.  You might think you have it enabled and see no performance difference whatsoever.  This is because one-way send is implicitly disabled if *anything* that requires a higher level of Quality of Service, such as:
  • Transactions
  • Persistence
  • Unit of Work / Unit of Order 
  • Client Store-And-Forward
This makes a great deal of sense – if the producer is not listening for acknowledgements, then it’s not exactly going to maintain transactional integrity.  If you want to use Unit-of-Order or Unit-of-Work, how long would it take for the producer to figure out that one of the messages in the sequence is missing?  One-way sends are also implicitly disabled when the destination specified is the name of a distributed destination (DD) – more on this later, but one-way sends can be used with DDs, but very carefully.  Finally, they are also disabled if the connection factory and destination are on different WebLogic Servers.
On the other hand, enabling one-way sends on a connection factory effectively disables flow control.  This is somewhat intuitive.  If the producer has a configuration where it’s basically getting very little to no feedback from the JMS server (say, like one return message for every one-way send window size), there isn’t adequate opportunity to tell the producer to slow down.

Making a Batch of Proof Pudding (Extra Proof, Hold the Pudding)

Here I am taking the configuration from Part 1 of this series and enabling one-way sends with a window size of 150 messages.  As before, I’m not really interested in tuning my development machine to be the best possible messaging machine it can be, so the 150 number is just a guess that is used for illustration.  To review, the test producer threads spin out as many 1 KB messages as the server will take.  There is always only one consumer thread, and it is a synchronous, non-durable consumer that is set to “auto acknowledge.”  The one-way sends test also inherits the Quotas and Quota-Blocking Sends settings from the previous blog.  The test run that I created takes the average performance over 8 minute periods, and standard deviation (where noticeable) is denoted by the range bars above and below the data points for the average.  In this case, I’m using a topic.

The first thing you might notice is how the rate for a single producer thread triples (22k messages per second vs 7.1k MPS) when compared to the “Quotas Only” dataset, which reinforces the multiplicative effect I mentioned earlier.  Since the producer, JMS server, and consumer are all located on the same machine, the round-trip time was fairly low to start out with.  Looking at UNIX top, I was able to see that the utilization for that producer thread is now higher (but not even close to 100% utilization of one CPU core).  This is also in line with our expectations – the producer spends less time waiting, and more time sending.
The second thing you will probably notice is that the best results for one-way sends occur with only one producer, and the numbers very gradually diminish after adding more producer threads.  We can tell there is an artificial or unnecessary bottleneck from watching top (or whatever equivalent you might be using – Windows Task Manager?).  The utilization for the systems cores for the producers, WebLogic, and the consumer are still very low – which means we are doing some kind of waiting.  The bottleneck is due to the message consumer and is the topic of the next blog in this series.  While the in-depth explanation is coming, mull over a couple of key points:  1) Adding several additional subscribers doesn’t change the messaging rate per subscriber, and 2) There is very low utilization on each consumer thread.  In effect, we have a nearly identical problem on the consumer side as we did on the producer side (which we addressed with one-way sends).

Clusters and One-Way Sends

In WebLogic 11gR1PS4 (and previous releases), one-way sends are not directly supported with distributed destinations.  It’s also worthwhile to note that using one-way sends in clusters is more complicated than in individual servers.  The documentation is helpful with respect to how this might work, but I thought a little additional dialog on this topic might be useful.
The fundamental “trick” of using one-way sends within a WebLogic cluster is to ensure your connection factory that your producer is using and the (non-distributed) destination are in the same application server container. 

Case 1: Single Destination within a Cluster

This is the simplest to configure of the two.  Define your connection factory and target at a single server, not the cluster.  Create the destination, and target it at the same server – the destination is a singleton within the cluster.  Logically, it should look something like this:

This topology has the advantage of one-way sends but sacrifices the horizontal scalability of a distributed destination.  It also tends to create uneven utilization within the cluster.

Case 2: Multiple Destinations in the Cluster

Here is where it can get complicated.  You could just take the notion from Case 1, and extrapolate it over the cluster.  Then you would have a cluster full of different connection factories and different destinations, and this is probably difficult to manage from a code and configuration perspective.  You’d have to figure out how to distribute your producers fairly over the independent destinations – so, you’d likely be trading complexity in your WebLogic configuration for complexity in your producer code.
This (probably) non-ideal arrangement might look a little like this:

The documentation on OTN probably lists the best approach.  In brief, target one connection factory to all of the participating servers – this will most likely be the entire cluster.  Turn on “Server Affinity” at the connection factory so that producers become pinned to the individual destinations (this will disable RMI load balancing for the external producers).  Create one destination per server, each with a distinct name in the global JNDI (or no global JNDI name) and an identical name for local JNDI.  Now ensure that the producers use the local JNDI name and the created connection factory for the context creation.
This will look something like:

Now the connection and delivery logic can be generalized between the producers – no more accounting for the individual connection factories or destinations in code.  We now have to account for load balancing, as our connections are not automatically balanced by creating a connection. 
Now that RMI load balancing no longer occurs, we need to ensure load balancing is performed somewhere so that we don’t end up with one cluster member doing all of the work.  The OTN docs cover load balancing with affinity turned on quite well.  Take a look at how getting your initial context from the cluster can result in a load balance.  Once the context is created, the code utilizing that context is now pinned to that particular instance.  Subsequent initial contexts from the same client will get load balanced to other cluster members. 
This may not result in a desired level of fairness, and so some degree of additional load balancing may be necessary (particularly if all producers create only one initial context, resulting in all utilization occurring on one server).  Then, either DNS load balancing or network load balancing may be appropriate.

Case 3: Using One-way Sends with Distributed Destinations

Now, you may remember me saying that one-way sends are implicitly disabled if the destination specified is the name of a distributed destination.  This is true – but you can still manually target the physical DD members.  In practice, this is somewhat more complex than Case 2 as you will need to know the name of the JMS server you are targeting in order to use it.  The name of the physical DD member follows the form “MyJMSServerName@myDistributedQueueName” in versions of WebLogic Server 9.0 and newer.  This may look something like this:

This approach has several notable complications.
  • The producer is responsible for ensuring the JNDI lookup uses the correct JMS server name to make sure the destination is the one hosted by the WebLogic instance that the producer is connected to.
  • Effectively, since you are forcing a connection to a particular WLS instance, your producers will also not be load balanced.  You will need to align producers with specific portions of the DD to get reasonably fair distribution of message load.
  • The previous bullets make failure recovery more complicated as well.  If server_3 goes down, and you have configured the producer to connect to the next server in the list, some kind of additional logic will be needed to push the producer back to server_3 when it comes online.
So, it’s possible, but it’s complicated.  In many cases, it may be preferable to either use Case 2 or disable one-way sends altogether and just increase the number of producer threads to reach the desired message rates.
There is a possible approach that will simplify using DDs with one-way sends: Using a Foreign JNDI server to map to each physical DD member (each on the same app server) and turn on server affinity.  The advantage is that the client application doesn’t need to maintain a list of servers and separate destinations.  I haven’t tried this out yet, but I will blog about this once I have (and if there is sufficient interest).

Silent Deletions

If a producer is just blasting away sending messages without listening for acknowledgements, it probably shouldn’t be a surprise that the JMS server may need to delete the message without immediately informing the producer.  This is triggered by exceeding quota (covered quite well here).  If you’re thinking, “Well that shouldn’t happen,” keep in mind that the alternative is to keep accepting messages over and beyond the quota and risking server instability.
You can maneuver around this issue by adjusting the send timeout, which modifies the amount of time it takes to silently delete the message if the quota condition isn’t cleared.
The maximum number of messages that can be deleted silently is defined by the one-way send window size.  The one-way send window defines how many messages the producer can send before having to do a two-way send (a send while waiting for a return).  A little bit of research on what types of message surge conditions you are looking to support can help you scope out what these settings should be in order to claim that you will have no message loss unless certain metrics are exceeded.

Final Thoughts

While I indicated that you should always set quotas, one-way sends are a trade-off.  You have this magnificent performance advantage that you cannot use when you have high QoS needs or require transactions.  The bright side is that you have a pretty clear picture of when you can use it. 
A partial list of critical-to-understand caveats around one-way sends:
  • The use of one-way sends in a cluster within a cluster is somewhat more complicated.  See the documentation link provided, but the trick is to ensure that the connection factory and the destination are on the same physical WLS instance.
  • If the consumers cannot keep up with the producers, the performance will be determined by the capacity of the consumers (and enabling one-way sends may make little difference).
  • One-way sends do not directly work with distributed destinations without additional configuration.  It’s somewhat more complicated than other options.
  • Same link, but notice that conditions where messages over quota condition may be silently deleted when using one-way sends.
  • Enabling one-way sends effectively disables the WebLogic JMS Flow Control feature.  That said, you can still use quotas and quota blocking sends as a means to provide some control over message producers.
  • A number of QoS features implicitly disable one-way sends.  An understanding of the way one-way send option works should help you remember which settings interfere or interact with it. 
Next up in this series, MessagesMaximum!

Thursday, November 17, 2011

JMS Performance Tuning Series, Part 1: The relationship between production, consumption, and quotas

After presenting at Oracle OpenWorld on WebLogic JMS performance tuning, I began to think that some additional information would be welcome in blog format.  This entry will kick off a larger number of blogs on JMS performance, starting by introducing the basics and mandatory settings (as in, “You need to set this to ensure your system is available”) and eventually move on to more advanced topics (as in, “You just might need to set this, depending on what you are doing”).

The Basics

Java Message Service as a specification is explained here, but I think a couple of basic items are needed to proceed in this blog.  First, let’s consider that in every working JMS scenario there is at least one producer sending messages to the JMS Server, and at least one consumer taking messages off of the JMS Server. 
Ideally, the consumer’s capacity to process messages is limitless, and the JMS server is at least able to receive messages and handoff to the consumer as fast as or faster than the producer is sending them.  In reality, this is hardly ever the case.  The creation of messages by producers and the capability of the consumers to process messages fluctuate over time.
When the consumer(s) is unable to keep up with the producer, this gets somewhat complicated as the JMS Server is left holding the message “surplus.”  When and if the surplus continues to build, a number of physical limitations start asserting themselves. 
From the JMS Server’s standpoint, managing the messages in this “surplus” becomes more difficult as the surplus grows.  The first line of storage occurs in the JVM heap – but eventually, the backlog may outgrow what we want to be stored in the heap.  WebLogic Server helps manage this backlog, by paging the messages out to a messaging store, but generally we don’t want to see the stored messages grow to unmanageable sizes.  Further, paging only helps in scenarios where we are experiencing a surge in messages that is fairly short-lived, and we will eventually catch up in terms of message processing.  Imagine having 100,000 messages arriving per second, and your message store has 2 GB of backlogged messages waiting for consumption.  When will they be consumed?  How many of them are still relevant / valid?  When will I run out of storage space for backlogged messages?  What if my disks cannot write fast enough to keep up with the messages being paged to them?

Enter Quotas…

Quotas allow us to set the maximum message number or maximum total size in bytes that our JMS server or destination will allow.  Set both of them.  Really, you should do it.  Quotas define, in effect, what the server or destination is willing to hold before refusing to take more messages from the producers.  In WebLogic 11gR1 PS4, you can find Quota settings in the Administration console, in MY_DOMAIN_NAME->Services->Messaging->JMS Servers->MY_JMS_SERVER_N AME under the “Thresholds and Quota” tab.
 Let’s take a look at the settings provided here:
 Neither of these stop whatever is creating the messages (e.g., stock purchases, etc.) – it just prevents the producers from swamping the JMS server by defining when the server will stop accepting new messages.  This is a good thing.  If we let the server be inundated to the point of instability by not adding quotas, periods of high volume can bring down our server (which is always bad). 
Now, there’s always the question, “How should I size my quota?”  Good question (I’m glad I asked it), and one I can’t answer on this blog entirely.  Among other concerns, this really depends on:
  • Whether or not the WebLogic Server is being dedicated to JMS or not.
  • The expected heap size is.
  • Your tolerance to latency (larger quotas tend to mean garbage collection will take longer).
  • Expected variance in producer send rate.  
So let me leave you with this thought: What is your expected fluctuation rate with your producers?  Given that, the larger the quota size is, the more you open yourself up to diminished transactions per second (due to GC, paging, etc), what is the smallest you can set the quota to and still expect to fulfill your requirements?  Keep in mind, 1/3 of the heap is a not-unreasonable quota total for all of the JMS servers present.

Send Timeouts / Quota Blocking Sends

Once a specified quota condition has been reached, the WLS JMS server will start rejecting new messages from producers until the backlog is back under the quota.  We can supplement the quota settings by altering the Send Timeout on the Connection Factory (it’s under the “Default Delivery” tab) – this will tell the producer how long to wait on the destination or JMS server prior to timing out on its send() operation.  This has the effect of causing the producer thread to block for up to the length of time specified, waiting for the message backlog to fall under the quota condition.  This is particularly useful if there are multiple producers.
From a programmatic standpoint, this is the length of time your producer thread will wait before it comes back with a timeout exception.  This has the net effect of slowing down the producers when you’ve exceeded your quota.  Do not mistake this mechanism for throttling.  Think of this as your insurance measure to guarantee your server is up, responsive, and continuing to provide messages to the consumer.  Quota blocking sends add a level of resiliency or forgiveness to the system. They enable the send operation to complete if space opens up during the wait time, which means that the application doesn’t need to handle resource exception when the quota is exceeded, especially if space is freed up quickly.
How should you determine your send timeout?  Again, depends.  After what time does your message lose its value?  What happens if your application holds onto a message for longer periods of time – does it continue to generate or queue up more messages while waiting?  For that matter, what is a more desirable behavior: Rolling back a transaction, or waiting?  I tend to keep my send timeouts under one second as a rule of thumb, but your mileage may vary.

Final Words on Quotas

I don’t often say “always configure x this way,” but always, always, always configure message quotas on WebLogic JMS.
Take a look at this graph, which shows a sample JMS server with no quotas set under heavy load, showing the average message rates for an 8 minute period:

This graph shows results from a performance harness that floods the server with as many messages as possible. This isn’t exactly a real-world scenario, but it can help me illustrate my point. Can you see how the standard deviation rate becomes more pronounced as more producers are added?  This is not good – the standard deviation range is larger than the actual average message rate, which means we simply don’t have predictable performance or availability. The server is over-loaded. It is doing everything it can to keep up with the unyielding stream of messages by paging to disk, which also results in frequent full garbage collections.
Now take a look at what happens when we add a quota of a size (100 MB) that I chose without research or tuning (i.e., there could be better or worse quota sizes, but this isn’t the point):

Note the disparity, not just in the message rate, but that the standard deviation is no longer visible.  The performance increase is experienced because the quota has prevented the server from being overwhelmed. 
Until next time.