Friday, December 23, 2011

JMS Performance Tuning Series, Part 2: One-Way Sends

WebLogic JMS Performance Tuning Series, Part 2: One-Way Sends
Part 1 of this series can be found here.  It covered quotas, a performance and availability setting that should always be set to protect your JMS server from being overwhelmed.  This entry is about an optional setting that’s valid only in certain environments – those with lower Quality of Service (QoS) needs but higher performance needs.
Consider the basic structure of a messaging system.  There’s a producer, a consumer, and there’s the messaging server.

Now, for the moment, let’s suppose that we wait for a receipt of each message we send (and for the most part, we do).  Each time the producer thread creates a message, it enters the send() method, and then the thread blocks for the length of time that it takes for the server to indicate that the message send operation completed normally – amounting to one round trip worth of time.  This is known as a two-way send. 
This send behavior is a fairly “safe behavior,” or at least a pessimistic behavior.  But what if performance needs trump the need for guaranteed delivery?

Enter One-Way Sends…

You might think of one-way sends as “fire and forget.”  The producer does not wait for the server response.  Your performance advantage per thread may vary, but you may see an increase in message production by a factor of several times – often, this factor is determined by what the round trip time is (the longer it takes for the message to be delivered, the more advantageous one-way sends tend to be with respect to performance).
If one-way send were completely fire-and-forget, producers would continue to send messages after (perhaps long after) a JMS server has become unavailable.  This is why you can specify a one-way send window size.  The window size specifies the number of one-way sends allowed before a two-way send is required.  Determining an appropriate window size is a tradeoff between performance (i.e., larger window size) and mitigating message loss when the JMS server becomes unavailable (smaller window size).  Increasing the window beyond a certain size (the size of which is determined, in part, by your network) may yield progressively less performance benefit – thus, it requires some experimentation to arrive at an appropriate setting.
In WebLogic 11gR1PS4, you can enable one-way sends in your producer connection factory, on the “Configuration->Flow Control” tab.

Now, *BEWARE* - merely changing the One-Way Send Mode to something other than “Disabled” doesn’t mean that it’s actually enabled.  You might think you have it enabled and see no performance difference whatsoever.  This is because one-way send is implicitly disabled if *anything* that requires a higher level of Quality of Service, such as:
  • Transactions
  • Persistence
  • Unit of Work / Unit of Order 
  • Client Store-And-Forward
This makes a great deal of sense – if the producer is not listening for acknowledgements, then it’s not exactly going to maintain transactional integrity.  If you want to use Unit-of-Order or Unit-of-Work, how long would it take for the producer to figure out that one of the messages in the sequence is missing?  One-way sends are also implicitly disabled when the destination specified is the name of a distributed destination (DD) – more on this later, but one-way sends can be used with DDs, but very carefully.  Finally, they are also disabled if the connection factory and destination are on different WebLogic Servers.
On the other hand, enabling one-way sends on a connection factory effectively disables flow control.  This is somewhat intuitive.  If the producer has a configuration where it’s basically getting very little to no feedback from the JMS server (say, like one return message for every one-way send window size), there isn’t adequate opportunity to tell the producer to slow down.

Making a Batch of Proof Pudding (Extra Proof, Hold the Pudding)

Here I am taking the configuration from Part 1 of this series and enabling one-way sends with a window size of 150 messages.  As before, I’m not really interested in tuning my development machine to be the best possible messaging machine it can be, so the 150 number is just a guess that is used for illustration.  To review, the test producer threads spin out as many 1 KB messages as the server will take.  There is always only one consumer thread, and it is a synchronous, non-durable consumer that is set to “auto acknowledge.”  The one-way sends test also inherits the Quotas and Quota-Blocking Sends settings from the previous blog.  The test run that I created takes the average performance over 8 minute periods, and standard deviation (where noticeable) is denoted by the range bars above and below the data points for the average.  In this case, I’m using a topic.

The first thing you might notice is how the rate for a single producer thread triples (22k messages per second vs 7.1k MPS) when compared to the “Quotas Only” dataset, which reinforces the multiplicative effect I mentioned earlier.  Since the producer, JMS server, and consumer are all located on the same machine, the round-trip time was fairly low to start out with.  Looking at UNIX top, I was able to see that the utilization for that producer thread is now higher (but not even close to 100% utilization of one CPU core).  This is also in line with our expectations – the producer spends less time waiting, and more time sending.
The second thing you will probably notice is that the best results for one-way sends occur with only one producer, and the numbers very gradually diminish after adding more producer threads.  We can tell there is an artificial or unnecessary bottleneck from watching top (or whatever equivalent you might be using – Windows Task Manager?).  The utilization for the systems cores for the producers, WebLogic, and the consumer are still very low – which means we are doing some kind of waiting.  The bottleneck is due to the message consumer and is the topic of the next blog in this series.  While the in-depth explanation is coming, mull over a couple of key points:  1) Adding several additional subscribers doesn’t change the messaging rate per subscriber, and 2) There is very low utilization on each consumer thread.  In effect, we have a nearly identical problem on the consumer side as we did on the producer side (which we addressed with one-way sends).

Clusters and One-Way Sends

In WebLogic 11gR1PS4 (and previous releases), one-way sends are not directly supported with distributed destinations.  It’s also worthwhile to note that using one-way sends in clusters is more complicated than in individual servers.  The documentation is helpful with respect to how this might work, but I thought a little additional dialog on this topic might be useful.
The fundamental “trick” of using one-way sends within a WebLogic cluster is to ensure your connection factory that your producer is using and the (non-distributed) destination are in the same application server container. 

Case 1: Single Destination within a Cluster

This is the simplest to configure of the two.  Define your connection factory and target at a single server, not the cluster.  Create the destination, and target it at the same server – the destination is a singleton within the cluster.  Logically, it should look something like this:

This topology has the advantage of one-way sends but sacrifices the horizontal scalability of a distributed destination.  It also tends to create uneven utilization within the cluster.

Case 2: Multiple Destinations in the Cluster

Here is where it can get complicated.  You could just take the notion from Case 1, and extrapolate it over the cluster.  Then you would have a cluster full of different connection factories and different destinations, and this is probably difficult to manage from a code and configuration perspective.  You’d have to figure out how to distribute your producers fairly over the independent destinations – so, you’d likely be trading complexity in your WebLogic configuration for complexity in your producer code.
This (probably) non-ideal arrangement might look a little like this:

The documentation on OTN probably lists the best approach.  In brief, target one connection factory to all of the participating servers – this will most likely be the entire cluster.  Turn on “Server Affinity” at the connection factory so that producers become pinned to the individual destinations (this will disable RMI load balancing for the external producers).  Create one destination per server, each with a distinct name in the global JNDI (or no global JNDI name) and an identical name for local JNDI.  Now ensure that the producers use the local JNDI name and the created connection factory for the context creation.
This will look something like:

Now the connection and delivery logic can be generalized between the producers – no more accounting for the individual connection factories or destinations in code.  We now have to account for load balancing, as our connections are not automatically balanced by creating a connection. 
Now that RMI load balancing no longer occurs, we need to ensure load balancing is performed somewhere so that we don’t end up with one cluster member doing all of the work.  The OTN docs cover load balancing with affinity turned on quite well.  Take a look at how getting your initial context from the cluster can result in a load balance.  Once the context is created, the code utilizing that context is now pinned to that particular instance.  Subsequent initial contexts from the same client will get load balanced to other cluster members. 
This may not result in a desired level of fairness, and so some degree of additional load balancing may be necessary (particularly if all producers create only one initial context, resulting in all utilization occurring on one server).  Then, either DNS load balancing or network load balancing may be appropriate.

Case 3: Using One-way Sends with Distributed Destinations

Now, you may remember me saying that one-way sends are implicitly disabled if the destination specified is the name of a distributed destination.  This is true – but you can still manually target the physical DD members.  In practice, this is somewhat more complex than Case 2 as you will need to know the name of the JMS server you are targeting in order to use it.  The name of the physical DD member follows the form “MyJMSServerName@myDistributedQueueName” in versions of WebLogic Server 9.0 and newer.  This may look something like this:

This approach has several notable complications.
  • The producer is responsible for ensuring the JNDI lookup uses the correct JMS server name to make sure the destination is the one hosted by the WebLogic instance that the producer is connected to.
  • Effectively, since you are forcing a connection to a particular WLS instance, your producers will also not be load balanced.  You will need to align producers with specific portions of the DD to get reasonably fair distribution of message load.
  • The previous bullets make failure recovery more complicated as well.  If server_3 goes down, and you have configured the producer to connect to the next server in the list, some kind of additional logic will be needed to push the producer back to server_3 when it comes online.
So, it’s possible, but it’s complicated.  In many cases, it may be preferable to either use Case 2 or disable one-way sends altogether and just increase the number of producer threads to reach the desired message rates.
There is a possible approach that will simplify using DDs with one-way sends: Using a Foreign JNDI server to map to each physical DD member (each on the same app server) and turn on server affinity.  The advantage is that the client application doesn’t need to maintain a list of servers and separate destinations.  I haven’t tried this out yet, but I will blog about this once I have (and if there is sufficient interest).

Silent Deletions

If a producer is just blasting away sending messages without listening for acknowledgements, it probably shouldn’t be a surprise that the JMS server may need to delete the message without immediately informing the producer.  This is triggered by exceeding quota (covered quite well here).  If you’re thinking, “Well that shouldn’t happen,” keep in mind that the alternative is to keep accepting messages over and beyond the quota and risking server instability.
You can maneuver around this issue by adjusting the send timeout, which modifies the amount of time it takes to silently delete the message if the quota condition isn’t cleared.
The maximum number of messages that can be deleted silently is defined by the one-way send window size.  The one-way send window defines how many messages the producer can send before having to do a two-way send (a send while waiting for a return).  A little bit of research on what types of message surge conditions you are looking to support can help you scope out what these settings should be in order to claim that you will have no message loss unless certain metrics are exceeded.

Final Thoughts

While I indicated that you should always set quotas, one-way sends are a trade-off.  You have this magnificent performance advantage that you cannot use when you have high QoS needs or require transactions.  The bright side is that you have a pretty clear picture of when you can use it. 
A partial list of critical-to-understand caveats around one-way sends:
  • The use of one-way sends in a cluster within a cluster is somewhat more complicated.  See the documentation link provided, but the trick is to ensure that the connection factory and the destination are on the same physical WLS instance.
  • If the consumers cannot keep up with the producers, the performance will be determined by the capacity of the consumers (and enabling one-way sends may make little difference).
  • One-way sends do not directly work with distributed destinations without additional configuration.  It’s somewhat more complicated than other options.
  • Same link, but notice that conditions where messages over quota condition may be silently deleted when using one-way sends.
  • Enabling one-way sends effectively disables the WebLogic JMS Flow Control feature.  That said, you can still use quotas and quota blocking sends as a means to provide some control over message producers.
  • A number of QoS features implicitly disable one-way sends.  An understanding of the way one-way send option works should help you remember which settings interfere or interact with it. 
Next up in this series, MessagesMaximum!

1 comment:

  1. Using a Foreign JNDI server to map to each physical DD member (each on the same app server) and turn on server affinity. The advantage is that the client application doesn’t need to maintain a list of servers and separate destinations. I haven’t tried this out yet, but I will blog about this once I have (and if there is sufficient interest).

    I am interested.