Tuesday, March 20, 2012

WebLogic Clusters and the Singleton Service


Ever need to have exactly one object (a single object method invocation) in a cluster of many Oracle WebLogic application servers that supports failover?  The EJB 3.1 @Singleton annotation only guarantees an EJB singleton per JVM, and creating a singleton in the traditional Java SE-fashions (see Joshua Bloch’s article at Dr Dobbs) only guarantees a singleton per class loader.
WebLogic Server provides support for such a cluster-wide singleton (here, scroll down to “Implementing the Singleton Service Interface”), which I had the chance to experiment with during the past week.  The documentation on this feature is adequate to get it running for the first time, but I thought some additional detail around it might be useful.


What is the SingletonService, and what does it provide?

weblogic.cluster.singleton.SingletonService is an interface that you can implement in your Plain Old Java Object.  It’s not applicable to EJBs, MDBs, or other objects whose lifecycles are managed by the application server.
SingletonService provides the two abstract methods activate() and deactivate().  activate() is invoked when the class becomes the designated cluster-wide singleton (i.e., server startup, failover and migration, or if the application is re-deployed).  deactivate() is more or less invoked on the inverse side of those operations: Server shutdown, failover and migration, and application un-deployment.
This is really all the Singleton Service interface provides: The invocation of the two implemented methods at the appropriate times, the guarantee that only one instance is active, and the behavior of starting activate() on another server in the cluster.  This seemingly basic functionality can be very powerful in the right use case, however.

What are some valid use cases for Singleton Service?

Singletons, even in the Java SE usage, usually require some additional consideration and planning. It shouldn’t be surprising that a singleton construct outside of the Java SE and EE APIs would require additional care as well.  The first step in correct implementation is developing an understanding for what use cases the cluster-wide singleton is (and isn’t) appropriate for.
This is not, by any means, a comprehensive list.  Feel free to add use cases you have used it for as well in the comments.

Timers / job schedulers in a cluster

The basic use case is getting a single server to fire off timed events within a cluster and support failover in the event that the server is turned off / unplugged / exploded.  You only want a single server firing off these timed events (e.g., JMS messages sent to a topic that prompts subscribers to report some kind of status).  This would be along the lines of a cluster-aware cron job. This is ordinarily not easy to achieve unless you configure a heterogeneous cluster or an external cron job– and that makes failover a concern.
SingletonService makes this use case relatively simple to implement, since you only have it active on one server at a time and failover is taken care of for you.  James Bayer wrote about this back in 2009, so you should be able to follow his example for implementation.  You may not want to have the scheduler be the singleton service, but you can use the singleton service to construct and start the timer.

Use the SingletonService to handle other un-clusterable services

What if you wanted to run a service that cannot be clustered in a meaningful way?  How about a Java-based email server?  Perhaps you need a file, FTP, or email client poller?
Using the activate() and deactivate() methods, you can create the service within the cluster exactly once, and ensure that the service will migrate over to another cluster member on server shutdown.

Single Source for State or Properties for Clustered Applications

Usually, using a database or an in-memory cache like Coherence are more desirable options to store application properties, due to both reduced complexity and ease of implementation. Using the database might not be an option, however, since it’s possible that the connection to the database may be transient or the information may be needed prior to connecting to the database.  It’s also possible that an in-memory cache is not currently in the environment.
Given that you want a single place to store and update the information, and not have to redeploy the application or restart the server to get the change to take effect, you could use the Singleton Service to store state / properties.  In this case, I’ve provided a sample.

My Example

My example is composed into two main parts: the Singleton Service POJO (and its interface), and the application that invokes methods from the “JEE world” – in this case, an Enterprise Java Bean.  I’ve got a couple of basic requirements: I need to reference the POJO from anywhere in the cluster via JNDI, and I want it to carry some kind of state.
Because I need to bind the object to JNDI, and access its methods, I need to start with an interface that extends Remote (i.e., I am going to use Remote Method Invocation).  Nothing profound, I just need modifiers to a private integer.
 package com.darrel.samples;  
 import java.rmi.Remote;  
 import java.rmi.RemoteException;  
 public interface MySingletonServiceInterface extends Remote {  
    public void setMyValue(int value) throws RemoteException;  
    public int getMyValue() throws RemoteException;  
 }  
Now I need to create the implementation:
 package com.darrel.samples;  
   
 import java.io.Serializable;  
 import javax.naming.Context;  
 import javax.naming.InitialContext;  
 import javax.naming.NamingException;  
 import weblogic.cluster.singleton.SingletonService;  
   
 public class MySingletonServiceClass implements 
    SingletonService, Serializable, MySingletonServiceInterface {  
   
    private static final long serialVersionUID = 3966807367110330202L;  
    private static final String jndiName = "MySingletonServiceClass";  
    private int myValue;  
      
    public int getMyValue() {  
       return myValue;  
    }  
      
    public synchronized void setMyValue(int myValue) {  
       this.myValue = myValue;  
    }  
   
    @Override  
    public void activate() {  
       System.out.println("activate triggered");  
       Context ic = null;  
       try {  
          ic = new InitialContext();  
          ic.bind(jndiName, this);  
          System.out.println("Object now bound in JNDI at " + jndiName);  
          myValue = 5;  
       } catch (NamingException e) {  
          myValue = -1;  
          e.printStackTrace();  
       }finally{  
          try {  
             if(ic != null) ic.close();  
          } catch (NamingException e) {  
             e.printStackTrace();  
          }  
       }     
    }  
   
    @Override  
    public void deactivate() {  
       System.out.println("deactivate triggered");  
       Context ic = null;  
       try {  
          ic = new InitialContext();  
          ic.unbind(jndiName);  
          System.out.println("Context unbound successfully");  
       }catch (NamingException e){  
          e.printStackTrace();  
       }  
    }  
 }  
The basics are there – I implement the abstract methods from Singleton Service.  I use activate() to initialize a value for myValue.    I also bind (and unbind upon deactivation) the object in JNDI as “MySingletonService” (creative, I know).  Concurrency is definitely an issue, so the synchronized modifier on setMyValue() is very, very necessary.
I created an EJB to access the POJO, and added web service annotations for testing purposes.
 package com.darrel.samples;  
   
 import javax.ejb.Stateless;  
 import javax.jws.WebMethod;  
 import javax.jws.WebService;  
 import javax.naming.Context;  
 import javax.naming.InitialContext;  
   
 @WebService  
 @Stateless(mappedName="com.darrel.samples.SingletonTestingBean")  
 public class SingletonTestingBean   
 implements SingletonTestingBeanRemote,   
 SingletonTestingBeanLocal   
 {  
    int myValue;  
   
    public SingletonTestingBean() {}  
   
    @Override  
    @WebMethod  
    public String sayHelloInternalValue(String firstname) throws Exception {  
       System.out.println("sayHelloInternalValue invoked");  
       Context ctx = new InitialContext();  
       MySingletonServiceInterface mssc = (MySingletonServiceInterface)   
             ctx.lookup("MySingletonServiceClass");  
       myValue = mssc.getMyValue();  
       return "Hello " + firstname + ", my value is " + myValue;           
    }  
   
    @Override  
    @WebMethod  
    public int addInternalValue(int myInt) throws Exception {  
       Context ctx = new InitialContext();  
       MySingletonServiceInterface mssc = (MySingletonServiceInterface)   
             ctx.lookup("MySingletonServiceClass");  
       mssc.setMyValue(mssc.getMyValue() + myInt);  
       myValue = mssc.getMyValue();  
       return myValue;  
    }  
 }  
   
Not much new here – a context lookup to the object, and simple setters and getters.  I’ve excluded the remote and local interfaces for brevity.
To build and bundle our new Singleton Service and its interface into a JAR, you will need to add weblogic.jar to your class path at build time.  Since I used a plain Java project, I had to add weblogic.jar as an external JAR to the project build path.
I added the resulting JAR to my WebLogic Domain’s /lib folder.  The EJB project can be built with the JAR in the build class path.  In Eclipse, you could do this via the “Required projects on the build path” dialog:


Now we need to configure the cluster for the Singleton Service. In the WebLogic Administration Console, navigate to your cluster, and then to the “Migration” tab.  You will need to have migration set up in some way, I used “Consensus” to avoid using a database for this example, but your production model may have different needs entirely.

Now you need to navigate to the cluster’s “Singleton Services” tab, and create a new Singleton Service.
You will want to use the fully qualified class name for the singleton:

Set your preferred server and we are ready for deployment and testing.  Deploy the EJB project to the cluster. You can use the WebLogic Test Client to verify functionality, as the EJB Web Service will provide web test points for you to use. 
In my test, I used the addInternalValue() method on the EJB hosted on server1 to add 8, returning the total value of 13.

Then, I used the sayHelloInternalValue() method from server2, using the argument “World” – note that the value displayed is 13, which implies that server2 is indeed invoking methods to the same object as server1.

This is also a good time to look at the console output – take a look at your server.out for the preferred server, you should see the System.out.println() from the activate() method.
 <Mar 2, 2012 5:25:20 PM CST> <Notice> <WebLogicServer> <BEA-000360> <The server started in RUNNING mode.>   
 activate triggered  
 Object now bound in JNDI at MySingletonServiceClass  
   
To verify migration, try shutting down the preferred server and you will then see the activate() method’s print statements in the console output of one of the other servers in your cluster.  If you shut down each server as the singleton becomes active on that server, you should see this output in the console output in every server in the cluster. 

Alternate methods of building and deploying

My initial attempt at deploying this Singleton Service was by bundling it in a JEE utility JAR that was inside of the EJB EAR, and using the weblogic-application.xml deployment descriptor of the EAR to register it as an app-scoped singleton service.  The xml snippet regarding the singleton service would look like this:
   <wls:singleton-service>  
     <wls:class-name>com.darrel.samples.MySingletonServiceClass</wls:class-name>  
     <wls:name>Appscoped_Singleton_Service</wls:name>  
   </wls:singleton-service>  
This approach has the merit of not requiring a server restart to get class changes to take effect after updating the Singleton Service – you only need to redeploy the EAR.  It also simplifies the maintenance of your build path, since the WebLogic System Libraries should already be there – unlike in my example, where I had to add the weblogic.jar file externally.  Further, this eliminates the step of modifying the cluster configuration in the Administration Console to account for the singleton.  Finally, since you are no longer adding your singleton class to the $DOMAIN_HOME/lib, administrators will not have to directly modify the class path to transition your application to production.

Implications for Use

Individually, the singleton service doesn’t benefit from the linear performance scaling of the cluster, just the failover capabilities.   This doesn’t mean that you can use the singleton service to create scalable and performance-driven services, merely that you can’t directly leverage the cluster to do so.  For example: JMS servers exist as form of singleton services within the cluster, but address scaling by hosting distributed destinations.  The constituent, physical destinations of the logical distributed destination are individually hosted by the singleton JMS servers.  In this case, however, quite a bit more is happening than simply registering a POJO in JNDI.
Concurrency is a consideration when providing access to class members in the singleton that are not thread-safe.  In the above example, I use a synchronized method to address concurrent access to the data primitive class member.  This is an observation that should be readily apparent (certainly, it would also be true in the case of any other type of singleton), but is worth mentioning as a warning.
“What are other people using this for?” you might ask.  Of the customers I have encountered, a majority have used the SingletonService as a means to create a High Availability service out of a service that is fundamentally unable to be clustered (like an FTP destination poller).  The SingletonService capability enabled them to avoid deploying the application to a stand-alone managed server, and the failover capability allows them to ensure that the service stays up as long as the cluster does.

Final Thoughts

I’d be curious to find out what you are using the Singleton Service for – please leave your use case in the comments, if you can share.

Friday, February 24, 2012

WebLogic JMS Performance Tuning Series, Part 3: Consumer-Side Message Pipelining


Following Part 1 and Part 2 of this series, this entry is more in line with Part 2 as it is a setting that is particularly useful in lower Quality-of-Service, higher volume environments.
Reusing the diagram from Part 2, consider the basic structure of the average message send():
For a moment, let’s consider what it would look like if the WebLogic JMS server had to wait for a consumer that does not allow a backlog, and confirm the receipt of each message individually.
Note that, for both participants, the waiting constitutes a significant portion (if not the majority) of time. The longer the roundtrip, the more pronounced this effect is. This results in a low utilization of both the consumer and the JMS server. We frequently don’t want this behavior – it’s slower, and the usual manner of compensating for it is to add more consumer threads.
Consider the one-way sends from Part 2: The underlying trick is to remove the waiting, in favor of accepting a diminishment of message quality of service. Message pipelining (called “Maximum Messages per Session” in the WebLogic Administration Console) setting is a JMS configuration for the messaging consumer’s connection factory that is somewhat similar, as it may require a tradeoff for added performance.

Turn on the Speed: MessagesMaximum

Maximum messages per session (MessagesMaximum for short or in the configuration XML file), like many phrases or words with “Max” in them, can make a pretty spectacular performance gain in certain scenarios and is always welcome at the best of parties.
You can find this setting in the connection factory “Client” tab in the WebLogic Administration Console.
One of the interesting things about message pipelining is that the setting works a little differently based on your JMS client type. The main purpose of message pipelining is to lower the amount of time the client spends waiting, and increase the ratio of time that is spent by the JMS server transmitting messages. The message pipeline (also referred to as a “message backlog”) is created by sending more than one message to a consumer prior to receiving an acknowledgement.

Case 1: Asynchronous Client

If you’re using an MDB or otherwise using a client with an onMessage() method and implementing a MessageListener interface, you are using an asynchronous client. When you are using an asynchronous client, the “Maximum Messages per Session” setting applies to the message pipeline on the consumer side.
The message pipeline is in effect when the consumers unable to take messages off of the destination of the JMS server as fast as the messages are put there by the producers. Until production is faster than consumption, individual messages are received by the consumer from the JMS server in a two-way send. When production outpaces consumption, messages begin to be sent in batches to available, asynchronous consumer sessions.

The batch style of message delivery from the messaging server provides both the performance benefit of lowering the number of two-way sends and generally having messages more immediately available for consumption by the onMessage() method.
A potential downside is that the message pipeline very clearly affects memory consumption on the JMS consumer side, so getting optimal performance with this setting may be a balancing act if heap consumption becomes a concern on the consumer. If the pipeline is too large, you might wind up with one consumer overwhelmed by a huge backlog of messages when the other consumers are doing nothing.
There are a few behaviors to consider prior to implementing message pipelines for asynchronous consumers:
  • Messages in the pipeline will not be in the destination’s configured sort order. This isn’t surprising – if the messages have already left the server, the server isn’t going to be sorting messages that are now on the client. The messages are sorted, however, prior to being sent in batch to the client.
  • The message pipeline is sometimes sent as a single T3 message, which makes it easier to go over the MaxT3MessageSize. Generally this is more of a concern with larger messages (> 1MB), but it depends on your pipeline size setting and the average message size.

Case 2: Synchronous Client & Prefetch Mode

If you are receiving messages with receive() (or receive(long timeout), receiveNoWait()), you’re receiving synchronously. The consumer makes a two-way call to the JMS server to see if there is a message available, and retrieves it, if possible – it’s a polling behavior. If there is no message available, the call’s thread blocks for the specified time, waiting for the next message on the destination to arrive.
This is the behavior for synchronous clients unless “Prefetch Mode for Synchronous Consumers” is enabled. You can find this in the Administration Console, under your Connection Factory settings in the “Client” tab.
Like the asynchronous client message pipelining behavior, synchronous clients with Prefetch Mode enabled receive batches of messages when the client invokes the receive() method. The number specified in “Maximum Messages per Session” will apply here, as well. Despite the batches of messages that are sent to your client JVM, the receive() method returns messages individually. De-batching takes place in the code provided in the WebLogic client libraries – so no de-batching is needed in the user-written consumer / subscriber code.
As with the asynchronous client, performance improves if pipelining results in the consumer spending less time waiting and more time processing. There is also the added benefit that, since the consumer generally receives more than one (possibly many more) message per polling attempt, which reduces the amount of polling necessary and, therefore, overall network traffic. Overall, the trend is towards higher consumer utilization.
Pipelining works differently with user transactions (XA). It also behaves differently when more than one consumer shares the same session. I invite you to read the docs on this. They state that User Transactions (XA) will either silently ignore the Prefetch Mode setting, or the consumer will fail to retrieve the message and generate an exception (the same applies to multiple consumers on the same session). The docs didn’t clarify this adequately for my purposes, so I will expound a bit after having experimented with this on WebLogic. Keep in mind that these are just my findings, and not official aims or requirements of the product.

Synchronous Clients, User Transactions and Session Sharing

WebLogic implicitly disables Prefetch Mode / pipelining for the rest of the session when:
  1. Using an XA-enabled connection factory, the first receive() on a non-transacted session is a part of a User Transaction (XA).
  2. Multiple consumers are created in the same session prior to calling the first receive().
Otherwise, pipelining for a synchronous client is enabled for the rest of the session upon the first receive() when there is a single consumer on the session, presuming Prefetch Mode is enabled in the connection factory settings.
Knowing when pipelining is enabled or disabled is imperative to understand what conditions produce exceptions. If pipelining is already enabled for a session, and you perform one of the two conditions that would have caused it to be disabled (back when the session was newly created), you’ll get an exception.

When Do I Use it and How?

I think the question is not so much, “When do I use it?” as much as it is “When do I turn it off?” The performance advantage, presuming adequate producer-side performance, is significant. Presuming you don’t have a strict need involving message sorting, there isn’t much downside as long as you are using asynchronous consumers. Even using synchronous consumers, where transactions or consumer session sharing *might* (but shouldn’t, if you’ve read this blog) impact your usage.
The primary questions to ask on whether or not to enable pipelining, in general, are:
  • Is the utilization on my consumers currently low? Am I currently creating extra consumers to compensate for consumption rate in the presence of the low client utilization?
  • Are the JMS producers getting throttled or otherwise hitting quota because message consumption isn’t happening fast enough?
  • Are my messages small? Or are my messages very large? You may gain little to no advantage from enabling message pipelines with larger messages. Grouping large messages in a batch has some pretty negative consequences, and generally makes no sense. Think of it this way: Do you think receiving an acknowledgement is the time-consuming part of transmitting a 3 megabyte message? An average message size over 100 kB should be an indicator that this setting may possess less value for you.
  • Is throughput less of a consideration than latency? If so, batching messages together may make less sense than immediate sends. You may alternately benefit from simply keeping the number of pipelined messages low, in this case.
Message pipelines are turned on and set to 10 messages per session by default. This can be a conservative setting in some scenarios, and can frequently be set higher than a few hundred if the average message size is sufficiently low. You can explicitly turn it off, by setting it to “1”.
There is no way to guess at a generalized, ideal messages-per-session setting – except on a case-by-case basis. The following questions should be answered in order to guess at the initial setting (and you can tune from there):
  • What is the expected average message size?
  • What is the expected quantity of messages of the expected average size that the consumers can support?
  • What is the expected round trip time between the JMS server and the consumer? The smaller the round trip time, the less potential advantage there is in setting the messages per session at a higher number.
Fortunately, other than these considerations, MessagesMaximum is a relatively straightforward choice – there isn’t a special cluster consideration as with one-way sends.

Case Study

I’m simply going use the scenario from Part 1 and Part 2, and add on to it. To recap, I started with an out-of-the box configuration of a WebLogic JMS server (“Base” in the graph), and used the IBM Performance Harness producer and consumer for simulation. The producer threads are set to fire out as many non-persistent, non-transactional 1 KB messages at a JMS topic as the JMS server will take. The single-threaded, synchronous consumer was set to AUTO_ACKNOWLEDGE.
Adding quotas and quota-blocking sends (“Quotas Only” in the graph) reduced the large standard deviation in message rate caused the WebLogic JMS servers holding onto more messages than could be delivered, and paging the messages to disk. It also increased overall performance considerably.
Adding one-way sends (“One-Way Sends Enabled” in the graph) reduced the number of producer threads necessary to reach the level of performance seen in the Quotas test run.
Taking this configuration, and enabling “Prefetch Mode” (because I’m using a synchronous consumer) and setting “Maximum Messages per Session” to 300 (which is a guess, based on the size of the message, the presence of only one consumer, and the short round-trip time). Like my other efforts, there’s not a lot of purpose in perfecting these settings on my local machine, so I’m not concerned with ideal settings so much as illustrating the principle.
The hypothesis from the last blog entry (that message consumption was the bottleneck) seems accurate. Altering MessagesMaximum and enabling Prefetch Mode caused a fairly linear scaling for the producers up to 4 producers. Here, the scaling stops because the single consumer thread has become saturated. We can be confident this is true because: 1) UNIX top reports the thread is utilizing 100% of a CPU core, and 2) Adding a second subscriber to the topic doesn’t cause the message rate to alter significantly (each subscriber is getting > 90k messages per second, although this is not displayed in the graph).

Final Thoughts

There are fewer reasons not to use message pipelines (MessagesMaximum / Prefetch) than there are with One-Way Sends. Message backlog is valuable when: 1) Utilization is low in your consumers and producers are waiting (either due to quotas or throttling), 2) Message size trends towards smaller messages, and 3) You are willing to accept the transaction and message ordering caveats. As presented, performance can be dramatically improved (more than 4x in this case), and is quite simple to configure.

Friday, December 23, 2011

JMS Performance Tuning Series, Part 2: One-Way Sends


WebLogic JMS Performance Tuning Series, Part 2: One-Way Sends
Part 1 of this series can be found here.  It covered quotas, a performance and availability setting that should always be set to protect your JMS server from being overwhelmed.  This entry is about an optional setting that’s valid only in certain environments – those with lower Quality of Service (QoS) needs but higher performance needs.
Consider the basic structure of a messaging system.  There’s a producer, a consumer, and there’s the messaging server.

Now, for the moment, let’s suppose that we wait for a receipt of each message we send (and for the most part, we do).  Each time the producer thread creates a message, it enters the send() method, and then the thread blocks for the length of time that it takes for the server to indicate that the message send operation completed normally – amounting to one round trip worth of time.  This is known as a two-way send. 
This send behavior is a fairly “safe behavior,” or at least a pessimistic behavior.  But what if performance needs trump the need for guaranteed delivery?

Enter One-Way Sends…

You might think of one-way sends as “fire and forget.”  The producer does not wait for the server response.  Your performance advantage per thread may vary, but you may see an increase in message production by a factor of several times – often, this factor is determined by what the round trip time is (the longer it takes for the message to be delivered, the more advantageous one-way sends tend to be with respect to performance).
If one-way send were completely fire-and-forget, producers would continue to send messages after (perhaps long after) a JMS server has become unavailable.  This is why you can specify a one-way send window size.  The window size specifies the number of one-way sends allowed before a two-way send is required.  Determining an appropriate window size is a tradeoff between performance (i.e., larger window size) and mitigating message loss when the JMS server becomes unavailable (smaller window size).  Increasing the window beyond a certain size (the size of which is determined, in part, by your network) may yield progressively less performance benefit – thus, it requires some experimentation to arrive at an appropriate setting.
In WebLogic 11gR1PS4, you can enable one-way sends in your producer connection factory, on the “Configuration->Flow Control” tab.

Now, *BEWARE* - merely changing the One-Way Send Mode to something other than “Disabled” doesn’t mean that it’s actually enabled.  You might think you have it enabled and see no performance difference whatsoever.  This is because one-way send is implicitly disabled if *anything* that requires a higher level of Quality of Service, such as:
  • Transactions
  • Persistence
  • Unit of Work / Unit of Order 
  • Client Store-And-Forward
This makes a great deal of sense – if the producer is not listening for acknowledgements, then it’s not exactly going to maintain transactional integrity.  If you want to use Unit-of-Order or Unit-of-Work, how long would it take for the producer to figure out that one of the messages in the sequence is missing?  One-way sends are also implicitly disabled when the destination specified is the name of a distributed destination (DD) – more on this later, but one-way sends can be used with DDs, but very carefully.  Finally, they are also disabled if the connection factory and destination are on different WebLogic Servers.
On the other hand, enabling one-way sends on a connection factory effectively disables flow control.  This is somewhat intuitive.  If the producer has a configuration where it’s basically getting very little to no feedback from the JMS server (say, like one return message for every one-way send window size), there isn’t adequate opportunity to tell the producer to slow down.

Making a Batch of Proof Pudding (Extra Proof, Hold the Pudding)

Here I am taking the configuration from Part 1 of this series and enabling one-way sends with a window size of 150 messages.  As before, I’m not really interested in tuning my development machine to be the best possible messaging machine it can be, so the 150 number is just a guess that is used for illustration.  To review, the test producer threads spin out as many 1 KB messages as the server will take.  There is always only one consumer thread, and it is a synchronous, non-durable consumer that is set to “auto acknowledge.”  The one-way sends test also inherits the Quotas and Quota-Blocking Sends settings from the previous blog.  The test run that I created takes the average performance over 8 minute periods, and standard deviation (where noticeable) is denoted by the range bars above and below the data points for the average.  In this case, I’m using a topic.

The first thing you might notice is how the rate for a single producer thread triples (22k messages per second vs 7.1k MPS) when compared to the “Quotas Only” dataset, which reinforces the multiplicative effect I mentioned earlier.  Since the producer, JMS server, and consumer are all located on the same machine, the round-trip time was fairly low to start out with.  Looking at UNIX top, I was able to see that the utilization for that producer thread is now higher (but not even close to 100% utilization of one CPU core).  This is also in line with our expectations – the producer spends less time waiting, and more time sending.
The second thing you will probably notice is that the best results for one-way sends occur with only one producer, and the numbers very gradually diminish after adding more producer threads.  We can tell there is an artificial or unnecessary bottleneck from watching top (or whatever equivalent you might be using – Windows Task Manager?).  The utilization for the systems cores for the producers, WebLogic, and the consumer are still very low – which means we are doing some kind of waiting.  The bottleneck is due to the message consumer and is the topic of the next blog in this series.  While the in-depth explanation is coming, mull over a couple of key points:  1) Adding several additional subscribers doesn’t change the messaging rate per subscriber, and 2) There is very low utilization on each consumer thread.  In effect, we have a nearly identical problem on the consumer side as we did on the producer side (which we addressed with one-way sends).

Clusters and One-Way Sends

In WebLogic 11gR1PS4 (and previous releases), one-way sends are not directly supported with distributed destinations.  It’s also worthwhile to note that using one-way sends in clusters is more complicated than in individual servers.  The documentation is helpful with respect to how this might work, but I thought a little additional dialog on this topic might be useful.
The fundamental “trick” of using one-way sends within a WebLogic cluster is to ensure your connection factory that your producer is using and the (non-distributed) destination are in the same application server container. 

Case 1: Single Destination within a Cluster

This is the simplest to configure of the two.  Define your connection factory and target at a single server, not the cluster.  Create the destination, and target it at the same server – the destination is a singleton within the cluster.  Logically, it should look something like this:

This topology has the advantage of one-way sends but sacrifices the horizontal scalability of a distributed destination.  It also tends to create uneven utilization within the cluster.

Case 2: Multiple Destinations in the Cluster

Here is where it can get complicated.  You could just take the notion from Case 1, and extrapolate it over the cluster.  Then you would have a cluster full of different connection factories and different destinations, and this is probably difficult to manage from a code and configuration perspective.  You’d have to figure out how to distribute your producers fairly over the independent destinations – so, you’d likely be trading complexity in your WebLogic configuration for complexity in your producer code.
This (probably) non-ideal arrangement might look a little like this:

The documentation on OTN probably lists the best approach.  In brief, target one connection factory to all of the participating servers – this will most likely be the entire cluster.  Turn on “Server Affinity” at the connection factory so that producers become pinned to the individual destinations (this will disable RMI load balancing for the external producers).  Create one destination per server, each with a distinct name in the global JNDI (or no global JNDI name) and an identical name for local JNDI.  Now ensure that the producers use the local JNDI name and the created connection factory for the context creation.
This will look something like:

Now the connection and delivery logic can be generalized between the producers – no more accounting for the individual connection factories or destinations in code.  We now have to account for load balancing, as our connections are not automatically balanced by creating a connection. 
Now that RMI load balancing no longer occurs, we need to ensure load balancing is performed somewhere so that we don’t end up with one cluster member doing all of the work.  The OTN docs cover load balancing with affinity turned on quite well.  Take a look at how getting your initial context from the cluster can result in a load balance.  Once the context is created, the code utilizing that context is now pinned to that particular instance.  Subsequent initial contexts from the same client will get load balanced to other cluster members. 
This may not result in a desired level of fairness, and so some degree of additional load balancing may be necessary (particularly if all producers create only one initial context, resulting in all utilization occurring on one server).  Then, either DNS load balancing or network load balancing may be appropriate.

Case 3: Using One-way Sends with Distributed Destinations

Now, you may remember me saying that one-way sends are implicitly disabled if the destination specified is the name of a distributed destination.  This is true – but you can still manually target the physical DD members.  In practice, this is somewhat more complex than Case 2 as you will need to know the name of the JMS server you are targeting in order to use it.  The name of the physical DD member follows the form “MyJMSServerName@myDistributedQueueName” in versions of WebLogic Server 9.0 and newer.  This may look something like this:

This approach has several notable complications.
  • The producer is responsible for ensuring the JNDI lookup uses the correct JMS server name to make sure the destination is the one hosted by the WebLogic instance that the producer is connected to.
  • Effectively, since you are forcing a connection to a particular WLS instance, your producers will also not be load balanced.  You will need to align producers with specific portions of the DD to get reasonably fair distribution of message load.
  • The previous bullets make failure recovery more complicated as well.  If server_3 goes down, and you have configured the producer to connect to the next server in the list, some kind of additional logic will be needed to push the producer back to server_3 when it comes online.
So, it’s possible, but it’s complicated.  In many cases, it may be preferable to either use Case 2 or disable one-way sends altogether and just increase the number of producer threads to reach the desired message rates.
There is a possible approach that will simplify using DDs with one-way sends: Using a Foreign JNDI server to map to each physical DD member (each on the same app server) and turn on server affinity.  The advantage is that the client application doesn’t need to maintain a list of servers and separate destinations.  I haven’t tried this out yet, but I will blog about this once I have (and if there is sufficient interest).

Silent Deletions

If a producer is just blasting away sending messages without listening for acknowledgements, it probably shouldn’t be a surprise that the JMS server may need to delete the message without immediately informing the producer.  This is triggered by exceeding quota (covered quite well here).  If you’re thinking, “Well that shouldn’t happen,” keep in mind that the alternative is to keep accepting messages over and beyond the quota and risking server instability.
You can maneuver around this issue by adjusting the send timeout, which modifies the amount of time it takes to silently delete the message if the quota condition isn’t cleared.
The maximum number of messages that can be deleted silently is defined by the one-way send window size.  The one-way send window defines how many messages the producer can send before having to do a two-way send (a send while waiting for a return).  A little bit of research on what types of message surge conditions you are looking to support can help you scope out what these settings should be in order to claim that you will have no message loss unless certain metrics are exceeded.

Final Thoughts

While I indicated that you should always set quotas, one-way sends are a trade-off.  You have this magnificent performance advantage that you cannot use when you have high QoS needs or require transactions.  The bright side is that you have a pretty clear picture of when you can use it. 
A partial list of critical-to-understand caveats around one-way sends:
  • The use of one-way sends in a cluster within a cluster is somewhat more complicated.  See the documentation link provided, but the trick is to ensure that the connection factory and the destination are on the same physical WLS instance.
  • If the consumers cannot keep up with the producers, the performance will be determined by the capacity of the consumers (and enabling one-way sends may make little difference).
  • One-way sends do not directly work with distributed destinations without additional configuration.  It’s somewhat more complicated than other options.
  • Same link, but notice that conditions where messages over quota condition may be silently deleted when using one-way sends.
  • Enabling one-way sends effectively disables the WebLogic JMS Flow Control feature.  That said, you can still use quotas and quota blocking sends as a means to provide some control over message producers.
  • A number of QoS features implicitly disable one-way sends.  An understanding of the way one-way send option works should help you remember which settings interfere or interact with it. 
Next up in this series, MessagesMaximum!

Thursday, November 17, 2011

JMS Performance Tuning Series, Part 1: The relationship between production, consumption, and quotas


After presenting at Oracle OpenWorld on WebLogic JMS performance tuning, I began to think that some additional information would be welcome in blog format.  This entry will kick off a larger number of blogs on JMS performance, starting by introducing the basics and mandatory settings (as in, “You need to set this to ensure your system is available”) and eventually move on to more advanced topics (as in, “You just might need to set this, depending on what you are doing”).

The Basics

Java Message Service as a specification is explained here, but I think a couple of basic items are needed to proceed in this blog.  First, let’s consider that in every working JMS scenario there is at least one producer sending messages to the JMS Server, and at least one consumer taking messages off of the JMS Server. 
Ideally, the consumer’s capacity to process messages is limitless, and the JMS server is at least able to receive messages and handoff to the consumer as fast as or faster than the producer is sending them.  In reality, this is hardly ever the case.  The creation of messages by producers and the capability of the consumers to process messages fluctuate over time.
When the consumer(s) is unable to keep up with the producer, this gets somewhat complicated as the JMS Server is left holding the message “surplus.”  When and if the surplus continues to build, a number of physical limitations start asserting themselves. 
From the JMS Server’s standpoint, managing the messages in this “surplus” becomes more difficult as the surplus grows.  The first line of storage occurs in the JVM heap – but eventually, the backlog may outgrow what we want to be stored in the heap.  WebLogic Server helps manage this backlog, by paging the messages out to a messaging store, but generally we don’t want to see the stored messages grow to unmanageable sizes.  Further, paging only helps in scenarios where we are experiencing a surge in messages that is fairly short-lived, and we will eventually catch up in terms of message processing.  Imagine having 100,000 messages arriving per second, and your message store has 2 GB of backlogged messages waiting for consumption.  When will they be consumed?  How many of them are still relevant / valid?  When will I run out of storage space for backlogged messages?  What if my disks cannot write fast enough to keep up with the messages being paged to them?

Enter Quotas…

Quotas allow us to set the maximum message number or maximum total size in bytes that our JMS server or destination will allow.  Set both of them.  Really, you should do it.  Quotas define, in effect, what the server or destination is willing to hold before refusing to take more messages from the producers.  In WebLogic 11gR1 PS4, you can find Quota settings in the Administration console, in MY_DOMAIN_NAME->Services->Messaging->JMS Servers->MY_JMS_SERVER_N AME under the “Thresholds and Quota” tab.
 Let’s take a look at the settings provided here:
 Neither of these stop whatever is creating the messages (e.g., stock purchases, etc.) – it just prevents the producers from swamping the JMS server by defining when the server will stop accepting new messages.  This is a good thing.  If we let the server be inundated to the point of instability by not adding quotas, periods of high volume can bring down our server (which is always bad). 
Now, there’s always the question, “How should I size my quota?”  Good question (I’m glad I asked it), and one I can’t answer on this blog entirely.  Among other concerns, this really depends on:
  • Whether or not the WebLogic Server is being dedicated to JMS or not.
  • The expected heap size is.
  • Your tolerance to latency (larger quotas tend to mean garbage collection will take longer).
  • Expected variance in producer send rate.  
So let me leave you with this thought: What is your expected fluctuation rate with your producers?  Given that, the larger the quota size is, the more you open yourself up to diminished transactions per second (due to GC, paging, etc), what is the smallest you can set the quota to and still expect to fulfill your requirements?  Keep in mind, 1/3 of the heap is a not-unreasonable quota total for all of the JMS servers present.

Send Timeouts / Quota Blocking Sends

Once a specified quota condition has been reached, the WLS JMS server will start rejecting new messages from producers until the backlog is back under the quota.  We can supplement the quota settings by altering the Send Timeout on the Connection Factory (it’s under the “Default Delivery” tab) – this will tell the producer how long to wait on the destination or JMS server prior to timing out on its send() operation.  This has the effect of causing the producer thread to block for up to the length of time specified, waiting for the message backlog to fall under the quota condition.  This is particularly useful if there are multiple producers.
From a programmatic standpoint, this is the length of time your producer thread will wait before it comes back with a timeout exception.  This has the net effect of slowing down the producers when you’ve exceeded your quota.  Do not mistake this mechanism for throttling.  Think of this as your insurance measure to guarantee your server is up, responsive, and continuing to provide messages to the consumer.  Quota blocking sends add a level of resiliency or forgiveness to the system. They enable the send operation to complete if space opens up during the wait time, which means that the application doesn’t need to handle resource exception when the quota is exceeded, especially if space is freed up quickly.
How should you determine your send timeout?  Again, depends.  After what time does your message lose its value?  What happens if your application holds onto a message for longer periods of time – does it continue to generate or queue up more messages while waiting?  For that matter, what is a more desirable behavior: Rolling back a transaction, or waiting?  I tend to keep my send timeouts under one second as a rule of thumb, but your mileage may vary.

Final Words on Quotas

I don’t often say “always configure x this way,” but always, always, always configure message quotas on WebLogic JMS.
Take a look at this graph, which shows a sample JMS server with no quotas set under heavy load, showing the average message rates for an 8 minute period:

This graph shows results from a performance harness that floods the server with as many messages as possible. This isn’t exactly a real-world scenario, but it can help me illustrate my point. Can you see how the standard deviation rate becomes more pronounced as more producers are added?  This is not good – the standard deviation range is larger than the actual average message rate, which means we simply don’t have predictable performance or availability. The server is over-loaded. It is doing everything it can to keep up with the unyielding stream of messages by paging to disk, which also results in frequent full garbage collections.
Now take a look at what happens when we add a quota of a size (100 MB) that I chose without research or tuning (i.e., there could be better or worse quota sizes, but this isn’t the point):

Note the disparity, not just in the message rate, but that the standard deviation is no longer visible.  The performance increase is experienced because the quota has prevented the server from being overwhelmed. 
Until next time.