Tuesday, March 20, 2012

WebLogic Clusters and the Singleton Service

Ever need to have exactly one object (a single object method invocation) in a cluster of many Oracle WebLogic application servers that supports failover?  The EJB 3.1 @Singleton annotation only guarantees an EJB singleton per JVM, and creating a singleton in the traditional Java SE-fashions (see Joshua Bloch’s article at Dr Dobbs) only guarantees a singleton per class loader.
WebLogic Server provides support for such a cluster-wide singleton (here, scroll down to “Implementing the Singleton Service Interface”), which I had the chance to experiment with during the past week.  The documentation on this feature is adequate to get it running for the first time, but I thought some additional detail around it might be useful.

What is the SingletonService, and what does it provide?

weblogic.cluster.singleton.SingletonService is an interface that you can implement in your Plain Old Java Object.  It’s not applicable to EJBs, MDBs, or other objects whose lifecycles are managed by the application server.
SingletonService provides the two abstract methods activate() and deactivate().  activate() is invoked when the class becomes the designated cluster-wide singleton (i.e., server startup, failover and migration, or if the application is re-deployed).  deactivate() is more or less invoked on the inverse side of those operations: Server shutdown, failover and migration, and application un-deployment.
This is really all the Singleton Service interface provides: The invocation of the two implemented methods at the appropriate times, the guarantee that only one instance is active, and the behavior of starting activate() on another server in the cluster.  This seemingly basic functionality can be very powerful in the right use case, however.

What are some valid use cases for Singleton Service?

Singletons, even in the Java SE usage, usually require some additional consideration and planning. It shouldn’t be surprising that a singleton construct outside of the Java SE and EE APIs would require additional care as well.  The first step in correct implementation is developing an understanding for what use cases the cluster-wide singleton is (and isn’t) appropriate for.
This is not, by any means, a comprehensive list.  Feel free to add use cases you have used it for as well in the comments.

Timers / job schedulers in a cluster

The basic use case is getting a single server to fire off timed events within a cluster and support failover in the event that the server is turned off / unplugged / exploded.  You only want a single server firing off these timed events (e.g., JMS messages sent to a topic that prompts subscribers to report some kind of status).  This would be along the lines of a cluster-aware cron job. This is ordinarily not easy to achieve unless you configure a heterogeneous cluster or an external cron job– and that makes failover a concern.
SingletonService makes this use case relatively simple to implement, since you only have it active on one server at a time and failover is taken care of for you.  James Bayer wrote about this back in 2009, so you should be able to follow his example for implementation.  You may not want to have the scheduler be the singleton service, but you can use the singleton service to construct and start the timer.

Use the SingletonService to handle other un-clusterable services

What if you wanted to run a service that cannot be clustered in a meaningful way?  How about a Java-based email server?  Perhaps you need a file, FTP, or email client poller?
Using the activate() and deactivate() methods, you can create the service within the cluster exactly once, and ensure that the service will migrate over to another cluster member on server shutdown.

Single Source for State or Properties for Clustered Applications

Usually, using a database or an in-memory cache like Coherence are more desirable options to store application properties, due to both reduced complexity and ease of implementation. Using the database might not be an option, however, since it’s possible that the connection to the database may be transient or the information may be needed prior to connecting to the database.  It’s also possible that an in-memory cache is not currently in the environment.
Given that you want a single place to store and update the information, and not have to redeploy the application or restart the server to get the change to take effect, you could use the Singleton Service to store state / properties.  In this case, I’ve provided a sample.

My Example

My example is composed into two main parts: the Singleton Service POJO (and its interface), and the application that invokes methods from the “JEE world” – in this case, an Enterprise Java Bean.  I’ve got a couple of basic requirements: I need to reference the POJO from anywhere in the cluster via JNDI, and I want it to carry some kind of state.
Because I need to bind the object to JNDI, and access its methods, I need to start with an interface that extends Remote (i.e., I am going to use Remote Method Invocation).  Nothing profound, I just need modifiers to a private integer.
 package com.darrel.samples;  
 import java.rmi.Remote;  
 import java.rmi.RemoteException;  
 public interface MySingletonServiceInterface extends Remote {  
    public void setMyValue(int value) throws RemoteException;  
    public int getMyValue() throws RemoteException;  
Now I need to create the implementation:
 package com.darrel.samples;  
 import java.io.Serializable;  
 import javax.naming.Context;  
 import javax.naming.InitialContext;  
 import javax.naming.NamingException;  
 import weblogic.cluster.singleton.SingletonService;  
 public class MySingletonServiceClass implements 
    SingletonService, Serializable, MySingletonServiceInterface {  
    private static final long serialVersionUID = 3966807367110330202L;  
    private static final String jndiName = "MySingletonServiceClass";  
    private int myValue;  
    public int getMyValue() {  
       return myValue;  
    public synchronized void setMyValue(int myValue) {  
       this.myValue = myValue;  
    public void activate() {  
       System.out.println("activate triggered");  
       Context ic = null;  
       try {  
          ic = new InitialContext();  
          ic.bind(jndiName, this);  
          System.out.println("Object now bound in JNDI at " + jndiName);  
          myValue = 5;  
       } catch (NamingException e) {  
          myValue = -1;  
          try {  
             if(ic != null) ic.close();  
          } catch (NamingException e) {  
    public void deactivate() {  
       System.out.println("deactivate triggered");  
       Context ic = null;  
       try {  
          ic = new InitialContext();  
          System.out.println("Context unbound successfully");  
       }catch (NamingException e){  
The basics are there – I implement the abstract methods from Singleton Service.  I use activate() to initialize a value for myValue.    I also bind (and unbind upon deactivation) the object in JNDI as “MySingletonService” (creative, I know).  Concurrency is definitely an issue, so the synchronized modifier on setMyValue() is very, very necessary.
I created an EJB to access the POJO, and added web service annotations for testing purposes.
 package com.darrel.samples;  
 import javax.ejb.Stateless;  
 import javax.jws.WebMethod;  
 import javax.jws.WebService;  
 import javax.naming.Context;  
 import javax.naming.InitialContext;  
 public class SingletonTestingBean   
 implements SingletonTestingBeanRemote,   
    int myValue;  
    public SingletonTestingBean() {}  
    public String sayHelloInternalValue(String firstname) throws Exception {  
       System.out.println("sayHelloInternalValue invoked");  
       Context ctx = new InitialContext();  
       MySingletonServiceInterface mssc = (MySingletonServiceInterface)   
       myValue = mssc.getMyValue();  
       return "Hello " + firstname + ", my value is " + myValue;           
    public int addInternalValue(int myInt) throws Exception {  
       Context ctx = new InitialContext();  
       MySingletonServiceInterface mssc = (MySingletonServiceInterface)   
       mssc.setMyValue(mssc.getMyValue() + myInt);  
       myValue = mssc.getMyValue();  
       return myValue;  
Not much new here – a context lookup to the object, and simple setters and getters.  I’ve excluded the remote and local interfaces for brevity.
To build and bundle our new Singleton Service and its interface into a JAR, you will need to add weblogic.jar to your class path at build time.  Since I used a plain Java project, I had to add weblogic.jar as an external JAR to the project build path.
I added the resulting JAR to my WebLogic Domain’s /lib folder.  The EJB project can be built with the JAR in the build class path.  In Eclipse, you could do this via the “Required projects on the build path” dialog:

Now we need to configure the cluster for the Singleton Service. In the WebLogic Administration Console, navigate to your cluster, and then to the “Migration” tab.  You will need to have migration set up in some way, I used “Consensus” to avoid using a database for this example, but your production model may have different needs entirely.

Now you need to navigate to the cluster’s “Singleton Services” tab, and create a new Singleton Service.
You will want to use the fully qualified class name for the singleton:

Set your preferred server and we are ready for deployment and testing.  Deploy the EJB project to the cluster. You can use the WebLogic Test Client to verify functionality, as the EJB Web Service will provide web test points for you to use. 
In my test, I used the addInternalValue() method on the EJB hosted on server1 to add 8, returning the total value of 13.

Then, I used the sayHelloInternalValue() method from server2, using the argument “World” – note that the value displayed is 13, which implies that server2 is indeed invoking methods to the same object as server1.

This is also a good time to look at the console output – take a look at your server.out for the preferred server, you should see the System.out.println() from the activate() method.
 <Mar 2, 2012 5:25:20 PM CST> <Notice> <WebLogicServer> <BEA-000360> <The server started in RUNNING mode.>   
 activate triggered  
 Object now bound in JNDI at MySingletonServiceClass  
To verify migration, try shutting down the preferred server and you will then see the activate() method’s print statements in the console output of one of the other servers in your cluster.  If you shut down each server as the singleton becomes active on that server, you should see this output in the console output in every server in the cluster. 

Alternate methods of building and deploying

My initial attempt at deploying this Singleton Service was by bundling it in a JEE utility JAR that was inside of the EJB EAR, and using the weblogic-application.xml deployment descriptor of the EAR to register it as an app-scoped singleton service.  The xml snippet regarding the singleton service would look like this:
This approach has the merit of not requiring a server restart to get class changes to take effect after updating the Singleton Service – you only need to redeploy the EAR.  It also simplifies the maintenance of your build path, since the WebLogic System Libraries should already be there – unlike in my example, where I had to add the weblogic.jar file externally.  Further, this eliminates the step of modifying the cluster configuration in the Administration Console to account for the singleton.  Finally, since you are no longer adding your singleton class to the $DOMAIN_HOME/lib, administrators will not have to directly modify the class path to transition your application to production.

Implications for Use

Individually, the singleton service doesn’t benefit from the linear performance scaling of the cluster, just the failover capabilities.   This doesn’t mean that you can use the singleton service to create scalable and performance-driven services, merely that you can’t directly leverage the cluster to do so.  For example: JMS servers exist as form of singleton services within the cluster, but address scaling by hosting distributed destinations.  The constituent, physical destinations of the logical distributed destination are individually hosted by the singleton JMS servers.  In this case, however, quite a bit more is happening than simply registering a POJO in JNDI.
Concurrency is a consideration when providing access to class members in the singleton that are not thread-safe.  In the above example, I use a synchronized method to address concurrent access to the data primitive class member.  This is an observation that should be readily apparent (certainly, it would also be true in the case of any other type of singleton), but is worth mentioning as a warning.
“What are other people using this for?” you might ask.  Of the customers I have encountered, a majority have used the SingletonService as a means to create a High Availability service out of a service that is fundamentally unable to be clustered (like an FTP destination poller).  The SingletonService capability enabled them to avoid deploying the application to a stand-alone managed server, and the failover capability allows them to ensure that the service stays up as long as the cluster does.

Final Thoughts

I’d be curious to find out what you are using the Singleton Service for – please leave your use case in the comments, if you can share.

Friday, February 24, 2012

WebLogic JMS Performance Tuning Series, Part 3: Consumer-Side Message Pipelining

Following Part 1 and Part 2 of this series, this entry is more in line with Part 2 as it is a setting that is particularly useful in lower Quality-of-Service, higher volume environments.
Reusing the diagram from Part 2, consider the basic structure of the average message send():
For a moment, let’s consider what it would look like if the WebLogic JMS server had to wait for a consumer that does not allow a backlog, and confirm the receipt of each message individually.
Note that, for both participants, the waiting constitutes a significant portion (if not the majority) of time. The longer the roundtrip, the more pronounced this effect is. This results in a low utilization of both the consumer and the JMS server. We frequently don’t want this behavior – it’s slower, and the usual manner of compensating for it is to add more consumer threads.
Consider the one-way sends from Part 2: The underlying trick is to remove the waiting, in favor of accepting a diminishment of message quality of service. Message pipelining (called “Maximum Messages per Session” in the WebLogic Administration Console) setting is a JMS configuration for the messaging consumer’s connection factory that is somewhat similar, as it may require a tradeoff for added performance.

Turn on the Speed: MessagesMaximum

Maximum messages per session (MessagesMaximum for short or in the configuration XML file), like many phrases or words with “Max” in them, can make a pretty spectacular performance gain in certain scenarios and is always welcome at the best of parties.
You can find this setting in the connection factory “Client” tab in the WebLogic Administration Console.
One of the interesting things about message pipelining is that the setting works a little differently based on your JMS client type. The main purpose of message pipelining is to lower the amount of time the client spends waiting, and increase the ratio of time that is spent by the JMS server transmitting messages. The message pipeline (also referred to as a “message backlog”) is created by sending more than one message to a consumer prior to receiving an acknowledgement.

Case 1: Asynchronous Client

If you’re using an MDB or otherwise using a client with an onMessage() method and implementing a MessageListener interface, you are using an asynchronous client. When you are using an asynchronous client, the “Maximum Messages per Session” setting applies to the message pipeline on the consumer side.
The message pipeline is in effect when the consumers unable to take messages off of the destination of the JMS server as fast as the messages are put there by the producers. Until production is faster than consumption, individual messages are received by the consumer from the JMS server in a two-way send. When production outpaces consumption, messages begin to be sent in batches to available, asynchronous consumer sessions.

The batch style of message delivery from the messaging server provides both the performance benefit of lowering the number of two-way sends and generally having messages more immediately available for consumption by the onMessage() method.
A potential downside is that the message pipeline very clearly affects memory consumption on the JMS consumer side, so getting optimal performance with this setting may be a balancing act if heap consumption becomes a concern on the consumer. If the pipeline is too large, you might wind up with one consumer overwhelmed by a huge backlog of messages when the other consumers are doing nothing.
There are a few behaviors to consider prior to implementing message pipelines for asynchronous consumers:
  • Messages in the pipeline will not be in the destination’s configured sort order. This isn’t surprising – if the messages have already left the server, the server isn’t going to be sorting messages that are now on the client. The messages are sorted, however, prior to being sent in batch to the client.
  • The message pipeline is sometimes sent as a single T3 message, which makes it easier to go over the MaxT3MessageSize. Generally this is more of a concern with larger messages (> 1MB), but it depends on your pipeline size setting and the average message size.

Case 2: Synchronous Client & Prefetch Mode

If you are receiving messages with receive() (or receive(long timeout), receiveNoWait()), you’re receiving synchronously. The consumer makes a two-way call to the JMS server to see if there is a message available, and retrieves it, if possible – it’s a polling behavior. If there is no message available, the call’s thread blocks for the specified time, waiting for the next message on the destination to arrive.
This is the behavior for synchronous clients unless “Prefetch Mode for Synchronous Consumers” is enabled. You can find this in the Administration Console, under your Connection Factory settings in the “Client” tab.
Like the asynchronous client message pipelining behavior, synchronous clients with Prefetch Mode enabled receive batches of messages when the client invokes the receive() method. The number specified in “Maximum Messages per Session” will apply here, as well. Despite the batches of messages that are sent to your client JVM, the receive() method returns messages individually. De-batching takes place in the code provided in the WebLogic client libraries – so no de-batching is needed in the user-written consumer / subscriber code.
As with the asynchronous client, performance improves if pipelining results in the consumer spending less time waiting and more time processing. There is also the added benefit that, since the consumer generally receives more than one (possibly many more) message per polling attempt, which reduces the amount of polling necessary and, therefore, overall network traffic. Overall, the trend is towards higher consumer utilization.
Pipelining works differently with user transactions (XA). It also behaves differently when more than one consumer shares the same session. I invite you to read the docs on this. They state that User Transactions (XA) will either silently ignore the Prefetch Mode setting, or the consumer will fail to retrieve the message and generate an exception (the same applies to multiple consumers on the same session). The docs didn’t clarify this adequately for my purposes, so I will expound a bit after having experimented with this on WebLogic. Keep in mind that these are just my findings, and not official aims or requirements of the product.

Synchronous Clients, User Transactions and Session Sharing

WebLogic implicitly disables Prefetch Mode / pipelining for the rest of the session when:
  1. Using an XA-enabled connection factory, the first receive() on a non-transacted session is a part of a User Transaction (XA).
  2. Multiple consumers are created in the same session prior to calling the first receive().
Otherwise, pipelining for a synchronous client is enabled for the rest of the session upon the first receive() when there is a single consumer on the session, presuming Prefetch Mode is enabled in the connection factory settings.
Knowing when pipelining is enabled or disabled is imperative to understand what conditions produce exceptions. If pipelining is already enabled for a session, and you perform one of the two conditions that would have caused it to be disabled (back when the session was newly created), you’ll get an exception.

When Do I Use it and How?

I think the question is not so much, “When do I use it?” as much as it is “When do I turn it off?” The performance advantage, presuming adequate producer-side performance, is significant. Presuming you don’t have a strict need involving message sorting, there isn’t much downside as long as you are using asynchronous consumers. Even using synchronous consumers, where transactions or consumer session sharing *might* (but shouldn’t, if you’ve read this blog) impact your usage.
The primary questions to ask on whether or not to enable pipelining, in general, are:
  • Is the utilization on my consumers currently low? Am I currently creating extra consumers to compensate for consumption rate in the presence of the low client utilization?
  • Are the JMS producers getting throttled or otherwise hitting quota because message consumption isn’t happening fast enough?
  • Are my messages small? Or are my messages very large? You may gain little to no advantage from enabling message pipelines with larger messages. Grouping large messages in a batch has some pretty negative consequences, and generally makes no sense. Think of it this way: Do you think receiving an acknowledgement is the time-consuming part of transmitting a 3 megabyte message? An average message size over 100 kB should be an indicator that this setting may possess less value for you.
  • Is throughput less of a consideration than latency? If so, batching messages together may make less sense than immediate sends. You may alternately benefit from simply keeping the number of pipelined messages low, in this case.
Message pipelines are turned on and set to 10 messages per session by default. This can be a conservative setting in some scenarios, and can frequently be set higher than a few hundred if the average message size is sufficiently low. You can explicitly turn it off, by setting it to “1”.
There is no way to guess at a generalized, ideal messages-per-session setting – except on a case-by-case basis. The following questions should be answered in order to guess at the initial setting (and you can tune from there):
  • What is the expected average message size?
  • What is the expected quantity of messages of the expected average size that the consumers can support?
  • What is the expected round trip time between the JMS server and the consumer? The smaller the round trip time, the less potential advantage there is in setting the messages per session at a higher number.
Fortunately, other than these considerations, MessagesMaximum is a relatively straightforward choice – there isn’t a special cluster consideration as with one-way sends.

Case Study

I’m simply going use the scenario from Part 1 and Part 2, and add on to it. To recap, I started with an out-of-the box configuration of a WebLogic JMS server (“Base” in the graph), and used the IBM Performance Harness producer and consumer for simulation. The producer threads are set to fire out as many non-persistent, non-transactional 1 KB messages at a JMS topic as the JMS server will take. The single-threaded, synchronous consumer was set to AUTO_ACKNOWLEDGE.
Adding quotas and quota-blocking sends (“Quotas Only” in the graph) reduced the large standard deviation in message rate caused the WebLogic JMS servers holding onto more messages than could be delivered, and paging the messages to disk. It also increased overall performance considerably.
Adding one-way sends (“One-Way Sends Enabled” in the graph) reduced the number of producer threads necessary to reach the level of performance seen in the Quotas test run.
Taking this configuration, and enabling “Prefetch Mode” (because I’m using a synchronous consumer) and setting “Maximum Messages per Session” to 300 (which is a guess, based on the size of the message, the presence of only one consumer, and the short round-trip time). Like my other efforts, there’s not a lot of purpose in perfecting these settings on my local machine, so I’m not concerned with ideal settings so much as illustrating the principle.
The hypothesis from the last blog entry (that message consumption was the bottleneck) seems accurate. Altering MessagesMaximum and enabling Prefetch Mode caused a fairly linear scaling for the producers up to 4 producers. Here, the scaling stops because the single consumer thread has become saturated. We can be confident this is true because: 1) UNIX top reports the thread is utilizing 100% of a CPU core, and 2) Adding a second subscriber to the topic doesn’t cause the message rate to alter significantly (each subscriber is getting > 90k messages per second, although this is not displayed in the graph).

Final Thoughts

There are fewer reasons not to use message pipelines (MessagesMaximum / Prefetch) than there are with One-Way Sends. Message backlog is valuable when: 1) Utilization is low in your consumers and producers are waiting (either due to quotas or throttling), 2) Message size trends towards smaller messages, and 3) You are willing to accept the transaction and message ordering caveats. As presented, performance can be dramatically improved (more than 4x in this case), and is quite simple to configure.