Ankur Kumar's Blog - Pragmatic Best Practices for Digital Architecture: Coherence Deployment

Coherence is a reliable in-memory data grid product offering OOTB failover & continuous availability with extreme scalability. But we at times, face challenges during Coherence deployment and tend to lean towards clean restart of entire Coherence Cluster. This defeats the purpose of 24x7 availability of data grid layer and eventually the availability of dependent applications as well.

I came across this discussion with several people and hence sharing my thoughts on the entire Coherence Deployment Strategy, which does not require any downtime ensuring continuous availability.

In my opinion, there are particularly three high-level scenarios with respect to Coherence deployment:

Scenario 1 - Deployment of Application, which is using Coherence Data Grid Layer

Problem Statement: Typically, this is the case when there are multiple web or native applications backed-up by Coherence data grid layer. Often, infrastructure team tends to restart Coherence Cluster during the process of deployment causing downtime to cache layer & eventually the entire application. This causes extended downtime of entire application (even hours) as clean restart of Coherence usually takes time.
Solution Approach:

As a best practice, Coherence Cluster shutdown & restart should be avoided wherever possible. Coherence does not require to be cleanly restarted unless there are changes in libraries (which is second scenario below).
If there is requirement to clean-up existing Cache entries and replace them with new cache entries, then it is more of a change in application version maintenance of cache items than cache system. Typically, each cache item can have version information (getter method like getVersion()) attached to it and post deployment, previous version entries can be discarded by the application.
You can also refer to Cache Invalidation Strategies, which comes as an OOTB feature in Coherence.

Scenario 2 - Deployment of Application with updated Coherence Application Libraries, which is using Coherence Data Grid Layer

Problem Statement: This scenario is applicable in cases where there is usage of Coherence Application Cache particularly where read-through or write-through patterns are implemented. In this case, application specific JAR files or libraries need to be updated on Coherence Nodes & hence infrastructure team tends to shutdown entire Coherence Cluster with clean restart.
Solution Approach:

As a best practice, Coherence Cluster shutdown & restart should be avoided wherever possible.
A cyclic restart (or rolling restart) can help in this case along with version based maintenance & cache invalidation strategies of cache items (as explained in scenario 1).
Note that invalidation or cache item clean-up plays critical role as even if Coherence Nodes get restarted, data will get automatically backed-up in data grid layer (by other nodes). In essence, failover feature is acting against clean deployment in this case and hence need to be careful in clean-up approach in this case.

Scenario 3 - Coherence Configuration Change as part of Deployment

Problem Statement: This scenario is applicable in cases where there are changes in Coherence Configuration (Cluster Configuration or otherwise). Note that even if there is any difference (even minor) in configuration of any Coherence Node, it will get rejected by Coherence Cluster. For example, if there is change in Security Configuration (using override file) or TTL change or Coherence Edition Change.
Solution Approach:

The easiest approach is to shutdown entire Coherence Cluster (JMX monitoring can help to make sure all Coherence Nodes are down) and post configuration change, restart all nodes. But it defeats our purpose of ZERO DOWNTIME.
If Zero downtime is needed, then we need to:

Setup an entirely new Coherence Cluster (e.g. by assigning a new multicast IP address or change of mutlicast port)
Make Configuration Changes & do fresh deployment on new cluster
Do cyclic restart of dependent application servers using new Coherence Cluster setup
Discard Old Coherence Cluster post migration of old applications to new Coherence Cluster

There can be multiple other deployment scenarios possible but they can be variation of scenarios described above (at least in my mind).

Hope it helps to all those people who are seeking Zero Downtime Deployment without paying extra for other products like Oracle GoldenGate to achieve the same.

Disclaimer:

All data and information provided on this site is for informational purposes only. This site makes no representations as to accuracy, completeness, correctness, suitability, or validity of any information on this site and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.This is a personal weblog. The opinions expressed here represent my own and not those of my employer or any other organization.

Oracle Coherence is a in-memory data grid product, which is also being used commonly for Session Replication across cluster of application server nodes. It supports wide variety of application servers like WebLogic, WebSphere, Tomcat, JBoss, etc. Coherence*Web is Session Management module (built on top of Coherence) used for managing session information in clustered environment.

I would recommend following best practices w.r.t. Coherence*Web & Coherence usage particularly for Session Management (it can also be applied in other Coherence scenarios):

Coherence Deployment Topology

Coherence supports three deployment modes:

¨ In-process - Application servers that run Coherence*Web are storage-enabled, so that the HTTP session storage is co-located with the application servers. No separate cache servers are used for HTTP session storage.

¨ Out-of-process - The application servers that run Coherence*Web are storage-disabled members of the Coherence cluster. Separate cache servers are used for HTTP session storage.

¨ Out-of-Process with Coherence*Extend - The application servers that run Coherence*Web are not part of a Coherence cluster; the application servers use Coherence*Extend to attach to a Coherence cluster which contains cache servers used for HTTP session storage.

Recommendation:

If there is need for Coherence to extend its boundaries beyond core Coherence TCMP (internal protocol used by Coherence), use Coherence*Extend, which supports Java, .Net & C++ clients.

In most of the scenarios, out-of-process is recommended topology because it has dedicated cache server nodes running independently promoting loose-coupled physical architecture.

For Session Replication, sharing associated application server memory (heap) with Coherence using in-process deployment creates dependability. If application server memory usage increases, it will impact Coherence performance as well & vice-versa.

Please make sure the following for Out-of-process configuration:

¨ Application Server Nodes are running in Storage-disabled mode. You need to pass both of these command-line parameters (or by using Coherence over-ride file) to application server JVM :

-Dtangosol.coherence.session.localstorage=false

-Dtangosol.coherence.distributed.localstorage=false

Please note that setting session storage property explicitely is needed as by default it is “true” in “session-cache-config.xml”:

………………….

<local-storage system-property="tangosol.coherence.session.localstorage"

………………….

¨ Coherence Dedicated Nodes need to be storage enabled (otherwise there is nobody to store session attributes) and should either use “session-cache-config.xml” or custom cache configuration file with session cache configured in it:

java –Xms512m -Xmx512m -cp /usr/local/coherence_3_6/lib/coherence.jar:/usr/local/coherence_3_6/lib/coherence-web-spi.war:/usr/local/coherence_3_6/lib/commons-logging-api.jar:/usr/local/coherence_3_6/lib/log4j-1.2.8.jar

-Dtangosol.coherence.cacheconfig=../../../webapps/example/WEB-INF/classes/session-cache-config.xml -Dtangosol.coherence.log.level=6 -Dtangosol.coherence.ttl=2 -Dtangosol.coherence.log=log4j -Dtangosol.coherence.edition=EE

-Dtangosol.coherence.session.localstorage=true com.tangosol.net.DefaultCacheServer

Coherence Cache Topology

Coherence supports five different types of Cache based upon four cache topologies:

¨ Local Cache Topology: Local Cache

¨ Partitioned Cache Topology: Distributed (or Partitioned Cache)

¨ Replicated Cache Topology: Replicated Cache, Optimistic Cache

¨ Hybrid Topology (Local + Partitioned): Near Cache

You can use following simple guidelines in choosing appropriate type of cache:

Scenario	Recommended Cache Type
· You need faster read & write · You don’t need fault tolerance (caution: no fault tolerance)	Local Cache
· You need faster read with best fault tolerance · Write will be comparatively good but will have latency for copying updated data across · Typically used to store metadata or configuration data Note: Scale-out (horizontal scalability) can not be linear.	Replicated Cache
· You need faster write but best fault tolerance · Read will be comparatively faster but depend on whether it reads from local or remote node	Partitioned or Distributed Cache
· You need faster write but best fault tolerance · Read will be comparatively faster but depend on whether it reads from local or remote node · Affinity boost performance of read-heavy application with moderate writes	Near Cache (backed up by Partitioned Cache)

Executing Production Checklist

Coherence recommends executing list of checklist on production environment to make sure environment & infrastructure has recommended settings/configurations particularly in following areas:

¨ Network:

· Multicast Test: If you are using multicast clustering, this test is must to make sure multicast configuration is correct & working properly.

· Datagram Test – Before deploying your application, you must run it to make sure that there is no packet-loss in your network. Note that in 1GbE network, you should use 100MB packets for testing & minimum (not average) success rate should be close to 100% (~98-99%)

· TTL – It is very important setting for multicast network & usually 2-5 is recommended value in production environment

¨ Hardware, OS & JVM Settings

¨ Coherence Editions & Modes:

· Needless to say, Coherence mode should be PROD in production environment. It needs to be specified on command-line as override configuration file can not be used for Edition & Mode.
-Dtangosol.coherence.mode=PROD

· By default, Coherence runs in GE (Grid Edition) & it is very important to use appropriate edition (as per your license & needs) to specify the correct edition.

-Dtangosol.coherence.edition=EE

Note that all the nodes in cluster should use same edition.

Executing Performance Tuning Guidelines

Coherence suggests tuning for: OS, Network, JVM & Coherence Network.

Please refer to Coherence Performance Tuning guidelines (reference section) for more details.

Enable JMX for Monitoring Coherence

Coherence provides OOTB support for JMX-based monitoring for its cluster, nodes, caches & others.

It needs at least one node to act as manager & rest of the nodes in cluster can publish their statistics using JMX.

For management node,

-Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true

-Dtangosol.coherence.management.jvm.all=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port= -Dcom.sun.management.jmxremote

For other nodes, you can simply remove “tangosol.coherence.management” command-line parameter.

Also, note that in above case, JMX authentication is not enabled (which needs to be secured) & JMX port needs to be specified as well.

Using Log4J for Coherence Logs

Though Coherence has its own logging mechanism, Log4J is more beneficial in terms of log rotation & controlling appropriate log levels.

Note that you can use both Coherence Log Level parameter (-Dtangosol.coherence.log.level) & Log4J configuration for logging level.

Follow these steps to enable Log4J for Coherence:

¨ Coherence does not have log4j libraries & hence you need to add following jars to classpath:

a. Copy “commons-logging-api.jar” & “log4j-1.2.8.jar” to /lib folder

¨ Create/Modify your Log4J XML file & put that in classpath of your Coherence JVM.

¨ Set command-line parameter (or use override file) to specify log parameter value as “log4j”.

Note that Coherence assumes that Log4J XML will have Logger Name as “Coherence” otherwise you need to specify logger name by having separate parameter “tangosol.coherence.log.logger”.

Example:

Cache Server Startup Script

JAVA_OPTS="-Xms$MEMORY -Xmx$MEMORY -Dtangosol.coherence.log.level=6 -Dtangosol.coherence.log=log4j -Dtangosol.coherence.log.logger=MyCoherence"

$JAVAEXEC -server -showversion $JAVA_OPTS -cp "$COHERENCE_HOME/lib/coherence.jar:$ "$COHERENCE_HOME/lib/commons-logging-api.jar:$ COHERENCE_HOME/lib/log4j-1.2.8.jar" com.tangosol.net.DefaultCacheServer $1

Log4J XML

......................

<appender-ref ref="CoherenceAppender"/>

....................

Review Coherence*Web Context Parameter

There are several Coherence Web Context Parameters, which need to be adjusted when you are installing Coherence*Web in your web application, particularly following:

¨ coherence-enable-sessioncontext

¨ coherence-session-id-length

¨ coherence-session-urlencode-enabled

¨ coherence-session-thread-locking

¨ coherence-sticky-sessions

¨ coherence-reaperdaemon-assume-locality

¨ coherence-enable-suspect-attributes

Note: These parameters are configured in web.xml & got instrumented when Coherence*Web install utility is invoked.

Using Coherence as L2 Cache Provider

Coherence can also be used as L2 Cache provider for ORM frameworks in-use. Having Coherence as L2 cache enables enterprise level caching for your ORM L2 caches as well.

To configure the same, you need to specify Coherence as L2 Cache Provider (particularly for Hibernate L2 Cache):

¨ Specify Coherence as L2 Cache provider in Hibernate Configuration file:

com.tangosol.coherence.hibernate.CoherenceCacheProvider

</prop>

¨ Configuration for Hibernate L2 Cache is loaded based on following parameter. There is default L2 Cache configuration file already in place.

-Dtangosol.coherence.hibernate.cacheconfig = /hibernate-cache-config.xml

References

Coherence User Guide: http://download.oracle.com/docs/cd/E18686_01/coh.37/e18690/toc.htm

Disclaimer:
All data and information provided on this site is for informational purposes only. This site makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information on this site and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.This is a personal weblog. The opinions expressed here represent my own and not those of my employer or any other organization.

Ankur Kumar's Blog - Pragmatic Best Practices for Digital Architecture

Wednesday, January 15, 2014

Zero Downtime of Coherence Infrastructure (24x7 Availability) as part of Planned Deployment Strategy

Thursday, August 11, 2011

Oracle Coherence Best Practices for Session Management (Replication) in Clustered Application Servers Environment

Blog Archive

Followers