Fuzzy JMS Tuning for WSO2 APIM MI and Azure Service Bus

After you’ve installed APIM Micro Integrator and successfully connected to Azure Service Bus and are running high volume traffic you might run into some symptoms that the WSO2 JMS Tuning Guide doesn’t cover that is vendor specific. Many of these will vary depending on your specific situation so there may not be hard and fast rules, so consider all of these as an introduction to think about and make smart adjustments for your situation.

So let’s dive in!

Lots of JMSException: Connection timed out errors

If you get this error I’m guessing you are running on a Linux server with the modern kernel settings. A few years ago the default retry time changed to be 15 for almost all distress. Basically this means that you’ll time out after 17ish minutes but ASB has already timed out after about 2 minutes. So Java gives the above error.

But if you are running on a VM inside of Azure going to ASB you are on a super fast network that needs a client time out to happen sooner than the ASB timeout. 

I suggest reading this blog which explains this concept better than I can.

Suggestion: Change /proc/sys/net/ip4/tcp_retries2 to have a value of 5.

High Volume of Messages Going to Dead Letter Queue with Error: MaxDeliveryCountExceeded

There are three phases to this one: 

Phase 1 – Enhance Your Logging

First you’ll want to add the some of the JMS_* fields to your logging to aid in debugging. I suggest at a minimum you add these two and then log them:

  •      <property expression=”$trp:JMSXDeliveryCount” name=”JMS-X-Delivery-Count”/>
  •      <property expression=”$trp:JMS_MESSAGE_ID” name=”JMS-MESSAGE-ID”/>

 

JMS_Message_ID is important because that is the id on the ASB back-end side that can be used for research in Azure support tickets.

 

Phase 2 – Surpress QPID Buffer and Apply JMS Tuning

This is especially critical if you having servers that are running out of memory from getting too much data all at once. 

Second phase to be done if in the first you find that you are reprocessing the same message multiple times and it still goes to the DLQ. Upgrade QPID to at least version 1.6 and then add these parameters to your jndi connection string. It will tell QPID client to not grab a bunch of messages from ASB in its buffer and instead will allow WSO2 to control when it gets messages. Add this setting in your jndi.properties file in the connection for ASB: jms.prefecthPolicy.all=0&Provider.futureType=balanced. (This needs to be added after the question mark in the uri.)

Next you’ll want to review these best practices and tuning guides:

 If you are using EI 6.6.0 please also look into adding proxy-level-throttling.

Please note: Do not upgrade QPID to version 2.0 because there are breaking changes that are not yet compatible.

Phase 3 – Adjust ASB Settings

And the third thing you can do is change your Topic Subscriber, or Queue, Message Lock Duration from the default of 30 seconds to something longer like 2 or 5 minutes in the Azure Portal. Also look at changing the max delivery count from say 5 to something like 10 or 15.

Deep ASB Debugging

If you want to see step by step what is happening in QPID and synapse/axis2 engines then adjust your log4j2.properties to be similar to these changes:

diff log4j2.properties log4j2.properties.deepdebug
69c69
< appender.CARBON_LOGFILE.filter.threshold.level = DEBUG

> appender.CARBON_LOGFILE.filter.threshold.level = TRACE
209c209
< logger.org-apache-synapse.level=INFO

> logger.org-apache-synapse.level=TRACE
213c213
< logger.org-apache-axis2.level=INFO

> logger.org-apache-axis2.level=TRACE
217c217
< logger.org-wso2-carbon.level=INFO

> logger.org-wso2-carbon.level=TRACE
274c274
< logger.org-apache.level = INFO

> logger.org-apache.level = TRACE
281c281
< logger.org-apache-tomcat.level = INFO

> logger.org-apache-tomcat.level = TRACE
319c319,320
< logger.org-apache-qpid.level = WARN

> logger.org-apache-qpid.level = TRACE
> logger.org-apache-qpid.appenderRef.CARBON_LOGFILE.ref = CARBON_LOGFILE
322c323,324
< logger.org-apache-qpid-server-Main.level = INFO

> logger.org-apache-qpid-server-Main.level = TRACE
> logger.org-apache-qpid-server-Main.appenderRef.CARBON_LOGFILE.ref = CARBON_LOGFILE
325c327,328
< logger.qpid-message.level = WARN

> logger.qpid-message.level = TRACE
> logger.qpid-message.appenderRef.CARBON_LOGFILE.ref = CARBON_LOGFILE
328c331,332
< logger.qpid-message-broker-listening.level = INFO

> logger.qpid-message-broker-listening.level = TRACE
> logger.qpid-message-broker.appenderRef.CARBON_LOGFILE.ref = CARBON_LOGFILE