Skip navigation

Now that you know how the new architecture looks like and how to install the latest version, it is time to talk about upgrading and clustering of Intermediary.

For those of you new to the product line, Intermediary is our on-premise API/Webservices/Security gateway product.


(In case you wonder why I do not cover Management Server and Agent, there is no supported upgrade path.

It is a fresh install, configuration components can be imported and you can migrate the database schema to still have audit data available.

Statistical data is then no longer accessible via the UI but still via the database directly.)


Back to Intermediary (AI), in version 8/9/10 versions we had three main concepts of configuration distribution:


  • Local Export/Import on AI level
  • Configuration replication via clustering
  • Deployment Export Profiles

Let’s take a look at each option and how it works in the latest version:

Local Export/Import on AI level

This option can be used to backup the configuration or to move from one environment to another.

There is no change to previous versions.

We do not recommend to use this option for maintaining the configuration across AI instances in the same environment.


Configuration replication via clustering

This option no longer exists. There are only a few use cases left that require a cluster configuration.

Deployment Export Profiles

Export Profiles were already used by many of your customers in older versions. They now became the default/recommended approach for configuration distribution.
An Export Profile is a defined set of AI configuration components (e.g. Service Group, OAuth settings, Security Contract). It is persisted in AI and can be refreshed and uploaded to AMS. This is usually done each time there is a configuration change that you want to deploy to all your AI instances.


Intermediary Migration

There is no in-place upgrade. You start with a fresh installation of Aurea Monitor and then launch an Intermediary profile.

The Launcher can either run as a Docker container or a simple process.

Once Intermediary started you might try to access the web console: http://localhost:4400/sst/

Tip: If it does not allow you to log in, verify that the user you used is set up and has the Admin role assigned ( http://localhost:4040/lgserver/admin/security/users/users_list.jsp).

If that is the case ensure that the user directory is set and configured on the used launcher profile ( http://localhost:4040/lgserver/admin/services/profile/profile_list.jsp).


Where to start?

Most customers have three configuration sources that we have to consider.


  • Intermediary configuration components (e.g. Service Group)
  • JVM arguments and system properties
  • Tuning done via settings.jsp


In the latest version, you should configure everything on the profile level.

No configuration changes should be made on the Intermediary directly (except for the source AI, see below).


So where do all the settings go then? The below picture outlines the migration and configuration flow. The next sections will explain this further:




First, go to your old Intermediary (8/9/10.x) and export all your configuration components to the local disk (e.g

Next, you import everything on the new Intermediary (11.x+).


Tip: Use dedicated transports for your access points. Ideally do not reuse the default http/https listeners.

This can cause issues with different profile configurations (e.g source and target profiles have different default listener ports).


Now you should take some time to think about the packaging/grouping of your services/APIs.

You can group them by business-service and/or by the load they have to handle. The grouping is done via dedicated Export Profiles.

The way you define this grouping defines how easy it is to scale later.

You have to ensure that each Service Group is only added to one Export Profile.

It is not supported to assign multiple Export Profiles with the same Service Groups to an Intermediary Launcher Profile.


Tip: On the selection of the exported components ensure that you have “Always the active revision”


Once all the Export Profiles are defined you have to upload each to the Management Server.

All further configuration is done on the Management Server, where you can find all the uploaded Export Profiles (http://localhost:4040/lgserver/admin/deployment/distpackages/exportedartifacts_list.jsp).


Launcher Profile creation and/or configuration is the next thing you need to do (http://localhost:4040/lgserver/admin/services/profile/profile_list.jsp). 

You might want to keep it simple and simply reuse the default_intermediary_profile for all your AI instances.

That is ok, but might not be ideal.

Let me explain why I think that you should at least create one additional Launcher Profile:


Each time you make a configuration change or create a new Service Group the Export Profile has to be uploaded to the Management Server. Then provisioned to the AI instances.

If you have only one Launcher Profile, the AI instance that uploaded the configuration will also get the changes provisioned, potentially replacing changes that were not uploaded yet.

To avoid this I recommend having a (or up to one per Export Profile) dedicated “source” AI instance with a dedicated Launcher Profile.

You can do local imports (e.g. from a lower environment), create a new configuration on it, upload to the server, and then provision to other (“target”) AI instances.


As you (hopefully) have already split your configuration into multiple Export Profiles you can spread these now across different (“target”) Launcher Profiles.

Or just one if your configuration is not complex, you want to keep it simple and have no service-specific scaling requirements.


Let’s assume that I have convinced you that a dedicated Launcher Profile makes sense.

So  we create now a new Launcher Profile and configure it:


Launcher Profile Configuration


There are several sections but I would like to call out a few:


  • Additional JVM Arguments
  • Additional Packages
  • User Directory
  • Listener Settings & HTTP/S Listener
  • Management (Database)
  • Provisioning Packages

Additional JVM Arguments

This is an important section.

Any setting/system-property that was previously defined in your AI local or your settings.jsp goes in here.


Example: ( com.actional.config.RetainRuntime was defined on settings.jsp)

The settings for the JMS connection have to be added here as well.

Please compare them to the settings you defined in the AMS post-custom.conf. If these do not match then you will not be able to provision the AI instance.


Additional Packages

This section defines which packages the launcher should download.

Once downloaded they are either started (e.g. Intermediary) or used by the main package (e.g. SonicMQ JMS libraries).

You can define your own packages to provide additional libraries (e.g. Saxon for XSL 3 support on Intermediary) or custom Java Agents.


User Directory

In previous versions, the users were managed locally on Intermediary or you configured on each instance a third party (e.g. LDAP).

Now, this is also managed centrally. You can either use the Local Users from AMS (http://localhost:4040/lgserver/admin/security/users/users_list.jsp ) or you can configure centrally an External User Directory (http://localhost:4040/lgserver/admin/security/directory_services/externaldir_list.jsp ).


Listener Settings & HTTP/S Listener

The listeners you configure here replace what you did in version 8/9/10 via “http://localhost:4400/appsrv/”.

This is the Jetty listener configuration and not the AI HTTP/S transport configuration.

Transport configuration is still done on the (“source”) Intermediary console and then uploaded to the AMS using the Export Profile.

You have to ensure that the ports used in the AI transports match a Jetty listener of the Launcher Profiles.

Dedicated transports (see the first Tip) for access-points are highly recommended.

Let us go through an example:

Source Intermediary (source_intermediary_profile)
Jetty Listener4400
Transport (AI console)http/4400
Service GroupSG1 using http on AP1

"http" is the default HTTP listener and automatically mapped to the first Jetty HTTP listener you configured.

Usually, this is the port you use to access the web console of AI. If you now add SG1 to an Export Profile and upload it, it will reference the default transport with the name "http".

Then you assign this Export Profile to a different AI profile where Jetty listeners are defined for 5500 and 5501. If you start this instance and try to access port 5501 you will get:

Admin and WebServices Viewer Console access is not allowed on runtime-only transport

The reason is that there is a Jetty listener but no HTTP transport to handle the request. The only transport that exists by default is “http” which is in this case mapped to 5500.

Now you might think that 5500 will still work but it does not work either.

Jetty listens on the port but the http transport got overwritten by the Export Profile with port 4400.


The result is this:

Target Intermediary (target_intermediary_profile)
Jetty Listener5500, 5501
Transport (AI console)http/4400
Service GroupSG1 using http on AP1

To avoid this problem you should use dedicated (non-default) HTTP transports for your access-point configuration.

The other option is to not add the default transport to the Export Profile but then you cannot do any configuration changes on the HTTP transport (e.g. Service Location URL).



Configure the database to be used by Intermediary. Usually, this is the same as the server database.


Provisioning Packages

Now we are back to the Export Profiles which were created earlier.

You can assign one or more Export Profiles to the Launcher Profile.

Each AI instance is assigned to a given Launcher Profile.


Dynamic Scaling

Most customers had one large Intermediary instance per server. This does not really scale well.

If one service was used a lot then usually another node/server was introduced and added to the Intermediary (cluster) setup.

Now with version 11.x+ it is way more attractive to have more but smaller instances.

You can create one Launcher Profile per Export Profile and this way you can scale on the Export Profile level.

You can manually configure how many instances of each Launcher Profile you run.

Or you can make one more step and use containerization (Docker) together with a container-orchestration system like Kubernetes.

This will give you stateless containers (everything is configured on the profile level) and dynamic scale.

Putting enough thought into defining the Export Profile level granularity is key to enable the full potential of this.


This post became a little bit longer than I thought it would. So let’s just recap what needs to be done:


  • Make a fresh install of AMS
  • Configure the launcher profile
  • Export the complete AI 10.x configuration
  • Import it on an 11.x AI
  • Define Export profiles and upload them to AMS
  • Assign them to the target profile(s)
  • Launch Intermediary instances


Feel free to leave a comment or log a support case if I totally confused you by now or there are any questions left.

In my previous blog I explained the new architecture of Aurea Monitor.

As you can imagine the architectural changes also impact the installation and basic configuration of the product.


From an installation perspective, there are four components:


  • Database
  • JMS Server
  • Management Server
  • Launcher (Agent/Intermediary)

Let’s go through them one by one and guide you through the installation process.

Note that I will install everything on localhost.

Therefore please pay attention to hostnames, usernames, and passwords when you follow this.


Aurea Monitor requires an RDBMS (MSSQL or Oracle) to persist the statistical data. The Derby database option is only supported for basic testing and does not work well under load.

A fast database will ensure a responsive Management Server user interface.

Ensure you have a database setup and ready.

During the Management Server setup tables and other database objects will be created.

JMS Server:

The communication between the Agent/Intermediary and the Management Server is done via JMS. You have the choice between three supported and certified JMS implementations:


  • Aurea Messenger / Sonic
  • Software AG Universal Messenger
  • ActiveMQ


Usually, the load on these JMS servers is very low and you can run them together with the Management Server.


For Aurea Messenger (Sonic):

If you have Aurea Messenger in place then it is recommended to simply run another broker instance (does not have to be CAA).

If you prefer a dedicated install, then follow the installation guide to install the domain manager.

To keep it simple and given the low volume you can point the Management Server to the management broker itself.

In any case, it is required to adjust the maximum temporary queue size to e.g. 10000KB.


For ActiveMQ: Download it and follow the instructions to install and run it.


For Software AG UM: Follow the Software AG product installation instructions.

Management Server:

Your database and your JMS provider should be up and running and reachable from the server that will host the Aurea Monitor Management Server.


First, verify that you have Java 8 on your PATH and that it is a supported version. The easiest way to do this is on the command line (shell). Also, ensure that JAVA_HOME is set.



C:\Users\Stefan>java -version

openjdk version "1.8.0_232"

OpenJDK Runtime Environment Corretto- (build 1.8.0_232-b09)

OpenJDK 64-Bit Server VM Corretto- (build 25.232-b09, mixed mode)


C:\Users\Stefan>echo %JAVA_HOME%





[root@centos7-box /]# java -version

java version "1.8.0_191"

Java(TM) SE Runtime Environment (build 1.8.0_191-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

[root@centos7-box /]# echo $JAVA_HOME


We certify with Amazon Corretto 8.

Adjust your path and installed Java version as needed (e.g. set JAVA_HOME and add %JAVA_HOME%\bin to the PATH).


I usually recommend using the silent installer to install the product. Mainly as it gives you a reproducible way of installing/scripting.

Therefore this is what I will explain here.Of course, we also ship a GUI installer.



# optional but recommended

useradd --no-create-home --shell /bin/false aurea

mkdir /opt/aurea

vi ./


Insert this as the content of the








Then run these shell commands:


chmod +x CXMonitor_Management_Server_Enterprise_2020_R1_LNX.bin

./CXMonitor_Management_Server_Enterprise_2020_R1_LNX.bin -f


# optional but recommended

chown -R aurea:aurea /opt/aurea

Windows: file content:







Then run:

./CXMonitor_Management_Server_Enterprise_2020_R1_LNX.exe -f


After running the silent installer you should have Monitor Server installed.

You can verify that the DefaultProfile has been created:


Linux: /opt/aurea/monitor/DefaultProfile

Windows: D:\aurea\MonitorServer\DefaultProfile


DO NOT start the Monitor Server yet!

The reason is that the initial profile creation will otherwise not automatically take the JMS settings into account which we will add now.

If you did already, no worries.

You can manually add them or contact Aurea Support to assist.


Next, we have to tell the Monitor Server where our JMS provider is located and where the JMS client JAR files are.


Linux: Create/Edit /opt/aurea/monitor/common/scripts/ (alternative is /opt/aurea/monitor/DefaultProfile/bin/ and add these lines:



actional_append_cp /opt/activemq/activemq-all-5.11.1.jar

actional_sys_prop always.recreate.initial.settings true

actional_sys_prop ConnectionFactoryExt activemq.js

actional_sys_prop com.actional.jms.ext.activemq.url failover:tcp://localhost:61616

#actional_sys_prop com.actional.jms.ext.activemq.username <username>

#actional_sys_prop com.actional.jms.ext.activemq.password <password>

actional_sys_prop com.actional.destPrefix actional.Test



Aurea Messenger:

actional_append_cp /opt/aurea/messenger/MQ10.0/lib/mfcontext.jar

actional_append_cp /opt/aurea/messenger/MQ10.0/lib/sonic_Crypto.jar

actional_append_cp /opt/aurea/messenger/MQ10.0/lib/sonic_Client.jar

actional_append_cp /opt/aurea/messenger/MQ10.0/lib/sonic_XA.jar

actional_append_cp /opt/aurea/messenger/MQ10.0/lib/mgmt_client.jar

actional_sys_prop always.recreate.initial.settings true

actional_sys_prop ConnectionFactoryExt sonicmq.js

actional_sys_prop com.actional.jms.ext.sonicmq.url tcp://localhost:2506

actional_sys_prop com.actional.jms.ext.sonicmq.username Administrator

actional_sys_prop com.actional.jms.ext.sonicmq.password Administrator

actional_sys_prop com.actional.jms.ext.sonicmq.initialConnectTimeout 60

actional_sys_prop com.actional.jms.ext.sonicmq.faultTolerant true

actional_sys_prop com.actional.jms.ext.sonicmq.faultTolerantReconnectTimeout 60

actional_sys_prop com.actional.jms.ext.sonicmq.pingInterval 30

actional_sys_prop com.actional.jms.ext.sonicmq.socketConnectTimeout 10000

actional_sys_prop com.actional.destPrefix actional.Test




Create/Edit the D:\aurea\MonitorServer\common\scripts\post-custom.conf, see example content below:




Aurea Messenger:\aurea\Messenger\MQ10.0\lib/*.jar

Once you are done with this you can finally start the Aurea Monitor Server.









Open a browser and navigate to: http://localhost:4040/lgserver

Press START and follow the wizard instructions:



After you provided the license and defined username/password for the admin account, the database connection can be configured.

You can also configure the database later (via http://localhost:4040/lgserver/admin/configure/logging_db_list.jsp)  and leave it as ‘none’ for now.


With this, the basic setup of the Management Server is completed.




Launcher profiles can now be configured on the server side in order to launch an Agent/Intermediary.

The launcher itself is a 5MB zip file that you can extract on the target machine, or you simply run it as Docker container.

You only have to adjust the configuration.json to point it to the profile you want to use.


An Agent profile will save interceptors locally and allow you also to define SDK system properties as part of the profile. This also impacts how you instrument a system and reference interceptor JARs.

An Intermediary profile defines which provisioning packages to load. These packages contain the configuration definitions for service groups to load.


In my next post, I will explain how to migrate a 10.x Intermediary configuration to 11.x

Things have changed.

The architecture of Aurea Monitor (Actional) has not received major changes for a long time.

With the release of version 11.x this a new architecture got introduced.


You might be used to this:


  • An (Actional) Agent = A Node in Management Server = A monitored server
  • Communication (Agent / Management-Server) is synchronous using HTTP
  • Intermediary can run standalone or with a Management Server
  • Agent/Intermediary configuration local to each node


Ten years ago the above was absolutely ok. Nowadays with more dynamic environments, hybrid environments (cloud & on-premise), and last but not least containerization (Docker, Kubernetes) these concepts are not sufficient anymore.

Let's look at some details of the new architecture:





This concept has been broken up. An Agent now supports monitoring more than one server/container.

The monitored system can define which Agent to report the traffic to.

The Agent will forward the received events to the server with the information of the origin (= monitored server).

The server keeps track of all the reported/known endpoints (hostnames, addresses) of a monitored server and links it to a node.


In a dynamic/containerized environment these endpoints might change. Nevertheless, you might want to ensure that Aurea Monitor always treats them the same (ignoring their unique container/machine identifier).

In other words, you want to avoid a new node for every additional machine/VM/container that is brought up.

In 11.x+ parameters exist to configure this and ensure a single node representation.



Agent /Management-Server Communication


The communication between the Agent and the Management Server is now asynchronous. Instead of HTTP, the architecture was changed to use

JMS (e.g. Aurea Messenger) instead.

Only topics and temporary queues are used for communication. This way the required JMS server configuration is kept to a minimum.







It is no longer possible to have a standalone Intermediary without a Management Server.

A Management Server is mandatory for each Aurea Monitor environment. All the configuration is provisioned from the server to each Intermediary instance.



Agent/Intermediary Configuration


The previous paragraphs left out one important aspect of the communication and the architecture.

The concept of the Launcher.


The Launcher is what is initially started and not the Agent/Intermediary. This is a key component of the new architecture.

On startup, it connects to the Management Server via HTTP ( this is the only communication part that still requires HTTP) and fetches a configuration profile.


Example configuration.json which defines server location and profile:



     url: "http://localhost:4040/lgserver/auto-deploy/v1/profiles/default_agent_profile",

     username: "admin",

     password: "secret"



The profile contains the start command configuration (e.g. JVM arguments), product configuration (e.g. service groups, transports), and also the binaries (e.g. Jetty, Agent).

Once everything is downloaded, to the deploy folder inside the launcher folder (e.g. C:\laucher\deploy\), the Launcher starts the configured product e.g. the Agent using the active deploy artifacts (e.g. C:\laucher\deploy\active).

A working directory is created and used by the launched product to store all product-specific configuration files and logs (e.g. C:\laucher\working\logs).


As a best practice you should no longer do any manual configuration via the Agent/Intermediary web interfaces in production but rely on the profile configuration.



Centralized management of configuration profiles also allows you to easily upgrade everything from the server.

Simply assign a new product version (e.g. a new release of Intermediary) to the profile and remotely restart the launcher.


The ability to remotely control all the launched process from the server makes it easy to roll out fixes and new versions.


Many of you might already plan to migrate due to the upcoming end of life of Flash. Remind you, only 11.x+ will receive a de-flash.

Therefore installation, migration/upgrade to the new version will be the topic in one of the upcoming posts.

I ended my previous blog post with the question: “Why is the consumer application not fast enough?“

This is a very common question. The root cause of slowness or increased response time is often not very obvious.

For the end user the system she/he interacts is slow. Reality though is that this is not always true.


Let me take you through an example and show what we can do to find the real root cause.


Thread dumps and logs are a starting point if the issue is not intermittent (and it is Java).

Simple take a series of thread dumps using “jstack -l <pid>” and pass this to the support team of the product in question.

For intermittent issues and/or heterogeneous environments (e.g. .net + Java) the root cause analysis is more complex.



The issue can be load, data(size/content) or environment related. The more backend systems involved in one request the more complex it becomes.


For this blog post I created an example to illustrate this.

A customer tries to access a web app and is experiencing long wait times after triggering a request.



The app is a page which interacts with a REST API. The REST API itself is sending requests to a JMS queue.

From that queue an integration engine (CX Messenger, Sonic ESB) is picking up the message.

A business process flow is executed and then the response is sent back.


Sample Scenario



As you can see there are several systems involved. You might argue now that this is a constructed example.

I agree but reality is often even way more complex than this.

Which makes it often quite hard to understand all the reasons why the final response time is so high.


Back to the example. The customer experience happens at the web page, no matter what is done behind the scenes.

The user reported a wait time of circa ten seconds till the page is loaded. Network issues as potential cause have already been eliminated by the operations teams.

Normally the investigation would now start with logs of all involved  components and teams.

Different technology-stacks, different ops teams, potential collaboration issues, time consuming… etc.



This is where CX Monitor (Actional) can show one if its strengths.

It allows you to dig into past traffic/interactions using a date time picker or you work proactively (preferred) using policies.

A policy defines a certain rule/condition/target. If the condition is met (e.g response time > 3s) then an action can be triggered.

Typically this action is an alert inside CX Monitor (can be passed to other monitoring systems/dashboards) but can be anything you want.


Sample Policy:


Policy condition


Part of the alert is information about the interactions and (if requested) the data involved in the complete interaction flow.

The flow itself can be reviewed and drilled into. It shows the interactions between the systems and APIs.



Example Flow Map:


Flow map of interactions between systems at given point in time




For “slowness” root cause analysis I personally prefer a different view of the same data. The sequence table which is also part of the alert details in CX Monitor.

In our example it clearly shows us where the time is spent.


Example Sequence Table:


Sequence table showing time spent in each app


The ten seconds reported by the customer are confirmed by this. The sequence map shows that the time is spent in these components:


  1. 4 seconds in the aspx page before the REST call is made
  2. 2 seconds on the ESB business process on log file write
  3. 3 seconds on a JDBC/SQL call done by the app that is exposing the REST API
  4. 1 second again on this REST API app after the database call


Having this information promptly at hand can save a lot of time and gives valuable insights into monitored systems.

Monitoring can be done for a single application (e.g. Aurea CRM connector/interface/webservices) or a complete heterogeneous environment. Seamless view of interactions across technology stacks (.net, java, etc).

You can even add monitoring capabilities to your custom solution app.


You have access to all this via Aurea Unlimited which means no extra cost for your company.


Further reading:

Is it possible to monitor ACRM using CX Monitor (Actional)?

This is really something we hear regularly in Aurea Support and in most cases flow control is the cause.

Many customers heard about flow control, but most are not fully aware about the details.

Some even consider it a product bug or limitation.

I understand that it can cause pain, but there is a reason for it, which is why I thought it is worth explaining it in more detail:


What is flow control?

In a messaging system you always have a producer and a consumer. Ideally the consumer is at least as fast at processing messages as the producer. In reality this is not always possible.

Reasons are spikes in load, outages on consumer side or simply not well designed architecture.

CX Messenger (Sonic) provides of course some buffers but once these are full, message processing is impacted.

By default the producer is simply blocked until space is available on broker side to take the next message.

This is what we call flow control.


Let’s get into a bit more detail on this per JMS messaging domain:


Point-to-Point (Queues)

Recap of PTP basics: n producers per queue, n consumers per queue allowed, only one of the consumers of the queue will get the message.


If the consumers are not fast enough (or disconnected) the broker will queue the messages per queue. Each queue has two configuration options, Save Threshold and Maximum Size.

The Maximum Size defines how many kilobytes of message data the queue can hold.

The Save Threshold defines how much of this date is kept in memory, rest goes to disk.

Once the Maximum Size is reached flow control kicks in.


Publish/Subscribe (Topics)

Recap of PubSub basics: n producers per topic, n consumers per topic allowed, each consumer of the topic will get the message.


If the consumers are not fast enough the broker will queue the messages per subscriber. Each subscriber has buffers which are configured globally in the broker properties.

Once the buffer (per subscriber) of one subscriber (of given topic/pattern) is full, flow control kicks in on the particular topic. This means the slowest subscriber defines/limits the message delivery rate to all subscribers.

To be clear: at that point all the other subscribers on that topic no longer get messages and the publisher is blocked.

(In case you were wondering, yes it is key to detect this guy to prevent flow control. We will get there soon.)


Can I avoid flow control?

Now that you know that there are limiting factors, questions might be:


     "How to avoid such situations?

     Or how can flow control be avoided at all?

     But is it really a bad thing?

     Does it even help in your architecture?"


CX Messenger JMS API allows you to disable it, which will then cause an exception on the message producer side once flow control would kick in.

In most architectures though you would not want to do that, but rather get to the bottom of the cause and act accordingly.


So how can you avoid/reduce flow control? As you might guess there is no simple answer to it. It all depends on the cause and is very specific to each implementation.

There are buffers and there is the pace at which messages are produced and consumed. These are the key factors that you have to look at.



  • For PTP you can increase the number of consumers to ensure messages are consumed faster. A larger maximum queue size will help on spikes on messaging load, but will increase latency (messages might stay longer in the queue).
  • Similar to PTP you can increase buffers for PubSub, but again there is latency impact and also memory impact. In addition there is this magic switch called “Flow To Disk” which allows you to use the whole hard disk as buffer.


     “So I just enable that magic switch and all good, great!”


Wrong, let me stop your enthusiasm here for a moment.

I personally think Flow To Disk is the worst feature we have.

You wonder why?

The feature itself is great, but the way how it is often used is causing issues. It simply hides bad architecture and bad configuration. People tend to enable it by default. Do not want to invest in proper load tests and architectural/configuration changes. Then once all is stuck (e.g. disk full or memory reference buffer is full) Aurea Support is pulled in and is supposed to fix it.

At this stage though most projects are already live and cannot easily make major changes.

Hopefully this blog post helps you to not make the same mistake.


FlowToDisk notification:


Back to PubSub: Another option to avoid/reduce flow control is to use shared/grouped subscribers.

It will ensure that each message is only consumed once per shared group.

This allows you to have parallel processing of messages per group but only once per message.


How do I know what the cause of flow control is in my architecture?

I hope by now you are convinced that flow control is great and Flow to Disk has to be used with caution.

So the question is: how do you even know that you run into flow control?


To detect whether your current deployment is stuck due to flow control the quickest is to get a Java thread dump using  "jstack -l  <pid>".

Look for threads blocked within a 'Job.join' call inside a send or publish.  This indicates that the client is waiting to send a message to the broker and is most commonly due to flow control.


For example:


"JMS Session Delivery Thread" (TID:0x101E7D30, sys_thread_t:0x3DDDBE8, state:CW, native ID:0x1F9C) prio=5

    at java.lang.Object.wait(Native Method)

    at java.lang.Object.wait( Code))

    at progress.message.zclient.Job.join((Compiled Code))

    at progress.message.zclient.Publication.join((Compiled Code))

    at progress.message.zclient.Session.publishInternal((Compiled Code))

    at progress.message.zclient.Session.publishInternal((Compiled Code))

    at progress.message.zclient.Session.publish((Compiled Code))

    at progress.message.zclient.Session.publish((Compiled Code))

    at progress.message.jimpl.MessageProducer.internalSend((Compiled Code))




From a proactive monitoring perspective there are several options that the product offers.

Which of the options is best for you depends on product usage.


You can setup flow control related broker notifications. PubPause/SendPause notifications are the starting point.

There are additional notifications (e.g interbroker flow control) as well which you should make yourself familiar with.

These notifications may cause a lot of noise and rarely operations team really investigate these notifications.

Some advanced teams offload these to ElasticSearch for analytics. Of course the noise is less the better you configured the system.

These notifications allow you to identify which consumer is causing flow control. The details are available in the PubPause notification:




Note: PubPause/PubResume does not apply/work if you use a shard/group subscription!

     (SlowSubscriber and BackloggedSessionSkip is key here, see below)


Especially for PubSub the flow control monitoring has more options. In case you have enabled Flow To Disk the disk usage of the pubsub store and the memory usage of the Flow To Disk can be monitored.

There is another notification which helps to identify slow subscribers and especially (but not limited to) for shared subscribers this is super helpful: application.session.SlowSubscriber



If a message is stuck for a defined number of milliseconds at the front of the subscribers buffer a notification is generated.

This does not replace PubPause but it allows you to detect stuck messages even if no flow control kicked in (yet).

(for PTP the queue.messages.TimeInQueue notification is the best equivalent. It allows you to get notified if a message is pending for too long in a queue.)


Related to the slow subscriber monitoring there is another corner case where a shared subscriber might back up on one member of the group. Normally this would cause the whole group to be slowed down, but might not even cause flow control. In more recent releases this has been improved to favor the faster clients while distributing messages in a group.


A new notification application.session.BackloggedSessionSkip is raised to identify clients that are backing up.



Once you identified the consumer(s) that cause this the next question is: Why is the consumer application not fast enough?


The answer to that will be given in my next blog post.





How can a thread dump be generated from a Sonic Container or Client?

Assessing Flow Control condition.

How to monitor subscribers to identify slow message consumption?

Slow shared subscriber impacts other subscribers in the group

Monitoring for flow control using the Sonic Management Console

What is Flow to Disk?

Under what condition a publisher might get flow controlled even though flow to disk is enabled?

Publisher flow controlled even though FlowToDisk is enabled.