ATS : Increasing the speed we send some events to the log DB

Introduction


In rare cases, it may turn out that our logging system cannot handle a large amount of messages or events that are pending to be written in the logging DB.

In such a case you will see a Queue full error.

The reason for the problem comes from the fact that in order to have non-blocking logging, ATS caches the log events in a queue with a limited size, and when the queue gets overfilled this error comes up.



Monitor the log events queue performance

Prior to fixing the performance of the events queue, it is good to first see how the number of cached events in the queue is changing during the test execution.

Enable trace messages

You can enable trace messages on the ATS agent side by adding a system variable called ats.log.monitor.events.queue with the value true when starting the agent.

Edit the agent.sh or agent.bat by setting MONITOR_EVENTS_QUEUE=true.

By doing this you will see in the agent's console repetitive lines about:

  • number of new events that can be cached
  • how long time it took to commit the last chunk of events. This is in the case when using batch logging mode

Using Java code

See the number of cached messages in the events queue of the Test Executor :

import com.axway.ats.log.appenders.ActiveDbAppender;

int numberEventsStillNotInTheDB = ActiveDbAppender.getCurrentInstance().getNumberPendingLogEvents();

See the number of cached messages in the events queue of some remote ATS agent :

import com.axway.ats.agent.webapp.client.AgentConfigurationClient;

int numberEventsStillNotInTheDB = new AgentConfigurationClient( <AGENT ADDRESS> ).getNumberPendingLogEvents();



Reasons for the problem and options to fix it

Very intensive logging

This usually happens when running performance tests with many threads that run very quick actions - for example, you might be doing file transfers which take a few milliseconds each. This sends lots of log events in order to register the start and end of each action. Additionally, the actions themselves can send come log events as well.

A possible solution is to change the log level as explained here

If for example, you have set the log level to DEBUG and change it to WARN, this will greatly reduce the number of log events. Such change instructs to be logged only messages with severity Warning and up (like Error and Fatal).

Slow connection to the DB


May happen when the DB is not in the same geographical location as the test running machines.

A possible solution is to use an instance of Test Explorer located in your local network.

Very large DB

When our DB tables are getting very long - tens of millions of records

Possible  solutions are:

  • Delete not needed runs
  • Use more than one DB - one for spare runs and one for official runs
  • Upgrade your DB hardware



Increase the events queue capacity

One possible option to ease the problem is to increase the size of the log events queue. The default capacity is 100 000 and can be modified by adding the following line in your log4j.xml

<param name="events" value="200000" />

Of course, this is not going to speed up the insertions in the log DB, but will give more room for caching log events.

In some cases, this can be your work-around. But in general, if your test execution logging continues for long enough, at some moment you will still hit the queue full error.



Insert log events faster

The list above can give you some hints on how to proceed, but there is one more way to use - let ATS send log events faster.

How does it work?

You can set ATS to send the most often log events in batch mode. This means that many events are cached in an internal buffer and when the buffer is full, the whole batch of events is sent to the log DB as one transaction and this causes some performance boost.

Note that the following options are only applicable to SQL Server log DB.

Since 4.0.7, you can also set the chunk size for the batch mode and the driver to be used (MSSQL/JTDS)This allows you to specify how many checkpoints/action responses will be flushed at once to the log DB.

And once again since 4.0.7, you can customize the flush timeout for batch mode. This way, if your performance tests are running too long, you can minimize the amount of times ATS flushes data to the log DB, which greatly improves logging operations' speed.

So to recap for best performance, set the following parameters in your DB appender section in the  log4j.xml:

<param name="mode" value="batch" />
<param name="driver" value="mssql" />
<param name="chunkSize" value="25000" /> <!-- Try different values for that parameter, as there is no one value fits all -->

And start the ATS agents/loaders with the following JAVA_OPTS:

/path/to/agent.sh -java_opts "-Dats.log.db.max.cache.events.flush.timeout=<flush_timeout_in_seconds>" start|restart

How faster it gets?

When you do DB inserts it is important how long are the tables you work with. Databases with tables containing up to 10 - 20 million lines are considered small, while such with 50+ million lines are very large databases.

Our tests show 5 times quicker inserts on small DBs and 2 times quicker inserts on very large DBs.

Which events are cached?

We use batch mode only for the events that are proven to come in thousands for a single test execution - the message and action response events

How to switch to batch mode?

All you need to do is to add the following line inside the DB appender in your log4.xml file:

<param name="mode" value="batch" />

A log4j.xml example is provided here

What are the drawbacks?

  1. When running performance tests, if you look at the Test Explorer's Performance actions tab you will see how many actions are currently running, passed, and failed. When using batch mode you will not see running, but only passed and failed ones.
  2. Some events may take longer to be registered into DB. This may happen when the internal cache is not filled, in this case, the events will be sent to the DB after they get too old which is 10 seconds. This means there is a chance that some events will be registered in the DB with a delay of up to 10 seconds.
  3. If you kill the JVM some events from the cache may not get in the DB. This is not considered as a problem, because you are going to kill a test run only in case when the log results are not important to you.

When should this option be used?

We advise you to use this option when you need it i.e. when you get Queue full error.

Since version 4.0.7 there is an additional option to speedup insertion and prevent Queue full error. It is to use Microsoft's JDBC driver for logging ( SQL Server DBs only ). This is done by adding property driver = mssql in the ActiveDbAppender section in the log4j.xml file. For backward compatibility reasons, the default SQL Server JDBC driver used is jTDS.

If this also does not help you, then please contact ATS team ( mail is in the project's README.md file).



Back to parent page

Go to Table of Contents