Saturday, September 1, 2012

ESB performance

I recently happen to do some performance tests with a system having a back end service and an ESB proxy service passing the message through that. In real case it needs to route the message using http headers but to measure the performance, it is reasonable just to use it as a pass through proxy. The problem was when the message route through the ESB the system performance get reduced.
First I created an back end service using an ADB axis2 service and deployed that in WSO2 AS 4.1.2. Then I measure the throughput of this service with a message with 780 bytes using java bench tool. It had around 13,000 TPS. Here I used 500,000 messages with difference concurrency levels (For this I edit the mgt-transports.xml file to increase the threads).

-n 500 -c 1000 - 13,140.52
-n 1000 -c 500 - 13,747.96
-n 2000 -c 250 - 13,004.18
-n 5000 -c 100 - 13,297.51
-n 10000 -c 50 - 13,039.79

Then I created a proxy service using WSO2 ESB 4.0.3 and enable the binary relay (This can be done by editing the axis2.xml and setting the correct builder and formatter) to send a message through ESB. Then I measure the performance of this system again with 500,000 messages using java bench. Here are the results.

-n 500 -c 1000 - 1,179.16
-n 1000 -c 500 - 1,360.03
-n 2000 -c 250 - 1,471.56
-n 5000 -c 100 - 1,532.91
-n 10000 -c 50 - 1,526.42

It has reduced the performance around 8 times. After talking to different people I found that the ESB performance can be fine tuned as given here. Basically this increases the threads to maximize the performance. After starting the server with these parameters it gave me an exception saying too many open files. In order to fix this I had to increase the OS level threads as follows.

1. Increased the number of file handlers
sudo vi /etc/sysctl.conf
fs.file-max = 1000000
fs.inotify.max_user_watches = 1000000

2. Increase the number of files to user
sudo vi /etc/security/limits.conf
amila soft nofile 100000
amila hard nofile 100000

These two fixed the above problem and I could start the server successfully. Then I ran the earlier test again. Here is the results.

-n 500 -c 1000 - 5,562.29
-n 1000 -c 500 - 5,628.08
-n 2000 -c 250 - 5,679.52
-n 5000 -c 100 - 5,609.53
-n 10000 -c 50 - 5,159.77

In this case it was around 4 time increment of the performance. But still it is half of the direct back end performance. In this point after running with different options I concluded this as an I/O issue since the message now have to go through an extra hop. I ran all tests in my machine.
Then I thought of comparing these results with another ESB to check my above assumption. For that I choose UltraESB 1.7.1 since it has claimed over 2 times performance gain for a direct proxy invocation here. Then I added some extra jars as given in the performance tool site. Then ran the tests with a direct proxy service and here were the results.

-n 500 -c 1000 - 4,591.54
-n 1000 -c 500 - 3,797.29
-n 2000 -c 250 - 2,826.34
-n 5000 -c 100 - 1,516.03
-n 10000 -c 50 - 1,140.33

Here for some reason it shows very low performance at low concurrency levels and increase with the concurrency. Generally systems shows low performance at low concurrency but I think 50 is a sufficiently enough concurrency level. The other observation I made in this regard was the load average factor with the top tool in linux. It was kept on increasing through out the whole tests. This may be due to not fine tuning the server. However I could not find any document for that. In the given documents it was mentioned that detect the available processors automatically and adjust the parameters accordingly.
Then I tried with the zeroCopyEnabled since there was a claim for its high efficiency. Again there was not much difference from the results. Still there is no any significant performance improvement.

-n 500 -c 1000 - 4,602.64
-n 1000 -c 500 - 3,774.64
-n 2000 -c 250 - 2,644.13
-n 5000 -c 100 - 1,538.64
-n 10000 -c 50 - 1,123.28

After this I was thinking why this test shows a complete different result compared to bench mark results given. One of the observation I made was the low number of messages which may not have produced the consistent results. Although when doing a performance test it is required to send large amount of messages to obtain consistent results there is no reason it always shows a better performance of one ESB always as well. And also there is a possibility of WSO2 ESB is not being tuned for the performance. According to the above results after fine tuning it has showed higher performance.
In summary I learned two things from this. First routing through an ESB always degrade the TPS of the system due to additional I/O. Secondly it is always better to measure the performance with the required system in the hardware it suppose to run instead of being rely on the performance bench marks.