I recently happen to do some performance tests with a system having a back end service and an ESB proxy service passing the message through that. In real case it needs to route the message using http headers but to measure the performance, it is reasonable just to use it as a pass through proxy.  The problem was when the message route through the ESB the system performance get reduced. 
First I created an back end service using an ADB axis2 service and deployed that in WSO2 AS 4.1.2. Then I measure the throughput of this service with a message with 780 bytes using java bench tool. It had around 13,000 TPS. Here I used 500,000 messages with difference concurrency levels (For this I edit the mgt-transports.xml file to increase the threads).
-n 500 -c 1000 - 13,140.52
-n 1000 -c 500 - 13,747.96
-n 2000 -c 250 - 13,004.18
-n 5000 -c 100 - 13,297.51
-n 10000 -c 50 - 13,039.79
Then I created a proxy service using WSO2 ESB 4.0.3 and enable the binary relay (This can be done by editing the axis2.xml and setting the correct builder and formatter) to send a message through ESB. Then I measure the performance of this system again with 500,000 messages using java bench. Here are the results.
-n 500 -c 1000 - 1,179.16
-n 1000 -c 500 - 1,360.03
-n 2000 -c 250 - 1,471.56
-n 5000 -c 100 - 1,532.91
-n 10000 -c 50 - 1,526.42
It has reduced the performance around 8 times. After talking to different people I found that the ESB performance can be fine tuned as given here. Basically this increases the threads to maximize the performance. After starting the server with these parameters it gave me an exception saying too many open files. In order to fix this I had to increase the OS level threads as follows.
1. Increased the number of file handlers
sudo vi /etc/sysctl.conf
fs.file-max = 1000000
fs.inotify.max_user_watches = 1000000
2. Increase the number of files to user
sudo vi /etc/security/limits.conf
amila soft nofile 100000
amila hard nofile 100000
These two fixed the above problem and I could start the server successfully. Then I ran the earlier test again. Here is the results.
-n 500 -c 1000 - 5,562.29
-n 1000 -c 500 - 5,628.08
-n 2000 -c 250 - 5,679.52
-n 5000 -c 100 - 5,609.53
-n 10000 -c 50 - 5,159.77
In this case it was around 4 time increment of the performance. But still it is half of the direct back end performance. In this point after running with different options I concluded this as an I/O issue since the message now have to go through an extra hop. I ran all tests in my machine.
Then I thought of comparing these results with another ESB to check my above assumption. For that I choose UltraESB 1.7.1 since it has claimed over 2 times performance gain for a direct proxy invocation here. Then I added some extra jars as given in the performance tool site. Then ran the tests with a direct proxy service and here were the results.
-n 500 -c 1000 - 4,591.54
-n 1000 -c 500 - 3,797.29
-n 2000 -c 250 - 2,826.34
-n 5000 -c 100 - 1,516.03
-n 10000 -c 50 - 1,140.33
Here for some reason it shows very low performance at low concurrency levels and increase with the concurrency. Generally systems shows low performance at low concurrency but I think 50 is a sufficiently enough concurrency level. The other observation I made in this regard was the load average factor with the top tool in linux. It was kept on increasing through out the whole tests. This may be due to not fine tuning the server. However I could not find any document for that. In the given documents it was mentioned that detect the available processors automatically and adjust the parameters accordingly.
Then I tried with the zeroCopyEnabled since there was a claim for its high efficiency. Again there was not much difference from the results. Still there is no any significant performance improvement.
-n 500 -c 1000 - 4,602.64
-n 1000 -c 500 - 3,774.64
-n 2000 -c 250 - 2,644.13
-n 5000 -c 100 - 1,538.64
-n 10000 -c 50 - 1,123.28
After this I was thinking why this test shows a complete different result compared to bench mark results given. One of the observation I made was the low number of messages which may not have produced the consistent results. Although when doing a performance test it is required to send large amount of messages to obtain consistent results there is no reason it always shows a better performance of one ESB always as well. And also there is a possibility of WSO2 ESB is not being tuned for the performance. According to the above results after fine tuning it has showed higher performance.
In summary I learned two things from this. First routing through an ESB always degrade the TPS of the system due to additional I/O. Secondly it is always better to measure the performance with the required system in the hardware it suppose to run instead of being rely on the performance bench marks.
First I created an back end service using an ADB axis2 service and deployed that in WSO2 AS 4.1.2. Then I measure the throughput of this service with a message with 780 bytes using java bench tool. It had around 13,000 TPS. Here I used 500,000 messages with difference concurrency levels (For this I edit the mgt-transports.xml file to increase the threads).
-n 500 -c 1000 - 13,140.52
-n 1000 -c 500 - 13,747.96
-n 2000 -c 250 - 13,004.18
-n 5000 -c 100 - 13,297.51
-n 10000 -c 50 - 13,039.79
Then I created a proxy service using WSO2 ESB 4.0.3 and enable the binary relay (This can be done by editing the axis2.xml and setting the correct builder and formatter) to send a message through ESB. Then I measure the performance of this system again with 500,000 messages using java bench. Here are the results.
-n 500 -c 1000 - 1,179.16
-n 1000 -c 500 - 1,360.03
-n 2000 -c 250 - 1,471.56
-n 5000 -c 100 - 1,532.91
-n 10000 -c 50 - 1,526.42
It has reduced the performance around 8 times. After talking to different people I found that the ESB performance can be fine tuned as given here. Basically this increases the threads to maximize the performance. After starting the server with these parameters it gave me an exception saying too many open files. In order to fix this I had to increase the OS level threads as follows.
1. Increased the number of file handlers
sudo vi /etc/sysctl.conf
fs.file-max = 1000000
fs.inotify.max_user_watches = 1000000
2. Increase the number of files to user
sudo vi /etc/security/limits.conf
amila soft nofile 100000
amila hard nofile 100000
These two fixed the above problem and I could start the server successfully. Then I ran the earlier test again. Here is the results.
-n 500 -c 1000 - 5,562.29
-n 1000 -c 500 - 5,628.08
-n 2000 -c 250 - 5,679.52
-n 5000 -c 100 - 5,609.53
-n 10000 -c 50 - 5,159.77
In this case it was around 4 time increment of the performance. But still it is half of the direct back end performance. In this point after running with different options I concluded this as an I/O issue since the message now have to go through an extra hop. I ran all tests in my machine.
Then I thought of comparing these results with another ESB to check my above assumption. For that I choose UltraESB 1.7.1 since it has claimed over 2 times performance gain for a direct proxy invocation here. Then I added some extra jars as given in the performance tool site. Then ran the tests with a direct proxy service and here were the results.
-n 500 -c 1000 - 4,591.54
-n 1000 -c 500 - 3,797.29
-n 2000 -c 250 - 2,826.34
-n 5000 -c 100 - 1,516.03
-n 10000 -c 50 - 1,140.33
Here for some reason it shows very low performance at low concurrency levels and increase with the concurrency. Generally systems shows low performance at low concurrency but I think 50 is a sufficiently enough concurrency level. The other observation I made in this regard was the load average factor with the top tool in linux. It was kept on increasing through out the whole tests. This may be due to not fine tuning the server. However I could not find any document for that. In the given documents it was mentioned that detect the available processors automatically and adjust the parameters accordingly.
Then I tried with the zeroCopyEnabled since there was a claim for its high efficiency. Again there was not much difference from the results. Still there is no any significant performance improvement.
-n 500 -c 1000 - 4,602.64
-n 1000 -c 500 - 3,774.64
-n 2000 -c 250 - 2,644.13
-n 5000 -c 100 - 1,538.64
-n 10000 -c 50 - 1,123.28
After this I was thinking why this test shows a complete different result compared to bench mark results given. One of the observation I made was the low number of messages which may not have produced the consistent results. Although when doing a performance test it is required to send large amount of messages to obtain consistent results there is no reason it always shows a better performance of one ESB always as well. And also there is a possibility of WSO2 ESB is not being tuned for the performance. According to the above results after fine tuning it has showed higher performance.
In summary I learned two things from this. First routing through an ESB always degrade the TPS of the system due to additional I/O. Secondly it is always better to measure the performance with the required system in the hardware it suppose to run instead of being rely on the performance bench marks.
 

9 comments:
Hi Amila,
Did you enable RAMDiskFileCache, as per http://docs.adroitlogic.org/display/esb/Production+Deployment+Best+Practices
Could you please share the UltraESB configuration used, conf/ultra-root.xml and the conf/ultra-dynamic.xml
You are seeing some thing equal to http://esbperformance.org/display/comparison/ESB+Performance+Testing+-+Round+6 numbers for WSO2 ESB, while very low numbers than stated for UltraESB. Which is for sure a configuration issue of UE.
Now I personally value this sort of a work rather than talking in abstract. You could have tried this scenario on EC2 as per http://esbperformance.org/display/comparison/Execution+-+EC2+Execution+of+Round+6 as the network isolation of the ESB, BE and the client is something which will always be the case in any production deployment.
Thanks,
Ruwan
I have uploaded the files you have mentioned to the same location as other files.
Comparing the numbers, the performance bench mark has aggregated the numbers for 1k - 100k. I am not sure how logical such calculation. So there is no relationship with that numbers and this :). Even if we take that it says some thing 2330 while this shows 5500.
I haven't done that os level optimisation since I thought it is same for both. Anyway I'll have a try with that.
For performance bench mark side, have you have fine tuned WSO2 ESB as given here?
I think it is not a nice thing to repeatedly link the performance bench mark from here :)
First of all I didn't purposely added the esbperformance.org link repeatedly. I just wanted to point out what we have observed, I linked the UltraESB documentation as you have mentioned in your blog that you couldn't find the documentation on tuning the UltraESB.
1K and 100K aggregation has been done for all ESBs so I think it is fair enough. However you can see the summery of the perf data at https://docs.google.com/spreadsheet/ccc?key=0AswvQMFbXtBPdE9lcS1SdThlUkg5Zll0Z0tRa0VKdHc (Now I am not going to link it :-) copy and paste the above link into your browser)
Not just the OS changes.. The FileCache used in the UltraESB in ultra-root.xml is PooledMessageFileCache in your case, but you should use RAMDiskFileCache as per the configurations available in the esbperf source code (Again I don't want to link it go to the esbperf site and you will be able to find it :-)) It is not nice to enable perf optimizations like binary relay for one ESB and do not tune at all the other ESB to compare the ESB performance :-)
Regarding the configuration of the WSO2 ESB, we have not used Binary relay, but we have configured the Pass-through transport and WSO2 ESB out-performed all the other ESB's in direct proxy scenario, which is there in the esbperf site data table too.
The problem with the pass through transport, AFAIK is that it cannot be used with any other scenarios than direct and transport header based routing. Meaning that you cannot have any proxy which touches the message content, on the same ESB if you enable pass-through, keeping it out in *most of the* real usage scenarios? That is what my understanding was, please correct me if I am wrong.
Disclaimer:: These are just my personal views.
What I have said is not to repeatedly link an already known thing. Please give the http link as you have done with the summary results for the optimization parameters. Or you can run the given backend service in your machine with the request message and tell me the steps to fine tune it similar to I have done with WSO2 ESB (As you have suggested :)).
You may not agree with me on this. By looking at the summary results (even with UltraESB enhanced) I can see some 5k/10k scenarios which out perform 500b cases. If you run this tests with a sufficiently large number of messages ( say 500,000) you won't get these numbers. You can say that that is what you have done to every one. But here the point is using an inconsistent way to bench mark. Here I am not going to put any argument about the final conclusion, but simply results are inconsistent. Same thing for adding 500b – 100k. It is not a logical thing to do although you do for every one :).
I agree with the fact that when you enable binary relay for a one content type, it can not be used with the same content type for message processing scenarios (as with the ESB 4.0.3). But routing messages with http headers is a real requirement. In fact that the way one WSO2 client uses to handle 2 billion messages per day :). So for this kind of scenarios WSO2 ESB performs better than UltraESB.
This is what I have mentioned in my blog. People has different kinds of real requirements for that it is better to do performance testing by fine tuning ESB for those scenarios with real HW they run.
Regarding the configuration has this fine tuned the nhttp properties? And any idea why UltraESB does not perform better than the normal case compared to zeroCopy case? Do you have any article comparing Ultra esb normal performance with zero copy enabled performance?
I wonder why esb performance site does not allow commenting. It is only fair that view points of others & findings such as these can be linked from the pages on that site.
I re run the performance tests as with the RAMCacheOption that Ruwan has suggested. Here were the results.
-n 500 -c 1000 - 6,720.89
-n 1000 -c 500 - 7,020.80
-n 2000 -c 250 - 6,904.56
-n 5000 -c 100 - 6,729.71
-n 10000 -c 50 - 6,463.72
This shows a clear improvement compared to WSO2 ESB binary relay. Then I re run the tests with the pass through transport given here with the WSO2ESB. Here are the results.
-n 500 -c 1000 - 7,125.96
-n 1000 -c 500 - 7,274.25
-n 2000 -c 250 - 7,229.12
-n 5000 -c 100 - 6,811.96
-n 10000 -c 50 – 6,435.23
So it has better performance than Ultra ESB fine tuned numbers. If we remove the 50 concurrent case (this shows low performance on both service due not having enough load to achieve maximum TPS) there is a 4% performance improvement with the Ultra ESB.
This again prove my initial thought. Always we need to fine tune the ESB for the given scenario with the best options and compare the performance.
And also if some one need to do a performance bench mark to give a general opinion it should have following properties.
1. Need to run sufficiently large number or messages (eg 500,000) in order to produce consistent results.
2. For each and every scenario, the best fine turned options of each vendor has to be used.
3. When analyzing results, each an every scenarios analyze separately with message sizes.
Here is the WSO2 ESB pass through performance link.
Here is the latest round of ESB performance done in January 2013.
Post a Comment