Saturday, June 2, 2007

Effective Profiling

J2EE Application and Profiling: A typical J2EE application contains logical layers/components (Presentation, Business, Data, etc) as shown below in the Figure-1. When we want to find bottlenecks in a J2EE application, easiest thing to do is to hookup a profiler (OptimizeIt, Jprobe, Jrockit, JProfiler etc), run a test, generate profile data, and find top hotspot methods. The profilers indicate top bottleneck methods in terms of CPU usage. We try to eliminate top hotspot methods either by making them efficient or by calling them less frequently and rerun tests to check the improvement in terms of application's throughput and/or response time. If the throughput and response time are not achieved, we repeat the above procedure until we achieve desired throughput and response time.



Figure-1: A typical J2EE Application

Problem: The above procedure is a trial and error effort. The profile data doesn't indicate which layer/component is a bottleneck. Typically profilers use either sampling or instrumentation for measuring CPU used by methods. The instrumentation method is useful when there is only one user running a test. The sampling method is used when an application needs to be profiled with concurrent users running use case(s). The sampling method doesn't offer complete picture of the system in terms of which components/layers are real bottlenecks.

Effective Profiling: The information per component/layer
may indicate which component/layer needs attention. To get per component information, we could disable/enable components under test one by one and run load tests to measure throughput and response time. Enabling/Disabling means either introducing a mock component or caching results of a component/layer/method and serving the same set of results for every request/call. By running tests with the real component/layer vs caching component/layer we could find out the impact of a component/layer on the overall system performance .

For example, in Figure-1 shown above we can find out cost of all the layers by load testing application and measure maximum throughput (pages/sec) that can be achieved on a given hardware configuration, lets say it is Xoriginal pages/sec.

Next, we will load test application with cache in Layer3 as shown in Figure-2 to find out the cost associated with Layer3 we could introduce a cache and load test the application to measure throughput, lets say it is Xcache3 pages/sec.



Figure-2: Cache in Layer3

If Xcache3 > Xoriginal we can find and fix bottlenecks by profiling Layer3. Lets call the measured throughput after fixing all bottlenecks in Layer3 as Xfixed3 which should be as close as possible to Xcache3.

In Figure-3 shown below, we can find out the cost associated with the Layer2 by introducing a cache in Layer2, lets say the measured maximum throughput is Xcache2 pages/sec.



Figure-3: Cache in Layer2 and Layer3

We have results of three load test runs as Xoriginal, Xfixed3 and Xcache2 pages/sec. Next, we will have to decide which layers need attention based on the following conditions.
  • If Xcache2 = Xfixed3 > Xoriginal we try to find and fix bottlenecks in Layer1.
  • If Xfixed3 <<>cache2 we try to find and fix bottlenecks in Layer2.
  • The condition Xfixed3 >> Xcache2 may not be possible for obvious reasons.
  • Xcache2 >= Xfixed3 > Xoriginal is always true.
Conclusion: Load testing with the cache in layers or components can provide critical information such as maximum achievable throughput. The information also helps in choosing the right component/layer to profile and find bottlenecks and fix them.

0 comments: