Sunday, June 17, 2007

Efficient Project Execution (An Idea)

Problems:

Project Organization: In most software companies, the project information such as requirements, designs, tasks, discussions, etc is scattered in various documents (word, PowerPoint, spreadsheets, wiki pages, blog, mail, chat, etc).
  1. It's hard to track a project's progress with this disconnected information model, even if they use wiki to create documents they are usually lost in the wiki pages jungle. Any discussions occurred about a project via emails, chat, meetings are lost unless somebody collects and organizes them manually.
  2. It's hard to trace project requirements, designs, tasks, people, milestones, bugs, etc with the information model.
  3. It's hard to learn lessons from previous projects using the disconnected information model.
Project Status:Usually managers conduct daily or weekly meetings to get status from developers working on a project. Status meetings are expensive. For example, work on a project is halted for about 30 minutes prior to the meeting + time spent in the meeting + 30 minutes after the meeting, this time is per person.

Project managers or developers should be able to answer questions like

  • How much time is being spent on what tasks?
  • Is there heavy context switching going on between unrelated projects or tasks?
  • What percentage of a project's work is completed?
  • What are the chances of not completing a project on time?
  • What tasks (supporting customers, supporting old code, etc) are slowing down the current project?
Time Tracking:Usually developers record time they spent on a requirement after a task is completed, rather than continuously recording time, so that it is useful to find out information about context switching between projects or tasks. For example, to calculate amount of time spent on bugs on the previous release versus time spent on a new project or task.

A typical developer's day may be as follows
  • 9:00AM :Start requirement 1
  • 10:00AM:Stop previous task, disturbed by a team mate to discuss something.
  • 10:45AM :Start requirement 1
  • 12:00PM :Stop previous task
  • 1:00PM :Attend status meeting
  • 1:45PM :Start requirement 1
  • 3:00PM :Stop previous task, check email and respond.
  • 3:30PM :Start Requirement 1
  • 5:00PM :Stop previous task
The typical time tracking works by recording time taken to finish a task, in the above case it is 5Hrs spent on "requirement 1". The time tracking tool lost the critical information about context switching for example, in the above scenario first interruption (highlighted) resulted in 45 minutes lose.

Solution:

We could have an enterprise web application (web2.0) that allows interconnection of projects, requirements, designs, tasks, milestones, bugs, people, blog entries, etc.

Here's the laundry list of features of a typical product.

  • Create projects.
  • Add people to projects with various roles.
  • Create individual requirements (could be as a wiki page).
  • Allow project members to create various types of tasks for each requirement. The standard tasks could be requirements analysis, design, implementation, test and documentation tasks.
  • Assign requirements/tasks to team members.
  • Create milestones by grouping either requirements or tasks.
  • Associate tasks to requirements.
  • Associate bugs to requirements.
  • Associate part of a project to another project to create a dependency.
  • Allow discussions between two or more members about some requirement to happen from within a requirement wiki page, so that information is recorded and associated to a requirement appropriately.
  • Allow time tracking per team member as well as for the whole project team. The time tracking would provide information about context switching, amount of time spent on various projects, etc.
Existing Products:
There are number of products that are available already, for example, basecamp from 37signals.com, it offers disconnected todo lists and time tracking per requirement basis not continuous time tracking. There are also standalone time tracking tools that detect time spent on IDE, browser, etc and they are not integrated into any project either.

Conclusion:

I think that there is an opportunity for a enterprise product (web 2.0) that is one stop shop for
  • organizing project information (projects, requirements, tasks, time, people, milestones, bugs, etc).
  • tracking time.
  • enabling efficient collaboration such as Instant Messaging from within a requirement page, etc.
  • Integrate bug tracking system, wikis, IM, email, blogs, etc.

Wednesday, June 13, 2007

Timed Deadlock

What is Timed Deadlock?:Timed deadlock is a deadlock where the thread t1 holds the lock l1 and wants to acquire a resource (for example a connection from the connection pool) when there are no resources and waits (sleeps) for a resource(s) to be available. In the mean time the thread t2 which has a resource but needs lock l1 to complete its task. The following figure illustrates timed deadlock.



Figure-1: Timed Deadlock

Example: The timed deadlock can be found in JBoss 2.4.x, please refer to the Reported Bug. The situation is caused when multiple to users (n > 20) are trying to access a portal. The connection pool size is set to 20 and the cache for portal meta data is set to expire every 2 minutes. When there are no connections in the connection pool, the requesting thread will be blocked for 30 secs (default).

The attached thread dump (from the reported bug) indicates that one thread holds a lock and ends up sleeping when trying to acquire a connection from the pool. All the other threads wait for the above lock to be release to finish their work. I am not sure if the other threads are holding connections or connections are not returned to the pool (connection leak).There could be multiple problems in JBoss portal contributing to the timed deadlock.

Conclusion: One thing is for sure that threads shouldn't hold a lock and interact with pools, especially the pools that provide waiting(sleep) option.

Saturday, June 2, 2007

Effective Profiling

J2EE Application and Profiling: A typical J2EE application contains logical layers/components (Presentation, Business, Data, etc) as shown below in the Figure-1. When we want to find bottlenecks in a J2EE application, easiest thing to do is to hookup a profiler (OptimizeIt, Jprobe, Jrockit, JProfiler etc), run a test, generate profile data, and find top hotspot methods. The profilers indicate top bottleneck methods in terms of CPU usage. We try to eliminate top hotspot methods either by making them efficient or by calling them less frequently and rerun tests to check the improvement in terms of application's throughput and/or response time. If the throughput and response time are not achieved, we repeat the above procedure until we achieve desired throughput and response time.



Figure-1: A typical J2EE Application

Problem: The above procedure is a trial and error effort. The profile data doesn't indicate which layer/component is a bottleneck. Typically profilers use either sampling or instrumentation for measuring CPU used by methods. The instrumentation method is useful when there is only one user running a test. The sampling method is used when an application needs to be profiled with concurrent users running use case(s). The sampling method doesn't offer complete picture of the system in terms of which components/layers are real bottlenecks.

Effective Profiling: The information per component/layer
may indicate which component/layer needs attention. To get per component information, we could disable/enable components under test one by one and run load tests to measure throughput and response time. Enabling/Disabling means either introducing a mock component or caching results of a component/layer/method and serving the same set of results for every request/call. By running tests with the real component/layer vs caching component/layer we could find out the impact of a component/layer on the overall system performance .

For example, in Figure-1 shown above we can find out cost of all the layers by load testing application and measure maximum throughput (pages/sec) that can be achieved on a given hardware configuration, lets say it is Xoriginal pages/sec.

Next, we will load test application with cache in Layer3 as shown in Figure-2 to find out the cost associated with Layer3 we could introduce a cache and load test the application to measure throughput, lets say it is Xcache3 pages/sec.



Figure-2: Cache in Layer3

If Xcache3 > Xoriginal we can find and fix bottlenecks by profiling Layer3. Lets call the measured throughput after fixing all bottlenecks in Layer3 as Xfixed3 which should be as close as possible to Xcache3.

In Figure-3 shown below, we can find out the cost associated with the Layer2 by introducing a cache in Layer2, lets say the measured maximum throughput is Xcache2 pages/sec.



Figure-3: Cache in Layer2 and Layer3

We have results of three load test runs as Xoriginal, Xfixed3 and Xcache2 pages/sec. Next, we will have to decide which layers need attention based on the following conditions.
  • If Xcache2 = Xfixed3 > Xoriginal we try to find and fix bottlenecks in Layer1.
  • If Xfixed3 <<>cache2 we try to find and fix bottlenecks in Layer2.
  • The condition Xfixed3 >> Xcache2 may not be possible for obvious reasons.
  • Xcache2 >= Xfixed3 > Xoriginal is always true.
Conclusion: Load testing with the cache in layers or components can provide critical information such as maximum achievable throughput. The information also helps in choosing the right component/layer to profile and find bottlenecks and fix them.