Example 1 ‑ A Long Time in Transaction State

Background

Some batch work is run daily, consisting of four reports that are run concurrently. Normally, all of these reports finish in 43 minutes and use 100 percent CPU during that time. Three of the reports are inquiry‑only, and the fourth does a lot of updating.

The Issue

After a small change to one of the reports, the batch work now takes 57 minutes and CPU consumption dropped to just over 50 percent CPU for the first 25 minutes. IO rates never were very high before, and did not change appreciably.

It looks like a logical bottleneck has been introduced.

Diagnosing the Problem

To diagnose this further, we would normally ask "What has changed?", and frequently the answer becomes obvious. As this looks like a logical bottleneck, we turn to Jade Monitor for confirmation.

When the batch work was investigated in the past, no locking issues were observed. There were a lot of exclusive locks and they were held for relatively long durations, but there were no queued locks.

After the change to the application, the Monitor shows there were about the same number of locks and they were held for shorter periods. They were all held by the updating report, as before.

However, there are also now some queued locks.

From this, we can see that one application has three collections exclusively locked and the other three applications are queued up waiting for a share lock on one of those collections.

Looking at the application's code, we find that the three inquiry reports' primary access is an iterator over myRoot.allParentsById, which is a collection of type ParentByIdDict. The updating report always did modify a large number of objects, but no collections were changed.

The small application change had been to add a parent object in some rare circumstances, which resulted in adding the object into the three collections, exclusively locking them. Commits were already being done periodically, based on the number of objects updated, and the locks were being held a lot longer than they needed to be.

The Solution

The application was changed to do a commit after creating each parent object, in addition to commits being done based on updated object count. This resulted in the collections being exclusively locked for a very short period.

After applying the fix, the report set was finishing in about the same time it always had, and returned the process to being CPU‑bound with all CPUs busy during the report period.

An alternative solution would be to use:

Conclusion

If frequently accessed collections are left exclusively locked for long periods, inquiry users can be left waiting for long periods. Transactions can be altered to reduce the length of time locks are held. Using deferred updates or update locks may improve concurrency by delaying the exclusive locks.