Shedlock, Duplicate Batch Processing, and lockAtMostFor


June 17th 2020


I often use Shedlock for batch processing in clustered environments. It works great. I hit an interesting situation recently though, where I was surprised at the behavior. Basically, I had a batch job configured like this:

@SchedulerLock(name = "SomeBatchJob", lockAtMostFor = "5m", lockAtLeastFor = "1m")

Per the documentation, I set the lockAtMostFor above what I thought was a reasonable execution time. I had three nodes in the cluster, and released it into the wild. I noticed at some point that multiple hosts were processing the batch job at the same time, and so I started looking at data. At first, everything looked normal:

Date/Time Host Message
6:45:00 am host 1 batch job started
6:45:50 am host 1 batch job completed
6:46:15 am host 2 batch job started
6:47:30 am host 2 batch job completed

...but then at some point, I saw this:

6:53:01 am host 1 batch job started
6:58:30 am host 2 batch job started
6:59:05 am host 1 batch job completed
6:59:23 am host 3 batch job started
7:00:21 am host 2 batch job completed
7:00:50 am host 1 batch job started


Basically, what happened was that host 1 took longer than the lockAtMostFor time to process, and so the lock was released. At that point, host 2 started processing as would be expected. However, when host 1 finally finished processing, it released the lock from host 2 when it was done, and then host 3 picked up and started processing concurrent to host 2. And this continued for a little while until the execution times reduced.

Long story short, it's really important to set the lockAtMostFor to a high number. Also, adding some logging, alerts, or some other control mechanism might be helpful as well to catch this, because there are cases where double-processing a batch would be very bad.

I'm an "old" programmer who has been blogging for almost 20 years now. In 2017, I started Highline Solutions, a consulting company that helps with software architecture and full-stack development. I have two degrees from Carnegie Mellon University, one practical (Information and Decision Systems) and one not so much (Philosophy - thesis here). Pittsburgh, PA is my home where I live with my wife and 3 energetic boys.
I recently released a web app called TechRez, a "better resume for tech". The idea is that instead of sending out the same-old static PDF resume that's jam packed with buzz words and spans multiple pages, you can create a TechRez, which is modern, visual, and interactive. Try it out for free!
Got a Comment?
Comments (0)

 None so far!