Shedlock, Duplicate Batch Processing, and lockAtMostFor


June 17th 2020


I often use Shedlock for batch processing in clustered environments. It works great. I hit an interesting situation recently though, where I was surprised at the behavior. Basically, I had a batch job configured like this:

@SchedulerLock(name = "SomeBatchJob", lockAtMostFor = "5m", lockAtLeastFor = "1m")

Per the documentation, I set the lockAtMostFor above what I thought was a reasonable execution time. I had three nodes in the cluster, and released it into the wild. I noticed at some point that multiple hosts were processing the batch job at the same time, and so I started looking at data. At first, everything looked normal:

Date/Time Host Message
6:45:00 am host 1 batch job started
6:45:50 am host 1 batch job completed
6:46:15 am host 2 batch job started
6:47:30 am host 2 batch job completed

...but then at some point, I saw this:

6:53:01 am host 1 batch job started
6:58:30 am host 2 batch job started
6:59:05 am host 1 batch job completed
6:59:23 am host 3 batch job started
7:00:21 am host 2 batch job completed
7:00:50 am host 1 batch job started


Basically, what happened was that host 1 took longer than the lockAtMostFor time to process, and so the lock was released. At that point, host 2 started processing as would be expected. However, when host 1 finally finished processing, it released the lock from host 2 when it was done, and then host 3 picked up and started processing concurrent to host 2. And this continued for a little while until the execution times reduced.

Long story short, it's really important to set the lockAtMostFor to a high number. Also, adding some logging, alerts, or some other control mechanism might be helpful as well to catch this, because there are cases where double-processing a batch would be very bad.

I believe that software development is fundamentally about making decisions, and so this is what I write about (mostly). In 2018 I started Highline Solutions, a consulting practice that helps companies with architecture, devops, and full-stack development. I have two degrees from Carnegie Mellon University, one in Information and Decision Systems and one in Philosophy (thesis). I live in Pittsburgh, PA with my wife and 3 energetic boys.
Got a Comment?

Comments (0)

 None so far!