Shedlock, Duplicate Batch Processing, and lockAtMostFor

I often use Shedlock for batch processing in clustered environments. It works great. I hit an interesting situation recently though, where I was surprised at the behavior. Basically, I had a batch job configured like this:

@SchedulerLock(name = "SomeBatchJob", lockAtMostFor = "5m", lockAtLeastFor = "1m")

Per the documentation, I set the lockAtMostFor above what I thought was a reasonable execution time. I had three nodes in the cluster, and released it into the wild. I noticed at some point that multiple hosts were processing the batch job at the same time, and so I started looking at data. At first, everything looked normal:

Date/Time	Host	Message
6:45:00 am	host 1	batch job started
6:45:50 am	host 1	batch job completed
6:46:15 am	host 2	batch job started
6:47:30 am	host 2	batch job completed

...but then at some point, I saw this:

6:53:01 am	host 1	batch job started
6:58:30 am	host 2	batch job started
6:59:05 am	host 1	batch job completed
6:59:23 am	host 3	batch job started
7:00:21 am	host 2	batch job completed
7:00:50 am	host 1	batch job started

Basically, what happened was that host 1 took longer than the lockAtMostFor time to process, and so the lock was released. At that point, host 2 started processing as would be expected. However, when host 1 finally finished processing, it released the lock from host 2 when it was done, and then host 3 picked up and started processing concurrent to host 2. And this continued for a little while until the execution times reduced.

Long story short, it's really important to set the lockAtMostFor to a high number. Also, adding some logging, alerts, or some other control mechanism might be helpful as well to catch this, because there are cases where double-processing a batch would be very bad.