[octavia] Lose of some jobs by worker

26 Sep 2024

      Hi, OpenStack community,

I’ve faced a problem when some of our jobs can get lost by a worker, for example from the screenshot, SIGTERM was called in several seconds after receiving a job by a worker.

[cid:image001.png@01DB1013.D648A630]
Then there were no new log messages related to this job. Then our client complained that LB stucked in PENDING_UPDATE for several days and we started investigation.
Our MySQL (persistent storage) is clean, but in our Redis, I can see several jobs without TTL and I think they are related to the «lost» jobs.

[cid:image002.png@01DB1014.1DFF1BD0]
Is it an ok situation? Can it be related to the https://github.com/openstack/octavia/blob/master/octavia/common/base_taskflo... Let’s discuss it!

[octavia] Lose of some jobs by worker

Payne Max