Hi, OpenStack community,

 

I’ve faced a problem when some of our jobs can get lost by a worker, for example from the screenshot, SIGTERM was called in several seconds after receiving a job by a worker.

 

Then there were no new log messages related to this job. Then our client complained that LB stucked in PENDING_UPDATE for several days and we started investigation.

Our MySQL (persistent storage) is clean, but in our Redis, I can see several jobs without TTL and I think they are related to the «lost» jobs.

 

Is it an ok situation? Can it be related to the https://github.com/openstack/octavia/blob/master/octavia/common/base_taskflow.py#L209-L211? Let’s discuss it!