Hi,

looking at taskflow code, it seems that those last_modified entries may not always be deleted:
https://github.com/openstack/taskflow/blob/master/taskflow/jobs/backends/impl_redis.py#L449

I think it's something that could be improved but it doesn't indicate a potential bug there.
Which octavia release do you use?

I don't see the octavia_jobboard:listings hash in your output.
it is used to keep all the current jobs in taskflow, when a job is posted, an element is added:
https://github.com/openstack/taskflow/blob/master/taskflow/jobs/backends/impl_redis.py#L774-L775

when the conductor is started in octavia (for instance when the worker restarts after a crash/kill), it fetches all the elements of this hash to schedule the jobs.
https://github.com/openstack/taskflow/blob/master/taskflow/jobs/backends/impl_redis.py#L823

any suspicious backtraces in the octavia worker, healthmanager, housekeeping logs?

Greg


On Thu, Sep 26, 2024 at 3:15 PM Payne Max <yardalgedal@gmail.com> wrote:

Hi, OpenStack community,

 

I’ve faced a problem when some of our jobs can get lost by a worker, for example from the screenshot, SIGTERM was called in several seconds after receiving a job by a worker.

 

Then there were no new log messages related to this job. Then our client complained that LB stucked in PENDING_UPDATE for several days and we started investigation.

Our MySQL (persistent storage) is clean, but in our Redis, I can see several jobs without TTL and I think they are related to the «lost» jobs.

 

Is it an ok situation? Can it be related to the https://github.com/openstack/octavia/blob/master/octavia/common/base_taskflow.py#L209-L211? Let’s discuss it!