openstack-discuss search results for query "#eventlet-removal"
openstack-discuss@lists.openstack.org- 186 messages
Re: [openstack] about libvirt exporter
by Nguyễn Hữu Khôi
Hi.
Thank you for very useful information. I am looking for 2026.2 for changes.
Nguyen Huu Khoi
On Mon, Sep 29, 2025, 8:17 PM Sean Mooney <smooney(a)redhat.com> wrote:
> i believe there are 2-3 libvirt exporters but none of them are
> officially part
> of the libvirt, Prometheus or openstack projects.
>
> the topic of if nova should/cloud have a export has come up in the past.
>
> my suggestion at thet time was if we restart and complete the
>
> https://specs.openstack.org/openstack/nova-specs/specs/2024.2/approved/per-…
>
> spec which adds a per procoess /health rest endpoint we could extend
> that with a /metrics or /telemetry endpoint in open metrics/telemetry
> format
>
> https://edgedelta.com/company/blog/opentelemetry-vs-opentracing-vs-openmetr…
>
> so that it can provide a promethus scrape endpoint
>
>
> the idea was actually originally raised by john garbut when we first
> dicssed the healthcheck endpoint at the ptg and
> we effectively said "yes that could be the direction but lets keep the
> initial scope small"
>
> this work was partly paused because of the eventlet removal work but its
> something i think we should revisit eventrually
> perhaps in 2026.2
>
> the reason i bring up eventlet is the current implementation uses the
> eventlet webserver but i have already consider
> replacing that with either the stdlib webserver to have 0 depencies or
> https://github.com/cherrypy/cheroot
> to align with ironic which already ueses cheroot to host there jsonrpc
> endpoints if we needed something
> more performant.
>
> in general how the existing libvirt node exporter work are via directly
> talking to the libvirt api
> and then parsing the debug metadata we add to the domain xmls to
> associate the domain metrics with open stack resocues.
>
> while the domain xml this is strictly speaking a private debug interface
> that is subject to change or removal
> and there for not something that should be relied on for external
> integrations, it is used by ceilomenter
> collectd and many other tools and we have not make a backward
> incompatible change since it was introduced so
> its more or less reasonable for tools like this to use that info in this
> way.
>
>
> On 29/09/2025 13:42, Nguyễn Hữu Khôi wrote:
> > Hello.
> >
> > I see that this exporter looks like it collects a lot of information
> > such as projectname, username or flavor of metrics but it looks not
> > popular. It could help us build a very useful dashboard.
> >
> > https://github.com/zhangjianweibj/prometheus-libvirt-exporter
>
> there is also https://github.com/inovex/prometheus-libvirt-exporter
>
> which is a fork of it and https://github.com/kumina/libvirt_exporter
> which is what was packaged by distos like debian in the past
>
> https://salsa.debian.org/go-team/packages/prometheus-libvirt-exporter
>
> so there have been a couple of competing implementation and no clear
> winner that im aware of between the 3
>
>
> >
> > I just want to understand some thoughts.
> >
> > Nguyen Huu Khoi
>
>
3 months
[tc][all] OpenStack Technical Committee Weekly Summary and Meeting Agenda (2025.2/R-9)
by Goutham Pacha Ravi
Hello Stackers,
We're a month away from the feature freeze date for the 2025.2
("Flamingo") release cycle [1], scheduled for 2025-08-28. OpenStack
project teams are observing various internal deadlines prior to that
important date to ensure a complete and timely release.
As we look ahead, OpenStack's election officials will begin accepting
self-nominations for the upcoming Project Team Lead roles and four
seats on the OpenStack Technical Committee starting 2025-08-06 [2].
The nomination deadline is two weeks later, on 2025-08-20. You are
welcome to submit an early nomination [3] now if you anticipate being
away during the nomination window. To be a Project Team Lead
candidate, you must be an Active Project Contributor—i.e., someone who
has contributed to the project you wish to lead in the past 12 months
(counted up to 2025-08-20). Any individual member of the OpenInfra
Foundation, regardless of contributor status, may propose their
candidacy for the OpenStack Technical Committee.
To vote in these elections, you must be a member of the OpenInfra
Foundation [4] and an active contributor—either through merged
contributions to official projects or via nomination by a project’s
PTL, liaisons, any Special Interest Group, or the OpenStack Technical
Committee [5]. To receive your ballot, please ensure your email
address is added in Gerrit and that you've opted in to receive emails
from CIVS before 2025-08-20:
https://civs1.civs.us/cgi-bin/opt_in.pl
In the past week, the TC approved revised milestones for the
cross-project goal of removing the use of Eventlet [6]. We still aim
to drop Eventlet usage completely in the 2027.2 ("J") release. The
"aetos" project now has a canonical service type called
"metric-storage" [7]. Several other governance changes are currently
under community review.
=== Weekly Meeting ===
The last weekly meeting of the OpenStack TC took place [8] on
2025-07-22, with a smaller attendance than usual. We expect this trend
to continue through the summer break in the northern hemisphere. Key
updates included changes to the Eventlet-removal goal timeline and a
deeper look into the prolonged inactivity of the Monasca project team.
While volunteers have consistently expressed interest in maintaining
the project, it hasn’t met reactivation milestones since its last
active cycle (2024.1). However, some improvements have been merged
across the team’s repositories. The TC agreed to enforce a more formal
and transparent process: projects must explicitly request extensions
to inactive status [9], and future documentation will include
retirement deadlines that compel TC votes on any extensions. We'll
continue discussing the next steps for Monasca through governance
proposals and this mailing list. Opendev administrators informed us
that they completed the transition off Nodepool, and Debian Trixie
images are nearly ready for testing. This is also useful for
transitioning the Ceph CI job to Debian to enable faster fixes in the
client software. In open discussion, the group revisited the
definition of corporate affiliation in the TC charter. A proposal will
be made to clarify what affiliation diversity means and how/why it
should be disclosed during elections.
The next meeting of the OpenStack Technical Committee is today,
2025-07-29. It will be hosted on the #openstack-tc channel on OFTC at
1700 UTC. Please find the meeting's agenda and other details on its
wiki page [10]. I hope you'll be able to join us.
=== Governance Proposals ===
=== Merged ===
- Make Eventlet removal deadlines more acceptable for operators |
https://review.opendev.org/c/openstack/governance/+/952903
- Retire networking-midonet |
https://review.opendev.org/c/openstack/governance/+/955364
=== Open for review ===
- Define "affiliation" within the context of the TC |
https://review.opendev.org/c/openstack/governance/+/956024
- Add series names to runtimes doc |
https://review.opendev.org/c/openstack/governance/+/955810
- Remove Monasca from active project |
https://review.opendev.org/c/openstack/governance/+/953671
- os-test-images never releases |
https://review.opendev.org/c/openstack/governance/+/954249
- Require declaration of affiliation from TC Candidates |
https://review.opendev.org/c/openstack/governance/+/949432
=== Upcoming Events ===
- 2025-08-06: Nominations open for OpenStack elections:
https://governance.openstack.org/election/
- 2025-08-20: Nominations close for OpenStack elections
- 2025-08-26: OpenInfra Days, Korea: https://2025.openinfradays.kr/
- 2025-08-28: Feature Freeze deadline, Milestone-3 of the Flamingo release [1]
- 2025-08-29: Colombia OpenInfra User Group Meetup:
https://www.meetup.com/colombia-openinfra-user-group/events/307096751/
- 2025-10-17: OpenInfra Summit, Paris-Saclay: https://summit2025.openinfra.org/
Thank you very much for reading!
On behalf of the OpenStack TC,
Goutham Pacha Ravi (gouthamr)
OpenStack TC Chair
[1] 2025.2 "Flamingo" Release Schedule:
https://releases.openstack.org/flamingo/schedule.html
[2] 2026.1 OpenStack Elections: https://governance.openstack.org/election/
[3] Submitting your candidacy:
https://governance.openstack.org/election/#how-to-submit-a-candidacy
[4] Join the OpenInfra Foundation: https://openinfra.org/join/individual/
[5] Who is an Active Contributor:
https://governance.openstack.org/tc/reference/charter.html#voters-for-tc-se…
[6] Eventlet removal timeline:
https://governance.openstack.org/tc/goals/selected/remove-eventlet.html#com…
[7] OpenStack service types:
https://specs.openstack.org/openstack/service-types-authority/#service-data
[8] TC Meeting IRC Log 2025-07-22:
https://meetings.opendev.org/meetings/tc/2025/tc.2025-07-22-17.00.html
[9] Emerging and inactive projects:
https://governance.openstack.org/tc/reference/emerging-technology-and-inact…
[10] TC Meeting Agenda, 2025-07-29:
https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting
5 months
Re: [oslo.messaging] Heartbeat in pthread
by Takashi Kajinami
On 10/1/24 23:31, Arnaud Morin wrote:
> Hey,
>
> I totally agree about the fact that heartbeat_in_pthread and the
> oslo.log PipeMutex are technical debt that we need to get rid of,
> as well as eventlet.
>
> However, despite the fact that it seems purely cosmetic on your side,
> we believe it's not.
> I can't prove / reproduce the issue on a small infra, but definetely,
> at large scale, having those tcp connections to be dropped by rabbitmq
> and recreated in a loop by agents is affecting the cluster.
>
> I know all the pain that these settings introduced in the past, but now
> I feel we are in a stable situation regarding this, that's why I am
> surprised about deprecating heartbeat_in_pthread now.
>
> Can we, as least, make sure we keep all of this until we switch off
> eventlet?
> In other words, can we get rid of eventlet, then remove this params?
> and not the opposite?
That's the plan. We deprecated the parameter because it is no longer useful
*ONCE* we get rid of eventlet completely. The parameter will be removed ONLY
AFTER the eventlet removal is down.
>
> Regards,
>
> Arnaud
>
>
> On 01.10.24 - 11:38, smooney(a)redhat.com wrote:
>> im glad you managed to make it work but form a nova perspective we
>> do not recommend using heartbeat_in_pthread=true with nova-compute to the
>> point that i woudl cosndier that config unsupported.
>>
>> we also dont recommend using it with nova-api even when running via a wsgi server such as mod_wsgi
>> or uwsgi.
>>
>> the only thing this has ever done is remove a cosmetic waring in the rabbit/nova logs
>> due to the heartbeat timing out. This has never fix any functional bug that
>> we were aware of but has resulted in several real bugs.
>>
>> the most recent we hit was https://launchpad.net/bugs/1983863 which was mitigated by
>> https://review.opendev.org/c/openstack/oslo.log/+/852443 however that uses a unsafe debug
>> option in eventlet eventlet.debug.hub_prevent_multiple_readers(False)
>>
>> while you may be able to make heartbeat_in_pthread work with a lot of work
>> as Takashi noted this will eventually go away when we remove evently and to enable that removal
>> we need to replace the PipeMutex that currently fixes logging in a native thread so
>> heartbeat_in_pthread is part of the technial debt we need to remvoe to evenrally allow
>> us to move away form eventlet entirly.
>>
>> On Tue, 2024-10-01 at 09:13 +0000, Arnaud Morin wrote:
>>> Yes, I agree that it used to be broken, but since the bug was reported,
>>> we merged the following fixes:
>>>
>>> https://review.opendev.org/c/openstack/oslo.messaging/+/894731
>>> https://review.opendev.org/c/openstack/oslo.messaging/+/875615
>>> https://review.opendev.org/c/openstack/oslo.messaging/+/876318
>>>
>>> That's why I believe everything should be fine now :)
>>>
>>>
>>> On 01.10.24 - 17:20, Takashi Kajinami wrote:
>>>> I was too fast to push Send button.
>>>>
>>>> It's still interesting to see that you enabled the feature for eventlet services,
>>>> such as nova-compute. In the past we got a few bugs caused by that feature,
>>>> which made us eventually revert the default value to False.
>>>> https://bugs.launchpad.net/oslo.messaging/+bug/1934937
>>>> https://bugs.launchpad.net/oslo.messaging/+bug/1949964
>>>> https://bugs.launchpad.net/oslo.messaging/+bug/1949964
>>>>
>>>> You might need to check if the reported problem is reproduced in your env.
>>>>
>>>> On 10/1/24 17:15, Takashi Kajinami wrote:
>>>>> Setting heartbeat_in_pthread is known to break services using eventlet
>>>>> so it SHOULD NOT be enabled by default. We tried to enable it by default
>>>>> in the past but eventually reverted it after seeing multiple problems.
>>>>>
>>>>> You can selectively disable it for services not using eventlet (api
>>>>> services run by http + mod_wsgi or uwsgi) but should keep it False for
>>>>> the other services.
>>>>>
>>>>> Once we get rid of eventlet then we no longer use eventlet thread for
>>>>> heartbeat so we no longer need that option (because the behavior would
>>>>> be equivalent to one with heartbeat_in_pthread=True). But until that point
>>>>> we can't change the default, unless someone is willing to dig into
>>>>> the past problems to make the feature completely work with eventlet (which
>>>>> I don't think worth paying effort for at this stage).
>>>>>
>>>>> On 10/1/24 16:34, Arnaud Morin wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I completely miss the deprecation of heartbeat_in_pthread in
>>>>>> oslo.messaging [1].
>>>>>>
>>>>>> We heavily rely on this parameter downstream and our opinion is that it
>>>>>> should be set to True by default. We use it for both wsgi services and
>>>>>> agents (nova-compute, neutron agents, etc.).
>>>>>>
>>>>>> I understand that eventlet will be dropped in the future, but should we
>>>>>> set heartbeat_in_pthread to True by default until then?
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Arnaud.
>>>>>>
>>>>>>
>>>>>> [1] https://review.opendev.org/c/openstack/oslo.messaging/+/925778
>>>
>>
1 year, 3 months
[tc][all] OpenStack Technical Committee Weekly Summary and Meeting Agenda (2025.2/R-14)
by Goutham Pacha Ravi
Hello Stackers,
We're fourteen weeks away from the coordinated 2025.2 "Flamingo"
release of OpenStack [1], and Milestone-2 for this release is on
2025-07-03. In the past week, the Call for Papers for the upcoming
OpenInfra Summit in Paris-Saclay ended; however, the deadline to
propose Forum sessions and Project Updates sessions is 2025-07-08 at
23:59 PST [2]. This time at the summit, there's also a "New
Contributor Showcase" inviting presentations on projects that leverage
or support open infrastructure. Whether it’s your first contribution,
an open source tool, or a collaboration with a community, we want to
hear about it, so please consider applying [3].
In the past week, the OpenStack Technical Committee did not merge any
new governance changes. A few change proposals are open for the
community's input. OpenDev Infra administrators and OpenStack's
Release Management team have posted several changes to begin enforcing
the Developer Certificate of Origin in lieu of the Contributor License
Agreement from July 1, 2025 [4]. If you haven't already begun using
"git commit -s" to sign off your commits in adherence to the DCO,
please do so. As a reminder, from July 1st, OpenDev's Gerrit system
(https://review.opendev.org) will reject any changes that do not
include a "Signed-off-by" line in the commit message. Simultaneously,
the existing CLAs will no longer be available for new contributors to
sign.
=== Weekly Meeting ===
The OpenStack Technical Committee met on OFTC's #openstack-tc channel
on 2025-06-17 [5]. The team confirmed plans to transition to DCO
(Developer Certificate of Origin) by July 1st, approving updates to
the contributor guide and discussing the ongoing process of
integrating DCO across tools and translations. We then focused on
improving the contributor experience, with analyses of review metrics
being shared with various project teams like Nova and Cinder through
weekly IRC meetings. This data collection aims to identify pain points
and best practices to inform future strategies. We plan to drive
collective attention to this effort during the PTG in October.
We also discussed the level of activity in the Cyborg project. We have
had issues contacting the core maintainers to get some critical fixes
merged, and the TC agreed to begin the process to mark the project
"inactive." This decision aligns with the upcoming M-2 release
deadline. The discussion also prompted a health check on several
project teams. We'll be discussing this in further detail on proposals
directly on Gerrit, or in a future weekly meeting.
OpenDev Infra administrators completed a transition from Nodepool to
Zuul-launcher and pleasantly have not seen any negative impact on the
CI. During an Open Discussion, concerns were raised about the timeline
specified by the Eventlet Removal TC goal [6]. Nova maintainers, in
particular, expressed the need for more time to ensure a stable
transition to threading and adequate operator testing, suggesting a
revised target of 2028.1 for full Eventlet removal, a point that will
be further debated on the mailing list [7] and a Gerrit proposal [8].
The next meeting of the OpenStack Technical Committee is on
2025-06-24. It will be hosted over IRC in OFTC's #openstack-tc channel
at 1700 UTC. Please find the agenda and other details on the meeting's
wiki page [9]. I hope you'll be able to join us.
=== Governance Proposals ===
==== Open for review ====
- Require declaration of affiliation from TC Candidates |
https://review.opendev.org/c/openstack/governance/+/949432
- Make Eventlet removal deadlines more acceptable for operators |
https://review.opendev.org/c/openstack/governance/+/952903
- Mark Cyborg inactive |
https://review.opendev.org/c/openstack/governance/+/952798
=== Upcoming Events ===
- 2025-06-28: OpenInfra+Cloud Native Day, Vietnam:
https://www.vietopeninfra.org/void2025
- 2025-07-03: OpenStack's 15th Birthday, Colombia User Group:
https://www.meetup.com/colombia-openinfra-user-group/events/308383244
- 2025-07-08: OpenInfra Board meeting: https://board.openinfra.org/
- 2025-07-19: OpenInfra Days, Indonesia: https://2025.openinfra.id/
Thank you very much for reading!
On behalf of the OpenStack TC,
Goutham Pacha Ravi (gouthamr)
OpenStack TC Chair
6 months, 1 week
PTG October 2025 Team List Announcement
by Kendall Nelson
Hello Everyone!
The October 2025 Project Teams List is official!
Projects + Teams
- OpenInfra Project Teams
- Kata Containers
- StarlingX
-
- OpenStack
- Services
- Blazar
- Cinder
- CloudKitty
- Ironic
- Keystone
- Kolla
- Magnum
- Manila
- Neutron
- Nova
- Octavia
- OpenStack Horizon
- OpenStack-Ansible
- Sunbeam
- Swift
- Telemetry
- Other OpenStack Teams
- Eventlet Removal
- openstack-i18n
- Release Management
- Technical Committee
If your team was planning to meet and isn’t in this list, please contact
ptg(a)openinfra.dev IMMEDIATELY.
Soon I will be contacting moderators to sign up for time via the PTGBot[1]
once it is configured to accept those reservations.
As usual, feel free to let us know if you have any questions!
Thanks!
-Kendall (diablo_rojo)
[1] PTGBot Docs:
https://opendev.org/openstack/ptgbot#open-infrastructure-ptg-bot
[2] PTG Registration: http://ptg.openinfra.org/ <http://ptg.openinfra.o/g>
3 months, 2 weeks
[tc][all] OpenStack Technical Committee Weekly Summary and Meeting Agenda (2025.2/R-13)
by Goutham Pacha Ravi
Hello Stackers,
We're thirteen weeks away from the coordinated release of OpenStack
2025.2 "Flamingo" [1]. This Thursday marks Milestone-2 of the release
cycle. This is a bug-targeting milestone for OpenStack project teams,
as well as a checkpoint for the OpenStack Release Management team to
line up deliverables that will be part of the coordinated release at
the end of this development cycle. Currently, the number of
deliverables to be released has only changed slightly from the
OpenStack 2025.1 release. No significant deliverables have been
deprecated or removed.
In the past week, the OpenStack Technical Committee did not make any
new governance changes; however, several proposals are under the
community's review. Most significantly, today (2025-06-30) marks the
end of the OpenStack Contributor License Agreement (CLA). The CLA was
adopted in July 2010 (do you feel old yet?) [2], and thousands of
contributors have signed it. However, for over a decade, we've felt
encumbered by how it was arcane, perceived to be more permissive than
Apache v2 and caused friction with individuals and organizations. We
believe it has limited contributor involvement. Today, we join several
other open source projects in requiring the "Developer Certificate of
Origin" as a way to sign off your contributions. As we close the lid
on the CLA regime, you'll need to "git commit -s" each of your
contributions from 2025-07-01 to adhere to the DCO [3][4]. We do
anticipate hiccups and request your cooperation in ironing out any
issues in the code review system and contributor tooling during this
transition. If you notice something broken, please chime in on this
mailing list or on OFTC's #opendev channel.
=== Weekly Meeting ===
The last weekly meeting of the OpenStack Technical Committee was held
on 2025-06-24 [5]. The meeting was well attended and covered several
topics. The proposal to mark the Cyborg project as inactive [6] was
withdrawn after critical CI fixes were merged. While activity has
resumed for this cycle, there are concerns about the project’s
long-term maintenance, particularly in light of future OpenStack-wide
changes like eventlet removal and dependency updates. It was opined
that Cyborg is not "feature complete" and must stay responsive to
ongoing ecosystem changes, even if its core functionality appears
stable. The upcoming cycle's elections will serve as a checkpoint to
reassess the project’s trajectory and whether new maintainers need
onboarding.
We then discussed the timeline of the ongoing effort to phase out
eventlet. A new governance patch [7] proposes a timeline that aligns
with operator and distro expectations, especially in light of Python
3.13 adoption. Python 3.12 will continue to be a fallback for some
time, but when supporting Python 3.13, we hope not to depend on
eventlet across the board. The TC emphasized that continued discussion
on the Gerrit change is necessary to finalize acceptable timelines.
The next major topic concerned setuptools changes that will impact PBR
(Python Build Reasonableness). Setuptools will remove "ScriptWriter"
and "pkg_resources" by October 31, 2025. These removals break current
functionality in PBR and could jeopardize CI and release workflows. A
critical bug was reported against PBR by maintainers of setuptools
[8]. The TC discussed options ranging from vendoring replacements to
rewriting PBR logic or adopting upstream tools despite performance
tradeoffs. A key concern is that PBR’s CI is currently broken and must
be fixed before any meaningful changes can be implemented. We'd like
to seek a volunteer to resolve this issue. Please chime in on OFTC's
#openstack-tc or #openstack-oslo if you're keen to help with this.
In closing, we discussed the OpenInfra Summit 2025 CFP for Forum
Sessions and Project Updates. CFP submissions must be made before
23:45 PST on 2025-07-08 [9].
The next meeting of the OpenStack Technical Committee will be held on
2025-07-01 at 1700 UTC. This meeting will be hosted simultaneously
over Meetpad and IRC. You're welcome to join whichever platform you
prefer. The Meetpad session will be recorded, and the recording will
be shared on the TC's YouTube channel [10]. Please find the agenda and
joining information on the meeting's wiki page [11]. I hope you'll be
able to join us there.
=== Governance Proposals ===
==== Open for review ====
- Add service-types-authority to SDK deliverables |
https://review.opendev.org/c/openstack/governance/+/953548
- Deprecate shade, os-client-config |
https://review.opendev.org/c/openstack/governance/+/953549
- Remove Monasca from active project |
https://review.opendev.org/c/openstack/governance/+/953671
- Make Eventlet removal deadlines more acceptable for operators |
https://review.opendev.org/c/openstack/governance/+/952903
- Require declaration of affiliation from TC Candidates |
https://review.opendev.org/c/openstack/governance/+/949432
=== Upcoming Events ===
- 2025-07-03: OpenStack's 15th Birthday, Colombia User Group:
https://www.meetup.com/colombia-openinfra-user-group/events/308383244
- 2025-07-08: OpenInfra Board meeting: https://board.openinfra.org/
- 2025-07-19: OpenInfra Days, Indonesia: https://2025.openinfra.id/
Thank you very much for reading!
On behalf of the OpenStack TC,
Goutham Pacha Ravi (gouthamr)
OpenStack TC Chair
[1] 2025.2 "Flamingo" Release Schedule:
https://releases.openstack.org/flamingo/schedule.html
[2] OpenStack and its CLA: https://wiki.openstack.org/wiki/OpenStackAndItsCLA
[3] OpenStack will replace CLA with DCO:
https://governance.openstack.org/tc/resolutions/20250520-replace-the-cla-wi…
[4] OpenStack Contributor Guide to DCO:
https://docs.openstack.org/contributors/common/dco.html
[5] TC Meeting IRC Log 2025-06-24:
https://meetings.opendev.org/meetings/tc/2025/tc.2025-06-24-17.00.html
[6] Cyborg will not be marked inactive:
https://review.opendev.org/c/openstack/governance/+/952798
[7] Timeline changes for the Eventlet Removal goal:
https://review.opendev.org/c/openstack/governance/+/952903
[8] PBR setuptools incompatibility: https://bugs.launchpad.net/pbr/+bug/2107732
[9] CFP for OpenInfra Summit 2025: https://summit2025.openinfra.org/cfp/
[10] OpenStack TC YouTube Channel: https://www.youtube.com/@openstack-tc
[11] TC Meeting Agenda, 2025-07-01:
https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting
6 months
Re: [eventlet-removal]When to drop eventlet support
by Ghanshyam Maan
---- On Fri, 13 Jun 2025 08:33:25 -0700 Jay Faulkner <jay(a)gr-oss.io> wrote ---
>
> On 6/13/25 5:08 AM, Balazs Gibizer wrote:
> > Hi Stackers!
> >
> > I would like to sync about the planned timeline of dropping eventlet
> > support from OpenStack / Oslo.
> >
> > Nova definitely needs at least the full 2026.1 cycle to have a chance
> > to transform the nova-compute service. But this plan already feels
> > stretched based on the progress in the current cycle. So being
> > conservative means we need the 2026.2 cycle as a buffer.
> >
> > Nova would like to keep a release where we support both eventlet and
> > threading in parallel. So that operators can do the switching from
> > eventlet to threading outside of the upgrade procedure. (This was an
> > explicit request from them during the PTG). So 2026.2 could be that
> > version where nova fully supports both concurrency mode, while
> > eventlet can be marked deprecated. Then the 2027.1 release could be
> > the first release dropping eventlet.
> >
> > However we need to align with the SLURP upgrade as well. 2026.1 is a
> > SLURP. But in that release Nova might not be ready to have all
> > services running in threading mode. So the 2026.1 - 2027.1 SLURP
> > upgrade would force the operators to change the concurrency mode
> > during the upgrade itself.
> >
> > I see two ways forward:
> > * A) We say that operators who want to do the concurrency mode change
> > outside of an upgrade could not skip the 2026.2 release, i.e. they
> > cannot do SLURP directly from 2026.1. to 2027.1.
This has a big impact on upgrades and breaks our SLURP model.
> > * B) We keep supporting the eventlet mode in the 2027.1 release as
> > well and only dropping support in 2028.1.
I am in favour of this option.
I was reading the goal doc about the timeline and found something in
'Completion Criteria' section[1] which says:
- (2027.1) Get usage of Eventlet in oslo deliverables removed;
- "(2027.2) Get Eventlet retired from OpenStack;"
Again, 2027.2 (non-SLURP) is mentioned as eventlet retirement, I do not know
if any technical reason to do it in non-SLURP or it can be moved to SLURP release.
Maybe hberaud knows.
Anyway, thanks gibi for bringing this. There are many projects that have not started the
work yet (Nova might have more work compared to others), but I think we should discuss/re-discuss
the timelines considering all projects/challenges. Accordingly, update the goal doc for the exact timelines
and what the impact will be for projects that do not finish work as per the timelines (for example upgrade
issue, workaround etc).
[1] https://governance.openstack.org/tc/goals/selected/remove-eventlet.html#com…
-gmaan
>
> Keeping eventlet running for that long is not something that is a worthy
> investment of time. The oslo libraries are showing a deprecation of
> 2026.2, I've been using that date as the target for all Ironic work as
> well.
>
> Beyond the oslo team (who I don't speak for), there are folks -- like
> Itamar on behalf of GR-OSS -- who are doing work behind the scenes to
> keep eventlet running - barely. I do not expect the GR-OSS investment in
> this work to extend much past the midpoint of 2026.
>
>
> My $.02,
>
> Jay Faulkner
> Open Source Developer
> G-Research Open Source Software
>
>
6 months, 2 weeks
[neutron][ptg] 2025.2 Flamingo PTG summary
by Brian Haley
Hi all,
Thanks for all that attended the meetings, we had a good turnout for
Neutron, the Nova/Cinder cross-project and the eventlet removal
discussions that were important to the team.
For complete notes see the etherpad [0] but a summary is below. And
please comment if you feel I missed anything.
Thank you,
-Brian
### Epoxy retrospective ###
Like any other PTG, it started with a retrospective of the past cycle,
with the highlights and improvement points.
Good:
- Eventlet removal is going well. Even the transition period (for the
Neutron API with ML2/OVN) finished with good results.
- Permanent "open doors" status to welcome newcomers. Our meetings (team
meeting, drivers meeting) are a good forum for them.
- Removed experimental code (Linuxbridge, IPv6 PD).
- Migration from ML2/Linux Bridge to ML2/OVN discussions have started in
Operators community
(https://etherpad.opendev.org/p/neutron-lb-ovn-migration)
Bad:
- The loss of 2 important core developers.
- No new specs merged in Epoxy cycle.
### Gate Stability ###
Team has been very good at addressing gate issues quickly, and meets on
a regular basis. We have noticed a few issues related to eventlet
changes and are reverting them until we can address them.
### Spring cleanup / Code base modernization ###
At the beginning of every cycle, we always try and cleanup the code
base, for example, removing deprecated and dead code, doc updates, job
consolidation. We have made progress on those already.
- During Epoxy, several changes were made to match the expected output
of pyupgrade/autopep8 for py39+. We will make a similar change in
Flamingo for py310+ as that is now the lower bounds (haleyb).
- It was also discussed to include "ruff" to our checklist (mlavalle).
- Now that the OVN agent is in place and the default in gate, it was
discussed about removing the OVN metadata agent as it is unnecessary to
have both.
- CI job reduction: move the ML2/OVS iptables-hybrid jobs to periodic
(and experimental) and determine if iptables-hybrid driver can be moved
to experimental config space.
- TODOs: spend some time addressing the current TODO notes in the code,
fixing what is proposed, already started.
- Remove the unneeded OVN maintenance tasks, according to the comments
in the methods.
- Will make sure beginning/end of cycle docs are up-to-date on things
like CI jobs, etc. (haleyb)
### Eventlet removal ###
The "oslo.service" library will implement two backends (eventlet or
threading), that will allow us to test both implementations during the
removal. It could be useful to have temporary CI jobs, using both
implementations.
The testing CI (unit test, functional and fullstack) should start the
migration during this cycle, but only once the Neutron service code is
migrated.
Please see [1] and [2] for more information and meetings notes from the
eventlet-specific agenda.
### Migration to SDK (neutronclient removal) ###
Etherpad link:
https://etherpad.opendev.org/p/python-neutronclient_deprecation
During the last cycle a number of Horizon patches merged (thanks Lajos!).
Heat patches are still under review (and not attended).
Nova patch is WIP: https://review.opendev.org/c/openstack/nova/+/928022
Neutron team will add deprecation warnings in the neutronclient code as
there are still a number of places in both in and out-of tree code that
is using the neutronclient.
The fullstack tests will migrate to SDK during this cycle.
### Migrate the OVN L3 scheduler to use HA_Chassis_Group ###
Link: https://bugs.launchpad.net/neutron/+bug/2092271
As recommended by the OVN core team, the usage of ``Gateway_Chassis`` to
bind a ``Logical_Router_Port`` is deprecated. Instead of this,
``HA_Chassis_Group`` should be used. This bug tracks this effort that
should be done during the next cycle.
### An OVN L3 router tool ###
Link: https://bugs.launchpad.net/neutron/+bug/2103521
The goal of this tool is:
- To be able to list the current GW LRP assignment, according to the
``HA_Chassis_Group``. It will show the GW chassis assignation level
(according to the priority) and the number of routers per chassis.
- To be able to reschedule the current assignments. In case of unbalance
(chassis deletion, router deletion), this tool will allow to reschedule
the GW chassis across the existing LRPs.
This is something that is useful to cloud operators that currently must
manually try and re-balance LRPs.
Status: Approved
### Alembic idempotency ###
Link: https://bugs.launchpad.net/neutron/+bug/2100770
This request comes from an issue seen in RHOSO18 with a DB migration
back port. It is not possible to change the cycle milestones (last DB
migrations of each cycle), thus the D/S migrations (in U/S are
forbidden) are done before these milestones. The problem happens when a
user, that is already in the last DB migration step, needs to re-execute
the DB migration tool to receive the new back ported migration. That
will fail because it is not possible to execute a DB migration script twice.
This proposal includes:
- Implementing a new test class that checks any new DB migration in
order to ensure it works, as we found issues recently.
- Enforce any new DB migration (code developers and reviewers) to be
idempotent.
- This would be supported on all existing migrations as well, not just
from today forward.
- Provide a usage guide on the alembic methods if necessary
Status: Approved
### Stop DB drop migrations, with field drop exceptions ###
Link: https://bugs.launchpad.net/neutron/+bug/2104160
In a DB migration, developers should add a TODO note to drop anything
not used in the next SLURP release. That will prevent the case described
in the bug, where a user executed the "expand" phase migration code with
the old Neutron API code (as specified by the upgrade guide), resulting
in an error due to a missing DB column, which was dropped by a DB
migration script.
The initial output of this bug is a new policy in the development guide.
### Create a restricted default policy for undefined API calls ###
Link: https://bugs.launchpad.net/neutron/+bug/2098737
The goal of this RFE is to define a default restrictive rule for every
non-defined policy in the Neutron API, as "oslo.policy" allows us to
create a "default" policy that will be used in case the API call is not
defined.
The "default" Neutron policy is RULE_ADMIN_OR_OWNER, we propose to
change this to RULE_ADMIN_ONLY. Any new API not defined in the policies
would be executable only by the admin user.
This change cannot be done now, due to the high number of missing API
rules not defined that are using the default rule, see
https://review.opendev.org/c/openstack/neutron/+/945687 - but this rule
migration should be a background tasks for any team member that will
improve the quality of the RBAC system in Neutron.
Status: Approved
### Nova/Cinder cross-project ###
We discussed two topics at the Nova/Cinder cross-project meeting.
Project-Specific QoS Controls for Granular Resource Management. Driven
by Bloomberg. Link: https://bugs.launchpad.net/neutron/+bug/2102184
Despite many questions still existing about the implementation, the
Neutron team agreed to accept a spec describing it. The Neutron team
agreed to have a per-project QoS policy that will affect to a
non-defined set of resources (ports, networks, routers, FIPs). Bloomberg
agreed to work on this later this cycle.
For live-migration with OVN, Nova currently has a vif-plugged event
timer to work-around any missed (or not sent) notifications.
Link: https://bugs.launchpad.net/nova/+bug/2073254
The code for the multiple binding is in core OVN (23.09) and Neutron.
Nova team will propose a patch to remove the condition in Nova and wait
for the vif-plugged event when using the ML2/OVN backend as the
condition should not happen any more.
Patch link: https://review.opendev.org/c/openstack/nova/+/946950
### Service role permissions ###
The Octavia allowed address pair driver requires new service role
permissions.
Link: https://bugs.launchpad.net/neutron/+bug/2105502
Three patches proposed:
- Add new default policy for device_id field on ports:
https://review.opendev.org/c/openstack/neutron/+/861169
- Allow service role to create/update port device_id:
https://review.opendev.org/c/openstack/neutron/+/947003
- Allow service role more RBAC access for Octavia:
https://review.opendev.org/c/openstack/neutron/+/945329
### L3/DHCP thread pool resizing ###
Link: https://review.opendev.org/c/openstack/neutron/+/938411
This proposal is related to the eventlet removal topic. There is no
direct replacement for ``eventlet.GreenPool``. The resizable mechanism
does not provide a resource improvement in speed or memory and there is
no kernel library that provides this functionality. The default and
static maximum number of threads will be 32 (the current maximum).
### VXLAN physical networks ###
Link: https://bugs.launchpad.net/neutron/+bug/2105855
The goal is to be able to schedule a VXLAN port depending on the
physical mappings as the VTEP interfaces in each compute node could be
connected to different physical networks.
This would help improve the "networking-generic-switch" back end as it
could use this new information.
An alternative to this could be to implement a new type driver: l2vni,
used for tunneled drivers (VXLAN) with a physical back end. A spec is
required to study this proposal further.
### vlan_transparent ###
Link: https://bugs.launchpad.net/neutron/+bug/2092174
Proposal to deprecate and remove the vlan_transparent config option.
By default, the loaded drivers will determine if the functionality is
supported or not.
Status: Approved
### OVN "Add-Only" Sync Mode ###
Link: https://bugs.launchpad.net/neutron/+bug/2099818
Proposal to add a new OVN sync mode option that will only add things to
the OVN database and never remove from either OVN or Neutron.
There were a few worries, as turning-off delete might just make things
more out of sync, and not address any issues being seen in
ovn-controller or ovn-northd.
It was also pointed out that we did spend some time over past cycles
tagging neutron-owned objects in OVN explicitly, and there might just be
some additional edge cases we need to fix. This might address the issue
where other objects change unexpectedly.
Need to discuss further at future Drivers meeting as submitter did not
attend.
### RFE discussions ###
On-demand topic to discuss any in-progress or proposed RFE changes,
please see etherpad for more information.
[0] https://etherpad.opendev.org/p/apr2025-ptg-neutron
[1] https://etherpad.opendev.org/p/neutron-eventlet-deprecation
[2] https://etherpad.opendev.org/p/apr2025-ptg-eventlet
8 months, 1 week
[tc][all] OpenStack Technical Committee Weekly Summary and Meeting Agenda (2025.1/R-10)
by Goutham Pacha Ravi
Hello Stackers,
We're 10 weeks away from the 2025.1 coordinated "Epoxy" release [1].
Starting this week, there are several deadlines that project teams
have adopted to ensure that code, specifications, and documentation
changes are adequately peer-reviewed. As Slawek Kaplonski (slaweq)
shared to this mailing list [2], the nomination period for the
upcoming Technical Committee and Project Team Leads elections will
begin on 2025-02-05 and remain open until 2025-02-19. Early
nominations are highly encouraged in case you'll be unavailable during
this period.
In the last week, the OpenStack Technical Committee selected a
community goal to migrate away from the eventlet library across its
codebase [3]. As with many community goals, this will be a
multi-release goal and needs contributors. Please join the
#openstack-eventlet-removal IRC channel on OFTC to participate in the
discussion.
=== Weekly Meeting ===
Dmitriy Rabotyagov (noonedeadpunk) chaired the TC's weekly IRC meeting
on 2025-01-14 [4]. We discussed the maintenance of the
"openstackdocstheme" Sphinx plugin and called out some early attempts
to switch documentation to a stock theme that's more maintainable and
sustainable. We also reviewed a proposal from Freezer project
maintainers to remove the project team from the "inactive" teams list
[5]. The TC also considered requesting an exception for Freezer to
release within 2025.1. In this vein, we also highlighted the need for
consistent criteria to move projects out of inactive status and better
tooling for indicating project health and activity.
The next meeting of the OpenStack Technical Committee is today,
2025-01-21. This meeting will be hosted via IRC on OFTC's
#openstack-tc channel. Please find the meeting agenda on the wiki [6].
I hope you'll be able to join us there.
=== Governance Proposals ===
==== Merged ====
- Propose to select the eventlet-removal community goal |
https://review.opendev.org/c/openstack/governance/+/934936
- Rework the eventlet-removal goal proposal |
https://review.opendev.org/c/openstack/governance/+/931254
- Add ansible-role-httpd repo to OSA-owned projects |
https://review.opendev.org/c/openstack/governance/+/935694
==== Open for Review ====
- Resolve to adhere to non-biased language |
https://review.opendev.org/c/openstack/governance/+/934907
- Retire Freezer DR | https://review.opendev.org/c/openstack/governance/+/938183
- Retire qdrouterd role |
https://review.opendev.org/c/openstack/governance/+/938193
- Remove Freezer from inactive state |
https://review.opendev.org/c/openstack/governance/+/938938
- Reset the DPL model for oslo project |
https://review.opendev.org/c/openstack/governance/+/939485
- Reset the DPL model for Release project |
https://review.opendev.org/c/openstack/governance/+/939486
- Reset the DPL model for Requirement project |
https://review.opendev.org/c/openstack/governance/+/939487
- Reset the DPL model for Watcher project |
https://review.opendev.org/c/openstack/governance/+/939488
- Define 2025 upstream investment opportunities |
https://review.opendev.org/c/openstack/governance/+/939507
=== Upcoming Events ===
- 2025-01-28: OpenInfra Board Monthly Meeting: https://board.openinfra.org/
- 2025-02-01: FOSDEM 2025 (https://fosdem.org/2025/) OpenStack's 15th
Birthday Celebration
- 2025-02-05: Nominations open for 2025.2 TC/PTL Elections:
https://governance.openstack.org/election/
- 2025-02-28: 2025.1 ("Epoxy") Feature Freeze and release milestone 3 [1]
- 2025-03-06: SCALE 2025 + OpenInfra Days NA
(https://www.socallinuxexpo.org/scale/22x)
Thank you very much for reading!
On behalf of the OpenStack TC,
Goutham Pacha Ravi (gouthamr)
OpenStack TC Chair
[1] 2025.1 "Epoxy" Release Schedule:
https://releases.openstack.org/epoxy/schedule.html
[2] TC/PTL Election dates for the 2025.2 cycle:
https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack…
[3] Goal to Remove Eventlet from OpenStack:
https://governance.openstack.org/tc/goals/selected/remove-eventlet.html
[4] TC Meeting IRC Log 2025-01-14:
https://meetings.opendev.org/meetings/tc/2025/tc.2025-01-14-18.00.log.html
[5] Remove Freezer from inactive state:
https://review.opendev.org/c/openstack/governance/+/938938
[6] TC Meeting Agenda, 2025-01-21:
https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting
11 months, 1 week
[watcher] 2026.1 Gazpacho PTG summary
by Douglas Viroel
Hi all,
Last week's PTG had a lot of great discussions, covering many different
aspects of the Watcher project.
If you want to get mode details about a topic, you can find the PTG
etherpad with all notes here
https://etherpad.opendev.org/p/watcher-2026.1-ptg
Here is a summary of the discussions that we had:
Future of datasources backends and untested integrations
*Discussion*: The discussion revolved around the removal timeline for
Monasca, the status of Gnocchi, and the deprecation of Prometheus
datasource in favor of Aetos. There was also a proposal to deprecate MAAS
support due to lack of testing and documentation, and to re-evaluate
experimental integrations in 2026.2.
*Agreed*:
- AI(jgilaber): Monasca datasource will be removed this cycle.
- More research is needed on current Gnocchi usage, and there will be no
change in its support this cycle. Suggestion: include a question about
gnocchi in the next openstack user survey.
- AI(jgilaber): Prometheus will be deprecated in 2026.1 in favor of
Aetos. Suggestion: include documentation on how to upgrade between
datasources.
- AI(dviroel): MAAS will be marked as deprecated this cycle, since it
has Eventlet dependent code that is planned to be removed in a near future.
We will send another email to the ML to call for maintainers.
- AI(sean-k-mooney): Update Watcher documentation about service
integrations that are now deprecated.
Openstack SDK
*Discussion*: It was focused on integrating Watcher with the OpenStack SDK
to deprecate and remove the python-bindings in other places. The goal for
watcher is to replace the current usage of the project's client with the
SDK instead (if there is support for it). The goal for python-watcherclient
is to only provide the *openstackclient* plugin and replace python binding
by adopting the usage of the SDK (when available). And for the
watcher-dashboard is to use the SDK exclusively for API interactions (also
when available).
*Agreed*:
- Start with watcher integrations. The goal is to move one service
client (e.g.: Nova) to the SDK this cycle.
- Do not freeze python bindings until SDK support is in place.
- A single spec can address the overall plan.
Code modernization, dependencies and dead code removal
*Discussion*: The discussion covered code modernization using pyupdate and
potentially ruff for cleanup. It also addressed the removal of dead code,
such as API routes that always raised exceptions and commented-out code in
production files. The removal of various client dependencies (Neutron,
Glance, Monasca) was also proposed.
*Agreed*:
- Apply the same pre-commit and ruff linting checks to the tempest
plugin and watcher client.
- Defer decisions on more typing until necessary, possibly starting with
interfaces.
- Explore dropping the number of dependencies, such as multiple timezone
libraries.
Applier's Workflow Execution and Its Interface/Contract
*Discussion*: The workflow for the Applier was identified as poorly
documented, leading to questions during reviews of new actions. The need
for a default Action interface was discussed.
*Agreed*:
- The assessment of a default Action interface will be based on the
chosen path for rollback and aborting topics.
- AI(dviroel): Current interface needs to be documented regardless.
Applier: Aborting running tasks
*Discussion*: The current implementation of the Applier spawning a new
green thread for each action, and killing threads for actions that support
abort(), was discussed.
*Agreed*:
- Stop spawning/killing threads on every action. This code can be
refactored right after we merge the eventlet changes in the applier (so we
don't mix the proposals).
- Improve the execute() method in actions to check resource status and
abort the process/looping when an action is cancelled/aborted.
Applier: Rollback of Action Plans
*Discussion*: The current lack of a working rollback mechanism and the fact
that the revert() method from Actions is not being tested or called were
highlighted. The future of the rollback option was debated. New rollback
mechanisms, such as a user-triggered "rollback" action for failed action
plans, were also considered.
*Agreed*:
- Auto-revert does not work and should be removed in the future.
- AI(dviroel): Current behavior should be treated as a bug, and the
documentation and associated configuration options should be updated
accordingly.
- A new spec should be proposed for a new action plan to revert workflow.
CI Testing and Coverage
*Discussion*: The naming and refactoring of watcher CI jobs were discussed.
The need for every voting job in check to also run in the gate was
emphasized, as well as in stable branches and other watcher projects. Job
renames and consolidation were proposed, along with enabling tempest
scenario jobs for stable branches and creating a new grenade job for
upgrade testing.
*Agreed*:
- AI(dviroel): Job renames/consolidation: watcher-functional can be
merged into other tempest jobs, watcher-tempest-actuator can be merged into
strategies job, and ipv6 job should be integrated into other existing
tempest jobs. Gate updates will occur after renaming and merging jobs.
Backporting changes from master to stable branches is expected.
- AI(dviroel): In watcher-tempest-plugin: replace tempest-functional
with a tempest job that runs scenario tests for stable branch validation.
- AI(chandankumar): A new grenade job to test upgrades between slurp
releases and include more testing in existing jobs.
- AI(sean-k-mooney): Propose a watcher job to run against OpenStack
requirements project.
Improving testing coverage for strategies by doing functional testing
*Discussion*: A specification proposal and a detailed implementation plan
for improving testing coverage for strategies through functional testing
were presented.
*Agreed*:
- A phased approach will be taken:
- AI(amoralej): 1st phase: API only GETs/POSTs.
- 2nd phase: Adding decision-engine + Nova + Prometheus datastore.
- 3rd phase: Adding Applier.
Rally Testing in watcher
*Discussion*: The current status of Rally testing in Watcher was reviewed,
noting that the rally-task-watcher job runs but is not in the Watcher
repository. Missing functionalities were identified, such as the inability
to pass audit template scope and parameters, lack of auto-triggering, and
no support for event, continuous audit, action plans, and actions.
*Agreed*:
- AI(chandankumar): Move rally-openstack watcher plugin code to watcher
repo
- AI(chandankumar): Add periodic job to run rally jobs
- Revisit this topic in future as we progress on scaling watcher CI
Watcher-dashboard improvements
*Discussion*: Improvements to the Watcher dashboard were discussed,
including adding auto-page refresh for audit/action plan status, a start
button for action plan details, and options for bulk archiving. Dashboard
testing using the Django test framework and Playwright was also covered.
*Agreed*:
- Create wishlist bugs or blueprints to track dashboard improvements.
- AI(chandankumar): Check with TC regarding the usage of pytest to
improve wording and proceed with integration tests implementation
- AI(reviewers): Get the spec
https://review.opendev.org/c/openstack/watcher-specs/+/963438 merged.
- A spec will be required for the audit and action plan bulk archive
feature.
The future of datamodel list API
*Discussion*: The utility of the datamodel list API, beyond tempest tests,
was questioned. The possibility of freezing the API with existing content
to avoid microversion bumps for new instance/node updates was raised. The
addition of new storage models (storage and baremetal) was also discussed.
*Agreed*:
- Do NOT extend the API to support additional models.
- Defer the removal of datamodel list (compute model) to future
discussions, but for now, do not extend it further, even if new fields are
added to model elements.
- AI(dviroel): Add test to avoid new API changes in datamodel list.
Eventlet Removal
*Discussion*: Changes made in the Flamingo cycle regarding Eventlet removal
were reviewed.The need for a collector sync timeout for threading mode and
refactoring the action plan cancel workflow were discussed.
*Agreed*:
- AI(dviroel): Collector timeout topic: REST API calls should continue
to have their own timeouts, and an event trigger can assist in stopping the
overall sync process. Add a new config option for collector timeout should
be considered.
- Applier: stop killing threads when an action plan is canceled. We can
keep the current behavior for evenlet but it will be a noop for threading
mode (minimal impact for eventlet removal changes).
- AI(dviroel): Set MAAS to deprecate after PTG. We will send an email to
the ML to call for maintainers once again. (same AI in Future of
datasources topic)
Future of Noisy Neighbort Goal/Strategy
*Discussion*: The deprecation of cache monitoring metrics in the kernel and
Nova, which formed the basis of the current noisy neighbor strategy, was
discussed. The need to replace this strategy and identify new metrics for
contention (e.g., CPU steal, CPU pressure, IOWait) and noisy neighbors
(e.g., CPU usage from low-priority instances) was explored. Instance
priority mechanisms were also considered.
*Agreed*:
- Remove the current noisy neighbor strategy in 2026.2+ to allow
deprecation to ship in a Slurp release.
- AI(dviroel): a proof of concept for CPU steal/IOWait/other metrics is
a nice to have, to replace current LLC monitoring metrics.
- Use instance metadata for PoC, but consider different solutions for
identifying/classifying workloads priorities, like tiering based on flavor
extra spec information.
Scaling Watcher
*Discussion*: The limitations of running a single instance of the Decision
Engine and Applier were discussed, with ideas for horizontal scalability.
Scalability concerns were raised for different watcher resources: audits,
actions and action plans. Some of the open issues that were discussed:
- Centralized datamodel: it is independently managed by each
decision-engine. Move the model from in-memory to memcached or the database
was discussed.
- ONESHOT audits are not a failover in a multiple-decision engine
deployment.
- When an applier dies, the action plan remains ONGOING and is only
cancelled upon the applier restart.
The main issue was identified as continuous audits being associated with a
specific decision-engine. The concept of moving to an event-driven model
with stateless decision engines and appliers was proposed, where data
models would reside in the database or a shared data store.
*Agreed*:
- Fow now, instrument the decision-engine to measure the size of the
model, and time taken to process notifications.
- Keep current failover behavior for ONESHOT audits.
- We could have a service monitor in the applier to reschedule PENDING
action plans and cancel ONGOING ones, providing the proper status message.
- There should be a way limit and set concurrency of actions in audits.
In this cycle we will propose a mechanism setting them at system or audit
level that covers all the strategies.
Stacking Strategies
*Discussion*: The possibility of having stacking strategies was
brainstormed in this topic. This included ideas like sequential execution
of multiple strategies, where a mutable cluster data model would be shared
among them. This could result in a list of linked action plans, or result
in merging all actions into a single action plan to avoid unnecessary
steps. This discussion may be revisited soon, with a more detailed use case.
*Agreed*:
- Get back to this topic when we have more detailed use cases. Propose a
spec to highlight the need of this feature in watcher.
Please let me know if I missed something.
Thanks!
Assisted-By: Gemini
--
Douglas Viroel - dviroel
1 month, 4 weeks