[release-announce] watcher 14.1.0 (epoxy)

28 Aug 2025

      We are gleeful to announce the release of:

watcher 14.1.0

This release is part of the epoxy release series.

The source is available from:

    https://opendev.org/openstack/watcher

Download the package from:

    https://tarballs.openstack.org/watcher/

Please report issues through:

    https://bugs.launchpad.net/watcher/+bugs

For more details, please see below.

14.1.0
^^^^^^

New Features
************

* A new module, "watcher.wsgi", has been added as a place to gather
  WSGI "application" objects. This is intended to ease deployment by
  providing a consistent location for these objects. For example, if
  using uWSGI then instead of:

     [uwsgi]
      wsgi-file = /bin/watcher-api-wsgi

  You can now use:

     [uwsgi]
     module = watcher.wsgi.api:application

  This also simplifies deployment with other WSGI servers that expect
  module paths such as gunicorn.

Deprecation Notes
*****************

* The watcher-api-wsgi console script is deprecated for removal in a
  future release. This artifact is generated using a setup-tools
  extension that is provide by PBR which is also deprecated. due to
  the changes in python packaging this custom extensions is planned to
  be removed form all OpenStack projects in a future PBR release in
  favor of module based wsgi applications entry points.

Security Issues
***************

* Watchers no longer forges requests on behalf of a tenant when
  swapping volumes. Prior to this release watcher had 2
  implementations of moving a volume, it could use cinders volume
  migrate api or its own internal implementation that directly calls
  nova volume attachment update api. The former is safe and the
  recommend way to move volumes between cinder storage backend the
  internal implementation was insecure, fragile due to a lack of error
  handling and capable of deleting user data.

  Insecure: the internal volume migration operation created a new
  keystone user with a weak name and password and added it to the
  tenants project with the admin role. It then used that user to forge
  request on behalf of the tenant with admin right to swap the volume.
  if the applier was restarted during the execution of this operation
  it would never be cleaned up.

  Fragile: the error handling was minimal, the swap volume api is
  async so watcher has to poll for completion, there was no support to
  resume that if interrupted of the time out was exceeded.

  Data-loss: while the internal polling logic returned success or
  failure watcher did not check the result, once the function returned
  it unconditionally deleted the source volume. For larger volumes
  this could result in irretrievable data loss.

  Finally if a volume was swapped using the internal workflow it put
  the nova instance in an out of sync state. If the VM was live
  migrated after the swap volume completed successfully prior to a
  hard reboot then the migration would fail or succeed and break
  tenant isolation.

  see: https://bugs.launchpad.net/nova/+bug/2112187 for details.

Bug Fixes
*********

* When using prometheus datasource and more that one target has the
  same value for the "fqdn_label", the driver used the wrong instance
  label to query for host metrics. The "instance" label is no longer
  used in the queries but the "fqdn_label" which identifies all the
  metrics for a specific compute node. see Bug 2103451:
  https://bugs.launchpad.net/watcher/+bug/2103451 for more info.

* Previously, when users attempted to create a new audit without
  providing a name and a goal or an audit template, the API returned
  error 500 and an incorrect error message was displayed.

  Now, Watcher displays a helpful message and returns HTTP error 400.

  For more info see: https://bugs.launchpad.net/watcher/+bug/2110947

* All code related to creating keystone user and granting roles has
  been removed. The internal swap volume implementation has been
  removed and replaced by cinders volume migrate api. Note as part of
  this change Watcher will no longer attempt volume migrations or
  retypes if the instance is in the *Verify Resize* task state. This
  resolves several issues related to volume migration in the zone
  migration and Storage capacity balance strategies. While efforts
  have been made to maintain backward compatibility these changes are
  required to address a security weakness in watcher's prior approach.

  see: https://bugs.launchpad.net/nova/+bug/2112187 for more context.

* When running an audit with the *workload_stabilization* strategy
  with *instance_ram_usage* metric in a deployment with prometheus
  datasource, the host metric for the ram usage was wrongly reported
  with the incorrect unit which lead to incorrect standard deviation
  and action plans due to the application of the wrong scale factor in
  the algorithm.

  The host ram usage metric is now properly reported in KB when using
  a prometheus datasource and the strategy *workload_stabilization*
  calculates the standard deviation properly.

  For more details: https://launchpad.net/bugs/2113776

* Host maintenance strategy should migrate servers based on backup
  node if specified or rely on nova scheduler. It was enabling
  disabled hosts with watcher_disabled reason and migrating servers to
  those nodes. It can impact customer workload. Compute nodes were
  disabled for a reason.

  Host maintenance strategy is fixed now to support migrating servers
  only on backup node or rely on nova scheduler if no backup node is
  provided.

* Previously, if an action failed in an action plan, the state of
  the action plan was reported as SUCCEEDED if the execution of the
  action has finished regardless of the outcome.

  Watcher will now reflect the actual state of all the actions in the
  plan after the execution has finished. If any action has status
  FAILED, it will set the state of the action plan as FAILED. This is
  the expected behavior according to Watcher documentation.

  For more info see: https://bugs.launchpad.net/watcher/+bug/2106407

* Bug #2110538 (https://bugs.launchpad.net/watcher/+bug/2110538):
  Corrected the HTTP error code returned when watcher users try to
  create audits with invalid parameters. The API now correctly returns
  a 400 Bad Request error.

Changes in watcher 14.0.0..14.1.0
---------------------------------

ffec800f use cinder migrate for swap volume
defd3953 Configure watcher tempest's microversion in devstack
ba417b38 Fix audit creation with no name and no goal or audit_template
38622442 Set actionplan state to FAILED if any action has failed
c7fde924 Add unit test to check action plan state when a nested action fails
e5b5ff5d Return HTTP code 400 when creating an audit with wrong parameters
fb85b27a Use KiB as unit for host_ram_usage when using prometheus datasource
53872f9a Aggregate by label when querying instance cpu usage in prometheus
c0ebb8dd Drop code from Host maintenance strategy migrating instance to disabled hosts
1d7f1636 Added unit test to validate audit creation with no goal and no name
c6ceaacf Add a unit test to check the error when creating an audit with wrong parameters
f4bfb105 [host_maintenance] Pass des hostname in add_action solution
8a99d4c5 Add support for pyproject.toml and wsgi module paths
ce9f0b4c Skip real-data tests in non-real-data jobs
e385ece6 Aggregate by fqdn label instead instance in host cpu metrics
c6505ad0 Query by fqdn_label instead of instance for host metrics
64f70b94 Drop sg_core prometheus related vars
68c9ce65 Update TOX_CONSTRAINTS_FILE for stable/2025.1
5fa09265 Update .gitreview for stable/2025.1

Diffstat (except docs and test files)
-------------------------------------

.gitreview                                         |   1 +
.zuul.yaml                                         |  17 +--
devstack/lib/watcher                               |  96 ++++----------
devstack/plugin.sh                                 |   3 +
pyproject.toml                                     |   3 +
.../add-wsgi-module-support-597f479e31979270.yaml  |  30 +++++
...ries-with-multiple-target-0e65d20711d1abe2.yaml |   8 ++
releasenotes/notes/bug-2110947.yaml                |  10 ++
.../notes/bug-2112187-763bae283e0b736d.yaml        |  47 +++++++
.../notes/bug-2113776-4bd314fb46623fbc.yaml        |  14 +++
...trategy-on-disabled-hosts-24084a22d4c8f914.yaml |  10 ++
...ion-plan-state-on-failure-69e498d902ada5c5.yaml |  13 ++
...ror-400-on-bad-parameters-bb964e4f5cadc15c.yaml |   7 ++
setup.cfg                                          |   2 +-
tox.ini                                            |  13 +-
watcher/api/controllers/v1/audit.py                |  19 +--
watcher/applier/action_plan/default.py             |  22 +++-
watcher/applier/actions/volume_migration.py        |  98 ++++-----------
watcher/common/keystone_helper.py                  |  34 -----
watcher/common/utils.py                            |   8 +-
watcher/decision_engine/datasources/base.py        |   2 +-
watcher/decision_engine/datasources/prometheus.py  | 135 ++++++++++----------
.../strategy/strategies/host_maintenance.py        |  26 +---
.../strategy/strategies/zone_migration.py          |  55 ++++----
.../action_plan/test_default_action_handler.py     |  27 ++++
.../datasources/test_prometheus_helper.py          | 140 +++++++++++++++------
.../strategy/strategies/test_host_maintenance.py   |  13 +-
.../strategies/test_workload_stabilization.py      |  62 ++++++++-
.../strategy/strategies/test_zone_migration.py     |   5 +-
watcher/wsgi/__init__.py                           |   0
watcher/wsgi/api.py                                |  18 +++
34 files changed, 652 insertions(+), 422 deletions(-)

[release-announce] watcher 14.1.0 (epoxy)

no-reply＠openstack.org