Hi all,

I am writing to introduce a new middleware component, ExcludeNode, which we've just started to develop to enhance node management within the Swift proxy server.

[Middleware Overview]

ExcludeNode is designed to dynamically exclude specific nodes from the proxy's operations. This middleware reads a list of nodes to be excluded from a specified file and prevents these nodes from being considered during normal operations.

Main reason for developing this middleware is to avoid objects request latency/failure when nodes are down or when rebalancing objects after adding new nodes for example.

The main functionalities include:

- Node Exclusion Check: Validates if a node (IP:PORT) is listed in the exclusion file.
- Dynamic Updates: Allows updating the exclusion list by writing node information to the specified file (IP:PORT).
- Clear the exclusion list

How to use it ?

Assuming 1.2.3.4 storage node is down, requests will fail with :

proxy-server: ERROR with Object server 1.2.3.4:6210/sdm re: Trying to GET /v1/AUTH_[...]: ConnectionTimeout (2.0s) (txn: tx19b7067dcb0c442e96a10-0066867068)

To ban the disk 1.2.3.4:6210/sdm , on each proxy-server :

curl -X POST `hostname`:8080/exclude_node -H "Content-Type: application/json" -d {"ip":"'1.2.3.4","port":6210}"'

in proxy-server.log will be logged :

proxy-server: Node added to exclusion list: 1.2.3.4:6210 (txn: tx0f292e3b08f940b5be3d9-0066867068)

This way, the node won't serve any request and depending the request type, this latter will be redirected to another primary or handoff node.
It will remain like this until the disk is manually removed from the exclusion list and every time this node would have been chosen to serve a request you'll get a log :

proxy-server: Node 1.2.3.4:6210 (sdm) explicitly excluded by exclude_node middleware. (txn: txeb9f5953ee3343dbb9699-006686707a)

I would greatly appreciate the community's feedback on the following:
  1. Functionality: Does the middleware fulfill a common need?
  2. Implementation: Are there any concerns regarding the code implementation or its impact on existing functionalities?
We'd like to submmit this middleware upstream but to be efficient and avoid wasting everybody's time I'd like some feedbacks first :)

The code itself is quite simple today but we plan to add features like auto excluding a node reported as unmounted by swift-recon for example.

exclude_node.py middleware :

https://kpaste.infomaniak.com/nRp8AdJGnMuLCoP7czIbL9IBJjlSkyYS#ARXTJo5ZnGa5U2Yu8xXfm5xRNTfdk2mWu1jiapaVi1yV

server.py modification :
https://kpaste.infomaniak.com/Qvy992wlRbc6yEgrIPhAhyrh2V3-Gipy#E2TaKDAXQdXpeWAivF2ZKQ3sic63JQ9jUnLmQsxg9yuQ

proxy-server.conf :

[pipeline:main]
- pipeline = [...] proxy-server
+ pipeline = [...] exclude_node proxy-server

[filter:exclude_node]                                                                                                                                                                                            use=egg:swift#exclude_node                                                                                                                                                                                       exclude_nodes_filename = /dev/shm/excluded_nodes 

Thank you in advance for your feedback,
Olivier