This is an interesting challenge. To my knowledge no one has ever done a part power increase on an account/container ring. There is native support for online part power increases on object data rings. It seems you’re familiar with the general idea:

https://docs.openstack.org/swift/latest/ring_partpower.html

I second Pete’s suggestion for a maintenance window with the proxies disabled. The first object part power increases were also performed in offline mode; before the relinker aware object server code was added.

The account and container databases are in theory a little easier than object layer part power increase since the replication model is already per item instead of per partition. But I might recommend you consider a relink based approach with a doubled part count ring to minimize downtime instead of “just” swapping out the ring and waiting on replication. 

The first step would be adapting the swift ring builder prepare part power increase command to work on account and container rings. The main advantage of a placement aware ring part power increase is that when part 1 gets split into 2 and 3 it will be assigned to the same device; making the relink/move operation much more io efficient. 

I’d love to review any more details you can share about your plan or your cluster. While most folks are probably going to be logging off for the holidays for the next couple of weeks you can probably find some of us in IRC for more real-time QA. 

Good luck!

Clay Gerrard



On Wed, Dec 20, 2023 at 1:57 PM Pete Zaitcev <zaitcev@redhat.com> wrote:
On Wed, 20 Dec 2023 09:33:50 -0300
Thiago De Moraes Teixeira <teixeira.thiago@luizalabs.com> wrote:

> ... I'm doing some crazy tests with
> SAIO and multiples storage nodes, based in build a new ring
> with part power 20 and just swap the old files
> (account/container.ring.gz) with the new ones and let the replicators
> do their jobs, moving *.db files to their new home partition.

I don't see a show-stopper if you do it while cluster is not
available to the client requests, in a maintenance window.
Normally Swift is intended to be run with total zero downtime
for the lifetime of a cluster.

The observable problem is a window when your rings are switched
over, but the container DBs are not yet moved. Proxy cannot find
them at the new place and gives a 404. The same is true for
updaters, I believe. You're risking losing track of container
and account stats.

If you quescent the cluster wrt the updaters, expirers, and clients,
then remaking rings outright ought to become possible.

However, I never tried what you're doing. I suggest you get engage
attention of people who thought about all issues with the partition
power changes - Christian Schwede, Clay Gerrard, maybe Alistair too.
There may be something that we're not considering.

-- Pete