... I second Pete’s suggestion for a maintenance window with the proxies disabled. The first object part power increases were also performed in offline mode; before the relinker aware object server code was added.
Yeah, our first idea is to do that inside a maintenance window, to not be concerned about availability issues during the process.
... The first step would be adapting the swift ring builder prepare part power increase command to work on account and container rings. The main advantage of a placement aware ring part power increase is that when part 1 gets split into 2 and 3 it will be assigned to the same device; making the relink/move operation much more io efficient.
About this, it's helpful advice and a good starting point to do a safe increase. After adapting the swift ring builder to handle account and container rings, your recommendation is do this increase 1 per 1 part power, like 8->9;9->10;...19->20? One idea I've just thought of, besides this adaptation on swift ring builder, is to add an input argument to map new part power value (thinking if this makes sense...) About our cluster, we are gathering information about the production environment like how many accounts/containers do we have exactly and this stuff to consider our options for this change. I can come back later with useful information when I've it! Appreciate! Em qui., 21 de dez. de 2023 às 01:57, Clay Gerrard <clay.gerrard@gmail.com> escreveu:
This is an interesting challenge. To my knowledge no one has ever done a part power increase on an account/container ring. There is native support for online part power increases on object data rings. It seems you’re familiar with the general idea:
https://docs.openstack.org/swift/latest/ring_partpower.html
I second Pete’s suggestion for a maintenance window with the proxies disabled. The first object part power increases were also performed in offline mode; before the relinker aware object server code was added.
The account and container databases are in theory a little easier than object layer part power increase since the replication model is already per item instead of per partition. But I might recommend you consider a relink based approach with a doubled part count ring to minimize downtime instead of “just” swapping out the ring and waiting on replication.
The first step would be adapting the swift ring builder prepare part power increase command to work on account and container rings. The main advantage of a placement aware ring part power increase is that when part 1 gets split into 2 and 3 it will be assigned to the same device; making the relink/move operation much more io efficient.
I’d love to review any more details you can share about your plan or your cluster. While most folks are probably going to be logging off for the holidays for the next couple of weeks you can probably find some of us in IRC for more real-time QA.
Good luck!
Clay Gerrard
On Wed, Dec 20, 2023 at 1:57 PM Pete Zaitcev <zaitcev@redhat.com> wrote:
On Wed, 20 Dec 2023 09:33:50 -0300 Thiago De Moraes Teixeira <teixeira.thiago@luizalabs.com> wrote:
... I'm doing some crazy tests with SAIO and multiples storage nodes, based in build a new ring with part power 20 and just swap the old files (account/container.ring.gz) with the new ones and let the replicators do their jobs, moving *.db files to their new home partition.
I don't see a show-stopper if you do it while cluster is not available to the client requests, in a maintenance window. Normally Swift is intended to be run with total zero downtime for the lifetime of a cluster.
The observable problem is a window when your rings are switched over, but the container DBs are not yet moved. Proxy cannot find them at the new place and gives a 404. The same is true for updaters, I believe. You're risking losing track of container and account stats.
If you quescent the cluster wrt the updaters, expirers, and clients, then remaking rings outright ought to become possible.
However, I never tried what you're doing. I suggest you get engage attention of people who thought about all issues with the partition power changes - Christian Schwede, Clay Gerrard, maybe Alistair too. There may be something that we're not considering.
-- Pete
-- _‘Esta mensagem é direcionada apenas para os endereços constantes no cabeçalho inicial. Se você não está listado nos endereços constantes no cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão imediatamente anuladas e proibidas’._ * **‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*