[Openstack] Swift replication analysis and questions (high CPU usage)

Julien Danjou julien.danjou at enovance.com
Mon Apr 16 13:31:02 UTC 2012


Hi there,

We're currently running a 3 nodes installation of Swift. We saw that the
CPU usage on this nodes is always very high, around 50%, with no users
connected, and we'd like to be sure there's not something wrong.

So I did some analysis and the whole point of this mail is to be
corroborated or invalidated.

Facts
=====
These 3 nodes are hosting account, containers and objects.

The following processes take a lot of CPU time:

  swift-container-replicator
  swift-container-server
  swift-object-replicator
  swift-object-server

The object replication time as indicated in log and by swift-recon
showed that it took around 5 minutes.
The options used are the default ones (nothing fancy then).
We replicate each account/container/objects 3 times.

Volumetry:
- 450 GB of storage used (each node has 19 TB)
- 57 accounts
- 7929 containers in 7870 partitions
- 58158 objects partitions used so far

Each ring has been built with 2^18 (262144) partitions.

The containers and objects sync run at around 300 partitions/s.

Analysis
========
What's taking CPU time is replication. If I understand correctly, the
container and object replicators walks through their respective
directories, and send a REPLICATE command to all the other servers
responsible for the same partition. Since there's only 3 servers for 3
replicates, obviously all nodes hold all the partitions. That means that
swift-{objecter,container}-replicator processes of host A will send
REPLICATE requests to hosts B and C, making the
swift-{object,container}-server process uses CPU on host B and C to
return some response indicating that the data must be rsync-ed or not.

For containers, it means that it walks ~8K directories and open 8K
sqlite database, send 2 REPLICATES, and does nothing (when everything is
in sync, which is the case 99.999% of the time).
For objects, it means that it walks ~58K directories and open the
hashfile for each of its 58K partitions and send 2 REPLICATES, and does
nothing (when everything is in sync, which is the case 99.999% of the
time)

For containers, it does that every 30s (by default). With around 8K
containers, it takes more than 35s, so swift-container-replicator uses
one CPU at 100 % all the time.
For objects, it does that in 5 minutes and pauses for 30s (by default).


If this is the normal behaviour, I can then conclude that:
- we need more hosts to lower CPU usage, because when we'll reach the
  262K objects partitions used, the CPU usage will explode and the
  objects synchronisation time will increase by 5 times.
  For example adding 2 more hosts would allow to reduce by 40 % the
  number of partitions hosted on each hosts (each would have 60 % of
  partition space rather than 100 %).

- we should have used less partitions to use on such a small hosts,
  probably something between 2^10 and 2^14 (we'll add more hosts, but
  probably not thousands).


What I did so far is to increase the number of workers and the
concurrency setting. Setting concurrency to 32 (each hosts has 8 CPU
cores) for object replicator pushed the CPU usage to 80 % (reminder:
50 % before) but divided the replication time by more than 2 (~2 minutes
instead of 5 minutes).

This doesn't help with CPU usage obviously, it's worst, but at least it
takes just 30 % more CPU to do the same thing twice in the same time
frame.

Thanks for any hint or helpful comment about this!

-- 
Julien Danjou
// eNovance                      http://enovance.com
// ✉ julien.danjou at enovance.com  ☎ +33 1 49 70 99 81




More information about the Openstack mailing list