[Openstack] Swift statistics discrepancy
pritpal at tech-guides.co.uk
pritpal at tech-guides.co.uk
Wed Aug 6 11:53:08 UTC 2014
I have been doing some further digging into this and have found
information which leads me to believe that replication is not working as
it should...
In this cluster, we have 1 account which holds the majority of data,
for the sake of this example, this account is 41677 - it holds 34TB of
data.
Looking at the accounts sqlite db for this account on all nodes, I
notice the incoming_sync and outgoing_sync have remote_id entries which
I cannot locate anywhere:
sqlite> select * from incoming_sync;
remote_id sync_point updated_at
------------------------------------ ---------- ----------
9332d177-1034-44e9-b77e-961a7ee7da6d 308256694 1406830765
d87e4dea-1c42-4f3f-8462-76227acc7c32 301384851 1406830765
0b84aac5-d16e-4d76-9903-eb9122c19119 310265599 1406836822
As you can see above, those are the nodes the "incoming" replication is
expected from - however those ID's are not present on any other node
with the same account. Hence the amount of data reported on some nodes
is less than 34TB.
Why would this be? What can I do to fix this to ensure replication
resumes correctly?
Thanks,
Pritpal
On 2014-08-05 13:06, pritpal at tech-guides.co.uk wrote:
> Hi All,
>
> We are running Swift 1.4.8 with 8 nodes and 4 zones.
>
> We recently added 4 SSD drives to one each to 4 of our storage nodes.
> The accounts and container rings were then rebalanced to ensure this
> data doesn't sit on spinning disks. Since the rebalance was done, we
> have noticed something unusual in the statistics returned from within
> swift.
>
> This is the command being run to grab the statistics:
>
> swift -v -A https://127.0.0.1:8080/auth/v1.0 -U <USERNAME> -K <PASS>
> stat
>
> Before the changes, the statistics looked like this:
>
> ===
> Wed, 30 Jul 2014 10:51:26 +0100
> Array
> (
> [X-Account-Object-Count] => 81473735
> [X-Account-Bytes-Used] => 34156718530011
> [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:51:36 +0100
> Array
> (
> [X-Account-Object-Count] => 81473735
> [X-Account-Bytes-Used] => 34156718530011
> [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:51:46 +0100
> Array
> (
> [X-Account-Object-Count] => 81698252
> [X-Account-Bytes-Used] => 34213134745373
> [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:51:56 +0100
> Array
> (
> [X-Account-Object-Count] => 81687266
> [X-Account-Bytes-Used] => 34209086906883
> [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:52:06 +0100
> Array
> (
> [X-Account-Object-Count] => 81687418
> [X-Account-Bytes-Used] => 34209165517185
> [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:52:16 +0100
> Array
> (
> [X-Account-Object-Count] => 81405109
> [X-Account-Bytes-Used] => 34105818678331
> [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:52:26 +0100
> Array
> (
> [X-Account-Object-Count] => 81460103
> [X-Account-Bytes-Used] => 34127360552723
> [X-Account-Container-Count] => 6510
> )
> ===
>
> Since the rebalancing, the statistics seem to show that
> X-Account-Bytes-Used has dropped by around 7TB and
> X-Account-Object-Count seems to have dropped to somewhere between 60M
> - 70M objects. The statistics now seem to jump around wildly, as can
> be seen below.
>
> ===
> Tue, 05 Aug 2014 12:32:49 +0100
> Array
> (
> [X-Account-Object-Count] => 59242579
> [X-Account-Bytes-Used] => 24304403925249
> [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:32:59 +0100
> Array
> (
> [X-Account-Object-Count] => 58817476
> [X-Account-Bytes-Used] => 24167437130211
> [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:09 +0100
> Array
> (
> [X-Account-Object-Count] => 63760679
> [X-Account-Bytes-Used] => 25828018327577
> [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:19 +0100
> Array
> (
> [X-Account-Object-Count] => 66724351
> [X-Account-Bytes-Used] => 27197208718607
> [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:29 +0100
> Array
> (
> [X-Account-Object-Count] => 67222017
> [X-Account-Bytes-Used] => 27465314723569
> [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:39 +0100
> Array
> (
> [X-Account-Object-Count] => 67214198
> [X-Account-Bytes-Used] => 27536268561101
> [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:49 +0100
> Array
> (
> [X-Account-Object-Count] => 68353884
> [X-Account-Bytes-Used] => 28017869874871
> [X-Account-Container-Count] => 6603
> )
> ===
>
> The above is repeated, the count increases, then drops back to down.
> The question I have is, why would this happen? We definitely did not
> delete anything, so as far as I am concerned data was just moved
> around.
>
> You can see the behaviour on these graphs -
> http://www.preeto.co.uk/SwiftStats.PNG - Note how prior to the change
> (2014-07-31), the totalbytes and totalobjects graphs are fairly
> static.
>
> Regards,
>
> Pritpal
More information about the Openstack
mailing list