[Openstack] Swift statistics discrepancy

pritpal at tech-guides.co.uk pritpal at tech-guides.co.uk
Wed Aug 6 11:53:08 UTC 2014


I have been doing some further digging into this and have found 
information which leads me to believe that replication is not working as 
it should...

In this cluster, we have 1 account which holds the majority of data, 
for the sake of this example, this account is 41677 - it holds 34TB of 
data.

Looking at the accounts sqlite db for this account on all nodes, I 
notice the incoming_sync and outgoing_sync have remote_id entries which 
I cannot locate anywhere:

sqlite> select * from incoming_sync;
remote_id                             sync_point  updated_at
------------------------------------  ----------  ----------
9332d177-1034-44e9-b77e-961a7ee7da6d  308256694   1406830765
d87e4dea-1c42-4f3f-8462-76227acc7c32  301384851   1406830765
0b84aac5-d16e-4d76-9903-eb9122c19119  310265599   1406836822

As you can see above, those are the nodes the "incoming" replication is 
expected from - however those ID's are not present on any other node 
with the same account. Hence the amount of data reported on some nodes 
is less than 34TB.

Why would this be? What can I do to fix this to ensure replication 
resumes correctly?

Thanks,

Pritpal


On 2014-08-05 13:06, pritpal at tech-guides.co.uk wrote:
> Hi All,
>
> We are running Swift 1.4.8 with 8 nodes and 4 zones.
>
> We recently added 4 SSD drives to one each to 4 of our storage nodes.
> The accounts and container rings were then rebalanced to ensure this
> data doesn't sit on spinning disks. Since the rebalance was done, we
> have noticed something unusual in the statistics returned from within
> swift.
>
> This is the command being run to grab the statistics:
>
> swift -v -A https://127.0.0.1:8080/auth/v1.0 -U <USERNAME> -K <PASS> 
> stat
>
> Before the changes, the statistics looked like this:
>
> ===
> Wed, 30 Jul 2014 10:51:26 +0100
> Array
> (
>     [X-Account-Object-Count] => 81473735
>     [X-Account-Bytes-Used] => 34156718530011
>     [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:51:36 +0100
> Array
> (
>     [X-Account-Object-Count] => 81473735
>     [X-Account-Bytes-Used] => 34156718530011
>     [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:51:46 +0100
> Array
> (
>     [X-Account-Object-Count] => 81698252
>     [X-Account-Bytes-Used] => 34213134745373
>     [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:51:56 +0100
> Array
> (
>     [X-Account-Object-Count] => 81687266
>     [X-Account-Bytes-Used] => 34209086906883
>     [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:52:06 +0100
> Array
> (
>     [X-Account-Object-Count] => 81687418
>     [X-Account-Bytes-Used] => 34209165517185
>     [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:52:16 +0100
> Array
> (
>     [X-Account-Object-Count] => 81405109
>     [X-Account-Bytes-Used] => 34105818678331
>     [X-Account-Container-Count] => 6510
> )
> Wed, 30 Jul 2014 10:52:26 +0100
> Array
> (
>     [X-Account-Object-Count] => 81460103
>     [X-Account-Bytes-Used] => 34127360552723
>     [X-Account-Container-Count] => 6510
> )
> ===
>
> Since the rebalancing, the statistics seem to show that
> X-Account-Bytes-Used has dropped by around 7TB and
> X-Account-Object-Count seems to have dropped to somewhere between 60M
> - 70M objects. The statistics now seem to jump around wildly, as can
> be seen below.
>
> ===
> Tue, 05 Aug 2014 12:32:49 +0100
> Array
> (
>     [X-Account-Object-Count] => 59242579
>     [X-Account-Bytes-Used] => 24304403925249
>     [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:32:59 +0100
> Array
> (
>     [X-Account-Object-Count] => 58817476
>     [X-Account-Bytes-Used] => 24167437130211
>     [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:09 +0100
> Array
> (
>     [X-Account-Object-Count] => 63760679
>     [X-Account-Bytes-Used] => 25828018327577
>     [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:19 +0100
> Array
> (
>     [X-Account-Object-Count] => 66724351
>     [X-Account-Bytes-Used] => 27197208718607
>     [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:29 +0100
> Array
> (
>     [X-Account-Object-Count] => 67222017
>     [X-Account-Bytes-Used] => 27465314723569
>     [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:39 +0100
> Array
> (
>     [X-Account-Object-Count] => 67214198
>     [X-Account-Bytes-Used] => 27536268561101
>     [X-Account-Container-Count] => 6603
> )
> Tue, 05 Aug 2014 12:33:49 +0100
> Array
> (
>     [X-Account-Object-Count] => 68353884
>     [X-Account-Bytes-Used] => 28017869874871
>     [X-Account-Container-Count] => 6603
> )
> ===
>
> The above is repeated, the count increases, then drops back to down.
> The question I have is, why would this happen? We definitely did not
> delete anything, so as far as I am concerned data was just moved
> around.
>
> You can see the behaviour on these graphs -
> http://www.preeto.co.uk/SwiftStats.PNG - Note how prior to the change
> (2014-07-31), the totalbytes and totalobjects graphs are fairly
> static.
>
> Regards,
>
> Pritpal





More information about the Openstack mailing list