[Openstack] [SWIFT] Change the partition power to recreate the RING

Chuck Thier cthier at gmail.com
Mon Jan 14 16:23:40 UTC 2013


Hi Alejandro,

I really doubt that partition size is causing these issues.  It can be
difficult to debug these types of issues without access to the
cluster, but I can think of a couple of things to look at.

1.  Check your disk io usage and io wait on the storage nodes.  If
that seems abnormally high, then that could be one of the sources of
problems.  If this is the case, then the first things that I would
look at are the auditors, as they can use up a lot of disk io if not
properly configured.  I would try turning them off for a bit
(swift-*-auditor) and see if that makes any difference.

2.  Check your network io usage.  You haven't described what type of
network you have going to the proxies, but if they share a single GigE
interface, if my quick calculations are correct, you could be
saturating the network.

3.  Check your CPU usage.  I listed this one last as you have said
that you have already worked at tuning the number of workers (though I
would be interested to hear how many workers you have running for each
service).  The main thing to look for, is to see if all of your
workers are maxed out on CPU, if so, then you may need to bump
workers.

4.  SSL Termination?  Where are you terminating the SSL connection?
If you are terminating SSL in Swift directly with the swift proxy,
then that could also be a source of issue.  This was only meant for
dev and testing, and you should use an SSL terminating load balancer
in front of the swift proxies.

That's what I could think of right off the top of my head.

--
Chuck

On Mon, Jan 14, 2013 at 5:45 AM, Alejandro Comisario
<alejandro.comisario at mercadolibre.com> wrote:
> Chuck / John.
> We are having 50.000 request per minute ( where 10.000+ are put from small
> objects, from 10KB to 150KB )
>
> We are using swift 1.7.4 with keystone token caching so no latency over
> there.
> We are having 12 proxyes and 24 datanodes divided in 4 zones ( each datanode
> has 48gb of ram, 2 hexacore and 4 devices of 3TB each )
>
> The workers that are puting objects in swift are seeing an awful
> performance, and we too.
> With peaks of 2secs to 15secs per put operations coming from the datanodes.
> We tunes db_preallocation, disable_fallocate, workers and concurrency but we
> cant reach the request that we need ( we need 24.000 put per minute of small
> objects ) but we dont seem to find where is the problem, other than from the
> datanodes.
>
> Maybe worth pasting our config over here?
> Thanks in advance.
>
> alejandro
>
> On 12 Jan 2013 02:01, "Chuck Thier" <cthier at gmail.com> wrote:
>>
>> Looking at this from a different perspective.  Having 2500 partitions
>> per drive shouldn't be an absolutely horrible thing either.  Do you
>> know how many objects you have per partition?  What types of problems
>> are you seeing?
>>
>> --
>> Chuck
>>
>> On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson <me at not.mn> wrote:
>> > If effect, this would be a complete replacement of your rings, and that
>> > is essentially a whole new cluster. All of the existing data would need to
>> > be rehashed into the new ring before it is available.
>> >
>> > There is no process that rehashes the data to ensure that it is still in
>> > the correct partition. Replication only ensures that the partitions are on
>> > the right drives.
>> >
>> > To change the number of partitions, you will need to GET all of the data
>> > from the old ring and PUT it to the new ring. A more complicated, but
>> > perhaps more efficient) solution may include something like walking each
>> > drive and rehashing+moving the data to the right partition and then letting
>> > replication settle it down.
>> >
>> > Either way, 100% of your existing data will need to at least be rehashed
>> > (and probably moved). Your CPU (hashing), disks (read+write), RAM (directory
>> > walking), and network (replication) may all be limiting factors in how long
>> > it will take to do this. Your per-disk free space may also determine what
>> > method you choose.
>> >
>> > I would not expect any data loss while doing this, but you will probably
>> > have availability issues, depending on the data access patterns.
>> >
>> > I'd like to eventually see something in swift that allows for changing
>> > the partition power in existing rings, but that will be
>> > hard/tricky/non-trivial.
>> >
>> > Good luck.
>> >
>> > --John
>> >
>> >
>> > On Jan 11, 2013, at 1:17 PM, Alejandro Comisario
>> > <alejandro.comisario at mercadolibre.com> wrote:
>> >
>> >> Hi guys.
>> >> We've created a swift cluster several months ago, the things is that
>> >> righ now we cant add hardware and we configured lots of partitions thinking
>> >> about the final picture of the cluster.
>> >>
>> >> Today each datanodes is having 2500+ partitions per device, and even
>> >> tuning the background processes ( replicator, auditor & updater ) we really
>> >> want to try to lower the partition power.
>> >>
>> >> Since its not possible to do that without recreating the ring, we can
>> >> have the luxury of recreate it with a very lower partition power, and
>> >> rebalance / deploy the new ring.
>> >>
>> >> The question is, having a working cluster with *existing data* is it
>> >> possible to do this and wait for the data to move around *without data loss*
>> >> ???
>> >> If so, it might be true to wait for an improvement in the overall
>> >> cluster performance ?
>> >>
>> >> We have no problem to have a non working cluster (while moving the
>> >> data) even for an entire weekend.
>> >>
>> >> Cheers.
>> >>
>> >>
>> >
>> >
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~openstack
>> > Post to     : openstack at lists.launchpad.net
>> > Unsubscribe : https://launchpad.net/~openstack
>> > More help   : https://help.launchpad.net/ListHelp
>> >




More information about the Openstack mailing list