[Openstack-operators] Configuring local instance storage

Robert van Leeuwen Robert.vanLeeuwen at spilgames.com
Fri May 9 11:07:53 UTC 2014

We started with flashcache devices for our Swift cluster about 2 1/2 years ago.
These have one Corsair Force 3 and lifetime is now at about 50%

Using "non-supported" SSDs on "a-brand" hardware is a bit of a pain though:
We had "unexpected" faillures on SSDs we could not monitor due to a raid controller in between.
These had a RAID 0 of 2 240GB ssds and we had 2 graphite nodes with the same data.
Those failed at the same time so pretty sure they ran out. I think it was predicted if we could have read the data :
The SSDs were doing 8K IOPS / 40MB sustained for about 2 years or about 2PB of written data.

We moved to Intel SSDs when we switched to Supermicro hardware, which is supported, and very happy about those up to now.

In general the SSDs seem pretty reliable. (better then spinning platters)
We are not yet close enough to the expected (write) lifetimes to be 100% sure the counters are perfect but up to now it looks okay.
Our plan is to do some preventive swapping just to make sure we do not end up with a big problem.

Robert van Leeuwen

From: Arne Wiebalck [Arne.Wiebalck at cern.ch]
Sent: Friday, May 09, 2014 11:50 AM
To: Robert van Leeuwen
Cc: openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] Configuring local instance storage

Any experiences with "unexpected" SSD failures, i.e. failures that were not predicted? I am asking this
as we're considering to use block level caching on non-RAIDed SSDs and I'd like to get a feeling for
how much we have to reflect this in the SLA :)


On May 9, 2014, at 8:13 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com> wrote:

> We are using KVM.
> What I noticed on the hypervisor was that it was actually doing lots of reads when doing the benchmark on QCOW2 images.
> I simulated a MYSQL workload on the OS, I created a 20GB file and doing 100% random 16K write IOPS in it.
> On QCOW we got about 600 IOPS on RAW about 5000.
> With RAW we did no reads when writing while with QCOW I saw 100MB+ reads per second.
> I would be happy to know if we can improve this somehow :)
> We are monitoring the lifetime of the SSDs and will do some preventive swapping:
> There are lifetime estimations you can read from the SSDs.
> Luckily the array controller we have lets us query those stats per disk even if they are in RAID :)
> Cheers,
> Robert van Leeuwen
> ________________________________________
> From: Abel Lopez [alopgeek at gmail.com]
> Sent: Thursday, May 08, 2014 9:50 PM
> To: Tim Bell
> Cc: Robert van Leeuwen; Arne Wiebalck; openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Configuring local instance storage
> Second that question, Using KVM at least, I couldn't find any significant differences between QCOW2 and RAW based images.
> By "significant", I mean, enough to justify tossing the benefits of qcow2.
> On May 8, 2014, at 8:57 AM, Tim Bell <Tim.Bell at cern.ch> wrote:
>> Robert,
>> The difference between RAW and QCOW2 is pretty significant... what hypervisor are you using ?
>> Have you seen scenarios where the two SSDs are failing at the same time ? Red Hat was recommending against mirroring SSDs as with the same write pattern, the failure points for the same batch of SSDs would be close.
>> Tim
>> -----Original Message-----

More information about the OpenStack-operators mailing list