[openstack-dev] Does DB schema hygiene warrant long migrations?

Jay Pipes jaypipes at gmail.com
Thu Oct 24 21:28:48 UTC 2013


On 10/24/2013 04:40 PM, Boris Pavlovic wrote:
> Hi,
>
>
> 1) If you have 30 million instance it means that you have 300 million
> instance_system_metadata records
> All these records will be downloaded every 6 seconds (in periodic tasks)
> to compute_nodes => which means that OpenStack on scale out of box
> doesn't work.
>
> If you have 3 million instance the situation is the same (OpenStack
> doesn't work). but migration will be done for 1 minute.
>
> So it is maximum 1 minute downtime (in actually non real case).
>
>
> This change is actually very important because VARCHAR works much much
> faster then BLOB (Text) records.

This is actually not always true. It depends on:

* What RDBMS you are using
* What storage engine within MySQL, if using MySQL
* What the data access/modification patterns on the field are

The last bullet is very important. For fields that:

* Are not used in predicates or aggregates (i.e. not used in JOIN 
conditions, WHERE clauses, or GROUP BY/HAVING clauses)
* Are rarely read or updated
* Are typically longer than 200-300 bytes of data

It's typically more efficient (both from a fill factor perspective and 
from a log structure perspective) to leave the field as TEXT rather than 
using VARCHAR. The reason is twofold:

1) Using TEXT saves space in the main data pages of the storage engine 
since a pointer to external data file is stored in the data page instead 
of the data itself, meaning more records of that table can fit in a 
single fixed-byte-size data page, which means fewer disk and memory 
reads to find a record, which means faster seeks and scans.

2) When modifying the value of the TEXT field, there is no chance that a 
clustered index-organized table layout (like, say, InnoDB uses) would 
need to perform a rebalancing action due to the new size of the TEXT 
data causing the record to not fit on its containing data page -- 
something that would happen a lot more if VARCHAR was used.

If the data is:

1) Often used in predicates or aggregates
2) Often updated and the size of the updated field stays a similar range
3) ALWAYS within a small, defined range of bytes (say... 32-64 bytes)

Then it's often advantageous to use VARCHAR (or CHAR) over TEXT.

But it's wrong to say that it is ALWAYS faster to use VARCHAR vs. BLOB/TEXT.

Best,
-jay

> So this is important change and shouldn't be -2.
>
>
> Best regards,
> Boris Pavlovic
>
>
>
>
>
> On Fri, Oct 25, 2013 at 12:30 AM, Michael Still <mikal at stillhq.com
> <mailto:mikal at stillhq.com>> wrote:
>
>     Hi.
>
>     Because I am a grumpy old man I have just -2'ed
>     https://review.openstack.org/#/c/39685/ and I wanted to explain my
>     rationale. Mostly I am hoping for a consensus to form -- if I am wrong
>     then I'll happy remove my vote from this patch.
>
>     This patch does the reasonably sensible thing of converting two
>     columns from being text to varchar, which reduces their expense to the
>     database. Given the data stored is already of limited length, it
>     doesn't impact our functionality at all either.
>
>     However, when I run it with medium sized (30 million instances)
>     databases, the change does cause a 10 minute downtime. I don't
>     personally think the change is worth such a large outage, but perhaps
>     everyone else disagrees.
>
>     Discuss.
>
>     Thanks,
>     Michael
>
>     PS: I could see a more complicated approach where we did these changes
>     "in flight" by adding columns, using a periodic task to copy data to
>     the new columns, and then dropping the old. That's a lot more
>     complicated to implement though.
>
>     --
>     Rackspace Australia
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>




More information about the OpenStack-dev mailing list