[openstack-dev] Does DB schema hygiene warrant long migrations?
Jay Pipes
jaypipes at gmail.com
Thu Oct 24 21:28:48 UTC 2013
On 10/24/2013 04:40 PM, Boris Pavlovic wrote:
> Hi,
>
>
> 1) If you have 30 million instance it means that you have 300 million
> instance_system_metadata records
> All these records will be downloaded every 6 seconds (in periodic tasks)
> to compute_nodes => which means that OpenStack on scale out of box
> doesn't work.
>
> If you have 3 million instance the situation is the same (OpenStack
> doesn't work). but migration will be done for 1 minute.
>
> So it is maximum 1 minute downtime (in actually non real case).
>
>
> This change is actually very important because VARCHAR works much much
> faster then BLOB (Text) records.
This is actually not always true. It depends on:
* What RDBMS you are using
* What storage engine within MySQL, if using MySQL
* What the data access/modification patterns on the field are
The last bullet is very important. For fields that:
* Are not used in predicates or aggregates (i.e. not used in JOIN
conditions, WHERE clauses, or GROUP BY/HAVING clauses)
* Are rarely read or updated
* Are typically longer than 200-300 bytes of data
It's typically more efficient (both from a fill factor perspective and
from a log structure perspective) to leave the field as TEXT rather than
using VARCHAR. The reason is twofold:
1) Using TEXT saves space in the main data pages of the storage engine
since a pointer to external data file is stored in the data page instead
of the data itself, meaning more records of that table can fit in a
single fixed-byte-size data page, which means fewer disk and memory
reads to find a record, which means faster seeks and scans.
2) When modifying the value of the TEXT field, there is no chance that a
clustered index-organized table layout (like, say, InnoDB uses) would
need to perform a rebalancing action due to the new size of the TEXT
data causing the record to not fit on its containing data page --
something that would happen a lot more if VARCHAR was used.
If the data is:
1) Often used in predicates or aggregates
2) Often updated and the size of the updated field stays a similar range
3) ALWAYS within a small, defined range of bytes (say... 32-64 bytes)
Then it's often advantageous to use VARCHAR (or CHAR) over TEXT.
But it's wrong to say that it is ALWAYS faster to use VARCHAR vs. BLOB/TEXT.
Best,
-jay
> So this is important change and shouldn't be -2.
>
>
> Best regards,
> Boris Pavlovic
>
>
>
>
>
> On Fri, Oct 25, 2013 at 12:30 AM, Michael Still <mikal at stillhq.com
> <mailto:mikal at stillhq.com>> wrote:
>
> Hi.
>
> Because I am a grumpy old man I have just -2'ed
> https://review.openstack.org/#/c/39685/ and I wanted to explain my
> rationale. Mostly I am hoping for a consensus to form -- if I am wrong
> then I'll happy remove my vote from this patch.
>
> This patch does the reasonably sensible thing of converting two
> columns from being text to varchar, which reduces their expense to the
> database. Given the data stored is already of limited length, it
> doesn't impact our functionality at all either.
>
> However, when I run it with medium sized (30 million instances)
> databases, the change does cause a 10 minute downtime. I don't
> personally think the change is worth such a large outage, but perhaps
> everyone else disagrees.
>
> Discuss.
>
> Thanks,
> Michael
>
> PS: I could see a more complicated approach where we did these changes
> "in flight" by adding columns, using a periodic task to copy data to
> the new columns, and then dropping the old. That's a lot more
> complicated to implement though.
>
> --
> Rackspace Australia
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list