Open Stack

Thu Jul 27 17:49:03 UTC 2017

I guess we're really getting into the weeds here.

On 07/27/2017 12:51 PM, Octave J. Orgeron wrote:
> Hi Jay,
> 
> Comments below..
> 
> On 7/26/2017 5:43 PM, Jay Pipes wrote:
>> On 07/26/2017 07:06 PM, Octave J. Orgeron wrote:
>>> Hi Michael,
>>>
>>> On 7/26/2017 4:28 PM, Michael Bayer wrote:
>>>>
>>>> it at all.
>>>> thinking out loud
>>>>
>>>> oslo_db.sqlalchemy.types.String(255, mysql_small_rowsize=64)
>>>> oslo_db.sqlalchemy.types.String(255, mysql_small_rowsize=sa.TINYTEXT)
>>>> oslo_db.sqlalchemy.types.String(255, mysql_small_rowsize=sa.TEXT)
>>>>
>>>>
>>>> so if you don't have mysql_small_rowsize,  nothing happens.
>>>>
>>>
>>> I think the mysql_small_rowsize is a bit misleading since in one case 
>>> we are changing the size and the others the type. Perhaps:
>>>
>>> mysql_alt_size=64
>>> mysql_alt_type=sa.TINYTEXT
>>> mysql_alt_type=sa.TEXT
>>>
>>> alt standing for alternate. What do you think?
>>
>> -1
>>
>> I think it should be specific to NDB, since that's what the override 
>> is for. I'd support something like:
>>
>>  oslo_db.sqlalchemy.types.String(255, mysql_ndb_size=64)
>>
>> Octave, I understand due to the table row size limitations the desire 
>> to reduce some column sizes for NDB. What I'm not entirely clear on is 
>> the reason to change the column *type* specifically for NDB. There are 
>> definitely cases where different databases have column types -- say, 
>> PostgreSQL's INET column type -- that don't exist in other RDBMS. For 
>> those cases, the standard approach in SQLAlchemy is to create a 
>> sqlalchemy ColumnType concrete class that essentially translates the 
>> CREATE TABLE statement (and type compilation/coercing) to specify the 
>> supported column type in the RDBMS if it's supported otherwise 
>> defaults the column type to something coerceable.
> 
> When it comes to changing the size or the type for a column for NDB, 
> this has to do with the difference in the table row limits. InnoDB 
> limits to 65k and NDB limits to 14k. You can't cross those limits in 
> either engine because it's used as part of the internal storage engine 
> and affects things like replication constraints, memory alignment, etc.

Yes, I'm aware of those constraints, though you are incorrect about InnoDB.

InnoDB's row size limit is actually not 65K. It is dependent on the 
innodb_page_size value. At the default innodb_page_size value of 16KB, 
the max row size is 8KB. It is MySQL, not InnoDB, that places a max row 
size of 64KB which limits row size when innodb_page_size is set to a 
large value.

However, it is important to point out that InnoDB's max row size 
*doesn't* include the size of BLOB-based *data*, only a pointer to that 
data on disk. *Nor* does InnoDB count VARCHAR/VARBINARY columns' size in 
its maximum row size calculations. A VARCHAR(9000) column in InnoDB is 
perfectly acceptable. [1]

NDB, on the other hand, isn't a MySQL storage engine in the way that 
InnoDB is [2]. :) It's a completely different database system that is 
designed for in-memory-only use, though support for TEXT and BLOB 
columns in disk-backed cluster nodes is supported now.

NDB always stores the first 256 bytes of the BLOB/TEXT data plus an 
8-byte pointer to disk table data. This is in contrast to InnoDB which 
just stores a pointer to the data [2]. VARCHAR/VARBINARY data in InnoDB 
is different. While InnoDB will try to fit some amount of the VARCHAR 
data in the row itself before automatically overflowing the data to a 
pointer to a separate data page, NDB, on the other hand, treats 
VARCHAR/VARBINARY is fixed-size columns.

This is the big difference between the two systems and why you are 
changing some columns to the TEXT type.

[1] mysql> use test
Database changed
mysql> show variables like 'innodb_page_size';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| innodb_page_size | 16384 |
+------------------+-------+
1 row in set (0.02 sec)

mysql> create table t1 (a VARCHAR(9000));
Query OK, 0 rows affected (0.02 sec)

[2] This is the reason NDB isn't listed under "Alternative Storage 
Engines" in the MySQL documentation...

> Because we are dealing with an issue of row length within the table, the 
> best way to work around this is to do one of the following.. change the 
> size of the column so that it fits, move the column to another table, 
> split the table up, or to change it to a different type. The reason why 
> this works is that TEXT types are stored as blobs in databases.

Please see my earlier response about the REST API -- at least in Nova -- 
unfortunately exposing certain input field length limitations. It's not 
possible to reduce certain column sizes without a corresponding 
microversion bump in the public API.

> All database engines handle BLOBs differently than other types and as
> a result they reduce the count against the row length. That's why I
> change some of these columns to TEXT types.
As mentioned above, you are changing the column type because of the fact 
that NDB treats VARCHAR as fixed-length CHAR fields and counts the max 
VARCHAR size against the total row size, unlike InnoDB. It's an 
important distinction.

> If you look closely through services like Neutron, Barbican,
> Designate, Keystone, etc. you'll see that they have hit the 65k limit
> in InnoDB on some tables and have had to do the same thing.
> Realistically, any time you are storing something like SSH keys, SSL
> certs, output from commands, etc. you should be using the TEXT types
> anyways.

No disagreement from me at all here. However, see above point about the 
public REST API and the constraints it places on us.

> FYI, if you were talking about a large enterprise database for a bank or 
> retail shop, DBAs spend a lot of time designing tables and looking very 
> closely at the structure to ensure that they don't hit performance 
> problems, run out of row or table space, etc. They are extremely careful 
> about the usage of space. In some of the openstack projects, it's very 
> clear that we are wasting a lot of space and when tables get too wide, 
> they have to be rearranged and modified to deal with the limits and 
> constraints. So to put it into context for Nova, if any of the tables 
> are close to 65k in width, they will need to be modified or restructured 
> eventually.

I agree that Nova's database schema (and other project's schemas) is 
less than optimal in a number of places. I very much look forward to 
your contributions to Nova to optimize our DB schema -- outside of any 
NDB-specific things.

> Each database has structure limits:
> 
> https://www.postgresql.org/about/
> https://dev.mysql.com/doc/refman/5.7/en/innodb-restrictions.html
> https://dev.mysql.com/doc/mysql-cluster-excerpt/5.7/en/mysql-cluster-limitations.html 
> 
> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0001029.html 
> 
> https://docs.oracle.com/cloud/latest/db112/REFRN/limits003.htm#REFRN0043
> 
> If you dig through those, you'll see that each database has different 
> limits on things like columns, rows, sizes, indexes, etc. So this isn't 
> just an NDB constraint. If you want everything to work across InnoDB, 
> NDB, PostgreSQL, DB2, etc. we will have to deal with these table issues 
> eventually.

Octave, nobody is disagreeing with you on the fact that different DBs 
have different storage restrictions. What we are trying to do is come up 
with a solution to supporting NDB without:

a) Making NDB-only/NDB-specific code modules in oslo.db
b) Embedding tribal knowledge about NDB's internal data structure 
restrictions in a format that only experts will be able to understand

Nobody is against NDB. We're just trying to find an elegant solution 
that is maintainable long-term.

Mike's got a good start on the above-mentioned solution here:

https://review.openstack.org/#/c/487902/

I'll be reviewing it later on this evening. Hope to see your review on 
it as well.

Best,
-jay

Open Stack

[openstack-dev] [oslo.db] [ndb] ndb namespace throughout openstack projects

OpenStack

Community

Documentation

Branding & Legal