<div dir="ltr"><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">Hi,</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">I have done some SQLite footgun elimination at Mozilla, was curious if swift ran into similar issues.</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px">From blog posts like <a href="http://blog.maginatics.com/2014/05/13/multi-container-sharding-making-openstack-swift-swifter/" target="_blank">http://blog.maginatics.com/2014/05/13/multi-container-sharding-making-openstack-swift-swifter/</a> and <a href="http://engineering.spilgames.com/openstack-swift-lots-small-files/" target="_blank">http://engineering.spilgames.com/openstack-swift-lots-small-files/</a> it seemed worth looking into.</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px"><br></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px"><b>Good things</b></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
* torgomatic pointed out on IRC that inserts are now batched via an intermediate file that isn't fsync()ed(<a href="https://github.com/openstack/swift/commit/85362fdf4e7e70765ba08cee288437a763ea5475%29" target="_blank">https://github.com/openstack/swift/commit/85362fdf4e7e70765ba08cee288437a763ea5475)</a>. That should help with usecases described by above blog posts. Hope rest of my observations are still of some use.</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px">* There are few indexes involved, this is good because indexes in single-file databases are very risky for perf.</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
<br></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">I setup devstack on my laptop to observe swift performance and poke at the resulting db. I don't have a proper benchmarking environment to check if any of my observations are valid.</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px"><br></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px"><b>Container .db handle LRU</b></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
It seems that container DBs are opened once per read/write operation: having container-server keep LRU list of db handles might help workloads with hot containers</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
<br></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px"><b>Speeding up LIST</b></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">* Lack of index for LIST is good, but means LIST will effectively read whole file.</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px">* 1024 byte pagesize is used, moving to bigger pagesizes, reduces numer of syscalls</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
** Firefox moving to 1K->32K cut our DB IO by 1.2-2x <a href="http://taras.glek.net/blog/2013/06/28/new-performance-people/" target="_blank">http://taras.glek.net/blog/2013/06/28/new-performance-people/</a></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
* Doing fadvise(WILL_NEED) on the db file prior to opening it with SQLite should help OS read the db file in at maximum throughput. This causes Linux to issue disk IO in 2mb chunks vs 128K with default readahead settings. SQLite should really do this itself :(</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px">* Appends end up fragmenting the db file, should use <a href="http://www.sqlite.org/c3ref/c_fcntl_chunk_size.html#sqlitefcntlchunksize" target="_blank">http://www.sqlite.org/c3ref/c_fcntl_chunk_size.html</a><a href="http://piratepad.net/ep/search?query=sqlitefcntlchunksize" target="_blank"></a><a href="http://www.sqlite.org/c3ref/c_fcntl_chunk_size.html#sqlitefcntlchunksize" target="_blank">#sqlitefcntlchunksize</a> to grow DB with less fragmentation OR copy(with fallocate) sqlite file over every time it doubles in size(eg during weekly compaction)</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px">** Fragmentation means db scans are non-sequential on disk</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">** XFS is particularly susceptible to fragmentation. Can use filefrag on .db files to monitor fragmentation</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px"><br></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px"><b>Write amplification</b></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
* write amplification is bad because it causes table scans to be slower than necessary(eg reading less data is always better for cache locality; torgomatic says container dbs can get into gigabytes)</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
* swift uses timestamps in decimal seconds form..eg 1409350185.26144 as a string. I'm guessing these are mainly used for HTTP headers yet HTTP uses seconds, which would normally only take up 4 bytes</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
* CREATE INDEX ix_object_deleted_name ON object (deleted, name) might be a problem for delete-heavy workloads</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">** SQLite copies column entries used in indexes. Here the index almost doubles amount of space used by deleted entries</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px">** Indexes in general are risky in sqlite, as they end up dispersed with table data until a VACUUM. This causes table scan operations(eg during LIST) to be suboptimal. This could also mean that operations that rely on the index are no better IO-wise than a whole table scan.</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px">* deleted is both in content type & deleted field. This might not be a big deal.</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
* Ideally you'd be using a database that can be (lz4?) compressed at a whole-file level. I'm not aware of a good off-the-shelf solution here. Some column store might be a decent replacement for SQLite</div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">
<br></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">Hope some of these observations are useful. If not, sorry for the noise. I'm pretty impressed at swift-container's minimalist SQLite usage, did not see many footguns here.</div>
<div style="font-family:arial,sans-serif;font-size:12.7272720336914px"><br></div><div style="font-family:arial,sans-serif;font-size:12.7272720336914px">Taras</div></div>