<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 03/28/2014 11:29 AM, Serge Kovaleff
wrote:<br>
</div>
<blockquote
cite="mid:CALW1W+W_QY_wExLY0EmEJpTRwrJLHMcuVkRMSHHurPBs=Dvpsw@mail.gmail.com"
type="cite">
<div dir="ltr">Hi Iliia,
<div><br>
</div>
<div>I would take a look into BSON <a moz-do-not-send="true"
href="http://bsonspec.org/">http://bsonspec.org/</a></div>
<div class="gmail_extra"><br clear="all">
<div>
<div dir="ltr">
<div><span>Cheers,</span></div>
<span>Serge Kovaleff</span><br>
<div><br>
</div>
</div>
</div>
<div class="gmail_quote">On Thu, Mar 27, 2014 at 8:23 PM,
Illia Khudoshyn <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:ikhudoshyn@mirantis.com" target="_blank">ikhudoshyn@mirantis.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>Hi, Openstackers,<br>
<br>
</div>
I'm currently working on adding bulk data load
functionality to MagnetoDB. This functionality implies
inserting huge amounts of data (billions of rows,
gigabytes of data). The data being uploaded is a set
of JSON's (for now). The question I'm interested in is
a way of data transportation. For now I do streaming
HTTP POST request from the client side with
gevent.pywsgi on the server side.<br>
<br>
Could anybody suggest any (better?) approach for the
transportation, please?<br>
</div>
<div>What are best practices for that.<br>
<br>
</div>
Thanks in advance.<br clear="all">
<div>
<div>
<div><br>
-- <br>
<div dir="ltr">
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">
Best regards,</p>
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">Illia
Khudoshyn,<br>
Software Engineer, Mirantis, Inc.</p>
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">
</p>
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">38,
Lenina ave. Kharkov, Ukraine</p>
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"><span
style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(136,136,136)"><a
moz-do-not-send="true"
href="http://www.mirantis.ru/"
style="color:rgb(17,85,204)"
target="_blank">www.mirantis.com</a></span></p>
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"><span
style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(136,136,136)"><a
moz-do-not-send="true"
href="http://www.mirantis.ru/"
style="color:rgb(17,85,204)"
target="_blank">www.mirantis.ru</a></span></p>
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"><span
style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(136,136,136)"> </span></p>
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">
<span style="font-size:11pt">Skype: gluke_work</span><br>
</p>
<p style="margin:0in 0in
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"><span
style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(136,136,136)"><a
moz-do-not-send="true"
href="mailto:ikhudoshyn@mirantis.com"
style="color:rgb(17,85,204)"
target="_blank">ikhudoshyn@mirantis.com</a></span></p>
</div>
</div>
</div>
</div>
</div>
<br>
_______________________________________________<br>
OpenStack-dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a moz-do-not-send="true"
href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
OpenStack-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a>
<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>
</pre>
</blockquote>
Hi Iliia,<br>
I guess if we a talking about cassandra batch loading the fastest
way is to generate sstables locally and load it into Cassandra via
JMX or sstableloader<br>
<a class="moz-txt-link-freetext" href="http://www.datastax.com/dev/blog/bulk-loading">http://www.datastax.com/dev/blog/bulk-loading</a><br>
<br>
If you want to implement bulk load via magnetodb layer (not to
cassandra directly) you could try to use simple tcp socket and
implement your binary protocol (using bson for example). Http is
text protocol so using tcp socket can help you to avoid overhead of
base64 encoding. In my opinion, working with HTTP and BSON is
doubtful solution<br>
because you wil use 2 phase encoddung and decoding: 1) "object to
bson", 2) "bson to base64", 3) "base64 to bson", 4) "bson to object"
1) "obect to json" instead of 1) "object to json", 2) "json to
object" in case of HTTP + json<br>
<br>
Http streaming as I know is asynchronous type of http. You can
expect performance growing thanks to skipping generation of http
response on server side and waiting on for that response on client
side for each chunk. But you still need to send almost the same
amount of data. So if network throughput is your bottleneck - it
doesn't help. If server side is your bottleneck - it doesn't help
too.<br>
<br>
Also pay your attention that in any case, now MagnetoDB Cassandra
Storage convert your data to CQL query which is also text. It would
be nice to implement MagnetoDB BatchWriteItem operation via
Cassandra sstable generation and loading via sstableloader, but
unfortunately as I know this functionality support implemented only
for Java world<br>
<pre class="moz-signature" cols="72">--
Best regards,
Dmitriy Ukhlov
Mirantis Inc.</pre>
</body>
</html>