<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 22, 2016 at 6:33 PM, Thierry Carrez <span dir="ltr"><<a href="mailto:thierry@openstack.org" target="_blank">thierry@openstack.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Jeremy Stanley wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 2016-06-21 17:34:07 +0000 (+0000), Jeremy Stanley wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 2016-06-21 18:16:49 +0200 (+0200), Thierry Carrez wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

It hurts a lot when it's down because of so many services being served from<br>

it. We could also separate the published websites (status.o.o,<br>

governance.o.o, security.o.o, releases.o.o...) which require limited<br>

resources and grow slowly, from the more resource-hungry storage sites<br>

(logs.o.o, tarballs.o.o...).<br>

</blockquote>

<br>

Agreed, that's actually a pretty trivial change, comparatively<br>

speaking.<br>

</blockquote>

<br>

Oh, though it bears mention that the most recent extended outage<br>

(and by far longest we've experienced in a while) would have been<br>

just as bad either way. It had nothing to do with recovering<br>

attached volumes/filesystems, but rather was a host outage at the<br>

provider entirely outside our sphere of control. That sort of issue<br>

can potentially happen with any of our servers/services no matter<br>

how much we split them up.<br>

</blockquote>

<br></span>

I don't think it would have been just as bad... Even in the unlucky case where the VMs end up on the same machine and are all affected, IIUC rebuilding some of them would have been much faster if they were split up (less data to rsync) ?</blockquote><div><br></div><div><br></div><div>I believe only the VM was rsync'd to a new server and the cinder volumes reattached. What did take a while though was the fsck that ran after it rebooted. This would have been faster for some services if they were on a separate VM with smaller disks.</div><div><br></div><div>Another gain from separating the services is the surface area affected should a node go down is smaller. If we're lucky just one service would go down at a time (for example job logs, but tarballs stays up).</div><div><br></div><div>Cheers,<br>Josh</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888"><br>

<br>

-- <br>

Thierry Carrez (ttx)</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

_______________________________________________<br>

OpenStack-Infra mailing list<br>

<a href="mailto:OpenStack-Infra@lists.openstack.org" target="_blank">OpenStack-Infra@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra</a><br>

</div></div></blockquote></div><br></div></div>