<div dir="ltr">You don't need to run swift-dispersion-populate more than once - all it does is put a bunch of objects in some % of your ring's partitions. The number of partitions in a ring is fixed at creation [1] - only which device where each partition is assigned will change with a rebalance.<div><br></div><div>swift-dispersion-report uses the objects placed in the cluster to query multiple backend replicas of each known object - and highlight any partitions that have less than all replicas available at their primary locations [2]</div><div><br></div><div>The objects generated by swift-dispersion-populate will not go away unless you loose all three replicas - which would be bad - because unless you can find them - it probably means any user data that happened to share a partition with those objects is also lost. Luckily if you're monitoring partition health using swift-dispersion-report you should have some early warnings if some partition has even one (or two!?) replicas unavailable at the primary locations and can take some corrective action before things get scary.</div><div><br></div><div>These tools are used as a proxy for overall cluster health - particularly for replicator/rebalance or hardware failure</div><div><br></div><div>They don't really evaluate the ring assignment placement algorithm - which determines ring assignment based on failure domains (regions). The ring itself as an abstract data structure *also* has a metric unfortunately *also* named "dispersion" - it useful to understanding how your rings are using your cluster topology - but it's orthogonal to where data is physically located on disk "at this moment" - it's a more pure representation of where the data *should be eventually*. You can evaluate your ring with `swift-ring-builder object.builder dispersion -v` it would warn you immediately if all replicas of some part were assigned to region 0 and no replicas of the part were assigned to region 1.</div><div><br></div><div>But because you bring up write affinity I assume you're dealing with neither of these concepts - but instead you're trying to monitor/understand how/when data written to handoff locations in the local region is being moved to the remote region. I don't think anyone has tried to use swift-dispersion-populate on a constant/repeating basis to solve for this - although I can see now a dispersion-populate with write-affinity followed immediately by a dispersion-report *will* accurately reflect that until the replicator moves them - most parts will have all replicas in the region local to the write.</div><div><br></div><div>IMHO this is still something operations teams are thinking about. You should start with the recent-ish writeup in the admin guide doco that address this issue directly [3]. Then you might find some useful information on swift-dispersion-report and some of the visualizations SwiftStack uses to represent handoff partitions in a recent-ish presentation from Barcelona [4]. If you want to maybe integrate some form of handoff monitoring to your system you I have a script I've been wanting to polish - but maybe you can get inspired [5]. Lastly you might consider that there's a small growing number of swift operators/maintainers that think write_affinity is more trouble than it's worth - the natural back-pressure of forcing clients to write directly to the local and remote regions has a number of benefits - if that's even remotely possible for you I would highly recommended it - if you're not sure you should at least try it.</div><div><br></div><div>Keep us posted.</div><div><br></div><div>Good luck,</div><div><br></div><div>-Clay</div><div><br></div><div><div>1. ignoring <a href="https://review.openstack.org/#/c/337297/">https://review.openstack.org/#/c/337297/</a> ;)</div></div><div>2. typically because of a rebalance, but disk failures or offline node will also cause a replica of some parts to be unavailable</div><div>3. <a href="http://docs.openstack.org/developer/swift/admin_guide.html#checking-handoff-partition-distribution">http://docs.openstack.org/developer/swift/admin_guide.html#checking-handoff-partition-distribution</a><br></div><div>4. <a href="https://youtu.be/ger20cqOypE?t=1412">https://youtu.be/ger20cqOypE?t=1412</a></div><div>5. <a href="https://gist.github.com/clayg/90143abc1c34e259752bf333f485a37e">https://gist.github.com/clayg/90143abc1c34e259752bf333f485a37e</a></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 12, 2017 at 6:06 PM, Mark Kirkwood <span dir="ltr"><<a href="mailto:mark.kirkwood@catalyst.net.nz" target="_blank">mark.kirkwood@catalyst.net.nz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We are running a 2 region Swift cluster with write affinity.<br>
<br>
I've just managed to deceive myself with the Dispersion report :-( . The last run of the populate was Early Dec, and the corresponding report happily shows 100%. All good - seemingly.<br>
<br>
However probing the actual distribution of a number of objects created later in Dec I see they have 3 copies in region 1 and 0 in regions 2. Hmm... I'm guessing that if we ran the population more frequently this would have been highlighted sooner. This begs the question - how frequently is it sensible to run the populate?<br>
<br>
regards<br>
<br>
Mark<br>
<br>
<br>
<br>
______________________________<wbr>_________________<br>
Mailing list: <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k</a><br>
Post to : <a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.openstack.org</a><br>
Unsubscribe : <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k</a><br>
</blockquote></div><br></div>