<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Aug 14, 2016 at 5:53 PM, John Griffith <span dir="ltr"><<a href="mailto:john.griffith8@gmail.com" target="_blank">john.griffith8@gmail.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div style="font-family:monospace,monospace">I'd like to get a more detailed use case and example of a problem you want to solve with this. I have a number of concerns including those I raised in your "list manageable volumes" proposal. Most importantly there's really no clear definition of what these fields mean and how they should be interpreted. </div></div></div></blockquote><div><br></div><div>I didn't specify what anything means yet on purpose - the idea was to first gather information here about what various backends can report, then we make an educated decision about what health states make sense to expose.</div><div><br></div><div>I see Cinder's potential as a single pane of glass management for all of my cloud's storage. Once I do some initial configuration, I hope to look at the backend's UI as little as possible. Today a user can create a volume, but can't know anything about it's resiliency or availability. The user has a volume that's "available" and is happy. But what does the user really care about? In my opinion not Cinder's internal state machine, but things like "Is my data safe?" and "Is my data accessible?" That's the problem that I want to solve here.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div style="font-family:monospace,monospace">For backends, I'm not sure what you want to solve that can't be handled already by the scheduler and report-capabilities periodic job? You can already report back from your backend to the scheduler that you shouldn't be used for any scheduling activities going forward. More detailed info than that might be useful, but I'm not sure it wouldn't fall into an already existing OpenStack monitoring project like Monasca? </div></div></div></blockquote><div><br></div><div>My storage requires maintenance and now all volumes are unaccessible. I have management access and create as many volumes as I want, but no attach. Or the storage is down totally. Or it is up but performance/reliability is degraded due to rebuilds in progress. Or multiple disks failed, and I lost data from 100 volumes.</div><div><br></div><div>In all these cases, all I see is that my volumes are available/in-use. To have any real insight into what is going on the admin has to go to the storage backend and use vendor-specific APIs to find out. Why not abstract these APIs as well, to allow the admin to monitor the storage? It can be as simple as "Hey, there's a problem, your volumes aren't accessible - go look at the backend's UI" - without going into details.</div><div> </div><div>Do you propose every vendor write a Monasca plugin? It doesn't seem to be in line with their goal...</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div style="font-family:monospace,monospace">As far as volumes, I personally don't think volumes should have more than a few states. They're either "ok" and available for an operation or they're not.</div></div></div></blockquote><div><br></div><div>I agree. In my opinion volumes have way too many states today. But that's another topic. What I am proposing is not new states, or a new state machine, but rather a simple health property: volume['health'] = "healthy", volume['health'] = "error". Whatever the backend reports.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div style="font-family:monospace,monospace">The list you have seems ok to me, but I don't see a ton of value in fault prediction or going to great lengths to avoid something failing. The current model we have of a volume being "ok" until it's "not" seems perfectly reasonable to me. Typically my experience is that trying to be clever and polling/monitoring to try and preemptively change the status of a volume does little more than result in complexity, confusion and false status changes of resources. I'm pretty strongly opposed to having a level of granularity of the volume here. At least for now, I'd rather see what you have in mind for the backend and nail that down to something that's solid and basically bullet proof before trying to tackle thousands of volumes which have transient states. And of course the biggest question I have still "what problem" you hope to solve here? </div></div></div></blockquote><div><br></div><div>This is not about fault prediction, or preemptive changes, or anything fancy like that. It's simply reporting on the current health. "You have lost the data in this volume, sorry". "Don't bother trying to attach this volume right now, it's not accessible." "The storage is currently doing something with your volume and performance will suck."</div><div><br></div><div>I don't know exactly what we want to expose - I'd rather answer that after getting feedback from vendors about what information is available. But providing some real, up to date, health status on storage resources is a big value for customers.</div><div><br></div><div>Thanks,</div><div>Avishay</div></div><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><b style="font-size:12.7273px"><font color="#666666">Avishay Traeger, PhD</font></b><br></div><div><font color="#666666"><i>System Architect</i></font></div><div><span style="color:rgb(102,102,102);font-size:12.7273px"><br></span></div><div><span style="color:rgb(102,102,102)">Mobile:</span><span style="color:rgb(102,102,102)"> </span><a value="+972524317955" style="color:rgb(17,85,204)">+972 54 447 1475</a><br></div><div><font color="#666666">E-mail: <a href="mailto:avishay@stratoscale.com" style="color:rgb(17,85,204)" target="_blank">avishay@stratoscale.com</a></font></div><div><font color="#666666"><br></font></div><div><img src="http://www.stratoscale.com/wp-content/uploads/Logo-Signature-Stratoscale-230.jpg"><br></div><div><font color="#666666"><br></font></div><div><p style="margin:0in"><a href="http://www.stratoscale.com/" style="color:rgb(17,85,204)" target="_blank"><span style="font-family:arial;font-size:9.75pt">Web</span></a><span style="font-family:arial;font-size:9.75pt"> | </span><a href="http://www.stratoscale.com/blog/" style="color:rgb(17,85,204)" target="_blank"><span style="font-family:arial;font-size:9.75pt">Blog</span></a><span style="font-family:arial;font-size:9.75pt;color:rgb(108,163,214)"> | </span><a href="https://twitter.com/Stratoscale" style="color:rgb(17,85,204)" target="_blank"><span style="font-family:arial;font-size:9.75pt">Twitter</span></a><span style="font-family:arial;font-size:9.75pt;color:rgb(108,163,214)"> | <a href="https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts" style="color:rgb(17,85,204)" target="_blank">Google+</a> | </span><span style="font-family:arial;font-size:9.75pt"><a href="https://www.linkedin.com/company/stratoscale" style="color:rgb(17,85,204)" target="_blank">Linkedin</a></span></p></div></div></div></div></div></div></div></div></div>
</div></div>