[openstack-dev] [swift] - question about statsd messages and 404 errors
Samuel Merritt
sam at swiftstack.com
Fri Jul 25 22:46:38 UTC 2014
On 7/25/14, 4:58 AM, Seger, Mark (Cloud Services) wrote:
> I’m trying to track object server GET errors using statsd and I’m not
> seeing them. The test I’m doing is to simply do a GET on an
> non-existent object. As expected, a 404 is returned and the object
> server log records it. However, statsd implies it succeeded because
> there were no errors reported. A read of the admin guide does clearly
> say the GET timing includes failed GETs, but my question then becomes
> how is one to tell there was a failure? Should there be another type of
> message that DOES report errors? Or how about including these in the
> ‘object-server.GET.errors.timing’ message?
What "error" means with respect to Swift's backend-server timing metrics
is pretty fuzzy at the moment, and could probably use some work.
The idea is that object-server.GET.timing has timing data for everything
that Swift handled successfully, and object-server.GET.timing.errors has
timing data for things where Swift failed.
Some things are pretty easy to divide up. For example, 200-series status
code always counts as success, and 500-series status code always counts
as error.
It gets tricky in the 400-series status codes. For example, a 404 means
that a client asked for an object that doesn't exist. That's not Swift's
fault, so that goes into the success bucket (object-server.GET.timing).
Similarly, a 412 means that a client set an unsatisfiable precondition
in the If-Match, If-None-Match, If-Modified-Since, or
If-Unmodified-Since headers, and Swift correctly determined that the
requested object can't fulfill the precondition, so that one goes in the
success bucket too.
However, there are other status codes that are more ambiguous. Consider
409; the object server responds with 409 if the request's X-Timestamp is
less than the object's X-Timestamp (on PUT/POST/DELETE). You can get
this with two near-simultaneous POSTs:
1. request A hits proxy; proxy assigns X-Timestamp: 1406316223.851131
2. request B hits proxy; proxy assigns X-Timestamp: 1406316223.851132
3. request B hits object server and gets 202
4. request A hits object server and gets 409
Does that error count as Swift's fault? If the client requests were
nearly simultaneous, then I think not; there's always going to be *some*
delay between accept() and gettimeofday(). On the other hand, if one
proxy server's time is significantly behind another's, then it is
Swift's fault.
It's even worse with 400; sometimes it's for bad paths (like asking an
object server for /<partition>/<account>/<container>; this can happen if
the administrator misconfigures their rings), and sometimes it's for bad
X-Delete-At / X-Delete-After values (which are set by the client).
I'm not sure what the best way to fix this is, but if you just want to
see some error metrics, unmount a disk to get some 507s.
More information about the OpenStack-dev
mailing list