[openstack-dev] [barbican] Date/time/timezone parsing
John Dennis
jdennis at redhat.com
Wed Jun 5 15:24:06 UTC 2013
On 06/05/2013 08:36 AM, Jarret Raim wrote:
> On 6/4/13 9:13 PM, "John Dennis" <jdennis at redhat.com> wrote:
>
>> On 06/04/2013 07:01 PM, John Wood wrote:
>>> Simo, we were planning to normalize times into UTC prior to putting
>>> into datastore, but didn't know if it would be too stringent to make
>>> clients also conform to UTC to use the API. Using UTC on both sides of
>>> the API does seem safer and more robust overall though, so we could
>>> enforce this in our code base for sure.
>>>
>>> Would anyone out there object to a UTC-only mode of operation for
>>> barbican?
>> Time values should always be in UTC. The only time (no pun intended) a
>> time value should be in local time is when it is displayed to the user
>> or accepted as user input, after which it should immediately be
>> converted to UTC. Following the rule that time values are always UTC
>> will prevent any number of nasty problems that can easily be avoided.
>
> The API will store all date times as UTC. However, when a customer
> specifies a timezone offset in a message, we have two options. Either we
> accept the message, modify the UTC time to correctly represent the
> requested date time (e.g. Apply the offset) or reject the message as
> malformed.
>
> The current iso8601 implementation allows us to do neither. In some cases
> it incorrectly parses the timezone offset (or ignores it) and does not
> throw an error. I'm fine with rejecting a message with an offset if that's
> the way that the rest of the APIs work. Is there a way to do that with the
> current olso / iso8601 implementation? I guess we could roll our own, but
> that seems like something property belonging to the parsing library.
>
>
Going back to the examples in the original post it appears the parser is
broken and needs to be fixed. Here are my suggestions based on having
worked a fair bit with date/time values.
Follow the rules in RFC 3339.
The wikipedia article on ISO 8601 gives a very lucid (i.e. not
rfc-speak) of how 8601 works, from that you can see the parser is broken
with respect to "last field, highest precision" parsing. Fix the parser.
Demand that any message containing a timestamp which does not strictly
follow the rules be rejected. Permitting common case exceptions or other
formats has lead to many headaches best avoided. Make clients play by
the rules.
Never accept a timestamp without a timezone specifier, either "Z" for
UTC or a numeric offset.
Provide guidance that timestamps should be in UTC. My personal
preference is UTC but in some scenarios knowing the offset can be
useful, but this usually only arises in select situations such as in
scheduling where an awareness of local time can provide extra
information useful for allocating resources. But there are so many
problems with the interpretation of UTC offsets that there be dragons
there, beware. The best recommendation is to immediately convert to UTC
but preserve the offset as extra information if you believe it might be
useful, but don't use the stored offset for anything authoritative,
consider as only scheduling guidance if needed.
I don't know if the Python module you're working with is based on
Python's native datetime or not but I've used datetime a fair amount and
(at least in Python 2) there is a nasty problem with respect to
timezones. It appears as if datetime was originally written without
timezone support and timezone was later grafted on. This lead to "tz
naive" and "tz aware" datetime objects. This was very unfortunate
because a timestamp with tz info isn't much use as a timestamp because
it's ambiguous. It's further compounded by the fact you get "tz naive"
datetime objects by default, which of course is what most people do. So
by default you get meaningless timestamps :-( It's kind of like having a
scalar value without pairing it with a "units" value. Because Python's
datetime objects are so ambiguous (either by default or by incorrect
processing of the extra tz info) the clearest, simplest, and most robust
recommendation is require that every datetime (or time) object be in
UTC. Hopefully it will be obvious a "tz naive" datetime object can't
distinguish between a value in UTC or one that has an implicit offset,
hence the rule "UTC always".
Of course there is problem of comparing datetime values. If datetime
objects always carried tz offset then using non-UTC values would be fine
in most cases because the values would be normalized before comparison.
But without the extra tzinfo data you can't normalize. Python's datetime
library will throw an exception if you try to compare a naive and aware
object because normalization can't be performed. But it will happily
compare two naive objects and unless those objects happen to share the
same tz offset you're screwed. You're really left with two choices 1)
always use "tz aware objects", but this is difficult because you don't
get it by default and it's non-trivial, or 2) always use UTC.
Hope this clarifies things a little.
The biggest lesson I've learned from systems with problems (and there
are many) is initial weak specification of date/time values and/or
permitting exceptions, in short order you'll have a mess. The other
thing to remember is UTC offsets are very tricky and are only meaningful
for an exact time and location (hence the Olson Database). Try as much
as possible to avoid offsets, if present process them immediately
(stored offsets are a huge problem) and treat a supplied offset as
informational extra data or completely discard it if local scheduling is
not involved.
John
More information about the OpenStack-dev
mailing list