[openstack-dev] [barbican] Date/time/timezone parsing

John Dennis jdennis at redhat.com
Wed Jun 5 15:24:06 UTC 2013


On 06/05/2013 08:36 AM, Jarret Raim wrote:
> On 6/4/13 9:13 PM, "John Dennis" <jdennis at redhat.com> wrote:
>
>> On 06/04/2013 07:01 PM, John Wood wrote:
>>> Simo, we were planning to normalize times into UTC prior to putting
>>> into datastore, but didn't know if it would be too stringent to make
>>> clients also conform to UTC to use the API. Using UTC on both sides of
>>> the API does seem safer and more robust overall though, so we could
>>> enforce this in our code base for sure.
>>>
>>> Would anyone out there object to a UTC-only mode of operation for
>>> barbican?
>> Time values should always be in UTC. The only time (no pun intended) a
>> time value should be in local time is when it is displayed to the user
>> or accepted as user input, after which it should immediately be
>> converted to UTC. Following the rule that time values are always UTC
>> will prevent any number of nasty problems that can easily be avoided.
>
> The API will store all date times as UTC. However, when a customer
> specifies a timezone offset in a message, we have two options. Either we
> accept the message, modify the UTC time to correctly represent the
> requested date time (e.g. Apply the offset) or reject the message as
> malformed.
>
> The current iso8601 implementation allows us to do neither. In some cases
> it incorrectly parses the timezone offset (or ignores it) and does not
> throw an error. I'm fine with rejecting a message with an offset if that's
> the way that the rest of the APIs work. Is there a way to do that with the
> current olso / iso8601 implementation? I guess we could roll our own, but
> that seems like something property belonging to the parsing library.
>
>
Going back to the examples in the original post it appears the parser is 
broken and needs to be fixed. Here are my suggestions based on having 
worked a fair bit with date/time values.

Follow the rules in RFC 3339.

The wikipedia article on ISO 8601 gives a very lucid (i.e. not 
rfc-speak) of how 8601 works, from that you can see the parser is broken 
with respect to "last field, highest precision" parsing. Fix the parser.

Demand that any message containing a timestamp which does not strictly 
follow the rules be rejected. Permitting common case exceptions or other 
formats has lead to many headaches best avoided. Make clients play by 
the rules.

Never accept a timestamp without a timezone specifier, either "Z" for 
UTC or a numeric offset.

Provide guidance that timestamps should be in UTC.  My personal 
preference is UTC but in some scenarios knowing the offset can be 
useful, but this usually only arises in select situations such as in 
scheduling where an awareness of local time can provide extra 
information useful for allocating resources. But there are so many 
problems with the interpretation of UTC offsets that there be dragons 
there, beware. The best recommendation is to immediately convert to UTC 
but preserve the offset as extra information if you believe it might be 
useful, but don't use the stored offset for anything authoritative, 
consider as only scheduling guidance if needed.

I don't know if the Python module you're working with is based on 
Python's native datetime or not but I've used datetime a fair amount and 
(at least in Python 2) there is a nasty problem with respect to 
timezones. It appears as if datetime was originally written without 
timezone support and timezone was later grafted on. This lead to "tz 
naive" and "tz aware" datetime objects. This was very unfortunate 
because a timestamp with tz info isn't much use as a timestamp because 
it's ambiguous. It's further compounded by the fact you get "tz naive" 
datetime objects by default, which of course is what most people do. So 
by default you get meaningless timestamps :-( It's kind of like having a 
scalar value without pairing it with a "units" value. Because Python's 
datetime objects are so ambiguous (either by default or by incorrect 
processing of the extra tz info) the clearest, simplest, and most robust 
recommendation is require that every datetime (or time) object be in 
UTC. Hopefully it will be obvious a "tz naive" datetime object can't 
distinguish between a value in UTC or one that has an implicit offset, 
hence the rule "UTC always".

Of course there is problem of comparing datetime values. If datetime 
objects always carried tz offset then using non-UTC values would be fine 
in most cases because the values would be normalized before comparison. 
But without the extra tzinfo data you can't normalize. Python's datetime 
library will throw an exception if you try to compare a naive and aware 
object because normalization can't be performed. But it will happily 
compare two naive objects and unless those objects happen to share the 
same tz offset you're screwed. You're really left with two choices 1) 
always use "tz aware objects", but this is difficult because you don't 
get it by default and it's non-trivial, or 2) always use UTC.

Hope this clarifies things a little.

The biggest lesson I've learned from systems with problems (and there 
are many) is initial weak specification of date/time values and/or 
permitting exceptions, in short order you'll have a mess. The other 
thing to remember is UTC offsets are very tricky and are only meaningful 
for an exact time and location (hence the Olson Database). Try as much 
as possible to avoid offsets, if present process them immediately 
(stored offsets are a huge problem) and treat a supplied offset as 
informational extra data or completely discard it if local scheduling is 
not involved.

John



More information about the OpenStack-dev mailing list