[openstack-dev] [oslo] log message translations
Doug Hellmann
doug.hellmann at dreamhost.com
Wed Jan 29 17:17:09 UTC 2014
On Wed, Jan 29, 2014 at 11:52 AM, Ben Nemec <openstack at nemebean.com> wrote:
> Okay, I think you've convinced me. Specific comments below.
>
> -Ben
>
> On 2014-01-29 07:05, Doug Hellmann wrote:
>
>
> On Tue, Jan 28, 2014 at 8:47 PM, Ben Nemec <openstack at nemebean.com> wrote:
>
>> On 2014-01-27 11:42, Doug Hellmann wrote:
>>
>> We have a blueprint open for separating translated log messages into
>> different domains so the translation team can prioritize them differently
>> (focusing on errors and warnings before debug messages, for example) [1].
>> Some concerns were raised related to the review [2], and I would like to
>> address those in this thread and see if we can reach consensus about how to
>> proceed.
>> The implementation in [2] provides a set of new marker functions similar
>> to _(), one for each log level (we have _LE, LW, _LI, _LD, etc.). These
>> would be used in conjunction with _(), and reserved for log messages.
>> Exceptions, API messages, and other user-facing messages all would still be
>> marked for translation with _() and would (I assume) receive the highest
>> priority work from the translation team.
>> When the string extraction CI job is updated, we will have one "main"
>> catalog for each app or library, and additional catalogs for the log
>> levels. Those show up in transifex separately, but will be named in a way
>> that they are obviously related. Each translation team will be able to
>> decide, based on the requirements of their users, how to set priorities for
>> translating the different catalogs.
>> Existing strings being sent to the log and marked with _() will be
>> removed from the main catalog and moved to the appropriate
>> log-level-specific catalog when their marker function is changed. My
>> understanding is that transifex is smart enough to recognize the same
>> string from more than one source, and to suggest previous translations when
>> it sees the same text. This should make it easier for the translation teams
>> to "catch up" by reusing the translations they have already done, in the
>> new catalogs.
>> One concern that was raised was the need to mark all of the log messages
>> by hand. I investigated using extraction patterns like "LOG.debug(" and
>> "LOG.info(", but because of the way the translation actually works
>> internally we cannot do that. There are a few related reasons.
>> In other applications, the function _() translates a string at the point
>> where it is invoked, and returns a new string object. OpenStack has a
>> requirement that messages be translated multiple times, whether in the API
>> or the LOG (there is already support for logging in more than one language,
>> to different log files). This requirement means we delay the translation
>> operation until right before the string is output, at which time we know
>> the target language. We could update the log functions to create Message
>> objects dynamically, except...
>> Each app or library that uses the translation code will need its own
>> "domain" for the message catalogs. We get around that right now by not
>> translating many messages from the libraries, but that's obviously not what
>> we want long term (we at least want exceptions translated). If we had a
>> special version of a logger in oslo.log that knew how to create Message
>> objects for the format strings used in logging (the first argument to
>> LOG.debug for example), it would also have to know what translation domain
>> to use so the proper catalog could be loaded. The wrapper functions defined
>> in the patch [2] include this information, and can be updated to be
>> application or library specific when oslo.log eventually becomes its own
>> library.
>> Further, as part of moving the logging code from oslo-incubator to
>> oslo.log, and making our logging something we can use from other OpenStack
>> libraries, we are trying to change the implementation of the logging code
>> so it is no longer necessary to create loggers with our special wrapper
>> function. That would mean that oslo.log will be a library for *configuring*
>> logging, but the actual log calls can be handled with Python's standard
>> library, eliminating a dependency between new libraries and oslo.log. (This
>> is a longer, and separate, discussion, but I mention it here as backround.
>> We don't want to change the API of the logger in oslo.log because we don't
>> want to be using it directly in the first place.)
>> Another concern raised was the use of a prefix _L for these functions,
>> since it ties the priority definitions to "logs." I chose that prefix as an
>> explicit indicate that these *are* just for logs. I am not associating any
>> actual priority with them. The translators want us to move the log messages
>> out of the main catalog. Having them all in separate catalogs is a
>> refinement that gives them what they want -- some translators don't care
>> about log messages at all, some only care about errors, etc. We decided
>> that the translators should set priorities, and we would make that possible
>> by separating the catalogs into logical groups. Everything marked with _()
>> will still go into the main catalog, but beyond that it isn't up to the
>> developers to indicate "priority" for translations.
>> The alternative approach of using babel translator comments would, under
>> other circumstances, help because each message could have some indication
>> of its relative importance. However, it does not meet the requirement that
>> the translators (and not the developers) set those priorities. It also
>> doesn't help the translators because the main catalog does not shrink to
>> hold only the user-facing messages. So the comments might be useful in
>> addition to this proposed change, but they doesn't solve the original
>> problem.
>> If we all agree on the approach, I think the patches already in progress
>> should be pretty easy to land in the incubator. The next step is to update
>> the CI jobs that extract the messages and interact with transifex. After
>> that, changes to the applications and existing libraries are likely to take
>> longer, and could be done in batches. They may not happen until the next
>> cycle, but I would like to have the infrastructure in place by the end of
>> this one.
>> Feedback?
>> Doug
>> [1]
>> https://blueprints.launchpad.net/oslo/+spec/log-messages-translation-domain
>> [2] https://review.openstack.org/#/c/65518/
>>
>> I guess my thoughts are still largely the same as on the original
>> review. This is already going to be an additional burden on developers and
>> reviewers (who love i18n so much already ;-) and ideally I'd prefer that we
>> be a little less granular with our designations. Something like _IMPORTANT
>> and _OPTIONAL instead of separate translation domains for each individual
>> log level. Maybe that can't get the translation load down to a manageable
>> level though. I'm kind of guessing on that point.
>>
>
> We did consider something like that at the summit, IIRC. However, we
> wanted to leave the job of setting the priority for doing the translation
> up to the translators, rather than the developers, because the priorities
> vary by language. Using designators that match the log output level lowers
> the review burden, because you don't have to think about the importance of
> translation, only whether or not the translator tag matches the log
> function.
>
>
> Hmm, hadn't thought about it that way, but it does actually make more
> work for reviewers. I guess that means I'm good with the 1:1 log
> level:translation domain mapping. :-)
>
> I wonder if we could add something into our log wrappers to check that
> Message domains match the log level in use. It wouldn't be able to catch
> everything, but maybe we could turn it on in the gate and at least verify
> anything that gets logged during those runs. Something to consider once
> we've implemented this, I guess.
>
I like that. I found a few cases in the incubator where we have something
like:
msg = _('some text: %s') % args
LOG.alevel(msg)
raise Exception(msg)
So we do have some known cases where the levels won't match. Because the
message is used in an exception, I left it wrapped in _() for now, but we
can think about whether that's the best approach once we have the basic
pieces all working.
>
>> For reference, I grepped the nova source to see how many times we're
>> logging at each of the different levels. It's a very rough estimate since
>> I'm sure I'm missing some things and there are almost certainly some dupes,
>> but I would expect it to be relatively close to reality. Here were the
>> results:
>>
>> [fedora at openstack nova]$ grep -ri log.error | wc -l
>> 190
>> [fedora at openstack nova]$ grep -ri log.warn | wc -l
>> 286
>> [fedora at openstack nova]$ grep -ri log.info | wc -l
>> 254
>> [fedora at openstack nova]$ grep -ri log.debug | wc -l
>> 849
>>
>> It seems like debug is the low-hanging fruit here - getting rid of that
>> eliminates more translations than the rest of the log levels combined
>> (since it looks like Nova is translating the vast majority of their debug
>> messages). I don't know if that's helpful (enough) though.
>>
> I'm not sure either. Daisy, would it solve your team's needs if we just
> removed translation markers from debug log messages and left everything in
> the same catalog? It's not what we talked about at the summit, but maybe
> it's an alternative?
>
>
> A lot of my motivation for getting these numbers was finding a
> "simpler" way to break down translation domains, but since I seem to have
> changed my mind on that I'm not as hung up on this. If we can accomplish
> what we need by dropping debug translations that would be great, but since
> those numbers don't include non-log translations I'm guessing it won't be
> enough. Still interested to hear from Daisy though.
>
Yes, I'd like to make sure our plans meet their needs. We could take both
approaches -- not translate debug messages at all, and mark the others
based on their level.
>
>> I suppose my biggest concern is getting reviewers to buy in to whatever
>> we do. It's going to be some additional workload for them since we likely
>> can't enforce this through a hacking rule, and some people basically refuse
>> to touch anything to do with translation as it is. It's also one more
>> hurdle for new contributors since it's a non-standard way of handling
>> translation. And, as I noted on the review, it's almost certainly going to
>> get out of sync over time as people adjust log message priorities and
>> such. Maybe those are all issues we just have to accept, but they are
>> issues.
>>
> I expect we'll need to set some project-wide standards, as Sean is doing
> with the meanings of the various log levels.
>
>>
>> Oh, one other thing I wanted to ask about was what the status of
>> Transifex is as far as OpenStack is concerned. My understanding was that
>> we were looking for alternatives because Transifex had pretty much
>> abandoned their open source version. Does that have any impact on this?
>>
> If we replace it, we will replace it with another tool. The file formats
> are standardized, so I wouldn't expect a tool change at that level to
> affect our decision on this question.
>
>
> Fair enough. Handling this gracefully would just become a requirement
> on any new tool we adopted.
>
Agreed.
Doug
>
> Doug
>
>>
>> Anyway, it's getting late and my driveway won't shovel itself, so those
>> are my slightly rambling thoughts on this. :-)
>>
>> -Ben
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140129/1d25a305/attachment.html>
More information about the OpenStack-dev
mailing list