<html><body>
<p><tt><font size="2">Ben Nemec <openstack@nemebean.com> wrote on 2014/01/30 00:52:14:<br>
<br>
> Ben Nemec <openstack@nemebean.com> </font></tt><br>
<tt><font size="2">> 2014/01/30 00:52</font></tt><br>
<tt><font size="2">> <br>
> Please respond to<br>
> openstack@nemebean.com</font></tt><br>
<tt><font size="2">> <br>
> To</font></tt><br>
<tt><font size="2">> <br>
> Doug Hellmann <doug.hellmann@dreamhost.com>, </font></tt><br>
<tt><font size="2">> <br>
> cc</font></tt><br>
<tt><font size="2">> <br>
> "OpenStack Development Mailing List (not for usage questions)" <br>
> <openstack-dev@lists.openstack.org>, Ying Chun Guo/China/IBM@IBMCN</font></tt><br>
<tt><font size="2">> <br>
> Subject</font></tt><br>
<tt><font size="2">> <br>
> Re: [openstack-dev] [oslo] log message translations</font></tt><br>
<tt><font size="2">> <br>
> Okay, I think you've convinced me. Specific comments below.</font></tt><br>
<tt><font size="2">> -Ben</font></tt><br>
<tt><font size="2">> On 2014-01-29 07:05, Doug Hellmann wrote:</font></tt><br>
<tt><font size="2">> <br>
> On Tue, Jan 28, 2014 at 8:47 PM, Ben Nemec <openstack@nemebean.com> wrote:</font></tt><br>
<tt><font size="2">> On 2014-01-27 11:42, Doug Hellmann wrote:</font></tt><br>
<tt><font size="2">> We have a blueprint open for separating translated log messages into<br>
> different domains so the translation team can prioritize them <br>
> differently (focusing on errors and warnings before debug messages, <br>
> for example) [1]. Some concerns were raised related to the review <br>
> [2], and I would like to address those in this thread and see if we <br>
> can reach consensus about how to proceed.</font></tt><br>
<tt><font size="2">> The implementation in [2] provides a set of new marker functions <br>
> similar to _(), one for each log level (we have _LE, LW, _LI, _LD, <br>
> etc.). These would be used in conjunction with _(), and reserved for<br>
> log messages. Exceptions, API messages, and other user-facing <br>
> messages all would still be marked for translation with _() and <br>
> would (I assume) receive the highest priority work from the translation team.</font></tt><br>
<tt><font size="2">> When the string extraction CI job is updated, we will have one <br>
> "main" catalog for each app or library, and additional catalogs for <br>
> the log levels. Those show up in transifex separately, but will be <br>
> named in a way that they are obviously related. Each translation <br>
> team will be able to decide, based on the requirements of their <br>
> users, how to set priorities for translating the different catalogs.</font></tt><br>
<tt><font size="2">> Existing strings being sent to the log and marked with _() will be <br>
> removed from the main catalog and moved to the appropriate log-<br>
> level-specific catalog when their marker function is changed. My <br>
> understanding is that transifex is smart enough to recognize the <br>
> same string from more than one source, and to suggest previous <br>
> translations when it sees the same text. This should make it easier <br>
> for the translation teams to "catch up" by reusing the translations <br>
> they have already done, in the new catalogs.</font></tt><br>
<tt><font size="2">> One concern that was raised was the need to mark all of the log <br>
> messages by hand. I investigated using extraction patterns like <br>
> "LOG.debug(" and "LOG.info(", but because of the way the translation<br>
> actually works internally we cannot do that. There are a few related reasons.</font></tt><br>
<tt><font size="2">> In other applications, the function _() translates a string at the <br>
> point where it is invoked, and returns a new string object. <br>
> OpenStack has a requirement that messages be translated multiple <br>
> times, whether in the API or the LOG (there is already support for <br>
> logging in more than one language, to different log files). This <br>
> requirement means we delay the translation operation until right <br>
> before the string is output, at which time we know the target <br>
> language. We could update the log functions to create Message <br>
> objects dynamically, except...</font></tt><br>
<tt><font size="2">> Each app or library that uses the translation code will need its own<br>
> "domain" for the message catalogs. We get around that right now by <br>
> not translating many messages from the libraries, but that's <br>
> obviously not what we want long term (we at least want exceptions <br>
> translated). If we had a special version of a logger in oslo.log <br>
> that knew how to create Message objects for the format strings used <br>
> in logging (the first argument to LOG.debug for example), it would <br>
> also have to know what translation domain to use so the proper <br>
> catalog could be loaded. The wrapper functions defined in the patch <br>
> [2] include this information, and can be updated to be application <br>
> or library specific when oslo.log eventually becomes its own library.</font></tt><br>
<tt><font size="2">> Further, as part of moving the logging code from oslo-incubator to <br>
> oslo.log, and making our logging something we can use from other <br>
> OpenStack libraries, we are trying to change the implementation of <br>
> the logging code so it is no longer necessary to create loggers with<br>
> our special wrapper function. That would mean that oslo.log will be <br>
> a library for *configuring* logging, but the actual log calls can be<br>
> handled with Python's standard library, eliminating a dependency <br>
> between new libraries and oslo.log. (This is a longer, and separate,<br>
> discussion, but I mention it here as backround. We don't want to <br>
> change the API of the logger in oslo.log because we don't want to be<br>
> using it directly in the first place.)</font></tt><br>
<tt><font size="2">> Another concern raised was the use of a prefix _L for these <br>
> functions, since it ties the priority definitions to "logs." I chose<br>
> that prefix as an explicit indicate that these *are* just for logs. <br>
> I am not associating any actual priority with them. The translators <br>
> want us to move the log messages out of the main catalog. Having <br>
> them all in separate catalogs is a refinement that gives them what <br>
> they want -- some translators don't care about log messages at all, <br>
> some only care about errors, etc. We decided that the translators <br>
> should set priorities, and we would make that possible by separating<br>
> the catalogs into logical groups. Everything marked with _() will <br>
> still go into the main catalog, but beyond that it isn't up to the <br>
> developers to indicate "priority" for translations.</font></tt><br>
<tt><font size="2">> The alternative approach of using babel translator comments would, <br>
> under other circumstances, help because each message could have some<br>
> indication of its relative importance. However, it does not meet the<br>
> requirement that the translators (and not the developers) set those <br>
> priorities. It also doesn't help the translators because the main <br>
> catalog does not shrink to hold only the user-facing messages. So <br>
> the comments might be useful in addition to this proposed change, <br>
> but they doesn't solve the original problem.</font></tt><br>
<tt><font size="2">> If we all agree on the approach, I think the patches already in <br>
> progress should be pretty easy to land in the incubator. The next <br>
> step is to update the CI jobs that extract the messages and interact<br>
> with transifex. After that, changes to the applications and existing<br>
> libraries are likely to take longer, and could be done in batches. <br>
> They may not happen until the next cycle, but I would like to have <br>
> the infrastructure in place by the end of this one.</font></tt><br>
<tt><font size="2">> Feedback?</font></tt><br>
<tt><font size="2">> Doug</font></tt><br>
<tt><font size="2">> [1] <a href="https://blueprints.launchpad.net/oslo/+spec/log-messages-">https://blueprints.launchpad.net/oslo/+spec/log-messages-</a><br>
> translation-domain</font></tt><br>
<tt><font size="2">> [2] <a href="https://review.openstack.org/#/c/65518/">https://review.openstack.org/#/c/65518/</a></font></tt><br>
<tt><font size="2">> I guess my thoughts are still largely the same as on the original <br>
> review. This is already going to be an additional burden on <br>
> developers and reviewers (who love i18n so much already ;-) and <br>
> ideally I'd prefer that we be a little less granular with our <br>
> designations. Something like _IMPORTANT and _OPTIONAL instead of <br>
> separate translation domains for each individual log level. Maybe <br>
> that can't get the translation load down to a manageable level <br>
> though. I'm kind of guessing on that point.</font></tt><br>
<tt><font size="2">> </font></tt><br>
<tt><font size="2">> We did consider something like that at the summit, IIRC. However, we<br>
> wanted to leave the job of setting the priority for doing the <br>
> translation up to the translators, rather than the developers, <br>
> because the priorities vary by language. Using designators that <br>
> match the log output level lowers the review burden, because you <br>
> don't have to think about the importance of translation, only <br>
> whether or not the translator tag matches the log function. </font></tt><br>
<tt><font size="2">> </font></tt><br>
<tt><font size="2">> Hmm, hadn't thought about it that way, but it does actually make <br>
> more work for reviewers. I guess that means I'm good with the 1:1 <br>
> log level:translation domain mapping. :-)</font></tt><br>
<tt><font size="2">> </font></tt><br>
<tt><font size="2">> I wonder if we could add something into our log wrappers to check <br>
> that Message domains match the log level in use. It wouldn't be <br>
> able to catch everything, but maybe we could turn it on in the gate <br>
> and at least verify anything that gets logged during those runs. <br>
> Something to consider once we've implemented this, I guess.</font></tt><br>
<tt><font size="2">> <br>
> For reference, I grepped the nova source to see how many times we're<br>
> logging at each of the different levels. It's a very rough estimate<br>
> since I'm sure I'm missing some things and there are almost <br>
> certainly some dupes, but I would expect it to be relatively close <br>
> to reality. Here were the results:<br>
> <br>
> [fedora@openstack nova]$ grep -ri log.error | wc -l<br>
> 190<br>
> [fedora@openstack nova]$ grep -ri log.warn | wc -l<br>
> 286<br>
> [fedora@openstack nova]$ grep -ri log.info | wc -l<br>
> 254<br>
> [fedora@openstack nova]$ grep -ri log.debug | wc -l<br>
> 849<br>
> <br>
> It seems like debug is the low-hanging fruit here - getting rid of <br>
> that eliminates more translations than the rest of the log levels <br>
> combined (since it looks like Nova is translating the vast majority <br>
> of their debug messages). I don't know if that's helpful (enough) though.</font></tt><br>
<tt><font size="2">> I'm not sure either. Daisy, would it solve your team's needs if we <br>
> just removed translation markers from debug log messages and left <br>
> everything in the same catalog? It's not what we talked about at the<br>
> summit, but maybe it's an alternative?</font></tt><br>
<tt><font size="2">> </font></tt><br>
<tt><font size="2">> A lot of my motivation for getting these numbers was finding a <br>
> "simpler" way to break down translation domains, but since I seem to<br>
> have changed my mind on that I'm not as hung up on this. If we can <br>
> accomplish what we need by dropping debug translations that would be<br>
> great, but since those numbers don't include non-log translations <br>
> I'm guessing it won't be enough. Still interested to hear from Daisy though.</font></tt><br>
<tt><font size="2">> </font></tt><br>
<br>
<tt><font size="2">In my mind, one important </font></tt><tt><font size="2">objective </font></tt><tt><font size="2">to separate log messages is to separating</font></tt><br>
<tt><font size="2">user facing messages ( API response messages, CLI messages ) from log messages.</font></tt><br>
<tt><font size="2">Translation of user facing messages is a "must have" in Openstack localization.</font></tt><br>
<tt><font size="2">Translation of log messages is a "good to have". Removing translation markers from </font></tt><br>
<tt><font size="2">debug log messages and leaving everything in the same catalog cannot satisfy </font></tt><br>
<tt><font size="2">this objective. </font></tt><br>
<br>
<tt><font size="2">After removing translation markers from debug log messages, putting other log </font></tt><br>
<tt><font size="2">messages (ERROR, WARNING, INFO, AUDIT...) in a same catalog, and leaving the </font></tt><br>
<tt><font size="2">other user facing messages (API response messages, CLI messages ) in the other </font></tt><br>
<tt><font size="2">catalog can be a basic feature.</font></tt><br>
<br>
<tt><font size="2">It's an advanced feature to separating different log level messages into different</font></tt><br>
<tt><font size="2">catalogs. It allows translators to set different priorities to different log levels.</font></tt><br>
<tt><font size="2">It could easily allow users to customize some levels of log messages too.</font></tt><br>
<br>
<tt><font size="2">For example, error messages are very important but warning and info messages are not.</font></tt><br>
<tt><font size="2">If a company wants to localize Openstack but time is limited, they may want to focus </font></tt><br>
<tt><font size="2">on error messages. They can translate their own error messages ( or do a carefully check)</font></tt><br>
<tt><font size="2">and use the community translations of warning and info.</font></tt><br>
<br>
<tt><font size="2">If it's not a big workload to satisfy both the basic one and the advanced one,</font></tt><br>
<tt><font size="2">I would like to implement them together.</font></tt><br>
<br>
<tt><font size="2">Regards</font></tt><br>
<tt><font size="2">Daisy</font></tt><br>
<br>
<tt><font size="2">> I suppose my biggest concern is getting reviewers to buy in to <br>
> whatever we do. It's going to be some additional workload for them <br>
> since we likely can't enforce this through a hacking rule, and some <br>
> people basically refuse to touch anything to do with translation as <br>
> it is. It's also one more hurdle for new contributors since it's a <br>
> non-standard way of handling translation. And, as I noted on the <br>
> review, it's almost certainly going to get out of sync over time as <br>
> people adjust log message priorities and such. Maybe those are all <br>
> issues we just have to accept, but they are issues.</font></tt><br>
<tt><font size="2">> I expect we'll need to set some project-wide standards, as Sean is <br>
> doing with the meanings of the various log levels.</font></tt><br>
<tt><font size="2">> <br>
> Oh, one other thing I wanted to ask about was what the status of <br>
> Transifex is as far as OpenStack is concerned. My understanding was<br>
> that we were looking for alternatives because Transifex had pretty <br>
> much abandoned their open source version. Does that have any impact on this?</font></tt><br>
<tt><font size="2">> If we replace it, we will replace it with another tool. The file <br>
> formats are standardized, so I wouldn't expect a tool change at that<br>
> level to affect our decision on this question.</font></tt><br>
<tt><font size="2">> </font></tt><br>
<tt><font size="2">> Fair enough. Handling this gracefully would just become a <br>
> requirement on any new tool we adopted.</font></tt><br>
<tt><font size="2">> </font></tt><br>
<tt><font size="2">> Doug</font></tt><br>
<tt><font size="2">> <br>
> Anyway, it's getting late and my driveway won't shovel itself, so <br>
> those are my slightly rambling thoughts on this. :-)<br>
> <br>
> -Ben</font></tt><br>
<tt><font size="2">> <br>
> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> OpenStack-dev@lists.openstack.org<br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</font></tt><br>
<tt><font size="2">> </font></tt><br>
<tt><font size="2">> </font></tt></body></html>