PSR-3 logging has been fully supported in MediaWiki since1.25wmf5. We've been making various tuning improvements since then including a recent deprecation of the initial MWLogger wrapper class in favor of direct usage of Psr\Log\LoggerInterface by wfDebugLog() and other internal wrapper methods [2].
The next big step in introducing structured logging to MediaWiki is to begin to replace calls to wfDebugLog() (and dear god wfDebug()) with direct usage of Psr\Log\LoggerInterface instances. Kunal and I have a couple starter patches that show this in gerrit [3]. In both conversions we have chosen to implement Psr\Log\LoggerAwareInterface in the effected classes to allow setter based injection of a LoggerInterface. The log channel names were also chosen to match the previous wfDebugLog logGroup values. When you use a PSR-3 logger you need to choose a severity for each log message. PSR-3 has quite a few possible levels, but I'd like to propose that we really only need 4 to start with in MediaWiki:
* debug: Useful for ummm.... debugging. :) These are messages that are useful for local development and are generally too "spammy" to output on a production wiki. This would typically include anything currently being logged via wfDebug. * info: Valuable state change information. This level is a great place to record information that would be useful in a production environment when tracing the path of a request that eventually had an error. * warning: A soft error condition such as a recoverable error or another condition that typically should not be seen but isn't halting for the operation in process * error: A hard error such as a caught exception with no recovery path.
The notice, critical, alert and emergency log levels seem unnecessary to me, but I'm willing to hear arguments about where they are super duper useful for some log event state that I haven't thought of yet.
When thinking about Wikimedia cluster logging, events emitted at warning and error levels should be things that you want deployers and operations staff to see in the Logstash "fatalmonitor" view and recorded to other durable logging stores. Events at info level may or may not be captured similar to how we currently enable some but not all wfDebugLog channels. Events at debug level will probably only be captured in beta labs and similar low volume debugging environments.
The wfDebug* methods are not being deprecated officially yet but it would be great if people started treating them like they were deprecated when writing new classes. It would be even more awesome if more folks started making small cleanup patches to convert existing classes to the new style of logging. Tagging gerrit reviews for these patches with "PSR-3" as either a submitter or a reviewer would also be appreciated.
[0]: https://gerrit.wikimedia.org/r/#/c/119940/ [1]: https://gerrit.wikimedia.org/r/#/c/185210/ [2]: https://gerrit.wikimedia.org/r/#/c/184830/ [3]: https://gerrit.wikimedia.org/r/#/q/project:mediawiki/core+branch:master+topi...
Bryan
Huge thanks for this -- logging with wfDebug() and friends has always been a little primitive, and this is both more flexible and lets us remove some of our weird custom code. :)
-- brion
On Thu, Jan 29, 2015 at 11:45 PM, Bryan Davis bd808@wikimedia.org wrote:
PSR-3 logging has been fully supported in MediaWiki since1.25wmf5. We've been making various tuning improvements since then including a recent deprecation of the initial MWLogger wrapper class in favor of direct usage of Psr\Log\LoggerInterface by wfDebugLog() and other internal wrapper methods [2].
The next big step in introducing structured logging to MediaWiki is to begin to replace calls to wfDebugLog() (and dear god wfDebug()) with direct usage of Psr\Log\LoggerInterface instances. Kunal and I have a couple starter patches that show this in gerrit [3]. In both conversions we have chosen to implement Psr\Log\LoggerAwareInterface in the effected classes to allow setter based injection of a LoggerInterface. The log channel names were also chosen to match the previous wfDebugLog logGroup values. When you use a PSR-3 logger you need to choose a severity for each log message. PSR-3 has quite a few possible levels, but I'd like to propose that we really only need 4 to start with in MediaWiki:
- debug: Useful for ummm.... debugging. :) These are messages that are
useful for local development and are generally too "spammy" to output on a production wiki. This would typically include anything currently being logged via wfDebug.
- info: Valuable state change information. This level is a great place
to record information that would be useful in a production environment when tracing the path of a request that eventually had an error.
- warning: A soft error condition such as a recoverable error or
another condition that typically should not be seen but isn't halting for the operation in process
- error: A hard error such as a caught exception with no recovery path.
The notice, critical, alert and emergency log levels seem unnecessary to me, but I'm willing to hear arguments about where they are super duper useful for some log event state that I haven't thought of yet.
When thinking about Wikimedia cluster logging, events emitted at warning and error levels should be things that you want deployers and operations staff to see in the Logstash "fatalmonitor" view and recorded to other durable logging stores. Events at info level may or may not be captured similar to how we currently enable some but not all wfDebugLog channels. Events at debug level will probably only be captured in beta labs and similar low volume debugging environments.
The wfDebug* methods are not being deprecated officially yet but it would be great if people started treating them like they were deprecated when writing new classes. It would be even more awesome if more folks started making small cleanup patches to convert existing classes to the new style of logging. Tagging gerrit reviews for these patches with "PSR-3" as either a submitter or a reviewer would also be appreciated.
Bryan
Bryan Davis Wikimedia Foundation bd808@wikimedia.org [[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA irc: bd808 v:415.839.6885 x6855
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
This is great bryan, thanks for moving this all forward. One exciting part of this logging update is that the log methods also take a context array which (i believe) is imported to kibana and searchable via elasticsearch. Has any effort been put into best practices around what should be in these context arrays?
In my experiences at a $job-1 being able to find related logs via these context values was incredibly useful. We implemented both a global logging context which was used for all logs of the same request including a hash that uniquely identifies the request, the username if logged in, and the route(we don't have a direct equiv to route, Title in some cases). Area specific debugging would add their own info. For example if failing to connect to a remote server it would be included in the logging context so we could easily find all errors connecting to the same machine.
Erik B.
On Fri, Jan 30, 2015 at 11:07 AM, Brion Vibber bvibber@wikimedia.org wrote:
Huge thanks for this -- logging with wfDebug() and friends has always been a little primitive, and this is both more flexible and lets us remove some of our weird custom code. :)
-- brion
On Thu, Jan 29, 2015 at 11:45 PM, Bryan Davis bd808@wikimedia.org wrote:
PSR-3 logging has been fully supported in MediaWiki since1.25wmf5. We've been making various tuning improvements since then including a recent deprecation of the initial MWLogger wrapper class in favor of direct usage of Psr\Log\LoggerInterface by wfDebugLog() and other internal wrapper methods [2].
The next big step in introducing structured logging to MediaWiki is to begin to replace calls to wfDebugLog() (and dear god wfDebug()) with direct usage of Psr\Log\LoggerInterface instances. Kunal and I have a couple starter patches that show this in gerrit [3]. In both conversions we have chosen to implement Psr\Log\LoggerAwareInterface in the effected classes to allow setter based injection of a LoggerInterface. The log channel names were also chosen to match the previous wfDebugLog logGroup values. When you use a PSR-3 logger you need to choose a severity for each log message. PSR-3 has quite a few possible levels, but I'd like to propose that we really only need 4 to start with in MediaWiki:
- debug: Useful for ummm.... debugging. :) These are messages that are
useful for local development and are generally too "spammy" to output on a production wiki. This would typically include anything currently being logged via wfDebug.
- info: Valuable state change information. This level is a great place
to record information that would be useful in a production environment when tracing the path of a request that eventually had an error.
- warning: A soft error condition such as a recoverable error or
another condition that typically should not be seen but isn't halting for the operation in process
- error: A hard error such as a caught exception with no recovery path.
The notice, critical, alert and emergency log levels seem unnecessary to me, but I'm willing to hear arguments about where they are super duper useful for some log event state that I haven't thought of yet.
When thinking about Wikimedia cluster logging, events emitted at warning and error levels should be things that you want deployers and operations staff to see in the Logstash "fatalmonitor" view and recorded to other durable logging stores. Events at info level may or may not be captured similar to how we currently enable some but not all wfDebugLog channels. Events at debug level will probably only be captured in beta labs and similar low volume debugging environments.
The wfDebug* methods are not being deprecated officially yet but it would be great if people started treating them like they were deprecated when writing new classes. It would be even more awesome if more folks started making small cleanup patches to convert existing classes to the new style of logging. Tagging gerrit reviews for these patches with "PSR-3" as either a submitter or a reviewer would also be appreciated.
[3]:
https://gerrit.wikimedia.org/r/#/q/project:mediawiki/core+branch:master+topi...
Bryan
Bryan Davis Wikimedia Foundation bd808@wikimedia.org [[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA irc: bd808 v:415.839.6885 x6855
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Jan 30, 2015 at 11:44 AM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
This is great bryan, thanks for moving this all forward. One exciting part of this logging update is that the log methods also take a context array which (i believe) is imported to kibana and searchable via elasticsearch. Has any effort been put into best practices around what should be in these context arrays?
In my experiences at a $job-1 being able to find related logs via these context values was incredibly useful. We implemented both a global logging context which was used for all logs of the same request including a hash that uniquely identifies the request, the username if logged in, and the route(we don't have a direct equiv to route, Title in some cases). Area specific debugging would add their own info. For example if failing to connect to a remote server it would be included in the logging context so we could easily find all errors connecting to the same machine.
The context data provided with a log message will indeed be passed on to Logstash on the WMF cluster and eventually end up as viewable/searchable in the Kibana frontend. We are using Monolog as the PSR-3 implementation library on the cluster [0]. In the current configuration we are adding several things to all log messages using Monolog's "Processor" classes:
* host => wfHostname() * wiki => wfWikiID() * process_id => getmypid() * uid => random 7 digit hex integer that correlates all log events for a given request * url => $_SERVER['REQUEST_URI'] * ip => $_SERVER['REMOTE_ADDR'] * http_method' => $_SERVER['REQUEST_METHOD'] * server => $_SERVER['SERVER_NAME'] * referrer => $_SERVER['HTTP_REFERER']
It would be great to come up with standard names for other bits of information that might commonly be attached to a log message. 'exception' is a pretty easy one to pick for adding an Exception class to a log event. Monolog will automagically expand any Exception in the log event's context to a nice structure similar to our exception-json log event stream. 'method' and 'class' might be reasonable standard keys to use as well.
[0]: http://noc.wikimedia.org/conf/highlight.php?file=logging.php
Bryan
Thanks Bryan, it is great to see this happening!
On Thu, Jan 29, 2015 at 11:45 PM, Bryan Davis bd808@wikimedia.org wrote:
PSR-3 logging has been fully supported in MediaWiki since1.25wmf5. When thinking about Wikimedia cluster logging, events emitted at warning and error levels should be things that you want deployers and operations staff to see in the Logstash "fatalmonitor" view and recorded to other durable logging stores. Events at info level may or may not be captured similar to how we currently enable some but not all wfDebugLog channels. Events at debug level will probably only be captured in beta labs and similar low volume debugging environments.
Monolog has a logger called FingersCrossed which buffers info level events and logs them together with the next error (or discards them at the end of request if there is no error). Might be useful for debugging, although it would increase disk space and memory use.
The wfDebug* methods are not being deprecated officially yet but it would be great if people started treating them like they were deprecated when writing new classes.
So why not actually deprecate them?
On Fri, Jan 30, 2015 at 12:00 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Thu, Jan 29, 2015 at 11:45 PM, Bryan Davis bd808@wikimedia.org
wrote:
The wfDebug* methods are not being deprecated officially yet but it would be great if people started treating them like they were deprecated when writing new classes.
So why not actually deprecate them?
As Bryan said I think we should, for new classes in core. They're unlikely to need backporting. https://www.mediawiki.org/wiki/Manual:How_to_debug#Logging seems to be the place to deprecate, plus the various $wg pages.
For extensions, our compatibility guidelines [1] say both * "trunk extensions only have to support current trunk MediaWiki" * and "don't break support for old MediaWiki versions unless the compatibility code is causing actual quantifiable problems." So an extension like Flow which depends on fixes to core in 1.25 might as well switch. Whatever extension developers decide, the Extension: page must express the reality of the MW version(s) it requires to work.
For sample code, it's tricky. Should mw.org have code that shows developers how to write extensions that run on latest git or on "Download MediaWiki 1.24.1", or both? (And even for the legacy LTS releases 1.23.8 and 1.19.23.)
[1] https://www.mediawiki.org/wiki/Backward_compatibility#Trunk_extensions.27_co... [2] https://www.mediawiki.org/wiki/Extension_registration
wikitech-l@lists.wikimedia.org