On Tue, Jul 28, 2020 at 11:27 AM Roy Smith <roy(a)panix.com> wrote:
Is there any specific guidance for what's appropriate or not appropriate for my tool
to be logging?
Currently I would say that
https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#Wh…
is the official rules.
I'm using OAUTH, so my tool knows the identity of
the user. I assume I don't want to be logging that. Or, is it OK, as long as the
logs stay within toolforge?
The most safe thing to assume is that any information your application
collects or logs inside Toolforge will become public to all other
Toolforge members. There are many things you can do to reduce the risk
of data exposure including file permissions and keeping data retention
as short as possible. The shared access nature of Toolforge servers
(like the bastions) means that there is always a risk of a zero-day
local privilege escalation exploit that could expose any and all data
stored there.
For debugging purposes, I'd like to log some kind
of unique session and/or request id, which I'd expose to the user via the tool's
U/I and ask that they report those as part of any bug reports. Are there any issues with
that?
That seems reasonable. I think I would personally avoid using an
actual session id value and instead use some request id system that is
only useful to correlate the log events of a single request. You did
not mention a specific language that you are using, but I imagine
things similar to <https://pypi.org/project/django-log-request-id/>
and <https://pypi.org/project/Flask-Log-Request-ID/> exist for many
languages/frameworks these days. Try searching for things related to
"distributed tracing" if you need to find a helper library or
tutorial.
Bryan
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808