On Tue, Jul 28, 2020 at 11:27 AM Roy Smith roy@panix.com wrote:
Is there any specific guidance for what's appropriate or not appropriate for my tool to be logging?
Currently I would say that https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#Wha... is the official rules.
I'm using OAUTH, so my tool knows the identity of the user. I assume I don't want to be logging that. Or, is it OK, as long as the logs stay within toolforge?
The most safe thing to assume is that any information your application collects or logs inside Toolforge will become public to all other Toolforge members. There are many things you can do to reduce the risk of data exposure including file permissions and keeping data retention as short as possible. The shared access nature of Toolforge servers (like the bastions) means that there is always a risk of a zero-day local privilege escalation exploit that could expose any and all data stored there.
For debugging purposes, I'd like to log some kind of unique session and/or request id, which I'd expose to the user via the tool's U/I and ask that they report those as part of any bug reports. Are there any issues with that?
That seems reasonable. I think I would personally avoid using an actual session id value and instead use some request id system that is only useful to correlate the log events of a single request. You did not mention a specific language that you are using, but I imagine things similar to https://pypi.org/project/django-log-request-id/ and https://pypi.org/project/Flask-Log-Request-ID/ exist for many languages/frameworks these days. Try searching for things related to "distributed tracing" if you need to find a helper library or tutorial.
Bryan