---------- Původní e-mail ----------
Od: Dan Andreescu <dandreescu(a)wikimedia.org>
Komu: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Datum: 18. 9. 2017 16:26:18
Předmět: Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?
"So, as things stand, rev_sha1 in the database is used for:
1. the XML dumps process and all the researchers depending on the XML dumps
(probably just for revert detection)
2. revert detection for libraries like python-mwreverts [1]
3. revert detection in mediawiki history reconstruction processes in Hadoop
(Wikistats 2.0)
4. revert detection in Wikistats 1.0
5. revert detection for tools that run on labs, like Wikimetrics
?. I think Aaron also uses rev_sha1 in ORES, but I can't seem to find the
latest code for that service
If you think about this list above as a flow of data, you'll see that
rev_sha1 is replicated to xml, labs databases, hadoop, ML models, etc. So
removing it and adding it back downstream from the main mediawiki database
somewhere, like in XML, cuts off the other places that need it. That means
it must be available either in the mediawiki database or in some other
central database which all those other consumers can pull from.
"
I use rev_sha1 on replicas to check the consistency of modules, templates or
other pages (typically help) which should be same between projects (either
within one language or even crosslanguage, if the page is not language
dependent). In other words to detect possible changes in them and syncing
them.
Also, I haven't noticed it mentioned in the thread: Flow also notices users
on reverts, but IDK whether it uses rev_sha1 or not. So I'm rather
mentioning it.
Kind regards
Danny B.