---------- Původní e-mail ---------- Od: Dan Andreescu dandreescu@wikimedia.org Komu: Wikimedia developers wikitech-l@lists.wikimedia.org Datum: 18. 9. 2017 16:26:18 Předmět: Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)? "So, as things stand, rev_sha1 in the database is used for:
1. the XML dumps process and all the researchers depending on the XML dumps (probably just for revert detection) 2. revert detection for libraries like python-mwreverts [1] 3. revert detection in mediawiki history reconstruction processes in Hadoop (Wikistats 2.0) 4. revert detection in Wikistats 1.0 5. revert detection for tools that run on labs, like Wikimetrics ?. I think Aaron also uses rev_sha1 in ORES, but I can't seem to find the latest code for that service
If you think about this list above as a flow of data, you'll see that rev_sha1 is replicated to xml, labs databases, hadoop, ML models, etc. So removing it and adding it back downstream from the main mediawiki database somewhere, like in XML, cuts off the other places that need it. That means it must be available either in the mediawiki database or in some other central database which all those other consumers can pull from. "
I use rev_sha1 on replicas to check the consistency of modules, templates or other pages (typically help) which should be same between projects (either within one language or even crosslanguage, if the page is not language dependent). In other words to detect possible changes in them and syncing them.
Also, I haven't noticed it mentioned in the thread: Flow also notices users on reverts, but IDK whether it uses rev_sha1 or not. So I'm rather mentioning it.
Kind regards
Danny B.