Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

3 Sep 2011


      Hi,
I've suggested to generate bulk checksums as well but both Brion and Ariel see the primary purpose of this field to check the validity of the dump generating process and so they want to generate the checksums straight from the external storage.
In a general sense, there are two use cases for this new field:
1) Checking the validity of the XML dump files
2) Identifying reverts
I have started to work on a proposal for deployment (and while being incomplete) it might be a good start to further plan the deployment. I have been trying to come up with some back-of-the-envelope calculations about how much time and space it would take but I don't have all the required information yet to come up with some reasonable estimates.
You can find the proposal here: http://strategy.wikimedia.org/wiki/Proposal:Implement_and_deploy_checksum_re...
I want to thank Brion and Asher for giving feedback on prior drafts. Please feel free to improve this proposal.
Best,
Diederik
PS: not sure if this proposal should be on strategy or mediawiki...
On 2011-09-03, at 7:16 AM, Daniel Friesen wrote:
...
On 11-09-02 09:33 PM, Rob Lanphier wrote:
...
On Fri, Sep 2, 2011 at 5:47 PM, Daniel Friesen
lists@nadir-seen-fire.com wrote:
...
On 11-09-02 05:20 PM, Asher Feldman wrote:
...
When using for analysis, will we wish the new columns had partial indexes
(first 6 characters?)
Bug 2939 is one relevant bug to this, it could probably use an index.
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=2939
My understanding is that having a normal index on a table the size of
our revision table will be far too expensive for db writes.
...
Rob
We've got 5 normal indexes on revision:

A unique int+int
A binary(14)
An int+binary(14)
Another int+binary(14)
And a varchar(255)+binary(14)

That bug wise a (rev_page,rev_sha1) or (rev_page,rev_timestamp,rev_sha1)
may do.
-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)