Re: [Wikitech-l] Dump processes seem to be dead

24 Feb 2009


      Ariel,
Thank you for giving some insight into what has been going on behind  
the scenes.
I have a few questions that will hopefully get some answers to those  
of us eager to help out
in any way we can.
What are the planned code changes to speed the process up? Can we help  
this volunteer
with the coding or architectural decisions? How much time do they have  
to dedicate to it?  Some visibility
into the fix and timeline would benefit a lot of us.  It would also  
help us know how we can
help out!
Thanks again for shedding some light on the issue.
On Feb 22, 2009, at 8:12 PM, Ariel T. Glenn wrote:
...
The reason these dumps are not rewritten more efficiently is that this
job was handed to me (at my request) and I have not been able to get  
to
it, even though it is the first thing on my list for development work.
So, if there are going to be rants, they can be directed at me, not at
the whole team.
The work was started already by a volunteer.  As I am the blocking
factor, someone else should probably take it on and get it done,  
though
it will make me sad.  Brion discussed this with me about a week and a
half ago and I still wanted to keep it then but it doesn't make sense.
The in-office needs that I am also responsible for take virtually  
all of
my time.  Perhaps they shouldn't, but that is how it has worked out.
So, I am very sorry for having needlessly held things up.  (I also  
have
aa crawler that requests pages changed since the latest xml dump, so
that projects I am on can keep a current xml file; we've been running
that way for at least a year.)
Ariel
Στις 23-02-2009, ημέρα Δευ, και ώρα 00:37 +0100, ο/ 
η Gerard Meijssen
έγραψε:
...
Hoi,
There have been previous offers for developer time and for  
hardware...
Thanks,
      GerardM
2009/2/23 Platonides Platonides@gmail.com
...
Robert Ullmann wrote:
...
Hi,
Maybe I should offer a constructive suggestion?
They are better than rants :)
...
Clearly, trying to do these dumps (particularly "history" dumps)  
as it
is being done from the servers is proving hard to manage
I also realize that you can't just put the set of daily
permanent-media backups on line, as they contain lots of user info,
plus deleted and oversighted revs, etc.
But would it be possible to put each backup disc (before sending  
one
of the several copies off to its secure storage) in a machine that
would filter all the content into a public file (or files)? Then
someone else could download each disc (i.e. a 10-15 GB chunk of
updates) and sort it into the useful files for general download?
I don't think they move backup copies off to secure storage. They  
have
the db replicated and the backup discs would be copies of that same
dumps. (Some sysadmin to confirm?)
...
Then someone can produce a current (for example) English 'pedia XML
file; and with more work the cumulative history files (if we want  
that
as one file).
There would be delays, each of your permanent media backup discs  
has
to be (probably manually, but changers are available) loaded on the
"filter" system, and I don't know how many discs WMF generates per
day. (;-) and then it has to filter all the revision data etc.  
But it
still would easily be available for others in 48-72 hours, which  
beats
the present ~6 weeks when the dumps are working.
No shortage of people with a box or two and any number of Tbyte  
hard
drives that might be willing to help, if they can get the raw  
backups.
The problem is that WMF can't provide that raw unfiltered  
information.
Perhaps you could donate a box on the condition that it could only  
be
used for dump processing, but giving out unfiltered data would be  
too
risky.

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Dump processes seem to be dead