Hello,
Given that the current dump process is having problem, why not provide a simple fix such as providing raw table format , SQL files or even CSV files?
I think there are quite still many people using MySQL and prefer these format instead of XML (as you know XML generation and parsing really take time...)
Howard
howard chen schrieb:
Hello,
Given that the current dump process is having problem, why not provide a simple fix such as providing raw table format , SQL files or even CSV files?
Because that would contain private data. It needs to be filtered first. And for that, the text blobs have to be uncompressed and pulled apart first, so individual revisions can be handeled. When all that is done, it can just as well be written as XML. XML as such is not the problem.
-- daniel
-----Original Message----- Given that the current dump process is having problem, why not provide a simple fix such as providing raw table format , SQL files or even CSV files?
Howard _______________________________________________
Howard,
Can't you get the SQL files from running mysqldump from the command line? Why does something new need to be created? I hope I'm not being dense, but I don't understand what new niche you are asking to fill.
Thanks! -Courtney
On Tue, Mar 31, 2009 at 10:02 AM, Christensen, Courtney ChristensenC@battelle.org wrote:
-----Original Message----- Given that the current dump process is having problem, why not provide a simple fix such as providing raw table format , SQL files or even CSV files?
Howard,
Can't you get the SQL files from running mysqldump from the command line? Why does something new need to be created? I hope I'm not being dense, but I don't understand what new niche you are asking to fill.
Because the data (text) isn't in a single database, even for a single project, it is spread across a large number of machines. It's also in a mixture of bizarre internal formats.
The file format it pretty much irrelevant to the 'cost' of producing a dump.
howard chen schrieb:
Hello,
Given that the current dump process is having problem, why not provide a simple fix such as providing raw table format , SQL files or even CSV files?
Ooooh! You meant for Wikipedia, not for individual MediaWiki installations? Well that makes a lot more sense. We call DumpHTML our dump process and it isn't running smoothly for us ATM and clients want flattened HTML from the wikis to take around to trade shows and whatever.
Sorry for being confused! -Courtney
(as you know XML generation and parsing really take time...)
I didn't know that. Ever tried SAX?
On 3/31/09 8:21 AM, Domas Mituzas wrote:
(as you know XML generation and parsing really take time...)
I didn't know that. Ever tried SAX?
Indeed, not all XML software is slow...
http://dotnot.org/blog/archives/2008/02/
- Trevor
On Tue, Mar 31, 2009 at 9:57 AM, howard chen howachen@gmail.com wrote:
Given that the current dump process is having problem, why not provide a simple fix such as providing raw table format , SQL files or even CSV files?
Please pardon this newbie question: is there a succinct explanation of what the problem is with the current Wikipedia dump process?
//Ed
On Wed, Apr 1, 2009 at 12:07 AM, Ed Summers ehs@pobox.com wrote:
Please pardon this newbie question: is there a succinct explanation of what the problem is with the current Wikipedia dump process?
"needs a rewrite"?
what the problem is with the current Wikipedia dump process?
its choking with almost 300 million revisions. it wasnt designed for a wiki this size and needed re-written two years ago
On Wed, Apr 1, 2009 at 9:28 AM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
wrote:
On Wed, Apr 1, 2009 at 12:07 AM, Ed Summers ehs@pobox.com wrote:
Please pardon this newbie question: is there a succinct explanation of what the problem is with the current Wikipedia dump process?
"needs a rewrite"?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Ed Summers wrote:
On Tue, Mar 31, 2009 at 9:57 AM, howard chen howachen@gmail.com wrote:
Given that the current dump process is having problem, why not provide a simple fix such as providing raw table format , SQL files or even CSV files?
Please pardon this newbie question: is there a succinct explanation of what the problem is with the current Wikipedia dump process?
http://wikitech.wikimedia.org/view/Data_dump_redesign
-- brion
wikitech-l@lists.wikimedia.org