[Wikitech-l] Crawling deWP

Rolf Lampa rolf.lampa at rilnet.com
Wed Jan 28 09:42:12 UTC 2009


Marco Schuster skrev:

> Rolf Lampa  wrote:
>>
>> Doesn't the xml dumps contain the flag for flagged revs?
> 
> The xml dumps are nothing for me, way too much overhead (especially,
> they are old, and I want to use single files, it's easier to process
> these than one huuuuge xml file). And they don't contain flagged
> revisions flags :(

I traverse the last enwiki dump (last revision only) in 15 minutes (or
the Swedish svwiki in < 3 min) with my stream tool (written in Delphi
Pascal).

On the go I can copy the whole thing, (takes no longer) and while at it
I can create the "big three" sql-tables (page, revision & text) out of
the xml dump as well, in less than 20 minutes.

I like Xml dumps. :)

I'd love, however, to see the flagged rev status as an attribute in one 
of the tags, for example <revision flagged_rev="true">

Regards,

// Rolf Lampa





More information about the Wikitech-l mailing list