[Wikitech-l] Crawling deWP
Rolf Lampa
rolf.lampa at rilnet.com
Wed Jan 28 09:42:12 UTC 2009
Marco Schuster skrev:
> Rolf Lampa wrote:
>>
>> Doesn't the xml dumps contain the flag for flagged revs?
>
> The xml dumps are nothing for me, way too much overhead (especially,
> they are old, and I want to use single files, it's easier to process
> these than one huuuuge xml file). And they don't contain flagged
> revisions flags :(
I traverse the last enwiki dump (last revision only) in 15 minutes (or
the Swedish svwiki in < 3 min) with my stream tool (written in Delphi
Pascal).
On the go I can copy the whole thing, (takes no longer) and while at it
I can create the "big three" sql-tables (page, revision & text) out of
the xml dump as well, in less than 20 minutes.
I like Xml dumps. :)
I'd love, however, to see the flagged rev status as an attribute in one
of the tags, for example <revision flagged_rev="true">
Regards,
// Rolf Lampa
More information about the Wikitech-l
mailing list