De: Gregor Martynus gregor@martynus.net
Para: Platonides platonides@gmail.com CC: "xmldatadumps-l@lists.wikimedia.org" xmldatadumps-l@lists.wikimedia.org Enviado: Martes 12 de junio de 2012 9:13 Asunto: Re: [Xmldatadumps-l] anonymous user account logs (account created / account blocked)
excellent idea, thanks Platonides.
In fact, this is a very common procedure with just one caveat. Big languages sometimes present problems with the dump content that are not present in very small languages. In general, one must program dump parsers to be robust against missing fields (specially, missing author information, or missing text field). HTH. Felipe.
-- Gregor Martynus
On Tuesday, 12. June 2012 at 00:16, Platonides wrote: On 11/06/12 23:22, Gregor Martynus wrote:
Thanks again for your input, sounds like the Stub-meta-history dump is
exactly what we need. I'm already downloading it.
I'm not sure if this is the place for such an suggestion, but it would be great to have example versions of the real dumps, with only a few hundred entries each, just to find out if they fit specific requirements, without the need of downloading sever GB of data. Just a thought.
-- Gregor Martynus
You can use a small wiki for that. A common choice is simplewiki, because a) It's smaller than enwiki b) It's written in English (simple English)
But if you don't have a language barrier, you can go for other wikis. For example, Wikipedia in Ligurian language has just a few thousand pages: http://dumps.wikimedia.org/lijwiki/20120611/
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l