Re: [Xmldatadumps-l] problem with french wikipedia dump

4 Mar 2014


      Thank you, Petr Onderka
I found finally the problem, it came from the module of Python (lxml), 
which work correctly with iterparse() ,
but when i use it to :
1st match <title>,
2nd match his parent <page>,
i have this problem. I pose this question in the mail list of lxml now.
Any way, i can extract the page what i want:
1st. match <page>
2nd, match <title>
it's just a little slowly.
Thank you very much!
Kun JIN
On 03/03/2014 08:31 PM, Petr Onderka wrote:
...
On Fri, Feb 28, 2014 at 3:13 PM, Kun JIN kun.jin@univ-bpclermont.fr wrote:
...
I have another problem with "frwiki-20140208-pages-meta-current.xml". I
tried to extract "
Discussion:Apple"(http://fr.wikipedia.org/wiki/Discussion:Apple). In this
dump, i got last revision of course, but the page has missing text (see
Attached-file "page-Discussion:Apple.xml")
How exactly did you extract the text? When I look into that dump, I
can see the full text.
Petr Onderka
[[en:User:Svick]]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Xmldatadumps-l] problem with french wikipedia dump