Re: [Xmldatadumps-l] problem with french wikipedia dump

4 Mar 2014

Thank you, Petr Onderka

I found finally the problem, it came from the module of Python (lxml), 
which work correctly with iterparse() ,
but when i use it to :
1st match <title>,
2nd match his parent <page>,
i have this problem. I pose this question in the mail list of lxml now.

Any way, i can extract the page what i want:
1st. match <page>
2nd, match <title>
it's just a little slowly.

Thank you very much!

Kun JIN

On 03/03/2014 08:31 PM, Petr Onderka wrote:
...
  On Fri, Feb 28, 2014 at 3:13 PM, Kun JIN
&lt;kun.jin(a)univ-bpclermont.fr&gt; wrote:
  I have another problem with
"frwiki-20140208-pages-meta-current.xml". I
 tried to extract "
 Discussion:Apple"(http://fr.wikipedia.org/wiki/Discussion:Apple)pple). In this
 dump, i got last revision of course, but the page has missing text (see
 Attached-file "page-Discussion:Apple.xml")  How exactly did you extract
the text? When I look into that dump, I
 can see the full text.

 Petr Onderka
 [[en:User:Svick]] 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Xmldatadumps-l] problem with french wikipedia dump