Re: [Wikitech-l] WikiXRay python parser improved

7 Jul 2007


      I've uploaded some performance measures for the parser:
http://meta.wikimedia.org/wiki/WikiXRay_Python_parser
Felipe.
Felipe Ortega glimmer_phoenix@yahoo.es escribió: Hi.
While I finish the standard edition of the WikiXRay Python parser (for general purposes like retrieving the whole text of each revision) I have improved the research version of the parser.
The new code is at: http://meta.wikimedia.org/wiki/WikiXRay_parser
Basically, I introduced a recipe from the Python Cookbook that speeds up the parsing process, filtering text events until the parsers has read the whole block of text between two tags.
Testing it against the tiny dump of furwiki, it reduced the processing time from 38.7 seconds to 15.019.
Maybe the speed up will be lower with big dumps, but I hope anyway it will be faster than the previous version.
Felipe.
---------------------------------
LLama Gratis a cualquier PC del Mundo.
Llamadas a fijos y móviles desde 1 céntimo por minuto.
http://es.voice.yahoo.com
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
---------------------------------
LLama Gratis a cualquier PC del Mundo.
Llamadas a fijos y móviles desde 1 céntimo por minuto.
http://es.voice.yahoo.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] WikiXRay python parser improved