Re: [Wiki-research-l] Library to filter HTML

31 Jan 2008


      Thanks a lot. Performance is an important issue in this case (think about parsing the entire enwiki).
I'll give it a chance and post my comments.
Thanks for the feedback.
Felipe.
Brian Brian.Mingus@colorado.edu escribió: s/right/write/. pre-morning coffee still :)
On Thu, Jan 31, 2008 at 9:33 AM, Brian Brian.Mingus@colorado.edu wrote:
 I've used BeautifulSoup to get plain text out of rendered HTML dumps. Its slow and doesn't work that well. What you really want to do it right is an actual mediawiki parser to strip the syntax out for you.
Try this one: http://code.pediapress.com/wiki/wiki
On Thu, Jan 31, 2008 at 7:57 AM, Kurt Luther luther@cc.gatech.edu wrote:
  Hi Felipe,
I've found Beautiful Soup to be a useful Python-based HTML parser.
http://www.crummy.com/software/BeautifulSoup/
Kurt
----- Original Message -----
 From: "Felipe Ortega" glimmer_phoenix@yahoo.es
 To: wiki-research-l@lists.wikimedia.org
 Sent: Thursday, January 31, 2008 8:17:53 AM (GMT-0500) America/New_York
 Subject: [Wiki-research-l] Library to filter HTML
_______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
---------------------------------
¿Con Mascota por primera vez? - Sé un mejor Amigo
Entra en Yahoo! Respuestas.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Library to filter HTML