Re: [Wiki-research-l] Library to filter HTML

31 Jan 2008


      I've used BeautifulSoup to get plain text out of rendered HTML dumps. Its
slow and doesn't work that well. What you really want to do it right is an
actual mediawiki parser to strip the syntax out for you.
Try this one: http://code.pediapress.com/wiki/wiki
On Thu, Jan 31, 2008 at 7:57 AM, Kurt Luther luther@cc.gatech.edu wrote:
...
Hi Felipe,
I've found Beautiful Soup to be a useful Python-based HTML parser.
http://www.crummy.com/software/BeautifulSoup/
Kurt
----- Original Message -----
From: "Felipe Ortega" glimmer_phoenix@yahoo.es
To: wiki-research-l@lists.wikimedia.org
Sent: Thursday, January 31, 2008 8:17:53 AM (GMT-0500) America/New_York
Subject: [Wiki-research-l] Library to filter HTML

Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Library to filter HTML