Hi,
I have two questions related to each other.
I would like to know if there is a simple way to get the Static HTML from the Wikipedia Articles i.e. extraction as HTML files.
In this regard I managed to put the Text Table into a MySQL database. It can give me the Wiki Text – which I could then parse. However I found the Wikipedia Syntax more complicated than what I had used when contributing to Wikipedia myself. Is there some place where the complete syntax is specified? At least if I have the specification I can think about working on a parser.
Thanks a lot. O.O.
__________________________________________________ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! Regístrate ya - http://correo.espanol.yahoo.com/
O. Olson skrev:
Hi,
I have two questions related to each other.
I would like to know if there is a simple way to get the Static HTML from the Wikipedia Articles i.e. extraction as HTML files.
Try this: http://static.wikipedia.org/
Regards,
// Rolf Lampa
Dear Rolf,
I don’t think I understood what you are referring to. To get the Static HTML you mean I would have to spider/crawl through those pages? I thought Wikipedia explicitly did not allow this.
Also any idea regarding the syntax specification?
Thanks again for your post. O.O.
--- El sáb 5-jul-08, Rolf Lampa rolf.lampa@rilnet.com escribió:
De: Rolf Lampa rolf.lampa@rilnet.com Asunto: Re: [Wikipedia-l] Wikipedia HTML & Syntax specification A: wikipedia-l@lists.wikimedia.org Fecha: sábado, 5 julio, 2008, 5:49 am O. Olson skrev:
Hi,
I have two questions related to each other.
I would like to know if there is a simple way to get
the Static HTML from the Wikipedia Articles i.e. extraction as HTML files.
Try this: http://static.wikipedia.org/
Regards,
// Rolf Lampa
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
__________________________________________________ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! Regístrate ya - http://correo.espanol.yahoo.com/
O. Olson skrev:
Dear Rolf,
I don’t think I understood what you are referring to. To get the Static HTML you mean I would have to spider/crawl through those pages? I thought Wikipedia explicitly did not allow this.
???
You wouldn't have to crawl. There's a download-link in the middle of the page I linked to. If you for example would like to have the entire English Wikipedia's content in html format, then download it: http://static.wikipedia.org/downloads/2008-06/en/
Or did I misunderstand your question entirely?
Also any idea regarding the syntax specification?
There's no end to all the pages written on that subject. But I don't know of any place where you can find all of it on one page. For a start see this: http://en.wikipedia.org/wiki/Help:Contents and http://en.wikipedia.org/wiki/Wikipedia:Cheatsheet and go on with referred pages, like: http://meta.wikimedia.org/wiki/Help:Link
Regards,
// Rolf Lampa
--- El lun 7-jul-08, Rolf Lampa rolf.lampa@rilnet.com escribió:
You wouldn't have to crawl. There's a download-link in the middle of the page I linked to. If you for example would like to have the entire English Wikipedia's content in html format, then download it: http://static.wikipedia.org/downloads/2008-06/en/
Thanks Rolf. This was not clear from your original post – but I have since downloaded it. However it seems too big to extract in the 200 GB space I have on my drive. I am trying to borrow a terabyte drive from my friend over the next week to see how everything looks. Thanks again. O.O.
__________________________________________________ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! Regístrate ya - http://correo.espanol.yahoo.com/
wikipedia-l@lists.wikimedia.org