Greetings,
I'm writing a script to read Wikipedia dump files and generate raw text from them, much like it would appear in a web browser. At first, I ignored all macros, discarding anything between {{ ... }}, but I soon learned that some macros generate useful text. Now I need a comprehensive list of all existing macros, to know which of them I should treat. As I believe some macros are language dependent, I am dealing with the Portuguese Wikipedia.
Thank you for any help, Erick
El dt 14 de 02 de 2012 a les 19:14 -0200, en/na Erick Fonseca va escriure:
Greetings,
I'm writing a script to read Wikipedia dump files and generate raw text from them, much like it would appear in a web browser. At first, I ignored all macros, discarding anything between {{ ... }}, but I soon learned that some macros generate useful text. Now I need a comprehensive list of all existing macros, to know which of them I should treat. As I believe some macros are language dependent, I am dealing with the Portuguese Wikipedia.
What a fantastic idea!!! Do let the list know how you get on :)
Fran
Hi Erick,
What you are calling "macros" are actually called "templates" in the Wikipedia (or, more generally, MediaWiki) context. In Portuguese, the name is "Predefinição."
You can find all the templates on a wiki through the "Special:AllPages" page, and then choose the "Template" or "Predefinição" namespace.
Here is a direct link. I suspect you will find there are many thousands of templates; breaking that down into those that supply significant text (which I would guess are the majority) vs. those that do not will probably be a difficult task; sorry, I don't have much to offer in that area.
http://pt.wikipedia.org/w/index.php?title=Especial:Todas_as_p%C3%A1ginas&...
I hope this helps! -Pete
-- Pete Forsyth [[User:Peteforsyth]] peteforsyth@gmail.com 503-383-9454 mobile
On Feb 14, 2012, at 1:14 PM, Erick Fonseca wrote:
Greetings,
I'm writing a script to read Wikipedia dump files and generate raw text from them, much like it would appear in a web browser. At first, I ignored all macros, discarding anything between {{ ... }}, but I soon learned that some macros generate useful text. Now I need a comprehensive list of all existing macros, to know which of them I should treat. As I believe some macros are language dependent, I am dealing with the Portuguese Wikipedia.
Thank you for any help, Erick
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
Although templates are normally in their own separate namespace, any page can theoretically be transcluded - so a comprehensive list of possible transclusions would consist of literally every single page of every single namespace on any given mediawiki installation.
---- Kevin Gorman User:Kgorman-ucb
On Tue, Feb 14, 2012 at 1:14 PM, Erick Fonseca erickrfonseca@gmail.comwrote:
Greetings,
I'm writing a script to read Wikipedia dump files and generate raw text from them, much like it would appear in a web browser. At first, I ignored all macros, discarding anything between {{ ... }}, but I soon learned that some macros generate useful text. Now I need a comprehensive list of all existing macros, to know which of them I should treat. As I believe some macros are language dependent, I am dealing with the Portuguese Wikipedia.
Thank you for any help, Erick
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
Thank you for all the help. I didn't know that there were so many templates, I think I'll try to treat a few of the most common and discard the rest.
Erick
Hallo Erick,
I use the same way (Dump-Scan) for the Project "TemplateTiger" http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Vorlagenauswertung/en Maybe this is a help for your project.
If you want parse the templates, than maybe you can use the API of the mediawiki-Software. Every foundation-project has own API.
See: http://www.mediawiki.org/wiki/API:Parsing_wikitext
For Example: http://pt.wikipedia.org/w/api.php?action=parse&text=%7B%7BCommons%7CEins...
Hope this help!
Stefan (sk)
-------- Original-Nachricht --------
Datum: Wed, 15 Feb 2012 10:26:21 -0200 Von: Erick Fonseca erickrfonseca@gmail.com An: wikipedia-l@lists.wikimedia.org Betreff: [Wikipedia-l] List of all wikipedia macros
Thank you for all the help. I didn't know that there were so many templates, I think I'll try to treat a few of the most common and discard the rest.
Erick
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
wikipedia-l@lists.wikimedia.org