Hi
Your question is to vaguely formulated - please correct it
On Thu, Jan 12, 2012 at 2:37 PM, Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hello all, is there a query language for wiki syntax? (NOTE: I really do not mean the Wikipedia API here.)
I am looking for an easy way to scrape data from Wiki pages. In this way, we could apply a crowd-sourcing approach to knowledge extraction from Wikis.
There must be thousands of data scraping approaches. But is there one amongst them that has developed a "wiki scraper language" ? Maybe with some sort of fuzziness involved, if the pages are too messy. I have not yet worked with the XML transformation of the wiki markup:
*action=expandtemplates ** generatexml - Generate XML parse tree
Is it any good for issuing XPATH queries ?
1. XPATH reqires XML , mediawiki markup is not XML. 2. the only aplication which (correctly!?) expands templates is MedaiWiki itself. 3. You neglected to explain what you are trying to scrape and what constitutes a messy page.
Thank you very much, Sebastian
-- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Projects: http://nlp2rdf.org , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l