On 4/5/2011 4:00 PM, Platonides wrote
I think he is better parsing the articles, though.
For a linguistic research you don't need things such as the contents of templates, so a simple wikitext stripping would do. And it will be much, much, much, much faster than parsing the whole wiki.
Could be true, but what's fascinating for me about Wikipedia is all of the unscrambled eggs that can be found in the middle of otherwise unstructured text.