Hi,

   If you want to do more NLP research on enwiki and having an NLP markup of Wikipedia is the bottleneck, you should look at the WIKI dataset just released by Chris Re's team at Stanford based on a snapshot of enwiki as of late January 2015. You can find this and other interesting datasets released by the team at http://deepdive.stanford.edu/doc/opendata/ The data format is explained on the top of the page.

   Making the WIKI dataset required 24K machine hours. The team has access to more machine hours and is actively receiving feedback from the NLP community to generate more datasets. If you're interested about the recent release or have suggestions for the team to generate other datasets based on publicly available data, please contact the team.

Best,
Leila