Hi There,
I'm searching for some efficient way to convert the WikiText of the downloaded data dumps(in XML) to plain text. I basically need plain text of each and every revision of Wikipedia articles.
Therefore, it would be very helpful if you can tell me about some library or some piece of code(bunch of regex) to convert WikiText to Plain Text. BTW, I write my code in Python!
Thanks.