Hi There,
I'm searching for some efficient way to convert the WikiText of the
downloaded data dumps(in XML) to plain text. I basically need plain text of
each and every revision of Wikipedia articles.
Therefore, it would be very helpful if you can tell me about some library
or some piece of code(bunch of regex) to convert WikiText to Plain Text.
BTW, I write my code in Python!
Thanks.