) is still broken at the
moment but another way to build some word frequency data is by
randomly sampling the wikis for the languages you are interested in.
At least these Indic languages have Wikipedias of varying sizes:
If you'd like to use it I have a tool that downloads random samples of
wiki pages and strips the HTML for purposes such as this.
Andrew Dunbar (hippietrail)
On 14 December 2010 18:36, pravin.d.s(a)gmail.com <pravin.d.s(a)gmail.com> wrote:
I am Pravin Satpute, I am working on language technology and for building
words and it frequency, i required some webpages in indic language.
Can i get the most recent dump without en.wiki
Wikitech-l mailing list