) is still broken at the
moment but another way to build some word frequency data is by
randomly sampling the wikis for the languages you are interested in.
At least these Indic languages have Wikipedias of varying sizes:
Assamese
If you'd like to use it I have a tool that downloads random samples of
wiki pages and strips the HTML for purposes such as this.
Good luck!
Andrew Dunbar (hippietrail)
On 14 December 2010 18:36, pravin.d.s(a)gmail.com <pravin.d.s(a)gmail.com> wrote:
Hi All,
I am Pravin Satpute, I am working on language technology and for building
words and it frequency, i required some webpages in indic language.
Can i get the most recent dump without en.wiki
Thanks,
Pravin s
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l