Hi
I want to announce here the publication of new datasets to easy make
selections of Wikipedia articles. This data can be used by any developer
or tech-friendly guy to create subset of Wikipedia. You can find the
data here:
http://download.kiwix.org/wp1/ (or via FTP).
This data repository will be kept up-to-date every month thanks to a few
scripts which are published here:
https://github.com/openzim/wp1_selection_tools. Of course, everything is
free software.
For each of the 500.000+ Wikipedias, you can find there TSV tables which
contain usual indicators of importance for each article: like number of
interlanguage links, number of links pointing to an articles, pageviews,
... All gathered in one file. For the Wikipedia in English you will
benefit in addition of the Wikiproject importance/quality evaluations.
If you are really lazy, there is a "score" file which mix all these
indicators to give a unique score number per article. The methodology is
described here
https://github.com/openzim/wp1_selection_tools. For
example, if you want tje TOP1000 articles of Wikipedia, just take the
first thousand lines of the "score" file to get your list of articles.
All this work has been done to allow the creation of TOP Wikipedia
articles ZIM files. It has also been done to make possible the creation
of ZIM extension files, a concept we want to develop to improve our
WikiMed Android apps. Both of them will appear before the end of the year.
Stay tuned!
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web:
http://www.kiwix.org
* Twitter:
https://twitter.com/KiwixOffline
* more:
http://www.kiwix.org/wiki/Communication