Dear Wikipedia and MediaWiki people,
For a few years I have maintained a web site wikipediatrends.com. For variety of reasons I cannot do it any more and the site will be closed in January. However, our DB of English wikipedia pageviews from 2007 can be used for other projects. Any person who wish to get it please see info below.
Alex Druk, 12/12/2016 08:32:
For a few years I have maintained a web site wikipediatrends.com http://wikipediatrends.com. For variety of reasons I cannot do it any more and the site will be closed in January. However, our DB of English wikipedia pageviews from 2007 can be used for other projects. Any person who wish to get it please see info below.
Thanks. Can you please upload those files to the Internet Archive? You can use the https://internetarchive.readthedocs.io/en/latest/cli.html#upload CLI with mediatype "data", collection "opensource" and subject "Wikipedia; enwiki".
Nemo
A few words about DB. We keep data in separate files for each page. Each file is csv with lines started with year and followed by pageviews for each day. Page name is md5 encoded and used as name of the file. Page names are in separate Berkley DB file. The total size of DB is about 30GB. It is in 3 archived files ~ 10 GB. You can download DB as 12/03/2016 from: https://s3-us-west-2.amazonaws.com/adrouk/november2016/rdd112016_1.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/november2016/rdd112016_2.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/november2016/articles112016.db As June 2015: https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_1.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_1.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_2.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_2.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/june2015/articles62015.db https://s3-us-west-2.amazonaws.com/adrouk/june2015/articles62015.db Please do not hesitate to ask any question about DB. If by any chance you are interested in the site also, please contact me of the list. Enjoy!
Thank you.
Alex Druk, PhD wikipediatrends.com http://wikipediatrends.com/alex.druk@gmail.com mailto:alex.druk@gmail.com (775) 237-8550 tel:(775)%20237-8550 Google voice
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Thanks for the release, Alex. I am sorry to see this resource go but agree the data will be of great interest to researchers / app developers.
In terms of how to best store the data and metadata for long-term preservation and discoverability, my recommendation is to use an open data registry where you can describe the dataset, make it citable and discoverable, add metadata and assign the entry a unique and persistent identifier.
Services like Zenodo https://zenodo.org/ or figshare https://figshare.com/ (the one we've used for our data releases at WMF, see for example the clickstream dataset https://dx.doi.org/10.6084/m9.figshare.1305770.v21) are good options to do this.
Dario
On Sun, Dec 11, 2016 at 11:53 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Alex Druk, 12/12/2016 08:32:
For a few years I have maintained a web site wikipediatrends.com http://wikipediatrends.com. For variety of reasons I cannot do it any more and the site will be closed in January. However, our DB of English wikipedia pageviews from 2007 can be used for other projects. Any person who wish to get it please see info below.
Thanks. Can you please upload those files to the Internet Archive? You can use the https://internetarchive.readthedocs.io/en/latest/cli.html#upload CLI with mediatype "data", collection "opensource" and subject "Wikipedia; enwiki".
Nemo
A few words about DB. We keep data in separate files for each page. Each
file is csv with lines started with year and followed by pageviews for each day. Page name is md5 encoded and used as name of the file. Page names are in separate Berkley DB file. The total size of DB is about 30GB. It is in 3 archived files ~ 10 GB. You can download DB as 12/03/2016 from: https://s3-us-west-2.amazonaws.com/adrouk/november2016/rdd112016_1.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/november2016/rdd112016_2.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/november2016/articles112016.db As June 2015: https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_1.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_1.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_2.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_2.tar.gz https://s3-us-west-2.amazonaws.com/adrouk/june2015/articles62015.db https://s3-us-west-2.amazonaws.com/adrouk/june2015/articles62015.db Please do not hesitate to ask any question about DB. If by any chance you are interested in the site also, please contact me of the list. Enjoy!
Thank you.
Alex Druk, PhD wikipediatrends.com http://wikipediatrends.com/alex.druk@gmail.com mailto:alex.druk@gmail.com (775) 237-8550 tel:(775)%20237-8550 Google voice
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org