Thanks for the release, Alex. I am sorry to see this resource go but agree the data will be of great interest to researchers / app developers.

In terms of how to best store the data and metadata for long-term preservation and discoverability, my recommendation is to use an open data registry where you can describe the dataset, make it citable and discoverable, add metadata and assign the entry a unique and persistent identifier. 

Services like Zenodo or figshare (the one we've used for our data releases at WMF, see for example the clickstream dataset) are good options to do this.

Dario

On Sun, Dec 11, 2016 at 11:53 PM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Alex Druk, 12/12/2016 08:32:
For a few years I have maintained a web site wikipediatrends.com
<http://wikipediatrends.com>. For variety of reasons I cannot do it any
more and the site will be closed in January.
However, our DB of English wikipedia pageviews from 2007 can be used for
other projects. Any person who wish to get it please see  info below.

Thanks. Can you please upload those files to the Internet Archive? You can use the https://internetarchive.readthedocs.io/en/latest/cli.html#upload CLI with mediatype "data", collection "opensource" and subject "Wikipedia; enwiki".

Nemo

A few words about DB. We keep data in separate files for each page. Each
file is csv with lines started with year and followed by pageviews for
each day. Page name is md5 encoded  and used as name of the file. Page
names are in separate Berkley DB file. The total size of DB is about
30GB. It is in 3 archived files ~ 10 GB.
You can download DB as 12/03/2016 from:
https://s3-us-west-2.amazonaws.com/adrouk/november2016/rdd112016_1.tar.gz
https://s3-us-west-2.amazonaws.com/adrouk/november2016/rdd112016_2.tar.gz
https://s3-us-west-2.amazonaws.com/adrouk/november2016/articles112016.db
As June 2015:
https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_1.tar.gz
<https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_1.tar.gz>
https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_2.tar.gz
<https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_2.tar.gz>
https://s3-us-west-2.amazonaws.com/adrouk/june2015/articles62015.db
<https://s3-us-west-2.amazonaws.com/adrouk/june2015/articles62015.db>
Please do not hesitate to ask any question about DB. If by any chance
you are interested in the site also, please contact me of the list.
Enjoy!

---
Thank you.

Alex Druk, PhD
wikipediatrends.com
<http://wikipediatrends.com/>alex.druk@gmail.com
<mailto:alex.druk@gmail.com>
(775) 237-8550 <tel:(775)%20237-8550> Google voice



_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--

Dario Taraborelli  Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter