cc-ed xmldatadumps-l
Hi,
2012/10/23 Dario Taraborelli dtaraborelli@wikimedia.org:
2012/10/23 James Forrester james@jdforrester.org:
On 22 October 2012 16:03, Hydriz Wikipedia admin@alphacorp.tk wrote:
I have long been wanting to say this, but is it possible for the team behind compiling such datasets to put future (and if possible, current) datasets into dumps.wikimedia.org so that it is easier for everyone to find stuff and not be all over the place? Thanks for that!
Many one-off and regular datasets, from query results to data dumps and similar, are now indexed[0] on The Data Hub (formerly CKAN) run by the Open Knowledge Foundation for precisely this reason - so that data researchers can easily find data about Wikimedia, and see when it's updated.
The dumps server was never meant to become a permanent open data repository, but it started being used as an ad-hoc solution to host all sort of datasets published by WMF on top of the actual XML dumps: that's the problem we're trying to fix.
Regardless of where the data is physically hosted, your go-to point to discover WMF datasets from now on is the DataHub. Think of it as a data registry: the registry is all you need to know in order to find where the data is hosted and to extract the appropriate metadata/documentation.
That's fine for me but I think more communication about this would be welcome. I've added a link to meta:Data_dumps¹ and I'll communicate about this on the French Wikipedia, but a link on the dumps' page for other downloads² would be great.
Most people I've helped to find data on the Wikimedia projects now know about dumps.wikimedia.org, but AFAIK none of them is reading wiki-research-l.
Best regards,
¹ https://meta.wikimedia.org/wiki/Data_dumps ² http://dumps.wikimedia.org/other/