cc-ed xmldatadumps-l
Hi,
2012/10/23 Dario Taraborelli <dtaraborelli(a)wikimedia.org>rg>:
2012/10/23 James Forrester
<james(a)jdforrester.org>rg>:
On 22 October 2012 16:03, Hydriz Wikipedia
<admin(a)alphacorp.tk> wrote:
I have long been wanting to say this, but is it
possible for the team behind
compiling such datasets to put future (and if possible, current) datasets
into
dumps.wikimedia.org so that it is easier for everyone to find stuff and
not be all over the place? Thanks for that!
Many one-off and regular datasets, from query results to data dumps
and similar, are now indexed[0] on The Data Hub (formerly CKAN) run by
the Open Knowledge Foundation for precisely this reason - so that data
researchers can easily find data about Wikimedia, and see when it's
updated.
[0] -
http://thedatahub.org/en/group/wikimedia
The dumps server was never meant to become a permanent open data repository, but it
started being used as an ad-hoc solution to host all sort of datasets published by WMF on
top of the actual XML dumps: that's the problem we're trying to fix.
Regardless of where the data is physically hosted, your go-to point to discover WMF
datasets from now on is the DataHub. Think of it as a data registry: the registry is all
you need to know in order to find where the data is hosted and to extract the appropriate
metadata/documentation.
That's fine for me but I think more communication about this would be
welcome. I've added a link to meta:Data_dumps¹ and I'll communicate
about this on the French Wikipedia, but a link on the dumps' page for
other downloads² would be great.
Most people I've helped to find data on the Wikimedia projects now
know about
dumps.wikimedia.org, but AFAIK none of them is reading
wiki-research-l.
Best regards,
¹
https://meta.wikimedia.org/wiki/Data_dumps
²
http://dumps.wikimedia.org/other/
--
Jérémie