Hi Robert,
TBH I asked the question as NPOV as possible because I have my own bias. By stating it in general terms I hope that the conversation isn't forced in any particular direction.
There are technical limitations to reuse external datasets, like how do you control that the external site doesn't manage to inject malicious code into the visitors' browser, or how do you cache the data, or what happens when the source data changes or is no longer available... which doesn't mean that it cannot be overcome. In general I also tend to prefer a complete data management solution, because it is what we do in all our projects. We only use the files stored in Commons, like images, videos, books, sound... no exceptions (or at least I don't know any).
OTOH, is it practical to import and standardize the data?
I appreciate your thoughts. If you could write them on the talk page too, that would be great. And if you think that we should make a precision about the three aspects of datasets, please feel free to edit the RFC and let's address each one of them individually.
Cheers, Micru
On Thu, May 15, 2014 at 9:48 PM, Robert Rohde rarohde@gmail.com wrote:
Micru,
There are several related aspects of datasets, that I would enumerate as:
- Storing / archiving datasets
- Editing / manipulating datasets
- Using excerpts (e.g. specific data) from datasets
Each of these involves a different, but related set of tools.
It isn't entirely clear to me, but I think the question you started with is aimed at how we might use excerpts from externally managed datasets. For example, having a way to pull data from CKAN and have it appear in a Wikipedia article? That would remove steps one and two from immediate consideration, as someone else would be responsible for maintaining the data. On the other hand, the responses so far seem more aimed at question one, i.e. where / how would Wikimedia best store datasets.
Personally, I think all parts of the question are ultimately important, as I would love for Wikimedia to have a complete data management solution. But am I correct in thinking that you asked the question primarily out of a desire to think about how we could use externally managed data sets?
-Robert Rohde
On Thu, May 15, 2014 at 2:25 AM, David Cuenca dacuetu@gmail.com wrote:
Hi,
During the Zürich Hackathon I met several people that looked for
solutions
about how to integrate external open datasets into our projects (mainly Wikipedia, Wikidata). Since Wikidata is not the right tool to manage them (reasons explained in the RFC as discussed during the Wikidata session),
I
have felt convenient to centralize the discussion about potential requirements, needs, and how to approach this new changing landscape that didn't exist a few years ago.
You will find more details here
https://meta.wikimedia.org/wiki/Requests_for_comment/How_to_deal_with_open_d...
Your comments, thoughts and ideas are appreciated!
Cheers, Micru _______________________________________________ Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe