Hi,
I work in the Strategic Partnerships team at the Wikimedia Foundation, and I'm in initial conversations with the World Bank and several other large NGOs about using their open data sets in our projects.
The World Bank maintains a large data set of statistics on countries, and is ready to start a pilot test with us. They suggested us to look for a sample of specific indicators that are missing in Wikidata/Wikipedia that could either be linked to or imported into Wikidata. Examples could go from basic indicators like population to more specific data like “% of country's population with access to water”.
I'm looking for technical help to work with our contacts at the World Bank. For instance, what is the best way to:
*compare the World Bank's indicators with Wikidata's properties, and see what are we missing today that would be interesting to collect, either in Wikidata or directly through templates in Wikipedia
**pull/connect that content from the World Bank into our servers
This is what is available today: http://data.worldbank.org/indicator/all
I also welcome your advice starting this project following community processes and standards. Could https://www.wikidata.org/wiki/Wikidata:WikiProject_Economics serve as a starting point?
Thank you and happy Hackathon for those in Lyon!
Sylvia
On 22 May 2015 at 23:08, Sylvia Ventura sventura@wikimedia.org wrote:
Hi,
I work in the Strategic Partnerships team at the Wikimedia Foundation, and I'm in initial conversations with the World Bank and several other large NGOs about using their open data sets in our projects.
The World Bank maintains a large data set of statistics on countries, and is ready to start a pilot test with us. They suggested us to look for a sample of specific indicators that are missing in Wikidata/Wikipedia that could either be linked to or imported into Wikidata. Examples could go from basic indicators like population to more specific data like “% of country's population with access to water”.
I'm looking for technical help to work with our contacts at the World Bank. For instance, what is the best way to:
*compare the World Bank's indicators with Wikidata's properties, and see what are we missing today that would be interesting to collect, either in Wikidata or directly through templates in Wikipedia
**pull/connect that content from the World Bank into our servers
This is what is available today: http://data.worldbank.org/indicator/all
I also welcome your advice starting this project following community processes and standards. Could https://www.wikidata.org/wiki/Wikidata:WikiProject_Economics serve as a starting point?
Thank you and happy Hackathon for those in Lyon!
How about:
Sylvia
-- Sylvia Ventura Strategic Partnerships Wikimedia Foundation sventura@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Just skimming through the list of World Bank data, it looks like most of this would require unit support for quantity properties, which is not yet available. (See https://phabricator.wikimedia.org/T77977 . Unit support is currently listed in the development plan right after access for remaining sister projects and arbitrary access, so I would assume it's not that far off.) Until this is ready, we can't add any data that is measured in square kilometers, dollars, kilowatt hours, metric tons, hectares, kilograms, years, etc.
Of the rest of the data, much of it seems to be too specific for how data is normally entered, and some might be difficult to reasonably add simply due to the limited number of statements the software appears to be able to handle atm.
That leaves things like total population, urban/rural populations, mortality rates, battle-related deaths (though we might want to first find out how a negative number of people died in Syria two years ago... (?)), migration, trademark applications, and many others.
(None of this actually answers your request, but I just wanted to point out the current limitations.)
On Fri, May 22, 2015 at 5:08 PM, Sylvia Ventura sventura@wikimedia.org wrote:
Hi,
I work in the Strategic Partnerships team at the Wikimedia Foundation, and I'm in initial conversations with the World Bank and several other large NGOs about using their open data sets in our projects.
The World Bank maintains a large data set of statistics on countries, and is ready to start a pilot test with us. They suggested us to look for a sample of specific indicators that are missing in Wikidata/Wikipedia that could either be linked to or imported into Wikidata. Examples could go from basic indicators like population to more specific data like “% of country's population with access to water”.
I'm looking for technical help to work with our contacts at the World Bank. For instance, what is the best way to:
*compare the World Bank's indicators with Wikidata's properties, and see what are we missing today that would be interesting to collect, either in Wikidata or directly through templates in Wikipedia
**pull/connect that content from the World Bank into our servers
This is what is available today: http://data.worldbank.org/indicator/all
I also welcome your advice starting this project following community processes and standards. Could https://www.wikidata.org/wiki/Wikidata:WikiProject_Economics serve as a starting point?
Thank you and happy Hackathon for those in Lyon!
Sylvia
-- Sylvia Ventura Strategic Partnerships Wikimedia Foundation sventura@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 22 May 2015 at 22:08, Sylvia Ventura sventura@wikimedia.org wrote:
I'm looking for technical help to work with our contacts at the World Bank. For instance, what is the best way to:
*compare the World Bank's indicators with Wikidata's properties, and see what are we missing today that would be interesting to collect, either in Wikidata or directly through templates in Wikipedia
**pull/connect that content from the World Bank into our servers
It seems to me that the biggest single useful thing the WB (and any other open data publisher) could do would be to include Wikidata IDs (as URIs) in its linked data. For instance, if it refers to "Qatar", it should do so with the URI "https://www.wikidata.org/wiki/Q846"; if it refers to "cotton", it should use "https://www.wikidata.org/wiki/Q11457".
It should always use the most precise URI available, and it should provide a feedback mechanism for errors to be reported.
If it publishes tables or lists of its own identifiers, it should include Wikidata equivalences; and we should work with them to map them in Wikdiata and to fill any gaps. If it uses third-parties' identifiers, it should encourage those third parties to do likewise.
I would really stress the concept that these data have three dimensions:
a) quantity b) objects c) time
And may be the dimensions can be extended to four or to five or to six... these are the cubs (https://en.wikipedia.org/wiki/OLAP_cube).
The discussion can be longer as we want but if a data structure is not able to work with multiple dimensions, it seems hard to host statistical data.
Regards
On Wed, May 27, 2015 at 10:31 AM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 22 May 2015 at 22:08, Sylvia Ventura sventura@wikimedia.org wrote:
I'm looking for technical help to work with our contacts at the World
Bank.
For instance, what is the best way to:
*compare the World Bank's indicators with Wikidata's properties, and see what are we missing today that would be interesting to collect, either in Wikidata or directly through templates in Wikipedia
**pull/connect that content from the World Bank into our servers
It seems to me that the biggest single useful thing the WB (and any other open data publisher) could do would be to include Wikidata IDs (as URIs) in its linked data. For instance, if it refers to "Qatar", it should do so with the URI "https://www.wikidata.org/wiki/Q846"; if it refers to "cotton", it should use "https://www.wikidata.org/wiki/Q11457".
It should always use the most precise URI available, and it should provide a feedback mechanism for errors to be reported.
If it publishes tables or lists of its own identifiers, it should include Wikidata equivalences; and we should work with them to map them in Wikdiata and to fill any gaps. If it uses third-parties' identifiers, it should encourage those third parties to do likewise.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 27.05.2015 10:31, Andy Mabbett wrote:
On 22 May 2015 at 22:08, Sylvia Ventura sventura@wikimedia.org wrote:
I'm looking for technical help to work with our contacts at the World Bank. For instance, what is the best way to:
*compare the World Bank's indicators with Wikidata's properties, and see what are we missing today that would be interesting to collect, either in Wikidata or directly through templates in Wikipedia
**pull/connect that content from the World Bank into our servers
It seems to me that the biggest single useful thing the WB (and any other open data publisher) could do would be to include Wikidata IDs (as URIs) in its linked data. For instance, if it refers to "Qatar", it should do so with the URI "https://www.wikidata.org/wiki/Q846"; if it refers to "cotton", it should use "https://www.wikidata.org/wiki/Q11457".
Yes, that would be great. However, please use the URIs, e.g.,
http://www.wikidata.org/entity/Q846
rather than the HTML page URLs (such as http://www.wikidata.org/wiki/Q846). The URIs provide content negotiation and deliver data to crawlers (the RDF data exported will soon increase significantly and include all data). When called in the browser (which asks for HTML content), the URIs will redirect to the HTML pages, so they work well for users too.
Regards,
Markus
It should always use the most precise URI available, and it should provide a feedback mechanism for errors to be reported.
If it publishes tables or lists of its own identifiers, it should include Wikidata equivalences; and we should work with them to map them in Wikdiata and to fill any gaps. If it uses third-parties' identifiers, it should encourage those third parties to do likewise.
After having a chance to chat (at different times and in different conversations) with Sylvia and Lydia, I find this project very interesting. Being able to feed Wikidata with referenced data gathered periodically from the World Bank database sounds useful already, but I think we could take this initiative as a pilot for how to work with these types of organizations.
The point where we are today with organizations hosting open data is similar to our relationship with galleries, libraries, archives, and museums before GLAM or Wikipedians in residence existed. Many discussions, new precedents, tools, and processes were needed to reach the point where we at today in GLAM, and there is still so much to do, but everybody agrees that the effort is clearly worth.
Today we miss platform features, tools, and processes to visit (or receive the visit) of organizations with large open datasets, hook onto their APIs, and retrieve their interesting data. The World Bank is a potential good use case: they provide many numbers used in infoboxes of articles of countries which are updated manually in hundreds of Wikipedias (and in Wikidata), they are actively interested in collaborating, and the Wikimedia Foundation can contribute some of the "overhead" on partner relations and, if needed, project management. With time, and if this pilot progresses well, Sylvia's team and whoever wants to be involved could start knocking other doors. Who knows, maybe the first generation of Wikidata-scientists in residence are not that far off? :)
I'm tempted of proposing a #World-Bank-Data project in Phabricator if only to lay down the basic plan, define the dependencies with Wikidata et al, and start discussing the specifics that can be discussed today... but I don't want to run faster than needed, so I'll wait until more people (and specially Sylvia and Lydia) think it's a good idea.
An example of task in that project would be
On Wed, May 27, 2015 at 11:16 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 27.05.2015 10:31, Andy Mabbett wrote:
It seems to me that the biggest single useful thing the WB (and any other open data publisher) could do would be to include Wikidata IDs (as URIs) in its linked data. For instance, if it refers to "Qatar", it should do so with the URI "https://www.wikidata.org/wiki/Q846"; if it refers to "cotton", it should use "https://www.wikidata.org/wiki/Q11457".
Yes, that would be great. However, please use the URIs, e.g.,
Interesting, although I guess we will be in a better position to propose this when we have some plan to import their data. Meanwhile, just for the sake to refine this idea: note that WB already has http://data.worldbank.org/country/qatar, and it would make sense for them to link there in the first place. If you were in http://data.worldbank.org/country/qatar, where would you expect a link to https://www.wikidata.org/entity/Q846 and with which kind of message? Or are you suggesting something else?
I agree with this point considering that I am experiencing personally GLAMs that can donate their huge datasets of metadata while the content is protected by copyright because their digital assets are the metadata and not the content itself.
I am speaking mainly about digital archives of media.
For this reason a wikidata-scientist is already a reality at least in GLAMs. This is not the future but it is the current need.
Anyway it would be useful if among platform features, tools and processes we may include also APIs or an improvement of APIs to facilitate the work of the import of these data by these wikidata-scientists/WiR.
Regards
On Wed, May 27, 2015 at 1:48 PM, Quim Gil qgil@wikimedia.org wrote:
The point where we are today with organizations hosting open data is similar to our relationship with galleries, libraries, archives, and museums before GLAM or Wikipedians in residence existed. Many discussions, new precedents, tools, and processes were needed to reach the point where we at today in GLAM, and there is still so much to do, but everybody agrees that the effort is clearly worth.
Today we miss platform features, tools, and processes to visit (or receive the visit) of organizations with large open datasets, hook onto their APIs, and retrieve their interesting data. The World Bank is a potential good use case: they provide many numbers used in infoboxes of articles of countries which are updated manually in hundreds of Wikipedias (and in Wikidata), they are actively interested in collaborating, and the Wikimedia Foundation can contribute some of the "overhead" on partner relations and, if needed, project management. With time, and if this pilot progresses well, Sylvia's team and whoever wants to be involved could start knocking other doors. Who knows, maybe the first generation of Wikidata-scientists in residence are not that far off? :)
Sorry for being late to this great thread - I am definitely interested in pursuing the integration of Wikidata workflows with those of organizations hosting in-scope data. That was actually the gist of the Wiki4R proposal [1], and while it did not get funded, I think it contains a number of nuclei for further activities in this space to build on.
As for partners beyond the World Bank, I'd be happy to help piloting something here at NIH beyond the Gene Wiki's [2] Wikidata activities (which are NIH-funded). We already have about 20 NIH-related properties [3].
Cheers, d.
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Wikidata_for_research/EIN... [2] https://en.wikipedia.org/wiki/User:ProteinBoxBot [3] https://www.wikidata.org/wiki/Template:NIH_properties
On Wed, May 27, 2015 at 1:48 PM, Quim Gil qgil@wikimedia.org wrote:
After having a chance to chat (at different times and in different conversations) with Sylvia and Lydia, I find this project very interesting. Being able to feed Wikidata with referenced data gathered periodically from the World Bank database sounds useful already, but I think we could take this initiative as a pilot for how to work with these types of organizations.
The point where we are today with organizations hosting open data is similar to our relationship with galleries, libraries, archives, and museums before GLAM or Wikipedians in residence existed. Many discussions, new precedents, tools, and processes were needed to reach the point where we at today in GLAM, and there is still so much to do, but everybody agrees that the effort is clearly worth.
Today we miss platform features, tools, and processes to visit (or receive the visit) of organizations with large open datasets, hook onto their APIs, and retrieve their interesting data. The World Bank is a potential good use case: they provide many numbers used in infoboxes of articles of countries which are updated manually in hundreds of Wikipedias (and in Wikidata), they are actively interested in collaborating, and the Wikimedia Foundation can contribute some of the "overhead" on partner relations and, if needed, project management. With time, and if this pilot progresses well, Sylvia's team and whoever wants to be involved could start knocking other doors. Who knows, maybe the first generation of Wikidata-scientists in residence are not that far off? :)
I'm tempted of proposing a #World-Bank-Data project in Phabricator if only to lay down the basic plan, define the dependencies with Wikidata et al, and start discussing the specifics that can be discussed today... but I don't want to run faster than needed, so I'll wait until more people (and specially Sylvia and Lydia) think it's a good idea.
An example of task in that project would be
On Wed, May 27, 2015 at 11:16 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
On 27.05.2015 10:31, Andy Mabbett wrote:
It seems to me that the biggest single useful thing the WB (and any other open data publisher) could do would be to include Wikidata IDs (as URIs) in its linked data. For instance, if it refers to "Qatar", it should do so with the URI "https://www.wikidata.org/wiki/Q846"; if it refers to "cotton", it should use "https://www.wikidata.org/wiki/Q11457".
Yes, that would be great. However, please use the URIs, e.g.,
Interesting, although I guess we will be in a better position to propose this when we have some plan to import their data. Meanwhile, just for the sake to refine this idea: note that WB already has http://data.worldbank.org/country/qatar, and it would make sense for them to link there in the first place. If you were in http://data.worldbank.org/country/qatar, where would you expect a link to https://www.wikidata.org/entity/Q846 and with which kind of message? Or are you suggesting something else?
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 27 May 2015 at 10:16, Markus Krötzsch markus@semantic-mediawiki.org wrote:
please use the URIs, e.g.,
http://www.wikidata.org/entity/Q846
rather than the HTML page URLs (such as http://www.wikidata.org/wiki/Q846).
Good point; thank you.