I have noticed a lack of actual data in wikidata representations of wikipedia list.
for example
List of countries by tax revenue to GDP ratio https://en.wikipedia.org/wiki/List_of_countries_by_tax_revenue_to_GDP_ratio
to
List of countries by tax revenue as percentage of GDP (Q2529105) https://www.wikidata.org/wiki/Q2529105
is there currently a development in the wikidata community to transform these lists into wikibase items and last but not least produce RDF respresentations for WDQS?
best, Marco
Hi Marco,
I agree that many of these lists and tables could be harvested (with some care, of course).
However, I don't think that the information they contain should go to the Wikidata item they are associated with. This Wikidata item mostly exists to store inter-language links, but is poorly connected to the rest of the knowledge graph. This tax revenue and GDP information should go to the country items themselves.
I am working on the problem of extraction of statements from lists and tables and will write a tutorial when the tools are ready. WikidataCon attendees might have a glimpse of that in this session: https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Submissions/OpenRefi...
Cheers, Antonin
On 17/10/2017 12:07, Marco Neumann wrote:
I have noticed a lack of actual data in wikidata representations of wikipedia list.
for example
List of countries by tax revenue to GDP ratio https://en.wikipedia.org/wiki/List_of_countries_by_tax_revenue_to_GDP_ratio
to
List of countries by tax revenue as percentage of GDP (Q2529105) https://www.wikidata.org/wiki/Q2529105
is there currently a development in the wikidata community to transform these lists into wikibase items and last but not least produce RDF respresentations for WDQS?
best, Marco
2017-10-17 16:30 GMT+02:00 Antonin Delpeuch (lists) < lists@antonin.delpeuch.eu>:
Hi Marco,
I agree that many of these lists and tables could be harvested (with some care, of course).
On some are (for example, table for monuments are harvested into Heritage http://tools.wmflabs.org/heritage/who is now imported on Wikidata).
However, I don't think that the information they contain should go to
the Wikidata item they are associated with. This Wikidata item mostly exists to store inter-language links, but is poorly connected to the rest of the knowledge graph. This tax revenue and GDP information should go to the country items themselves.
In this case, yes, absolutely. But in some other case, it could be directly into the asociated item (see for instance, this graph : https://fr.wikipedia.org/wiki/Pont-l%27%C3%A9v%C3%AAque#Production ).
I am working on the problem of extraction of statements from lists and tables and will write a tutorial when the tools are ready. WikidataCon attendees might have a glimpse of that in this session: https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/ Submissions/OpenRefine_demo
Great, I'll be there ;)
Cdlt, ~nicolas
We have been doing this for the Wiki Loves Monuments Wikipedia list articles, here is some info
https://www.wikidata.org/wiki/Wikidata:WikiProject_WLM https://www.wikidata.org/wiki/Wikidata:WikiProject_WLM/Mapping_tables
https://www.wikidata.org/wiki/Wikidata:WikiProject_WLM/Mapping_tables Thanks
John
On 17 October 2017 at 16:42, Nicolas VIGNERON vigneron.nicolas@gmail.com wrote:
2017-10-17 16:30 GMT+02:00 Antonin Delpeuch (lists) < lists@antonin.delpeuch.eu>:
Hi Marco,
I agree that many of these lists and tables could be harvested (with some care, of course).
On some are (for example, table for monuments are harvested into Heritage http://tools.wmflabs.org/heritage/who is now imported on Wikidata).
However, I don't think that the information they contain should go to
the Wikidata item they are associated with. This Wikidata item mostly exists to store inter-language links, but is poorly connected to the rest of the knowledge graph. This tax revenue and GDP information should go to the country items themselves.
In this case, yes, absolutely. But in some other case, it could be directly into the asociated item (see for instance, this graph : https://fr.wikipedia.org/wiki/ Pont-l%27%C3%A9v%C3%AAque#Production ).
I am working on the problem of extraction of statements from lists and tables and will write a tutorial when the tools are ready. WikidataCon attendees might have a glimpse of that in this session: https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Subm issions/OpenRefine_demo
Great, I'll be there ;)
Cdlt, ~nicolas
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
great I will be on site as well, so we will have a little more time to discuss this in detail.
On Tue, Oct 17, 2017 at 4:30 PM, Antonin Delpeuch (lists) lists@antonin.delpeuch.eu wrote:
Hi Marco,
I agree that many of these lists and tables could be harvested (with some care, of course).
However, I don't think that the information they contain should go to the Wikidata item they are associated with. This Wikidata item mostly exists to store inter-language links, but is poorly connected to the rest of the knowledge graph. This tax revenue and GDP information should go to the country items themselves.
I am working on the problem of extraction of statements from lists and tables and will write a tutorial when the tools are ready. WikidataCon attendees might have a glimpse of that in this session: https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Submissions/OpenRefi...
Cheers, Antonin
On 17/10/2017 12:07, Marco Neumann wrote:
I have noticed a lack of actual data in wikidata representations of wikipedia list.
for example
List of countries by tax revenue to GDP ratio https://en.wikipedia.org/wiki/List_of_countries_by_tax_revenue_to_GDP_ratio
to
List of countries by tax revenue as percentage of GDP (Q2529105) https://www.wikidata.org/wiki/Q2529105
is there currently a development in the wikidata community to transform these lists into wikibase items and last but not least produce RDF respresentations for WDQS?
best, Marco
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
There is a better alternative to storing lists - https://www.mediawiki.org/wiki/Help:Tabular_Data -- it allows you to store a CSV-like table of data on Commons, with localized columns, and access it from all other wikis from the <graph> and Lua scripts.
A good example of it -- "per state GDP" page -- see graph in the upper right corner.
Page: List_of_U.S._states_by_GDP https://en.wikipedia.org/wiki/List_of_U.S._states_by_GDP Data: GDP_by_state.tab https://commons.wikimedia.org/wiki/Data:Bea.gov/GDP_by_state.tab
Wikidata is not very well suited for lists data, but tabular data was designed for a relatively large (up to 2mb) lists.
On Tue, Oct 17, 2017 at 12:46 PM, Marco Neumann marco.neumann@gmail.com wrote:
great I will be on site as well, so we will have a little more time to discuss this in detail.
On Tue, Oct 17, 2017 at 4:30 PM, Antonin Delpeuch (lists) lists@antonin.delpeuch.eu wrote:
Hi Marco,
I agree that many of these lists and tables could be harvested (with some care, of course).
However, I don't think that the information they contain should go to the Wikidata item they are associated with. This Wikidata item mostly exists to store inter-language links, but is poorly connected to the rest of the knowledge graph. This tax revenue and GDP information should go to the country items themselves.
I am working on the problem of extraction of statements from lists and tables and will write a tutorial when the tools are ready. WikidataCon attendees might have a glimpse of that in this session: https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/
Submissions/OpenRefine_demo
Cheers, Antonin
On 17/10/2017 12:07, Marco Neumann wrote:
I have noticed a lack of actual data in wikidata representations of wikipedia list.
for example
List of countries by tax revenue to GDP ratio https://en.wikipedia.org/wiki/List_of_countries_by_tax_
revenue_to_GDP_ratio
to
List of countries by tax revenue as percentage of GDP (Q2529105) https://www.wikidata.org/wiki/Q2529105
is there currently a development in the wikidata community to transform these lists into wikibase items and last but not least produce RDF respresentations for WDQS?
best, Marco
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Marco Neumann KONA
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
when you say "wikidata is not well suited for lists data", you refer to wikibase or WDQS here?
the data:Bea.gov/GDP by state.tab above is certainly a good representation for efficient delivery (via json) and display of data. but inefficient for further data sharing without URIs.
On Tue, Oct 17, 2017 at 9:08 PM, Yuri Astrakhan yuriastrakhan@gmail.com wrote:
There is a better alternative to storing lists - https://www.mediawiki.org/wiki/Help:Tabular_Data -- it allows you to store a CSV-like table of data on Commons, with localized columns, and access it from all other wikis from the <graph> and Lua scripts.
A good example of it -- "per state GDP" page -- see graph in the upper right corner.
Page: List_of_U.S._states_by_GDP Data: GDP_by_state.tab
Wikidata is not very well suited for lists data, but tabular data was designed for a relatively large (up to 2mb) lists.
On Tue, Oct 17, 2017 at 12:46 PM, Marco Neumann marco.neumann@gmail.com wrote:
great I will be on site as well, so we will have a little more time to discuss this in detail.
On Tue, Oct 17, 2017 at 4:30 PM, Antonin Delpeuch (lists) lists@antonin.delpeuch.eu wrote:
Hi Marco,
I agree that many of these lists and tables could be harvested (with some care, of course).
However, I don't think that the information they contain should go to the Wikidata item they are associated with. This Wikidata item mostly exists to store inter-language links, but is poorly connected to the rest of the knowledge graph. This tax revenue and GDP information should go to the country items themselves.
I am working on the problem of extraction of statements from lists and tables and will write a tutorial when the tools are ready. WikidataCon attendees might have a glimpse of that in this session:
https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Submissions/OpenRefi...
Cheers, Antonin
On 17/10/2017 12:07, Marco Neumann wrote:
I have noticed a lack of actual data in wikidata representations of wikipedia list.
for example
List of countries by tax revenue to GDP ratio
https://en.wikipedia.org/wiki/List_of_countries_by_tax_revenue_to_GDP_ratio
to
List of countries by tax revenue as percentage of GDP (Q2529105) https://www.wikidata.org/wiki/Q2529105
is there currently a development in the wikidata community to transform these lists into wikibase items and last but not least produce RDF respresentations for WDQS?
best, Marco
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Marco Neumann KONA
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
when you say "wikidata is not well suited for lists data", you refer to wikibase or WDQS here?
Wikibase, per Daniel K.
the data:Bea.gov/GDP by state.tab above is certainly a good representation for efficient delivery (via json) and display of data. but inefficient for further data sharing without URIs.
What do you mean by further sharing without uri?
Btw, there is a wikidata property to link to the .map data pages. Not sure about the .tab, but might also work
On Wed, Oct 18, 2017 at 12:50 PM, Yuri Astrakhan yuriastrakhan@gmail.com wrote:
when you say "wikidata is not well suited for lists data", you refer to wikibase or WDQS here?
Wikibase, per Daniel K.
the data:Bea.gov/GDP by state.tab above is certainly a good representation for efficient delivery (via json) and display of data. but inefficient for further data sharing without URIs.
What do you mean by further sharing without uri?
if the cell is identified by a string rather than a URI (as it is the case in the example above) disambiguation is necessary and error prone.
e.g. Berlin
wd:Q64 wd:Q4579913 wd:Q5932836
Btw, there is a wikidata property to link to the .map data pages. Not sure about the .tab, but might also work
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I like the idea of storing tables in Commons, but for now I am still using Wikidata to store the lists I upload, because:
* tabular data is not integrated with WDQS as far as I know * the tabular data format is quite poor compared to things like https://www.w3.org/TR/tabular-metadata/ * it is not clear to me how this new Wikibase datatype should be used by the community. Should we create tabular counterparts of many existing properties? Should we have one generic tabular property for use as a qualifier of a traditional property?
I find the feature very promising, but for now it is still in its infancy. I don't see how I could use it for edits like this one: https://www.wikidata.org/w/index.php?title=Q37461404&diff=578074181&...
Antonin
On 18/10/2017 11:50, Yuri Astrakhan wrote:
when you say "wikidata is not well suited for lists data", you refer to wikibase or WDQS here?
Wikibase, per Daniel K.
the data:Bea.gov/GDP by state.tab above is certainly a good representation for efficient delivery (via json) and display of data. but inefficient for further data sharing without URIs.
What do you mean by further sharing without uri?
Btw, there is a wikidata property to link to the .map data pages. Not sure about the .tab, but might also work
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
The idea with .tab is to treat it like a blob from wikibase perspective, just like an image. There was some discussion on importing it into wdqs, but that hasn't progressed much.
The headers use nonlocalizable ids, Latin chars only, digits, underscores. But it supports titles--localized names for the ids.
Tab format is a subset of a well know standard, but I can't find a link to the discussion with one of the standard writers. I think it was http://specs.frictionlessdata.io/table-schema/
The biggest problem is the missing ui to edit it. On Wed, Oct 18, 2017, 07:03 Antonin Delpeuch (lists) < lists@antonin.delpeuch.eu> wrote:
I like the idea of storing tables in Commons, but for now I am still using Wikidata to store the lists I upload, because:
- tabular data is not integrated with WDQS as far as I know
- the tabular data format is quite poor compared to things like
https://www.w3.org/TR/tabular-metadata/
- it is not clear to me how this new Wikibase datatype should be used by
the community. Should we create tabular counterparts of many existing properties? Should we have one generic tabular property for use as a qualifier of a traditional property?
I find the feature very promising, but for now it is still in its infancy. I don't see how I could use it for edits like this one:
https://www.wikidata.org/w/index.php?title=Q37461404&diff=578074181&...
Antonin
On 18/10/2017 11:50, Yuri Astrakhan wrote:
when you say "wikidata is not well suited for lists data", you refer to wikibase or WDQS here?
Wikibase, per Daniel K.
the data:Bea.gov/GDP by state.tab above is certainly a good representation for efficient delivery (via json) and display of data. but inefficient for further data sharing without URIs.
What do you mean by further sharing without uri?
Btw, there is a wikidata property to link to the .map data pages. Not sure about the .tab, but might also work
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, What bothers me is that when the individual values correspond with Wikidata items or statements, we have to again include all the labels. It is not really effective in this way, Thanks, GerardM
On 18 October 2017 at 13:35, Yuri Astrakhan yuriastrakhan@gmail.com wrote:
The idea with .tab is to treat it like a blob from wikibase perspective, just like an image. There was some discussion on importing it into wdqs, but that hasn't progressed much.
The headers use nonlocalizable ids, Latin chars only, digits, underscores. But it supports titles--localized names for the ids.
Tab format is a subset of a well know standard, but I can't find a link to the discussion with one of the standard writers. I think it was http://specs.frictionlessdata.io/table-schema/
The biggest problem is the missing ui to edit it. On Wed, Oct 18, 2017, 07:03 Antonin Delpeuch (lists) < lists@antonin.delpeuch.eu> wrote:
I like the idea of storing tables in Commons, but for now I am still using Wikidata to store the lists I upload, because:
- tabular data is not integrated with WDQS as far as I know
- the tabular data format is quite poor compared to things like
https://www.w3.org/TR/tabular-metadata/
- it is not clear to me how this new Wikibase datatype should be used by
the community. Should we create tabular counterparts of many existing properties? Should we have one generic tabular property for use as a qualifier of a traditional property?
I find the feature very promising, but for now it is still in its infancy. I don't see how I could use it for edits like this one: https://www.wikidata.org/w/index.php?title=Q37461404& diff=578074181&oldid=578071885
Antonin
On 18/10/2017 11:50, Yuri Astrakhan wrote:
when you say "wikidata is not well suited for lists data", you
refer
to wikibase or WDQS here?
Wikibase, per Daniel K.
the data:Bea.gov/GDP by state.tab above is certainly a good representation for efficient delivery (via json) and display of
data.
but inefficient for further data sharing without URIs.
What do you mean by further sharing without uri?
Btw, there is a wikidata property to link to the .map data pages. Not sure about the .tab, but might also work
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
when you say "wikidata is not well suited for lists data", you refer to wikibase or WDQS here?
Wikidata is not good for storing list data, or any serial data. WDQS can produce all kinds of amazing lists via queries, but it's not a primary data storage. In general, it could store series data, but since it's based off Wikidata and feeds from it, that creates certain issue when data is not very suitable for Wikidata.
the data:Bea.gov/GDP by state.tab above is certainly a good representation for efficient delivery (via json) and display of data. but inefficient for further data sharing without URIs.
The question of querying data like "GDB by state.tab" is an interesting one. I'm not sure whether triple store would be a good medium, but maybe it could be... Needs some research on the idea.
Hi Stas and Antonin,
Regarding triple storage of list-like data...
In fact, the primary motivation that OpenRefine was developed was as an importer tool of list-like data to be uploaded into Freebase. I had tons of difficulty with Freebase's earlier importer tool that did not allow much flexibility. And I was adamant and vocal in complaints to Freebase staff to "give us better importing tools". OpenRefine was born from those discussions and working with Freebase staff to develop and design Gridworks, ala Google Refine, ala OpenRefine..
Lists are just rows of lots of individual facts or statements that need have to be aligned against a schema.. So having a schema alignment dialog, as we had in OpenRefine against Freebase schema, will be important for absorbing any lists and aligning and uploading into Wikidata's triple store. The schema alignment dialog was the core feature that the previous Freebase importer tool lacked sufficient fluid UI/UX.
It worked fantastic with Freebase and I do not see any reason why it couldn't be done for Wikidata and simplify the absorption of lists into Wikidata.
Antonin, was it in your plans to eventually work on the schema alignment dialog also for uploading data back to Wikidata to complete the circle of life, "take and give" ?
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On 19/10/2017 12:28, Thad Guidry wrote:
It worked fantastic with Freebase and I do not see any reason why it couldn't be done for Wikidata and simplify the absorption of lists into Wikidata.
Antonin, was it in your plans to eventually work on the schema alignment dialog also for uploading data back to Wikidata to complete the circle of life, "take and give" ?
Yes - a good part of that is implemented, but there is still some work to do before a release.
Antonin