Hello Everyone,
My name is Gabriel, I'm a Ph.D. student who's trying to research the relevance/quality of internal references in Wikidata (non-external URLs).
My first objective is to locate these references, and for that I wanted to ask you all for help.
Here https://www.wikidata.org/wiki/Help:Sources, it is mentioned that sources come normally either if the 'reference URL' property (I assume for external references) or the 'stated in' property (I assume for internal references).
So, my question is, apart from the 'stated in' property, what other properties are used for internal references?
My goal is to parse these references, match them with their claims and see, for each claim-reference pair, how relevant is the reference.
Thank you!
Kind regards,
*Gabriel Maia http://gabrielmaia7.github.io/Data Scientist and Researcher* gabrielmaiarocha@gmail.com | +44 7472 546312 [image: https://facebook.com/maiarocg] https://facebook.com/maiarocg [image: https://www.linkedin.com/in/maiarocg] https://www.linkedin.com/in/maiarocg https://twitter.com/maiarocg
Hi,
A lot of different properties are used in the reference "space" but 'reference URL' and 'stated in' are the two main one. The third important is ' imported from Wikimedia project' whi is the most used but it's not a real reference per se (more a token for tracability). Then there is a lot of property to precise the reference (like 'page(s)' P304 or 'retrieved' P813 for the mosts common). Finally, there is probably a lot of errors or mistake.
In a nutshell, you can safely assume that only the property 'stated in' matters.
Cheers, ~nicolas
PS: I tried to do a SPARQL to see which properties is used but it timesOut :/
Le lun. 20 janv. 2020 à 12:26, Gabriel Maia gabrielmaiarocha@gmail.com a écrit :
Hello Everyone,
My name is Gabriel, I'm a Ph.D. student who's trying to research the relevance/quality of internal references in Wikidata (non-external URLs).
My first objective is to locate these references, and for that I wanted to ask you all for help.
Here https://www.wikidata.org/wiki/Help:Sources, it is mentioned that sources come normally either if the 'reference URL' property (I assume for external references) or the 'stated in' property (I assume for internal references).
So, my question is, apart from the 'stated in' property, what other properties are used for internal references?
My goal is to parse these references, match them with their claims and see, for each claim-reference pair, how relevant is the reference.
Thank you!
Kind regards,
*Gabriel Maia http://gabrielmaia7.github.io/Data Scientist and Researcher* gabrielmaiarocha@gmail.com | +44 7472 546312 [image: https://facebook.com/maiarocg] https://facebook.com/maiarocg [image: https://www.linkedin.com/in/maiarocg] https://www.linkedin.com/in/maiarocg https://twitter.com/maiarocg _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Mon, Jan 20, 2020, 9:06 PM Nicolas VIGNERON, vigneron.nicolas@gmail.com wrote
In a nutshell, you can safely assume that only the property 'stated in' matters.
I think 'inferred from' (P3452) should also be considered as an internal reference.
Le lun. 20 janv. 2020 à 14:14, Eugene Alvin Villar seav80@gmail.com a écrit :
On Mon, Jan 20, 2020, 9:06 PM Nicolas VIGNERON, < vigneron.nicolas@gmail.com> wrote
In a nutshell, you can safely assume that only the property 'stated in' matters.
I think 'inferred from' (P3452) should also be considered as an internal reference.
Probably but this property is not often used.
I managed to make a query work: https://w.wiki/FqH (number of use of the 76 properties used in the references space for items located in Paris, the last part is to have a small sample and avoid timeout ; also, maybe the SPARQL code could be optimized, I just wanted to have an overview and confirm my assumption that only a few property are widely used)
Cheers, ~nicolas
On 20/01/2020 14:19, Nicolas VIGNERON wrote:
Le lun. 20 janv. 2020 à 14:14, Eugene Alvin Villar seav80@gmail.com a écrit :
On Mon, Jan 20, 2020, 9:06 PM Nicolas VIGNERON, < vigneron.nicolas@gmail.com> wrote
In a nutshell, you can safely assume that only the property 'stated in' matters.
I think 'inferred from' (P3452) should also be considered as an internal reference.
Probably but this property is not often used.
I managed to make a query work: https://w.wiki/FqH (number of use of the 76 properties used in the references space for items located in Paris, the last part is to have a small sample and avoid timeout ; also, maybe the SPARQL code could be optimized, I just wanted to have an overview and confirm my assumption that only a few property are widely used)
Here's a version of the query with labels for the properties, making its output a bit more readable: https://w.wiki/Fr9
Note that for external references, it's very common just to give the identifier in the external database, via the appropriate property (perhaps accompanied by a "retrieved" = <date>, but usually not.
-- James.
Hi Gabriel, its Gabriel :)
If the query service is timing out on you and you'd like to work with a raw Wikidata JSON dump, you might find this package I've been working on useful.
https://qwikidata.readthedocs.io/en/stable/index.html
You could use it to pull out all of the reference/source statements. There is an example of accessing references in the "entity" section of the docs,
https://qwikidata.readthedocs.io/en/stable/entity.html
On Mon, Jan 20, 2020 at 12:52 PM James Heald jpm.heald@gmail.com wrote:
On 20/01/2020 14:19, Nicolas VIGNERON wrote:
Le lun. 20 janv. 2020 à 14:14, Eugene Alvin Villar seav80@gmail.com a écrit :
On Mon, Jan 20, 2020, 9:06 PM Nicolas VIGNERON, < vigneron.nicolas@gmail.com> wrote
In a nutshell, you can safely assume that only the property 'stated in' matters.
I think 'inferred from' (P3452) should also be considered as an internal reference.
Probably but this property is not often used.
I managed to make a query work: https://w.wiki/FqH (number of use of
the 76
properties used in the references space for items located in Paris, the last part is to have a small sample and avoid timeout ; also, maybe the SPARQL code could be optimized, I just wanted to have an overview and confirm my assumption that only a few property are widely used)
Here's a version of the query with labels for the properties, making its output a bit more readable: https://w.wiki/Fr9
Note that for external references, it's very common just to give the identifier in the external database, via the appropriate property (perhaps accompanied by a "retrieved" = <date>, but usually not.
-- James.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thank you very much for all your answers everyone.
I'm also trying to have a solid understanding of the wikidata data model. How references are stored and structured, how revisions are stored, etc.
My goal is to download the historic dump from wikidata and parse it into tables to analyse the references and their edits through time.
Can anyone point me to a resource which explains it?
Best regards,
Gabriel Maia Data Scientist and Developer
gabrielmaiarocha@gmail.com gabrielmaia7.github.io +55 85 99430 5370
Sent from the tiny tiny keys of my mobile phone
On Mon, Jan 20, 2020, 19:08 Gabriel Altay gabriel.altay@gmail.com wrote:
Hi Gabriel, its Gabriel :)
If the query service is timing out on you and you'd like to work with a raw Wikidata JSON dump, you might find this package I've been working on useful.
https://qwikidata.readthedocs.io/en/stable/index.html
You could use it to pull out all of the reference/source statements. There is an example of accessing references in the "entity" section of the docs,
https://qwikidata.readthedocs.io/en/stable/entity.html
On Mon, Jan 20, 2020 at 12:52 PM James Heald jpm.heald@gmail.com wrote:
On 20/01/2020 14:19, Nicolas VIGNERON wrote:
Le lun. 20 janv. 2020 à 14:14, Eugene Alvin Villar seav80@gmail.com a écrit :
On Mon, Jan 20, 2020, 9:06 PM Nicolas VIGNERON, < vigneron.nicolas@gmail.com> wrote
In a nutshell, you can safely assume that only the property 'stated
in'
matters.
I think 'inferred from' (P3452) should also be considered as an
internal
reference.
Probably but this property is not often used.
I managed to make a query work: https://w.wiki/FqH (number of use of
the 76
properties used in the references space for items located in Paris, the last part is to have a small sample and avoid timeout ; also, maybe the SPARQL code could be optimized, I just wanted to have an overview and confirm my assumption that only a few property are widely used)
Here's a version of the query with labels for the properties, making its output a bit more readable: https://w.wiki/Fr9
Note that for external references, it's very common just to give the identifier in the external database, via the appropriate property (perhaps accompanied by a "retrieved" = <date>, but usually not.
-- James.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata