Hi all,
the newest version of the LOD cloud is just a week old,[1] but the underlying information for Wikidata is out of date by several years.[2]
Has anyone looked into streamlining the data submission process?
Best,
Daniel
[1] https://lod-cloud.net/ [2] https://lod-cloud.net/dataset/wikidata
On Wed, 5 Aug 2020 at 22:12, Daniel Mietchen daniel.mietchen@googlemail.com wrote:
the newest version of the LOD cloud is just a week old,[1] but the underlying information for Wikidata is out of date by several years.[2]
Has anyone looked into streamlining the data submission process?
Yes; here's the discussion from 2018:
https://lists.wikimedia.org/pipermail/wikidata/2018-April/011988.html
Lucas Werkmeister was handing this from the Wikidata Dev Team's side:
https://lists.wikimedia.org/pipermail/wikidata/2018-May/012042.html
Of particular interest is the link counts, based on formatter URI for RDF resource (P1921).
There are 167 property - P1921 combinations: https://w.wiki/YsV
Calculating the number of links times out when run for all properties: https://w.wiki/Ysk So, an iterative script is likely more practical.
But basically, the query just counts the number of items with a property, but limited by the properties that have a P1921 defined. Surely, that can be done more efficiently.
Egon
On Wed, Aug 5, 2020 at 11:30 PM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On Wed, 5 Aug 2020 at 22:12, Daniel Mietchen daniel.mietchen@googlemail.com wrote:
the newest version of the LOD cloud is just a week old,[1] but the
underlying information for Wikidata is out of date by several years.[2]
Has anyone looked into streamlining the data submission process?
Yes; here's the discussion from 2018:
https://lists.wikimedia.org/pipermail/wikidata/2018-April/011988.html
Lucas Werkmeister was handing this from the Wikidata Dev Team's side:
https://lists.wikimedia.org/pipermail/wikidata/2018-May/012042.html
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Thu, Aug 6, 2020 at 10:58 AM Egon Willighagen egon.willighagen@gmail.com wrote:
Of particular interest is the link counts, based on formatter URI for RDF resource (P1921).
There are 167 property - P1921 combinations: https://w.wiki/YsV
Calculating the number of links times out when run for all properties: https://w.wiki/Ysk So, an iterative script is likely more practical.
But basically, the query just counts the number of items with a property, but limited by the properties that have a P1921 defined. Surely, that can be done more efficiently.
Egon
Hey :)
Adam got the current numbers for us at https://phabricator.wikimedia.org/P12190 Anyone up for helping put those in?
Cheers Lydia
On Fri, Aug 14, 2020 at 3:46 PM Lydia Pintscher Lydia.Pintscher@wikimedia.de wrote:
Hey :)
Adam got the current numbers for us at https://phabricator.wikimedia.org/P12190 Anyone up for helping put those in?
Oh and one thing I forgot: It would be pretty helpful for further automating this if we had the mapping between Wikidata's property and the LODCloud entries. Anyone up for proposing a new property for that?
Cheers Lydia
I'll have a go at it.
Egon
On Fri, Aug 14, 2020 at 3:51 PM Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
On Fri, Aug 14, 2020 at 3:46 PM Lydia Pintscher Lydia.Pintscher@wikimedia.de wrote:
Hey :)
Adam got the current numbers for us at
https://phabricator.wikimedia.org/P12190
Anyone up for helping put those in?
Oh and one thing I forgot: It would be pretty helpful for further automating this if we had the mapping between Wikidata's property and the LODCloud entries. Anyone up for proposing a new property for that?
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Proposed: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Linked_Open...
Egon
On Sat, Aug 15, 2020 at 8:38 AM Egon Willighagen egon.willighagen@gmail.com wrote:
I'll have a go at it.
Egon
On Fri, Aug 14, 2020 at 3:51 PM Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
On Fri, Aug 14, 2020 at 3:46 PM Lydia Pintscher Lydia.Pintscher@wikimedia.de wrote:
Hey :)
Adam got the current numbers for us at
https://phabricator.wikimedia.org/P12190
Anyone up for helping put those in?
Oh and one thing I forgot: It would be pretty helpful for further automating this if we had the mapping between Wikidata's property and the LODCloud entries. Anyone up for proposing a new property for that?
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Have you heard about Wikidata already? "Use Scholia and Wikidata to find scientific literature" is a new tutorial from my colleague Lauren Dupuis. https://laurendupuis.github.io/Scholia_tutorial/
E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: https://www.zotero.org/egonw ORCID: 0000-0001-7542-0286 http://orcid.org/0000-0001-7542-0286 ImpactStory: https://impactstory.org/u/egonwillighagen
On Sat, Aug 15, 2020 at 9:06 AM Egon Willighagen egon.willighagen@gmail.com wrote:
Proposed: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Linked_Open...
Thank you! :)
Cheers Lydia
On Sat, Aug 15, 2020 at 9:06 AM Egon Willighagen egon.willighagen@gmail.com wrote:
Proposed: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Linked_Open...
Egon
And we now have the Property \o/ https://www.wikidata.org/wiki/Property:P8605
Cheers Lydia
Ah, awesome!
I've been very busy and hope someone will beat me to it, but I was thinking of doing some webscraping and prepare a Mix'n'Match data set... but we can all also just do a few more manually :)
Egon
On Wed, Sep 16, 2020 at 3:54 PM Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2020 at 9:06 AM Egon Willighagen egon.willighagen@gmail.com wrote:
Proposed:
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Linked_Open...
Egon
And we now have the Property \o/ https://www.wikidata.org/wiki/Property:P8605
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hurrah ~__~
Might we find a faster process for adding properties for identifiers that cross some threshold of prominence + use? S
On Wed, Sep 16, 2020 at 9:54 AM Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2020 at 9:06 AM Egon Willighagen egon.willighagen@gmail.com wrote:
Proposed:
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Linked_Open...
Egon
And we now have the Property \o/ https://www.wikidata.org/wiki/Property:P8605
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi SJ,
I think we already have that with https://www.wikidata.org/wiki/Property_talk:P973 in some fashion?
This is to be used to provide links to reliable external resources that are not the item's official website, when no relevant "authority control" property exists (note: this should be moved to the property statements)
Scroll down on that link and look at its listed patterns that eventually get added to the TODO list. (look at the other *See also* properties it lists as well - which I often use)
NOTE: I'm not an expert on this particular P973 property of Wikidata however.
Thad https://www.linkedin.com/in/thadguidry/
On Wed, Sep 16, 2020 at 10:34 AM Samuel Klein meta.sj@gmail.com wrote:
Hurrah ~__~
Might we find a faster process for adding properties for identifiers that cross some threshold of prominence + use? S
All,
On Wed, Sep 16, 2020 at 3:54 PM Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
And we now have the Property \o/ https://www.wikidata.org/wiki/Property:P8605
When approved, a fifth example was added, for a Bio2RDF subset in the LOD cloud, linked the Bio2RDF entry in Wikidata itself (and not a Wikidata item for the subset). That would be possible, but then bio2rdf-pubchem (and many, many more) will be linked to that Wikidata item for Bio2RDF. Is that what we want? Or should bio2rdf-pubchem be linked to the PubChem entry in Wikidata? Etc?
What do you think?
Egon
Hi all,
a question here:
P8605 is shown as string, e.g. as "doi" in https://www.wikidata.org/wiki/Q5188229%C2%A0 , which is the last path segment of the identifier, shouldn't this be "http://lod-cloud.net/dataset/doi" ?
At least for the LOD Cloud project.
-- Sebastian
On 16.09.20 15:53, Lydia Pintscher wrote:
On Sat, Aug 15, 2020 at 9:06 AM Egon Willighagen egon.willighagen@gmail.com wrote:
Proposed: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Linked_Open...
Egon
And we now have the Property \o/ https://www.wikidata.org/wiki/Property:P8605
Cheers Lydia
i guess this is the downstream tools still needing to catch up. The formatter URL is defined: https://www.wikidata.org/wiki/Property:P8605#P1630
Egon
On Thu, Sep 17, 2020 at 8:52 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi all,
a question here:
P8605 is shown as string, e.g. as "doi" in https://www.wikidata.org/wiki/Q5188229 , which is the last path segment of the identifier, shouldn't this be "http://lod-cloud.net/dataset/doi" ?
At least for the LOD Cloud project.
-- Sebastian
On 16.09.20 15:53, Lydia Pintscher wrote:
On Sat, Aug 15, 2020 at 9:06 AM Egon Willighagen egon.willighagen@gmail.com wrote:
Proposed:
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Linked_Open...
Egon
And we now have the Property \o/
https://www.wikidata.org/wiki/Property:P8605
Cheers Lydia
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Wed, 16 Sep 2020 at 14:53, Lydia Pintscher Lydia.Pintscher@wikimedia.de wrote:
And we now have the Property \o/ https://www.wikidata.org/wiki/Property:P8605
Now in Mix'n'match:
https://mix-n-match.toolforge.org/#/catalog/3862
Awesome, thanks!
How was it done? I like to learn a bit more about the steps.
Egon
On Tue, Sep 22, 2020 at 11:02 AM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On Wed, 16 Sep 2020 at 14:53, Lydia Pintscher Lydia.Pintscher@wikimedia.de wrote:
And we now have the Property \o/
https://www.wikidata.org/wiki/Property:P8605
Now in Mix'n'match:
https://mix-n-match.toolforge.org/#/catalog/3862
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Tue, 22 Sep 2020 at 10:13, Egon Willighagen egon.willighagen@gmail.com wrote:
On Tue, Sep 22, 2020 at 11:02 AM Andy Mabbett andy@pigsonthewing.org.uk wrote:
Now in Mix'n'match:
How was it done? I like to learn a bit more about the steps.
In this case, they aheva JSON file, so I sent a link to that to Magnus, and he added it.
More generally, if I have the IDs and short descriptive text in a spreadsheet, I use:
https://mix-n-match.toolforge.org/mix-n-match/import.php
On Tue, 22 Sep 2020 at 10:01, Andy Mabbett andy@pigsonthewing.org.uk wrote:
Now in Mix'n'match:
This has highlighted an issue I've been concerned about for some time; the lack of granularity in property P1629, "subject item of this property".
For example, the data set has an entry for "National Library of Greece Authority Records", which corresponds to our property P3348 "National Library of Greece ID". Yet the P1629 on that property is for Q1467610, "National Library of Greece" - we have no item for the ID or its data set.
Of course, one institution can have many such ID types, and thus Wikidata properties.
There was quite strong opposition when I tried to create more items for sets of IDs (one successful example, for instance, is Q51044 of ORCID iDs). I think we will need to revisit that.
Andy,
It looks like P3348 is not actually filled out completely and a bit incorrectly to my eyes.
1. Wouldn't "National Library of Greece" be instead represented by "issued by" P2378, instead of P1629? Look at the properties for the type https://www.wikidata.org/wiki/Q18614948 2. Secondly, what is so important to know about the National Library of Greece Authority Records ID? that could all be captured on the property P3348 itself.
- It's an external identifier. We already know that through the authority assumption and ExternalId type. - there are a few things to say about the identifier, such as what does it really represent? a work? a title? both? broadly any single item in their *holdings*? ( look at how https://www.wikidata.org/wiki/Q18609040 uses facet of P1269 to hold the representation "taxon" at a higher level for all property that represent a "taxon" like https://www.wikidata.org/wiki/Property:P627 ) - Are the same IDs also held and included in a particular named database that often gets cited or referenced? Then we should say that they are included in that named database with some property (of which I don't know if we have something that works for that currently) - Are the IDs constrained and only issued and generated for some holding type or set of collections for National Library of Greece, for example, only official active holdings (not dropbox holdings that have not been vetted) - Is there an email or phone number for the Authority Records questions, then I'd add that also to P3348. - What else is there to say about the IDs, what they represent, who issues them, etc. Capture all the metadata of the authoritative ids on the ID property itself.
I think your particular question is that around bullet point 3, where an authority might maintain 2 or 3 holding types or collection sets, and then issue sets of identifiers for those 2 or 3. Each holding type or collection set might be in a database that is named and known and often cited and has many things to say about the database itself, like who owns it, when it was first created, etc. Irrespective of all the sets of ID's that it might contain.
Thad https://www.linkedin.com/in/thadguidry/
On Tue, Sep 22, 2020 at 4:38 AM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On Tue, 22 Sep 2020 at 10:01, Andy Mabbett andy@pigsonthewing.org.uk wrote:
Now in Mix'n'match:
This has highlighted an issue I've been concerned about for some time; the lack of granularity in property P1629, "subject item of this property".
For example, the data set has an entry for "National Library of Greece Authority Records", which corresponds to our property P3348 "National Library of Greece ID". Yet the P1629 on that property is for Q1467610, "National Library of Greece" - we have no item for the ID or its data set.
Of course, one institution can have many such ID types, and thus Wikidata properties.
There was quite strong opposition when I tried to create more items for sets of IDs (one successful example, for instance, is Q51044 of ORCID iDs). I think we will need to revisit that.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Tue, 22 Sep 2020 at 14:47, Thad Guidry thadguidry@gmail.com wrote:
It looks like P3348 is not actually filled out completely and a bit incorrectly to my eyes.
- Wouldn't "National Library of Greece" be instead represented by "issued by"
P2378, instead of P1629? Look at the properties for the type https://www.wikidata.org/wiki/Q18614948
I agree; that's apparently not what the community has decided.
- Secondly, what is so important to know about the National Library of Greece
Authority Records ID? that could all be captured on the property P3348 itself.
It could, but again I don't think that is how the community has chosen to model them.
Otherwise, what is P1629 for? Note the original description:
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Archive/27#P1629
"item corresponding exactly to the concept represented by the property, if applicable."
I think your particular question is that around bullet point 3
What question? I wasn't aware that I asked one.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
How will we handle LOD entries for data sets provided by a third party?
For example, Bio2RDF released over time quite a few data sets, each having a separate entry in the LOD cloud, e.g. Bio2rdf::DrugBank?
Do we create a separate Wikidata item for that? Link it to the Wikidata item for Drugbank, so that one database can have more than one LOD Cloud ID?
What do you think?
Egon
On Tue, Sep 22, 2020 at 11:02 AM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On Wed, 16 Sep 2020 at 14:53, Lydia Pintscher Lydia.Pintscher@wikimedia.de wrote:
And we now have the Property \o/
https://www.wikidata.org/wiki/Property:P8605
Now in Mix'n'match:
https://mix-n-match.toolforge.org/#/catalog/3862
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Tue, 22 Sep 2020 at 13:08, Egon Willighagen egon.willighagen@gmail.com wrote:
Bio2RDF released over time quite a few data sets, each having a separate entry in the LOD cloud, e.g. Bio2rdf::DrugBank?
Do we create a separate Wikidata item for that? Link it to the Wikidata item for Drugbank, so that one database can have more than one LOD Cloud ID?
I think we should have a separate entry for each entry in the LOD Cloud database; if necessary, sets of two or more of the can be grouped under a parent item.