Hey all,
I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata.
As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons:
- to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) - to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) - to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) - to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5])
While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems.
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections
Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import).
Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query).
If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers.
Dario
[1] http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val... [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon...
On 24 November 2017 at 23:30, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
create a Wikidata class called "Wikidata item collection" [Q-X]
This sounds like Wikimedia categories, as used on Wikipedia and Wikimedia Commons.
Implicit heterogeneous unordered containers where members sees a homogeneous parent. The member properties should be transitive to avoid the maintenance burden, like a "tracking property", and also to make the parent item manageable.
I can't see anything that needs any kind of special structure at the entity level. Not even sure whether we need a new container for this, claims are already unordered containers.
On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 24 November 2017 at 23:30, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
create a Wikidata class called "Wikidata item collection" [Q-X]
This sounds like Wikimedia categories, as used on Wikipedia and Wikimedia Commons.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Like others in this thread, I would caution *against* overloading P31 "instance of" if possible.
When a somewhat similar issue came up, re how to artists that were of interest the the "Black Lunch Table" project https://www.wikidata.org/wiki/Q28781198 that works on coverage of visual artists of the African diaspora, the solution adopted (after quite a vigorous debate at Project Chat) was to use property P972 "catalog" with the value Q28781198 to mark artists that were of interest to the project.
A similar approach could be used here, if a project has a list of works of interest, that it would be valuable to record inclusion in.
Best regards,
James.
On 25/11/2017 04:42, John Erling Blad wrote:
Implicit heterogeneous unordered containers where members sees a homogeneous parent. The member properties should be transitive to avoid the maintenance burden, like a "tracking property", and also to make the parent item manageable.
I can't see anything that needs any kind of special structure at the entity level. Not even sure whether we need a new container for this, claims are already unordered containers.
On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 24 November 2017 at 23:30, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
create a Wikidata class called "Wikidata item collection" [Q-X]
This sounds like Wikimedia categories, as used on Wikipedia and Wikimedia Commons.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
--- This email has been checked for viruses by AVG. http://www.avg.com
Dario's 1-3 is fine by me. Rather than P361 (part of), I think - as James Heald - that P972 (catalog) would be better. It is also used for artworks, for instance,
https://www.wikidata.org/wiki/Q44015154 P972 https://www.wikidata.org/wiki/Q42661788
I see P972 and the associated identifier P528 as a kind of lightweight external identifier.
If the external dataset is available in RDF, then I suppose skos:exactMatch (P2888) can be used.
For some items available in other databases I have used "external data available at" P1325. But the semantics of that require some data at the other end.
--- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
On 11/25/2017 02:16 PM, James Heald wrote:
Like others in this thread, I would caution *against* overloading P31 "instance of" if possible.
When a somewhat similar issue came up, re how to artists that were of interest the the "Black Lunch Table" project https://www.wikidata.org/wiki/Q28781198 that works on coverage of visual artists of the African diaspora, the solution adopted (after quite a vigorous debate at Project Chat) was to use property P972 "catalog" with the value Q28781198 to mark artists that were of interest to the project.
A similar approach could be used here, if a project has a list of works of interest, that it would be valuable to record inclusion in.
Best regards,
James.
On 25/11/2017 04:42, John Erling Blad wrote:
Implicit heterogeneous unordered containers where members sees a homogeneous parent. The member properties should be transitive to avoid the maintenance burden, like a "tracking property", and also to make the parent item manageable.
I can't see anything that needs any kind of special structure at the entity level. Not even sure whether we need a new container for this, claims are already unordered containers.
On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 24 November 2017 at 23:30, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
create a Wikidata class called "Wikidata item collection" [Q-X]
This sounds like Wikimedia categories, as used on Wikipedia and Wikimedia Commons.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
This email has been checked for viruses by AVG. http://www.avg.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
In addition to the Black Lunch Table project, which focuses on Visual artists from the African Diaspora, that James mentions below -- the GLAM projects for Colección Patricia Phelps de Cisneros ("CPPC") and WIKIarte, which focus on Latin American art, are both using the catalog P972 as well.
https://en.wikipedia.org/wiki/Wikipedia:GLAM/Colecci%C3%B3n_Patricia_Phelps_... https://en.wikipedia.org/wiki/Wikipedia:WIKIarte/Tasks#Wikidata_task_lists
I want to be transparent about this usage. It has been incredibly impactful, and as CPPC prepares for a Commons donation of public domain images -- that are artworks -- and that have Creator templates and are fully Wikidata-ified, this catalog will be mission critical to tracking and querying the items.
Without this catalog property, it would be impossible to collocate the various projects. Especially as the metadata is not a large dataset, but is being painfully and manually input, often one by one. The metadata from the various sources is not in a dataset, and is constantly moving, changing, and expanding.
- Erika
*Erika Herzog* Wikipedia *User:BrillLyle https://en.wikipedia.org/wiki/User:BrillLyle*
On Wed, Nov 29, 2017 at 8:22 PM, fn@imm.dtu.dk wrote:
Dario's 1-3 is fine by me. Rather than P361 (part of), I think - as James Heald - that P972 (catalog) would be better. It is also used for artworks, for instance,
https://www.wikidata.org/wiki/Q44015154 P972 https://www.wikidata.org/wiki/Q42661788
I see P972 and the associated identifier P528 as a kind of lightweight external identifier.
If the external dataset is available in RDF, then I suppose skos:exactMatch (P2888) can be used.
For some items available in other databases I have used "external data available at" P1325. But the semantics of that require some data at the other end.
Finn Årup Nielsen http://people.compute.dtu.dk/faan/
On 11/25/2017 02:16 PM, James Heald wrote:
When a somewhat similar issue came up, re how to artists that were of interest the the "Black Lunch Table" project https://www.wikidata.org/wiki/Q28781198 that works on coverage of visual artists of the African diaspora, the solution adopted (after quite a vigorous debate at Project Chat) was to use property P972 "catalog" with the value Q28781198 to mark artists that were of interest to the project.
A similar approach could be used here, if a project has a list of works of interest, that it would be valuable to record inclusion in.
Best regards,
James.
Hoi, I "abuse" the property catalog for some time now just to get this effect. I use it to identify items that are part of a project like the "Black Lunch Table". It works really well it is used for instance to identify in queries; by associating it with a location, we know the subjects of / for an ediathon.
The principle is the same Thanks, GerardM
On 25 November 2017 at 05:42, John Erling Blad jeblad@gmail.com wrote:
Implicit heterogeneous unordered containers where members sees a homogeneous parent. The member properties should be transitive to avoid the maintenance burden, like a "tracking property", and also to make the parent item manageable.
I can't see anything that needs any kind of special structure at the entity level. Not even sure whether we need a new container for this, claims are already unordered containers.
On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 24 November 2017 at 23:30, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
create a Wikidata class called "Wikidata item collection" [Q-X]
This sounds like Wikimedia categories, as used on Wikipedia and Wikimedia Commons.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
It would be great to have federated Wikibase on meta.wikimedia.org http://meta.wikimedia.org/ specifically to manage these kinds of projects, institutions we work with, etc. I saw that idea appear on wishlists somewhere, but, I guess, seeing the current workload for other projects, this is not going to be implemented soonish, so an ‘in between’ solution that is easily transferrable to external federated wikibase solutions later would be good.
Hi,
I like the idea of creating collections. However 'official' federated Wikibase-powered instances and assigning them an external identifier can be another approach. This permits other instances to grow independently and leaves the choice to store only key (community-decided) information on Wikidata. Thus Wikidata still plays a major role in the discoverability of external federated instances.
Best, John Samuel
On Saturday, November 25, 2017 at 12:30:56 AM UTC+1, dtaraborelli wrote:
Hey all,
I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata.
As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons:
- to provide sources machine-extracted statements (WikiFactMine [3],
StrepHit [4])
- to represent sources cited in Wikipedia (e.g. DOIs and PMIDs
imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews)
- to create collections of the open access literature citable and
reusable in Wikimedia projects (e.g. open access PMC review articles)
- to maintain small, curated corpora about specific topics (e.g. the
Zika corpus [5])
While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems.
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
- create a Wikidata class called "Wikidata item collection" [Q-X]
- create and document individual collections (e.g. the Wikidata Zika
corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections
Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import).
Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query).
If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers.
Dario
[1] http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val... [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon...
Hi Dario,
Thanks for your proposal and starting the discussion. I'm skeptical about any items that refer to internal aspects of Wikidata so I wonder whether we actually need a rather artificial class such as "Wikidata item collection". You wrote:
- create and document individual collections (e.g. the Wikidata Zika
corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X]
- add appropriate metadata to describe such collections (its main
topic(s), creators, any external identifiers, if applicable)
- mark individual bibliographic items as part of [P361] the
corresponding collections
We already have several classes for collections, e.g. bibliography (Q134995) or bibliographic database (Q1789476), what's wrong or missing when using them? The Zika corpus is a bibliography. We also have a large number of other bibliographic databases that might get imported into Wikidata, e.g. PubMed that can be linked to bibliographic items in the same way.
Some criteria would be needed to determine an appropriate threshold for
legitimate
collections (we wouldn't want arbitrary collections to be created for
sets of items generated as part of a test import).
That's the more important question. There should be at least a WikiProject page about the collection and I'd classifiy such projects as bibliographies or other kinds of catalogs.
If something similar already exists in the context of structured data
donations/imports for GLAM, I'd be most grateful for any pointers.
See property catalog code (P528) and its use e.g. at Mona Lisa (Q12418).
Jakob
P.S: At the moment we have 9275 instances of catalog (Q2352616) or its subclasses.
Hi Dario and All,
Thanks, and per your "My thoughts on why we need a class to describe “(bibliographic) item collections” in @wikidata and @wikicite. Feedback welcome" - https://twitter.com/ReaderMeter/status/934204071870742528 (see reply here - https://twitter.com/WorldUnivAndSch/status/934570933372588032 ) - I wonder how to add a further "virtual place-based" component to this - beyond Wikidata geo-coordinates - since such collections (e.g. in all 8,444 entries in languages per Glottolog eventually, and over all history and pre-history) may become so numerous with time - given Wikimedia's mission of all knowledge - such that organizing such (bibliographic) items in collections within a realistic virtual earth (think Google Streetview/Maps/Earth with TIME SLIDER and group build-able like Minecraft) - e.g. for where an artist made such art works in the 1600s, or for museums that have numerous items of such a collection now, but didn't 100 years' ago ... and even eventually with avatar bots painting these again - may have much merit. (Wiki CC MIT OCW-centric World Univ & Sch is planning ALL Libraries and ALL Museums in each of all 8,444 languages in such a realistic virtual earth with TIME SLIDER).
Thanks, Scott Harbin Hot Springs’ Actual/Virtual Ethnography http://bit.ly/HarbinBook http://www.google.com/url?q=http%3A%2F%2Fbit.ly%2FHarbinBook&sa=D&sntz=1&usg=AFQjCNEog7UxaSx9sGg4042SnYfjlG8UmA https://twitter.com/HarbinBook https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2FHarbinBook&sa=D&sntz=1&usg=AFQjCNF0ee6nFWTxa02981cMhKPY-c68FQ
On Fri, Nov 24, 2017 at 3:30 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Hey all,
I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata.
As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons:
- to provide sources machine-extracted statements (WikiFactMine [3],
StrepHit [4])
- to represent sources cited in Wikipedia (e.g. DOIs and PMIDs
imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews)
- to create collections of the open access literature citable and
reusable in Wikimedia projects (e.g. open access PMC review articles)
- to maintain small, curated corpora about specific topics (e.g. the
Zika corpus [5])
While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems.
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
- create a Wikidata class called "Wikidata item collection" [Q-X]
- create and document individual collections (e.g. the Wikidata Zika
corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections
Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import).
Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query).
If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers.
Dario
[1] http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_ Wikidata_Statements_Validation_via_References/Renewal [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/ 2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_ as_a_structured_repository_of_bibliographic_data_hd.mp4
-- Meta: https://meta.wikimedia.org/wiki/WikiCite Twitter: https://twitter.com/wikicite
You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe@wikimedia.org.
I think the general idea of documenting collections is a good one, though I haven't thought carefully about this or some of the responses already sent. However, I think the use of P361 (part of) for this purpose might not be a good idea and a new property should be proposed for it, or some other mechanism used for large collection handling (collections added through Mix n Match for example generally have external identifiers as their collection-specific properties). My concern here is mainly that the relationship is not generally going to be intrinsic to the item, and is more related to the project doing the import work, while P361 should generally describe some intrinsic relationship that an item has (for example a subsidiary being part of a parent company, a component of a device being part of the device, a research article being part of a particular journal issue, etc).
We do have a very new property that might be useable for this purpose, though it is intended to link to Wikiprojects rather than "collection" items - P4570 (Wikidata project). Or perhaps something similar should be proposed?
Arthur
On Fri, Nov 24, 2017 at 6:30 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Hey all,
I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata.
As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons:
- to provide sources machine-extracted statements (WikiFactMine [3],
StrepHit [4])
- to represent sources cited in Wikipedia (e.g. DOIs and PMIDs
imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews)
- to create collections of the open access literature citable and
reusable in Wikimedia projects (e.g. open access PMC review articles)
- to maintain small, curated corpora about specific topics (e.g. the
Zika corpus [5])
While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems.
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
- create a Wikidata class called "Wikidata item collection" [Q-X]
- create and document individual collections (e.g. the Wikidata Zika
corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections
Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import).
Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query).
If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers.
Dario
[1] http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_ Wikidata_Statements_Validation_via_References/Renewal [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/ 2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_ as_a_structured_repository_of_bibliographic_data_hd.mp4
-- Meta: https://meta.wikimedia.org/wiki/WikiCite Twitter: https://twitter.com/wikicite
You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe@wikimedia.org.
Hi all,
You should in any case be sure to avoid allowing collections which fall in Russell's paradox https://en.wikipedia.org/wiki/Russell%27s_paradox. So if a predicate "belongs to collection QX" is added such that an Wikidata item can be stated as being part of an other, it must be envisionned that at some point a request my aske "What is the collection of items that do not belongs to themselves?".
Paradoxically logical, mathieu
Le 27/11/2017 à 02:07, Arthur Smith a écrit :
I think the general idea of documenting collections is a good one, though I haven't thought carefully about this or some of the responses already sent. However, I think the use of P361 (part of) for this purpose might not be a good idea and a new property should be proposed for it, or some other mechanism used for large collection handling (collections added through Mix n Match for example generally have external identifiers as their collection-specific properties). My concern here is mainly that the relationship is not generally going to be intrinsic to the item, and is more related to the project doing the import work, while P361 should generally describe some intrinsic relationship that an item has (for example a subsidiary being part of a parent company, a component of a device being part of the device, a research article being part of a particular journal issue, etc).
We do have a very new property that might be useable for this purpose, though it is intended to link to Wikiprojects rather than "collection" items - P4570 (Wikidata project). Or perhaps something similar should be proposed?
Arthur
On Fri, Nov 24, 2017 at 6:30 PM, Dario Taraborelli <dtaraborelli@wikimedia.org mailto:dtaraborelli@wikimedia.org> wrote:
Hey all, I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata. As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons: * to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) * to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) * to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) * to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5]) While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems. I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications. 1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import). Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query). If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers. Dario [1] http://wikicite.org/statistics.html <http://wikicite.org/statistics.html> [2] https://doi.org/10.6084/m9.figshare.5548591.v1 <https://doi.org/10.6084/m9.figshare.5548591.v1> [3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine <https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine> [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal <https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal> [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus <https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus> [6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4 <https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4> -- Meta: https://meta.wikimedia.org/wiki/WikiCite <https://meta.wikimedia.org/wiki/WikiCite> Twitter: https://twitter.com/wikicite --- You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe@wikimedia.org <mailto:wikicite-discuss+unsubscribe@wikimedia.org>.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
It's not easy to get to a true paradox with this collection. Not only do you have to be able to express it but you have to require that it exists.
Peter F. Patel-Schneider
On 12/02/2017 11:09 AM, mathieu stumpf guntz wrote:
Hi all,
You should in any case be sure to avoid allowing collections which fall in Russell's paradox https://en.wikipedia.org/wiki/Russell%27s_paradox. So if a predicate "belongs to collection QX" is added such that an Wikidata item can be stated as being part of an other, it must be envisionned that at some point a request my aske "What is the collection of items that do not belongs to themselves?".
Paradoxically logical, mathieu
Le 27/11/2017 à 02:07, Arthur Smith a écrit :
I think the general idea of documenting collections is a good one, though I haven't thought carefully about this or some of the responses already sent. However, I think the use of P361 (part of) for this purpose might not be a good idea and a new property should be proposed for it, or some other mechanism used for large collection handling (collections added through Mix n Match for example generally have external identifiers as their collection-specific properties). My concern here is mainly that the relationship is not generally going to be intrinsic to the item, and is more related to the project doing the import work, while P361 should generally describe some intrinsic relationship that an item has (for example a subsidiary being part of a parent company, a component of a device being part of the device, a research article being part of a particular journal issue, etc).
We do have a very new property that might be useable for this purpose, though it is intended to link to Wikiprojects rather than "collection" items - P4570 (Wikidata project). Or perhaps something similar should be proposed?
Arthur
Hi,
I do not see any problem.
SELECT ?catalog ?catalogLabel WITH { SELECT DISTINCT ?catalog WHERE { ?item wdt:P972 ?catalog . FILTER NOT EXISTS { ?catalog wdt:P972 ?catalog . } } } AS %results WHERE { INCLUDE %results SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
P972 is catalog. There are presently 166 distinct.
I suppose you run into the paradox if you make a Wikidata item that is a catalog over the catalogs in Wikidata that are not member of themselves. That would at least include the 166. The question is then if the 167th catalog should be in its own catalog. As I see it you could have an infinite edit war with yourself as you first discover that it is not a member of itself, then add it, then find out that now it is a member of itself and then revert your own edit. :)
I do recall myself almost getting trapped in editing Q16222597. Isn't it an instance of itself?
/Finn
On 12/02/2017 08:09 PM, mathieu stumpf guntz wrote:
Hi all,
You should in any case be sure to avoid allowing collections which fall in Russell's paradox https://en.wikipedia.org/wiki/Russell%27s_paradox. So if a predicate "belongs to collection QX" is added such that an Wikidata item can be stated as being part of an other, it must be envisionned that at some point a request my aske "What is the collection of items that do not belongs to themselves?".
Paradoxically logical, mathieu
Le 27/11/2017 à 02:07, Arthur Smith a écrit :
I think the general idea of documenting collections is a good one, though I haven't thought carefully about this or some of the responses already sent. However, I think the use of P361 (part of) for this purpose might not be a good idea and a new property should be proposed for it, or some other mechanism used for large collection handling (collections added through Mix n Match for example generally have external identifiers as their collection-specific properties). My concern here is mainly that the relationship is not generally going to be intrinsic to the item, and is more related to the project doing the import work, while P361 should generally describe some intrinsic relationship that an item has (for example a subsidiary being part of a parent company, a component of a device being part of the device, a research article being part of a particular journal issue, etc).
We do have a very new property that might be useable for this purpose, though it is intended to link to Wikiprojects rather than "collection" items - P4570 (Wikidata project). Or perhaps something similar should be proposed?
Arthur
On Fri, Nov 24, 2017 at 6:30 PM, Dario Taraborelli <dtaraborelli@wikimedia.org mailto:dtaraborelli@wikimedia.org> wrote:
Hey all, I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata. As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons: * to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) * to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) * to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) * to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5]) While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems. I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications. 1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import). Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query). If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers. Dario [1] http://wikicite.org/statistics.html <http://wikicite.org/statistics.html> [2] https://doi.org/10.6084/m9.figshare.5548591.v1 <https://doi.org/10.6084/m9.figshare.5548591.v1> [3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine <https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine> [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal <https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal> [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus <https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus> [6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4 <https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4> -- Meta: https://meta.wikimedia.org/wiki/WikiCite <https://meta.wikimedia.org/wiki/WikiCite> Twitter: https://twitter.com/wikicite --- You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe@wikimedia.org <mailto:wikicite-discuss+unsubscribe@wikimedia.org>.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Haha, nice reply thank you.
Le 04/12/2017 à 20:59, Finn Aarup Nielsen a écrit :
Hi,
I do not see any problem.
SELECT ?catalog ?catalogLabel WITH { SELECT DISTINCT ?catalog WHERE { ?item wdt:P972 ?catalog . FILTER NOT EXISTS { ?catalog wdt:P972 ?catalog . } } } AS %results WHERE { INCLUDE %results SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
P972 is catalog. There are presently 166 distinct.
I suppose you run into the paradox if you make a Wikidata item that is a catalog over the catalogs in Wikidata that are not member of themselves. That would at least include the 166. The question is then if the 167th catalog should be in its own catalog. As I see it you could have an infinite edit war with yourself as you first discover that it is not a member of itself, then add it, then find out that now it is a member of itself and then revert your own edit. :)
I do recall myself almost getting trapped in editing Q16222597. Isn't it an instance of itself?
/Finn
On 12/02/2017 08:09 PM, mathieu stumpf guntz wrote:
Hi all,
You should in any case be sure to avoid allowing collections which fall in Russell's paradox https://en.wikipedia.org/wiki/Russell%27s_paradox. So if a predicate "belongs to collection QX" is added such that an Wikidata item can be stated as being part of an other, it must be envisionned that at some point a request my aske "What is the collection of items that do not belongs to themselves?".
Paradoxically logical, mathieu
Le 27/11/2017 à 02:07, Arthur Smith a écrit :
I think the general idea of documenting collections is a good one, though I haven't thought carefully about this or some of the responses already sent. However, I think the use of P361 (part of) for this purpose might not be a good idea and a new property should be proposed for it, or some other mechanism used for large collection handling (collections added through Mix n Match for example generally have external identifiers as their collection-specific properties). My concern here is mainly that the relationship is not generally going to be intrinsic to the item, and is more related to the project doing the import work, while P361 should generally describe some intrinsic relationship that an item has (for example a subsidiary being part of a parent company, a component of a device being part of the device, a research article being part of a particular journal issue, etc).
We do have a very new property that might be useable for this purpose, though it is intended to link to Wikiprojects rather than "collection" items - P4570 (Wikidata project). Or perhaps something similar should be proposed?
Arthur
On Fri, Nov 24, 2017 at 6:30 PM, Dario Taraborelli <dtaraborelli@wikimedia.org mailto:dtaraborelli@wikimedia.org> wrote:
Hey all,
I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata.
As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons:
* to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) * to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) * to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) * to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5])
While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems.
I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications.
1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections
Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import).
Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query).
If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers.
Dario
[1] http://wikicite.org/statistics.html http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val... https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon... https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4 -- Meta: https://meta.wikimedia.org/wiki/WikiCite https://meta.wikimedia.org/wiki/WikiCite Twitter: https://twitter.com/wikicite --- You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe@wikimedia.org mailto:wikicite-discuss+unsubscribe@wikimedia.org.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata