Cleaning up bibliographic collections in Wikidata

List overview All Threads
Download

newer

older

Wikidata JSON dumps missing or...

Wikidata page for data publishers...

Dario Taraborelli

25 Nov 2017 25 Nov '17

12:30 a.m.

Hey all, I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata. As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons: - to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) - to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) - to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) - to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5]) While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems. I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications. 1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import). Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query). If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers. Dario [1] http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidataco…

Attachments:

attachment.htm (text/html — 4.4 KB)

Show replies by date

Andy Mabbett

25 Nov 25 Nov

1:25 a.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

On 24 November 2017 at 23:30, Dario Taraborelli <dtaraborelli(a)wikimedia.org> wrote:

...

I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications. create a Wikidata class called "Wikidata item collection" [Q-X]

This sounds like Wikimedia categories, as used on Wikipedia and Wikimedia Commons. -- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

John Erling Blad

5:42 a.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

...

On 24 November 2017 at 23:30, Dario Taraborelli <dtaraborelli(a)wikimedia.org> wrote:

This sounds like Wikimedia categories, as used on Wikipedia and Wikimedia Commons. -- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

James Heald

2:16 p.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

Like others in this thread, I would caution *against* overloading P31 "instance of" if possible. When a somewhat similar issue came up, re how to artists that were of interest the the "Black Lunch Table" project https://www.wikidata.org/wiki/Q28781198 that works on coverage of visual artists of the African diaspora, the solution adopted (after quite a vigorous debate at Project Chat) was to use property P972 "catalog" with the value Q28781198 to mark artists that were of interest to the project. A similar approach could be used here, if a project has a list of works of interest, that it would be valuable to record inclusion in. Best regards, James. On 25/11/2017 04:42, John Erling Blad wrote:

...

Implicit heterogeneous unordered containers where members sees a homogeneous parent. The member properties should be transitive to avoid the maintenance burden, like a "tracking property", and also to make the parent item manageable. I can't see anything that needs any kind of special structure at the entity level. Not even sure whether we need a new container for this, claims are already unordered containers. On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett <andy(a)pigsonthewing.org.uk> wrote: > On 24 November 2017 at 23:30, Dario Taraborelli > <dtaraborelli(a)wikimedia.org> wrote: > >> I'd like to propose a fairly simple solution and hear your feedback on >> whether it makes sense to implement it as is or with some modifications. >> >> create a Wikidata class called "Wikidata item collection" [Q-X] > > This sounds like Wikimedia categories, as used on Wikipedia and > Wikimedia Commons. > > -- > Andy Mabbett > @pigsonthewing > http://pigsonthewing.org.uk >

--- This email has been checked for viruses by AVG. http://www.avg.com

fn＠imm.dtu.dk

30 Nov 30 Nov

2:22 a.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

...

Implicit heterogeneous unordered containers where members sees a homogeneous parent. The member properties should be transitive to avoid the maintenance burden, like a "tracking property", and also to make the parent item manageable. I can't see anything that needs any kind of special structure at the entity level. Not even sure whether we need a new container for this, claims are already unordered containers. On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett <andy(a)pigsonthewing.org.uk> wrote: > On 24 November 2017 at 23:30, Dario Taraborelli > <dtaraborelli(a)wikimedia.org> wrote: > >> I'd like to propose a fairly simple solution and hear your feedback on >> whether it makes sense to implement it as is or with some >> modifications. >> >> create a Wikidata class called "Wikidata item collection" [Q-X] > > This sounds like Wikimedia categories, as used on Wikipedia and > Wikimedia Commons. > > -- > Andy Mabbett > @pigsonthewing > http://pigsonthewing.org.uk >

--- This email has been checked for viruses by AVG. http://www.avg.com _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Brill Lyle

7:31 a.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

In addition to the Black Lunch Table project, which focuses on Visual artists from the African Diaspora, that James mentions below -- the GLAM projects for Colección Patricia Phelps de Cisneros ("CPPC") and WIKIarte, which focus on Latin American art, are both using the catalog P972 as well. https://en.wikipedia.org/wiki/Wikipedia:GLAM/Colección_Patricia_Phelps_de_C… https://en.wikipedia.org/wiki/Wikipedia:WIKIarte/Tasks#Wikidata_task_lists I want to be transparent about this usage. It has been incredibly impactful, and as CPPC prepares for a Commons donation of public domain images -- that are artworks -- and that have Creator templates and are fully Wikidata-ified, this catalog will be mission critical to tracking and querying the items. Without this catalog property, it would be impossible to collocate the various projects. Especially as the metadata is not a large dataset, but is being painfully and manually input, often one by one. The metadata from the various sources is not in a dataset, and is constantly moving, changing, and expanding. - Erika *Erika Herzog* Wikipedia *User:BrillLyle <https://en.wikipedia.org/wiki/User:BrillLyle>* On Wed, Nov 29, 2017 at 8:22 PM, <fn(a)imm.dtu.dk> wrote:

...

Dario's 1-3 is fine by me. Rather than P361 (part of), I think - as James Heald - that P972 (catalog) would be better. It is also used for artworks, for instance, https://www.wikidata.org/wiki/Q44015154 P972 https://www.wikidata.org/wiki/Q42661788 I see P972 and the associated identifier P528 as a kind of lightweight external identifier. If the external dataset is available in RDF, then I suppose skos:exactMatch (P2888) can be used. For some items available in other databases I have used "external data available at" P1325. But the semantics of that require some data at the other end. --- Finn Årup Nielsen http://people.compute.dtu.dk/faan/ On 11/25/2017 02:16 PM, James Heald wrote: > > > When a somewhat similar issue came up, re how to artists that were of > interest the the "Black Lunch Table" project > https://www.wikidata.org/wiki/Q28781198 that works on coverage of visual > artists of the African diaspora, the solution adopted (after quite a > vigorous debate at Project Chat) was to use property P972 "catalog" with > the value Q28781198 to mark artists that were of interest to the project. > > A similar approach could be used here, if a project has a list of works > of interest, that it would be valuable to record inclusion in. > > Best regards, > > James. > >

Gerard Meijssen

25 Nov 25 Nov

4:01 p.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

Hoi, I "abuse" the property catalog for some time now just to get this effect. I use it to identify items that are part of a project like the "Black Lunch Table". It works really well it is used for instance to identify in queries; by associating it with a location, we know the subjects of / for an ediathon. The principle is the same Thanks, GerardM On 25 November 2017 at 05:42, John Erling Blad <jeblad(a)gmail.com> wrote:

...

On 24 November 2017 at 23:30, Dario Taraborelli <dtaraborelli(a)wikimedia.org> wrote:

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Sandra Fauconnier

4:06 p.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

It would be great to have federated Wikibase on meta.wikimedia.org <http://meta.wikimedia.org/> specifically to manage these kinds of projects, institutions we work with, etc. I saw that idea appear on wishlists somewhere, but, I guess, seeing the current workload for other projects, this is not going to be implemented soonish, so an ‘in between’ solution that is easily transferrable to external federated wikibase solutions later would be good. -- Sandra Fauconnier sandra.fauconnier(a)gmail.com http://www.spinster.be

...

On 25 Nov 2017, at 16:01, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote: Hoi, I "abuse" the property catalog for some time now just to get this effect. I use it to identify items that are part of a project like the "Black Lunch Table". It works really well it is used for instance to identify in queries; by associating it with a location, we know the subjects of / for an ediathon. The principle is the same Thanks, GerardM On 25 November 2017 at 05:42, John Erling Blad <jeblad(a)gmail.com <mailto:jeblad@gmail.com>> wrote: Implicit heterogeneous unordered containers where members sees a homogeneous parent. The member properties should be transitive to avoid the maintenance burden, like a "tracking property", and also to make the parent item manageable. I can't see anything that needs any kind of special structure at the entity level. Not even sure whether we need a new container for this, claims are already unordered containers. On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett <andy(a)pigsonthewing.org.uk <mailto:andy@pigsonthewing.org.uk>> wrote: On 24 November 2017 at 23:30, Dario Taraborelli <dtaraborelli(a)wikimedia.org <mailto:dtaraborelli@wikimedia.org>> wrote:

This sounds like Wikimedia categories, as used on Wikipedia and Wikimedia Commons. -- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk <http://pigsonthewing.org.uk/> _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata> _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata> _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

John Samuel

2:04 p.m.

Hi, I like the idea of creating collections. However 'official' federated Wikibase-powered instances and assigning them an external identifier can be another approach. This permits other instances to grow independently and leaves the choice to store only key (community-decided) information on Wikidata. Thus Wikidata still plays a major role in the discoverability of external federated instances. Best, John Samuel On Saturday, November 25, 2017 at 12:30:56 AM UTC+1, dtaraborelli wrote:

...

Jakob

6:50 p.m.

Hi Dario, Thanks for your proposal and starting the discussion. I'm skeptical about any items that refer to internal aspects of Wikidata so I wonder whether we actually need a rather artificial class such as "Wikidata item collection". You wrote:

...

2. create and document individual collections (e.g. the Wikidata Zika

corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X]

...

3. add appropriate metadata to describe such collections (its main

topic(s), creators, any external identifiers, if applicable)

...

4. mark individual bibliographic items as part of [P361] the

corresponding collections We already have several classes for collections, e.g. bibliography (Q134995) or bibliographic database (Q1789476), what's wrong or missing when using them? The Zika corpus is a bibliography. We also have a large number of other bibliographic databases that might get imported into Wikidata, e.g. PubMed that can be linked to bibliographic items in the same way.

...

Some criteria would be needed to determine an appropriate threshold for

legitimate

...

collections (we wouldn't want arbitrary collections to be created for

sets of items generated as part of a test import). That's the more important question. There should be at least a WikiProject page about the collection and I'd classifiy such projects as bibliographies or other kinds of catalogs.

...

If something similar already exists in the context of structured data

donations/imports for GLAM, I'd be most grateful for any pointers. See property catalog code (P528) and its use e.g. at Mona Lisa (Q12418). Jakob P.S: At the moment we have 9275 instances of catalog (Q2352616) or its subclasses.

Scott MacLeod

26 Nov 26 Nov

1 a.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

Hi Dario and All, Thanks, and per your "My thoughts on why we need a class to describe “(bibliographic) item collections” in @wikidata and @wikicite. Feedback welcome" - https://twitter.com/ReaderMeter/status/934204071870742528 (see reply here - https://twitter.com/WorldUnivAndSch/status/934570933372588032 ) - I wonder how to add a further "virtual place-based" component to this - beyond Wikidata geo-coordinates - since such collections (e.g. in all 8,444 entries in languages per Glottolog eventually, and over all history and pre-history) may become so numerous with time - given Wikimedia's mission of all knowledge - such that organizing such (bibliographic) items in collections within a realistic virtual earth (think Google Streetview/Maps/Earth with TIME SLIDER and group build-able like Minecraft) - e.g. for where an artist made such art works in the 1600s, or for museums that have numerous items of such a collection now, but didn't 100 years' ago ... and even eventually with avatar bots painting these again - may have much merit. (Wiki CC MIT OCW-centric World Univ & Sch is planning ALL Libraries and ALL Museums in each of all 8,444 languages in such a realistic virtual earth with TIME SLIDER). Thanks, Scott Harbin Hot Springs’ Actual/Virtual Ethnography http://bit.ly/HarbinBook <http://www.google.com/url?q=http%3A%2F%2Fbit.ly%2FHarbinBook&sa=D&sntz=1&usg=AFQjCNEog7UxaSx9sGg4042SnYfjlG8UmA> https://twitter.com/HarbinBook <https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2FHarbinBook&sa=D&sntz=1&usg=AFQjCNF0ee6nFWTxa02981cMhKPY-c68FQ> On Fri, Nov 24, 2017 at 3:30 PM, Dario Taraborelli < dtaraborelli(a)wikimedia.org> wrote:

...

Hey all, I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata. As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons: - to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) - to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) - to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) - to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5]) While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems. I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications. 1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import). Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query). If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers. Dario [1] http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_ Wikidata_Statements_Validation_via_References/Renewal [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/ 2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_ as_a_structured_repository_of_bibliographic_data_hd.mp4 -- Meta: https://meta.wikimedia.org/wiki/WikiCite Twitter: https://twitter.com/wikicite --- You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe(a)wikimedia.org.

-- -- - Scott MacLeod - Founder & President - World University and School - http://worlduniversityandschool.org - 415 480 4577 - http://scottmacleod.com - CC World University and School - like CC Wikipedia with best STEM-centric CC OpenCourseWare - incorporated as a nonprofit university and school in California, and is a U.S. 501 (c) (3) tax-exempt educational organization. IMPORTANT NOTICE: This transmission and any attachments are intended only for the use of the individual or entity to which they are addressed and may contain information that is privileged, confidential, or exempt from disclosure under applicable federal or state laws. If the reader of this transmission is not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this transmission in error, please notify me immediately by email or telephone. World University and School is sending you this because of your interest in free, online, higher education. If you don't want to receive these, please reply with 'unsubscribe' in the body of the email, leaving the subject line intact. Thank you.

Arthur Smith

27 Nov 27 Nov

2:07 a.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

...

Hey all, I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata. As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons: - to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) - to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) - to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) - to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5]) While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems. I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications. 1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import). Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query). If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers. Dario [1] http://wikicite.org/statistics.html [2] https://doi.org/10.6084/m9.figshare.5548591.v1 [3] https://meta.wikimedia.org/wiki/Grants:Project/ ContentMine/WikiFactMine [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_ Wikidata_Statements_Validation_via_References/Renewal [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus [6] https://mirror.netcologne.de/CCC/events/wikidatacon/ 2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_ as_a_structured_repository_of_bibliographic_data_hd.mp4 -- Meta: https://meta.wikimedia.org/wiki/WikiCite Twitter: https://twitter.com/wikicite --- You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe(a)wikimedia.org.

mathieu stumpf guntz

2 Dec 2 Dec

8:09 p.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

...

I think the general idea of documenting collections is a good one, though I haven't thought carefully about this or some of the responses already sent. However, I think the use of P361 (part of) for this purpose might not be a good idea and a new property should be proposed for it, or some other mechanism used for large collection handling (collections added through Mix n Match for example generally have external identifiers as their collection-specific properties). My concern here is mainly that the relationship is not generally going to be intrinsic to the item, and is more related to the project doing the import work, while P361 should generally describe some intrinsic relationship that an item has (for example a subsidiary being part of a parent company, a component of a device being part of the device, a research article being part of a particular journal issue, etc). We do have a very new property that might be useable for this purpose, though it is intended to link to Wikiprojects rather than "collection" items - P4570 (Wikidata project). Or perhaps something similar should be proposed? Arthur On Fri, Nov 24, 2017 at 6:30 PM, Dario Taraborelli <dtaraborelli(a)wikimedia.org <mailto:dtaraborelli@wikimedia.org>> wrote: Hey all, I'd like to hear from you on a proposal to add some order and structure to the various bibliographic corpora we currently have in Wikidata. As you may know, coverage of creative works in Wikidata has seen significant growth over the last year. [1][2] Different groups and projects have started importing source metadata for various reasons: * to provide sources machine-extracted statements (WikiFactMine [3], StrepHit [4]) * to represent sources cited in Wikipedia (e.g. DOIs and PMIDs imported via the mwcite identifier dumps) or other Wikimedia projects (Wikisource, Wikispecies, Wikinews) * to create collections of the open access literature citable and reusable in Wikimedia projects (e.g. open access PMC review articles) * to maintain small, curated corpora about specific topics (e.g. the Zika corpus [5]) While all these efforts have grown organically and with little coordination, it's hard to keep track of who initiated the, to clearly communicate their purpose, to understand their completion criteria and their data quality needs, and last but not least to offer any contribution opportunities (in terms of code, or manual labor) to other community members. It's unclear if the future of these efforts should continue to be within Wikidata, or leverage the power of federated Wikibase-powered wikis (see our discussion at the end of the WikiCite session at WikidataCon [6]). Irrespective of the best long term solution, we need to provide some better structure to these efforts today if we want to address the above problems. I'd like to propose a fairly simple solution and hear your feedback on whether it makes sense to implement it as is or with some modifications. 1. create a Wikidata class called "Wikidata item collection" [Q-X] 2. create and document individual collections (e.g. the Wikidata Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31--> [Q-X] 3. add appropriate metadata to describe such collections (its main topic(s), creators, any external identifiers, if applicable) 4. mark individual bibliographic items as part of [P361] the corresponding collections Note that this approach can apply to bibliographic item collections but also to any other set of items not directly identifiable via Wikidata properties. Of course, the same items could obviously be part of multiple collections. Some criteria would be needed to determine an appropriate threshold for legitimate collections (we wouldn't want arbitrary collections to be created for sets of items generated as part of a test import). Beyond solving the issues listed above, this approach would also allow us to generate dedicated statistics on the growth or data quality of each collection via the SPARQL endpoint. It would also allow us to design constraints for arbitrary item collections, something that right now is not possible (unless these sets can already be identified via a query). If something similar already exists in the context of structured data donations/imports for GLAM, I'd be most grateful for any pointers. Dario [1] http://wikicite.org/statistics.html <http://wikicite.org/statistics.html> [2] https://doi.org/10.6084/m9.figshare.5548591.v1 <https://doi.org/10.6084/m9.figshare.5548591.v1> [3] https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine <https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine> [4] https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… <https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal> [5] https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus <https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus> [6] https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidataco… <https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4> -- Meta: https://meta.wikimedia.org/wiki/WikiCite <https://meta.wikimedia.org/wiki/WikiCite> Twitter: https://twitter.com/wikicite --- You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe(a)wikimedia.org <mailto:wikicite-discuss+unsubscribe@wikimedia.org>. _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Peter F. Patel-Schneider

9:30 p.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

It's not easy to get to a true paradox with this collection. Not only do you have to be able to express it but you have to require that it exists. Peter F. Patel-Schneider On 12/02/2017 11:09 AM, mathieu stumpf guntz wrote:

...

Hi all, You should in any case be sure to avoid allowing collections which fall in Russell's paradox <https://en.wikipedia.org/wiki/Russell%27s_paradox>. So if a predicate "belongs to collection QX" is added such that an Wikidata item can be stated as being part of an other, it must be envisionned that at some point a request my aske "What is the collection of items that do not belongs to themselves?". Paradoxically logical, mathieu Le 27/11/2017 à 02:07, Arthur Smith a écrit : > I think the general idea of documenting collections is a good one, though I > haven't thought carefully about this or some of the responses already sent. > However, I think the use of P361 (part of) for this purpose might not be a > good idea and a new property should be proposed for it, or some other > mechanism used for large collection handling (collections added through Mix > n Match for example generally have external identifiers as their > collection-specific properties). My concern here is mainly that the > relationship is not generally going to be intrinsic to the item, and is > more related to the project doing the import work, while P361 should > generally describe some intrinsic relationship that an item has (for > example a subsidiary being part of a parent company, a component of a > device being part of the device, a research article being part of a > particular journal issue, etc). > > We do have a very new property that might be useable for this purpose, > though it is intended to link to Wikiprojects rather than "collection" > items - P4570 (Wikidata project). Or perhaps something similar should be > proposed? > > Arthur > >

Finn Aarup Nielsen

4 Dec 4 Dec

8:59 p.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

Hi, I do not see any problem. SELECT ?catalog ?catalogLabel WITH { SELECT DISTINCT ?catalog WHERE { ?item wdt:P972 ?catalog . FILTER NOT EXISTS { ?catalog wdt:P972 ?catalog . } } } AS %results WHERE { INCLUDE %results SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } P972 is catalog. There are presently 166 distinct. http://tinyurl.com/y7vxdacu I suppose you run into the paradox if you make a Wikidata item that is a catalog over the catalogs in Wikidata that are not member of themselves. That would at least include the 166. The question is then if the 167th catalog should be in its own catalog. As I see it you could have an infinite edit war with yourself as you first discover that it is not a member of itself, then add it, then find out that now it is a member of itself and then revert your own edit. :) I do recall myself almost getting trapped in editing Q16222597. Isn't it an instance of itself? /Finn On 12/02/2017 08:09 PM, mathieu stumpf guntz wrote:

...

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

mathieu stumpf guntz

5 Dec 5 Dec

1 a.m.

New subject: [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

Haha, nice reply thank you. Le 04/12/2017 à 20:59, Finn Aarup Nielsen a écrit :

...

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

2334

days inactive

2345

days old

wikidata@lists.wikimedia.org

Manage subscription

15 comments

15 participants

tags (0)

participants (15)

Andy Mabbett
Arthur Smith
Brill Lyle
Dario Taraborelli
Finn Aarup Nielsen
fn＠imm.dtu.dk
Gerard Meijssen
Jakob
James Heald
John Erling Blad
John Samuel
mathieu stumpf guntz
Peter F. Patel-Schneider
Sandra Fauconnier
Scott MacLeod