Hi all,
You should in any case be sure to avoid allowing collections which
fall in Russell's paradox
<https://en.wikipedia.org/wiki/Russell%27s_paradox>. So if a
predicate "belongs to collection QX" is added such that an Wikidata
item can be stated as being part of an other, it must be envisionned
that at some point a request my aske "What is the collection of items
that do not belongs to themselves?".
Paradoxically logical,
mathieu
Le 27/11/2017 à 02:07, Arthur Smith a écrit :
I think the general idea of documenting
collections is a good one,
though I haven't thought carefully about this or some of the
responses already sent. However, I think the use of P361 (part of)
for this purpose might not be a good idea and a new property should
be proposed for it, or some other mechanism used for large
collection handling (collections added through Mix n Match for
example generally have external identifiers as their
collection-specific properties). My concern here is mainly that the
relationship is not generally going to be intrinsic to the item, and
is more related to the project doing the import work, while P361
should generally describe some intrinsic relationship that an item
has (for example a subsidiary being part of a parent company, a
component of a device being part of the device, a research article
being part of a particular journal issue, etc).
We do have a very new property that might be useable for this
purpose, though it is intended to link to Wikiprojects rather than
"collection" items - P4570 (Wikidata project). Or perhaps something
similar should be proposed?
Arthur
On Fri, Nov 24, 2017 at 6:30 PM, Dario Taraborelli
<dtaraborelli(a)wikimedia.org <mailto:dtaraborelli@wikimedia.org>> wrote:
Hey all,
I'd like to hear from you on a proposal to add some order and
structure to the various bibliographic corpora we currently have
in Wikidata.
As you may know, coverage of creative works in Wikidata has seen
significant growth over the last year. [1][2] Different groups and
projects have started importing source metadata for various
reasons:
* to provide sources machine-extracted statements (WikiFactMine
[3], StrepHit [4])
* to represent sources cited in Wikipedia (e.g. DOIs and PMIDs
imported via the mwcite identifier dumps) or other Wikimedia
projects (Wikisource, Wikispecies, Wikinews)
* to create collections of the open access literature citable
and reusable in Wikimedia projects (e.g. open access PMC
review articles)
* to maintain small, curated corpora about specific topics (e.g.
the Zika corpus [5])
While all these efforts have grown organically and with little
coordination, it's hard to keep track of who initiated the, to
clearly communicate their purpose, to understand their completion
criteria and their data quality needs, and last but not least to
offer any contribution opportunities (in terms of code, or manual
labor) to other community members. It's unclear if the future of
these efforts should continue to be within Wikidata, or leverage
the power of federated Wikibase-powered wikis (see our discussion
at the end of the WikiCite session at WikidataCon [6]).
Irrespective of the best long term solution, we need to provide
some better structure to these efforts today if we want to address
the above problems.
I'd like to propose a fairly simple solution and hear your
feedback on whether it makes sense to implement it as is or with
some modifications.
1. create a Wikidata class called "Wikidata item collection" [Q-X]
2. create and document individual collections (e.g. the Wikidata
Zika corpus [Q-Y]) as instances of this class: [Q-Y] --P31-->
[Q-X]
3. add appropriate metadata to describe such collections (its
main topic(s), creators, any external identifiers, if
applicable)
4. mark individual bibliographic items as part of [P361] the
corresponding collections
Note that this approach can apply to bibliographic item
collections but also to any other set of items not directly
identifiable via Wikidata properties. Of course, the same items
could obviously be part of multiple collections. Some criteria
would be needed to determine an appropriate threshold for
legitimate collections (we wouldn't want arbitrary collections to
be created for sets of items generated as part of a test import).
Beyond solving the issues listed above, this approach would also
allow us to generate dedicated statistics on the growth or data
quality of each collection via the SPARQL endpoint. It would also
allow us to design constraints for arbitrary item collections,
something that right now is not possible (unless these sets can
already be identified via a query).
If something similar already exists in the context of structured
data donations/imports for GLAM, I'd be most grateful for any
pointers.
Dario
[1]
http://wikicite.org/statistics.html
<http://wikicite.org/statistics.html>
[2]
https://doi.org/10.6084/m9.figshare.5548591.v1
<https://doi.org/10.6084/m9.figshare.5548591.v1>
[3]
https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine
<https://meta.wikimedia.org/wiki/Grants:Project/ContentMine/WikiFactMine>
[4]
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
<https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal>
[5]
https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Zika_Corpus>
[6]
https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidataco…
<https://mirror.netcologne.de/CCC/events/wikidatacon/2017/h264-hd/wikidatacon2017-10009-eng-WikiCite_Wikidata_as_a_structured_repository_of_bibliographic_data_hd.mp4>
-- Meta:
https://meta.wikimedia.org/wiki/WikiCite
<https://meta.wikimedia.org/wiki/WikiCite>
Twitter:
https://twitter.com/wikicite
---
You received this message because you are subscribed to the Google
Groups "wikicite-discuss" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to wikicite-discuss+unsubscribe(a)wikimedia.org
<mailto:wikicite-discuss+unsubscribe@wikimedia.org>.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org