Hi Michael,
Thanks for your comments! Some replies below:
I think an important discussion to have is that of
identifying what types
of entities can or should be subject to the quality of
completeness (or,
for reasons to be explained, incompleteness).
Totally agree. As mentioned before, some properties are inherently fuzzy
("SignificantEvent", "Occupation", "AwardsReceived") and
then completeness
is hard to define. But others have a rather well-agreed definition (child
as biological child, kings, bordering countries). Separating these classes
will require effort.
As a very (very) contrived example, it may be the case
in the future that
legally (because some fictitious court ruling) a "son"
classification is
reserved for only those who were conceived one year after their parents'
marriage because of bizarre tax policy. How then is a prior classification
of "compete" to be retroactively interpreted or reconciled?
I don't know the current policy of Wikidata regarding changes of concept
definitions, but I guess one would need to do the same for all completeness
statements that one would need to do for all regular facts that use the
modified concept: Recheck whether they are still valid.
In practice and in the spirit of the Incompleteness
Theorem it seems like
it could be more useful for the cool tool to be used
specifically to
identify entities which are incomplete, as opposed to complete.
Incompleteness is a quality which can be known and gives us actionable
steps for improving the quality of wikidata.
This is an interesting idea, we will see how to can add that to the
demonstrator! Would be interesting to see whether more often people flag
incomplete data, or directly complete it. A challenge in flagging data as
incomplete is that knowledge of incompleteness in many cases implies
knowing what is missing (I find examples of situations where one knows that
data is missing, but doesn't know the data a bit contrived (except for
functional properties)), so it would be interesting to see how such a
feature would be liked.
Cheers,
Simon
On 2 March 2016 at 21:51, Michael Karpeles <michael.karpeles(a)gmail.com>
wrote:
> I think the cases Tom provides are significant (not in evaluating the
> usefulness of the cool tool, but in considering how the quality of
> completeness is used within Wikidata). A lax interpretation of completeness
> exposes an opportunity for subjective misclassifation of entities which is
> hard to detect and/or correct for (because subjectivity is by definition
> ambiguous) .
>
> The COOL-WD tool itself seems useful and things like Godels Incompleteness
> Theorem should be considered as a cautionary guiding principle rather than
> a deterrent to progress (
>
https://michaelkarpeles.com/essays/philosophy/incompleteness-theorem). I
> think an important discussion to have is that of identifying what types of
> entities can or should be subject to the quality of completeness (or, for
> reasons to be explained, incompleteness).
>
> Some concepts are indeed both discreet and finite and also are not subject
> to variation or stochasticism over time (read: not subject to mcarthy's
> frame problem). A game of checkers which is fully explorable, and
> unchanging in its rules, is an example of a type of entity whose problem
> space can be considered completely known.
>
> But even in regarding an event which has already transpired, while one may
> argue the event itself is complete, this does not preclude new information
> coming to light (as Tom points out) which changes how we record this
> event. And if we really care that the event has already happened, that's
> probably better represented by its date field which is not subject to the
> same level of ambiguity.
>
As a very (very) contrived example, it may be the case
in the future that
> legally (because some fictitious court ruling) a
"son" classification is
> reserved for only those who were conceived one year after their parents'
> marriage because of bizarre tax policy. How then is a prior classification
> of "compete" to be retroactively interpreted or reconciled?
>
In practice and in the spirit of the Incompleteness
Theorem it seems like
> it could be more useful for the cool tool to be used
specifically to
> identify entities which are incomplete, as opposed to complete.
> Incompleteness is a quality which can be known and gives us actionable
> steps for improving the quality of wikidata.
>
> Additionally it seems like it would be helpful (and maybe this is how it
> works, I haven't checked) if the quality of incompleteness included some
> citation so researchers don't spend hours trying to investigate a claim
> which may have been made accidentally.
> Tom,
>
> how do we know whether anything is the truth? I would argue that for
> completeness statements, as discussed by James Heald above, we should use
> pretty much the same criteria we use for anything else - i.e. not truth,
> but whether the sources support that statement.
>
> I.e. I don't see much difference in the question of truth for the
> statement "Barack is the father of Malia" or "Barack's children
are Malia
> and Sasha and that's a complete list".
>
> Cheers,
> Denny
>
> On Wed, Mar 2, 2016 at 9:51 AM Michael Karpeles <
> michael.karpeles(a)gmail.com> wrote:
>
>> Godels incompleteness theorem, QED.
>> On Mar 2, 2016 8:39 AM, "Tom Morris" <tfmorris(a)gmail.com> wrote:
>>
>>> I can see how one could measure completeness of article transcription in
>>> the 1911 edition of Encyclopedia Britannica, but I don't see how you can
>>> measure the completeness of a list of Obama's children, or anyone
else's
>>> for that matter.
>>>
>>> First, it's temporally sensitive, so it depends on when you ask or what
>>> point in time you want to know about. Second, speaking as an occasional
>>> family historian, the father may not know all his children or, even if he
>>> does, be willing to divulge the complete list.
>>>
>>> This situation isn't unique to family composition, it applies to many,
>>> many aspects of the real world. How do you know something is
"complete?"
>>> The fact that you've got everything in the latest Rembrandt catalog
doesn't
>>> mean that you have a complete list of Rembrandt's paintings.
>>>
>>> Tom
>>>
>>> On Wed, Mar 2, 2016 at 11:27 AM, Fariz Darari <fadirra(a)gmail.com>
wrote:
>>>
>>>> Hello Jane,
>>>>
>>>> I did some look-up about the Encyclopedia Britannica on Wikisource
>>>> (thanks for the pointer), and yes, some part is complete (
>>>>
https://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Vol_1:1)
>>>> and some part is yet to be completed (
>>>>
https://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Vol_20:4
>>>> ).
>>>>
>>>> With respect to that use case, COOL-WD could perform the following
>>>> inference:
>>>> 1. Suppose that we are complete for all the volumes of the EB.
>>>> 2. Suppose that for each volume, we are also complete for all the
>>>> sections of the EB.
>>>> 3. And last, for each section, we are complete for all the topics.
>>>> Then, one conclusion by COOL-WD is that querying for all the topics of
>>>> all the sections of all the volumes of the EB would give us the complete
>>>> answers :)
>>>>
>>>> As for the statistical inference (e.g., how far are we complete for the
>>>> EB wrt. volumes, sections, and topics), this is indeed an interesting
idea
>>>> that could be featured on a next release of COOL-WD!
>>>>
>>>> Regards,
>>>> Fariz
>>>>
>>>> On Wed, Mar 2, 2016 at 11:26 AM, Jane Darnell <jane023(a)gmail.com>
>>>> wrote:
>>>>
>>>>> I had no problem reading your mail. Thinking it over, this would
also
>>>>> be a way to track the connection of dictionaries in Wikisource to
items in
>>>>> Wikidata. So for example, Wikisource has lots of imported articles
from the
>>>>> Encyclopedia Britannica 1911, and it would be nice to track
>>>>> 1) Completeness in Wikisource (how many articles are complete in
>>>>> section "A"?)
>>>>> 2) Completeness of matchups in section "A" to articles in
English
>>>>> Wikipedia (how many subjects of Wikisource EB 1911 "A"
articles have items
>>>>> on Wikidata with a link to English Wikipedia)
>>>>>
>>>>> Once you have all that, it would be interesting to know about the
>>>>> relative completeness of
>>>>> a) Places
>>>>> b) Events
>>>>> c) Male people
>>>>> d) Female people
>>>>>
>>>>> etc.
>>>>>
>>>>> On Wed, Mar 2, 2016 at 10:57 AM, Fariz Darari
<fadirra(a)gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Jane,
>>>>>>
>>>>>> thank you! Yes, that sounds like a suitable use case for
COOL-WD!
>>>>>>
>>>>>> PS: Pardon the formatting of the announcement email, somehow the
>>>>>> linebreaks are vanished :(
>>>>>> I am now experimenting with another email client, hopefully the
>>>>>> linebreaks are there.
>>>>>>
>>>>>> Regards,
>>>>>> Fariz
>>>>>>
>>>>>> On Wed, Mar 2, 2016 at 10:38 AM, Jane Darnell
<jane023(a)gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Wow this sounds great! I would love to use this for oeuvre
catalogs
>>>>>>> of top painters! The latest Rembrandt catalog is complete on
Wikidata, as
>>>>>>> well as a few other ones, but older ones are not yet
complete. This could
>>>>>>> be a great tracking tool for WikiProjects.
>>>>>>>
>>>>>>> On Wed, Mar 2, 2016 at 10:17 AM, Darari Fariz <
>>>>>>> Fariz.Darari(a)stud-inf.unibz.it> wrote:
>>>>>>>
>>>>>>>> Hello Wikidata community! Wikidata is a great platform
for
>>>>>>>> collecting information, and the high quality work of many
authors yields
>>>>>>>> very reliable information. Still, a challenge for users
of Wikidata is that
>>>>>>>> there is no way to see whether *all* data on a certain
topic is in
>>>>>>>> Wikidata. For instance, it is easy to see that Malia and
Sasha are children
>>>>>>>> of Obama, but there is no way to specify that these are
all his children.
>>>>>>>> More generally, Wikidata stores many facts, but it stores
no information
>>>>>>>> about which topic it contains all facts. Today we are
happy to share with
>>>>>>>> you a prototype that allows to add and manage such
completeness
>>>>>>>> information, and would be happy to get your feedback on
how useful you
>>>>>>>> consider this tool, or where you see space for
improvements. With our
>>>>>>>> prototype, called COOL-WD (Completeness Tool for
Wikidata), one can: 1. See
>>>>>>>> completeness statements for Wikidata facts 2. Add,
remove, aggregate and
>>>>>>>> filter completeness statements 3. See how completeness
statements allow
>>>>>>>> conclusions about the completeness of SPARQL queries over
Wikidata. COOL-WD
>>>>>>>> is available at
http://cool-wd.inf.unibz.it/ and a 3-min
demo
>>>>>>>> video can be found at
http://cool-wd.inf.unibz.it/coolwd-hd.mp4 It
>>>>>>>> employs various libraries, most importantly GWT, Apache
Jena, SQLite and
>>>>>>>> the Wikidata API. The formal background and description
of the tool
>>>>>>>> including an indexing technique for completeness
statements have been
>>>>>>>> accepted as a research paper at ICWE 2016 (
>>>>>>>>
http://icwe2016.inf.usi.ch/) available to download at:
>>>>>>>>
http://bit.ly/1VOsRCH Below are some naive ideas of how
>>>>>>>> completeness could be useful to users: > Use Case 1:
Rido is a geographer
>>>>>>>> who would like to contribute to Wikidata about the
administrative divisions
>>>>>>>> of regions. He cares so much about data quality,
especially data
>>>>>>>> completeness, and is collaborating with Simon, another
geographer. However,
>>>>>>>> when completing data on Wikidata, there is currently no
way to mark which
>>>>>>>> data is complete. Rido and Simon must make these notes
about completeness
>>>>>>>> manually in, say, a Google Doc. Worse still, the effort
from Rido and Simon
>>>>>>>> to complete data could not be appreciated by Wikidata
users since to the
>>>>>>>> users’ eyes, there is no difference between complete data
and incomplete
>>>>>>>> data on Wikidata. Demo: Wikidata is complete for all
administrative
>>>>>>>> divisions of Saxony
(
http://cool-wd.inf.unibz.it/?p=Q1202) > Use
>>>>>>>> Case 2: Jen is a developer of a moviegoer application.
She usually
>>>>>>>> integrates data between multiple sources including
Wikidata. If some movies
>>>>>>>> on Wikidata have completeness statements, she might
optimize her
>>>>>>>> application to not search in other data sources for those
movies. Demo: So,
>>>>>>>> when her app is asking on COOL-WD at
>>>>>>>>
http://cool-wd.inf.unibz.it/?p=query for cast and
screenwriters of
>>>>>>>> the movie Before Sunset
(
http://cool-wd.inf.unibz.it/?p=Q652186):
>>>>>>>> SELECT * WHERE { wd:Q652186 wdt:P161 ?c . wd:Q652186
wdt:P58 ?s } Her app
>>>>>>>> gets not only query answers but also the completeness
information of her
>>>>>>>> query. We are looking forward to your feedback! Best,
Fariz, Simon, Rido,
>>>>>>>> and Werner Free University of Bozen-Bolzano, Italy
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Wikidata mailing list
>>>>>>>> Wikidata(a)lists.wikimedia.org
>>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Wikidata mailing list
>>>>>>> Wikidata(a)lists.wikimedia.org
>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikidata mailing list
>>>>>> Wikidata(a)lists.wikimedia.org
>>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wikidata mailing list
>>>>> Wikidata(a)lists.wikimedia.org
>>>>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list
>>>> Wikidata(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wikidata mailing list
>>> Wikidata(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>> _______________________________________________
>> Wikidata mailing list
>> Wikidata(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>