My previous reply was partial and accidentally sent - here's my actual
reply :)
On Sun, Mar 22, 2015 at 1:53 PM, Dmitry Brant <dbrant(a)wikimedia.org> wrote:
Hi Lydia,
Indeed, there are many more Wikidata items than Wikipedia articles.
However, the users of our mobile apps only see Wikipedia articles in our
search results (at least for now), which means that they will only be able
to contribute descriptions to Wikidata items for which a Wikipedia article
exists.
They are also used in "*Recent*" and "*Nearby*" and Vibha wants them
in "*Saved
Pages*" list as well.
No doubt, the description field is an important
component of each Wikidata
entry. But, when there is a corresponding Wikipedia article, why not query
it to provide an automatic description? This could be based on the first
sentence of the article, or a subset of the first sentence, or some other
kind of metadata within the article.
Why not query it to provide an automatic description? Because finding the
best subset of the first sentence(s) isn't all there is to it.
For example, take the enwiki "Fish" article.
The first couple sentences are these:
*A fish is any member of a paraphyletic group of organisms that consist of
all gill-bearing aquatic craniate animals that lack limbs with digits.
Included in this definition are the living hagfish, lampreys, and
cartilaginous and bony fish, as well as various extinct related groups.*
So if the we reduce the description to its first sentence we have:
*A fish is any member of a paraphyletic group of organisms that consist of
all gill-bearing aquatic craniate animals that lack limbs with digits. *
Now, for the sake of argument, let's imagine the *bold* words below
represent a best case scenario for a relevant subset of the first sentence:
*A fish is* any member of *a* paraphyletic group of organisms that consist
of all *gill-bearing aquatic* craniate *animal*s that lack limbs with
digits.
So, we have "*A fish is a gill-bearing aquatic animal*", or you could
reduce it further to "a *gill-bearing aquatic animal*".
But reducing the first sentence in this way is deceptively complicated to
do programmatically, precisely because of the word "arguably" in the
preceding sentence - it's almost entirely a matter of qualitative
judgement. You have to know what a fish is to know what parts of the first
sentence are most important and then you have to know how to contextually
stitch these words together according to rules of the language's grammar
and syntax so they "read" nicely (see the word "a" and the
"s" on the end
of "animal*s*").
Basically, great descriptions require a native speaker of the language with
some skill at summarizing. This is such a low bar for humans that almost
anyone could contribute quality descriptions.
But, If descriptions are not human editable, then we are stuck with the
limitations of whatever heuristics are used to auto-generate the
description.
The key is that the description would stay with the
article, which would
eliminate the need for duplication and synchronization.
So, in a sense, I would look at it the other way: descriptions within
Wikipedia articles would be useful for Wikidata entries.
-Dmitry
On Sun, Mar 22, 2015 at 4:17 PM, Lydia Pintscher <
lydia.pintscher(a)wikimedia.de> wrote:
On Sun, Mar 22, 2015 at 9:10 PM, Dmitry Brant
<dbrant(a)wikimedia.org>
wrote:
Hi Jane,
Perhaps my comments came off as more pessimistic than I intended. Of
course
I believe in the power of crowdsourcing, and I
would never want to make
anyone feel like their contributions are being marginalized.
I'll agree for now that the idea of "fully" automated descriptions leans
more towards science fiction than reality. :)
However, my whole point has more to do with the apparent duplication of
content that seems to be happening between the first sentence of
Wikipedia
articles and the corresponding Wikidata
description. There's something
about it that seems unnecessary. If we can figure out a way to
automatically extract the description from the first sentence of the
article, it would simplify things in two ways:
1) People wouldn't need to edit Wikidata descriptions, and would instead
focus on improving the Wikipedia article.
2) People who monitor changes made to articles would need to monitor
only
the article, instead of the article plus its
corresponding Wikidata
description.
There are a lot more items on Wikidata than articles on Wikipedia. And
not every language has a Wikipedia article for each item. Don't just
look at descriptions on Wikidata as something useful for Wikipedia.
They're much more than that.
Cheers
Lydia
--
Lydia Pintscher -
http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Mobile-l mailing list
Mobile-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l
_______________________________________________
Mobile-l mailing list
Mobile-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l