On Wed, Aug 19, 2015 at 11:19 PM Monte Hurd <mhurd(a)wikimedia.org> wrote:
No manual descriptions, on basically any item. And
that will remain so for
the (near) future. Automatic descriptions can
change that, literally over
night, with a little programming and linguistic effort. ... This is a
"force multiplier" of volunteer effort with a factor of 250. And we ignore
that ... why, exactly?
Not ignoring. In fact, if the auto-generated descriptions near the quality
of human curated descriptions, I'm totally and wholeheartedly onboard that
their use should be strongly considered.
I just disagree that closing the quality gap will involve "little
programming and linguistic effort." I lean more toward "massive programming
and linguistic effort" end of the spectrum.
Specifically, I think it will take massive effort to make the
auto-generated descriptions so good that an average person would say, "hey
these auto generated descriptions are better than the human curated
descriptions" in the examples I posted.
You are confusing (in the literal meaning of the word, fusing together)
several
issues into one here, which you then call "better". I see at least
five distinct types of "better":
1. A description exists, vs. it does not. In that aspect, automatic
descriptions will always be "better" than manual ones.
2. One description is more complete than the other. From what I see in
random examples, this is already the case for many biographical items that
have a lot of statements. I have actually considered cutting them back a
little, because even these "short" descriptions can get quite extensive.
3. Context-aware, specifically, the context where the description is shown.
This one goes to the automatic descriptions. AutoDesc already can generate
plain text, links to Wikidata, links to a specific Wikipedia where there
are articles, and use plain text/redlinks/Wikidata links otherwise. It can
generate Wikitext, with some infoboxes. It could easily generate HTML
blurbs with a thumbnail if there is an image, and so on. This if contrasted
with plain text for manual descriptions.
4. Linguistic/style. Manual descriptions CAN be better phrased than
automatic ones, but can also be worse. Automatic descriptions are
unimaginative, but consistent. Here is where I probably beg to differ from
most other people on this thread: I firmly believe that a description, even
if it is slightly wrong grammatically, is preferable to no description, as
long as humans still can understand what is meant. If the German
description gets the gender of "moon" wrong, so what? (I don't think it
does, but just for the sake of argument) Eventually, someone will implement
a fix for that. Maybe we'll have gender for things per language as
statements at some point, which would be useful beyond autodesc.
5. "To the point". That is where manual descriptions have their only
advantage in the long run. Even from a lot of statements, it is hard for an
algorithm to figure out why exactly that person, that thing, that event are
important. Sometime it is something "obscure", something that does not fit
well into statements, or is "hidden" among them. And there, and only there,
do manual descriptions make sense, as I have always maintained.
I am well aware of the limitations of automatic descriptions. I can also
see that "perfection" will never be reached, that the algorithms will never
be finished.
Like Wikipedia.