On Wed, Aug 19, 2015 at 11:19 PM Monte Hurd mhurd@wikimedia.org wrote:
No manual descriptions, on basically any item. And that will remain so for
the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?
Not ignoring. In fact, if the auto-generated descriptions near the quality of human curated descriptions, I'm totally and wholeheartedly onboard that their use should be strongly considered.
I just disagree that closing the quality gap will involve "little programming and linguistic effort." I lean more toward "massive programming and linguistic effort" end of the spectrum.
Specifically, I think it will take massive effort to make the auto-generated descriptions so good that an average person would say, "hey these auto generated descriptions are better than the human curated descriptions" in the examples I posted.
You are confusing (in the literal meaning of the word, fusing together)
several issues into one here, which you then call "better". I see at least five distinct types of "better":
1. A description exists, vs. it does not. In that aspect, automatic descriptions will always be "better" than manual ones.
2. One description is more complete than the other. From what I see in random examples, this is already the case for many biographical items that have a lot of statements. I have actually considered cutting them back a little, because even these "short" descriptions can get quite extensive.
3. Context-aware, specifically, the context where the description is shown. This one goes to the automatic descriptions. AutoDesc already can generate plain text, links to Wikidata, links to a specific Wikipedia where there are articles, and use plain text/redlinks/Wikidata links otherwise. It can generate Wikitext, with some infoboxes. It could easily generate HTML blurbs with a thumbnail if there is an image, and so on. This if contrasted with plain text for manual descriptions.
4. Linguistic/style. Manual descriptions CAN be better phrased than automatic ones, but can also be worse. Automatic descriptions are unimaginative, but consistent. Here is where I probably beg to differ from most other people on this thread: I firmly believe that a description, even if it is slightly wrong grammatically, is preferable to no description, as long as humans still can understand what is meant. If the German description gets the gender of "moon" wrong, so what? (I don't think it does, but just for the sake of argument) Eventually, someone will implement a fix for that. Maybe we'll have gender for things per language as statements at some point, which would be useful beyond autodesc.
5. "To the point". That is where manual descriptions have their only advantage in the long run. Even from a lot of statements, it is hard for an algorithm to figure out why exactly that person, that thing, that event are important. Sometime it is something "obscure", something that does not fit well into statements, or is "hidden" among them. And there, and only there, do manual descriptions make sense, as I have always maintained.
I am well aware of the limitations of automatic descriptions. I can also see that "perfection" will never be reached, that the algorithms will never be finished.
Like Wikipedia.