No manual descriptions, on basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?

Not ignoring. In fact, if the auto-generated descriptions near the quality of human curated descriptions, I'm totally and wholeheartedly onboard that their use should be strongly considered. 

I just disagree that closing the quality gap will involve "little programming and linguistic effort." I lean more toward "massive programming and linguistic effort" end of the spectrum. 

Specifically, I think it will take massive effort to make the auto-generated descriptions so good that an average person would say, "hey these auto generated descriptions are better than the human curated descriptions" in the examples I posted.

But I may, of course, be wrong!

On Wed, Aug 19, 2015 at 1:27 PM, S Page <spage@wikimedia.org> wrote:
My hero Magnus Manske noted
> The situation, for most languages, is this: No manual descriptions, on basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?

The potential of AutoDesc is so enormous to attain "a world in which every single person on the planet is given free access to the sum of all human knowledge" that it should be the entire movement's top project. I nearly wrote a career-limiting e-mail rant to WMF-all on that subject last night.

In this e-mail thread we're talking about it in the limited scope of "

Wikidata descriptions in search on mobile web beta", where the mobile client presents a useful signpost for *existing* articles, in an emblem on lead images and in search results. That's important but we're missing the forest for a single tree when discussing such a transformative technology. If only WMF had a CTO for such things [1].


Anyway, returning to this specific use case:
* Nobody is saying store the AutoDesc in the Wikidata per-language description field.
* Nobody is saying show the AutoDesc if there is an existing Wikidata description.
* Is anybody against showing AutoDesc, after some refinement and productization [2], in these mobile use cases when there is no Wikidata description?
* I propose the AutoDesc as a quality bar that any edit to a Wikidata description needs to improve on (but again that's a topic beyond this mail thread).

Yours, excitedly,
=S Page

[1] http://grnh.se/30f54b , apply today!
[2] https://bitbucket.org/magnusmanske/autodesc/src/HEAD/www/js/?at=master and https://github.com/dbrant/wikidata-autodesc .  It's already a nodejs service, can we append "oid" and declare victory ? :-)

On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske <magnusmanske@googlemail.com> wrote:
Oh, and as for examples, random-paging just got me this:

https://en.wikipedia.org/wiki/Jules_Malou

Manual description: Belgian politician

Automatic description:  Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂

I know which one I'd prefer...


On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske <magnusmanske@googlemail.com> wrote:
Thank you Dmitry! Well phrased and to the point!

As for "templating", that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to "free-style" text). We have a Visual Editor on Wikipedia for a reason :-)



On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant <dbrant@wikimedia.org> wrote:
My thoughts, as ever(!), are as follows:

- The tool that generates the descriptions deserves a lot more development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man.
- Auto-generated descriptions work for current articles, and all future articles. They automatically adapt to updated data. They automatically become more accurate as new data is added.
- When you edit the descriptions yourself, you're not really making a meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time.

As for Brian's suggestion:
It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach.


On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle <bgerstle@wikimedia.org> wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)? 

I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.

On Tuesday, August 18, 2015, Monte Hurd <mhurd@wikimedia.org> wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:


- "Courage Under Fire", 1996 film about a Gulf War friendly-fire incident

- "Pebasiconcha immanis", largest known species of land snail, extinct

- "List of Kenyan writers", notable Kenyan authors

- "Solar eclipse of December 14, 1917", annular eclipse which lasted 77 seconds

- "Natchaug Forest Lumber Shed", historic Civilian Conservation Corps post-and-beam building

- "Sun of Jamaica (album)", debut 1980 studio album by Goombay Dance Band

- "E-1027", modernist villa in France by architect Eileen Gray

- "Daingerfield State Park", park in Morris County, Texas, USA, bordering Lake Daingerfield

- "Todo Lo Que Soy-En Vivo", 2014 Live album by Mexican pop singer Fey

- "2009 UEFA Regions' Cup", 6th UEFA Regions' Cup, won by Castile and Leon



And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:



- "Courage Under Fire", 1996 film by Edward Zwick, produced by John Davis and David T. Friendly from United States of America

- "Pebasiconcha immanis", species of Mollusca

- "List of Kenyan writers", Wikimedia list article

- "Solar eclipse of December 14, 1917", solar eclipse

- "Natchaug Forest Lumber Shed", Construction in Connecticut, United States of America

- "Sun of Jamaica (album)", album

- "E-1027", villa in Roquebrune-Cap-Martin, France

- "Daingerfield State Park", state park and state park of a state of the United States in Texas, United States of America

- "Todo Lo Que Soy-En Vivo", live album by Fey

- "2009 UEFA Regions' Cup", none



Thoughts? 

Just trying to make my own bold assertions falsifiable :)



On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd <mhurd@wikimedia.org> wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:

- Pick, some number of articles at random. 
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface I demo'ed.

If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.




On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd <mhurd@wikimedia.org> wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)

On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant <dbrant@wikimedia.org> wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.



On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd <mhurd@wikimedia.org> wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;) 

I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon:

I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.

I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...


On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali <jan.ainali@wikimedia.se> wrote:
Nice one! 

Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?

Med vänliga hälsningar,
Jan Ainali

Verksamhetschef, Wikimedia Sverige 
0729 - 67 29 48


Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.


2015-08-18 17:23 GMT+02:00 Magnus Manske <magnusmanske@googlemail.com>:
Show automatic description underneath "From Wikipedia...":
https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js

To use, add:
importScript ( 'User:Magnus_Manske/autodesc.js' ) ;
to your common.js

On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell <jane023@gmail.com> wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.

On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields
separated by a pipe it would be better than nothing.

+1, item descriptions are mostly useless in my experience.

As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)

Nemo

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l



_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l



_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l




--
Dmitry Brant
Mobile Apps Team (Android)
Wikimedia Foundation
https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering






--
EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
IRC: bgerstle




--
Dmitry Brant
Mobile Apps Team (Android)
Wikimedia Foundation
https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering


On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle <bgerstle@wikimedia.org> wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)? 

I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.

On Tuesday, August 18, 2015, Monte Hurd <mhurd@wikimedia.org> wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:


- "Courage Under Fire", 1996 film about a Gulf War friendly-fire incident

- "Pebasiconcha immanis", largest known species of land snail, extinct

- "List of Kenyan writers", notable Kenyan authors

- "Solar eclipse of December 14, 1917", annular eclipse which lasted 77 seconds

- "Natchaug Forest Lumber Shed", historic Civilian Conservation Corps post-and-beam building

- "Sun of Jamaica (album)", debut 1980 studio album by Goombay Dance Band

- "E-1027", modernist villa in France by architect Eileen Gray

- "Daingerfield State Park", park in Morris County, Texas, USA, bordering Lake Daingerfield

- "Todo Lo Que Soy-En Vivo", 2014 Live album by Mexican pop singer Fey

- "2009 UEFA Regions' Cup", 6th UEFA Regions' Cup, won by Castile and Leon



And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:



- "Courage Under Fire", 1996 film by Edward Zwick, produced by John Davis and David T. Friendly from United States of America

- "Pebasiconcha immanis", species of Mollusca

- "List of Kenyan writers", Wikimedia list article

- "Solar eclipse of December 14, 1917", solar eclipse

- "Natchaug Forest Lumber Shed", Construction in Connecticut, United States of America

- "Sun of Jamaica (album)", album

- "E-1027", villa in Roquebrune-Cap-Martin, France

- "Daingerfield State Park", state park and state park of a state of the United States in Texas, United States of America

- "Todo Lo Que Soy-En Vivo", live album by Fey

- "2009 UEFA Regions' Cup", none



Thoughts? 

Just trying to make my own bold assertions falsifiable :)



On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd <mhurd@wikimedia.org> wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:

- Pick, some number of articles at random. 
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface I demo'ed.

If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.




On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd <mhurd@wikimedia.org> wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)

On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant <dbrant@wikimedia.org> wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.



On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd <mhurd@wikimedia.org> wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;) 

I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon:

I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.

I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...


On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali <jan.ainali@wikimedia.se> wrote:
Nice one! 

Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?

Med vänliga hälsningar,
Jan Ainali

Verksamhetschef, Wikimedia Sverige 
0729 - 67 29 48


Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.


2015-08-18 17:23 GMT+02:00 Magnus Manske <magnusmanske@googlemail.com>:
Show automatic description underneath "From Wikipedia...":
https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js

To use, add:
importScript ( 'User:Magnus_Manske/autodesc.js' ) ;
to your common.js

On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell <jane023@gmail.com> wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.

On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields
separated by a pipe it would be better than nothing.

+1, item descriptions are mostly useless in my experience.

As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)

Nemo

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l



_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l



_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l




--
Dmitry Brant
Mobile Apps Team (Android)
Wikimedia Foundation
https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering






--
EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
IRC: bgerstle




--
Dmitry Brant
Mobile Apps Team (Android)
Wikimedia Foundation
https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l




--
=S Page  WMF Tech writer

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l