My hero Magnus Manske noted
The situation, for most languages, is this: No manual descriptions, on
basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?
The potential of AutoDesc is so enormous to attain "a world in which every single person on the planet is given free access to the sum of all human knowledge" that it should be the entire movement's top project. I nearly wrote a career-limiting e-mail rant to WMF-all on that subject last night.
In this e-mail thread we're talking about it in the limited scope of "Wikidata descriptions in search on mobile web beta", where the mobile client presents a useful signpost for *existing* articles, in an emblem on lead images and in search results. That's important but we're missing the forest for a single tree when discussing such a transformative technology. If only WMF had a CTO for such things [1].
Anyway, returning to this specific use case: * Nobody is saying store the AutoDesc in the Wikidata per-language description field. * Nobody is saying show the AutoDesc if there is an existing Wikidata description. * Is anybody against showing AutoDesc, after some refinement and productization [2], in these mobile use cases when there is no Wikidata description? * I propose the AutoDesc as a quality bar that any edit to a Wikidata description needs to improve on (but again that's a topic beyond this mail thread).
Yours, excitedly, =S Page
[1] http://grnh.se/30f54b , apply today! [2] https://bitbucket.org/magnusmanske/autodesc/src/HEAD/www/js/?at=master and https://github.com/dbrant/wikidata-autodesc . It's already a nodejs service, can we append "oid" and declare victory ? :-)
On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske magnusmanske@googlemail.com wrote:
Oh, and as for examples, random-paging just got me this:
https://en.wikipedia.org/wiki/Jules_Malou
Manual description: Belgian politician
Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂
I know which one I'd prefer...
On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske < magnusmanske@googlemail.com> wrote:
Thank you Dmitry! Well phrased and to the point!
As for "templating", that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to "free-style" text). We have a Visual Editor on Wikipedia for a reason :-)
On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbrant@wikimedia.org wrote:
My thoughts, as ever(!), are as follows:
- The tool that generates the descriptions deserves a lot more
development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man.
- Auto-generated descriptions work for current articles, and *all
future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added.
- When you edit the descriptions yourself, you're not really making a
meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time.
As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach.
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation
Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile
and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app
interface I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
> If having the most elegant description extraction mechanism was the > goal I would totally agree ;) > > On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org > wrote: > >> IMO, allowing the user to edit the description is a missed >> opportunity to make the user edit the actual *data*, such that the >> description is generated correctly. >> >> >> >> On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org >> wrote: >> >>> IMO, if the goal is quality, then human curated descriptions are >>> superior until such time as the auto-generation script passes the Turing >>> test ;) >>> >>> I see these empty descriptions as an amazing opportunity to give >>> *everyone* an easy new way to edit. I whipped an app editing interface up >>> at the Lyon hackathon: >>> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >>> >>> I used it to add a couple hundred descriptions in a single day >>> just by hitting "random" then adding descriptions for articles which didn't >>> have them. >>> >>> I'd love to try a limited test of this in production to get a >>> sense for how effective human curation can be if the interface is easy to >>> use... >>> >>> >>> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >>> jan.ainali@wikimedia.se> wrote: >>> >>>> Nice one! >>>> >>>> Does not appear to work on svwiki though. Does it have something >>>> to do with that the wiki in question does not display that tagline? >>>> >>>> >>>> *Med vänliga hälsningar,Jan Ainali* >>>> >>>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>>> 0729 - 67 29 48 >>>> >>>> >>>> *Tänk dig en värld där varje människa har fri tillgång till >>>> mänsklighetens samlade kunskap. Det är det vi gör.* >>>> Bli medlem. http://blimedlem.wikimedia.se >>>> >>>> >>>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>>> magnusmanske@googlemail.com>: >>>> >>>>> Show automatic description underneath "From Wikipedia...": >>>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>>> >>>>> To use, add: >>>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>>> to your common.js >>>>> >>>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>>> wrote: >>>>> >>>>>> It would be even better if this (short: 3 field max) >>>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>>> should) and I can see the description thanks to another gadget (sorry no >>>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>>> was served basic fields (so for a painting, the creator field), I would >>>>>> click through to update that too. >>>>>> >>>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>>> nemowiki@gmail.com> wrote: >>>>>> >>>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>>> >>>>>>>> Yes but even if the descriptions were just the contents of >>>>>>>> fields >>>>>>>> separated by a pipe it would be better than nothing. >>>>>>>> >>>>>>> >>>>>>> +1, item descriptions are mostly useless in my experience. >>>>>>> >>>>>>> As for "get into production on Wikipedia" I don't know what it >>>>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>>> >>>>>>> Nemo >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> >> -- >> Dmitry Brant >> Mobile Apps Team (Android) >> Wikimedia Foundation >> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >> >> >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation
Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile
and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app
interface I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
> If having the most elegant description extraction mechanism was the > goal I would totally agree ;) > > On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org > wrote: > >> IMO, allowing the user to edit the description is a missed >> opportunity to make the user edit the actual *data*, such that the >> description is generated correctly. >> >> >> >> On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org >> wrote: >> >>> IMO, if the goal is quality, then human curated descriptions are >>> superior until such time as the auto-generation script passes the Turing >>> test ;) >>> >>> I see these empty descriptions as an amazing opportunity to give >>> *everyone* an easy new way to edit. I whipped an app editing interface up >>> at the Lyon hackathon: >>> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >>> >>> I used it to add a couple hundred descriptions in a single day >>> just by hitting "random" then adding descriptions for articles which didn't >>> have them. >>> >>> I'd love to try a limited test of this in production to get a >>> sense for how effective human curation can be if the interface is easy to >>> use... >>> >>> >>> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >>> jan.ainali@wikimedia.se> wrote: >>> >>>> Nice one! >>>> >>>> Does not appear to work on svwiki though. Does it have something >>>> to do with that the wiki in question does not display that tagline? >>>> >>>> >>>> *Med vänliga hälsningar,Jan Ainali* >>>> >>>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>>> 0729 - 67 29 48 >>>> >>>> >>>> *Tänk dig en värld där varje människa har fri tillgång till >>>> mänsklighetens samlade kunskap. Det är det vi gör.* >>>> Bli medlem. http://blimedlem.wikimedia.se >>>> >>>> >>>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>>> magnusmanske@googlemail.com>: >>>> >>>>> Show automatic description underneath "From Wikipedia...": >>>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>>> >>>>> To use, add: >>>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>>> to your common.js >>>>> >>>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>>> wrote: >>>>> >>>>>> It would be even better if this (short: 3 field max) >>>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>>> should) and I can see the description thanks to another gadget (sorry no >>>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>>> was served basic fields (so for a painting, the creator field), I would >>>>>> click through to update that too. >>>>>> >>>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>>> nemowiki@gmail.com> wrote: >>>>>> >>>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>>> >>>>>>>> Yes but even if the descriptions were just the contents of >>>>>>>> fields >>>>>>>> separated by a pipe it would be better than nothing. >>>>>>>> >>>>>>> >>>>>>> +1, item descriptions are mostly useless in my experience. >>>>>>> >>>>>>> As for "get into production on Wikipedia" I don't know what it >>>>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>>> >>>>>>> Nemo >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> >> -- >> Dmitry Brant >> Mobile Apps Team (Android) >> Wikimedia Foundation >> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >> >> >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l