We recently resumed tweeting with the @WikimediaMobile handle, and I wanted to share one tweet with you:
https://twitter.com/WikimediaMobile/status/631178379501285376
It looks like people are pretty keen on it.
There was one person who said outside of top Wikipedias it doesn't seem quite as useful. I was wondering, what role might https://www.wikidata.org/wiki/Wikidata:Arbitrary_access play in helping to enrich results? https://phabricator.wikimedia.org/T100786 and https://phabricator.wikimedia.org/T100787 are recent examples of implementation of this sort of thing, as mentioned on https://wikitech.wikimedia.org/wiki/Deployments.
-Adam
As to Wikidata descriptions, I think it's a good first step. As someone mentioned, it's pretty useless for most languages, as there are no descriptions on Wikidata. IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
If this is done, I suggest to NOT put the auto-generated text in the manual description field, as descriptions will improve over time, through both new statements and better algorithms. Rather, cache descriptions separately, and update them as required.
On Fri, Aug 14, 2015 at 4:52 PM Adam Baso abaso@wikimedia.org wrote:
We recently resumed tweeting with the @WikimediaMobile handle, and I wanted to share one tweet with you:
https://twitter.com/WikimediaMobile/status/631178379501285376
It looks like people are pretty keen on it.
There was one person who said outside of top Wikipedias it doesn't seem quite as useful. I was wondering, what role might https://www.wikidata.org/wiki/Wikidata:Arbitrary_access play in helping to enrich results? https://phabricator.wikimedia.org/T100786 and https://phabricator.wikimedia.org/T100787 are recent examples of implementation of this sort of thing, as mentioned on https://wikitech.wikimedia.org/wiki/Deployments.
-Adam _______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Magnus, were you thinking that if there *is* a description field for the knowledge item then that should override the computed description?
On Fri, Aug 14, 2015 at 10:31 AM, Magnus Manske <magnusmanske@googlemail.com
wrote:
As to Wikidata descriptions, I think it's a good first step. As someone mentioned, it's pretty useless for most languages, as there are no descriptions on Wikidata. IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
If this is done, I suggest to NOT put the auto-generated text in the manual description field, as descriptions will improve over time, through both new statements and better algorithms. Rather, cache descriptions separately, and update them as required.
On Fri, Aug 14, 2015 at 4:52 PM Adam Baso abaso@wikimedia.org wrote:
We recently resumed tweeting with the @WikimediaMobile handle, and I wanted to share one tweet with you:
https://twitter.com/WikimediaMobile/status/631178379501285376
It looks like people are pretty keen on it.
There was one person who said outside of top Wikipedias it doesn't seem quite as useful. I was wondering, what role might https://www.wikidata.org/wiki/Wikidata:Arbitrary_access play in helping to enrich results? https://phabricator.wikimedia.org/T100786 and https://phabricator.wikimedia.org/T100787 are recent examples of implementation of this sort of thing, as mentioned on https://wikitech.wikimedia.org/wiki/Deployments.
-Adam _______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Absolutely. However, IMO there should only be a manual description if the automatic one is not sufficient. "Austrian biologist (1900-1990), winner of the 1980 Whatnot award" is not something humans need to write in 250 languages. In fact, I'd be in favor of removing trivial manual descriptions, as automatic ones would likely be better (as in, more up-to-date with the statements).
But yes, the manual description, if present, should take precedence.
On Fri, Aug 14, 2015 at 6:36 PM Adam Baso abaso@wikimedia.org wrote:
Magnus, were you thinking that if there *is* a description field for the knowledge item then that should override the computed description?
On Fri, Aug 14, 2015 at 10:31 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
As to Wikidata descriptions, I think it's a good first step. As someone mentioned, it's pretty useless for most languages, as there are no descriptions on Wikidata. IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
If this is done, I suggest to NOT put the auto-generated text in the manual description field, as descriptions will improve over time, through both new statements and better algorithms. Rather, cache descriptions separately, and update them as required.
On Fri, Aug 14, 2015 at 4:52 PM Adam Baso abaso@wikimedia.org wrote:
We recently resumed tweeting with the @WikimediaMobile handle, and I wanted to share one tweet with you:
https://twitter.com/WikimediaMobile/status/631178379501285376
It looks like people are pretty keen on it.
There was one person who said outside of top Wikipedias it doesn't seem quite as useful. I was wondering, what role might https://www.wikidata.org/wiki/Wikidata:Arbitrary_access play in helping to enrich results? https://phabricator.wikimedia.org/T100786 and https://phabricator.wikimedia.org/T100787 are recent examples of implementation of this sort of thing, as mentioned on https://wikitech.wikimedia.org/wiki/Deployments.
-Adam _______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Fri, Aug 14, 2015 at 8:51 AM, Adam Baso abaso@wikimedia.org wrote:
I was wondering, what role might https://www.wikidata.org/wiki/Wikidata:Arbitrary_access play in helping to enrich results? https://phabricator.wikimedia.org/T100786 and https://phabricator.wikimedia.org/T100787 are recent examples of implementation of this sort of thing, as mentioned on https://wikitech.wikimedia.org/wiki/Deployments.
I don't think it plays any role. Arbitrary access is about cache invalidation: how to make sure that article A can display a value from the data item of article B and do not become stale when that data item is edited. For search results, that kind of invalidation probably does not make sense and their caching should be controlled with some kind of TTL instead.
On Fri, Aug 14, 2015 at 10:31 AM, Magnus Manske <magnusmanske@googlemail.com
wrote:
IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
The Wikidata team is not a fan of that idea: T91981 https://phabricator.wikimedia.org/T91981
On Fri, Aug 14, 2015 at 9:54 PM Gergo Tisza gtisza@wikimedia.org wrote:
On Fri, Aug 14, 2015 at 10:31 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
The Wikidata team is not a fan of that idea: T91981 https://phabricator.wikimedia.org/T91981
Yes, sadly. The argument "not good enough" is a fail IMHO, though. If it's
bad, improve the algorithm and/or add statements. If it's still bad, THEN add a manual description.
I think the worst possible description is the one that's missing.
Back-of-the-envelope calculation: * We have ~45 million manual descriptions at the moment on Wikidata * We have ~18 million items * We have ~250 languages That means that, as of this moment, less than 1% of all possible descriptions are filled in. And the quality of these manual descriptions is everyone's best guess; I've seen plenty "disambiguation page" and "category page", EVEN IS THAT IS NOT TRUE. Some crappy bot filled those in. No chance of quickly fixing this.
So, 99% descriptions missing, with little chance of them getting filled in at all (think: small languages), and a rather dubious track record for the ones that are.
It's like letting people drown in the Mediterranean because the tents to house them temporarily are "not good enough". Frustrating, seriously.
First example that loaded on "random item": https://www.wikidata.org/wiki/Q6256189
English: Manual description: "American politician". Automatic description: "US-American politician (*1968) ♂"
German: Manual description: None. Automatic description: "Vereinigte Staaten Politiker (*1968) ♂" (yes, would need some work on the algorithm, but understandable)
https://tools.wmflabs.org/autodesc/?q=Q6256189&lang=de&mode=short&am...
On Fri, Aug 14, 2015 at 11:22 PM Magnus Manske magnusmanske@googlemail.com wrote:
On Fri, Aug 14, 2015 at 9:54 PM Gergo Tisza gtisza@wikimedia.org wrote:
On Fri, Aug 14, 2015 at 10:31 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
The Wikidata team is not a fan of that idea: T91981 https://phabricator.wikimedia.org/T91981
Yes, sadly. The argument "not good enough" is a fail IMHO, though. If
it's bad, improve the algorithm and/or add statements. If it's still bad, THEN add a manual description.
I think the worst possible description is the one that's missing.
Back-of-the-envelope calculation:
- We have ~45 million manual descriptions at the moment on Wikidata
- We have ~18 million items
- We have ~250 languages
That means that, as of this moment, less than 1% of all possible descriptions are filled in. And the quality of these manual descriptions is everyone's best guess; I've seen plenty "disambiguation page" and "category page", EVEN IS THAT IS NOT TRUE. Some crappy bot filled those in. No chance of quickly fixing this.
So, 99% descriptions missing, with little chance of them getting filled in at all (think: small languages), and a rather dubious track record for the ones that are.
It's like letting people drown in the Mediterranean because the tents to house them temporarily are "not good enough". Frustrating, seriously.
I added some thoughts on the task. I do think it's something we explore, even if on a small group of articles to measure the impact.
On Fri, Aug 14, 2015 at 3:27 PM, Magnus Manske magnusmanske@googlemail.com wrote:
First example that loaded on "random item": https://www.wikidata.org/wiki/Q6256189
English: Manual description: "American politician". Automatic description: "US-American politician (*1968) ♂"
German: Manual description: None. Automatic description: "Vereinigte Staaten Politiker (*1968) ♂" (yes, would need some work on the algorithm, but understandable)
https://tools.wmflabs.org/autodesc/?q=Q6256189&lang=de&mode=short&am...
On Fri, Aug 14, 2015 at 11:22 PM Magnus Manske magnusmanske@googlemail.com wrote:
On Fri, Aug 14, 2015 at 9:54 PM Gergo Tisza gtisza@wikimedia.org wrote:
On Fri, Aug 14, 2015 at 10:31 AM, Magnus Manske magnusmanske@googlemail.com wrote:
IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
The Wikidata team is not a fan of that idea: T91981
Yes, sadly. The argument "not good enough" is a fail IMHO, though. If it's bad, improve the algorithm and/or add statements. If it's still bad, THEN add a manual description.
I think the worst possible description is the one that's missing.
Back-of-the-envelope calculation:
- We have ~45 million manual descriptions at the moment on Wikidata
- We have ~18 million items
- We have ~250 languages
That means that, as of this moment, less than 1% of all possible descriptions are filled in. And the quality of these manual descriptions is everyone's best guess; I've seen plenty "disambiguation page" and "category page", EVEN IS THAT IS NOT TRUE. Some crappy bot filled those in. No chance of quickly fixing this.
So, 99% descriptions missing, with little chance of them getting filled in at all (think: small languages), and a rather dubious track record for the ones that are.
It's like letting people drown in the Mediterranean because the tents to house them temporarily are "not good enough". Frustrating, seriously.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
The argument "not good enough" is a fail IMHO, though. If it's bad,
improve the algorithm and/or add statements. If it's still bad, THEN add a manual description.
+10^100
On Fri, Aug 14, 2015 at 6:22 PM, Magnus Manske magnusmanske@googlemail.com wrote:
On Fri, Aug 14, 2015 at 9:54 PM Gergo Tisza gtisza@wikimedia.org wrote:
On Fri, Aug 14, 2015 at 10:31 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
The Wikidata team is not a fan of that idea: T91981 https://phabricator.wikimedia.org/T91981
Yes, sadly. The argument "not good enough" is a fail IMHO, though. If
it's bad, improve the algorithm and/or add statements. If it's still bad, THEN add a manual description.
I think the worst possible description is the one that's missing.
Back-of-the-envelope calculation:
- We have ~45 million manual descriptions at the moment on Wikidata
- We have ~18 million items
- We have ~250 languages
That means that, as of this moment, less than 1% of all possible descriptions are filled in. And the quality of these manual descriptions is everyone's best guess; I've seen plenty "disambiguation page" and "category page", EVEN IS THAT IS NOT TRUE. Some crappy bot filled those in. No chance of quickly fixing this.
So, 99% descriptions missing, with little chance of them getting filled in at all (think: small languages), and a rather dubious track record for the ones that are.
It's like letting people drown in the Mediterranean because the tents to house them temporarily are "not good enough". Frustrating, seriously.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
I've seen arguments on both sides here. Some say automatically generated descriptions are not good enough. Some say they are. Why don't we gather some data on this and use that to decide what's right? :-)
Dan On 14 Aug 2015 6:29 pm, "Dmitry Brant" dbrant@wikimedia.org wrote:
The argument "not good enough" is a fail IMHO, though. If it's bad,
improve the algorithm and/or add statements. If it's still bad, THEN add a manual description.
+10^100
On Fri, Aug 14, 2015 at 6:22 PM, Magnus Manske < magnusmanske@googlemail.com> wrote:
On Fri, Aug 14, 2015 at 9:54 PM Gergo Tisza gtisza@wikimedia.org wrote:
On Fri, Aug 14, 2015 at 10:31 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
IMHO the next step is auto-generating short descriptions from the item statements, which will be perfectly fine for the vast majority of cases.
The Wikidata team is not a fan of that idea: T91981 https://phabricator.wikimedia.org/T91981
Yes, sadly. The argument "not good enough" is a fail IMHO, though. If
it's bad, improve the algorithm and/or add statements. If it's still bad, THEN add a manual description.
I think the worst possible description is the one that's missing.
Back-of-the-envelope calculation:
- We have ~45 million manual descriptions at the moment on Wikidata
- We have ~18 million items
- We have ~250 languages
That means that, as of this moment, less than 1% of all possible descriptions are filled in. And the quality of these manual descriptions is everyone's best guess; I've seen plenty "disambiguation page" and "category page", EVEN IS THAT IS NOT TRUE. Some crappy bot filled those in. No chance of quickly fixing this.
So, 99% descriptions missing, with little chance of them getting filled in at all (think: small languages), and a rather dubious track record for the ones that are.
It's like letting people drown in the Mediterranean because the tents to house them temporarily are "not good enough". Frustrating, seriously.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org wrote:
I've seen arguments on both sides here. Some say automatically generated descriptions are not good enough. Some say they are. Why don't we gather some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
Cheers Lydia
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing. This could be a prompt to make a game that offers to update the description with an auto-generated text. So for a Monet painting, the description could be "creator Monet|instance painting". We have over 100,000 paintings on Wikidata thanks to the Sum of all Paintings project (yay!) and most museums only have titles in the language it was created in and the language of the museum, so we are a looooong way from creating meaningful titles for all of these and meaningful short descriptions would be a real benefit to the project.
On Sat, Aug 15, 2015 at 8:38 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org wrote:
I've seen arguments on both sides here. Some say automatically generated descriptions are not good enough. Some say they are. Why don't we gather some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Ah, but when auto-descriptions get better, how do we know which should be updated, and which have been "improved" bu humans? Because people will screem bloody murder if we replace "their" descriptions with automatic ones, even if those are better.
On Sat, Aug 15, 2015 at 7:53 AM Jane Darnell jane023@gmail.com wrote:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing. This could be a prompt to make a game that offers to update the description with an auto-generated text. So for a Monet painting, the description could be "creator Monet|instance painting". We have over 100,000 paintings on Wikidata thanks to the Sum of all Paintings project (yay!) and most museums only have titles in the language it was created in and the language of the museum, so we are a looooong way from creating meaningful titles for all of these and meaningful short descriptions would be a real benefit to the project.
On Sat, Aug 15, 2015 at 8:38 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org wrote:
I've seen arguments on both sides here. Some say automatically generated descriptions are not good enough. Some say they are. Why don't we gather some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Well we should start by filling blank descriptions of course. I should have gone on to explain that in my experience of using listeria lists in my userspace on the Dutch wikipedia, I have noticed lots of Wikidata infrastructure that hasn't been translated yet. So in my example of the Monet painting, in some languages it would not look like "creator Monet|instance painting" but like "Qxyz Monet|Qefg Qklm" (worst case scenario where only the item for Monet has been propagated to all 200+ languages). Having a game where such auto descriptions can be served to people who are able to fill in labels and descriptions could be useful for more than just the one item.
On Sat, Aug 15, 2015 at 2:08 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Ah, but when auto-descriptions get better, how do we know which should be updated, and which have been "improved" bu humans? Because people will screem bloody murder if we replace "their" descriptions with automatic ones, even if those are better.
On Sat, Aug 15, 2015 at 7:53 AM Jane Darnell jane023@gmail.com wrote:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing. This could be a prompt to make a game that offers to update the description with an auto-generated text. So for a Monet painting, the description could be "creator Monet|instance painting". We have over 100,000 paintings on Wikidata thanks to the Sum of all Paintings project (yay!) and most museums only have titles in the language it was created in and the language of the museum, so we are a looooong way from creating meaningful titles for all of these and meaningful short descriptions would be a real benefit to the project.
On Sat, Aug 15, 2015 at 8:38 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org wrote:
I've seen arguments on both sides here. Some say automatically
generated
descriptions are not good enough. Some say they are. Why don't we
gather
some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
That sounds like a very good idea for labels.
Not quite a game, but I have something along those lines running for a while. Example: Monet. http://tools.wmflabs.org/wikidata-todo/cloudy_concept.php?q=Q296&lang=en
Every time a label is set in a language, it would potentially improve dozens or hundreds of auto-descriptions. Which is one reason why we should NOT flood the manual description field with one-off text generation; they need to be updated, and figuring out which descriptions we can overwrite is next-to-impossible.
On Sat, Aug 15, 2015 at 1:17 PM Jane Darnell jane023@gmail.com wrote:
Well we should start by filling blank descriptions of course. I should have gone on to explain that in my experience of using listeria lists in my userspace on the Dutch wikipedia, I have noticed lots of Wikidata infrastructure that hasn't been translated yet. So in my example of the Monet painting, in some languages it would not look like "creator Monet|instance painting" but like "Qxyz Monet|Qefg Qklm" (worst case scenario where only the item for Monet has been propagated to all 200+ languages). Having a game where such auto descriptions can be served to people who are able to fill in labels and descriptions could be useful for more than just the one item.
On Sat, Aug 15, 2015 at 2:08 PM, Magnus Manske < magnusmanske@googlemail.com> wrote:
Ah, but when auto-descriptions get better, how do we know which should be updated, and which have been "improved" bu humans? Because people will screem bloody murder if we replace "their" descriptions with automatic ones, even if those are better.
On Sat, Aug 15, 2015 at 7:53 AM Jane Darnell jane023@gmail.com wrote:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing. This could be a prompt to make a game that offers to update the description with an auto-generated text. So for a Monet painting, the description could be "creator Monet|instance painting". We have over 100,000 paintings on Wikidata thanks to the Sum of all Paintings project (yay!) and most museums only have titles in the language it was created in and the language of the museum, so we are a looooong way from creating meaningful titles for all of these and meaningful short descriptions would be a real benefit to the project.
On Sat, Aug 15, 2015 at 8:38 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org wrote:
I've seen arguments on both sides here. Some say automatically
generated
descriptions are not good enough. Some say they are. Why don't we
gather
some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Yes something like that (although if you look at a list that long in a small language like Macedonian it could be pretty overwhelming). I think we need something that shows people why it is worth their time to translate property or item labels for things that don't have Wikipedia articles attached to them (yet).
On Sat, Aug 15, 2015 at 2:33 PM, Magnus Manske magnusmanske@googlemail.com wrote:
That sounds like a very good idea for labels.
Not quite a game, but I have something along those lines running for a while. Example: Monet. http://tools.wmflabs.org/wikidata-todo/cloudy_concept.php?q=Q296&lang=en
Every time a label is set in a language, it would potentially improve dozens or hundreds of auto-descriptions. Which is one reason why we should NOT flood the manual description field with one-off text generation; they need to be updated, and figuring out which descriptions we can overwrite is next-to-impossible.
On Sat, Aug 15, 2015 at 1:17 PM Jane Darnell jane023@gmail.com wrote:
Well we should start by filling blank descriptions of course. I should have gone on to explain that in my experience of using listeria lists in my userspace on the Dutch wikipedia, I have noticed lots of Wikidata infrastructure that hasn't been translated yet. So in my example of the Monet painting, in some languages it would not look like "creator Monet|instance painting" but like "Qxyz Monet|Qefg Qklm" (worst case scenario where only the item for Monet has been propagated to all 200+ languages). Having a game where such auto descriptions can be served to people who are able to fill in labels and descriptions could be useful for more than just the one item.
On Sat, Aug 15, 2015 at 2:08 PM, Magnus Manske < magnusmanske@googlemail.com> wrote:
Ah, but when auto-descriptions get better, how do we know which should be updated, and which have been "improved" bu humans? Because people will screem bloody murder if we replace "their" descriptions with automatic ones, even if those are better.
On Sat, Aug 15, 2015 at 7:53 AM Jane Darnell jane023@gmail.com wrote:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing. This could be a prompt to make a game that offers to update the description with an auto-generated text. So for a Monet painting, the description could be "creator Monet|instance painting". We have over 100,000 paintings on Wikidata thanks to the Sum of all Paintings project (yay!) and most museums only have titles in the language it was created in and the language of the museum, so we are a looooong way from creating meaningful titles for all of these and meaningful short descriptions would be a real benefit to the project.
On Sat, Aug 15, 2015 at 8:38 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org wrote:
I've seen arguments on both sides here. Some say automatically
generated
descriptions are not good enough. Some say they are. Why don't we
gather
some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Sat, Aug 15, 2015 at 5:08 AM, Magnus Manske magnusmanske@googlemail.com wrote:
Ah, but when auto-descriptions get better, how do we know which should be updated, and which have been "improved" bu humans? Because people will screem bloody murder if we replace "their" descriptions with automatic ones, even if those are better.
Would it be acceptable to *generate* a description on the fly if there isn't a description in the user's language, but never *replace* an existing description in Wikidata?
AIUI this is what RESTBase is good at: in response to API requests for information about a page, some backend generates information, RESTBase caches it for future requests but RESTBase doesn't update the content databases. If I'm right (unlikely :-) ), then the upcoming MobileApps service could do this without anyone screaming.
Maybe the MobileApps service already does this, I'm not sure what https://restbase.wikimedia.org/en.wikipedia.org/v1/page/mobile-text/Cat puts in the "description" field if Wikidata's description is empty.
figuring out which descriptions we can overwrite is next-to-impossible.
So don't try. The game becomes: present the generated description next to the manual Wikidata description, and if enough users prefer the former, blank out the Wikidata description.
Cheers,
S,
No, the RESTBase mobileapps service[1] doesn't do this currently. That should be possible, though. The service currently uses action=mobileview under the hood. This means it gets it first from the WP instances, and it that one doesn't have it it would go to Wikidata.
In the future we'll likely switch to Parsoid for the backend requests but I don't know when that will happen. We then might have to request the description using something like action=query&prop=pageterms&wbptterms=description[1] if that's not included in Parsoid.
[1] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... [2] https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=paget...
Bernd
On Mon, Aug 17, 2015 at 4:56 PM, S Page spage@wikimedia.org wrote:
On Sat, Aug 15, 2015 at 5:08 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
Ah, but when auto-descriptions get better, how do we know which should be updated, and which have been "improved" bu humans? Because people will screem bloody murder if we replace "their" descriptions with automatic ones, even if those are better.
Would it be acceptable to *generate* a description on the fly if there isn't a description in the user's language, but never *replace* an existing description in Wikidata?
AIUI this is what RESTBase is good at: in response to API requests for information about a page, some backend generates information, RESTBase caches it for future requests but RESTBase doesn't update the content databases. If I'm right (unlikely :-) ), then the upcoming MobileApps service could do this without anyone screaming.
Maybe the MobileApps service already does this, I'm not sure what https://restbase.wikimedia.org/en.wikipedia.org/v1/page/mobile-text/Cat puts in the "description" field if Wikidata's description is empty.
figuring out which descriptions we can overwrite is next-to-impossible.
So don't try. The game becomes: present the generated description next to the manual Wikidata description, and if enough users prefer the former, blank out the Wikidata description.
Cheers,
=S Page WMF Tech writer
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Thanks - is that the one I have (can't tell)? Here's mine: https://en.wikipedia.org/wiki/User:Jane023/common.js
On Tue, Aug 18, 2015 at 5:23 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) <nemowiki@gmail.com
wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
That's the one, just add
importScript ( 'User:Magnus_Manske/autodesc.js' ) ;
to that page.
On Tue, Aug 18, 2015 at 6:39 PM Jane Darnell jane023@gmail.com wrote:
Thanks - is that the one I have (can't tell)? Here's mine: https://en.wikipedia.org/wiki/User:Jane023/common.js
On Tue, Aug 18, 2015 at 5:23 PM, Magnus Manske < magnusmanske@googlemail.com> wrote:
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Nice! That works like a charm!
On Tue, Aug 18, 2015 at 7:43 PM, Magnus Manske magnusmanske@googlemail.com wrote:
That's the one, just add
importScript ( 'User:Magnus_Manske/autodesc.js' ) ;
to that page.
On Tue, Aug 18, 2015 at 6:39 PM Jane Darnell jane023@gmail.com wrote:
Thanks - is that the one I have (can't tell)? Here's mine: https://en.wikipedia.org/wiki/User:Jane023/common.js
On Tue, Aug 18, 2015 at 5:23 PM, Magnus Manske < magnusmanske@googlemail.com> wrote:
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Nice one!
Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-18 17:23 GMT+02:00 Magnus Manske magnusmanske@googlemail.com:
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) <nemowiki@gmail.com
wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;)
I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon: https://www.youtube.com/watch?v=6VblyGhf_c8
I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.
I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...
On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
Nice one!
Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-18 17:23 GMT+02:00 Magnus Manske magnusmanske@googlemail.com:
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;)
I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon: https://www.youtube.com/watch?v=6VblyGhf_c8
I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.
I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...
On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
Nice one!
Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-18 17:23 GMT+02:00 Magnus Manske magnusmanske@googlemail.com:
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;)
I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon: https://www.youtube.com/watch?v=6VblyGhf_c8
I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.
I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...
On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
Nice one!
Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-18 17:23 GMT+02:00 Magnus Manske magnusmanske@googlemail.com:
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
Jane Darnell, 15/08/2015 08:53:
> Yes but even if the descriptions were just the contents of fields > separated by a pipe it would be better than nothing. >
+1, item descriptions are mostly useless in my experience.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random. - Run them through a description extraction script. - Have a human describe the same articles with, say, the app interface I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;)
I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon: https://www.youtube.com/watch?v=6VblyGhf_c8
I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.
I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...
On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
Nice one!
Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-18 17:23 GMT+02:00 Magnus Manske magnusmanske@googlemail.com:
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
It would be even better if this (short: 3 field max) pipe-separated list was available as a gadget to wikidatans on Wikipedia (like me). I can't see if a page I am on has an "instance of" (though it should) and I can see the description thanks to another gadget (sorry no idea which one that is). Often I will update empty descriptions, but if I was served basic fields (so for a painting, the creator field), I would click through to update that too.
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
> Jane Darnell, 15/08/2015 08:53: > >> Yes but even if the descriptions were just the contents of fields >> separated by a pipe it would be better than nothing. >> > > +1, item descriptions are mostly useless in my experience. > > As for "get into production on Wikipedia" I don't know what it > means, I certainly don't like 1) mobile-specific features, 2) overriding > existing manually curated content; but it's good to 3) fill gaps. Mobile > folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) > > Nemo >
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire incident*
- "Pebasiconcha immanis", *largest known species of land snail, extinct*
- "List of Kenyan writers", *notable Kenyan authors*
- "Solar eclipse of December 14, 1917", *annular eclipse which lasted 77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance Band*
- "E-1027", *modernist villa in France by architect Eileen Gray*
- "Daingerfield State Park", *park in Morris County, Texas, USA, bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John Davis and David T. Friendly from United States of America*
- "Pebasiconcha immanis", *species of Mollusca*
- "List of Kenyan writers", *Wikimedia list article*
- "Solar eclipse of December 14, 1917", *solar eclipse*
- "Natchaug Forest Lumber Shed", *Construction in Connecticut, United States of America*
- "Sun of Jamaica (album)", *album*
- "E-1027", *villa in Roquebrune-Cap-Martin, France*
- "Daingerfield State Park", *state park and state park of a state of the United States in Texas, United States of America*
- "Todo Lo Que Soy-En Vivo", *live album by Fey*
- "2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface I
demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;)
I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon: https://www.youtube.com/watch?v=6VblyGhf_c8
I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.
I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...
On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
Nice one!
Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-18 17:23 GMT+02:00 Magnus Manske magnusmanske@googlemail.com :
Show automatic description underneath "From Wikipedia...": https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js
To use, add: importScript ( 'User:Magnus_Manske/autodesc.js' ) ; to your common.js
On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com wrote:
> It would be even better if this (short: 3 field max) pipe-separated > list was available as a gadget to wikidatans on Wikipedia (like me). I > can't see if a page I am on has an "instance of" (though it should) and I > can see the description thanks to another gadget (sorry no idea which one > that is). Often I will update empty descriptions, but if I was served basic > fields (so for a painting, the creator field), I would click through to > update that too. > > On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < > nemowiki@gmail.com> wrote: > >> Jane Darnell, 15/08/2015 08:53: >> >>> Yes but even if the descriptions were just the contents of fields >>> separated by a pipe it would be better than nothing. >>> >> >> +1, item descriptions are mostly useless in my experience. >> >> As for "get into production on Wikipedia" I don't know what it >> means, I certainly don't like 1) mobile-specific features, 2) overriding >> existing manually curated content; but it's good to 3) fill gaps. Mobile >> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >> >> Nemo >> > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l >
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
"Pebasiconcha immanis", *largest known species of land snail, extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted 77
seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation Corps
post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
"Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer Fey*
"2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile and
Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of the
United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd <mhurd@wikimedia.org javascript:_e(%7B%7D,'cvml','mhurd@wikimedia.org');> wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface I
demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd <mhurd@wikimedia.org javascript:_e(%7B%7D,'cvml','mhurd@wikimedia.org');> wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant <dbrant@wikimedia.org javascript:_e(%7B%7D,'cvml','dbrant@wikimedia.org');> wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd <mhurd@wikimedia.org javascript:_e(%7B%7D,'cvml','mhurd@wikimedia.org');> wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;)
I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon: bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8
I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.
I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...
On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali <jan.ainali@wikimedia.se javascript:_e(%7B%7D,'cvml','jan.ainali@wikimedia.se');> wrote:
Nice one!
Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-18 17:23 GMT+02:00 Magnus Manske <magnusmanske@googlemail.com javascript:_e(%7B%7D,'cvml','magnusmanske@googlemail.com');>:
> Show automatic description underneath "From Wikipedia...": > https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js > > To use, add: > importScript ( 'User:Magnus_Manske/autodesc.js' ) ; > to your common.js > > On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell <jane023@gmail.com > javascript:_e(%7B%7D,'cvml','jane023@gmail.com');> wrote: > >> It would be even better if this (short: 3 field max) pipe-separated >> list was available as a gadget to wikidatans on Wikipedia (like me). I >> can't see if a page I am on has an "instance of" (though it should) and I >> can see the description thanks to another gadget (sorry no idea which one >> that is). Often I will update empty descriptions, but if I was served basic >> fields (so for a painting, the creator field), I would click through to >> update that too. >> >> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >> nemowiki@gmail.com >> javascript:_e(%7B%7D,'cvml','nemowiki@gmail.com');> wrote: >> >>> Jane Darnell, 15/08/2015 08:53: >>> >>>> Yes but even if the descriptions were just the contents of fields >>>> separated by a pipe it would be better than nothing. >>>> >>> >>> +1, item descriptions are mostly useless in my experience. >>> >>> As for "get into production on Wikipedia" I don't know what it >>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>> >>> Nemo >>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Mobile-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
My thoughts, as ever(!), are as follows:
- The tool that generates the descriptions deserves a lot more development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man. - Auto-generated descriptions work for current articles, and *all future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added. - When you edit the descriptions yourself, you're not really making a meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time.
As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach.
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
"Pebasiconcha immanis", *largest known species of land snail, extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted 77
seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation Corps
post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
"Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer Fey*
"2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile and
Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface I
demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;)
I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon: bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8
I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.
I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...
On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
> Nice one! > > Does not appear to work on svwiki though. Does it have something to > do with that the wiki in question does not display that tagline? > > > *Med vänliga hälsningar,Jan Ainali* > > Verksamhetschef, Wikimedia Sverige http://wikimedia.se > 0729 - 67 29 48 > > > *Tänk dig en värld där varje människa har fri tillgång till > mänsklighetens samlade kunskap. Det är det vi gör.* > Bli medlem. http://blimedlem.wikimedia.se > > > 2015-08-18 17:23 GMT+02:00 Magnus Manske < > magnusmanske@googlemail.com>: > >> Show automatic description underneath "From Wikipedia...": >> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >> >> To use, add: >> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >> to your common.js >> >> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >> wrote: >> >>> It would be even better if this (short: 3 field max) >>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>> (like me). I can't see if a page I am on has an "instance of" (though it >>> should) and I can see the description thanks to another gadget (sorry no >>> idea which one that is). Often I will update empty descriptions, but if I >>> was served basic fields (so for a painting, the creator field), I would >>> click through to update that too. >>> >>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>> nemowiki@gmail.com> wrote: >>> >>>> Jane Darnell, 15/08/2015 08:53: >>>> >>>>> Yes but even if the descriptions were just the contents of fields >>>>> separated by a pipe it would be better than nothing. >>>>> >>>> >>>> +1, item descriptions are mostly useless in my experience. >>>> >>>> As for "get into production on Wikipedia" I don't know what it >>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>> >>>> Nemo >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
Thank you Dmitry! Well phrased and to the point!
As for "templating", that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to "free-style" text). We have a Visual Editor on Wikipedia for a reason :-)
On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbrant@wikimedia.org wrote:
My thoughts, as ever(!), are as follows:
- The tool that generates the descriptions deserves a lot more
development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man.
- Auto-generated descriptions work for current articles, and *all future
articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added.
- When you edit the descriptions yourself, you're not really making a
meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time.
As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach.
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
"Pebasiconcha immanis", *largest known species of land snail, extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation Corps
post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
"Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer Fey*
"2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile and
Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface
I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org wrote:
> IMO, if the goal is quality, then human curated descriptions are > superior until such time as the auto-generation script passes the Turing > test ;) > > I see these empty descriptions as an amazing opportunity to give > *everyone* an easy new way to edit. I whipped an app editing interface up > at the Lyon hackathon: > bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 > > I used it to add a couple hundred descriptions in a single day just > by hitting "random" then adding descriptions for articles which didn't have > them. > > I'd love to try a limited test of this in production to get a sense > for how effective human curation can be if the interface is easy to use... > > > On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali <jan.ainali@wikimedia.se > > wrote: > >> Nice one! >> >> Does not appear to work on svwiki though. Does it have something to >> do with that the wiki in question does not display that tagline? >> >> >> *Med vänliga hälsningar,Jan Ainali* >> >> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >> 0729 - 67 29 48 >> >> >> *Tänk dig en värld där varje människa har fri tillgång till >> mänsklighetens samlade kunskap. Det är det vi gör.* >> Bli medlem. http://blimedlem.wikimedia.se >> >> >> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >> magnusmanske@googlemail.com>: >> >>> Show automatic description underneath "From Wikipedia...": >>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>> >>> To use, add: >>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>> to your common.js >>> >>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>> wrote: >>> >>>> It would be even better if this (short: 3 field max) >>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>> should) and I can see the description thanks to another gadget (sorry no >>>> idea which one that is). Often I will update empty descriptions, but if I >>>> was served basic fields (so for a painting, the creator field), I would >>>> click through to update that too. >>>> >>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>> nemowiki@gmail.com> wrote: >>>> >>>>> Jane Darnell, 15/08/2015 08:53: >>>>> >>>>>> Yes but even if the descriptions were just the contents of >>>>>> fields >>>>>> separated by a pipe it would be better than nothing. >>>>>> >>>>> >>>>> +1, item descriptions are mostly useless in my experience. >>>>> >>>>> As for "get into production on Wikipedia" I don't know what it >>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>> >>>>> Nemo >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
"Pebasiconcha immanis", *largest known species of land snail, extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation Corps
post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
"Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer Fey*
"2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile and
Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface
I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org wrote:
> IMO, if the goal is quality, then human curated descriptions are > superior until such time as the auto-generation script passes the Turing > test ;) > > I see these empty descriptions as an amazing opportunity to give > *everyone* an easy new way to edit. I whipped an app editing interface up > at the Lyon hackathon: > bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 > > I used it to add a couple hundred descriptions in a single day just > by hitting "random" then adding descriptions for articles which didn't have > them. > > I'd love to try a limited test of this in production to get a sense > for how effective human curation can be if the interface is easy to use... > > > On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali <jan.ainali@wikimedia.se > > wrote: > >> Nice one! >> >> Does not appear to work on svwiki though. Does it have something to >> do with that the wiki in question does not display that tagline? >> >> >> *Med vänliga hälsningar,Jan Ainali* >> >> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >> 0729 - 67 29 48 >> >> >> *Tänk dig en värld där varje människa har fri tillgång till >> mänsklighetens samlade kunskap. Det är det vi gör.* >> Bli medlem. http://blimedlem.wikimedia.se >> >> >> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >> magnusmanske@googlemail.com>: >> >>> Show automatic description underneath "From Wikipedia...": >>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>> >>> To use, add: >>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>> to your common.js >>> >>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>> wrote: >>> >>>> It would be even better if this (short: 3 field max) >>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>> should) and I can see the description thanks to another gadget (sorry no >>>> idea which one that is). Often I will update empty descriptions, but if I >>>> was served basic fields (so for a painting, the creator field), I would >>>> click through to update that too. >>>> >>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>> nemowiki@gmail.com> wrote: >>>> >>>>> Jane Darnell, 15/08/2015 08:53: >>>>> >>>>>> Yes but even if the descriptions were just the contents of >>>>>> fields >>>>>> separated by a pipe it would be better than nothing. >>>>>> >>>>> >>>>> +1, item descriptions are mostly useless in my experience. >>>>> >>>>> As for "get into production on Wikipedia" I don't know what it >>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>> >>>>> Nemo >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Oh, and as for examples, random-paging just got me this:
https://en.wikipedia.org/wiki/Jules_Malou
Manual description: Belgian politician
Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂
I know which one I'd prefer...
On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske magnusmanske@googlemail.com wrote:
Thank you Dmitry! Well phrased and to the point!
As for "templating", that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to "free-style" text). We have a Visual Editor on Wikipedia for a reason :-)
On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbrant@wikimedia.org wrote:
My thoughts, as ever(!), are as follows:
- The tool that generates the descriptions deserves a lot more
development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man.
- Auto-generated descriptions work for current articles, and *all future
articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added.
- When you edit the descriptions yourself, you're not really making a
meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time.
As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach.
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation Corps
post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile and
Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface
I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
> IMO, allowing the user to edit the description is a missed > opportunity to make the user edit the actual *data*, such that the > description is generated correctly. > > > > On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org > wrote: > >> IMO, if the goal is quality, then human curated descriptions are >> superior until such time as the auto-generation script passes the Turing >> test ;) >> >> I see these empty descriptions as an amazing opportunity to give >> *everyone* an easy new way to edit. I whipped an app editing interface up >> at the Lyon hackathon: >> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >> >> I used it to add a couple hundred descriptions in a single day just >> by hitting "random" then adding descriptions for articles which didn't have >> them. >> >> I'd love to try a limited test of this in production to get a sense >> for how effective human curation can be if the interface is easy to use... >> >> >> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >> jan.ainali@wikimedia.se> wrote: >> >>> Nice one! >>> >>> Does not appear to work on svwiki though. Does it have something >>> to do with that the wiki in question does not display that tagline? >>> >>> >>> *Med vänliga hälsningar,Jan Ainali* >>> >>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>> 0729 - 67 29 48 >>> >>> >>> *Tänk dig en värld där varje människa har fri tillgång till >>> mänsklighetens samlade kunskap. Det är det vi gör.* >>> Bli medlem. http://blimedlem.wikimedia.se >>> >>> >>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>> magnusmanske@googlemail.com>: >>> >>>> Show automatic description underneath "From Wikipedia...": >>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>> >>>> To use, add: >>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>> to your common.js >>>> >>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>> wrote: >>>> >>>>> It would be even better if this (short: 3 field max) >>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>> should) and I can see the description thanks to another gadget (sorry no >>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>> was served basic fields (so for a painting, the creator field), I would >>>>> click through to update that too. >>>>> >>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>> nemowiki@gmail.com> wrote: >>>>> >>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>> >>>>>>> Yes but even if the descriptions were just the contents of >>>>>>> fields >>>>>>> separated by a pipe it would be better than nothing. >>>>>>> >>>>>> >>>>>> +1, item descriptions are mostly useless in my experience. >>>>>> >>>>>> As for "get into production on Wikipedia" I don't know what it >>>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>> >>>>>> Nemo >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> > > > -- > Dmitry Brant > Mobile Apps Team (Android) > Wikimedia Foundation > https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering > >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation Corps
post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile and
Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface
I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
> IMO, allowing the user to edit the description is a missed > opportunity to make the user edit the actual *data*, such that the > description is generated correctly. > > > > On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org > wrote: > >> IMO, if the goal is quality, then human curated descriptions are >> superior until such time as the auto-generation script passes the Turing >> test ;) >> >> I see these empty descriptions as an amazing opportunity to give >> *everyone* an easy new way to edit. I whipped an app editing interface up >> at the Lyon hackathon: >> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >> >> I used it to add a couple hundred descriptions in a single day just >> by hitting "random" then adding descriptions for articles which didn't have >> them. >> >> I'd love to try a limited test of this in production to get a sense >> for how effective human curation can be if the interface is easy to use... >> >> >> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >> jan.ainali@wikimedia.se> wrote: >> >>> Nice one! >>> >>> Does not appear to work on svwiki though. Does it have something >>> to do with that the wiki in question does not display that tagline? >>> >>> >>> *Med vänliga hälsningar,Jan Ainali* >>> >>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>> 0729 - 67 29 48 >>> >>> >>> *Tänk dig en värld där varje människa har fri tillgång till >>> mänsklighetens samlade kunskap. Det är det vi gör.* >>> Bli medlem. http://blimedlem.wikimedia.se >>> >>> >>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>> magnusmanske@googlemail.com>: >>> >>>> Show automatic description underneath "From Wikipedia...": >>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>> >>>> To use, add: >>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>> to your common.js >>>> >>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>> wrote: >>>> >>>>> It would be even better if this (short: 3 field max) >>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>> should) and I can see the description thanks to another gadget (sorry no >>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>> was served basic fields (so for a painting, the creator field), I would >>>>> click through to update that too. >>>>> >>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>> nemowiki@gmail.com> wrote: >>>>> >>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>> >>>>>>> Yes but even if the descriptions were just the contents of >>>>>>> fields >>>>>>> separated by a pipe it would be better than nothing. >>>>>>> >>>>>> >>>>>> +1, item descriptions are mostly useless in my experience. >>>>>> >>>>>> As for "get into production on Wikipedia" I don't know what it >>>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>> >>>>>> Nemo >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> > > > -- > Dmitry Brant > Mobile Apps Team (Android) > Wikimedia Foundation > https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering > >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
My hero Magnus Manske noted
The situation, for most languages, is this: No manual descriptions, on
basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?
The potential of AutoDesc is so enormous to attain "a world in which every single person on the planet is given free access to the sum of all human knowledge" that it should be the entire movement's top project. I nearly wrote a career-limiting e-mail rant to WMF-all on that subject last night.
In this e-mail thread we're talking about it in the limited scope of "Wikidata descriptions in search on mobile web beta", where the mobile client presents a useful signpost for *existing* articles, in an emblem on lead images and in search results. That's important but we're missing the forest for a single tree when discussing such a transformative technology. If only WMF had a CTO for such things [1].
Anyway, returning to this specific use case: * Nobody is saying store the AutoDesc in the Wikidata per-language description field. * Nobody is saying show the AutoDesc if there is an existing Wikidata description. * Is anybody against showing AutoDesc, after some refinement and productization [2], in these mobile use cases when there is no Wikidata description? * I propose the AutoDesc as a quality bar that any edit to a Wikidata description needs to improve on (but again that's a topic beyond this mail thread).
Yours, excitedly, =S Page
[1] http://grnh.se/30f54b , apply today! [2] https://bitbucket.org/magnusmanske/autodesc/src/HEAD/www/js/?at=master and https://github.com/dbrant/wikidata-autodesc . It's already a nodejs service, can we append "oid" and declare victory ? :-)
On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske magnusmanske@googlemail.com wrote:
Oh, and as for examples, random-paging just got me this:
https://en.wikipedia.org/wiki/Jules_Malou
Manual description: Belgian politician
Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂
I know which one I'd prefer...
On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske < magnusmanske@googlemail.com> wrote:
Thank you Dmitry! Well phrased and to the point!
As for "templating", that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to "free-style" text). We have a Visual Editor on Wikipedia for a reason :-)
On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbrant@wikimedia.org wrote:
My thoughts, as ever(!), are as follows:
- The tool that generates the descriptions deserves a lot more
development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man.
- Auto-generated descriptions work for current articles, and *all
future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added.
- When you edit the descriptions yourself, you're not really making a
meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time.
As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach.
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation
Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile
and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app
interface I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
> If having the most elegant description extraction mechanism was the > goal I would totally agree ;) > > On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org > wrote: > >> IMO, allowing the user to edit the description is a missed >> opportunity to make the user edit the actual *data*, such that the >> description is generated correctly. >> >> >> >> On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org >> wrote: >> >>> IMO, if the goal is quality, then human curated descriptions are >>> superior until such time as the auto-generation script passes the Turing >>> test ;) >>> >>> I see these empty descriptions as an amazing opportunity to give >>> *everyone* an easy new way to edit. I whipped an app editing interface up >>> at the Lyon hackathon: >>> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >>> >>> I used it to add a couple hundred descriptions in a single day >>> just by hitting "random" then adding descriptions for articles which didn't >>> have them. >>> >>> I'd love to try a limited test of this in production to get a >>> sense for how effective human curation can be if the interface is easy to >>> use... >>> >>> >>> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >>> jan.ainali@wikimedia.se> wrote: >>> >>>> Nice one! >>>> >>>> Does not appear to work on svwiki though. Does it have something >>>> to do with that the wiki in question does not display that tagline? >>>> >>>> >>>> *Med vänliga hälsningar,Jan Ainali* >>>> >>>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>>> 0729 - 67 29 48 >>>> >>>> >>>> *Tänk dig en värld där varje människa har fri tillgång till >>>> mänsklighetens samlade kunskap. Det är det vi gör.* >>>> Bli medlem. http://blimedlem.wikimedia.se >>>> >>>> >>>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>>> magnusmanske@googlemail.com>: >>>> >>>>> Show automatic description underneath "From Wikipedia...": >>>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>>> >>>>> To use, add: >>>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>>> to your common.js >>>>> >>>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>>> wrote: >>>>> >>>>>> It would be even better if this (short: 3 field max) >>>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>>> should) and I can see the description thanks to another gadget (sorry no >>>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>>> was served basic fields (so for a painting, the creator field), I would >>>>>> click through to update that too. >>>>>> >>>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>>> nemowiki@gmail.com> wrote: >>>>>> >>>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>>> >>>>>>>> Yes but even if the descriptions were just the contents of >>>>>>>> fields >>>>>>>> separated by a pipe it would be better than nothing. >>>>>>>> >>>>>>> >>>>>>> +1, item descriptions are mostly useless in my experience. >>>>>>> >>>>>>> As for "get into production on Wikipedia" I don't know what it >>>>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>>> >>>>>>> Nemo >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> >> -- >> Dmitry Brant >> Mobile Apps Team (Android) >> Wikimedia Foundation >> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >> >> >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation
Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile
and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app
interface I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
> If having the most elegant description extraction mechanism was the > goal I would totally agree ;) > > On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org > wrote: > >> IMO, allowing the user to edit the description is a missed >> opportunity to make the user edit the actual *data*, such that the >> description is generated correctly. >> >> >> >> On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org >> wrote: >> >>> IMO, if the goal is quality, then human curated descriptions are >>> superior until such time as the auto-generation script passes the Turing >>> test ;) >>> >>> I see these empty descriptions as an amazing opportunity to give >>> *everyone* an easy new way to edit. I whipped an app editing interface up >>> at the Lyon hackathon: >>> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >>> >>> I used it to add a couple hundred descriptions in a single day >>> just by hitting "random" then adding descriptions for articles which didn't >>> have them. >>> >>> I'd love to try a limited test of this in production to get a >>> sense for how effective human curation can be if the interface is easy to >>> use... >>> >>> >>> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >>> jan.ainali@wikimedia.se> wrote: >>> >>>> Nice one! >>>> >>>> Does not appear to work on svwiki though. Does it have something >>>> to do with that the wiki in question does not display that tagline? >>>> >>>> >>>> *Med vänliga hälsningar,Jan Ainali* >>>> >>>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>>> 0729 - 67 29 48 >>>> >>>> >>>> *Tänk dig en värld där varje människa har fri tillgång till >>>> mänsklighetens samlade kunskap. Det är det vi gör.* >>>> Bli medlem. http://blimedlem.wikimedia.se >>>> >>>> >>>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>>> magnusmanske@googlemail.com>: >>>> >>>>> Show automatic description underneath "From Wikipedia...": >>>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>>> >>>>> To use, add: >>>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>>> to your common.js >>>>> >>>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>>> wrote: >>>>> >>>>>> It would be even better if this (short: 3 field max) >>>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>>> should) and I can see the description thanks to another gadget (sorry no >>>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>>> was served basic fields (so for a painting, the creator field), I would >>>>>> click through to update that too. >>>>>> >>>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>>> nemowiki@gmail.com> wrote: >>>>>> >>>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>>> >>>>>>>> Yes but even if the descriptions were just the contents of >>>>>>>> fields >>>>>>>> separated by a pipe it would be better than nothing. >>>>>>>> >>>>>>> >>>>>>> +1, item descriptions are mostly useless in my experience. >>>>>>> >>>>>>> As for "get into production on Wikipedia" I don't know what it >>>>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>>> >>>>>>> Nemo >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> >> -- >> Dmitry Brant >> Mobile Apps Team (Android) >> Wikimedia Foundation >> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >> >> >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
No manual descriptions, on basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?
Not ignoring. In fact, if the auto-generated descriptions near the quality of human curated descriptions, I'm totally and wholeheartedly onboard that their use should be strongly considered.
I just disagree that closing the quality gap will involve "little programming and linguistic effort." I lean more toward "massive programming and linguistic effort" end of the spectrum.
Specifically, I think it will take massive effort to make the auto-generated descriptions so good that an average person would say, "hey these auto generated descriptions are better than the human curated descriptions" in the examples I posted.
But I may, of course, be wrong!
On Wed, Aug 19, 2015 at 1:27 PM, S Page spage@wikimedia.org wrote:
My hero Magnus Manske noted
The situation, for most languages, is this: No manual descriptions, on
basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?
The potential of AutoDesc is so enormous to attain "a world in which every single person on the planet is given free access to the sum of all human knowledge" that it should be the entire movement's top project. I nearly wrote a career-limiting e-mail rant to WMF-all on that subject last night.
In this e-mail thread we're talking about it in the limited scope of "Wikidata descriptions in search on mobile web beta", where the mobile client presents a useful signpost for *existing* articles, in an emblem on lead images and in search results. That's important but we're missing the forest for a single tree when discussing such a transformative technology. If only WMF had a CTO for such things [1].
Anyway, returning to this specific use case:
- Nobody is saying store the AutoDesc in the Wikidata per-language
description field.
- Nobody is saying show the AutoDesc if there is an existing Wikidata
description.
- Is anybody against showing AutoDesc, after some refinement and
productization [2], in these mobile use cases when there is no Wikidata description?
- I propose the AutoDesc as a quality bar that any edit to a Wikidata
description needs to improve on (but again that's a topic beyond this mail thread).
Yours, excitedly, =S Page
[1] http://grnh.se/30f54b , apply today! [2] https://bitbucket.org/magnusmanske/autodesc/src/HEAD/www/js/?at=master and https://github.com/dbrant/wikidata-autodesc . It's already a nodejs service, can we append "oid" and declare victory ? :-)
On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
Oh, and as for examples, random-paging just got me this:
https://en.wikipedia.org/wiki/Jules_Malou
Manual description: Belgian politician
Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂
I know which one I'd prefer...
On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske < magnusmanske@googlemail.com> wrote:
Thank you Dmitry! Well phrased and to the point!
As for "templating", that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to "free-style" text). We have a Visual Editor on Wikipedia for a reason :-)
On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbrant@wikimedia.org wrote:
My thoughts, as ever(!), are as follows:
- The tool that generates the descriptions deserves a lot more
development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man.
- Auto-generated descriptions work for current articles, and *all
future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added.
- When you edit the descriptions yourself, you're not really making a
meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time.
As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach.
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle <bgerstle@wikimedia.org
wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which
lasted 77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation
Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay
Dance Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile
and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut,
United States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state
of the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
> The whole human-vs-extracted descriptions quality question could be > fairly easy to test I think: > > - Pick, some number of articles at random. > - Run them through a description extraction script. > - Have a human describe the same articles with, say, the app > interface I demo'ed. > > If nothing else this exercise could perhaps make what's thus far > been a wildly abstract discussion more concrete. > > > > > On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org > wrote: > >> If having the most elegant description extraction mechanism was the >> goal I would totally agree ;) >> >> On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant <dbrant@wikimedia.org >> > wrote: >> >>> IMO, allowing the user to edit the description is a missed >>> opportunity to make the user edit the actual *data*, such that the >>> description is generated correctly. >>> >>> >>> >>> On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org >>> wrote: >>> >>>> IMO, if the goal is quality, then human curated descriptions are >>>> superior until such time as the auto-generation script passes the Turing >>>> test ;) >>>> >>>> I see these empty descriptions as an amazing opportunity to give >>>> *everyone* an easy new way to edit. I whipped an app editing interface up >>>> at the Lyon hackathon: >>>> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >>>> >>>> I used it to add a couple hundred descriptions in a single day >>>> just by hitting "random" then adding descriptions for articles which didn't >>>> have them. >>>> >>>> I'd love to try a limited test of this in production to get a >>>> sense for how effective human curation can be if the interface is easy to >>>> use... >>>> >>>> >>>> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >>>> jan.ainali@wikimedia.se> wrote: >>>> >>>>> Nice one! >>>>> >>>>> Does not appear to work on svwiki though. Does it have something >>>>> to do with that the wiki in question does not display that tagline? >>>>> >>>>> >>>>> *Med vänliga hälsningar,Jan Ainali* >>>>> >>>>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>>>> 0729 - 67 29 48 >>>>> >>>>> >>>>> *Tänk dig en värld där varje människa har fri tillgång till >>>>> mänsklighetens samlade kunskap. Det är det vi gör.* >>>>> Bli medlem. http://blimedlem.wikimedia.se >>>>> >>>>> >>>>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>>>> magnusmanske@googlemail.com>: >>>>> >>>>>> Show automatic description underneath "From Wikipedia...": >>>>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>>>> >>>>>> To use, add: >>>>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>>>> to your common.js >>>>>> >>>>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>>>> wrote: >>>>>> >>>>>>> It would be even better if this (short: 3 field max) >>>>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>>>> should) and I can see the description thanks to another gadget (sorry no >>>>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>>>> was served basic fields (so for a painting, the creator field), I would >>>>>>> click through to update that too. >>>>>>> >>>>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>>>> nemowiki@gmail.com> wrote: >>>>>>> >>>>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>>>> >>>>>>>>> Yes but even if the descriptions were just the contents of >>>>>>>>> fields >>>>>>>>> separated by a pipe it would be better than nothing. >>>>>>>>> >>>>>>>> >>>>>>>> +1, item descriptions are mostly useless in my experience. >>>>>>>> >>>>>>>> As for "get into production on Wikipedia" I don't know what >>>>>>>> it means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>>>> >>>>>>>> Nemo >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mobile-l mailing list >>>>>>> Mobile-l@lists.wikimedia.org >>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> >>> -- >>> Dmitry Brant >>> Mobile Apps Team (Android) >>> Wikimedia Foundation >>> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >>> >>> >> >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle <bgerstle@wikimedia.org
wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which
lasted 77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation
Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay
Dance Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile
and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut,
United States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state
of the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
> The whole human-vs-extracted descriptions quality question could be > fairly easy to test I think: > > - Pick, some number of articles at random. > - Run them through a description extraction script. > - Have a human describe the same articles with, say, the app > interface I demo'ed. > > If nothing else this exercise could perhaps make what's thus far > been a wildly abstract discussion more concrete. > > > > > On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org > wrote: > >> If having the most elegant description extraction mechanism was the >> goal I would totally agree ;) >> >> On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant <dbrant@wikimedia.org >> > wrote: >> >>> IMO, allowing the user to edit the description is a missed >>> opportunity to make the user edit the actual *data*, such that the >>> description is generated correctly. >>> >>> >>> >>> On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org >>> wrote: >>> >>>> IMO, if the goal is quality, then human curated descriptions are >>>> superior until such time as the auto-generation script passes the Turing >>>> test ;) >>>> >>>> I see these empty descriptions as an amazing opportunity to give >>>> *everyone* an easy new way to edit. I whipped an app editing interface up >>>> at the Lyon hackathon: >>>> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >>>> >>>> I used it to add a couple hundred descriptions in a single day >>>> just by hitting "random" then adding descriptions for articles which didn't >>>> have them. >>>> >>>> I'd love to try a limited test of this in production to get a >>>> sense for how effective human curation can be if the interface is easy to >>>> use... >>>> >>>> >>>> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >>>> jan.ainali@wikimedia.se> wrote: >>>> >>>>> Nice one! >>>>> >>>>> Does not appear to work on svwiki though. Does it have something >>>>> to do with that the wiki in question does not display that tagline? >>>>> >>>>> >>>>> *Med vänliga hälsningar,Jan Ainali* >>>>> >>>>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>>>> 0729 - 67 29 48 >>>>> >>>>> >>>>> *Tänk dig en värld där varje människa har fri tillgång till >>>>> mänsklighetens samlade kunskap. Det är det vi gör.* >>>>> Bli medlem. http://blimedlem.wikimedia.se >>>>> >>>>> >>>>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>>>> magnusmanske@googlemail.com>: >>>>> >>>>>> Show automatic description underneath "From Wikipedia...": >>>>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>>>> >>>>>> To use, add: >>>>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>>>> to your common.js >>>>>> >>>>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>>>> wrote: >>>>>> >>>>>>> It would be even better if this (short: 3 field max) >>>>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>>>> should) and I can see the description thanks to another gadget (sorry no >>>>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>>>> was served basic fields (so for a painting, the creator field), I would >>>>>>> click through to update that too. >>>>>>> >>>>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>>>> nemowiki@gmail.com> wrote: >>>>>>> >>>>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>>>> >>>>>>>>> Yes but even if the descriptions were just the contents of >>>>>>>>> fields >>>>>>>>> separated by a pipe it would be better than nothing. >>>>>>>>> >>>>>>>> >>>>>>>> +1, item descriptions are mostly useless in my experience. >>>>>>>> >>>>>>>> As for "get into production on Wikipedia" I don't know what >>>>>>>> it means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>>>> >>>>>>>> Nemo >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mobile-l mailing list >>>>>>> Mobile-l@lists.wikimedia.org >>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> >>> -- >>> Dmitry Brant >>> Mobile Apps Team (Android) >>> Wikimedia Foundation >>> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >>> >>> >> >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- =S Page WMF Tech writer
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Wed, Aug 19, 2015 at 11:19 PM Monte Hurd mhurd@wikimedia.org wrote:
No manual descriptions, on basically any item. And that will remain so for
the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?
Not ignoring. In fact, if the auto-generated descriptions near the quality of human curated descriptions, I'm totally and wholeheartedly onboard that their use should be strongly considered.
I just disagree that closing the quality gap will involve "little programming and linguistic effort." I lean more toward "massive programming and linguistic effort" end of the spectrum.
Specifically, I think it will take massive effort to make the auto-generated descriptions so good that an average person would say, "hey these auto generated descriptions are better than the human curated descriptions" in the examples I posted.
You are confusing (in the literal meaning of the word, fusing together)
several issues into one here, which you then call "better". I see at least five distinct types of "better":
1. A description exists, vs. it does not. In that aspect, automatic descriptions will always be "better" than manual ones.
2. One description is more complete than the other. From what I see in random examples, this is already the case for many biographical items that have a lot of statements. I have actually considered cutting them back a little, because even these "short" descriptions can get quite extensive.
3. Context-aware, specifically, the context where the description is shown. This one goes to the automatic descriptions. AutoDesc already can generate plain text, links to Wikidata, links to a specific Wikipedia where there are articles, and use plain text/redlinks/Wikidata links otherwise. It can generate Wikitext, with some infoboxes. It could easily generate HTML blurbs with a thumbnail if there is an image, and so on. This if contrasted with plain text for manual descriptions.
4. Linguistic/style. Manual descriptions CAN be better phrased than automatic ones, but can also be worse. Automatic descriptions are unimaginative, but consistent. Here is where I probably beg to differ from most other people on this thread: I firmly believe that a description, even if it is slightly wrong grammatically, is preferable to no description, as long as humans still can understand what is meant. If the German description gets the gender of "moon" wrong, so what? (I don't think it does, but just for the sake of argument) Eventually, someone will implement a fix for that. Maybe we'll have gender for things per language as statements at some point, which would be useful beyond autodesc.
5. "To the point". That is where manual descriptions have their only advantage in the long run. Even from a lot of statements, it is hard for an algorithm to figure out why exactly that person, that thing, that event are important. Sometime it is something "obscure", something that does not fit well into statements, or is "hidden" among them. And there, and only there, do manual descriptions make sense, as I have always maintained.
I am well aware of the limitations of automatic descriptions. I can also see that "perfection" will never be reached, that the algorithms will never be finished.
Like Wikipedia.
True about algorithms never being finished, but aren't we essentially "stuck" with the first run output, unless I misunderstand how you envision this working?
(assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process)
On Thu, Aug 20, 2015 at 1:43 AM Monte Hurd mhurd@wikimedia.org wrote:
True about algorithms never being finished, but aren't we essentially "stuck" with the first run output, unless I misunderstand how you envision this working?
(assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process)
Of course we're not "stuck" with the initial automatic descriptions! Whatever gave you that idea? Ideally, each description would be computed on-the-fly, but that won't scale; output needs to be cached, and invalidated when necessary.
Possible reasons for cache invalidation: * The item statements have changed * Items referenced in the description (e.g. country for nationality) have changed * The algorithm has been improved * After cache reached a certain age, just to make sure
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
So it turns out that ValterVBot alone has created over 1.8 MILLION "manual" descriptions. And there are other bots that do this. We already HAVE automatic descriptions, we just store them in the "manual" field.
The worst of both worlds.
On Thu, Aug 20, 2015 at 9:24 AM Magnus Manske magnusmanske@googlemail.com wrote:
On Thu, Aug 20, 2015 at 1:43 AM Monte Hurd mhurd@wikimedia.org wrote:
True about algorithms never being finished, but aren't we essentially "stuck" with the first run output, unless I misunderstand how you envision this working?
(assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process)
Of course we're not "stuck" with the initial automatic descriptions! Whatever gave you that idea? Ideally, each description would be computed on-the-fly, but that won't scale; output needs to be cached, and invalidated when necessary.
Possible reasons for cache invalidation:
- The item statements have changed
- Items referenced in the description (e.g. country for nationality) have
changed
- The algorithm has been improved
- After cache reached a certain age, just to make sure
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
+1000!!!! Exactly! I was operating under the assumption we were talking about the existing "description" field. Separate auto and manual description fields completely avoids *all* of the issues/concerns I raised :)
On Thu, Aug 20, 2015 at 2:48 AM, Magnus Manske magnusmanske@googlemail.com wrote:
So it turns out that ValterVBot alone has created over 1.8 MILLION "manual" descriptions. And there are other bots that do this. We already HAVE automatic descriptions, we just store them in the "manual" field.
The worst of both worlds.
On Thu, Aug 20, 2015 at 9:24 AM Magnus Manske magnusmanske@googlemail.com wrote:
On Thu, Aug 20, 2015 at 1:43 AM Monte Hurd mhurd@wikimedia.org wrote:
True about algorithms never being finished, but aren't we essentially "stuck" with the first run output, unless I misunderstand how you envision this working?
(assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process)
Of course we're not "stuck" with the initial automatic descriptions! Whatever gave you that idea? Ideally, each description would be computed on-the-fly, but that won't scale; output needs to be cached, and invalidated when necessary.
Possible reasons for cache invalidation:
- The item statements have changed
- Items referenced in the description (e.g. country for nationality) have
changed
- The algorithm has been improved
- After cache reached a certain age, just to make sure
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
This is a really interesting discussion and it seems that there is near-consensus that an automated description for entities without a manual description is not a bad idea, particularly if they are kept in a separate field. Speak now if you feel that is not correct.
To S's suggestion: what steps do we need to take to put autodesc into wiki's?
- establish consensus with stakeholders outside this thread? - create new field? - rule out/protect against edge cases (are their length limits, for instance) - ways to edit (explaining to a user how they can edit or override is going to be important)
Who should own it and create an epic to track? Wikidata, Search, Reading?....
On Fri, Aug 21, 2015 at 10:27 AM, Monte Hurd mhurd@wikimedia.org wrote:
This is why the automatic description cache and the manual description
need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
+1000!!!! Exactly! I was operating under the assumption we were talking about the existing "description" field. Separate auto and manual description fields completely avoids *all* of the issues/concerns I raised :)
On Thu, Aug 20, 2015 at 2:48 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
So it turns out that ValterVBot alone has created over 1.8 MILLION "manual" descriptions. And there are other bots that do this. We already HAVE automatic descriptions, we just store them in the "manual" field.
The worst of both worlds.
On Thu, Aug 20, 2015 at 9:24 AM Magnus Manske < magnusmanske@googlemail.com> wrote:
On Thu, Aug 20, 2015 at 1:43 AM Monte Hurd mhurd@wikimedia.org wrote:
True about algorithms never being finished, but aren't we essentially "stuck" with the first run output, unless I misunderstand how you envision this working?
(assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process)
Of course we're not "stuck" with the initial automatic descriptions! Whatever gave you that idea? Ideally, each description would be computed on-the-fly, but that won't scale; output needs to be cached, and invalidated when necessary.
Possible reasons for cache invalidation:
- The item statements have changed
- Items referenced in the description (e.g. country for nationality)
have changed
- The algorithm has been improved
- After cache reached a certain age, just to make sure
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
If the way to 'edit' the autodescription is by changing the claims for the item, I support the idea. I would oppose, however, the autodescription being another text field you can edit directly as I think this would be very confusing for Wikidata editors, as each item would effectively just have 2 interchangable description fields.
On Aug 21, 2015, at 11:21 AM, Jon Katz jkatz@wikimedia.org wrote:
This is a really interesting discussion and it seems that there is near-consensus that an automated description for entities without a manual description is not a bad idea, particularly if they are kept in a separate field. Speak now if you feel that is not correct.
To S's suggestion: what steps do we need to take to put autodesc into wiki's? establish consensus with stakeholders outside this thread? create new field? rule out/protect against edge cases (are their length limits, for instance) ways to edit (explaining to a user how they can edit or override is going to be important)
Who should own it and create an epic to track? Wikidata, Search, Reading?....
On Fri, Aug 21, 2015 at 10:27 AM, Monte Hurd mhurd@wikimedia.org wrote:
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
+1000!!!! Exactly! I was operating under the assumption we were talking about the existing "description" field. Separate auto and manual description fields completely avoids *all* of the issues/concerns I raised :)
On Thu, Aug 20, 2015 at 2:48 AM, Magnus Manske magnusmanske@googlemail.com wrote:
So it turns out that ValterVBot alone has created over 1.8 MILLION "manual" descriptions. And there are other bots that do this. We already HAVE automatic descriptions, we just store them in the "manual" field.
The worst of both worlds.
On Thu, Aug 20, 2015 at 9:24 AM Magnus Manske magnusmanske@googlemail.com wrote:
On Thu, Aug 20, 2015 at 1:43 AM Monte Hurd mhurd@wikimedia.org wrote:
True about algorithms never being finished, but aren't we essentially "stuck" with the first run output, unless I misunderstand how you envision this working?
(assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process)
Of course we're not "stuck" with the initial automatic descriptions! Whatever gave you that idea? Ideally, each description would be computed on-the-fly, but that won't scale; output needs to be cached, and invalidated when necessary.
Possible reasons for cache invalidation:
- The item statements have changed
- Items referenced in the description (e.g. country for nationality) have changed
- The algorithm has been improved
- After cache reached a certain age, just to make sure
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
I am with Ryan here, and I believe that is Magnus idea too, the autodescription should not be a field in the database, it should be queried on the fly from the statements.
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-21 21:26 GMT+02:00 Ryan Kaldari rkaldari@wikimedia.org:
If the way to 'edit' the autodescription is by changing the claims for the item, I support the idea. I would oppose, however, the autodescription being another text field you can edit directly as I think this would be very confusing for Wikidata editors, as each item would effectively just have 2 interchangable description fields.
On Aug 21, 2015, at 11:21 AM, Jon Katz jkatz@wikimedia.org wrote:
This is a really interesting discussion and it seems that there is near-consensus that an automated description for entities without a manual description is not a bad idea, particularly if they are kept in a separate field. Speak now if you feel that is not correct.
To S's suggestion: what steps do we need to take to put autodesc into wiki's?
- establish consensus with stakeholders outside this thread?
- create new field?
- rule out/protect against edge cases (are their length limits, for
instance)
- ways to edit (explaining to a user how they can edit or override is
going to be important)
Who should own it and create an epic to track? Wikidata, Search, Reading?....
On Fri, Aug 21, 2015 at 10:27 AM, Monte Hurd mhurd@wikimedia.org wrote:
This is why the automatic description cache and the manual description
need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
+1000!!!! Exactly! I was operating under the assumption we were talking about the existing "description" field. Separate auto and manual description fields completely avoids *all* of the issues/concerns I raised :)
On Thu, Aug 20, 2015 at 2:48 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
So it turns out that ValterVBot alone has created over 1.8 MILLION "manual" descriptions. And there are other bots that do this. We already HAVE automatic descriptions, we just store them in the "manual" field.
The worst of both worlds.
On Thu, Aug 20, 2015 at 9:24 AM Magnus Manske < magnusmanske@googlemail.com> wrote:
On Thu, Aug 20, 2015 at 1:43 AM Monte Hurd mhurd@wikimedia.org wrote:
True about algorithms never being finished, but aren't we essentially "stuck" with the first run output, unless I misunderstand how you envision this working?
(assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process)
Of course we're not "stuck" with the initial automatic descriptions! Whatever gave you that idea? Ideally, each description would be computed on-the-fly, but that won't scale; output needs to be cached, and invalidated when necessary.
Possible reasons for cache invalidation:
- The item statements have changed
- Items referenced in the description (e.g. country for nationality)
have changed
- The algorithm has been improved
- After cache reached a certain age, just to make sure
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Yes. This should be a client feature, not a Wikidata feature (so something that is on Wikipedia and Commons)
On Fri, Aug 21, 2015 at 10:54 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
I am with Ryan here, and I believe that is Magnus idea too, the autodescription should not be a field in the database, it should be queried on the fly from the statements.
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-21 21:26 GMT+02:00 Ryan Kaldari rkaldari@wikimedia.org:
If the way to 'edit' the autodescription is by changing the claims for the item, I support the idea. I would oppose, however, the autodescription being another text field you can edit directly as I think this would be very confusing for Wikidata editors, as each item would effectively just have 2 interchangable description fields.
On Aug 21, 2015, at 11:21 AM, Jon Katz jkatz@wikimedia.org wrote:
This is a really interesting discussion and it seems that there is near-consensus that an automated description for entities without a manual description is not a bad idea, particularly if they are kept in a separate field. Speak now if you feel that is not correct.
To S's suggestion: what steps do we need to take to put autodesc into wiki's?
- establish consensus with stakeholders outside this thread?
- create new field?
- rule out/protect against edge cases (are their length limits, for
instance)
- ways to edit (explaining to a user how they can edit or override is
going to be important)
Who should own it and create an epic to track? Wikidata, Search, Reading?....
On Fri, Aug 21, 2015 at 10:27 AM, Monte Hurd mhurd@wikimedia.org wrote:
This is why the automatic description cache and the manual description
need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
+1000!!!! Exactly! I was operating under the assumption we were talking about the existing "description" field. Separate auto and manual description fields completely avoids *all* of the issues/concerns I raised :)
On Thu, Aug 20, 2015 at 2:48 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
So it turns out that ValterVBot alone has created over 1.8 MILLION "manual" descriptions. And there are other bots that do this. We already HAVE automatic descriptions, we just store them in the "manual" field.
The worst of both worlds.
On Thu, Aug 20, 2015 at 9:24 AM Magnus Manske < magnusmanske@googlemail.com> wrote:
On Thu, Aug 20, 2015 at 1:43 AM Monte Hurd mhurd@wikimedia.org wrote:
True about algorithms never being finished, but aren't we essentially "stuck" with the first run output, unless I misunderstand how you envision this working?
(assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process)
Of course we're not "stuck" with the initial automatic descriptions! Whatever gave you that idea? Ideally, each description would be computed on-the-fly, but that won't scale; output needs to be cached, and invalidated when necessary.
Possible reasons for cache invalidation:
- The item statements have changed
- Items referenced in the description (e.g. country for nationality)
have changed
- The algorithm has been improved
- After cache reached a certain age, just to make sure
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On vacation, internet spotty. Quick thoughts: * RESTbase seems a good way to cache automatic descriptions; would either need a "wrapper" to generate on-the-fly or serve cached version, or a bot generating, storing, and updating automatic descriptions for all items in RESTbase * Maybe only generate/update automatic descriptions for item types that have dedicated generator code (e.g. biographies), and for supported languages, at least initially? "Generic" English description should be understandable for most items, don't know for other languages * Dmitry has done some work on a /proper/ AutoDesc implementation; couldn't try it out yet, sadly, but looks great so far * It should be straightforward to (optionally) link words/names in descriptions to the #statement/property in the described item, to quickly edit wrong statements
Love to see how people take this idea and run with it. That's the spirit! :-)
On Sat, Aug 22, 2015 at 11:07 AM Jane Darnell jane023@gmail.com wrote:
Yes. This should be a client feature, not a Wikidata feature (so something that is on Wikipedia and Commons)
On Fri, Aug 21, 2015 at 10:54 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
I am with Ryan here, and I believe that is Magnus idea too, the autodescription should not be a field in the database, it should be queried on the fly from the statements.
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-21 21:26 GMT+02:00 Ryan Kaldari rkaldari@wikimedia.org:
If the way to 'edit' the autodescription is by changing the claims for the item, I support the idea. I would oppose, however, the autodescription being another text field you can edit directly as I think this would be very confusing for Wikidata editors, as each item would effectively just have 2 interchangable description fields.
On Aug 21, 2015, at 11:21 AM, Jon Katz jkatz@wikimedia.org wrote:
This is a really interesting discussion and it seems that there is near-consensus that an automated description for entities without a manual description is not a bad idea, particularly if they are kept in a separate field. Speak now if you feel that is not correct.
To S's suggestion: what steps do we need to take to put autodesc into wiki's?
- establish consensus with stakeholders outside this thread?
- create new field?
- rule out/protect against edge cases (are their length limits, for
instance)
- ways to edit (explaining to a user how they can edit or override
is going to be important)
Who should own it and create an epic to track? Wikidata, Search, Reading?....
On Fri, Aug 21, 2015 at 10:27 AM, Monte Hurd mhurd@wikimedia.org wrote:
This is why the automatic description cache and the manual description
need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
+1000!!!! Exactly! I was operating under the assumption we were talking about the existing "description" field. Separate auto and manual description fields completely avoids *all* of the issues/concerns I raised :)
On Thu, Aug 20, 2015 at 2:48 AM, Magnus Manske < magnusmanske@googlemail.com> wrote:
So it turns out that ValterVBot alone has created over 1.8 MILLION "manual" descriptions. And there are other bots that do this. We already HAVE automatic descriptions, we just store them in the "manual" field.
The worst of both worlds.
On Thu, Aug 20, 2015 at 9:24 AM Magnus Manske < magnusmanske@googlemail.com> wrote:
On Thu, Aug 20, 2015 at 1:43 AM Monte Hurd mhurd@wikimedia.org wrote:
> True about algorithms never being finished, but aren't we > essentially "stuck" with the first run output, unless I misunderstand how > you envision this working? > > (assuming you don't want to over-write non-blank descriptions the > next time you improve and re-run the process) >
Of course we're not "stuck" with the initial automatic descriptions! Whatever gave you that idea? Ideally, each description would be computed on-the-fly, but that won't scale; output needs to be cached, and invalidated when necessary.
Possible reasons for cache invalidation:
- The item statements have changed
- Items referenced in the description (e.g. country for nationality)
have changed
- The algorithm has been improved
- After cache reached a certain age, just to make sure
This is why the automatic description cache and the manual description need to be kept separate; just "pasting" the autodesc into the manual description field would mean it could never be updated automatically. That would be very bad indeed.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Fri, Aug 21, 2015 at 11:21 AM, Jon Katz jkatz@wikimedia.org wrote:
To S's suggestion: what steps do we need to take to put autodesc into wiki's?
Noooo! Saying "put" or "store" produces resistance. This is about when and where to _display_ an AutoDesc that's generated on-the-fly from Wikidata. Caching it is an optimization detail. The second message in this thread said
Rather, cache [auto] descriptions separately, and update them as required
yet we keep reviving a dead horse.
- establish consensus with stakeholders outside this thread?
I think the Reading team can decide to show the AutoDesc on lead images
and in mobile search results when there's no Wikidata description.
- create new field?
Never. Cache it in RESTBase.
- rule out/protect against edge cases (are their length limits, for
instance)
- ways to edit (explaining to a user how they can edit or override is
going to be important)
I think Monte's excellent prototype of editing descriptions on Mobile
(T90765) should show the AutoDesc, as in "Try to write something better than this". However, Lydia Pintscher declined my T109772 "present the short AutoDesc of an item when editing its description", giving some cogent blockers.
If the AutoDesc is inaccurate solely because a fact in Wikidata is wrong, then the user should update the item in Wikidata rather than add a manual description. As Dimitry wrote
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
I don't know if AutoDesc could link every piece of the description to the fact generating it.
Who should own it and create an epic to track? Wikidata, Search,
Reading?....
The CTO, i.e. bring it up at some Engineering management meeting.
Magnus Manske wrote:
So it turns out that ValterVBot alone has created over 1.8 MILLION "manual" descriptions. And there are other bots that do this. We already HAVE automatic descriptions, we just store them in the "manual" field.
The worst of both worlds.
The longer we go without a productized AutoDesc that's shown whenever there isn't a manual description, the more people will do this.
Regards,
Those were literally the first 10 random articles I encountered which didn't have descriptions.
The tool that generates the descriptions deserves a lot more development.
Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man.
It's not a straw man at all - it's a baseline to move the discussion away from the abstract. We need to start looking at real examples.
One of my main concerns is "a lot more development" is actually an understatement as many of the optimizations will be language dependent.
On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske magnusmanske@googlemail.com wrote:
Oh, and as for examples, random-paging just got me this:
https://en.wikipedia.org/wiki/Jules_Malou
Manual description: Belgian politician
Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂
I know which one I'd prefer...
On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske < magnusmanske@googlemail.com> wrote:
Thank you Dmitry! Well phrased and to the point!
As for "templating", that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to "free-style" text). We have a Visual Editor on Wikipedia for a reason :-)
On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbrant@wikimedia.org wrote:
My thoughts, as ever(!), are as follows:
- The tool that generates the descriptions deserves a lot more
development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man.
- Auto-generated descriptions work for current articles, and *all
future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added.
- When you edit the descriptions yourself, you're not really making a
meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time.
As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach.
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation
Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile
and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app
interface I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
> If having the most elegant description extraction mechanism was the > goal I would totally agree ;) > > On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org > wrote: > >> IMO, allowing the user to edit the description is a missed >> opportunity to make the user edit the actual *data*, such that the >> description is generated correctly. >> >> >> >> On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org >> wrote: >> >>> IMO, if the goal is quality, then human curated descriptions are >>> superior until such time as the auto-generation script passes the Turing >>> test ;) >>> >>> I see these empty descriptions as an amazing opportunity to give >>> *everyone* an easy new way to edit. I whipped an app editing interface up >>> at the Lyon hackathon: >>> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >>> >>> I used it to add a couple hundred descriptions in a single day >>> just by hitting "random" then adding descriptions for articles which didn't >>> have them. >>> >>> I'd love to try a limited test of this in production to get a >>> sense for how effective human curation can be if the interface is easy to >>> use... >>> >>> >>> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >>> jan.ainali@wikimedia.se> wrote: >>> >>>> Nice one! >>>> >>>> Does not appear to work on svwiki though. Does it have something >>>> to do with that the wiki in question does not display that tagline? >>>> >>>> >>>> *Med vänliga hälsningar,Jan Ainali* >>>> >>>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>>> 0729 - 67 29 48 >>>> >>>> >>>> *Tänk dig en värld där varje människa har fri tillgång till >>>> mänsklighetens samlade kunskap. Det är det vi gör.* >>>> Bli medlem. http://blimedlem.wikimedia.se >>>> >>>> >>>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>>> magnusmanske@googlemail.com>: >>>> >>>>> Show automatic description underneath "From Wikipedia...": >>>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>>> >>>>> To use, add: >>>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>>> to your common.js >>>>> >>>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>>> wrote: >>>>> >>>>>> It would be even better if this (short: 3 field max) >>>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>>> should) and I can see the description thanks to another gadget (sorry no >>>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>>> was served basic fields (so for a painting, the creator field), I would >>>>>> click through to update that too. >>>>>> >>>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>>> nemowiki@gmail.com> wrote: >>>>>> >>>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>>> >>>>>>>> Yes but even if the descriptions were just the contents of >>>>>>>> fields >>>>>>>> separated by a pipe it would be better than nothing. >>>>>>>> >>>>>>> >>>>>>> +1, item descriptions are mostly useless in my experience. >>>>>>> >>>>>>> As for "get into production on Wikipedia" I don't know what it >>>>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>>> >>>>>>> Nemo >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> >> -- >> Dmitry Brant >> Mobile Apps Team (Android) >> Wikimedia Foundation >> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >> >> >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)?
I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available.
On Tuesday, August 18, 2015, Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
- "Pebasiconcha immanis", *largest known species of land snail,
extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted
77 seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation
Corps post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
- "Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer
Fey*
- "2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile
and Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of
the United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app
interface I demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
> If having the most elegant description extraction mechanism was the > goal I would totally agree ;) > > On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org > wrote: > >> IMO, allowing the user to edit the description is a missed >> opportunity to make the user edit the actual *data*, such that the >> description is generated correctly. >> >> >> >> On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org >> wrote: >> >>> IMO, if the goal is quality, then human curated descriptions are >>> superior until such time as the auto-generation script passes the Turing >>> test ;) >>> >>> I see these empty descriptions as an amazing opportunity to give >>> *everyone* an easy new way to edit. I whipped an app editing interface up >>> at the Lyon hackathon: >>> bluetooth720 https://www.youtube.com/watch?v=6VblyGhf_c8 >>> >>> I used it to add a couple hundred descriptions in a single day >>> just by hitting "random" then adding descriptions for articles which didn't >>> have them. >>> >>> I'd love to try a limited test of this in production to get a >>> sense for how effective human curation can be if the interface is easy to >>> use... >>> >>> >>> On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali < >>> jan.ainali@wikimedia.se> wrote: >>> >>>> Nice one! >>>> >>>> Does not appear to work on svwiki though. Does it have something >>>> to do with that the wiki in question does not display that tagline? >>>> >>>> >>>> *Med vänliga hälsningar,Jan Ainali* >>>> >>>> Verksamhetschef, Wikimedia Sverige http://wikimedia.se >>>> 0729 - 67 29 48 >>>> >>>> >>>> *Tänk dig en värld där varje människa har fri tillgång till >>>> mänsklighetens samlade kunskap. Det är det vi gör.* >>>> Bli medlem. http://blimedlem.wikimedia.se >>>> >>>> >>>> 2015-08-18 17:23 GMT+02:00 Magnus Manske < >>>> magnusmanske@googlemail.com>: >>>> >>>>> Show automatic description underneath "From Wikipedia...": >>>>> https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js >>>>> >>>>> To use, add: >>>>> importScript ( 'User:Magnus_Manske/autodesc.js' ) ; >>>>> to your common.js >>>>> >>>>> On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com >>>>> wrote: >>>>> >>>>>> It would be even better if this (short: 3 field max) >>>>>> pipe-separated list was available as a gadget to wikidatans on Wikipedia >>>>>> (like me). I can't see if a page I am on has an "instance of" (though it >>>>>> should) and I can see the description thanks to another gadget (sorry no >>>>>> idea which one that is). Often I will update empty descriptions, but if I >>>>>> was served basic fields (so for a painting, the creator field), I would >>>>>> click through to update that too. >>>>>> >>>>>> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >>>>>> nemowiki@gmail.com> wrote: >>>>>> >>>>>>> Jane Darnell, 15/08/2015 08:53: >>>>>>> >>>>>>>> Yes but even if the descriptions were just the contents of >>>>>>>> fields >>>>>>>> separated by a pipe it would be better than nothing. >>>>>>>> >>>>>>> >>>>>>> +1, item descriptions are mostly useless in my experience. >>>>>>> >>>>>>> As for "get into production on Wikipedia" I don't know what it >>>>>>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>>>>>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>>>>>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>>>>>> >>>>>>> Nemo >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> >> -- >> Dmitry Brant >> Mobile Apps Team (Android) >> Wikimedia Foundation >> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >> >> >
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
There is no question that there is a lot of room for improvement of autodesc. There are also some instances where a manual description is vastly superior to an automatic one, where the algorithm can not catch the point of why that item is important.
However, consider this: * volunteer time to manually update these descriptions: a few minutes? (load, read, understand, type, save...) * volunteer time to have them generated automatically: none (well, mine, but distributed over 14M items times 250 languages, lim->0)
I noticed there are no biographies in your list, which is surprising, considering those are most numerous "class" of items. It is also one of the few classes where autodesc does something more clever than "generic description". I assume this was not intentional ;-)
The situation, for most languages, is this: No manual descriptions, on basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. Adding a manual description can help speakers of that language; adding a statement, and thereby improving automatic descriptions in all languages, helps everyone. With essentially the same volunteer effort. This is a "force multiplier" of volunteer effort with a factor of 250. And we ignore that ... why, exactly?
On Wed, Aug 19, 2015 at 3:25 AM Monte Hurd mhurd@wikimedia.org wrote:
Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions:
- "Courage Under Fire", *1996 film about a Gulf War friendly-fire
incident*
"Pebasiconcha immanis", *largest known species of land snail, extinct*
"List of Kenyan writers", *notable Kenyan authors*
"Solar eclipse of December 14, 1917", *annular eclipse which lasted 77
seconds*
- "Natchaug Forest Lumber Shed", *historic Civilian Conservation Corps
post-and-beam building*
- "Sun of Jamaica (album)", *debut 1980 studio album by Goombay Dance
Band*
"E-1027", *modernist villa in France by architect Eileen Gray*
"Daingerfield State Park", *park in Morris County, Texas, USA,
bordering Lake Daingerfield*
"Todo Lo Que Soy-En Vivo", *2014 Live album by Mexican pop singer Fey*
"2009 UEFA Regions' Cup", *6th UEFA Regions' Cup, won by Castile and
Leon*
And here are the respective descriptions from Magnus' (quite excellent) autodesc.js:
- "Courage Under Fire", *1996 film by Edward Zwick, produced by John
Davis and David T. Friendly from United States of America*
"Pebasiconcha immanis", *species of Mollusca*
"List of Kenyan writers", *Wikimedia list article*
"Solar eclipse of December 14, 1917", *solar eclipse*
"Natchaug Forest Lumber Shed", *Construction in Connecticut, United
States of America*
"Sun of Jamaica (album)", *album*
"E-1027", *villa in Roquebrune-Cap-Martin, France*
"Daingerfield State Park", *state park and state park of a state of the
United States in Texas, United States of America*
"Todo Lo Que Soy-En Vivo", *live album by Fey*
"2009 UEFA Regions' Cup", *none*
Thoughts?
Just trying to make my own bold assertions falsifiable :)
On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mhurd@wikimedia.org wrote:
The whole human-vs-extracted descriptions quality question could be fairly easy to test I think:
- Pick, some number of articles at random.
- Run them through a description extraction script.
- Have a human describe the same articles with, say, the app interface I
demo'ed.
If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete.
On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mhurd@wikimedia.org wrote:
If having the most elegant description extraction mechanism was the goal I would totally agree ;)
On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbrant@wikimedia.org wrote:
IMO, allowing the user to edit the description is a missed opportunity to make the user edit the actual *data*, such that the description is generated correctly.
On Tue, Aug 18, 2015 at 8:02 PM, Monte Hurd mhurd@wikimedia.org wrote:
IMO, if the goal is quality, then human curated descriptions are superior until such time as the auto-generation script passes the Turing test ;)
I see these empty descriptions as an amazing opportunity to give *everyone* an easy new way to edit. I whipped an app editing interface up at the Lyon hackathon: https://www.youtube.com/watch?v=6VblyGhf_c8
I used it to add a couple hundred descriptions in a single day just by hitting "random" then adding descriptions for articles which didn't have them.
I'd love to try a limited test of this in production to get a sense for how effective human curation can be if the interface is easy to use...
On Tue, Aug 18, 2015 at 1:25 PM, Jan Ainali jan.ainali@wikimedia.se wrote:
Nice one!
Does not appear to work on svwiki though. Does it have something to do with that the wiki in question does not display that tagline?
*Med vänliga hälsningar,Jan Ainali*
Verksamhetschef, Wikimedia Sverige http://wikimedia.se 0729 - 67 29 48
*Tänk dig en värld där varje människa har fri tillgång till mänsklighetens samlade kunskap. Det är det vi gör.* Bli medlem. http://blimedlem.wikimedia.se
2015-08-18 17:23 GMT+02:00 Magnus Manske <magnusmanske@googlemail.com >:
> Show automatic description underneath "From Wikipedia...": > https://en.wikipedia.org/wiki/User:Magnus_Manske/autodesc.js > > To use, add: > importScript ( 'User:Magnus_Manske/autodesc.js' ) ; > to your common.js > > On Tue, Aug 18, 2015 at 9:47 AM Jane Darnell jane023@gmail.com > wrote: > >> It would be even better if this (short: 3 field max) pipe-separated >> list was available as a gadget to wikidatans on Wikipedia (like me). I >> can't see if a page I am on has an "instance of" (though it should) and I >> can see the description thanks to another gadget (sorry no idea which one >> that is). Often I will update empty descriptions, but if I was served basic >> fields (so for a painting, the creator field), I would click through to >> update that too. >> >> On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) < >> nemowiki@gmail.com> wrote: >> >>> Jane Darnell, 15/08/2015 08:53: >>> >>>> Yes but even if the descriptions were just the contents of fields >>>> separated by a pipe it would be better than nothing. >>>> >>> >>> +1, item descriptions are mostly useless in my experience. >>> >>> As for "get into production on Wikipedia" I don't know what it >>> means, I certainly don't like 1) mobile-specific features, 2) overriding >>> existing manually curated content; but it's good to 3) fill gaps. Mobile >>> folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :) >>> >>> Nemo >>> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Tue, Aug 18, 2015 at 9:58 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Jane Darnell, 15/08/2015 08:53:
Yes but even if the descriptions were just the contents of fields separated by a pipe it would be better than nothing.
+1, item descriptions are mostly useless in my experience.
Anecdotal. I know plenty of readers that find them useful for knowing if they should click something, and it is also an anecdotal useless opinion.
As for "get into production on Wikipedia" I don't know what it means, I certainly don't like 1) mobile-specific features, 2) overriding existing manually curated content; but it's good to 3) fill gaps. Mobile folks often do (1) and (2), if they *instead* did (3) I'd be very happy. :)
Nemo
Luckily there's no mobile teams any more after the reorg.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On 08/18/2015 10:29 AM, Joaquin Oltra Hernandez wrote:
Luckily there's no mobile teams any more after the reorg.
Are you sure? [1] In any case, it would be interesting to look at how many commits and contributions the "Desktop & Mobile Web" team has made to the desktop interface.
[1] https://wikimediafoundation.org/wiki/Staff_and_contractors#Mobile_Apps
-- Legoktm
Kunal,
I believe what Joaquin was referring to is the notion that the web team in Reading will be entering into work on the desktop-oriented experience. As you rightly note, the Android and iOS teams are focused squarely on experiences for mobile devices.
For the edification of the list, the web engineers in Reading were in the mobile web team, with a focus on experiences for users on mobile form factor devices like phones and tablets. It's going to take some time to ramp up practices for tackling code and architecture historically oriented to the desktop form factor. We'll need to ensure continued stability in the platform and work with the community as we propose and introduce changes to the desktop form factor user experience.
-Adam
On Tue, Aug 18, 2015 at 11:22 AM, Legoktm legoktm.wikipedia@gmail.com wrote:
On 08/18/2015 10:29 AM, Joaquin Oltra Hernandez wrote:
Luckily there's no mobile teams any more after the reorg.
Are you sure? [1] In any case, it would be interesting to look at how many commits and contributions the "Desktop & Mobile Web" team has made to the desktop interface.
[1] https://wikimediafoundation.org/wiki/Staff_and_contractors#Mobile_Apps
-- Legoktm
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Sat, Aug 15, 2015 at 7:38 AM Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org wrote:
I've seen arguments on both sides here. Some say automatically generated descriptions are not good enough. Some say they are. Why don't we gather some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
And you know very well that (AFAIK) I am the only one who actually worked on this, in a tiny fraction of my spare time, and I only speak German and English.
The /real/ questions here are: 1. The language that are actually implemented, are they returning descriptions that are good/OK/bad/plain wrong 2. What could be achieved, on the existing or similar infrastructure, in a short period of time, if we drive to get code snippets (or equivalent) for other languages from volunteers? 3. What could be achieved, medium/long term, if we had a proper linguist to work on the problem? Or someone who has worked with multi-language text generation before?
I've just been winging it so far. Current auto-descriptions are not the best we can do. They are, frankly, the WORST we can do. This is a starting point, not the end product.
On Aug 15, 2015 14:06, "Magnus Manske" magnusmanske@googlemail.com wrote:
On Sat, Aug 15, 2015 at 7:38 AM Lydia Pintscher <
lydia.pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org wrote:
I've seen arguments on both sides here. Some say automatically
generated
descriptions are not good enough. Some say they are. Why don't we
gather
some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
And you know very well that (AFAIK) I am the only one who actually worked
on this, in a tiny fraction of my spare time, and I only speak German and English.
The /real/ questions here are:
- The language that are actually implemented, are they returning
descriptions that are good/OK/bad/plain wrong
- What could be achieved, on the existing or similar infrastructure, in
a short period of time, if we drive to get code snippets (or equivalent) for other languages from volunteers?
- What could be achieved, medium/long term, if we had a proper linguist
to work on the problem? Or someone who has worked with multi-language text generation before?
I've just been winging it so far. Current auto-descriptions are not the
best we can do. They are, frankly, the WORST we can do. This is a starting point, not the end product.
Yeah I understand. And this is not a criticism of your work. I think it is actually rather cool. It is questioning if it is a good idea to continue to push it to get into production on Wikipedia on a large scale.
Cheers Lydia
On Sat, Aug 15, 2015 at 1:17 PM Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Aug 15, 2015 14:06, "Magnus Manske" magnusmanske@googlemail.com wrote:
On Sat, Aug 15, 2015 at 7:38 AM Lydia Pintscher <
lydia.pintscher@wikimedia.de> wrote:
On Sat, Aug 15, 2015 at 3:43 AM, Dan Garry dgarry@wikimedia.org
wrote:
I've seen arguments on both sides here. Some say automatically
generated
descriptions are not good enough. Some say they are. Why don't we
gather
some data on this and use that to decide what's right? :-)
Please do. Especially pay attention to languages other than English though. Because even if we get algorithms to write good descriptions for English are we going to do the same for all the other languages? Especially those where grammar is tricky and Wikidata doesn't even have the necessary information to make the grammar right? The other tricky side is determining why something is actually notable. That's not a trivial thing to determine based on the data we have.
And you know very well that (AFAIK) I am the only one who actually
worked on this, in a tiny fraction of my spare time, and I only speak German and English.
The /real/ questions here are:
- The language that are actually implemented, are they returning
descriptions that are good/OK/bad/plain wrong
- What could be achieved, on the existing or similar infrastructure, in
a short period of time, if we drive to get code snippets (or equivalent) for other languages from volunteers?
- What could be achieved, medium/long term, if we had a proper linguist
to work on the problem? Or someone who has worked with multi-language text generation before?
I've just been winging it so far. Current auto-descriptions are not the
best we can do. They are, frankly, the WORST we can do. This is a starting point, not the end product.
Yeah I understand. And this is not a criticism of your work. I think it is actually rather cool. It is questioning if it is a good idea to continue to push it to get into production on Wikipedia on a large scale.
With that, I agree wholeheartedly.
There might be a point of doing an "extended prototype" though, before going to production (as much as I'd like that). What languages would be easy, hard, impossible? Would this work as a stand-alone project (e.g. dedicated VM), or as an extension of wikibase (flexibility vs. convenient integration)? What open source code is already out there we could use? Anyone in WMF/chapters who has experience in text generation? Anyone in WMF/chapters who speaks a "small" language who could help set up an example generator for that? What are the major item "classes" on Wikidata to be covered with special code, beyond the obvious "human bio"?
And we'd need someone to run this. As much as I'd like to, I'm stretched too thin as it is...