Hoi, I understand that item descriptions are going to be used in a mobile app. In my opinion that is seriously disappointing because it is not realistic to expect enough coverage in any language. Particularly in the small languages it will not be really useful.
My question is: we have had automated descriptions for a long time. What is it that they makes that they are not used.?
Thanks, GerardM
I'd rather see it not as something terribly disappointing, but as an opportunity to find a way to fill item descriptions more efficiently.
Basically, to find some cycles to resolve https://phabricator.wikimedia.org/T64695 בתאריך 8 בפבר 2015 10:33, "Gerard Meijssen" gerard.meijssen@gmail.com כתב:
Hoi, I understand that item descriptions are going to be used in a mobile app. In my opinion that is seriously disappointing because it is not realistic to expect enough coverage in any language. Particularly in the small languages it will not be really useful.
My question is: we have had automated descriptions for a long time. What is it that they makes that they are not used.?
Thanks, GerardM
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
015-02-08 14:07 GMT+01:00 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
I'd rather see it not as something terribly disappointing, but as an opportunity to find a way to fill item descriptions more efficiently.
+1.
L.
Hoi, How does that help ? The point is exactly that there is no point to descriptions. Why iterate on a dog it will still be a mutt. Thanks, GerardM
On 8 February 2015 at 14:07, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
I'd rather see it not as something terribly disappointing, but as an opportunity to find a way to fill item descriptions more efficiently.
Basically, to find some cycles to resolve https://phabricator.wikimedia.org/T64695 בתאריך 8 בפבר 2015 10:33, "Gerard Meijssen" gerard.meijssen@gmail.com כתב:
Hoi, I understand that item descriptions are going to be used in a mobile app. In my opinion that is seriously disappointing because it is not realistic to expect enough coverage in any language. Particularly in the small languages it will not be really useful.
My question is: we have had automated descriptions for a long time. What is it that they makes that they are not used.?
Thanks, GerardM
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Manual descriptions are, in the vast majority of cases, a waste of volunteer time. Alternative: http://magnusmanske.de/wordpress/?p=265
On Sun Feb 08 2015 at 17:37:42 Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, How does that help ? The point is exactly that there is no point to descriptions. Why iterate on a dog it will still be a mutt. Thanks, GerardM
On 8 February 2015 at 14:07, Amir E. Aharoni <amir.aharoni@mail.huji.ac.il
wrote:
I'd rather see it not as something terribly disappointing, but as an opportunity to find a way to fill item descriptions more efficiently.
Basically, to find some cycles to resolve https://phabricator.wikimedia.org/T64695 בתאריך 8 בפבר 2015 10:33, "Gerard Meijssen" gerard.meijssen@gmail.com כתב:
Hoi, I understand that item descriptions are going to be used in a mobile app. In my opinion that is seriously disappointing because it is not realistic to expect enough coverage in any language. Particularly in the small languages it will not be really useful.
My question is: we have had automated descriptions for a long time. What is it that they makes that they are not used.?
Thanks, GerardM
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
@Gerard, @Magnus: please help me out here.
I agree that automatic descriptions are very useful. I also think that in *some* cases, manual descriptions are more useful, and maybe even needed.
I definitely think that 3rd party consumers of wikidata should not have to think about whether descriptions have been written manually or were created automatically. This should be completely transparent.
So, if you want to help with making automated description a reality, please make suggestions that take into account the above points, and also consider the mechanisms for language fallback.
The only thing that I can think of right away is simply inserting automated descriptions by bot. This isn't ideal, but I can't think of a better solution that wouldn't be hugely complicated (and would thus not be implemented any time soon). Maybe you have ideas?
-- daniel
Am 09.02.2015 um 11:41 schrieb Magnus Manske:
Manual descriptions are, in the vast majority of cases, a waste of volunteer time. Alternative: http://magnusmanske.de/wordpress/?p=265
On Sun Feb 08 2015 at 17:37:42 Gerard Meijssen <gerard.meijssen@gmail.com mailto:gerard.meijssen@gmail.com> wrote:
Hoi, How does that help ? The point is exactly that there is no point to descriptions. Why iterate on a dog it will still be a mutt. Thanks, GerardM On 8 February 2015 at 14:07, Amir E. Aharoni <amir.aharoni@mail.huji.ac.il <mailto:amir.aharoni@mail.huji.ac.il>> wrote: I'd rather see it not as something terribly disappointing, but as an opportunity to find a way to fill item descriptions more efficiently. Basically, to find some cycles to resolve https://phabricator.wikimedia.org/T64695 בתאריך 8 בפבר 2015 10:33, "Gerard Meijssen" <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>> כתב: Hoi, I understand that item descriptions are going to be used in a mobile app. In my opinion that is seriously disappointing because it is not realistic to expect enough coverage in any language. Particularly in the small languages it will not be really useful. My question is: we have had automated descriptions for a long time. What is it that they makes that they are not used.? Thanks, GerardM _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Mon Feb 09 2015 at 10:59:12 Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
@Gerard, @Magnus: please help me out here.
I agree that automatic descriptions are very useful. I also think that in *some* cases, manual descriptions are more useful, and maybe even needed.
Yes.
I definitely think that 3rd party consumers of wikidata should not have to
think about whether descriptions have been written manually or were created automatically. This should be completely transparent.
My autodesc API serves both at the moment, so the consumer can decide which one they want to use. Automatic descriptions can "miss the point" sometimes, but are generally more up-to-date.
So, if you want to help with making automated description a reality, please make suggestions that take into account the above points, and also consider the mechanisms for language fallback.
From my point of view, this is the "evolution" of automatic descriptions
(ADs): 1. web-based tools as proof-of-concept. This is done. 2. web-based API to standardise automatic descriptions, and make them easily accessible for everyone. I am trying to do that now, 3. WMF/Wikibase-team picks up the API code, or writes their own; integration into MediaWiki/extension, with proper language generation in many languages, good caching/invalidation, API integration etc. Waiting for that :-)
The only thing that I can think of right away is simply inserting automated descriptions by bot. This isn't ideal, but I can't think of a better solution that wouldn't be hugely complicated (and would thus not be implemented any time soon). Maybe you have ideas?
I think that would be a massive waste, and it would miss one of the points of AD, which is improvement with more/better statements.
What I /can/ see is a cached AD in a new field, one which gets invalidated on every item edit, and maybe on every related item edit (new/better labels for description). The wb_terms table could support that easily, AFAICT. It could be used for search results, and it could be shown as placeholder for, or below, the manual description in the interface. This would, however, require engineering beyond what I can offer as a volunteer. It could also profit from the involvement of someone versed in linguistics.
Cheers, Magnus
-- daniel
Am 09.02.2015 um 11:41 schrieb Magnus Manske:
Manual descriptions are, in the vast majority of cases, a waste of
volunteer
time. Alternative: http://magnusmanske.de/wordpress/?p=265
On Sun Feb 08 2015 at 17:37:42 Gerard Meijssen <
gerard.meijssen@gmail.com
mailto:gerard.meijssen@gmail.com> wrote:
Hoi, How does that help ? The point is exactly that there is no point to descriptions. Why iterate on a dog it will still be a mutt. Thanks, GerardM On 8 February 2015 at 14:07, Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il
<mailto:amir.aharoni@mail.huji.ac.il>> wrote: I'd rather see it not as something terribly disappointing, but
as an
opportunity to find a way to fill item descriptions more
efficiently.
Basically, to find some cycles to resolve https://phabricator.wikimedia.org/T64695 בתאריך 8 בפבר 2015 10:33, "Gerard Meijssen" <
gerard.meijssen@gmail.com
<mailto:gerard.meijssen@gmail.com>> כתב: Hoi, I understand that item descriptions are going to be used in
a mobile
app. In my opinion that is seriously disappointing because
it is not
realistic to expect enough coverage in any language.
Particularly in
the small languages it will not be really useful. My question is: we have had automated descriptions for a
long time.
What is it that they makes that they are not used.? Thanks, GerardM _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.
wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.
wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.
wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 09.02.2015 um 12:17 schrieb Magnus Manske:
My autodesc API serves both at the moment, so the consumer can decide which one they want to use. Automatic descriptions can "miss the point" sometimes, but are generally more up-to-date.
Can you post a link for us to play with?
In any case, the mobile app would need a production grade service, so it would have to wait until this is fully integrated with wikibase and live on wikidata.
So, if you want to help with making automated description a reality, please make suggestions that take into account the above points, and also consider the mechanisms for language fallback.
From my point of view, this is the "evolution" of automatic descriptions (ADs):
- web-based tools as proof-of-concept. This is done.
- web-based API to standardise automatic descriptions, and make them easily
accessible for everyone. I am trying to do that now, 3. WMF/Wikibase-team picks up the API code, or writes their own; integration into MediaWiki/extension, with proper language generation in many languages, good caching/invalidation, API integration etc. Waiting for that :-)
As Markus points out, this does not address the needs of dump consumers. If the UI and API generate automatic summaries on the fly, there is very little incentive for users to enter descriptions manually (which is the point, of course). This means few descriptions in dumps.
To have the automatic summaries in the dumps, we would need to either materialize them in the database (and then invalidate/update them when appropriate), or we generated them on the fly when creating the dump.
In summary, I understand the issue, but it seems tricky to get the solution right, both conceptually, and in terms of engineering.
I assume Magnus is referring to cases where for example, an item exists because someone's biography has been added to the Esperanto Wikipedia as a leading esperantist, but whose actual claim to fame for other Wikipedias is quite different (e.g. the person was a poet in another language, a leading politician in some city, etc).
On Mon, Feb 9, 2015 at 12:26 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de
wrote:
Am 09.02.2015 um 12:17 schrieb Magnus Manske:
My autodesc API serves both at the moment, so the consumer can decide
which one
they want to use. Automatic descriptions can "miss the point" sometimes,
but are
generally more up-to-date.
Can you post a link for us to play with?
In any case, the mobile app would need a production grade service, so it would have to wait until this is fully integrated with wikibase and live on wikidata.
So, if you want to help with making automated description a reality,
please make
suggestions that take into account the above points, and also
consider the
mechanisms for language fallback.
From my point of view, this is the "evolution" of automatic descriptions
(ADs):
- web-based tools as proof-of-concept. This is done.
- web-based API to standardise automatic descriptions, and make them
easily
accessible for everyone. I am trying to do that now, 3. WMF/Wikibase-team picks up the API code, or writes their own;
integration
into MediaWiki/extension, with proper language generation in many
languages,
good caching/invalidation, API integration etc. Waiting for that :-)
As Markus points out, this does not address the needs of dump consumers. If the UI and API generate automatic summaries on the fly, there is very little incentive for users to enter descriptions manually (which is the point, of course). This means few descriptions in dumps.
To have the automatic summaries in the dumps, we would need to either materialize them in the database (and then invalidate/update them when appropriate), or we generated them on the fly when creating the dump.
In summary, I understand the issue, but it seems tricky to get the solution right, both conceptually, and in terms of engineering.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Mon Feb 09 2015 at 11:27:06 Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 09.02.2015 um 12:17 schrieb Magnus Manske:
My autodesc API serves both at the moment, so the consumer can decide
which one
they want to use. Automatic descriptions can "miss the point" sometimes,
but are
generally more up-to-date.
Can you post a link for us to play with?
Interface at https://tools.wmflabs.org/autodesc/
Example JSONFM: https://tools.wmflabs.org/autodesc/?q=Q3184929&lang=&mode=short&...
In any case, the mobile app would need a production grade service, so it would have to wait until this is fully integrated with wikibase and live on wikidata.
I understand that. In my blog post about the API: http://magnusmanske.de/wordpress/?p=265 I point out that it is not exactly production-quality yet :-)
So, if you want to help with making automated description a reality,
please make
suggestions that take into account the above points, and also
consider the
mechanisms for language fallback.
From my point of view, this is the "evolution" of automatic descriptions
(ADs):
- web-based tools as proof-of-concept. This is done.
- web-based API to standardise automatic descriptions, and make them
easily
accessible for everyone. I am trying to do that now, 3. WMF/Wikibase-team picks up the API code, or writes their own;
integration
into MediaWiki/extension, with proper language generation in many
languages,
good caching/invalidation, API integration etc. Waiting for that :-)
As Markus points out, this does not address the needs of dump consumers. If the UI and API generate automatic summaries on the fly, there is very little incentive for users to enter descriptions manually (which is the point, of course). This means few descriptions in dumps.
To have the automatic summaries in the dumps, we would need to either materialize them in the database (and then invalidate/update them when appropriate), or we generated them on the fly when creating the dump.
Just put them into wb_terms and not into the JSON. They could be displayed, added to search results, and put into "description dumps". Maybe these could even be sqlite databases, as there is little point analysing automatic descriptions for wording; you'd need these descriptions to display with an item, so sqlite could be a way of getting them quickly.
In summary, I understand the issue, but it seems tricky to get the solution right, both conceptually, and in terms of engineering.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 09.02.2015 um 13:26 schrieb Magnus Manske:
On Mon Feb 09 2015 at 11:27:06 Daniel Kinzler <daniel.kinzler@wikimedia.de mailto:daniel.kinzler@wikimedia.de> wrote:
Am 09.02.2015 um 12:17 schrieb Magnus Manske: > My autodesc API serves both at the moment, so the consumer can decide which one > they want to use. Automatic descriptions can "miss the point" sometimes, but are > generally more up-to-date. Can you post a link for us to play with?
Interface at https://tools.wmflabs.org/autodesc/
Example JSONFM: https://tools.wmflabs.org/autodesc/?q=Q3184929&lang=&mode=short&...
Thanks!
Just put them into wb_terms and not into the JSON. They could be displayed, added to search results, and put into "description dumps". Maybe these could even be sqlite databases, as there is little point analysing automatic descriptions for wording; you'd need these descriptions to display with an item, so sqlite could be a way of getting them quickly.
Since wb_terms has one row per term, and a field for the term type, it would be simple enough to inject "auto-descriptions". The only issue is that wb_terms is already pretty huge, and adding automatic descriptions in *all* languages would likely bloat it a lot more. Language variants could be omitted, but still - that's a lot of data...
On Mon Feb 09 2015 at 13:00:35 Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Since wb_terms has one row per term, and a field for the term type, it would be simple enough to inject "auto-descriptions". The only issue is that wb_terms is already pretty huge, and adding automatic descriptions in *all* languages would likely bloat it a lot more. Language variants could be omitted, but still - that's a lot of data...
It would be a quick'n'dirty solution. But it highlights an issue: We'd
have the same problem with manual descriptions, if they were to arrive in large numbers.
There's always Yet Another Table. Maybe a description would be generated on-the-fly only if a Wikidata page is visited in a language, and removed after ~1 month of "non-viewing"? That should keep the table short enough, but would require extra effort for API calls and dumps, provided those should show descriptions for /all/ languages.
Then again there's the Labs hadoop cluster, used for Analytics IIRC. That sounds like a way to process and store vast amounts of small, self-contained datasets (description strings). Would tie the solution to Wikimedia, though, and require a lot of engineering effort to get started.
Oh, real-live example for "short automatic descriptions" (same code as the API) vs. manual ones: Searching for "Peter" on Wikidata, with autodesc gadget: https://twitter.com/MagnusManske/status/564782161845551104
On Mon Feb 09 2015 at 13:09:27 Magnus Manske magnusmanske@googlemail.com wrote:
On Mon Feb 09 2015 at 13:00:35 Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Since wb_terms has one row per term, and a field for the term type, it would be simple enough to inject "auto-descriptions". The only issue is that wb_terms is already pretty huge, and adding automatic descriptions in *all* languages would likely bloat it a lot more. Language variants could be omitted, but still
that's a lot of data...
It would be a quick'n'dirty solution. But it highlights an issue: We'd
have the same problem with manual descriptions, if they were to arrive in large numbers.
There's always Yet Another Table. Maybe a description would be generated on-the-fly only if a Wikidata page is visited in a language, and removed after ~1 month of "non-viewing"? That should keep the table short enough, but would require extra effort for API calls and dumps, provided those should show descriptions for /all/ languages.
Then again there's the Labs hadoop cluster, used for Analytics IIRC. That sounds like a way to process and store vast amounts of small, self-contained datasets (description strings). Would tie the solution to Wikimedia, though, and require a lot of engineering effort to get started.
On 2/9/15 8:00 AM, Daniel Kinzler wrote:
On Mon Feb 09 2015 at 11:27:06 Daniel Kinzler <daniel.kinzler@wikimedia.de
mailto:daniel.kinzler@wikimedia.de> wrote:
Am 09.02.2015 um 12:17 schrieb Magnus Manske: > My autodesc API serves both at the moment, so the consumer can decide which one > they want to use. Automatic descriptions can "miss the point" sometimes, but are > generally more up-to-date. Can you post a link for us to play with?
Interface at https://tools.wmflabs.org/autodesc/
Example JSONFM: https://tools.wmflabs.org/autodesc/?q=Q3184929&lang=&mode=short&...
Thanks!
Daniel,
Have you still not considered using <link/> in <head/> of HTML docs to enable user agents discover:
1. alternative document content sources?
2. related document content sources?
All you have to do (although you must know this already) is:
<head> <link rel="alternate" href="https://tools.wmflabs.org/autodesc/?q=Q3184929&lang=en&mode=short&links=text&redlinks=&format=json" title="JSON representation" /> </head>
or if you simply want to be looser:
<head> <link rel="related" href="https://tools.wmflabs.org/autodesc/?q=Q3184929&lang=en&mode=short&links=text&redlinks=&format=json" title="JSON representation" /> </head>
Just placing the above in your HTML aids user agents that understand HTML.
You can even go further, in regards to HTTP aware user agents (beyond browsers) by replicating the relations above via "Link:" response metadata.
This simple tweaks solve lots of discovery related problems :)
On Mon, Feb 9, 2015 at 3:22 PM, Kingsley Idehen kidehen@openlinksw.com wrote:
Daniel,
Have you still not considered using <link/> in <head/> of HTML docs to enable user agents discover:
alternative document content sources?
related document content sources?
All you have to do (although you must know this already) is:
<head> <link rel="alternate" href="https://tools.wmflabs.org/autodesc/?q=Q3184929&lang=en&mode=short&links=text&redlinks=&format=json" title="JSON representation" /> </head>
or if you simply want to be looser:
<head> <link rel="related" href="https://tools.wmflabs.org/autodesc/?q=Q3184929&lang=en&mode=short&links=text&redlinks=&format=json" title="JSON representation" /> </head>
Just placing the above in your HTML aids user agents that understand HTML.
You can even go further, in regards to HTTP aware user agents (beyond browsers) by replicating the relations above via "Link:" response metadata.
This simple tweaks solve lots of discovery related problems :)
Can you please open a ticket for this on phabricator.wikimedia.org? Then we have it in our todo list. Thanks.
Cheers Lydia
Manual descriptions are not an entire waste of time.
Magnus writes in his post http://magnusmanske.de/wordpress/?p=265:
And some people have seen my Reasonator
http://tools.wmflabs.org/reasonator/?q=Q1339 tool, where (for some item types, and some languages) rather long descriptions can be generated.
It's not necessary gd that they are long. For the mobile app it's better if they are short.
But the "some item types, and some languages" part is the real problem. Only some. It's quite possible that in the future Reasonator will cover all languages and all data types and will also be tweaked t provide appropriate length, maybe even different lengths according to context. Reasonator natural language sentence creation works for a very small number of languages. If it was as easy to translate it as it is to translate MediaWiki UI messages, I wouldn't object to its wider, but AFAIK this is not the case not.
And it's not that good for English either. Reasonator is not smart enough at the moment to describe people with several qualifications. The current Reasonator-generated description of Peter Garrett https://tools.wmflabs.org/reasonator/?find=peter+garrett is vastly inferior to the manually-written description. Compare: 1. "Australian singer and politician, Minister for School Education, Early Childhood and Youth, Minister for Sustainability, Environment, Water, Population and Communities (Australia), and Member of the Australian House of Representatives (*1953) ♂" 2. "Australian politician and Midnight Oil lead singer". Basic human intuition tells me that for most Wikipedia readers, who simply want to know "Who is Peter Garrett?", #2 is far more useful. #1 has oversize descriptions of all his political roles, and *doesn't* have the name the rock band that made him popular. This is just one example out of hundreds of thousands that could be brought up. For what it's worth, #2 is also easier to translate manually.
It's important to emphasize at this point that I have the utmost respect to Magnus's brilliant work. It's just not ready to completely replace the manual descriptions.
A practical solution for now is to have a system for manual translation of descriptions, which shows the Reasonator descriptions as a translation aid, similarly to how the Translate extension shows translation memory suggestions. Also, a way to manually tweak descriptions can take Reasonator further, for example a way to tell it that for the Peter Garrett item there's no need to include a long list of all his roles in the Australian government.
Oh, and even if you can run away some day from manually translating descriptions, you cannot run away from manually translating labels. At most, some can be copied from Wikipedia, but even then many of them need post-import fixing.
So all of this brings me back to https://phabricator.wikimedia.org/T64695 .
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-02-09 12:58 GMT+02:00 Daniel Kinzler daniel.kinzler@wikimedia.de:
@Gerard, @Magnus: please help me out here.
I agree that automatic descriptions are very useful. I also think that in *some* cases, manual descriptions are more useful, and maybe even needed.
I definitely think that 3rd party consumers of wikidata should not have to think about whether descriptions have been written manually or were created automatically. This should be completely transparent.
So, if you want to help with making automated description a reality, please make suggestions that take into account the above points, and also consider the mechanisms for language fallback.
The only thing that I can think of right away is simply inserting automated descriptions by bot. This isn't ideal, but I can't think of a better solution that wouldn't be hugely complicated (and would thus not be implemented any time soon). Maybe you have ideas?
-- daniel
Am 09.02.2015 um 11:41 schrieb Magnus Manske:
Manual descriptions are, in the vast majority of cases, a waste of
volunteer
time. Alternative: http://magnusmanske.de/wordpress/?p=265
On Sun Feb 08 2015 at 17:37:42 Gerard Meijssen <
gerard.meijssen@gmail.com
mailto:gerard.meijssen@gmail.com> wrote:
Hoi, How does that help ? The point is exactly that there is no point to descriptions. Why iterate on a dog it will still be a mutt. Thanks, GerardM On 8 February 2015 at 14:07, Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il
<mailto:amir.aharoni@mail.huji.ac.il>> wrote: I'd rather see it not as something terribly disappointing, but
as an
opportunity to find a way to fill item descriptions more
efficiently.
Basically, to find some cycles to resolve https://phabricator.wikimedia.org/T64695 בתאריך 8 בפבר 2015 10:33, "Gerard Meijssen" <
gerard.meijssen@gmail.com
<mailto:gerard.meijssen@gmail.com>> כתב: Hoi, I understand that item descriptions are going to be used in
a mobile
app. In my opinion that is seriously disappointing because
it is not
realistic to expect enough coverage in any language.
Particularly in
the small languages it will not be really useful. My question is: we have had automated descriptions for a
long time.
What is it that they makes that they are not used.? Thanks, GerardM _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Mon Feb 09 2015 at 12:08:01 Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Manual descriptions are not an entire waste of time.
I never said that, so please don't put words in my mouth. I did say, on several occasions, that, for the vast majority of items, manual descriptions are a waste of volunteer time. For cases where "the point" of the item is not easily expressible as statements, manual descriptions are indeed useful. But the number of cases where this applies will only shrink, as we get more properties on Wikidata, and as the description generators get better.
Magnus writes in his post http://magnusmanske.de/wordpress/?p=265:
And some people have seen my Reasonator
http://tools.wmflabs.org/reasonator/?q=Q1339 tool, where (for some item types, and some languages) rather long descriptions can be generated.
It's not necessary gd that they are long. For the mobile app it's better if they are short.
Which is one reason my automatic description API defaults to short descriptions, unless you ask for a long one. Even then, it will default to short, unless the language/item type combination is covered. The short description generator covers more languages, and is easily extendable. Check out "stock" at https://bitbucket.org/magnusmanske/autodesc/src/019b395c1bd5e13720e5cfda4df0...
I think the results are, at the very least, understandable by humans; and yes, these things tend to get better very quickly (developers! developers! <throws chair>).
But the "some item types, and some languages" part is the real problem. Only some. It's quite possible that in the future Reasonator will cover all languages and all data types and will also be tweaked t provide appropriate length, maybe even different lengths according to context. Reasonator natural language sentence creation works for a very small number of languages. If it was as easy to translate it as it is to translate MediaWiki UI messages, I wouldn't object to its wider, but AFAIK this is not the case not.
And it's not that good for English either. Reasonator is not smart enough at the moment to describe people with several qualifications. The current Reasonator-generated description of Peter Garrett https://tools.wmflabs.org/reasonator/?find=peter+garrett is vastly inferior to the manually-written description. Compare:
- "Australian singer and politician, Minister for School Education, Early
Childhood and Youth, Minister for Sustainability, Environment, Water, Population and Communities (Australia), and Member of the Australian House of Representatives (*1953) ♂" 2. "Australian politician and Midnight Oil lead singer". Basic human intuition tells me that for most Wikipedia readers, who simply want to know "Who is Peter Garrett?", #2 is far more useful. #1 has oversize descriptions of all his political roles, and *doesn't* have the name the rock band that made him popular. This is just one example out of hundreds of thousands that could be brought up. For what it's worth, #2 is also easier to translate manually.
It's important to emphasize at this point that I have the utmost respect to Magnus's brilliant work. It's just not ready to completely replace the manual descriptions.
Thanks, and it should not. But a little developer time can save megahours (new unit!) of volunteers performing needless work.
A practical solution for now is to have a system for manual translation of descriptions, which shows the Reasonator descriptions as a translation aid, similarly to how the Translate extension shows translation memory suggestions. Also, a way to manually tweak descriptions can take Reasonator further, for example a way to tell it that for the Peter Garrett item there's no need to include a long list of all his roles in the Australian government.
Oh, and even if you can run away some day from manually translating descriptions, you cannot run away from manually translating labels. At most, some can be copied from Wikipedia, but even then many of them need post-import fixing.
So all of this brings me back to https://phabricator.wikimedia.org/T64695 .
Yes, labels are a different beast. Though there are some things we could automate; basically, all people with a German label and no English one could use the German one in English just as well. Those cases aside, automated translation can be helpful, but can also go horribly wrong in cases.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-02-09 12:58 GMT+02:00 Daniel Kinzler daniel.kinzler@wikimedia.de:
@Gerard, @Magnus: please help me out here.
I agree that automatic descriptions are very useful. I also think that in *some* cases, manual descriptions are more useful, and maybe even needed.
I definitely think that 3rd party consumers of wikidata should not have to think about whether descriptions have been written manually or were created automatically. This should be completely transparent.
So, if you want to help with making automated description a reality, please make suggestions that take into account the above points, and also consider the mechanisms for language fallback.
The only thing that I can think of right away is simply inserting automated descriptions by bot. This isn't ideal, but I can't think of a better solution that wouldn't be hugely complicated (and would thus not be implemented any time soon). Maybe you have ideas?
-- daniel
Am 09.02.2015 um 11:41 schrieb Magnus Manske:
Manual descriptions are, in the vast majority of cases, a waste of
volunteer
time. Alternative: http://magnusmanske.de/wordpress/?p=265
On Sun Feb 08 2015 at 17:37:42 Gerard Meijssen <
gerard.meijssen@gmail.com
mailto:gerard.meijssen@gmail.com> wrote:
Hoi, How does that help ? The point is exactly that there is no point to descriptions. Why iterate on a dog it will still be a mutt. Thanks, GerardM On 8 February 2015 at 14:07, Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il
<mailto:amir.aharoni@mail.huji.ac.il>> wrote: I'd rather see it not as something terribly disappointing, but
as an
opportunity to find a way to fill item descriptions more
efficiently.
Basically, to find some cycles to resolve https://phabricator.wikimedia.org/T64695 בתאריך 8 בפבר 2015 10:33, "Gerard Meijssen" <
gerard.meijssen@gmail.com
<mailto:gerard.meijssen@gmail.com>> כתב: Hoi, I understand that item descriptions are going to be used in
a mobile
app. In my opinion that is seriously disappointing because
it is not
realistic to expect enough coverage in any language.
Particularly in
the small languages it will not be really useful. My question is: we have had automated descriptions for a
long time.
What is it that they makes that they are not used.? Thanks, GerardM _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Maybe not really a correct answer to your question, but I'm thiking of something similar of automated generated articles.
In my suggestion this was, associate a template description, in form of a wikitemplate that uses claims, to a query.
2015-02-09 11:58 GMT+01:00 Daniel Kinzler daniel.kinzler@wikimedia.de:
@Gerard, @Magnus: please help me out here.
I agree that automatic descriptions are very useful. I also think that in *some* cases, manual descriptions are more useful, and maybe even needed.
I definitely think that 3rd party consumers of wikidata should not have to think about whether descriptions have been written manually or were created automatically. This should be completely transparent.
So, if you want to help with making automated description a reality, please make suggestions that take into account the above points, and also consider the mechanisms for language fallback.
The only thing that I can think of right away is simply inserting automated descriptions by bot. This isn't ideal, but I can't think of a better solution that wouldn't be hugely complicated (and would thus not be implemented any time soon). Maybe you have ideas?
-- daniel
Am 09.02.2015 um 11:41 schrieb Magnus Manske:
Manual descriptions are, in the vast majority of cases, a waste of
volunteer
time. Alternative: http://magnusmanske.de/wordpress/?p=265
On Sun Feb 08 2015 at 17:37:42 Gerard Meijssen <
gerard.meijssen@gmail.com
mailto:gerard.meijssen@gmail.com> wrote:
Hoi, How does that help ? The point is exactly that there is no point to descriptions. Why iterate on a dog it will still be a mutt. Thanks, GerardM On 8 February 2015 at 14:07, Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il
<mailto:amir.aharoni@mail.huji.ac.il>> wrote: I'd rather see it not as something terribly disappointing, but
as an
opportunity to find a way to fill item descriptions more
efficiently.
Basically, to find some cycles to resolve https://phabricator.wikimedia.org/T64695 בתאריך 8 בפבר 2015 10:33, "Gerard Meijssen" <
gerard.meijssen@gmail.com
<mailto:gerard.meijssen@gmail.com>> כתב: Hoi, I understand that item descriptions are going to be used in
a mobile
app. In my opinion that is seriously disappointing because
it is not
realistic to expect enough coverage in any language.
Particularly in
the small languages it will not be really useful. My question is: we have had automated descriptions for a
long time.
What is it that they makes that they are not used.? Thanks, GerardM _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:
Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 09.02.2015 11:41, Magnus Manske wrote:
Manual descriptions are, in the vast majority of cases, a waste of volunteer time. Alternative: http://magnusmanske.de/wordpress/?p=265
I am slightly concerned for the external data users (which I am too). Descriptions are very useful to have in the data dumps. I don't mind if they are auto-generated or written by humans, but I am worried that I would have to go to a web service for fetching all them, which seems a lot of work and very time consuming if you do it on the data dump scale. It may not even be possilbe in all (offline) contexts where dumps can be used.
More generally, switching from "we provide the data" to "we provide some data and a list of web services that you need to query to get the rest" seems to be a change of paradigm that I am not entirely happy with. Just consider how much data we import that is generated automatically -- should we in all of this cases switch to offering a web service that gives you the data if you really need?
So, +1 for auto-generated descriptions, but -1 for not having them in the data anymore.
Cheers,
Markus
Considering that "hardcoded" descriptions (written manually, or generated automatically) for all items in all ~290 languages would likely make up most of the data dump file, this seems somewhat impractical :-)
For "offline users", description dumps could be generated on a regular basis, if there is demand.
On Mon Feb 09 2015 at 11:05:26 Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
On 09.02.2015 11:41, Magnus Manske wrote:
Manual descriptions are, in the vast majority of cases, a waste of volunteer time. Alternative: http://magnusmanske.de/wordpress/?p=265
I am slightly concerned for the external data users (which I am too). Descriptions are very useful to have in the data dumps. I don't mind if they are auto-generated or written by humans, but I am worried that I would have to go to a web service for fetching all them, which seems a lot of work and very time consuming if you do it on the data dump scale. It may not even be possilbe in all (offline) contexts where dumps can be used.
More generally, switching from "we provide the data" to "we provide some data and a list of web services that you need to query to get the rest" seems to be a change of paradigm that I am not entirely happy with. Just consider how much data we import that is generated automatically -- should we in all of this cases switch to offering a web service that gives you the data if you really need?
So, +1 for auto-generated descriptions, but -1 for not having them in the data anymore.
Cheers,
Markus
-- Markus Kroetzsch Faculty of Computer Science Technische Universität Dresden +49 351 463 38486 http://korrekt.org/
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 09.02.2015 um 12:08 schrieb Magnus Manske:
Considering that "hardcoded" descriptions (written manually, or generated automatically) for all items in all ~290 languages would likely make up most of the data dump file, this seems somewhat impractical :-)
It's entirely practical, and apparently what at least some consumers of our dumps expect and desire.
But wouldn't it be better to keep the dump as it is, for those who don't want triple size (just inventing a number here), and have one separate, or even per-language, dump with just the automated descriptions, for those who want that?
On Mon Feb 09 2015 at 11:21:56 Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 09.02.2015 um 12:08 schrieb Magnus Manske:
Considering that "hardcoded" descriptions (written manually, or generated automatically) for all items in all ~290 languages would likely make up
most of
the data dump file, this seems somewhat impractical :-)
It's entirely practical, and apparently what at least some consumers of our dumps expect and desire.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 09.02.2015 um 12:25 schrieb Magnus Manske:
But wouldn't it be better to keep the dump as it is, for those who don't want triple size (just inventing a number here), and have one separate, or even per-language, dump with just the automated descriptions, for those who want that?
Possibly. Depends on how much more data this would actually be. Which also depends on whether we would omit descriptions in languages that can easily be covered by language fallback (e.g. no separate descriptions in de-ch and de-at).
Hi Magnus, hi Daniel,
I don't think file size should be our primary concern here. What may seem big today will be negligible in a few years. Having all data in one place is just easier to work with. I am happy to wait for another 30min for a download if it saves me from implementing another Web service connector in my own code. Compute time is cheap, disk space is cheap, human labour is expensive.
Maybe the whole size discussion is a bit of a red herring here anyway. If we are worried about file size, there would maybe be better ways of reducing it. We can split the contents into several smaller dump files, not just for descriptions. We are already doing this when creating RDF dumps, and it would be easy for us to do the same for JSON. We could do this immediately if someone needs it (just let me know and we will set it up for you). However, if we want to provide smaller files, a more effective method would be to split by language rather than by term type: all labels in all languages would still be much bigger than labels+descriptions+aliases in English only, and many applications will not need labels in 300 languages.
Anyway, as I said, I do not mind whether the auto-descriptions are stored like normal descriptions or whether they are added to the dump files "last minute" when generating them. I just need the descriptions in the dumps.
Cheers,
Markus
On 09.02.2015 12:28, Daniel Kinzler wrote:
Am 09.02.2015 um 12:25 schrieb Magnus Manske:
But wouldn't it be better to keep the dump as it is, for those who don't want triple size (just inventing a number here), and have one separate, or even per-language, dump with just the automated descriptions, for those who want that?
Possibly. Depends on how much more data this would actually be. Which also depends on whether we would omit descriptions in languages that can easily be covered by language fallback (e.g. no separate descriptions in de-ch and de-at).
Hoi, It is pointless to include automated descriptions when they are then saved in a fixed form. The point of automated descriptions is exactly that they change as new statements are made. This is one reason why they are superior to manual descriptions. The other is that when one label is added in a language, it immediately affects all items that include the associated item.
When the argument is that external users need the best descriptions available at whatever time, it is best to have the automated descriptions separate. We have enough experience of the disruption caused by failing dumps. Given that there is a need for descriptions for off line usage, it makes sense to consider caching such a file and removing the content that is changed and have it regenerated in a batch process. When a description is needed it can always be generated there and then. These can be used interactively as well. Thanks, GerardM
On 9 February 2015 at 13:21, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
Hi Magnus, hi Daniel,
I don't think file size should be our primary concern here. What may seem big today will be negligible in a few years. Having all data in one place is just easier to work with. I am happy to wait for another 30min for a download if it saves me from implementing another Web service connector in my own code. Compute time is cheap, disk space is cheap, human labour is expensive.
Maybe the whole size discussion is a bit of a red herring here anyway. If we are worried about file size, there would maybe be better ways of reducing it. We can split the contents into several smaller dump files, not just for descriptions. We are already doing this when creating RDF dumps, and it would be easy for us to do the same for JSON. We could do this immediately if someone needs it (just let me know and we will set it up for you). However, if we want to provide smaller files, a more effective method would be to split by language rather than by term type: all labels in all languages would still be much bigger than labels+descriptions+aliases in English only, and many applications will not need labels in 300 languages.
Anyway, as I said, I do not mind whether the auto-descriptions are stored like normal descriptions or whether they are added to the dump files "last minute" when generating them. I just need the descriptions in the dumps.
Cheers,
Markus
On 09.02.2015 12:28, Daniel Kinzler wrote:
Am 09.02.2015 um 12:25 schrieb Magnus Manske:
But wouldn't it be better to keep the dump as it is, for those who don't want triple size (just inventing a number here), and have one separate, or even per-language, dump with just the automated descriptions, for those who want that?
Possibly. Depends on how much more data this would actually be. Which also depends on whether we would omit descriptions in languages that can easily be covered by language fallback (e.g. no separate descriptions in de-ch and de-at).
-- Markus Kroetzsch Faculty of Computer Science Technische Universität Dresden +49 351 463 38486 http://korrekt.org/
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 12.02.2015 07:17, Gerard Meijssen wrote:
Hoi, It is pointless to include automated descriptions when they are then saved in a fixed form. The point of automated descriptions is exactly that they change as new statements are made. This is one reason why they are superior to manual descriptions. The other is that when one label is added in a language, it immediately affects all items that include the associated item.
Sure, I am all for having automated descriptions that are updated regularly. I think we are in agreement here.
When the argument is that external users need the best descriptions available at whatever time, it is best to have the automated descriptions separate. We have enough experience of the disruption caused by failing dumps. Given that there is a need for descriptions for off line usage, it makes sense to consider caching such a file and removing the content that is changed and have it regenerated in a batch process. When a description is needed it can always be generated there and then. These can be used interactively as well.
You are suggesting to create a file that caches the current autodescriptions and that is generated in a batch process? Seems like a good idea to me. The easiest solution for dump consumers would be if this file would be integrated directly into the dump files rather than having a separate download that needs to be merged with the dump locally (as I said before, splitting the current dump into many separate files may have its own merits; but this is another discussion).
Cheers
Markus
On 9 February 2015 at 10:41, Magnus Manske magnusmanske@googlemail.com wrote:
Manual descriptions are, in the vast majority of cases, a waste of volunteer time. Alternative: http://magnusmanske.de/wordpress/?p=265
Why not make that part of the Wikidata game?
You could show the proposed description and the article lede, in the user's preferred language, then offer options to:
* Accept the prosed description * Edit the proposed description, then save it * reject the proposed description and skip to the next article
For "en" users, you could add a check-box to "also save the proposed description to "simple".