Hi Jane,

Perhaps my comments came off as more pessimistic than I intended. Of course I believe in the power of crowdsourcing, and I would never want to make anyone feel like their contributions are being marginalized.

I'll agree for now that the idea of "fully" automated descriptions leans more towards science fiction than reality. :)

However, my whole point has more to do with the apparent duplication of content that seems to be happening between the first sentence of Wikipedia articles and the corresponding Wikidata description.  There's something about it that seems unnecessary.  If we can figure out a way to automatically extract the description from the first sentence of the article, it would simplify things in two ways:

1) People wouldn't need to edit Wikidata descriptions, and would instead focus on improving the Wikipedia article.
2) People who monitor changes made to articles would need to monitor only the article, instead of the article plus its corresponding Wikidata description.

-Dmitry


On Sun, Mar 22, 2015 at 3:29 PM, Jane Darnell <jane023@gmail.com> wrote:
I agree with Monte Hurd and would add that my personal volunteer time on Wiki projects, though unique, is not irreplaceable, and the idea that I and others interested in my area of editing could be "overworked" by some new technology is just silly. I think you need to have a little faith in the whole concept of crowd-sourcing. It really does seem to work. Automated descriptions sounds like a terrible idea and I have seen time and again on all sorts of subjects that the main claim to fame switches across languages. An example is a language -pedia that has added a town because it is the location of a castle that is notable in that language -pedia for whatever reason, while in the language -pedia of the town itself, the town may be better known as a hub on a railway network or some such thing.

On Sun, Mar 22, 2015 at 8:08 PM, Monte Hurd <mhurd@wikimedia.org> wrote:
Responses inline...



> On Mar 22, 2015, at 8:57 AM, Dmitry Brant <dbrant@wikimedia.org> wrote:
>
> In preparation for next week's quarterly planning, I'd like to restate some of my concerns regarding Wikidata descriptions and flesh them out more comprehensively, since we're featuring them more prominently in the upcoming quarter.
> (n.b. These are more like "devil's advocate" thoughts, lest I make it sound like the Apps team isn't unified in its vision, which it certainly is.)
>
> My reservations fall under two categories:
>
> == Philosophical ==
>
> Wikidata is a superbly valuable repository of *data* -- data that a machine can use to generate all kinds of results that us humans can consume. The "description" field, on the other hand, is the only thing that is *not* data, and is not usable by a machine in any way.
>
> To allow users to manually fill in the Wikidata description (i.e. to manually duplicate the contents of Wikipedia) is to miss the point of the true potential of Wikidata, which is to be able to *use* the data to generate the description automatically!
>


I disagree with the premise that the description being "data" means it is missing its promise if it is human curated. I am more concerned with the quality of the description.




> Of course the counterargument to this is that the current state of auto-generated descriptions is not quite good (they often sound strange or nonsensical), but that's only because the tools we have at our disposal for generating descriptions are still in their infancy. I don't deny that this will be a hard problem to solve, but in my view, this is ultimately the *correct* problem to solve.
>



It's surprisingly hard to create auto generated descriptions that rival the quality of user generated descriptions.

Deeply hard, in fact, because it's complicated not only by language syntax and grammatical rules, but also by qualitative factors (readability, meaning, context, relevance etc).

This already complicated situation then becomes many orders of magnitude more difficult because these qualitative factors can differ between languages.





> The other thing (a more obvious one) that makes Wikidata descriptions redundant is the first sentence of every Wikipedia article which, on its own, is intended to provide a concise description of the article (and many articles already do this with rather good consistency). In fact, as we speak, we're working on programmatically "cleaning up" the first sentence to make it even more concise. Why not simply use this as the description?
>
> Is the first sentence sometimes too long to be a good description? No problem: create a markup annotation that will denote the *portion* of the first sentence that will serve as the description. In any case, making users manually copy the content from the first sentence (which is from where most of the current Wikidata descriptions appear to be derived) seems extraordinarily unnecessary.


The description needs to be able to be shorter than the first sentence in the article.




> On top of all that, it creates an unnecessary synchronization cost, fulfillable only by a human contributor, between the two sources of data.
>
> So, what I mean to say is: every edit to the Wikidata description is a missed opportunity to edit the Wikipedia article in such a way that the description could be auto-generated correctly. (or, similarly, a missed opportunity to edit the *data* of the Wikidata entry in such a way that the description could be auto-generated correctly)
>
> == Practical ==
>
> If we open the floodgates to editing the Wikidata description (i.e. if we make it too easy to edit the description), I predict that we'll be very disappointed by the quality of the contributions we'll get. I can see it quickly devolving into a whole lot of noise, spam, and vandalism.
>



I predict this won't be any worse than what happened when we enabled section editing.




> This means that we would need to implement the same kind of moderation/administration schemes that currently exist on Wikipedia itself.  I'm by no means qualified to speak for the Community, but I doubt that many Wikipedians will want to double their workload by having to "watch" the Wikidata description of their favorite articles, in addition to the articles themselves.
>
> I'll also point out that we do not yet expose any administrative mechanisms in the mobile apps.  This means that users will routinely see their edits disappear or be reverted without any notification or explanation.  This is already the case for the general editing of article content in the apps, but since the description is featured much more prominently, any edits (or reverts) to it will be much more noticeable, and will surely add to the confusion and frustration.




I've been editing descriptions from the Wikidata site directly for months and only one, of dozens I've added or edited were reverted.




> If we really want to get it right, we have to figure this out before proceeding.
>
>
> -Dmitry
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l