Hey everyone,

Sorry for being late to the discussion and thank you very much for your feedback. Many of the important questions were answered by Lydia already, I'll try to cover what is left.

First of all, please keep in mind that this was a Bachelor's thesis and therefore I was limited in the scope of the project as well as time wise. I am happy about input, and there is a phabricator board for the further development of the extension [1]

Luis- I haven't thought about a/b testing yet. It will be a beta feature in the beginning as mentioned to collect feedback, but I tried to keep in mind that especially in the beginning we speak about very small Wikipedias, and collecting data about how many articles are created from the placeholders over all will be the first step of testing how well they are accepted beside the general feedback. This is planned [2]

The problem of the red links is rather broad but extremely interesting to me. Sadly, it was out of the scope and the "smart red links" chapter mostly exists to indicate that there are first approaches to include this topic and there will be further work on this. But it does involve very well planned work and more than just half a page of writing and discussion I guess :)

The notability of items is another difficult topic. I chose the solution discussed for now, because a) as Lydia said I don't wont encourage article creation when not appropriate and b) there are many items on Wikidata, that will not reach the criterias I chose for now. Displaying them anyway may lead to disappointment with the content of the ArticlePlaceholder and editors, that would actually want to create an article on the topic would have to do more clicks than otherwise necessary, since the placeholders can't show them much more information as an empty page in most of these cases. Therefore we decided to filter those items out.

I can't say why there is not more input on the RfC about ordering of statement groups but I hope you will agree on giving my approach a try. If it's not the one wished by the communities and/or developers and we can come up with a better one, I'll be open to change and adjust. But for now, that seemed like a solution, that could be a first step in having ordered statements.

Thank you very much again!

Lucie

[1] https://phabricator.wikimedia.org/tag/articleplaceholder/
[2] https://phabricator.wikimedia.org/T123087

On Tue, Apr 5, 2016 at 3:32 PM, Gerard Meijssen <gerard.meijssen@gmail.com> wrote:
Hoi,
Really, how? We have over 280 Wikipedias, we have Wikisources etc. How do you realistically think there would be something useful?
Thanks,
     GerardM

On 5 April 2016 at 13:48, John Erling Blad <jeblad@gmail.com> wrote:
First you say that the heuristic isn't perfect, then you say that "As long as we don't have notability criteria in a machine readable format we can only work with heuristics." and then "And I really don't believe machine readable notability criteria is something we should strive for." If the heuristic isn't perfect then alternatives should be investigated. There are already machine readable notability criterias in there, the only thing missing is exposing them, probably by using the existing relations.

On Tue, Apr 5, 2016 at 11:32 AM, Lydia Pintscher <Lydia.Pintscher@wikimedia.de> wrote:
On Sun, Apr 3, 2016 at 4:28 PM John Erling Blad <jeblad@gmail.com> wrote:
Just read through the doc, and found some important points. I post each one in a separate mail.

> Since it is hard to decide which content is actually notable, the items appear-
> ing in the search should be limited to the ones having at least one statements
> and two sitelinks to the same project (like Wikipedia or Wikivoyage).

This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.

Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".

Yes the heuristic we're using isn't perfect. However I believe it is good enough for 99% of the cases while being really simple. This is what we need at the beginning. As we go along we can learn and see if other things make more sense.
We have taken the exact same approach to ranking for item suggestions on Wikidata. At first all we took into account was the number of sitelinks on the items. This definitely wasn't a perfect measure for how relevant an item is but it was absolutely good enough while introducing very little complexity. As we've learned more and as Wikidata grows it was no longer good enough so we switched the algorithm to also take into account the number of labels. This is still relatively low complexity while producing good results.
For the particular case of notability: As long as we don't have notability criteria in a machine readable format we can only work with heuristics. And I really don't believe machine readable notability criteria is something we should strive for.

Cheers
Lydia
--
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Lucie-Aimée Kaffee
Working Student Software Development
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
http://wikimedia.de

Imagine a world in which every single human being can freely share in the sum of all knowledge. 
That‘s our commitment. Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B.
Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin,
Steuernummer 27/029/42207.