I wrote my Bachelor's thesis on "Generating Article Placeholders from Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The thesis summarizes a lot of the work done on the ArticlePlaceholder extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
I uploaded the thesis to commons under a CC-BY-SA license- you can find it at https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from...
I continue working on the extension and aim to deploy it to the first Wikipedias, that are interested, in the next months.
I am happy to answer questions related to the extension!
Lucie (Frimelle)
On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee lucie.kaffee@wikimedia.de wrote:
I wrote my Bachelor's thesis on "Generating Article Placeholders from Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The thesis summarizes a lot of the work done on the ArticlePlaceholder extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
I uploaded the thesis to commons under a CC-BY-SA license- you can find it at https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from...
I continue working on the extension and aim to deploy it to the first Wikipedias, that are interested, in the next months.
I am happy to answer questions related to the extension!
Great work on something that I *believe *has a lot of promise - thanks! I really think this approach has a lot of promise to help take back some readership from Google, and potentially in the long-run drive more new editors as well. (I know that was part of the theory of LSJbot, though I don't know if anyone has actually a/b tested that.)
I was somewhat surprised to not see data collection discussed in Section 8.10 - are there plans to do that? I would have expected to see a/b testing discussed as part of the deployment methodology, so that it could be compared both to the current baseline and also to similar approaches (like the ones you survey in Section 3).
Thanks again for the hard work here-
Luis
Just read through the doc, and found some important points. I post each one in a separate mail.
Since it is hard to decide which content is actually notable, the items
appear-
ing in the search should be limited to the ones having at least one
statements
and two sitelinks to the same project (like Wikipedia or Wikivoyage).
This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.
Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".
On Sat, Apr 2, 2016 at 11:09 PM, Luis Villa luis@lu.is wrote:
On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee lucie.kaffee@wikimedia.de wrote:
I wrote my Bachelor's thesis on "Generating Article Placeholders from Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The thesis summarizes a lot of the work done on the ArticlePlaceholder extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
I uploaded the thesis to commons under a CC-BY-SA license- you can find it at https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from...
I continue working on the extension and aim to deploy it to the first Wikipedias, that are interested, in the next months.
I am happy to answer questions related to the extension!
Great work on something that I *believe *has a lot of promise - thanks! I really think this approach has a lot of promise to help take back some readership from Google, and potentially in the long-run drive more new editors as well. (I know that was part of the theory of LSJbot, though I don't know if anyone has actually a/b tested that.)
I was somewhat surprised to not see data collection discussed in Section 8.10 - are there plans to do that? I would have expected to see a/b testing discussed as part of the deployment methodology, so that it could be compared both to the current baseline and also to similar approaches (like the ones you survey in Section 3).
Thanks again for the hard work here-
Luis
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I think not doing anything at all when the item is presumed not notable is a bad thing, especially when we have datas. We should be able to at least generate a description in those cases, maybe in a popup way.
Why not, just not showing the "create" button instead ?
2016-04-03 16:27 GMT+02:00 John Erling Blad jeblad@gmail.com:
Just read through the doc, and found some important points. I post each one in a separate mail.
Since it is hard to decide which content is actually notable, the items
appear-
ing in the search should be limited to the ones having at least one
statements
and two sitelinks to the same project (like Wikipedia or Wikivoyage).
This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.
Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".
On Sat, Apr 2, 2016 at 11:09 PM, Luis Villa luis@lu.is wrote:
On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee lucie.kaffee@wikimedia.de wrote:
I wrote my Bachelor's thesis on "Generating Article Placeholders from Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The thesis summarizes a lot of the work done on the ArticlePlaceholder extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
I uploaded the thesis to commons under a CC-BY-SA license- you can find it at https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from...
I continue working on the extension and aim to deploy it to the first Wikipedias, that are interested, in the next months.
I am happy to answer questions related to the extension!
Great work on something that I *believe *has a lot of promise - thanks! I really think this approach has a lot of promise to help take back some readership from Google, and potentially in the long-run drive more new editors as well. (I know that was part of the theory of LSJbot, though I don't know if anyone has actually a/b tested that.)
I was somewhat surprised to not see data collection discussed in Section 8.10 - are there plans to do that? I would have expected to see a/b testing discussed as part of the deployment methodology, so that it could be compared both to the current baseline and also to similar approaches (like the ones you survey in Section 3).
Thanks again for the hard work here-
Luis
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Sun, Apr 3, 2016 at 4:41 PM Thomas Douillard thomas.douillard@gmail.com wrote:
I think not doing anything at all when the item is presumed not notable is a bad thing, especially when we have datas. We should be able to at least generate a description in those cases, maybe in a popup way.
Why not, just not showing the "create" button instead ?
One of the main goals of the ArticlePlaceholder is getting more editors for the projects in question. We need to be very careful not to encourage them to create articles which are then deleted 5 minutes later. If this would happen to me as an editor I'd be extremely miffed because you just asked me to do exactly that 5 minutes earlier.
Cheers Lydia
One of the main goals of the ArticlePlaceholder is getting more editors
for the projects in question. We need to be very careful not to encourage them to create articles which are then deleted 5 minutes later. If this would happen to me as an editor I'd be extremely miffed because you just asked me to do exactly that 5 minutes earlier.
Sorry, I meant "not showing the create article button when the article is presumed not notable" ? If it should be very clear that when the button is shown, there is an incent to create the article, if the button is not shown it's very clear that we are just showing datas. There could be a button "edit Wikidata" instead.
Red links are used frequently in Wikipedia to indicate an article which
is does
not yet exist, but should. Today it leads the user to an empty create
article page.
In the future it should instead bring them to an ArticlePlaceholder,
offering the
option of creating an article. This is part of the topic of smart red
links, which is
discussed in the section 8.1: Smart red links
It should be interesting to hear if someone have an idea how this might work. There are some attempts on this at nowiki, none of them seems to work in all cases.
Note that "Extension:ArticlePlaceholder/Smart red links" doesn't really solve the problem for existing redlinks, it solves the association problem when the user tries to resolve the redlink. That is one step further down the line, or more like solving the redlinks for a disambiguation page. ("I know there is a page like this, named like so, on that specific project.")
Note also that an item is not necessarily described on any project, and that creating an item on Wikidata can be outside the editors scope or even very difficult. Often we have a name of some "thing", but we only have a sketchy idea about the thing itself. Check out https://www.wikidata.org/wiki/Q12011301 for an example.
It seems like a lot of what are done so far on redlinks is an attempt to make pink-ish links with _some_information_, while the problem is that redlinks have _no_information_. The core reason why we have redlinks is that we lacks manpower to avoid them, and because of that we can't just add "some information". It is not a problem of what we need first, hens or eggs, as we have none of them.
On Sun, Apr 3, 2016 at 4:27 PM, John Erling Blad jeblad@gmail.com wrote:
Just read through the doc, and found some important points. I post each one in a separate mail.
Since it is hard to decide which content is actually notable, the items
appear-
ing in the search should be limited to the ones having at least one
statements
and two sitelinks to the same project (like Wikipedia or Wikivoyage).
This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.
Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".
On Sat, Apr 2, 2016 at 11:09 PM, Luis Villa luis@lu.is wrote:
On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee lucie.kaffee@wikimedia.de wrote:
I wrote my Bachelor's thesis on "Generating Article Placeholders from Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The thesis summarizes a lot of the work done on the ArticlePlaceholder extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
I uploaded the thesis to commons under a CC-BY-SA license- you can find it at https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from...
I continue working on the extension and aim to deploy it to the first Wikipedias, that are interested, in the next months.
I am happy to answer questions related to the extension!
Great work on something that I *believe *has a lot of promise - thanks! I really think this approach has a lot of promise to help take back some readership from Google, and potentially in the long-run drive more new editors as well. (I know that was part of the theory of LSJbot, though I don't know if anyone has actually a/b tested that.)
I was somewhat surprised to not see data collection discussed in Section 8.10 - are there plans to do that? I would have expected to see a/b testing discussed as part of the deployment methodology, so that it could be compared both to the current baseline and also to similar approaches (like the ones you survey in Section 3).
Thanks again for the hard work here-
Luis
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Ordering of statement groups
The solution described (?) seems to me as "dev-ish" way to do this, and I think it is wrong. The grouping is something that should be done dynamically, as it depends both on the item itself (ie the knowledge base), its class hierarchy (ie interpretation of the knowledge base, often part of the knowledge base), our communicative goal (the overall context of the communication), the discourse (usually we drop this, as we don't maintain state), and the user model (which changes through a wp-article). This 4-tuple is pretty well-known in Natural Language Generation, but the implications for reuse of Wikidata statements in Wikipedia is mostly neglected. (That is not something Lucie should discuss in a bachelor thesis, but it is extremely important if the goal for Wikidata is actual reuse on Wikipedia)
That said; I tried to figure out whats the idea, and also read the RfC (Statement group ordering [1]), but actually I don't know whats planned here. I think I know it, but most probably I don't. The statement group ordering is a on-wiki list of ordered groups? How do you create those groups? What is the implications on those groups? Does it has implications for other visualizations? What if groups should follow the type of the item? It seems like this describe a system where "one size fits all - or make it youself".
And not to forget, where is the discussion? An RfC with no discussion?
[1] https://www.mediawiki.org/wiki/Requests_for_comment/Statement_group_ordering
On Sun, Apr 3, 2016 at 5:06 PM, John Erling Blad jeblad@gmail.com wrote:
Red links are used frequently in Wikipedia to indicate an article which
is does
not yet exist, but should. Today it leads the user to an empty create
article page.
In the future it should instead bring them to an ArticlePlaceholder,
offering the
option of creating an article. This is part of the topic of smart red
links, which is
discussed in the section 8.1: Smart red links
It should be interesting to hear if someone have an idea how this might work. There are some attempts on this at nowiki, none of them seems to work in all cases.
Note that "Extension:ArticlePlaceholder/Smart red links" doesn't really solve the problem for existing redlinks, it solves the association problem when the user tries to resolve the redlink. That is one step further down the line, or more like solving the redlinks for a disambiguation page. ("I know there is a page like this, named like so, on that specific project.")
Note also that an item is not necessarily described on any project, and that creating an item on Wikidata can be outside the editors scope or even very difficult. Often we have a name of some "thing", but we only have a sketchy idea about the thing itself. Check out https://www.wikidata.org/wiki/Q12011301 for an example.
It seems like a lot of what are done so far on redlinks is an attempt to make pink-ish links with _some_information_, while the problem is that redlinks have _no_information_. The core reason why we have redlinks is that we lacks manpower to avoid them, and because of that we can't just add "some information". It is not a problem of what we need first, hens or eggs, as we have none of them.
On Sun, Apr 3, 2016 at 4:27 PM, John Erling Blad jeblad@gmail.com wrote:
Just read through the doc, and found some important points. I post each one in a separate mail.
Since it is hard to decide which content is actually notable, the items
appear-
ing in the search should be limited to the ones having at least one
statements
and two sitelinks to the same project (like Wikipedia or Wikivoyage).
This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.
Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".
On Sat, Apr 2, 2016 at 11:09 PM, Luis Villa luis@lu.is wrote:
On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee lucie.kaffee@wikimedia.de wrote:
I wrote my Bachelor's thesis on "Generating Article Placeholders from Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The thesis summarizes a lot of the work done on the ArticlePlaceholder extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
I uploaded the thesis to commons under a CC-BY-SA license- you can find it at https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from...
I continue working on the extension and aim to deploy it to the first Wikipedias, that are interested, in the next months.
I am happy to answer questions related to the extension!
Great work on something that I *believe *has a lot of promise - thanks! I really think this approach has a lot of promise to help take back some readership from Google, and potentially in the long-run drive more new editors as well. (I know that was part of the theory of LSJbot, though I don't know if anyone has actually a/b tested that.)
I was somewhat surprised to not see data collection discussed in Section 8.10 - are there plans to do that? I would have expected to see a/b testing discussed as part of the deployment methodology, so that it could be compared both to the current baseline and also to similar approaches (like the ones you survey in Section 3).
Thanks again for the hard work here-
Luis
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Sun, Apr 3, 2016 at 4:28 PM John Erling Blad jeblad@gmail.com wrote:
Just read through the doc, and found some important points. I post each one in a separate mail.
Since it is hard to decide which content is actually notable, the items
appear-
ing in the search should be limited to the ones having at least one
statements
and two sitelinks to the same project (like Wikipedia or Wikivoyage).
This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.
Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".
Yes the heuristic we're using isn't perfect. However I believe it is good enough for 99% of the cases while being really simple. This is what we need at the beginning. As we go along we can learn and see if other things make more sense. We have taken the exact same approach to ranking for item suggestions on Wikidata. At first all we took into account was the number of sitelinks on the items. This definitely wasn't a perfect measure for how relevant an item is but it was absolutely good enough while introducing very little complexity. As we've learned more and as Wikidata grows it was no longer good enough so we switched the algorithm to also take into account the number of labels. This is still relatively low complexity while producing good results. For the particular case of notability: As long as we don't have notability criteria in a machine readable format we can only work with heuristics. And I really don't believe machine readable notability criteria is something we should strive for.
Cheers Lydia
First you say that the heuristic isn't perfect, then you say that "As long as we don't have notability criteria in a machine readable format we can only work with heuristics." and then "And I really don't believe machine readable notability criteria is something we should strive for." If the heuristic isn't perfect then alternatives should be investigated. There are already machine readable notability criterias in there, the only thing missing is exposing them, probably by using the existing relations.
On Tue, Apr 5, 2016 at 11:32 AM, Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
On Sun, Apr 3, 2016 at 4:28 PM John Erling Blad jeblad@gmail.com wrote:
Just read through the doc, and found some important points. I post each one in a separate mail.
Since it is hard to decide which content is actually notable, the items
appear-
ing in the search should be limited to the ones having at least one
statements
and two sitelinks to the same project (like Wikipedia or Wikivoyage).
This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.
Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".
Yes the heuristic we're using isn't perfect. However I believe it is good enough for 99% of the cases while being really simple. This is what we need at the beginning. As we go along we can learn and see if other things make more sense. We have taken the exact same approach to ranking for item suggestions on Wikidata. At first all we took into account was the number of sitelinks on the items. This definitely wasn't a perfect measure for how relevant an item is but it was absolutely good enough while introducing very little complexity. As we've learned more and as Wikidata grows it was no longer good enough so we switched the algorithm to also take into account the number of labels. This is still relatively low complexity while producing good results. For the particular case of notability: As long as we don't have notability criteria in a machine readable format we can only work with heuristics. And I really don't believe machine readable notability criteria is something we should strive for.
Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, Really, how? We have over 280 Wikipedias, we have Wikisources etc. How do you realistically think there would be something useful? Thanks, GerardM
On 5 April 2016 at 13:48, John Erling Blad jeblad@gmail.com wrote:
First you say that the heuristic isn't perfect, then you say that "As long as we don't have notability criteria in a machine readable format we can only work with heuristics." and then "And I really don't believe machine readable notability criteria is something we should strive for." If the heuristic isn't perfect then alternatives should be investigated. There are already machine readable notability criterias in there, the only thing missing is exposing them, probably by using the existing relations.
On Tue, Apr 5, 2016 at 11:32 AM, Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
On Sun, Apr 3, 2016 at 4:28 PM John Erling Blad jeblad@gmail.com wrote:
Just read through the doc, and found some important points. I post each one in a separate mail.
Since it is hard to decide which content is actually notable, the
items appear-
ing in the search should be limited to the ones having at least one
statements
and two sitelinks to the same project (like Wikipedia or Wikivoyage).
This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.
Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".
Yes the heuristic we're using isn't perfect. However I believe it is good enough for 99% of the cases while being really simple. This is what we need at the beginning. As we go along we can learn and see if other things make more sense. We have taken the exact same approach to ranking for item suggestions on Wikidata. At first all we took into account was the number of sitelinks on the items. This definitely wasn't a perfect measure for how relevant an item is but it was absolutely good enough while introducing very little complexity. As we've learned more and as Wikidata grows it was no longer good enough so we switched the algorithm to also take into account the number of labels. This is still relatively low complexity while producing good results. For the particular case of notability: As long as we don't have notability criteria in a machine readable format we can only work with heuristics. And I really don't believe machine readable notability criteria is something we should strive for.
Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hey everyone,
Sorry for being late to the discussion and thank you very much for your feedback. Many of the important questions were answered by Lydia already, I'll try to cover what is left.
First of all, please keep in mind that this was a Bachelor's thesis and therefore I was limited in the scope of the project as well as time wise. I am happy about input, and there is a phabricator board for the further development of the extension [1]
Luis- I haven't thought about a/b testing yet. It will be a beta feature in the beginning as mentioned to collect feedback, but I tried to keep in mind that especially in the beginning we speak about very small Wikipedias, and collecting data about how many articles are created from the placeholders over all will be the first step of testing how well they are accepted beside the general feedback. This is planned [2]
The problem of the red links is rather broad but extremely interesting to me. Sadly, it was out of the scope and the "smart red links" chapter mostly exists to indicate that there are first approaches to include this topic and there will be further work on this. But it does involve very well planned work and more than just half a page of writing and discussion I guess :)
The notability of items is another difficult topic. I chose the solution discussed for now, because a) as Lydia said I don't wont encourage article creation when not appropriate and b) there are many items on Wikidata, that will not reach the criterias I chose for now. Displaying them anyway may lead to disappointment with the content of the ArticlePlaceholder and editors, that would actually want to create an article on the topic would have to do more clicks than otherwise necessary, since the placeholders can't show them much more information as an empty page in most of these cases. Therefore we decided to filter those items out.
I can't say why there is not more input on the RfC about ordering of statement groups but I hope you will agree on giving my approach a try. If it's not the one wished by the communities and/or developers and we can come up with a better one, I'll be open to change and adjust. But for now, that seemed like a solution, that could be a first step in having ordered statements.
Thank you very much again!
Lucie
[1] https://phabricator.wikimedia.org/tag/articleplaceholder/ [2] https://phabricator.wikimedia.org/T123087
On Tue, Apr 5, 2016 at 3:32 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Really, how? We have over 280 Wikipedias, we have Wikisources etc. How do you realistically think there would be something useful? Thanks, GerardM
On 5 April 2016 at 13:48, John Erling Blad jeblad@gmail.com wrote:
First you say that the heuristic isn't perfect, then you say that "As long as we don't have notability criteria in a machine readable format we can only work with heuristics." and then "And I really don't believe machine readable notability criteria is something we should strive for." If the heuristic isn't perfect then alternatives should be investigated. There are already machine readable notability criterias in there, the only thing missing is exposing them, probably by using the existing relations.
On Tue, Apr 5, 2016 at 11:32 AM, Lydia Pintscher < Lydia.Pintscher@wikimedia.de> wrote:
On Sun, Apr 3, 2016 at 4:28 PM John Erling Blad jeblad@gmail.com wrote:
Just read through the doc, and found some important points. I post each one in a separate mail.
Since it is hard to decide which content is actually notable, the
items appear-
ing in the search should be limited to the ones having at least one
statements
and two sitelinks to the same project (like Wikipedia or Wikivoyage).
This is a good baseline, but figuring out what is notable locally is a bit more involved. A language is used in a local area, and within that area some items are more important just because they reside within the area. This is quite noticeable in the differences between nnwiki and nowiki which both basically covers "Norway". Also items that somehow relates to the local area or language is more noticeable than those outside those areas. By traversing upwords in the claims using the "part of" property it is possible to build a priority on the area involved. It is possible to traverse "nationality" and a few other properties.
Things directly noticeable like an area enclosed in an area using the language is somewhat easy to identify, but things that are noticeable by association with another noticeable thing is not. Like a Danish slave ship operated by a Norwegian firm, the ship is thus noticeable in nowiki. I would say that all things linked as an item from other noticeable things should be included. Some would perhaps say that "items with second order relevance should be included".
Yes the heuristic we're using isn't perfect. However I believe it is good enough for 99% of the cases while being really simple. This is what we need at the beginning. As we go along we can learn and see if other things make more sense. We have taken the exact same approach to ranking for item suggestions on Wikidata. At first all we took into account was the number of sitelinks on the items. This definitely wasn't a perfect measure for how relevant an item is but it was absolutely good enough while introducing very little complexity. As we've learned more and as Wikidata grows it was no longer good enough so we switched the algorithm to also take into account the number of labels. This is still relatively low complexity while producing good results. For the particular case of notability: As long as we don't have notability criteria in a machine readable format we can only work with heuristics. And I really don't believe machine readable notability criteria is something we should strive for.
Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, The one major issue I have is that the thinking has been Wikipedia centred. It is in my opinion a major flaw. To top this off, Wikidata is used in other projects as well. When Commons will be wikidatified, it will need article placeholder to bring information in any and all languages for any picture. A similar thing can be said for Wikispecies.
Many items are not represented in any Wikipedia. An increasing number of items are created as placeholders. Many of them could be red links but in many Wikipedias they do not like red links so they are often not created as such. Obviously it is outside the scope of your research but the use of Wikidata extends to all links. It is important to do so because it is a great way to ensure that links actually go where they are supposed to go. It will also ensure that the "concept cloud" as implemented in the Reasonator becomes even more relevant.
What I miss is attention for labels. It is the big elephant in the room. Without labels, the whole placeholder idea falls flat. Search falls flat. In Reasonator it is possible to add missing labels. Once a label has been added, it has an effect on all items where that item is referenced. In a small experiment in Berlin half an hour we was spend adding labels for "Nelson Mandela" and it was gratifying to show how it translated to other people like "president" to "Barack Obama".
Notability in the context of an article placeholder is problematic. If you propose that it has to be decided to have an item as an article placeholder, then I sincerely hope that this is done with a flag. This restricted approach backfires when you consider the small Wikipedias. Arguably it serves a purpose to have article placeholders for any item and particularly when there are labels present. Our mission is to share in the sum of all knowledge after all and this approach to notability has a undesired impact.
Coming back to article placeholder being Wikipedia centric. Is there any reason why article placeholder should not be available on Wikidata itself? Reasonator is not linked to Wikidata at all. It works in any language and, it is my primary tool of maintaining and understanding information in Wikidata. It would be good when Reasonator is replaced by something native and article placeholder is the obvious candidate.
I am happy that article placeholder is happening. It will have an interesting future. Thanks, GerardM
On 2 April 2016 at 13:32, Lucie Kaffee lucie.kaffee@wikimedia.de wrote:
I wrote my Bachelor's thesis on "Generating Article Placeholders from Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The thesis summarizes a lot of the work done on the ArticlePlaceholder extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
I uploaded the thesis to commons under a CC-BY-SA license- you can find it at https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from...
I continue working on the extension and aim to deploy it to the first Wikipedias, that are interested, in the next months.
I am happy to answer questions related to the extension!
Lucie (Frimelle)
Lucie-Aimée Kaffee Working Student Software Development
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0http://wikimedia.de
Imagine a world in which every single human being can freely share in the sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Mon, Apr 4, 2016 at 6:41 AM Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, The one major issue I have is that the thinking has been Wikipedia centred. It is in my opinion a major flaw. To top this off, Wikidata is used in other projects as well. When Commons will be wikidatified, it will need article placeholder to bring information in any and all languages for any picture. A similar thing can be said for Wikispecies.
It is Wikipedia centered to the extend that Lucie had to focus for her thesis. There is nothing preventing it from being used on other projects in the future.
Many items are not represented in any Wikipedia. An increasing number of items are created as placeholders. Many of them could be red links but in many Wikipedias they do not like red links so they are often not created as such. Obviously it is outside the scope of your research but the use of Wikidata extends to all links. It is important to do so because it is a great way to ensure that links actually go where they are supposed to go. It will also ensure that the "concept cloud" as implemented in the Reasonator becomes even more relevant.
What I miss is attention for labels. It is the big elephant in the room. Without labels, the whole placeholder idea falls flat. Search falls flat. In Reasonator it is possible to add missing labels. Once a label has been added, it has an effect on all items where that item is referenced. In a small experiment in Berlin half an hour we was spend adding labels for "Nelson Mandela" and it was gratifying to show how it translated to other people like "president" to "Barack Obama".
We will work with the communities in question to add labels to the most important things before the deployment on a project. In the future we will also look at making this easy inside the ArticlePlaceholder.
Notability in the context of an article placeholder is problematic. If you propose that it has to be decided to have an item as an article placeholder, then I sincerely hope that this is done with a flag. This restricted approach backfires when you consider the small Wikipedias. Arguably it serves a purpose to have article placeholders for any item and particularly when there are labels present. Our mission is to share in the sum of all knowledge after all and this approach to notability has a undesired impact.
Coming back to article placeholder being Wikipedia centric. Is there any reason why article placeholder should not be available on Wikidata itself? Reasonator is not linked to Wikidata at all. It works in any language and, it is my primary tool of maintaining and understanding information in Wikidata. It would be good when Reasonator is replaced by something native and article placeholder is the obvious candidate.
We have and will continue to concentrate on improving the existing UI on Wikidata. This includes making it more visual similar to Reasonator. This includes sharing some stuff with the ArticlePlaceholder.
Cheers Lydia