Is there a tool already (or "how hard would it be?") which would show the user what is said about article X in other articles. It seems to me that there are a lot of easy content additions that might be found that way and used to flesh out stubs and other shorter articles. What is motivating this is because I often find that "what links here" often points to some surprising articles which can reveal new insights into a topic. I often write about places. Often I think "oh, this one's nothing special" and suddenly "what links here" reveals some interesting events that occurred there. Discovery of a famous fossil or a big role in World War II or the birthplace of someone quite famous. So I am wondering if there is a way to automate this process a bit by quickly drilling down to the relevant chunk of the article content rather than having to read/search the whole thing.
That is, if I was writing the article [[Bang Bang Jump Up]], I would want a list along the lines of:
* From article [[Winston Churchill]] within section "After the Second World War" : On 23 July 1944 at [[Bang Bang Jump Up]], he met [[Harry Truman]] to discuss the establishment of the [[United Nations]].
(False news alert: These world leaders did not meet at Bang Bang Jump Up, but let's pretend they did.)
That is, a list of the articles with the sentence/para containing the link or +/- N chars before or after the link, whatever's feasible to create an intelligible snippet without having to read the whole article.
I am assuming here that article X is linked from Y (I'm not considering text mentions). Of course, the success of the tool is its ability to pick what might be most relevant. Nobody wants to wade through a list of irrelevant mentions. So I would want to stick to links occurring in the prose of the article body rather than navbox transclusions, links in citations, templates and so forth. I also think that ordering the list by some "likely to be most useful" metric would be beneficial (or ideally the ability of the user to fiddle with those choices at run-time). Now until one has such a tool to get experience with, it's hard to know what might constitute more "relevant". But some metrics might be:
* The relative importance of the topics. I suspect if a more important topic is mentioning a less important topic, it might be more relevant. Winston Churchill is more important than Bang Bang Jump Up.
* The relative quality of the articles. I suspect if a high quality article is mentioning a low quality article, it might be more relevant. Winston Church is a higher quality article than Bang Bang Jump Up.
* Being tagged by the same WikiProject (or not within the same WikiProject). Not sure which would likely be more relevant but it might be interesting to explore. It's unlikely Winston Churchill and Bang Bang Jump Up are in the same WikiProject.
* The other article is not already linked in this article. That is, if Bang Bang Jump Up already links to Winston Churchill, then probably this is less likely to be "new information" for the Bang Bang Jump Up article.
Anyhow, do we have a tool that does something along these lines? If not, is there a student project here? :)
Kerry
Hi,
I would *love* this tool as well. Being a frequent editor of "List of people from" articles, it would save me oodles of time to be able to pass by false positives in What links here. (A third of the links to a community are gold, the rest are template transclusions and people competing in a tourney at X community.)
Nick On Tue, Jun 13, 2017 at 6:45 PM Kerry Raymond kerry.raymond@gmail.com wrote:
Is there a tool already (or "how hard would it be?") which would show the user what is said about article X in other articles. It seems to me that there are a lot of easy content additions that might be found that way and used to flesh out stubs and other shorter articles. What is motivating this is because I often find that "what links here" often points to some surprising articles which can reveal new insights into a topic. I often write about places. Often I think "oh, this one's nothing special" and suddenly "what links here" reveals some interesting events that occurred there. Discovery of a famous fossil or a big role in World War II or the birthplace of someone quite famous. So I am wondering if there is a way to automate this process a bit by quickly drilling down to the relevant chunk of the article content rather than having to read/search the whole thing.
That is, if I was writing the article [[Bang Bang Jump Up]], I would want a list along the lines of:
From article [[Winston Churchill]] within section "After the
Second World War" : On 23 July 1944 at [[Bang Bang Jump Up]], he met [[Harry Truman]] to discuss the establishment of the [[United Nations]].
(False news alert: These world leaders did not meet at Bang Bang Jump Up, but let's pretend they did.)
That is, a list of the articles with the sentence/para containing the link or +/- N chars before or after the link, whatever's feasible to create an intelligible snippet without having to read the whole article.
I am assuming here that article X is linked from Y (I'm not considering text mentions). Of course, the success of the tool is its ability to pick what might be most relevant. Nobody wants to wade through a list of irrelevant mentions. So I would want to stick to links occurring in the prose of the article body rather than navbox transclusions, links in citations, templates and so forth. I also think that ordering the list by some "likely to be most useful" metric would be beneficial (or ideally the ability of the user to fiddle with those choices at run-time). Now until one has such a tool to get experience with, it's hard to know what might constitute more "relevant". But some metrics might be:
The relative importance of the topics. I suspect if a more
important topic is mentioning a less important topic, it might be more relevant. Winston Churchill is more important than Bang Bang Jump Up.
The relative quality of the articles. I suspect if a high quality
article is mentioning a low quality article, it might be more relevant. Winston Church is a higher quality article than Bang Bang Jump Up.
Being tagged by the same WikiProject (or not within the same
WikiProject). Not sure which would likely be more relevant but it might be interesting to explore. It's unlikely Winston Churchill and Bang Bang Jump Up are in the same WikiProject.
The other article is not already linked in this article. That is,
if Bang Bang Jump Up already links to Winston Churchill, then probably this is less likely to be "new information" for the Bang Bang Jump Up article.
Anyhow, do we have a tool that does something along these lines? If not, is there a student project here? :)
Kerry
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Indeed, the “Notable residents” section is one that would definitely benefit from this tool. Is it just me or is there something actually broken with “What links here?”. I try to suppress the transclusions (usually coming from navboxes) but they are still displayed no matter whether I say to “Show/Hide Transclusions” but a search of the article reveals there is no other link present.
Kerry
From: Nicholas Moreau [mailto:nicholasmoreau@gmail.com] Sent: Wednesday, 14 June 2017 9:55 AM To: kerry.raymond@gmail.com; Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Subject: Re: [Wiki-research-l] Finding what is said about a topic in other articles
Hi,
I would *love* this tool as well. Being a frequent editor of "List of people from" articles, it would save me oodles of time to be able to pass by false positives in What links here. (A third of the links to a community are gold, the rest are template transclusions and people competing in a tourney at X community.)
Nick
On Tue, Jun 13, 2017 at 6:45 PM Kerry Raymond <kerry.raymond@gmail.com mailto:kerry.raymond@gmail.com > wrote:
Is there a tool already (or "how hard would it be?") which would show the user what is said about article X in other articles. It seems to me that there are a lot of easy content additions that might be found that way and used to flesh out stubs and other shorter articles. What is motivating this is because I often find that "what links here" often points to some surprising articles which can reveal new insights into a topic. I often write about places. Often I think "oh, this one's nothing special" and suddenly "what links here" reveals some interesting events that occurred there. Discovery of a famous fossil or a big role in World War II or the birthplace of someone quite famous. So I am wondering if there is a way to automate this process a bit by quickly drilling down to the relevant chunk of the article content rather than having to read/search the whole thing.
That is, if I was writing the article [[Bang Bang Jump Up]], I would want a list along the lines of:
* From article [[Winston Churchill]] within section "After the Second World War" : On 23 July 1944 at [[Bang Bang Jump Up]], he met [[Harry Truman]] to discuss the establishment of the [[United Nations]].
(False news alert: These world leaders did not meet at Bang Bang Jump Up, but let's pretend they did.)
That is, a list of the articles with the sentence/para containing the link or +/- N chars before or after the link, whatever's feasible to create an intelligible snippet without having to read the whole article.
I am assuming here that article X is linked from Y (I'm not considering text mentions). Of course, the success of the tool is its ability to pick what might be most relevant. Nobody wants to wade through a list of irrelevant mentions. So I would want to stick to links occurring in the prose of the article body rather than navbox transclusions, links in citations, templates and so forth. I also think that ordering the list by some "likely to be most useful" metric would be beneficial (or ideally the ability of the user to fiddle with those choices at run-time). Now until one has such a tool to get experience with, it's hard to know what might constitute more "relevant". But some metrics might be:
* The relative importance of the topics. I suspect if a more important topic is mentioning a less important topic, it might be more relevant. Winston Churchill is more important than Bang Bang Jump Up.
* The relative quality of the articles. I suspect if a high quality article is mentioning a low quality article, it might be more relevant. Winston Church is a higher quality article than Bang Bang Jump Up.
* Being tagged by the same WikiProject (or not within the same WikiProject). Not sure which would likely be more relevant but it might be interesting to explore. It's unlikely Winston Churchill and Bang Bang Jump Up are in the same WikiProject.
* The other article is not already linked in this article. That is, if Bang Bang Jump Up already links to Winston Churchill, then probably this is less likely to be "new information" for the Bang Bang Jump Up article.
Anyhow, do we have a tool that does something along these lines? If not, is there a student project here? :)
Kerry
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org mailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Tue, Jun 13, 2017 at 6:08 PM, Kerry Raymond kerry.raymond@gmail.com wrote:
Indeed, the “Notable residents” section is one that would definitely benefit from this tool. Is it just me or is there something actually broken with “What links here?”. I try to suppress the transclusions (usually coming from navboxes) but they are still displayed no matter whether I say to “Show/Hide Transclusions” but a search of the article reveals there is no other link present.
That existing feature works by hiding/showing where *the page itself* is transcluded *into*. E.g. https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:WikiFauna vs https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Wi...
Making it work differently for incoming links that are coming from a template, is a long-standing (and complicated to implement) feature-request: https://phabricator.wikimedia.org/T14396
However, I see this comment by Izno suggests a partial (manual) workaround, using an "insource:/[[FOO/" search. https://phabricator.wikimedia.org/T14396#3246134 e.g. https://en.wikipedia.org/w/index.php?title=Special:Search&profile=all&am... versus https://en.wikipedia.org/wiki/Special:WhatLinksHere/Wikipedia:WikiGremlin (I'm not sure why Izno's example also includes the "linksto:FOO" string, but it appears to be redundant)
Ahh ... it's a whole new meaning of transclusion ...
I am tempted to say "well, just work with the original wikitext and don’t resolve the templates" but I guess the problem here is that all templates aren't equal. Links in an infobox are much more likely to be relevant to *this* article than links in a navbox are and who knows about arbitrary templates more generally. I saw a stat in passing the other day that said around 50% of Wikipedia article have navboxes and I confess to having added a few navboxes even in the past few days. As a reader I like them, but they are a pain for anyone using "What links here".
Indeed, just using
insource:"Chapel Hill, Queensland"
* without* the square brackets does a jolly fine job of identifying the articles that mention the article [[Chapel Hill, Queensland]] or just the topic and provides a snippet (not a great one but it does gives some context to the link)
It works because it sees the links that are used as parameters in the infobox (whether or not they are wrapped in square brackets) but can't see the ones embedded in the definition of the navboxes. Plus you get mentions as well as links. Sweet! If one could have a filter that eliminated the "mutually linking" articles (X links to Y and Y links to X) it would be close to nailing it! Of course it works better for longer article titles unlikely to occur in other circumstances. I wouldn't bother to try it for [[Food]] but then I am looking to a tool to populate stubs which probably eliminates "common name" articles.
Kerry
-----Original Message----- From: Nick Wilson (Quiddity) [mailto:nwilson@wikimedia.org] Sent: Wednesday, 14 June 2017 3:34 PM To: Kerry Raymond kerry.raymond@gmail.com; Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Cc: Nicholas Moreau nicholasmoreau@gmail.com Subject: Re: [Wiki-research-l] Finding what is said about a topic in other articles
On Tue, Jun 13, 2017 at 6:08 PM, Kerry Raymond kerry.raymond@gmail.com wrote:
Indeed, the “Notable residents” section is one that would definitely benefit from this tool. Is it just me or is there something actually broken with “What links here?”. I try to suppress the transclusions (usually coming from navboxes) but they are still displayed no matter whether I say to “Show/Hide Transclusions” but a search of the article reveals there is no other link present.
That existing feature works by hiding/showing where *the page itself* is transcluded *into*. E.g. https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:WikiFauna vs https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Wi...
Making it work differently for incoming links that are coming from a template, is a long-standing (and complicated to implement) feature-request: https://phabricator.wikimedia.org/T14396
However, I see this comment by Izno suggests a partial (manual) workaround, using an "insource:/[[FOO/" search. https://phabricator.wikimedia.org/T14396#3246134 e.g. https://en.wikipedia.org/w/index.php?title=Special:Search&profile=all&am... versus https://en.wikipedia.org/wiki/Special:WhatLinksHere/Wikipedia:WikiGremlin (I'm not sure why Izno's example also includes the "linksto:FOO" string, but it appears to be redundant)
-- Quiddity
Kerry Raymond, 14/06/2017 00:45:
What is motivating this is because I often find that "what links here" often points to some surprising articles which can reveal new insights into a topic.
Indeed. I always teach the "what links here" feature at all my wiki courses.
Kerry Raymond, 14/06/2017 03:08:
I try to suppress the transclusions (usually coming from navboxes) but they are still displayed no matter whether I say to “Show/Hide Transclusions” but a search of the article reveals there is no other link present.
Hiding transclusions means to hide the "links" in the form {{:Bang Bang Jump Up}}, not the links within templates. There is currently no distinction in the database for "templated" links (they all go https://www.mediawiki.org/wiki/Manual:Pagelinks_table ).
I think ElasticSearch/CirrusSearch currently can tell the difference, for ranking purposes, and could maybe expose it somewhere. Which makes sense, because overlinking is a problem specific to some wikis (other wikis have even forbidden the navigational templates that are so prevalent on the English Wikipedia), and while several HTML classes have been defined across the years (some now listed at https://www.mediawiki.org/wiki/Manual:Interface/IDs_and_classes#Content ), only "noprint" is standard/stable.
Nemo
wiki-research-l@lists.wikimedia.org