Maybe it could be done with just the Referer field on the second request, without needing to log two different page requests and correlate them.
Date: Tue, 16 Jul 2013 14:14:42 -0400 From: David Cuencadacuetu@gmail.com
Good idea, it could also help to know which are the links more used in a disambiguation page to sort them by importance.
Micru
On Tue, Jul 16, 2013 at 2:03 PM, Nicolas Vervellenvervelle@gmail.comwrote:
Interesting idea...
On Mon, Jul 15, 2013 at 11:41 PM, Jon Robsonjdlrobson@gmail.com wrote:
I understand there is an issue that needs solving where various pages link to disambiguation pages. These need fixing to point at the appropriate thing.
I had a thought on how this might be done using a variant of EventLogging...
When a user clicks on a link that is a disambiguation page and then clicks on a link on that page we log an event that contains
- page user was on before
- page user is on now
If we were to collect this data it would allow us to statistically suggest what the correct disambiguation page might be.
To take a more concrete theoretical example:
- If I am on the Wiki page for William Blake and click on London I am
taken tohttps://en.wikipedia.org/wiki/London_(disambiguation)
- I look through and see London (poem) and click on it
- An event is fired that links London (poem) to William Blake.
Obviously this won't always be accurate but I'd expect generally this would work (obviously we'd need to filter out bots)
Then when editing William Blake say that disambiguation links are surfaced. If I go to fix one it might prompt me that 80% of visitors go from William Blake to London (poem).
Have we done anything like this in the past? (Collecting data from readers and informing editors)
I can imagine applying this sort of pattern could have various other uses...
-- Jon Robson http://jonrobson.me.uk @rakugojon
Without having the origin page making the connection wouldnt be possible. (you would just end up suggesting the most common result in stead of the most accurate )
On Tue, Jul 16, 2013 at 10:37 PM, Lee Worden worden.lee@gmail.com wrote:
Maybe it could be done with just the Referer field on the second request, without needing to log two different page requests and correlate them.
Date: Tue, 16 Jul 2013 14:14:42 -0400
From: David Cuencadacuetu@gmail.com
Good idea, it could also help to know which are the links more used in a disambiguation page to sort them by importance.
Micru
On Tue, Jul 16, 2013 at 2:03 PM, Nicolas Vervellenvervelle@gmail.com** wrote:
Interesting idea...
On Mon, Jul 15, 2013 at 11:41 PM, Jon Robsonjdlrobson@gmail.com
wrote:
I understand there is an issue that needs solving where various pages link to disambiguation pages. These need fixing to point at the appropriate thing.
I had a thought on how this might be done using a variant of EventLogging...
When a user clicks on a link that is a disambiguation page and then clicks on a link on that page we log an event that contains
- page user was on before
- page user is on now
If we were to collect this data it would allow us to statistically suggest what the correct disambiguation page might be.
To take a more concrete theoretical example:
- If I am on the Wiki page for William Blake and click on London I am
taken tohttps://en.wikipedia.org/**wiki/London_(disambiguation)http://en.wikipedia.org/wiki/London_(disambiguation)
- I look through and see London (poem) and click on it
- An event is fired that links London (poem) to William Blake.
Obviously this won't always be accurate but I'd expect generally this would work (obviously we'd need to filter out bots)
Then when editing William Blake say that disambiguation links are surfaced. If I go to fix one it might prompt me that 80% of visitors go from William Blake to London (poem).
Have we done anything like this in the past? (Collecting data from readers and informing editors)
I can imagine applying this sort of pattern could have various other uses...
-- Jon Robson http://jonrobson.me.uk @rakugojon
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l
There's one issue with this. This assumes that links to disambiguated pages are the only types of links on a disambiguation page. What if somebody clicks a category link at the bottom of the page? Or what if there's just another different link?
You'd need a way to distinguish exactly what articles are being disambiguated on the page.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Tue, Jul 16, 2013 at 7:45 PM, Tyler Romeo tylerromeo@gmail.com wrote:
There's one issue with this. This assumes that links to disambiguated pages are the only types of links on a disambiguation page. What if somebody clicks a category link at the bottom of the page? Or what if there's just another different link?
I don't suspect this is much of an issue if constrained to the content element but if it was I imagine these links would be relatively easy to distinguish via ignoring any links with a ':' in it using regex or worst case scenario a soundex algorithm. I'd still suspect the disambiguation links would be the most popular clicked links...
On Wed, Jul 17, 2013 at 2:35 AM, Jon Robson jdlrobson@gmail.com wrote:
I don't suspect this is much of an issue if constrained to the content element but if it was I imagine these links would be relatively easy to distinguish via ignoring any links with a ':' in it using regex or worst case scenario a soundex algorithm. I'd still suspect the disambiguation links would be the most popular clicked links...
Even if you restrict it like that, it's still an issue. You have pages like http://en.wikipedia.org/wiki/007_(disambiguation) or http://en.wikipedia.org/wiki/11_Squadron, which have a See also section that is usually unrelated to the disambiguated topic, but still may be clicked often.
Better yet, all disambiguation pages have the disambiguation template on them, and in that template are links and image links you can click on.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
Send out a "mw-previous-referrer" on the disambiguation page and echo it back from the browser. It could be done through a cookie. On next page it must be removed, either in the server or in the browser. The server can simply rip off any incoming cookie, but not sure if this will work in the squids or if it is simple to implement. The echoed back mw-previous-referrer can then be logged somehow for the landing page. Analysis of the log will then identify missing or failed linkage.
The same could be done for search pages, as much of the same problem exist there.
Instead of using cookies javascript can do this by remembering specific pages by using the session storage. That could imply a logging facility with some kind of api access.
On Wed, Jul 17, 2013 at 4:37 AM, Lee Worden worden.lee@gmail.com wrote:
Maybe it could be done with just the Referer field on the second request, without needing to log two different page requests and correlate them.
Date: Tue, 16 Jul 2013 14:14:42 -0400 From: David Cuencadacuetu@gmail.com
Good idea, it could also help to know which are the links more used in a disambiguation page to sort them by importance.
Micru
On Tue, Jul 16, 2013 at 2:03 PM, Nicolas Vervellenvervelle@gmail.comwrote:
Interesting idea...
On Mon, Jul 15, 2013 at 11:41 PM, Jon Robsonjdlrobson@gmail.com wrote:
I understand there is an issue that needs solving where various pages link to disambiguation pages. These need fixing to point at the appropriate thing.
I had a thought on how this might be done using a variant of EventLogging...
When a user clicks on a link that is a disambiguation page and then clicks on a link on that page we log an event that contains
- page user was on before
- page user is on now
If we were to collect this data it would allow us to statistically suggest what the correct disambiguation page might be.
To take a more concrete theoretical example:
- If I am on the Wiki page for William Blake and click on London I am
taken tohttps://en.wikipedia.org/wiki/London_(disambiguation)
- I look through and see London (poem) and click on it
- An event is fired that links London (poem) to William Blake.
Obviously this won't always be accurate but I'd expect generally this would work (obviously we'd need to filter out bots)
Then when editing William Blake say that disambiguation links are surfaced. If I go to fix one it might prompt me that 80% of visitors go from William Blake to London (poem).
Have we done anything like this in the past? (Collecting data from readers and informing editors)
I can imagine applying this sort of pattern could have various other uses...
-- Jon Robson http://jonrobson.me.uk @rakugojon
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Jul 17, 2013 at 4:26 AM, John Erling Blad jeblad@gmail.com wrote:
Send out a "mw-previous-referrer" on the disambiguation page and echo it back from the browser. It could be done through a cookie. On next page it must be removed, either in the server or in the browser. The server can simply rip off any incoming cookie, but not sure if this will work in the squids or if it is simple to implement. The echoed back mw-previous-referrer can then be logged somehow for the landing page. Analysis of the log will then identify missing or failed linkage.
The same could be done for search pages, as much of the same problem exist there.
Instead of using cookies javascript can do this by remembering specific pages by using the session storage. That could imply a logging facility with some kind of api access.
This is an even worse solution. Not only does it have the same problem I mentioned, but also what if the person just browses to another page by URL? Then the server thinks the user got there from the disambiguation page.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
It doesn't matter because the correct behavior will accumulate over time. You don't try to "fix" linkage just because you have one single observed behavior, you collect and correlate behavior over time and use several, perhaps hundreds of observations.
Even more interesting than disambiguation pages are search pages. A user tries to find something, lands on some slightly related page, but must use another search to find the correct one. Observing only a few users will give a very confusing use pattern, but observing thousand of users over a year or more will create distinct patterns.
There are a lot of works on why and how if anyone bother digging it up. Short story it is only a matter of number of observations.
On Wed, Jul 17, 2013 at 10:30 AM, Tyler Romeo tylerromeo@gmail.com wrote:
On Wed, Jul 17, 2013 at 4:26 AM, John Erling Blad jeblad@gmail.com wrote:
Send out a "mw-previous-referrer" on the disambiguation page and echo it back from the browser. It could be done through a cookie. On next page it must be removed, either in the server or in the browser. The server can simply rip off any incoming cookie, but not sure if this will work in the squids or if it is simple to implement. The echoed back mw-previous-referrer can then be logged somehow for the landing page. Analysis of the log will then identify missing or failed linkage.
The same could be done for search pages, as much of the same problem exist there.
Instead of using cookies javascript can do this by remembering specific pages by using the session storage. That could imply a logging facility with some kind of api access.
This is an even worse solution. Not only does it have the same problem I mentioned, but also what if the person just browses to another page by URL? Then the server thinks the user got there from the disambiguation page.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Jul 17, 2013 at 4:42 AM, John Erling Blad jeblad@gmail.com wrote:
It doesn't matter because the correct behavior will accumulate over time. You don't try to "fix" linkage just because you have one single observed behavior, you collect and correlate behavior over time and use several, perhaps hundreds of observations.
I strongly doubt that the correct behavior will be prevalent enough to warrant using such an automatic system over just manually fixing disambiguation links, which can be done quite easily using automatic wiki browsers and the like.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
Sounds like a disagreement that can be settled quantitatively. ;) --scott On Jul 17, 2013 5:03 AM, "Tyler Romeo" tylerromeo@gmail.com wrote:
On Wed, Jul 17, 2013 at 4:42 AM, John Erling Blad jeblad@gmail.com wrote:
It doesn't matter because the correct behavior will accumulate over time. You don't try to "fix" linkage just because you have one single observed behavior, you collect and correlate behavior over time and use several, perhaps hundreds of observations.
I strongly doubt that the correct behavior will be prevalent enough to warrant using such an automatic system over just manually fixing disambiguation links, which can be done quite easily using automatic wiki browsers and the like.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Agreed. As a first step, if someone is interested in this and this doesn't go against our privacy policy it would be good to collect some link clicking data for various disambiguation pages to get an idea of whether the data created is meaningful and useful. Tyler's concerns are valid but we should clarify with some data rather than speculate to whether these are indeed concerns we need to worry about and whether this. EventLogging [1] could be used for this in my opinion using some simple javascript that hijacks links on the disambiguation page - looking at referrer and next page.
In terms of analyzing the data you could then simply look at a sample of disambiguation pages and manually determine the accuracy of users picking the correct link.
If the data does show promise it would then be an easy enough job to create a UI to use it and for editors to correct them.
I don't currently have time to explore this but would like to in future but if anyone is interested please dive in...
[1] https://mediawiki.org/wiki/Extension:EventLogging
On Wed, Jul 17, 2013 at 5:14 AM, C. Scott Ananian cananian@wikimedia.org wrote:
Sounds like a disagreement that can be settled quantitatively. ;) --scott On Jul 17, 2013 5:03 AM, "Tyler Romeo" tylerromeo@gmail.com wrote:
On Wed, Jul 17, 2013 at 4:42 AM, John Erling Blad jeblad@gmail.com wrote:
It doesn't matter because the correct behavior will accumulate over time. You don't try to "fix" linkage just because you have one single observed behavior, you collect and correlate behavior over time and use several, perhaps hundreds of observations.
I strongly doubt that the correct behavior will be prevalent enough to warrant using such an automatic system over just manually fixing disambiguation links, which can be done quite easily using automatic wiki browsers and the like.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Not sure if the analysis has to expose any private data at all, you show the result of the analysis and that would integrate over weeks or months and perhaps after filtering out random noise. Would that be a privacy problem?
One of the tricky things is that the disambiguation or search page is a signal that the referrer or some other previous page in the users history is difficult to connect to some later page. When the number of steps between the pages are increasing the problem of detecting the relation increases exponentially. It is also worth noting that by only using click events on the disambiguation page you will only discover connections that are already present as links on the disambiguation page.
On Wed, Jul 17, 2013 at 6:49 PM, Jon Robson jdlrobson@gmail.com wrote:
Agreed. As a first step, if someone is interested in this and this doesn't go against our privacy policy it would be good to collect some link clicking data for various disambiguation pages to get an idea of whether the data created is meaningful and useful. Tyler's concerns are valid but we should clarify with some data rather than speculate to whether these are indeed concerns we need to worry about and whether this. EventLogging [1] could be used for this in my opinion using some simple javascript that hijacks links on the disambiguation page - looking at referrer and next page.
In terms of analyzing the data you could then simply look at a sample of disambiguation pages and manually determine the accuracy of users picking the correct link.
If the data does show promise it would then be an easy enough job to create a UI to use it and for editors to correct them.
I don't currently have time to explore this but would like to in future but if anyone is interested please dive in...
[1] https://mediawiki.org/wiki/Extension:EventLogging
On Wed, Jul 17, 2013 at 5:14 AM, C. Scott Ananian cananian@wikimedia.org wrote:
Sounds like a disagreement that can be settled quantitatively. ;) --scott On Jul 17, 2013 5:03 AM, "Tyler Romeo" tylerromeo@gmail.com wrote:
On Wed, Jul 17, 2013 at 4:42 AM, John Erling Blad jeblad@gmail.com wrote:
It doesn't matter because the correct behavior will accumulate over time. You don't try to "fix" linkage just because you have one single observed behavior, you collect and correlate behavior over time and use several, perhaps hundreds of observations.
I strongly doubt that the correct behavior will be prevalent enough to warrant using such an automatic system over just manually fixing disambiguation links, which can be done quite easily using automatic wiki browsers and the like.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Jon Robson http://jonrobson.me.uk @rakugojon
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org