There's a question being discussed on wikien-l at present about whether [[en:Ireland]] should be the country, the island or a disambig.
The arguments are both on correctness and on reader expectation. The first is for Manual of Style wonks (as well as anyone with an opinion), but the second is theoretically numerically ascertainable.
The question is:
* how to get numbers on how readers travel from one page to another within Wikipedia * without risking a privacy violation.
Examples of the latter would be if we included jumps to or from a user page or even a WP: pseudo-pagespace page. So let's ignore those.
The first idea that springs to my mind is logging referers for article pages, if the referer is an article page on the same Wikipedia or [[Special:Search]].
1. Is this technically feasible given our logging structure? 2. Is there a privacy gotcha I'm ignoring?
The huge benefit from this would be seeing how readers actually use Wikipedia, which would give us solid reasons to put given pages, links, redirects, etc. somewhere.
This should of course include jumps from [[Special:Search]], so we know what the heck people are actually searching for!
I realise our current logging of pretty much every page view without crippling the servers is a miracle of computer system administration. How feasible is my idea?
- d.
David Gerard wrote:
There's a question being discussed on wikien-l at present about whether [[en:Ireland]] should be the country, the island or a disambig.
Don't let the readers random actions dictate the community decisions ;)
I realise our current logging of pretty much every page view without crippling the servers is a miracle of computer system administration. How feasible is my idea?
- d.
Doesn't seem too different than logging the urls or user-agents (as it's being discussed). Although an appropiate C parser would be needed.
Although an interesting difference would be that the filtering which pages get a referer should be done at the squid, instead of the aggregator, as it is being done.
For the record, if it's done, I'd be interested in getting the referers for eswiki's old main page (maybe it'd need sampling).
On Wed, Nov 26, 2008 at 10:32 AM, David Gerard dgerard@gmail.com wrote: [snip]
The first idea that springs to my mind is logging referers for article pages, if the referer is an article page on the same Wikipedia or [[Special:Search]].
- Is this technically feasible given our logging structure?
Sure.
- Is there a privacy gotcha I'm ignoring?
[snip]
The only leak I can think of is:
Step 1. I accidentally paste confidential information into the go box (you've never done this?)
Step 2. I end up at non-existent article [[random confidential information]]
Step 3. I browse back to the article about me.
Probably best resolved resolved by not reporting infrequent referrers. (Filtering by existing articles is computationally expensive)
Gregory Maxwell wrote:
On Wed, Nov 26, 2008 at 10:32 AM, David Gerard wrote: [snip]
The first idea that springs to my mind is logging referers for article pages, if the referer is an article page on the same Wikipedia or [[Special:Search]].
- Is this technically feasible given our logging structure?
Sure.
- Is there a privacy gotcha I'm ignoring?
[snip]
The only leak I can think of is:
Step 1. I accidentally paste confidential information into the go box (you've never done this?)
Step 2. I end up at non-existent article [[random confidential information]]
Step 3. I browse back to the article about me.
Probably best resolved resolved by not reporting infrequent referrers. (Filtering by existing articles is computationally expensive)
You'd need to actually follow a link to the article about you, not just using the browser back button. Maybe if you choose the first link on the sidebar, it can get logged on the Main_Page referer's.
It's more likely to happen if you actually follow a result link on the random confidential information results. So, if I paste, 'Pay $100000 to Gregory Maxwell for help hiding Osama bin Laden' and think, "they have an article about Greg! What will they tell?" And follow a link to [[Gregory Maxwell]], sure, that would get logged and we would find out.
Not a privacy risk as big as publishing the searchs, but still a concern.
PS: Expect the NSA to come at both our homes after ECHELON intercepts this email.
wikitech-l@lists.wikimedia.org