In the Hebrew Wikipedia there's been some discussions about changing the links in the sidebar. Is there a clever way to do it by using click statistics?
For example, can we get statistics about how many people click each link in the sidebar, and if possible - what kind of user click them - registered, anonymous, having more than 5 edits, etc.?
Of course, this may be useful for all projects.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
On Mon, Aug 15, 2011 at 11:48 PM, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
In the Hebrew Wikipedia there's been some discussions about changing the links in the sidebar. Is there a clever way to do it by using click statistics?
For example, can we get statistics about how many people click each link in the sidebar, and if possible - what kind of user click them - registered, anonymous, having more than 5 edits, etc.?
Of course, this may be useful for all projects.
ClickTracking has code ready to go for this. We used it on enwiki once but never really analyzed the data.
If you have community consensus for this I can turn it on quite easily.
Roan
Amir E. Aharoni wrote:
In the Hebrew Wikipedia there's been some discussions about changing the links in the sidebar. Is there a clever way to do it by using click statistics?
For example, can we get statistics about how many people click each link in the sidebar, and if possible - what kind of user click them - registered, anonymous, having more than 5 edits, etc.?
Of course, this may be useful for all projects.
The English Wikipedia once used a hack where it changed a few links on the Main Page to redirects and then measured page view statistics using stats.grok.se.[1]
An extension such as ClickTracking is probably a much better option if you can get local consensus and sysadmin support.
MZMcBride
[1] http://en.wikipedia.org/w/index.php?diff=223886569&oldid=223883642
2011/8/16 MZMcBride z@mzmcbride.com:
An extension such as ClickTracking is probably a much better option if you can get local consensus and sysadmin support.
Are there any special implications i should know about? Privacy? Performance? Anything else?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
On Tue, Aug 16, 2011 at 8:30 AM, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
2011/8/16 MZMcBride z@mzmcbride.com:
An extension such as ClickTracking is probably a much better option if you can get local consensus and sysadmin support.
Are there any special implications i should know about? Privacy? Performance? Anything else?
As for privacy: for each event, we store * the kind of event (e.g. 'clicked on the recent changes link in the sidebar') * the timestamp, with second-level granularity, in UTC (just like for edits) * whether the user was logged in * the user's lifetime edit count at the time of the event * the number of edits the user made in the 1, 3 and 6 months preceding the event
This is all tracked in a database table that is not shared with the outside world. The data we'll give you will be in some aggregated form like "on September 2nd between 16:00 and 17:00 UTC, the recent changes link was clicked 40 times, of which 30 clicks were by users with a lifetime edit count greater than 10". So aside from the fact that we store some mildly stalkerish (but not personally identifiable, unless your edit count differs substantially from everyone else's such that you can be identified by your edit count alone) information privately, the information we make public shouldn't be privacy-sensitive, right?
As for performance, we'll be fine. As I said, we've run this kind of thing on wikis before (including enwiki, which is a little bit higher-traffic than hewiki) and we were fine. We just didn't do much with the data at the time IIRC (apart from looking at 'how many times was each link clicked for the duration of the tracking', which only tells you which links are used and which aren't), and it's too old to be useful now.
Roan Kattouw (Catrope)
Amir E. Aharoni wrote:
2011/8/16 MZMcBride z@mzmcbride.com:
An extension such as ClickTracking is probably a much better option if you can get local consensus and sysadmin support.
Are there any special implications i should know about? Privacy? Performance? Anything else?
Roan thoroughly covered the privacy and performance concerns. To clarify what I meant by "sysadmin support," you'll need a sysadmin to fulfill the "shell" bug in a timely manner (to enable the extension, once there's consensus) and you'll need a sysadmin to aggregate the stats for you (as Roan noted), as I don't believe this information will be considered public. It looks like there are partial views for the tables (click_tracking and click_tracking_events) on the Toolserver, for what it's worth.
MZMcBride
On Tue, Aug 16, 2011 at 4:47 PM, MZMcBride z@mzmcbride.com wrote:
It looks like there are partial views for the tables (click_tracking and click_tracking_events) on the Toolserver, for what it's worth.
Oh wow, I had no idea those were there. I'm not entirely sure that's a good idea. It's a partial view but it only suppresses the session_id field.
Anyway, I'll let the toolserver folks decide on the privacy concerns around exposing the edit count fields (like I said, that can be identifiable information in certain cases). This does mean anyone with a toolserver account can do analysis on the data as it comes in, which is a good thing cause it means less work for me :)
Roan Kattouw (Catrope)
On Tue, Aug 16, 2011 at 10:56 AM, Roan Kattouw roan.kattouw@gmail.com wrote:
On Tue, Aug 16, 2011 at 4:47 PM, MZMcBride z@mzmcbride.com wrote:
It looks like there are partial views for the tables (click_tracking and click_tracking_events) on the Toolserver, for what it's worth.
Oh wow, I had no idea those were there. I'm not entirely sure that's a good idea. It's a partial view but it only suppresses the session_id field.
Anyway, I'll let the toolserver folks decide on the privacy concerns around exposing the edit count fields (like I said, that can be identifiable information in certain cases). This does mean anyone with a toolserver account can do analysis on the data as it comes in, which is a good thing cause it means less work for me :)
Roan Kattouw (Catrope)
I asked about it in the #toolserver IRC channel and had this conversation with River:
<sumanah> I have heard some discussion of a privacy implication around exposing edit count fields: http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/thread.html#5472... It sounds as though the toolserver community should consider whether this needs to be changed? <felicity> sumanah: which message talks about edit counts? wm-de would need to decide on that, but personally i don't see a problem with it <felicity> it's just a pre-computed version of information that's trivially available from the database anyway <sumanah> felicity: I didn't totally follow it, but http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054720.html and http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054730.html seem to have the gist <felicity> sumanah: the version of click_tracking exposed to users is redacted: http://p.tcx.org.uk/63 <felicity> sumanah: it was originally requested by Roan: https://jira.toolserver.org/browse/TS-1012 so I assume WMF is okay with it
I'm not personally interested in pushing forward on any investigation, consensus-building, etc. regarding this issue, but wanted to put this information out there in case someone else is.
Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation
wikitech-l@lists.wikimedia.org