Yesterday in the quarterly review Dan mentioned that our current user satisfaction metric uses the somewhat arbitrary 10s dwell time cutoff for a successful search, and that we want to use a survey to correlate qualitative and quantitative values to pin down a better cutoff for our users. I don't remember whether Dan mentioned it, or I was just rehashing the notion on my own, but it may be difficult to pin down a specific cutoff.
A wild thought appears! Why do we have to pin down a specific cut off? Why can't we have a probabilistic user satisfaction metric? (Other then complexity and computational speed, which may be relevant.)
We have the ability to gather so much data that we could easily compute something like this: 20% of users are satisfied when dwell time is <5s, 35% for 5-10s, 75% for 10-60s, 98% for 1m-5m, 85% for 5m-20m, and 80% for >20m.
Determining the cutoffs might be tricky, and computation is more complex than counting, but not ridiculously complicated, and potentially much more accurate for large samples. Presenting the results is still easy: "54.7% of our users are happy with their search results based on our dwell-time model".
I tried to do a quick search for papers on this topic, but I didn't find anything. I'm not familiar with the literature, so that may not mean much.
Okay, back to the TextCat mines....
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation