Yesterday in the quarterly review Dan mentioned that our current user
satisfaction metric uses the somewhat arbitrary 10s dwell time cutoff for a
successful search, and that we want to use a survey to correlate
qualitative and quantitative values to pin down a better cutoff for our
users. I don't remember whether Dan mentioned it, or I was just rehashing
the notion on my own, but it may be difficult to pin down a specific cutoff.
A wild thought appears! Why do we have to pin down a specific cut off? Why
can't we have a probabilistic user satisfaction metric? (Other then
complexity and computational speed, which may be relevant.)
We have the ability to gather so much data that we could easily compute
something like this: 20% of users are satisfied when dwell time is <5s, 35%
for 5-10s, 75% for 10-60s, 98% for 1m-5m, 85% for 5m-20m, and 80% for >20m.
Determining the cutoffs might be tricky, and computation is more complex
than counting, but not ridiculously complicated, and potentially much more
accurate for large samples. Presenting the results is still easy: "54.7% of
our users are happy with their search results based on our dwell-time
model".
I tried to do a quick search for papers on this topic, but I didn't find
anything. I'm not familiar with the literature, so that may not mean much.
Okay, back to the TextCat mines....
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation