Hi all,
I'd like to float a small extension to the Clickstream dataset that I think
would make it considerably more useful for studying how AI assistants are
reshaping traffic to Wikipedia.
Right now the referrer mapping scheme gives an external search engine its
own value (other-search) but folds everything else into other-external, or
into other-empty when the referrer is stripped. That means clickthroughs
from AI assistants — ChatGPT, Perplexity, Gemini, Copilot, Claude, and so
on — are effectively invisible, mixed in with the rest of the external web.
My suggestion is to add one more mapped value, say other-ai, populated by
matching referrers against a maintained list of AI-assistant / LLM product
domains. The payoff is that the Clickstream becomes a longitudinal
instrument for a question the community is already wrestling with: how much
traffic AI intermediaries actually send back to Wikipedia, and how that's
trending over time.
Thanks for considering it,
Kai Zhu
Bocconi University