It was great to meet you at IA yesterday, thanks for following up with this link to your work. Very interesting and coincides with our own work on using the completion suggester to replace the current prefix search used on-wiki.
Have you put any thought into normalizing page view data? One thing we have been trying to figure out (but on the back-burner as we focus on currently quarterly goals) is how best to integrate page views ( https://phabricator.wikimedia.org/T112681). Because we have to do this across many wiki's with a wide varience in page views, and we want to use the data not only for the completion suggester but also within our full text search results, we are thinking about normalizing the data down to a % of page views for that wiki over a time period. Possiblying taking in a larger time period of page views and weighting newer page views as more important than older page views. Additionally we are looking into if we should be batch loading page view information on a weekly basis, or if we can load page view data only when pages are edited (or some combination of the two). I've pinged david and trey with this and they might have some questions for you :)
For comparison here is similar data but with a different scoring algorithm david worked up that reuses the same data we use for rescoring full text searches: https://en.wikipedia.org/w/api.php?action=cirrus-suggest&text=Que
We havn't yet put this into production because we wanted to integrate page view data into the scoring before running more tests. It looks quite promising based on your initial
On Fri, Nov 13, 2015 at 11:07 AM, Greg Lindahl lindahl@pbm.com wrote:
I've been working on book search at the Internet Archive, and I've been using Wikipedia article titles and redirects as entities and synonyms. I wanted to build autocomplete for this gizmo, so I downloaded 7 days of pageviews for the en Wikipedia, and wrote a tiny script to sum them up. It worked great!
Here's the demo (currently live, will disappear eventually). "number" is the pageviews count.
curl http://researcher3.fnf.archive.org:8080/autocomplete?q=Que | json_pp { "autocomplete" : [ { "number" : 68310, "label" : "Queen Victoria" }, { "number" : 53283, "label" : "Quentin Tarantino" }, { "number" : 29192, "label" : "Quebec" }, { "number" : 23717, "label" : "Queen Elizabeth The Queen Mother" }, { "number" : 20500, "label" : "Quetiapine" } ] }
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics