Greetings,
I am the individual who provided code to Gerard. Towards the Bugzilla
entry serving as "blocker" for this and many other inquiries, I will
note that my code fires nightly to obtain one days worth of pageview
stats and does write them to an SQL database. I have been persistently
storing all pageview statistics for en.wp in this query-able format for
2+ years at this point. I then use this data in my research, as well as
reports such as [
https://en.wikipedia.org/wiki/Wikipedia:Top_5000_pages]
and [
https://en.wikipedia.org/wiki/Wikipedia:TOPRED]. Is it production
ready? Probably not, but it works for me as research code.
My limitations with this are primarily hardware based. I do it all on a
single commodity server that also runs services like [[WP:STiki]]. Thus:
(a) I don't particularly have the storage to do all languages/projects.
CPU cycles would also become an issue at this scale. It can take up to 3
hours to parse in a day's worth of en.wp stats. It could be done
quicker, but with my query-driven indices and scalable format, this is
how it goes. (b) I am not in a position to open this as a private or
public API. It would be trivial to DOS this server with some pretty
simple queries (en.wp sees 10 million+ article titles daily, I think, as
this data includes attempted URL accesses that don't exist and there is
all types of muck in that regard).
I am not sure what Gerard is chasing in particular with "missing
searches", but regardless, I get an overwhelming amount of requests to
do popular pages or redlinks reports for various projects/languages. My
code could do this by changing a small handful of strings, what is
really needs is a place to run and someone to oversee it. More than a
dev, this seems to be in the realm of someone like Erik Zachte, not that
I am trying to append to anyone's responsibilities. -AW
On 12/19/2013 06:14 AM, Gerard Meijssen wrote:
Hoi,
As I said, there is software that does basically what we need it to do.
I am asking for access for Magnus so that he can modify that software
and make it more useful.
Waiting for perfection takes too long. The need for this functionality
exists and the arguments are in my initial mail.
Thanks,
GerardM
On 19 December 2013 12:10, Federico Leva (Nemo) <nemowiki(a)gmail.com
<mailto:nemowiki@gmail.com>> wrote:
Gerard Meijssen, 19/12/2013 12:06:
Hoi,
Sorry .. the link [1] and the blog post [2] I wrote when I
learned about it.
Thanks,
Gerard
[1]
https://en.wikipedia.org/wiki/__User:West.andrew.g/Popular___redlinks
<https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks>
[2]
http://ultimategerardm.__blogspot.nl/2013/11/a-__brilliant-idea-barnstar.ht…
<http://ultimategerardm.blogspot.nl/2013/11/a-brilliant-idea-barnstar.html>
Ah. Those are not searches, they're direct URL accesses (where
enabled, wdsearch.js shows wikidata search results for those too).
So again that would require the good old
https://bugzilla.wikimedia.__org/show_bug.cgi?id=42259
<https://bugzilla.wikimedia.org/show_bug.cgi?id=42259> , our usual
blocker. :( Actual search results misses are something quite harder
to get.
Nemo
_________________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.__wikimedia.org
<mailto:Wiki-research-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wiki-__research-l
<https://lists.wikimedia.org/mailman/listinfo/wiki-research-l>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Andrew G. West, PhD
Research Scientist
Verisign Labs - Reston, VA
Website:
http://www.andrew-g-west.com