Hoi, For the English Wikipedia there is a page where you find the most often failed searches [1]. We have asked for the code for this software and we received it. What we want is expand the functionality and use it for any language.
When we do, we want to differentiate in failed searches that do not exist in Wikidata either and the failed searches that exist in Wikidata.
As you may know searching Wikidata as well has been enabled on several Wikipedias among them the Italian and the Polish Wikipedia. This allows us to provide access to articles in other languages, it allows for finding images in Commons because of a link to a Commons category, obviously access to Wikidata and access in a more visual way care of the "Reasonator".
NB this functionality would add value to the en,wp as well because as you may know 51% of the de.wp articles are linked to the en.wp...
What I am looking for is to have the developer who will modify the software have access to the data. Magnus Manske is a well known and trusted developer. He is the one that started MediaWiki. It is for him that I ask access.
Our theory is that when we add Wikidata items, we will get more quickly at the tipping point where search is actually useful (remember, there are 280+ Wikipedias). We expect that when we advertise what subjects do not have a Wikipedia article but do have a Wikidata item, we will stimulate the writing of articles that prove popular.
In effect we expect to engage in data driven user participation.
If you have any questions or suggestions I am happy to hear them. If someone can get access to the data for Magnus please let me know. Thanks, Gerard
Hoi, Sorry .. the link [1] and the blog post [2] I wrote when I learned about it. Thanks, Gerard
[1] https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks [2] http://ultimategerardm.blogspot.nl/2013/11/a-brilliant-idea-barnstar.html
On 19 December 2013 12:00, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Gerard Meijssen, 19/12/2013 09:54:
Hoi,
For the English Wikipedia there is a page where you find the most often failed searches [1].
Link missing!
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Gerard Meijssen, 19/12/2013 12:06:
Hoi, Sorry .. the link [1] and the blog post [2] I wrote when I learned about it. Thanks, Gerard
[1] https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks [2] http://ultimategerardm.blogspot.nl/2013/11/a-brilliant-idea-barnstar.html
Ah. Those are not searches, they're direct URL accesses (where enabled, wdsearch.js shows wikidata search results for those too). So again that would require the good old https://bugzilla.wikimedia.org/show_bug.cgi?id=42259 , our usual blocker. :( Actual search results misses are something quite harder to get.
Nemo
Hoi,
As I said, there is software that does basically what we need it to do. I am asking for access for Magnus so that he can modify that software and make it more useful.
Waiting for perfection takes too long. The need for this functionality exists and the arguments are in my initial mail. Thanks, GerardM
On 19 December 2013 12:10, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Gerard Meijssen, 19/12/2013 12:06:
Hoi,
Sorry .. the link [1] and the blog post [2] I wrote when I learned about it. Thanks, Gerard
[1] https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks [2] http://ultimategerardm.blogspot.nl/2013/11/a-brilliant-idea-barnstar.html
Ah. Those are not searches, they're direct URL accesses (where enabled, wdsearch.js shows wikidata search results for those too). So again that would require the good old https://bugzilla.wikimedia. org/show_bug.cgi?id=42259 , our usual blocker. :( Actual search results misses are something quite harder to get.
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Greetings,
I am the individual who provided code to Gerard. Towards the Bugzilla entry serving as "blocker" for this and many other inquiries, I will note that my code fires nightly to obtain one days worth of pageview stats and does write them to an SQL database. I have been persistently storing all pageview statistics for en.wp in this query-able format for 2+ years at this point. I then use this data in my research, as well as reports such as [https://en.wikipedia.org/wiki/Wikipedia:Top_5000_pages] and [https://en.wikipedia.org/wiki/Wikipedia:TOPRED]. Is it production ready? Probably not, but it works for me as research code.
My limitations with this are primarily hardware based. I do it all on a single commodity server that also runs services like [[WP:STiki]]. Thus: (a) I don't particularly have the storage to do all languages/projects. CPU cycles would also become an issue at this scale. It can take up to 3 hours to parse in a day's worth of en.wp stats. It could be done quicker, but with my query-driven indices and scalable format, this is how it goes. (b) I am not in a position to open this as a private or public API. It would be trivial to DOS this server with some pretty simple queries (en.wp sees 10 million+ article titles daily, I think, as this data includes attempted URL accesses that don't exist and there is all types of muck in that regard).
I am not sure what Gerard is chasing in particular with "missing searches", but regardless, I get an overwhelming amount of requests to do popular pages or redlinks reports for various projects/languages. My code could do this by changing a small handful of strings, what is really needs is a place to run and someone to oversee it. More than a dev, this seems to be in the realm of someone like Erik Zachte, not that I am trying to append to anyone's responsibilities. -AW
On 12/19/2013 06:14 AM, Gerard Meijssen wrote:
Hoi,
As I said, there is software that does basically what we need it to do. I am asking for access for Magnus so that he can modify that software and make it more useful.
Waiting for perfection takes too long. The need for this functionality exists and the arguments are in my initial mail. Thanks, GerardM
On 19 December 2013 12:10, Federico Leva (Nemo) <nemowiki@gmail.com mailto:nemowiki@gmail.com> wrote:
Gerard Meijssen, 19/12/2013 12:06: Hoi, Sorry .. the link [1] and the blog post [2] I wrote when I learned about it. Thanks, Gerard [1] https://en.wikipedia.org/wiki/__User:West.andrew.g/Popular___redlinks <https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks> [2] http://ultimategerardm.__blogspot.nl/2013/11/a-__brilliant-idea-barnstar.html <http://ultimategerardm.blogspot.nl/2013/11/a-brilliant-idea-barnstar.html> Ah. Those are not searches, they're direct URL accesses (where enabled, wdsearch.js shows wikidata search results for those too). So again that would require the good old https://bugzilla.wikimedia.__org/show_bug.cgi?id=42259 <https://bugzilla.wikimedia.org/show_bug.cgi?id=42259> , our usual blocker. :( Actual search results misses are something quite harder to get. Nemo _________________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.__wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wiki-__research-l <https://lists.wikimedia.org/mailman/listinfo/wiki-research-l>
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org