When user is looking for e certain information on Wikipedia, it sometimes find this information by using redirects. Not very often but number of hits on redirects can be hundreds or even thousands times more than number of hit on corresponding articles. So if you consider subjects of what users were looking for it does not make sense to separate hits on redirects from hits on corresponding articles. When calculating data for wikipediatrends.com we combined URLs of articles and their redirects exectly like Oliver described. It was not very hard because we did it before summarising and cleaning raw data.
On Sun, Apr 27, 2014 at 9:21 PM, Oliver Keyes okeyes@wikimedia.org wrote:
The problem with that is that, as Henrik said, it works on the basis of URLs, not page names. The only way to discover "X is a redirect to Y" would be to prod the database for that information, for every unique URL.
On 27 April 2014 05:36, Jane Darnell jane023@gmail.com wrote:
Henrik,
I don't know about the page history part of this question, but going forward it would be nice to offer an extra option to include hits on all incoming redirects though. There are two issues with this. The first is when the name needs disambiguation and gets split out into towns, provinces or countries (for places) or gets split out into occupations (for people). The second is when wikipedians go through with bots and "correct" links. I assume they do this so that the page hits will become more representative, but I often work on old names and I try to preserve original spelling (especially by older authors) to increase the "findability". To do this I create a lot of redirects based on older spellings to use in pages where the older spelling is used in a reference. I've noticed "redirect" corrections in cases where there is no disambiguation needed, and so I think offering an option for all redirects might help stop that behavior.
Jane
2014-04-27 12:34 GMT+02:00, Henrik Abelsson henrik@abelsson.com:
On 2014-04-27 08:45, ENWP Pine wrote:
- I think it would be desirable to add an https option to
stats.grok.se so that viewers' interests in page readership statistics are more private.
Hm, why not? I'll request a certificate and start serving https also. I hadn't really thought of page readership statistics as something all that sensitive, but I don't see any downside to also serving https.
- There is an issue in the statistics given at [1]. As you can see
from [2] editors created and edited the project page on days when stats.grok.se said there were no pageviews. This may be the result of a page move [3] [4] and the pre-move views were not integrated into the results shown in [1]. Is this the expected and desired behavior? From my point of view as a Signpost author, this is undesirable as we try to track our readership statistics. I think it is the case that if page A is moved to new page B then the statistics for page A should be integrated into those for page B and a notice should be given to the viewer that the statistics for page B includes those from page A which was moved on date X. This problem may affect other pages that are the subject of mergers. I think it would be the case that if page A is merged into page B then we would want some notice to appear on stats.grok.se alerting the viewer that there was a merger, the date of the merger, and offering the viewer a way to select statistics with and without the historical information from page A as they look at the viewership statistics for page B.
That would indeed be better. However, the statistics data tracks URLs rather than pages and it's computationally expensive to look up the page history and what URLs it has been accessible through. The average throughput of view statistics is that some tens of thousands of entries are added per minute 24/7. One could perhaps do it as the data was requested, but that would mean making several round-trips to the WMF servers to look up the history of all moves and correlate URLs across time. It's certainly possible to build a tool that does that on top of the stats.grok.se and wikipedia APIs though.
-henrik
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics