On Mon, Nov 14, 2016 at 12:25 PM, Nuria Ruiz nuria@wikimedia.org wrote:
This is documented now here:
https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas
Thanks for the documentation. Does this only affect data provided by the API, or also the page_title field in the pageview_hourly table, i.e. the source of the API data?
In the latter case, please also add a note to the "known problems" at https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly . (This is the canonical place for documenting such issues - thanks for making this explicit at https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Issues_with_data . Separately, for pageview definition changes there is also https://meta.wikimedia.org/wiki/Research:Page_view#Change_log . No objections of course if the Analytics team commits to keeping the information up to date in all three places.)
Also, just out of curiosity and to better understand the issue, what would be an example of a real life request URL that results in such a "no page title found" error when extracting the title?
On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik vipulnaik1@gmail.com wrote:
Hi Joseph,
Thanks for the clarification.
Any ideas why this number is much higher for some months? In particular, on desktop, it's high in the months of July to September 2015 (around 10 million, compared to the usual 5 million) and then high again in October 2016 (45 million, about 10x the usual value).
For context , https://en.wikipedia.org/wiki/- was the 8th most viewed page on all projects from May to October 2015, see footnote [1] at https://phabricator.wikimedia.org/T117945 (that bug, flagged as "High" Analytics priority since almost a year, is about a separate but similar issue)
Data is from http://wikipediaviews.org/displayviewsformultiplemonths.php?page=-&allmo... which summarizes results from the Wikimedia API (and stats.grok.se for data before July 2015).
Vipul
On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou jallemandou@wikimedia.org wrote:
Hello Issa,
Thank you for your question. The very high number of views of the "-" page is explained by this dash value being used as a special value for "no page title found" when extracting titles from urls. We definitely should document this in the API, creating this task: https://phabricator.wikimedia.org/T150249 Best Joseph
On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice riceissa@gmail.com wrote:
Dear Analytics Mailing List,
Recently while querying pageviews of various pages, I discovered that the page whose title is a single hyphen character (i.e. with the title "-", with URL https://en.wikipedia.org/wiki/-, which redirects to https://en.wikipedia.org/wiki/Hyphen-minus) receives an unusually high number of pageviews under the Pageview API. Taking October 2015 as an example, the page received 5.4 million pageviews during that month according to the API:
However, according the stats.grok.se (which was still operational in the same month), the page received only 1209 pageviews: http://stats.grok.se/en/201510/-.
Looking at the tabulation of pageviews on Wikipedia Views, the increase in pageviews for this page coincides with the change to the Pageview API in July 2015:
http://wikipediaviews.org/displayviewsformultiplemonths.php?page=-&allmonths=allmonths&drilldown=all.
As I understand, page titles must be URL-encoded before the query, but the URL-encoding of "-" is itself.
I looked at the API documentation but did not see this behavior listed, so I am wondering where these numbers are coming from.
Best regards, Issa
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Joseph Allemandou Data Engineer @ Wikimedia Foundation IRC: joal
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics