Also, just out of curiosity and to better understand
the issue, what
would be an example of a real life request URL that results in such a
"no page title found" error when extracting the title?
Special page
requests, for example.
Normally pages like "Special:Blah" are "actions" not pages themselves.
We
do not count those as pageviews with the notably exception of Search
requests (as they do provide content). So a page like "Special:Search:
Blah-Blah" would be an example of a pageview with title "-" on
pageview_hourly table.
On Mon, Dec 5, 2016 at 3:15 PM, Tilman Bayer <tbayer(a)wikimedia.org> wrote:
On Mon, Nov 14, 2016 at 12:25 PM, Nuria Ruiz
<nuria(a)wikimedia.org> wrote:
Thanks for the
documentation. Does this only affect data provided by
the API, or also the page_title
field in the pageview_hourly table, i.e. the source of the API data?
In the latter case, please also add a note to the "known problems" at
https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly .
(This is the canonical place for documenting such issues - thanks for
making this explicit at
https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Issues_with_data
.
Separately, for pageview definition changes there is also
https://meta.wikimedia.org/wiki/Research:Page_view#Change_log . No
objections of course if the Analytics team commits to keeping the
information up to date in all three places.)
Also, just out of curiosity and to better understand the issue, what
would be an example of a real life request URL that results in such a
"no page title found" error when extracting the title?
On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik <vipulnaik1(a)gmail.com> wrote:
>
> Hi Joseph,
>
> Thanks for the clarification.
>
> Any ideas why this number is much higher for some months? In particular,
> on desktop, it's high in the months of July to September 2015 (around 10
> million, compared to the usual 5 million) and then high again in October
> 2016 (45 million, about 10x the usual value).
For context ,
https://en.wikipedia.org/wiki/- was the 8th most viewed
page on all projects from May to October 2015, see footnote [1] at
https://phabricator.wikimedia.org/T117945 (that bug, flagged as "High"
Analytics priority since almost a year, is about a separate but
similar issue)
php?page=-&allmonths=allmonths&drilldown=all
> which summarizes results from the Wikimedia
API (and stats.grok.se for
data
> before July 2015).
>
> Vipul
>
> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou
> <jallemandou(a)wikimedia.org> wrote:
>>
>> Hello Issa,
>>
>> Thank you for your question.
>> The very high number of views of the "-" page is explained by this
dash
>> value being used as a special value for "no page title found" when
>> extracting titles from urls.
>> We definitely should document this in the API, creating this task:
>>
https://phabricator.wikimedia.org/T150249
>> Best
>> Joseph
>>
>>
>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice <riceissa(a)gmail.com> wrote:
>>>
>>> Dear Analytics Mailing List,
>>>
>>> Recently while querying pageviews of various pages, I discovered that
>>> the page whose title is a single hyphen character (i.e. with the title
>>> "-", with URL <https://en.wikipedia.org/wiki/->, which
redirects to
>>> <https://en.wikipedia.org/wiki/Hyphen-minus>) receives an unusually
high
>>> number of pageviews under the
Pageview API. Taking October 2015 as an
>>> example, the page received 5.4 million pageviews during that month
>>> according to the API:
>>>
>>> <https://wikimedia.org/api/rest_v1/metrics/pageviews/per-
article/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
>>>
>>> However, according the stats.grok.se (which was still operational in
the
>>> same month), the page received only
1209 pageviews:
>>> <http://stats.grok.se/en/201510/->.
>>>
>>> Looking at the tabulation of pageviews on Wikipedia Views, the
increase
>>> in pageviews for this page coincides
with the change to the Pageview
>>> API in July 2015:
>>>
>>> <http://wikipediaviews.org/displayviewsformultiplemonths.
php?page=-&allmonths=allmonths&drilldown=all>.
>>>
>>> As I understand, page titles must be URL-encoded before the query,
>>> but the URL-encoding of "-" is itself.
>>>
>>> I looked at the API documentation but did not see this behavior
listed,
> so I am wondering where these numbers are coming
from.
>
> Best regards,
> Issa
>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
Joseph Allemandou
Data Engineer @ Wikimedia Foundation
IRC: joal
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics