hi,
our team is trying to determine how pageviews are attributed to pages that redirect to other pages.
for instance, the page Panic!_at_the_disco redirects to the page Panic!_at_the_Disco, however, in the pageview dumps file there is an entry for both Panic!_at_the_disco and Panic!_at_the_Disco. does this mean that a single visit to the page Panic!_at_the_disco generates two entries in the pageview dumps file (one entry for the source page of the redirect and another for the target page of the redirect)?
-best, -ace
Pageview data (both in the dumps and pageviews API) is counted for the nominal page title as requested, i.e. it is agnostic as to what that title redirects to.
To obtain a complete dataset of pageviews across all redirects for a given page you would need to reconstruct its redirect graph over the time range you're interested in, which is a pretty laborious process.
If you're doing research on this topic, you may be interested in this recent work by Mako Hill and Aaron Shaw looking at redirects and how they affect the quality of data on Wikipedia articles.
*Consider the Redirect: A Missing Dimension of Wikipedia Research* http://dx.doi.org/10.1145/2641580.2641616
HTH, Dario
On Thu, Aug 25, 2016 at 5:25 AM, Aubrey Rembert arembert@pandora.com wrote:
hi,
our team is trying to determine how pageviews are attributed to pages that redirect to other pages.
for instance, the page Panic!_at_the_*d*isco redirects to the page Panic!_at_the_*D*isco, however, in the pageview dumps file there is an entry for both Panic!_at_the_disco and Panic!_at_the_Disco. does this mean that a single visit to the page Panic!_at_the_disco generates two entries in the pageview dumps file (one entry for the source page of the redirect and another for the target page of the redirect)?
-best, -ace
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 25 August 2016 at 05:25, Aubrey Rembert arembert@pandora.com wrote:
hi,
our team is trying to determine how pageviews are attributed to pages that redirect to other pages.
for instance, the page Panic!_at_the_*d*isco redirects to the page Panic!_at_the_*D*isco, however, in the pageview dumps file there is an entry for both Panic!_at_the_disco and Panic!_at_the_Disco. does this mean that a single visit to the page Panic!_at_the_disco generates two entries in the pageview dumps file (one entry for the source page of the redirect and another for the target page of the redirect)?
Dario's reply covers it but just wanted to elaborate a tiny bit:
For traditional web redirects your assumption would be true. There, a request for a redirect url would get a 30x HTTP response with a pointer to the target url. In which case the web browser will make a subsequent request for the target url so that it can show the desired page.
However wiki redirects don't work that way. A request for a redirect wiki page immediately results in a response for the target article (with a little "Redirect from" caption under the title). So there'll be only 1 web request, 1 response, and 1 page view, logged under the redirect name.
-- Timo