for this). There are change tags
<https://www.mediawiki.org/wiki/Help:Tags> that identify redirect changes
-- "mw-new-redirect" and "mw-changed-redirect-target", specifically --
but
I am not sure if this is easily searchable via the action API. Someone on
this list might know.
Redirects can be created directly, say as an alternate name or misspelling
of an article (i.e. "Barak Obama" redirects to "Barack Obama", it was
never
an article on its own). Usually when a page is moved a redirect is left
behind at the old location ("20th Century Fox" was recently renamed to
"20th Century Studios"), but sometimes the redirects are suppressed. So you
could focus only on page moves, in which case you could query the page move
log using the logevents API <https://www.mediawiki.org/wiki/API:Logevents>,
specifically with letype=move. From that you could piece together the
pageviews. You wouldn't be including traffic originating from redirects
that were never the target article, but from my experience this is the
minority, since Google and the like usually link to the target and not
redirects.
There's also a task to make the Toolforge tool go by the page move log
automatically <https://phabricator.wikimedia.org/T141332>, since it is such
a common need. The same caveats exist though; say articles Foo and Bar have
been moved back and forth a few times, you might need to check the move
logs of both and not just Foo. It can be quite tricky!
Overall I would say including all redirects is probably your best bet.
Allow me to clarify the Redirect Views tool does offer date filtering <
just as the main Pageviews tool
does. If you do need automation, you could write a script to the query the
redirects API <https://www.mediawiki.org/wiki/API:Redirects> and then the
REST API <https://w.wiki/J8K>, which is all that that tool does.
Hope this helps!
~ MA
On Mon, Feb 24, 2020 at 6:40 PM James Gardner <gardnerj2(a)carleton.edu>
wrote:
Thanks for the clarification of how redirects work,
and what we should
keep in mind when trying to count pageviews. Do you know if there's a way
to find the date(s) when a page is redirected using the API? We know we can
get the 'old' page ids of redirected pages using the API, but we're not
sure if using the creation date of these page ids would be accurate. Also,
what's the difference between redirects and page moves if there is one?
We may stick to including redirects without trying to avoid overcounting
as this appears to be a more complicated issue that we thought. We are
working to collect pageviews within a specific time frame, so relative
dates isn't quite what we're looking for.
Thanks again!
Jackie, James, Junyi, Kirby
On Mon, Feb 24, 2020 at 10:52 AM MusikAnimal <musikanimal(a)gmail.com>
wrote:
> > We attempted to use the
wmflabs.org tool, but it only shows data from
> a certain date
>
> I'm assuming you want relative dates, not exact dates? You can do this by
> using the range=latest-N URL parameter (where N is the number of days). See
> <https://tools.wmflabs.org/pageviews/url_structure/> and <
>
https://tools.wmflabs.org/redirectviews/url_structure/> for Redirect
> Views. This mirrors the pvipdays parameter of the action API.
>
> I'm sorry there is no backend for these tools, so if you need automation
> you'll have to scrape it or re-implement it's logic yourself.
>
> > In the end, we are trying to get an accurate count of view for a
> certain page no matter the source.
>
> Keep in mind that redirects can change, and historically may have not
> been the "same" page. For instance, if I create the article Foo, and
> someone else creates Bar, and some months later Foo is redirected to Bar.
> To accurately get the views of just Bar, you'll need to somehow exclude the
> time when Foo was a different article. Page moves can also cause unexpected
> results (Foo is moved to Baz, Bar is moved to Foo, etc.). Finally, page IDs
> can change too, say if I delete Foo, then move Bar to Foo. There isn't a
> foolproof solution, it seems, but simply including redirects is usually
> enough to give you what you want.
>
> ~ MA
>
> On Mon, Feb 24, 2020 at 9:18 AM James Gardner via Wikitech-l <
> wikitech-l(a)lists.wikimedia.org> wrote:
>
>> Hi all,
>>
>> Thanks for all the help and advice with this issue, especially with the
>> wmflabs tool with the redirect view tool. We'll try using that tool to
>> download the pageview data we need and manually filter by dates to map
>> redirects to the page. We'll also look into the REST API that Wiki has to
>> see if it can help us as well.
>>
>> Thanks again,
>>
>> Jackie, James, Junyi, Kirby
>>
>>
>> On Sun, Feb 23, 2020 at 10:58 PM Gergo Tisza <gtisza(a)wikimedia.org>
>> wrote:
>>
>> > On Sun, Feb 23, 2020 at 4:17 PM James Gardner via Wikitech-l <
>> > wikitech-l(a)lists.wikimedia.org> wrote:
>> >
>> >> We attempted to use the
wmflabs.org tool, but it only shows data
>> from a
>> >> certain date. (Example link:
>> >>
>> >>
>>
https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=…
>>
<https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&start=2019-07-01&end=2020-01-25&pages=2019%E2%80%9320_Hong_Kong_protests%7CChina>
>> >> <
>>
https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=…
>> >
>> >> <
>> >>
>>
https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=…
>> >> >
>> >> )
>> >>
>> >
>> > There's a redirectview tool (see the "redirects" links at the
bottom of
>> > the page you linked) but it can't be filtered by date so it probably
>> can't
>> > help you.
>> >
>> >
>> >> Then we attempted to use the redirects of a page and using the old
>> page
>> >> ids
>> >> to grab the pageview data, but there was no data returned. When we
>> >> attempted to grab data for a page that we knew would have a long
>> past, but
>> >> the parameter of "pvipcontinue" did not appear (
>> >>
>>
https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bpagevie…
>> >> ).
>> >> (Example:
>> >>
>> >>
>>
https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query&format=j…
>> >> )
>> >>
>> >
>> > That API displays a limited set of metrics and is focused on caching
>> and
>> > being backend-agnostic. There is no way to get old data, pvicontinue
>> is for
>> > fetching data about more pages. If you need something more specific,
>> you
>> > should use the Analytics Query Service (which the other APIs rely on)
>> > directly:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
>> >
>> > I think you'll have to piece the data together using the MediaWiki
>> > redirects API and AQS.
>> >
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>