Hey James, I'm not sure if this will be helpful but I've analyzed the
weekly (and yearly) periodicities in *editor* data for various
I show the day-of-week breakdown for the number of editors on English
and Japanese Wikipedias. I was surprised to see how, on English,
Monday–Thursday see the most editing, with editors taking a break
Friday–Sunday, but on Japanese, Saturday and Sunday see more edits.
Perhaps more useful to you might be how I *quantify* the weekly (or
yearly) periodicity. I have two ways of doing this: you can look at
the auto-correlation with 7-day lags and you can look at the strength
of the periodogram (or any such spectral estimate) at 7-day periods.
(These two are tightly related, since the periodogram is basically
estimating the Fourier transform of the auto-correlation function.)
Both these are discussed at that link, and you should be able to adapt
the method/code to pageview data. (And feel free to reach out if you
want me to summarize those findings, my writing there is very long
because I'm trying to both understand and describe my findings.)
I was just using daily data (the finest-grained data for editors).
Since the pageviews data is available at hourly-granularity, I'm sure
you can run the same kinds of analysis for pageviews and make a
choropleth/heatmap. The only thing that I might mention is that the
hourly pageview data is only available for the last couple of years
(unlike editor data that I show in the link above, which is available
since 2000), but this shouldn't be a big problem.
On Mon, Apr 23, 2018 at 2:29 PM, James Salsman <jsalsman(a)gmail.com> wrote:
[Crossposting to Research and Analytics lists]
Most Wikipedia articles with a weekly periodicity show more pageviews
on a typical weekday than a weekend. Some articles associated with
weekends (e.g. articles associated with a variety of hobbies) will
show relatively fewer pave views on weekdays.
Suppose I wanted to plot a heatmap with colors corresponding to the
strength of the weekly periodicity of the pageviews of articles shown
in different geographic locations.
(1) Has anyone done anything like this before?
(2) Is sufficient information available from the current logging regime?
Finally, I would also like to ask for review of this summarization, please:
Analytics mailing list