Will the new platform be located on stats.wikimedia.org as well?
Do you know what the sitemap for the replacements of Wikistats will be? I
would like to be able to refer people to the Wikistats successor from the
LearnWiki videos, rather than refer them to links that may be dead by the
time that viewers two years from now see the videos.
After consulting with several people on several departments, we have
scheduled a maintenance window (downtime) for dbstore1002 (alias
analytics-store, and s2 *to* s7-analytics-slave) for Thursday 21 July
between14:00-15:00 UTC .
The downtime is expected to be of only 5-10 minutes, but in case
something goes wrong, we will reserve the entire hour. During the
maintenance, this host will stop replication from eventlogging and the
production shards, kill all ongoing queries and stop responding to new
ones. User databases on this host will be unavailable, too.
The recommendation is to use db1047 / analytics-slave /
s1-analytics-slave, which also has an up-to-date version of
eventlogging and shards s1 and s2 for the following 3 days to avoid
service interruption. I see some crons running on dbstore1002, contact
me privately if you want to change them and require help.
I will send an update when the maintenance is over to confirm normal
work can resume.
Hello Wikimedia analytics mailing list,
As part of research into how people read Wikipedia, a friend and I created
a short survey. We are interested in seeing how people on this mailing list
(not a representative sample of Wikipedia readers for sure!) fill the
survey. The survey should take 2 to 10 minutes to complete.
I would also appreciate if any of you have the ability to circulate the
survey to a different audience. If you are interested in doing that, please
let me know (off-list, if you prefer) and I will give you a separate URL
through which to do so for each such audience. The URLs represent different
audiences to whom the survey is shared so that it is easier to understand
how responses differ based on audience.
Any feedback on the survey questions would also be appreciated, on- or
Thank you very much!
Are there heatmaps somewhere that show the geographic distribution of
Wikipedia readers and editors by country?
I realize that country-specific data can be sensitive for countries with
small populations of readers and/or editors, and I don't need fine detail.
Incrementing heatmaps in units of hundreds of readers and editors, or
readers and editors per thousand population, is probably sufficient.
These would be great visualizations to have if they're available.
I am working on a project the uses page view numbers for wiki articles and
I was hoping somebody could help me out. I am using wikipedia redirects to
find aliases for query names. Unfortunately there is a lot of noise in the
redirects. I was hoping to use the page views as a heuristic to weed out
bad redirects. I was looking at the page view files but the ones on
stats.grok.se are hourly which is too much to process in a reasonable
amount of time. I was wondering if anybody had (or knew where I could
access) page view files for a longer amount of time like yearly, monthly,
or even daily. I need to able to download the file locally because I will
be dealing with a lot of query names. I appreciate any help you can
Is there an easy way to rank our projects, with languages being
consolidated, by (1) size in GB, or (2) number of content pages, or (3)
number of active users in the previous month? I imagine that the ordered
list would look something like this: Commons, all Wikipedias, all
Wiktionaries, all Wikisources, Wikispecies, Wikidata, etc.
If there's an easy way to get an ordered list I'd like to include that info
in an introductory portion of my LearnWiki video tutorials, but if this
question would consume more than a few moments of staff time to research
then I can skip it. I'm thinking that the sizes of the databases would be
an easy way to measure sizes in GB, but I don't know with certainty if the
databases are consolidated or if each language has its own database.
This might be of interest: https://clickhouse.yandex/
ClickHouse is an open-source column-oriented database management
system that allows generating analytical data reports in real time.
ClickHouse manages extremely large volumes of data in a stable and
sustainable manner. It currently powers Yandex.Metrica, world’s second
largest web analytics platform, with over 13 trillion database records
and over 20 billion events a day, generating customized reports
on-the-fly, directly from non-aggregated data. This system was
successfully implemented at CERN’s LHCb experiment to store and
process metadata on 10bn events with over 1000 attributes per event
registered in 2011.