Bon dia / Hi,
I would like to ask around about the fact that we have very significant imbalances of official statistics when it comes to the traffic in our wiki projects.
Let's look at the reading gross value for the English Wikipedia on January 2024. Without applying any kind of filtering to their searching method, the website Wikimedia Stats shows 12.697.373.117 visits, while the Pageviews tool from the WikimediaCloud in Toolforge yields 8.864.755.474 visits.
This is a huge figure disparity, and both data repositories are hosted "under the same roog" and most likely are the two more widely used tools for this purpose. Am I wrong? For a smaller project, like the Catalan Wikipedia, there is a 19.5M vs 64.5M inconsistency... Which changes a lot our conclusions in a tech situation in which we are socially dealing with Google hiding results in our language from its search engine since a couple of years ago. What am I missing about these value differences? As an experienced editor, I have been regularly digging in our available data tools for several years. But it's difficult not to encounter frequent problems of comparability, false positives, and reliability. Not only for my personal pleasure, but when it comes to easily explain our projects' data to the outside world via referring to a trustworthy portal.
That adds up to the fact that we are not able to see itemised statistics for some small countries in Wikimedia Stats. We can filter how many millions of visits does the Dutch Wikipedia get in Belgium or in the Netherlands, but we cannot see which is the language use of each Wikipedia in Belgium (% of readers that accesses it in French, Dutch, German, Walloon, Picard, English, etc). That feature disappeared in 2018 with the last update of the dismantled WiViVi portal. Altogether, it makes it impossible to tackle biases or plan actions by chapters, user groups or even academic policies regarding awareness or revitalization of minoritized and endangered languages.
I am afraid that, this being my experience as a long-term editor, the ones of newcomers, journalists, or even scientists may still be much more uncertain and confusing. Hopefully someone can help to figure out some of these questions.
Salutacions / Best regards,
Xavier Dengra
Hi! I think the discrepancy is because Wikimedia Stats counts all views by default, whereas Siteviews only counts User views (excluding Bots and Spiders) by default. Once you exclude Bots and Spiders from Wikimedia Stats, or include them in Siteviews, the figures match.
On Thu, Feb 15, 2024 at 9:30 AM F. Xavier Dengra i Grau via Wikimedia-l < wikimedia-l@lists.wikimedia.org> wrote:
Bon dia / Hi,
I would like to ask around about the fact that we have very significant imbalances of official statistics when it comes to the traffic in our wiki projects.
Let's look at the reading gross value for the English Wikipedia on January 2024. Without applying any kind of filtering to their searching method, the website Wikimedia Stats shows 12.697.373.117 visits, while the Pageviews tool from the WikimediaCloud in Toolforge yields 8.864.755.474 visits.
This is a huge figure disparity, and both data repositories are hosted "under the same roog" and most likely are the two more widely used tools for this purpose. Am I wrong? For a smaller project, like the Catalan Wikipedia, there is a 19.5M vs 64.5M inconsistency... Which changes a lot our conclusions in a tech situation in which we are socially dealing with Google hiding results in our language from its search engine since a couple of years ago. What am I missing about these value differences? As an experienced editor, I have been regularly digging in our available data tools for several years. But it's difficult not to encounter frequent problems of comparability, false positives, and reliability. Not only for my personal pleasure, but when it comes to easily explain our projects' data to the outside world via referring to a trustworthy portal.
That adds up to the fact that we are not able to see itemised statistics for some small countries in Wikimedia Stats. We can filter how many millions of visits does the Dutch Wikipedia get in Belgium or in the Netherlands, but we cannot see which is the language use of each Wikipedia in Belgium (% of readers that accesses it in French, Dutch, German, Walloon, Picard, English, etc). That feature disappeared in 2018 with the last update of the dismantled WiViVi portal. Altogether, it makes it impossible to tackle biases or plan actions by chapters, user groups or even academic policies regarding awareness or revitalization of minoritized and endangered languages.
I am afraid that, this being my experience as a long-term editor, the ones of newcomers, journalists, or even scientists may still be much more uncertain and confusing. Hopefully someone can help to figure out some of these questions.
Salutacions / Best regards,
Xavier Dengra
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
wikimedia-l@lists.wikimedia.org