[Mediawiki-api] About Statistical Data from Query of Multiple Duplicates

7 Feb 2020

Hi folks.

*This is my first time using this mail list*, so if this is not the right
place to ask this kind of question please lemme know about how I should
proceed in this case.

*Question*
I have basically downloaded from MediaWiki API a lot of pages related to
mathematics. Some of them are just *duplicated of the same Article*, but
with one difference being their title, such as different way os calling the
same subject, or letter that differs from one and another, ao so on and so
forth.

One example that I can show you right away is:

   - "Adição_de_*s*egmentos", and
   - "Adição_de_*S*egmentos",

both written in portuguese (my native language). The only difference
between the titles are just the lowercase and uppercase of the letter
"s".As I was testing on the URL's, it seems that *they both are the same
article, but redirecting from different links to the official "title".*

Keeping in mind those kind of duplicates, when I've started *to analyse the
statistics of views on a specific article*, while going through its cases,
I was expecting to receive the following structure of data:

   - The old ones (deprecated) would hold views until some day X, and then
   it would have nothing to further count and show;
   - The up-to-date titles would have data starting from day X and then
   would hold until the last day that I want to analyse.

Nothing too crazy to expect from the database. But that was not what
happened. *There are plenty of articles that are still receiving views even
though they all redirect to another article*. At first, I've just thought
that people are getting to the articles's content with different links
available on search engines, such as google, so all views must be
independent from one another. The problem is, after testing on the google
platform different search for *the same Wikipedia's article I can only get
the* *up-to-date articles, not the old ones.*

   1. How can this be possible?
   2. But more important for me, are all acesses on the deprecated articles
   made by bots or old links available on old pages from other sites?
   3. Are the count on all different article's title independent?
   4. If so, how could I be able to even track all the possible acesses on
   a particular subject to create an effective study o it?

Anyway, this is (if I remember well) the fourth time I'm trying to get a
proper answer for my question, and I'm hopping I'll get it soon.

Thanks!

Marco Antonio

Graduando em Matemática Pura na USP | Divulgador Científico

<https://www.facebook.com/ViaSaber> <https://www.linkedin.com/in/magcastro/>
<https://www.instagram.com/marcoantoniograziano/>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

[Mediawiki-api] About Statistical Data from Query of Multiple Duplicates