Hi folks.
*This is my first time using this mail list*, so if this is not the right
place to ask this kind of question please lemme know about how I should
proceed in this case.
*Question*
I have basically downloaded from MediaWiki API a lot of pages related to
mathematics. Some of them are just *duplicated of the same Article*, but
with one difference being their title, such as different way os calling the
same subject, or letter that differs from one and another, ao so on and so
forth.
One example that I can show you right away is:
- "Adição_de_*s*egmentos", and
- "Adição_de_*S*egmentos",
both written in portuguese (my native language). The only difference
between the titles are just the lowercase and uppercase of the letter
"s".As I was testing on the URL's, it seems that *they both are the same
article, but redirecting from different links to the official "title".*
Keeping in mind those kind of duplicates, when I've started *to analyse the
statistics of views on a specific article*, while going through its cases,
I was expecting to receive the following structure of data:
- The old ones (deprecated) would hold views until some day X, and then
it would have nothing to further count and show;
- The up-to-date titles would have data starting from day X and then
would hold until the last day that I want to analyse.
Nothing too crazy to expect from the database. But that was not what
happened. *There are plenty of articles that are still receiving views even
though they all redirect to another article*. At first, I've just thought
that people are getting to the articles's content with different links
available on search engines, such as google, so all views must be
independent from one another. The problem is, after testing on the google
platform different search for *the same Wikipedia's article I can only get
the* *up-to-date articles, not the old ones.*
1. How can this be possible?
2. But more important for me, are all acesses on the deprecated articles
made by bots or old links available on old pages from other sites?
3. Are the count on all different article's title independent?
4. If so, how could I be able to even track all the possible acesses on
a particular subject to create an effective study o it?
Anyway, this is (if I remember well) the fourth time I'm trying to get a
proper answer for my question, and I'm hopping I'll get it soon.
Thanks!
Marco Antonio
Graduando em Matemática Pura na USP | Divulgador Científico
<https://www.facebook.com/ViaSaber> <https://www.linkedin.com/in/magcastro/>
<https://www.instagram.com/marcoantoniograziano/>