Hi!
There's a little research project I've been working on in the last few weeks: What are the articles that people are most often looking for in their language, and *cannot* find?
I was doing this by looking at the logs of searches in the language search box in the interlanguage links panel and counting the articles on which searching for a language didn't yield any result.
This can be useful to the editors in different languages for understanding which articles are in demand and should be created. This may also be useful for considering how to reorganize existing articles. Of course, actually doing this is up to the editing communities in each language; I'm just trying to show where exactly does this happen.
My first attempt at producing a report about it can be found here: https://meta.wikimedia.org/wiki/Most_wanted_articles_across_languages
This is my first attempt to make a public version of this report, so you may find some issues there, for example contradicting or missing data. Also, the tables could probably be more nicely designed. Bug reports, suggestions for improvement, and all other feedback is obviously welcome. However, I believe this is good enough for taking a first look and reaching some conclusions.
The two immediate findings that I can see are that the most notable articles that people cannot find fall into the following categories: * Topics that are popular in the news: "Avengers: Infinity War", "General Data Protection Regulation", "Avicii". In particular, I should note that topics that are featured in Google Doodles [1] come up often: "Georges Méliès", "Mahadevi Varma", etc. * Topics that are covered in another language, but cannot be found because of different organization of information. This often happens with articles where there are cultural differences between languages, for example "Football" in the English Wikipedia refers to several different games (I'd guess that many people around the world are interested in "Association Football"). This also often happens with articles about Biology and species: "Homo Sapiens", "Blueberry", etc.; these are organized differently in different Wikipedias.
[1] https://www.google.com/doodles/
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Excellent. Google also provided a list of some of the most missing items in 13 languages of India as part of Project Tiger.
https://meta.wikimedia.org/wiki/Supporting_Indian_Language_Wikipedias_Progra...
James
On Thu, May 31, 2018 at 10:58 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
Hi!
There's a little research project I've been working on in the last few weeks: What are the articles that people are most often looking for in their language, and *cannot* find?
I was doing this by looking at the logs of searches in the language search box in the interlanguage links panel and counting the articles on which searching for a language didn't yield any result.
This can be useful to the editors in different languages for understanding which articles are in demand and should be created. This may also be useful for considering how to reorganize existing articles. Of course, actually doing this is up to the editing communities in each language; I'm just trying to show where exactly does this happen.
My first attempt at producing a report about it can be found here: https://meta.wikimedia.org/wiki/Most_wanted_articles_across_languages
This is my first attempt to make a public version of this report, so you may find some issues there, for example contradicting or missing data. Also, the tables could probably be more nicely designed. Bug reports, suggestions for improvement, and all other feedback is obviously welcome. However, I believe this is good enough for taking a first look and reaching some conclusions.
The two immediate findings that I can see are that the most notable articles that people cannot find fall into the following categories:
- Topics that are popular in the news: "Avengers: Infinity War", "General
Data Protection Regulation", "Avicii". In particular, I should note that topics that are featured in Google Doodles [1] come up often: "Georges Méliès", "Mahadevi Varma", etc.
- Topics that are covered in another language, but cannot be found because
of different organization of information. This often happens with articles where there are cultural differences between languages, for example "Football" in the English Wikipedia refers to several different games (I'd guess that many people around the world are interested in "Association Football"). This also often happens with articles about Biology and species: "Homo Sapiens", "Blueberry", etc.; these are organized differently in different Wikipedias.
[1] https://www.google.com/doodles/
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
This is indeed comparable, though from a slightly different aspect, and we are doing it completely ourselves.
Hopefully it will be directly useful to editors, and also for improving the software. For example, we already used it to improve the functionality of the search box itself, so that it would be able to find languages with alternate names, such as "castellano" and "español" for Spanish, and a few more.
בתאריך יום ה׳, 31 במאי 2018, 10:41, מאת James Heilman jmh649@gmail.com:
Excellent. Google also provided a list of some of the most missing items in 13 languages of India as part of Project Tiger.
https://meta.wikimedia.org/wiki/Supporting_Indian_Language_Wikipedias_Progra...
James
On Thu, May 31, 2018 at 10:58 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
Hi!
There's a little research project I've been working on in the last few weeks: What are the articles that people are most often looking for in their language, and *cannot* find?
I was doing this by looking at the logs of searches in the language
search
box in the interlanguage links panel and counting the articles on which searching for a language didn't yield any result.
This can be useful to the editors in different languages for
understanding
which articles are in demand and should be created. This may also be
useful
for considering how to reorganize existing articles. Of course, actually doing this is up to the editing communities in each language; I'm just trying to show where exactly does this happen.
My first attempt at producing a report about it can be found here: https://meta.wikimedia.org/wiki/Most_wanted_articles_across_languages
This is my first attempt to make a public version of this report, so you may find some issues there, for example contradicting or missing data. Also, the tables could probably be more nicely designed. Bug reports, suggestions for improvement, and all other feedback is obviously welcome. However, I believe this is good enough for taking a first look and
reaching
some conclusions.
The two immediate findings that I can see are that the most notable articles that people cannot find fall into the following categories:
- Topics that are popular in the news: "Avengers: Infinity War", "General
Data Protection Regulation", "Avicii". In particular, I should note that topics that are featured in Google Doodles [1] come up often: "Georges Méliès", "Mahadevi Varma", etc.
- Topics that are covered in another language, but cannot be found
because
of different organization of information. This often happens with
articles
where there are cultural differences between languages, for example "Football" in the English Wikipedia refers to several different games
(I'd
guess that many people around the world are interested in "Association Football"). This also often happens with articles about Biology and species: "Homo Sapiens", "Blueberry", etc.; these are organized
differently
in different Wikipedias.
[1] https://www.google.com/doodles/
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- James Heilman MD, CCFP-EM, Wikipedian _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org