Hoi,
You write: "But that assumption is doing a lot of work here: it is very likely that topics that are of interest for English Wikipedia users are much better covered than topics for other language communities." This is an understatement. The coverage of women in English Wikipedia is much better than the coverage of Africa. 

We may have an abstract Wikipedia in languages like Igbo, Yoruba or Zulu and it will have more articles on the thugs in the US ancien regime than it will have on the government ministers of Africa. If this is not cultural appropriation, what is.

One root cause is the notion of quality in Wikidata, there is no consideration for the data we do not have. One way of considering a level zero on quality is that data is to be expected in a similar ratio as what is held for regions like America or Europe. One way to make this more explicit in Wikipedias is to have red and black links based on Wikidata items. The most relevant benefit; Wikipedias will finally benefit from the information held in other Wikipedias.
Thanks,
       GerardM

On Fri, 15 Jul 2022 at 18:27, Denny Vrandečić <dvrandecic@wikimedia.org> wrote:
An on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-07-15

--

Recently, we were asked about the potential impact that Abstract Wikipedia might have. So we made a Fermi estimate, or a “back-of-the-envelope” calculation if you like. Please, consider these estimates as a first draft. We welcome your feedback and input in order to improve them.

Here we provide the estimates for two questions: how many additional readers and how many additional contributors can we reach with the help of Abstract Wikipedia?

Note that the answers we provide here are not targets that we need to meet or expectations for the project in order to call ourselves a success, but rather they try to model an idea of how much growth Abstract Wikipedia could provide.

How many additional readers might we reach with the help of Abstract Wikipedia?

  1. Today, there are 4.6 billion Internet users [1] — more than half of the world’s population of 7.9 billion [2].
  2. English Wikipedia has 800 million readers [3].
  3. Across all Wikipedias we have about 1.8 billion readers [4].
  4. 1.2 billion Internet users speak English [5].
  5. So English Wikipedia reaches 67% of English-speaking Internet users (Divide (2) by (4))
  6. If we could make that independent of the language, we would have 3.1 billion readers (Multiply (1) by (5))
  7. So the growth potential is up to 1.3 billion new Wikipedia readers (Subtract (3) from (6))
  8. But: there are 850 million Internet users in China [6]
  9. Wikipedia is blocked in China [7]. Blocking a Wikipedia hampers growth considerably [8]. After accounting for censorship in China, the growth potential is up to 700 million new Wikipedia readers (That’s ((1)-(8))x(5)-(3))

We want to repeat the Foundation’s call to Chinese authorities to lift the block on Wikipedia in the People’s Republic of China [9]. We are committed to allowing everyone, everywhere to freely access, share, and participate in knowledge on Wikipedia.

This modeling does not account for further growth in Internet access across the globe, which would tend to be even more biased towards speakers of languages other than English. This model assumes that people are in general equally interested in encyclopedic content, if it was available to them in their language.

How many new contributors might we reach with the help of Abstract Wikipedia?

  1. English Wikipedia has 130,000 active contributors [10].
  2. All Wikipedias together have about 315,000 active contributors.
  3. The ratio of English Wikipedia contributors to English speaking Internet users is about 1 : 9,200 (i.e. one out of 9,200 English speaking Internet users is an active Wikipedian) (Divide (4) by (10))
  4. If we assume the same ratio over all Internet users across all languages, we should have 500,000 contributors (Multiply (1) with (12))
  5. This means that, assuming that we reach a more equitable distribution across language contributors, we would have a potential of 185,000 new contributors outside of English Wikipedia (Subtract (11) from (13))
  6. That would double our current number of Wikipedia contributors outside of English Wikipedia (Subtract (10) from (11) to get the current number of contributors)
  7. After accounting for censorship in China results in a potential of 92,500 new contributors (That’s ((1)-(8))x(12)-(11))

This is assuming that the ratio of contributors to Wikipedia would be equal in each language. Right now, for example, the goal of reaching an up-to-date Wikipedia in many languages might seem unrealistic or overwhelming, so potential contributors choose not to engage. Abstract Wikipedia might make this goal seem more realistically achievable, which might lead to a more similar ratio to what we see in English. This and other similar arguments are discussed in more detail in the Wikipedia@20 article. Other considerations, such as easier access to knowledge or more leisure time [11], are not taken into account by our model.

Aren’t these models missing something?

Yes, they are. The readership model assumes that we can apply the interest of English speaking Internet users in Wikipedia to estimate how many readers we are missing in the other languages. But that assumption is doing a lot of work here: it is very likely that topics that are of interest for English Wikipedia users are much better covered than topics for other language communities.

And this is where the estimate about new contributors comes in: the hope is that more contributors in under-represented languages will be covering the topics that are of particular interest for these languages (whether in Abstract Wikipedia, or in their own language edition). Particularly because these contributors wouldn’t have to spend as much time covering the gaps in the common knowledge baseline.

What we are also not considering is that by covering these additional topics and providing novel contribution models, we might potentially also reach more readers and contributors even in English. Such an increase would, if propagated to the other languages, lead to even more readers and contributors worldwide.

Besides that, there are many other contributing factors that have been discussed in the literature, such as internet skills, awareness of Wikipedia, etc. (see, for example, [12][13][14][15]).

I wish I could play around with these calculations!

Right? And that’s one possible use case for Wikifunctions: we will be able to turn the models we have described above into functions in Wikifunctions. Then we can directly discuss and improve these models on Wikifunctions, update them with new numbers, and show differing models. The model above doesn’t go beyond looking up and combining a few numbers. On Wikifunctions we could then work together on improving these models, adding more interesting relationships between topic coverage, readership, and contributor numbers.

Your feedback to further improve the model is very welcome! Thanks a lot to Isaac Johnson and Joseph Allemandou for their valuable feedback.

Workstream updates (as of July 1, 2022)

Performance:

  • Researched options for adding uptime monitoring in Beta Cluster
  • Continued progress on end-to-end testing using the Beta Cluster and migration of the tester pipeline from orchestrator into MediaWiki

NLG:

  • Aligned on the NLG goals

Meta-data:

  • Extended metadata dialog module to support objects as values, links, i18n of keys

Experience:

  • Fixed function-orchestrator and function-evaluator tests, and Wikilambda ApiFunctionCall examples
  • Merged interactivity components for tester and implementation tables
  • Finished API pagination changes for wikilambdafn_search
  • Replaced custom toast component with Codex message and modified edit button for responsive screens

Notes

  1.  https://www.statista.com/statistics/617136/digital-population-worldwide/
  2.  https://www.worldometers.info/world-population/
  3.  Note: we use the unique devices metric as a proxy for readers. https://stats.wikimedia.org/#/en.wikipedia.org/reading/unique-devices/normal%7Cline%7C1-year%7C(access-site)~mobile-site*desktop-site%7Cmonthly
  4.  Note: we use the unique devices metric as a proxy for readers. https://stats.wikimedia.org/#/all-wikipedia-projects/reading/unique-devices/normal%7Cline%7C1-year%7C(access-site)~mobile-site*desktop-site%7Cmonthly
  5.  https://www.internetworldstats.com/stats7.htm
  6.  https://www.statista.com/statistics/262966/number-of-internet-users-in-selected-countries/
  7.  https://en.wikipedia.org/wiki/Censorship_of_Wikipedia#China
  8.  https://wikimania2012.wikimedia.org/wiki/Submissions/Have_you_heard_about_the_Uzbek_Wikipedia%3F_Neither_have_the_Uzbeks.
  9.  https://diff.wikimedia.org/2019/05/17/wikimedia-foundation-urges-chinese-authorities-to-lift-block-of-wikipedia-in-china/
  10.  https://stats.wikimedia.org/#/en.wikipedia.org/contributing/editors/normal%7Cline%7C1-year%7Ceditor_type~group-bot*name-bot*user%7Cmonthly
  11.  https://ourworldindata.org/time-use#do-workers-in-richer-countries-work-longer-hours
  12.  https://academic.oup.com/joc/article-abstract/68/1/143/4915319
  13.  https://www.nulearningforlife.org/wp-content/uploads/2016/08/Hargittai_Shaw-2015-Mind_the_skills_gap-ICS.pdf
  14.  https://arxiv.org/pdf/2008.12314.pdf#subsection.3.3
  15.  https://arxiv.org/pdf/2008.12314.pdf#subsection.4.3
_______________________________________________
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimedia.org/