And to get a list of all wikis, to plug into that
URL instead of "
en.wikipedia.org", the most up-to-date information is here:
g/w/api.php?action=sitematrix&formatversion=2&format=json&ma
xage=3600&smaxage=3600. Sometimes new sites won't have data in the AQS
API for a month or two until we add them and start crunching their stats.
The way I figured this out is to look at how our UI uses the API:
.
So if you were interested in something else, you can browse around there
and take a look at the XHR requests in the browser console. Have fun!
On Thu, Mar 29, 2018 at 12:54 AM, Zainan Zhou (a.k.a Victor) <
zzn(a)google.com> wrote:
Hi Dan,
How are you! This is Victor, It's been a while since we meet at the 2018
Wikimedia Dev Summit. I hope you are doing great.
As I mentioned to you, my team works on extracting the knowledge from
Wikipedia. Currently it's undergoing a project that expands language
coverage. My teammate Yuan Gao(cc'ed here) is tech leader of this
project.She plans to *monitor the list of all the current available
wikipedia's sites and the number of articles for each language*, so
that we can compare with our extraction system's output to sanity-check if
there is a massive breakage of the extraction logic, or if we need to
add/remove languages in the event that a new wikipedia site is introduced
to/remove from the wikipedia family.
I think your team at Analytics at Wikimedia probably knows the best
where we can find this data. Here are 4 places we already know, but doesn't
seem to have the data.
-
https://en.wikipedia.org/wiki/List_of_Wikipedias. has the
information we need, but the list is manually edited, not automatic
-
https://stats.wikimedia.org/EN/Sitemap.htm, has the full list, but
the information seems pretty out of date(last updated almost a month ago)
- StatsV2 UI:
https://stats.wikimedia.org/v2/#/all-projects, I can't
find the full list nor the number of articles
- API
https://wikimedia.org/api/rest_v1/ suggested by elukey on
#wikimedia-analytics channel, it doesn't seem to have # of article
information
Do you know what is a good place to find this information? Thank you!
Victor
* • **Zainan Zhou(**周载南**) a.k.a. "Victor" * <http://who/zzn>
* • *Software Engineer, Data Engine
* •* Google Inc.
* • *zzn(a)google.com <ecarmeli(a)google.com> - 650.336.5691
* • * 1600 Amphitheathre Pkwy, LDAP zzn, Mountain View 94043
---------- Forwarded message ----------
From: Yuan Gao <gaoyuan(a)google.com>
Date: Wed, Mar 28, 2018 at 4:15 PM
Subject: Monitor the number of Wikipedia sites and the number of
articles in each site
To: Zainan Victor Zhou <zzn(a)google.com>
Cc: Wenjie Song <wenjies(a)google.com>om>, WikiData <wikidata(a)google.com>
Hi Victor,
as we discussed in the meeting, I'd like to monitor:
1) the number of Wikipedia sites
2) the number of articles in each site
Can you help us to contact with WMF to get a realtime or at least daily
update of these numbers? What we can find now is
https://en.wikipedia.org/wiki/List_of_Wikipedias, but the number of
Wikipedia sites is manually updated, and possibly out-of-date.
The monitor can help us catch such bugs.
--
Yuan Gao
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org