On Wed, Aug 1, 2018 at 3:07 PM Yuan Gao gaoyuan@google.com wrote:
Hi Tilman, our team, i.e., the team working on extracting the knowledge from Wikipedia in Google, has just compared our crawled data with https://meta.wikimedia.org/wiki/List_of_Wikipedias/Table. In the following sites, we have quite significant diffs:
The stats Special Page for bo.wikipedia provide the following count as of today:
Content pages https://bo.wikipedia.org/w/index.php?title=Special:AllPages&hideredirects=1 5,818 Pages https://bo.wikipedia.org/wiki/Special:AllPages (All pages in the wiki, including talk pages, redirects, etc.)16,498
A page, according to software documentation is: "The automatic definition used by the software at Special:Statistics https://en.wikipedia.org/wiki/Special:Statistics is: *any page that is in the article namespace, is not a redirect page https://en.wikipedia.org/wiki/Wikipedia:Redirect and contains at least one wiki link*." Could it be possible that your definition is broader than the Mediawiki one? https://en.wikipedia.org/wiki/Wikipedia:What_is_an_article%3F#Lists_of_artic... Other things I would suggest is if Google may be including duplicate results.
There could be some amount of caching in both the statistics calculation and the rendering of those pages, although probably not enough to double the number of articles.