Wiktionary-l September 2017

wiktionary-l@lists.wikimedia.org

2 participants
1 discussions

Historic stats
by Lars Aronsson 09 Sep '17

09 Sep '17

In Wiktionary, every site/language documents words from every language, as I am sure you know. A typical wiki page, e.g. "war" contains information about the English noun as well as the German verb. Through categories, we also know how many entries there are. How many English lemmas, how many English nouns, how many German verbs. But if I want to plot a graph of the growth over time of English nouns and German verbs, it is a pity that this is not available anywhere. But it would be possible to generate such data from the history dump, by finding out when the page "war" was created and when its English and German sections were created. In SQL terms, it would be for each combination of page and section (heading), find the earliest date when that section was present in that page. But a practical implementation would of course solve that as a single-pass filter, reading the stdout from bunzip. So has anybody already written a program that reads through the XML dump of articles and their history, and generates statistics of this kind? -- Lars Aronsson (lars(a)aronsson.se) Linköping, Sweden

2 1

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Wiktionary-l September 2017