Re: [Wiktionary-l] Historic stats

9 Sep 2017


      Hi Lars,
https://dkpro.github.io/dkpro-jwktl/ might be a good starting point for you. It does not solve all steps of your use case right away, but you could save a lot of implementation time compared to starting from scratch. The software is written in Java.
Best,
Christian
-----Original Message-----
From: Wiktionary-l [mailto:wiktionary-l-bounces@lists.wikimedia.org] On Behalf Of Lars Aronsson
Sent: Friday, September 08, 2017 8:56 PM
To: Wikimedia developers
Cc: Wiktionary
Subject: [Wiktionary-l] Historic stats
In Wiktionary, every site/language documents words from every language,
as I am sure you know. A typical wiki page, e.g. "war" contains information
about the English noun as well as the German verb.
Through categories, we also know how many entries there are. How many
English lemmas, how many English nouns, how many German verbs.
But if I want to plot a graph of the growth over time of English nouns
and German verbs, it is a pity that this is not available anywhere.
But it would be possible to generate such data from the history
dump, by finding out when the page "war" was created and when its
English and German sections were created. In SQL terms, it would be
for each combination of page and section (heading), find the earliest
date when that section was present in that page. But a practical
implementation would of course solve that as a single-pass filter,
reading the stdout from bunzip.
So has anybody already written a program that reads through the
XML dump of articles and their history, and generates statistics
of this kind?
-- 
   Lars Aronsson (lars@aronsson.se)
   Linköping, Sweden


_______________________________________________
Wiktionary-l mailing list
Wiktionary-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wiktionary-l] Historic stats