en:WP:DYK has a measure of 1,500+ characters of prose, which is a useful cutoff. There is weaponised javascript to measure that at en:WP:Did you know/DYKcheck
Probably doesn't translate to CJK languages which have radically different information content per character.
cheers stuart
-- ...let us be heard from red core to black sky
On Tue, Sep 20, 2016 at 9:26 PM, Robert West west@cs.stanford.edu wrote:
Hi everyone,
Does anyone know if there's a straightforward (ideally language-independent) way of identifying stub articles in Wikipedia?
Whatever works is ok, whether it's publicly available data or data accessible only on the WMF cluster.
I've found lists for various languages (e.g., Italian https://it.wikipedia.org/wiki/Categoria:Stub or English https://en.wikipedia.org/wiki/Category:All_stub_articles), but the lists are in different formats, so separate code is required for each language, which doesn't scale.
I guess in the worst case, I'll have to grep for the respective stub templates in the respective wikitext dumps, but even this requires to know for each language what the respective template is. So if anyone could point me to a list of stub templates in different languages, that would also be appreciated.
Thanks! Bob
-- Up for a little language game? -- http://www.unfun.me
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l