Hi everyone,
Does anyone know if there's a straightforward (ideally language-independent) way of identifying stub articles in Wikipedia?
Whatever works is ok, whether it's publicly available data or data accessible only on the WMF cluster.
I've found lists for various languages (e.g., Italian https://it.wikipedia.org/wiki/Categoria:Stub or English https://en.wikipedia.org/wiki/Category:All_stub_articles), but the lists are in different formats, so separate code is required for each language, which doesn't scale.
I guess in the worst case, I'll have to grep for the respective stub templates in the respective wikitext dumps, but even this requires to know for each language what the respective template is. So if anyone could point me to a list of stub templates in different languages, that would also be appreciated.
Thanks! Bob
Hi Robert, one solution may be to use a query on Wikidata to retrieve the name for the stubs category in all the different languages. Then you could use a tool like PetScan to retrive all the pages in such categories, or write your own tool by using either a query on the database or Mediawiki API. You can find a sample solution here: http://paws-public.wmflabs.org/paws-public/3270/Stub%20categories.ipynb
I wrote that thing while on a train, so it may be messy and/or sub-optimal. I would like to thank Alex Monk and Yuvi Panda for their help with SQL on paws today.
Best, Giuseppe
2016-09-20 11:26 GMT+02:00 Robert West west@cs.stanford.edu:
Hi everyone,
Does anyone know if there's a straightforward (ideally language-independent) way of identifying stub articles in Wikipedia?
Whatever works is ok, whether it's publicly available data or data accessible only on the WMF cluster.
I've found lists for various languages (e.g., Italian or English), but the lists are in different formats, so separate code is required for each language, which doesn't scale.
I guess in the worst case, I'll have to grep for the respective stub templates in the respective wikitext dumps, but even this requires to know for each language what the respective template is. So if anyone could point me to a list of stub templates in different languages, that would also be appreciated.
Thanks! Bob
-- Up for a little language game? -- http://www.unfun.me
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics