Thanks a bunch to everyone who chimed in here. These hints brought us
forward quite a bit!
Bob
On Wed, Sep 21, 2016 at 12:50 AM, Giuseppe Profiti
<profgiuseppe(a)gmail.com> wrote:
[forwarding my answer from analytics ml, I forgot to
subscribe to this list too]
Hi Robert,
one solution may be to use a query on Wikidata to retrieve the name
for the stubs category in all the different languages. Then you could
use a tool like PetScan to retrive all the pages in such categories,
or write your own tool by using either a query on the database or
Mediawiki API.
You can find a sample solution here:
http://paws-public.wmflabs.org/paws-public/3270/Stub%20categories.ipynb
I wrote that thing while on a train, so it may be messy and/or sub-optimal.
I would like to thank Alex Monk and Yuvi Panda for their help with SQL
on paws today.
Best,
Giuseppe
2016-09-20 11:26 GMT+02:00 Robert West <west(a)cs.stanford.edu>du>:
Hi everyone,
Does anyone know if there's a straightforward (ideally language-independent)
way of identifying stub articles in Wikipedia?
Whatever works is ok, whether it's publicly available data or data
accessible only on the WMF cluster.
I've found lists for various languages (e.g., Italian or English), but the
lists are in different formats, so separate code is required for each
language, which doesn't scale.
I guess in the worst case, I'll have to grep for the respective stub
templates in the respective wikitext dumps, but even this requires to know
for each language what the respective template is. So if anyone could point
me to a list of stub templates in different languages, that would also be
appreciated.
Thanks!
Bob
--
Up for a little language game? --
http://www.unfun.me
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l