On 2012-06-09 01:31, Platonides wrote:
That's not a problem. You can process anold dump
with current
contentness values.
That's right, but if my results are not the same as the results you
get, this might cause doubts. In my imagination, the number of
links before and after a GLAM cooperation could be one metric
of progress. This is where links in talk pages should not count, but
links in content pages should. So I would prefer a robust way of
counting the links, and not one that varies with the API.
You should of course store the ns you treated as
content with the
filtered external link, but you're generating that file.
Good idea.
Another
problem is that I want to count links that I find in the File:
(ns=6)
There's usually no content there (license templates, fair use
rationales...). Given that you won't be correctly computing the external
links from transcluded commons inmages, I wouldn't count it (except for
commons, where images are the content).
If there are no local images (as on Swedish Wikipedia), it does
no harm to include the File namespace. But on Commons it's
the only real content. I also heard the suggestion that Wikisource
should allow local uploads (of PDF / Djvu files with scanned books)
to navigate around the overly restrictive admins on Commons.
Should I
submit my script (300 lines of Perl) somewhere?
Yes. Probably somewhere at
svn.wikimedia.org/mediawiki/trunk/tools
I had the impression that SVN was replaced by Git, but perhaps
that's just for MediaWiki core?
Most likely, I'll just use my user page on meta.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se
Project Runeberg - free Nordic literature -
http://runeberg.org/