On 2012-06-09 01:31, Platonides wrote:
That's not a problem. You can process anold dump with current contentness values.
That's right, but if my results are not the same as the results you get, this might cause doubts. In my imagination, the number of links before and after a GLAM cooperation could be one metric of progress. This is where links in talk pages should not count, but links in content pages should. So I would prefer a robust way of counting the links, and not one that varies with the API.
You should of course store the ns you treated as content with the filtered external link, but you're generating that file.
Good idea.
Another problem is that I want to count links that I find in the File: (ns=6)
There's usually no content there (license templates, fair use rationales...). Given that you won't be correctly computing the external links from transcluded commons inmages, I wouldn't count it (except for commons, where images are the content).
If there are no local images (as on Swedish Wikipedia), it does no harm to include the File namespace. But on Commons it's the only real content. I also heard the suggestion that Wikisource should allow local uploads (of PDF / Djvu files with scanned books) to navigate around the overly restrictive admins on Commons.
Should I submit my script (300 lines of Perl) somewhere?
Yes. Probably somewhere at svn.wikimedia.org/mediawiki/trunk/tools
I had the impression that SVN was replaced by Git, but perhaps that's just for MediaWiki core?
Most likely, I'll just use my user page on meta.