On Thu, 28 Apr 2016 at 15:55 +0200, Alex Brollo wrote:
> Very interesting.
>
> Have you any suggestion about finding the list of not transcluded pages? I
> can imagine, to get by a bot html of ns0 main page and all its subpages
> related to a Index page, then parsing it to get the list of existing page
> links; is there any simpler strategy?
>
> Alex
If you have access to the database the simplest way is the code of this tool
https://github.com/phil-el/phetools/blob/master/statistics/not_transcluded.py
as the function not_transcluded() is nearly what you need. I'll probably
show the list of page not transcluded in a future version but this tool get
such list for all index: on a wiki and the query takes a few minutes, it's not
handy for a per index transclusions status.
To get such list for only one index it'll easier to use the API, 1) get all
links on the Index:page filtered to namespace Page: 2) use the embededin api
to get all transclusions from ns:0, result from 1) minus result from 2) are
what you are searching. You can do 1) in one request and you can probably get
also the proofread status with the same request as you are probably only
interested in yellow or green page not transcluded, 2) is perhaps possible
in only one request, I don't remember. Such tool to complement my tool can be
very useful. It's possible I'll provide a simpler API on toollabs to do that.
--
phe
_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l