Hi,
I added a new tool: https://tools.wmflabs.org/phetools/not_transcluded/ to provide a list of Index containing corrected or validated page which are not transcluded from main:, see the README.txt.
Wow, this is fantastic Phe. It's really useful for running the "Match & split" when it's needed.
Andrea
On Thu, Apr 28, 2016 at 2:42 PM, Philippe Elie phil.el@free.fr wrote:
Hi,
I added a new tool: https://tools.wmflabs.org/phetools/not_transcluded/ to provide a list of Index containing corrected or validated page which are not transcluded from main:, see the README.txt.
-- phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Very interesting.
Have you any suggestion about finding the list of not transcluded pages? I can imagine, to get by a bot html of ns0 main page and all its subpages related to a Index page, then parsing it to get the list of existing page links; is there any simpler strategy?
Alex
2016-04-28 14:50 GMT+02:00 Andrea Zanni zanni.andrea84@gmail.com:
Wow, this is fantastic Phe. It's really useful for running the "Match & split" when it's needed.
Andrea
On Thu, Apr 28, 2016 at 2:42 PM, Philippe Elie phil.el@free.fr wrote:
Hi,
I added a new tool: https://tools.wmflabs.org/phetools/not_transcluded/ to provide a list of Index containing corrected or validated page which are not transcluded from main:, see the README.txt.
-- phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Thu, 28 Apr 2016 at 15:55 +0200, Alex Brollo wrote:
Very interesting.
Have you any suggestion about finding the list of not transcluded pages? I can imagine, to get by a bot html of ns0 main page and all its subpages related to a Index page, then parsing it to get the list of existing page links; is there any simpler strategy?
Alex
If you have access to the database the simplest way is the code of this tool https://github.com/phil-el/phetools/blob/master/statistics/not_transcluded.p... as the function not_transcluded() is nearly what you need. I'll probably show the list of page not transcluded in a future version but this tool get such list for all index: on a wiki and the query takes a few minutes, it's not handy for a per index transclusions status.
To get such list for only one index it'll easier to use the API, 1) get all links on the Index:page filtered to namespace Page: 2) use the embededin api to get all transclusions from ns:0, result from 1) minus result from 2) are what you are searching. You can do 1) in one request and you can probably get also the proofread status with the same request as you are probably only interested in yellow or green page not transcluded, 2) is perhaps possible in only one request, I don't remember. Such tool to complement my tool can be very useful. It's possible I'll provide a simpler API on toollabs to do that.
Thanks Phe also for the pointer at your GitHub page, I'll try to post issues directly there if needed :-)
Your tool and a bit of fiddling with transclusions got me thinking: sometime some works are really complex. You have multiple Indexes representing multiple texts, and at times you also have other versions/editions of the same work. This creates a mess because all the realtionships between Indexes and ns0 pages are made by humans, and it's not always easy to understand the "structure".
So, my question is: is it possibile to "draw" some sort of graph/network between Indexes and the pages that are transcluded from them?
Maybe with a visual representation it would be easier to tame the chaos :-)
Aubrey
On Thu, Apr 28, 2016 at 4:47 PM, Philippe Elie phil.el@free.fr wrote:
On Thu, 28 Apr 2016 at 15:55 +0200, Alex Brollo wrote:
Very interesting.
Have you any suggestion about finding the list of not transcluded pages?
I
can imagine, to get by a bot html of ns0 main page and all its subpages related to a Index page, then parsing it to get the list of existing page links; is there any simpler strategy?
Alex
If you have access to the database the simplest way is the code of this tool
https://github.com/phil-el/phetools/blob/master/statistics/not_transcluded.p... as the function not_transcluded() is nearly what you need. I'll probably show the list of page not transcluded in a future version but this tool get such list for all index: on a wiki and the query takes a few minutes, it's not handy for a per index transclusions status.
To get such list for only one index it'll easier to use the API, 1) get all links on the Index:page filtered to namespace Page: 2) use the embededin api to get all transclusions from ns:0, result from 1) minus result from 2) are what you are searching. You can do 1) in one request and you can probably get also the proofread status with the same request as you are probably only interested in yellow or green page not transcluded, 2) is perhaps possible in only one request, I don't remember. Such tool to complement my tool can be very useful. It's possible I'll provide a simpler API on toollabs to do that.
-- phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
I don't see fawikis on the page, could you update it also with farsi wikiS ? thanks in advance
Mardetanha
On Thu, Apr 28, 2016 at 8:09 PM, Andrea Zanni zanni.andrea84@gmail.com wrote:
Thanks Phe also for the pointer at your GitHub page, I'll try to post issues directly there if needed :-)
Your tool and a bit of fiddling with transclusions got me thinking: sometime some works are really complex. You have multiple Indexes representing multiple texts, and at times you also have other versions/editions of the same work. This creates a mess because all the realtionships between Indexes and ns0 pages are made by humans, and it's not always easy to understand the "structure".
So, my question is: is it possibile to "draw" some sort of graph/network between Indexes and the pages that are transcluded from them?
Maybe with a visual representation it would be easier to tame the chaos :-)
Aubrey
On Thu, Apr 28, 2016 at 4:47 PM, Philippe Elie phil.el@free.fr wrote:
On Thu, 28 Apr 2016 at 15:55 +0200, Alex Brollo wrote:
Very interesting.
Have you any suggestion about finding the list of not transcluded
pages? I
can imagine, to get by a bot html of ns0 main page and all its subpages related to a Index page, then parsing it to get the list of existing
page
links; is there any simpler strategy?
Alex
If you have access to the database the simplest way is the code of this tool
https://github.com/phil-el/phetools/blob/master/statistics/not_transcluded.p... as the function not_transcluded() is nearly what you need. I'll probably show the list of page not transcluded in a future version but this tool get such list for all index: on a wiki and the query takes a few minutes, it's not handy for a per index transclusions status.
To get such list for only one index it'll easier to use the API, 1) get all links on the Index:page filtered to namespace Page: 2) use the embededin api to get all transclusions from ns:0, result from 1) minus result from 2) are what you are searching. You can do 1) in one request and you can probably get also the proofread status with the same request as you are probably only interested in yellow or green page not transcluded, 2) is perhaps possible in only one request, I don't remember. Such tool to complement my tool can be very useful. It's possible I'll provide a simpler API on toollabs to do that.
-- phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Fri, 29 Apr 2016 at 00:18 +0430, Mardetanha wrote:
I don't see fawikis on the page, could you update it also with farsi wikiS ? thanks in advance
Mardetanha
fa is in, but empty html file are not created, I'll change it to create the file even if zero index meet the needed criteria.
Thanks
Mardetanha
On Fri, Apr 29, 2016 at 2:33 AM, Philippe Elie phil.el@free.fr wrote:
On Fri, 29 Apr 2016 at 00:18 +0430, Mardetanha wrote:
I don't see fawikis on the page, could you update it also with farsi
wikiS ?
thanks in advance
Mardetanha
fa is in, but empty html file are not created, I'll change it to create the file even if zero index meet the needed criteria.
-- Phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
To note that we have the long existing tool "checker" at toollabs that will generate the transclusion listing per work
https://tools.wmflabs.org/checker
eg. https://tools.wmflabs.org/checker/?db=enwikisource_p&title=Index:Dream_d...
and enWS has been using it on its Index: ns pages for ages.
Regards, Billinghurst
On Fri, Apr 29, 2016 at 12:47 AM, Philippe Elie phil.el@free.fr wrote:
On Thu, 28 Apr 2016 at 15:55 +0200, Alex Brollo wrote:
Very interesting.
Have you any suggestion about finding the list of not transcluded pages? I can imagine, to get by a bot html of ns0 main page and all its subpages related to a Index page, then parsing it to get the list of existing page links; is there any simpler strategy?
Alex
If you have access to the database the simplest way is the code of this tool https://github.com/phil-el/phetools/blob/master/statistics/not_transcluded.p... as the function not_transcluded() is nearly what you need. I'll probably show the list of page not transcluded in a future version but this tool get such list for all index: on a wiki and the query takes a few minutes, it's not handy for a per index transclusions status.
To get such list for only one index it'll easier to use the API, 1) get all links on the Index:page filtered to namespace Page: 2) use the embededin api to get all transclusions from ns:0, result from 1) minus result from 2) are what you are searching. You can do 1) in one request and you can probably get also the proofread status with the same request as you are probably only interested in yellow or green page not transcluded, 2) is perhaps possible in only one request, I don't remember. Such tool to complement my tool can be very useful. It's possible I'll provide a simpler API on toollabs to do that.
-- phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Thanks for reminding us, it.source uses it too on the Index ns. But Phe tool is different: it gives you the list of all the "not transcluded" books. You don't have to check all the books by hand to know.
Ideally, the tools should be merged in one, so an editor can check every work directly. Used together the tools are pretty powerful.
Aubrey
On Fri, Apr 29, 2016 at 1:14 PM, billinghurst billinghurstwiki@gmail.com wrote:
To note that we have the long existing tool "checker" at toollabs that will generate the transclusion listing per work
https://tools.wmflabs.org/checker
eg. https://tools.wmflabs.org/checker/?db=enwikisource_p&title=Index:Dream_d...
and enWS has been using it on its Index: ns pages for ages.
Regards, Billinghurst
On Fri, Apr 29, 2016 at 12:47 AM, Philippe Elie phil.el@free.fr wrote:
On Thu, 28 Apr 2016 at 15:55 +0200, Alex Brollo wrote:
Very interesting.
Have you any suggestion about finding the list of not transcluded
pages? I
can imagine, to get by a bot html of ns0 main page and all its subpages related to a Index page, then parsing it to get the list of existing
page
links; is there any simpler strategy?
Alex
If you have access to the database the simplest way is the code of this
tool
https://github.com/phil-el/phetools/blob/master/statistics/not_transcluded.p...
as the function not_transcluded() is nearly what you need. I'll probably show the list of page not transcluded in a future version but this tool
get
such list for all index: on a wiki and the query takes a few minutes,
it's not
handy for a per index transclusions status.
To get such list for only one index it'll easier to use the API, 1) get
all
links on the Index:page filtered to namespace Page: 2) use the embededin
api
to get all transclusions from ns:0, result from 1) minus result from 2)
are
what you are searching. You can do 1) in one request and you can
probably get
also the proofread status with the same request as you are probably only interested in yellow or green page not transcluded, 2) is perhaps
possible
in only one request, I don't remember. Such tool to complement my tool
can be
very useful. It's possible I'll provide a simpler API on toollabs to do
that.
-- phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
(it seems there is a problem: I thought a simple patch like this could work
line 47:
def format_html_line(domain, bookname, count): if domain == 'old': domain = 'mul' #bookname = unicode(bookname, 'utf-8') fmt = '<li><a href="//%s.wikisource.org/wiki/Index:%s">%s</a> %d <a href="//%s.wikisource.org/wiki/Index:%s">check pages</a></li>' result = fmt % (domain, urllib.quote(bookname), bookname, count, domain, urllib.quote(bookname), bookname) return result
but the problem is that the transclusion checker wants the exact name of the page, so the translations of "Index" in all the language give and error. This should be change on the "checker" side.
Aubrey
On Fri, Apr 29, 2016 at 3:23 PM, Andrea Zanni zanni.andrea84@gmail.com wrote:
Thanks for reminding us, it.source uses it too on the Index ns. But Phe tool is different: it gives you the list of all the "not transcluded" books. You don't have to check all the books by hand to know.
Ideally, the tools should be merged in one, so an editor can check every work directly. Used together the tools are pretty powerful.
Aubrey
On Fri, Apr 29, 2016 at 1:14 PM, billinghurst billinghurstwiki@gmail.com wrote:
To note that we have the long existing tool "checker" at toollabs that will generate the transclusion listing per work
https://tools.wmflabs.org/checker
eg. https://tools.wmflabs.org/checker/?db=enwikisource_p&title=Index:Dream_d...
and enWS has been using it on its Index: ns pages for ages.
Regards, Billinghurst
On Fri, Apr 29, 2016 at 12:47 AM, Philippe Elie phil.el@free.fr wrote:
On Thu, 28 Apr 2016 at 15:55 +0200, Alex Brollo wrote:
Very interesting.
Have you any suggestion about finding the list of not transcluded
pages? I
can imagine, to get by a bot html of ns0 main page and all its subpages related to a Index page, then parsing it to get the list of existing
page
links; is there any simpler strategy?
Alex
If you have access to the database the simplest way is the code of this
tool
https://github.com/phil-el/phetools/blob/master/statistics/not_transcluded.p...
as the function not_transcluded() is nearly what you need. I'll probably show the list of page not transcluded in a future version but this tool
get
such list for all index: on a wiki and the query takes a few minutes,
it's not
handy for a per index transclusions status.
To get such list for only one index it'll easier to use the API, 1) get
all
links on the Index:page filtered to namespace Page: 2) use the
embededin api
to get all transclusions from ns:0, result from 1) minus result from 2)
are
what you are searching. You can do 1) in one request and you can
probably get
also the proofread status with the same request as you are probably only interested in yellow or green page not transcluded, 2) is perhaps
possible
in only one request, I don't remember. Such tool to complement my tool
can be
very useful. It's possible I'll provide a simpler API on toollabs to do
that.
-- phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Fri, 29 Apr 2016 at 15:38 +0200, Andrea Zanni wrote:
(it seems there is a problem: I thought a simple patch like this could work
line 47:
def format_html_line(domain, bookname, count): if domain == 'old': domain = 'mul' #bookname = unicode(bookname, 'utf-8') fmt = '<li><a href="//%s.wikisource.org/wiki/Index:%s">%s</a> %d <a href="//%s.wikisource.org/wiki/Index:%s">check pages</a></li>' result = fmt % (domain, urllib.quote(bookname), bookname, count, domain, urllib.quote(bookname), bookname) return result
but the problem is that the transclusion checker wants the exact name of the page, so the translations of "Index" in all the language give and error. This should be change on the "checker" side.
Aubrey
checker is older than canonical namespace for Index and Page, I've the needed names on my side so I added a link to checker for each index listed.
hello Phe,
tool is very useful, but generated links point "nowhere (404)" if the Pages and Index have different names (often used on pl ws ), see: https://tools.wmflabs.org/phetools/not_transcluded/pl.html and Poezye_cz._2_(Antoni_Lange).djvu
where pages Strona:Poezye_cz._2_(Antoni_Lange).djvu/001... are from Indeks:Poezye_cz._2_(Antoni Lange) https://pl.wikisource.org/wiki/Indeks%3APoezye_cz._2_%28Antoni_Lange%29
so, the tool would have to link to the correct name of the index (also for Checker)
regards,
Z.
On 28 April 2016 at 14:42, Philippe Elie phil.el@free.fr wrote:
Hi,
I added a new tool: https://tools.wmflabs.org/phetools/not_transcluded/ to provide a list of Index containing corrected or validated page which are not transcluded from main:, see the README.txt.
-- phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Fri, 29 Apr 2016 at 17:08 +0200, Zdzislaw wrote:
hello Phe,
tool is very useful, but generated links point "nowhere (404)" if the Pages and Index have different names (often used on pl ws ), see: https://tools.wmflabs.org/phetools/not_transcluded/pl.html and Poezye_cz._2_(Antoni_Lange).djvu
where pages Strona:Poezye_cz._2_(Antoni_Lange).djvu/001... are from Indeks:Poezye_cz._2_(Antoni Lange) https://pl.wikisource.org/wiki/Indeks%3APoezye_cz._2_%28Antoni_Lange%29
so, the tool would have to link to the correct name of the index (also for Checker)
regards,
Z.
Yes, such index are not correctly handled it's stated in the README.txt. Actually deducing index name from page is done by striping the "/*" part of Page: name which is very cheap, but this tool is already enough slow than I need to cache the result and to generate these html page only one per day. I'm unsure how to handle that sort of index without doing a request per Page: not transcluded to get the Index: name. Perhaps I could check if the index doesn't exist and retry by removing the extension ?
I am just checking of bnws, here https://tools.wmflabs.org/phetools/not_transcluded/bn.html. All "check pages" got the error
" Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."
On Fri, Apr 29, 2016 at 9:18 PM, Philippe Elie phil.el@free.fr wrote:
On Fri, 29 Apr 2016 at 17:08 +0200, Zdzislaw wrote:
hello Phe,
tool is very useful, but generated links point "nowhere (404)" if the Pages and Index have different names (often used on pl ws ), see: https://tools.wmflabs.org/phetools/not_transcluded/pl.html and Poezye_cz._2_(Antoni_Lange).djvu
where pages Strona:Poezye_cz._2_(Antoni_Lange).djvu/001... are from Indeks:Poezye_cz._2_(Antoni Lange) https://pl.wikisource.org/wiki/Indeks%3APoezye_cz._2_%28Antoni_Lange%29
so, the tool would have to link to the correct name of the index (also for Checker)
regards,
Z.
Yes, such index are not correctly handled it's stated in the README.txt. Actually deducing index name from page is done by striping the "/*" part of Page: name which is very cheap, but this tool is already enough slow than I need to cache the result and to generate these html page only one per day. I'm unsure how to handle that sort of index without doing a request per Page: not transcluded to get the Index: name. Perhaps I could check if the index doesn't exist and retry by removing the extension ?
-- Phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Fri, 29 Apr 2016 at 21:25 +0530, Jayanta Nath wrote:
I am just checking of bnws, here https://tools.wmflabs.org/phetools/not_transcluded/bn.html. All "check pages" got the error
" Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."
I asked the maintainer of the checker tool to get a look at that.
sorry, I have not read the README.txt carefully :))) there is no need to check if the index doesn't exist and removing the extension, I can do it manually :))
thanks,
Z.
On 29 April 2016 at 17:48, Philippe Elie phil.el@free.fr wrote:
On Fri, 29 Apr 2016 at 17:08 +0200, Zdzislaw wrote:
hello Phe,
tool is very useful, but generated links point "nowhere (404)" if the Pages and Index have different names (often used on pl ws ), see: https://tools.wmflabs.org/phetools/not_transcluded/pl.html and Poezye_cz._2_(Antoni_Lange).djvu
where pages Strona:Poezye_cz._2_(Antoni_Lange).djvu/001... are from Indeks:Poezye_cz._2_(Antoni Lange) https://pl.wikisource.org/wiki/Indeks%3APoezye_cz._2_%28Antoni_Lange%29
so, the tool would have to link to the correct name of the index (also for Checker)
regards,
Z.
Yes, such index are not correctly handled it's stated in the README.txt. Actually deducing index name from page is done by striping the "/*" part of Page: name which is very cheap, but this tool is already enough slow than I need to cache the result and to generate these html page only one per day. I'm unsure how to handle that sort of index without doing a request per Page: not transcluded to get the Index: name. Perhaps I could check if the index doesn't exist and retry by removing the extension ?
-- Phe
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
wikisource-l@lists.wikimedia.org