For English Wikisource, I have a series of 63 volumes of the "Dictionary of National Biography" that have now been transcribed. These works have many internal cross reference links, which we have been generating, however, in the 16 years that the volumes were originally constructed, some of the articles that were planned to be written were not, so they have ended up being red links. I wish to identify these red links
I am trying work out how to use pywikibot to generate a list of the pages with red links. To get a list of pages to feed through is easy, as each page is linked to an index page per volume, so it would be a namespace collection from a central page, all based around {{PAGENAME}}.
Thanks for any help provided.
Regards, Billinghurst
I can do something far easier. Are these pages all in a category or something else trackable?
On Sunday, September 7, 2014, Wiki Billinghurst billinghurstwiki@gmail.com wrote:
For English Wikisource, I have a series of 63 volumes of the "Dictionary of National Biography" that have now been transcribed. These works have many internal cross reference links, which we have been generating, however, in the 16 years that the volumes were originally constructed, some of the articles that were planned to be written were not, so they have ended up being red links. I wish to identify these red links
I am trying work out how to use pywikibot to generate a list of the pages with red links. To get a list of pages to feed through is easy, as each page is linked to an index page per volume, so it would be a namespace collection from a central page, all based around {{PAGENAME}}.
Thanks for any help provided.
Regards, Billinghurst
On Mon, Sep 8, 2014 at 10:45 AM, Wiki Billinghurst billinghurstwiki@gmail.com wrote:
For English Wikisource, I have a series of 63 volumes of the "Dictionary of National Biography" that have now been transcribed. ... I am trying work out how to use pywikibot to generate a list of the pages with red links.
I am guessing that "John phoenixoverride@gmail.com" might be offering to run a database query to provide the data you need, which would obtain the data quite quickly, but here is a introductory approach to obtaining the data using pywikibot 'core'.
$ python pwb.py shell
import pywikibot site = pywikibot.Site('en', 'wikisource')
from pywikibot import pagegenerators gen = pagegenerators.PrefixingPageGenerator(namespace=104, prefix='Dictionary_of_National_Biography', site=site)
for dnb_page in gen: for linked_page in dnb_page.linkedPages(): if not linked_page.exists(): pywikibot.output(u"%s refers to missing %s" % (dnb_page, linked_page))
[[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/124]] refers to missing [[wikisource:en:Bickerseth, Edward Henry (DNB02)]] [[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/124]] refers to missing [[wikisource:en:Temple, Frederick (DNB02)]] [[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/124]] refers to missing [[wikisource:en:Westcott, Brooke Foss (DNB02)]] [[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/188]] refers to missing [[wikisource:en:Chinese Picture: notes on photographs made in China]] [[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/37]] refers to missing [[wikisource:en:Modern Philosophy (Adamson)]] [[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/460]] refers to missing [[wikisource:en:Handbook to Spain]] [[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/460]] refers to missing [[wikisource:en:Portugal Old and New]] [[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/460]] refers to missing [[wikisource:en:Round the Calendar in Portugal]] [[wikisource:en:Page:Dictionary of National Biography, Second Supplement, volume 1.djvu/460]] refers to missing [[wikisource:en:Travels in Portugal]] ...
You may need to add more logic to filter which 'linked_page' you are interested in, such as only some titles or namespaces.
-- John Vandenberg
Please have a look at https://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/piroskat-userallap.py Unfortunately, not all comments are in English, sorry for that.
def redCategories
shows how an HTML source can be downloaded and processed for red category links. "(megíratlan szócikk)" after the title within brackets means "unwritten article", I don't know the English phrase. "piros" in comments means red and "kék" means blue.
Although working from HTML seems to be uncommon and strange for an API-based environment, in this special case it is easy and worth a try.
2014-09-08 9:13 GMT+02:00 Bináris wikiposta@gmail.com:
Please have a look at https://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/piroskat-userallap.py Unfortunately, not all comments are in English, sorry for that.
def redCategories
shows how an HTML source can be downloaded and processed for red category links. "(megíratlan szócikk)" after the title within brackets means "unwritten article", I don't know the English phrase. "piros" in comments means red and "kék" means blue.
Filtering for red links can also be done through the API one page at a time using action=parse and prop=links.[1] However, I don't know if Pywikibot supports this usage of the parse action.
[1] Example: https://en.wikipedia.org/w/api.php?action=parse&format=jsonfm&page=T...
On Mon, Sep 8, 2014 at 3:40 AM, Bináris wikiposta@gmail.com wrote:
Although working from HTML seems to be uncommon and strange for an API-based environment, in this special case it is easy and worth a try.
2014-09-08 9:13 GMT+02:00 Bináris wikiposta@gmail.com:
Please have a look at https://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/piroskat-userallap.py Unfortunately, not all comments are in English, sorry for that.
def redCategories
shows how an HTML source can be downloaded and processed for red category links. "(megíratlan szócikk)" after the title within brackets means "unwritten article", I don't know the English phrase. "piros" in comments means red and "kék" means blue.
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l