Xqt has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/826937 )
Change subject: [pep8] PEP8 changes for create_isbn_edition.py ......................................................................
[pep8] PEP8 changes for create_isbn_edition.py
- Code style issues - clear trailing white space - untabify file - keep lines below 80 chars - update function documentation and parameter list - update shebang - script documentation is in __doc__ - replace print statements by pywikibot.info - add main() function, mostly needed for windows and for script tests - add pywikibot.handle_args to handle global options and test -help - add isbnlib dependency - lazy import isbnlib and unidecode - replace sys.stdin.read by pywikibot.input to show a input message - create wikidata_site and repo after global args are read to prevent site warning
Change-Id: I6917ec9b511db609c2f1828486b9a53998d1e376 --- M scripts/create_isbn_edition.py M setup.py M tests/script_tests.py M tox.ini 4 files changed, 317 insertions(+), 212 deletions(-)
Approvals: jenkins-bot: Verified Xqt: Looks good to me, approved
diff --git a/scripts/create_isbn_edition.py b/scripts/create_isbn_edition.py index ee6b18c..2aea6c1 100644 --- a/scripts/create_isbn_edition.py +++ b/scripts/create_isbn_edition.py @@ -1,15 +1,15 @@ -#!/home/geertivp/pwb/bin/python3 - -codedoc = """ -Pywikibot client to load ISBN related data into Wikidata +#!/usr/bin/python3 +"""Pywikibot script to load ISBN related data into Wikidata.
Pywikibot script to get ISBN data from a digital library, and create or amend the related Wikidata item for edition (with the P212=ISBN number as unique external ID).
-Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata. +Use digital libraries to get ISBN data in JSON format, and integrate the +results into Wikidata.
-Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q. +Then the resulting item number can be used e.g. to generate Wikipedia +references using template Cite_Q.
Parameters:
@@ -34,39 +34,49 @@ Default LANG; e.g. en, nl, fr, de, es, it, etc.
P3 P4...: P/Q pairs to add additional claims (repeated) - e.g. P921 Q107643461 (main subject: database management linked to P2163 Fast ID) + e.g. P921 Q107643461 (main subject: database + management linked to P2163 Fast ID)
stdin: ISBN numbers (International standard book number)
- Free text (e.g. Wikipedia references list, or publication list) is accepted. - Identification is done via an ISBN regex expression. + Free text (e.g. Wikipedia references list, or publication list) + is accepted. Identification is done via an ISBN regex expression.
Functionality:
- * The ISBN number is used as a primary key (P212 where no duplicates are allowed) - The item update is not performed when there is no unique match - * Statements are added or merged incrementally; existing data is not overwritten. - * Authors and publishers are searched to get their item number (ambiguous items are skipped) + * The ISBN number is used as a primary key (P212 where no duplicates + are allowed. The item update is not performed when there is no + unique match + * Statements are added or merged incrementally; existing data is not + overwritten. + * Authors and publishers are searched to get their item number + (ambiguous items are skipped) * Book title and subtitle are separated with '.', ':', or '-' * This script can be run incrementally with the same parameters - Caveat: Take into account the Wikidata Query database replication delay. - Wait for minimum 5 minutes to avoid creating duplicate objects. + Caveat: Take into account the Wikidata Query database + replication delay. Wait for minimum 5 minutes to avoid creating + duplicate objects.
Data quality:
- * Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates - Merge duplicate items before running the script again. + * Use https://query.wikidata.org/querybuilder/ to identify P212 + duplicates. Merge duplicate items before running the script + again. * The following properties should only be used for written works P5331: OCLC work ID (editions should only have P243) - P8383: Goodreads-identificatiecode for work (editions should only have P2969) + P8383: Goodreads-identificatiecode for work (editions should + only have P2969)
Examples:
- # Default library (Google Books), language (LANG), no additional statements + # Default library (Google Books), language (LANG), no additional + statements + ./create_isbn_edition.py 9789042925564
# Wikimedia, language Dutch, main subject: database management + ./create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6
@@ -109,10 +119,11 @@ P1036: Dewey Decimal Classification P2163: Fast ID (inverse lookup via Wikidata Query) -> P921: main subject P2969: Goodreads-identificatiecode - + (only for written works) P5331: OCLC work ID (editions should only have P243) - P8383: Goodreads-identificatiecode for work (editions should only have P2969) + P8383: Goodreads-identificatiecode for work (editions should only + have P2969)
Author:
@@ -154,7 +165,7 @@ https://pypi.org/search/?q=isbnlib_
pip install isbnlib (mandatory) - + (optional) pip install isbnlib-bol pip install isbnlib-bnf @@ -169,24 +180,32 @@ * Better use the ISO 639-1 language code parameter as a default The language code is not always available from the digital library. * SPARQL queries run on a replicated database - Possible important replication delay; wait 5 minutes before retry -- otherwise risk for creating duplicates. + Possible important replication delay; wait 5 minutes before retry + -- otherwise risk for creating duplicates.
Known problems:
* Unknown ISBN, e.g. 9789400012820 - * No ISBN data available for an edition either causes no output (goob = Google Books), or an error message (wiki, openl) + * No ISBN data available for an edition either causes no output + (goob = Google Books), or an error message (wiki, openl) The script is taking care of both * Only 6 ISBN attributes are listed by the webservice(s) missing are e.g.: place of publication, number of pages - * Not all ISBN atttributes have data (authos, publisher, date of publication, language) - * The script uses multiple webservice calls (script might take time, but it is automated) - * Need to amend ISBN items that have no author, publisher, or other required data (which additional services to use?) + * Not all ISBN atttributes have data (authos, publisher, date of + publication, language) + * The script uses multiple webservice calls (script might take time, + but it is automated) + * Need to amend ISBN items that have no author, publisher, or other + required data (which additional services to use?) * How to add still more digital libraries? - * Does the KBR has a public ISBN service (Koninklijke Bibliotheek van België)? - * Filter for work properties -- need to amend Q47461344 (written work) instance and P629 (edition of) + P747 (has edition) statements - https://www.wikidata.org/wiki/Q63413107 + * Does the KBR has a public ISBN service (Koninklijke + Bibliotheek van België)? + * Filter for work properties -- need to amend Q47461344 (written + work) instance and P629 (edition of) + P747 (has edition) + statements https://www.wikidata.org/wiki/Q63413107 ['9781282557246', '9786612557248', '9781847196057', '9781847196040'] - P8383: Goodreads-identificatiecode voor work 13957943 (should have P2969) + P8383: Goodreads-identificatiecode voor work 13957943 (should + have P2969) P5331: OCLC-identificatiecode voor work 793965595 (should have P243)
To do: @@ -205,7 +224,7 @@ Environment:
The python script can run on the following platforms: - + Linux client Google Chromebook (Linux container) Toolforge Portal @@ -238,7 +257,7 @@ Related projects:
https://phabricator.wikimedia.org/T314942 (this script) - + (other projects) https://phabricator.wikimedia.org/T282719 https://phabricator.wikimedia.org/T214802 @@ -254,64 +273,71 @@ https://en.wikipedia.org/wiki/bibliographic_database https://www.titelbank.nl/pls/ttb/f?p=103:4012:::NO::P4012_TTEL_ID:3496019&am...
+.. versionadded:: 7.7 """ - +# +# (C) Pywikibot team, 2022 +# +# Distributed under the terms of the MIT license. +# import logging # Error logging import os # Operating system -import re # Regular expressions (very handy!) +import re # Regular expressions (very handy!) import sys # System calls -import unidecode # Unicode
-import pywikibot # API interface to Wikidata - -from isbnlib import * # ISBN data -from pywikibot import pagegenerators as pg # Wikidata Query interface +import pywikibot # API interface to Wikidata +from pywikibot import pagegenerators as pg # Wikidata Query interface +from pywikibot.backports import List from pywikibot.data import api
+try: + import isbnlib +except ImportError as e: + isbnlib = e + +try: + from unidecode import unidecode +except ImportError as e: + unidecode = e + # Initialisation debug = True # Show debugging information verbose = True # Verbose mode
booklib = 'goob' # Default digital library -isbnre = re.compile(r'[0-9-]{10,17}') # ISBN number: 10 or 13 digits with optional dashes (-) + +# ISBN number: 10 or 13 digits with optional dashes (-) +isbnre = re.compile(r'[0-9-]{10,17}') propre = re.compile(r'P[0-9]+') # Wikidata P-number qsuffre = re.compile(r'Q[0-9]+') # Wikidata Q-number
# Other statements are added via command line parameters target = { -'P31':'Q3331189', # Is an instance of an edition + 'P31': 'Q3331189', # Is an instance of an edition }
# Statement property and instance validation rules propreqinst = { -'P50':'Q5', # Author requires human -'P123':{'Q2085381', 'Q1114515', 'Q1320047'},# Publisher requires publisher -'P407':{'Q34770', 'Q33742', 'Q1288568'}, # Edition language requires at least one of (living, natural) language + 'P50': 'Q5', # Author requires human + # Publisher requires publisher + 'P123': {'Q2085381', 'Q1114515', 'Q1320047'}, + # Edition language requires at least one of (living, natural) language + 'P407': {'Q34770', 'Q33742', 'Q1288568'}, }
mainlang = os.getenv('LANG', 'en')[:2] # Default description language
# Connect to database -transcmt = '#pwb Create ISBN edition' # Wikidata transaction comment -wikidata_site = pywikibot.Site('wikidata', 'wikidata') # Login to Wikibase instance -repo = wikidata_site.data_repository() # Required for wikidata object access (item, property, statement) +transcmt = '#pwb Create ISBN edition' # Wikidata transaction comment
-def is_in_list(statement_list, checklist): +def is_in_list(statement_list, checklist: List[str]) -> bool: + """Verify if statement list contains at least one item from the checklist. + + :param statement_list: Statement list + :param checklist: List of values + :Returns: True when match """ -Verify if statement list contains at least one item from the checklist - -Parameters: - - statement_list: Statement list - - checklist: List of values (string) - -Returns: - - Boolean (True when match) - """ - for seq in statement_list: if seq.getTarget().getID() in checklist: isinlist = True @@ -322,84 +348,92 @@
def get_item_list(item_name, instance_id): + """Get list of items by name, belonging to an instance (list). + + :param item_name: Item name (string; case sensitive) + :param instance_id: Instance ID (string, set, or list) + :Returns: Set of items (Q-numbers) """ -Get list of items by name, belonging to an instance (list) - -Parameters: - - item_name: Item name (string; case sensitive) - - instance_id: Instance ID (string, set, or list) - -Returns: - - Set of items (Q-numbers) - """ - item_list = set() # Empty set - params = {'action': 'wbsearchentities', 'format': 'json', 'type': 'item', 'strictlanguage': False, - 'language': mainlang, # All languages are searched, but labels are in native language - 'search': item_name} # Get item list from label + params = { + 'action': 'wbsearchentities', + 'format': 'json', + 'type': 'item', + 'strictlanguage': False, + # All languages are searched, but labels are in native language + 'language': mainlang, + 'search': item_name, # Get item list from label + } request = api.Request(site=wikidata_site, parameters=params) result = request.submit()
if 'search' in result: for res in result['search']: item = pywikibot.ItemPage(repo, res['id']) - item.get(get_redirect = True) + item.get(get_redirect=True) if 'P31' in item.claims: - for seq in item.claims['P31']: # Loop through instances - if seq.getTarget().getID() in instance_id: # Matching instance - for lang in item.labels: # Search all languages - if unidecode.unidecode(item_name.lower()) == unidecode.unidecode(item.labels[lang].lower()): # Ignore label case and accents - item_list.add(item.getID()) # Label math + for seq in item.claims['P31']: # Loop through instances + # Matching instance + if seq.getTarget().getID() in instance_id: + for lang in item.labels: # Search all languages + # Ignore label case and accents + if (unidecode(item_name.lower()) + == unidecode(item.labels[lang].lower())): + item_list.add(item.getID()) # Label math for lang in item.aliases: - if item_name in item.aliases[lang]: # Case sensitive for aliases - item_list.add(item.getID()) # Alias match + # Case sensitive for aliases + if item_name in item.aliases[lang]: + item_list.add(item.getID()) # Alias match return item_list
-def amend_isbn_edition(isbn_number): - """ -Amend ISBN registration. - -Parameters: - - isbn_number: ISBN number (string; 10 or 13 digits with optional hyphens) - -Result: +def amend_isbn_edition(isbn_number): # noqa: C901 + """Amend ISBN registration.
Amend Wikidata, by registering the ISBN-13 data via P212, depending on the data obtained from the digital library. + + :param isbn_number: ISBN number (string; 10 or 13 digits with + optional hyphens) """ + global logger global proptyx + global targetx
isbn_number = isbn_number.strip() if isbn_number == '': - return 3 # Do nothing when the ISBN number is missing - + return 3 # Do nothing when the ISBN number is missing + # Validate ISBN data if verbose: - print() + pywikibot.info()
try: - isbn_data = meta(isbn_number, service=booklib) + isbn_data = isbnlib.meta(isbn_number, service=booklib) logger.info(isbn_data) - # {'ISBN-13': '9789042925564', 'Title': 'De Leuvense Vaart - Van De Vaartkom Tot Wijgmaal. Aspecten Uit De Industriele Geschiedenis Van Leuven', 'Authors': ['A. Cresens'], 'Publisher': 'Peeters Pub & Booksellers', 'Year': '2012', 'Language': 'nl'} + # {'ISBN-13': '9789042925564', + # 'Title': 'De Leuvense Vaart - Van De Vaartkom Tot Wijgmaal. ' + # 'Aspecten Uit De Industriele Geschiedenis Van Leuven', + # 'Authors': ['A. Cresens'], + # 'Publisher': 'Peeters Pub & Booksellers', + # 'Year': '2012', + # 'Language': 'nl'} except Exception as error: # When the book is unknown the function returns logger.error(error) - #raise ValueError(error) + # raise ValueError(error) return 3
if len(isbn_data) < 6: - logger.error('Unknown or incomplete digital library registration for %s' % isbn_number) + logger.error( + 'Unknown or incomplete digital library registration for %s' + % isbn_number) return 3
# Show the raw results if verbose: for i in isbn_data: - print('%s:\t%s' % (i, isbn_data[i])) + pywikibot.info('%s:\t%s' % (i, isbn_data[i]))
# Get the book language from the ISBN book reference booklang = mainlang # Default language @@ -419,10 +453,10 @@
# Get formatted ISBN number isbn_number = isbn_data['ISBN-13'] # Numeric format - isbn_fmtd = mask(isbn_number) # Canonical format + isbn_fmtd = isbnlib.mask(isbn_number) # Canonical format if verbose: - print() - print(isbn_fmtd) # First one + pywikibot.info() + pywikibot.info(isbn_fmtd) # First one
# Get (sub)title when there is a dot titles = isbn_data['Title'].split('. ') # goob is using a '.' @@ -435,14 +469,17 @@ if len(titles) > 1: subtitle = titles[1].strip()
- # Print book titles + # pywikibot.info book titles if debug: - print(objectname, file=sys.stderr) - print(subtitle, file=sys.stderr) # Optional - for i in range(2,len(titles)): # Print subsequent subtitles, when available - print(titles[i].strip(), file=sys.stderr) # Not stored in Wikidata... + pywikibot.info(objectname, file=sys.stderr) + pywikibot.info(subtitle, file=sys.stderr) # Optional + # print subsequent subtitles, when available + for i in range(2, len(titles)): + # Not stored in Wikidata... + pywikibot.info(titles[i].strip(), file=sys.stderr)
# Search the ISBN number in Wikidata both canonical and numeric + # P212 should have canonical hyphenated format isbn_query = ("""# Get ISBN number SELECT ?item WHERE { VALUES ?isbn_number { @@ -451,13 +488,13 @@ } ?item wdt:P212 ?isbn_number. } -""" % (isbn_fmtd, isbn_number)) # P212 should have canonical hyphenated format +""" % (isbn_fmtd, isbn_number))
logger.info(isbn_query) generator = pg.WikidataSPARQLPageGenerator(isbn_query, site=wikidata_site)
rescnt = 0 - for item in generator: # Main loop for all DISTINCT items + for item in generator: # Main loop for all DISTINCT items rescnt += 1 qnumber = item.getID() logger.warning('Found item: %s' % qnumber) @@ -479,7 +516,7 @@ # Add all P/Q values # Make sure that labels are known in the native language if debug: - print(target, file=sys.stderr) + pywikibot.info(target, file=sys.stderr)
# Register statements for propty in target: @@ -489,8 +526,11 @@ targetx[propty] = pywikibot.ItemPage(repo, target[propty])
try: - logger.warning('Add %s (%s): %s (%s)' % (proptyx[propty].labels[booklang], propty, targetx[propty].labels[booklang], target[propty])) - except: + logger.warning('Add %s (%s): %s (%s)' + % (proptyx[propty].labels[booklang], propty, + targetx[propty].labels[booklang], + target[propty])) + except: # noqa: B001, E722, H201 logger.warning('Add %s:%s' % (propty, target[propty]))
claim = pywikibot.Claim(repo, propty) @@ -508,20 +548,23 @@ if 'P1476' not in item.claims: logger.warning('Add Title (P1476): %s' % (objectname)) claim = pywikibot.Claim(repo, 'P1476') - claim.setTarget(pywikibot.WbMonolingualText(text=objectname, language=booklang)) + claim.setTarget(pywikibot.WbMonolingualText(text=objectname, + language=booklang)) item.addClaim(claim, bot=True, summary=transcmt)
# Subtitle if subtitle != '' and 'P1680' not in item.claims: logger.warning('Add Subtitle (P1680): %s' % (subtitle)) claim = pywikibot.Claim(repo, 'P1680') - claim.setTarget(pywikibot.WbMonolingualText(text=subtitle, language=booklang)) + claim.setTarget(pywikibot.WbMonolingualText(text=subtitle, + language=booklang)) item.addClaim(claim, bot=True, summary=transcmt)
# Date of publication pub_year = isbn_data['Year'] if pub_year != '' and 'P577' not in item.claims: - logger.warning('Add Year of publication (P577): %s' % (isbn_data['Year'])) + logger.warning('Add Year of publication (P577): %s' + % (isbn_data['Year'])) claim = pywikibot.Claim(repo, 'P577') claim.setTarget(pywikibot.WbTime(year=int(pub_year), precision='year')) item.addClaim(claim, bot=True, summary=transcmt) @@ -543,7 +586,8 @@ break
if add_author: - logger.warning('Add author %d (P50): %s (%s)' % (author_cnt, author_name, author_list[0])) + logger.warning('Add author %d (P50): %s (%s)' + % (author_cnt, author_name, author_list[0])) claim = pywikibot.Claim(repo, 'P50') claim.setTarget(pywikibot.ItemPage(repo, author_list[0])) item.addClaim(claim, bot=True, summary=transcmt) @@ -559,11 +603,13 @@ # Get the publisher publisher_name = isbn_data['Publisher'].strip() if publisher_name != '': - publisher_list = list(get_item_list(publisher_name, propreqinst['P123'])) + publisher_list = list(get_item_list(publisher_name, + propreqinst['P123']))
if len(publisher_list) == 1: if 'P123' not in item.claims: - logger.warning('Add publisher (P123): %s (%s)' % (publisher_name, publisher_list[0])) + logger.warning('Add publisher (P123): %s (%s)' + % (publisher_name, publisher_list[0])) claim = pywikibot.Claim(repo, 'P123') claim.setTarget(pywikibot.ItemPage(repo, publisher_list[0])) item.addClaim(claim, bot=True, summary=transcmt) @@ -573,30 +619,33 @@ logger.warning('Ambiguous publisher: %s' % publisher_name)
# Get addional data from the digital library - isbn_cover = cover(isbn_number) - isbn_editions = editions(isbn_number, service='merge') - isbn_doi = doi(isbn_number) - isbn_info = info(isbn_number) + isbn_cover = isbnlib.cover(isbn_number) + isbn_editions = isbnlib.editions(isbn_number, service='merge') + isbn_doi = isbnlib.doi(isbn_number) + isbn_info = isbnlib.info(isbn_number)
if verbose: - print() - print(isbn_info) - print(isbn_doi) - print(isbn_editions) + pywikibot.info() + pywikibot.info(isbn_info) + pywikibot.info(isbn_doi) + pywikibot.info(isbn_editions)
# Book cover images for i in isbn_cover: - print('%s:\t%s' % (i, isbn_cover[i])) + pywikibot.info('%s:\t%s' % (i, isbn_cover[i]))
# Handle ISBN classification - isbn_classify = classify(isbn_number) + isbn_classify = isbnlib.classify(isbn_number) if debug: for i in isbn_classify: - print('%s:\t%s' % (i, isbn_classify[i]), file=sys.stderr) + pywikibot.info('%s:\t%s' % (i, isbn_classify[i]), file=sys.stderr)
# ./create_isbn_edition.py '978-3-8376-5645-9' - de P407 Q188 # Q113460204 - # {'owi': '11103651812', 'oclc': '1260160983', 'lcc': 'TK5105.8882', 'ddc': '300', 'fast': {'1175035': 'Wikis (Computer science)', '1795979': 'Wikipedia', '1122877': 'Social sciences'}} + # {'owi': '11103651812', 'oclc': '1260160983', 'lcc': 'TK5105.8882', + # 'ddc': '300', 'fast': {'1175035': 'Wikis (Computer science)', + # '1795979': 'Wikipedia', + # '1122877': 'Social sciences'}}
# Set the OCLC ID if 'oclc' in isbn_classify and 'P243' not in item.claims: @@ -608,54 +657,75 @@ # OCLC ID and OCLC work ID should not be both assigned if 'P243' in item.claims and 'P5331' in item.claims: if 'P629' in item.claims: - oclcwork = item.claims['P5331'][0] # OCLC Work should be unique - oclcworkid = oclcwork.getTarget() # Get the OCLC Work ID from the edition - work = item.claims['P629'][0].getTarget() # Edition should belong to only one single work - logger.warning('Move OCLC Work ID %s to work %s' % (oclcworkid, work.getID())) # There doesn't exist a moveClaim method? - if 'P5331' not in work.claims: # Keep current OCLC Work ID if present + oclcwork = item.claims['P5331'][0] # OCLC Work should be unique + # Get the OCLC Work ID from the edition + oclcworkid = oclcwork.getTarget() + # Edition should belong to only one single work + work = item.claims['P629'][0].getTarget() + # There doesn't exist a moveClaim method? + logger.warning('Move OCLC Work ID %s to work %s' + % (oclcworkid, work.getID())) + # Keep current OCLC Work ID if present + if 'P5331' not in work.claims: claim = pywikibot.Claim(repo, 'P5331') claim.setTarget(oclcworkid) work.addClaim(claim, bot=True, summary=transcmt) - item.removeClaims(oclcwork, bot=True, summary=transcmt) # OCLC Work ID does not belong to edition + # OCLC Work ID does not belong to edition + item.removeClaims(oclcwork, bot=True, summary=transcmt) else: - logger.error('OCLC Work ID %s conflicts with OCLC ID %s and no work available' % (item.claims['P5331'][0].getTarget(), item.claims['P243'][0].getTarget())) + logger.error('OCLC Work ID %s conflicts with OCLC ID %s and no ' + 'work available' + % (item.claims['P5331'][0].getTarget(), + item.claims['P243'][0].getTarget()))
# OCLC work ID should not be registered for editions, only for works if 'owi' not in isbn_classify: pass - elif 'P629' in item.claims: # Get the work related to the edition - work = item.claims['P629'][0].getTarget() # Edition should only have one single work - if 'P5331' not in work.claims: # Assign the OCLC work ID if missing - logger.warning('Add OCLC work ID (P5331): %s to work %s' % (isbn_classify['owi'], work.getID())) + elif 'P629' in item.claims: # Get the work related to the edition + # Edition should only have one single work + work = item.claims['P629'][0].getTarget() + if 'P5331' not in work.claims: # Assign the OCLC work ID if missing + logger.warning('Add OCLC work ID (P5331): %s to work %s' + % (isbn_classify['owi'], work.getID())) claim = pywikibot.Claim(repo, 'P5331') claim.setTarget(isbn_classify['owi']) work.addClaim(claim, bot=True, summary=transcmt) elif 'P243' in item.claims: - logger.warning('OCLC Work ID %s ignored because of OCLC ID %s' % (isbn_classify['owi'], item.claims['P243'][0].getTarget())) - elif 'P5331' not in item.claims: # Assign the OCLC work ID only if there is no work, and no OCLC ID for edition - logger.warning('Add OCLC work ID (P5331): %s to edition' % (isbn_classify['owi'])) + logger.warning('OCLC Work ID %s ignored because of OCLC ID %s' + % (isbn_classify['owi'], + item.claims['P243'][0].getTarget())) + # Assign the OCLC work ID only if there is no work, and no OCLC ID + # for edition + elif 'P5331' not in item.claims: + logger.warning('Add OCLC work ID (P5331): %s to edition' + % (isbn_classify['owi'])) claim = pywikibot.Claim(repo, 'P5331') claim.setTarget(isbn_classify['owi']) item.addClaim(claim, bot=True, summary=transcmt)
- # Reverse logic for moving OCLC ID and P212 (ISBN) from work to edition is more difficult because of 1:M relationship... + # Reverse logic for moving OCLC ID and P212 (ISBN) from work to + # edition is more difficult because of 1:M relationship...
# Same logic as for OCLC (work) ID
# Goodreads-identificatiecode (P2969)
- # Goodreads-identificatiecode for work (P8383) should not be registered for editions; should rather use P2969 + # Goodreads-identificatiecode for work (P8383) should not be + # registered for editions; should rather use P2969
# Library of Congress Classification (works and editions) if 'lcc' in isbn_classify and 'P8360' not in item.claims: - logger.warning('Add Library of Congress Classification for edition (P8360): %s' % (isbn_classify['lcc'])) + logger.warning( + 'Add Library of Congress Classification for edition (P8360): %s' + % (isbn_classify['lcc'])) claim = pywikibot.Claim(repo, 'P8360') claim.setTarget(isbn_classify['lcc']) item.addClaim(claim, bot=True, summary=transcmt)
# Dewey Decimale Classificatie if 'ddc' in isbn_classify and 'P1036' not in item.claims: - logger.warning('Add Dewey Decimale Classificatie (P1036): %s' % (isbn_classify['ddc'])) + logger.warning('Add Dewey Decimale Classificatie (P1036): %s' + % (isbn_classify['ddc'])) claim = pywikibot.Claim(repo, 'P1036') claim.setTarget(isbn_classify['ddc']) item.addClaim(claim, bot=True, summary=transcmt) @@ -666,7 +736,8 @@ # https://www.oclc.org/research/areas/data-science/fast.html # https://www.oclc.org/content/dam/oclc/fast/FAST-quick-start-guide-2022.pdf
- # Authority control identifier from WorldCat's “FAST Linked Data” authority file (external ID P2163) + # Authority control identifier from WorldCat's “FAST Linked Data” + # authority file (external ID P2163) # Corresponding to P921 (Wikidata main subject) if 'fast' in isbn_classify: for fast_id in isbn_classify['fast']: @@ -679,109 +750,142 @@ """ % (fast_id))
logger.info(main_subject_query) - generator = pg.WikidataSPARQLPageGenerator(main_subject_query, site=wikidata_site) + generator = pg.WikidataSPARQLPageGenerator(main_subject_query, + site=wikidata_site)
rescnt = 0 - for main_subject in generator: # Main loop for all DISTINCT items + for main_subject in generator: # Main loop for all DISTINCT items rescnt += 1 qmain_subject = main_subject.getID() try: main_subject_label = main_subject.labels[booklang] - logger.info('Found main subject %s (%s) for Fast ID %s' % (main_subject_label, qmain_subject, fast_id)) - except: + logger.info('Found main subject %s (%s) for Fast ID %s' + % (main_subject_label, qmain_subject, fast_id)) + except: # noqa B001, E722, H201 main_subject_label = '' - logger.info('Found main subject (%s) for Fast ID %s' % (qmain_subject, fast_id)) - logger.error('Missing label for item %s' % qmain_subject) + logger.info('Found main subject (%s) for Fast ID %s' + % (qmain_subject, fast_id)) + logger.error('Missing label for item %s' + % qmain_subject)
# Create or amend P921 statement if rescnt == 0: - logger.error('Main subject not found for Fast ID %s' % (fast_id)) + logger.error('Main subject not found for Fast ID %s' + % (fast_id)) elif rescnt == 1: add_main_subject = True - if 'P921' in item.claims: # Check for duplicates + if 'P921' in item.claims: # Check for duplicates for seq in item.claims['P921']: if seq.getTarget().getID() == qmain_subject: add_main_subject = False break
if add_main_subject: - logger.warning('Add main subject (P921) %s (%s)' % (main_subject_label, qmain_subject)) + logger.warning('Add main subject (P921) %s (%s)' + % (main_subject_label, qmain_subject)) claim = pywikibot.Claim(repo, 'P921') claim.setTarget(main_subject) item.addClaim(claim, bot=True, summary=transcmt) else: - logger.info('Skipping main subject %s (%s)' % (main_subject_label, qmain_subject)) + logger.info('Skipping main subject %s (%s)' + % (main_subject_label, qmain_subject)) else: - logger.error('Ambiguous main subject for Fast ID %s' % (fast_id)) + logger.error('Ambiguous main subject for Fast ID %s' + % (fast_id))
# Book description - isbn_description = desc(isbn_number) + isbn_description = isbnlib.desc(isbn_number) if isbn_description != '': - print() - print(isbn_description) + pywikibot.info() + pywikibot.info(isbn_description)
# Currently does not work (service not available) try: logger.warning('BibTex unavailable') return 0 - bibtex_metadata = doi2tex(isbn_doi) - print(bibtex_metadata) + bibtex_metadata = isbnlib.doi2tex(isbn_doi) + pywikibot.info(bibtex_metadata) except Exception as error: logger.error(error) # Data not available
return 0
-# Error logging -logger = logging.getLogger('create_isbn_edition') -#logging.basicConfig(level=logging.DEBUG) # Uncomment for debugging -##logger.setLevel(logging.DEBUG) +def main(*args: str) -> None: + """ + Process command line arguments and invoke bot.
-pgmnm = sys.argv.pop(0) -logger.debug('%s %s' % (pgmnm, '2022-08-23 (gvp)')) + If args is an empty list, sys.argv is used.
-# Get optional parameters + :param args: command line arguments + """ + # Error logging + global logger + global repo + global targetx + global wikidata_site
-# Get the digital library -if len(sys.argv) > 0: - booklib = sys.argv.pop(0) - if booklib == '-': - booklib = 'goob' + logger = logging.getLogger('create_isbn_edition')
-# Get the native language -# The language code is only required when P/Q parameters are added, or different from the LANG code -if len(sys.argv) > 0: - mainlang = sys.argv.pop(0) + # Get optional parameters + local_args = pywikibot.handle_args(*args)
-# Get additional P/Q parameters -while len(sys.argv) > 0: - inpar = propre.findall(sys.argv.pop(0).upper())[0] - target[inpar] = qsuffre.findall(sys.argv.pop(0).upper())[0] + # Login to Wikibase instance + wikidata_site = pywikibot.Site('wikidata') + # Required for wikidata object access (item, property, statement) + repo = wikidata_site.data_repository()
-# Validate P/Q list -proptyx={} -targetx={} + # Get the digital library + if local_args: + booklib = local_args.pop(0) + if booklib == '-': + booklib = 'goob'
-# Validate the propery/instance pair -for propty in target: - if propty not in proptyx: - proptyx[propty] = pywikibot.PropertyPage(repo, propty) - targetx[propty] = pywikibot.ItemPage(repo, target[propty]) - targetx[propty].get(get_redirect=True) - if propty in propreqinst and ('P31' not in targetx[propty].claims or not is_in_list(targetx[propty].claims['P31'], propreqinst[propty])): - logger.critical('%s (%s) is not a language' % (targetx[propty].labels[mainlang], target[propty])) - sys.exit(12) + # Get the native language + # The language code is only required when P/Q parameters are added, + # or different from the LANG code + if local_args: + mainlang = local_args.pop(0)
-# Get list of item numbers -inputfile = sys.stdin.read() # Typically the Appendix list of references of e.g. a Wikipedia page containing ISBN numbers -itemlist = sorted(set(isbnre.findall(inputfile))) # Extract all ISBN numbers + # Get additional P/Q parameters + while local_args: + inpar = propre.findall(local_args.pop(0).upper())[0] + target[inpar] = qsuffre.findall(local_args(0).upper())[0]
-for isbn_number in itemlist: # Process the next edition - amend_isbn_edition(isbn_number) + # Validate P/Q list + proptyx = {} + targetx = {}
-# Einde van de miserie -""" -Notes: + # Validate the propery/instance pair + for propty in target: + if propty not in proptyx: + proptyx[propty] = pywikibot.PropertyPage(repo, propty) + targetx[propty] = pywikibot.ItemPage(repo, target[propty]) + targetx[propty].get(get_redirect=True) + if propty in propreqinst and ( + 'P31' not in targetx[propty].claims + or not is_in_list(targetx[propty].claims['P31'], + propreqinst[propty])): + logger.critical('%s (%s) is not a language' + % (targetx[propty].labels[mainlang], + target[propty])) + sys.exit(12) + + # check dependencies + for module in (isbnlib, unidecode): + if isinstance(module, ImportError): + raise module + + # Get list of item numbers + # Typically the Appendix list of references of e.g. a Wikipedia page + # containing ISBN numbers + inputfile = pywikibot.input('Get list of item numbers') + # Extract all ISBN numbers + itemlist = sorted(set(isbnre.findall(inputfile))) + + for isbn_number in itemlist: # Process the next edition + amend_isbn_edition(isbn_number)
-""" +if __name__ == '__main__': + main() diff --git a/setup.py b/setup.py index 21779d9..00a1cb9 100755 --- a/setup.py +++ b/setup.py @@ -97,6 +97,7 @@
# ------- setup extra_requires for scripts ------- # script_deps = { + 'create_isbn_edition.py': ['isbnlib', 'unidecode'], 'commons_information.py': extra_deps['mwparserfromhell'], 'patrol.py': extra_deps['mwparserfromhell'], 'weblinkchecker.py': extra_deps['memento'], diff --git a/tests/script_tests.py b/tests/script_tests.py index 94d0b80..d499269 100755 --- a/tests/script_tests.py +++ b/tests/script_tests.py @@ -26,6 +26,7 @@ # These dependencies are not always the package name which is in setup.py. # Here, the name given to the module which will be imported is required. script_deps = { + 'create_isbn_edition': ['isbnlib', 'unidecode'], 'commons_information': ['mwparserfromhell'], 'patrol': ['mwparserfromhell'], 'weblinkchecker': ['memento_client'], @@ -374,7 +375,7 @@ # Here come scripts requiring and missing dependencies, that haven't been # fixed to output -help in that case. _expected_failures = {'version'} - _allowed_failures = ['create_isbn_edition'] + _allowed_failures = []
_arguments = '-help' _results = None diff --git a/tox.ini b/tox.ini index ecf4bfc..3b35408 100644 --- a/tox.ini +++ b/tox.ini @@ -164,7 +164,6 @@ scripts/clean_sandbox.py: N816 scripts/commonscat.py: N802, N806, N816 scripts/cosmetic_changes.py: N816 - scripts/create_isbn_edition.py: C901, D100, E402, E501, F405, T201 scripts/dataextend.py: C901, D101, D102, E126, E127, E131, E501 scripts/harvest_template.py: N802, N816 scripts/interwiki.py: N802, N803, N806, N816