[Pywikibot-commits] [Gerrit] pywikibot/core[master]: proofreadpage.py: OCR needs BeautifulSoup

3 Dec 2018

jenkins-bot has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/475613 )

Change subject: proofreadpage.py: OCR needs BeautifulSoup
......................................................................

proofreadpage.py: OCR needs BeautifulSoup

In proofreadpage.py, OCR needs BeautifulSoup in:
- url_image()
- _do_hocr()

Soup() is defined at import time only if bs4 is available.
Define it also when bs4 is not avaiable and make it raise
ImportError when called.
Rename Soup() to _bs4_soup() to comply with function naming rules.

OCR tests if bs4 is not available are already skipped:
- see Iaeabb046660b294fa19025282a344356f756c5bf

Bug: T210335
Change-Id: I5e3d235cdb1cba9b4ed52ba2442a9bfb1802d9bf
---
M pywikibot/proofreadpage.py
1 file changed, 18 insertions(+), 7 deletions(-)

Approvals:
  Xqt: Looks good to me, approved
  jenkins-bot: Verified

diff --git a/pywikibot/proofreadpage.py b/pywikibot/proofreadpage.py
index e22c2e6..5432d70 100644
--- a/pywikibot/proofreadpage.py
+++ b/pywikibot/proofreadpage.py
@@ -38,20 +38,29 @@
     from bs4 import BeautifulSoup, FeatureNotFound
 except ImportError as e:
     BeautifulSoup = e
+
+    def _bs4_soup(*args, **kwargs):
+        """Raise BeautifulSoup when called, if bs4 is not
available."""
+        raise BeautifulSoup
 else:
     try:
         BeautifulSoup('', 'lxml')
     except FeatureNotFound:
-        Soup = partial(BeautifulSoup, features='html.parser')
+        _bs4_soup = partial(BeautifulSoup, features='html.parser')
     else:
-        Soup = partial(BeautifulSoup, features='lxml')
+        _bs4_soup = partial(BeautifulSoup, features='lxml')
 
 import pywikibot
 from pywikibot.comms import http
 from pywikibot.data.api import Request
+from pywikibot.tools import ModuleDeprecationWrapper
 
 _logger = 'proofreadpage'
 
+wrapper = ModuleDeprecationWrapper(__name__)
+wrapper._add_deprecated_attr('Soup', _bs4_soup,
replacement_name='_bs4_soup',
+                             since='20181128')
+
 
 class FullHeader(object):
 
@@ -524,9 +533,10 @@
         @rtype: str/unicode
 
         @raises Exception: in case of http errors
+        @raise ImportError: if bs4 is not installed, _bs4_soup() will raise
         @raises ValueError: in case of no prp_page_image src found for scan
         """
-        # wrong link fail with various possible Exceptions.
+        # wrong link fails with various possible Exceptions.
         if not hasattr(self, '_url_image'):
 
             if self.exists():
@@ -541,7 +551,7 @@
                 pywikibot.error('Error fetching HTML for %s.' % self)
                 raise
 
-            soup = Soup(response.text)
+            soup = _bs4_soup(response.text)
 
             try:
                 self._url_image = soup.find(class_='prp-page-image')
@@ -623,10 +633,11 @@
         This is the main method for 'phetools'.
         Fallback method is ocr.
 
+        @raise ImportError: if bs4 is not installed, _bs4_soup() will raise
         """
         def parse_hocr_text(txt):
             """Parse hocr text."""
-            soup = Soup(txt)
+            soup = _bs4_soup(txt)
 
             res = []
             for ocr_page in soup.find_all(class_='ocr_page'):
@@ -823,7 +834,7 @@
             del self._parsed_text
 
         self._parsed_text = self._get_parsed_page()
-        self._soup = Soup(self._parsed_text)
+        self._soup = _bs4_soup(self._parsed_text)
         # Do not search for "new" here, to avoid to skip purging if links
         # to non-existing pages are present.
         attrs = {'class': re.compile('prp-pagequality')}
@@ -845,7 +856,7 @@
             self.purge()
             del self._parsed_text
             self._parsed_text = self._get_parsed_page()
-            self._soup = Soup(self._parsed_text)
+            self._soup = _bs4_soup(self._parsed_text)
             if not self._soup.find_all('a', attrs=attrs):
                 raise ValueError(
                     'Missing class="qualityN prp-pagequality-N" or '

-- 
To view, visit https://gerrit.wikimedia.org/r/475613
To unsubscribe, or for help writing mail filters, visit
https://gerrit.wikimedia.org/r/settings

Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5e3d235cdb1cba9b4ed52ba2442a9bfb1802d9bf
Gerrit-Change-Number: 475613
Gerrit-PatchSet: 6
Gerrit-Owner: Mpaa &lt;mpaa.wiki(a)gmail.com&gt;
Gerrit-Reviewer: John Vandenberg &lt;jayvdb(a)gmail.com&gt;
Gerrit-Reviewer: Mpaa &lt;mpaa.wiki(a)gmail.com&gt;
Gerrit-Reviewer: Xqt &lt;info(a)gno.de&gt;
Gerrit-Reviewer: jenkins-bot (75)



    

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

[Pywikibot-commits] [Gerrit] pywikibot/core[master]: proofreadpage.py: OCR needs BeautifulSoup