jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/461089 )
Change subject: [IMPR] Improvements for solve_disambiguation.py ......................................................................
[IMPR] Improvements for solve_disambiguation.py
- firstlinks() becomes a generator - titles in firstize() becomes a set for lookup
Change-Id: I145aad348cf0590e5773f2a042cc28e4701af152 --- M scripts/solve_disambiguation.py 1 file changed, 3 insertions(+), 6 deletions(-)
Approvals: Framawiki: Looks good to me, approved jenkins-bot: Verified
diff --git a/scripts/solve_disambiguation.py b/scripts/solve_disambiguation.py index 823f513..53444bc 100755 --- a/scripts/solve_disambiguation.py +++ b/scripts/solve_disambiguation.py @@ -715,13 +715,11 @@ Lines without an asterisk at the beginning will be disregarded. No check for page existence, it has already been done. """ - links = [] reg = re.compile(r'*.*?[[(.*?)(?:||]])') - for line in page.get().splitlines(): + for line in page.text.splitlines(): found = reg.match(line) if found: - links.append(found.group(1)) - return links + yield found.group(1)
def firstize(self, page, links): """Call firstlinks and remove extra links. @@ -729,9 +727,8 @@ This will remove a lot of silly redundant links from overdecorated disambiguation pages and leave the first link of each asterisked line only. This must be done if -first is used in command line. - """ - titles = [firstcap(t) for t in self.firstlinks(page)] + titles = {firstcap(t) for t in self.firstlinks(page)} links = list(links) for l in links[:]: # uses a copy because of remove! if l.title() not in titles: