jenkins-bot merged this change.

View Change

Approvals: Framawiki: Looks good to me, approved jenkins-bot: Verified
[IMPR] Improvements for solve_disambiguation.py

- firstlinks() becomes a generator
- titles in firstize() becomes a set for lookup

Change-Id: I145aad348cf0590e5773f2a042cc28e4701af152
---
M scripts/solve_disambiguation.py
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/scripts/solve_disambiguation.py b/scripts/solve_disambiguation.py
index 823f513..53444bc 100755
--- a/scripts/solve_disambiguation.py
+++ b/scripts/solve_disambiguation.py
@@ -715,13 +715,11 @@
Lines without an asterisk at the beginning will be disregarded.
No check for page existence, it has already been done.
"""
- links = []
reg = re.compile(r'\*.*?\[\[(.*?)(?:\||\]\])')
- for line in page.get().splitlines():
+ for line in page.text.splitlines():
found = reg.match(line)
if found:
- links.append(found.group(1))
- return links
+ yield found.group(1)

def firstize(self, page, links):
"""Call firstlinks and remove extra links.
@@ -729,9 +727,8 @@
This will remove a lot of silly redundant links from overdecorated
disambiguation pages and leave the first link of each asterisked
line only. This must be done if -first is used in command line.
-
"""
- titles = [firstcap(t) for t in self.firstlinks(page)]
+ titles = {firstcap(t) for t in self.firstlinks(page)}
links = list(links)
for l in links[:]: # uses a copy because of remove!
if l.title() not in titles:

To view, visit change 461089. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I145aad348cf0590e5773f2a042cc28e4701af152
Gerrit-Change-Number: 461089
Gerrit-PatchSet: 1
Gerrit-Owner: Xqt <info@gno.de>
Gerrit-Reviewer: BinĂ¡ris <wikiposta@gmail.com>
Gerrit-Reviewer: Framawiki <framawiki@tools.wmflabs.org>
Gerrit-Reviewer: John Vandenberg <jayvdb@gmail.com>
Gerrit-Reviewer: jenkins-bot (75)