lists.wikimedia.org
Sign In
Sign Up
Sign In
Sign Up
Manage this list
×
Keyboard Shortcuts
Thread View
j
: Next unread message
k
: Previous unread message
j a
: Jump to all threads
j l
: Jump to MailingList overview
2024
July
June
May
April
March
February
January
2023
December
November
October
September
August
July
June
May
April
March
February
January
2022
December
November
October
September
August
July
June
May
April
March
February
January
2021
December
November
October
September
August
July
June
May
April
March
February
January
2020
December
November
October
September
August
July
June
May
April
March
February
January
2019
December
November
October
September
August
July
June
May
April
March
February
January
2018
December
November
October
September
August
July
June
May
April
March
February
January
2017
December
November
October
September
August
July
June
May
April
March
February
January
2016
December
November
October
September
August
July
June
May
April
March
February
January
2015
December
November
October
September
August
July
June
May
April
March
February
January
2014
December
November
October
September
August
July
June
May
April
March
February
January
2013
December
November
October
September
August
July
List overview
Download
Pywikibot-commits
----- 2024 -----
July 2024
June 2024
May 2024
April 2024
March 2024
February 2024
January 2024
----- 2023 -----
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
----- 2022 -----
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
----- 2021 -----
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
----- 2020 -----
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
----- 2019 -----
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
----- 2018 -----
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
----- 2017 -----
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
----- 2016 -----
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
----- 2015 -----
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
----- 2014 -----
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
----- 2013 -----
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
pywikibot-commits@lists.wikimedia.org
1 participants
13424 discussions
Start a n
N
ew thread
[Gerrit] Add "category" and "file" as exemptions in ReplaceExampt - change (pywikibot/core)
by jenkins-bot (Code Review)
10 Nov '13
10 Nov '13
jenkins-bot has submitted this change and it was merged. Change subject: Add "category" and "file" as exemptions in ReplaceExampt ...................................................................... Add "category" and "file" as exemptions in ReplaceExampt Implemented from compat - synchronized with compat by xqt Change-Id: I062f21c592a04cbc872cf287a6873410c70ed864 --- M pywikibot/textlib.py 1 file changed, 6 insertions(+), 2 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py index c30b678..f1df622 100644 --- a/pywikibot/textlib.py +++ b/pywikibot/textlib.py @@ -94,6 +94,10 @@ 'property': re.compile(r'(?i)\{\{\s*#property:\s*p\d+\s*\}\}'), # Module invocations (currently only Lua) 'invoke': re.compile(r'(?i)\{\{\s*#invoke:.*?}\}'), + # categories + 'category': re.compile(ur'\[\[ *(?:%s)\s*:.*?\]\]' % ur'|'.join(site.namespace(14, all=True))), + #files + 'file': re.compile(ur'\[\[ *(?:%s)\s*:.*?\]\]' % ur'|'.join(site.namespace(6, all=True))), } @@ -219,10 +223,10 @@ groupMatch = groupR.search(replacement) if not groupMatch: break - groupID = (groupMatch.group('name') or \ + groupID = (groupMatch.group('name') or int(groupMatch.group('number'))) try: - replacement = (replacement[:groupMatch.start()] + \ + replacement = (replacement[:groupMatch.start()] + match.group(groupID) + \ replacement[groupMatch.end():]) except IndexError: -- To view, visit
https://gerrit.wikimedia.org/r/94494
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I062f21c592a04cbc872cf287a6873410c70ed864 Gerrit-PatchSet: 4 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] PEP8 changes for test stuff - change (pywikibot/compat)
by Xqt (Code Review)
10 Nov '13
10 Nov '13
Xqt has submitted this change and it was merged. Change subject: PEP8 changes for test stuff ...................................................................... PEP8 changes for test stuff Change-Id: I281e8b1ff765d7ea6b005bb79d8449cbb6b1cd78 --- M tests/test_wikipedia.py M tests/test_wiktionary.py 2 files changed, 277 insertions(+), 236 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/tests/test_wikipedia.py b/tests/test_wikipedia.py index 9195de5..678e6ce 100644 --- a/tests/test_wikipedia.py +++ b/tests/test_wikipedia.py @@ -22,156 +22,152 @@ # a set of hard pages for Page.getSections() PAGE_SET_Page_getSections = [ -u'Benutzer Diskussion:Reiner Stoppok/Dachboden', -u'Wikipedia:Löschkandidaten/12. Dezember 2009', #
https://bugzilla.wikimedia.org/show_bug.cgi?id=32753
-u'Wikipedia:Löschkandidaten/28. Juli 2006', -u'Wikipedia Diskussion:Persönliche Bekanntschaften/Archiv/2008', -u'Wikipedia:WikiProjekt München', # bugzilla:32753 -u'Wikipedia Diskussion:Hauptseite', -u'Diskussion:Selbstkühlendes Bierfass', -u'Benutzer Diskussion:P.Copp', -u'Benutzer Diskussion:David Ludwig', -u'Diskussion:Zufall', -u'Benutzer Diskussion:Dekator', -u'Benutzer Diskussion:Bautsch', -u'Benutzer Diskussion:Henbeu', -u'Benutzer Diskussion:Olaf Studt', -u'Diskussion:K.-o.-Tropfen', -u'Portal Diskussion:Fußball/Archiv6', -u'Benutzer Diskussion:Roland.M/Archiv2006-2007', -u'Benutzer Diskussion:Tigerente/Archiv2006', -u'Wikipedia:WikiProjekt Bremen/Beobachtungsliste', # bugzilla:32753 -u'Diskussion:Wirtschaft Chiles', -u'Benutzer Diskussion:Ausgangskontrolle', -u'Benutzer Diskussion:Amnesty.tina', -#u'Diskussion:Chicagoer Schule', # [ DELETED ] -#u'Wikipedia Diskussion:Hausaufgabenhilfe', # [ DELETED ] -u'Benutzer Diskussion:Niemot', -u'Benutzer Diskussion:Computer356', -u'Benutzer Diskussion:Bautsch', -u'Benutzer Diskussion:Infinite Monkey', -u'Benutzer Diskussion:Lsjm', -u'Benutzer Diskussion:Eduardo79', -u'Benutzer Diskussion:Rigidmc', -u'Benutzer Diskussion:Gilgamesch2010', -u'Benutzer Diskussion:Paulusschinew', -u'Benutzer Diskussion:Hollister71', -u'Benutzer Diskussion:Schott-PR', -u'Benutzer Diskussion:RoBoVsKi', -#u'Benutzer Diskussion:Tjaraaa', # [ REDIRECTED ] -u'Benutzer Diskussion:Jason Hits', -u'Benutzer Diskussion:Fit-Fabrik', -u'Benutzer Diskussion:SpaceRazor', -u'Benutzer Diskussion:Fachversicherer', -u'Benutzer Diskussion:Qniemiec', -u'Benutzer Diskussion:Ilikeriri', -u'Benutzer Diskussion:Casinoroyal', -u'Benutzer Diskussion:Havanabua', -u'Benutzer Diskussion:Euku/2010/II. Quartal', # bugzilla:32753 -u'Benutzer Diskussion:Mo4jolo/Archiv/2008', -u'Benutzer Diskussion:Eschweiler', -u'Benutzer Diskussion:Marilyn.hanson', -u'Benutzer Diskussion:A.Savin', -u'Benutzer Diskussion:W!B:/Knacknüsse', -u'Benutzer Diskussion:Euku/2009/II. Halbjahr', -u'Benutzer Diskussion:Gamma', -u'Hilfe Diskussion:Captcha', -u'Benutzer Diskussion:Zacke/Kokytos', -u'Benutzer Diskussion:Wolfgang1018', -u'Benutzer Diskussion:El bes', -u'Benutzer Diskussion:Janneman/Orkus', -u'Wikipedia Diskussion:Shortcuts', -u'Benutzer Diskussion:PDD', -u'Wikipedia:WikiProjekt Vorlagen/Werkstatt', -u'Wikipedia Diskussion:WikiProjekt Wuppertal/2008', -u'Benutzer Diskussion:SchirmerPower', -u'Benutzer Diskussion:Stefan Kühn/Check Wikipedia', -u'Benutzer Diskussion:Elian', -u'Wikipedia:Fragen zur Wikipedia', -u'Benutzer Diskussion:Michael Kühntopf', -u'Benutzer Diskussion:Drahreg01', -u'Wikipedia:Vandalismusmeldung', -u'Benutzer Diskussion:Jesusfreund', -u'Benutzer Diskussion:Velipp28', -u'Benutzer Diskussion:Jotge', -u'Benutzer Diskussion:DAJ', -u'Benutzer Diskussion:Karl-G. Walther', -u'Benutzer Diskussion:Pincerno', -u'Benutzer Diskussion:Polluks', -u'Portal:Serbien/Nachrichtenarchiv', -u'Benutzer Diskussion:Elly200253', -u'Benutzer Diskussion:Yak', -u'Wikipedia:Auskunft', -u'Benutzer Diskussion:Toolittle', -u'Benutzer Diskussion:He3nry', -u'Benutzer Diskussion:Euku/2009/I. Halbjahr', -u'Benutzer Diskussion:Elchbauer' , -u'Benutzer Diskussion:Matthiasb', -u'Benutzer Diskussion:Gripweed', -u'Wikipedia:Löschkandidaten/10. Februar 2011', -u'Benutzer Diskussion:Funkruf', -u'Benutzer Diskussion:Vux', -u'Benutzer Diskussion:Zollernalb/Archiv/2008' , -u'Benutzer Diskussion:Geiserich77/Archiv2009', -u'Benutzer Diskussion:Markus Mueller/Archiv' , -u'Benutzer Diskussion:Capaci34/Archiv/2009', -u'Wikipedia Diskussion:Persönliche Bekanntschaften/Archiv/2010', -u'Benutzer Diskussion:Leithian/Archiv/2009/Aug', -u'Benutzer Diskussion:Lady Whistler/Archiv/2010', -u'Benutzer Diskussion:Jens Liebenau/Archiv1', -u'Benutzer Diskussion:Tilla/Archiv/2009/Juli', -u'Benutzer Diskussion:Xqt', -u'Vorlage Diskussion:Benutzerdiskussionsseite', -u'Wikipedia Diskussion:Meinungsbilder/Gestaltung von Signaturen', -u'Benutzer Diskussion:JvB1953', -u'Benutzer Diskussion:J.-H. Janßen', -u'Benutzer Diskussion:Xqt/Archiv/2009-1', -u'Hilfe Diskussion:Weiterleitung/Archiv/1', -u'Benutzer Diskussion:Raymond/Archiv 2006-2', -u'Wikipedia Diskussion:Projektneuheiten/Archiv/2009', -u'Vorlage Diskussion:Erledigt', -u'Wikipedia:Bots/Anfragen/Archiv/2008-2', -u'Diskussion:Golfschläger/Archiv', -u'Wikipedia:Löschkandidaten/9. Januar 2006', -u'Benutzer Diskussion:Church of emacs/Archiv5', -u'Wikipedia:WikiProjekt Vorlagen/Werkstatt/Archiv 2006', -u'Wikipedia Diskussion:Löschkandidaten/Archiv7', -u'Benutzer Diskussion:Physikr', -u'Benutzer Diskussion:Haring/Archiv, Dez. 2005', -u'Benutzer Diskussion:Seewolf/Archiv 7', -u'Benutzer Diskussion:Mipago/Archiv', -u'Wikipedia Diskussion:WikiProjekt Syntaxkorrektur/Archiv/2009', -u'Benutzer Diskussion:PDD/monobook.js', -u'Wikipedia:Löschkandidaten/9. April 2010', -u'Benutzer Diskussion:Augiasstallputzer/Archiv', -u'Hilfe Diskussion:Variablen', -u'Benutzer Diskussion:Merlissimo/Archiv/2009', -u'Benutzer Diskussion:Elya/Archiv 2007-01', -u'Benutzer Diskussion:Merlissimo/Archiv/2010', -u'Benutzer Diskussion:Jonathan Groß/Archiv 2006', -u'Benutzer Diskussion:Erendissss', -u'Diskussion:Ilse Elsner', -u'Diskussion:Pedro Muñoz', -u'Diskussion:Stimmkreis Nürnberg-Süd', -u'Diskussion:Geschichte der Sozialversicherung in Deutschland', -u'Diskussion:Josef Kappius', -u'Diskussion:Bibra (Adelsgeschlecht)', -#u'Diskussion:Stimmkreis Regensburg-Land-Ost', # [ DELETED ] -u'Diskussion:Volkmar Kretkowski', -u'Diskussion:KS Cracovia', -u'Diskussion:Livingston (Izabal)', -u'Wikipedia Diskussion:WikiProjekt Gesprochene Wikipedia/Howto', -u'Benutzer Diskussion:Otfried Lieberknecht', -u'Benutzer Diskussion:Jahn Henne', -u'Wikipedia:WikiProjekt Begriffsklärungsseiten/Fließband', -u'Wikipedia:Löschprüfung', -u'Benutzer Diskussion:Hubertl', -u'Benutzer Diskussion:Diba', -u'Wikipedia:Qualitätssicherung/11. März 2012', -u'Benutzer Diskussion:Heubergen/Archiv/2012', -u'Benutzer Diskussion:DrTrigon/Archiv', -u'Wikipedia:Fotowerkstatt', -u'Wikipedia:Urheberrechtsfragen', + u'Benutzer Diskussion:Reiner Stoppok/Dachboden', + u'Wikipedia:Löschkandidaten/12. Dezember 2009', #
https://bugzilla.wikimedia.org/show_bug.cgi?id=32753
+ u'Wikipedia:Löschkandidaten/28. Juli 2006', + u'Wikipedia Diskussion:Persönliche Bekanntschaften/Archiv/2008', + u'Wikipedia:WikiProjekt München', # bugzilla:32753 + u'Wikipedia Diskussion:Hauptseite', + u'Diskussion:Selbstkühlendes Bierfass', + u'Benutzer Diskussion:P.Copp', + u'Benutzer Diskussion:David Ludwig', + u'Diskussion:Zufall', + u'Benutzer Diskussion:Dekator', + u'Benutzer Diskussion:Bautsch', + u'Benutzer Diskussion:Henbeu', + u'Benutzer Diskussion:Olaf Studt', + u'Diskussion:K.-o.-Tropfen', + u'Portal Diskussion:Fußball/Archiv6', + u'Benutzer Diskussion:Roland.M/Archiv2006-2007', + u'Benutzer Diskussion:Tigerente/Archiv2006', + u'Wikipedia:WikiProjekt Bremen/Beobachtungsliste', # bugzilla:32753 + u'Diskussion:Wirtschaft Chiles', + u'Benutzer Diskussion:Ausgangskontrolle', + u'Benutzer Diskussion:Amnesty.tina', + u'Benutzer Diskussion:Niemot', + u'Benutzer Diskussion:Computer356', + u'Benutzer Diskussion:Bautsch', + u'Benutzer Diskussion:Infinite Monkey', + u'Benutzer Diskussion:Lsjm', + u'Benutzer Diskussion:Eduardo79', + u'Benutzer Diskussion:Rigidmc', + u'Benutzer Diskussion:Gilgamesch2010', + u'Benutzer Diskussion:Paulusschinew', + u'Benutzer Diskussion:Hollister71', + u'Benutzer Diskussion:Schott-PR', + u'Benutzer Diskussion:RoBoVsKi', + u'Benutzer Diskussion:Jason Hits', + u'Benutzer Diskussion:Fit-Fabrik', + u'Benutzer Diskussion:SpaceRazor', + u'Benutzer Diskussion:Fachversicherer', + u'Benutzer Diskussion:Qniemiec', + u'Benutzer Diskussion:Ilikeriri', + u'Benutzer Diskussion:Casinoroyal', + u'Benutzer Diskussion:Havanabua', + u'Benutzer Diskussion:Euku/2010/II. Quartal', # bugzilla:32753 + u'Benutzer Diskussion:Mo4jolo/Archiv/2008', + u'Benutzer Diskussion:Eschweiler', + u'Benutzer Diskussion:Marilyn.hanson', + u'Benutzer Diskussion:A.Savin', + u'Benutzer Diskussion:W!B:/Knacknüsse', + u'Benutzer Diskussion:Euku/2009/II. Halbjahr', + u'Benutzer Diskussion:Gamma', + u'Hilfe Diskussion:Captcha', + u'Benutzer Diskussion:Zacke/Kokytos', + u'Benutzer Diskussion:Wolfgang1018', + u'Benutzer Diskussion:El bes', + u'Benutzer Diskussion:Janneman/Orkus', + u'Wikipedia Diskussion:Shortcuts', + u'Benutzer Diskussion:PDD', + u'Wikipedia:WikiProjekt Vorlagen/Werkstatt', + u'Wikipedia Diskussion:WikiProjekt Wuppertal/2008', + u'Benutzer Diskussion:SchirmerPower', + u'Benutzer Diskussion:Stefan Kühn/Check Wikipedia', + u'Benutzer Diskussion:Elian', + u'Wikipedia:Fragen zur Wikipedia', + u'Benutzer Diskussion:Michael Kühntopf', + u'Benutzer Diskussion:Drahreg01', + u'Wikipedia:Vandalismusmeldung', + u'Benutzer Diskussion:Jesusfreund', + u'Benutzer Diskussion:Velipp28', + u'Benutzer Diskussion:Jotge', + u'Benutzer Diskussion:DAJ', + u'Benutzer Diskussion:Karl-G. Walther', + u'Benutzer Diskussion:Pincerno', + u'Benutzer Diskussion:Polluks', + u'Portal:Serbien/Nachrichtenarchiv', + u'Benutzer Diskussion:Elly200253', + u'Benutzer Diskussion:Yak', + u'Wikipedia:Auskunft', + u'Benutzer Diskussion:Toolittle', + u'Benutzer Diskussion:He3nry', + u'Benutzer Diskussion:Euku/2009/I. Halbjahr', + u'Benutzer Diskussion:Elchbauer', + u'Benutzer Diskussion:Matthiasb', + u'Benutzer Diskussion:Gripweed', + u'Wikipedia:Löschkandidaten/10. Februar 2011', + u'Benutzer Diskussion:Funkruf', + u'Benutzer Diskussion:Vux', + u'Benutzer Diskussion:Zollernalb/Archiv/2008', + u'Benutzer Diskussion:Geiserich77/Archiv2009', + u'Benutzer Diskussion:Markus Mueller/Archiv', + u'Benutzer Diskussion:Capaci34/Archiv/2009', + u'Wikipedia Diskussion:Persönliche Bekanntschaften/Archiv/2010', + u'Benutzer Diskussion:Leithian/Archiv/2009/Aug', + u'Benutzer Diskussion:Lady Whistler/Archiv/2010', + u'Benutzer Diskussion:Jens Liebenau/Archiv1', + u'Benutzer Diskussion:Tilla/Archiv/2009/Juli', + u'Benutzer Diskussion:Xqt', + u'Vorlage Diskussion:Benutzerdiskussionsseite', + u'Wikipedia Diskussion:Meinungsbilder/Gestaltung von Signaturen', + u'Benutzer Diskussion:JvB1953', + u'Benutzer Diskussion:J.-H. Janßen', + u'Benutzer Diskussion:Xqt/Archiv/2009-1', + u'Hilfe Diskussion:Weiterleitung/Archiv/1', + u'Benutzer Diskussion:Raymond/Archiv 2006-2', + u'Wikipedia Diskussion:Projektneuheiten/Archiv/2009', + u'Vorlage Diskussion:Erledigt', + u'Wikipedia:Bots/Anfragen/Archiv/2008-2', + u'Diskussion:Golfschläger/Archiv', + u'Wikipedia:Löschkandidaten/9. Januar 2006', + u'Benutzer Diskussion:Church of emacs/Archiv5', + u'Wikipedia:WikiProjekt Vorlagen/Werkstatt/Archiv 2006', + u'Wikipedia Diskussion:Löschkandidaten/Archiv7', + u'Benutzer Diskussion:Physikr', + u'Benutzer Diskussion:Haring/Archiv, Dez. 2005', + u'Benutzer Diskussion:Seewolf/Archiv 7', + u'Benutzer Diskussion:Mipago/Archiv', + u'Wikipedia Diskussion:WikiProjekt Syntaxkorrektur/Archiv/2009', + u'Benutzer Diskussion:PDD/monobook.js', + u'Wikipedia:Löschkandidaten/9. April 2010', + u'Benutzer Diskussion:Augiasstallputzer/Archiv', + u'Hilfe Diskussion:Variablen', + u'Benutzer Diskussion:Merlissimo/Archiv/2009', + u'Benutzer Diskussion:Elya/Archiv 2007-01', + u'Benutzer Diskussion:Merlissimo/Archiv/2010', + u'Benutzer Diskussion:Jonathan Groß/Archiv 2006', + u'Benutzer Diskussion:Erendissss', + u'Diskussion:Ilse Elsner', + u'Diskussion:Pedro Muñoz', + u'Diskussion:Stimmkreis Nürnberg-Süd', + u'Diskussion:Geschichte der Sozialversicherung in Deutschland', + u'Diskussion:Josef Kappius', + u'Diskussion:Bibra (Adelsgeschlecht)', + u'Diskussion:Volkmar Kretkowski', + u'Diskussion:KS Cracovia', + u'Diskussion:Livingston (Izabal)', + u'Wikipedia Diskussion:WikiProjekt Gesprochene Wikipedia/Howto', + u'Benutzer Diskussion:Otfried Lieberknecht', + u'Benutzer Diskussion:Jahn Henne', + u'Wikipedia:WikiProjekt Begriffsklärungsseiten/Fließband', + u'Wikipedia:Löschprüfung', + u'Benutzer Diskussion:Hubertl', + u'Benutzer Diskussion:Diba', + u'Wikipedia:Qualitätssicherung/11. März 2012', + u'Benutzer Diskussion:Heubergen/Archiv/2012', + u'Benutzer Diskussion:DrTrigon/Archiv', + u'Wikipedia:Fotowerkstatt', + u'Wikipedia:Urheberrechtsfragen', ] PAGE_SINGLE_GENERIC = PAGE_SET_Page_getSections[0] @@ -182,13 +178,13 @@ class PyWikiWikipediaTestCase(test_pywiki.PyWikiTestCase): def setUp(self): - result = test_pywiki.PyWikiTestCase.setUp(self) + result = test_pywiki.PyWikiTestCase.setUp(self) self.site = pywikibot.getSite('de', 'wikipedia') self.repo = self.site.data_repository() return result def test_module_import(self): - self.assertTrue( "pywikibot" in sys.modules ) + self.assertTrue("pywikibot" in sys.modules) def test_Site(self): self._check_member(pywikibot, "Site", call=True) @@ -196,16 +192,16 @@ def test_Site_getParsedString(self): self._check_member(self.site, "getParsedString", call=True) test_text = u'{{CURRENTTIMESTAMP}}' - text = self.site.getParsedString(test_text, keeptags = []) - self.assertTrue( len(text) <= len(test_text) ) + text = self.site.getParsedString(test_text, keeptags=[]) + self.assertTrue(len(text) <= len(test_text)) text = self.site.getParsedString(test_text) - self.assertTrue( len(text) >= len(test_text) ) + self.assertTrue(len(text) >= len(test_text)) def test_Site_getExpandedString(self): self._check_member(self.site, "getExpandedString", call=True) test_text = u'{{CURRENTTIMESTAMP}}' text = self.site.getExpandedString(test_text) - self.assertTrue( len(text) <= len(test_text) ) + self.assertTrue(len(text) <= len(test_text)) def test_Page(self): self._check_member(pywikibot, "Page", call=True) @@ -213,7 +209,7 @@ def test_Page_getSections(self): self._check_member(pywikibot.Page(self.site, PAGE_SINGLE_GENERIC), "getSections", call=True) - self.assertEqual( len(PAGE_SET_Page_getSections), 146 ) + self.assertEqual(len(PAGE_SET_Page_getSections), 146) count = 0 problems = [] for i, TESTPAGE in enumerate(PAGE_SET_Page_getSections): @@ -222,18 +218,18 @@ sections = page.getSections(minLevel=1) except pywikibot.Error: count += 1 - problems.append( (i, page) ) + problems.append((i, page)) print "Number of pages total:", len(PAGE_SET_Page_getSections) print "Number of problematic pages:", count #print "Problematic pages:", problems - print "Problematic pages:\n", "\n".join( map(str, problems) ) - self.assertLessEqual(count, round(len(PAGE_SET_Page_getSections)/50.)) + print "Problematic pages:\n", "\n".join(map(str, problems)) + self.assertLessEqual(count, round(len(PAGE_SET_Page_getSections) / 50.)) #self.assertTrue( count <= 0 ) def test_Page_purgeCache(self): page = pywikibot.Page(self.site, PAGE_SINGLE_GENERIC) self._check_member(page, "purgeCache", call=True) - self.assertEqual( page.purgeCache(), True ) + self.assertEqual(page.purgeCache(), True) def test_Page_isRedirectPage(self): page = pywikibot.Page(self.site, PAGE_SINGLE_GENERIC) @@ -243,8 +239,8 @@ def test_Page_getVersionHistory(self): page = pywikibot.Page(self.site, PAGE_SINGLE_GENERIC) self._check_member(page, "getVersionHistory", call=True) - self.assertEqual( len(page.getVersionHistory(revCount=1)), 1 ) - self.assertGreater( len(page.getVersionHistory()), 1 ) + self.assertEqual(len(page.getVersionHistory(revCount=1)), 1) + self.assertGreater(len(page.getVersionHistory()), 1) def test_Page_get(self): page = pywikibot.Page(self.site, PAGE_SINGLE_GENERIC) diff --git a/tests/test_wiktionary.py b/tests/test_wiktionary.py index 6e6e329..b5264b6 100644 --- a/tests/test_wiktionary.py +++ b/tests/test_wiktionary.py @@ -9,24 +9,28 @@ import wiktionary + class KnownValues(unittest.TestCase): knownValues = ( - ('==English==', 'en', 2, 'lang'), - ('=={{en}}==', 'en', 2, 'lang'), - ('{{-en-}}', 'en', None, 'lang'), - ('===Noun===', 'noun', 3, 'pos'), - ('==={{noun}}===', 'noun', 3, 'pos'), - ('{{-noun-}}', 'noun', None, 'pos'), - ('===Verb===', 'verb', 3, 'pos'), - ('==={{verb}}===', 'verb', 3, 'pos'), - ('{{-verb-}}', 'verb', None, 'pos'), - ('====Translations====', 'trans', 4, 'other'), - ('===={{trans}}====', 'trans', 4, 'other'), - ('{{-trans-}}', 'trans', None, 'other'), - ) + ('==English==', 'en', 2, 'lang'), + ('=={{en}}==', 'en', 2, 'lang'), + ('{{-en-}}', 'en', None, 'lang'), + ('===Noun===', 'noun', 3, 'pos'), + ('==={{noun}}===', 'noun', 3, 'pos'), + ('{{-noun-}}', 'noun', None, 'pos'), + ('===Verb===', 'verb', 3, 'pos'), + ('==={{verb}}===', 'verb', 3, 'pos'), + ('{{-verb-}}', 'verb', None, 'pos'), + ('====Translations====', 'trans', 4, 'other'), + ('===={{trans}}====', 'trans', 4, 'other'), + ('{{-trans-}}', 'trans', None, 'other'), + ) def testHeaderInitKnownValuesContents(self): - """Header parsing comparing known result with known input for contents""" + """Header parsing comparing known result with known input for contents + + """ + for wikiline, contents, level, type in self.knownValues: result = wiktionary.Header(wikiline).contents self.assertEqual(contents, result) @@ -45,18 +49,27 @@ class SortEntriesCheckSortOrder(unittest.TestCase): - """Entries should be sorted as follows on a page: Translingual first, Wikilang next, then the others alphabetically on the language name in the Wiktionary's language """ + """Entries should be sorted as follows on a page: Translingual first, + Wikilang next, then the others alphabetically on the language name in the + Wiktionary's language + + """ + def testHeaderInitKnownValuesType(self): """Sorting order of Entries on a page""" - examples=((('en','C'),('eo', 'en', 'de', 'nl', 'es', 'translingual', 'fr'), - ['translingual', 'en', 'nl', 'eo', 'fr', 'de', 'es']), - (('nl','C'),('eo', 'en', 'de', 'nl', 'es', 'translingual', 'fr'), - ['translingual', 'nl', 'de', 'en', 'eo', 'fr', 'es']), - (('fr','C'),('eo', 'en', 'de', 'nl', 'es', 'translingual', 'fr'), - ['translingual', 'fr', 'de', 'en', 'es', 'eo', 'nl']), - (('de','C'),('eo', 'en', 'de', 'nl', 'es', 'translingual', 'fr'), - ['translingual', 'de', 'en', 'eo', 'fr', 'nl', 'es']), - ) + examples = ((('en', 'C'), + ('eo', 'en', 'de', 'nl', 'es', 'translingual', 'fr'), + ['translingual', 'en', 'nl', 'eo', 'fr', 'de', 'es']), + (('nl', 'C'), + ('eo', 'en', 'de', 'nl', 'es', 'translingual', 'fr'), + ['translingual', 'nl', 'de', 'en', 'eo', 'fr', 'es']), + (('fr', 'C'), + ('eo', 'en', 'de', 'nl', 'es', 'translingual', 'fr'), + ['translingual', 'fr', 'de', 'en', 'es', 'eo', 'nl']), + (('de', 'C'), + ('eo', 'en', 'de', 'nl', 'es', 'translingual', 'fr'), + ['translingual', 'de', 'en', 'eo', 'fr', 'nl', 'es']), + ) for example in examples: page = wiktionary.WiktionaryPage(example[0][0], example[0][1]) for lang in example[1]: @@ -65,9 +78,16 @@ page.sortEntries() self.assertEqual(page.sortedentries, example[2]) + class TestKnownValuesInParser(unittest.TestCase): - """This class will check various aspects of parsing Wiktionary entries into our object model""" - knownvalues=({'wikilang': 'en', 'term': 'nut', 'wikiformat': u"""==English== + """This class will check various aspects of parsing Wiktionary entries into + our object model + + """ + knownvalues = ( + {'wikilang': 'en', + 'term': 'nut', + 'wikiformat': u"""==English== ===Etymology=== From Middle English [[nute]], from Old English [[hnutu]]. <!-- Is Latin [[nux]], nuc- a cognate? --> ===Pronunciation=== @@ -199,41 +219,66 @@ [[Category:Trees]] [[category:Foods]] """, - 'internalrep': - ( - [u'1000 English basic words',u'Colors',u'Browns',u'Trees',u'Foods'], - [u'io','la'], - {u'en': - [u'nut', None, u'nuts', - [{'definition': u'A hard-shelled seed', 'concisedef': u'seed', - 'trans': {'nl': u"[[noot]] ''f''", 'fr': u"""''no generic translation exists''; [[noix]] ''f'' ''is often used, but this actually means "[[walnut]]"''""", 'de': u"[[Nuss]] ''f''", 'it': u"[[noce]] {{f}}", 'la': u"[[nux]]"}}, - {'definition': u"A piece of metal, often [[hexagonal]], with a hole through it with internal threading intended to fit on to a bolt.", 'concisedef': u'that fits on a bolt', - 'trans': {'nl': u"[[moer]] ''f''", 'fr': u"[[écrou]] ''m''", 'de': u"[[Mutter]] ''f''", 'it': u"[[dado]] {{m}}"}}, - {'definition': u"(''informal'') An insane person.", 'concisedef': u"'''informal: insane person'''", - 'syns': u"[[loony]], [[nutcase]], [[nutter]]", - 'trans': {'nl': u"[[gek]] ''m'', [[gekkin]] ''f'', [[zot]] ''m'', [[zottin]] ''f''", 'fr': "[[fou]] ''m'', [[folle]] ''f''", 'de': "[[Irre]] ''m/f'', [[Irrer]] ''m indef.''"}}, - {'definition': u"(''slang'') The head.", 'concisedef': u"'''slang: the head'''", - 'syns': u"[[bonce]], [[noddle]] (See further synonyms under [[head]])", - 'trans': {'de': u"[[Birne]] ''f'', [[Rübe]] ''f'', [[Dötz]] ''m''"}}, - {'definition': u"(''slang; rarely used in the singular'') A testicle.", 'concisedef': u"'''slang: testicle'''", - 'syns': u"[[ball]], [[bollock]] (''taboo slang''), [[nad]]", - 'trans': {'nl': u"[[noten]] ''m (plural)'' <!--Never heard this before-->, [[bal]] ''m'', [[teelbal]] ''m''", 'fr': u"[[couille]] ''f''", 'de': u"[[Ei]] ''n'', ''lately:'' [[Nuss]] ''f''", 'es': u"[[cojone]], [[huevo]]"}}, - ], - ], - u'nl': - [u'nut', 'n', None, - [{'definition': u'[[use]], [[benefit]]'}] - ], - } - ) - },{'wikilang': 'en', 'term': 'nut', 'wikiformat': u"""[[category:Foods]] -[[category:Drinks]]""", 'internalrep': ([u'Foods', u'Drinks'],[],{})}) + 'internalrep': ( + [u'1000 English basic words', u'Colors', u'Browns', u'Trees', + u'Foods'], + [u'io', 'la'], + {u'en': + [u'nut', None, u'nuts', + [{'definition': u'A hard-shelled seed', + 'concisedef': u'seed', + 'trans': { + 'nl': u"[[noot]] ''f''", + 'fr': u"""''no generic translation exists''; [[noix]] ''f'' ''is often used, but this actually means "[[walnut]]"''""", + 'de': u"[[Nuss]] ''f''", + 'it': u"[[noce]] {{f}}", + 'la': u"[[nux]]"}}, + {'definition': u"A piece of metal, often [[hexagonal]], with a hole through it with internal threading intended to fit on to a bolt.", + 'concisedef': u'that fits on a bolt', + 'trans': { + 'nl': u"[[moer]] ''f''", + 'fr': u"[[écrou]] ''m''", 'de': u"[[Mutter]] ''f''", + 'it': u"[[dado]] {{m}}"}}, + {'definition': u"(''informal'') An insane person.", + 'concisedef': u"'''informal: insane person'''", + 'syns': u"[[loony]], [[nutcase]], [[nutter]]", + 'trans': { + 'nl': u"[[gek]] ''m'', [[gekkin]] ''f'', [[zot]] ''m'', [[zottin]] ''f''", + 'fr': "[[fou]] ''m'', [[folle]] ''f''", + 'de': "[[Irre]] ''m/f'', [[Irrer]] ''m indef.''"}}, + {'definition': u"(''slang'') The head.", + 'concisedef': u"'''slang: the head'''", + 'syns': u"[[bonce]], [[noddle]] (See further synonyms under [[head]])", + 'trans': { + 'de': u"[[Birne]] ''f'', [[Rübe]] ''f'', [[Dötz]] ''m''"}}, + {'definition': u"(''slang; rarely used in the singular'') A testicle.", + 'concisedef': u"'''slang: testicle'''", + 'syns': u"[[ball]], [[bollock]] (''taboo slang''), [[nad]]", + 'trans': { + 'nl': u"[[noten]] ''m (plural)'' <!--Never heard this before-->, [[bal]] ''m'', [[teelbal]] ''m''", + 'fr': u"[[couille]] ''f''", + 'de': u"[[Ei]] ''n'', ''lately:'' [[Nuss]] ''f''", + 'es': u"[[cojone]], [[huevo]]"}}, + ], + ], + u'nl': + [u'nut', 'n', None, + [{'definition': u'[[use]], [[benefit]]'}] + ], + } + ) + }, + {'wikilang': 'en', + 'term': 'nut', + 'wikiformat': u"""[[category:Foods]] +[[category:Drinks]]""", + 'internalrep': ([u'Foods', u'Drinks'], [], {})}) def testWhetherCategoriesAreParsedProperly(self): """Test whether Categories are parsed properly""" for value in self.knownvalues: - internalrepresentation=value['internalrep'] - apage = wiktionary.WiktionaryPage(value['wikilang'],value['term']) + internalrepresentation = value['internalrep'] + apage = wiktionary.WiktionaryPage(value['wikilang'], value['term']) apage.parseWikiPage(value['wikiformat']) self.assertEqual(apage.categories, internalrepresentation[0]) @@ -241,8 +286,8 @@ def testWhetherLinksAreParsedProperly(self): """Test whether Links are parsed properly""" for value in self.knownvalues: - internalrepresentation=value['internalrep'] - apage = wiktionary.WiktionaryPage(value['wikilang'],value['term']) + internalrepresentation = value['internalrep'] + apage = wiktionary.WiktionaryPage(value['wikilang'], value['term']) apage.parseWikiPage(value['wikiformat']) self.assertEqual(apage.interwikilinks, internalrepresentation[1]) @@ -250,31 +295,31 @@ def testWhetherDefsAreParsedProperly(self): """Test whether Definitions are parsed properly""" for value in self.knownvalues: - internalrepresentation=value['internalrep'][2] - apage = wiktionary.WiktionaryPage(value['wikilang'],value['term']) + internalrepresentation = value['internalrep'][2] + apage = wiktionary.WiktionaryPage(value['wikilang'], value['term']) apage.parseWikiPage(value['wikiformat']) for entrylang in internalrepresentation.keys(): - term=internalrepresentation[entrylang][0] - gender=internalrepresentation[entrylang][1] - plural=internalrepresentation[entrylang][2] - definitions=internalrepresentation[entrylang][3] - refdefs=[] + term = internalrepresentation[entrylang][0] + gender = internalrepresentation[entrylang][1] + plural = internalrepresentation[entrylang][2] + definitions = internalrepresentation[entrylang][3] + refdefs = [] for definition in definitions: refdefs.append(definition['definition']) - resultmeanings=[] + resultmeanings = [] for key in apage.entries[entrylang].meanings.keys(): for resultmeaning in apage.entries[entrylang].meanings[key]: resultmeanings.append(resultmeaning.definition) self.assertEqual(resultmeanings.sort(), refdefs.sort()) -''' -class ToRomanBadInput(unittest.TestCase): - def testTooLarge(self): - """toRoman should fail with large input""" - self.assertRaises(roman.OutOfRangeError, roman.toRoman, 4000) -''' + +##class ToRomanBadInput(unittest.TestCase): +## def testTooLarge(self): +## """toRoman should fail with large input""" +## self.assertRaises(roman.OutOfRangeError, roman.toRoman, 4000) + if __name__ == "__main__": unittest.main() -- To view, visit
https://gerrit.wikimedia.org/r/94622
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I281e8b1ff765d7ea6b005bb79d8449cbb6b1cd78 Gerrit-PatchSet: 2 Gerrit-Project: pywikibot/compat Gerrit-Branch: master Gerrit-Owner: Xqt <info(a)gno.de> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] Add "category" and "file" as exemptions in replaceExept - change (pywikibot/compat)
by Xqt (Code Review)
10 Nov '13
10 Nov '13
Xqt has submitted this change and it was merged. Change subject: Add "category" and "file" as exemptions in replaceExept ...................................................................... Add "category" and "file" as exemptions in replaceExept - pep8ified and synchronized with core by xqt Change-Id: If103b5fa1dc5952628665ee54f1bd72c31a29040 --- M pywikibot/textlib.py 1 file changed, 34 insertions(+), 23 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py index d0e69c4..80b55e5 100644 --- a/pywikibot/textlib.py +++ b/pywikibot/textlib.py @@ -22,6 +22,7 @@ TEMP_REGEX = re.compile( '{{(?:msg:)?(?P<name>[^{\|]+?)(?:\|(?P<params>[^{]+?(?:{[^{]+?}[^{]*?)?))?}}') + def unescape(s): """Replace escaped HTML-special characters by their originals""" if '&' not in s: @@ -90,6 +91,10 @@ 'property': re.compile(r'(?i)\{\{\s*#property:\s*p\d+\s*\}\}'), # Module invocations (currently only Lua) 'invoke': re.compile(r'(?i)\{\{\s*#invoke:.*?}\}'), + # categories + 'category': re.compile(ur'\[\[ *(?:%s)\s*:.*?\]\]' % ur'|'.join(site.namespace(14, all=True))), + #files + 'file': re.compile(ur'\[\[ *(?:%s)\s*:.*?\]\]' % ur'|'.join(site.namespace(6, all=True))), } @@ -215,12 +220,12 @@ groupMatch = groupR.search(replacement) if not groupMatch: break - groupID = groupMatch.group('name') or \ - int(groupMatch.group('number')) + groupID = (groupMatch.group('name') or + int(groupMatch.group('number'))) try: - replacement = replacement[:groupMatch.start()] + \ - match.group(groupID) + \ - replacement[groupMatch.end():] + replacement = (replacement[:groupMatch.start()] + + match.group(groupID) + \ + replacement[groupMatch.end():]) except IndexError: print '\nInvalid group reference:', groupID print 'Groups found:\n', match.groups() @@ -353,7 +358,7 @@ lenseparator:firstinseparator]): firstinseparator -= lenseparator striploopcontinue = True - elif text[firstinseparator-1] < ' ': + elif text[firstinseparator - 1] < ' ': firstinseparator -= 1 striploopcontinue = True marker = text[firstinseparator:firstinmarker] + marker @@ -510,10 +515,10 @@ if site.language() in site.family.interwiki_attop or \ u'<!-- interwiki at top -->' in oldtext: #do not add separator if interiki links are on one line - newtext = s + \ - [separator, u''][site.language() in - site.family.interwiki_on_one_line] + \ - s2.replace(marker, '').strip() + newtext = (s + + [separator, u''][site.language() in + site.family.interwiki_on_one_line] + + s2.replace(marker, '').strip()) else: # calculate what was after the language links on the page firstafter = s2.find(marker) @@ -525,8 +530,9 @@ if "</noinclude>" in s2[firstafter:]: if separatorstripped: s = separator + s - newtext = s2[:firstafter].replace(marker, '') + s + \ - s2[firstafter:] + newtext = (s2[:firstafter].replace(marker, '') + + s + + s2[firstafter:]) elif site.language() in site.family.categories_last: cats = getCategoryLinks(s2, site=site) s2 = removeCategoryLinksAndSeparator( @@ -538,8 +544,9 @@ # (not supported by rewrite - no API) elif site.family.name == 'wikitravel': s = separator + s + separator - newtext = s2[:firstafter].replace(marker, '') + s + \ - s2[firstafter:] + newtext = (s2[:firstafter].replace(marker, '') + + s + + s2[firstafter:]) else: if template or template_subpage: if template_subpage: @@ -558,8 +565,10 @@ newtext = regexp.sub(s + includeOff, s2) else: # Put the langlinks at the end, inside noinclude's - newtext = s2.replace(marker, '').strip() + separator + \ - u'%s\n%s%s\n' % (includeOn, s, includeOff) + newtext = (s2.replace(marker, '').strip() + + separator + + u'%s\n%s%s\n' % (includeOn, s, includeOff) + ) else: newtext = s2.replace(marker, '').strip() + separator + s else: @@ -646,8 +655,9 @@ r'(?:\|(?P<sortKey>.+?))?\s*\]\]' % catNamespace, re.I) for match in R.finditer(text): - cat = catlib.Category(site, '%s:%s' % (match.group('namespace'), - match.group('catName')), + cat = catlib.Category(site, + '%s:%s' % (match.group('namespace'), + match.group('catName')), sortKey=match.group('sortKey')) result.append(cat) return result @@ -788,8 +798,9 @@ if "</noinclude>" in s2[firstafter:]: if separatorstripped: s = separator + s - newtext = s2[:firstafter].replace(marker, '') + s + \ - s2[firstafter:] + newtext = (s2[:firstafter].replace(marker, '') + + s + + s2[firstafter:]) elif site.language() in site.family.categories_last: newtext = s2.replace(marker, '').strip() + separator + s else: @@ -823,7 +834,7 @@ if categories[0][0] == '[': catLinks = categories else: - catLinks = ['[[Category:'+category+']]' for category in categories] + catLinks = ['[[Category:' + category + ']]' for category in categories] else: catLinks = [category.aslink(noInterwiki=True) for category in categories] @@ -865,8 +876,8 @@ r'(?=[%(notAtEnd)s]*\'\')|http[s]?://[^%(notInside)s]*' \ r'[^%(notAtEnd)s])' % {'notInside': notInside, 'notAtEnd': notAtEnd} regexb = r'(?P<urlb>http[s]?://[^%(notInside)s]*?[^%(notAtEnd)s]' \ - r'(?=[%(notAtEnd)s]*\'\')|http[s]?://[^%(notInside)s]*' \ - r'[^%(notAtEnd)s])' % {'notInside': notInside, 'notAtEnd': notAtEndb} + r'(?=[%(notAtEnd)s]*\'\')|http[s]?://[^%(notInside)s]*' \ + r'[^%(notAtEnd)s])' % {'notInside': notInside, 'notAtEnd': notAtEndb} if withoutBracketed: regex = r'(?<!\[)' + regex elif onlyBracketed: -- To view, visit
https://gerrit.wikimedia.org/r/94493
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: If103b5fa1dc5952628665ee54f1bd72c31a29040 Gerrit-PatchSet: 3 Gerrit-Project: pywikibot/compat Gerrit-Branch: master Gerrit-Owner: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Legoktm <legoktm.wikipedia(a)gmail.com> Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Reza <reza.energy(a)gmail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] Fixing option order. - change (pywikibot/compat)
by Xqt (Code Review)
10 Nov '13
10 Nov '13
Xqt has submitted this change and it was merged. Change subject: Fixing option order. ...................................................................... Fixing option order. Change-Id: I23d2a9f7186a7a532d2836efd99590f81d75da70 --- M pagegenerators.py 1 file changed, 5 insertions(+), 5 deletions(-) Approvals: Xqt: Looks good to me, approved diff --git a/pagegenerators.py b/pagegenerators.py index 8decfb0..ebfc496 100644 --- a/pagegenerators.py +++ b/pagegenerators.py @@ -340,16 +340,16 @@ title = i18n.input('pywikibot-enter-page-processing') page = pywikibot.Page(site, title) gen = InterwikiPageGenerator(page) - elif arg.startswith('-random'): - if len(arg) == 7: - gen = RandomPageGenerator() - else: - gen = RandomPageGenerator(number=int(arg[8:])) elif arg.startswith('-randomredirect'): if len(arg) == 15: gen = RandomRedirectPageGenerator() else: gen = RandomRedirectPageGenerator(number=int(arg[16:])) + elif arg.startswith('-random'): + if len(arg) == 7: + gen = RandomPageGenerator() + else: + gen = RandomPageGenerator(number=int(arg[8:])) elif arg.startswith('-recentchanges'): if len(arg) >= 15: gen = RecentchangesPageGenerator(number=int(arg[15:])) -- To view, visit
https://gerrit.wikimedia.org/r/94612
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I23d2a9f7186a7a532d2836efd99590f81d75da70 Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/compat Gerrit-Branch: master Gerrit-Owner: Pyfisch <pyfisch(a)googlemail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] Fixing -randomredirects command line option which not interp... - change (pywikibot/core)
by jenkins-bot (Code Review)
10 Nov '13
10 Nov '13
jenkins-bot has submitted this change and it was merged. Change subject: Fixing -randomredirects command line option which not interpreted correctly, because of option order. ...................................................................... Fixing -randomredirects command line option which not interpreted correctly, because of option order. Change-Id: I7ab435a19557d8919f1e6ec12d484bbbc4927ac5 --- M pywikibot/pagegenerators.py 1 file changed, 5 insertions(+), 5 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/pagegenerators.py b/pywikibot/pagegenerators.py index a7b8fe3..a0698e4 100644 --- a/pywikibot/pagegenerators.py +++ b/pywikibot/pagegenerators.py @@ -313,16 +313,16 @@ page = pywikibot.Page(pywikibot.Link(title, pywikibot.Site())) gen = InterwikiPageGenerator(page) - elif arg.startswith('-random'): - if len(arg) == 7: - gen = RandomPageGenerator() - else: - gen = RandomPageGenerator(number=int(arg[8:])) elif arg.startswith('-randomredirect'): if len(arg) == 15: gen = RandomRedirectPageGenerator() else: gen = RandomRedirectPageGenerator(number=int(arg[16:])) + elif arg.startswith('-random'): + if len(arg) == 7: + gen = RandomPageGenerator() + else: + gen = RandomPageGenerator(number=int(arg[8:])) elif arg.startswith('-recentchanges'): if len(arg) >= 15: gen = RecentChangesPageGenerator(total=int(arg[15:])) -- To view, visit
https://gerrit.wikimedia.org/r/94608
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I7ab435a19557d8919f1e6ec12d484bbbc4927ac5 Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Pyfisch <pyfisch(a)googlemail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] [BUGFIX] bot.py: 2nd hasattr parameter must be a string - change (pywikibot/core)
by jenkins-bot (Code Review)
10 Nov '13
10 Nov '13
jenkins-bot has submitted this change and it was merged. Change subject: [BUGFIX] bot.py: 2nd hasattr parameter must be a string ...................................................................... [BUGFIX] bot.py: 2nd hasattr parameter must be a string Change-Id: I7cc9ebecd876529c2c44f17519c03dd6e79ba6bf --- M pywikibot/bot.py 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Legoktm: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/bot.py b/pywikibot/bot.py index fb62888..c530c4b 100644 --- a/pywikibot/bot.py +++ b/pywikibot/bot.py @@ -21,9 +21,9 @@ import os.path import sys -# logging levels _logger = "bot" +# logging levels from logging import DEBUG, INFO, WARNING, ERROR, CRITICAL STDOUT = 16 VERBOSE = 18 @@ -83,7 +83,7 @@ os.rename(self.baseFilename, dfn) #print "%s -> %s" % (self.baseFilename, dfn) elif self.backupCount == -1: - if not hasattr(self, lastNo): + if not hasattr(self, '_lastNo'): self._lastNo = 1 while True: fn = "%s.%d%s" % (root, self._lastNo, ext) -- To view, visit
https://gerrit.wikimedia.org/r/94597
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I7cc9ebecd876529c2c44f17519c03dd6e79ba6bf Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Xqt <info(a)gno.de> Gerrit-Reviewer: Legoktm <legoktm.wikipedia(a)gmail.com> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] wiktionary: pep8 changes, small code improvements - change (pywikibot/compat)
by Xqt (Code Review)
10 Nov '13
10 Nov '13
Xqt has submitted this change and it was merged. Change subject: wiktionary: pep8 changes, small code improvements ...................................................................... wiktionary: pep8 changes, small code improvements Change-Id: I1bb2f0ee00a881a31e35a1204dc07b893b9aaf3c --- M wiktionary/entry.py M wiktionary/header.py M wiktionary/headertest.py M wiktionary/meaning.py M wiktionary/meaningtest.py M wiktionary/sortonlanguagename.py M wiktionary/structs.py M wiktionary/term.py M wiktionary/termtest.py M wiktionary/wiktionarypage.py M wiktionary/wiktionarypagetest.py 11 files changed, 1,328 insertions(+), 993 deletions(-) Approvals: Xqt: Looks good to me, approved diff --git a/wiktionary/entry.py b/wiktionary/entry.py index a0392e8..b9d7c48 100644 --- a/wiktionary/entry.py +++ b/wiktionary/entry.py @@ -7,84 +7,111 @@ import meaning import structs + class Entry: """ This class contains the entries that belong together on one page. - On Wiktionaries that are still on first character capitalization, this means both [[Kind]] and [[kind]]. - Terms in different languages can be described. Usually there is one entry for each language. + On Wiktionaries that are still on first character capitalization, this + means both [[Kind]] and [[kind]]. + Terms in different languages can be described. Usually there is one entry + for each language. + """ - def __init__(self,entrylang,meaning=""): + def __init__(self, entrylang, meaning=""): """ Constructor - Called with one parameter: - - the language of this entry + Called with one parameter: + - the language of this entry and can optionally be initialized with a first meaning + """ - self.entrylang=entrylang - self.meanings = {} # a dictionary containing the meanings for this term grouped by part of speech - if meaning: - self.addMeaning(meaning) - self.posorder = [] # we don't want to shuffle the order of the parts of speech, so we keep a list to keep the order in which they were encountered + self.entrylang = entrylang + # a dictionary containing the meanings for this term grouped by part of + # speech: + self.meanings = {} - def addMeaning(self,meaning): + if meaning: + self.addMeaning(meaning) + # we don't want to shuffle the order of the parts of speech, so we keep + # a list to keep the order in which they were encountered: + self.posorder = [] + + def addMeaning(self, meaning): """ Lets you add another meaning to this entry """ - term = meaning.term # fetch the term, in order to be able to determine its part of speech in the next step + # fetch the term, in order to be able to determine its part of speech + # in the next step + term = meaning.term - self.meanings.setdefault( term.pos, [] ).append(meaning) - if not term.pos in self.posorder: # we only need each part of speech once in our list where we keep track of the order + self.meanings.setdefault(term.pos, []).append(meaning) + # we only need each part of speech once in our list where we keep track + # of the order + if not term.pos in self.posorder: self.posorder.append(term.pos) def getMeanings(self): - """ Returns a dictionary containing all the meaning objects for this entry + """ Returns a dictionary containing all the meaning objects for this + entry + """ return self.meanings - def wikiWrap(self,wikilang): + def wikiWrap(self, wikilang): """ Returns a string for this entry in a format ready for Wiktionary + """ - entry = structs.wiktionaryformats[wikilang]['langheader'].replace('%%langname%%',langnames[wikilang][self.entrylang]).replace('%%ISOLangcode%%',self.entrylang) + '\n' + entry = structs.wiktionaryformats[wikilang]['langheader'].replace( + '%%langname%%', langnames[wikilang][self.entrylang]).replace( + '%%ISOLangcode%%', self.entrylang) + '\n' for pos in self.posorder: meanings = self.meanings[pos] - entry += structs.wiktionaryformats[wikilang]['posheader'][pos] - entry +='\n' - if wikilang=='en': - entry = entry + meanings[0].term.wikiWrapAsExample(wikilang) + '\n\n' + entry += '\n' + if wikilang == 'en': + entry += meanings[0].term.wikiWrapAsExample(wikilang) + '\n\n' for meaning in meanings: - entry = entry + '#' + meaning.getLabel() + ' ' + meaning.definition + '\n' - entry = entry + meaning.wikiWrapExamples() - entry +='\n' + entry += '#%s %s\n' % (meaning.getLabel(), + meaning.definition) + entry += meaning.wikiWrapExamples() + entry += '\n' - if wikilang=='nl': + if wikilang == 'nl': for meaning in meanings: - term=meaning.term - entry = entry + meaning.getLabel() + term.wikiWrapAsExample(wikilang) + '; ' + meaning.definition + '\n' - entry = entry + meaning.wikiWrapExamples() - entry +='\n' + term = meaning.term + entry += meaning.getLabel() + term.wikiWrapAsExample( + wikilang) + '; %s\n' % meaning.definition + entry += meaning.wikiWrapExamples() + entry += '\n' if meaning.hasSynonyms(): - entry = entry + structs.wiktionaryformats[wikilang]['synonymsheader'] + '\n' + entry += '%s\n' % ( + structs.wiktionaryformats[wikilang]['synonymsheader']) for meaning in meanings: - entry = entry + '*' + meaning.getLabel() + "'''" + meaning.getConciseDef() + "''': " + meaning.wikiWrapSynonyms(wikilang) - entry +='\n' + entry += "*%s'''%s''': %s" % (meaning.getLabel(), + meaning.getConciseDef(), + meaning.wikiWrapSynonyms( + wikilang)) + entry += '\n' if meaning.hasTranslations(): - entry = entry + structs.wiktionaryformats[wikilang]['translationsheader'] + '\n' + entry += '%s\n' % ( + structs.wiktionaryformats[wikilang]['translationsheader']) for meaning in meanings: - entry = entry + meaning.getLabel() + "'''" + meaning.getConciseDef() + "'''" + '\n' + meaning.wikiWrapTranslations(wikilang,self.entrylang) + '\n\n' - entry +='\n' + entry += "%s'''%s'''\n%s\n\n" % ( + meaning.getLabel(), meaning.getConciseDef(), + meaning.wikiWrapTranslations(wikilang, self.entrylang)) + entry += '\n' return entry - def showContents(self,indentation): + def showContents(self, indentation): """ Prints the contents of all the subobjects contained in this entry. - Every subobject is indented a little further on the screen. - The primary purpose is to help keep your sanity while debugging. - """ - print ' ' * indentation + 'entrylang = %s'% self.entrylang + Every subobject is indented a little further on the screen. + The primary purpose is to help keep your sanity while debugging. + """ + print ' ' * indentation + 'entrylang = %s' % self.entrylang print ' ' * indentation + 'posorder:' + repr(self.posorder) meaningkeys = self.meanings.keys() for meaningkey in meaningkeys: for meaning in self.meanings[meaningkey]: - meaning.showContents(indentation+2) + meaning.showContents(indentation + 2) diff --git a/wiktionary/header.py b/wiktionary/header.py index 9a941b3..6e5d8f5 100644 --- a/wiktionary/header.py +++ b/wiktionary/header.py @@ -3,69 +3,75 @@ from structs import * + class Header(object): - def __init__(self,line=None,contents=None,header=None,level=None,type=None): + def __init__(self, line=None, contents=None, header=None, level=None, + type=None): """ Constructor - Generally called with one parameter: - - The line read from a Wiktonary page - after determining it's probably a header + Generally called with one parameter: + - The line read from a Wiktonary page + after determining it's probably a header + """ - # sane defaults for self - self.contents=None - self.header=None - self.level=None - self.type=None + self.contents = None + self.header = None + self.level = None + self.type = None - # settings for self - if line!=None: self.parseLine(line) - if contents!=None: self.contents=contents - if header!=None: self.header=header - if level!=None: self.level=level - if type!=None: self.type=type + if line is not None: + self.parseLine(line) + if contents is not None: + self.contents = contents + if header is not None: + self.header = header + if level is not None: + self.level = level + if type is not None: + self.type = type - def __eq__(x,y): + def __eq__(x, y): """x.__eq__(y) <==> x==y""" - return hasattr(x,"__dict__") and hasattr(y,"__dict__") and x.__dict__==y.__dict__ + return hasattr(x, "__dict__") and hasattr(y, "__dict__") and \ + x.__dict__ == y.__dict__ - def __ne__(x,y): + def __ne__(x, y): """x.__ne__(y) <==> x!=y""" - return (not hasattr(x,"__eq__")) and (not x.__eq__(y)) + return (not hasattr(x, "__eq__")) and (not x.__eq__(y)) - def parseLine(self,line): - self.level=None - self.type='' # The type of header, i.e. lang, pos, other - self.contents='' # If lang, which lang? If pos, which pos? + def parseLine(self, line): + self.level = None + self.type = '' # The type of header, i.e. lang, pos, other + self.contents = '' # If lang, which lang? If pos, which pos? self.header = '' - if line.count('=')>1: - self.level = line.count('=') // 2 # integer floor division without fractional part - self.header = line.replace('=','') + if line.count('=') > 1: + # integer floor division without fractional part + self.level = line.count('=') // 2 + self.header = line.replace('=', '') elif '{{' in line: - self.header = line.replace('{{-','').replace('-}}','') + self.header = line.replace('{{-', '').replace('-}}', '') - self.header = self.header.replace('{{','').replace('}}','').strip().lower() + self.header = self.header.replace('{{', + '').replace('}}', '').strip().lower() - # Now we know the content of the header, let's try to find out what it means: + # Now we know the content of the header, let's try to find out what it + # means: if self.header in pos: - self.type=u'pos' - self.contents=pos[self.header] + self.type = u'pos' + self.contents = pos[self.header] if self.header in langnames: - self.type=u'lang' - self.contents=self.header + self.type = u'lang' + self.contents = self.header if self.header in invertedlangnames: - self.type=u'lang' - self.contents=invertedlangnames[self.header] + self.type = u'lang' + self.contents = invertedlangnames[self.header] if self.header in otherheaders: - self.type=u'other' - self.contents=otherheaders[self.header] + self.type = u'other' + self.contents = otherheaders[self.header] def __repr__(self): - return self.__module__+".Header("+\ - "contents='"+self.contents+\ - "', header='"+self.header+\ - "', level="+str(self.level)+\ - ", type='"+self.type+\ - "')" + return "%s.Header(contents='%s', header='%s', level=%d, type='%s')" % ( + self.__module__, self.contents, self.header, self.level, self.type) diff --git a/wiktionary/headertest.py b/wiktionary/headertest.py index db01f90..81278b5 100644 --- a/wiktionary/headertest.py +++ b/wiktionary/headertest.py @@ -6,24 +6,27 @@ import header import unittest + class KnownValues(unittest.TestCase): knownValues = ( - ('==English==', 'en', 2, 'lang'), - ('=={{en}}==', 'en', 2, 'lang'), - ('{{-en-}}', 'en', None, 'lang'), - ('===Noun===', 'noun', 3, 'pos'), - ('==={{noun}}===', 'noun', 3, 'pos'), - ('{{-noun-}}', 'noun', None, 'pos'), - ('===Verb===', 'verb', 3, 'pos'), - ('==={{verb}}===', 'verb', 3, 'pos'), - ('{{-verb-}}', 'verb', None, 'pos'), - ('====Translations====', 'trans', 4, 'other'), - ('===={{trans}}====', 'trans', 4, 'other'), - ('{{-trans-}}', 'trans', None, 'other'), - ) + ('==English==', 'en', 2, 'lang'), + ('=={{en}}==', 'en', 2, 'lang'), + ('{{-en-}}', 'en', None, 'lang'), + ('===Noun===', 'noun', 3, 'pos'), + ('==={{noun}}===', 'noun', 3, 'pos'), + ('{{-noun-}}', 'noun', None, 'pos'), + ('===Verb===', 'verb', 3, 'pos'), + ('==={{verb}}===', 'verb', 3, 'pos'), + ('{{-verb-}}', 'verb', None, 'pos'), + ('====Translations====', 'trans', 4, 'other'), + ('===={{trans}}====', 'trans', 4, 'other'), + ('{{-trans-}}', 'trans', None, 'other'), + ) def testHeaderInitKnownValuesContents(self): - """Header parsing comparing known result with known input for contents""" + """Header parsing comparing known result with known input for contents + + """ for wikiline, contents, level, type in self.knownValues: result = header.Header(wikiline).contents self.assertEqual(contents, result) @@ -43,10 +46,10 @@ def testReprSanity(self): """Header __repr__, __eq__, __ne__ should give sane results""" for stuff in self.knownValues: - wikiline=stuff[0] - h=header.Header(wikiline) - self.assertEqual(h, eval(repr(h)) ) - self.assertNotEqual(h,header.Header()) + wikiline = stuff[0] + h = header.Header(wikiline) + self.assertEqual(h, eval(repr(h))) + self.assertNotEqual(h, header.Header()) if __name__ == "__main__": unittest.main() diff --git a/wiktionary/meaning.py b/wiktionary/meaning.py index a7019b5..e74b3ff 100644 --- a/wiktionary/meaning.py +++ b/wiktionary/meaning.py @@ -5,148 +5,170 @@ import structs import re + class Meaning: """ This class contains one meaning for a word or an expression. """ - def __init__(self,term,definition='',etymology='',synonyms={'remark': '', 'synonyms': [{'remark': '', 'synonym': ''}]},translations={},label='',concisedef='',examples=[]): + def __init__(self, term, definition='', etymology='', + synonyms={'remark': '', + 'synonyms': [{'remark': '', 'synonym': ''}]}, + translations=None, label='', concisedef='', examples=[]): """ Constructor - Generally called with one parameter: - - The Term object we are describing + Generally called with one parameter: + - The Term object we are describing - - definition (string) for this term is optional - - etymology (string) is optional - - synonyms (optional) - - translations (dictionary of Term objects, ISO639 is the key) is optional + - definition (string) for this term is optional + - etymology (string) is optional + - synonyms (optional) + - translations (dictionary of Term objects, ISO639 is the key) is + optional + """ - self.term=term - self.definition=definition - self.concisedef=concisedef - self.etymology=etymology - self.synonyms=synonyms + self.term = term + self.definition = definition + self.concisedef = concisedef + self.etymology = etymology + self.synonyms = synonyms # A structure, possibly containing the following items: # {'remark' : 'this remark concerns all the synonyms for this meaning', # 'synonyms' : [ - # {'remark': 'this remark concerns this particular synonym', + # {'remark': 'concerns this particular synonym', # 'synonym': Term object containing the synonym # }, # ] - self.examples=examples - self.label=label - - if translations: # Why this has to be done explicitly is beyond me, but it doesn't work correctly otherwise - self.translations=translations + self.examples = examples + self.label = label + if translations: + self.translations = translations else: - self.translations={} # a dictionary containing lists with translations to the different languages. Each translation is again a dictionary as follows: {'remark': '', 'trans': Term object} - self.translationsremark='' # a remark applying to all the translations for this meaning - self.translationsremarks={} # a dictionary containing remarks applying to a specific language - self.label=label + # a dictionary containing lists with translations to the different + # languages. Each translation is again a dictionary as follows: + # {'remark': '', 'trans': Term object} + self.translations = {} + # a remark applying to all the translations for this meaning + self.translationsremark = '' + # a dictionary containing remarks applying to a specific language + self.translationsremarks = {} + self.label = label - def setDefinition(self,definition): + def setDefinition(self, definition): """ Provide a definition """ - self.definition=definition + self.definition = definition def getDefinition(self): """ Returns the definition """ return self.definition - def setEtymology(self,etymology): + def setEtymology(self, etymology): """ Provide the etymology """ - self.etymology=etymology + self.etymology = etymology def getEtymology(self): """ Returns the etymology """ return self.etymology - def setSynonyms(self,synonyms): + def setSynonyms(self, synonyms): """ Provide the synonyms """ - self.synonyms=synonyms + self.synonyms = synonyms def getSynonyms(self): """ Returns the list of synonym Term objects """ return self.synonyms - def parseSynonyms(self,synonymswikiline): + def parseSynonyms(self, synonymswikiline): synsremark = '' synonyms = [] - openparenthesis=synonymswikiline.lower().find('(see') - if openparenthesis!=-1: - closeparenthesis=synonymswikiline.find(')',openparenthesis) - synsremark=synonymswikiline[openparenthesis:closeparenthesis+1] - synonymswikiline=synonymswikiline[:openparenthesis-1] + synonymswikiline[closeparenthesis+1:] + openparenthesis = synonymswikiline.lower().find('(see') + if openparenthesis != -1: + closeparenthesis = synonymswikiline.find(')', openparenthesis) + synsremark = synonymswikiline[openparenthesis:closeparenthesis + 1] + synonymswikiline = synonymswikiline[:openparenthesis - 1] + \ + synonymswikiline[closeparenthesis + 1:] for synonym in synonymswikiline.split(','): synremark = '' - openparenthesis=synonym.lower().find('(') - if openparenthesis!=-1: - closeparenthesis=synonym.find(')',openparenthesis) - synremark=synonym[openparenthesis:closeparenthesis+1] - synonym=synonym[:openparenthesis-1] + synonym[closeparenthesis+2:] - synonym=synonym.replace(',','').replace("[",'').replace(']','').strip() + openparenthesis = synonym.lower().find('(') + if openparenthesis != -1: + closeparenthesis = synonym.find(')', openparenthesis) + synremark = synonym[openparenthesis:closeparenthesis + 1] + synonym = synonym[:openparenthesis - 1] + \ + synonym[closeparenthesis + 2:] + synonym = synonym.replace( + ',', '').replace("[", '').replace(']', '').strip() synonyms.append({'synonym': synonym, 'remark': synremark}) - self.synonyms={'remark': synsremark, 'synonyms': synonyms} + self.synonyms = {'remark': synsremark, 'synonyms': synonyms} - def parseTranslations(self,translationswikiline): + def parseTranslations(self, translationswikiline): ''' This function will parse one line in wiki format Typically this is the translation towards one language. ''' - # There can be many translations for a language, each one can have remark - # a gender and a number. - # There can also be a remark for the group of translations for a given language - # And there can be a remark applying to all the translations (That has to be detected and stored on a higher level though. - # It is also possible that the translation for a given language is not parseable - # In that case the entire line should go into the remark. + # There can be many translations for a language, each one can have + # remark a gender and a number. + # There can also be a remark for the group of translations for a given + # language. And there can be a remark applying to all the translations + # (That has to be detected and stored on a higher level though. + # It is also possible that the translation for a given language is not + # parseable. In that case the entire line should go into the remark. translationsremark = translationremark = '' - translations = [] # a list of translations for a given language - colon=translationswikiline.find(':') - if colon!=-1: - # Split in lang and the rest of the line which should be a list of translations - lang = translationswikiline[:colon].replace('*','').replace('[','').replace(']','').replace('{','').replace('}','').strip().lower() - trans = translationswikiline[colon+1:] + translations = [] # a list of translations for a given language + colon = translationswikiline.find(':') + if colon != -1: + # Split in lang and the rest of the line which should be a list of + # translations + lang = translationswikiline[:colon].replace( + '*', '').replace('[', '').replace(']', '').replace( + '{', '').replace('}', '').strip().lower() + trans = translationswikiline[colon + 1:] # Look up lang and convert to an ISO abbreviation - isolang='' + isolang = '' if lang in structs.langnames: - isolang=lang + isolang = lang elif lang in structs.invertedlangnames: - isolang=structs.invertedlangnames[lang] + isolang = structs.invertedlangnames[lang] # We need to prepare the line a bit to make it more easily parseable # All the commas found between '' '' are converted to simple spaces # Also }}, {{ has to be converted to }} {{ - trans="''".join([ [i[1],re.sub(',',' ',i[1])][i[0]%2==1] for i in enumerate(trans.split("''")) ]) + trans = "''".join([[i[1], re.sub(',', ' ', i[1])][i[0] % 2 == 1] + for i in enumerate(trans.split("''"))]) - trans=re.sub(r"(}}.*),(.*{{)",'}} {{',trans) + trans = re.sub(r"(}}.*),(.*{{)", '}} {{', trans) # Now split up the translations (we got rid of extraneous commas) for translation in trans.split(','): - translation=translation.strip() + translation = translation.strip() # Find what is contained inside parentheses - m= re.search(r'(\(.*\))',translation) + m = re.search(r'(\(.*\))', translation) if m: # Only when the parentheses don't occur # between [[ ]] - if translation[m.end(1)+1:m.end(1)+2]!=']': - translationremark = m.group(1).replace('(','').replace(')','') - translation=translation.replace(m.group(1),'') + if translation[m.end(1) + 1:m.end(1) + 2] != ']': + translationremark = m.group(1).replace( + '(', '').replace(')', '') + translation = translation.replace(m.group(1), '') number = 1 masculine = feminine = neutral = common = diminutive = False partconsumed = False for part in translation.split(' '): - part=part.strip() - colon=part.find(':') - if colon!=-1: - colon2=part.find(':',colon+1) - pipe=part.find('|') - if colon2!=-1 and pipe!=-1: + part = part.strip() + colon = part.find(':') + if colon != -1: + colon2 = part.find(':', colon + 1) + pipe = part.find('|') + if colon2 != -1 and pipe != -1: # We found a link to another language Wiktionary # This contains no interesting information to store - # If the target Wiktionary uses them, we'll create them upon output + # If the target Wiktionary uses them, we'll create + # them upon output pass else: - translationremark = part.replace("'",'').replace('(','').replace(')','').replace(':','') + translationremark = part.replace( + "'", '').replace('(', '').replace( + ')', '').replace(':', '') partconsumed = True - cleanpart=part.replace("'",'').lower() - delim='' + cleanpart = part.replace("'", '').lower() + delim = '' # XXX The following 3 tests look wrong: # find() returns either -1 if the substring is not found, # or the position of the substring in the string. @@ -155,120 +177,149 @@ # # the test "',' in cleanpart" might be the one to use. if cleanpart.find(','): - delim=',' + delim = ',' if cleanpart.find(';'): - delim=';' + delim = ';' if cleanpart.find('/'): - delim='/' + delim = '/' if 0 <= part.find("'") <= 2 or '{' in part: - if delim=='': - delim='|' - cleanpart=cleanpart+'|' + if delim == '': + delim = '|' + cleanpart += '|' for maybegender in cleanpart.split(delim): - maybegender=maybegender.strip() - if maybegender=='m' or maybegender=='{{m}}': - masculine=True + maybegender = maybegender.strip() + if maybegender == 'm' or maybegender == '{{m}}': + masculine = True partconsumed = True - if maybegender=='f' or maybegender=='{{f}}': - feminine=True + if maybegender == 'f' or maybegender == '{{f}}': + feminine = True partconsumed = True - if maybegender=='n' or maybegender=='{{n}}': - neutral=True + if maybegender == 'n' or maybegender == '{{n}}': + neutral = True partconsumed = True - if maybegender=='c' or maybegender=='{{c}}': - common=True + if maybegender == 'c' or maybegender == '{{c}}': + common = True partconsumed = True - if maybegender=='p' or maybegender=='pl' or maybegender=='plural' or maybegender=='{{p}}': - number=2 + if maybegender == 'p' or maybegender == 'pl' or \ + maybegender == 'plural' or \ + maybegender == '{{p}}': + number = 2 partconsumed = True - if maybegender[:3]=='dim' or maybegender=='{{dim}}': - diminutive=True + if maybegender[:3] == 'dim' or \ + maybegender == '{{dim}}': + diminutive = True partconsumed = True - # print 'consumed: ', partconsumed +## print 'consumed: ', partconsumed if not partconsumed: # This must be our term - termweareworkingon=part.replace("[",'').replace("]",'').lower() - if '#' in termweareworkingon and '|' in termweareworkingon: - termweareworkingon=termweareworkingon.split('#')[0] + termweareworkingon = part.replace( + "[", '').replace("]", '').lower() + if '#' in termweareworkingon and \ + '|' in termweareworkingon: + termweareworkingon = termweareworkingon.split( + '#')[0] # Now we have enough information to create a term # object for this translation and add it to our list - addedflag=False + addedflag = False if masculine: - thistrans = {'remark': translationremark, 'trans': term.Term(isolang,termweareworkingon,gender='m',number=number,diminutive=diminutive,wikiline=translation)} + thistrans = {'remark': translationremark, + 'trans': term.Term(isolang, + termweareworkingon, + gender='m', + number=number, + diminutive=diminutive, + wikiline=translation)} translations.append(thistrans) - addedflag=True + addedflag = True if feminine: - thistrans = {'remark': translationremark, 'trans': term.Term(isolang,termweareworkingon,gender='f',number=number,diminutive=diminutive,wikiline=translation)} + thistrans = {'remark': translationremark, + 'trans': term.Term(isolang, + termweareworkingon, + gender='f', + number=number, + diminutive=diminutive, + wikiline=translation)} translations.append(thistrans) - addedflag=True + addedflag = True if neutral: - thistrans = {'remark': translationremark, 'trans': term.Term(isolang,termweareworkingon,gender='n',number=number,diminutive=diminutive,wikiline=translation)} + thistrans = {'remark': translationremark, + 'trans': term.Term(isolang, + termweareworkingon, + gender='n', + number=number, + diminutive=diminutive, + wikiline=translation)} translations.append(thistrans) - addedflag=True + addedflag = True if common: - thistrans = {'remark': translationremark, 'trans': term.Term(isolang,termweareworkingon,gender='c',number=number,diminutive=diminutive,wikiline=translation)} + thistrans = {'remark': translationremark, + 'trans': term.Term(isolang, + termweareworkingon, + gender='c', + number=number, + diminutive=diminutive, + wikiline=translation)} translations.append(thistrans) - addedflag=True - # if it wasn't added by now, it's a term which has no gender indication + addedflag = True + # if it wasn't added by now, it's a term which has no gender + # indication if not addedflag: - thistrans = {'remark': translationremark, 'trans': term.Term(isolang,termweareworkingon,number=number,diminutive=diminutive)} + thistrans = {'remark': translationremark, + 'trans': term.Term(isolang, + termweareworkingon, + number=number, + diminutive=diminutive)} translations.append(thistrans) if not isolang: - print "Houston, we have a problem. This line doesn't seem to contain an indication of the language:",translationswikiline - self.translations[isolang] = {'remark': translationsremark, - 'alltrans': translations } + print ("This line doesn't seem to contain an indication of the " + "language: %s" % translationswikiline) + self.translations[isolang] = {'remark': translationsremark, + 'alltrans': translations} def hasSynonyms(self): - """ Returns True if there are synonyms - Returns False if there are no synonyms - """ - if self.synonyms == []: - return False - else: - return True + """ Returns True if there are synonyms else False """ + return bool(self.synonyms) - def setTranslations(self,translations): + def setTranslations(self, translations): """ Provide the translations """ - self.translations=translations + self.translations = translations def getTranslations(self): """ Returns the translations dictionary containing translation - Term objects for this meaning + Term objects for this meaning """ return self.translations - def addTranslation(self,translation): + def addTranslation(self, translation): """ Add a translation Term object to the dictionary for this meaning - The lang property of the Term object will be used as the key of the dictionary - """ - self.translations.setdefault( translation.lang, [] ).append( translation ) + The lang property of the Term object will be used as the key of the + dictionary - def addTranslations(self,*translations): + """ + self.translations.setdefault(translation.lang, []).append(translation) + + def addTranslations(self, *translations): """ This method calls addTranslation as often as necessary to add - all the translations it receives + all the translations it receives + """ for translation in translations: self.addTranslation(translation) def hasTranslations(self): - """ Returns True if there are translations - Returns False if there are no translations - """ - if self.translations == {}: - return 0 - else: - return 1 + """ Returns True if there are translations else False """ + return bool(self.translations) - def setLabel(self,label): - self.label=label.replace('<!--','').replace('-->','') + def setLabel(self, label): + self.label = label.replace('<!--', '').replace('-->', '') def getLabel(self): if self.label: - return u'<!--' + self.label + u'-->' + return u'<!--%s-->' % self.label - def setConciseDef(self,concisedef): - self.concisedef=concisedef + def setConciseDef(self, concisedef): + self.concisedef = concisedef def getConciseDef(self): if self.concisedef: @@ -276,18 +327,22 @@ def getExamples(self): """ Returns the list of example strings for this meaning + """ return self.examples - def addExample(self,example): + def addExample(self, example): """ Add a translation Term object to the dictionary for this meaning - The lang property of the Term object will be used as the key of the dictionary + The lang property of the Term object will be used as the key of the + dictionary + """ self.examples.append(example) - def addExamples(self,*examples): + def addExamples(self, *examples): """ This method calls addExample as often as necessary to add - all the examples it receives + all the examples it receives + """ for example in examples: self.addExample(example) @@ -301,94 +356,125 @@ else: return 1 - def wikiWrapSynonyms(self,wikilang): - """ Returns a string with all the synonyms in a format ready for Wiktionary + def wikiWrapSynonyms(self, wikilang): + """ Returns a string with all the synonyms in a format ready for + Wiktionary + """ first = 1 wrappedsynonyms = '' for synonym in self.synonyms: - if first==0: + if first == 0: wrappedsynonyms += ', ' else: first = 0 - wrappedsynonyms = wrappedsynonyms + synonym.wikiWrapForList(wikilang) + wrappedsynonyms += synonym.wikiWrapForList( + wikilang) return wrappedsynonyms + '\n' - def wikiWrapTranslations(self,wikilang,entrylang): + def wikiWrapTranslations(self, wikilang, entrylang): """ Returns a string with all the translations in a format - ready for Wiktionary - The behavior changes with the circumstances. - For an entry in the same language as the Wiktionary the full list of translations is contained in the output, excluding the local - language itself - - This list of translations has to end up in a table with two columns - - The first column of this table contains languages with names from A to M, the second contains N to Z - - If a column in this list remains empty a html comment is put in that column - For an entry in a foreign language only the translation towards the local language is output. + ready for Wiktionary + The behavior changes with the circumstances. + For an entry in the same language as the Wiktionary the full list of + translations is contained in the output, excluding the local language + itself + - This list of translations has to end up in a table with two columns + - The first column of this table contains languages with names + from A to M, the second contains N to Z + - If a column in this list remains empty a html comment is put in that + column + For an entry in a foreign language only the translation towards the + local language is output. """ if wikilang == entrylang: - # When treating an entry of the same lang as the Wiktionary, we want to output the translations in such a way that they end up sorted alphabetically on the language name in the language of the current Wiktionary - alllanguages=self.translations.keys() + # When treating an entry of the same lang as the Wiktionary, we + # want to output the translations in such a way that they end up + # sorted alphabetically on the language name in the language of the + # current Wiktionary + alllanguages = self.translations.keys() alllanguages.sort(sortonname(langnames[wikilang])) - wrappedtranslations = structs.wiktionaryformats[wikilang]['transbefore'] + '\n' + wrappedtranslations = '%s\n' % ( + structs.wiktionaryformats[wikilang]['transbefore']) alreadydone = 0 for language in alllanguages: - if language == wikilang: continue # don't output translation for the wikilang itself + if language == wikilang: + # don't output translation for the wikilang itself + continue # split translations into two column table - if not alreadydone and langnames[wikilang][language][0:1].upper() > 'M': - wrappedtranslations = wrappedtranslations + structs.wiktionaryformats[wikilang]['transinbetween'] + '\n' + if not alreadydone and \ + langnames[wikilang][language][0:1].upper() > 'M': + wrappedtranslations += structs.wiktionaryformats[ + wikilang]['transinbetween'] + '\n' alreadydone = 1 - # Indicating the language according to the wikiformats dictionary - wrappedtranslations = wrappedtranslations + structs.wiktionaryformats[wikilang]['translang'].replace('%%langname%%',langnames[wikilang][language]).replace('%%ISOLangcode%%',language) + ': ' + # Indicating the language according to the wikiformats + # dictionary + wrappedtranslations += structs.wiktionaryformats[ + wikilang]['translang'].replace( + '%%langname%%', + langnames[wikilang][language]).replace( + '%%ISOLangcode%%', language) + ': ' first = 1 for translation in self.translations[language]: - termweareworkingon=translation.term - if first==0: + termweareworkingon = translation.term + if first == 0: wrappedtranslations += ', ' else: first = 0 - wrappedtranslations = wrappedtranslations + translation.wikiWrapAsTranslation(wikilang) + wrappedtranslations += translation.wikiWrapAsTranslation( + wikilang) wrappedtranslations += '\n' if not alreadydone: - wrappedtranslations = wrappedtranslations + structs.wiktionaryformats[wikilang]['transinbetween'] + '\n' + structs.wiktionaryformats[wikilang]['transnoNtoZ'] + '\n' + wrappedtranslations += structs.wiktionaryformats[ + wikilang]['transinbetween'] + '\n' + \ + structs.wiktionaryformats[wikilang]['transnoNtoZ'] + '\n' alreadydone = 1 - wrappedtranslations = wrappedtranslations + structs.wiktionaryformats[wikilang]['transafter'] + '\n' + wrappedtranslations += structs.wiktionaryformats[ + wikilang]['transafter'] + '\n' else: - # For the other entries we want to output the translation in the language of the Wiktionary - wrappedtranslations = structs.wiktionaryformats[wikilang]['translang'].replace('%%langname%%',langnames[wikilang][wikilang]).replace('%%ISOLangcode%%',wikilang) + ': ' + # For the other entries we want to output the translation in the + # language of the Wiktionary + wrappedtranslations = structs.wiktionaryformats[ + wikilang]['translang'].replace('%%langname%%', + langnames[ + wikilang][wikilang]).replace( + '%%ISOLangcode%%', + wikilang) + ': ' first = True for translation in self.translations[wikilang]: - termweareworkingon=translation.term - if first==False: + termweareworkingon = translation.term + if not first: wrappedtranslations += ', ' else: first = False - wrappedtranslations = wrappedtranslations + translation.wikiWrapAsTranslation(wikilang) + wrappedtranslations += translation.wikiWrapAsTranslation( + wikilang) return wrappedtranslations - def showContents(self,indentation): + def showContents(self, indentation): """ Prints the contents of this meaning. - Every subobject is indented a little further on the screen. - The primary purpose is to help keep one's sanity while debugging. + Every subobject is indented a little further on the screen. + The primary purpose is to help keep one's sanity while debugging. """ print ' ' * indentation + 'term: ' - self.term.showContents(indentation+2) - print ' ' * indentation + 'definition = %s'% self.definition - print ' ' * indentation + 'etymology = %s'% self.etymology - + self.term.showContents(indentation + 2) + print ' ' * indentation + 'definition = %s' % self.definition + print ' ' * indentation + 'etymology = %s' % self.etymology print ' ' * indentation + 'Synonyms:' for synonym in self.synonyms: - synonym.showContents(indentation+2) - + synonym.showContents(indentation + 2) print ' ' * indentation + 'Translations:' translationkeys = self.translations.keys() for translationkey in translationkeys: for translation in self.translations[translationkey]: - translation.showContents(indentation+2) + translation.showContents(indentation + 2) def wikiWrapExamples(self): - """ Returns a string with all the examples in a format ready for Wiktionary + """ Returns a string with all the examples in a format ready for + Wiktionary + """ wrappedexamples = '' for example in self.examples: - wrappedexamples = wrappedexamples + "#:'''" + example + "'''\n" + wrappedexamples += "#:'''%s'''\n" % example return wrappedexamples diff --git a/wiktionary/meaningtest.py b/wiktionary/meaningtest.py index 033db79..8c6d1cc 100644 --- a/wiktionary/meaningtest.py +++ b/wiktionary/meaningtest.py @@ -6,57 +6,65 @@ import meaning import unittest + class KnownValues(unittest.TestCase): knownParserValues = ( - ("*German: [[wichtig]]", - [('de','wichtig','',1,False,'')] - ), - ("*[[Esperanto]]: [[grava]]", - [('eo','grava','',1,False,'')] - ), - ("*{{fr}}: [[importante]] {{f}}", - [('fr','importante','f',1,False,'')] - ), - ("*Dutch: [[voorbeelden]] ''n, pl'', [[instructies]] {{f}}, {{p}}", - [('nl','voorbeelden','n',2,False,''), - ('nl','instructies', 'f',2,False,'')] - ), - ("*Russian: [[шесток]] ''m'' (shestok)", - [('ru','шесток','m',1,False,'shestok')] - ), - ("*Kazakh: сәлем, салам, сәлеметсіздер(respectable)", - [('ka','сәлем','',1,False,''), - ('ka','салам','',1,False,''), - ('ka','сәлеметсіздер','',1,False,'respectable')] - ), - ("*Chinese(Mandarin):[[你好]](ni3 hao3), [[您好]](''formal'' nin2 hao3)", - [('zh','你好','',1,False,'ni3 hao3'), - ('zh','您好','',1,False,"''formal'' nin2 hao3")] - ), - ("*German: [[Lamm]] ''n'' [[:de:Lamm|(de)]]", - [('de','Lamm','n',1,False,'')] - ), - ("*Italian: [[pronto#Italian|pronto]]", - [('it','pronto','',1,False,'')] - ), - ) + ("*German: [[wichtig]]", + [('de', 'wichtig', '', 1, False, '')] + ), + ("*[[Esperanto]]: [[grava]]", + [('eo', 'grava', '', 1, False, '')] + ), + ("*{{fr}}: [[importante]] {{f}}", + [('fr', 'importante', 'f', 1, False, '')] + ), + ("*Dutch: [[voorbeelden]] ''n, pl'', [[instructies]] {{f}}, {{p}}", + [('nl', 'voorbeelden', 'n', 2, False, ''), + ('nl', 'instructies', 'f', 2, False, '')] + ), + ("*Russian: [[шесток]] ''m'' (shestok)", + [('ru', 'шесток', 'm', 1, False, 'shestok')] + ), + ("*Kazakh: сәлем, салам, сәлеметсіздер(respectable)", + [('ka', 'сәлем', '', 1, False, ''), + ('ka', 'салам', '', 1, False, ''), + ('ka', 'сәлеметсіздер', '', 1, False, 'respectable')] + ), + ("*Chinese(Mandarin):[[你好]](ni3 hao3), [[您好]](''formal'' nin2 hao3)", + [('zh', '你好', '', 1, False, 'ni3 hao3'), + ('zh', '您好', '', 1, False, "''formal'' nin2 hao3")] + ), + ("*German: [[Lamm]] ''n'' [[:de:Lamm|(de)]]", + [('de', 'Lamm', 'n', 1, False, '')] + ), + ("*Italian: [[pronto#Italian|pronto]]", + [('it', 'pronto', '', 1, False, '')] + ), + ) def testParser(self): - '''self.term, self.gender, self.number, self.diminutive and remark parsed correctly from Wiki format''' + '''self.term, self.gender, self.number, self.diminutive and remark + parsed correctly from Wiki format + + ''' for wikiline, results in self.knownParserValues: ameaning = meaning.Meaning('en', 'dummy') ameaning.parseTranslations(wikiline) - i=0 - for termlang, thisterm, termgender, termnumber, termisadiminutive, remark in results: - resultterm = ameaning.translations[termlang]['alltrans'][i]['trans'] + i = 0 + for termlang, thisterm, termgender, termnumber, termisadiminutive, \ + remark in results: + resultterm = ameaning.translations[ + termlang]['alltrans'][i]['trans'] self.assertEqual(resultterm.getTerm(), thisterm) self.assertEqual(resultterm.getGender(), termgender) self.assertEqual(resultterm.getNumber(), termnumber) -# self.assertEqual(resultterm.getIsDiminutive(), termisadiminutive) - self.assertEqual(ameaning.translations[termlang]['alltrans'][i]['remark'], remark) - i+=1 +## self.assertEqual(resultterm.getIsDiminutive(), +## termisadiminutive) + self.assertEqual( + ameaning.translations[termlang]['alltrans'][i]['remark'], + remark) + i += 1 if __name__ == "__main__": unittest.main() - diff --git a/wiktionary/sortonlanguagename.py b/wiktionary/sortonlanguagename.py index 090faec..73d4312 100755 --- a/wiktionary/sortonlanguagename.py +++ b/wiktionary/sortonlanguagename.py @@ -2,13 +2,16 @@ # -*- coding: utf-8 -*- # A big thanks to Rob Hooft for the following class: -# It may not seem like much, but it magically allows the translations to be sorted on -# the names of the languages. I would never have thought of doing it like this myself. +# It may not seem like much, but it magically allows the translations to be +# sorted on the names of the languages. I would never have thought of doing it +# like this myself. + class sortonlanguagename: ''' This class sorts translations alphabetically on the name of the language, instead of on the iso abbreviation that is used internally. + ''' def __init__(self, lang): self.lang = lang diff --git a/wiktionary/structs.py b/wiktionary/structs.py index ce19f6c..946a6b8 100644 --- a/wiktionary/structs.py +++ b/wiktionary/structs.py @@ -5,7 +5,21 @@ Basic structures for wiktionary.py ''' -isolangs = ['af','sq','ar','an','hy','ast','tay','ay','az','bam','eu','bn','my','bi','bs','br','bg','sro','ca','zh','chp','rmr','co','dgd','da','de','eml','en','eo','et','fo','fi','fr','cpf','fy','fur','gl','ka','el','gu','hat','haw','he','hi','hu','io','ga','is','gil','id','ia','it','ja','jv','ku','kok','ko','hr','lad','la','lv','ln','li','lt','lb','src','ma','ms','mg','mt','mnc','mi','mr','mh','mas','myn','mn','nah','nap','na','nds','no','ny','oc','uk','oen','grc','pau','pap','pzh','fa','pl','pt','pa','qu','rap','roh','ra','ro','ja-ro','ru','smi','sm','sa','sc','sco','sr','sn','si','sk','sl','so','sov','es','scn','su','sw','tl','tt','th','ti','tox','cs','che','tn','tum','tpn','tr','ts','tvl','ur','vi','vo','wa','cy','be','wo','xh','zu','sv'] +isolangs = ['af', 'an', 'ar', 'ast', 'ay', 'az', 'bam', 'be', 'bg', 'bi', 'bn', + 'br', 'bs', 'ca', 'che', 'chp', 'co', 'cpf', 'cs', 'cy', 'da', 'de', + 'dgd', 'el', 'eml', 'en', 'eo', 'es', 'et', 'eu', 'fa', 'fi', 'fo', + 'fr', 'fur', 'fy', 'ga', 'gil', 'gl', 'grc', 'gu', 'hat', 'haw', + 'he', 'hi', 'hr', 'hu', 'hy', 'ia', 'id', 'io', 'is', 'it', 'ja', + 'ja-ro', 'jv', 'ka', 'ko', 'kok', 'ku', 'la', 'lad', 'lb', 'li', + 'ln', 'lt', 'lv', 'ma', 'mas', 'mg', 'mh', 'mi', 'mn', 'mnc', 'mr', + 'ms', 'mt', 'my', 'myn', 'na', 'nah', 'nap', 'nds', 'no', 'ny', + 'oc', 'oen', 'pa', 'pap', 'pau', 'pl', 'pt', 'pzh', 'qu', 'ra', + 'rap', 'rmr', 'ro', 'roh', 'ru', 'sa', 'sc', 'scn', 'sco', 'si', + 'sk', 'sl', 'sm', 'smi', 'sn', 'so', 'sov', 'sq', 'sr', 'src', + 'sro', 'su', 'sv', 'sw', 'tay', 'th', 'ti', 'tl', 'tn', 'tox', + 'tpn', 'tr', 'ts', 'tt', 'tum', 'tvl', 'uk', 'ur', 'vi', 'vo', 'wa', + 'wo', 'xh', 'zh', 'zu', + ] wiktionaryformats = { 'nl': { @@ -15,10 +29,10 @@ 'afterexampleterm': u"'''", 'gender': u"{{%%gender%%}}", 'posheader': { - 'noun': u'{{-noun-}}', - 'adjective': u'{{-adj-}}', - 'verb': u'{{-verb-}}', - }, + 'noun': u'{{-noun-}}', + 'adjective': u'{{-adj-}}', + 'verb': u'{{-verb-}}', + }, 'translationsheader': u"{{-trans-}}", 'transbefore': u'{{top}}', 'transinbetween': u'{{mid}}', @@ -27,7 +41,7 @@ 'transnoNtoZ': u'<!-- Vertalingen van N tot Z komen hier-->', 'synonymsheader': u"{{-syn-}}", 'relatedheader': u'{{-rel-}}', - }, + }, 'en': { 'langheader': u'==%%langname%%==', 'translang': u'*%%langname%%', @@ -35,10 +49,10 @@ 'afterexampleterm': u"'''", 'gender': u"''%%gender%%''", 'posheader': { - 'noun': u'===Noun===', - 'adjective': u'===Adjective===', - 'verb': u'===Verb===', - }, + 'noun': u'===Noun===', + 'adjective': u'===Adjective===', + 'verb': u'===Verb===', + }, 'translationsheader': u"====Translations====", 'transbefore': u'{{top}}', 'transinbetween': u'{{mid}}', @@ -47,7 +61,7 @@ 'transnoNtoZ': u'<!-- Translations from N tot Z go here-->', 'synonymsheader': u"====Synonyms====", 'relatedheader': u'===Related words===', - } + } } pos = { @@ -76,87 +90,88 @@ } langnames = { - 'nl': { - 'translingual' : u'Taalonafhankelijk', - 'nl' : u'Nederlands', - 'en' : u'Engels', - 'de' : u'Duits', - 'fr' : u'Frans', - 'it' : u'Italiaans', - 'eo' : u'Esperanto', - 'es' : u'Spaans', - }, - 'de': { - 'translingual' : u'???', - 'nl' : u'Niederländisch', - 'en' : u'Englisch', - 'de' : u'Deutsch', - 'fr' : u'Französisch', - 'it' : u'Italienisch', - 'eo' : u'Esperanto', - 'es' : u'Spanisch', - }, - 'en': { - 'translingual' : u'Translingual', - 'nl' : u'Dutch', - 'en' : u'English', - 'de' : u'German', - 'fr' : u'French', - 'it' : u'Italian', - 'eo' : u'Esperanto', - 'es' : u'Spanish', - }, - 'eo': { - 'translingual' : u'???', - 'nl' : u'Nederlanda', - 'en' : u'Angla', - 'de' : u'Germana', - 'fr' : u'Franca', - 'it' : u'Italiana', - 'eo' : u'Esperanto', - 'es' : u'Hispana', - }, - 'ia': { - 'translingual' : u'translingual', - 'nl' : u'nederlandese', - 'en' : u'anglese', - 'de' : u'germano', - 'fr' : u'francese', - 'it' : u'italiano', - 'eo' : u'esperanto', - 'es' : u'espaniol', - }, - 'it': { - 'translingual' : u'???', - 'nl' : u'olandese', - 'en' : u'inglese', - 'de' : u'tedesco', - 'fr' : u'francese', - 'it' : u'italiano', - 'eo' : u'esperanto', - 'es' : u'spagnuolo', - }, - 'fr': { - 'translingual' : u'???', - 'nl' : u'néerlandais', - 'en' : u'anglais', - 'de' : u'allemand', - 'fr' : u'français', - 'it' : u'italien', - 'eo' : u'espéranto', - 'es' : u'espagnol', - }, - 'es': { - 'translingual' : u'???', - 'nl' : u'olandés', - 'en' : u'inglés', - 'de' : u'alemán', - 'fr' : u'francés', - 'it' : u'italiano', - 'eo' : u'esperanto', - 'es' : u'español', - }, + 'nl': { + 'translingual': u'Taalonafhankelijk', + 'nl': u'Nederlands', + 'en': u'Engels', + 'de': u'Duits', + 'fr': u'Frans', + 'it': u'Italiaans', + 'eo': u'Esperanto', + 'es': u'Spaans', + }, + 'de': { + 'translingual': u'???', + 'nl': u'Niederländisch', + 'en': u'Englisch', + 'de': u'Deutsch', + 'fr': u'Französisch', + 'it': u'Italienisch', + 'eo': u'Esperanto', + 'es': u'Spanisch', + }, + 'en': { + 'translingual': u'Translingual', + 'nl': u'Dutch', + 'en': u'English', + 'de': u'German', + 'fr': u'French', + 'it': u'Italian', + 'eo': u'Esperanto', + 'es': u'Spanish', + }, + 'eo': { + 'translingual': u'???', + 'nl': u'Nederlanda', + 'en': u'Angla', + 'de': u'Germana', + 'fr': u'Franca', + 'it': u'Italiana', + 'eo': u'Esperanto', + 'es': u'Hispana', + }, + 'ia': { + 'translingual': u'translingual', + 'nl': u'nederlandese', + 'en': u'anglese', + 'de': u'germano', + 'fr': u'francese', + 'it': u'italiano', + 'eo': u'esperanto', + 'es': u'espaniol', + }, + 'it': { + 'translingual': u'???', + 'nl': u'olandese', + 'en': u'inglese', + 'de': u'tedesco', + 'fr': u'francese', + 'it': u'italiano', + 'eo': u'esperanto', + 'es': u'spagnuolo', + }, + 'fr': { + 'translingual': u'???', + 'nl': u'néerlandais', + 'en': u'anglais', + 'de': u'allemand', + 'fr': u'français', + 'it': u'italien', + 'eo': u'espéranto', + 'es': u'espagnol', + }, + 'es': { + 'translingual': u'???', + 'nl': u'olandés', + 'en': u'inglés', + 'de': u'alemán', + 'fr': u'francés', + 'it': u'italiano', + 'eo': u'esperanto', + 'es': u'español', + }, } + def invertlangnames(): ''' @@ -164,58 +179,83 @@ parsing we need a dictionary to efficiently convert these back to iso abbreviations. The dictionary that gets created also contains common misspellings + ''' invertedlangnames = {} for ISOKey in langnames.keys(): for ISOKey2 in langnames[ISOKey].keys(): - lowercaselangname=langnames[ISOKey][ISOKey2].lower() - #Put in the names of the languages so we can easily do a reverse lookup lang name -> iso abbreviation + lowercaselangname = langnames[ISOKey][ISOKey2].lower() + # Put in the names of the languages so we can easily do a reverse + # lookup lang name -> iso abbreviation invertedlangnames.setdefault(lowercaselangname, ISOKey2) - # Now all the correct forms are in, but we also want to be able to find them when there are typos in them - for index in range(1,len(lowercaselangname)): + # Now all the correct forms are in, but we also want to be able to + # find them when there are typos in them + for index in range(1, len(lowercaselangname)): # So first we create all the possibilities with one letter gone - invertedlangnames.setdefault(lowercaselangname[:index]+lowercaselangname[index+1:], ISOKey2) + invertedlangnames.setdefault( + lowercaselangname[:index] + lowercaselangname[index + 1:], + ISOKey2) # Then we switch two consecutive letters - invertedlangnames.setdefault(lowercaselangname[:index-1]+lowercaselangname[index]+lowercaselangname[index-1]+lowercaselangname[index+1:], ISOKey2) - # There are of course other typos possible, but this caters for a lot of possibilities already - # TODO One other treatment that would make sense is to filter out the accents. + invertedlangnames.setdefault( + lowercaselangname[:index - 1] + + lowercaselangname[index] + + lowercaselangname[index - 1] + + lowercaselangname[index + 1:], + ISOKey2) + # There are of course other typos possible, but this caters for + # a lot of possibilities already + # TODO One other treatment that would make sense is to filter + # out the accents. return invertedlangnames + def createPOSlookupDict(): ''' The dictionary for looking up parts of speech gets completed with common misspellings + ''' for key in pos.keys(): - lowercasekey=key.lower() - value=pos[key] - for index in range(1,len(lowercasekey)): + lowercasekey = key.lower() + value = pos[key] + for index in range(1, len(lowercasekey)): # So first we create all the possibilities with one letter gone - pos.setdefault(lowercasekey[:index]+lowercasekey[index+1:], value) + pos.setdefault(lowercasekey[:index] + lowercasekey[index + 1:], + value) # Then we switch two consecutive letters - pos.setdefault(lowercasekey[:index-1]+lowercasekey[index]+lowercasekey[index-1]+lowercasekey[index+1:], value) - # There are of course other typos possible, but this caters for a lot of possibilities already + pos.setdefault(lowercasekey[:index - 1] + lowercasekey[index] + + lowercasekey[index - 1] + lowercasekey[index + 1:], + value) + # There are of course other typos possible, but this caters for a + # lot of possibilities already return pos + def createOtherHeaderslookupDict(): ''' The dictionary for looking up names of other headers gets completed with common misspellings + ''' for key in otherheaders.keys(): - lowercasekey=key.lower() - value=otherheaders[key] - for index in range(1,len(lowercasekey)): + lowercasekey = key.lower() + value = otherheaders[key] + for index in range(1, len(lowercasekey)): # So first we create all the possibilities with one letter gone - otherheaders.setdefault(lowercasekey[:index]+lowercasekey[index+1:], value) + otherheaders.setdefault(lowercasekey[:index] + + lowercasekey[index + 1:], value) # Then we switch two consecutive letters - otherheaders.setdefault(lowercasekey[:index-1]+lowercasekey[index]+lowercasekey[index-1]+lowercasekey[index+1:], value) - # There are of course other typos possible, but this caters for a lot of possibilities already + otherheaders.setdefault(lowercasekey[:index - 1] + + lowercasekey[index] + + lowercasekey[index - 1] + + lowercasekey[index + 1:], value) + # There are of course other typos possible, but this caters for a + # lot of possibilities already return otherheaders # Execute the functions that will take care of setting up and completing # lookup dictionaries for stuff that can appear in headers. -invertedlangnames=invertlangnames() +invertedlangnames = invertlangnames() createPOSlookupDict() createOtherHeaderslookupDict() diff --git a/wiktionary/term.py b/wiktionary/term.py index edef58e..a874a1a 100644 --- a/wiktionary/term.py +++ b/wiktionary/term.py @@ -3,185 +3,238 @@ import structs + class Term: """ This is a superclass for terms. """ - def __init__(self,lang,term,relatedwords=[],gender='',number=1,diminutive=False,wikiline=u''): - """ Constructor - Generally called with two parameters: - - The language of the term - - The term (string) - - relatedwords (list of Term objects) is optional + def __init__(self, lang, term, relatedwords=None, gender='', number=1, + diminutive=False, wikiline=u''): + """ Constructor + Generally called with two parameters: + - The language of the term + - The term (string) + + - relatedwords (list of Term objects) is optional """ - self.lang=lang - self.term=term - self.relatedwords=relatedwords - self.gender=gender # m: masculine, f: feminine, n: neutral, c: common - self.number=number # 1: singular, 2: plural - self.diminutive=diminutive # True: diminutive, False: not a diminutive + self.lang = lang + self.term = term + if relatedwords is None: + self.relatedwords = [] + else: + self.relatedwords = relatedwords + self.gender = gender # m: masculine, f: feminine, n: neutral, c: common + self.number = number # 1: singular, 2: plural + self.diminutive = diminutive if wikiline: - pos=wikiline.find("''") - if pos==-1: - pos=wikiline.find("{{") - if pos==-1: - pos=len(wikiline) - maybegender=wikiline[pos:].replace("'",'').replace('{','').replace('}','').strip() - self.term=wikiline[:pos].replace("[",'').replace(']','').strip() + pos = wikiline.find("''") + if pos == -1: + pos = wikiline.find("{{") + if pos == -1: + pos = len(wikiline) + maybegender = wikiline[pos:].replace("'", '').replace( + '{', '').replace('}', '').strip() + self.term = wikiline[:pos].replace("[", '').replace(']', '').strip() if 'm' in maybegender: - self.gender='m' + self.gender = 'm' if 'f' in maybegender: - self.gender='f' + self.gender = 'f' if 'n' in maybegender: - self.gender='n' + self.gender = 'n' if 'c' in maybegender: - self.gender='c' + self.gender = 'c' if 'p' in maybegender: - self.number=2 + self.number = 2 if 'dim' in maybegender: - self.diminutive=True + self.diminutive = True def __getitem__(self): """ Documenting as an afterthought is a bad idea - I don't know anymore why I added this, but I'm pretty sure it was in response to an error message + I don't know anymore why I added this, but I'm pretty sure it was in + response to an error message + """ return self - def setTerm(self,term): - self.term=term + def setTerm(self, term): + self.term = term def getTerm(self): return self.term - def setLang(self,lang): - self.lang=lang + def setLang(self, lang): + self.lang = lang def getLang(self): return self.lang - def setGender(self,gender): - self.gender=gender + def setGender(self, gender): + self.gender = gender def getGender(self): - return(self.gender) + return self.gender - def setNumber(self,number): - self.number=number + def setNumber(self, number): + self.number = number def getNumber(self): - return(self.number) + return self.number -# def setLabel(self,label): -# self.label=label.replace('<!--','').replace('-->','') +## def setLabel(self,label): +## self.label = label.replace('<!--', '').replace('-->', '') -# def getLabel(self): -# if self.label: -# return '<!--' + self.label + '-->' +## def getLabel(self): +## if self.label: +## return '<!--%s-->' % self.label - def wikiWrapGender(self,wikilang): - """ Returns a string with the gender in a format ready for Wiktionary, if it is applicable + def wikiWrapGender(self, wikilang): + """ Returns a string with the gender in a format ready for Wiktionary, + if it is applicable + """ if self.gender: - return ' ' + structs.wiktionaryformats[wikilang]['gender'].replace('%%gender%%',self.gender) + return ' %s' % ( + structs.wiktionaryformats[wikilang]['gender'].replace( + '%%gender%%', self.gender)) else: return '' - def wikiWrapAsExample(self,wikilang): - """ Returns a string with the gender in a format ready for Wiktionary, if it exists - """ - return structs.wiktionaryformats[wikilang]['beforeexampleterm'] + self.term + structs.wiktionaryformats[wikilang]['afterexampleterm'] + def wikiWrapAsExample(self, wikilang): + """ Returns a string with the gender in a format ready for Wiktionary, + if it exists - def wikiWrapForList(self,wikilang): - """ Returns a string with this term as a link followed by the gender in a format ready for Wiktionary """ - return '[[' + self.term + ']]' + return structs.wiktionaryformats[wikilang][ + 'beforeexampleterm'] + self.term + structs.wiktionaryformats[ + wikilang]['afterexampleterm'] - def wikiWrapAsTranslation(self,wikilang): - """ Returns a string with this term as a link followed by the gender in a format ready for Wiktionary + def wikiWrapForList(self, wikilang): + """ Returns a string with this term as a link followed by the gender + in a format ready for Wiktionary + """ - return '[[' + self.term + ']]' + return '[[%s]]' % self.term - def showContents(self,indentation): + def wikiWrapAsTranslation(self, wikilang): + """ Returns a string with this term as a link followed by the gender + in a format ready for Wiktionary + + """ + return '[[%s]]' % self.term + + def showContents(self, indentation): """ Prints the contents of this Term. - Every subobject is indented a little further on the screen. - The primary purpose is to help keep one's sanity while debugging. + Every subobject is indented a little further on the screen. + The primary purpose is to help keep one's sanity while debugging. + """ - print ' ' * indentation + 'lang = %s'% self.lang - print ' ' * indentation + 'pos = %s'% self.pos - print ' ' * indentation + 'term = %s'% self.term - print ' ' * indentation + 'relatedwords = %s'% self.relatedwords + print ' ' * indentation + 'lang = %s' % self.lang + print ' ' * indentation + 'pos = %s' % self.pos + print ' ' * indentation + 'term = %s' % self.term + print ' ' * indentation + 'relatedwords = %s' % self.relatedwords + class Noun(Term): """ This class inherits from Term. - It adds properties and methods specific to nouns + It adds properties and methods specific to nouns + """ - def __init__(self,lang,term,gender='',number=1,diminutive=False): + def __init__(self, lang, term, gender='', number=1, diminutive=False): """ Constructor - Generally called with two parameters: - - The language of the term - - The term (string) + Generally called with two parameters: + - The language of the term + - The term (string) - - gender is optional + - gender is optional + """ - self.pos='noun' # part of speech - Term.__init__(self,lang,term,gender=gender,number=number,diminutive=diminutive) + self.pos = 'noun' # part of speech + super(Noun, self).__init__(self, lang, term, gender=gender, + number=number, diminutive=diminutive) - def showContents(self,indentation): - Term.showContents(self,indentation) - print ' ' * indentation + 'gender = %s'% self.gender + def showContents(self, indentation): + Term.showContents(self, indentation) + print ' ' * indentation + 'gender = %s' % self.gender - def wikiWrapAsExample(self,wikilang): - """ Returns a string with the gender in a format ready for Wiktionary, if it exists + def wikiWrapAsExample(self, wikilang): + """ Returns a string with the gender in a format ready for Wiktionary, + if it exists + """ - return Term.wikiWrapAsExample(self, wikilang) + Term.wikiWrapGender(self,wikilang) + return Term.wikiWrapAsExample( + self, wikilang) + Term.wikiWrapGender(self, wikilang) - def wikiWrapForList(self,wikilang): - """ Returns a string with this term as a link followed by the gender in a format ready for Wiktionary - """ - return Term.wikiWrapForList(self, wikilang) + Term.wikiWrapGender(self, wikilang) + def wikiWrapForList(self, wikilang): + """ Returns a string with this term as a link followed by the gender in + a format ready for Wiktionary - def wikiWrapAsTranslation(self,wikilang): - """ Returns a string with this term as a link followed by the gender in a format ready for Wiktionary """ - return Term.wikiWrapAsTranslation(self, wikilang) + Term.wikiWrapGender(self, wikilang) + return Term.wikiWrapForList( + self, wikilang) + Term.wikiWrapGender(self, wikilang) + + def wikiWrapAsTranslation(self, wikilang): + """ Returns a string with this term as a link followed by the gender + in a format ready for Wiktionary + + """ + return Term.wikiWrapAsTranslation( + self, wikilang) + Term.wikiWrapGender(self, wikilang) + class Adjective(Term): - def __init__(self,lang,term,gender='',number=1): - self.pos='adjective' # part of speech - Term.__init__(self,lang,term,gender=gender,number=number) - def wikiWrapAsExample(self,wikilang): - """ Returns a string with the gender in a format ready for Wiktionary, if it exists - """ - return Term.wikiWrapAsExample(self, wikilang) + Term.wikiWrapGender(self,wikilang) + def __init__(self, lang, term, gender='', number=1): + self.pos = 'adjective' # part of speech + super(Adjective, self).__init__(self, lang, term, gender=gender, + number=number) - def wikiWrapForList(self,wikilang): - """ Returns a string with this term as a link followed by the gender in a format ready for Wiktionary - """ - return Term.wikiWrapForList(self, wikilang) + Term.wikiWrapGender(self, wikilang) + def wikiWrapAsExample(self, wikilang): + """ Returns a string with the gender in a format ready for Wiktionary, + if it exists - def wikiWrapAsTranslation(self,wikilang): - """ Returns a string with this term as a link followed by the gender in a format ready for Wiktionary """ - return Term.wikiWrapAsTranslation(self, wikilang) + Term.wikiWrapGender(self, wikilang) + return Term.wikiWrapAsExample( + self, wikilang) + Term.wikiWrapGender(self, wikilang) + + def wikiWrapForList(self, wikilang): + """ Returns a string with this term as a link followed by the gender in + a format ready for Wiktionary + + """ + return Term.wikiWrapForList( + self, wikilang) + Term.wikiWrapGender(self, wikilang) + + def wikiWrapAsTranslation(self, wikilang): + """ Returns a string with this term as a link followed by the gender + in a format ready for Wiktionary + + """ + return Term.wikiWrapAsTranslation( + self, wikilang) + Term.wikiWrapGender(self, wikilang) + class Verb(Term): - def __init__(self,lang,term): - self.pos='verb' # part of speech - Term.__init__(self,lang,term) - def showContents(self,indentation): - Term.showContents(self,indentation) + def __init__(self, lang, term): + self.pos = 'verb' # part of speech + super(Verb, self).__init__(self, lang, term) - def wikiWrapForList(self,wikilang): - """ Returns a string with this term as a link in a format ready for Wiktionary + def showContents(self, indentation): + Term.showContents(self, indentation) + + def wikiWrapForList(self, wikilang): + """ Returns a string with this term as a link in a format ready for + Wiktionary + """ - if wikilang=='en': + if wikilang == 'en': if self.term.lower().startswith('to '): - return 'to [[' + self.term[3:] + ']]' + return 'to [[%s]]' % self.term[3:] return Term.wikiWrapForList(self, wikilang) - def wikiWrapAsTranslation(self,wikilang): - """ Returns a string with this term as a link in a format ready for Wiktionary + def wikiWrapAsTranslation(self, wikilang): + """ Returns a string with this term as a link in a format ready for + Wiktionary + """ return Verb.wikiWrapForList(self, wikilang) diff --git a/wiktionary/termtest.py b/wiktionary/termtest.py index 1d71db3..7f4bf2e 100755 --- a/wiktionary/termtest.py +++ b/wiktionary/termtest.py @@ -6,59 +6,69 @@ import term import unittest + class KnownValues(unittest.TestCase): knownValues = ( - ('en','noun','en','example','', "'''example'''", '[[example]]'), - ('en','noun','nl','voorbeeld','n', "'''voorbeeld''' ''n''", "[[voorbeeld]] ''n''"), - ('nl','noun','nl','voorbeeld','n', "'''voorbeeld''' {{n}}", "[[voorbeeld]] {{n}}"), - ('en','verb','en','to show','', "'''to show'''", 'to [[show]]'), - ('en','verb','nl','tonen','', "'''tonen'''", "[[tonen]]"), - ('nl','verb','nl','tonen','', "'''tonen'''", "[[tonen]]"), - ) + ('en', 'noun', 'en', 'example', '', "'''example'''", '[[example]]'), + ('en', 'noun', 'nl', 'voorbeeld', 'n', "'''voorbeeld''' ''n''", + "[[voorbeeld]] ''n''"), + ('nl', 'noun', 'nl', 'voorbeeld', 'n', "'''voorbeeld''' {{n}}", + "[[voorbeeld]] {{n}}"), + ('en', 'verb', 'en', 'to show', '', "'''to show'''", 'to [[show]]'), + ('en', 'verb', 'nl', 'tonen', '', "'''tonen'''", "[[tonen]]"), + ('nl', 'verb', 'nl', 'tonen', '', "'''tonen'''", "[[tonen]]"), + ) def testTermKnownValuesWikiWrapAsExample(self): """WikiWrap output correct for a term used as an example""" - for wikilang, pos, termlang, thisterm, termgender, asexample, forlist in self.knownValues: - if pos=='noun': + for wikilang, pos, termlang, thisterm, termgender, asexample, \ + forlist in self.knownValues: + if pos == 'noun': aterm = term.Noun(termlang, thisterm, gender=termgender) - if pos=='verb': + if pos == 'verb': aterm = term.Verb(termlang, thisterm) result = aterm.wikiWrapAsExample(wikilang) self.assertEqual(asexample, result) def testTermKnownValuesWikiWrapForList(self): """WikiWrap output correct for a term when used in a list""" - for wikilang, pos, termlang, thisterm, termgender, asexample, forlist in self.knownValues: - if pos=='noun': + for wikilang, pos, termlang, thisterm, termgender, asexample, \ + forlist in self.knownValues: + if pos == 'noun': aterm = term.Noun(termlang, thisterm, gender=termgender) - if pos=='verb': + if pos == 'verb': aterm = term.Verb(termlang, thisterm) result = aterm.wikiWrapForList(wikilang) self.assertEqual(forlist, result) def testTermKnownValuesWikiWrapAsTranslation(self): """WikiWrap output correct for a term when used as a translation""" - for wikilang, pos, termlang, thisterm, termgender, asexample, forlist in self.knownValues: - if pos=='noun': + for wikilang, pos, termlang, thisterm, termgender, asexample, \ + forlist in self.knownValues: + if pos == 'noun': aterm = term.Noun(termlang, thisterm, gender=termgender) - if pos=='verb': + if pos == 'verb': aterm = term.Verb(termlang, thisterm) result = aterm.wikiWrapAsTranslation(wikilang) self.assertEqual(forlist, result) knownParserValues = ( - ("[[example]] ",'en','example','',1), - ("[[voorbeeld]] ''n''",'nl','voorbeeld','n',1), - ("[[voorbeeld]] {{n}}",'nl','voorbeeld','n',1), - ("[[voorbeelden]] ''n, pl''",'nl','voorbeelden','n',2), - ("[[voorbeelden]] {{n}},{{p}}",'nl','voorbeelden','n',2), -# ("to [[show]]",'en','to show','',1), - ("[[tonen]]",'nl','tonen','',1), - ) + ("[[example]] ", 'en', 'example', '', 1), + ("[[voorbeeld]] ''n''", 'nl', 'voorbeeld', 'n', 1), + ("[[voorbeeld]] {{n}}", 'nl', 'voorbeeld', 'n', 1), + ("[[voorbeelden]] ''n, pl''", 'nl', 'voorbeelden', 'n', 2), + ("[[voorbeelden]] {{n}},{{p}}", 'nl', 'voorbeelden', 'n', 2), +## ("to [[show]]", 'en', 'to show', '', 1), + ("[[tonen]]", 'nl', 'tonen', '', 1), + ) def testParser(self): - '''self.term, self.gender and self.number parsed correctly from Wiki format''' - for wikiline, termlang, thisterm, termgender, termnumber in self.knownParserValues: + '''self.term, self.gender and self.number parsed correctly from Wiki + format + + ''' + for wikiline, termlang, thisterm, termgender, termnumber in \ + self.knownParserValues: aterm = term.Term(termlang, '', wikiline=wikiline) self.assertEqual(aterm.getTerm(), thisterm) self.assertEqual(aterm.getGender(), termgender) @@ -66,4 +76,3 @@ if __name__ == "__main__": unittest.main() - diff --git a/wiktionary/wiktionarypage.py b/wiktionary/wiktionarypage.py index da1506c..77efae1 100644 --- a/wiktionary/wiktionarypage.py +++ b/wiktionary/wiktionarypage.py @@ -3,11 +3,17 @@ ''' This module contains code to store Wiktionary content in Python objects. -The objects can output the content again in Wiktionary format by means of the wikiWrap methods +The objects can output the content again in Wiktionary format by means of the +wikiWrap methods -I'm currently working on a parser that can read the textual version in the various Wiktionary formats and store what it finds in the Python objects. +I'm currently working on a parser that can read the textual version in the +various Wiktionary formats and store what it finds in the Python objects. -The code is still very much alpha level and the scope of what it can do is still rather limited, only 3 parts of speech, only 2 different Wiktionary output formats, only langnames matrix for about 8 languages. One of the things on the todo list is to harvest the content of this matrix dictionary from the various Wiktionary projects. GerardM put them all on line in templates already. +The code is still very much alpha level and the scope of what it can do is +still rather limited, only 3 parts of speech, only 2 different Wiktionary +output formats, only langnames matrix for about 8 languages. One of the things +on the todo list is to harvest the content of this matrix dictionary from the +various Wiktionary projects. GerardM put them all on line in templates already. ''' import entry @@ -18,30 +24,33 @@ import meaning import term + class WiktionaryPage: """ This class contains all that can appear on one Wiktionary page """ - def __init__(self,wikilang,term): # wikilang here refers to the language of the Wiktionary this page belongs to + def __init__(self, wikilang, term): """ Constructor - Called with two parameters: - - the language of the Wiktionary the page belongs to - - the term that is described on this page + Called with two parameters: + - the language of the Wiktionary the page belongs to + - the term that is described on this page + """ - self.wikilang=wikilang - self.term=term - self.entries = {} # entries is a dictionary of entry objects indexed by entrylang + self.wikilang = wikilang + self.term = term + # entries is a dictionary of entry objects indexed by entrylang + self.entries = {} self.sortedentries = [] self.interwikilinks = [] self.categories = [] - def setWikilang(self,wikilang): + def setWikilang(self, wikilang): """ This method allows to switch the language on the fly """ - self.wikilang=wikilang + self.wikilang = wikilang - def addEntry(self,entry): + def addEntry(self, entry): """ Add an entry object to this page object """ -# self.entries.setdefault(entry.entrylang, []).append(entry) - self.entries[entry.entrylang]=entry +## self.entries.setdefault(entry.entrylang, []).append(entry) + self.entries[entry.entrylang] = entry def listEntries(self): """ Returns a dictionary of entry objects for this entry """ @@ -54,136 +63,146 @@ if not self.entries == {}: self.sortedentries = self.entries.keys() - self.sortedentries.sort(sortonlanguagename.sortonlanguagename(structs.langnames[self.wikilang])) + self.sortedentries.sort(sortonlanguagename.sortonlanguagename( + structs.langnames[self.wikilang])) try: - samelangentrypos=self.sortedentries.index(self.wikilang) - except (ValueError): + samelangentrypos = self.sortedentries.index(self.wikilang) + except ValueError: # wikilang isn't in the list, do nothing pass else: - samelangentry=self.sortedentries[samelangentrypos] + samelangentry = self.sortedentries[samelangentrypos] self.sortedentries.remove(self.wikilang) - self.sortedentries.insert(0,samelangentry) + self.sortedentries.insert(0, samelangentry) try: - translingualentrypos=self.sortedentries.index(u'translingual') - except (ValueError): + translingualentrypos = self.sortedentries.index(u'translingual') + except ValueError: # translingual isn't in the list, do nothing pass else: - translingualentry=self.sortedentries[translingualentrypos] + translingualentry = self.sortedentries[translingualentrypos] self.sortedentries.remove(u'translingual') - self.sortedentries.insert(0,translingualentry) + self.sortedentries.insert(0, translingualentry) - def addLink(self,link): + def addLink(self, link): """ Add a link to another wikimedia project """ - link=link.replace('[','').replace(']','') - pos=link.find(':') - if pos!=1: - link=link[:pos] + link = link.replace('[', '').replace(']', '') + pos = link.find(':') + if pos != 1: + link = link[:pos] self.interwikilinks.append(link) - # print self.interwikilinks +## print self.interwikilinks - def addCategory(self,category): + def addCategory(self, category): """ Add a link to another wikimedia project """ self.categories.append(category) - def parseWikiPage(self,content): + def parseWikiPage(self, content): '''This function will parse the content of a Wiktionary page - and read it into our object structure. - It returns a list of dictionaries. Each dictionary contains a header object - and the textual content found under that header. Only relevant content is stored. - Empty lines and lines to create tables for presentation to the user are taken out.''' + and read it into our object structure. + It returns a list of dictionaries. Each dictionary contains a header + object and the textual content found under that header. Only relevant + content is stored. Empty lines and lines to create tables for + presentation to the user are taken out. + + ''' templist = [] context = {} aheader = '' - splitcontent=[] - content=content.split('\n') + splitcontent = [] + content = content.split('\n') for line in content: - # print line +## print line # Let's get rid of line breaks and extraneous white space - line=line.replace('\n','').strip() - # Let's start by looking for general stuff, that provides information which is - # interesting to store at the page level + line = line.replace('\n', '').strip() + # Let's start by looking for general stuff, that provides + # information which is interesting to store at the page level if '{wikipedia}' in line.lower(): self.addLink('wikipedia') continue if '[[category:' in line.lower(): - category=line.split(':')[1].replace(']','') + category = line.split(':')[1].replace(']', '') self.addCategory(category) -# print 'category: ', category +## print 'category: ', category continue if '|' not in line: - bracketspos=line.find('[[') - colonpos=line.find(':') - if bracketspos!=-1 and colonpos!=-1 and bracketspos < colonpos: + bracketspos = line.find('[[') + colonpos = line.find(':') + if bracketspos != -1 and colonpos != -1 and \ + bracketspos < colonpos: # This seems to be an interwikilink # If there is a pipe in it, it's not a simple interwikilink - linkparts=line.replace(']','').replace('[','').split(':') - lang=linkparts[0] - linkto=linkparts[1] - if len(lang)>1 and len(lang)<4: - self.addLink(lang+':'+linkto) + linkparts = line.replace(']', + '').replace('[', '').split(':') + lang = linkparts[0] + linkto = linkparts[1] + if len(lang) > 1 and len(lang) < 4: + self.addLink(lang + ':' + linkto) continue - # store empty lines literally, this is necessary for the blocks we don't parse - # and will return literally - if len(line) <2: + # store empty lines literally, this is necessary for the blocks we + # don't parse and will return literally + if len(line) < 2: templist.append(line) continue -# print 'line0:',line[0], 'line-2:',line[-2],'|','stripped line-2',line.rstrip()[-2] - if line.strip()[0]=='='and line.rstrip()[-2]=='=' or '{{-' in line and '-}}' in line: - # When a new header is encountered, it is necessary to store the information - # encountered under the previous header. +## print 'line0:', line[0], 'line-2:', line[-2],'|', +## print 'stripped line-2', print line.rstrip()[-2] + if line.strip()[0] == '=' and line.rstrip()[-2] == '=' or \ + '{{-' in line and '-}}' in line: + # When a new header is encountered, it is necessary to store + # the information encountered under the previous header. if templist and aheader: - tempdictstructure={'text': templist, - 'header': aheader, - 'context': copy.copy(context), - } - templist=[] + tempdictstructure = {'text': templist, + 'header': aheader, + 'context': copy.copy(context), + } + templist = [] splitcontent.append(tempdictstructure) -# print "splitcontent: ",splitcontent,"\n\n" - aheader=header.Header(line) -# print "Header parsed:",aheader.level, aheader.header, aheader.type, aheader.contents - if aheader.type==u'lang': - context['lang']=aheader.contents - if aheader.type==u'pos': +## print "splitcontent: ",splitcontent,"\n\n" + aheader = header.Header(line) +## print "Header parsed:", aheader.level, aheader.header, +## print aheader.type, aheader.contents + if aheader.type == u'lang': + context['lang'] = aheader.contents + if aheader.type == u'pos': if not 'lang' in context: # This entry lacks a language indicator, - # so we assume it is the same language as the Wiktionary we're working on - context['lang']=self.wikilang - context['pos']=aheader.contents + # so we assume it is the same language as the + # Wiktionary we're working on + context['lang'] = self.wikilang + context['pos'] = aheader.contents else: # It's not a header line, so we add it to a temporary list # containing content lines - if aheader.contents==u'trans': - # Under the translations header there is quite a bit of stuff - # that's only needed for formatting, we can just skip that - # and go on processing the next line + if aheader.contents == u'trans': + # Under the translations header there is quite a bit of + # stuff that's only needed for formatting, we can just skip + # that and go on processing the next line lower = line.lower() - if '{top}' in lower: continue - if '{mid}' in lower: continue - if '{bottom}' in lower: continue - if '|-' in line: continue - if '{|' in line: continue - if '|}' in line: continue - if 'here-->' in lower: continue - if 'width=' in lower: continue - if '<!--left column' in lower: continue - if '<!--right column' in lower: continue + if ('{top}' in lower or + '{mid}' in lower or + '{bottom}' in lower or + '|-' in line or + '{|' in line or + '|}' in line or + 'here-->' in lower or + 'width=' in lower or + '<!--left column' in lower or + '<!--right column' in lower): + continue templist.append(line) # Let's not forget the last block that was encountered if templist: - tempdictstructure={'text': templist, - 'header': aheader, - 'context': copy.copy(context), - } + tempdictstructure = {'text': templist, + 'header': aheader, + 'context': copy.copy(context), + } splitcontent.append(tempdictstructure) - # make sure variables are defined before they are used gender = sample = plural = diminutive = label = definition = '' @@ -191,127 +210,154 @@ diminutive = False examples = [] for contentblock in splitcontent: - headercontent=contentblock['header'].contents + headercontent = contentblock['header'].contents -# print "contentblock:",contentblock -# print contentblock['header'] +## print "contentblock:",contentblock +## print contentblock['header'] # Now we parse the text blocks. - # Let's start by describing what to do with content found under the POS header - if contentblock['header'].type==u'pos': - flag=False + # Let's start by describing what to do with content found under + # the POS header + if contentblock['header'].type == u'pos': + flag = False for line in contentblock['text']: -# print line +## print line if line[:3] == "'''": # This seems to be an ''inflection line'' # It can be built up like this: '''sample''' - # Or more elaborately like this: '''staal''' ''n'' (Plural: [[stalen]], diminutive: [[staaltje]]) + # Or more elaborately like this: + # '''staal''' ''n'' (Plural: [[stalen]], + # diminutive: [[staaltje]]) # Or like this: {{en-infl-reg-other-e|ic|e}} # Let's first get rid of parentheses and brackets: - line=line.replace('(','').replace(')','').replace('[','').replace(']','') + line = line.replace('(', '').replace(')', '').replace( + '[', '').replace(']', '') # Then we can split it on the spaces for part in line.split(' '): -# print part[:3], "Flag:", flag - if flag==False and part[:3] == "'''": - sample=part.replace("'",'').strip() -# print 'Sample:', sample - # OK, so this should be an example of the term we are describing - # maybe it is necessary to compare it to the title of the page +## print part[:3], "Flag:", flag + if not flag and part[:3] == "'''": + sample = part.replace("'", '').strip() +## print 'Sample:', sample + # OK, so this should be an example of the term + # we are describing. Maybe it is necessary to + # compare it to the title of the page if sample: for subpart in line.split(' '): - maybegender=part.replace("'",'').replace("}",'').replace("{",'').lower() - if maybegender=='m': - gender='m' - if maybegender=='f': - gender='f' - if maybegender=='n': - gender='n' - if maybegender=='c': - gender='c' - if maybegender[:1]=='p': - number=2 - if maybegender[:3]=='dim': - diminutive=True -# print 'Gender: ',gender - if part.replace("'",'')[:2].lower()=='pl': - flag='plural' - if part.replace("'",'')[:3].lower()=='dim': - flag='diminutive' - if flag=='plural': - plural=part.replace(',','').replace("'",'').strip() -# print 'Plural: ',plural - if flag=='diminutive': - diminutive=part.replace(',','').replace("'",'').strip() -# print 'Diminutive: ',diminutive + maybegender = part.replace( + "'", '').replace("}", '').replace( + "{", '').lower() + if maybegender == 'm': + gender = 'm' + if maybegender == 'f': + gender = 'f' + if maybegender == 'n': + gender = 'n' + if maybegender == 'c': + gender = 'c' + if maybegender[:1] == 'p': + number = 2 + if maybegender[:3] == 'dim': + diminutive = True +## print 'Gender: ',gender + if part.replace("'", '')[:2].lower() == 'pl': + flag = 'plural' + if part.replace("'", '')[:3].lower() == 'dim': + flag = 'diminutive' + if flag == 'plural': + plural = part.replace(',', '').replace( + "'", '').strip() +## print 'Plural: ',plural + if flag == 'diminutive': + diminutive = part.replace( + ',', '').replace("'", '').strip() +## print 'Diminutive: ',diminutive if line[:2] == "{{": # Let's get rid of accolades: - line=line.replace('{','').replace('}','') + line = line.replace('{', '').replace('}', '') # Then we can split it on the dashes - parts=line.split('-') - lang=parts[0] - what=parts[1] - mode=parts[2] - other=parts[3] - infl=parts[4].split('|') + parts = line.split('-') + lang = parts[0] + what = parts[1] + mode = parts[2] + other = parts[3] + infl = parts[4].split('|') if sample: # We can create a Term object # TODO which term object depends on the POS -# print "contentblock['context'].['lang']", contentblock['context']['lang'] - if headercontent=='noun': - theterm=term.Noun(lang=contentblock['context']['lang'], term=sample, gender=gender, number=number, diminutive=diminutive) - if headercontent=='verb': - theterm=term.Verb(lang=contentblock['context']['lang'], term=sample) - sample='' -# raw_input("") +## print "contentblock['context'].['lang']", +## print contentblock['context']['lang'] + if headercontent == 'noun': + theterm = term.Noun( + lang=contentblock['context']['lang'], + term=sample, gender=gender, number=number, + diminutive=diminutive) + if headercontent == 'verb': + theterm = term.Verb( + lang=contentblock['context']['lang'], + term=sample) + sample = '' +## raw_input("") if line[:1].isdigit(): - # Somebody didn't like automatic numbering and added static numbers - # of their own. Let's get rid of them + # Somebody didn't like automatic numbering and added + # static numbers of their own. Let's get rid of them while line[:1].isdigit(): - line=line[1:] - # and replace them with a hash, so the following if block picks it up + line = line[1:] + # and replace them with a hash, so the following if + # block picks it up line = '#' + line if line[:1] == "#": # This probably is a definition - # If we already had a definition we need to store that one's data - # in a Meaning object and make that Meaning object part of the Page object + # If we already had a definition we need to store that + # one's data in a Meaning object and make that Meaning + # object part of the Page object if definition: - ameaning = meaning.Meaning(term=theterm,definition=definition, label=label, examples=examples) + ameaning = meaning.Meaning(term=theterm, + definition=definition, + label=label, + examples=examples) # sample # plural and diminutive belong with the Noun object - # comparative and superlative belong with the Adjective object - # conjugations belong with the Verb object + # comparative and superlative belong with the + # Adjective object conjugations belong with the + # Verb object # Reset everything for the next round - sample = plural = diminutive = label = definition = '' + sample = plural = diminutive = label = '' + definition = '' examples = [] - if not contentblock['context']['lang'] in self.entries: - # If no entry for this language has been foreseen yet - # let's create one - anentry = entry.Entry(contentblock['context']['lang']) + if not contentblock[ + 'context']['lang'] in self.entries: + # If no entry for this language has been + # foreseen yet. Let's create one + anentry = entry.Entry( + contentblock['context']['lang']) # and add it to our page object self.addEntry(anentry) # Then we can easily add this meaning to it. anentry.addMeaning(ameaning) - pos=line.find('<!--') - if pos!=-1 and pos < 4: - # A html comment at the beginning of the line means this entry already has disambiguation labels, great - pos2=line.find('-->') - label=line[pos+4:pos2] - definition=line[pos2+1:] -# print 'label:',label + pos = line.find('<!--') + if pos != -1 and pos < 4: + # A html comment at the beginning of the line + # means this entry already has disambiguation + # labels, great + pos2 = line.find('-->') + label = line[pos + 4:pos2] + definition = line[pos2 + 1:] +## print 'label:',label else: - definition=line[1:].strip() -# print "Definition: ", definition + definition = line[1:].strip() +## print "Definition: ", definition if line[:2] == "#:": # This is an example for the preceding definition - example=line[2:] -# print "Example:", example + example = line[2:] +## print "Example:", example examples.add(example) # Make sure we store the last definition if definition: - ameaning = meaning.Meaning(term=theterm, definition=definition, label=label, examples=examples) + ameaning = meaning.Meaning(term=theterm, definition=definition, + label=label, examples=examples) if not contentblock['context']['lang'] in self.entries: # If no entry for this language has been foreseen yet # let's create one @@ -321,22 +367,32 @@ # Then we can easily add this meaning to it. anentry.addMeaning(ameaning) - winner = False # This is going to contain the Meaning object which has the Definition which matches the Concisedef of the entry we are working on right now - if headercontent=='trans' or headercontent=='syn' or headercontent=='ant': - # On the English Wiktionary we will find concisedefs here to link definitions to the content of these sections, but only if there is more than one definition. - print "number of meanings:",len(anentry.meanings.keys()) - concisedefclean='' + # This is going to contain the Meaning object which has the + # Definition which matches the Concisedef of the entry we are + # working on right now: + winner = False + if headercontent == 'trans' or headercontent == 'syn' or \ + headercontent == 'ant': + # On the English Wiktionary we will find concisedefs here to + # link definitions to the content of these sections, but only + # if there is more than one definition. + print "number of meanings:", len(anentry.meanings.keys()) + concisedefclean = '' for line in contentblock['text']: if line[:3] == "'''": # This seems to be a line containing a concisedef - concisedef=line.replace("'''",'').strip() - concisedefclean=concisedef.replace("(",'').replace(")",'').replace("'",'').replace(":",'').replace(".",'').lower() + concisedef = line.replace("'''", '').strip() + concisedefclean = concisedef.replace( + "(", '').replace(")", '').replace("'", '').replace( + ":", '').replace(".", '').lower() if line[:2] == "*(": # This seems to be a line containing a concisedef - pos=line.find(')') - concisedef=line[2:pos].strip() - concisedefclean=concisedef.replace("(",'').replace(")",'').replace("'",'').replace(":",'').replace(".",'').lower() - restofline=line[pos+2:].strip() + pos = line.find(')') + concisedef = line[2:pos].strip() + concisedefclean = concisedef.replace("(", '').replace( + ")", '').replace("'", '').replace(":", '').replace( + ".", '').lower() + restofline = line[pos + 2:].strip() # Now we have this concisedef, it's worthless if it can't # be matched to a definition in order to know to what # meaning the following content belongs to @@ -344,39 +400,52 @@ # Let's start by creating a list of meanings for the entry # we're working on if concisedefclean: - highest=0 - winner=anentry.meanings[contentblock['context']['pos']][0] - for anothermeaning in anentry.meanings[contentblock['context']['pos']]: - score=0 + highest = 0 + winner = anentry.meanings[ + contentblock['context']['pos']][0] + for anothermeaning in anentry.meanings[ + contentblock['context']['pos']]: + score = 0 for word in concisedefclean.split(): - definition=anothermeaning.definition.replace("(",'').replace(")",'').replace("'",'').replace(":",'').replace(".",'').replace("#",'').lower() - if len(word)>1 and ' '+word+' ' in definition: - score+=1 - if len(word)>2 and word in definition: - score+=1 - if score>highest: - highest=score - winner=anothermeaning -# print 'winner:',winner.definition, 'score:',highest + definition = anothermeaning.definition.replace( + "(", '').replace(")", '').replace( + "'", '').replace(":", '').replace( + ".", '').replace("#", '').lower() + if len(word) > 1 and \ + ' %s ' % word in definition: + score += 1 + if len(word) > 2 and word in definition: + score += 1 + if score > highest: + highest = score + winner = anothermeaning +## print 'winner:', winner.definition, 'score:', highest winner.setConciseDef(concisedef) - if headercontent=='trans': - """ - We have to find a way to read the rest of the lines until the next ConciseDef into a structure, that can be processed later on. In contrast to a list of synonyms where the synonyms are on the rest of the lines, translations are on the following lines. - It's also possible that there is no concisedef and that the translation's block simpy starts... or that there are numbers instead of concisedefs. - """ + if headercontent == 'trans': + # We have to find a way to read the rest of the + # lines until the next ConciseDef into a structure, + # that can be processed later on. In contrast to a + # list of synonyms where the synonyms are on the + # rest of the lines, translations are on the + # following lines. It's also possible that there + # is no concisedef and that the translation's block + # simpy starts... or that there are numbers instead + # of concisedefs. + pass - if headercontent=='syn': -# print 'syn',restofline + if headercontent == 'syn': +## print 'syn', restofline winner.parseSynonyms(restofline) - if headercontent=='trans': -# print 'trans',restofline + if headercontent == 'trans': +## print 'trans', restofline winner.parseTranslations(line) # raw_input("") def wikiWrap(self): """ Returns a string that is ready to be submitted to Wiktionary for - this page + this page + """ page = '' self.sortEntries() @@ -387,21 +456,22 @@ print "Entries:", self.entries[index] entry = self.entries[index] print entry - if first == False: - page = page + '\n----\n' + if not first: + page += '\n----\n' else: first = False page = page + entry.wikiWrap(self.wikilang) # Add interwiktionary links at bottom of page for link in self.interwikilinks: - page = page + '[' + link + ':' + self.term + ']\n' + page += '[' + link + ':' + self.term + ']\n' return page def showContents(self): """ Prints the contents of all the subobjects contained in this page. - Every subobject is indented a little further on the screen. - The primary purpose is to help keep one's sanity while debugging. + Every subobject is indented a little further on the screen. + The primary purpose is to help keep one's sanity while debugging. + """ indentation = 0 print ' ' * indentation + 'wikilang = %s' % self.wikilang @@ -411,15 +481,11 @@ entrieskeys = self.entries.keys() for entrieskey in entrieskeys: for entry in self.entries[entrieskey]: - entry.showContents(indentation+2) + entry.showContents(indentation + 2) if __name__ == '__main__': - temp() - - ofn = 'wiktionaryentry.txt' content = open(ofn).readlines() - - apage = WiktionaryPage(wikilang,pagetopic) + apage = WiktionaryPage(wikilang, pagetopic) apage.parseWikiPage(content) diff --git a/wiktionary/wiktionarypagetest.py b/wiktionary/wiktionarypagetest.py index 2e6566b..1c65466 100644 --- a/wiktionary/wiktionarypagetest.py +++ b/wiktionary/wiktionarypagetest.py @@ -45,9 +45,9 @@ """ knownvalues = ( -{'wikilang': 'en', - 'term': 'nut', - 'wikiformat': u"""==English== + {'wikilang': 'en', + 'term': 'nut', + 'wikiformat': u"""==English== ===Etymology=== From Middle English [[nute]], from Old English [[hnutu]]. <!-- Is Latin [[nux]], nuc- a cognate? --> ===Pronunciation=== @@ -179,210 +179,242 @@ [[Category:Trees]] [[category:Foods]] """, - 'internalrep': - ( - [u'1000 English basic words', u'Colors', u'Browns', u'Trees', u'Foods'], - [u'io', 'la'], - {u'en': - [u'nut', None, u'nuts', - [{'definition': u'A hard-shelled seed.', - 'concisedef': u'seed', + 'internalrep': + ( + [u'1000 English basic words', u'Colors', u'Browns', u'Trees', + u'Foods'], + [u'io', 'la'], + {u'en': + [u'nut', None, u'nuts', + [{'definition': u'A hard-shelled seed.', + 'concisedef': u'seed', + 'trans': {'remark': '', + 'alltrans': { + 'nl': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"noot", 'f', 1)} + ] + }, + 'de': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"Nuss", 'f', 1)} + ] + }, + 'it': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"noce", 'f', 1)} + ] + }, + 'la': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"nux", '', 1)} + ] + }, + } + } + }, + {'definition': + u"A piece of metal, often [[hexagonal]], with a hole through it with internal threading intended to fit on to a bolt.", + 'concisedef': u'that fits on a bolt', + 'trans': {'remark': '', + 'alltrans': { + 'nl': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"moer", 'f', 1)} + ] + }, + 'fr': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"écrou", 'm', 1)} + ] + }, + 'de': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"Mutter", 'f', 1)} + ] + }, + 'it': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"dado", 'm', 1)} + ] + } + } + } + }, + {'definition': u"(''informal'') An insane person.", + 'concisedef': u"informal: insane person", + 'syns': {'remark': '', + 'synonyms': [{'remark': '', + 'synonym': u"loony"}, + {'remark': '', + 'synonym': u"nutcase"}, + {'remark': '', + 'synonym': u"nutter"} + ] + }, 'trans': {'remark': '', 'alltrans': { - 'nl': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"noot", 'f', 1)} - ] - }, -# 'fr': u"""''no generic translation exists''; [[noix]] ''f'' ''is often used, but this actually means "[[walnut]]"''""", - 'de': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"Nuss", 'f', 1)} - ] - }, - 'it': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"noce", 'f', 1)} - ] - }, - 'la': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"nux", '', 1)} - ] - }, - } - } - }, - {'definition': u"A piece of metal, often [[hexagonal]], with a hole through it with internal threading intended to fit on to a bolt.", - 'concisedef': u'that fits on a bolt', - 'trans': {'remark': '', - 'alltrans': { - 'nl': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"moer", 'f', 1)} - ] - }, - 'fr': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"écrou", 'm', 1)} - ] - }, - 'de': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"Mutter", 'f', 1)} - ] - }, - 'it': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"dado", 'm', 1)} - ] - } - } - } - }, - {'definition': u"(''informal'') An insane person.", - 'concisedef': u"informal: insane person", - 'syns': {'remark': '', - 'synonyms': [{'remark': '', - 'synonym': u"loony"}, - {'remark': '', - 'synonym': u"nutcase"}, - {'remark': '', - 'synonym': u"nutter"} - ] - }, - 'trans': {'remark': '', - 'alltrans': { - 'nl': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"gek", 'm', 1)}, - {'remark': '', - 'translation': (u"gekkin", 'f', 1)}, - {'remark': '', - 'translation': (u"zot", 'm', 1)}, - {'remark': '', - 'translation': (u"zottin", 'f', 1)} - ] - }, - 'fr': {'remark': '', - 'translations': [{'remark': '', - 'translation': ("fou", 'm', 1)}, - {'remark': '', - 'translation': ("folle", 'f', 1)} - ] - }, - 'de': {'remark': '', - 'translations': [{'remark': '', - 'translation': ("Irre", 'mf', 1)}, - {'remark': '', - 'translation': ("Irrer", 'm indef.', 1)} - ] - } - } - } - }, - {'definition': u"(''slang'') The head.", - 'concisedef': u"slang: the head", - 'syns': {'remark': '(See further synonyms under [[head]])', - 'synonyms': [{'remark': '', - 'synonym': u"bonce"}, - {'remark': '', - 'synonym': u"noddle"}]}, - 'trans': {'remark': '', - 'alltrans': { - 'de': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"Birne", 'f', 1)}, - {'remark': '', - 'translation': ("Rübe", 'f', 1)}, - {'remark': '', - 'translation': ("Dötz", 'm', 1)} - ] - } - } + 'nl': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"gek", 'm', 1)}, + {'remark': '', + 'translation': + (u"gekkin", 'f', 1)}, + {'remark': '', + 'translation': + (u"zot", 'm', 1)}, + {'remark': '', + 'translation': + (u"zottin", 'f', 1)} + ] + }, + 'fr': {'remark': '', + 'translations': [{'remark': '', + 'translation': ("fou", 'm', 1)}, + {'remark': '', + 'translation': + ("folle", 'f', 1)} + ] + }, + 'de': {'remark': '', + 'translations': [{'remark': '', + 'translation': + ("Irre", 'mf', 1)}, + {'remark': '', + 'translation': + ("Irrer", 'm indef.', 1)} + ] + } } - }, - {'definition': u"(''slang; rarely used in the singular'') A testicle.", - 'concisedef': u"slang: testicle", - 'syns': {'remark': '', - 'synonyms': [{'remark': '', - 'synonym': u"ball"}, - {'remark': "(''taboo slang'')", - 'synonym': u"bollock"}, - {'remark': '', - 'synonym': u"nad"}]}, - 'trans': {'remark': '', - 'alltrans': {'nl': {'remark': '<!--Never heard this before-->', - 'translations': [{'remark': '', - 'translation': (u"noten", 'm', 2)}, - {'remark': '', - 'translation': ("bal", 'm', 1)}, - {'remark': '', - 'translation': ("teelbal", 'm', 1)} - ] - }, - 'fr': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"couille", 'f', 1)} - ] - }, - 'de': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"Ei", 'n', 1)}, - {'remark': u"''lately:''", - 'translation': (u"Nuss", 'f', 1)} - ] - }, - 'es': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"cojone", '', 1)}, - {'remark': '', - 'translation': (u"huevo", '', 1)} - ] - } - } - }, - } - ], - ], - u'nl': - [u'nut', 'n', None, - [{'definition': u'[[use]], [[benefit]]', 'concisedef': u''}] - ], - } - ) - }, -{'wikilang': 'nl', - 'term': 'dummy', - 'wikiformat': u""" + } + }, + {'definition': u"(''slang'') The head.", + 'concisedef': u"slang: the head", + 'syns': {'remark': '(See further synonyms under [[head]])', + 'synonyms': [{'remark': '', + 'synonym': u"bonce"}, + {'remark': '', + 'synonym': u"noddle"}]}, + 'trans': {'remark': '', + 'alltrans': { + 'de': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"Birne", 'f', 1)}, + {'remark': '', + 'translation': + ("Rübe", 'f', 1)}, + {'remark': '', + 'translation': + ("Dötz", 'm', 1)} + ] + } + } + } + }, + {'definition': + u"(''slang; rarely used in the singular'') A testicle.", + 'concisedef': u"slang: testicle", + 'syns': {'remark': '', + 'synonyms': [{'remark': '', + 'synonym': u"ball"}, + {'remark': "(''taboo slang'')", + 'synonym': u"bollock"}, + {'remark': '', + 'synonym': u"nad"}]}, + 'trans': {'remark': '', + 'alltrans': { + 'nl': {'remark': + '<!--Never heard this before-->', + 'translations': [{'remark': '', + 'translation': + (u"noten", 'm', 2)}, + {'remark': '', + 'translation': + ("bal", 'm', 1)}, + {'remark': '', + 'translation': + ("teelbal", 'm', 1)} + ] + }, + 'fr': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"couille", 'f', 1)} + ] + }, + 'de': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"Ei", 'n', 1)}, + {'remark': + u"''lately:''", + 'translation': + (u"Nuss", 'f', 1)} + ] + }, + 'es': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"cojone", '', 1)}, + {'remark': '', + 'translation': + (u"huevo", '', 1)} + ] + } + } + }, + } + ], + ], + u'nl': + [u'nut', 'n', None, + [{'definition': u'[[use]], [[benefit]]', 'concisedef': u''}] + ], + } + ) + }, + {'wikilang': 'nl', + 'term': 'dummy', + 'wikiformat': u""" {{-nl-}} {{-noun-}} '''dummy''' {{m}} """, - 'internalrep': - ( - [u''], - [u''], - {u'nl': - [u'dummy', 'm', u"dummy's", - [{'definition': u'', - 'concisedef': u'', - 'trans': {'remark': '', - 'alltrans': { - 'nl': {'remark': '', - 'translations': [{'remark': '', - 'translation': (u"", '', 1)} - ] - }, - } - } + 'internalrep': + ( + [u''], + [u''], + {u'nl': + [u'dummy', 'm', u"dummy's", + [{'definition': u'', + 'concisedef': u'', + 'trans': {'remark': '', + 'alltrans': { + 'nl': {'remark': '', + 'translations': [{'remark': '', + 'translation': + (u"", '', 1)} + ] + }, + } + } + } + ], + ], + } + ) } - ], - ], - } ) - } - ) # def testWhetherCategoriesAreParsedProperly(self): # """Test whether Categories are parsed properly""" # for value in self.knownvalues: @@ -467,6 +499,7 @@ # if concisedef!='' and refsyns.has_key(concisedef) and resultsyns.has_key(concisedef): # self.assertEqual(resultsyns[concisedef], refsyns[concisedef]) # + def testWhetherTranslationsAreParsedProperly(self): """Test whether translations are parsed properly""" for value in self.knownvalues: @@ -475,8 +508,8 @@ value['term']) apage.parseWikiPage(value['wikiformat']) for entrylang in internalrepresentation.keys(): - definitions=internalrepresentation[entrylang][3] - reftrans={} + definitions = internalrepresentation[entrylang][3] + reftrans = {} for definition in definitions: if 'trans' in definition and definition['trans']: reftrans[definition['concisedef']] = definition['trans'] @@ -487,7 +520,8 @@ for resultmeaning in apage.entries[entrylang].meanings[key]: print resultmeaning.concisedef print 'Translations: ', resultmeaning.getTranslations() - resulttrans[resultmeaning.concisedef] = resultmeaning.getTranslations() + resulttrans[ + resultmeaning.concisedef] = resultmeaning.getTranslations() for concisedef in resulttrans.keys(): if concisedef != '' and concisedef in reftrans and \ -- To view, visit
https://gerrit.wikimedia.org/r/93325
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I1bb2f0ee00a881a31e35a1204dc07b893b9aaf3c Gerrit-PatchSet: 3 Gerrit-Project: pywikibot/compat Gerrit-Branch: master Gerrit-Owner: Xqt <info(a)gno.de> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] running pylint on the code - change (pywikibot/core)
by jenkins-bot (Code Review)
09 Nov '13
09 Nov '13
jenkins-bot has submitted this change and it was merged. Change subject: running pylint on the code ...................................................................... running pylint on the code Change-Id: I94f96b452d5dd5abb4b325cd3ee9396167cd1d04 Signed-off-by: James Michael DuPont <jamesmikedupont(a)gmail.com> --- M pywikibot/botirc.py M pywikibot/login.py M pywikibot/pagegenerators.py 3 files changed, 18 insertions(+), 16 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/botirc.py b/pywikibot/botirc.py index bda0284..48e7f4d 100644 --- a/pywikibot/botirc.py +++ b/pywikibot/botirc.py @@ -7,7 +7,7 @@ """ # # (C) Balasyum -# (C) Pywikipedia bot team, 2008-2012 +# (C) Pywikipedia bot team, 2008-2013 # # Distributed under the terms of the MIT license. # @@ -18,18 +18,16 @@ # scripts, instead of writing each one from scratch. -import logging import logging.handlers # all output goes thru python std library "logging" module import re from ircbot import SingleServerIRCBot -from irclib import nm_to_n, nm_to_h, irc_lower, ip_numstr_to_quad -from irclib import ip_quad_to_numstr +from irclib import nm_to_n -# logging levels _logger = "botirc" +# logging levels from logging import DEBUG, INFO, WARNING, ERROR, CRITICAL STDOUT = 16 VERBOSE = 18 @@ -60,7 +58,8 @@ self.channel = channel self.site = site self.other_ns = re.compile( - u'14\[\[07(' + u'|'.join([item[0] for item in site.namespaces().values() if item[0]]) + u')') + u'14\[\[07(' + u'|'.join([item[0] for item in + site.namespaces().values() if item[0]]) + u')') self.api_url = self.site.family.apipath(self.site.lang) self.api_url += '?action=query&meta=siteinfo&siprop=statistics&format=xml' self.api_found = re.compile(r'articles="(.*?)"') @@ -79,9 +78,9 @@ def on_pubmsg(self, c, e): match = self.re_edit.match(e.arguments()[0]) if not match: - return + return if not ('N' in match.group('flags')): - return + return try: msg = unicode(e.arguments()[0], 'utf-8') except UnicodeDecodeError: @@ -93,11 +92,11 @@ entry = self.api_found.findall(text) page = pywikibot.Page(self.site, name) try: - text = page.get() + text = page.get() except pywikibot.NoPage: - return + return except pywikibot.IsRedirectPage: - return + return pywikibot.output(str((entry[0], name))) def on_dccmsg(self, c, e): diff --git a/pywikibot/login.py b/pywikibot/login.py index 872ad68..f73f544 100644 --- a/pywikibot/login.py +++ b/pywikibot/login.py @@ -11,7 +11,6 @@ # __version__ = '$Id$' -import logging import pywikibot from pywikibot import config, deprecate_arg from pywikibot.exceptions import NoSuchSite, NoUsername @@ -46,7 +45,8 @@ self.username = user elif sysop: try: - self.username = config.sysopnames[self.site.family.name][self.site.code] + self.username = config.sysopnames[ + self.site.family.name][self.site.code] except KeyError: raise NoUsername( u"""ERROR: Sysop username for %(fam_name)s:%(wiki_code)s is undefined. @@ -57,7 +57,8 @@ 'wiki_code': self.site.code}) else: try: - self.username = config.usernames[self.site.family.name][self.site.code] + self.username = config.usernames[ + self.site.family.name][self.site.code] except: raise NoUsername( u"""ERROR: Username for %(fam_name)s:%(wiki_code)s is undefined. @@ -77,7 +78,8 @@ """ if self.site.family.name in botList \ and self.site.code in botList[self.site.family.name]: - botListPageTitle, botTemplate = botList[self.site.family.name][self.site.code] + botListPageTitle, botTemplate = botList[ + self.site.family.name][self.site.code] botListPage = pywikibot.Page(self.site, botListPageTitle) if botTemplate: for template in botListPage.templatesWithParams(): diff --git a/pywikibot/pagegenerators.py b/pywikibot/pagegenerators.py index 50fd0d7..a7b8fe3 100644 --- a/pywikibot/pagegenerators.py +++ b/pywikibot/pagegenerators.py @@ -20,10 +20,11 @@ __version__ = '$Id$' import re -import sys import codecs import itertools import pywikibot +import time +import date from pywikibot import config from pywikibot import deprecate_arg, i18n -- To view, visit
https://gerrit.wikimedia.org/r/94469
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I94f96b452d5dd5abb4b325cd3ee9396167cd1d04 Gerrit-PatchSet: 4 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Mdupont <jamesmikedupont(a)gmail.com> Gerrit-Reviewer: Legoktm <legoktm.wikipedia(a)gmail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] update language_by_size - change (pywikibot/core)
by jenkins-bot (Code Review)
09 Nov '13
09 Nov '13
jenkins-bot has submitted this change and it was merged. Change subject: update language_by_size ...................................................................... update language_by_size Change-Id: I1a06c24133342edc86a3bbcbf9b33c695e3b0d72 --- M pywikibot/families/wikibooks_family.py M pywikibot/families/wikipedia_family.py M pywikibot/families/wiktionary_family.py 3 files changed, 21 insertions(+), 21 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/families/wikibooks_family.py b/pywikibot/families/wikibooks_family.py index 667296e..910bb0c 100644 --- a/pywikibot/families/wikibooks_family.py +++ b/pywikibot/families/wikibooks_family.py @@ -12,10 +12,10 @@ self.languages_by_size = [ 'en', 'de', 'fr', 'hu', 'ja', 'it', 'pt', 'nl', 'pl', 'es', 'he', - 'vi', 'id', 'sq', 'ca', 'fi', 'ru', 'cs', 'zh', 'sv', 'da', 'tr', - 'hr', 'no', 'th', 'fa', 'sr', 'gl', 'ko', 'ta', 'ar', 'mk', 'tl', + 'vi', 'id', 'sq', 'ca', 'fi', 'ru', 'cs', 'zh', 'sv', 'da', 'hr', + 'tr', 'no', 'th', 'fa', 'sr', 'gl', 'ko', 'ta', 'ar', 'mk', 'tl', 'ro', 'is', 'tt', 'lt', 'ka', 'az', 'eo', 'uk', 'bg', 'sk', 'el', - 'hy', 'sl', 'si', 'ms', 'li', 'la', 'ml', 'ang', 'ia', 'ur', 'cv', + 'hy', 'sl', 'si', 'ms', 'li', 'la', 'ml', 'ang', 'ur', 'ia', 'cv', 'et', 'mr', 'bn', 'hi', 'oc', 'kk', 'km', 'eu', 'fy', 'ie', 'ne', 'sa', 'te', 'af', 'tg', 'ky', 'bs', 'pa', 'mg', 'be', 'cy', 'zh-min-nan', 'ku', 'uz', diff --git a/pywikibot/families/wikipedia_family.py b/pywikibot/families/wikipedia_family.py index 803c7ea..09f87be 100644 --- a/pywikibot/families/wikipedia_family.py +++ b/pywikibot/families/wikipedia_family.py @@ -13,32 +13,32 @@ self.languages_by_size = [ 'en', 'nl', 'de', 'sv', 'fr', 'it', 'ru', 'es', 'pl', 'war', 'ceb', 'vi', 'ja', 'pt', 'zh', 'uk', 'ca', 'no', 'fi', 'fa', 'id', 'cs', - 'ko', 'hu', 'ar', 'ms', 'ro', 'sr', 'min', 'tr', 'kk', 'sk', 'eo', + 'ko', 'hu', 'ar', 'ro', 'ms', 'sr', 'min', 'tr', 'kk', 'sk', 'eo', 'da', 'eu', 'lt', 'bg', 'he', 'hr', 'sl', 'uz', 'vo', 'et', 'hi', 'gl', 'nn', 'simple', 'hy', 'la', 'az', 'el', 'sh', 'oc', 'th', 'ka', 'mk', 'new', 'be', 'pms', 'tl', 'ta', 'te', 'ht', 'tt', 'be-x-old', 'lv', 'cy', 'sq', 'bs', 'mg', 'br', 'jv', 'lb', 'mr', - 'is', 'ml', 'my', 'ba', 'yo', 'an', 'lmo', 'fy', 'af', 'pnb', 'bn', - 'zh-yue', 'ga', 'ur', 'sw', 'bpy', 'io', 'ky', 'ne', 'gu', 'scn', - 'tg', 'nds', 'ku', 'cv', 'ast', 'qu', 'su', 'sco', 'als', 'kn', + 'is', 'ml', 'my', 'ba', 'yo', 'an', 'lmo', 'fy', 'af', 'pnb', 'ga', + 'bn', 'zh-yue', 'ur', 'sw', 'bpy', 'io', 'ky', 'ne', 'gu', 'scn', + 'tg', 'nds', 'ku', 'cv', 'ast', 'qu', 'sco', 'su', 'als', 'kn', 'ia', 'bug', 'nap', 'bat-smg', 'am', 'map-bms', 'wa', 'ckb', 'gd', 'hif', 'mn', 'zh-min-nan', 'arz', 'mzn', 'yi', 'vec', 'sah', 'nah', 'sa', 'roa-tara', 'os', 'si', 'bar', 'pam', 'hsb', 'pa', 'se', 'li', 'mi', 'fo', 'co', 'ilo', 'gan', 'bo', 'frr', 'glk', 'rue', 'bcl', - 'nds-nl', 'fiu-vro', 'mrj', 'tk', 'ps', 'vls', 'ce', 'xmf', 'gv', + 'nds-nl', 'fiu-vro', 'mrj', 'ce', 'tk', 'ps', 'vls', 'xmf', 'gv', 'or', 'diq', 'zea', 'kv', 'km', 'pag', 'mhr', 'csb', 'dv', 'vep', - 'nrm', 'hak', 'rm', 'koi', 'udm', 'lad', 'lij', 'wuu', + 'nrm', 'hak', 'rm', 'koi', 'udm', 'lad', 'wuu', 'lij', 'zh-classical', 'sc', 'fur', 'stq', 'mt', 'ug', 'ay', 'so', 'pi', - 'bh', 'nov', 'ksh', 'gn', 'kw', 'gag', 'ang', 'pcd', 'as', 'eml', - 'nv', 'ace', 'ext', 'szl', 'frp', 'ie', 'mwl', 'ln', 'pfl', 'krc', + 'nov', 'bh', 'ksh', 'gn', 'kw', 'gag', 'ang', 'pcd', 'as', 'eml', + 'ace', 'nv', 'szl', 'ext', 'frp', 'ie', 'mwl', 'ln', 'pfl', 'krc', 'lez', 'xal', 'haw', 'pdc', 'rw', 'crh', 'dsb', 'to', 'arc', 'kl', - 'myv', 'kab', 'sn', 'bjn', 'pap', 'tpi', 'lo', 'lbe', 'wo', 'mdf', - 'kbd', 'jbo', 'cbk-zam', 'av', 'srn', 'ty', 'kg', 'ab', 'na', 'tet', + 'myv', 'kab', 'sn', 'bjn', 'pap', 'tpi', 'lo', 'kbd', 'lbe', 'wo', + 'mdf', 'jbo', 'cbk-zam', 'av', 'srn', 'ty', 'kg', 'ab', 'na', 'tet', 'ltg', 'ig', 'bxr', 'nso', 'za', 'kaa', 'zu', 'chy', 'rmy', 'cu', 'tn', 'chr', 'cdo', 'roa-rup', 'bi', 'got', 'pih', 'sm', 'bm', 'iu', - 'ss', 'pnt', 'sd', 'ki', 'ee', 'tyv', 'ha', 'om', 'fj', 'ti', 'ts', - 'ks', 'tw', 'sg', 've', 'rn', 'st', 'cr', 'dz', 'ak', 'tum', 'ik', - 'lg', 'ff', 'ny', 'ch', 'xh', + 'ss', 'sd', 'pnt', 'ki', 'ee', 'tyv', 'ha', 'om', 'fj', 'ti', 'ts', + 'ks', 'tw', 'sg', 've', 'rn', 'st', 'cr', 'dz', 'ak', 'ff', 'tum', + 'ik', 'lg', 'ny', 'ch', 'xh', ] langs = self.languages_by_size + ['test', 'test2'] # Sites we want to edit but not count as real languages diff --git a/pywikibot/families/wiktionary_family.py b/pywikibot/families/wiktionary_family.py index 146dbc0..6f1334c 100644 --- a/pywikibot/families/wiktionary_family.py +++ b/pywikibot/families/wiktionary_family.py @@ -11,16 +11,16 @@ self.name = 'wiktionary' self.languages_by_size = [ - 'en', 'mg', 'fr', 'zh', 'lt', 'ru', 'el', 'pl', 'sv', 'ko', 'de', - 'es', 'tr', 'nl', 'ku', 'ta', 'io', 'kn', 'fi', 'vi', 'hu', 'pt', - 'chr', 'no', 'ml', 'my', 'id', 'it', 'li', 'et', 'ja', 'ro', 'te', + 'en', 'mg', 'fr', 'zh', 'lt', 'ru', 'el', 'es', 'pl', 'sv', 'ko', + 'de', 'tr', 'nl', 'ku', 'ta', 'io', 'kn', 'fi', 'vi', 'hu', 'pt', + 'chr', 'no', 'ml', 'my', 'id', 'it', 'li', 'et', 'ro', 'ja', 'te', 'fa', 'cs', 'ca', 'ar', 'eu', 'jv', 'gl', 'lo', 'uk', 'br', 'fj', 'eo', 'bg', 'hr', 'th', 'oc', 'is', 'vo', 'ps', 'zh-min-nan', - 'simple', 'cy', 'scn', 'sr', 'uz', 'af', 'ast', 'sw', 'fy', 'da', + 'simple', 'cy', 'scn', 'uz', 'sr', 'af', 'ast', 'sw', 'da', 'fy', 'tl', 'he', 'az', 'nn', 'wa', 'ur', 'la', 'sq', 'hy', 'sm', 'sl', 'nah', 'pnb', 'ka', 'hi', 'tt', 'bs', 'lb', 'lv', 'tk', 'sk', 'hsb', 'nds', 'kk', 'ky', 'be', 'mk', 'km', 'ga', 'wo', 'ms', 'ang', 'co', - 'sa', 'gn', 'mr', 'csb', 'st', 'ia', 'sd', 'ug', 'sh', 'si', 'tg', + 'sa', 'gn', 'mr', 'csb', 'st', 'ia', 'ug', 'sd', 'sh', 'si', 'tg', 'mn', 'kl', 'or', 'jbo', 'an', 'vec', 'ln', 'fo', 'zu', 'gu', 'kw', 'gv', 'rw', 'qu', 'ss', 'ie', 'mt', 'om', 'bn', 'roa-rup', 'iu', 'pa', 'so', 'am', 'su', 'za', 'gd', 'mi', 'tpi', 'ne', 'yi', 'ti', -- To view, visit
https://gerrit.wikimedia.org/r/94505
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I1a06c24133342edc86a3bbcbf9b33c695e3b0d72 Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Xqt <info(a)gno.de> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
[Gerrit] update language_by_size - change (pywikibot/compat)
by Xqt (Code Review)
09 Nov '13
09 Nov '13
Xqt has submitted this change and it was merged. Change subject: update language_by_size ...................................................................... update language_by_size Change-Id: I3d20c63f89b009eae6f8352fcfb5a36fe05912b4 --- M families/wikibooks_family.py M families/wikipedia_family.py M families/wiktionary_family.py 3 files changed, 21 insertions(+), 21 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/families/wikibooks_family.py b/families/wikibooks_family.py index 9f0e13b..9ae2e09 100644 --- a/families/wikibooks_family.py +++ b/families/wikibooks_family.py @@ -12,10 +12,10 @@ self.languages_by_size = [ 'en', 'de', 'fr', 'hu', 'ja', 'it', 'pt', 'nl', 'pl', 'es', 'he', - 'vi', 'id', 'sq', 'ca', 'fi', 'ru', 'cs', 'zh', 'sv', 'da', 'tr', - 'hr', 'no', 'th', 'fa', 'sr', 'gl', 'ko', 'ta', 'ar', 'mk', 'tl', + 'vi', 'id', 'sq', 'ca', 'fi', 'ru', 'cs', 'zh', 'sv', 'da', 'hr', + 'tr', 'no', 'th', 'fa', 'sr', 'gl', 'ko', 'ta', 'ar', 'mk', 'tl', 'ro', 'is', 'tt', 'lt', 'ka', 'az', 'eo', 'uk', 'bg', 'sk', 'el', - 'hy', 'sl', 'si', 'ms', 'li', 'la', 'ml', 'ang', 'ia', 'ur', 'cv', + 'hy', 'sl', 'si', 'ms', 'li', 'la', 'ml', 'ang', 'ur', 'ia', 'cv', 'et', 'mr', 'bn', 'hi', 'oc', 'kk', 'km', 'eu', 'fy', 'ie', 'ne', 'sa', 'te', 'af', 'tg', 'ky', 'bs', 'pa', 'mg', 'be', 'cy', 'zh-min-nan', 'ku', 'uz', diff --git a/families/wikipedia_family.py b/families/wikipedia_family.py index 9b0d8f5..1ddbfa3 100644 --- a/families/wikipedia_family.py +++ b/families/wikipedia_family.py @@ -13,32 +13,32 @@ self.languages_by_size = [ 'en', 'nl', 'de', 'sv', 'fr', 'it', 'ru', 'es', 'pl', 'war', 'ceb', 'vi', 'ja', 'pt', 'zh', 'uk', 'ca', 'no', 'fi', 'fa', 'id', 'cs', - 'ko', 'hu', 'ar', 'ms', 'ro', 'sr', 'min', 'tr', 'kk', 'sk', 'eo', + 'ko', 'hu', 'ar', 'ro', 'ms', 'sr', 'min', 'tr', 'kk', 'sk', 'eo', 'da', 'eu', 'lt', 'bg', 'he', 'hr', 'sl', 'uz', 'vo', 'et', 'hi', 'gl', 'nn', 'simple', 'hy', 'la', 'az', 'el', 'sh', 'oc', 'th', 'ka', 'mk', 'new', 'be', 'pms', 'tl', 'ta', 'te', 'ht', 'tt', 'be-x-old', 'lv', 'cy', 'sq', 'bs', 'mg', 'br', 'jv', 'lb', 'mr', - 'is', 'ml', 'my', 'ba', 'yo', 'an', 'lmo', 'fy', 'af', 'pnb', 'bn', - 'zh-yue', 'ga', 'ur', 'sw', 'bpy', 'io', 'ky', 'ne', 'gu', 'scn', - 'tg', 'nds', 'ku', 'cv', 'ast', 'qu', 'su', 'sco', 'als', 'kn', + 'is', 'ml', 'my', 'ba', 'yo', 'an', 'lmo', 'fy', 'af', 'pnb', 'ga', + 'bn', 'zh-yue', 'ur', 'sw', 'bpy', 'io', 'ky', 'ne', 'gu', 'scn', + 'tg', 'nds', 'ku', 'cv', 'ast', 'qu', 'sco', 'su', 'als', 'kn', 'ia', 'bug', 'nap', 'bat-smg', 'am', 'map-bms', 'wa', 'ckb', 'gd', 'hif', 'mn', 'zh-min-nan', 'arz', 'mzn', 'yi', 'vec', 'sah', 'nah', 'sa', 'roa-tara', 'os', 'si', 'bar', 'pam', 'hsb', 'pa', 'se', 'li', 'mi', 'fo', 'co', 'ilo', 'gan', 'bo', 'frr', 'glk', 'rue', 'bcl', - 'nds-nl', 'fiu-vro', 'mrj', 'tk', 'ps', 'vls', 'ce', 'xmf', 'gv', + 'nds-nl', 'fiu-vro', 'mrj', 'ce', 'tk', 'ps', 'vls', 'xmf', 'gv', 'or', 'diq', 'zea', 'kv', 'km', 'pag', 'mhr', 'csb', 'dv', 'vep', - 'nrm', 'hak', 'rm', 'koi', 'udm', 'lad', 'lij', 'wuu', + 'nrm', 'hak', 'rm', 'koi', 'udm', 'lad', 'wuu', 'lij', 'zh-classical', 'sc', 'fur', 'stq', 'mt', 'ug', 'ay', 'so', 'pi', - 'bh', 'nov', 'ksh', 'gn', 'kw', 'gag', 'ang', 'pcd', 'as', 'eml', - 'nv', 'ace', 'ext', 'szl', 'frp', 'ie', 'mwl', 'ln', 'pfl', 'krc', + 'nov', 'bh', 'ksh', 'gn', 'kw', 'gag', 'ang', 'pcd', 'as', 'eml', + 'ace', 'nv', 'szl', 'ext', 'frp', 'ie', 'mwl', 'ln', 'pfl', 'krc', 'lez', 'xal', 'haw', 'pdc', 'rw', 'crh', 'dsb', 'to', 'arc', 'kl', - 'myv', 'kab', 'sn', 'bjn', 'pap', 'tpi', 'lo', 'lbe', 'wo', 'mdf', - 'kbd', 'jbo', 'cbk-zam', 'av', 'srn', 'ty', 'kg', 'ab', 'na', 'tet', + 'myv', 'kab', 'sn', 'bjn', 'pap', 'tpi', 'lo', 'kbd', 'lbe', 'wo', + 'mdf', 'jbo', 'cbk-zam', 'av', 'srn', 'ty', 'kg', 'ab', 'na', 'tet', 'ltg', 'ig', 'bxr', 'nso', 'za', 'kaa', 'zu', 'chy', 'rmy', 'cu', 'tn', 'chr', 'cdo', 'roa-rup', 'bi', 'got', 'pih', 'sm', 'bm', 'iu', - 'ss', 'pnt', 'sd', 'ki', 'ee', 'tyv', 'ha', 'om', 'fj', 'ti', 'ts', - 'ks', 'tw', 'sg', 've', 'rn', 'st', 'cr', 'dz', 'ak', 'tum', 'ik', - 'lg', 'ff', 'ny', 'ch', 'xh', + 'ss', 'sd', 'pnt', 'ki', 'ee', 'tyv', 'ha', 'om', 'fj', 'ti', 'ts', + 'ks', 'tw', 'sg', 've', 'rn', 'st', 'cr', 'dz', 'ak', 'ff', 'tum', + 'ik', 'lg', 'ny', 'ch', 'xh', ] self.langs = dict([(lang, '%s.wikipedia.org' % lang) diff --git a/families/wiktionary_family.py b/families/wiktionary_family.py index 66dc707..bb3ac27 100644 --- a/families/wiktionary_family.py +++ b/families/wiktionary_family.py @@ -11,16 +11,16 @@ self.name = 'wiktionary' self.languages_by_size = [ - 'en', 'mg', 'fr', 'zh', 'lt', 'ru', 'el', 'pl', 'sv', 'ko', 'de', - 'es', 'tr', 'nl', 'ku', 'ta', 'io', 'kn', 'fi', 'vi', 'hu', 'pt', - 'chr', 'no', 'ml', 'my', 'id', 'it', 'li', 'et', 'ja', 'ro', 'te', + 'en', 'mg', 'fr', 'zh', 'lt', 'ru', 'el', 'es', 'pl', 'sv', 'ko', + 'de', 'tr', 'nl', 'ku', 'ta', 'io', 'kn', 'fi', 'vi', 'hu', 'pt', + 'chr', 'no', 'ml', 'my', 'id', 'it', 'li', 'et', 'ro', 'ja', 'te', 'fa', 'cs', 'ca', 'ar', 'eu', 'jv', 'gl', 'lo', 'uk', 'br', 'fj', 'eo', 'bg', 'hr', 'th', 'oc', 'is', 'vo', 'ps', 'zh-min-nan', - 'simple', 'cy', 'scn', 'sr', 'uz', 'af', 'ast', 'sw', 'fy', 'da', + 'simple', 'cy', 'scn', 'uz', 'sr', 'af', 'ast', 'sw', 'da', 'fy', 'tl', 'he', 'az', 'nn', 'wa', 'ur', 'la', 'sq', 'hy', 'sm', 'sl', 'nah', 'pnb', 'ka', 'hi', 'tt', 'bs', 'lb', 'lv', 'tk', 'sk', 'hsb', 'nds', 'kk', 'ky', 'be', 'mk', 'km', 'ga', 'wo', 'ms', 'ang', 'co', - 'sa', 'gn', 'mr', 'csb', 'st', 'ia', 'sd', 'ug', 'sh', 'si', 'tg', + 'sa', 'gn', 'mr', 'csb', 'st', 'ia', 'ug', 'sd', 'sh', 'si', 'tg', 'mn', 'kl', 'or', 'jbo', 'an', 'vec', 'ln', 'fo', 'zu', 'gu', 'kw', 'gv', 'rw', 'qu', 'ss', 'ie', 'mt', 'om', 'bn', 'roa-rup', 'iu', 'pa', 'so', 'am', 'su', 'za', 'gd', 'mi', 'tpi', 'ne', 'yi', 'ti', -- To view, visit
https://gerrit.wikimedia.org/r/94504
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged Gerrit-Change-Id: I3d20c63f89b009eae6f8352fcfb5a36fe05912b4 Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/compat Gerrit-Branch: master Gerrit-Owner: Xqt <info(a)gno.de> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot
1
0
0
0
← Newer
1
...
1290
1291
1292
1293
1294
1295
1296
...
1343
Older →
Jump to page:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
Results per page:
10
25
50
100
200