jenkins-bot has submitted this change and it was merged.
Change subject: [doc] Shortcuts for action command and further explanations.
......................................................................
[doc] Shortcuts for action command and further explanations.
Change-Id: I0c07d6739981628012f77a79a8d7fbdfe636cbd1
---
M scripts/redirect.py
1 file changed, 8 insertions(+), 4 deletions(-)
Approvals:
XZise: Looks good to me, approved
jenkins-bot: Verified
diff --git a/scripts/redirect.py b/scripts/redirect.py
index 0fde4ab..39212a7 100755
--- a/scripts/redirect.py
+++ b/scripts/redirect.py
@@ -12,13 +12,17 @@
where action can be one of these:
-double Fix redirects which point to other redirects
+double Fix redirects which point to other redirects.
+do Shortcut action command is "do".
+
broken Tries to fix broken redirect to the last moved target of the
- destination page. If this fails and -delete option is given
+br destination page. If this fails and -delete option is given
it deletes redirects where targets don't exist if bot has
admin rights otherwise it marks the page with a speedy deletion
- template if available.
-both Both of the above.
+ template if available. Shortcut action command is "br".
+
+both Both of the above. Retrieves redirect pages from live wiki,
+ not from a special page.
and arguments can be:
--
To view, visit https://gerrit.wikimedia.org/r/173770
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I0c07d6739981628012f77a79a8d7fbdfe636cbd1
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: Category.py: upcast from Page to Category
......................................................................
Category.py: upcast from Page to Category
Instantiate directly a Category, instead of a Page with ns=14.
Change-Id: Iff7d591c0877140fa50c4a939e5430bf44c4d21b
---
M scripts/category.py
1 file changed, 1 insertion(+), 1 deletion(-)
Approvals:
Mpaa: Looks good to me, but someone else must approve
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/scripts/category.py b/scripts/category.py
index 429fc8a..dc1bcdf 100755
--- a/scripts/category.py
+++ b/scripts/category.py
@@ -348,7 +348,7 @@
newcat = self.newcat
if not self.current_page.site.nocapitalize:
newcat = newcat[:1].upper() + newcat[1:]
- catpl = pywikibot.Page(self.current_page.site, newcat, ns=14)
+ catpl = pywikibot.Category(self.current_page.site, newcat)
if catpl in cats:
pywikibot.output(u"%s is already in %s."
% (self.current_page.title(), catpl.title()))
--
To view, visit https://gerrit.wikimedia.org/r/167371
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Iff7d591c0877140fa50c4a939e5430bf44c4d21b
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: Skip expandtemplates if the page content is empty
......................................................................
Skip expandtemplates if the page content is empty
Bug: 73529
Change-Id: If75d7fffab8b765699576e2954f5fb03c4af8ef2
---
M pywikibot/page.py
M pywikibot/site.py
2 files changed, 8 insertions(+), 0 deletions(-)
Approvals:
XZise: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/page.py b/pywikibot/page.py
index 68bfd5c..bda181f 100644
--- a/pywikibot/page.py
+++ b/pywikibot/page.py
@@ -493,6 +493,10 @@
"""
if not hasattr(self, '_expanded_text') or (
self._expanded_text is None) or force:
+ if not self.text:
+ self._expanded_text = ''
+ return ''
+
self._expanded_text = self.site.expand_text(
self.text,
title=self.title(withSection=False),
diff --git a/pywikibot/site.py b/pywikibot/site.py
index 4a139fa..5b1399d 100644
--- a/pywikibot/site.py
+++ b/pywikibot/site.py
@@ -1945,6 +1945,10 @@
@return: unicode
"""
+ if not isinstance(text, basestring):
+ raise ValueError('text must be a string')
+ if not text:
+ return ''
req = api.Request(site=self, action='expandtemplates', text=text)
if title is not None:
req['title'] = title
--
To view, visit https://gerrit.wikimedia.org/r/173994
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: If75d7fffab8b765699576e2954f5fb03c4af8ef2
Gerrit-PatchSet: 2
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: New WikiStats module
......................................................................
New WikiStats module
wikistats.wmflabs.org provides aggregate statistics for families of
wikis. Pywikibot currently fetches a subset of the data using XML
to create a hard-coded list of languages by size for the multi-lang
WMF families and family 'anarchopedias'.
This new module provides raw access to all of the data stored in
wikistats, and allows it to be fetched using the smaller csv format,
however the XML format is also supported and is used for Python 2
without the unicodecsv module installed.
e.g. for wiktionaries, the XML is 300Kb vs the csv 26Kb.
Change-Id: Id0070092d2337c9fc86b01e2103999c6dcea42fa
---
A pywikibot/data/wikistats.py
M setup.py
A tests/wikistats_tests.py
3 files changed, 305 insertions(+), 0 deletions(-)
Approvals:
John Vandenberg: Looks good to me, but someone else must approve
XZise: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/data/wikistats.py b/pywikibot/data/wikistats.py
new file mode 100644
index 0000000..125eabc
--- /dev/null
+++ b/pywikibot/data/wikistats.py
@@ -0,0 +1,240 @@
+# -*- coding: utf-8 -*-
+"""Objects representing WikiStats API."""
+#
+# (C) Pywikibot team, 2014
+#
+# Distributed under the terms of the MIT license.
+
+import sys
+
+from io import BytesIO, StringIO
+
+import pywikibot
+
+if sys.version_info[0] > 2:
+ import csv
+else:
+ try:
+ import unicodecsv as csv
+ except ImportError:
+ pywikibot.warning(
+ 'WikiStats: unicodecsv package required for using csv in Python 2;'
+ ' falling back to using the larger XML datasets.')
+ csv = None
+
+from pywikibot.comms import threadedhttp
+
+
+class WikiStats(object):
+
+ """
+ Light wrapper around WikiStats data, caching responses and data.
+
+ The methods accept a Pywikibot family name as the WikiStats table name,
+ mapping the names before calling the WikiStats API.
+ """
+
+ FAMILY_MAPPING = {
+ 'anarchopedia': 'anarchopedias',
+ 'wikipedia': 'wikipedias',
+ 'wikiquote': 'wikiquotes',
+ 'wikisource': 'wikisources',
+ 'wiktionary': 'wiktionaries',
+ }
+
+ MISC_SITES_TABLE = 'mediawikis'
+
+ WMF_MULTILANG_TABLES = set([
+ 'wikipedias', 'wiktionaries', 'wikisources', 'wikinews',
+ 'wikibooks', 'wikiquotes', 'wikivoyage', 'wikiversity',
+ ])
+
+ OTHER_MULTILANG_TABLES = set([
+ 'uncyclomedia',
+ 'anarchopedias',
+ 'rodovid',
+ 'wikifur',
+ 'wikitravel',
+ 'scoutwiki',
+ 'opensuse',
+ 'metapedias',
+ 'lxde',
+ 'pardus',
+ 'gentoo',
+ ])
+
+ OTHER_TABLES = set([
+ # Farms
+ 'wikia',
+ 'wikkii',
+ 'wikisite',
+ 'editthis',
+ 'orain',
+ 'shoutwiki',
+ 'referata',
+
+ # Single purpose/manager sets
+ 'wmspecials',
+ 'gamepedias',
+ 'w3cwikis',
+ 'neoseeker',
+ 'sourceforge',
+ ])
+
+ ALL_TABLES = (set([MISC_SITES_TABLE]) | WMF_MULTILANG_TABLES |
+ OTHER_MULTILANG_TABLES | OTHER_TABLES)
+
+ ALL_KEYS = set(FAMILY_MAPPING.keys()) | ALL_TABLES
+
+ def __init__(self, url='https://wikistats.wmflabs.org/'):
+ """Constructor."""
+ self.url = url
+ self._raw = {}
+ self._data = {}
+
+ def fetch(self, table, format="xml"):
+ """
+ Fetch data from WikiStats.
+
+ @param table: table of data to fetch
+ @type table: basestring
+ @param format: Format of data to use
+ @type format: 'xml' or 'csv'.
+ @rtype: bytes
+ """
+ URL = self.url + '/api.php?action=dump&table=%s&format=%s'
+
+ if table not in self.ALL_KEYS:
+ pywikibot.warning('WikiStats unknown table %s' % table)
+
+ if table in self.FAMILY_MAPPING:
+ table = self.FAMILY_MAPPING[table]
+
+ o = threadedhttp.Http()
+ r = o.request(uri=URL % (table, format))
+ if isinstance(r, Exception):
+ raise r
+ return r[1]
+
+ def raw_cached(self, table, format):
+ """
+ Cache raw data.
+
+ @param table: table of data to fetch
+ @type table: basestring
+ @param format: Format of data to use
+ @type format: 'xml' or 'csv'.
+ @rtype: bytes
+ """
+ if format not in self._raw:
+ self._raw[format] = {}
+ if table in self._raw[format]:
+ return self._raw[format][table]
+
+ data = self.fetch(table, format)
+
+ self._raw[format][table] = data
+ return data
+
+ def csv(self, table):
+ """
+ Fetch and parse CSV for a table.
+
+ @param table: table of data to fetch
+ @type table: basestring
+ @rtype: list
+ """
+ if table in self._data.setdefault('csv', {}):
+ return self._data['csv'][table]
+
+ data = self.raw_cached(table, 'csv')
+
+ if sys.version_info[0] > 2:
+ f = StringIO(data.decode('utf8'))
+ else:
+ f = BytesIO(data)
+
+ reader = csv.DictReader(f)
+
+ data = [site for site in reader]
+
+ self._data['csv'][table] = data
+
+ return data
+
+ def xml(self, table):
+ """
+ Fetch and parse XML for a table.
+
+ @param table: table of data to fetch
+ @type table: basestring
+ @rtype: list
+ """
+ if table in self._data.setdefault('xml', {}):
+ return self._data['xml'][table]
+
+ from xml.etree import cElementTree
+
+ data = self.raw_cached(table, 'xml')
+
+ f = BytesIO(data)
+ tree = cElementTree.parse(f)
+
+ data = []
+
+ for row in tree.findall('row'):
+ site = {}
+
+ for field in row.findall('field'):
+ site[field.get('name')] = field.text
+
+ data.append(site)
+
+ self._data['xml'][table] = data
+
+ return data
+
+ def get(self, table, format=None):
+ """
+ Get a list of a table of data using format.
+
+ @param table: table of data to fetch
+ @type table: basestring
+ @param format: Format of data to use
+ @type format: 'xml' or 'csv', or None to autoselect.
+ @rtype: list
+ """
+ if csv or format == 'csv':
+ data = self.csv(table)
+ else:
+ data = self.xml(table)
+ return data
+
+ def get_dict(self, table, format=None):
+ """
+ Get dictionary of a table of data using format.
+
+ @param table: table of data to fetch
+ @type table: basestring
+ @param format: Format of data to use
+ @type format: 'xml' or 'csv', or None to autoselect.
+ @rtype: dict
+ """
+ return dict((data['prefix'], data)
+ for data in self.get(table, format))
+
+ def sorted(self, table, key):
+ """
+ Reverse numerical sort of data.
+
+ @param table: name of table of data
+ @param key: numerical key, such as id, total, good
+ """
+ return sorted(self.get(table),
+ key=lambda d: int(d[key]),
+ reverse=True)
+
+ def languages_by_size(self, table):
+ """ Return ordered list of languages by size from WikiStats. """
+ # This assumes they appear in order of size in the WikiStats dump.
+ return [d['prefix'] for d in self.get(table)]
diff --git a/setup.py b/setup.py
index 4fdacda..564ad08 100644
--- a/setup.py
+++ b/setup.py
@@ -27,6 +27,9 @@
'mwparserfromhell': ['mwparserfromhell>=0.3.3']
}
+if sys.version_info[0] == 2:
+ extra_deps['wikistats-csv'] = ['unicodecsv']
+
script_deps = {
'script_wui.py': ['irc', 'lunatic-python', 'crontab'],
# Note: None of the 'lunatic-python' repos on github support MS Windows.
diff --git a/tests/wikistats_tests.py b/tests/wikistats_tests.py
new file mode 100644
index 0000000..c9e8532
--- /dev/null
+++ b/tests/wikistats_tests.py
@@ -0,0 +1,62 @@
+# -*- coding: utf-8 -*-
+"""Test cases for the WikiStats dataset."""
+#
+# (C) Pywikibot team, 2014
+#
+# Distributed under the terms of the MIT license.
+#
+__version__ = '$Id$'
+#
+
+import sys
+
+from pywikibot.data.wikistats import WikiStats, csv
+
+from tests.aspects import unittest, TestCase
+
+if sys.version_info[0] == 3:
+ basestring = (str, )
+
+
+class WikiStatsTestCase(TestCase):
+
+ """Test WikiStats dump."""
+
+ net = True
+
+ def test_sort(self):
+ ws = WikiStats()
+ data = ws.sorted('wikipedia', 'total')
+ top = data[0]
+ self.assertIn('prefix', top)
+ self.assertIn('total', top)
+ self.assertEqual(top['prefix'], 'en')
+ self.assertIsInstance(top['total'], basestring)
+ self.assertEqual(ws.languages_by_size('wikipedia')[0], 'en')
+ self.assertEqual(ws.languages_by_size('wikisource')[0], 'fr')
+
+ def test_csv(self):
+ if not csv:
+ raise unittest.SkipTest('unicodecsv not installed.')
+ ws = WikiStats()
+ data = ws.get_dict('wikipedia', 'csv')
+ self.assertIsInstance(data, dict)
+ self.assertIn('en', data)
+ self.assertIn('ht', data)
+ self.assertGreater(int(data['en']['total']), 4000000)
+ data = ws.get_dict
+
+ def test_xml(self):
+ ws = WikiStats()
+ data = ws.get_dict('wikisource', 'xml')
+ self.assertIsInstance(data, dict)
+ self.assertIn('en', data)
+ self.assertIn('id', data)
+ self.assertGreater(int(data['fr']['total']), 1600000)
+
+
+if __name__ == '__main__':
+ try:
+ unittest.main()
+ except SystemExit:
+ pass
--
To view, visit https://gerrit.wikimedia.org/r/172104
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Id0070092d2337c9fc86b01e2103999c6dcea42fa
Gerrit-PatchSet: 4
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: [FEAT] Process a single page with redirect.py
......................................................................
[FEAT] Process a single page with redirect.py
- Additional -page option enables to work on a single page to solve
it's problems.
- Re-enable XML file for broken redirect
- Remove old screen scraping code for broken redirect special page
Change-Id: I6e7da9ba91c7eb820b10b07a77076a093a2c2b2a
---
M scripts/redirect.py
1 file changed, 25 insertions(+), 32 deletions(-)
Approvals:
John Vandenberg: Looks good to me, approved
jenkins-bot: Verified
diff --git a/scripts/redirect.py b/scripts/redirect.py
index c39a931..0fde4ab 100755
--- a/scripts/redirect.py
+++ b/scripts/redirect.py
@@ -36,6 +36,8 @@
If neither of -xml -fullscan -moves is given, info will be
loaded from a special page of the live wiki.
+-page:title Work on a single page
+
-namespace:n Namespace to process. Can be given multiple times, for several
namespaces. If omitted, only the main (article) namespace is
treated.
@@ -82,7 +84,7 @@
def __init__(self, xmlFilename=None, namespaces=[], offset=-1,
use_move_log=False, use_api=False, start=None, until=None,
- number=None, step=None):
+ number=None, step=None, page_title=None):
self.site = pywikibot.Site()
self.xmlFilename = xmlFilename
self.namespaces = namespaces
@@ -95,6 +97,7 @@
self.api_until = until
self.api_number = number
self.api_step = step
+ self.page_title = page_title
def get_redirects_from_dump(self, alsoGetPageTitles=False):
"""
@@ -267,38 +270,22 @@
count += 1
if count >= self.api_number:
break
- elif not self.xmlFilename:
+ elif self.xmlFilename:
+ # retrieve information from XML dump
+ pywikibot.output(
+ u'Getting a list of all redirects and of all page titles...')
+ redirs, pageTitles = self.get_redirects_from_dump(
+ alsoGetPageTitles=True)
+ for (key, value) in redirs.items():
+ if value not in pageTitles:
+ yield key
+ elif self.page_title:
+ yield self.page_title
+ else:
# retrieve information from broken redirect special page
pywikibot.output(u'Retrieving special page...')
for redir_name in self.site.broken_redirects():
yield redir_name.title()
-
-# TODO: add XML dump support
-## elif self.xmlFilename == None:
-## # retrieve information from the live wiki's maintenance page
-## # broken redirect maintenance page's URL
-## path = self.site.broken_redirects_address(default_limit=False)
-## pywikibot.output(u'Retrieving special page...')
-## maintenance_txt = self.site.getUrl(path)
-##
-## # regular expression which finds redirects which point to a
-## # non-existing page inside the HTML
-## Rredir = re.compile('\<li\>\<a href=".+?" title="(.*?)"')
-##
-## redir_names = Rredir.findall(maintenance_txt)
-## pywikibot.output(u'Retrieved %d redirects from special page.\n'
-## % len(redir_names))
-## for redir_name in redir_names:
-## yield redir_name
-## else:
-## # retrieve information from XML dump
-## pywikibot.output(
-## u'Getting a list of all redirects and of all page titles...')
-## redirs, pageTitles = self.get_redirects_from_dump(
-## alsoGetPageTitles=True)
-## for (key, value) in redirs.items():
-## if value not in pageTitles:
-## yield key
def retrieve_double_redirects(self):
if self.use_move_log:
@@ -326,6 +313,8 @@
yield key
pywikibot.output(u'\nChecking redirect %i of %i...'
% (num + 1, len(redict)))
+ elif self.page_title:
+ yield self.page_title
else:
# retrieve information from double redirect special page
pywikibot.output(u'Retrieving special page...')
@@ -516,8 +505,8 @@
u"Won't delete anything."
% targetPage.title(asLink=True))
else:
- #we successfully get the target page, meaning that
- #it exists and is not a redirect: no reason to touch it.
+ # we successfully get the target page, meaning that
+ # it exists and is not a redirect: no reason to touch it.
pywikibot.output(
u'Redirect target %s does exist! Won\'t delete anything.'
% targetPage.title(asLink=True))
@@ -753,6 +742,8 @@
until = ''
number = None
step = None
+ pagename = None
+
for arg in pywikibot.handle_args(args):
if arg == 'double' or arg == 'do':
action = 'double'
@@ -796,6 +787,8 @@
number = int(arg[7:])
elif arg.startswith('-step:'):
step = int(arg[6:])
+ elif arg.startswith('-page:'):
+ pagename = arg[6:]
elif arg == '-always':
options['always'] = True
elif arg == '-delete':
@@ -812,7 +805,7 @@
else:
pywikibot.Site().login()
gen = RedirectGenerator(xmlFilename, namespaces, offset, moved_pages,
- fullscan, start, until, number, step)
+ fullscan, start, until, number, step, pagename)
bot = RedirectRobot(action, gen, number=number, **options)
bot.run()
--
To view, visit https://gerrit.wikimedia.org/r/173665
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I6e7da9ba91c7eb820b10b07a77076a093a2c2b2a
Gerrit-PatchSet: 4
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: Bot.site property fails when set to None
......................................................................
Bot.site property fails when set to None
- Allow None value to be given to Bot.site property setter.
- Fix ReplaceBot.__init__ to not set it to None.
- Provide an explicit site when running ReplaceBot from replace.main.
Any other caller of ReplaceBot will now receive a warning from
Bot.site getter if they have not specified a site for the
bot to use.
Bug: 73494
Change-Id: Ib98334d820ec2e8ea14d17cb1346ac287ba19a59
---
M pywikibot/bot.py
M scripts/replace.py
2 files changed, 22 insertions(+), 13 deletions(-)
Approvals:
XZise: Looks good to me, approved
Martineznovo: Looks good to me, but someone else must approve
jenkins-bot: Verified
diff --git a/pywikibot/bot.py b/pywikibot/bot.py
index 1b5bfec..9fb6254 100644
--- a/pywikibot/bot.py
+++ b/pywikibot/bot.py
@@ -1070,6 +1070,10 @@
When Bot.run() is managing the generator and site property, this is
set each time a page is on a site different from the previous page.
"""
+ if not site:
+ self._site = None
+ return
+
if site not in self._sites:
log(u'LOADING SITE %s VERSION: %s'
% (site, unicode(site.version())))
diff --git a/scripts/replace.py b/scripts/replace.py
index 4a79ddc..cd9fcfb 100755
--- a/scripts/replace.py
+++ b/scripts/replace.py
@@ -164,7 +164,7 @@
constructor below.
"""
- def __init__(self, xmlFilename, xmlStart, replacements, exceptions):
+ def __init__(self, xmlFilename, xmlStart, replacements, exceptions, site):
"""Constructor."""
self.xmlFilename = xmlFilename
self.replacements = replacements
@@ -178,7 +178,10 @@
if "inside" in self.exceptions:
self.excsInside += self.exceptions['inside']
from pywikibot import xmlreader
- self.site = pywikibot.Site()
+ if site:
+ self.site = site
+ else:
+ self.site = pywikibot.Site()
dump = xmlreader.XmlDump(self.xmlFilename)
self.parser = dump.parse()
@@ -287,9 +290,8 @@
self.acceptall = acceptall
self.allowoverlap = allowoverlap
self.recursive = recursive
- self.site = site
- if self.site is None:
- self.site = pywikibot.Site()
+ if site:
+ self.site = site
if addedCat:
cat_ns = site.category_namespaces()[0]
self.addedCat = pywikibot.Page(self.site,
@@ -576,6 +578,8 @@
else:
commandline_replacements.append(arg)
+ site = pywikibot.Site()
+
if (len(commandline_replacements) % 2):
raise pywikibot.Error('require even number of replacements.')
elif (len(commandline_replacements) == 2 and fix is None):
@@ -583,7 +587,7 @@
commandline_replacements[1]))
if not summary_commandline:
edit_summary = i18n.twtranslate(
- pywikibot.Site(), 'replace-replacing',
+ site, 'replace-replacing',
{'description': ' (-%s +%s)' % (commandline_replacements[0],
commandline_replacements[1])}
)
@@ -598,7 +602,7 @@
for i in range(0, len(commandline_replacements), 2)]
replacementsDescription = '(%s)' % ', '.join(
[('-' + pair[0] + ' +' + pair[1]) for pair in pairs])
- edit_summary = i18n.twtranslate(pywikibot.Site(),
+ edit_summary = i18n.twtranslate(site,
'replace-replacing',
{'description':
replacementsDescription})
@@ -621,7 +625,7 @@
change += ' & -' + old + ' +' + new
replacements.append((old, new))
if not summary_commandline:
- default_summary_message = i18n.twtranslate(pywikibot.Site(),
+ default_summary_message = i18n.twtranslate(site,
'replace-replacing',
{'description': change})
pywikibot.output(u'The summary message will default to: %s'
@@ -645,10 +649,10 @@
regex = fix['regex']
if "msg" in fix:
if isinstance(fix['msg'], basestring):
- edit_summary = i18n.twtranslate(pywikibot.Site(),
+ edit_summary = i18n.twtranslate(site,
str(fix['msg']))
else:
- edit_summary = i18n.translate(pywikibot.Site(),
+ edit_summary = i18n.translate(site,
fix['msg'], fallback=True)
if "exceptions" in fix:
exceptions = fix['exceptions']
@@ -688,7 +692,7 @@
except NameError:
xmlStart = None
gen = XmlDumpReplacePageGenerator(xmlFilename, xmlStart,
- replacements, exceptions)
+ replacements, exceptions, site)
elif useSql:
whereClause = 'WHERE (%s)' % ' OR '.join(
["old_text RLIKE '%s'" % prepareRegexForMySQL(old_regexp.pattern)
@@ -717,8 +721,9 @@
preloadingGen = pagegenerators.PreloadingGenerator(gen)
bot = ReplaceRobot(preloadingGen, replacements, exceptions, acceptall,
- allowoverlap, recursive, add_cat, sleep, edit_summary)
- pywikibot.Site().login()
+ allowoverlap, recursive, add_cat, sleep, edit_summary,
+ site)
+ site.login()
bot.run()
# Explicitly call pywikibot.stopme().
--
To view, visit https://gerrit.wikimedia.org/r/173675
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib98334d820ec2e8ea14d17cb1346ac287ba19a59
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Martineznovo <martineznovo(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: Email and IRC build notifications
......................................................................
Email and IRC build notifications
Always send build notifications to pywikibot-commits(a)lists.wikimedia.org
as even successes may have slightly different output, and the build
email signals that the build is completed.
Only notify IRC on changes, and reduce the IRC build notice to one
message which will wrap onto two lines.
Change-Id: I5cc6e11e6bf5ecb2da83f9b1c4d5c40ea9caa694
---
M .travis.yml
1 file changed, 8 insertions(+), 1 deletion(-)
Approvals:
John Vandenberg: Looks good to me, but someone else must approve
Merlijn van Deen: Looks good to me, approved
jenkins-bot: Verified
diff --git a/.travis.yml b/.travis.yml
index 4759d96..7bac86e 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -46,8 +46,15 @@
- LANGUAGE=fr FAMILY=wiktionary
notifications:
+ email:
+ recipients:
+ - pywikibot-commits(a)lists.wikimedia.org
+ on_success: always
+ on_failure: always
irc:
channels:
- "chat.freenode.net#pywikibot"
on_success: change
- on_failure: always
+ on_failure: change
+ template:
+ - "%{repository_slug}#%{build_number} (%{branch} - %{commit} : %{author}): %{message} %{build_url}"
--
To view, visit https://gerrit.wikimedia.org/r/152834
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I5cc6e11e6bf5ecb2da83f9b1c4d5c40ea9caa694
Gerrit-PatchSet: 2
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Legoktm <legoktm.wikipedia(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: [FIX] Don't use deprecated parameter of redirect()
......................................................................
[FIX] Don't use deprecated parameter of redirect()
With 7aa43ba4a0c8e12bda4a62e32e062672ec8851fa the default parameter from
compat was deprecated. Because the decorator only supports deprecating
keyword arguments, all calls which add the default parameter via a
positional argument will fail.
This removes all usages of the default parameter in the code base.
Because only redirect() was used, only those method calls were fixed.
Bug: 73489
Change-Id: Ie91063b48a599b737d67a624d79532daecd23dc6
---
M scripts/redirect.py
M scripts/solve_disambiguation.py
2 files changed, 2 insertions(+), 2 deletions(-)
Approvals:
John Vandenberg: Looks good to me, approved
jenkins-bot: Verified
diff --git a/scripts/redirect.py b/scripts/redirect.py
index 5524483..c39a931 100755
--- a/scripts/redirect.py
+++ b/scripts/redirect.py
@@ -662,7 +662,7 @@
targetlink = targetPage.title(asLink=True, textlink=True)
text = self.site.redirectRegex().sub(
- '#%s %s' % (self.site.redirect(True),
+ '#%s %s' % (self.site.redirect(),
targetlink),
oldText, 1)
if redir.title() == targetPage.title() or text == oldText:
diff --git a/scripts/solve_disambiguation.py b/scripts/solve_disambiguation.py
index b32e0ef..e29b618 100644
--- a/scripts/solve_disambiguation.py
+++ b/scripts/solve_disambiguation.py
@@ -574,7 +574,7 @@
'to %s?' % (refPage.title(), target),
default=False, automatic_quit=False):
redir_text = '#%s [[%s]]' \
- % (self.mysite.redirect(default=True), target)
+ % (self.mysite.redirect(), target)
try:
refPage.put_async(redir_text, comment=self.comment)
except pywikibot.PageNotSaved as error:
--
To view, visit https://gerrit.wikimedia.org/r/173659
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie91063b48a599b737d67a624d79532daecd23dc6
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: [FEAT] Improved Site.sametitle
......................................................................
[FEAT] Improved Site.sametitle
This improves the Site.sametitle comparision by the following features:
- It uses (if available) the case-sensitivity option defined by the
namespace
- It replaces underscores and spaces by only one space. So 'Fo__ar',
'Fo_ar' and 'Fo ar' are all the same.
- It works with servers which don't have a namespace which is empty.
Bug: 69118
Change-Id: I0b57ea6d7014b4ddfd8ceafbd859594b021e92b4
---
M pywikibot/site.py
M tests/site_tests.py
2 files changed, 83 insertions(+), 44 deletions(-)
Approvals:
John Vandenberg: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/site.py b/pywikibot/site.py
index 799d99c..4a139fa 100644
--- a/pywikibot/site.py
+++ b/pywikibot/site.py
@@ -376,9 +376,9 @@
# Discard leading colon
if count >= 2 and parts[0] == '' and parts[1]:
- return parts[1]
+ return parts[1].strip()
elif parts[0]:
- return parts[0]
+ return parts[0].strip()
return False
@staticmethod
@@ -806,55 +806,42 @@
re.IGNORECASE | re.UNICODE | re.DOTALL)
def sametitle(self, title1, title2):
- """Return True if title1 and title2 identify the same wiki page."""
- # title1 and title2 may be unequal but still identify the same page,
- # if they use different aliases for the same namespace
+ """
+ Return True if title1 and title2 identify the same wiki page.
- def valid_namespace(alias, ns):
- """Determine if a string is a valid alias for a namespace.
-
- @param alias: namespace alias
- @type alias: unicode
- @param ns: namespace
- @type ns: int
-
- @return: bool
- """
- for text in self.namespace(ns, all=True):
- if text.lower() == alias.lower():
- return True
- return False
+ title1 and title2 may be unequal but still identify the same page,
+ if they use different aliases for the same namespace.
+ """
+ def ns_split(title):
+ """Separate the namespace from the name."""
+ if ':' not in title:
+ title = ':' + title
+ ns, _, name = title.partition(':')
+ ns = Namespace.lookup_name(ns, self.namespaces) or default_ns
+ return ns, name
if title1 == title2:
return True
+ # Replace underscores with spaces and multiple combinations of them
+ # with only one space
+ title1 = re.sub(r'[_ ]+', ' ', title1)
+ title2 = re.sub(r'[_ ]+', ' ', title2)
+ if title1 == title2:
+ return True
+ default_ns = self.namespaces[0]
# determine whether titles contain namespace prefixes
- if ":" in title1:
- ns1, name1 = title1.split(":", 1)
- else:
- ns1, name1 = 0, title1
- if ":" in title2:
- ns2, name2 = title2.split(":", 1)
- else:
- ns2, name2 = 0, title2
- for space in self.namespaces(): # iterate over all valid namespaces
- if not isinstance(ns1, int) and valid_namespace(ns1, space):
- ns1 = space
- if not isinstance(ns2, int) and valid_namespace(ns2, space):
- ns2 = space
- if not isinstance(ns1, int):
- # no valid namespace prefix found, so the string followed by ":"
- # must be part of the title
- name1 = ns1 + ":" + name1
- ns1 = 0
- if not isinstance(ns2, int):
- name2 = ns2 + ":" + name2
- ns2 = 0
- if ns1 != ns2:
+ ns1_obj, name1 = ns_split(title1)
+ ns2_obj, name2 = ns_split(title2)
+ if ns1_obj != ns2_obj:
# pages in different namespaces
return False
- if self.case() == "first-letter":
- name1 = name1[:1].upper() + name1[1:]
- name2 = name2[:1].upper() + name2[1:]
+ name1 = name1.strip()
+ name2 = name2.strip()
+ # If the namespace has a case definition it's overriding the site's
+ # case definition
+ if (ns1_obj.case if hasattr(ns1_obj, 'case') else self.case()) == 'first-letter':
+ name1 = name1[0].upper() + name1[1:]
+ name2 = name2[0].upper() + name2[1:]
return name1 == name2
# namespace shortcuts for backwards-compatibility
diff --git a/tests/site_tests.py b/tests/site_tests.py
index 408c60b..c2b3140 100644
--- a/tests/site_tests.py
+++ b/tests/site_tests.py
@@ -144,6 +144,17 @@
self.assertFalse(mysite.isInterwikiLink("foo"))
self.assertIsInstance(mysite.redirectRegex().pattern, basestring)
self.assertIsInstance(mysite.category_on_one_line(), bool)
+ self.assertTrue(mysite.sametitle("Template:Test", "Template:Test"))
+ self.assertTrue(mysite.sametitle("Template: Test", "Template: Test"))
+ self.assertTrue(mysite.sametitle('Test name', 'Test name'))
+ self.assertFalse(mysite.sametitle('Test name', 'Test Name'))
+ # User, MediaWiki (both since 1.16) and Special are always
+ # first-letter (== only first non-namespace letter is case insenstive)
+ # See also: https://www.mediawiki.org/wiki/Manual:$wgCapitalLinks
+ self.assertTrue(mysite.sametitle("Special:Always", "Special:always"))
+ if LV(mysite.version()) >= LV('1.16'):
+ self.assertTrue(mysite.sametitle('User:Always', 'User:always'))
+ self.assertTrue(mysite.sametitle('MediaWiki:Always', 'MediaWiki:always'))
def testConstructors(self):
"""Test cases for site constructors."""
@@ -1611,6 +1622,47 @@
self.assertEqual(item.id, 'Q5296')
+class TestSameTitleSite(TestCase):
+
+ """Test APISite.sametitle on sites with known behaviour."""
+
+ sites = {
+ 'enwp': {
+ 'family': 'wikipedia',
+ 'code': 'en',
+ },
+ 'dewp': {
+ 'family': 'wikipedia',
+ 'code': 'de',
+ },
+ 'enwt': {
+ 'family': 'wiktionary',
+ 'code': 'en',
+ }
+ }
+
+ def check(self, site, case_sensitive):
+ self.assertEqual(site.sametitle('Foo', 'foo'), not case_sensitive)
+ self.assertTrue(site.sametitle('File:Foo', 'Image:Foo'))
+ self.assertTrue(site.sametitle(':Foo', 'Foo'))
+ self.assertFalse(site.sametitle('User:Foo', 'Foo'))
+
+ def test_enwp(self):
+ self.check(self.get_site('enwp'), False)
+ self.assertFalse(self.get_site('enwp').sametitle(
+ 'Template:Test template', 'Template:Test Template'))
+
+ def test_dewp(self):
+ site = self.get_site('dewp')
+ self.check(site, False)
+ self.assertTrue(site.sametitle('Benutzer:Foo', 'User:Foo'))
+ self.assertTrue(site.sametitle('Benutzerin:Foo', 'User:Foo'))
+ self.assertTrue(site.sametitle('Benutzerin:Foo', 'Benutzer:Foo'))
+
+ def test_enwt(self):
+ self.check(self.get_site('enwt'), True)
+
+
if __name__ == '__main__':
try:
unittest.main()
--
To view, visit https://gerrit.wikimedia.org/r/151809
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I0b57ea6d7014b4ddfd8ceafbd859594b021e92b4
Gerrit-PatchSet: 12
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: Nullzero <nullzero.free(a)gmail.com>
Gerrit-Reviewer: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: jenkins-bot <>