[Pywikipedia-l] [ pywikipediabot-Feature Requests-1722782 ] interwiki.py should follow category redirect templates

SourceForge.net noreply at sourceforge.net
Mon Jan 12 22:08:20 UTC 2009


Feature Requests item #1722782, was opened at 2007-05-21 17:21
Message generated for change (Comment added) made by aronsson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1722782&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Byrial Ole Jensen (byrial)
Assigned to: Nobody/Anonymous (nobody)
Summary: interwiki.py should follow category redirect templates

Initial Comment:
Sometimes when a category is moved, a template like [[en:template:Template:Category redirect]] (see its interwiki links for similar templates in other languages) is left at the old category page.

It would be good if interwiki.py could check for the presence of a such template and follow the redirect to the new category as given by the template argument.


----------------------------------------------------------------------

Comment By: Lars Aronsson (aronsson)
Date: 2009-01-12 23:08

Message:
Index: wikipedia.py
===================================================================
--- wikipedia.py	(revision 6250)
+++ wikipedia.py	(working copy)
@@ -830,6 +830,10 @@
                 self._redirarg = redirtarget
             else:
                 raise IsRedirectPage(redirtarget)
+        elif self.isCategoryRedirect(): # sets _redirarg
+            if not get_redirect:
+                self._getexception = IsRedirectPage
+                raise IsRedirectPage, self._redirarg
         if self.section():
             # TODO: What the hell is this? Docu please.
             m =
re.search("\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D"
% re.escape(self.section()), sectionencode(text,self.site().encoding()))
@@ -1078,6 +1082,122 @@
         """Return True if this is an image description page, False
otherwise."""
         return self.namespace() == 6
 
+    def isCategoryRedirect(self):
+        if not self.isCategory():
+            return False
+        if not hasattr(self, "_isCategoryRedirect"):
+            if not hasattr(self.site(), "_categoryredirecttemplates"):
+                # Big table duplicated from category_redirecty.py
+                # Where should this list reside? In family.py?
+                try:
+                    self._site._categoryredirecttemplates = {
+                        'wikipedia': {
+                            'ar': (u"تحويل تصنيف",
+                                   u"تحويلة تصنيف"
+                                   u"Category redirect",
+                                   u"تحويلة تصنيف"),
+                            'arz': (u'تحويل تصنيف'),
+                            'cs': (u'Zastaralá kategorie'),
+                            'da': (u'Kategoriomdirigering'),
+                            'de': (u'Kategorieweiterleitung'),
+                            'en': (u"Category redirect",
+                                   u"Category redirect3",
+                                   u"Categoryredirect",
+                                   u"Empty category",
+                                   u"CR",
+                                   u"Catredirect",
+                                   u"Cat redirect",
+                                   u"Emptycat",
+                                   u"Emptycategory",
+                                   u"Empty cat",
+                                   u"Seecat"),
+                            'es': (u'Categoría redirigida'),
+                            'eu': (u'Kategoria redirect'),
+                            'fa': (u'رده بهتر',
+                                   u'انتقال رده',
+                                   u'فیلم‌های
امریکایی'),
+                            'fr': (u'Redirection de catégorie'),
+                            'hi':
(u'श्रेणीअनुप्रेषित',
+                                   u'Categoryredirect'),
+                            'id': (u'Alih kategori',
+                                   u'Alihkategori'),
+                            # 'it' has removed its template
+                            # 'ja' is discussing to remove this template
+                            'ja': (u"Category redirect"),
+                            'ko': (u'분류 넘겨주기'),
+                            'mk': (u'Премести
категорија'),
+                            'ms': (u'Pengalihan kategori',
+                                   u'Categoryredirect',
+                                   u'Category redirect'),
+                            'mt': (u'Redirect kategorija'),
+                            # 'nl' has removed its template
+                            'no': (u"Category redirect",
+                                   u"Kategoriomdirigering",
+                                   u"Kategori-omdirigering"),
+                            'pl': (u'Przekierowanie kategorii',
+                                   u'Category redirect'),
+                            'pt': (u'Redirecionamento de categoria',
+                                   u'Redircat',
+                                   u'Redirect-categoria'),
+                            'ro': (u'Redirect categorie'),
+                            'ru': (u'Переименованная
категория',
+                                   u'Categoryredirect',
+                                   u'CategoryRedirect',
+                                   u'Category redirect',
+                                   u'Catredirect'),
+                            'simple': (u"Category redirect",
+                                       u"Catredirect"),
+                            'sq': (u'Kategori e zhvendosur',
+                                   u'Category redirect'),
+                            'tl': (u'Category redirect'),
+                            'tr': (u'Kategori yönlendirme',
+                                   u'Kat redir'),
+                            'uk': (u'Categoryredirect'),
+                            'vi': (u'Đổi hướng thể loại',
+                                   u'Thể loại đổi hướng',
+                                   u'Chuyển hướng thể loại',
+                                   u'Categoryredirect',
+                                   u'Category redirect',
+                                   u'Catredirect',
+                                   u'Categoryredirect'),
+                            'yi': (u'קאטעגאריע
אריבערפירן'),
+                            'zh': (u'分类重定向',
+                                   u'Cr',
+                                   u'CR',
+                                   u'Cat-redirect'),
+                            'zh-yue': (u'Category redirect',
+                                       u'分類彈去',
+                                       u'分類跳轉'),
+                            },
+                        'commons': {
+                            'commons': (u'Category redirect',
+                                        u'Categoryredirect',
+                                        u'See cat',
+                                        u'Seecat',
+                                        u'Catredirect',
+                                        u'Cat redirect',
+                                        u'CatRed',
+                                        u'Cat-red',
+                                        u'Catredir',
+                                        u'Redirect category'),
+                            }
+                        }[self._site.family.name][self._site.lang]
+                except:
+                    self._site._categoryredirecttemplates = None
+
+            if self._site._categoryredirecttemplates is None:
+                self._isCategoryRedirect = False
+                return False
+            for (t, arg) in self.templatesWithParams():
+                if t in self.site()._categoryredirecttemplates:
+                    self._isCategoryRedirect = True
+                    # Get target
+                    self._redirarg =
self._site.namespace(self._namespace) + ":" + arg[0]
+                    break
+            else:
+                self._isCategoryRedirect = False
+        return self._isCategoryRedirect
+
     def isDisambig(self):
         """Return True if this is a disambiguation page, False
otherwise.
 
@@ -2958,6 +3078,8 @@
                     page2._revisionId = revisionId
                     page2._editTime = timestamp
                     section = page2.section()
+                    page2._contents = text
+
                     m = self.site.redirectRegex().match(text)
                     if m:
                         ## output(u"%s is a redirect" % page2.aslink())
@@ -2966,6 +3088,9 @@
                             redirectto = redirectto+"#"+section
                         page2._getexception = IsRedirectPage
                         page2._redirarg = redirectto
+                    elif page2.isCategoryRedirect():
+                        page2._getexception = IsRedirectPage
+                        
                     # This is used for checking deletion conflict.
                     # Use the data loading time.
                     page2._startTime = time.strftime('%Y%m%d%H%M%S',
time.gmtime())


----------------------------------------------------------------------

Comment By: Lars Aronsson (aronsson)
Date: 2009-01-10 02:41

Message:
Thanks, I hadn't even looked in category_redirect.py. For the moment, I
just copied the list of template names to my version of wikipedia.py so all
my changes are in one file. I have updated the list with more template
names (and more synonyms).

The detection of #REDIRECT in wikipedia.py is done in two places, using
self.site.redirectRegex() both in Page._getEditPage() and GetAll.oneDone().
These are the two places I added an "elif" branch to look for category
redirects. I don't fully understand why there needs to be two places to do
this test, but that's a matter of overall design. The naming of
redirectRegex() is also hardwired to the use of a single regex, which
doesn't scale to category redirects. Perhaps a refactoring would lead to
that function being renamed to isRedirect(). I think redirect detection
does belong in the Site object, since it depends on language-specific
synonyms to REDIRECT and to specific templates used for category redirects.

----------------------------------------------------------------------

Comment By: Russell Blau (russblau)
Date: 2009-01-09 14:17

Message:
category_redirect.py already contains a list of category redirect
templates, although only for a few sites.  If it is desired to use this
capability in other bots, then the template lists should probably be moved
into the family files, and an is_category_redirect() method added to the
Category object in catlib.py, or alternatively to the Page object.

----------------------------------------------------------------------

Comment By: Lars Aronsson (aronsson)
Date: 2009-01-09 12:40

Message:
I now have some code that I believe solves this. But since I'm a beginner
in Python, I'd like someone more experienced to look at my code before it
is submitted.

----------------------------------------------------------------------

Comment By: Lars Aronsson (aronsson)
Date: 2009-01-09 01:46

Message:
The previous comment was by me. I don't know why I wasn't logged in.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2009-01-09 01:43

Message:
Implementing this feature involves several steps.

First the template needs to be detected. This is similar to isDisambig()
in wikipedia.py. Perhaps that function should also require isCategory(), so
the template is only detected when used in category pages. Unfortunately,
there is no equivalent to the MediaWiki:Disambiguationspage to help us find
out what the template name is in each language, so we have to list the
template translations for each language. I think that should be manageable.

I propose the new function be called isCategoryRedirect(). Then this
function needs to be introduced where isRedirect() is used. Or perhaps
isRedirect() should call it? That would save a lot of work.

Are there some situations where it would be harmful to detect this
template? Should the use of the new function be configurable?


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1722782&group_id=93107



More information about the Pywikipedia-l mailing list