I think that wikipedia.Page.getVersionHistory has a bug.
The function getVersionHistory return without edit summary at least in
Japanese Wikipedia.
The local variable editR is wrong.
The following version may work:
> editR = re.compile(r'<li>\([^\)]*\) \([^\)]*\) <[^>]*><[^>]*> <a href="[^\'"]*oldid=(\d*)"[^>]*>([^<]*)</a> <span class=[\'"]history-user[\'"]><a [^>]*>([^<]*)</a>(?:[^<]|<(?!span class="comment">)(?!/li>))*(?:<span class="comment">\((.*)\)</span>)?.*</li>', re.UNICODE)
I've tested this code rough.
Sorry for my poor English. Thank you.
----
[[w:ja:user:mizusumashi]]
I've never written a script before, and know very little about them. (I've
googled, and just got confused.)
If I want to run a heap of movepages.py commands, it seems like I should use
something like:
#!/bin/sh
python movepages.py -from:"abc (from Wikipedia)" -to:"abc" -log:movelog
python movepages.py -from:"xyz (from Wikipedia)" -to:"xyz" -log:movelog
etc
My questions:
- If I place the script in my pywikipedia directory, do I still need to
add a *cd pywikipedia/* or equivalent after the first line?
- I've seen *rundate=`date +"%FT%T%z"` *used after the cd command, but
I can't figure out what this does. Is it needed?
--
Chris Watkins (a.k.a. Chriswaterguy)
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
identi.ca/appropedia / twitter.com/appropediablogs.appropedia.org
I like this: five.sentenc.es
Bugs item #2605385, was opened at 2009-02-16 15:28
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2605385&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Robert Ullmann (robertullmann)
Assigned to: Nobody/Anonymous (nobody)
Summary: wiktionary sort order for ms and da
Initial Comment:
It was pointed out to me that the order used for ms.wikt (default code-alpha) is incorrect, they use the order used by ms.wp: "alphabetic_revised"
Also: da.wikt wants the order to be "alphabetic"
Does anyone look at "feature requests"? Took me a while to realize they were separate from "bugs".
Pywikipedia [http] trunk/pywikipedia (r6353, Feb 14 2009, 14:36:27)
Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2605385&group_…
I need to move a bunch of pages, removing the last part of the title.
(Detail: I've edited the XML file of imported pages, to add "(from
Wikipedia)" on the end of each article's name. This helps avoid clashes
between articles of the same name in the imported pages and the existing
wiki pages. By leaving the redirects in place, it helps avoid importing each
article more than once, when I import more groups of pages in future.)
Now, I can use a spreadsheet to make up a list of commands, one line for
each page, of the form:
*python movepages.py -from:"foo (from Wikipedia)" -to:"foo"* *
"file:pagelist.txt"*
*
*and that will work...but is there a more efficient way of renaming these
pages?
Thanks!
--
Chris Watkins (a.k.a. Chriswaterguy)
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
identi.ca/appropedia / twitter.com/appropediablogs.appropedia.org
I like this: five.sentenc.es
Bugs item #2602058, was opened at 2009-02-15 08:48
Message generated for change (Comment added) made by multichill
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2602058&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Lars Aronsson (aronsson)
Assigned to: Nobody/Anonymous (nobody)
Summary: Interwiki link sorting for Swedish Wikipedia
Initial Comment:
Please modify families/wikipedia_family.py (near line 910) so that interwiki links on the Swedish Wikipedia (sv.wikipedia.org) are sorted by language name (self.interwiki_putfirst = {... 'sv': self.alphabetic, ...}, as for en.wikipedia.org and no.wikipedia.org) rather than by language code (which is the default).
----------------------------------------------------------------------
>Comment By: Multichill (multichill)
Date: 2009-02-15 12:17
Message:
Do you have community consensus on this?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2602058&group_…
Bugs item #2602058, was opened at 2009-02-15 08:48
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2602058&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Lars Aronsson (aronsson)
Assigned to: Nobody/Anonymous (nobody)
Summary: Interwiki link sorting for Swedish Wikipedia
Initial Comment:
Please modify families/wikipedia_family.py (near line 910) so that interwiki links on the Swedish Wikipedia (sv.wikipedia.org) are sorted by language name (self.interwiki_putfirst = {... 'sv': self.alphabetic, ...}, as for en.wikipedia.org and no.wikipedia.org) rather than by language code (which is the default).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2602058&group_…
Revision: 5914
Author: wikipedian
Date: 2008-09-22 16:15:37 +0000 (Mon, 22 Sep 2008)
Log Message:
-----------
Fixed the Esperanto X-convention bug [ 2006208 ] by rolling back many changes that
concerned Esperanto X-conv.
I fixed this on 2008-08-21 already, but somehow my commit seems to have failed
(sorry), so now I retry to commit it.
Modified Paths:
--------------
trunk/pywikipedia/families/wikipedia_family.py
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/families/wikipedia_family.py
===================================================================
--- trunk/pywikipedia/families/wikipedia_family.py 2008-09-22 09:58:41 UTC (rev 5913)
+++ trunk/pywikipedia/families/wikipedia_family.py 2008-09-22 16:15:37 UTC (rev 5914)
@@ -966,16 +966,4 @@
return self.code2encoding(code),
def shared_image_repository(self, code):
- return ('commons', 'commons')
-
- def post_get_convert(self, site, getText):
- if site.lang == 'eo':
- return wikipedia.decodeEsperantoX(getText)
- else:
- return getText
-
- def pre_put_convert(self, site, getText):
- if site.lang == 'eo':
- return wikipedia.encodeEsperantoX(getText)
- else:
- return getText
+ return ('commons', 'commons')
\ No newline at end of file
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2008-09-22 09:58:41 UTC (rev 5913)
+++ trunk/pywikipedia/wikipedia.py 2008-09-22 16:15:37 UTC (rev 5914)
@@ -799,12 +799,12 @@
else:
self._isWatched = False
# Now process the contents of the textarea
- # Unescape HTML characters, strip whitespace and postconvert
- pagetext = text[i1:i2]
- pagetext = unescape(pagetext)
- pagetext = pagetext.rstrip()
- pagetext = self.site().post_get_convert(pagetext)
-
+ # Unescape HTML characters, strip whitespace
+ pagetext = text[i1:i2]
+ pagetext = unescape(pagetext)
+ pagetext = pagetext.rstrip()
+ if self.site().lang == 'eo':
+ pagetext = decodeEsperantoX(pagetext)
m = self.site().redirectRegex().match(pagetext)
if m:
# page text matches the redirect pattern
@@ -1295,7 +1295,12 @@
import watchlist
watchArticle = watchlist.isWatched(self.title(), site = self.site())
newPage = not self.exists()
- newtext = self.site().pre_put_convert(newtext)
+ # if posting to an Esperanto wiki, we must e.g. write Bordeauxx instead
+ # of Bordeaux
+ if self.site().lang == 'eo':
+ newtext = encodeEsperantoX(newtext)
+ comment = encodeEsperantoX(comment)
+
return self._putPage(newtext, comment, watchArticle, minorEdit,
newPage, self.site().getToken(sysop = sysop), sysop = sysop)
@@ -2237,7 +2242,7 @@
reason = input(u'Please enter a reason for the deletion:')
answer = 'y'
if prompt and not hasattr(self.site(), '_noDeletePrompt'):
- answer = inputChoice(u'Do you want to delete %s?' % self.aslink(forceInterwiki = True), ['Yes', 'No', 'All'], ['Y', 'N', 'A'], 'N')
+ answer = inputChoice(u'Do you want to delete %s?' % self.aslink(forceInterwiki = True), ['yes', 'no', 'all'], ['y', 'N', 'a'], 'N')
if answer == 'a':
answer = 'y'
self.site()._noDeletePrompt = True
@@ -2939,6 +2944,9 @@
def getData(self):
address = self.site.export_address()
pagenames = [page.sectionFreeTitle() for page in self.pages]
+ # We need to use X convention for requested page titles.
+ if self.site.lang == 'eo':
+ pagenames = [encodeEsperantoX(pagetitle) for pagetitle in pagenames]
pagenames = u'\r\n'.join(pagenames)
if type(pagenames) is not unicode:
output(u'Warning: xmlreader.WikipediaXMLHandler.getData() got non-unicode page names. Please report this.')
@@ -3995,11 +4003,6 @@
linktrail: Return regex for trailing chars displayed as part of a link.
disambcategory: Category in which disambiguation pages are listed.
- post_get_convert: Converts text data from the site immediatly after get
- i.e. EsperantoX -> unicode
- pre_put_convert: Converts text data from the site immediatly before put
- i.e. unicode -> EsperantoX
-
Methods that yield Page objects derived from a wiki's Special: pages
(note, some methods yield other information in a tuple along with the
Pages; see method docs for details) --
@@ -5840,12 +5843,6 @@
"""Return regex for trailing chars displayed as part of a link."""
return self.family.linktrail(self.lang)
- def post_get_convert(self, getText):
- return self.family.post_get_convert(self, getText)
-
- def pre_put_convert(self, putText):
- return self.family.pre_put_convert(self, putText)
-
def language(self):
"""Return Site's language code."""
return self.lang