Patches item #3147585, was opened at 2010-12-29 05:55
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3147585&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: DixonD (dixond)
Assigned to: xqt (xqt)
Summary: Working with interwikis on subpage
Initial Comment:
I think I implemented my feature request with ID: 3146291
Please, review carefully my code - to be honest I don't have really much experience with Python.
And few moments:
1) When we adding/fixing interwikis on subpage, I think we should remove interwikis in template itself if any
2) It seems that logic of determing whether this page has subpages, getting interwikis also from included subpages etc should be decoupled from interwiki.py and moved to wikipedia.py
3) Not really related but.. cosmetic_changes.py removes interwikis from subpage.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 11:05
Message:
Housekeeper's note: the patch does not apply cleanly to r10035
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:00
Message:
Sorry for the very... slow response. The problem is that interwiki.py is
complicated (as you have noticed), and it's a somewhat complicated patch.
To be honest: I'm not even sure how much interwiki.py does in terms of
templates, subpages... and as such I'm scared as hell to even touch the
script, considering it sort of works at the moment.
----------------------------------------------------------------------
Comment By: DixonD (dixond)
Date: 2011-03-30 02:37
Message:
Any news?
----------------------------------------------------------------------
Comment By: DixonD (dixond)
Date: 2010-12-30 04:46
Message:
Uploaded new patch with some fixes.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3147585&group_…
Patches item #3108310, was opened at 2010-11-12 21:42
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3108310&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: lankier (lankier)
Assigned to: Nobody/Anonymous (nobody)
Summary: parameter expandtemplates for Page.linkedPages
Initial Comment:
Added parameter expandtemplates for Page.linkedPages. I think it is a usefull parameter.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 11:04
Message:
housekeeper's note: patch applies cleanly to r10035.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 09:53
Message:
I think so too, but I'm not sure if 'expandtemplates' is the clearest term
to use for this. Maybe 'includetranscluded'?
In any case, there should be some documentation on the parameter added.
Could you do that? Thanks!
(unfortunately there is no option for 'accepted, but waiting for updated
patch...')
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3108310&group_…
Patches item #3092870, was opened at 2010-10-22 03:26
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3092870&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: lankier (lankier)
Assigned to: xqt (xqt)
Summary: non ascii in system messages and max retry
Initial Comment:
This patch fixed two issues:
1. Ubuntu has non ascii in system messages.
Test:
$ sudo ifconfig eth0 down
$ cat test.py
import wikipedia
site = wikipedia.getSite()
page = wikipedia.Page(site, 'S')
text = page.get()
$ LANG=ru_RU.utf8 python test.py
Error downloading data: 'ascii' codec can't decode byte 0xd0 in position 27: ordinal not in range(128)
Request ru:/w/api.php?inprop=protection%7Ctalkid%7Csubjectid%7Curl%7Creadable&format=json&rvprop=content%7Cids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Csize&prop=revisions%7Cinfo&titles=S&rvlimit=1&action=query
Retrying in 1 minutes...
^C
After fix (added "e = unicode(str(e), locale.getpreferredencoding())"):
$ LANG=ru_RU.utf8 python test.py
<urlopen error [Errno 101] Сеть недоступна>
WARNING: Could not open [...]
2. Added raise MaxTriesExceededError when max tries exceeded.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 11:03
Message:
housekeeper's note: the patch does not apply cleanly to r10035
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 09:49
Message:
I think we should either
a) skip the entire output() machinery and use traceback.print_exc()
instead
or
b) write a wrapper for that does what you propose here (but which can also
be used for traceback.format_exc).
and replace all exception printing with one of those two options.
----------------------------------------------------------------------
Comment By: lankier (lankier)
Date: 2010-11-07 12:44
Message:
We can't fix it in output() because we have an exception before we entered
in output().
What about just replace output(u'%s' %e) -> output(str(e)) ? it works.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-11-07 08:52
Message:
output should be fixed in output method. Would you please check the
following fix in output method:
def output(...)
...
try:
text = unicode(text, 'utf-8')
except UnicodeDecodeError:
text = unicode(text, 'iso8859-1')
replace it with
try:
text = unicode(text, 'utf-8')
except UnicodeDecodeError:
text = unicode(text, locale.getpreferredencoding())
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3092870&group_…
Patches item #3017517, was opened at 2010-06-17 02:38
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3017517&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: BalaSundaraRaman L (lsundar)
Assigned to: Nobody/Anonymous (nobody)
Summary: cosmetic_changes.py to remove bad wikilinks
Initial Comment:
Translated articles created using http://translate.google.com/toolkit?hl=en suffer from one complex issue. It creates links to impossible pages in the target wiki. Let's take the example below:
( Excerpt from http://en.wikipedia.org/wiki/Corporate_governance )
A related but separate thread of discussions focuses on the impact of a corporate governance system in [[economic efficiency]], with a strong emphasis on shareholders' welfare.
This when translated to Tamil, for example, will have a single word for "in economic efficiency" and the tool wrongly links to that phrase. Since article title can't be of the form "in economic efficiency", it'll remain a red link forever. Since articles are littered with such red links, it's hard to read.
In view of the large-scale http://wikimania2010.wikimedia.org/wiki/Submissions/Google_translation project and the problems we faced ( http://wikimania2010.wikimedia.org/wiki/Submissions/A_Review_of_Google_Tran… ), I've developed a patch for cosmetic_changes.py which'll remove red links of the form [[some phrase]] leaving out cases where the label is different from the target. I've attached the patch as well. The changes by my bot running the modified code is at http://ta.wikipedia.org/wiki/Special:Contributions/SundarBot
If approved, I can give it to a dedicated bot operator with the translation team.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 11:02
Message:
housekeeper's note: the patch does not apply cleanly to r10035
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 09:45
Message:
Shouldn't this be in fixes.py?
In any case, I'm having some trouble understanding the goal of the added
parameters in this patch.
Last but not, what is the /generic/ use for this? The problem you're
solving (which is, as I understand it, unlinking all links, or all red
links) sounds awfully specific for tawiki, so I'm not sure if adding this
to pwb is useful.
----------------------------------------------------------------------
Comment By: BalaSundaraRaman L (lsundar)
Date: 2010-06-17 23:00
Message:
The changes will be visible when run in the following manner:
python cosmetic_changes.py -fewerlinks -keepblue -file:listofarticles.txt
For a diff of its changes, please check http://is.gd/cTJWr
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3017517&group_…
Patches item #3007742, was opened at 2010-05-26 20:19
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3007742&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: rewrite
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pyr0 ()
Assigned to: Nobody/Anonymous (nobody)
Summary: rvdiffto parameter implementation
Initial Comment:
No revisions diff text loading function is implemented in the framework. Here is one:
Changelog:
Modified site.loadrevisions() method to support rvdiffto parameter.
Added a Page.Revision.Diff class for storing the diff text and revto id.
Modified api.update_page() to save the new diff information.
A method from Page.py is still missing to get diffs just like you get a revision now. But you can get the diff text from page._revision[id].diff.text directly for now.
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 11:01
Message:
housekeeper's note:the patch does not apply cleanly to r10035
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 09:38
Message:
Sorry for the slow slow slow sloooow response.
I think diffs are a useful thing to have, but I'm not quite sure what the
goal here is - what is the advantage of using rvdiffto instead of getting
both revisions and comparing them with a python diff function?
I can see the use case for, for instance, an antivandalism bot, but I'm not
quite sure how you would use it with this.
Then on the implementation - I can imagine it makes sense to store diffs
for a certain revision, but I'd expect, for instance, a dict with revid's
such that
page, revid=10001
diffs = {10000: <diff object between 10000 and 10001>, 9000: <diff object
between 9000 and 10001>}, and storing e.g. revision.prev = 10000.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3007742&group_…
Patches item #2790445, was opened at 2009-05-11 21:30
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: NicDumZ — Nicolas Dumazet (nicdumz)
Summary: Re 1843798: Add capabiliy to remember pages to replace.py
Initial Comment:
A new patch to implement toobaz's function with the changes suggested by wikipedian.
https://sourceforge.net/tracker/?func=detail&aid=1843798&group_id=93107&ati…
- solve_disambiguation.py and pagegenerators.py:
1. Generator and logging function for -primary option moved
from solve_disambiguation.py to pagegenerators.py
2. TODO in solve_disambiguation.py done:
generator now starts yielding before all referring pages have been found
3. makes use of new TextfilePageGenerator
4. code is a few lines shorter
- replace.py:
5. "-exclude" option from toobaz's patch implemented.
Allows to filter generator through a list of previously edited pages.
New pages are appended to the filter file based on choices made:
-exclude: logs to filter choice "N"
6. additional command line options for other settings:
-editonce: logs to filter choices "Y", "A"
-treatonce: logs to filter choices "Y", "A", "N"
-scanonce: logs to filter choices "Y", "A", "N"; no change
7. uses generator and file format from solve_disambiguation.py
(suggested by wikipedian below)
8. default filter filename is the name of the fix. Files are placed
in a subdirectory "replace".
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:59
Message:
Nicdumz, do you have the time to work on this? It's been stale for.... a
while.
Sigmaoctantis, sorry for the very slow uptake. It's a general problem for
most patches that are larger than the 'glance over it, looks ok, commit'
language updates. I'll see if I can find the time to review it.
In any case, the patch does not apply cleanly currently, so it needs some
more fiddling.
----------------------------------------------------------------------
Comment By: siebrand (siebrand)
Date: 2009-10-02 02:49
Message:
Assigning to nicdumz for processing.
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-05-12 06:25
Message:
Thanks for the quick review. I will try to address the
various points and included a new version of the patch.
a. I added a bit more text to the source and reformatted
part of the code, but I didn't want to change existing
code more than needed.
b. generator:
- checks if the filter file exists
- reads it
- runs the next generator and skips pages in memory
Previously, it first run the next generator and then deleted
from its result pages that were in the filter file
c. replace.py command line options
I added several command line options to define which
pages should be skipped the next time. One could edit
replace.py directly, but it seemed cleaner to provide
all options at command line level.
toobaz excluded pages where a replacement was manually
rejected ("N"). The option "-exclude" will keep this
functionality.
Personally, I find it more useful to filter pages that
were edited in a previous run. This avoids that the bot
repeats the same edit later, after someone reverted
a previous edit. Option "-editonce" provides this.
"-treatonce" combines the two.
"-scanonce" avoids that the bot re-fetches the same page
in a 2nd run, even if the regex didn't match it in
the first run. (I fixed an omission for "skipped" in
the second patch)
Without the different options, the additions to replace.py
would be much shorter ..
d. I had to insert several "break" in replace.py to avoid
that nothing but "N" gets to the stage confusingly labeled
"choice must be 'N'" in the code.
e. FilterFileAppend is based on the function from
solve_disambiguation. The advantage of writing each
page to the file is that it wont miss one if it's
interrupted or crashes. This mode from
solve_disambiguation remains unchanged.
f. The same goes for the file format. Up to now, I didn't
have any problems with it and it worked ok with a
title "臺灣Taiwan&āàäà" I just tested. urlname was also
used by PrimaryIgnoreManager. For backward compatibility,
may it should be kept.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-05-12 03:47
Message:
Wow, that's a big patch =)
* codecs is fine with me
* can you avoid lines > 80 characters? I know that this is not something we
do everywhere, but that's bad looking code. Same goes for if foo: bar.
Please skip a line.
* can you document thoroughly what's being done? parameters in the
generators? In replace.py ? I find it really hard to understand the
"choice" table in the docstring explaining -scanonce & others.
* What's this:
+ f = codecs.open(filename, 'r', 'utf-8')
+ f.close()
??
I am also not convinced by the fact that after each page, FilterFileAppend
is called, and #1 path is computed, #2 a file is opened, written in, and
closed.
I'm thinking that a possible cleaner way to do this would be to have a
Filter object: put everything you need in it (an opened file descriptor, a
list of titles to ignore if you need to use this, etc...) and keep a
reference to it from the replace & disambig bots. How does that sound to
you?
I also know that Daniel wanted first to keep the same file format, but... a
couple of things are wrong here:
* if you output titles with page.urlname() it will not be possible to read
the file with TextfilePageGenerator afaik. Think of special characters,
being url encoded, and not decoded.
* if you want to use a Page title for a filename, you want
Page.titleforFilename, not Page.urlname
Thank you!
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2790445&group_…
Patches item #3479070, was opened at 2012-01-24 10:34
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3479070&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Hannes Röst (hroest)
Assigned to: Nobody/Anonymous (nobody)
Summary: Template parsing / general wikitext parsing
Initial Comment:
Improved support for wikitext parsing, especially template parsing.
Tests are enclosed:
nosetests -w tests --tests=test_textrange_parser.py,test_templateparser.py
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:56
Message:
Closing because Hannes now has commit access.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3479070&group_…
Patches item #3507199, was opened at 2012-03-17 13:12
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3507199&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: JAn (jandudik)
Assigned to: Nobody/Anonymous (nobody)
Summary: noreferences at hsb and dsb
Initial Comment:
I made patch for this script and these two languages
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:49
Message:
Please test your patches before submitting; there were some missing commas.
The corrected version for hsb and dsb (so not the sk change!) has been
committed in r10035 [ http://toolserver.org/~pywikipedia/r10035 ].
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:49
Message:
Please test your patches before submitting; there were some missing commas.
The corrected version for hsb and dsb (so not the sk change!) has been
committed in r10035 [ http://toolserver.org/~pywikipedia/r10035 ].
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:45
Message:
Is the change for skwiki on purpose?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3507199&group_…
Patches item #3507199, was opened at 2012-03-17 13:12
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3507199&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: JAn (jandudik)
Assigned to: Nobody/Anonymous (nobody)
Summary: noreferences at hsb and dsb
Initial Comment:
I made patch for this script and these two languages
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:49
Message:
Please test your patches before submitting; there were some missing commas.
The corrected version for hsb and dsb (so not the sk change!) has been
committed in r10035 [ http://toolserver.org/~pywikipedia/r10035 ].
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:45
Message:
Is the change for skwiki on purpose?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3507199&group_…
Patches item #3507199, was opened at 2012-03-17 13:12
Message generated for change (Comment added) made by valhallasw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3507199&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: JAn (jandudik)
Assigned to: Nobody/Anonymous (nobody)
Summary: noreferences at hsb and dsb
Initial Comment:
I made patch for this script and these two languages
----------------------------------------------------------------------
>Comment By: Merlijn S. van Deen (valhallasw)
Date: 2012-03-21 10:45
Message:
Is the change for skwiki on purpose?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3507199&group_…