Revision: 5301
Author: nicdumz
Date: 2008-05-03 13:05:23 +0000 (Sat, 03 May 2008)
Log Message:
-----------
Reminder :
Do not commit to fix something you were reported broken when working on some other stuff.
Modified Paths:
--------------
trunk/pywikipedia/reflinks.py
Modified: trunk/pywikipedia/reflinks.py
===================================================================
--- trunk/pywikipedia/reflinks.py 2008-05-03 12:48:24 UTC (rev 5300)
+++ trunk/pywikipedia/reflinks.py 2008-05-03 13:05:23 UTC (rev 5301)
@@ -90,7 +90,8 @@
ur'^\[\]\s<>"]+\([^\[\]\s<>"]+[^\[\]\s\.:;\\,<>\?"]+|'+
# unbracketed without ()
ur'[^\[\]\s<>"]+[^\[\]\s\)\.:;\\,<>\?"]+|[^\[\]\s<>"]+))[!?,\s]*\]?\s*</ref>')
-listof404pages = 'http://www.twoevils.org/files/wikipedia/404-links.txt'
+#http://www.twoevils.org/files/wikipedia/404-links.txt.gz
+listof404pages = '404-links.txt'
class XmlDumpPageGenerator:
def __init__(self, xmlFilename, xmlStart, namespaces):
@@ -286,7 +287,11 @@
Runs the Bot
"""
wikipedia.setAction(wikipedia.translate(self.site, msg))
- deadLinks = codecs.open(listof404pages, 'r', 'latin_1').read()
+ try:
+ deadLinks = codecs.open(listof404pages, 'r', 'latin_1').read()
+ except IOError:
+ wikipedia.output('You need to download http://www.twoevils.org/files/wikipedia/404-links.txt.gz and to ungzip it in the same directory')
+ raise
socket.setdefaulttimeout(30)
editedpages = 0
for page in self.generator:
@@ -322,7 +327,7 @@
headers = f.info()
contentType = headers.getheader('Content-Type')
if contentType and not self.MIME.search(contentType):
- if ref.link.lower().endswith('.pdf') and not ignorepdf:
+ if ref.link.lower().endswith('.pdf') and not self.ignorepdf:
# If file has a PDF suffix
self.getPDFTitle(ref, f)
else:
Revision: 5299
Author: nicdumz
Date: 2008-05-03 07:59:59 +0000 (Sat, 03 May 2008)
Log Message:
-----------
some reflinks.py minor doc
Modified Paths:
--------------
trunk/pywikipedia/reflinks.py
Modified: trunk/pywikipedia/reflinks.py
===================================================================
--- trunk/pywikipedia/reflinks.py 2008-05-03 07:57:27 UTC (rev 5298)
+++ trunk/pywikipedia/reflinks.py 2008-05-03 07:59:59 UTC (rev 5299)
@@ -11,6 +11,10 @@
DumZiBoT is running that script on en: & fr: at every new dump, running it on de: is not allowed anymore.
+As it uses it, you need to configure noreferences.py for your wiki, or it will not work.
+
+pdfinfo is needed for parsing pdf titles.
+
See [[:en:User:DumZiBoT/refLinks]] for more information on the bot.
¶ms;
Bugs item #1797503, was opened at 2007-09-18 15:05
Message generated for change (Comment added) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1797503&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: DarkoNeko (darkoneko)
Assigned to: Nobody/Anonymous (nobody)
Summary: category.py
Initial Comment:
evt : cmd.exe on windows XP family edition
command used :
C:\Program Files\TortoiseSVN\pywikipedia>python category.py move -from:"Cumuni di Sicilia" -to:"Cumuna di Sicilia" -lang:co
error message :
There are more articles in Category:Cumuni di Sicilia.
Getting [[Category:Cumuni di Sicilia]] starting at Mistirjancu" class="new...
Changing page [[co:Longi]]
WARNING: No character set found.
Category page detection is not bug free. Please report this error!
substring not found
Changing page [[co:Lucca Sicula]]
(a few other pages)
Changing page [[co:Marineu]]
Getting [[Category:Cumuni di Sicilia]]...
There are more articles in Category:Cumuni di Sicilia.
Getting [[Category:Cumuni di Sicilia]] starting at Vita+%28Sicilia%29" class="ne
w...
WARNING: No character set found.
Category page detection is not bug free. Please report this error!
Dumping to category.dump.bz2, please wait...
Traceback (most recent call last):
File "category.py", line 832, in <module>
bot.run()
File "category.py", line 365, in run
subcategories = self.oldCat.subcategoriesList(recurse = False)
File "C:\Program Files\TortoiseSVN\pywikipedia\catlib.py", line 298, in subcat
egoriesList
for cat in self.subcategories(recurse):
File "C:\Program Files\TortoiseSVN\pywikipedia\catlib.py", line 284, in subcat
egories
for tag, subcat in self._getContentsAndSupercats(recurse):
File "C:\Program Files\TortoiseSVN\pywikipedia\catlib.py", line 124, in _getCo
ntentsAndSupercats
for tag, page in self._parseCategory(purge, startFrom):
File "C:\Program Files\TortoiseSVN\pywikipedia\catlib.py", line 204, in _parse
Category
ibegin = txt.index('<!-- start content -->') # does not work for cats withou
t text
ValueError: substring not found
apparently an error when recovering the page list.
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2008-05-02 19:20
Message:
Logged In: YES
user_id=1312539
Originator: NO
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: Russell Blau (russblau)
Date: 2008-04-18 08:41
Message:
Logged In: YES
user_id=855050
Originator: NO
If this error is still occurring, please provide updated information. If
no information is received, this bug will be closed automatically.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1797503&group_…
Bugs item #1803615, was opened at 2007-09-27 07:10
Message generated for change (Comment added) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1803615&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Another bug in category.py
Initial Comment:
I have no idea how this bug occurs, but here it is. It doesn't recognize that a simple category already exists in this case, which is strange, since the category I'm adding is Landskrona BoIS and in this case the DEFAULTSORT template is NOT being used in the article.
Current categories:
* Kategori:Födda 1984
* Kategori:Spelare i Häljarps IF
* Kategori:Spelare i IFK Hässleholm
<b>* Kategori:Spelare i Landskrona BoIS</b>
* Kategori:Svenska fotbollsspelare
Adding [[Kategori:Spelare i Landskrona BoIS|Dahlgren, Mikael]]
Very peculiar.
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2008-05-02 19:20
Message:
Logged In: YES
user_id=1312539
Originator: NO
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: Russell Blau (russblau)
Date: 2008-04-18 08:39
Message:
Logged In: YES
user_id=855050
Originator: NO
If this bug is still occurring, please provide updated information
including the specific command line used to run the bot. If no information
is provided, this bug will be closed automatically.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2007-09-27 07:10
Message:
Logged In: NO
I didn't know that HTML code was disabled in bug descriptions... The HTML
code is of course not being printed in output.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1803615&group_…
Feature Requests item #1956119, was opened at 2008-05-02 05:52
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1956119&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: intwerwiki to non existing anchor
Initial Comment:
Please, add possibility to give interwiki link to anchor whis is not one of headers of article.
Example:
There are articles Microsoft .NET and .NET Framework in several languages, but in en: is only .NET Framework.
But some people repeatly add link to .NET Framework to the second group of articles.
If there is link [[en:.NET_Framework#top]], it works correctly for browser, but bot wants to remove this link
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1956119&group_…