https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
Web browser: --- Bug ID: 55195 Summary: Invalid Title in flickrripper Product: Pywikibot Version: unspecified Hardware: All OS: All Status: ASSIGNED Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: legoktm.wikipedia@gmail.com Classification: Unclassified Mobile Platform: ---
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1466/ Reported by: betacommand Created on: 2012-06-19 19:52:25 Subject: Invalid Title in flickrripper Assigned to: xqt Original description: Betacommand multichill: I know you wrote flickrripper.py and Im trying to fix an issue with it, and thought it might be easier for you to fix Betacommand lines 157-161 where it grabs the description and uses it for the file name Betacommand when you start working with non-latin descriptions it doesnt handle multi-byte characters well, it ended up with a title over 320 bytes Betacommand the max mediawiki lets you have is 255 multichill Lol Betacommand multichill: really rather a pain multichill So the check shoul probably encode it and than see how long it is? Betacommand correct multichill Or just lower the limit a bit? Betacommand thai letters for example are 3 bytes Betacommand notes it was discovered with flickrripper.py -autonomous -user_id:40561337@N07 -addcategory:"Files from Abhisit Vejjajiva Flickr stream" multichill Betacommand: Could you file a bug for this? Betacommand multichill: you would need to cut it down to 85 to be safe
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #1 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- I guess the title is cutted by mw and not the slice operator since it works correct for unicode strings. len() also gives the number of characters not the number bytes. Do we have any size(object) method?
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- an idea for getFilename (could anybody test it whether it works)
if not title: #find the max length for a mw title maxBytes = 240 - len(project.encode('utf-8')) \ - len(username.encode('utf-8')) description = photoInfo.find('photo').find('description').text if description: descBytes = len(description.encode('utf-8')) if descBytes > maxBytes: # maybe we cut more than needed, anyway we do it items = max(0, len(description) - maxBytes + descBytes) description = description[:items] title = cleanUpTitle(description) else: title = u'' # Should probably have the id of the photo as last resort.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #3 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **assigned_to**: nobody --> xqt
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #4 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- fix committed in r10387, please check
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #5 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **summary**: Invalid Title --> Invalid Title in flickrripper
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #6 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **status**: open --> pending
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #7 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **status**: pending --> pending-fixed
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #8 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Issue still not fixed, actually its worse C:\Dev\SVN\pywikipedia>flickrripper.py -autonomous -user_id:40561337@N07 -addcat egory:"Files from Abhisit Vejjajiva Flickr stream" 5703017392 Traceback (most recent call last): File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 609, in <module> main() File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 599, in main removeCategories, autonomous) File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 257, in processPhoto filename = getFilename(photoInfo) File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 172, in getFilename % (title, project, username)).exists(): File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 1284, in exists self.get() File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 705, in get expandtemplates = expandtemplates) File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 787, in _getEditPage raise BadTitle('BadTitle: %s' % self) pywikibot.exceptions.BadTitle: BadTitle: [[commons:File:&#3609;&#3634;&#3618; 585;&#3619;&#3633;&#3600;&#3617;&#3609;&#3605;&#3619;&#3637; &#3649;&#3621;ū 2;&#3588;&#3603;&#3632;&#3648;&#3604;&#3636;&#3609;&#3607;&#3634;&#3591;&#3629;& #3629;&#3585;&#3592;&#3634;&#3585;&#3585;&#3619;&#3640;&#3591;&#3592;&#3634;# 85;&#3634;&#3619;&#3660;&#3605;&#3634; &#3626;&#3634;&#3608;&#3634;&#3619;ณ ;&#3619;&#3633;&#3600;&#3629;&#3636;&#3609;&#3650;&#3604;&#3609;&#3637;&#3648;&# 3595;&#3637;&#3618;&#3585;&#3621;&#3633;&#3610;&#3618;&#3633;&#3591;&#3611;ũ 9;&#3632;&#3648;&#3607;&#3624;&#3652;&#3607;&#3618; &#3623;&#3633;&#3609;&#3629; &#3634;&#3607;&#3636;&#3605;&#3618;&#3660;&#3607;&#3637;&#3656; 8 &#3614;&#3620; &#3625;&#3616;&#3634;&#3588;&#3617; &#3614;.&#3624;.2554 (Photographer attached to the Prime Minister of the Kingdom of Thailand (H.E.Mr.Abhisit Vejjajiva) , Pe erapat Wimolrungkarat - &#3614;&#3637;&#3619;&#3614;&#3633;&#3602;&#3609;&#3660; &#3623;&#3636;&#3617;&#3621;&#3619;&#3633;&#3591;&#3588;&#3619;&#3633;&#3605;&# 3609;&#3660;) @is50mm - Flickr - Abhisit Vejjajiva.jpg]]
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #9 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **status**: pending-fixed --> open
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #10 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Where are the html entities from? Are they part of the flickr page?
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #11 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- those are the thai parts of the page title that are being converted when the exception is being thrown
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #12 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- I do not see a conversion by the exception. I converted the title from html entities to unicode in my last commit
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #13 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Line 787 doesnt return the title, it returns the whole page (self) when you print the object and not the title it gets converted there. I used a log to confirm that the title was UTF-8 before filling this bug,
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
--- Comment #14 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Thanks for testing. The lenght calculation was wrong. I've corrected it
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://sourceforge.net/p/p | |ywikipediabot/bugs/1466
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
xqt info@gno.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW CC| |info@gno.de
https://bugzilla.wikimedia.org/show_bug.cgi?id=55195
John Mark Vandenberg jayvdb@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jayvdb@gmail.com Component|General |Other scripts
pywikipedia-bugs@lists.wikimedia.org