[ pywikipediabot-Bugs-3536400 ] Invalid Title in flickrripper - Pywikipedia-bugs

20 Jun 2012


      Bugs item #3536400, was opened at 2012-06-19 12:52
Message generated for change (Comment added) made by xqt
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3536400...
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: betacommand (betacommand)
Assigned to: xqt (xqt)
...
Summary: Invalid Title in flickrripper
Initial Comment:
Betacommand	multichill: I know you wrote flickrripper.py and Im trying to fix an issue with it, and thought it might be easier for you to fix
Betacommand	lines 157-161 where it grabs the description and uses it for the file name
Betacommand	when you start working with non-latin descriptions it doesnt handle multi-byte characters well, it ended up with a title over 320 bytes
Betacommand	the max mediawiki lets you have is 255
multichill	Lol
Betacommand	multichill: really rather a pain
multichill	So the check shoul probably encode it and than see how long it is?
Betacommand	correct
multichill	Or just lower the limit a bit?
Betacommand	thai letters for example are 3 bytes
Betacommand	notes it was discovered with flickrripper.py -autonomous -user_id:40561337@N07 -addcategory:"Files from Abhisit Vejjajiva Flickr stream"
multichill	Betacommand: Could you file a bug for this?
Betacommand	multichill: you would need to cut it down to 85 to be safe
----------------------------------------------------------------------
...
Comment By: xqt (xqt)
Date: 2012-06-20 07:17
Message:
fix committed in r10387, please check
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-06-20 00:08
Message:
an idea for getFilename (could anybody test it whether it works)
if not title:
        #find the max length for a mw title
        maxBytes = 240 - len(project.encode('utf-8')) \
                       - len(username.encode('utf-8'))
        description = photoInfo.find('photo').find('description').text
        if description:
            descBytes = len(description.encode('utf-8'))
            if descBytes > maxBytes:
                # maybe we cut more than needed, anyway we do it
                items = max(0, len(description) - maxBytes + descBytes)
                description = description[:items]
            title = cleanUpTitle(description)
        else:
            title = u''
            # Should probably have the id of the photo as last resort.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-06-19 23:03
Message:
I guess the title is cutted by mw and not the slice operator since it works
correct for unicode strings. len() also gives the number of characters not
the number bytes. Do we have any size(object) method?
----------------------------------------------------------------------
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3536400...