[Mediawiki-l] Does wikipedia use different methods to compute the hash part of an image path?

Daniel Friesen lists at nadir-seen-fire.com
Tue Dec 6 23:45:28 UTC 2011


Is batching the api an option?
You can provide multiple titles to the api and get their info all at
once. Additionally if you are actually already using an api query to get
your initial list of files you can usually turn that into a generator
api query that will get the image info as well all in one query.

On Tue, 06 Dec 2011 15:11:34 -0800, Tommy Chheng <tommy.chheng at gmail.com>  
wrote:

> The problem is that I'm bulk processing image url data so if i were to
> rely on the api, I would have to do 2x the http calls.
>
>
> On Tue, Dec 6, 2011 at 10:25 AM, Daniel Friesen
> <lists at nadir-seen-fire.com> wrote:
>> You could also use Special:FilePath.
>>
>> Definitely a good idea. We long for a day when we can eliminate the
>> current way we organize files and do things like renaming a file in the  
>> ui
>> without having to rename files, handling paths in a way that don't  
>> suffer
>>  from negative caching effects and make it so the url of an old file
>> version is the same as when it was the main file, and maybe even
>> possibilities like using only one file instead of multiples when two  
>> files
>> have the same contents.
>>
>> Those kind of changes we could make in the future would completely  
>> destroy
>> clients that hardcode this kind of handling.me
>>
>> On Tue, 06 Dec 2011 06:40:12 -0800, C Stafford <c.stafford at gmail.com>
>> wrote:
>>
>>> You may be more future proof by asking the API for the image url,
>>> rather then trying to figure it out your self, as each wiki install
>>> may have other factors that determine that director/hash structure
>>> (i've seen places that have 3 levels, not 2)
>>>
>>> http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=url&titles=File:Stewie_Griffin.png
>>>
>>> On Mon, Dec 5, 2011 at 6:25 PM, Tommy Chheng <tommy.chheng at gmail.com>
>>> wrote:
>>>> I'm computing the url of an image by the following:
>>>> (the md5 of the first char and the second two chars concat)
>>>>
>>>>   val md = MessageDigest.getInstance("MD5")
>>>>   val messageDigest = md.digest(fileName.getBytes)
>>>>   val md5 = (new BigInteger(1, messageDigest)).toString(16)
>>>>
>>>>   val hash1 = md5.substring(0, 1)
>>>>   val hash2 = md5.substring(0, 2)
>>>>
>>>>   val urlPart = hash1 + "/" + hash2 + "/" + fileName
>>>>
>>>> Most of the time, the function works correctly but on a few cases, it
>>>> is incorrect:
>>>>
>>>> For "Stewie_Griffin.png", I get 2/26/Stewie_Griffin.png but the real
>>>> one is 0/02/Stewie_Griffin.png
>>>>
>>>> The source file info is here:
>>>> http://en.wikipedia.org/wiki/File:Stewie_Griffin.png
>>>> http://upload.wikimedia.org/wikipedia/en/0/02/Stewie_Griffin.png
>>>>
>>>> Any ideas why the hashing scheme doesn't work sometimes?
>>>>
>>>> I posted this question on stackoverflow but I might be able to get a
>>>> better answer
>>>> here.http://stackoverflow.com/questions/8389616/does-wikipedia-use-different-methods-to-compute-the-hash-part-of-an-image-path
>>>>
>>>> --
>>>> @tommychheng
>>>> http://tommy.chheng.com
>>>>
>>>> _______________________________________________
>>>> Mediawiki-api mailing list
>>>> Mediawiki-api at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>
>>
>> --
>> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
>>
>> _______________________________________________
>> MediaWiki-l mailing list
>> MediaWiki-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
>
>


-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]



More information about the MediaWiki-l mailing list