Hello,
XML files from http://en.wikipedia.org/wiki/Special:Export contain img refences like: [[Image:Flag_de-berlin_civil_300px.png|150px|Landesflagge Berlins]] this e.g. resolves to: commons.wikimedia.org/upload/thumb/5/54/150px-Flag_de-berlin_civil_300px.png
That means that I don't get enough information to reconstruct the path.
Is any help available? E.g. how can I tell if an img comes from upload.wikimedia.org or commons.wikimedia.org/upload?
What's the explanation for "thumb/5/54" in the example above? Are there rules for conversion?
Marek
http://en.wikipedia.org/wiki/User:Marek_Moehling bn548mm@g214mx.net (remove numbers to despam)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Marek Möhling wrote: | XML files from http://en.wikipedia.org/wiki/Special:Export | contain img refences like: | [[Image:Flag_de-berlin_civil_300px.png|150px|Landesflagge Berlins]] | this e.g. resolves to: | commons.wikimedia.org/upload/thumb/5/54/150px-Flag_de-berlin_civil_300px.png | | That means that I don't get enough information to reconstruct the path. | | Is any help available? | E.g. how can I tell if an img comes from | upload.wikimedia.org or commons.wikimedia.org/upload?
If it's in the local wiki's uploaded file set, that's used. Otherwise, the commons file set is checked.
| What's the explanation for "thumb/5/54" in the example above? | Are there rules for conversion?
The 'thumb' subdirectory is used for thumbnails. Please check the source code for the exact details of how things are parsed, or look at the mailing list archives for past discussions.
- -- brion vibber (brion @ pobox.com)
| If it's in the local wiki's uploaded file set, that's used. Otherwise, | the commons file set is checked.
I need to import recent wiki content in diff. languages to my website - presently about 5 MB including images - so to download 26 GBs via the complete DB dump wouldn't be economical.
As I don't have access to the wikipedia filesystem I'd have to do the check above via HTTP request.
I'd do it like this:
1) send a GET request to http://upload.wikimedia.org/somePathToImg 2) wait for result 3) a - use the file in case of response 200 b - else if response 404 fetch the file at http://commons.wikimedia.org/somePathToImg
...which still causes overhead. Isn't there a more elegant solution?
Marek
On Sat, 29 Jan 2005 04:07:44 +0100, Marek Möhling bonnmm@t-online.de wrote:
Hello,
XML files from http://en.wikipedia.org/wiki/Special:Export contain img refences like: [[Image:Flag_de-berlin_civil_300px.png|150px|Landesflagge Berlins]] this e.g. resolves to: commons.wikimedia.org/upload/thumb/5/54/150px-Flag_de-berlin_civil_300px.png
That means that I don't get enough information to reconstruct the path.
Not directly, but you could try for the image from the wiki from which you are getting the export, and if you don't find it there, try the commons.
What's the explanation for "thumb/5/54" in the example above? Are there rules for conversion?
It's some sort of checksum of the image that determines the path.
What's the explanation for "thumb/5/54" in the example above? Are there rules for conversion?
It's based on the MD5 hash of the image name. The MD5 Hash of "150px-Flag_de-berlin_civil_300px.png" is 5490af54dffb956a9455f50a1f954eee. The first folder is the first character, and the sub-folder is the first 2 characters.
Thanks, that helped. M
"Robert Jones" bob@jones-cliffe.freeserve.co.uk wrote in message news:001a01c50605$d7229c60$d1062bd9@68fffzwmmg78wn9...
What's the explanation for "thumb/5/54" in the example above? Are there rules for conversion?
It's based on the MD5 hash of the image name. The MD5 Hash of "150px-Flag_de-berlin_civil_300px.png" is 5490af54dffb956a9455f50a1f954eee. The first folder is the first character, and the sub-folder is the first 2 characters.
wikitech-l@lists.wikimedia.org