Brion Vibber wrote:
One possibility is to embed the timestamp into the URL. So the goatse version might be: http://upload.wikimedia.org/wikipedia/en/2005/10/23/074223/Puppy.jpg
and the reverted image would get a different URL, a few minutes later: http://upload.wikimedia.org/wikipedia/en/2005/10/23/074506/Puppy.jpg
Alternative but very similar idea would be to embed the revision number in the URL, instead of the upload timestamp:
Example original: http://upload.wikimedia.org/wikipedia/en/P/1/Puppy.jpg
Example revised: http://upload.wikimedia.org/wikipedia/en/P/2/Puppy.jpg
Then internally there needs to be some translation/lookup table from image name --> current revision number, as opposed to a lookup table for image name --> upload date. (Integers are smaller than dates, so small memory saving perhaps).
Possibility of a very very very small bandwidth saving from slightly shorter URLs.
Maybe also it helps if two people upload Puppy.jpg at the exact same second (not sure what happens in a date timestamp system that's only accurate to the second when this happens, but in a revision number system one is always going to first, even by a few microseconds).
Lastly, it's easy for a human with the URL to see what revisions come before/after by incrementing/decrementing the digit in the URL, whereas the date and time of the upload of a previous revision cannot be predicted just from the image name.
All other benefits as per timestamp system, I think.
All the best, Nick.
Nick Jenkins wrote:
Brion Vibber wrote:
One possibility is to embed the timestamp into the URL. So the goatse version might be: http://upload.wikimedia.org/wikipedia/en/2005/10/23/074223/Puppy.jpg
and the reverted image would get a different URL, a few minutes later: http://upload.wikimedia.org/wikipedia/en/2005/10/23/074506/Puppy.jpg
Alternative but very similar idea would be to embed the revision number in the URL, instead of the upload timestamp:
Example original: http://upload.wikimedia.org/wikipedia/en/P/1/Puppy.jpg
Example revised: http://upload.wikimedia.org/wikipedia/en/P/2/Puppy.jpg
We had a lively discussion on in #wikimedia-tech on this subject; as well as the revision ID numbers another possibility discussed was using a content hash.
A content hash has the additional advantage that duplicate file versions only need to be stored once; for instance currently when reverting a file it makes a new copy of the file on the filesystem, which wastes space. (However you then need to be careful about deleting.)
So you might have something like: http://upload.wikimedia.org/584/590/5845907fdfc6eb1125129c4ce0da0704c496a7e4...
Obviously a disadvantage is that the filenames are ugly. One might tack a 'pretty' but ignored filename on the end, using rewrites or whatever tool to drop it on the backend:
http://upload.wikimedia.org/584/590/5845907fdfc6eb1125129c4ce0da0704c496a7e4...
This does though complicate the server configuration; I think a goal should be making it very easy to set up a file mirror that we can actually send requests to. Arbitrary filename additions may also have security implications for broken browsers like Internet Explorer which like to interpet filetype information out of the "extension" on the URL.
Lastly, it's easy for a human with the URL to see what revisions come before/after by incrementing/decrementing the digit in the URL, whereas the date and time of the upload of a previous revision cannot be predicted just from the image name.
That might be kind of neat, but requires maintaining a consistent revision sequence _within_ each image. If using revision numbers, it's easier to work with the global row id numbers as the database can guarantee their uniqueness.
-- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNED MESSAGE-----
Moin,
On Monday 24 October 2005 08:57, Brion Vibber wrote:
Nick Jenkins wrote:
Brion Vibber wrote:
One possibility is to embed the timestamp into the URL. So the goatse version might be: http://upload.wikimedia.org/wikipedia/en/2005/10/23/074223/Puppy.jpg
and the reverted image would get a different URL, a few minutes later: http://upload.wikimedia.org/wikipedia/en/2005/10/23/074506/Puppy.jpg
Alternative but very similar idea would be to embed the revision number in the URL, instead of the upload timestamp:
Example original: http://upload.wikimedia.org/wikipedia/en/P/1/Puppy.jpg
Example revised: http://upload.wikimedia.org/wikipedia/en/P/2/Puppy.jpg
We had a lively discussion on in #wikimedia-tech on this subject; as well as the revision ID numbers another possibility discussed was using a content hash.
A content hash has the additional advantage that duplicate file versions only need to be stored once; for instance currently when reverting a file it makes a new copy of the file on the filesystem, which wastes space. (However you then need to be careful about deleting.)
So you might have something like: http://upload.wikimedia.org/584/590/5845907fdfc6eb1125129c4ce0da0704c49 6a7e4.jpg
Obviously a disadvantage is that the filenames are ugly. One might tack a 'pretty' but ignored filename on the end, using rewrites or whatever tool to drop it on the backend:
http://upload.wikimedia.org/584/590/5845907fdfc6eb1125129c4ce0da0704c49 6a7e4/Puppy.jpg
Which is still very human-unfriendly. I couldn't remember this URl even if my life depended on it!
I rather like
http://upload.wikimedia.org/wikipedia/en/P/2/Puppy.jpg
although I am not sure why the "/P/" needs to be visible to the user (it is deterministic, after all), and it would be handy to have a "latest" revision URL. Which could be just:
http://upload.wikimedia.org/wikipedia/en/Puppy.jpg
and the software behind the back figures out what the exact latest revision is and under what /CapitcalLetter/ directory it falls. These are things a human user shouldn't need to do or know about.
(Yes, I know, it is technically difficult. But I'd rather have you spent some time figuring it out and implementing it, than every wikipedia user to remember these little technicalities :)
If the plan is to hide all that, well, please forget my 0.02€.
Best wishes,
Tels
- -- Signed on Mon Oct 24 18:58:02 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"Any sufficiently advanced technology is indistinguishable from a rigged demo." -- Andy Finkel, computer guy
Tels wrote:
although I am not sure why the "/P/" needs to be visible to the user (it is deterministic, after all), and it would be handy to have a "latest" revision URL. Which could be just: http://upload.wikimedia.org/wikipedia/en/Puppy.jpg and the software behind the back figures out what the exact latest revision is and under what /CapitcalLetter/ directory it falls.
Please read Brion's original message again.
These are things a human user shouldn't need to do or know about.
Serving static content usually amounts to just pushing bits, and that's all it should be. If you want it to be more than that, you need to call into a language that will supply an additional layer of logic before you can start pushing bits, which is usually enormously slower than being able to say "here's a link to a static file". You can serve the latter with all sorts of optimizations -- one example being in-kernel httpd, or one of the very fast userland ones.
-----BEGIN PGP SIGNED MESSAGE-----
Moin ,
On Monday 24 October 2005 19:21, Ivan Krstic wrote:
Tels wrote:
although I am not sure why the "/P/" needs to be visible to the user (it is deterministic, after all), and it would be handy to have a "latest" revision URL. Which could be just: http://upload.wikimedia.org/wikipedia/en/Puppy.jpg and the software behind the back figures out what the exact latest revision is and under what /CapitcalLetter/ directory it falls.
Please read Brion's original message again.
Duh, sorry, I was mighty confused %~/ Please ignore any noise coming from me. :)
Best wishes,
Tels
- -- Signed on Tue Oct 25 17:58:14 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"Now, _you_ behave!"
Various people wrote: [I've picked this response just because it's the most recent, various people have been making similar points.]
So you might have something like: http://upload.wikimedia.org/584/590/5845907fdfc6eb1125129c4ce0da0704c49 6a7e4.jpg
Obviously a disadvantage is that the filenames are ugly. One might tack a 'pretty' but ignored filename on the end, using rewrites or whatever tool to drop it on the backend:
http://upload.wikimedia.org/584/590/5845907fdfc6eb1125129c4ce0da0704c49 6a7e4/Puppy.jpg
Which is still very human-unfriendly. I couldn't remember this URl even if my life depended on it!
The question that occurs to me is, quite simply, why do humans ever *need* to know or manipulate such URIs?
* anyone wanting to "bookmark" a particular image will want to link to its description page (to show copyright info, possible replacements, etc); if that's not the case, we need to redesign our image description pages (this may be the case w.r.t. Commons).
* anyone *saving* the image ("downloading" it, as they would describe it) would only see the *filename* part (as the default name); as long as we tack on the "friendly name" at the end (even just to ignore) the rest of the URI can be anything at all
* anyone wanting to include an image *inline* in an external site is abusing our bandwidth (either maliciously or just through naivety)
* somebody mentionned bot authors; but what purpose do bot authors have with the absolute URI of an image? Creating a static dump by screen-scraping rather than parsing the wikitext dump?
* a user-side renderer (e.g. WikiWyg, Pilaf's Live Preview) might need to know them to render fully, I suppose; like distinguishing "red" and "blue" internal links, this could ideally be done through some minimal "API", question and answer style
In short, I don't see any need for making these URIs "pretty", or even providing a Special page that redirects, except as a [bad] substitute for a "bot API" that allows you to request the current full URI. I *do*, however, see some very good reasons for tacking a pretty *filename* on as the last part, even if it's actually ignored (e.g. for "save as", as mentionned above).
Actually, we might want to do more than just ignore the pretty name, because (esp. knowing IE) it rmight be dangerous if http://..../abc123/Puppy.jpeg and http://..../abc123/Puppy.txt are valid URIs for the same file. To keep things static, we don't really want to check this explicitly, but perhaps the "pretty bits" could actually exist on the filesystem, as symlinks or such: http://.../abc123/abc123 [actual content] http://..../abc123/Puppy.jpeg [symlink to above] http://..../abc123/JSC0123.jpg [symlink to above; old or alternative name] http://..../abc123/Haxx0r.txt [no symlink here, so returns HTTP 404]
-- Rowan Collins BSc [IMSoP]
I'm putting further notes at: http://www.mediawiki.org/wiki/1.6_image_storage
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org