Upload large files

List overview All Threads
Download

newer

older

German Wikipedia feature in the...

MediaWiki automated test run...

Magnus Manske

21 Aug 2006 21 Aug '06

7:01 p.m.

I've found some nice classical ogg files online (CC-BY-SA-2.0). However, some are larger than 20 MB. Uploading those leads me back to a blank upload page, without comment or error. 20MB seems to be a magical limt for PHP.

Is there a way to bypass that limit? I'd hate to have to cut perfectly good ogg files.

Magnus

Attachments:

signature.asc (application/pgp-signature — 249 bytes)

Show replies by date

Steve Bennett

21 Aug 21 Aug

7:40 p.m.

On 8/21/06, Magnus Manske magnus.manske@web.de wrote:

...

I've found some nice classical ogg files online (CC-BY-SA-2.0). However, some are larger than 20 MB. Uploading those leads me back to a blank upload page, without comment or error. 20MB seems to be a magical limt for PHP.

You mean a magical limit for uploading to MediaWiki? Maybe a nice person with access to the servers would copy them for you? :)

Steve

Rob Church

7:55 p.m.

On 21/08/06, Steve Bennett stevage@gmail.com wrote:

...

On 8/21/06, Magnus Manske magnus.manske@web.de wrote:

...
I've found some nice classical ogg files online (CC-BY-SA-2.0). However, some are larger than 20 MB. Uploading those leads me back to a blank upload page, without comment or error. 20MB seems to be a magical limt for PHP.

You mean a magical limit for uploading to MediaWiki? Maybe a nice person with access to the servers would copy them for you? :)

There are limits within MediaWiki, PHP and Apache. Magic can come from more than one direction.

Rob Church

Magnus Manske

8:19 p.m.

Rob Church schrieb:

...

On 21/08/06, Steve Bennett stevage@gmail.com wrote:

...
On 8/21/06, Magnus Manske magnus.manske@web.de wrote:

...
I've found some nice classical ogg files online (CC-BY-SA-2.0). However, some are larger than 20 MB. Uploading those leads me back to a blank upload page, without comment or error. 20MB seems to be a magical limt for PHP.

You mean a magical limit for uploading to MediaWiki? Maybe a nice person with access to the servers would copy them for you? :)

There are limits within MediaWiki, PHP and Apache. Magic can come from more than one direction.

I don't think it's MediaWiki, as it doesn't display the file-is-too-large message, but just returns a blank upload form. And AFAIK, Apache doesn't have an upload size limit /per se/ (though you can set one, apparently, with LimitRequsetBody). Thus, I'd bet on PHP to be the culprit. I know we've raised the PHP upload limit, but maybe not enough, or maybe there is a compiled-in limit?

Also, I don't think manually copying files would be a good idea. Even if you'd replace an existing upload, at least the img_size field would be wrong. And manually copying a file, then manually update the database to have it show correct values, just for uploading a larger-than-usual file is ... not good at all.

Magnus

Magnus Manske

8:21 p.m.

Also, that error should at least display a message, not just return a blank upload form.

Magnus

Rob Church

8:22 p.m.

On 21/08/06, Magnus Manske magnus.manske@web.de wrote:

...

I don't think it's MediaWiki, as it doesn't display the file-is-too-large message, but just returns a blank upload form. And AFAIK, Apache doesn't have an upload size limit /per se/ (though you can set one, apparently, with LimitRequsetBody). Thus, I'd bet on PHP to be the culprit. I know we've raised the PHP upload limit, but maybe not enough, or maybe there is a compiled-in limit?

Yeah, I was advising Steve that more than one possible limit exists. :)

Rob Church

Steve Bennett

8:32 p.m.

On 8/21/06, Rob Church robchur@gmail.com wrote:

...

Yeah, I was advising Steve that more than one possible limit exists. :)

Heh, I wasn't actually meaning to blame MediaWiki, but it sounded like it.

Ugly kludge of the day idea: add a "feature" where two files can be appended to create a new file. Then users could upload large files in 20mb segments and concatenate them afterwards.

Steve

Tim Starling

22 Aug 22 Aug

1:20 p.m.

Magnus Manske wrote:

...

I've found some nice classical ogg files online (CC-BY-SA-2.0). However, some are larger than 20 MB. Uploading those leads me back to a blank upload page, without comment or error. 20MB seems to be a magical limt for PHP.

Is there a way to bypass that limit? I'd hate to have to cut perfectly good ogg files.

PHP stores the entire contents of the POST request in memory, as it is receiving it. That's why we can't allow arbitrarily large uploads, the server would run out of memory. In any case, HTTP does not support resuming for uploads, so it's quite fragile.

Ideally, we should use a protocol which is designed for uploading large files efficiently and robustly. FTP is one such protocol, that's what archive.org use for their video and audio uploads. They do it like this:

1. When a web account is created, an FTP account and home directory is set up. 2. Via a PHP script, the user gives the name of the collection of files they want to upload. The script creates a directory for the upload on the FTP server. 3. The user logs in to the FTP server using their web username and password. They upload the files using an FTP client. 4. The user "checks in" the files. There is an HTML file in the FTP directory called "CLICK_HERE_WHEN_DONE.htm" which does this operation via a meta refresh. Alternatively, it's done automatically 48 hours after the directory creation.

So we could set up something like that. Or maybe we could just outsource our large file handling to them. It'd certainly save on hard drive costs, wouldn't it?

-- Tim Starling

Magnus Manske

2:23 p.m.

Tim Starling schrieb:

...

Magnus Manske wrote:

...
I've found some nice classical ogg files online (CC-BY-SA-2.0). However, some are larger than 20 MB. Uploading those leads me back to a blank upload page, without comment or error. 20MB seems to be a magical limt for PHP.

Is there a way to bypass that limit? I'd hate to have to cut perfectly good ogg files.

PHP stores the entire contents of the POST request in memory, as it is receiving it. That's why we can't allow arbitrarily large uploads, the server would run out of memory. In any case, HTTP does not support resuming for uploads, so it's quite fragile.

Ideally, we should use a protocol which is designed for uploading large files efficiently and robustly. FTP is one such protocol, that's what archive.org use for their video and audio uploads.

Or we could use a "mixed" solution: * I upload my file to a publically accessible file (ftp or http, no matter), if it's not already online * I call "Special:Upload?source=web" * The upload <input> is replaced with a simple text input row for the URL * Instead of using the PHP upload mechanism, MediaWiki just copies the file through ftp/http

Advantages: * Simple changes to MediaWiki * No need to set up ftp accounts etc.

Disadvantages: * User needs a place to store files temporarily online (shouldn't be too hard these days) * People might copy stuff from anywhere on the web (they can do that already, but only for small files ;-)) We might want to restrict this in some creative way; at least we could dynamically set the size limit (20 MB for newbies, 1GB for admins;-)

I'd volunteer to implement the above; sounds like just a few lines of code (with a simple hard limit, say, 100MB for everyone).

...

So we could set up something like that. Or maybe we could just outsource our large file handling to them. It'd certainly save on hard drive costs, wouldn't it?

Rely on others to store our valueable, multi-GB open content pr0n^W music files? Never! :-)

Magnus

Magnus Manske

5:16 p.m.

Magnus Manske schrieb:

...

I'd volunteer to implement the above; sounds like just a few lines of code (with a simple hard limit, say, 100MB for everyone).

OK, done :-)

Set $wgAllowCopyUploads to true to turn it on. $wgMaxUploadSize (default : 100MB) limits the file size.

Magnus

Brion Vibber

5:24 p.m.

Magnus Manske wrote:

...

Magnus Manske schrieb:

...
I'd volunteer to implement the above; sounds like just a few lines of code (with a simple hard limit, say, 100MB for everyone).

OK, done :-)

Set $wgAllowCopyUploads to true to turn it on. $wgMaxUploadSize (default : 100MB) limits the file size.

There are very serious security problems with this, as discussed on IRC. (Please try to be in #mediawiki when making commits for feedback.)

-- brion vibber (brion @ pobox.com)

Timwi

23 Aug 23 Aug

1:22 a.m.

Hi Magnus,

...

Or we could use a "mixed" solution:

I upload my file to a publically accessible file (ftp or http, no

matter), if it's not already online

I call "Special:Upload?source=web"

The upload <input> is replaced with a simple text input row for the URL

Instead of using the PHP upload mechanism, MediaWiki just copies the

file through ftp/http

Why are you suggesting an extra different upload page? Why not just add a radio button right there on the Upload page?

However, as Brion Vibber already mentioned, there are significant security issues with this. I have a suggestion that might solve them; if I have overlooked a security problem that this doesn't solve, please let me know.

My suggestion is thus:

* The upload page displays (if the "upload from web" option is selected) a randomly-generated token. This token is generated only once for every user, and then stays the same. * When uploading a file, the user needs to submit two URLs: * One that points to a text file containing the above token * One to the actual file he wants to upload * The upload is allowed only if the two files are on the same domain (or in the same directory, depending on how draconian you want it).

Ideas? Criticism? Timwi

Simetrical

1:30 a.m.

Why is downloading from an unknown server any less secure than downloading from an unknown user? You have to ensure that the file is non-malicious either way.

Timwi

24 Aug 24 Aug

10:34 p.m.

Simetrical wrote:

...

Why is downloading from an unknown server any less secure than downloading from an unknown user? You have to ensure that the file is non-malicious either way.

The uploaded file itself is not the only source of a potential security problem here.

Steve Bennett

10:42 p.m.

On 8/24/06, Timwi timwi@gmx.net wrote:

...

The uploaded file itself is not the only source of a potential security problem here.

Is that BEANS I can smell?:)

Steve

Magnus Manske

23 Aug 23 Aug

2:26 p.m.

Timwi schrieb:

...

Hi Magnus,

...
Or we could use a "mixed" solution:

I upload my file to a publically accessible file (ftp or http, no

matter), if it's not already online

I call "Special:Upload?source=web"

The upload <input> is replaced with a simple text input row for the URL

Instead of using the PHP upload mechanism, MediaWiki just copies the

file through ftp/http

Why are you suggesting an extra different upload page?

I don't.

...

Why not just add a radio button right there on the Upload page?

I have already implemented it. It is the same upload page, just with the textbox instead of the <input type=file>. It uses a little extra code in SpecialUpload.php, is all.

...

However, as Brion Vibber already mentioned, there are significant security issues with this. I have a suggestion that might solve them; if I have overlooked a security problem that this doesn't solve, please let me know.

On concerns by Brion and Tim, I've rewritten the copy-from-URL part using CURL, which makes the function less susceptible for malicious/broken sources.

...

My suggestion is thus:

The upload page displays (if the "upload from web" option is selected) a randomly-generated token. This token is generated only once for every user, and then stays the same.

When uploading a file, the user needs to submit two URLs:

One that points to a text file containing the above token

One to the actual file he wants to upload

The upload is allowed only if the two files are on the same domain (or in the same directory, depending on how draconian you want it).

This isn't really a security feature, as an Evil User (tm) can still upload any file (s)he wants.

It could, however, be a measure against newbies trying to copy random files from the web. They can do that, however, right now - thy only have to save the file locally, as long as it's not too large. So, it would prevent newbies with no own web space from uploading large files. Is that really worth the bother?

If activated, my implementation by default only grants admins the right to upload large files. So, to solve my original problem, I'd have to find a commons admin, and write on his/her talk page to please upload the files I stored at (URL), maybe give the file description/license there or insert it myself once it's up. As long as the overall number of large files to upload is low, that should work just fine.

Or I'll have to run for admin myself. I have a feeling I might be accepted ;-)

Magnus

Platonides

8:03 p.m.

Actually, the "upload from web" available for anyone would improve things, as there wouldn't be so much "no source". We could find the url from where it was caught. I am assuming the original url appearing on the summary. Now, the instructions to "move to commons" are download file, go to commons, fill summary, *upload file*, and press upload. It could be, as simple as go to commons, fill summary, fill url, upload. Even easier if you use the commonshelper, as it would fill almost all for you. I think the part where i spent most time in the 'to commons' process is in the down/uploading, as the file full of bytes must cross the net. Then you could add more tricks to it, as having a bot checking license for uploads from flickr, auto-nfd blacklisted urls, etc.

Platonides

Jay R. Ashworth

8:12 p.m.

On Wed, Aug 23, 2006 at 03:03:27PM +0200, Platonides wrote:

...

Actually, the "upload from web" available for anyone would improve things, as there wouldn't be so much "no source". We could find the url from where it was caught. I am assuming the original url appearing on the summary.

It occurs to *me* that logging the source URL in the file history might be useful in many circumstances.

Cheers -- jra

-- Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274 The Internet: We paved paradise, and put up a snarking lot.

Timwi

24 Aug 24 Aug

10:42 p.m.

...

I have already implemented it. It is the same upload page, just with the textbox instead of the <input type=file>.

Still sounds like a separate page (for the user) instead of a simple radio button. Why?

...

This isn't really a security feature, as an Evil User (tm) can still upload any file (s)he wants.

People can *already* upload any file they want (subject to size restrictions). I wasn't addressing that problem because it is not a problem of "upload via URL". I was trying to address the security issues that come from the user's ability to cause the server to perform any GET request to any server. But now that I think about it more, I haven't actually solved that issue at all: the necessity to retrieve the "token file" would still grant the user that ability... so scratch it all :)

...

So, to solve my original problem, I'd have to find a commons admin, and write on his/her talk page to please upload the files I stored at (URL), maybe give the file description/license there or insert it myself once it's up.

(You can actually place that information on the Image page even before it's up.)

Timwi

Simetrical

11 p.m.

On 8/24/06, Timwi timwi@gmx.net wrote:

...

I was trying to address the security issues that come from the user's ability to cause the server to perform any GET request to any server.

This is a problem why, provided the server is careful about what it does with the response? It could potentially be used for, e.g., flooding a third party's server, but it wouldn't be hard to restrict the harm that could do (by throttling), and no one could do much more damage that way than they could do without the WMF's help. An overwhelming number of massive, reputable sites are willing to execute arbitrary GET requests -- it's necessary for spidering, to begin with.

Given that this feature is *not* currently implemented, I see no reason not to discuss its possible implications openly.

Steve Bennett

11:11 p.m.

On 8/24/06, Simetrical Simetrical+wikitech@gmail.com wrote:

...

This is a problem why, provided the server is careful about what it does with the response? It could potentially be used for, e.g., flooding a third party's server, but it wouldn't be hard to restrict the harm that could do (by throttling), and no one could do much more damage that way than they could do without the WMF's help. An overwhelming number of massive, reputable sites are willing to execute arbitrary GET requests -- it's necessary for spidering, to begin with.

Given that this feature is *not* currently implemented, I see no reason not to discuss its possible implications openly.

I was trying to think of reasons too, and couldn't come up with much. Maybe since the GET operation is not exactly the same as the HTTP upload operation (again I don't know what I'm talking about), there would be a way of forcing MediaWiki to download something harmful to itself, such as an executable, or a file that would cause a buffer overrun? What if you set up a dodgy server that said it was going to download you a nice little .gif file, and instead sent you 10 gig of executable?

There are certainly lots of good uses for this feature, like slurping images from public domain repositories. Security aside, we shouldn't let the poor self-restraint of pokemon fans dictate what we can and can't do...

Steve

Simetrical

11:14 p.m.

On 8/24/06, Steve Bennett stevage@gmail.com wrote:

...

Maybe since the GET operation is not exactly the same as the HTTP upload operation (again I don't know what I'm talking about), there would be a way of forcing MediaWiki to download something harmful to itself, such as an executable, or a file that would cause a buffer overrun?

Buffer overruns are critical security flaws that are always specific to a particular implementation's misprogramming. PHP does not have any known buffer overruns; if it did, they'd likely be patched within days if not hours.

...

What if you set up a dodgy server that said it was going to download you a nice little .gif file, and instead sent you 10 gig of executable?

Same as if a user tries to submit 10 gigs of executable as an uploaded image: you either discard it, or interpret it as whatever it's claimed to be.

Magnus Manske

25 Aug 25 Aug

2:08 p.m.

Steve Bennett schrieb:

...

On 8/24/06, Simetrical Simetrical+wikitech@gmail.com wrote:

...
This is a problem why, provided the server is careful about what it does with the response? It could potentially be used for, e.g., flooding a third party's server, but it wouldn't be hard to restrict the harm that could do (by throttling), and no one could do much more damage that way than they could do without the WMF's help. An overwhelming number of massive, reputable sites are willing to execute arbitrary GET requests -- it's necessary for spidering, to begin with.

Given that this feature is *not* currently implemented, I see no reason not to discuss its possible implications openly.

I was trying to think of reasons too, and couldn't come up with much. Maybe since the GET operation is not exactly the same as the HTTP upload operation (again I don't know what I'm talking about), there would be a way of forcing MediaWiki to download something harmful to itself, such as an executable, or a file that would cause a buffer overrun? What if you set up a dodgy server that said it was going to download you a nice little .gif file, and instead sent you 10 gig of executable?

My implementation uses a hard limit (default: 100MB) and won't copy any more data than that, even if the size is reported wrong (maliciously or otherwise).

Magnus

Timwi

5:33 a.m.

Simetrical wrote:

...

On 8/24/06, Timwi timwi@gmx.net wrote:

...
I was trying to address the security issues that come from the user's ability to cause the server to perform any GET request to any server.

This is a problem why, provided the server is careful about what it does with the response?

It's not the response that's the problem, it's the GET request itself.

Suppose some stupid web programmer programmed a forum where you can delete posts with a GET request. If you can fire GET requests to any server from Wikimedia's servers, then the forum's servers will only log Wikimedia's IPs, and the mass-deletion forum vandal is now untraceable.

I'm sure there are even more significant cases that I haven't thought of.

Timwi

Simetrical

5:38 a.m.

On 8/24/06, Timwi timwi@gmx.net wrote:

...

Suppose some stupid web programmer programmed a forum where you can delete posts with a GET request. If you can fire GET requests to any server from Wikimedia's servers, then the forum's servers will only log Wikimedia's IPs, and the mass-deletion forum vandal is now untraceable.

1) Most web programmers aren't that stupid.

2) Even if they were that stupid, they wouldn't be stupid enough to allow an IP address completely unknown to their server to do anything bad to it.

3) Even if they were *that* stupid (and we're currently talking serious, serious stupid), they would have

Rob Church

5:40 a.m.

On 24/08/06, Simetrical Simetrical+wikitech@gmail.com wrote:

...

On 8/24/06, Timwi timwi@gmx.net wrote:

...
Suppose some stupid web programmer programmed a forum where you can delete posts with a GET request. If you can fire GET requests to any server from Wikimedia's servers, then the forum's servers will only log Wikimedia's IPs, and the mass-deletion forum vandal is now untraceable.

Most web programmers aren't that stupid.

Pfft. No, most web programmers are worse.

...

Even if they were that stupid, they wouldn't be stupid enough to

allow an IP address completely unknown to their server to do anything bad to it.

Yes they bloody would...

...

Even if they were *that* stupid (and we're currently talking

serious, serious stupid), they would have

Would have what?

Rob Church

Simetrical

5:43 a.m.

Argh, post got sent too early.

On 8/24/06, Timwi timwi@gmx.net wrote:

...

Suppose some stupid web programmer programmed a forum where you can delete posts with a GET request. If you can fire GET requests to any server from Wikimedia's servers, then the forum's servers will only log Wikimedia's IPs, and the mass-deletion forum vandal is now untraceable.

1) Most web programmers aren't that stupid.

2) Even if they were that stupid, they wouldn't be stupid enough to allow an IP address completely unknown to their server to do anything bad to it.

3) Even if they were *that* stupid (and we're currently talking serious, serious stupid), even if it could cause irreparable harm to their website, in fact even if following arbitrary GET requests would bring about the Apocalypse and plunge the Earth into a bath of fire, it wouldn't matter that we did so, because there are literally tens of thousands of sites that will do it for you. Any web spider *automatically* sends *millions* of arbitrary GET requests, and has to for the Internet as we know it to function. There is no way that sending arbitrary GET requests can hurt *anything*.

...

I'm sure there are even more significant cases that I haven't thought of.

See point 3 above. If there were good reasons for not following arbitrary GET requests, Google would not exist.

Rob Church

6:11 a.m.

On 24/08/06, Simetrical Simetrical+wikitech@gmail.com wrote:

...

See point 3 above. If there were good reasons for not following arbitrary GET requests, Google would not exist.

GOOGLE DELETED MY WIKI! THIEVING INFORMATION-SCABBING VANDALS!111oneone

You're correct, of course, but it doesn't mean we shouldn't be careful, too. :)

Rob Church

Steve Bennett

1:29 p.m.

On 8/25/06, Rob Church robchur@gmail.com wrote:

...

You're correct, of course, but it doesn't mean we shouldn't be careful, too. :)

In any case we obviously wouldn't allow this for anon users, and presumably we would do something nasty to anyone who caused us to get a complaint like "Your user X deleted my website using GET requests!"

Steve

Magnus Manske

2:05 p.m.

Steve Bennett schrieb:

...

On 8/25/06, Rob Church robchur@gmail.com wrote:

...
You're correct, of course, but it doesn't mean we shouldn't be careful, too. :)

In any case we obviously wouldn't allow this for anon users

AFAIK, we don't allow uploads for anon users anyway...

...

, and presumably we would do something nasty to anyone who caused us to get a complaint like "Your user X deleted my website using GET requests!"

The only punishment severe enough for letting random IPs GET-delete stuff would be to flood that server with mote GET requests ;-)

Magnus

Timwi

26 Aug 26 Aug

12:21 a.m.

Simetrical wrote:

...

Any web spider *automatically* sends *millions* of arbitrary GET requests, and has to for the Internet as we know it to function. There is no way that sending arbitrary GET requests can hurt *anything*.

That is simply not true. Web spiders only follow links.

The kinds of webmasters we are talking about here will assume that you can never fire a given GET URL if you never see a page with a link to it on it.

Timwi

Charlie Reams

12:30 a.m.

Timwi wrote:

...

Simetrical wrote:

...
Any web spider *automatically* sends *millions* of arbitrary GET requests, and has to for the Internet as we know it to function. There is no way that sending arbitrary GET requests can hurt *anything*.

That is simply not true. Web spiders only follow links.

The kinds of webmasters we are talking about here will assume that you can never fire a given GET URL if you never see a page with a link to it on it.

Timwi

Still, it would be a shame to lose such an obviously useful feature just to protect badly-programmed websites from a very weak attack that can easily be launched in myriad other ways. From my own experience with the Commons, it can be really tedious to upload even a medium-sized file, and this feature would totally solve that.

Charlie

Timwi

2:25 a.m.

Charlie Reams wrote:

...

Timwi wrote:

...
Simetrical wrote:

...
Any web spider *automatically* sends *millions* of arbitrary GET requests, and has to for the Internet as we know it to function. There is no way that sending arbitrary GET requests can hurt *anything*.

That is simply not true. Web spiders only follow links.

The kinds of webmasters we are talking about here will assume that you can never fire a given GET URL if you never see a page with a link to it on it.

Timwi

Still, it would be a shame to lose such an obviously useful feature just to protect badly-programmed websites from a very weak attack that can easily be launched in myriad other ways. From my own experience with the Commons, it can be really tedious to upload even a medium-sized file, and this feature would totally solve that.

Well yes, I completely agree with that. I am only pointing out things; I am not against the feature.

Rob Church

12:33 a.m.

On 25/08/06, Timwi timwi@gmx.net wrote:

...

The kinds of webmasters we are talking about here will assume that you can never fire a given GET URL if you never see a page with a link to it on it.

(Which is still a damn stupid assumption to make)

Not as bad as the ones who allow elementary SQL injection, etc. etc. There's probably still hundreds of thousands of web sites out there with basic flaws in. :)

Rob Church

Jay R. Ashworth

12:44 a.m.

On Fri, Aug 25, 2006 at 06:33:51PM +0100, Rob Church wrote:

...

On 25/08/06, Timwi timwi@gmx.net wrote:

...
The kinds of webmasters we are talking about here will assume that you can never fire a given GET URL if you never see a page with a link to it on it.

(Which is still a damn stupid assumption to make)

Not as bad as the ones who allow elementary SQL injection, etc. etc. There's probably still hundreds of thousands of web sites out there with basic flaws in. :)

Indeed; I can easily visualize a forum message page with a Delete Me link right on it.

Further authorization should clearly be required for that to actually happen, but the concept of such a link *existing* on a page isn't by any means beyond the pale...

Cheers, -- jra

Simetrical

1:03 a.m.

On 8/25/06, Timwi timwi@gmx.net wrote:

...

That is simply not true. Web spiders only follow links.

And since when is following an HTTP link *not* sending a GET request? You'd have to Google-bomb it to get Google to do it, granted, and that only hours or days later, but many other sites (ImageShack comes to mind) will execute arbitrary GET requests immediately upon request. Heck, you could even grab some random stranger's e-mail address and say "Hey, follow this cool link!". Or just use an ISP that uses proxies. Or find a high-quality open proxy. Or use Tor. Or . . .

. . . you get the picture. There is literally *no* *security* *reason* *at all* for MediaWiki to not send arbitrary GET requests. Period. The only difference from our side is that we have a GET response instead of a POST, which is no security difference at all, and if anything can harm the recipient (which it overwhelmingly can't), we aren't going to make an already trivial task any easier.

Timwi

2:28 a.m.

...

.. . . you get the picture. There is literally *no* *security* *reason* *at all* for MediaWiki to not send arbitrary GET requests. Period.

Okay then, go ahead and introduce the feature :-)

Neil Harris

27 Aug 27 Aug

3:24 a.m.

Timwi wrote:

[someone else wrote:]

...

...
.. . . you get the picture. There is literally *no* *security* *reason* *at all* for MediaWiki to not send arbitrary GET requests. Period.

OK, here's one scenario. This feature could be used for denial-of-service attacks against other sites, by using Wikipedia's high-bandwidth server farm as a dowload bandwidth amplifier: an attacker could simply set many downloads going at once to one server, at the cost of trivial bandwidth overhead to set up each connection.

-- N

...

Okay then, go ahead and introduce the feature :-)

Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Steve Bennett

3:44 a.m.

On 8/26/06, Neil Harris neil@tonal.clara.co.uk wrote:

...

OK, here's one scenario. This feature could be used for denial-of-service attacks against other sites, by using Wikipedia's high-bandwidth server farm as a dowload bandwidth amplifier: an attacker could simply set many downloads going at once to one server, at the cost of trivial bandwidth overhead to set up each connection.

You could pretty much rule that out by limiting downloads to one at a time per login. And you could do that simply by checking the time since the last download started, and making sure it was at least 10 minutes ago or something. Or to be nicer, check when the *second last* download started, in case they made a mistake and want to try again.

Steve

Neil Harris

6:56 a.m.

Steve Bennett wrote:

...

On 8/26/06, Neil Harris neil@tonal.clara.co.uk wrote:

...
OK, here's one scenario. This feature could be used for denial-of-service attacks against other sites, by using Wikipedia's high-bandwidth server farm as a dowload bandwidth amplifier: an attacker could simply set many downloads going at once to one server, at the cost of trivial bandwidth overhead to set up each connection.

You could pretty much rule that out by limiting downloads to one at a time per login. And you could do that simply by checking the time since the last download started, and making sure it was at least 10 minutes ago or something. Or to be nicer, check when the *second last* download started, in case they made a mistake and want to try again.

Steve _______________________________________________

That check would be easily worked around by, for example, creating many different accounts, and launching an attack from each one. Yes, you can build countermeasures to try and stop that, but there are counter-counter-measures to those, and so on...

-- Neil

Simetrical

9:50 a.m.

On 8/26/06, Neil Harris neil@tonal.clara.co.uk wrote:

...

OK, here's one scenario. This feature could be used for denial-of-service attacks against other sites, by using Wikipedia's high-bandwidth server farm as a dowload bandwidth amplifier: an attacker could simply set many downloads going at once to one server, at the cost of trivial bandwidth overhead to set up each connection.

Nothing that can't be done already with, say, ImageShack. We could throttle per IP as well as per user (to a higher rate than one per ten minutes, though), and if someone's going to use lots of anonymous proxies or a botnet, they could just use them to download directly. We could also provide X-Forwarded-For to indicate directly who's causing the trouble, unless we're going to suppress that for privacy reasons (which would be slightly ironic given our situation with ISPs like AOL).

If someone uses Wikipedia for abuse, obviously that person could be dealt with. The abuse isn't disastrous, many existing sites would enable it equally well, and so it shouldn't be held against a potentially quite useful feature in the slightest.

Warhog

25 Aug 25 Aug

2:41 p.m.

The problem could easily be solved. When we recommend the user to upload a text-file containing the information, we can also recommend which name that file should have. For example if user X wants to load a file from ftp://name:pw@example.org/~user/myfile.ogg mediawiki could automatically search for a file ftp://name:pw@example.org/~user/mediawiki_access_id.txt (or something like that). So the user cannot enter a specific GET-Target - thereby prohibiting the behaviour we fear.

greetings

Am Freitag, 25. August 2006 00:33 schrieb Timwi:

...

Simetrical wrote:

...
On 8/24/06, Timwi timwi@gmx.net wrote:

...
I was trying to address the security issues that come from the user's ability to cause the server to perform any GET request to any server.

This is a problem why, provided the server is careful about what it does with the response?

It's not the response that's the problem, it's the GET request itself.

Suppose some stupid web programmer programmed a forum where you can delete posts with a GET request. If you can fire GET requests to any server from Wikimedia's servers, then the forum's servers will only log Wikimedia's IPs, and the mass-deletion forum vandal is now untraceable.

I'm sure there are even more significant cases that I haven't thought of.

Timwi

Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Steve Bennett

4:12 p.m.

On 8/25/06, Warhog mediazilla@warhog.net wrote:

...

The problem could easily be solved. When we recommend the user to upload a text-file containing the information, we can also recommend which name that file should have. For example if user X wants to load a file from ftp://name:pw@example.org/~user/myfile.ogg mediawiki could automatically search for a file ftp://name:pw@example.org/~user/mediawiki_access_id.txt (or something like that). So the user cannot enter a specific GET-Target - thereby prohibiting the behaviour we fear.

Err, that would kill off every use of this feature for me. If I already had a file on a machine that I control, I would simply upload it like normal. The point (I thought) was to avoid having to transfer a file to a machine that you control before uploading it.

It seems that: * there are no real security issues with allowing arbitrary GETs to arbitrary sites (if throttled and restricted to some reasonable number of GETs per hour, like maybe 10-60). * since it could make it even easier to upload copyrighted content, it should be a privelege that can be revoked from people * it is fairly easy to implement

Therefore: Let's do it. (someone?) :)

Steve

Rob Church

8:09 p.m.

On 25/08/06, Steve Bennett stevage@gmail.com wrote:

...

it is fairly easy to implement

Therefore: Let's do it. (someone?) :)

You know how to submit a patch. We'll review it once you're done.

Rob Church

Steve Bennett

8:36 p.m.

On 8/25/06, Rob Church robchur@gmail.com wrote:

...

You know how to submit a patch. We'll review it once you're done.

See my other post where I offer my lame excuses about not yet knowing PHP, and going on holidays. :) (I'm not kidding about the lameness)

Steve

Rob Church

8:41 p.m.

On 25/08/06, Steve Bennett stevage@gmail.com wrote:

...

On 8/25/06, Rob Church robchur@gmail.com wrote:

...
You know how to submit a patch. We'll review it once you're done.

See my other post where I offer my lame excuses about not yet knowing PHP, and going on holidays. :) (I'm not kidding about the lameness)

In this case, I was semi-joking. :)

It's like those bug reports...I think this (semi-remembered) quote from Avar in IRC tells it well...

<avar> Yeah, "I do not predict that this would be too hard to implement" <avar> Well, why don't I predict you writing it and submitting a patch, then?

Rob Church

Mark Clements

29 Aug 29 Aug

9:34 p.m.

"Timwi" timwi@gmx.net wrote in message news:ecl9gm$dij$1@sea.gmane.org...

...

Simetrical wrote:

...
On 8/24/06, Timwi timwi@gmx.net wrote:

...
I was trying to address the security issues that come from the user's ability to cause the server to perform any GET request to any server.

This is a problem why, provided the server is careful about what it does with the response?

It's not the response that's the problem, it's the GET request itself.

Suppose some stupid web programmer programmed a forum where you can delete posts with a GET request. If you can fire GET requests to any server from Wikimedia's servers, then the forum's servers will only log Wikimedia's IPs, and the mass-deletion forum vandal is now untraceable.

I'm sure there are even more significant cases that I haven't thought of.

Timwi

It would not be hard to include appropriate trace information in the headers (referrer & useragent), which will show up in the remote website's logs. For example, IP/username of the uploader, link back to the resulting image page, etc.

We could even set the referrer URL to a non-editable page giving full details about the specific request, with further links that explain the feature and how it works, give details about how to report abuse/copyright infringement, etc.

- Mark Clements (HappyDog)

Platonides

1 Sep 1 Sep

9:13 p.m.

"Mark Clements" wrote:

...

It would not be hard to include appropriate trace information in the headers (referrer & useragent), which will show up in the remote website's logs. For example, IP/username of the uploader, link back to the resulting image page, etc.

We could even set the referrer URL to a non-editable page giving full details about the specific request, with further links that explain the feature and how it works, give details about how to report abuse/copyright infringement, etc.

Mark Clements (HappyDog)

We could even give a From: request pointing to a verified email (BIG letters saying it!!) redirecting webmaster to the email owner. But i don't see it too useful.

P.S. There's already an implementation on svn.

Gregory Maxwell

24 Aug 24 Aug

11:08 p.m.

On 8/24/06, Timwi timwi@gmx.net wrote:

...

I was trying to address the security issues that come from the user's ability to cause the server to perform any GET request to any server. But now that I think about it more, I haven't actually solved that issue at all: the necessity to retrieve the "token file" would still grant the user that ability... so scratch it all :)

How is this solved in Open-ID implementations?

Erik Moeller

22 Aug 22 Aug

2:51 p.m.

On 8/22/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:

...

Or maybe we could just outsource our large file handling to them. It'd certainly save on hard drive costs, wouldn't it?

I think that would make sense, yes. Archive.org could get API access to push new metadata records to Commons, so that the files can be referenced using standard syntax, but external URLs are used whenever they are directly pointed to.

Put it in the non-existent Wikimedia roadmap. ;-)

-- Peace & Love, Erik

Alex Powell

2:54 p.m.

Out of interest, how big is Wikipedia now? I mean the actual InnoDB file on the master server, and all the images in the media folder - not the ones that are released to the public.

Kind regards,

Alex

On 8/22/06, Erik Moeller eloquence@gmail.com wrote:

...

On 8/22/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:

...
Or maybe we could just outsource our large file handling to them. It'd certainly save on hard drive costs, wouldn't it?

I think that would make sense, yes. Archive.org could get API access to push new metadata records to Commons, so that the files can be referenced using standard syntax, but external URLs are used whenever they are directly pointed to.

Put it in the non-existent Wikimedia roadmap. ;-)

Peace & Love, Erik _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Walter Vermeir

26 Aug 26 Aug

2:22 p.m.

Tim Starling schreef:

...

PHP stores the entire contents of the POST request in memory, as it is receiving it. That's why we can't allow arbitrarily large uploads, the server would run out of memory. In any case, HTTP does not support resuming for uploads, so it's quite fragile.

The file sehare service YouSendIt.com , now only providing 100mb free upload, used the support the upload of file up to 1gb with only a basic webform.

Maybe the are willing to share how the do that if asked.

What also maybe is an idea is the use one specific server for the upload of large files files. And when you inter a request to upload a large file it is not uploaded but queued.

The user can get a message like; "there are 14 uploads before you. Estimated time to completion: 3 hours 53 minutes

That is not a fantastic system but at least you could upload large files.

Currently when you upload a non-supported file format I have the very strong impression the check of the is allowed is done after the file is uploaded. That is annoying for the user and a waste resources to upload the file.

-- Contact: walter AT wikizine DOT org Wikizine.org - news for and about the Wikimedia community

Rob Church

9:08 p.m.

On 26/08/06, Walter Vermeir walter@wikipedia.be wrote:

...

Currently when you upload a non-supported file format I have the very strong impression the check of the is allowed is done after the file is uploaded. That is annoying for the user and a waste resources to upload the file.

The pre-upload check couldn't determine that the user *was* passing a file which was, e.g. a GIF, a JPEG, a PNG...that has to happen through server-side MIME detection, which is done post-facto.

The most we could do is to check that the extension was on the allowed list; we'd still have to check *what* the user uploaded afterwards and make sure it was still allowed.

Rob Church

Steve Bennett

10:33 p.m.

On 8/26/06, Rob Church robchur@gmail.com wrote:

...

The most we could do is to check that the extension was on the allowed list; we'd still have to check *what* the user uploaded afterwards and make sure it was still allowed.

If by any chance, anyone works on this, would it also be possible to do the name check *before* the upload, rather than afterwards? It's happened to me a couple of times that I've uploaded a file of several meg, wandered off, and found out later that it's waiting for me to confirm that I really do want to take the incredibly drastic step of replacing the spaces in the name with underscores.

(even better, skip that confirmation altogether)

Steve

Rob Church

10:36 p.m.

On 26/08/06, Steve Bennett stevage@gmail.com wrote:

...

If by any chance, anyone works on this, would it also be possible to do the name check *before* the upload, rather than afterwards? It's happened to me a couple of times that I've uploaded a file of several meg, wandered off, and found out later that it's waiting for me to confirm that I really do want to take the incredibly drastic step of replacing the spaces in the name with underscores.

That particular check should be binned, there's no real point in it; the image is addressed the same when editing, regardless.

You should find that the upload happens once; the code behind it "stashes" the file elsewhere when presenting warnings, and passes back a hidden token which lets MediaWiki know about the save state.

Rob Church

Steve Bennett

10:39 p.m.

On 8/26/06, Rob Church robchur@gmail.com wrote:

...

You should find that the upload happens once; the code behind it "stashes" the file elsewhere when presenting warnings, and passes back a hidden token which lets MediaWiki know about the save state.

Yeah, I didn't mean that the upload happens twice, the process is like:

1. Press upload 2. Make coffee 3. Confirm that you want to rename the file 4. File is available.

I don't know how long you can wait between steps 3 and 4.

Steve

Platonides

1 Sep 1 Sep

9:16 p.m.

Name errors can be avoided by javascript checking. "Please add an extension". "No, i don't want .jpeg files, use .jpg" and so on...

6689

Age (days ago)

6700

Last active (days ago)

wikitech-l@lists.wikimedia.org

56 comments

17 participants

tags (0)

participants (17)

Alex Powell
Brion Vibber
Charlie Reams
Erik Moeller
Gregory Maxwell
Jay R. Ashworth
Magnus Manske
Mark Clements
Neil Harris
Platonides
Rob Church
Simetrical
Steve Bennett
Tim Starling
Timwi
Walter Vermeir
Warhog