Greetings,
I'm trying to import a file of approx 1 Megabyte (1014 KB to be precise) but the import stops around 20% of the way and returns a blank page. I've been able to exclude a few things:
1) this isn't a problem with the import: it works with other (smaller) files. 2) it isn't a problem with PHP limits: they are set to 2 MB 3) it isn't a problem with the hidden input field, also set to 2MB 4) it isn't a problem with the content of the file: the import stops at around 20% of the way, but not on the same page every time.
Am I missing something? Could it be a webserver timeout? Could it be a MySQL timeout?
Thanks for your help.
Manu
Emanuele D'Arrigo wrote:
Greetings,
I'm trying to import a file of approx 1 Megabyte (1014 KB to be precise) but the import stops around 20% of the way and returns a blank page. I've been able to exclude a few things:
- this isn't a problem with the import: it works with other (smaller) files.
- it isn't a problem with PHP limits: they are set to 2 MB
- it isn't a problem with the hidden input field, also set to 2MB
- it isn't a problem with the content of the file: the import stops at
around 20% of the way, but not on the same page every time.
Am I missing something? Could it be a webserver timeout? Could it be a MySQL timeout?
Thanks for your help.
Manu
A php timeout? http://www.php.net/set_time_limit
For importing files, it's better to use a command line script (which will also be faster).
On 7/26/07, Platonides Platonides@gmail.com wrote:
A php timeout? http://www.php.net/set_time_limit
Thank you for this suggestion. I have investigated the matter reading the page you pointed me to. It appears that set_time_limit is subject to the max_execution_time setting in php.ini. The set_time_limit() function just resets the timeout counter. In my case max_execution_time is set to 3000 seconds, which is plenty. My imports stop after about 60 seconds. In fact in one case I also received an error message which unfortunately I didn't manage to record. But it said the timeout was enforced by include/globalFunction.php. Unfortunately I was not able to locate in it a meaningful timeout value to increase.
For importing files, it's better to use a command line script (which will also be faster).
I guess you are referring to maintenance/importDump.php. Indeed if I don't find a workaround that option will have to do. I was hoping not to go down that root though. Had it been for myself only and as a one-off thing it wouldn't be a problem. Unfortunately whatever workflow I find it is for the users of an application that export xml transformed through an XSLT stylesheet. Ideally they would just export the file and import it in MediaWiki through the Special:Import page. It's crucial to keep it that simple for them. Luckily, this all works for small files (<150/180 KB) and this will help them already. But I'd love for them to be able to handle much bigger ones.
Beside, let's face it, if MediaWiki has a 2MB limit, I should be able to upload at least 1.5 MB without a problem! No?
Ciao!
Manu
2007/7/26, Emanuele D'Arrigo manu3d@gmail.com:
Am I missing something? Could it be a webserver timeout? Could it be a MySQL timeout?
Thanks for your help.
Is it your own web server? If not, many (most?) webhosters have limits on the amount of time a script is allowed to run before it's automatically aborted.
On 7/26/07, Schneelocke schneelocke@gmail.com wrote:
2007/7/26, Emanuele D'Arrigo manu3d@gmail.com:
Am I missing something? Could it be a webserver timeout? Could it be a MySQL timeout?
Is it your own web server? If not, many (most?) webhosters have limits on the amount of time a script is allowed to run before it's automatically aborted.
It's the an internal company's windows-based webserver. I think IIS. I'll ask to the sysadmins about this. So you are saying the webserver software has a limit on top of the php limit?
Ciao!
Manu
On Jul 26, 2007, at 9:57 AM, Emanuele D'Arrigo wrote:
Greetings,
I'm trying to import a file of approx 1 Megabyte (1014 KB to be precise) but the import stops around 20% of the way and returns a blank page. I've been able to exclude a few things:
- this isn't a problem with the import: it works with other
(smaller) files. 2) it isn't a problem with PHP limits: they are set to 2 MB 3) it isn't a problem with the hidden input field, also set to 2MB 4) it isn't a problem with the content of the file: the import stops at around 20% of the way, but not on the same page every time.
Are you sure about this? 90% of my import problems are from invalid XML in the dump, usually unescaped html entities or bad characters in the page titles.
Am I missing something? Could it be a webserver timeout? Could it be a MySQL timeout?
Thanks for your help.
Manu
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On 7/26/07, Jim Hu jimhu@tamu.edu wrote:
- it isn't a problem with the content of the file: the import
stops at around 20% of the way, but not on the same page every time.
Are you sure about this? 90% of my import problems are from invalid XML in the dump, usually unescaped html entities or bad characters in the page titles.
I'm quite sure because when I reduce the file to be imported into smaller ones they upload fine. Out of my many import attempts -one- generated an error message rather than an empty page. Unfortunately I was not able to record the error message precisely but it referred to a timeout limit somewhere in include/globalFunctions.php. I have examined the source code for such file but I wasn't able to find a timeout value to change.
Is this a problem that I should refer to wikitech-l?
Ciao!
Manu
(... large imports failing through the web interface likely due to timeouts...) I just realized there might be an easy way out of this.
My biggest concern right now is easyness of use. A set of users generates a lot of data through a database and I'd like this data to be uploaded in MW for the benefit of all other users. The Special:Import page is easy enough for them to use but has the problem I described in previous messages. Even raising the timeout tolerances, i.e. in php.ini or in the webserver if necessary, is unlikely to solve once and for all this, especially given the lack of error messages which would lead to puzzled users in case those tolerances are exceeded.
Unfortunately using command line tools such as those in mediawiki/maintenance would be uncomfortable to those users generating the data.
I've been wondering then: could a scheduled event (a cronjob or a windows equivalent) run a script to check the content of an import directory where users put the files to be imported in MW, and, if new files are found, trigger the execution of the import script? The same thing could be done with images I'd imagine.
This way the users would only have to worry about saving or transferring the files to be imported in the right location, confident that they'd be uploaded, say, every 15 minutes.
Does anybody see any drawbacks with this method?
Ciao!
Manu
On 29/07/07, Emanuele D'Arrigo manu3d@gmail.com wrote:
Does anybody see any drawbacks with this method?
Sounds pretty solid to me.
Rob Church
Hi There, Does anyone know if it is possible to create "friendly urls" for mediawiki on a windows server? Thanks, Kristin
Does anyone know if it is possible to create "friendly urls" for mediawiki on a windows server?
Friendly URLs? As in eliminating index.php? This should cover it: http://www.mediawiki.org/wiki/Manual:Short_URL
HTH.
-- F.
Thank you, Frederik. Yes, I want to change: http://example.com/wiki/index.php?title=PageTtitle to: http://example.com/wiki/PageTitle
I am a little green with this, but have been advised that "friendly urls" will increase my visibility in the search engines. Does anyone have experience with this? And is there one better way to create "friendly urls" than another, as far as the search engines go?
Thanks and thanks, Kristin
----- Original Message ----- From: "Frederik Dohr" fdg001@gmx.net To: "MediaWiki announcements and site admin list" mediawiki-l@lists.wikimedia.org Sent: Sunday, July 29, 2007 1:10 PM Subject: Re: [Mediawiki-l] Friendly URLs on Windows Server?
Does anyone know if it is possible to create "friendly urls" for mediawiki on a windows server?
Friendly URLs? As in eliminating index.php? This should cover it: http://www.mediawiki.org/wiki/Manual:Short_URL
HTH.
-- F.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On 29/07/07, Kristin kristin@traditionalwellness.com wrote:
Thank you, Frederik. Yes, I want to change: http://example.com/wiki/index.php?title=PageTtitle to: http://example.com/wiki/PageTitle
It's easier to have the real folder and the alias with different names. ie. don't have /wiki/ in both urls.
I am a little green with this, but have been advised that "friendly urls" will increase my visibility in the search engines. Does anyone have experience with this? And is there one better way to create "friendly urls" than another, as far as the search engines go?
Done correctly, the fact that they aren't the real urls should be completely transparent to the outside world, so I can't see if making any difference how you do it. Any short url should work just as well as any other.
I'm sure that those who know better will correct me, but I don't think most search engines care about the difference between
http://mywiki.org/wiki/pagename
and
http://mywiki.org/wiki/index.php/pagename (which works without changing the base installation)
However, some of them don't like:
http://mywiki.org/wiki/index.php?title=pagename
The other benefit of shorter urls is so your users don't have to type as much or remember a long URL (I foolishly named my wiki directory "colipedia" instead of "wiki" or "w") But this can be accomplished with redirecting instead of mod_rewrite. I do it by using a php script instead of html to handle 404 not found errors on my site:
<?php $url = 'http://ecoliwiki.net/colipedia/index.php/'; if (isset($_SERVER['REDIRECT_URL'])){ $url .= basename($_SERVER['REDIRECT_URL']); } if (isset($_SERVER['PATH_INFO'])) $url .= $_SERVER['PATH_INFO']; if (isset($_SERVER['QUERY_STRING'])) $url .= $_SERVER['QUERY_STRING'];
header("Location:$url"); ?>
I save this as error.php in my wiki's base directory and point apache's error handling at it. It seems to work, insofar as sending lots of mangled urls to reasonable pages. Anyone have any thoughts on whether this is better/worse/neutral compared to the usual way? I did this after playing with similar redirection to handle people losing session info due to a combination of parser caching and coming in from .org, .net, and .com domains.
Jim
On Jul 30, 2007, at 9:55 AM, Thomas Dalton wrote:
On 29/07/07, Kristin kristin@traditionalwellness.com wrote:
Thank you, Frederik. Yes, I want to change: http://example.com/wiki/index.php?title=PageTtitle to: http://example.com/wiki/PageTitle
It's easier to have the real folder and the alias with different names. ie. don't have /wiki/ in both urls.
I am a little green with this, but have been advised that "friendly urls" will increase my visibility in the search engines. Does anyone have experience with this? And is there one better way to create "friendly urls" than another, as far as the search engines go?
Done correctly, the fact that they aren't the real urls should be completely transparent to the outside world, so I can't see if making any difference how you do it. Any short url should work just as well as any other.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
If you wanted to be even cuter, you could make a wiki page where users list what they want imported, and have a watchlist trigger the import instead of a cron job.
i.e. change in the import list page sends an email to a robot account, which triggers on receipt of the email, loads whatever is desired, and clears the page. You could even send error messages to the Talk page!
Jim
On Jul 29, 2007, at 5:02 AM, Emanuele D'Arrigo wrote:
(... large imports failing through the web interface likely due to timeouts...) I just realized there might be an easy way out of this.
My biggest concern right now is easyness of use. A set of users generates a lot of data through a database and I'd like this data to be uploaded in MW for the benefit of all other users. The Special:Import page is easy enough for them to use but has the problem I described in previous messages. Even raising the timeout tolerances, i.e. in php.ini or in the webserver if necessary, is unlikely to solve once and for all this, especially given the lack of error messages which would lead to puzzled users in case those tolerances are exceeded.
Unfortunately using command line tools such as those in mediawiki/maintenance would be uncomfortable to those users generating the data.
I've been wondering then: could a scheduled event (a cronjob or a windows equivalent) run a script to check the content of an import directory where users put the files to be imported in MW, and, if new files are found, trigger the execution of the import script? The same thing could be done with images I'd imagine.
This way the users would only have to worry about saving or transferring the files to be imported in the right location, confident that they'd be uploaded, say, every 15 minutes.
Does anybody see any drawbacks with this method?
Ciao!
Manu _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On 7/26/07, Emanuele D'Arrigo manu3d@gmail.com wrote:
Greetings,
I'm trying to import a file of approx 1 Megabyte (1014 KB to be precise) but the import stops around 20% of the way and returns a blank page. I've been able to exclude a few things:
- this isn't a problem with the import: it works with other (smaller)
files. 2) it isn't a problem with PHP limits: they are set to 2 MB 3) it isn't a problem with the hidden input field, also set to 2MB 4) it isn't a problem with the content of the file: the import stops at around 20% of the way, but not on the same page every time.
Am I missing something? Could it be a webserver timeout? Could it be a MySQL timeout?
Following from this initial message and for posterity, I've been able to locate the limiting factor. After many blank pages returned, for a second time the import process returned an error message:
*Fatal error*: Maximum execution time of 240 seconds exceeded in * D:\Inetpub\wwwroot\MediaWiki\includes\GlobalFunctions.php* on line *245 * The first time around I didn't manage to preserve the message. * *It turns out however that the message only points to the location of the function exceeding the execution time limit, and not where the time is set. The Maximum Execution Time limit is set in the appropriately named variable maximum_execution_time in php.ini. The default value is 60 seconds.
One problem I'm facing is how to determine a reasonable value that can allow for the import of an entire 2MB file. The 240 seconds limit I just set has allowed for about 750 KB to be imported. I will do the easy math and find the value, but if the server is under heavy load even that could still result in an incomplete import.
Interestingly, if the import process had a consistent output error or had a live progress report, with each page appearing in the output as soon as it has been stored on the wiki, the user could then edit the file to import whatever was left out.
Now, any suggestions about where I should document this findings for other people not to get stuck as I did? Help:Import maybe?
Ciao!
Manu
On 7/31/07, Emanuele D'Arrigo manu3d@gmail.com wrote:
It turns out however that the message only points to the location of the function exceeding the execution time limit, and not where the time is set. The Maximum Execution Time limit is set in the appropriately named variable maximum_execution_time in php.ini. The default value is 60 seconds.
No, sorry, I mixed up things. PHP misleadingly returns a message about execution time, but the variable that came into play is actually max_input_time. max_execution_time is 3000 by default and doesn't seem to be the limiting factor.
Hope it helps.
Manu
mediawiki-l@lists.wikimedia.org