On 6/3/15, Charles Andrès charles.andres@wikimedia.ch wrote:
Out of interest, how many processing threads were chosen in GWT for the job? It may be an idea if the input page is changed to default to 2 threads and there are warnings if you have more than 8 or so. I can imagine 20 processing threads causing a server issue for large files and in practice I used 4 or 5 for my largest upload jobs; probably something to usefully add to the user guide.
I kept the by default setting of 5. My guess is that when our server get overloaded, GWToolset start making more request, those it would have done natural, and repeating those who were failing, but it’s just my guess.
Charles
Hmm, I'm not sure if it would try multiple times for requests that fail, but in terms of things logged, that does not appear to be the case (But it might try multiple time per item logged as failure).
Here's the day by day of your upload job (for User:Neuchâtel Herbarium)
MariaDB [commonswiki_p]> select substr( log_timestamp, 1, 8 ), log_action, count(*) from logging_logindex where log_type = 'gwtoolset' and log_timestamp > '20150500000000' and log_user = 2103899 and log_action != 'metadata-job' group by 1, 2; +-------------------------------+-------------------------+----------+ | substr( log_timestamp, 1, 8 ) | log_action | count(*) | +-------------------------------+-------------------------+----------+ | 20150526 | mediafile-job-failed | 378 | | 20150526 | mediafile-job-succeeded | 926 | | 20150527 | mediafile-job-failed | 115 | | 20150527 | mediafile-job-succeeded | 3734 | | 20150528 | mediafile-job-failed | 6431 | | 20150528 | mediafile-job-succeeded | 6327 | | 20150529 | mediafile-job-failed | 12148 | | 20150530 | mediafile-job-failed | 11915 | | 20150531 | mediafile-job-failed | 12371 | | 20150531 | mediafile-job-succeeded | 6 | | 20150601 | mediafile-job-failed | 7636 | | 20150601 | mediafile-job-succeeded | 225 | +-------------------------------+-------------------------+----------+ 12 rows in set (0.56 sec)
On May 28th, about 50% of the files failed, however the number of files attempted to be fetched was roughly the same as on may 29 when every single file failed.
I think this suggests that gwtoolset should have some sort of back-off feature when things start to fail (particularly due to "HTTP request timed out.") to slow down the request rate.
--bawolff