On 6/3/15, Charles Andrès <charles.andres(a)wikimedia.ch> wrote:
Out of interest, how many processing threads were chosen in GWT for
the job? It may be an idea if the input page is changed to default to
2 threads and there are warnings if you have more than 8 or so. I can
imagine 20 processing threads causing a server issue for large files
and in practice I used 4 or 5 for my largest upload jobs; probably
something to usefully add to the user guide.
I kept the by default setting of 5. My guess is that when our server get
overloaded, GWToolset start making more request, those it would have done
natural, and repeating those who were failing, but it’s just my guess.
Charles
Hmm, I'm not sure if it would try multiple times for requests that
fail, but in terms of things logged, that does not appear to be the
case (But it might try multiple time per item logged as failure).
Here's the day by day of your upload job (for User:Neuchâtel Herbarium)
MariaDB [commonswiki_p]> select substr( log_timestamp, 1, 8 ),
log_action, count(*) from logging_logindex where log_type =
'gwtoolset' and log_timestamp > '20150500000000' and log_user =
2103899 and log_action != 'metadata-job' group by 1, 2;
+-------------------------------+-------------------------+----------+
| substr( log_timestamp, 1, 8 ) | log_action | count(*) |
+-------------------------------+-------------------------+----------+
| 20150526 | mediafile-job-failed | 378 |
| 20150526 | mediafile-job-succeeded | 926 |
| 20150527 | mediafile-job-failed | 115 |
| 20150527 | mediafile-job-succeeded | 3734 |
| 20150528 | mediafile-job-failed | 6431 |
| 20150528 | mediafile-job-succeeded | 6327 |
| 20150529 | mediafile-job-failed | 12148 |
| 20150530 | mediafile-job-failed | 11915 |
| 20150531 | mediafile-job-failed | 12371 |
| 20150531 | mediafile-job-succeeded | 6 |
| 20150601 | mediafile-job-failed | 7636 |
| 20150601 | mediafile-job-succeeded | 225 |
+-------------------------------+-------------------------+----------+
12 rows in set (0.56 sec)
On May 28th, about 50% of the files failed, however the number of
files attempted to be fetched was roughly the same as on may 29 when
every single file failed.
I think this suggests that gwtoolset should have some sort of back-off
feature when things start to fail (particularly due to "HTTP request
timed out.") to slow down the request rate.
--bawolff