I’m not really competent to understand the technical aspect, if it can help, here and exemple of the request who were actually done to our server when all job were failing:

208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:23 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_343-5-5.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:28 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_344-1-2.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:29 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_344-2-2.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:31 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_344-1-1.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:38 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_344-1-3.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:41 +0200] "GET
/penard/Collection_Penard_MHNG_Specimen_339-17-3.tif HTTP/1.1" 200
3322941 "-" "MediaWiki/1.26wmf7"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:41 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_344-2-3.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:50 +0200] "GET
/penard/Collection_Penard_MHNG_Specimen_345-4-2.tif HTTP/1.1" 200 867440
"-" "MediaWiki/1.26wmf7"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:52 +0200] "GET
/penard/Collection_Penard_MHNG_Specimen_342-2-1.tif HTTP/1.1" 200
1837688 "-" "MediaWiki/1.26wmf7"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:17:55 +0200] "GET
/penard/Collection_Penard_MHNG_Specimen_342-3-1.tif HTTP/1.1" 200
1195016 "-" "MediaWiki/1.26wmf7"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:18:01 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_341-2-2.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:18:02 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_341-2-3.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:18:04 +0200] "HEAD
/penard/Collection_Penard_MHNG_Specimen_344-2-4.tif HTTP/1.1" 200 0 "-"
"MediaWiki/1.26wmf7 GWToolset/0.3.8"
208.80.154.156 lan.wikimedia.ch - [29/May/2015:13:18:07 +0200] "GET
/penard/Collection_Penard_MHNG_Specimen_366-2-2.tif HTTP/1.1" 200
1035100 "-" "MediaWiki/1.26wmf7 »



___________________________________________________________

Charles ANDRES, Chief Science Officer
"Wikimedia CH" – Association for the advancement of free knowledge –
www.wikimedia.ch
Office +41 (0)21 340 66 21
Mobile +41 (0)78 910 00 97
Skype: charles.andres.wmch
IRC://irc.freenode.net/wikimedia-ch

Le 3 juin 2015 à 16:10, Brian Wolff <bawolff@gmail.com> a écrit :

On 6/3/15, Charles Andrès <charles.andres@wikimedia.ch> wrote:



Out of interest, how many processing threads were chosen in GWT for
the job? It may be an idea if the input page is changed to default to
2 threads and there are warnings if you have more than 8 or so. I can
imagine 20 processing threads causing a server issue for large files
and in practice I used 4 or 5 for my largest upload jobs; probably
something to usefully add to the user guide.


I kept the by default setting of 5. My guess is that when our server get
overloaded, GWToolset start making more request, those it would have done
natural, and repeating those who were failing, but it’s just my guess.


Charles


Hmm, I'm not sure if it would try multiple times for requests that
fail, but in terms of things logged, that does not appear to be the
case (But it might try multiple time per item logged as failure).

Here's the day by day of your upload job (for User:Neuchâtel Herbarium)

MariaDB [commonswiki_p]> select substr( log_timestamp, 1, 8 ),
log_action, count(*) from logging_logindex where log_type =
'gwtoolset' and log_timestamp > '20150500000000' and log_user =
2103899 and log_action != 'metadata-job' group by 1, 2;
+-------------------------------+-------------------------+----------+
| substr( log_timestamp, 1, 8 ) | log_action              | count(*) |
+-------------------------------+-------------------------+----------+
| 20150526                      | mediafile-job-failed    |      378 |
| 20150526                      | mediafile-job-succeeded |      926 |
| 20150527                      | mediafile-job-failed    |      115 |
| 20150527                      | mediafile-job-succeeded |     3734 |
| 20150528                      | mediafile-job-failed    |     6431 |
| 20150528                      | mediafile-job-succeeded |     6327 |
| 20150529                      | mediafile-job-failed    |    12148 |
| 20150530                      | mediafile-job-failed    |    11915 |
| 20150531                      | mediafile-job-failed    |    12371 |
| 20150531                      | mediafile-job-succeeded |        6 |
| 20150601                      | mediafile-job-failed    |     7636 |
| 20150601                      | mediafile-job-succeeded |      225 |
+-------------------------------+-------------------------+----------+
12 rows in set (0.56 sec)

On May 28th, about 50% of the files failed, however the number of
files attempted to be fetched was roughly the same as on may 29 when
every single file failed.

I think this suggests that gwtoolset should have some sort of back-off
feature when things start to fail (particularly due to "HTTP request
timed out.") to slow down the request rate.

--bawolff

_______________________________________________
Glamtools mailing list
Glamtools@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/glamtools