Hi everyone!
Lizzy Jongma had started uploading her set of approx. 2,600 images to Commons proper, after a test on beta.
However, after approx. 300 uploaded images the upload has stopped, last Friday. https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma
Does anyone have any tips or ideas what the problem might be, and how to fix this?
We know the metadata (such as the creator templates) are still a bit wonky in some places - as soon as the upload is complete I’ll use VisualFileChange to fix this. And here’s the info page of the event for which we’re uploading, for those interested :-) https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijk... https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijksmuseum
Best, Sandra
Hi Sandra,
I checked the logfile: https://commons.wikimedia.org/w/index.php?title=Special:Log/gwtoolset&of...
and it seems to be that it encountered a lot of duplicates... I haven't looked into the details, but perhaps the logfile helps you to figure out what went wrong?
A lot of birds stuff happening btw :)
Best, Jesse
2015-09-22 10:46 GMT+02:00 Sandra Fauconnier sandra.fauconnier@gmail.com:
Hi everyone!
Lizzy Jongma had started uploading her set of approx. 2,600 images to Commons proper, after a test on beta.
However, after approx. 300 uploaded images the upload has stopped, last Friday. https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma
Does anyone have any tips or ideas what the problem might be, and how to fix this?
We know the metadata (such as the creator templates) are still a bit wonky in some places - as soon as the upload is complete I’ll use VisualFileChange to fix this. And here’s the info page of the event for which we’re uploading, for those interested :-)
https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijk...
Best, Sandra
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Hi Lizzy and Sandra,
Just check the GWToolsetbox "Upload again/Opnieuw opladen" just before submitting the job. If not, error messages like Lizzy's ensue because the requested filenames are in use already. GWToolset doesn't dare to overwrite without permission. Regards, hansmuller
Op Di, 22 september, 2015 11:54 am schreef Jesse de Vos:
Hi Sandra,
I checked the logfile: https://commons.wikimedia.org/w/index.php?title=Special:Log/gwtoolset&of... et=20150918111102&type=gwtoolset&user=
and it seems to be that it encountered a lot of duplicates... I haven't looked into the details, but perhaps the logfile helps you to figure out what went wrong?
A lot of birds stuff happening btw :)
Best, Jesse
2015-09-22 10:46 GMT+02:00 Sandra Fauconnier sandra.fauconnier@gmail.com:
Hi everyone!
Lizzy Jongma had started uploading her set of approx. 2,600 images to Commons proper, after a test on beta.
However, after approx. 300 uploaded images the upload has stopped, last Friday. https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma
Does anyone have any tips or ideas what the problem might be, and how to fix this?
We know the metadata (such as the creator templates) are still a bit wonky in some places - as soon as the upload is complete Iâll use VisualFileChange to fix this. And hereâs the info page of the event for which weâre uploading, for those interested :-)
https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het _Rijksmuseum
Best, Sandra
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
--
Met vriendelijke groet,
*Jesse de Vos* Researcher Interactive and New Media
*T* 035 - 677 39 37 *Aanwezig:* ma t/m do
*Nederlands Instituut voor Beeld en Geluid* *Media Parkboulevard 1, 1217 WE Hilversum | Postbus 1060, 1200 BB Hilversum | * *beeldengeluid.nl* http://www.beeldengeluid.nl/ _______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
On 22 September 2015 at 12:06, Hans Muller j.m.muller@hccnet.nl wrote:
Hi Lizzy and Sandra,
Just check the GWToolsetbox "Upload again/Opnieuw opladen" just before submitting the job. If not, error messages like Lizzy's ensue because the requested filenames are in use already. GWToolset doesn't dare to overwrite without permission. Regards, hansmuller
I did not realise (or maybe I forgot) that GWT had this option. If a batch upload project needs more complex responses, say behaving differently depending on whether the "digitally identical" file already exists, whether the filename is in use, or whether the digitally identical file has been previously uploaded and deleted from Commons, then it can be worth testing and filtering the XML using (somewhat complex) the Commons API in advance of pushing it through GWT, or considering whether GWT is the right solution for a project that is not a "clean" upload.
For a *really* complex set of problems, such as deliberately mass overwriting lower-resolution versions, upgrading existing image text pages with better metadata, uploading the same media in different transcoded formats, etc., I would recommend creating an on-wiki project page (under http://commons.wikimedia.org/wiki/Commons:BATCH and sticking a notice on the Village Pump to promote it), explaining the issues, potential solutions, test runs and giving at least a month for general volunteer feedback on the best approach, before investing a lot of programming time.
Fae
On 9/22/15, Sandra Fauconnier sandra.fauconnier@gmail.com wrote:
Hi everyone!
Lizzy Jongma had started uploading her set of approx. 2,600 images to Commons proper, after a test on beta.
However, after approx. 300 uploaded images the upload has stopped, last Friday. https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma
Does anyone have any tips or ideas what the problem might be, and how to fix this?
We know the metadata (such as the creator templates) are still a bit wonky in some places - as soon as the upload is complete I’ll use VisualFileChange to fix this. And here’s the info page of the event for which we’re uploading, for those interested :-) https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijk... https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijksmuseum
Best, Sandra
I think your uploads were accidentally stopped, when Hansmuller's uploads were canceled [1]. Just re-add your gwtoolset job.
[1] https://phabricator.wikimedia.org/T112878
-- Brian Wolff
Hi all,
I looked into the log, but although the URLs may look similar… I don’t think they are and I can imagine that how my upload was stopped when Hans’ upload was cancelled so I went with Brians advice and started the upload again. This time without the wrapping and with better categories: a nice side effect actually!
I hope we will now be able to upload all images and make it just in time before the Dutch Radioshow Vroege Vogels will visit the museum for this project. :-)
Thank you all for your input and quick replies. So far you all have been super helpful and for me this is a steep learning curve.
Best wishes, Lizzy
On 22 Sep 2015, at 16:12, Brian Wolff bawolff@gmail.com wrote:
On 9/22/15, Sandra Fauconnier sandra.fauconnier@gmail.com wrote:
Hi everyone!
Lizzy Jongma had started uploading her set of approx. 2,600 images to Commons proper, after a test on beta.
However, after approx. 300 uploaded images the upload has stopped, last Friday. https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma
Does anyone have any tips or ideas what the problem might be, and how to fix this?
We know the metadata (such as the creator templates) are still a bit wonky in some places - as soon as the upload is complete I’ll use VisualFileChange to fix this. And here’s the info page of the event for which we’re uploading, for those interested :-) https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijk... https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijksmuseum
Best, Sandra
I think your uploads were accidentally stopped, when Hansmuller's uploads were canceled [1]. Just re-add your gwtoolset job.
[1] https://phabricator.wikimedia.org/T112878
-- Brian Wolff
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Dear all,
I restarted my upload on september 22nd: I started uploading approx 2600 images with works of art depicting birds. Until 25th of september the images were ingested into Wikimedia Commons (they were slowly dripping in) but over the last two days no new images are ingested. Does anyone knows what went wrong/why the upload stopped (again) and how I can restart my job?
I also have difficulties estimating how many images and which images were ingested: which log can I check? I can’t find the relevant log to see what was uploaded or what problems were reported etc.
Thank you very much for your help.
best wishes Lizzy Jongma
On 22 Sep 2015, at 15:12, Brian Wolff bawolff@gmail.com wrote:
On 9/22/15, Sandra Fauconnier sandra.fauconnier@gmail.com wrote:
Hi everyone!
Lizzy Jongma had started uploading her set of approx. 2,600 images to Commons proper, after a test on beta.
However, after approx. 300 uploaded images the upload has stopped, last Friday. https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma https://commons.wikimedia.org/wiki/Special:ListFiles/LizzyJongma
Does anyone have any tips or ideas what the problem might be, and how to fix this?
We know the metadata (such as the creator templates) are still a bit wonky in some places - as soon as the upload is complete I’ll use VisualFileChange to fix this. And here’s the info page of the event for which we’re uploading, for those interested :-) https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijk... https://nl.wikipedia.org/wiki/Wikipedia:GLAM/Rijksmuseum/Vogelen_in_het_Rijksmuseum
Best, Sandra
I think your uploads were accidentally stopped, when Hansmuller's uploads were canceled [1]. Just re-add your gwtoolset job.
[1] https://phabricator.wikimedia.org/T112878
-- Brian Wolff
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
On Sun, Sep 27, 2015 at 3:27 AM, Lizzy Jongma L.Jongma@rijksmuseum.nl wrote:
Dear all,
I restarted my upload on september 22nd: I started uploading approx 2600 images with works of art depicting birds. Until 25th of september the images were ingested into Wikimedia Commons (they were slowly dripping in) but over the last two days no new images are ingested. Does anyone knows what went wrong/why the upload stopped (again) and how I can restart my job?
I also have difficulties estimating how many images and which images were ingested: which log can I check? I can’t find the relevant log to see what was uploaded or what problems were reported etc.
Thank you very much for your help.
best wishes Lizzy Jongma
The logs for you in particular are at https://commons.wikimedia.org/w/index.php?title=Special%3ALog&type=gwtoo...
If you want the list in xml or json format, you can do https://commons.wikimedia.org/w/api.php?action=query&list=logevents&... . See the api docs for details. In particular leaction can be used to filter by success/failure, and you need to use lecontinue parameter to get the next page of results.
There are several entries that appear to be "skipped" due to the image existing already in commons. The records numbers are in the 2800's (The record numbers will skip around a bit, but should very roughly go from low to high in order), so if you only have about 2600 images, I would guess that a the tool went through all the images
It seems like that perhaps the xml file has the same filename for several images. For example https://commons.wikimedia.org/wiki/File:Inhoudsopgave.jpeg is marked as being replaced by a second image after your initial upload, and in the log, it looks like record 2815 was going to replace that image a third time, except someone had edited that page in the meantime ( https://commons.wikimedia.org/w/index.php?title=File:Inhoudsopgave.jpeg&... ), so gwtoolset failed instead of silently overwriting
I also notice several of the templates and file names have question marks in them. If the ? is unintentional, that might represent issue with charset conversions possibly.
GWtoolset is still quite rough around the edges, especially for showing progress and explaining what happened in case of errors.
-- -Brian
Dear all,
Lizzy: i'd say: restart your job, i did the same at 13:48 h (11:48 wikipedia time).
My latest upload before that was at 11:04, then nothing so the job had gone somehow. Now my small job to test the waters runs ok again.
Groetjes, hansmuller
Op Zo, 27 september, 2015 4:46 pm schreef bawolff:
On Sun, Sep 27, 2015 at 3:27 AM, Lizzy Jongma L.Jongma@rijksmuseum.nl wrote:
Dear all,
I restarted my upload on september 22nd: I started uploading approx 2600 images with works of art depicting birds. Until 25th of september the images were ingested into Wikimedia Commons (they were slowly dripping in) but over the last two days no new images are ingested. Does anyone knows what went wrong/why the upload stopped (again) and how I can restart my job?
I also have difficulties estimating how many images and which images were ingested: which log can I check? I canât find the relevant log to see what was uploaded or what problems were reported etc.
Thank you very much for your help.
best wishes Lizzy Jongma
The logs for you in particular are at https://commons.wikimedia.org/w/index.php?title=Special%3ALog&type=gwtoo... et&user=LizzyJongma&page=&year=&month=-1&tagfilter=&uselang=en
If you want the list in xml or json format, you can do https://commons.wikimedia.org/w/api.php?action=query&list=logevents&... =gwtoolset&leuser=LizzyJongma&lelimit=max&format=jsonfm . See the api docs for details. In particular leaction can be used to filter by success/failure, and you need to use lecontinue parameter to get the next page of results.
There are several entries that appear to be "skipped" due to the image existing already in commons. The records numbers are in the 2800's (The record numbers will skip around a bit, but should very roughly go from low to high in order), so if you only have about 2600 images, I would guess that a the tool went through all the images
It seems like that perhaps the xml file has the same filename for several images. For example https://commons.wikimedia.org/wiki/File:Inhoudsopgave.jpeg is marked as being replaced by a second image after your initial upload, and in the log, it looks like record 2815 was going to replace that image a third time, except someone had edited that page in the meantime ( https://commons.wikimedia.org/w/index.php?title=File:Inhoudsopgave.jpeg&... iff=173165602&oldid=173147674 ), so gwtoolset failed instead of silently overwriting
I also notice several of the templates and file names have question marks in them. If the ? is unintentional, that might represent issue with charset conversions possibly.
GWtoolset is still quite rough around the edges, especially for showing progress and explaining what happened in case of errors.
-- -Brian
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
I have seen unexplained drop-outs for my direct uploads (not GWT) since 25 Sept and I see 4 files of mine created today with empty image text pages. I'd agree with leaving it for a day and trying again rather than spending a lot of time investigating.
If there is a pattern of failures of the same type, then the best thing will be to raise a Phabricator task request to analyse the cause.
Perhaps the learning point is that GWT is not a guaranteed service, something to keep in mind if working to a schedule. If the backup plan is to find an unpaid volunteer with experience in alternative batch upload methods, there are only a handful active at any one time, and most of us are over-committed with a backlog of ideas and projects.
Alternative easy workflows like Flickr2Commons are worth knowing about, as in a pinch you can "drag & drop" to a single purpose Flickr account and then mass upload so that the Files can go live to schedule, however this is likely to leave a problematic amount of "housekeeping".
Fae
On 28 September 2015 at 13:27, Hans Muller j.m.muller@hccnet.nl wrote:
Dear all,
Lizzy: i'd say: restart your job, i did the same at 13:48 h (11:48 wikipedia time).
My latest upload before that was at 11:04, then nothing so the job had gone somehow. Now my small job to test the waters runs ok again.
Groetjes, hansmuller
The Flickr2Commons sounds interesting and I will def look into that.
I also think it is better to cut up the job a bit (more): 500 or 1000 objects per upload is spreading the eggs a bit more and is easier to keep track off. So I think my solution for next time will be to upload a set in smaller pieces. But if there is a pattern or unexplained drop-out: I will post it immediately!
Best wishes, Lizzy
On 28 Sep 2015, at 16:12, Fæ faewik@gmail.com wrote:
I have seen unexplained drop-outs for my direct uploads (not GWT) since 25 Sept and I see 4 files of mine created today with empty image text pages. I'd agree with leaving it for a day and trying again rather than spending a lot of time investigating.
If there is a pattern of failures of the same type, then the best thing will be to raise a Phabricator task request to analyse the cause.
Perhaps the learning point is that GWT is not a guaranteed service, something to keep in mind if working to a schedule. If the backup plan is to find an unpaid volunteer with experience in alternative batch upload methods, there are only a handful active at any one time, and most of us are over-committed with a backlog of ideas and projects.
Alternative easy workflows like Flickr2Commons are worth knowing about, as in a pinch you can "drag & drop" to a single purpose Flickr account and then mass upload so that the Files can go live to schedule, however this is likely to leave a problematic amount of "housekeeping".
Fae
On 28 September 2015 at 13:27, Hans Muller j.m.muller@hccnet.nl wrote:
Dear all,
Lizzy: i'd say: restart your job, i did the same at 13:48 h (11:48 wikipedia time).
My latest upload before that was at 11:04, then nothing so the job had gone somehow. Now my small job to test the waters runs ok again.
Groetjes, hansmuller
-- faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Hi Lizzy, handling of metadata is quite problematic in Flickr approach causing that "housekeeping". We had to deal with it in our GLAM uploads and so we made a small tool called Flickr2GWToolset.
The idea is to chop data in right places BEFORE uploading with gwtoolset. For me, this was easier than doing that afterwards since I'm not familiar with tools used in commons housekeeping.
It worked quite well for us. It *might* be useful also for you if you are going to follow Flickr-route.
http://wikihacks.opendimension.org/flickr2gwtoolset/
Regards, Ari
Sent from my Debian. http://www.opendimension.org/
2015-09-28 20:23 GMT+03:00 Lizzy Jongma L.Jongma@rijksmuseum.nl:
The Flickr2Commons sounds interesting and I will def look into that.
I also think it is better to cut up the job a bit (more): 500 or 1000 objects per upload is spreading the eggs a bit more and is easier to keep track off. So I think my solution for next time will be to upload a set in smaller pieces. But if there is a pattern or unexplained drop-out: I will post it immediately!
Best wishes, Lizzy
On 28 Sep 2015, at 16:12, Fæ faewik@gmail.com wrote:
I have seen unexplained drop-outs for my direct uploads (not GWT) since 25 Sept and I see 4 files of mine created today with empty image text pages. I'd agree with leaving it for a day and trying again rather than spending a lot of time investigating.
If there is a pattern of failures of the same type, then the best thing will be to raise a Phabricator task request to analyse the cause.
Perhaps the learning point is that GWT is not a guaranteed service, something to keep in mind if working to a schedule. If the backup plan is to find an unpaid volunteer with experience in alternative batch upload methods, there are only a handful active at any one time, and most of us are over-committed with a backlog of ideas and projects.
Alternative easy workflows like Flickr2Commons are worth knowing about, as in a pinch you can "drag & drop" to a single purpose Flickr account and then mass upload so that the Files can go live to schedule, however this is likely to leave a problematic amount of "housekeeping".
Fae
On 28 September 2015 at 13:27, Hans Muller j.m.muller@hccnet.nl wrote:
Dear all,
Lizzy: i'd say: restart your job, i did the same at 13:48 h (11:48 wikipedia time).
My latest upload before that was at 11:04, then nothing so the job had gone somehow. Now my small job to test the waters runs ok again.
Groetjes, hansmuller
-- faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Thank you for helping me with the log files!!! That is really helpful. There are quite a few duplicate titles: some are actually not duplicate but just have the same title. We need to come up with a new and better strategy for naming files! And indeed: some were uploaded before (for instance by me :-) We also encountered some character set issues: I cannot explain where that came from but I will research it a bit more.
This has been a huge learning experience for me! So thank you for helping me. Next upload will hopefully go a lot smoother (we also have some rough edges to deal with on our side :-))
Best Lizzy
-----Oorspronkelijk bericht----- Van: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] Namens bawolff Verzonden: zondag 27 september 2015 16:47 Aan: Conversations revolving around the development of GLAM Digital Tools glamtools@lists.wikimedia.org Onderwerp: Re: [Glamtools] upload from Rijksmuseum has stalled?
On Sun, Sep 27, 2015 at 3:27 AM, Lizzy Jongma L.Jongma@rijksmuseum.nl wrote:
Dear all,
I restarted my upload on september 22nd: I started uploading approx 2600 images with works of art depicting birds. Until 25th of september the images were ingested into Wikimedia Commons (they were slowly dripping in) but over the last two days no new images are ingested. Does anyone knows what went wrong/why the upload stopped (again) and how I can restart my job?
I also have difficulties estimating how many images and which images were ingested: which log can I check? I can’t find the relevant log to see what was uploaded or what problems were reported etc.
Thank you very much for your help.
best wishes Lizzy Jongma
The logs for you in particular are at https://commons.wikimedia.org/w/index.php?title=Special%3ALog&type=gwtoo...
If you want the list in xml or json format, you can do https://commons.wikimedia.org/w/api.php?action=query&list=logevents&... . See the api docs for details. In particular leaction can be used to filter by success/failure, and you need to use lecontinue parameter to get the next page of results.
There are several entries that appear to be "skipped" due to the image existing already in commons. The records numbers are in the 2800's (The record numbers will skip around a bit, but should very roughly go from low to high in order), so if you only have about 2600 images, I would guess that a the tool went through all the images
It seems like that perhaps the xml file has the same filename for several images. For example https://commons.wikimedia.org/wiki/File:Inhoudsopgave.jpeg is marked as being replaced by a second image after your initial upload, and in the log, it looks like record 2815 was going to replace that image a third time, except someone had edited that page in the meantime ( https://commons.wikimedia.org/w/index.php?title=File:Inhoudsopgave.jpeg&... ), so gwtoolset failed instead of silently overwriting
I also notice several of the templates and file names have question marks in them. If the ? is unintentional, that might represent issue with charset conversions possibly.
GWtoolset is still quite rough around the edges, especially for showing progress and explaining what happened in case of errors.
-- -Brian
_______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
On 29 September 2015 at 10:10, Lizzy Jongma L.Jongma@rijksmuseum.nl wrote: ...
We also encountered some character set issues: I cannot explain where that came from but I will research it a bit more.
Character encoding problems can be very tricky. There is a short bit of guidance in the GWT manual on XML sensitive text to watch out for, these can also trip up the unwary. https://www.mediawiki.org/wiki/Help:Extension:GWToolset#Validating_your_xml
Fae
Thanks!
-----Oorspronkelijk bericht----- Van: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] Namens Fæ Verzonden: dinsdag 29 september 2015 11:15 Aan: Conversations revolving around the development of GLAM Digital Tools glamtools@lists.wikimedia.org Onderwerp: Re: [Glamtools] upload from Rijksmuseum has stalled?
On 29 September 2015 at 10:10, Lizzy Jongma L.Jongma@rijksmuseum.nl wrote: ...
We also encountered some character set issues: I cannot explain where that came from but I will research it a bit more.
Character encoding problems can be very tricky. There is a short bit of guidance in the GWT manual on XML sensitive text to watch out for, these can also trip up the unwary. https://www.mediawiki.org/wiki/Help:Extension:GWToolset#Validating_your_xml
Fae
_______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Hi Lizzy,
In Switzerland, we often just add the inventory number at the end of the file name. This is particularly handy if you have similar items with the same or similar names; otherwise it takes ages to match the items on Commons with the items in your database later on (or when checking against log files ;-).
Cheers, Beat
-----Original Message----- From: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] On Behalf Of Lizzy Jongma Sent: Dienstag, 29. September 2015 11:11 To: bawolff+wn@gmail.com; Conversations revolving around the development of GLAM Digital Tools Subject: Re: [Glamtools] upload from Rijksmuseum has stalled?
Thank you for helping me with the log files!!! That is really helpful. There are quite a few duplicate titles: some are actually not duplicate but just have the same title. We need to come up with a new and better strategy for naming files! And indeed: some were uploaded before (for instance by me :-) We also encountered some character set issues: I cannot explain where that came from but I will research it a bit more.
This has been a huge learning experience for me! So thank you for helping me. Next upload will hopefully go a lot smoother (we also have some rough edges to deal with on our side :-))
Best Lizzy
-----Oorspronkelijk bericht----- Van: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] Namens bawolff Verzonden: zondag 27 september 2015 16:47 Aan: Conversations revolving around the development of GLAM Digital Tools glamtools@lists.wikimedia.org Onderwerp: Re: [Glamtools] upload from Rijksmuseum has stalled?
On Sun, Sep 27, 2015 at 3:27 AM, Lizzy Jongma L.Jongma@rijksmuseum.nl wrote:
Dear all,
I restarted my upload on september 22nd: I started uploading approx 2600 images with works of art depicting birds. Until 25th of september the images were ingested into Wikimedia Commons (they were slowly dripping in) but over the last two days no new images are ingested. Does anyone knows what went wrong/why the upload stopped (again) and how I can restart my job?
I also have difficulties estimating how many images and which images were ingested: which log can I check? I can’t find the relevant log to see what was uploaded or what problems were reported etc.
Thank you very much for your help.
best wishes Lizzy Jongma
The logs for you in particular are at https://commons.wikimedia.org/w/index.php?title=Special%3ALog&type=gwtoo...
If you want the list in xml or json format, you can do https://commons.wikimedia.org/w/api.php?action=query&list=logevents&... . See the api docs for details. In particular leaction can be used to filter by success/failure, and you need to use lecontinue parameter to get the next page of results.
There are several entries that appear to be "skipped" due to the image existing already in commons. The records numbers are in the 2800's (The record numbers will skip around a bit, but should very roughly go from low to high in order), so if you only have about 2600 images, I would guess that a the tool went through all the images
It seems like that perhaps the xml file has the same filename for several images. For example https://commons.wikimedia.org/wiki/File:Inhoudsopgave.jpeg is marked as being replaced by a second image after your initial upload, and in the log, it looks like record 2815 was going to replace that image a third time, except someone had edited that page in the meantime ( https://commons.wikimedia.org/w/index.php?title=File:Inhoudsopgave.jpeg&... ), so gwtoolset failed instead of silently overwriting
I also notice several of the templates and file names have question marks in them. If the ? is unintentional, that might represent issue with charset conversions possibly.
GWtoolset is still quite rough around the edges, especially for showing progress and explaining what happened in case of errors.
-- -Brian
_______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools _______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Thank you! That was indeed the solution I was thinking of & good to hear others are doing it this way too.
-----Oorspronkelijk bericht----- Van: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] Namens Estermann Beat Verzonden: dinsdag 29 september 2015 11:16 Aan: Conversations revolving around the development of GLAM Digital Tools glamtools@lists.wikimedia.org Onderwerp: Re: [Glamtools] upload from Rijksmuseum has stalled?
Hi Lizzy,
In Switzerland, we often just add the inventory number at the end of the file name. This is particularly handy if you have similar items with the same or similar names; otherwise it takes ages to match the items on Commons with the items in your database later on (or when checking against log files ;-).
Cheers, Beat
-----Original Message----- From: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] On Behalf Of Lizzy Jongma Sent: Dienstag, 29. September 2015 11:11 To: bawolff+wn@gmail.com; Conversations revolving around the development of GLAM Digital Tools Subject: Re: [Glamtools] upload from Rijksmuseum has stalled?
Thank you for helping me with the log files!!! That is really helpful. There are quite a few duplicate titles: some are actually not duplicate but just have the same title. We need to come up with a new and better strategy for naming files! And indeed: some were uploaded before (for instance by me :-) We also encountered some character set issues: I cannot explain where that came from but I will research it a bit more.
This has been a huge learning experience for me! So thank you for helping me. Next upload will hopefully go a lot smoother (we also have some rough edges to deal with on our side :-))
Best Lizzy
-----Oorspronkelijk bericht----- Van: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] Namens bawolff Verzonden: zondag 27 september 2015 16:47 Aan: Conversations revolving around the development of GLAM Digital Tools glamtools@lists.wikimedia.org Onderwerp: Re: [Glamtools] upload from Rijksmuseum has stalled?
On Sun, Sep 27, 2015 at 3:27 AM, Lizzy Jongma L.Jongma@rijksmuseum.nl wrote:
Dear all,
I restarted my upload on september 22nd: I started uploading approx 2600 images with works of art depicting birds. Until 25th of september the images were ingested into Wikimedia Commons (they were slowly dripping in) but over the last two days no new images are ingested. Does anyone knows what went wrong/why the upload stopped (again) and how I can restart my job?
I also have difficulties estimating how many images and which images were ingested: which log can I check? I can’t find the relevant log to see what was uploaded or what problems were reported etc.
Thank you very much for your help.
best wishes Lizzy Jongma
The logs for you in particular are at https://commons.wikimedia.org/w/index.php?title=Special%3ALog&type=gwtoo...
If you want the list in xml or json format, you can do https://commons.wikimedia.org/w/api.php?action=query&list=logevents&... . See the api docs for details. In particular leaction can be used to filter by success/failure, and you need to use lecontinue parameter to get the next page of results.
There are several entries that appear to be "skipped" due to the image existing already in commons. The records numbers are in the 2800's (The record numbers will skip around a bit, but should very roughly go from low to high in order), so if you only have about 2600 images, I would guess that a the tool went through all the images
It seems like that perhaps the xml file has the same filename for several images. For example https://commons.wikimedia.org/wiki/File:Inhoudsopgave.jpeg is marked as being replaced by a second image after your initial upload, and in the log, it looks like record 2815 was going to replace that image a third time, except someone had edited that page in the meantime ( https://commons.wikimedia.org/w/index.php?title=File:Inhoudsopgave.jpeg&... ), so gwtoolset failed instead of silently overwriting
I also notice several of the templates and file names have question marks in them. If the ? is unintentional, that might represent issue with charset conversions possibly.
GWtoolset is still quite rough around the edges, especially for showing progress and explaining what happened in case of errors.
-- -Brian
_______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools _______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools _______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
On 29 Sep 2015 10:17, "Lizzy Jongma" L.Jongma@rijksmuseum.nl wrote:
Thank you! That was indeed the solution I was thinking of & good to hear
others are doing it this way too.
This is a fairly standard practice - and
https://commons.m.wikimedia.org/wiki/Commons:Guide_to_batch_uploading#Naming
-----Oorspronkelijk bericht----- Van: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] Namens
Estermann Beat
Verzonden: dinsdag 29 september 2015 11:16 Aan: Conversations revolving around the development of GLAM Digital Tools
Onderwerp: Re: [Glamtools] upload from Rijksmuseum has stalled?
Hi Lizzy,
In Switzerland, we often just add the inventory number at the end of the
file name.
This is particularly handy if you have similar items with the same or
similar names; otherwise it takes ages to match the items on Commons with the items in your database later on (or when checking against log files ;-).
Cheers, Beat
-----Original Message----- From: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] On Behalf
Of Lizzy Jongma
Sent: Dienstag, 29. September 2015 11:11 To: bawolff+wn@gmail.com; Conversations revolving around the development
of GLAM Digital Tools
Subject: Re: [Glamtools] upload from Rijksmuseum has stalled?
Thank you for helping me with the log files!!! That is really helpful.
There are quite a few duplicate titles: some are actually not duplicate but just have the same title. We need to come up with a new and better strategy for naming files! And indeed: some were uploaded before (for instance by me :-) We also encountered some character set issues: I cannot explain where that came from but I will research it a bit more.
This has been a huge learning experience for me! So thank you for helping
me.
Next upload will hopefully go a lot smoother (we also have some rough
edges to deal with on our side :-))
Best Lizzy
-----Oorspronkelijk bericht----- Van: Glamtools [mailto:glamtools-bounces@lists.wikimedia.org] Namens
bawolff
Verzonden: zondag 27 september 2015 16:47 Aan: Conversations revolving around the development of GLAM Digital Tools
Onderwerp: Re: [Glamtools] upload from Rijksmuseum has stalled?
On Sun, Sep 27, 2015 at 3:27 AM, Lizzy Jongma L.Jongma@rijksmuseum.nl
wrote:
Dear all,
I restarted my upload on september 22nd: I started uploading approx
2600 images with works of art depicting birds. Until 25th of september the images were ingested into Wikimedia Commons (they were slowly dripping in) but over the last two days no new images are ingested. Does anyone knows what went wrong/why the upload stopped (again) and how I can restart my job?
I also have difficulties estimating how many images and which images
were ingested: which log can I check? I can’t find the relevant log to see what was uploaded or what problems were reported etc.
Thank you very much for your help.
best wishes Lizzy Jongma
The logs for you in particular are at
https://commons.wikimedia.org/w/index.php?title=Special%3ALog&type=gwtoo...
If you want the list in xml or json format, you can do
https://commons.wikimedia.org/w/api.php?action=query&list=logevents&...
. See the api docs for details. In particular leaction can be used to
filter by success/failure, and you need to use lecontinue parameter to get the next page of results.
There are several entries that appear to be "skipped" due to the image
existing already in commons. The records numbers are in the 2800's (The record numbers will skip around a bit, but should very roughly go from low to high in order), so if you only have about 2600 images, I would guess that a the tool went through all the images
It seems like that perhaps the xml file has the same filename for several
images. For example https://commons.wikimedia.org/wiki/File:Inhoudsopgave.jpeg is marked as being replaced by a second image after your initial upload, and in the log, it looks like record 2815 was going to replace that image a third time, except someone had edited that page in the meantime (
https://commons.wikimedia.org/w/index.php?title=File:Inhoudsopgave.jpeg&...
), so gwtoolset failed instead of silently overwriting
I also notice several of the templates and file names have question marks
in them. If the ? is unintentional, that might represent issue with charset conversions possibly.
GWtoolset is still quite rough around the edges, especially for showing
progress and explaining what happened in case of errors.
-- -Brian
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools _______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools _______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools _______________________________________________ Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
In Switzerland, we often just add the inventory number at the end of the
file name.
Thank you! That was indeed the solution I was thinking of & good to hear others are doing it this way too.
Yes, this is a fairly standard practice - and as such is documented in the
allmighty help page :)
https://commons.wikimedia.org/wiki/Commons:Guide_to_batch_uploading#Naming