Hi!
Last year I was provided very kindly with a list of all svg-files on commons, that is their then *real* http(s)-paths. (Either by John phoenixoverride@gmail.com or by Ariel T. Glenn aglenn@wikimedia.org aglenn@wikimedia.org.)
Could I get a current version of this dump, please? (With the real paths and really existing files.)
Back then the dump was
http://tools.wmflabs.org/betacommand-dev/reports/commonswiki_svg_list.txt.7z as far as I remember.
(Someone told me I could create such a dump myself with some wiki-tools. Is this really possible?)
Greetings John
I sent this same message twice, but it didn't show up. Sorry, if it's still appearing twice maybe.
On 02/03/16 22:05, D. Hansen wrote:
Hi!
Last year I was provided very kindly with a list of all svg-files on commons, that is their then *real* http(s)-paths. (Either by John phoenixoverride@gmail.com or by Ariel T. Glenn aglenn@wikimedia.org aglenn@wikimedia.org.)
Could I get a current version of this dump, please? (With the real paths and really existing files.)
Back then the dump was
http://tools.wmflabs.org/betacommand-dev/reports/commonswiki_svg_list.txt.7z
as far as I remember.
(Someone told me I could create such a dump myself with some wiki-tools. Is this really possible?)
Greetings John
You should be able to extract such list with a query such as: SELECT CONCAT('https://upload.wikimedia.org/wikipedia/commons/', SUBSTRING(MD5(img_name), 1,1), '/', SUBSTRING(MD5(img_name), 1,2), '/', img_name) from image where img_media_type = 'DRAWING' AND img_major_mime='image' and img_minor_mime LIKE 'svg%';
which leads to 967065 images.
I've put a copy at http://tools.wmflabs.org/heritage/commonswiki_svg_list-2016-03-02.txt.xz for your convenience.
Hello Platonides! Thank you for being do kind to make the list. Originally I wanted to download the svg-files whenever my program is using the file, one after another. Turned out, that's just not feasible.
Could I get a dump of all svg files, please? Not just the list of the files with their paths, but the svg-files themselves? Then I just have them locally, and I don't have to handle all the network exceptions that occur on my side too often.
That would be of a great help for me.
Greetings Dieter (not John, as I accidentaly wrote in the last post)
Am 03.03.2016 01:00, schrieb Platonides:
On 02/03/16 22:05, D. Hansen wrote:
Hi!
Last year I was provided very kindly with a list of all svg-files on commons, that is their then *real* http(s)-paths. (Either by John phoenixoverride@gmail.com or by Ariel T. Glenn aglenn@wikimedia.org aglenn@wikimedia.org.)
Could I get a current version of this dump, please? (With the real paths and really existing files.)
Back then the dump was
http://tools.wmflabs.org/betacommand-dev/reports/commonswiki_svg_list.txt.7z
as far as I remember.
(Someone told me I could create such a dump myself with some wiki-tools. Is this really possible?)
Greetings John
You should be able to extract such list with a query such as: SELECT CONCAT('https://upload.wikimedia.org/wikipedia/commons/', SUBSTRING(MD5(img_name), 1,1), '/', SUBSTRING(MD5(img_name), 1,2), '/', img_name) from image where img_media_type = 'DRAWING' AND img_major_mime='image' and img_minor_mime LIKE 'svg%';
which leads to 967065 images.
I've put a copy at http://tools.wmflabs.org/heritage/commonswiki_svg_list-2016-03-02.txt.xz for your convenience.
As I am planing to download other filestypes from commons and other wikis, and as I can't ask you to do it for me like 100 times, is it possible to access the database myself? On my last request for the svg-filenames Platonides suggested:
You should be able to extract such list with a query such as: SELECT
CONCAT('https://upload.wikimedia.org/wikipedia/commons/', SUBSTRING(MD5(img_name), 1,1), '/', SUBSTRING(MD5(img_name), 1,2), '/', img_name) from image where img_media_type = 'DRAWING' AND img_major_mime='image' and img_minor_mime LIKE 'svg%';
Or do I need to download all dumps of all wikis, and set up my own database?
Alternativly: Could you make a last dump for me, and just but all svg-files (the files itself, not the paths) of all wikis in your scope of range in a dump, please?
Greetings Dieter
You can run your queries in http://quarry.wmflabs.org/ (don't forget to add USE commonswiki_p; if you want to query Commons).
But that seems to have a relatively short time limit, so if your query takes too long, I think you will need to get an account on https://tools.wmflabs.org/ and run your query there.
Petr Onderka User:Svick
On Fri, Mar 11, 2016 at 5:58 AM, D. Hansen sammelaccount@tageskurier.de wrote:
As I am planing to download other filestypes from commons and other wikis, and as I can't ask you to do it for me like 100 times, is it possible to access the database myself? On my last request for the svg-filenames Platonides suggested:
You should be able to extract such list with a query such as: SELECT CONCAT('https://upload.wikimedia.org/wikipedia/commons/', SUBSTRING(MD5(img_name), 1,1), '/', SUBSTRING(MD5(img_name), 1,2), '/', img_name) from image where img_media_type = 'DRAWING' AND img_major_mime='image' and img_minor_mime LIKE 'svg%';
Or do I need to download all dumps of all wikis, and set up my own database?
Alternativly: Could you make a last dump for me, and just but all svg-files (the files itself, not the paths) of all wikis in your scope of range in a dump, please?
Greetings Dieter
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
D. Hansen wrote:
Hello Platonides! Thank you for being do kind to make the list. Originally I wanted to download the svg-files whenever my program is using the file, one after another. Turned out, that's just not feasible.
Could I get a dump of all svg files, please? Not just the list of the files with their paths, but the svg-files themselves? Then I just have them locally, and I don't have to handle all the network exceptions that occur on my side too often.
That would be of a great help for me.
Greetings Dieter (not John, as I accidentaly wrote in the last post)
Hello Dieter
That would be similarly hard for me than for you.
I have run the equivalent of:
mkdir dieter-svgs cd dieter-svgs wget http://tools.wmflabs.org/heritage/commonswiki_svg_list-2016-03-02.txt.xzplat... time wget http://tools.wmflabs.org/heritage/commonswiki_svg_list-2016-03-02.txt.xz xz -d commonswiki_svg_list-2016-03-02.txt.xz time wget --force-directories -i commonswiki_svg_list-2016-03-02.txt cd .. tar -cjf dieter-svgs-2016-03-02.tar.bz2 dieter-svgs/
I have kept the directory structure in place (by using the --force-directories parameter) so that it isn't too stressful for the filesystem. Still, a GUI app will probably choke on it.
And after a quite long wait, here is the result: https://archivos.wikimedia.es/dieter-svgs-2016-03-02.tar.bz2 (99GB)
SHA256 checksum: 48966777cc5f5d733b2a1eaf4d11f86853e9545a56da329929c996e330b38e28 dieter-svgs-2016-03-02.tar.bz2
Best regards
Hi Platonides
First let me thank you very much!
99 GByte, how could that be? On https://dumps.wikimedia.org/commonswiki/20160305/ there are dumps like commonswiki-20160305-pages-articles-multistream.xml.bz2 with 6.7 GByte and that file should contain "Articles, templates, media/file descriptions, and primary meta-pages, in multiple bz2 streams, 100 pages per stream". Doesn't this include the svg-files, too? (Though I haven't found out how to get the contents out what it actually contains, because my GUI-programs on Win 7 Premium 64bit regularily crash when trying to open it. And trying to download the version splitted in 4 files now crashes my computer.)
If you say 99 GByte is correct, I will start downloading it. (I have downloaded such big files before with the openstreetmap data, and it worked.)
Best regards Dieter
Am 18.03.2016 00:42, schrieb Platonides:
Hello Dieter
That would be similarly hard for me than for you.
I have run the equivalent of:
mkdir dieter-svgs cd dieter-svgs wget http://tools.wmflabs.org/heritage/commonswiki_svg_list-2016-03-02.txt.xzplat... time wget http://tools.wmflabs.org/heritage/commonswiki_svg_list-2016-03-02.txt.xz xz -d commonswiki_svg_list-2016-03-02.txt.xz time wget --force-directories -i commonswiki_svg_list-2016-03-02.txt cd .. tar -cjf dieter-svgs-2016-03-02.tar.bz2 dieter-svgs/
I have kept the directory structure in place (by using the --force-directories parameter) so that it isn't too stressful for the filesystem. Still, a GUI app will probably choke on it.
And after a quite long wait, here is the result: https://archivos.wikimedia.es/dieter-svgs-2016-03-02.tar.bz2 (99GB)
SHA256 checksum: 48966777cc5f5d733b2a1eaf4d11f86853e9545a56da329929c996e330b38e28 dieter-svgs-2016-03-02.tar.bz2
Best regards
the dumps do not contain any images, just the description text that goes along with them. Platonides got you a raw copy of the actual files
On Fri, Mar 18, 2016 at 6:31 AM, D. Hansen sammelaccount@tageskurier.de wrote:
Hi Platonides
First let me thank you very much!
99 GByte, how could that be? On https://dumps.wikimedia.org/commonswiki/20160305/ there are dumps like commonswiki-20160305-pages-articles-multistream.xml.bz2 with 6.7 GByte and that file should contain "Articles, templates, media/file descriptions, and primary meta-pages, in multiple bz2 streams, 100 pages per stream". Doesn't this include the svg-files, too? (Though I haven't found out how to get the contents out what it actually contains, because my GUI-programs on Win 7 Premium 64bit regularily crash when trying to open it. And trying to download the version splitted in 4 files now crashes my computer.)
If you say 99 GByte is correct, I will start downloading it. (I have downloaded such big files before with the openstreetmap data, and it worked.)
Best regards Dieter
Am 18.03.2016 00:42, schrieb Platonides:
Hello Dieter
That would be similarly hard for me than for you.
I have run the equivalent of:
mkdir dieter-svgs cd dieter-svgs wget
http://tools.wmflabs.org/heritage/commonswiki_svg_list-2016-03-02.txt.xzplat... time wget http://tools.wmflabs.org/heritage/commonswiki_svg_list-2016-03-02.txt.xz xz -d commonswiki_svg_list-2016-03-02.txt.xz time wget --force-directories -i commonswiki_svg_list-2016-03-02.txt cd .. tar -cjf dieter-svgs-2016-03-02.tar.bz2 dieter-svgs/
I have kept the directory structure in place (by using the --force-directories parameter) so that it isn't too stressful for the filesystem. Still, a GUI app will probably choke on it.
And after a quite long wait, here is the result: https://archivos.wikimedia.es/dieter-svgs-2016-03-02.tar.bz2 (99GB)
SHA256 checksum: 48966777cc5f5d733b2a1eaf4d11f86853e9545a56da329929c996e330b38e28 dieter-svgs-2016-03-02.tar.bz2
Best regards
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Hi Platonides,
I succeeded in downloading the file. (After some crashes.)
As I'm sure you will need the hd-space, I want to use the checksum you provided, so I could tell you that the data actually arrived. As I'm fed up with using Windows for large files, I'll use an Ubuntu version from an USB-stick for checking the SHA256 checksum.
Which program did you use or would you suggest to get the checksum?
(Just so eventually I don't choose a program that crashes after the first 20 minutes, as so many download programs did on windows.)
Greetings Dieter
Am 18.03.2016 00:42, schrieb Platonides:
... And after a quite long wait, here is the result: https://archivos.wikimedia.es/dieter-svgs-2016-03-02.tar.bz2 (99GB)
SHA256 checksum: 48966777cc5f5d733b2a1eaf4d11f86853e9545a56da329929c996e330b38e28 dieter-svgs-2016-03-02.tar.bz2
Best regards
D. Hansen wrote:
Hi Platonides,
I succeeded in downloading the file. (After some crashes.)
As I'm sure you will need the hd-space, I want to use the checksum you provided, so I could tell you that the data actually arrived. As I'm fed up with using Windows for large files, I'll use an Ubuntu version from an USB-stick for checking the SHA256 checksum.
Which program did you use or would you suggest to get the checksum?
(Just so eventually I don't choose a program that crashes after the first 20 minutes, as so many download programs did on windows.)
Greetings Dieter
This is a basic sha256 checksum. The easiest would be to use sha256sum(1), for which you can:
a) run «sha256sum dieter-svgs-2016-03-02.tar.bz2» and verify that the output is identical to what I provided
b) run «sha256sum -c filename» with filename being a file containing that line (attaching one for your convenience).
Best regards
xmldatadumps-l@lists.wikimedia.org