Hi all,
Wikimedia Nederland has recently approached by several institutions that would like to do uploads of source material. Wikisource would be the preferred platform for this as the material would be searchable (which it wouldn't be if it was only uploaded as pdf to Commons).
I would like to know if there have been previous projects involving large uploads by institutions, and if there's any documentation on how to proceed with these.
Thanks!
Arne Wossink
Projectleider / Project Lead Wikimedia Nederland
Tel. +31 (0)6 11000505
*Postadres*: * Bezoekadres:* Postbus 167 Mariaplaats 3 3500 AD Utrecht Utrecht
Hi Arne, you should speak with Jean-Fred (in CC). few years ago they uploaded ~1200 books from Gallica in Commons, and then on Wikisource.
Aubrey
On Thu, Oct 15, 2015 at 12:18 PM, Arne Wossink wossink@wikimedia.nl wrote:
Hi all,
Wikimedia Nederland has recently approached by several institutions that would like to do uploads of source material. Wikisource would be the preferred platform for this as the material would be searchable (which it wouldn't be if it was only uploaded as pdf to Commons).
I would like to know if there have been previous projects involving large uploads by institutions, and if there's any documentation on how to proceed with these.
Thanks!
Arne Wossink
Projectleider / Project Lead Wikimedia Nederland
Tel. +31 (0)6 11000505
*Postadres*:
- Bezoekadres:*
Postbus 167 Mariaplaats 3 3500 AD Utrecht Utrecht
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Arne Wossink, 15/10/2015 12:18:
Wikimedia Nederland has recently approached by several institutions that would like to do uploads of source material. Wikisource would be the preferred platform for this as the material would be searchable (which it wouldn't be if it was only uploaded as pdf to Commons).
I would like to know if there have been previous projects involving large uploads by institutions, and if there's any documentation on how to proceed with these.
DjVU (and PDF?) files with a text layer *are* searchable in Commons since CirrusSearch was enabled (September 2014). Of course the search is only as good as the text: with poor OCR, it will be poor. The main points are the same as for all batch uploads, see https://commons.wikimedia.org/wiki/Commons:Guide_to_batch_uploading ; as for actually creating pages in Wikisource, that's another matter, you must consider what the goals are and have a good plan. BEIC uploaded about 1000 books in 2015 and will upload more in the future. We only created Index pages which we considered necessary and we didn't touch namespace 0. See pointers at https://it.wikipedia.org/wiki/Progetto:GLAM/BEIC/2015-07 (search "Wikisource") and https://it.wikisource.org/wiki/Wikisource:Collaborazioni/BEIC .
Nemo
...and for those who don't read Italian, the series of articles about this on Outreach may be helpful: https://outreach.wikimedia.org/w/index.php?search=beic&title=Special%3AS...
On Thu, Oct 15, 2015 at 1:12 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Arne Wossink, 15/10/2015 12:18:
Wikimedia Nederland has recently approached by several institutions that would like to do uploads of source material. Wikisource would be the preferred platform for this as the material would be searchable (which it wouldn't be if it was only uploaded as pdf to Commons).
I would like to know if there have been previous projects involving large uploads by institutions, and if there's any documentation on how to proceed with these.
DjVU (and PDF?) files with a text layer *are* searchable in Commons since CirrusSearch was enabled (September 2014). Of course the search is only as good as the text: with poor OCR, it will be poor. The main points are the same as for all batch uploads, see https://commons.wikimedia.org/wiki/Commons:Guide_to_batch_uploading ; as for actually creating pages in Wikisource, that's another matter, you must consider what the goals are and have a good plan. BEIC uploaded about 1000 books in 2015 and will upload more in the future. We only created Index pages which we considered necessary and we didn't touch namespace 0. See pointers at https://it.wikipedia.org/wiki/Progetto:GLAM/BEIC/2015-07 (search "Wikisource") and https://it.wikisource.org/wiki/Wikisource:Collaborazioni/BEIC .
Nemo
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Also to note that User:Dominic was a wikimedian in residence with NARA in the States and had a large number of files uploaded, and components of transcription project for those uploads.. They have their own template at Commons, so you should be able to dig them up.
Regards, Billinghurst
On Thu, Oct 15, 2015 at 9:18 PM Arne Wossink wossink@wikimedia.nl wrote:
Hi all,
Wikimedia Nederland has recently approached by several institutions that would like to do uploads of source material. Wikisource would be the preferred platform for this as the material would be searchable (which it wouldn't be if it was only uploaded as pdf to Commons).
I would like to know if there have been previous projects involving large uploads by institutions, and if there's any documentation on how to proceed with these.
Thanks!
Arne Wossink
Projectleider / Project Lead Wikimedia Nederland
Tel. +31 (0)6 11000505
*Postadres*:
- Bezoekadres:*
Postbus 167 Mariaplaats 3 3500 AD Utrecht Utrecht _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Perhaps all from you already know this, but I only recently discovered that pdf2djvu converts a *searchable pdf* into a *searchable djvu* (t.i. uploads anything from pdf to djvu, active links and metadata too) and I like to share my "discover". Conversion is extremely simple. Unluckily, we use only a little bit of djvu text data - usually only the whole, unmapped text, the only exception being hOCR tool by Phe, that outputs mapped text.
Alex
2015-10-15 13:45 GMT+02:00 billinghurst billinghurstwiki@gmail.com:
Also to note that User:Dominic was a wikimedian in residence with NARA in the States and had a large number of files uploaded, and components of transcription project for those uploads.. They have their own template at Commons, so you should be able to dig them up.
Regards, Billinghurst
On Thu, Oct 15, 2015 at 9:18 PM Arne Wossink wossink@wikimedia.nl wrote:
Hi all,
Wikimedia Nederland has recently approached by several institutions that would like to do uploads of source material. Wikisource would be the preferred platform for this as the material would be searchable (which it wouldn't be if it was only uploaded as pdf to Commons).
I would like to know if there have been previous projects involving large uploads by institutions, and if there's any documentation on how to proceed with these.
Thanks!
Arne Wossink
Projectleider / Project Lead Wikimedia Nederland
Tel. +31 (0)6 11000505
*Postadres*:
- Bezoekadres:*
Postbus 167 Mariaplaats 3 3500 AD Utrecht Utrecht _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
wikisource-l@lists.wikimedia.org