Hi all,
I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page.
The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource.
It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do.
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has /and more/!
—Sam
PS changes made by this tool are all tagged as "OAuth CID: 638" —
https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfil...
Hy Sam,
would be good, cause PG does not hat/show the scans,
But
as I remember there was/is a policy at de.ws to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS),
cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project
(besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times)
But that is this special German-thing.....
What do the others think about it? Anika
2016-10-14 3:20 GMT+02:00 Sam Wilson sam@samwilson.id.au:
Hi all,
I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page.
The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource.
It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do.
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has *and more*!
—Sam
PS changes made by this tool are all tagged as "OAuth CID: 638" —
https://en.wikisource.org/w/index.php?title=Special: RecentChanges&tagfilter=OAuth+CID%3A+638
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
corr1: [...] does not ha*ve*/show the scans, [...]
Anika
2016-10-14 8:18 GMT+02:00 Anika Born WikiAnika@wikipedia.de:
Hy Sam,
would be good, cause PG does not hat/show the scans,
But
as I remember there was/is a policy at de.ws to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS),
cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project
(besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times)
But that is this special German-thing.....
What do the others think about it? Anika
2016-10-14 3:20 GMT+02:00 Sam Wilson sam@samwilson.id.au:
Hi all,
I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page.
The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource.
It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do.
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has *and more*!
—Sam
PS changes made by this tool are all tagged as "OAuth CID: 638" —
https://en.wikisource.org/w/index.php?title=Special:RecentCh anges&tagfilter=OAuth+CID%3A+638
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
I think the idea is good, but I would like to try that in my wikisource: could you manage to take also the few italian books that PG has? Thanks!
On Fri, Oct 14, 2016 at 8:23 AM, Anika Born WikiAnika@wikipedia.de wrote:
corr1: [...] does not ha*ve*/show the scans, [...]
Anika
2016-10-14 8:18 GMT+02:00 Anika Born WikiAnika@wikipedia.de:
Hy Sam,
would be good, cause PG does not hat/show the scans,
But
as I remember there was/is a policy at de.ws to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS),
cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project
(besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times)
But that is this special German-thing.....
What do the others think about it? Anika
2016-10-14 3:20 GMT+02:00 Sam Wilson sam@samwilson.id.au:
Hi all,
I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page.
The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource.
It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do.
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has *and more*!
—Sam
PS changes made by this tool are all tagged as "OAuth CID: 638" —
https://en.wikisource.org/w/index.php?title=Special:RecentCh anges&tagfilter=OAuth+CID%3A+638
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
I'm too very interested both into the idea and into its technical implementation, but I need some more doc for dummies to understand it fully :-(
About importing into wikisource texts alreary proofread: a text into wikisource is different from a similar text into another web site, since it is "a node into wiki network", and this goal deserves IMHO some pain to proofread (and re-format) it again, adding lots of wiki cross links.
Alex
2016-10-14 8:27 GMT+02:00 Andrea Zanni zanni.andrea84@gmail.com:
I think the idea is good, but I would like to try that in my wikisource: could you manage to take also the few italian books that PG has? Thanks!
On Fri, Oct 14, 2016 at 8:23 AM, Anika Born WikiAnika@wikipedia.de wrote:
corr1: [...] does not ha*ve*/show the scans, [...]
Anika
2016-10-14 8:18 GMT+02:00 Anika Born WikiAnika@wikipedia.de:
Hy Sam,
would be good, cause PG does not hat/show the scans,
But
as I remember there was/is a policy at de.ws to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS),
cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project
(besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times)
But that is this special German-thing.....
What do the others think about it? Anika
2016-10-14 3:20 GMT+02:00 Sam Wilson sam@samwilson.id.au:
Hi all,
I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page.
The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource.
It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do.
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has *and more*!
—Sam
PS changes made by this tool are all tagged as "OAuth CID: 638" —
https://en.wikisource.org/w/index.php?title=Special:RecentCh anges&tagfilter=OAuth+CID%3A+638
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Hy Alex,
My comment was not about spending some time on a PG-Projekt or not spending any time at all.
The point/question (when it comes to de-WS) is a different one:
(A) to spend some of our valuable contributions into a project that already is freely available (in another format) or spend this time in a (related) project that is NOT already freely available? (and we do have a lot of them)
// note, it is not about not spending any time in proofreading or the Wikisourceproject... it is about finding valuable projects/texts to invest our time...
+ (B) to spend this time in a project, that may cost us the findability of the whole wikisource-project (and all other texts on wikisource) because Google/Bing/others do tag us as fork/reuser/copy of ... (as happened in the past, at least with de, when we had some texts of the commercial http://gutenberg.spiegel.de/ that is also supported by ABBY with a free softwarelizense)
Anika
2016-10-14 10:13 GMT+02:00 Alex Brollo alex.brollo@gmail.com:
I'm too very interested both into the idea and into its technical implementation, but I need some more doc for dummies to understand it fully :-(
About importing into wikisource texts alreary proofread: a text into wikisource is different from a similar text into another web site, since it is "a node into wiki network", and this goal deserves IMHO some pain to proofread (and re-format) it again, adding lots of wiki cross links.
Alex
2016-10-14 8:27 GMT+02:00 Andrea Zanni zanni.andrea84@gmail.com:
I think the idea is good, but I would like to try that in my wikisource: could you manage to take also the few italian books that PG has? Thanks!
On Fri, Oct 14, 2016 at 8:23 AM, Anika Born WikiAnika@wikipedia.de wrote:
corr1: [...] does not ha*ve*/show the scans, [...]
Anika
2016-10-14 8:18 GMT+02:00 Anika Born WikiAnika@wikipedia.de:
Hy Sam,
would be good, cause PG does not hat/show the scans,
But
as I remember there was/is a policy at de.ws to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS),
cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project
(besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times)
But that is this special German-thing.....
What do the others think about it? Anika
2016-10-14 3:20 GMT+02:00 Sam Wilson sam@samwilson.id.au:
Hi all,
I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page.
The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource.
It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do.
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has *and more*!
—Sam
PS changes made by this tool are all tagged as "OAuth CID: 638" —
https://en.wikisource.org/w/index.php?title=Special:RecentCh anges&tagfilter=OAuth+CID%3A+638
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Back to the tool, is there some more doc to understand - step by step - how to run it? I imagine, that there's the need of a Gutemberg text and of a wikisource Index page coming from the same edition used by Gutemberg text; then the tool allows something like a "manual match and split". But perhaps I didn't understand anything.... I need to see the tool at work to understand it! :-(
At its beginning, it.source uploaded many books from an Italian project, LiberLiber, somehow similar to Project Gutemberg, and we often convert those ns0-only texts into proofread ones by various tricks; so I'd like to learn anything from Sam's tool.
Alex
2016-10-14 12:55 GMT+02:00 Anika Born WikiAnika@wikipedia.de:
Hy Alex,
My comment was not about spending some time on a PG-Projekt or not spending any time at all.
The point/question (when it comes to de-WS) is a different one:
(A) to spend some of our valuable contributions into a project that already is freely available (in another format) or spend this time in a (related) project that is NOT already freely available? (and we do have a lot of them)
// note, it is not about not spending any time in proofreading or the Wikisourceproject... it is about finding valuable projects/texts to invest our time...
- (B) to spend this time in a project, that may cost us the findability of
the whole wikisource-project (and all other texts on wikisource) because Google/Bing/others do tag us as fork/reuser/copy of ... (as happened in the past, at least with de, when we had some texts of the commercial http://gutenberg.spiegel.de/ that is also supported by ABBY with a free softwarelizense)
Anika
2016-10-14 10:13 GMT+02:00 Alex Brollo alex.brollo@gmail.com:
I'm too very interested both into the idea and into its technical implementation, but I need some more doc for dummies to understand it fully :-(
About importing into wikisource texts alreary proofread: a text into wikisource is different from a similar text into another web site, since it is "a node into wiki network", and this goal deserves IMHO some pain to proofread (and re-format) it again, adding lots of wiki cross links.
Alex
2016-10-14 8:27 GMT+02:00 Andrea Zanni zanni.andrea84@gmail.com:
I think the idea is good, but I would like to try that in my wikisource: could you manage to take also the few italian books that PG has? Thanks!
On Fri, Oct 14, 2016 at 8:23 AM, Anika Born WikiAnika@wikipedia.de wrote:
corr1: [...] does not ha*ve*/show the scans, [...]
Anika
2016-10-14 8:18 GMT+02:00 Anika Born WikiAnika@wikipedia.de:
Hy Sam,
would be good, cause PG does not hat/show the scans,
But
as I remember there was/is a policy at de.ws to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS),
cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project
(besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times)
But that is this special German-thing.....
What do the others think about it? Anika
2016-10-14 3:20 GMT+02:00 Sam Wilson sam@samwilson.id.au:
Hi all,
I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page.
The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource.
It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do.
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has *and more*!
—Sam
PS changes made by this tool are all tagged as "OAuth CID: 638" —
https://en.wikisource.org/w/index.php?title=Special:RecentCh anges&tagfilter=OAuth+CID%3A+638
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Yeah, it's exactly like a "manual match-and-split" (or at least, I'm hoping it can be).
So yes, the first step is to make sure that the WD item has the two properties: one for PG ID, and one for Wikisource Index Page. Then the tool will show a link to 'transfer' the PG book to WS.
The interface has the full PG text, that you manually select the current page from. Click the button to transfer this to the WS text-box, clean it up a bit (adding links, templates, etc.), and then save it to WS.
I'm making a little screencast of how it works; will send the link for that to this list soon.
—Sam
On 14/10/16 07:52, Alex Brollo wrote:
Back to the tool, is there some more doc to understand - step by step
- how to run it? I imagine, that there's the need of a Gutemberg text
and of a wikisource Index page coming from the same edition used by Gutemberg text; then the tool allows something like a "manual match and split". But perhaps I didn't understand anything.... I need to see the tool at work to understand it! :-(
At its beginning, it.source uploaded many books from an Italian project, LiberLiber, somehow similar to Project Gutemberg, and we often convert those ns0-only texts into proofread ones by various tricks; so I'd like to learn anything from Sam's tool.
Alex
2016-10-14 12:55 GMT+02:00 Anika Born <WikiAnika@wikipedia.de mailto:WikiAnika@wikipedia.de>:
Hy Alex, My comment was not about spending some time on a PG-Projekt or not spending any time at all. The point/question (when it comes to de-WS) is a different one: (A) to spend some of our valuable contributions into a project that already is freely available (in another format) or spend this time in a (related) project that is NOT already freely available? (and we do have a lot of them) // note, it is not about not spending any time in proofreading or the Wikisourceproject... it is about finding valuable projects/texts to invest our time... + (B) to spend this time in a project, that may cost us the findability of the whole wikisource-project (and all other texts on wikisource) because Google/Bing/others do tag us as fork/reuser/copy of ... (as happened in the past, at least with de, when we had some texts of the commercial http://gutenberg.spiegel.de/ that is also supported by ABBY with a free softwarelizense) Anika 2016-10-14 10:13 GMT+02:00 Alex Brollo <alex.brollo@gmail.com <mailto:alex.brollo@gmail.com>>: I'm too very interested both into the idea and into its technical implementation, but I need some more doc for dummies to understand it fully :-( About importing into wikisource texts alreary proofread: a text into wikisource is different from a similar text into another web site, since it is "a node into wiki network", and this goal deserves IMHO some pain to proofread (and re-format) it again, adding lots of wiki cross links. Alex 2016-10-14 8:27 GMT+02:00 Andrea Zanni <zanni.andrea84@gmail.com <mailto:zanni.andrea84@gmail.com>>: I think the idea is good, but I would like to try that in my wikisource: could you manage to take also the few italian books that PG has? Thanks! On Fri, Oct 14, 2016 at 8:23 AM, Anika Born <WikiAnika@wikipedia.de <mailto:WikiAnika@wikipedia.de>> wrote: corr1: [...] does not ha*ve*/show the scans, [...] Anika 2016-10-14 8:18 GMT+02:00 Anika Born <WikiAnika@wikipedia.de <mailto:WikiAnika@wikipedia.de>>: Hy Sam, would be good, cause PG does not hat/show the scans, But as I remember there was/is a policy at de.ws <http://de.ws> to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS), cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project (besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times) But that is this special German-thing..... What do the others think about it? Anika 2016-10-14 3:20 GMT+02:00 Sam Wilson <sam@samwilson.id.au <mailto:sam@samwilson.id.au>>: Hi all, I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/ <http://tools.wmflabs.org/pg2ws/> The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page. The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource. It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do. Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has /and more/! —Sam PS changes made by this tool are all tagged as "OAuth CID: 638" — https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638 <https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Hi Sam,
Good idea !
For me, the Wikidata linking part seems (maybe the most) important. That's a great tools to visualise that most books are badly put in Wikidata (so much P1957 missing!).
The importing from PG part seems important too (but for fr.ws - IIRC - we already have most of PG works).
Cdlt, ~nicolas
2016-10-14 12:55 GMT+02:00 Anika Born WikiAnika@wikipedia.de:
Hy Alex,
My comment was not about spending some time on a PG-Projekt or not spending any time at all.
The point/question (when it comes to de-WS) is a different one:
(A) to spend some of our valuable contributions into a project that already is freely available (in another format) or spend this time in a (related) project that is NOT already freely available? (and we do have a lot of them)
// note, it is not about not spending any time in proofreading or the Wikisourceproject... it is about finding valuable projects/texts to invest our time...
I see the thing differently: when a text is on Gutenberg, why should we redo it again from scratch on Wikisource when we can just copy it?
- (B) to spend this time in a project, that may cost us the findability of
the whole wikisource-project (and all other texts on wikisource) because Google/Bing/others do tag us as fork/reuser/copy of ... (as happened in the past, at least with de, when we had some texts of the commercial http://gutenberg.spiegel.de/ that is also supported by ABBY with a free softwarelizense)
I've never heard of this before. Did it happen only on de.ws ? is it really because of copying Gutenberg? (and was it before the proofreading which changed pretty much everything ?)
Cdlt, ~nicolas
That's a really good point Anika, I'd not considered that having PG books could be detrimental to Wikisource! :-(
I guess the reverse could also be true? That Google might think that PG is a mirror of WS, and decrease PG's page-rank. Either way, not great.
How can I investigate whether this is occuring? How did you figure it out for de.ws?
As for replicating the effort: I figure that if there are people interested in doing it, then why not! :-) Personally, I want to make Wikisource the best digital library it can be, and when I show it to people and they say "oh but you haven't got all of Dickens" or something, then I want to fix that. And it seems that importing other existing (free and open) digital libraries can help with this in a quicker fashion than straight-up proofreading. But I totally can see why people wouldn't want to spend time doing it! And that's cool.
:-)
—Sam
On 14/10/16 03:55, Anika Born wrote:
Hy Alex,
My comment was not about spending some time on a PG-Projekt or not spending any time at all.
The point/question (when it comes to de-WS) is a different one:
(A) to spend some of our valuable contributions into a project that already is freely available (in another format) or spend this time in a (related) project that is NOT already freely available? (and we do have a lot of them)
// note, it is not about not spending any time in proofreading or the Wikisourceproject... it is about finding valuable projects/texts to invest our time...
- (B) to spend this time in a project, that may cost us the
findability of the whole wikisource-project (and all other texts on wikisource) because Google/Bing/others do tag us as fork/reuser/copy of ... (as happened in the past, at least with de, when we had some texts of the commercial http://gutenberg.spiegel.de/ that is also supported by ABBY with a free softwarelizense)
Anika
2016-10-14 10:13 GMT+02:00 Alex Brollo <alex.brollo@gmail.com mailto:alex.brollo@gmail.com>:
I'm too very interested both into the idea and into its technical implementation, but I need some more doc for dummies to understand it fully :-( About importing into wikisource texts alreary proofread: a text into wikisource is different from a similar text into another web site, since it is "a node into wiki network", and this goal deserves IMHO some pain to proofread (and re-format) it again, adding lots of wiki cross links. Alex 2016-10-14 8:27 GMT+02:00 Andrea Zanni <zanni.andrea84@gmail.com <mailto:zanni.andrea84@gmail.com>>: I think the idea is good, but I would like to try that in my wikisource: could you manage to take also the few italian books that PG has? Thanks! On Fri, Oct 14, 2016 at 8:23 AM, Anika Born <WikiAnika@wikipedia.de <mailto:WikiAnika@wikipedia.de>> wrote: corr1: [...] does not ha*ve*/show the scans, [...] Anika 2016-10-14 8:18 GMT+02:00 Anika Born <WikiAnika@wikipedia.de <mailto:WikiAnika@wikipedia.de>>: Hy Sam, would be good, cause PG does not hat/show the scans, But as I remember there was/is a policy at de.ws <http://de.ws> to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS), cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project (besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times) But that is this special German-thing..... What do the others think about it? Anika 2016-10-14 3:20 GMT+02:00 Sam Wilson <sam@samwilson.id.au <mailto:sam@samwilson.id.au>>: Hi all, I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/ <http://tools.wmflabs.org/pg2ws/> The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page. The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource. It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do. Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has /and more/! —Sam PS changes made by this tool are all tagged as "OAuth CID: 638" — https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638 <https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Hm, it should work fine for it.ws too. Can you give me a WD item for a book with a PG ID and a it.ws Index page? I'll investigate further... :-)
One cool thing that I've only recently found is this list of PG's sources: http://www.pgdp.net/c/tools/project_manager/show_image_sources.php (you need to log in)
It's not very structured, but it's the only place I've found that links a PG ID to a scan on the Internet Archive or elsewhere. I'm thinking of writing a scraper to get the data so that it can at least link more PG IDs and IA identifiers on Wikidata.
—Sam
On 13/10/16 23:27, Andrea Zanni wrote:
I think the idea is good, but I would like to try that in my wikisource: could you manage to take also the few italian books that PG has? Thanks!
On Fri, Oct 14, 2016 at 8:23 AM, Anika Born <WikiAnika@wikipedia.de mailto:WikiAnika@wikipedia.de> wrote:
corr1: [...] does not ha*ve*/show the scans, [...] Anika 2016-10-14 8:18 GMT+02:00 Anika Born <WikiAnika@wikipedia.de <mailto:WikiAnika@wikipedia.de>>: Hy Sam, would be good, cause PG does not hat/show the scans, But as I remember there was/is a policy at de.ws <http://de.ws> to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS), cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project (besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times) But that is this special German-thing..... What do the others think about it? Anika 2016-10-14 3:20 GMT+02:00 Sam Wilson <sam@samwilson.id.au <mailto:sam@samwilson.id.au>>: Hi all, I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/ <http://tools.wmflabs.org/pg2ws/> The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page. The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource. It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do. Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has /and more/! —Sam PS changes made by this tool are all tagged as "OAuth CID: 638" — https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638 <https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l> _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Ok, I'll use https://www.wikidata.org/wiki/Q27245478 as an example and I'll submit it to it.source WD specialists to see if we can retrieve, or add data for a test work.
Alex
2016-10-16 1:28 GMT+02:00 Sam Wilson sam@samwilson.id.au:
Hm, it should work fine for it.ws too. Can you give me a WD item for a book with a PG ID and a it.ws Index page? I'll investigate further... :-)
One cool thing that I've only recently found is this list of PG's sources: http://www.pgdp.net/c/tools/project_manager/show_image_sources.php (you need to log in)
It's not very structured, but it's the only place I've found that links a PG ID to a scan on the Internet Archive or elsewhere. I'm thinking of writing a scraper to get the data so that it can at least link more PG IDs and IA identifiers on Wikidata.
—Sam
On 13/10/16 23:27, Andrea Zanni wrote:
I think the idea is good, but I would like to try that in my wikisource: could you manage to take also the few italian books that PG has? Thanks!
On Fri, Oct 14, 2016 at 8:23 AM, Anika Born WikiAnika@wikipedia.de wrote:
corr1: [...] does not ha*ve*/show the scans, [...]
Anika
2016-10-14 8:18 GMT+02:00 Anika Born WikiAnika@wikipedia.de:
Hy Sam,
would be good, cause PG does not hat/show the scans,
But
as I remember there was/is a policy at de.ws to not use texts from other projects (say: if there is text A in PG, there won't be a similar text A in de.WS),
cause at the time de.WS did use PG-texts... Google said WS is a mirror of PG and all other (not PG)-texts were left out in Google-Search-Results as well.... The (small) visibility of WS got lost completely... That is the reason, why there are no new projects on de-WS about texts that are available in a (nearly) similar project
(besides the effort: why spending so much time on a text that already is avilable? - you'd have to proofread ist at least two times)
But that is this special German-thing.....
What do the others think about it? Anika
2016-10-14 3:20 GMT+02:00 Sam Wilson sam@samwilson.id.au:
Hi all,
I've been tinkering with an idea I've had for importing Project Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
The idea is that, if Wikidata makes a link between a PG ID number and a Wikisource Index page, then we can go through that Index page one page at a time, and copy the page's text from the PG book to the WS page.
The interface so far isn't very brilliant, but I'm just trying to figure out if this is worthwhile or not. Basically, it's a matter of selecting the right chunk of text in the right-most text box (the full PG text) and hitting the button to move it left into the centre box. Then cleaning it up (manually and with the magic cleaning button) to make it match the image, and then uploading it to Wikisource.
It's a bad tool though, because it doesn't handle the running header, and the copy-across button doesn't do nice things with {{hws}} etc. — not to mention all the other things it doesn't do.
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has *and more*!
—Sam
PS changes made by this tool are all tagged as "OAuth CID: 638" —
https://en.wikisource.org/w/index.php?title=Special:RecentCh anges&tagfilter=OAuth+CID%3A+638
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing listWikisource-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Anyway, just thought I'd mention it. :-) Anyone think this is an avenue worth exploring? Certainly I'd love to be able to say we've got everything PG has *and more*!
Hello Sam,
great idea, moving texts over to WS is definitely worth doing in my opinion. The text can be changed / reformatted / relinked if needed and we have the ability to export texts on demand to many formats, tweaking the output if needed. PG only offers static files.
Jan
wikisource-l@lists.wikimedia.org