"There is also a difference in how we view copyright, as my own website can cut corners and scan some books that are "most likely" out of copyright, which is something Wikimedia's user communities never accept."
Some of the community accept this. Polish Wikisource project uploaded translation of one's Montgomery book, as "pseudonymous" work without any proofs that it is pseudonym (even if they are, they are against COM:PRP). It's still on Commons and AFAIK rejected to delete by admins or not decided yet.
Mateusz Malinowski
niedz., 27 gru 2020, 13:02 użytkownik < wikisource-l-request@lists.wikimedia.org> napisał:
Send Wikisource-l mailing list submissions to wikisource-l@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/wikisource-l or, via email, send a message with subject or body 'help' to wikisource-l-request@lists.wikimedia.org
You can reach the person managing the list at wikisource-l-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikisource-l digest..."
Today's Topics:
- Systems for proofreading scanned books (Lars Aronsson)
- Re: Systems for proofreading scanned books (J Hayes)
Message: 1 Date: Sat, 26 Dec 2020 19:23:02 +0100 From: Lars Aronsson lars@aronsson.se To: Wikimedia developers wikitech-l@lists.wikimedia.org Cc: Wikisource wikisource-l@lists.wikimedia.org Subject: [Wikisource-l] Systems for proofreading scanned books Message-ID: e04dc83b-0da2-0c89-fc39-c5f28e0b5443@aronsson.se Content-Type: text/plain; charset=utf-8; format=flowed
In 2005, at the first Wikimania in Frankfurt, Germany, Magnus Manske asked me if I could open up my Scandinavian book scanning website Project Runeberg to German and other languages, or release the software as open source.
I refused, as my software is just a rapid prototype that would need to be rewritten from scratch anyway. But I said that Wikisource could be used for this purpose. At the time, Wikisource was only a wiki for e-text. As a proof of concept, I put up "Meyers Blitz-Lexikon" as the first book with scanned page images in Wikisource, https://de.wikisource.org/wiki/Seite:LA2-Blitz-0005.jpg and soon after the "New Student's Reference Work", https://en.wikisource.org/wiki/Page:LA2-NSRW-1-0013.jpg
This was the basic inspiration for the "Proofread Page" extension, now used in Wikisource.
In 2010-2011 I tried to use Wikisource, but I thought this extension was too hard to work with. From scanner to finished presentation, Wikisource was so much slower to work with than my own system. By primary gripes are: It is too hard to upload PDF files to Commons, it's too hard to create the Index page, each page is not created immediately (making the raw OCR text searchable), and pages hidden in the Page: namespace are not always indexed by search engines. Unfortunately, the system hasn't improved much in the last decade.
(My criticism of my own website's system is a lot harsher, but hits different targets.)
There is also a difference in how we view copyright, as my own website can cut corners and scan some books that are "most likely" out of copyright, which is something Wikimedia's user communities never accept.
In 2012, I thought the time had finally come to rewrite my software, but I failed to organize a project around this, and instead I continued to use the existing system, just adding volume. Indeed, Project Runeberg has grown from 0.75 million book pages in 2012 to 3.1 million pages today.
Now in 2020, I'm finally tired of my existing system's limitations. What should I do? It's not 2005 or 2012 anymore. What has changed in that time?
I can't move everything over to Wikisource, because of the copyright differences.
Should I start to use Mediawiki + ProofreadPage and convert my collection to that format?
Should I develop my own modification of Mediawiki? Is that a stable ground to work from?
It seems to me that PHP, MariaDB and the architecture of Mediawiki with extensions has now been the same for a long time. Will this last for the next 20 years?
Or is there today some other existing systems that solve the same problem, that weren't available in 2005? (And that Wikisource would have picked up, if it were started today, instead of developing its own extension.)
-- Lars Aronsson (lars@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/
Message: 2 Date: Sat, 26 Dec 2020 18:20:06 -0500 From: J Hayes slowking4@gmail.com To: "discussion list for Wikisource, the free library" wikisource-l@lists.wikimedia.org Subject: Re: [Wikisource-l] Systems for proofreading scanned books Message-ID: <CAN38RzKojj9K= nZ50Lbbvv5ZUND9WcA5kCRGeh++33ohfHB5Gg@mail.gmail.com> Content-Type: text/plain; charset="utf-8"
My suggestions: Simplified UX to upload works is on the wishlist But a tool that led to user to interact on multiple projects to produce a “rough draft” work from a scan would be a great step forward. Copyright might be eased for a local copy at wikisource, not on commons. But you would need some community consensus. If you were bringing tools, they might work with you, you should reach out to them. You could also transfer over the easy copyright works to wikisource, and retain the loose ones at your site. (The value to using wikisource is the increased visibility being integrated in Wikipedia, and community building potential) So I would brainstorm some goals, and begin a conversation / partnership with your wikisource language community toward an action plan. If I can be of help let me know. Cheers Jim hayes
On Sat, Dec 26, 2020 at 1:23 PM Lars Aronsson lars@aronsson.se wrote:
In 2005, at the first Wikimania in Frankfurt, Germany, Magnus Manske asked me if I could open up my Scandinavian book scanning website Project Runeberg to German and other languages, or release the software as open source.
I refused, as my software is just a rapid prototype that would need to be rewritten from scratch anyway. But I said that Wikisource could be used for this purpose. At the time, Wikisource was only a wiki for e-text. As a proof of concept, I put up "Meyers Blitz-Lexikon" as the first book with scanned page images in Wikisource, https://de.wikisource.org/wiki/Seite:LA2-Blitz-0005.jpg and soon after the "New Student's Reference Work", https://en.wikisource.org/wiki/Page:LA2-NSRW-1-0013.jpg
This was the basic inspiration for the "Proofread Page" extension, now used in Wikisource.
In 2010-2011 I tried to use Wikisource, but I thought this extension was too hard to work with. From scanner to finished presentation, Wikisource was so much slower to work with than my own system. By primary gripes are: It is too hard to upload PDF files to Commons, it's too hard to create the Index page, each page is not created immediately (making the raw OCR text searchable), and pages hidden in the Page: namespace are not always indexed by search engines. Unfortunately, the system hasn't improved much in the last decade.
(My criticism of my own website's system is a lot harsher, but hits different targets.)
There is also a difference in how we view copyright, as my own website can cut corners and scan some books that are "most likely" out of copyright, which is something Wikimedia's user communities never accept.
In 2012, I thought the time had finally come to rewrite my software, but I failed to organize a project around this, and instead I continued to use the existing system, just adding volume. Indeed, Project Runeberg has grown from 0.75 million book pages in 2012 to 3.1 million pages today.
Now in 2020, I'm finally tired of my existing system's limitations. What should I do? It's not 2005 or 2012 anymore. What has changed in that time?
I can't move everything over to Wikisource, because of the copyright differences.
Should I start to use Mediawiki + ProofreadPage and convert my collection to that format?
Should I develop my own modification of Mediawiki? Is that a stable ground to work from?
It seems to me that PHP, MariaDB and the architecture of Mediawiki with extensions has now been the same for a long time. Will this last for the next 20 years?
Or is there today some other existing systems that solve the same problem, that weren't available in 2005? (And that Wikisource would have picked up, if it were started today, instead of developing its own extension.)
-- Lars Aronsson (lars@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
wikisource-l@lists.wikimedia.org