> 2. It's important to emphasize that what I was writing about wasn't something theoretical, but something that has already been happening for years at Hebrew Wikisource. We have been doing both critical editions and scholarly editing, have had much fruitful discussion and collaboration, and never even once has there ever been an edit war. Work on projects like these is indeed slower than the process of scanning and proofreading, but the final product is often a much more valuable contribution to the public. (The public domain scan was already available anyway. But where once a reliable edition was copyrighted and unavailable, now an even better edition is available online under a free license!)
>
> I would suggested learning from our experience, and looking at your own Wikisource with a generous eye that values many different types of collaboration on texts towards building a useful free library for the public. While you and others are proofreading, be appreciative at the same time towards others who are editing in other ways.
Sound to me like a big mistake. Wikisource is a source, not an editor ;
we have not to decide what is more valuable for the public. And soon or
later there will be wars edit.
I think we are forgetting that Wikisource is a WIKI!
>This is not the question ; as I said : who decide what is a good
critical edition ?
The community decides through collaboration and discussion.
We are a wiki.
>Again, you talked about critical editions ; who decide what is a good
>edition then ? The quality of Wikisource can not be based only on what
>contributors think to be good. This is a cercle, and that doesn't make
>Wikisource reliable.
"The quality of an encyclopedia cannot be based only on what contributors
think to be good. This is a circle, and that doesn't make Wikipedia reliable.
That is why Britannica is a much better idea."
>I still wonder who decide what is good for the public. Beside, there is
>some rules that define Wkisource, what it is, and what it is not.
We are a wiki. In any Wikimedia project, the community needs to think about and
decide how it can best serve the public. The rules that define Wikisource are created
by the Wikisource community and such things are discussed (which is what
I hope we are doing here).
>If Wikisource publishes critical editions, there will be wars edit,
>because there is no critere to this kind of editions except what the
>contributors decide.
So go argue with 8 years of experience instead of trying to learn from it... :-)
Have to go...
>Sound to me like a big mistake. Wikisource is a source, not an editor ;
>we have not to decide what is more valuable for the public. And soon or
>later there will be wars edit.
Why is it a "big mistake" to provide valuable, useful editions of classic works to
the public under a free license?
Almost all "sources" require good editing, and any good library requires quality
editions. If a good edition is not in the public domain, then just proofreading OCR
won't produce a quality edition for your "free library".
Beyond that, there is no need to declare that Wikisource is THIS and not THAT.
A more generous view of things will better serve both the project and the public.
And like I said, we've never had an edit war (in about 8 years). I tend to think that
is because the people who edit texts and the process of editing texts are both less
prone to edit wars than are Wikipedia articles. It is a different culture. Of course it
could still happen, but then maybe it would be better not to have Wikipedia either
since edit wars happen there?
>Sound to me like a big mistake. Wikisource is a source, not an editor ;
>we have not to decide what is more valuable for the public. And soon or
>later there will be wars edit.
Why is it a "big mistake" to provide valuable, useful editions of classic works to
the public under a free license?
Almost all "sources" require good editing, and any good library requires quality
editions. If a good edition is not in the public domain, then just proofreading OCR
won't produce a quality edition for your "free library".
Beyond that, there is no need to declare that Wikisource is THIS and not THAT.
A more generous view of things will better serve both the project and the public.
And like I said, we've never had an edit war (in about 8 years). I tend to think that
is because the people who edit texts and the process of editing texts are both less
prone to edit wars than are Wikipedia articles. It is a different culture. Of course it
could still happen, but then maybe it would be better not to have Wikipedia either
since edit wars happen there?
If someone is interested,
Alex Brollo is digging into the djvu layer issue,
we have a Dropbox folder with all the files.
If you are interested in working on that, please drop me a mail.
What we can show you right now is this:
https://www.dropbox.com/s/lu6re2a02xp0nyc/Dialogo%20della%20salute%20djvu%2…
As you can see, the text is not mapped again into the djvu, but it is
"stored" all togheter in a region of the djvu page (in this case, left
angle below).
It is very difficult to re-map the text, for example because when we use
the tag <ref> for footnotes we destroy the pattern :-(
The cool thing is that the text inside is already formatted in wikitext!
https://www.dropbox.com/s/s2c0op5e9jeu47o/Dialogo%20della%20salute%20WS%20s…
Alex assures me this is easy and just uses few scripts from djvulibre
(which is already installed in toolserver).
The same could be made uploading wiki-rendered HTML into text layer.
This could be very interesting for other websites: they could just
copy-and-paste the HTML file, or extract it with a simple python script
calling for djvuLibre routines, and then use the Commons file as a
benchmark.
We could, maybe, give back some of our books to the Gutenberg project.
Or, maybe, give it back to GLAMs.
What do you think?
Aubrey and Alex
Thank you Lars! Two brief replies:
1. Thanks to you and Birgitte for the clarification about Wikidata. It's looks like that isn't the place to go, but on the other hand what has been suggested here about Mediawiki supporting TEI in the future could open a lot of doors. I would really like to know where to go and who is involved in that.
>I personally think that simple scanning and proofreading is
>the activity where we can most easily grow Wikisource. Since
>the job is mostly non-intellectual, many people can be
>instructed to help, without creating edit wars...
>Translation or scholarly
>editing requires more coordination and takes more time for
>a larger work than the sum of the parts.
2. It's important to emphasize that what I was writing about wasn't something theoretical, but something that has already been happening for years at Hebrew Wikisource. We have been doing both critical editions and scholarly editing, have had much fruitful discussion and collaboration, and never even once has there ever been an edit war. Work on projects like these is indeed slower than the process of scanning and proofreading, but the final product is often a much more valuable contribution to the public. (The public domain scan was already available anyway. But where once a reliable edition was copyrighted and unavailable, now an even better edition is available online under a free license!)
I would suggested learning from our experience, and looking at your own Wikisource with a generous eye that values many different types of collaboration on texts towards building a useful free library for the public. While you and others are proofreading, be appreciative at the same time towards others who are editing in other ways.
Dovi
Thanks Lars.
>Your examples 1 and 2 are the combination of two printed
>editions or variants into one digital product. That process is
>scholarly, text-critical editing, an intellectual exercise. For
>example, if the British and American editions would be found
>to differ not only in spelling but also in content, you would
>have to develop a policy for how to deal with that.
Absolutely correct, and that is exactly what we have done at Hebrew Wikisource. If there is a book that requires special editorial guidelines beyond just simple proofreading, then a page in the Wikisource namespace is created such as [[Wikisource:The Kinematics of Machinery]] where the community collaboratively develops those guidelines.
>The current
>process in Wikisource, as supported by the ProofreadPage
>extension, doesn't address such issues, but only converts one
>printed edition into a digital edition, through scanned images
>and human proofreading. It is a much more limited task, a
>mostly non-intellectual exercise, guided by simple rules.
Also correct to some degree for Wikisources in the larger Latin languages, but not all of Wikisource is this process, not even in English and certainly not in many other languages. There are still plenty of people at en.wikisource who edit and format texts without PP (e.g. based on Gutenburg files or typing themselves), Wikisource translations, etc. "Proofread Page" is a tool for Wikisource, not the definition of the project itself.
Even if many people at English Wikisource are not currently preoccupied with issues 1&2, wouldn't it be healthy to broaden horizons? Imagine Wikisource creating a modern version of the Loeb Classical Library based on collaborative work... It's wonderful to transcribe Mark Twain or the 1911 Britannica from scanned editions, but the full power and possibilities of the Wiki platform are so much more than that!
>It can't link to both. Ideally, ProofreadPage would be remade so
>that each position in the book (a certain chapter, a certain page,
>a certain paragraph) has only one unique address. This is
>an aspect that apparently was not considered when the current
>software and namespace architecture were developed.
Totally agree that would be a very important function. Equally important would be for the function to allow reference and citation with the simplest address possible: The title of the book plus completely flexible labels for the subsections so that links can be written manually in an intuitive way.
I looked at Aubrey's onion layers again and it seems to me they actually might be able to include the kinds of things I mentioned in 1&2, but I'd like to hear from her about that.
As to her wondering whether Wikisource is the place for such things, it really shouldn't be such an issue. A simple analogy is called for: Let's say a Wikipedia article needs to be written about the 2012 US Presidential elections. Writing such an article requires a huge amount of fact finding, decisions about writing and presentation and balance. Those problems are solved when there is good faith collaborative editing, by documenting external sources and scholarship, and by a commitment to presenting all sides of an issue fairly (NPOV). That is why even a highly controversial topic like the US presidential elections can have an article in Wikipedia.
The obstacles in creating a critical or annotated version of a text at Wikisource are far *less* in terms of original research or NPOV than in creating almost any Wikipedia article. The best way to find out is to simply try it!
I looked at DPLA by the way and it looks like a wonderful thing. But I can't imagine it replacing Wikisource in terms of quite a few fundamentals: Open Licensing, full commitment to many languages and cultures with full localization, and creative collaboration not just to document the existing library, but to enhance it and improve it.
Does anyone understand whether the years of discussion of "Wikidata" might have anything to do with #1-2?
Dovi
I am cataloguing the non-WM contacts I made at Wikimania. Reading up on them, basically making the time I didn't have during the conference to understand who they are. The BHL had three librarians at the conference, I remember that they were very interested in a solution for getting OCR corrections back into djvu files. They mentioned Wikisource in the blog post they made about the conference. I thought the list might be interested in what they had to say.
http://blog.biodiversitylibrary.org/2012/07/wikimania-2012.html
Birgitte SB
Thanks for all of the kind replies. Thanks Guarav for the links, I wish there was a clear explanation for all the elements in Aubrey’s layers. As for your example of annotation, it is gorgeous and is obviously the result of a lot of dedicated work.
Birgitte and Lars, maybe an example would be the best way to explain what I am trying to ask. I apologize in advance that the following example is contrived, because I am hard put to find a true example of the issues I’m talking about in actual English texts on Wikisource. So I’ll try to take you through my imaginary example step-by-step.
1. Imagine an English-language encyclopedia that was incredibly popular, so popular that it was published in both American and British editions. It’s the same encyclopedia in both editions, but the spelling is different. In a digital version of this encyclopedia that included database functions you would be able to tag words so as to allow the reader to choose which version s/he wants in terms of spelling with something like this:
{{spelling|A=color|B=colour}} (I understand that American versus British spelling is something so simplistic that an automatic function could probably deal with it even without tags or templates, but bear with me by considering that there might be other valid variations that are far more complicated, and which need to be tagged and documented in order to provide the user with options.)
2. Now further imagine that this encyclopedia was so popular that it was republished many times in the *same* edition. Each time the typesetting was manually reset, which allowed for small corrections to be made (e.g. typos but sometimes even greater variations) but at the very same time allowed new errors to creep in. So when you edit the text, you have several good editions of the same encyclopedia that cast light upon one another, but none of which is perfect. The best way to digitally republish such an encyclopedia would be to fully document the variations using a function something like this (where a,b,c,d are various reprints of the text):
{{variant|select=Wikisource is the Free Library and invites you to contribute!|=abd|c=Wikisource is the Free Libraries and invites you to contribute!|note=c is often sloppy about singular and plural nouns}}
For those who are familiar with the “critical apparatus” that often accompanies classical texts in scientific editions, this is a way to take that kind of apparatus and embed it within the text itself on the edit page. But a database function would further allow the user to show one particular version as s/he chooses. It would also allow the user to have a function making indications of variant readings and notes on them appear or disappear by turning the function on or off.
3. Now further imagine that what we are talking about is not an encyclopedia, but rather a legalistic type of literature that is organized by numbered sections and subsections. Furthermore, this literature cites itself avidly, and certain subsections of this book might be cited elsewhere or appear in other contexts tens of thousands of times (literally). Because of the need for convenient citation (often through transclusion) along with the fact of numerous similar editions with different pagination, the page-based “Proofread Page” is no longer the optimal tool for creating digital editions of this literature, and actually makes things more difficult for contributors. Instead, wiki-pages based on the natural division of the text allow for easy citation while keeping things as simple as possible, plus links to various scanned editions can be provided for verification and further improvement of the text.
What I have described here in #3 is the main reason why “Proofread Page” is not heavily used in Hebrew Wikisource. It is installed but not well-supported with infrastructure. I emphasize that in my opinion it is an incredible and important tool, and certainly should be used where appropriate for huge numbers of texts. At Hebrew Wikisource there is certainly no policy against it, and of course we would love it if someone came and started to use it on appropriate texts and improved the Hebrew infrastructure for it. But that still wouldn’t make it appropriate for all texts.
However, in terms of database functions within the text itself I don’t think there is really any issue with Proofread Page. Because when all is said and done, the proofread text of a page is still wikitext. And the question is whether wikitext in general (not PP in particular) could be made to support the kinds of database functions described above.
I hope all of this is clearer than my original inquiry. Was anything discussed at Wikimedia (including Aubrey’s various layers) that might make solutions possible for functions like these?
Dovi
Hi, please forgive me in advance if my technical knowledge isn't up to speed and I don't entirely understand the issues.
>From what I've seen, there is currently an effort to allow database functions for metadata about Wikisource texts.
That in itself is of course very cool.
My question is about the actual texts themselves (not just the metadata describing them):
Often there is more than one good way to format and present a single text. In the current Wikimedia environment this forces the community to decide on which format for any given text is the best one for readers and users. But in a true database environment it would be possible to tag all of the different possibilities within the text itself, allowing the reader or user to choose which format best serves his or her needs.
Is this possibility related to any of the current discussions?
Dovi