Lars Aronsson wrote:
At
http://runeberg.org/ I digitize old
books, among them several encyclopedias. For the sake of
familiarity, you can think about scanned books in Wikisource
rather than my website.
In many cases an encyclopedia from 1889 is useful for knowing the
population of Aberdeen in 1889. It could be nice to report what
the current population is, but in some cases it is also important
to point out that the reported number for 1889 was indeed wrong.
But if scanning and OCRing one page takes 3 seconds and
proofreading takes 3 minutes, how long does it take to check all
the facts? Not knowing how this should best be addressed, it
seemed like a stupid idea to digitize more old works that are full
of errors.
The originals are the originals, errors and all. Correcting their
errors is a bit like changing history. We cannot accept responsibility
for a lack of neutrality in these old works. We can let readers know
that that's the way the facts appeared, and perhaps add footnotes when
we find an error. In some cases these inaccuracies became the
foundation of whole streams of though that followed them. Students of
paleography are able to trace the origin of manuscripts by tracing
common errors. Each Wikipedia articles is accompanied by a history
which documents every little change. Similarly every error-filled old
work is as much a part of the history of that subject.
So one problem still exists: From the scanned book
page, there is
no link to the Wikipedia article that provides more up-to-date
information. The reader of the scanned page can of course use a
search engine, and will often find the Wikipedia article. But is
this really the ultimate solution? And even if the Wikipedia
article is found, the other scanned pages that link to the same
article are not found from there.
Should each scanned book page include a list of links to Wikipedia
articles that are relevant for the page? Could such lists be
compiled (or suggested) automatically?
This depends on what you see as the relative roles of the scanned page
and the transcribed page. The former is a connection with the past and
the latter with the future. The scanned page needs to give us a
perfectly accurate representation of what we were given to work with.
Each time we mark it up moves us a little further from what it was.
Even someting as simple as putting double square brackets around a word
could be questionable. The transcribed page is what makes Wikisource
special. Links and categories there should be encouraged. So should
all manner of annotations and translations.
Should Wikisource have a [[category:Aberdeen]] that
collects all
pages, chapters and books that pertain to this town? Today the
English Wikisource has one [[Category:Works by subject]], but
under this is a very small tree, compared to all articles in
Wikipedia. There is no category for Aberdeen, but one for
Scotland that has 15 links of which 4 are to articles in the 1911
Encyclopaedia Britannica. The 1911 EB article "Aberdeen (burgh)"
is not among these four,
http://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Aberdeen_%2…
I don't think that the category system is the best way of handling
this. Categorization can sometimes be highly subjective, and we do not
lack for individuals who make arguing about categories a priority. An
improved internal search engine would be better. Among the options it
should include would be Search titles, Search links, and Search whole
texts. I have long also envisioned the possibility that links with
Wiktionary could also provide evidence of how words have been used
historically, or develop concordances of any work included in Wikisource.
Wikisource also has a [[Category:Ottoman Empire]] that
contains
four articles from the 1911 Encyclopaedia Britannica, one other
chapter and two other works. But the corresponding category on
the English Wikipedia has 56 pages and 12 immediate subcategories.
Even the sub-subcategory Ottoman railways has 6 Wikipedia
articles. On Wikisource there seem to be 6 mentions of the
"Orient Express", but these are found through Google and not
through links on the website,
http://www.google.com/search?q=%22orient+express%22+site%3Aen.wikisource.org
Sounds like we have a lot of work ahead. :-)
Ec