Jim Hu wrote:
For example, the web service at Pubmed provide the abstract and links to full text (at yet another website) for a publication. My users would want to add things like: "This paper describes a resource that turned out to be useful for doing X" or "Figure 1 in this paper shows this thing that the authors didn't notice" or "The xxx gene described in this paper is also known as yyy; they were shown to be the same 10 years later" etc.
I have a similar problem. At http://runeberg.org/ I digitize old books, among them several encyclopedias. For the sake of familiarity, you can think about scanned books in Wikisource rather than my website.
In many cases an encyclopedia from 1889 is useful for knowing the population of Aberdeen in 1889. It could be nice to report what the current population is, but in some cases it is also important to point out that the reported number for 1889 was indeed wrong. But if scanning and OCRing one page takes 3 seconds and proofreading takes 3 minutes, how long does it take to check all the facts? Not knowing how this should best be addressed, it seemed like a stupid idea to digitize more old works that are full of errors.
When Wikipedia was started in 2001 and started to get off the ground, this became the obvious place to put information on the current and historic population of Aberdeen. The scanning of old texts no longer had to carry this role. It was really only in 2002 and 2003 that I got the energy to scan more works for my own site, and in 2005 I scanned this for Wikisource, http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work
Turns out Aberdeen's population in 1911 was 163,084, http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work/1-0016 http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work/Aberdeen but this bit of information is not linked to or included in http://en.wikipedia.org/wiki/Aberdeen#Population
So one problem still exists: From the scanned book page, there is no link to the Wikipedia article that provides more up-to-date information. The reader of the scanned page can of course use a search engine, and will often find the Wikipedia article. But is this really the ultimate solution? And even if the Wikipedia article is found, the other scanned pages that link to the same article are not found from there.
Should each scanned book page include a list of links to Wikipedia articles that are relevant for the page? Could such lists be compiled (or suggested) automatically?
Should Wikisource have a [[category:Aberdeen]] that collects all pages, chapters and books that pertain to this town? Today the English Wikisource has one [[Category:Works by subject]], but under this is a very small tree, compared to all articles in Wikipedia. There is no category for Aberdeen, but one for Scotland that has 15 links of which 4 are to articles in the 1911 Encyclopaedia Britannica. The 1911 EB article "Aberdeen (burgh)" is not among these four, http://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Aberdeen_%28...
Wikisource also has a [[Category:Ottoman Empire]] that contains four articles from the 1911 Encyclopaedia Britannica, one other chapter and two other works. But the corresponding category on the English Wikipedia has 56 pages and 12 immediate subcategories. Even the sub-subcategory Ottoman railways has 6 Wikipedia articles. On Wikisource there seem to be 6 mentions of the "Orient Express", but these are found through Google and not through links on the website, http://www.google.com/search?q=%22orient+express%22+site%3Aen.wikisource.org
I think we are far away from having the subject categories worked out at en.WS. We are still slowly working on populating the easy categories (like dates and languages). Basically categories in general need to be populated with manposer we do not currently have. Subject categories need discussion and decisions on how we are going to map those areas which is harder to produce than the manpower to populate easy categories.
More to your point about Wikisource encyclopedia articles linking back to Wikipedia. en.WS currently uses a space called "notes" in the header of the page to say something like "See the modern Wikipedia entry at [[w:Ethiopia|Ethiopia]]". Of course this is all in the transcribed page than on the scanned page. But I would think the scanned pages are used primarily to proofread the the transcribed pages and the transcribed pages would be what readers are seeeing. So I am not sure why the scanned page need links to Wikipedia.
http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work/Abyssinia
Birgitte SB
--- Lars Aronsson lars@aronsson.se wrote:
Jim Hu wrote:
For example, the web service at Pubmed provide the
abstract and
links to full text (at yet another website) for a
publication.
My users would want to add things like: "This
paper describes a
resource that turned out to be useful for doing X"
or "Figure 1
in this paper shows this thing that the authors
didn't notice"
or "The xxx gene described in this paper is also
known as yyy;
they were shown to be the same 10 years later"
etc.
I have a similar problem. At http://runeberg.org/ I digitize old books, among them several encyclopedias. For the sake of familiarity, you can think about scanned books in Wikisource rather than my website.
In many cases an encyclopedia from 1889 is useful for knowing the population of Aberdeen in 1889. It could be nice to report what the current population is, but in some cases it is also important to point out that the reported number for 1889 was indeed wrong. But if scanning and OCRing one page takes 3 seconds and proofreading takes 3 minutes, how long does it take to check all the facts? Not knowing how this should best be addressed, it seemed like a stupid idea to digitize more old works that are full of errors.
When Wikipedia was started in 2001 and started to get off the ground, this became the obvious place to put information on the current and historic population of Aberdeen. The scanning of old texts no longer had to carry this role. It was really only in 2002 and 2003 that I got the energy to scan more works for my own site, and in 2005 I scanned this for Wikisource,
http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work
Turns out Aberdeen's population in 1911 was 163,084,
http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work/1-0016
http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work/Aberdeen
but this bit of information is not linked to or included in http://en.wikipedia.org/wiki/Aberdeen#Population
So one problem still exists: From the scanned book page, there is no link to the Wikipedia article that provides more up-to-date information. The reader of the scanned page can of course use a search engine, and will often find the Wikipedia article. But is this really the ultimate solution? And even if the Wikipedia article is found, the other scanned pages that link to the same article are not found from there.
Should each scanned book page include a list of links to Wikipedia articles that are relevant for the page? Could such lists be compiled (or suggested) automatically?
Should Wikisource have a [[category:Aberdeen]] that collects all pages, chapters and books that pertain to this town? Today the English Wikisource has one [[Category:Works by subject]], but under this is a very small tree, compared to all articles in Wikipedia. There is no category for Aberdeen, but one for Scotland that has 15 links of which 4 are to articles in the 1911 Encyclopaedia Britannica. The 1911 EB article "Aberdeen (burgh)" is not among these four,
http://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Aberdeen_%28...
Wikisource also has a [[Category:Ottoman Empire]] that contains four articles from the 1911 Encyclopaedia Britannica, one other chapter and two other works. But the corresponding category on the English Wikipedia has 56 pages and 12 immediate subcategories. Even the sub-subcategory Ottoman railways has 6 Wikipedia articles. On Wikisource there seem to be 6 mentions of the "Orient Express", but these are found through Google and not through links on the website,
http://www.google.com/search?q=%22orient+express%22+site%3Aen.wikisource.org
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikisource-l
____________________________________________________________________________________ TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. http://tv.yahoo.com/
Birgitte SB wrote:
So I am not sure why the scanned page need links to Wikipedia.
I'm not insisting that they need to. The sequence of chapters or articles runs in parallel to the sequence of scanned pages. Exactly how they should best be represented in digital is still an open issue. Some formats (scanned PDF) is very close to the printed pages, other formats (e.g. classic Project Gutenberg e-text) tries hard to distance itself from the printed pages. Subject linking should at best refer to a point (or sometimes a range) in either sequence, the entry point where I should start to read this book if I want to learn about Aberdeen. The development in WS of "page:" and "notes" is very interesting to follow.
We tend to think of Wikipedia as something big, one of the biggest things to have happened on the Internet. But walk into any library and consider how small their encyclopedia shelf is in comparison to the rest of the library. We have a lot to do. In the end, Wikipedia might serve as a subject index to Wikisource. The index tends to be smaller than the book.
Lars Aronsson wrote:
We tend to think of Wikipedia as something big, one of the biggest things to have happened on the Internet. But walk into any library and consider how small their encyclopedia shelf is in comparison to the rest of the library. We have a lot to do. In the end, Wikipedia might serve as a subject index to Wikisource. The index tends to be smaller than the book.
In relation to what has been done in the past Wikipedia is indeed huge.
I see is as an index to both Wikisource and Wikibooks, and even more.
Your point is well taken. Even if we eliminate all copyright protected work from consideration the task is still huge. I sometimes wonder whether those who complain about the notability of an article ever grasp the size of the task It often seems that the only thing less notable about some of these subjects is the discussion about their notability. I would rather see all that manpower working on building the new library insead of obsessing about the graffiti that has been scribbled in a book.
Ec
Lars Aronsson wrote:
At http://runeberg.org/ I digitize old books, among them several encyclopedias. For the sake of familiarity, you can think about scanned books in Wikisource rather than my website.
In many cases an encyclopedia from 1889 is useful for knowing the population of Aberdeen in 1889. It could be nice to report what the current population is, but in some cases it is also important to point out that the reported number for 1889 was indeed wrong. But if scanning and OCRing one page takes 3 seconds and proofreading takes 3 minutes, how long does it take to check all the facts? Not knowing how this should best be addressed, it seemed like a stupid idea to digitize more old works that are full of errors.
The originals are the originals, errors and all. Correcting their errors is a bit like changing history. We cannot accept responsibility for a lack of neutrality in these old works. We can let readers know that that's the way the facts appeared, and perhaps add footnotes when we find an error. In some cases these inaccuracies became the foundation of whole streams of though that followed them. Students of paleography are able to trace the origin of manuscripts by tracing common errors. Each Wikipedia articles is accompanied by a history which documents every little change. Similarly every error-filled old work is as much a part of the history of that subject.
So one problem still exists: From the scanned book page, there is no link to the Wikipedia article that provides more up-to-date information. The reader of the scanned page can of course use a search engine, and will often find the Wikipedia article. But is this really the ultimate solution? And even if the Wikipedia article is found, the other scanned pages that link to the same article are not found from there.
Should each scanned book page include a list of links to Wikipedia articles that are relevant for the page? Could such lists be compiled (or suggested) automatically?
This depends on what you see as the relative roles of the scanned page and the transcribed page. The former is a connection with the past and the latter with the future. The scanned page needs to give us a perfectly accurate representation of what we were given to work with. Each time we mark it up moves us a little further from what it was. Even someting as simple as putting double square brackets around a word could be questionable. The transcribed page is what makes Wikisource special. Links and categories there should be encouraged. So should all manner of annotations and translations.
Should Wikisource have a [[category:Aberdeen]] that collects all pages, chapters and books that pertain to this town? Today the English Wikisource has one [[Category:Works by subject]], but under this is a very small tree, compared to all articles in Wikipedia. There is no category for Aberdeen, but one for Scotland that has 15 links of which 4 are to articles in the 1911 Encyclopaedia Britannica. The 1911 EB article "Aberdeen (burgh)" is not among these four, http://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Aberdeen_%28...
I don't think that the category system is the best way of handling this. Categorization can sometimes be highly subjective, and we do not lack for individuals who make arguing about categories a priority. An improved internal search engine would be better. Among the options it should include would be Search titles, Search links, and Search whole texts. I have long also envisioned the possibility that links with Wiktionary could also provide evidence of how words have been used historically, or develop concordances of any work included in Wikisource.
Wikisource also has a [[Category:Ottoman Empire]] that contains four articles from the 1911 Encyclopaedia Britannica, one other chapter and two other works. But the corresponding category on the English Wikipedia has 56 pages and 12 immediate subcategories. Even the sub-subcategory Ottoman railways has 6 Wikipedia articles. On Wikisource there seem to be 6 mentions of the "Orient Express", but these are found through Google and not through links on the website, http://www.google.com/search?q=%22orient+express%22+site%3Aen.wikisource.org
Sounds like we have a lot of work ahead. :-)
Ec
wikisource-l@lists.wikimedia.org