About texts without supporting files and "Index:" pages

List overview All Threads
Download

newer

older

Converting pdf files into wiki...

Blog post about sources in...

David Cuenca

10 Jun 2013 10 Jun '13

5:17 p.m.

With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow: - to share data book data between Commons, Wikisource and Wikipedia - to update it, when any of the sites has been updated - to facilitate better search functions (like searches by author, or topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Attachments:

attachment.htm (text/html — 1.4 KB)

Show replies by date

Aarti K. Dwivedi

10 Jun 10 Jun

6:45 p.m.

New subject: About texts without supporting files and "Index:" pages

Hi,

There was a thread some time ago where there were talks of having books which were born digital. These pages wouldn't have scans. What the 'Index' page would have in these cases is something I am not very sure about.

Cheers, Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.com wrote:

...

With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:

to share data book data between Commons, Wikisource and Wikipedia

to update it, when any of the sites has been updated

to facilitate better search functions (like searches by author, or

topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Alex Brollo

9:38 p.m.

New subject: About texts without supporting files and "Index:" pages

I don't see the need to change deeply Index/ns0 relationship, while I appreciate the idea "promote coherence reducing redundance" (many years ago I painfully used dBase III - dBase IV and I learned that principle by "try and learn").

Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a brief message about relationship among wikidata, commons, wikisource and any other project. Don't follow the link, it's so short that I copy it here (but if you like it, comment it there):

Scribunto-Lua and Wikidata I'd like a library to get Wikidata content; it would be a good idea IMHO to access to Wikidata data in plain form, just as such data would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)

If such a Lua library could be built, to import data from wikidata would be as simple, as writing a template, and data will be self-aligned.

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2093@gmail.com

...

Hi,
There was a thread some time ago where there were talks of having
books which were born digital. These pages wouldn't have scans. What the 'Index' page would have in these cases is something I am not very sure about.

Cheers, Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.com wrote:

...
With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:

to share data book data between Commons, Wikisource and Wikipedia

to update it, when any of the sites has been updated

to facilitate better search functions (like searches by author, or

topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

David Cuenca

9:46 p.m.

New subject: About texts without supporting files and "Index:" pages

@Alex: but what do you think of storing the source information in "Index:" pages for all works stored in Wikisource, even if they don't have a supporting scan?

That was the original question :)

About your proposed library, it would be more useful if it could modify data in Wikidata, not only import it. Besides, if the Wikidata client is installed in Wikisource, the inclusion syntax already takes care of displaying data...

Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.brollo@gmail.com wrote:

...

I don't see the need to change deeply Index/ns0 relationship, while I appreciate the idea "promote coherence reducing redundance" (many years ago I painfully used dBase III - dBase IV and I learned that principle by "try and learn").

Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a brief message about relationship among wikidata, commons, wikisource and any other project. Don't follow the link, it's so short that I copy it here (but if you like it, comment it there):

Scribunto-Lua and Wikidata I'd like a library to get Wikidata content; it would be a good idea IMHO to access to Wikidata data in plain form, just as such data would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)

If such a Lua library could be built, to import data from wikidata would be as simple, as writing a template, and data will be self-aligned.

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2093@gmail.com

Hi,

...
There was a thread some time ago where there were talks of having
books which were born digital. These pages wouldn't have scans. What the 'Index' page would have in these cases is something I am not very sure about.

Cheers, Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.com wrote:

...
With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:

to share data book data between Commons, Wikisource and Wikipedia

to update it, when any of the sites has been updated

to facilitate better search functions (like searches by author, or

topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non

Alex Brollo

10 p.m.

New subject: About texts without supporting files and "Index:" pages

Simply there is no need to store data twice or more, if they are dinamically imported from wikidata. Such data would be simply generated by a normal template. Something similar to Commons media sharing: most wikipedians but beginners know that when you want to edit a shared media file, you must do you edit in Commons; there's no need to host a media file locally.

So, IMHO a good Lua wikidata-reading library could avoid at all to store data in wikisource, or wikipedia, or Commons.

Alex

2013/6/10 David Cuenca dacuetu@gmail.com

...

@Alex: but what do you think of storing the source information in "Index:" pages for all works stored in Wikisource, even if they don't have a supporting scan?

That was the original question :)

About your proposed library, it would be more useful if it could modify data in Wikidata, not only import it. Besides, if the Wikidata client is installed in Wikisource, the inclusion syntax already takes care of displaying data...

Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.brollo@gmail.comwrote:

...
I don't see the need to change deeply Index/ns0 relationship, while I appreciate the idea "promote coherence reducing redundance" (many years ago I painfully used dBase III - dBase IV and I learned that principle by "try and learn").

Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a brief message about relationship among wikidata, commons, wikisource and any other project. Don't follow the link, it's so short that I copy it here (but if you like it, comment it there):

Scribunto-Lua and Wikidata I'd like a library to get Wikidata content; it would be a good idea IMHO to access to Wikidata data in plain form, just as such data would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)

If such a Lua library could be built, to import data from wikidata would be as simple, as writing a template, and data will be self-aligned.

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2093@gmail.com

Hi,

...
There was a thread some time ago where there were talks of having
books which were born digital. These pages wouldn't have scans. What the 'Index' page would have in these cases is something I am not very sure about.

Cheers, Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.comwrote:

...
With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:

to share data book data between Commons, Wikisource and Wikipedia

to update it, when any of the sites has been updated

to facilitate better search functions (like searches by author, or

topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
-- Etiamsi omnes, ego non _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

David Cuenca

10:13 p.m.

New subject: About texts without supporting files and "Index:" pages

No, it won't be stored in Wikisource, but still there is the need to present the information in a consistent manner. If you want to display the information on ns0, you will end up needing the same fields that the "Index:" page is using now. So why not to have the same solution for both?

It could also be a template with a reduced set of fields that expands to show "Template:Book" with linked data from Wikidata, no matter if they have supporting scans or not.

Micru

On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.brollo@gmail.com wrote:

...

Simply there is no need to store data twice or more, if they are dinamically imported from wikidata. Such data would be simply generated by a normal template. Something similar to Commons media sharing: most wikipedians but beginners know that when you want to edit a shared media file, you must do you edit in Commons; there's no need to host a media file locally.

So, IMHO a good Lua wikidata-reading library could avoid at all to store data in wikisource, or wikipedia, or Commons.

Alex

2013/6/10 David Cuenca dacuetu@gmail.com

...
@Alex: but what do you think of storing the source information in "Index:" pages for all works stored in Wikisource, even if they don't have a supporting scan?

That was the original question :)

About your proposed library, it would be more useful if it could modify data in Wikidata, not only import it. Besides, if the Wikidata client is installed in Wikisource, the inclusion syntax already takes care of displaying data...

Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.brollo@gmail.comwrote:

...
I don't see the need to change deeply Index/ns0 relationship, while I appreciate the idea "promote coherence reducing redundance" (many years ago I painfully used dBase III - dBase IV and I learned that principle by "try and learn").

Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a brief message about relationship among wikidata, commons, wikisource and any other project. Don't follow the link, it's so short that I copy it here (but if you like it, comment it there):

Scribunto-Lua and Wikidata I'd like a library to get Wikidata content; it would be a good idea IMHO to access to Wikidata data in plain form, just as such data would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)

If such a Lua library could be built, to import data from wikidata would be as simple, as writing a template, and data will be self-aligned.

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2093@gmail.com

Hi,

...
There was a thread some time ago where there were talks of having
books which were born digital. These pages wouldn't have scans. What the 'Index' page would have in these cases is something I am not very sure about.

Cheers, Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.comwrote:

...
With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:

to share data book data between Commons, Wikisource and Wikipedia

to update it, when any of the sites has been updated

to facilitate better search functions (like searches by author, or

topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
-- Etiamsi omnes, ego non _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non

Alex Brollo

11:33 p.m.

New subject: About texts without supporting files and "Index:" pages

I'm going to test what you are telling in a real Lua script; as you know, Lua can read the code of any page with one "expensive" server function only, so that a simple {{header|index name}} ns0 template call could read all the wiki code from index page, parse it, extract all its data content, and use it to build any html you like. No other field is needed. In it.wikisource we are testing something more complex, since we are exporting Index data into a local Lua data module, to be loaded with a mw.loadData function that is not listed as "server-expensive"; but I presume that wiki servers would not be overloaded by *one* server expensive call....

If Im not going wrong, such a script could be written tomorrow by a good Lua programmer.... I'll need some more time as a beginner. I'll test a "MediaWiki:Proofreadpage_index_template" Lua loader & parser working into ns0, just to see if all runs as I guess, then I'll tell you in this thread. In which wikisource project do you work usually?

Alex

2013/6/11 David Cuenca dacuetu@gmail.com

...

No, it won't be stored in Wikisource, but still there is the need to present the information in a consistent manner. If you want to display the information on ns0, you will end up needing the same fields that the "Index:" page is using now. So why not to have the same solution for both?

It could also be a template with a reduced set of fields that expands to show "Template:Book" with linked data from Wikidata, no matter if they have supporting scans or not.

Micru

On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.brollo@gmail.comwrote:

...
Simply there is no need to store data twice or more, if they are dinamically imported from wikidata. Such data would be simply generated by a normal template. Something similar to Commons media sharing: most wikipedians but beginners know that when you want to edit a shared media file, you must do you edit in Commons; there's no need to host a media file locally.

So, IMHO a good Lua wikidata-reading library could avoid at all to store data in wikisource, or wikipedia, or Commons.

Alex

2013/6/10 David Cuenca dacuetu@gmail.com

...
@Alex: but what do you think of storing the source information in "Index:" pages for all works stored in Wikisource, even if they don't have a supporting scan?

That was the original question :)

About your proposed library, it would be more useful if it could modify data in Wikidata, not only import it. Besides, if the Wikidata client is installed in Wikisource, the inclusion syntax already takes care of displaying data...

Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.brollo@gmail.comwrote:

...
I don't see the need to change deeply Index/ns0 relationship, while I appreciate the idea "promote coherence reducing redundance" (many years ago I painfully used dBase III - dBase IV and I learned that principle by "try and learn").

Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a brief message about relationship among wikidata, commons, wikisource and any other project. Don't follow the link, it's so short that I copy it here (but if you like it, comment it there):

Scribunto-Lua and Wikidata I'd like a library to get Wikidata content; it would be a good idea IMHO to access to Wikidata data in plain form, just as such data would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)

If such a Lua library could be built, to import data from wikidata would be as simple, as writing a template, and data will be self-aligned.

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2093@gmail.com

Hi,

...
There was a thread some time ago where there were talks of having
books which were born digital. These pages wouldn't have scans. What the 'Index' page would have in these cases is something I am not very sure about.

Cheers, Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.comwrote:

...
With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:

to share data book data between Commons, Wikisource and Wikipedia

to update it, when any of the sites has been updated

to facilitate better search functions (like searches by author, or

topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
-- Etiamsi omnes, ego non _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
-- Etiamsi omnes, ego non

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Thomas PT

11 Jun 11 Jun

6:41 a.m.

New subject: About texts without supporting files and "Index:" pages

Sorry if my answer is off-topic but if metadata are stored in WIkidata, is it really needed to create index pages to store the same data as Wikidata? As I see the things, we'll have bibliographical metadata on Wikidata (title, author, date of publication...) and data related to proofreading (proofreading level, table of content...) on the Index: pages. More, as the Proofread Page extension considers that an Index page is about a scan (ie one or more files) I'm not sure that Index pages about books without scan will be managed well by the extension.

{{header|index name}} is already done, for books with scan, by the Proofread Page extension with the header=1 feature. In fr Wikisource, we already use a Lua module to manage the Mediawiki:Proofreadpage_header_template template used by the header=1 feature. https://fr.wikisource.org/wiki/Module:Header_template This template outputs automatically metadata and navigation from the index page TOC (but it allows also to override data).

Tpt

Date: Tue, 11 Jun 2013 01:33:39 +0200 From: alex.brollo@gmail.com To: wikisource-l@lists.wikimedia.org Subject: Re: [Wikisource-l] About texts without supporting files and "Index:" pages

Alex

2013/6/11 David Cuenca dacuetu@gmail.com

No, it won't be stored in Wikisource, but still there is the need to present the information in a consistent manner.

If you want to display the information on ns0, you will end up needing the same fields that the "Index:" page is using now.

So why not to have the same solution for both?

It could also be a template with a reduced set of fields that expands to show "Template:Book" with linked data from Wikidata, no matter if they have supporting scans or not.

Micru

On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.brollo@gmail.com wrote:

So, IMHO a good Lua wikidata-reading library could avoid at all to store data in wikisource, or wikipedia, or Commons. Alex

2013/6/10 David Cuenca dacuetu@gmail.com

@Alex: but what do you think of storing the source information in "Index:" pages for all works stored in Wikisource, even if they don't have a supporting scan?

That was the original question :)

Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.brollo@gmail.com wrote:

Scribunto-Lua and WikidataI'd like a library to get Wikidata content; it would be a good idea IMHO to access to Wikidata data in plain form, just as such data would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)

If such a Lua library could be built, to import data from wikidata would be as simple, as writing a template, and data will be self-aligned.

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2093@gmail.com

Hi, There was a thread some time ago where there were talks of having books which were born digital. These pages wouldn't have scans.

What the 'Index' page would have in these cases is something I am not very sure about.

Cheers,Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.com wrote:

With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function.

The most direct transition to a Wikidata-supported Wikisource could be something like this:

https://sites.google.com/site/dacuetu/BookData.pdf

That would allow: - to share data book data between Commons, Wikisource and Wikipedia

- to update it, when any of the sites has been updated

- to facilitate better search functions (like searches by author, or topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Cheers, Micru

_______________________________________________

Wikisource-l mailing list

Wikisource-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l -- Etiamsi omnes, ego non _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l -- Etiamsi omnes, ego non _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Aarti K. Dwivedi

6:46 a.m.

New subject: About texts without supporting files and "Index:" pages

A slighly off-topic question: Even if we modify the extension to proofread books which do not have scans( I am assuming books that were born digital ), against what will these books be proofread?

On Tue, Jun 11, 2013 at 12:11 PM, Thomas PT thomaspt@hotmail.fr wrote:

...

Sorry if my answer is off-topic but if metadata are stored in WIkidata, is it really needed to create index pages to store the same data as Wikidata? As I see the things, we'll have bibliographical metadata on Wikidata (title, author, date of publication...) and data related to proofreading (proofreading level, table of content...) on the Index: pages. More, as the Proofread Page extension considers that an Index page is about a scan (ie one or more files) I'm not sure that Index pages about books without scan will be managed well by the extension.

{{header|index name}} is already done, for books with scan, by the Proofread Page extension with the header=1 feature. In fr Wikisource, we already use a Lua module to manage the Mediawiki:Proofreadpage_header_template template used by the header=1 feature. https://fr.wikisource.org/wiki/Module:Header_template This template outputs automatically metadata and navigation from the index page TOC (but it allows also to override data).

Tpt

Date: Tue, 11 Jun 2013 01:33:39 +0200 From: alex.brollo@gmail.com To: wikisource-l@lists.wikimedia.org Subject: Re: [Wikisource-l] About texts without supporting files and "Index:" pages

I'm going to test what you are telling in a real Lua script; as you know, Lua can read the code of any page with one "expensive" server function only, so that a simple {{header|index name}} ns0 template call could read all the wiki code from index page, parse it, extract all its data content, and use it to build any html you like. No other field is needed. In it.wikisource we are testing something more complex, since we are exporting Index data into a local Lua data module, to be loaded with a mw.loadData function that is not listed as "server-expensive"; but I presume that wiki servers would not be overloaded by *one* server expensive call....

If Im not going wrong, such a script could be written tomorrow by a good Lua programmer.... I'll need some more time as a beginner. I'll test a "MediaWiki:Proofreadpage_index_template" Lua loader & parser working into ns0, just to see if all runs as I guess, then I'll tell you in this thread. In which wikisource project do you work usually?

Alex

2013/6/11 David Cuenca dacuetu@gmail.com

No, it won't be stored in Wikisource, but still there is the need to present the information in a consistent manner. If you want to display the information on ns0, you will end up needing the same fields that the "Index:" page is using now. So why not to have the same solution for both?

It could also be a template with a reduced set of fields that expands to show "Template:Book" with linked data from Wikidata, no matter if they have supporting scans or not.

Micru

On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.brollo@gmail.comwrote:

Simply there is no need to store data twice or more, if they are dinamically imported from wikidata. Such data would be simply generated by a normal template. Something similar to Commons media sharing: most wikipedians but beginners know that when you want to edit a shared media file, you must do you edit in Commons; there's no need to host a media file locally.

So, IMHO a good Lua wikidata-reading library could avoid at all to store data in wikisource, or wikipedia, or Commons.

Alex

2013/6/10 David Cuenca dacuetu@gmail.com

@Alex: but what do you think of storing the source information in "Index:" pages for all works stored in Wikisource, even if they don't have a supporting scan?

That was the original question :)

About your proposed library, it would be more useful if it could modify data in Wikidata, not only import it. Besides, if the Wikidata client is installed in Wikisource, the inclusion syntax already takes care of displaying data...

Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.brollo@gmail.comwrote:

I don't see the need to change deeply Index/ns0 relationship, while I appreciate the idea "promote coherence reducing redundance" (many years ago I painfully used dBase III - dBase IV and I learned that principle by "try and learn").

Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a brief message about relationship among wikidata, commons, wikisource and any other project. Don't follow the link, it's so short that I copy it here (but if you like it, comment it there):

Scribunto-Lua and Wikidata I'd like a library to get Wikidata content; it would be a good idea IMHO to access to Wikidata data in plain form, just as such data would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)

If such a Lua library could be built, to import data from wikidata would be as simple, as writing a template, and data will be self-aligned.

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2093@gmail.com

Hi,
There was a thread some time ago where there were talks of having
books which were born digital. These pages wouldn't have scans. What the 'Index' page would have in these cases is something I am not very sure about.

Cheers, Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.com wrote:

With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:

to share data book data between Commons, Wikisource and Wikipedia

to update it, when any of the sites has been updated

to facilitate better search functions (like searches by author, or

topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

_______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Andrea Zanni

7:19 a.m.

New subject: About texts without supporting files and "Index:" pages

@aarti: sometimes some books/text/documents are born-digital. Think about all the scientific literature, or Phd thesis. These files (if cc-by/sa licensed) could be stored in Wikisource, and be useful for the wikicommunity. We already have some means to link those text to their source (with a URL).

It's a long time "controversy" if we must or must not allow documents without scans on Wikisource. Every community should decide by itself. My personal POV (also as a "librarian"), is that if we leave out born digital documents we are forgetting the bulk of the stuff. I think that one of the most important added values of Wikisource is integrating texts with other Wikimedia projects, and (wiki)linking and connecting each other. No other digital library do that on the Internet, and we can do it because we have a community.

So, these texts will have a source. I do think that proofreading a born digital PDF is a waste of time.

Aubrey

On Tue, Jun 11, 2013 at 8:46 AM, Aarti K. Dwivedi <ellydwivedi2093@gmail.com

...

wrote:

...

A slighly off-topic question: Even if we modify the extension to proofread books which do not have scans( I am assuming books that were born digital ), against what will these books be proofread?

On Tue, Jun 11, 2013 at 12:11 PM, Thomas PT thomaspt@hotmail.fr wrote:

...
Sorry if my answer is off-topic but if metadata are stored in WIkidata, is it really needed to create index pages to store the same data as Wikidata? As I see the things, we'll have bibliographical metadata on Wikidata (title, author, date of publication...) and data related to proofreading (proofreading level, table of content...) on the Index: pages. More, as the Proofread Page extension considers that an Index page is about a scan (ie one or more files) I'm not sure that Index pages about books without scan will be managed well by the extension.

{{header|index name}} is already done, for books with scan, by the Proofread Page extension with the header=1 feature. In fr Wikisource, we already use a Lua module to manage the Mediawiki:Proofreadpage_header_template template used by the header=1 feature. https://fr.wikisource.org/wiki/Module:Header_template This template outputs automatically metadata and navigation from the index page TOC (but it allows also to override data).

Tpt

Date: Tue, 11 Jun 2013 01:33:39 +0200 From: alex.brollo@gmail.com To: wikisource-l@lists.wikimedia.org Subject: Re: [Wikisource-l] About texts without supporting files and "Index:" pages

I'm going to test what you are telling in a real Lua script; as you know, Lua can read the code of any page with one "expensive" server function only, so that a simple {{header|index name}} ns0 template call could read all the wiki code from index page, parse it, extract all its data content, and use it to build any html you like. No other field is needed. In it.wikisource we are testing something more complex, since we are exporting Index data into a local Lua data module, to be loaded with a mw.loadData function that is not listed as "server-expensive"; but I presume that wiki servers would not be overloaded by *one* server expensive call....

If Im not going wrong, such a script could be written tomorrow by a good Lua programmer.... I'll need some more time as a beginner. I'll test a "MediaWiki:Proofreadpage_index_template" Lua loader & parser working into ns0, just to see if all runs as I guess, then I'll tell you in this thread. In which wikisource project do you work usually?

Alex

2013/6/11 David Cuenca dacuetu@gmail.com

No, it won't be stored in Wikisource, but still there is the need to present the information in a consistent manner. If you want to display the information on ns0, you will end up needing the same fields that the "Index:" page is using now. So why not to have the same solution for both?

It could also be a template with a reduced set of fields that expands to show "Template:Book" with linked data from Wikidata, no matter if they have supporting scans or not.

Micru

On Mon, Jun 10, 2013 at 6:00 PM, Alex Brollo alex.brollo@gmail.comwrote:

Simply there is no need to store data twice or more, if they are dinamically imported from wikidata. Such data would be simply generated by a normal template. Something similar to Commons media sharing: most wikipedians but beginners know that when you want to edit a shared media file, you must do you edit in Commons; there's no need to host a media file locally.

So, IMHO a good Lua wikidata-reading library could avoid at all to store data in wikisource, or wikipedia, or Commons.

Alex

2013/6/10 David Cuenca dacuetu@gmail.com

@Alex: but what do you think of storing the source information in "Index:" pages for all works stored in Wikisource, even if they don't have a supporting scan?

That was the original question :)

About your proposed library, it would be more useful if it could modify data in Wikidata, not only import it. Besides, if the Wikidata client is installed in Wikisource, the inclusion syntax already takes care of displaying data...

Micru

On Mon, Jun 10, 2013 at 5:38 PM, Alex Brollo alex.brollo@gmail.comwrote:

I don't see the need to change deeply Index/ns0 relationship, while I appreciate the idea "promote coherence reducing redundance" (many years ago I painfully used dBase III - dBase IV and I learned that principle by "try and learn").

Here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto/Brainstorming a brief message about relationship among wikidata, commons, wikisource and any other project. Don't follow the link, it's so short that I copy it here (but if you like it, comment it there):

Scribunto-Lua and Wikidata I'd like a library to get Wikidata content; it would be a good idea IMHO to access to Wikidata data in plain form, just as such data would be Lua tables/variables. --Alex brollo (talk) 13:06, 10 June 2013 (UTC)

If such a Lua library could be built, to import data from wikidata would be as simple, as writing a template, and data will be self-aligned.

Alex

2013/6/10 Aarti K. Dwivedi ellydwivedi2093@gmail.com

Hi,
There was a thread some time ago where there were talks of having
books which were born digital. These pages wouldn't have scans. What the 'Index' page would have in these cases is something I am not very sure about.

Cheers, Rtdwivedi

On Mon, Jun 10, 2013 at 10:47 PM, David Cuenca dacuetu@gmail.com wrote:

With the deployment of Wikidata it is a good moment to re-examine what "Index" pages are and what should be their function. The most direct transition to a Wikidata-supported Wikisource could be something like this: https://sites.google.com/site/dacuetu/BookData.pdf

That would allow:

to share data book data between Commons, Wikisource and Wikipedia

to update it, when any of the sites has been updated

to facilitate better search functions (like searches by author, or

topic, limiting the date range or the language)

That would only apply to those texts which use a "Index:" page, so now the question is, what do we do with books that do not have supporting scans (and therefore no index page)?

Some possible options: a) ignore pages without sources and focus only on works with supporting scans b) use ns0 pages also as data containers (instead of, or in addition to "Index" pages) c) create "Index:" pages for all works, with or without scans. Use that instead of "Template:Textinfo"

Personally I prefer "option c", even if it would require to rename "Index:" to "Source:" to make more clear what are those pages, however I would like to hear the opinion of other wikisourcerors about this.

Cheers, Micru

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

_______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

billinghurst

11:48 a.m.

New subject: About texts without supporting files and "Index:" pages

On Tue, 11 Jun 2013 12:16:54 +0530, "Aarti K. Dwivedi" ellydwivedi2093@gmail.com wrote:

...

A slighly off-topic question: Even if we modify the extension to

proofread

...

books which do not have scans( I am assuming books that were born

digital

...

), against what will these books be proofread?

I am not sure why we are looking to proofread a digital only file, unless of course it never had a text layer and it had to be OCR'd. Proofreading surely only relates to scanned images where there has been the need to proofread.

Regards, Billinghurst

David Cuenca

7:12 p.m.

New subject: About texts without supporting files and "Index:" pages

@Billinghurst, I think Aubrey was referring mainly to pdf files, which sometimes have text and format but they are not that easy to represent in Wikisource. The main problem is that our current workflow always assume that we are going to proofread a text and have it stored as a web page.

@others: for me it doesn't matter much if the representation of the metadata is done by a template, an index page, or something different (maybe related to the new Extension:BookManager?) However I think that from the user point of view it is better to have a consistent system that can handle: 1) representation of book/source metadata 2) give access to export/visualization options

I'm preparing a document with some ideas that we can discuss here.

Micru

On Tue, Jun 11, 2013 at 7:48 AM, billinghurst billinghurst@gmail.comwrote:

...

On Tue, 11 Jun 2013 12:16:54 +0530, "Aarti K. Dwivedi" ellydwivedi2093@gmail.com wrote:

...
A slighly off-topic question: Even if we modify the extension to

proofread

...
books which do not have scans( I am assuming books that were born

digital

...
), against what will these books be proofread?

I am not sure why we are looking to proofread a digital only file, unless of course it never had a text layer and it had to be OCR'd. Proofreading surely only relates to scanned images where there has been the need to proofread.

Regards, Billinghurst

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non

billinghurst

12 Jun 12 Jun

11:32 a.m.

New subject: About texts without supporting files and "Index:" pages

You need to be cautious talking about "PDF" documents, as it is not the document presentation format, it is the source of the text. So I like to talk as the source being digitally prepared (and not requiring validation, though may require formatting), or OCR'd (requiring validation, and probably formatting.)

If you are talking about how we represent digitally prepared text with the validation process. I would have no issue with the text being ripped and having a bot run through and taking it straight to level 4 (green), and then redefining green to say validated, or digitally prepared text not requiring validation.

At the same time, if someone proposed and generates a fifth colour to represent digitally prepared text not requiring proofreading, then I will be happy with that. It may make someone happier in being a truer representation, but in the end to me it is a moot point. In the end, each of those is a local community decision, though one that should be made in consideration of how the other wikis interpret their processes.

Regards, Billinghurst

On Tue, 11 Jun 2013 15:12:41 -0400, David Cuenca dacuetu@gmail.com wrote:

...

@Billinghurst, I think Aubrey was referring mainly to pdf files, which sometimes have text and format but they are not that easy to represent

...

Wikisource. The main problem is that our current workflow always assume that we are going to proofread a text and have it stored as a web page.

@others: for me it doesn't matter much if the representation of the metadata is done by a template, an index page, or something different (maybe related to the new Extension:BookManager?) However I think that from the user point of view it is better to have a consistent system that can handle:

representation of book/source metadata

give access to export/visualization options

I'm preparing a document with some ideas that we can discuss here.

Micru

On Tue, Jun 11, 2013 at 7:48 AM, billinghurst billinghurst@gmail.comwrote:

...
On Tue, 11 Jun 2013 12:16:54 +0530, "Aarti K. Dwivedi" ellydwivedi2093@gmail.com wrote:

...
A slighly off-topic question: Even if we modify the extension to

proofread

...
books which do not have scans( I am assuming books that were born

digital

...
), against what will these books be proofread?

I am not sure why we are looking to proofread a digital only file,

unless

...

...
of course it never had a text layer and it had to be OCR'd.

Proofreading

...

...
surely only relates to scanned images where there has been the need to proofread.

Regards, Billinghurst

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Andrea Zanni

11:35 a.m.

New subject: About texts without supporting files and "Index:" pages

On Wed, Jun 12, 2013 at 1:32 PM, billinghurst billinghurst@gmail.comwrote:

...

If you are talking about how we represent digitally prepared text with the validation process. I would have no issue with the text being ripped and having a bot run through and taking it straight to level 4 (green), and then redefining green to say validated, or digitally prepared text not requiring validation.

At the same time, if someone proposed and generates a fifth colour to represent digitally prepared text not requiring proofreading, then I will be happy with that. It may make someone happier in being a truer representation, but in the end to me it is a moot point. In the end, each of those is a local community decision, though one that should be made in consideration of how the other wikis interpret their processes.

Thanks for clarifying this. I agree with you, and would welcome both solutions.

But a lot of wikisourcerors don't think this way, so better discuss :-)

Aubrey

Thibaut Horel

12:32 p.m.

New subject: About texts without supporting files and "Index:" pages

Hi everybody,

Here is my attempt at giving my point of view while trying to summarize the discussion:

1. I think the role of Index: pages should be to present the *source* of a work. This is true whether the source is a scanned edition (as is most often the case at the moment), or a digital PDF (that is, containing text and not images) as is the case for most "digital-born" documents. I think it is good to have a neat separation between the original source and how Wikisource presents the work in the main namespace. Indeed, even if Wikisource tries to be as true as possible to the original content, there are very often some changes in the way it is presented in the main namespace.

2. Ideally, the metadata about the source of a work (author, date of printing, etc.) should be located in Wikidata. But metadata related to proofreading (e.g. the proofreading level of each individual page), being specific to the mission of Wikisource, should be located in Wikisource. How to do this while keeping the interface simple (i.e. hide it from the user so that she doesn't have to go from Wikisource to Wikidata to Wikisource) is a valid and very important concern, but is also beyond my current understanding of Wikidata and its integration into Wikimedia projects.

3. The current system with 4 quality levels to represent the proofreading state of a page is not sufficient to represent the diversity of proofreading scenarios. Indeed, there is a distinction to make between the *correctness* of the text and its *formatting*. In the case of a scanned edition which has been OCRed, we do need several passes before reaching a satisfying level of confidence about the correctness of the text as well as a suitable formatting (proper use of the wikicode, etc.). For digital-born documents however, as billinghurst said, we can automatically assume that the extracted text is correct, but that still doesn't mean that the text is correctly formatted and ready to be transcluded in the main namespace. Maybe we should add another level meaning "text is correct, still needs formatting"? Ideally, we should have to scales of quality levels: one dealing with the correctness of the text, and one dealing with its formatting. This would probably be too heavy and confusing though...

Thibaut (user:Zaran on Wikisource)

On 06/12/2013 01:35 PM, Andrea Zanni wrote:

...

On Wed, Jun 12, 2013 at 1:32 PM, billinghurst <billinghurst@gmail.com mailto:billinghurst@gmail.com> wrote:
If you are talking about how we represent digitally prepared text
with the
validation process. I would have no issue with the text being
ripped and
having a bot run through and taking it straight to level 4
(green), and
then redefining green to say validated, or digitally prepared text not
requiring validation.

At the same time, if someone proposed and generates a fifth colour to
represent digitally prepared text not requiring proofreading, then
I will
be happy with that. It may make someone happier in being a truer
representation, but in the end to me it is a moot point. In the
end, each
of those is a local community decision, though one that should be
made in
consideration of how the other wikis interpret their processes.
Thanks for clarifying this. I agree with you, and would welcome both solutions.

But a lot of wikisourcerors don't think this way, so better discuss :-)

Aubrey

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Andrea Zanni

12:48 p.m.

New subject: About texts without supporting files and "Index:" pages

On Wed, Jun 12, 2013 at 2:32 PM, Thibaut Horel thibaut.horel@gmail.comwrote:

...

The current system with 4 quality levels to represent the proofreading

state of a page is not sufficient to represent the diversity of proofreading scenarios. Indeed, there is a distinction to make between the *correctness* of the text and its *formatting*. In the case of a scanned edition which has been OCRed, we do need several passes before reaching a satisfying level of confidence about the correctness of the text as well as a suitable formatting (proper use of the wikicode, etc.). For digital-born documents however, as billinghurst said, we can automatically assume that the extracted text is correct, but that still doesn't mean that the text is correctly formatted and ready to be transcluded in the main namespace. Maybe we should add another level meaning "text is correct, still needs formatting"? Ideally, we should have to scales of quality levels: one dealing with the correctness of the text, and one dealing with its formatting. This would probably be too heavy and confusing though...

I couldn't agree more. I think this could be an opportunity also to make task *smaller* and *clearer* (in the direction of "microtask", which are contributions in crowdsourcing projects which are small, definite and simple. eg GalaxyZoo, reCAPTCHA).

We could define some tasks as * corrected the page * proofread the text * formatted the page * validated the formatting * OPTIONAL added optional templates/links/annotations *...

We could even have qualifiers (all/part of the page, ...)

Is this idea crazy, or somewhat doable?

Aubrey

David Cuenca

2:10 p.m.

New subject: About texts without supporting files and "Index:" pages

I think everything is doable, the problem is how to do it without cluttering the interface and keeping things simple.

Some levels might be redundant and we could take the chance to think if they are really necessary.

Some proposed changes: - Proofread page levels: "Unused", "Proofread", "Proofread with format", "Validated" (the "unused" level would mean: pages with no text, ocr text, pages with irrelevant content). - All pages would be created at start with the extracted ocr text at "unused" level, so finally search engines could also find our texts even if they are not started yet - A checkbox list to tag pages: "damaged scan", "missing scan", "contains media" (image, score, etc) - Color codes: like now plus orange for "Proofread with format". Page with tags would affect the color too. "damaged" would make the color half purple and half the corresponding proofread level color, "contains media" could add a (black?) square around the page number - Proofread book levels should be automatic to the lowest page level, plus two options, one to mark the book as "ready to export" and another one to mark it as "digital source", which would bring all pages at "proofread" level.

For the metadata interface I keep thinking about it, and my impression is that we should start working from Template:Book [1] until having a version that can be used across Commons, Index pages, and books without supporting scans (in this last case it could be the same header template with an option to expand it to show the whole template:book). That template also might need some coloring/reorganizing to reflect the Work/Edition distinction that Wikidata is bringing [2] And if with Lua it is possible to read/write Wikidata, then the possible migration towards a Wikidata-powered Wikisource shouldn't be that far away.

Cheers, Micru

[1] http://commons.wikimedia.org/wiki/Template:Book [2] http://www.wikidata.org/wiki/Wikidata:Books_task_force

On Wed, Jun 12, 2013 at 8:48 AM, Andrea Zanni zanni.andrea84@gmail.comwrote:

...

On Wed, Jun 12, 2013 at 2:32 PM, Thibaut Horel thibaut.horel@gmail.comwrote:

...

The current system with 4 quality levels to represent the proofreading

state of a page is not sufficient to represent the diversity of proofreading scenarios. Indeed, there is a distinction to make between the *correctness* of the text and its *formatting*. In the case of a scanned edition which has been OCRed, we do need several passes before reaching a satisfying level of confidence about the correctness of the text as well as a suitable formatting (proper use of the wikicode, etc.). For digital-born documents however, as billinghurst said, we can automatically assume that the extracted text is correct, but that still doesn't mean that the text is correctly formatted and ready to be transcluded in the main namespace. Maybe we should add another level meaning "text is correct, still needs formatting"? Ideally, we should have to scales of quality levels: one dealing with the correctness of the text, and one dealing with its formatting. This would probably be too heavy and confusing though...

I couldn't agree more. I think this could be an opportunity also to make task *smaller* and *clearer* (in the direction of "microtask", which are contributions in crowdsourcing projects which are small, definite and simple. eg GalaxyZoo, reCAPTCHA).

We could define some tasks as

corrected the page

proofread the text

formatted the page

validated the formatting

OPTIONAL added optional templates/links/annotations

*...

We could even have qualifiers (all/part of the page, ...)

Is this idea crazy, or somewhat doable?

Aubrey

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non

Lars Aronsson

2:28 p.m.

New subject: About texts without supporting files and "Index:" pages

On 06/12/2013 02:48 PM, Andrea Zanni wrote:

...

We could define some tasks as

corrected the page

OPTIONAL added optional templates/links/annotations

*...

Geotagged all the photos, ...

The list doesn't end. You need a generic mechanism for any new feature you can invent. But aren't our existing templates and categories the best way to do this? You could just add to each page: {{done|proofread=user1|validated=user2|geotagged=user4|...}}

-- Lars Aronsson (lars@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/

Aarti K. Dwivedi

2:47 p.m.

New subject: About texts without supporting files and "Index:" pages

If I am not wrong, as of today, most books that were born digital, are still under copyright. Of course, they are available freely on the internet. But we can't use the pirated copies. How would we go about the procurement of these books? If we procure these copyrighted books, then the only we would have to do is to check for proper formatting. Isn't it?

On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson lars@aronsson.se wrote:

...

On 06/12/2013 02:48 PM, Andrea Zanni wrote:

...
We could define some tasks as

corrected the page

OPTIONAL added optional templates/links/annotations

*...

Geotagged all the photos, ...

The list doesn't end. You need a generic mechanism for any new feature you can invent. But aren't our existing templates and categories the best way to do this? You could just add to each page: {{done|proofread=user1|**validated=user2|geotagged=**user4|...}}

-- Lars Aronsson (lars@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/

______________________________**_________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikisource-l https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

David Cuenca

2:54 p.m.

New subject: About texts without supporting files and "Index:" pages

Nobody is saying anything about using copyrighted works, there are many books that have an open license that would allow to include them in Wikisource.

For instance in ca-ws we have this translation from 2009: http://ca.wikisource.org/wiki/Llibre:El_secret_de_l%E2%80%99or_que_creix_%28...

The original is in the PD, and the translator gave away his rights. It would have been much easier to work directly with the pdf, instead of converting to djvu.

Micru

On Wed, Jun 12, 2013 at 10:47 AM, Aarti K. Dwivedi < ellydwivedi2093@gmail.com> wrote:

...

If I am not wrong, as of today, most books that were born digital, are still under copyright. Of course, they are available freely on the internet. But we can't use the pirated copies. How would we go about the procurement of these books? If we procure these copyrighted books, then the only we would have to do is to check for proper formatting. Isn't it?

On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson lars@aronsson.se wrote:

...
On 06/12/2013 02:48 PM, Andrea Zanni wrote:

...
We could define some tasks as

corrected the page

OPTIONAL added optional templates/links/annotations

*...

Geotagged all the photos, ...

The list doesn't end. You need a generic mechanism for any new feature you can invent. But aren't our existing templates and categories the best way to do this? You could just add to each page: {{done|proofread=user1|**validated=user2|geotagged=**user4|...}}

-- Lars Aronsson (lars@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/

______________________________**_________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikisource-l https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non

Alex Brollo

4:38 p.m.

New subject: About texts without supporting files and "Index:" pages

When we tried to convert into wiki code (a needed step to add links and to convert files into a "wiki hypertext") a pdf file, that's a opaque, closed format, such a work turned off in a nightmare. If we simply load free pdf books "as they are", I don't see any advantage, but "feed wikisource numbers/statistics" nd this in presently far from my personal interest.

As you guess, I'm one of users who don't support Aubrey's enthusiasm about texts born digital, even if free. :-)

Alex

2013/6/12 David Cuenca dacuetu@gmail.com

...

Nobody is saying anything about using copyrighted works, there are many books that have an open license that would allow to include them in Wikisource.

For instance in ca-ws we have this translation from 2009:

http://ca.wikisource.org/wiki/Llibre:El_secret_de_l%E2%80%99or_que_creix_%28...

The original is in the PD, and the translator gave away his rights. It would have been much easier to work directly with the pdf, instead of converting to djvu.

Micru

On Wed, Jun 12, 2013 at 10:47 AM, Aarti K. Dwivedi < ellydwivedi2093@gmail.com> wrote:

...
If I am not wrong, as of today, most books that were born digital, are still under copyright. Of course, they are available freely on the internet. But we can't use the pirated copies. How would we go about the procurement of these books? If we procure these copyrighted books, then the only we would have to do is to check for proper formatting. Isn't it?

On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson lars@aronsson.se wrote:

...
On 06/12/2013 02:48 PM, Andrea Zanni wrote:

...
We could define some tasks as

corrected the page

OPTIONAL added optional templates/links/annotations

*...

Geotagged all the photos, ...

The list doesn't end. You need a generic mechanism for any new feature you can invent. But aren't our existing templates and categories the best way to do this? You could just add to each page: {{done|proofread=user1|**validated=user2|geotagged=**user4|...}}

-- Lars Aronsson (lars@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/

______________________________**_________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikisource-l https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Aarti K. Dwivedi

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

-- Etiamsi omnes, ego non _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Andrea Zanni

13 Jun 13 Jun

8:31 a.m.

New subject: About texts without supporting files and "Index:" pages

On Wed, Jun 12, 2013 at 4:47 PM, Aarti K. Dwivedi <ellydwivedi2093@gmail.com

...

wrote:

...

If I am not wrong, as of today, most books that were born digital, are still under copyright. Of course, they are available freely on the internet. But we can't use the pirated copies. How would we go about the procurement of these books? If we procure these copyrighted books, then the only we would have to do is to check for proper formatting. Isn't it?

You are thinking of *books*, which are not the only documents Wikisource can host. For example, I am thinking about Open Access literature, which counts in hundred thousands CC-BY licensed articles, for example. Just look in DOAJ: http://www.doaj.org/

One of the wikimedians most involved in Open Access - Wiki collaboration is Daniel Mietchen (cc'ed). He's working on a bot who could grab the XML/HTML of an online article, format it in wikicode, and post it wherever he wants (maybe, Wikisources). The bot is aming to download automatically all images within the articles, and post them on Commons.

I personally think that this project is beyond awesomeness, IF we manage to solve particular and specific issues (as converting hyperlinks to other articles in wikilinks to those articles posted on WIkisource...)

As I said before, I see Wikisource as a broad, international, connected, hypertextual digital library, which has a thing no other digital library in the world has: a dedicated community[*].

It is my personal opinion, I know some people don't see it that way (like Alex :-D)

Aubrey

[*] there is Project Gutenberg, but I would argue they are not a digital library...

Andrea Zanni

11 Jun 11 Jun

7:25 a.m.

New subject: About texts without supporting files and "Index:" pages

On Tue, Jun 11, 2013 at 8:41 AM, Thomas PT thomaspt@hotmail.fr wrote:

...

Sorry if my answer is off-topic but if metadata are stored in WIkidata, is it really needed to create index pages to store the same data as Wikidata? As I see the things, we'll have bibliographical metadata on Wikidata (title, author, date of publication...) and data related to proofreading (proofreading level, table of content...) on the Index: pages. More, as the Proofread Page extension considers that an Index page is about a scan (ie one or more files) I'm not sure that Index pages about books without scan will be managed well by the extension.

I think that this is a matter of usability and user experience.

If we are going to use Index pages, we'll let users *stay on Wikisource* the whole time, while the complexity and data workflow would be hidden to them. It's a *bad* thing to ask newbies to navigate through Wikisource (entry), then Commons (file upload), the Wikisource(create Index page), then Wikidata(fetch data), then Wikisource(start working on the book) again to work on just a book.

For me this is one of the main obstacles to beginners, and we should try to ease things for people, IMHO.

Aubrey

Alex Brollo

11:38 a.m.

New subject: About texts without supporting files and "Index:" pages

You're right Aubrey nevertheless while promoving a user friendly interface the result is that data and wiki code is extremely difficult to use as a clean "data base". Think only to wiki markup and the "simple" trick to mark bold and italic text with apostophes.... very user friendly, but something like a nightmare for a poor programmer which needs to find the algorithm to understand which apostophes are text and which are code. The server too can't solve solve apostrophes concatenation. Was it less user friendly to use something like <b>...</b>? Yes; but.... how much cleaner raw wiki text would be!

Distributed Proofreaders uses a completely different approach: there's a rigid set of increasing abilitations for users, and unexperienced users can do simple task only. This is far from "wiki mentality", but we can't expect to keep things too much easy.

Alex

2013/6/11 Andrea Zanni zanni.andrea84@gmail.com

...

On Tue, Jun 11, 2013 at 8:41 AM, Thomas PT thomaspt@hotmail.fr wrote:

...
Sorry if my answer is off-topic but if metadata are stored in WIkidata, is it really needed to create index pages to store the same data as Wikidata? As I see the things, we'll have bibliographical metadata on Wikidata (title, author, date of publication...) and data related to proofreading (proofreading level, table of content...) on the Index: pages. More, as the Proofread Page extension considers that an Index page is about a scan (ie one or more files) I'm not sure that Index pages about books without scan will be managed well by the extension.

I think that this is a matter of usability and user experience.

If we are going to use Index pages, we'll let users *stay on Wikisource* the whole time, while the complexity and data workflow would be hidden to them. It's a *bad* thing to ask newbies to navigate through Wikisource (entry), then Commons (file upload), the Wikisource(create Index page), then Wikidata(fetch data), then Wikisource(start working on the book) again to work on just a book.

For me this is one of the main obstacles to beginners, and we should try to ease things for people, IMHO.

Aubrey

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Alex Brollo

11:39 a.m.

New subject: About texts without supporting files and "Index:" pages

I apologyze....

" The server too can't solve *some* apostrophes concatenation"

Alex

2013/6/11 Alex Brollo alex.brollo@gmail.com

...

You're right Aubrey nevertheless while promoving a user friendly interface the result is that data and wiki code is extremely difficult to use as a clean "data base". Think only to wiki markup and the "simple" trick to mark bold and italic text with apostophes.... very user friendly, but something like a nightmare for a poor programmer which needs to find the algorithm to understand which apostophes are text and which are code. The server too can't solve solve apostrophes concatenation. Was it less user friendly to use something like <b>...</b>? Yes; but.... how much cleaner raw wiki text would be!

Distributed Proofreaders uses a completely different approach: there's a rigid set of increasing abilitations for users, and unexperienced users can do simple task only. This is far from "wiki mentality", but we can't expect to keep things too much easy.

Alex

2013/6/11 Andrea Zanni zanni.andrea84@gmail.com

...
On Tue, Jun 11, 2013 at 8:41 AM, Thomas PT thomaspt@hotmail.fr wrote:

...
Sorry if my answer is off-topic but if metadata are stored in WIkidata, is it really needed to create index pages to store the same data as Wikidata? As I see the things, we'll have bibliographical metadata on Wikidata (title, author, date of publication...) and data related to proofreading (proofreading level, table of content...) on the Index: pages. More, as the Proofread Page extension considers that an Index page is about a scan (ie one or more files) I'm not sure that Index pages about books without scan will be managed well by the extension.

I think that this is a matter of usability and user experience.

If we are going to use Index pages, we'll let users *stay on Wikisource* the whole time, while the complexity and data workflow would be hidden to them. It's a *bad* thing to ask newbies to navigate through Wikisource (entry), then Commons (file upload), the Wikisource(create Index page), then Wikidata(fetch data), then Wikisource(start working on the book) again to work on just a book.

For me this is one of the main obstacles to beginners, and we should try to ease things for people, IMHO.

Aubrey

Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

4054

Age (days ago)

4057

Last active (days ago)

wikisource-l@lists.wikimedia.org

24 comments

8 participants

tags (0)

participants (8)

Aarti K. Dwivedi
Alex Brollo
Andrea Zanni
billinghurst
David Cuenca
Lars Aronsson
Thibaut Horel
Thomas PT