Hi, everyone.

I do have some experience with TEI markup (but don't know anything about the MW extension).

TEI is an XML dialect.  As such, there are general XML tools useful for editing it, and one of the favorite tools among TEI practitioners is the (proprietary and commercial) Oxygen XML Editor[0], which has significant customized support for TEI, but is a desktop application and is not collaborative.

Lars asked a good question, in that the goals of those scholars are rather different from ours: TEI practitioners are general in the business of creating critical editions[1] and multitexts[2].  They use TEI to denote textual details such as emendations, corrections, crossed-out words, divergences between different textual "witnesses", scribal notes, sometimes even a change of ink or scribal hand; see for example [3].  They spend years and six/seven-figure sums creating things like [2] or [4] the unfortunately-firewalled-but-huge[5].

Whereas the TEI folks are generally:
a. academics and paid professionals
b. well-funded
c. radically detail-oriented
d. working until it is done/perfect
e. focused on depth of critical edition over quantity of works produced (dozens of man-years per critical edition)

We Wikisourcerors and gonzo librarians[7] are generally:
a. volunteers
b. not funded.  (though we do have access to grants if we need them)
c. fairly detail-oriented
d. release/publish early and often; fix things as we go; tolerate errors and trusting an asymptotic improvement curve.
e. focused on quantity of works made accessible. (dozens/hundreds of _hours_ per work)

This, to me, suggests that there should be a reasonable limit to how far we go out of our way to support the TEI community on the Mediawiki platform.  While a powerful collaborative Mediawiki-powered TEI editor would be neat, I don't think it's the Wikimedia movement that should be the main force driving (and funding) its development.

Hope this helps,


[0] http://www.oxygenxml.com/
[1] https://en.wikipedia.org/wiki/Textual_criticism
[2] http://www.homermultitext.org/
[3] e.g. http://www.janeausten.ac.uk/manuscripts/blpers/23.html
[4] http://dare.uni-koeln.de/ [6]
[5] https://faustedition.uni-wuerzburg.de/dev/project/about [6]
[6] disclosure: I have collaborated and am friends with some of the engineers working on these two projects.
[7] To those who may not know, Lars runs Project Runeberg; I run Project Ben-Yehuda.  Both are free volunteer-run digital libraries of public domain texts.

On Thu, Nov 21, 2013 at 11:02 AM, Thomas Tanon <thomaspt@hotmail.fr> wrote:
It’s possible to create a new content type called something like ‘TEI’ that would replace wikitext with its own renderer. This capacity have been added in MediaWiki for Wikidata development. In order to make it used by the ProofreadPage extension it’s possible but not completely easy.

The pointed extension only adds some TEI tags to Wikitext markup and, so, isn’t an solution for this problem. Implement a full solution is a big task but not really difficult if we have a good TEI -> HTML renderer. It can be done, I think, in one or two months of work.


Le 21 nov. 2013 à 17:18, Andrea Zanni <zanni.andrea84@gmail.com> a écrit :

On Thu, Nov 21, 2013 at 4:27 PM, Lars Aronsson <lars@aronsson.se> wrote:
On 11/21/2013 03:55 PM, Andrea Zanni wrote:
I stumbled across this extension because today my collegues came to me asking about the Proofread extension and TEI.
the have a very common problem: they need a collaborative TEI editor for transcribing a scanned manuscript.
This is something we know, as many GLAMs and professors have the same issue;
and this is something that wikisource did not know how to handle.

Why are your colleagues doing TEI markup?
What is the output? How is it used? What is
it that TEI markup gives them, that the current
Wikisource process (scanning + proofreading
+ wiki markup + transclusion) doesn't give them?

My collegues (from the University of Bologna digital library)
are working for a Digital Humanities project.
In the DH field (Asaf, you can weight in here :-)
you often need to mark up some correction, notes, sequence of corrections, mispellings, differences between manuscripts and printed works, etc.
We can't get that level of "granularity", on Wikisource (or, we could, via templates).

The TEI is a standard de fact markup language for these kind of philological issues.

My idea (from my limited understanding) is that if we develop a good MediaWiki extension for this, many researchers will then use MediaWiki + proofredExtesnsion + TEI extension
as a collaborative TEI editor. Then they can visualize these data as they want, as in these projects:

Some of them, I hope could actually use Wikisource, if we alreqady had all the software needed. 


  Lars Aronsson (lars@aronsson.se)
  Project Runeberg - free Nordic literature - http://runeberg.org/

Wikisource-l mailing list

Wikisource-l mailing list

Wikisource-l mailing list

    Asaf Bartov
    Wikimedia Foundation

Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality!