[Wikisource-l] DIY Semantic Wikisource

Andrea Zanni zanni.andrea84 at gmail.com
Fri Oct 22 14:22:41 UTC 2010


Sorry to all,
I forgot to change the header of the message and Alex replied to all
and not to me only.
My bad, please ignore.
I'm resending the original message below.

Aubrey
----

Being the problem of metadata a long pain which is bothering wikimedia
projects from beginning, and Wikisource is no exception.

If we want to be a reliable digital library, we should face the fact that
our information about books are not using standards (e.g Dublin Core) nor
are machine-readable. MediaWiki still doesn't have a proper extension for
handling metadata, and SemanticMediaWiki is not used for some reasons I
understand only partially (security and scalability issues, as far as I
know).

Thus, what I'm gonna present you is just a proposal, made by Alex Brollo, in
the quick-and-dirty DIY style that we see often in wikis.

As all of you know, Wikisource uses the (beloved) #lst extension, usually
used only into the proofread procedure.
In It.source, we are exploring into it.source some other possibile uses of
#lst extension, the main one is to create something like "variables", that
can be used everywhere into our site. In a nut-shell, it's a DIY
"semantization" of text.

Simply, if you put the code <section begin="birth date" />May 6,
1876<section end="birth date" /> into a "Author:Pinco Pallino", you can
obviuosly access to birth date of the imaginary author Pinco Pallino with
#lst:

{{#section:Author:Pinco Pallino|birth date}}

that is pretty intuitive.
(it should work also with the brand-new syntax ## ##)

Using this "emerging feature" we are converting all the parameters of main,
standard templates into "DIY variables" by bot, with very interesting,
unexpected results.

You obviously need to put the "extracted data" somewhere: in it.source we
decided to test as a data container both Talk pages (nsPage:)  and main page
itself, with a Javascript system for hiding the code to users

(but it's there). Of course, if things go ahed we could even request a Data:
namespace...

Currently, we are using this feature to show in with a single template the
year and the status of a single text or page. For examples,
see http://it.wikisource.org/wiki/Pagina_principale/Sezioni#Ultimi_arrivi.
Or to show the page status in the transcluded version:
http://it.wikisource.org/wiki/Storia_di_una_capinera/II

At the end of the day, we have a system to "extract data" e do whatever we
want with them. If we had proper templates following Dublin Core (and maybe
a form Upload-like to put the data inside), we could even solve the bulk of
our metadata issue. In Paris, in the GLAM conference, we'll probably discuss
a lot about these
(definitely, I will).

So we want to ask you if there are any other similar experience, and which
drawbacks do you see.
We are aware that the system is heavy, it's not the way to fix this kind of
problem,
and stuff like that. If you can amend it, or suggest better
practices/procedure, you are definetely welcome.
For once, it would be very good to coordinate/collaborate together in
developing an important tool as this seem to be.

Cheers,

Aubrey
WMI Board
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/wikisource-l/attachments/20101022/aa92a897/attachment.htm 


More information about the Wikisource-l mailing list