Being the problem of metadata a long pain which is bothering wikimedia projects from beginning, and Wikisource is no exception.
If we want to be a reliable digital library, we should face the fact that our information about books are not using standards (e.g Dublin Core) nor are machine-readable. MediaWiki still doesn't have a proper extension for handling metadata, and SemanticMediaWiki is not used for some reasons I understand only partially (security and scalability issues, as far as I know).
Thus, what I'm gonna present you is just a proposal, made by Alex Brollo, in the quick-and-dirty DIY style that we see often in wikis.
As all of you know, Wikisource uses the (beloved) #lst extension, usually used only into the proofread procedure. In It.source, we are exploring into it.source some other possibile uses of #lst extension, the main one is to create something like "variables", that can be used everywhere into our site. In a nut-shell, it's a DIY "semantization" of text.
Simply, if you put the code <section begin="birth date" />May 6, 1876<section end="birth date" /> into a "Author:Pinco Pallino", you can obviuosly access to birth date of the imaginary author Pinco Pallino with #lst:
{{#section:Author:Pinco Pallino|birth date}}
that is pretty intuitive. (it should work also with the brand-new syntax ## ##)
Using this "emerging feature" we are converting all the parameters of main, standard templates into "DIY variables" by bot, with very interesting, unexpected results.
You obviously need to put the "extracted data" somewhere: in it.source we decided for Talk pages, with a Javascript system for hiding the code to users (but it's there). Of course, if things go ahed we could even request a Data: namespace...
Currently, we are using this feature to show in with a single template the year and the status of a single text or page. For examples, see http://it.wikisource.org/wiki/Pagina_principale/Sezioni#Ultimi_arrivi. Or to show the page status in the transcluded version: http://it.wikisource.org/wiki/Storia_di_una_capinera/II
At the end of the day, we have a system to "extract data" e do whatever we want with them. If we had proper templates following Dublin Core (and maybe a form Upload-like to put the data inside), we could even solve the bulk of our metadata issue. In Paris, in the GLAM conference, we'll probably discuss a lot about these (definitely, I will).
So we want to ask you if there are any other similar experience, and which drawbacks do you see. We are aware that the system is heavy, it's not the way to fix this kind of problem, and stuff like that. If you can amend it, or suggest better practices/procedure, you are definetely welcome. For once, it would be very good to coordinate/collaborate together in developing an important tool as this seem to be.
Cheers,
Aubrey WMI Board
Magnifico! Ti ho messo in rosso l'unica inesattezza: al momento i dati stanno nella talk page solo su Pagina, mentre stanno ancora nella pagina stessa in Ns0.
2010/10/22 Andrea Zanni zanni.andrea84@gmail.com
Being the problem of metadata a long pain which is bothering wikimedia projects from beginning, and Wikisource is no exception.
If we want to be a reliable digital library, we should face the fact that our information about books are not using standards (e.g Dublin Core) nor are machine-readable. MediaWiki still doesn't have a proper extension for handling metadata, and SemanticMediaWiki is not used for some reasons I understand only partially (security and scalability issues, as far as I know).
Thus, what I'm gonna present you is just a proposal, made by Alex Brollo, in the quick-and-dirty DIY style that we see often in wikis.
As all of you know, Wikisource uses the (beloved) #lst extension, usually used only into the proofread procedure. In It.source, we are exploring into it.source some other possibile uses of #lst extension, the main one is to create something like "variables", that can be used everywhere into our site. In a nut-shell, it's a DIY "semantization" of text.
Simply, if you put the code <section begin="birth date" />May 6, 1876<section end="birth date" /> into a "Author:Pinco Pallino", you can obviuosly access to birth date of the imaginary author Pinco Pallino with #lst:
{{#section:Author:Pinco Pallino|birth date}}
that is pretty intuitive. (it should work also with the brand-new syntax ## ##)
Using this "emerging feature" we are converting all the parameters of main, standard templates into "DIY variables" by bot, with very interesting, unexpected results.
You obviously need to put the "extracted data" somewhere: in it.source we decided to test as a data container both Talk pages (nsPage:) and main page itself, with a Javascript system for hiding the code to users (but it's there). Of course, if things go ahed we could even request a Data: namespace...
Currently, we are using this feature to show in with a single template the year and the status of a single text or page. For examples, see http://it.wikisource.org/wiki/Pagina_principale/Sezioni#Ultimi_arrivi.
Or to show the page status in the transcluded version: http://it.wikisource.org/wiki/Storia_di_una_capinera/II
At the end of the day, we have a system to "extract data" e do whatever we want with them. If we had proper templates following Dublin Core (and maybe a form Upload-like to put the data inside), we could even solve the bulk of our metadata issue. In Paris, in the GLAM conference, we'll probably discuss a lot about these (definitely, I will).
So we want to ask you if there are any other similar experience, and which drawbacks do you see. We are aware that the system is heavy, it's not the way to fix this kind of problem, and stuff like that. If you can amend it, or suggest better practices/procedure, you are definetely welcome. For once, it would be very good to coordinate/collaborate together in developing an important tool as this seem to be.
Cheers,
Aubrey WMI Board
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
wikisource-l@lists.wikimedia.org