Hello.
Some of you have heard me rant about this for a couple of years now. So, I finally wrote something up:
https://meta.wikimedia.org/wiki/Massively-Multiplayer_Online_Bibliography
Much, much to be added, but I'd love for this to be a group conversation, so by all means, dig in! :)
A.
HI Asaf,
It's an interesting idea and suggests to me crowdsourcing subjects through something like tagging. For many years library staff have known that the manner in which subject headings are assigned for library materials is questionable, yet to come up with a better scheme is difficult. One problem that tagging shows is that people's command of language is very different. If there wasn't a controlled vocabulary or thesaurus, people would create numerous tags that, once combined with other articles, might be less useful than doing a full-text search.
It's interesting to me that databases like JSTOR don't use subject headings except with regard to the discipline of the journal where the article first appeared. They depend on relevancy rankings to assist users in finding articles. Then there's the RILM database which established what it thought were fixed broad subject areas, but which are messy once interdisciplinary articles show up.
Perhaps you can show an example of a single article with the kind of system you're proposing?
Thanks.
Bob Kosovsky, Ph.D. -- Curator, Rare Books and Manuscripts, Music Division, The New York Public Library for the Performing Arts blog: http://www.nypl.org/blog/author/44 Twitter: @kos2 Listowner: OPERA-L ; SMT-TALK ; SMT-ANNOUNCE ; SoundForge-users - My opinions do not necessarily represent those of my institutions -
On Sun, Aug 18, 2013 at 1:38 AM, Asaf Bartov abartov@wikimedia.org wrote:
Hello.
Some of you have heard me rant about this for a couple of years now. So, I finally wrote something up:
https://meta.wikimedia.org/wiki/Massively-Multiplayer_Online_Bibliography
Much, much to be added, but I'd love for this to be a group conversation, so by all means, dig in! :)
A.
Asaf Bartov Wikimedia Foundation <http://www.wikimediafoundation.org>
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! https://donate.wikimedia.org
Libraries mailing list Libraries@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/libraries
Hi, everyone.
First of all, thanks very much for the positive feedback, and for signing at the page on Meta. That's a pretty stellar team already! :)
I'll respond to some of the comments here, but my aim is to keep the useful information up-to-date on Meta, so I'll be pasting the useful stuff back to Meta, and I encourage those of you who can deal with basic markup (surely at least most of you?) to continue refining the idea on the Meta page and to continue the conversation on its talk page.
On Sun, Aug 18, 2013 at 9:35 AM, Bob Kosovsky bobkosovsky@nypl.org wrote:
It's an interesting idea and suggests to me crowdsourcing subjects through something like tagging. For many years library staff have known that the manner in which subject headings are assigned for library materials is questionable, yet to come up with a better scheme is difficult. One problem that tagging shows is that people's command of language is very different. If there wasn't a controlled vocabulary or thesaurus, people would create numerous tags that, once combined with other articles, might be less useful than doing a full-text search.
Agreed. That's why I'm proposing using controlled vocabularies, alongside curated datasets such as Wikidata. That way we'll avoid having multiple unlinked variations on [[G. K. Chesterton]]'s name, for example.
It's interesting to me that databases like JSTOR don't use subject headings except with regard to the discipline of the journal where the article first appeared. They depend on relevancy rankings to assist users in finding articles.
Isn't that most likely because JSTOR don't have ready access to, or experience with engaging, a massive volunteer base who would undertake the work of classifying articles by subject headings? What I'm suggesting is obviously useful for JSTOR content as well, though I'd bet a good portion of JSTOR material is already topic-indexed fairly well in disciplinary bibliographical journals and databases (e.g. in my own academic field, classics, that would be L'Année Philologique[1]), so I'm leaving JSTOR out of the initial scope, for now.
Then there's the RILM database which established what it thought were fixed broad subject areas, but which are messy once interdisciplinary articles show up.
The way I'm approaching this is multiple, overlapping classifications, in multiple languages and according to multiple classification systems.[2] With some upvoting/downvoting or similar mechanism, I believe it can adequately solve interdisciplinary works.
Perhaps you can show an example of a single article with the kind of system
you're proposing?
Sure, let's see:
So, T.S. Eliot's "Hamlet and His Problems" -- http://en.wikisource.org/wiki/The_Sacred_Wood/Hamlet_and_His_Problems -- could be classified as ABOUT (or dc:subject[3], etc.): 1. http://lccn.loc.gov/sh85058566 -- "Hamlet (Legendary character)" (this is from the Library of Congress Subject Headings) 2. http://lccn.loc.gov/sh2008112835 -- "Theater--England--History--16th century" (likewise) 3. http://www.wikidata.org/wiki/Q2447542 -- "Prince Hamlet" (an item on Wikidata, about the fictional character Hamlet) -- sufficient to retrieve multi-lingual labels, link to Wikipedia articles, etc. 4. http://www.wikidata.org/wiki/Q41567 -- "Hamlet" (an item on Wikidata, about the play by Shakespeare) -- likewise 5-8. (a few more in English) 9-19. subject headings from some other thesaurus (do the DNB or BNF share their subject authority files like the LoC?)
All of these classifications are stored (either as Linked Data triples or in some conventional RDBMS [exposable as triples]) and can then be reviewed, revised, upvoted/downvoted, and of course searched.
I hope that is clear?
Cheers,
Asaf
P.S. I'm posting on this list with my Wikimedia Foundation address, because that's the one I'm subscribed to the list with, but let me clarify once again -- this is my own volunteer initiative, stemming from a longtime personal interest, and is neither endorsed by nor officially on the agenda of the Wikimedia Foundation.
[1] http://en.wikipedia.org/wiki/L%27Ann%C3%A9e_philologique [2] Do people still say "folksonomy"? ;) [3] http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=elements#subject
On Sun, Aug 18, 2013 at 1:38 AM, Asaf Bartov abartov@wikimedia.orgwrote:
Hello.
Some of you have heard me rant about this for a couple of years now. So, I finally wrote something up:
https://meta.wikimedia.org/wiki/Massively-Multiplayer_Online_Bibliography
Much, much to be added, but I'd love for this to be a group conversation, so by all means, dig in! :)
A.
On 08/20/2013 08:02 AM, Asaf Bartov wrote:
On Sun, Aug 18, 2013 at 9:35 AM, Bob Kosovsky <bobkosovsky@nypl.org mailto:bobkosovsky@nypl.org> wrote:
It's interesting to me that databases like JSTOR don't use subject headings except with regard to the discipline of the journal where the article first appeared.
Isn't that most likely because JSTOR don't have ready access to, or experience with engaging, a massive volunteer base who would undertake the work of classifying articles by subject headings?
Wikisource does have access to volunteers, but the individual articles in Popular Science Monthly and other journals or magazines aren't being systematically cataloged and indexed (or categorized) as they could. This is because our supply of volunteers is not infinite, even if the project is open to anybody.
Similarly, Wikipedia is quite large, but only in very few languages. In most languages it is quite small, because of the limited number of volunteers.
Hi Asaf,
It's an interesting idea, thanks for throwing it out there. Just to play devil's advocate a little bit, aren't most of the citations and external links in Wikipedia articles assertions of "aboutness"? How is what you are proposing different? For example, from the English Wikipedia Article for Friendship you could derive the following RDF assertion:
https://en.wikisource.org/wiki/Essays:_First_Series/Friendship dcterms:subject http://www.wikidata.org/entity/Q491 .
I guess answering my own question a bit, perhaps it could be easier for people to make these assertions as they are reading material on the web...and that perhaps not all of them belong in the citation or external links sections of Wikipedia articles? Some articles could get a bit long and unwieldy. I remember a social bookmarking site called Faviki that uses Wikipedia as a controlled vocabulary for tagging content while bookmarking it. Is that similar to what you are thinking about?
//Ed
On Sun, Aug 18, 2013 at 1:38 AM, Asaf Bartov abartov@wikimedia.org wrote:
Hello.
Some of you have heard me rant about this for a couple of years now. So, I finally wrote something up:
https://meta.wikimedia.org/wiki/Massively-Multiplayer_Online_Bibliography
Much, much to be added, but I'd love for this to be a group conversation, so by all means, dig in! :)
A.
Asaf Bartov Wikimedia Foundation
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! https://donate.wikimedia.org
Libraries mailing list Libraries@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/libraries
Hi all. I think that Asaf idea is very interesting, but of course my ultimate and neverending goal is to have Wikisource being a part/partner of it :-)
I have very unclear ideas about this, but: * couldn't the project completely rely on Wikidata? You can have an item for (almost) every record: http://www.wikidata.org/wiki/Help:Sources Micru (in copy) can explain more about this.
* couldn't we take all the Open Library data? are they CC0?
* how do you see the relationship of this with Wikipedia and Wikisource? One of the things I think about most is the fact that in Wikisource we actually use some template ad hoc for cited authors and cited works. es. http://it.wikisource.org/wiki/Storia_della_letteratura_italiana_(De_Sanctis)... Every blue link it's a wikilink to another Wikiosurce work/author page. Moreover, at the bottom of the page you can see categories that list every citation of every author/work in Wikisource. I mapped this kind of relationship from a "mentions" property from schema .org to a wikidata property (the whole mapping we used as a draft it's here: https://docs.google.com/spreadsheet/ccc?key=0AlPNcNlN2oqvdFQyR2F5YmhrMWpXaUF... )
I think that these templates could convey (in a way I don't know) a "mentions" property in Wikidata: ex. Book Q98 mentions Author Q42, or something like this.
Do we want a "cited thing/concept/item" template? That could link directly to Wikipedia, for example.
In my ideal digital library, this kind of annotations would be made upon a different layer, and not in the wikitext (as we are doing now). Of course, I can and will discuss about this in the biblio-hackathon we will host at the National Library of Florence in October to the Pund.it folks http://thepund.it
Finally, I would recommend to discuss about all these things in our beloved Books task force: https://www.wikidata.org/wiki/Wikidata:Books_task_force :-)
Aubrey
On Sun, Aug 18, 2013 at 7:44 PM, Ed Summers ehs@pobox.com wrote:
Hi Asaf,
It's an interesting idea, thanks for throwing it out there. Just to play devil's advocate a little bit, aren't most of the citations and external links in Wikipedia articles assertions of "aboutness"? How is what you are proposing different? For example, from the English Wikipedia Article for Friendship you could derive the following RDF assertion:
<https://en.wikisource.org/wiki/Essays:_First_Series/Friendship>
dcterms:subject http://www.wikidata.org/entity/Q491 .
I guess answering my own question a bit, perhaps it could be easier for people to make these assertions as they are reading material on the web...and that perhaps not all of them belong in the citation or external links sections of Wikipedia articles? Some articles could get a bit long and unwieldy. I remember a social bookmarking site called Faviki that uses Wikipedia as a controlled vocabulary for tagging content while bookmarking it. Is that similar to what you are thinking about?
//Ed
On Sun, Aug 18, 2013 at 1:38 AM, Asaf Bartov abartov@wikimedia.org wrote:
Hello.
Some of you have heard me rant about this for a couple of years now.
So, I
finally wrote something up:
https://meta.wikimedia.org/wiki/Massively-Multiplayer_Online_Bibliography
Much, much to be added, but I'd love for this to be a group
conversation, so
by all means, dig in! :)
A.
Asaf Bartov Wikimedia Foundation
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! https://donate.wikimedia.org
Libraries mailing list Libraries@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/libraries
Libraries mailing list Libraries@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/libraries
Hi all, Thanks for sharing your thoughts, Asaf. I have discussed some times with Andrea, Gerard, and others about the need of a portal for presenting all the bibliographic information from Wikipedia and Wikisource in an user-friendly format, I am glad that you put it into words. The key for this to happen is of course Wikidata, once the structure is defined (done[1]) and the information from infoboxes, citation templates and Wikisource is imported (that might take some months), it should be quite easy to have a portal as the one you suggest. For items repersenting people we already have something like that: http://tools.wmflabs.org/reasonator/?q=Q42
As for the aboutness, the needed property is already in discussion [2], however as Bob has mentioned, that is only part of the solution. For the searches to yield more results, the Wikidata implementation of Wiktionary should be in place, then it would be easy to connect synonyms and related words without having to resort to controlled vocabularies.
Even then, I would like to ask you, is it really that useful? I consider that a finer granularity might be more interesting for researchers ( thepund.it seems like a good candidate), and if it is about reading recommendations, then recommender engines work quite well, but that is a different story.
About importing all the metadata, I am not sure that would fall within the scope of Wikidata. The mission is to support WM projects, so that there is metadata at all, it is just a byproduct, not the primary aim.
Micru
[1] http://www.wikidata.org/wiki/Wikidata:Books_task_force [2] http://www.wikidata.org/wiki/Wikidata:Property_proposal/Creative_work#main_t...
On Sun, Aug 18, 2013 at 2:27 PM, Andrea Zanni zanni.andrea84@gmail.comwrote:
Hi all. I think that Asaf idea is very interesting, but of course my ultimate and neverending goal is to have Wikisource being a part/partner of it :-)
I have very unclear ideas about this, but:
- couldn't the project completely rely on Wikidata? You can have an item
for (almost) every record: http://www.wikidata.org/wiki/Help:Sources Micru (in copy) can explain more about this.
couldn't we take all the Open Library data? are they CC0?
how do you see the relationship of this with Wikipedia and Wikisource?
One of the things I think about most is the fact that in Wikisource we actually use some template ad hoc for cited authors and cited works. es. http://it.wikisource.org/wiki/Storia_della_letteratura_italiana_(De_Sanctis)... Every blue link it's a wikilink to another Wikiosurce work/author page. Moreover, at the bottom of the page you can see categories that list every citation of every author/work in Wikisource. I mapped this kind of relationship from a "mentions" property from schema .org to a wikidata property (the whole mapping we used as a draft it's here: https://docs.google.com/spreadsheet/ccc?key=0AlPNcNlN2oqvdFQyR2F5YmhrMWpXaUF... )
I think that these templates could convey (in a way I don't know) a "mentions" property in Wikidata: ex. Book Q98 mentions Author Q42, or something like this.
Do we want a "cited thing/concept/item" template? That could link directly to Wikipedia, for example.
In my ideal digital library, this kind of annotations would be made upon a different layer, and not in the wikitext (as we are doing now). Of course, I can and will discuss about this in the biblio-hackathon we will host at the National Library of Florence in October to the Pund.it folks http://thepund.it
Finally, I would recommend to discuss about all these things in our beloved Books task force: https://www.wikidata.org/wiki/Wikidata:Books_task_force :-)
Aubrey
On Sun, Aug 18, 2013 at 7:44 PM, Ed Summers ehs@pobox.com wrote:
Hi Asaf,
It's an interesting idea, thanks for throwing it out there. Just to play devil's advocate a little bit, aren't most of the citations and external links in Wikipedia articles assertions of "aboutness"? How is what you are proposing different? For example, from the English Wikipedia Article for Friendship you could derive the following RDF assertion:
<https://en.wikisource.org/wiki/Essays:_First_Series/Friendship>
dcterms:subject http://www.wikidata.org/entity/Q491 .
I guess answering my own question a bit, perhaps it could be easier for people to make these assertions as they are reading material on the web...and that perhaps not all of them belong in the citation or external links sections of Wikipedia articles? Some articles could get a bit long and unwieldy. I remember a social bookmarking site called Faviki that uses Wikipedia as a controlled vocabulary for tagging content while bookmarking it. Is that similar to what you are thinking about?
//Ed
On Sun, Aug 18, 2013 at 1:38 AM, Asaf Bartov abartov@wikimedia.org wrote:
Hello.
Some of you have heard me rant about this for a couple of years now.
So, I
finally wrote something up:
https://meta.wikimedia.org/wiki/Massively-Multiplayer_Online_Bibliography
Much, much to be added, but I'd love for this to be a group
conversation, so
by all means, dig in! :)
A.
Asaf Bartov Wikimedia Foundation
Imagine a world in which every single human being can freely share in
the
sum of all knowledge. Help us make it a reality! https://donate.wikimedia.org
Libraries mailing list Libraries@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/libraries
Libraries mailing list Libraries@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/libraries
On Sun, Aug 18, 2013 at 10:44 AM, Ed Summers ehs@pobox.com wrote:
It's an interesting idea, thanks for throwing it out there. Just to play devil's advocate a little bit, aren't most of the citations and external links in Wikipedia articles assertions of "aboutness"?
Some are, to be sure! And it would be an interesting path to explore to try to figure out how much and how useful it would be to "seed" or "suggest" (to human volunteers) classifications for essays in this semi-automatic way!
How is what you are proposing different?
I am proposing a human volunteer reading an essay (begin with the text), then (conveniently! a UI/usability challenge!) selecting topics from controlled vocabularies and Wikidata item titles and asserting the essay is about those things.
For example, from the English
Wikipedia Article for Friendship you could derive the following RDF assertion:
<https://en.wikisource.org/wiki/Essays:_First_Series/Friendship>
dcterms:subject http://www.wikidata.org/entity/Q491 .
Yes, but equally, from the same "Further reading" section, your algorithm would assert Aristotle's _Nicomachean Ethics_ is about Friendship, which would be misleading -- friendship is certainly a theme in that book, taking up perhaps 15% of the discussion, but it would be misleading to assert the entire book is about friendship, _unless_ you also assert all the other topics it includes (virtue, moderation, the examined life, etc.).
That very issue is another reason I'm focusing on individual essays and articles for this particular project (note that the MMOB vision is very broad and foresees multiple projects, some infrastructural and some higher level, all long term and ongoing). Complete works (such as the Ethics) are already catalogued and classified reasonably usefully by traditional catalogues. The goal here is to extend this to the vast space of essays, which are on the one hand nearly invisible to topical searches (as distinct from full-text searches), while on the other hand usually confined to one (or rarely two) clear topics, making the human classifier's work simpler.
I guess answering my own question a bit, perhaps it could be easier
for people to make these assertions as they are reading material on the web...and that perhaps not all of them belong in the citation or external links sections of Wikipedia articles? Some articles could get a bit long and unwieldy. I remember a social bookmarking site called Faviki that uses Wikipedia as a controlled vocabulary for tagging content while bookmarking it. Is that similar to what you are thinking about?
Hmm, yes! Thanks for this reference! Yes, Faviki is very much along the lines I'm thinking about. The obvious difference I see, having only read Faviki's about page so far, is that it classifies arbitrary Web pages (rather than a well-defined set of works), i.e. is broader in its target scope, and relies only on dbPedia concepts, which is narrower than the combined authority-files-and-Wikidata approach I have in mind. It's also not immediately clear where the data resides, how re-usable it is, etc., but perhaps further inquiry will reveal this. But again, this is very much the direction I was thinking of. It would be interesting to see if the Faviki maintainer would be interested in joining this conversation.
Thanks again for engaging!
Asaf