On Sun, Aug 18, 2013 at 10:44 AM, Ed Summers <ehs@pobox.com> wrote:

It's an interesting idea, thanks for throwing it out there. Just to
play devil's advocate a little bit, aren't most of the citations and
external links in Wikipedia articles assertions of "aboutness"?

Some are, to be sure!  And it would be an interesting path to explore to try to figure out how much and how useful it would be to "seed" or "suggest" (to human volunteers) classifications for essays in this semi-automatic way!
 
How is
what you are proposing different?

I am proposing a human volunteer reading an essay (begin with the text), then (conveniently! a UI/usability challenge!) selecting topics from controlled vocabularies and Wikidata item titles and asserting the essay is about those things.

For example, from the English
Wikipedia Article for Friendship you could derive the following RDF
assertion:

    <https://en.wikisource.org/wiki/Essays:_First_Series/Friendship>
dcterms:subject <http://www.wikidata.org/entity/Q491> .

Yes, but equally, from the same "Further reading" section, your algorithm would assert Aristotle's _Nicomachean Ethics_ is about Friendship, which would be misleading -- friendship is certainly a theme in that book, taking up perhaps 15% of the discussion, but it would be misleading to assert the entire book is about friendship, _unless_ you also assert all the other topics it includes (virtue, moderation, the examined life, etc.).  

That very issue is another reason I'm focusing on individual essays and articles for this particular project (note that the MMOB vision is very broad and foresees multiple projects, some infrastructural and some higher level, all long term and ongoing).  Complete works (such as the Ethics) are already catalogued and classified reasonably usefully by traditional catalogues.  The goal here is to extend this to the vast space of essays, which are on the one hand nearly invisible to topical searches (as distinct from full-text searches), while on the other hand usually confined to one (or rarely two) clear topics, making the human classifier's work simpler.

I guess answering my own question a bit, perhaps it could be easier
for people to make these assertions as they are reading material on
the web...and that perhaps not all of them belong in the citation or
external links sections of Wikipedia articles? Some articles could get
a bit long and unwieldy. I remember a social bookmarking site called
Faviki that uses Wikipedia as a controlled vocabulary for tagging
content while bookmarking it. Is that similar to what you are thinking
about?

Hmm, yes!  Thanks for this reference!  Yes, Faviki is very much along the lines I'm thinking about.  The obvious difference I see, having only read Faviki's about page so far, is that it classifies arbitrary Web pages (rather than a well-defined set of works), i.e. is broader in its target scope, and relies only on dbPedia concepts, which is narrower than the combined authority-files-and-Wikidata approach I have in mind.  It's also not immediately clear where the data resides, how re-usable it is, etc., but perhaps further inquiry will reveal this.  But again, this is very much the direction I was thinking of.  It would be interesting to see if the Faviki maintainer would be interested in joining this conversation.  

Thanks again for engaging!

   Asaf

--
    Asaf Bartov
    Wikimedia Foundation

Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality!
https://donate.wikimedia.org