Concepts and words are different things,  or better yet,  words (word senses,  ...) are a special kind of concept.

I was looking at what the data model for a system that supports logical representation of 100% of critical knowledge in business and technical documents over narrow domains.

One thing I tried was (more or less) Wikidata+Wordnet and I found the Wordnet part was difficult to apply.  Where Wikidata concepts match text chunks it works OK,  but trying to deal with the verbs and prepositions and all that stuff is labor intensive,  hard to do correctly,  and doesn't contribute much to machine readable semantics.  It is more useful to model verb functions in terms of discontinuous chunks which form templates,  i.e. often the verb and associated prepositions together are a good unit of modelling.

Super-Wordnet,  however,  will still be interesting to humans who might want to pin down exact word senses in a contract.

On Fri, May 8, 2015 at 11:35 AM, Luca Martinelli <martinelliluca@gmail.com> wrote:
2015-05-08 15:33 GMT+02:00 Federico Leva (Nemo) <nemowiki@gmail.com>:
> +1. The Wikimedia community has been long able to think of all the Wikimedia
> projects as an organic whole. Software, on the other hand, too often forced
> innatural divisions.
>
> Wiktionary, Wikipedia, Commons and Wikiquote (to name the main cases) link
> to each other all the time in a constructive division of labour. It makes no
> sense to make connections between them harder.


I start from here, since Nemo got the point IMHO: the fact that every
project has its own scope doesn't imply that the whole of the
community works on different scopes - we just decided to split up our
duties among ourselves. But it's not just that.

TL;DR: Wikidata and Wiktionary deal with the same things (concepts),
therefore are best-suited for each other, given some needed
adaptations. Structured Data and Structured Wikiquote deal with
different things (objects), therefore are not to be considered good
examples.

Long version here:

In theory, one might just agree that a separate instance of Wikibase
might be the best solution for Wiktionary, but Structured Data and
Structured Wikiquote are different from a theoretical "Structured
Wiktionary", because they respectively deal with images, quotes and
words.

Images and quotes are describable *objects*, as the Wiki*
articles/pages are, and there are billions and billions of those
objects out there. This is the main, if not just the only, reason why
we *have* to put up a separate instance of Wikibase to deal with them:
thinking that Wikidata might deal with such an infinite task is just
nuts.

Words, on the other hands, are describable *concepts*, not objects.
They can be linked one another by relation, they have synonyms and
opposites, they can be regrouped or separated, etcetera, which is
exactly what we're currently doing with Wikidata items.

I know, words are even more than images and quotes, so it would be
even more nuts to think to deal with this just with Wikidata - but
Wikidata is *already* structured for dealing with concepts, making it
the best choice for integrating data from Wiktionary.

In other words, Wikidata and Wiktionary both work with *concepts*,
while all the other projects work with *objects*. From a more
practical point of view, why should I have a Wikidata item about, say,
present tense[1] *AND* a completely similar item on "Structured
Wiktionary"? It's the same concept, why should I have it in two
different-yet-linked databases, belonging to and maintained by the
very same community? Why can't we work something out to keep all
informations just in one database?

This is why I think that setting up a separate Wikibase for Wiktionary
might end up in doubling our efforts and splitting our communities,
which is exactly the opposite of what we need to do (halving the
efforts and doubling the community).[2]

Sorry for the long post. :)


[1] https://www.wikidata.org/wiki/Q192613
[2] Not sure if I have to remark this, but please, PLEASE, note this
is just an exaggeration for argument's sake, I have of course no data
that might confirm factually that the WD community will surge by 100%.
I just want to make clear my concept (heh).

--
Luca "Sannita" Martinelli
http://it.wikipedia.org/wiki/Utente:Sannita

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



--
Paul Houle

Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes

(607) 539 6254    paul.houle on Skype   ontology2@gmail.com