Hoi,
The big point of having data that is consistently "different" is that it
can be queried and using query it can be changed. There is a problem in
having so much data but the problem in not having data is even bigger. A
good example is in "professor".. it has its own structures (multiple) and
in essence they are a "position". Like a president, a professor retains
this title after the period of active University involvement. Some people
insist that it is a "profession", I am of the opposite point of view. Now
someone who holds the profession of professor must be or must have been in
the employment of a university. So it is possible to find problematic
professors.
So yes it is a scaffold and yes things may change because there is all too
much that is severely problematic. This is not a problem when the data has
some kind of structure.
Thanks,
GerardM
On 17 May 2017 at 18:30, Joakim Soderberg <joakim.soderberg(a)blippar.com>
wrote:
Aubrey,
Thanks for the explanation and enlightening examples.
Here are my two cents’ worth.
Having an open community defining semantics to cross domain items is
powerful and challenging.
To mention one problem, I have come across several statements where
‘instance of’ has been confused with ‘about’, which is harmful when you
want to query for a type.
Perhaps manual curation or bots can help out more?
The challenge is that as wikidata complexity grows, it becomes more
difficult for the community to chose the right properties and classes.
A somewhat related example from Wikipedia; the article about Amy_Brand was
written using the inbox template “adult biography” instead of “person”,
which had the consequence of the entity being classified as “adult actor”
in dbpedia.
Best regards
Joakim
On May 17, 2017, at 8:17 AM, Andrea Zanni <zanni.andrea84(a)gmail.com>
wrote:
[Sorry for cross-posting. I just sent this mail to the Wikicite mailing
list, but I thought it could be of interest also here. Moreover, the
community here could provide some important insight to better explain
wikidata to "newbies" and set the right path for a meaningful conference.]
Dear all,
sorry for the messy email, but I'd like to to express a small concern I
have regarding the awesome Wikicite conference we'll have in few days.
My main point is that Wikidata is *complex*.
It's not just the data model (which is not easy, per se), but the whole
idea behind wikidata, the policies, the community, the workflow, the tools,
and the whole helo of vagueness that sorrounds it.
My favourite example about this is this little story:
https://en.wikipedia.org/wiki/Blind_men_and_an_elephant
(which was actually a very good metaphor used by my professors introducing
the Information Science course...)
Different people with different background work with data in different
manners: the word "data" (as "information") means anything and its
contrary, so we must be careful. You know better than me that everyone
projects her own dreams and delusions upon Wikidata, so we must work
towards a good understanding at the beginning, to avoid the painful and
time-consuming job of making people un-learn things they think they know.
This was quite evident especially last year, when a lot of librarians and
wikimedians were in the same room, and everyone knew many things about
metadata and their metadata model: librarians had difficulties grasping
things about Wikibase, Wikidata, policies and communities, and wikimedians
about bibliographic models and complexities.
We spent at least 30 minutes explaining Wikidata from the beginning, also
adding some "color" about strategies, policies and community.
I'll try to make examples.
* we need to explain that in Wikidata (and Wikibase) everything is an
*item*. Everything. Every items has properties, values, qualifiers. What's
not possible is to have sets or clusters of items, and to give properties
to such sets and clusters. This is somehow related to the book topic.
* we need to explain that there are at least 2 different possible
strategies: create few general properties and many specific items,
or create many specific properties and less general items. Wikidata chose
the former.
* we need to explain the "scaffolding principle", meaning that we don't
need to put *all the info in all the items*. We need to create and organize
items that are *queriable*, in such a way that I can make a query that get
all the data and details I need, scattered among different items. If the
items in questions are built "on top of each other", this is doable. It is
actually very important to understand this, because people get confused
about how many things to say inside one item. This principle (and the
former) was explained to me by Tobias1984, and helped me a lot in my
understanding of Wikidata.
I think that this kind of insight is crucial for working with Wikidata in
a meaningful way, because wikidata offers *one product*, with *one data
model*, and it simply impossible to adapt and stretch the Complexity Of The
World to Wikidata without loss of information.
I'm sure many of you know this perfectly, but other people maybe don't, or
at least they will struggle with it.
We all come from different backgrounds and are emotionally very attached
to our models and our crucial pieces of information we don't want to lose:
in this sensse, Wikidata is much more a "negotiation" than Wikipedia,
because Wikipedia is not structured and llows for much more messiness.
Every model and decision we will make will be a trade-off, and I think
it's worth trying to save time and effort trying to establish these
boundaries at the beginning.
Of course, there are many insights about Wikidata I don't have, but those
are kinda the things we want to understand first in these kind of
conferences.
Hope this makes sense.
Aubrey
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata