Hoi, The big point of having data that is consistently "different" is that it can be queried and using query it can be changed. There is a problem in having so much data but the problem in not having data is even bigger. A good example is in "professor".. it has its own structures (multiple) and in essence they are a "position". Like a president, a professor retains this title after the period of active University involvement. Some people insist that it is a "profession", I am of the opposite point of view. Now someone who holds the profession of professor must be or must have been in the employment of a university. So it is possible to find problematic professors.
So yes it is a scaffold and yes things may change because there is all too much that is severely problematic. This is not a problem when the data has some kind of structure. Thanks, GerardM
On 17 May 2017 at 18:30, Joakim Soderberg joakim.soderberg@blippar.com wrote:
Aubrey, Thanks for the explanation and enlightening examples. Here are my two cents’ worth.
Having an open community defining semantics to cross domain items is powerful and challenging. To mention one problem, I have come across several statements where ‘instance of’ has been confused with ‘about’, which is harmful when you want to query for a type. Perhaps manual curation or bots can help out more?
The challenge is that as wikidata complexity grows, it becomes more difficult for the community to chose the right properties and classes. A somewhat related example from Wikipedia; the article about Amy_Brand was written using the inbox template “adult biography” instead of “person”, which had the consequence of the entity being classified as “adult actor” in dbpedia.
Best regards Joakim
On May 17, 2017, at 8:17 AM, Andrea Zanni zanni.andrea84@gmail.com wrote:
[Sorry for cross-posting. I just sent this mail to the Wikicite mailing list, but I thought it could be of interest also here. Moreover, the community here could provide some important insight to better explain wikidata to "newbies" and set the right path for a meaningful conference.]
Dear all, sorry for the messy email, but I'd like to to express a small concern I have regarding the awesome Wikicite conference we'll have in few days.
My main point is that Wikidata is *complex*. It's not just the data model (which is not easy, per se), but the whole idea behind wikidata, the policies, the community, the workflow, the tools, and the whole helo of vagueness that sorrounds it.
My favourite example about this is this little story: https://en.wikipedia.org/wiki/Blind_men_and_an_elephant (which was actually a very good metaphor used by my professors introducing the Information Science course...)
Different people with different background work with data in different manners: the word "data" (as "information") means anything and its contrary, so we must be careful. You know better than me that everyone projects her own dreams and delusions upon Wikidata, so we must work towards a good understanding at the beginning, to avoid the painful and time-consuming job of making people un-learn things they think they know. This was quite evident especially last year, when a lot of librarians and wikimedians were in the same room, and everyone knew many things about metadata and their metadata model: librarians had difficulties grasping things about Wikibase, Wikidata, policies and communities, and wikimedians about bibliographic models and complexities.
We spent at least 30 minutes explaining Wikidata from the beginning, also adding some "color" about strategies, policies and community.
I'll try to make examples.
- we need to explain that in Wikidata (and Wikibase) everything is an
*item*. Everything. Every items has properties, values, qualifiers. What's not possible is to have sets or clusters of items, and to give properties to such sets and clusters. This is somehow related to the book topic.
- we need to explain that there are at least 2 different possible
strategies: create few general properties and many specific items, or create many specific properties and less general items. Wikidata chose the former.
- we need to explain the "scaffolding principle", meaning that we don't
need to put *all the info in all the items*. We need to create and organize items that are *queriable*, in such a way that I can make a query that get all the data and details I need, scattered among different items. If the items in questions are built "on top of each other", this is doable. It is actually very important to understand this, because people get confused about how many things to say inside one item. This principle (and the former) was explained to me by Tobias1984, and helped me a lot in my understanding of Wikidata.
I think that this kind of insight is crucial for working with Wikidata in a meaningful way, because wikidata offers *one product*, with *one data model*, and it simply impossible to adapt and stretch the Complexity Of The World to Wikidata without loss of information. I'm sure many of you know this perfectly, but other people maybe don't, or at least they will struggle with it. We all come from different backgrounds and are emotionally very attached to our models and our crucial pieces of information we don't want to lose: in this sensse, Wikidata is much more a "negotiation" than Wikipedia, because Wikipedia is not structured and llows for much more messiness.
Every model and decision we will make will be a trade-off, and I think it's worth trying to save time and effort trying to establish these boundaries at the beginning. Of course, there are many insights about Wikidata I don't have, but those are kinda the things we want to understand first in these kind of conferences. Hope this makes sense.
Aubrey
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata