Stas,
Always agreed, it's a classification problem.
So what claims/statements do I rule out ? Or what should I only rule in (claims/statements) when wanting to return only "real" entities ? Can someone help with those negative claims/statements that I am looking for ? So far, I only have got
1. filtering out any entry with P31:Q13406463 should omit most of them from your results.
All,
Freebase simply decided to not keep Wikipedia topic pages that simply held "lists of entities", but instead Freebase liked to easily generate those same "lists of entities" by using queries. There was no need to have hand coded lists in Freebase. It was a graph database and could generate all kinds of lists programmaticlly for a user, and keep those lists as views against our user profile for easy tweaking or re-use when we wanted to. (stored user queries)
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 2:56 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
In Freebase, we had bot scripts that went through and removed "Lists of Things" topic entities since they are lists of entities and not useful clumped together and normalized in a graph database.
Why delete them? Wikidata has a number of things which are not your standard "entity" - lists, sources, news, quotes, service entries, narrative articles (e.g. https://en.wikipedia.org/wiki/Control_of_fire_by_early_humans - it's not exactly "entity" like "human" or "fire"), etc. So I don't think the approach that singles out and excludes lists would help much - if you have an application that needs "individual entities" like "Douglas Adams" or "London" and exclude other types will have to exclude much more than just lists - but I think the approach of asking for exactly what you need and ignoring the rest may prove more efficient. I'm not sure there's really well-defined criteria to specify what "individual entity" actually is - I'm sure you have one that matches your application, but some other application may have completely different one. Generally, this can be solved by better classification I think, but so far I'm not sure what to base this classification on. -- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata