Stas,

Always agreed, it's a classification problem.

So what claims/statements do I rule out ?  Or what should I only rule in (claims/statements) when wanting to return only "real" entities ?  Can someone help with those negative claims/statements that I am looking for ?
So far, I only have got

​1. ​
 filtering out any entry with P31:Q13406463 should omit most
​ ​
of them from your results. 

All,

Freebase simply decided to not keep Wikipedia topic pages that simply held "lists of entities", but instead Freebase liked to easily generate those same "lists of entities" by using queries.  There was no need to have hand coded lists in Freebase.  It was a graph database and could generate all kinds of lists programmaticlly for a user, and keep those lists as views against our user profile for easy tweaking or re-use when we wanted to. (stored user queries)


On Mon, Jun 15, 2015 at 2:56 PM, Stas Malyshev <smalyshev@wikimedia.org> wrote:
Hi!

> In Freebase, we had bot scripts that went through and removed "Lists of
> Things" topic entities since they are lists of entities and not useful
> clumped together and normalized in a graph database.

Why delete them? Wikidata has a number of things which are not your
standard "entity" - lists, sources, news, quotes, service entries,
narrative articles (e.g.
https://en.wikipedia.org/wiki/Control_of_fire_by_early_humans - it's not
exactly "entity" like "human" or "fire"), etc. So I don't think the
approach that singles out and excludes lists would help much - if you
have an application that needs "individual entities" like "Douglas
Adams" or "London" and exclude other types will have to exclude much
more than just lists - but I think the approach of asking for exactly
what you need and ignoring the rest may prove more efficient. I'm not
sure there's really well-defined criteria to specify what "individual
entity" actually is - I'm sure you have one that matches your
application, but some other application may have completely different one.
Generally, this can be solved by better classification I think, but so
far I'm not sure what to base this classification on.
--
Stas Malyshev
smalyshev@wikimedia.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata