In Freebase, we had bot scripts that went through and
removed "Lists of
Things" topic entities since they are lists of entities and not useful
clumped together and normalized in a graph database.
Why delete them? Wikidata has a number of things which are not your
standard "entity" - lists, sources, news, quotes, service entries,
narrative articles (e.g.
- it's not
exactly "entity" like "human" or "fire"), etc. So I
don't think the
approach that singles out and excludes lists would help much - if you
have an application that needs "individual entities" like "Douglas
Adams" or "London" and exclude other types will have to exclude much
more than just lists - but I think the approach of asking for exactly
what you need and ignoring the rest may prove more efficient. I'm not
sure there's really well-defined criteria to specify what "individual
entity" actually is - I'm sure you have one that matches your
application, but some other application may have completely different one.
Generally, this can be solved by better classification I think, but so
far I'm not sure what to base this classification on.