On Thu, Dec 19, 2019 at 11:16 PM Aidan Hogan <aidhog(a)gmail.com> wrote:
- @Lydia, good point! I was thinking that filtering by
wikilinks will
just drop some more obscure nodes (like Q51366847 for example), but had
not considered that there are some more general "concepts" that do not
have a corresponding Wikipedia article. All the same, in a lot of the
research we use Wikidata for, we are not particularly interested in one
thing or another, but more interested in facilitating what other people
are interested in. Examples would be query performance, finding paths,
versioning, finding references, etc. But point taken! Maybe there is a
way to identify "general entities" that do not have wikilinks, but do
have a high degree or centrality, for example? Would a degree-based or
centrality-based filter be possible in something like WDumper (perhaps
it goes beyond the original purpose; certainly it does not seem trivial
in terms of resources used)? Would it be a good idea?
I think it's definitely worth exploring but I fear it needs someone to
actually sit down and collect the different dumps use-cases and talk
to people to figure out which part of the data they need. Based on
that we could identify common patterns. (I think this is something
that needs to be done but unfortunately can't dedicate time to it in
the foreseeable future.
https://phabricator.wikimedia.org/T46581 is a
good place for people who want to help think it through.
Cheers
Lydia
--
Lydia Pintscher -
http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.