On Thu, Dec 19, 2019 at 11:16 PM Aidan Hogan aidhog@gmail.com wrote:
- @Lydia, good point! I was thinking that filtering by wikilinks will
just drop some more obscure nodes (like Q51366847 for example), but had not considered that there are some more general "concepts" that do not have a corresponding Wikipedia article. All the same, in a lot of the research we use Wikidata for, we are not particularly interested in one thing or another, but more interested in facilitating what other people are interested in. Examples would be query performance, finding paths, versioning, finding references, etc. But point taken! Maybe there is a way to identify "general entities" that do not have wikilinks, but do have a high degree or centrality, for example? Would a degree-based or centrality-based filter be possible in something like WDumper (perhaps it goes beyond the original purpose; certainly it does not seem trivial in terms of resources used)? Would it be a good idea?
I think it's definitely worth exploring but I fear it needs someone to actually sit down and collect the different dumps use-cases and talk to people to figure out which part of the data they need. Based on that we could identify common patterns. (I think this is something that needs to be done but unfortunately can't dedicate time to it in the foreseeable future. https://phabricator.wikimedia.org/T46581 is a good place for people who want to help think it through.
Cheers Lydia