On 02.08.2016 22:28, Yuri Astrakhan wrote:
Is there a way we could have more than just the number of language links? Eg number of incoming links from other wikipedia pages?
One could have other data added to the store, but this may be more work depending on what you want. You ask about links from "wikipedia pages". If you really mean this (and not Wikidata items), then this would be a lot of work to do since one would have to update RDF when (any) Wikipedia page changes. I guess we do not have infrastructure for doing this in a life update mode. Also note that the number of these links is different in each language, so one would have to store many numbers. Overall, this link count would really be (meta)data about Wikipedia pages and their relations, and not so much about Wikidata. I think you could get such Wikipedia-specific data from DBpedia, but I am not sure how well their life endpoint keeps track of this data (since it is tricky). Maybe an offline solution that combines RDF dumps is the most practical approach for now if you really need this data.
Even storing the number of incoming links (properties) from other Wikidata items would actually be tricky. Currently, the RDF data about each item only depends on the content of this item's Wikidata page. The number of inlinks depends on other Wikidata pages, and therefore it is much more work to keep it up to date when there are edits.
Markus
On Aug 2, 2016 10:41 PM, "Markus Kroetzsch" <markus.kroetzsch@tu-dresden.de mailto:markus.kroetzsch@tu-dresden.de> wrote:
On 02.08.2016 20:59, Daniel Kinzler wrote: Am 02.08.2016 um 20:19 schrieb Markus Kroetzsch: Oh, there is a little misunderstanding here. I have not suggested to create a property "number of sitelinks in this document". What I propose instead is to create a property "number of sitelinks for the document associated with this entity". The domain of this suggested property is entity. The advantage of this proposal over the thing that you understood is that it makes queries much simpler, since you usually want to sort items by this value, not documents. One could also have a property for number of sitelinks per document, but I don't think it has such a clear use case. "number of sitelinks for the document associated with this entity" strikes me as semantically odd, which was the point of my earlier mail. I'd much rather have "number of sitelinks in this document". You are right that the primary use would be to "rank" items, and that it would be more conveniant to have the count assocdiated directly with the item (the entity), but I fear it will lead to a blurring of the line between information about the entity, and information about the document. That is already a common point of confusion, and I'd rather keep that separation very clear. I also don't think that one level of indirection would be orribly complicated. To me it's just natural to include the sitelink info on the same level as we provide a timestmap or revision id: for the document. I just proposed the simple and straightforward way to solve the practical problem at hand. It leads to shorter, more readable queries that execute faster. (I don't claim originality for this; it is the obvious solution to the problem and most people would arrive at exactly the same conclusion). Your concern is based on the assumption that there is some kind of psychological effect that a particular RDF encoding would have on users. I don't think that there is any such effect. Our users will not confuse the city of Paris with an RDF document just because of some data in the RDF store. Markus -- Prof. Dr. Markus Kroetzsch Knowledge-Based Systems Group Faculty of Computer Science TU Dresden +49 351 463 38486 <tel:%2B49%20351%20463%2038486> https://iccl.inf.tu-dresden.de/web/KBS/en _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata