In Freebase, we had bot scripts that went through and removed "Lists of Things" topic entities since they are lists of entities and not useful clumped together and normalized in a graph database.
Does Wikidata have something similar or a user review process for deletion of these ?
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
Thad +ThadGuidry https://www.google.com/+ThadGuidry
Why should they be deleted? Have you looked at our notability policy?
Greetings,
Sjoerd de Bruin sjoerddebruin@me.com
Op 15 jun. 2015, om 17:21 heeft Thad Guidry thadguidry@gmail.com het volgende geschreven:
In Freebase, we had bot scripts that went through and removed "Lists of Things" topic entities since they are lists of entities and not useful clumped together and normalized in a graph database.
Does Wikidata have something similar or a user review process for deletion of these ?
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364 https://www.wikidata.org/wiki/Q6642364
Thad +ThadGuidry https://www.google.com/+ThadGuidry _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463]) http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Benjamin has the right idea... and we did similar in Freebase in handling that same way... sometimes it was a manual labor of love... most of the time, we just deleted them and hoped that Wikipedia would make them real topic entities later on for us to properly absorb.
How Wikidata decided to handle, I don't care...if you keep them around, then just give users a way to filter them out in your API's is all that I ask. :)
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good ben.mcgee.good@gmail.com wrote:
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463]) http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
In General,
I think Wikidata needs to decide going forward if it will be a strict Entity Graph...or if it will be a Big Graph of all things Wikipedia. Its an important question...if it decides on the latter...then just give a way to filter out non-entities for the API and Search users.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 11:07 AM, Thad Guidry thadguidry@gmail.com wrote:
Benjamin has the right idea... and we did similar in Freebase in handling that same way... sometimes it was a manual labor of love... most of the time, we just deleted them and hoped that Wikipedia would make them real topic entities later on for us to properly absorb.
How Wikidata decided to handle, I don't care...if you keep them around, then just give users a way to filter them out in your API's is all that I ask. :)
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good ben.mcgee.good@gmail.com wrote:
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463])
http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) <nemowiki@gmail.com
wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I think this is clearly an evolutionary process. In the short term, wikidata needs to support Wikipedia use cases as Andrew mentioned above (thank you for the clarification). In the long term, this function and all other functions will (in my opinion) best be served by a transition into more and more of an entity graph where claims are made about things in the world rather than about constructs in a database. Perhaps there is some form of the WikiData game that could be generated to support this process for lists.
The intervening period is going to be a challenge in terms of modeling and in application-level hiding of weird ontological situations where objects are being described like (item1: instanceOf, WikipediaList AND item1: subclassOf moons of jupiter), but there is no way around it. And its 100% worthwhile to do whatever it takes to keep things integrated with Wikipedia and to further establish wikidata as indispensable there.
-Ben
On Mon, Jun 15, 2015 at 9:11 AM, Thad Guidry thadguidry@gmail.com wrote:
In General,
I think Wikidata needs to decide going forward if it will be a strict Entity Graph...or if it will be a Big Graph of all things Wikipedia. Its an important question...if it decides on the latter...then just give a way to filter out non-entities for the API and Search users.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 11:07 AM, Thad Guidry thadguidry@gmail.com wrote:
Benjamin has the right idea... and we did similar in Freebase in handling that same way... sometimes it was a manual labor of love... most of the time, we just deleted them and hoped that Wikipedia would make them real topic entities later on for us to properly absorb.
How Wikidata decided to handle, I don't care...if you keep them around, then just give users a way to filter them out in your API's is all that I ask. :)
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good <ben.mcgee.good@gmail.com
wrote:
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463])
http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 15.06.2015 um 18:11 schrieb Thad Guidry:
In General,
I think Wikidata needs to decide going forward if it will be a strict Entity Graph...or if it will be a Big Graph of all things Wikipedia. Its an important question...if it decides on the latter...then just give a way to filter out non-entities for the API and Search users.
I think there is a misunderstanding here. For practical reasons, Wikidata allows items about Wikipedia *pages*. Items that refer to Wikipedia list pages, or categories, or disambiguation pages, or policy pages, etc, are useful for managing these pages. They are conceptually different from items about "real" things.
I agree that Wikidata should not have items that *model* lists. But it can have items about list *pages* on Wikipedia.
That being said, I would love to be able to have a clear distinction between items about pages, and "real" items. To an extent, this is done via instanceof statements, e.g. instanceof -> Wikimedia Disambiguation Page. But it would be nice to haver an easier way to filter those out in contexts where they are not relevant.
Federico,
As a Data Architect, I only care about individual Entities. I do not care what Wikidata needs internal for coordination with Wikipedia, etc...
There is no contradiction...as long as Wikdata provides a good mechanism for me to filter out non-entities. Ideally an API or Search parameter that says "give me only 'real' items / entites".
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 12:31 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Thad Guidry, 15/06/2015 18:11:
I think Wikidata needs to decide going forward if it will be a strict Entity Graph...or if it will be a Big Graph of all things Wikipedia.
I understand the question, but why are the two things in contradiction?
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 15.06.2015 um 20:09 schrieb Thad Guidry:
Federico,
As a Data Architect, I only care about individual Entities. I do not care what Wikidata needs internal for coordination with Wikipedia, etc...
There is no contradiction...as long as Wikdata provides a good mechanism for me to filter out non-entities. Ideally an API or Search parameter that says "give me only 'real' items / entites".
I would also like that, for convenience.
But conceptually, Wikipedia pages are things in the world, and are "real" in that sense. So if we don't want to introduce a nasty hack into the data model, you'd have to do this by saying "exclude everything that is an instance of MediaWiki page (Q15474042)". That's a rather expensive operation...
Can you think of a way to do this nicely, that doesn't need special case hacks in the software?
We already does, all Wikimedia entities are marked as instance of ''Wikidata entity'' (or Wikidata page) or any subclass of them.
Plus usually the current properties does not applies to them, just dedicated properties, so it's unlikely we find one of them by mistake
(execpt for list items as they are a mess because they essentially are classes and are sometimes merged with actual entities, so it's a mess).
2015-06-15 18:11 GMT+02:00 Thad Guidry thadguidry@gmail.com:
In General,
I think Wikidata needs to decide going forward if it will be a strict Entity Graph...or if it will be a Big Graph of all things Wikipedia. Its an important question...if it decides on the latter...then just give a way to filter out non-entities for the API and Search users.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 11:07 AM, Thad Guidry thadguidry@gmail.com wrote:
Benjamin has the right idea... and we did similar in Freebase in handling that same way... sometimes it was a manual labor of love... most of the time, we just deleted them and hoped that Wikipedia would make them real topic entities later on for us to properly absorb.
How Wikidata decided to handle, I don't care...if you keep them around, then just give users a way to filter them out in your API's is all that I ask. :)
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good <ben.mcgee.good@gmail.com
wrote:
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463])
http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) < nemowiki@gmail.com> wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, Wikidata is not Freebase and no thanks. Thanks, GerardM
On 15 June 2015 at 18:07, Thad Guidry thadguidry@gmail.com wrote:
Benjamin has the right idea... and we did similar in Freebase in handling that same way... sometimes it was a manual labor of love... most of the time, we just deleted them and hoped that Wikipedia would make them real topic entities later on for us to properly absorb.
How Wikidata decided to handle, I don't care...if you keep them around, then just give users a way to filter them out in your API's is all that I ask. :)
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good ben.mcgee.good@gmail.com wrote:
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463])
http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) <nemowiki@gmail.com
wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Also the list entity has a function. The function of *instance of* is to identify what a page is about. A database is built on consistency, the list entity does do that for lists. A list is a very special type of a subject in comparison to other articles. It isn't linked through topic type properties. By using a list entity this kind of items are identified as such. Likewise for dps, categories, templates, etc.
Romaine
2015-06-15 17:53 GMT+02:00 Benjamin Good ben.mcgee.good@gmail.com:
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463]) http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
We can create as many specialized classes as we want. That lists are more specific than classes is not a fatality.
I think having a list about instances of a concept proves the concept is useful, so that the class is something that could exists. Moreother if we manually mark an item as an instance of such a class, in only one statement we add a lot of informations and maybe a few properties could be automatically added by a bot.
2015-06-15 18:09 GMT+02:00 Romaine Wiki romaine.wiki@gmail.com:
Also the list entity has a function. The function of *instance of* is to identify what a page is about. A database is built on consistency, the list entity does do that for lists. A list is a very special type of a subject in comparison to other articles. It isn't linked through topic type properties. By using a list entity this kind of items are identified as such. Likewise for dps, categories, templates, etc.
Romaine
2015-06-15 17:53 GMT+02:00 Benjamin Good ben.mcgee.good@gmail.com:
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463])
http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) <nemowiki@gmail.com
wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, I have been REALLY active in adding statements with "is a list of" They do have a function. They show the content of a list in Reasonator. I do appreciate it when they are retained. They are both lists and categories. THanks, GerardM
On 15 June 2015 at 17:53, Benjamin Good ben.mcgee.good@gmail.com wrote:
This is an important question. There are apparently 196,839 known list items based on a query for instanceOf Wikipedia list item (CLAIM[31:13406463]) http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
I tend to agree with Thad that these kinds of items aren't really what we want filling in WikiData. In fact replacing them with the ability to generate them automatically based on queries is a primary use case for wikidata. But just deleting them doesn't entirely make sense either because they are key signposts into things that ought to be brought into wikidata properly. The items in these lists clearly matter..
Ideally we could generate a bot that would examine each of these lists and identify the unifying properties that should be added to the items within the list that would enable the list to be reproduced by a query.
I disagree that this reasoning suggests deleting items about categories and disambiguation pages. - both of these clearly have functions in wikidata. I'm not sure what the function of a list entity is.
On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
By this reasoning we should also delete items about categories or disambiguation pages.
Thad Guidry, 15/06/2015 17:21:
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
What's the issue here? The item doesn't actually contain any list, there is no duplication or information "clumped together".
Nemo
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Thad,
These are in scope for Wikidata and so should be retained - as there are Wikipedia articles on those topics, we need to use the entries in Wikidata in order to provide cross-language functionality for those articles.
However, if you have concerns about them getting mixed in with 'real' entities, filtering out any entry with P31:Q13406463 should omit most of them from your results.
(There is a related grey area in that many "Lists of X", say for office-holders or prize-winners, often map directly to another language's entry on the office or prize; what is the Wikidata item really "about"? But that's something that hopefully will resolve itself over time.)
Andrew.
On 15 June 2015 at 16:21, Thad Guidry thadguidry@gmail.com wrote:
In Freebase, we had bot scripts that went through and removed "Lists of Things" topic entities since they are lists of entities and not useful clumped together and normalized in a graph database.
Does Wikidata have something similar or a user review process for deletion of these ?
Ex. List of tallest buildings in Wuhan - https://www.wikidata.org/wiki/Q6642364
Thad +ThadGuidry
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
In Freebase, we had bot scripts that went through and removed "Lists of Things" topic entities since they are lists of entities and not useful clumped together and normalized in a graph database.
Why delete them? Wikidata has a number of things which are not your standard "entity" - lists, sources, news, quotes, service entries, narrative articles (e.g. https://en.wikipedia.org/wiki/Control_of_fire_by_early_humans - it's not exactly "entity" like "human" or "fire"), etc. So I don't think the approach that singles out and excludes lists would help much - if you have an application that needs "individual entities" like "Douglas Adams" or "London" and exclude other types will have to exclude much more than just lists - but I think the approach of asking for exactly what you need and ignoring the rest may prove more efficient. I'm not sure there's really well-defined criteria to specify what "individual entity" actually is - I'm sure you have one that matches your application, but some other application may have completely different one. Generally, this can be solved by better classification I think, but so far I'm not sure what to base this classification on.
Stas,
Always agreed, it's a classification problem.
So what claims/statements do I rule out ? Or what should I only rule in (claims/statements) when wanting to return only "real" entities ? Can someone help with those negative claims/statements that I am looking for ? So far, I only have got
1. filtering out any entry with P31:Q13406463 should omit most of them from your results.
All,
Freebase simply decided to not keep Wikipedia topic pages that simply held "lists of entities", but instead Freebase liked to easily generate those same "lists of entities" by using queries. There was no need to have hand coded lists in Freebase. It was a graph database and could generate all kinds of lists programmaticlly for a user, and keep those lists as views against our user profile for easy tweaking or re-use when we wanted to. (stored user queries)
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 2:56 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
In Freebase, we had bot scripts that went through and removed "Lists of Things" topic entities since they are lists of entities and not useful clumped together and normalized in a graph database.
Why delete them? Wikidata has a number of things which are not your standard "entity" - lists, sources, news, quotes, service entries, narrative articles (e.g. https://en.wikipedia.org/wiki/Control_of_fire_by_early_humans - it's not exactly "entity" like "human" or "fire"), etc. So I don't think the approach that singles out and excludes lists would help much - if you have an application that needs "individual entities" like "Douglas Adams" or "London" and exclude other types will have to exclude much more than just lists - but I think the approach of asking for exactly what you need and ignoring the rest may prove more efficient. I'm not sure there's really well-defined criteria to specify what "individual entity" actually is - I'm sure you have one that matches your application, but some other application may have completely different one. Generally, this can be solved by better classification I think, but so far I'm not sure what to base this classification on. -- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
So what claims/statements do I rule out ? Or what should I only rule in (claims/statements) when wanting to return only "real" entities ? Can someone help with those negative claims/statements that I am looking for ? So far, I only have got
1. filtering out any entry with P31:Q13406463 should omit most of them from your results.
I guess it's somewhat depends on what you call "real". Unfortunately, not all items are even classified - e.g. random example: https://www.wikidata.org/wiki/Q16515271 this is wikiquote-only page, but it doesn't have any markers to say so. So with this one, I see no easy way to exclude it. OTOH, there are things like https://www.wikidata.org/wiki/Q17442446 or https://www.wikidata.org/wiki/Q17379835 - probably items in their hierarchy may be candidates for exclusion.
Thanks Stas,
Those are useful.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 7:46 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
So what claims/statements do I rule out ? Or what should I only rule in (claims/statements) when wanting to return only "real" entities ? Can someone help with those negative claims/statements that I am looking for
?
So far, I only have got
1. filtering out any entry with P31:Q13406463 should omit most of them from your results.
I guess it's somewhat depends on what you call "real". Unfortunately, not all items are even classified - e.g. random example: https://www.wikidata.org/wiki/Q16515271 this is wikiquote-only page, but it doesn't have any markers to say so. So with this one, I see no easy way to exclude it. OTOH, there are things like https://www.wikidata.org/wiki/Q17442446 or https://www.wikidata.org/wiki/Q17379835 - probably items in their hierarchy may be candidates for exclusion.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
In any case, as I said before, if you query with properties like birth day and so on, if a Wikipedia items turns on the result, its item should be edited. So it's safe to assume you won't get wikimedia items during such query.
Do you ave examples if that happened, however ?
2015-06-16 3:41 GMT+02:00 Thad Guidry thadguidry@gmail.com:
Thanks Stas,
Those are useful.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Mon, Jun 15, 2015 at 7:46 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
So what claims/statements do I rule out ? Or what should I only rule in (claims/statements) when wanting to return only "real" entities ? Can someone help with those negative claims/statements that I am looking
for ?
So far, I only have got
1. filtering out any entry with P31:Q13406463 should omit most of them from your results.
I guess it's somewhat depends on what you call "real". Unfortunately, not all items are even classified - e.g. random example: https://www.wikidata.org/wiki/Q16515271 this is wikiquote-only page, but it doesn't have any markers to say so. So with this one, I see no easy way to exclude it. OTOH, there are things like https://www.wikidata.org/wiki/Q17442446 or https://www.wikidata.org/wiki/Q17379835 - probably items in their hierarchy may be candidates for exclusion.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Thad
Freebase simply decided to not keep Wikipedia topic pages that simply held "lists of entities", but instead Freebase liked to easily generate those same "lists of entities" by using queries. There was no need to have hand coded lists in Freebase. It was a graph database and could generate all kinds of lists programmaticlly for a user, and keep those lists as views against our user profile for easy tweaking or re-use when we wanted to. (stored user queries)
We also plan to have "Query" entities once (in a dedicated namespace). They will be equivalent to current Wikipedia List articles but will be generated automatically by some kind of (SPARQL?) query. [1]
What we might consider is adding sitelinks to those entities as well and allow linking to Wikipedia pages that use this Query. This way we would still be able to serve Wikipedia's needs of linking corresponding pages on different sites while they are in a place where they semantically make sense and don't conflict with "real" items.
Best regards Bene
[1] https://www.wikidata.org/wiki/Wikidata:Glossary#Entities.2C_items.2C_propert...
Hi Bene,
Yes, I have played with a few of the SPARQL endpoints. Interesting things sometimes do popup excitedly unexpected. I like those. But many assumptions have to be made in certain corner cases. But that why I have a brain, to make my own internal DAG whenever I need. :)
ALL, Thanks for those additional suggestions and tips on filtering claims/statements.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Tue, Jun 16, 2015 at 2:27 AM, Bene* benestar.wikimedia@gmail.com wrote:
Hi Thad
Freebase simply decided to not keep Wikipedia topic pages that simply
held "lists of entities", but instead Freebase liked to easily generate those same "lists of entities" by using queries. There was no need to have hand coded lists in Freebase. It was a graph database and could generate all kinds of lists programmaticlly for a user, and keep those lists as views against our user profile for easy tweaking or re-use when we wanted to. (stored user queries)
We also plan to have "Query" entities once (in a dedicated namespace). They will be equivalent to current Wikipedia List articles but will be generated automatically by some kind of (SPARQL?) query. [1]
What we might consider is adding sitelinks to those entities as well and allow linking to Wikipedia pages that use this Query. This way we would still be able to serve Wikipedia's needs of linking corresponding pages on different sites while they are in a place where they semantically make sense and don't conflict with "real" items.
Best regards Bene
[1] https://www.wikidata.org/wiki/Wikidata:Glossary#Entities.2C_items.2C_propert...
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Il 15/06/2015 23:22, Thad Guidry ha scritto:
Freebase simply decided to not keep Wikipedia topic pages that simply held "lists of entities", but instead Freebase liked to easily generate those same "lists of entities" by using queries. There was no need to have hand coded lists in Freebase. It was a graph database and could generate all kinds of lists programmaticlly for a user, and keep those lists as views against our user profile for easy tweaking or re-use when we wanted to. (stored user queries)
Thad +ThadGuidry https://www.google.com/+ThadGuidry
Wikidata is not Freebase. Its primary use is still to link pages between projects, it'll take some time before it can serve as a real data source, with queries and all that stuff.
I'd agree with Stas - it depends immensely what you mean by "real" entities, and it would be best to define your desired subjects explicitly if possible rather than relying on removing (eg if you want to know about people, put in a filter for P31:Q5).
To remove things like disambiguation pages, categories, or lists, you would want (to use Magnus's WDQ syntax) something like claim[31:(tree[17379835][][279])] - anything that is an instance of a subclass of Q17379835, "Wikimedia page outside the main knowledge tree". This will remove (probably) all of our internal admin content.
However, defining what constitutes a "list" is challenging; many (most?) WP list articles contain a non-trivial amount of content in addition to the list per se. As a result, some are labelled "lists" (like Q6642364, a list of buildings in a city), while others are notionally articles on the class which also include a list of members (eg Q8344876, a list of recipients of an award). This complexity is not likely to go away any time soon, especially given cases where (say) the English article thinks it's a list of the set X, and the Spanish one thinks it's an article about the set X.
Andrew.
On 15 June 2015 at 22:22, Thad Guidry thadguidry@gmail.com wrote:
Stas,
Always agreed, it's a classification problem.
So what claims/statements do I rule out ? Or what should I only rule in (claims/statements) when wanting to return only "real" entities ? Can someone help with those negative claims/statements that I am looking for ? So far, I only have got
filtering out any entry with P31:Q13406463 should omit most of them from your results.
All,
Freebase simply decided to not keep Wikipedia topic pages that simply held "lists of entities", but instead Freebase liked to easily generate those same "lists of entities" by using queries. There was no need to have hand coded lists in Freebase. It was a graph database and could generate all kinds of lists programmaticlly for a user, and keep those lists as views against our user profile for easy tweaking or re-use when we wanted to. (stored user queries)
Thad +ThadGuidry
On Mon, Jun 15, 2015 at 2:56 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
In Freebase, we had bot scripts that went through and removed "Lists of Things" topic entities since they are lists of entities and not useful clumped together and normalized in a graph database.
Why delete them? Wikidata has a number of things which are not your standard "entity" - lists, sources, news, quotes, service entries, narrative articles (e.g. https://en.wikipedia.org/wiki/Control_of_fire_by_early_humans - it's not exactly "entity" like "human" or "fire"), etc. So I don't think the approach that singles out and excludes lists would help much - if you have an application that needs "individual entities" like "Douglas Adams" or "London" and exclude other types will have to exclude much more than just lists - but I think the approach of asking for exactly what you need and ignoring the rest may prove more efficient. I'm not sure there's really well-defined criteria to specify what "individual entity" actually is - I'm sure you have one that matches your application, but some other application may have completely different one. Generally, this can be solved by better classification I think, but so far I'm not sure what to base this classification on. -- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata