[Moving to wikidata-tech; previous conversation inline below]
Hi Polyglot,
ah, now I see. The Wikidata Toolkit method you call is looking for items by Wikipedia page title, not for items by label. Labels and titles are not related in Wikidata. The search by title is supported by the wbgetentities API action for which we have a wrapper class, but this API action does not support the search by label.
In fact, I am not sure that there is any API action for doing what you want. There is only wbsearchentities, but this search will return near matches and also look for aliases. Maybe this is not a big issue for long strings as in your case, but for shorter strings you would get many results and you would still need to check if they really match.
Anyway, you are right that it would be nice if we would implement support for the label/alias search as well. For this, we need to make a wrapper class for wbsearchentities. I created an issue to track this:
https://github.com/Wikidata/Wikidata-Toolkit/issues/228
Cheers,
Markus
On 13.02.2016 23:22, Jo wrote:
Hi Markus,
I'm searching for a wikidata item with that label. It would be even better if it were possible to search for a label/description combination.
This is the item I'm looking for: https://www.wikidata.org/wiki/Q22695926
I mostly want to make sure that I'm not creating duplicate entries in Wikidata, most of those schools are not noteworthy enough to get an article on Wikipedia, but since they have objects in Openstreetmap, I would think they are interesting enough for Wikidata.
Polyglot
2016-02-13 23:13 GMT+01:00 Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>:
Hi Jo, You are searching for an item that is assigned to the article "Kasega Church of Uganda Primary School" on English Wikipedia. However, there is not article of this name on English Wikipedia. Maybe there is a typo? Can you tell me which Wikidata item should be returned here? Cheers, Markus P.S. If you agree, I would prefer to continue this discussion on wikidata-tech for the benefit of others who may have similar questions. On 13.02.2016 14:47, Jo wrote: Hi Marcus, I had started to write my own implementation of a Wikidata bot in Jython, so I could use it in JOSM, but still get to code in Python. This worked well for a while, but now apparently something was changed to the login API. Anyway, I can't code in all possible things that can go wrong, so it makes more sense to reuse an existing framework. What I want to do is add items, but I want to check if they already exist first. Try as I may, I can't seem to retrieve the items I create myself, like: Kasega Church of Uganda Primary School Douglas Adams, on the other hand doesn't pose a problem. I can't figure out why this is. Some things can be found, others can't. I tried with a few more entries from recent changes. In my own bot, I had more succes with searchEntities than with getEntities. Was this implemented in WDTK? I hope you can help, I'm stuck, as it doesn't make a lot of sense to continue with the conversion, if I can't even get a trivial thing like this to work. from org.wikidata.wdtk.datamodel.helpers import Datamodel from org.wikidata.wdtk.datamodel.helpers import ItemDocumentBuilder from org.wikidata.wdtk.datamodel.helpers import ReferenceBuilder from org.wikidata.wdtk.datamodel.helpers import StatementBuilder from org.wikidata.wdtk.datamodel.interfaces import DatatypeIdValue from org.wikidata.wdtk.datamodel.interfaces import EntityDocument from org.wikidata.wdtk.datamodel.interfaces import ItemDocument from org.wikidata.wdtk.datamodel.interfaces import ItemIdValue from org.wikidata.wdtk.datamodel.interfaces import PropertyDocument from org.wikidata.wdtk.datamodel.interfaces import PropertyIdValue from org.wikidata.wdtk.datamodel.interfaces import Reference from org.wikidata.wdtk.datamodel.interfaces import Statement from org.wikidata.wdtk.datamodel.interfaces import StatementDocument from org.wikidata.wdtk.datamodel.interfaces import StatementGroup from org.wikidata.wdtk.wikibaseapi import ApiConnection from org.wikidata.wdtk.util import WebResourceFetcherImpl from org.wikidata.wdtk.wikibaseapi import ApiConnection from org.wikidata.wdtk.wikibaseapi import LoginFailedException from org.wikidata.wdtk.wikibaseapi import WikibaseDataEditor from org.wikidata.wdtk.wikibaseapi import WikibaseDataFetcher from org.wikidata.wdtk.wikibaseapi.apierrors import MediaWikiApiErrorException # print dir(ItemDocument) # print dir(ApiConnection) dataFetcher = WikibaseDataFetcher(connection, siteIri) # print dir(dataFetcher) # itemDocuments = dataFetcher.getEntityDocumentsByTitle('enwiki',['Kasega Church of Uganda Primary School']) # itemDocuments = dataFetcher.getEntityDocuments('Q22695926') itemDocuments = dataFetcher.getEntityDocumentsByTitle('enwiki','Kasega Church of Uganda Primary School') # print dir(itemDocuments) print str(len(itemDocuments)) + ' resulting items' print itemDocuments.toString() # for itemDocument in itemDocuments: # print '==========================' # print itemDocument.toString()
Oops, now I have 2 places to respond... I added the following to the ticket:
Hi Markus, I created some Python code myself: https://github.com/PolyglotOpenstreetmap/Python-scripts-to-automate-JOSM/blo...
lines 261 to 265 did what I was using before.
Then I use it from 387 to 397. So it searches on label first, then compares whether the descriptions match. It's not the greatest code... but it did the trick
Why was I coding this in Python? Well, I'm creating a prototype in the JOSM editor, which is written in Java. Hopefully this can be incorporated in core at some point and then it will be better to use a Java Toolkit. I had started coding one myself, but that doesn't make much sense. Better to stand on the shoulders of giants and reach up from there.
Jo
2016-02-13 23:46 GMT+01:00 Markus Krötzsch markus@semantic-mediawiki.org:
[Moving to wikidata-tech; previous conversation inline below]
Hi Polyglot,
ah, now I see. The Wikidata Toolkit method you call is looking for items by Wikipedia page title, not for items by label. Labels and titles are not related in Wikidata. The search by title is supported by the wbgetentities API action for which we have a wrapper class, but this API action does not support the search by label.
In fact, I am not sure that there is any API action for doing what you want. There is only wbsearchentities, but this search will return near matches and also look for aliases. Maybe this is not a big issue for long strings as in your case, but for shorter strings you would get many results and you would still need to check if they really match.
Anyway, you are right that it would be nice if we would implement support for the label/alias search as well. For this, we need to make a wrapper class for wbsearchentities. I created an issue to track this:
https://github.com/Wikidata/Wikidata-Toolkit/issues/228
Cheers,
Markus
On 13.02.2016 23:22, Jo wrote:
Hi Markus,
I'm searching for a wikidata item with that label. It would be even better if it were possible to search for a label/description combination.
This is the item I'm looking for: https://www.wikidata.org/wiki/Q22695926
I mostly want to make sure that I'm not creating duplicate entries in Wikidata, most of those schools are not noteworthy enough to get an article on Wikipedia, but since they have objects in Openstreetmap, I would think they are interesting enough for Wikidata.
Polyglot
2016-02-13 23:13 GMT+01:00 Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>:
Hi Jo, You are searching for an item that is assigned to the article "Kasega Church of Uganda Primary School" on English Wikipedia. However, there is not article of this name on English Wikipedia. Maybe there is a typo? Can you tell me which Wikidata item should be returned here? Cheers, Markus P.S. If you agree, I would prefer to continue this discussion on wikidata-tech for the benefit of others who may have similar
questions.
On 13.02.2016 14:47, Jo wrote: Hi Marcus, I had started to write my own implementation of a Wikidata bot in Jython, so I could use it in JOSM, but still get to code in Python. This worked well for a while, but now apparently something was changed to the login API. Anyway, I can't code in all possible things that can go wrong, so
it makes more sense to reuse an existing framework.
What I want to do is add items, but I want to check if they
already exist first. Try as I may, I can't seem to retrieve the items I create myself, like:
Kasega Church of Uganda Primary School Douglas Adams, on the other hand doesn't pose a problem. I can't figure out why this is. Some things can be found, others can't. I tried with a few more entries from recent changes. In my own bot, I had more succes with searchEntities than with getEntities. Was this implemented in WDTK? I hope you can help, I'm stuck, as it doesn't make a lot of sense
to continue with the conversion, if I can't even get a trivial thing like this to work.
from org.wikidata.wdtk.datamodel.helpers import Datamodel from org.wikidata.wdtk.datamodel.helpers import
ItemDocumentBuilder from org.wikidata.wdtk.datamodel.helpers import ReferenceBuilder from org.wikidata.wdtk.datamodel.helpers import StatementBuilder from org.wikidata.wdtk.datamodel.interfaces import DatatypeIdValue from org.wikidata.wdtk.datamodel.interfaces import EntityDocument from org.wikidata.wdtk.datamodel.interfaces import ItemDocument from org.wikidata.wdtk.datamodel.interfaces import ItemIdValue from org.wikidata.wdtk.datamodel.interfaces import PropertyDocument from org.wikidata.wdtk.datamodel.interfaces import PropertyIdValue from org.wikidata.wdtk.datamodel.interfaces import Reference from org.wikidata.wdtk.datamodel.interfaces import Statement from org.wikidata.wdtk.datamodel.interfaces import StatementDocument from org.wikidata.wdtk.datamodel.interfaces import StatementGroup from org.wikidata.wdtk.wikibaseapi import ApiConnection from org.wikidata.wdtk.util import WebResourceFetcherImpl from org.wikidata.wdtk.wikibaseapi import ApiConnection from org.wikidata.wdtk.wikibaseapi import LoginFailedException from org.wikidata.wdtk.wikibaseapi import WikibaseDataEditor from org.wikidata.wdtk.wikibaseapi import WikibaseDataFetcher from org.wikidata.wdtk.wikibaseapi.apierrors import MediaWikiApiErrorException # print dir(ItemDocument) # print dir(ApiConnection)
dataFetcher = WikibaseDataFetcher(connection, siteIri) # print dir(dataFetcher) # itemDocuments = dataFetcher.getEntityDocumentsByTitle('enwiki',['Kasega Church of Uganda Primary School']) # itemDocuments = dataFetcher.getEntityDocuments('Q22695926') itemDocuments = dataFetcher.getEntityDocumentsByTitle('enwiki','Kasega Church of Uganda Primary School') # print dir(itemDocuments) print str(len(itemDocuments)) + ' resulting items' print itemDocuments.toString() # for itemDocument in itemDocuments: # print '==========================' # print itemDocument.toString()
On 13.02.2016 23:57, Jo wrote:
Oops, now I have 2 places to respond... I added the following to the ticket:
No worries: we can keep the discussion on the ticket only. Anyone on this list who is interested in this feature now or later can always look it up to see the status (hopefully it will be closed by then).
Markus
Hi Markus, I created some Python code myself: https://github.com/PolyglotOpenstreetmap/Python-scripts-to-automate-JOSM/blo...
lines 261 to 265 did what I was using before.
Then I use it from 387 to 397. So it searches on label first, then compares whether the descriptions match. It's not the greatest code... but it did the trick
Why was I coding this in Python? Well, I'm creating a prototype in the JOSM editor, which is written in Java. Hopefully this can be incorporated in core at some point and then it will be better to use a Java Toolkit. I had started coding one myself, but that doesn't make much sense. Better to stand on the shoulders of giants and reach up from there.
Jo
2016-02-13 23:46 GMT+01:00 Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>:
[Moving to wikidata-tech; previous conversation inline below] Hi Polyglot, ah, now I see. The Wikidata Toolkit method you call is looking for items by Wikipedia page title, not for items by label. Labels and titles are not related in Wikidata. The search by title is supported by the wbgetentities API action for which we have a wrapper class, but this API action does not support the search by label. In fact, I am not sure that there is any API action for doing what you want. There is only wbsearchentities, but this search will return near matches and also look for aliases. Maybe this is not a big issue for long strings as in your case, but for shorter strings you would get many results and you would still need to check if they really match. Anyway, you are right that it would be nice if we would implement support for the label/alias search as well. For this, we need to make a wrapper class for wbsearchentities. I created an issue to track this: https://github.com/Wikidata/Wikidata-Toolkit/issues/228 Cheers, Markus On 13.02.2016 23:22, Jo wrote: Hi Markus, I'm searching for a wikidata item with that label. It would be even better if it were possible to search for a label/description combination. This is the item I'm looking for: https://www.wikidata.org/wiki/Q22695926 I mostly want to make sure that I'm not creating duplicate entries in Wikidata, most of those schools are not noteworthy enough to get an article on Wikipedia, but since they have objects in Openstreetmap, I would think they are interesting enough for Wikidata. Polyglot 2016-02-13 23:13 GMT+01:00 Markus Krötzsch <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org> <mailto:markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>>: Hi Jo, You are searching for an item that is assigned to the article "Kasega Church of Uganda Primary School" on English Wikipedia. However, there is not article of this name on English Wikipedia. Maybe there is a typo? Can you tell me which Wikidata item should be returned here? Cheers, Markus P.S. If you agree, I would prefer to continue this discussion on wikidata-tech for the benefit of others who may have similar questions. On 13.02.2016 14:47, Jo wrote: Hi Marcus, I had started to write my own implementation of a Wikidata bot in Jython, so I could use it in JOSM, but still get to code in Python. This worked well for a while, but now apparently something was changed to the login API. Anyway, I can't code in all possible things that can go wrong, so it makes more sense to reuse an existing framework. What I want to do is add items, but I want to check if they already exist first. Try as I may, I can't seem to retrieve the items I create myself, like: Kasega Church of Uganda Primary School Douglas Adams, on the other hand doesn't pose a problem. I can't figure out why this is. Some things can be found, others can't. I tried with a few more entries from recent changes. In my own bot, I had more succes with searchEntities than with getEntities. Was this implemented in WDTK? I hope you can help, I'm stuck, as it doesn't make a lot of sense to continue with the conversion, if I can't even get a trivial thing like this to work. from org.wikidata.wdtk.datamodel.helpers import Datamodel from org.wikidata.wdtk.datamodel.helpers import ItemDocumentBuilder from org.wikidata.wdtk.datamodel.helpers import ReferenceBuilder from org.wikidata.wdtk.datamodel.helpers import StatementBuilder from org.wikidata.wdtk.datamodel.interfaces import DatatypeIdValue from org.wikidata.wdtk.datamodel.interfaces import EntityDocument from org.wikidata.wdtk.datamodel.interfaces import ItemDocument from org.wikidata.wdtk.datamodel.interfaces import ItemIdValue from org.wikidata.wdtk.datamodel.interfaces import PropertyDocument from org.wikidata.wdtk.datamodel.interfaces import PropertyIdValue from org.wikidata.wdtk.datamodel.interfaces import Reference from org.wikidata.wdtk.datamodel.interfaces import Statement from org.wikidata.wdtk.datamodel.interfaces import StatementDocument from org.wikidata.wdtk.datamodel.interfaces import StatementGroup from org.wikidata.wdtk.wikibaseapi import ApiConnection from org.wikidata.wdtk.util import WebResourceFetcherImpl from org.wikidata.wdtk.wikibaseapi import ApiConnection from org.wikidata.wdtk.wikibaseapi import LoginFailedException from org.wikidata.wdtk.wikibaseapi import WikibaseDataEditor from org.wikidata.wdtk.wikibaseapi import WikibaseDataFetcher from org.wikidata.wdtk.wikibaseapi.apierrors import MediaWikiApiErrorException # print dir(ItemDocument) # print dir(ApiConnection) dataFetcher = WikibaseDataFetcher(connection, siteIri) # print dir(dataFetcher) # itemDocuments = dataFetcher.getEntityDocumentsByTitle('enwiki',['Kasega Church of Uganda Primary School']) # itemDocuments = dataFetcher.getEntityDocuments('Q22695926') itemDocuments = dataFetcher.getEntityDocumentsByTitle('enwiki','Kasega Church of Uganda Primary School') # print dir(itemDocuments) print str(len(itemDocuments)) + ' resulting items' print itemDocuments.toString() # for itemDocument in itemDocuments: # print '==========================' # print itemDocument.toString()
wikidata-tech@lists.wikimedia.org