Hello dear wikidata list participants,
I use a MediaWiki API <https://www.wikidata.org/w/api.php> to query for
Wikidata items. I faced a problem when using 'wbsearchentities' action.
I have a query string 'Blue Angel' and I need to find the entities that
have a title containing this query string, in particular entity with
property id equal to 'Q158047'. But to find it I have to search for
'*The* Blue Angel'. I tried to increase the limit of maximal number
results up to 50, but still entity Q158047 does not appear in the
resultant list.
Is there a way to have partial matches?
Thanks,
Almer
Hi,
Here is the technical counterpart of the question on globe coordinates I
just sent to wikidata-l:
"""
Which of the following statements are most accurate given the technical
roadmap of Wikibase?
(1a) Wikibase will continue to support arbitrary precision values for
coordinates, and the UI will be extended so people can actually enter them.
(1b) Wikibase will restrict the set of supported precision values for
coordinates to those already supported in the UI. Other values are
considered an error that will have to be fixed in the future.
(2a) Null values for precision are an error that should be fixed in the
data. Wikibase will reject such data in the future.
(2b) Null values for precision have a meaning. It is as follows (please
explain): ...
"""
It would really be useful for third parties to know which way this is going.
Thanks,
Markus
Hi:
Is there a definition of what queries against Wikidata are supposed to return?
In particular, how will queries interact with P1647 (subproperty of) and
similiar aspect of Wikidata?
peter
I have filed an RFC about ways to improve the performance of the wb_terms table
(or rather, how to replace it with something that works better).
https://phabricator.wikimedia.org/T86530
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
I have files an RFC about how units for quantities should be localized during
input and rendering: https://phabricator.wikimedia.org/T86528
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Hello wikidata-tech!
tl/dr: we're going to move forward building a query service against
Titan/Cassandra rather than OrientDB or ArangoDB or anything else.
We finally finished up the exploratory phase of building a Wikidata Query
Service we can run on the cluster. As a reminder, these were our goals:
1. Horizontal scalability for handling more queries and large data sets
2. Active user community outside of WMF
The system has to be able to answer questions like "find me all the humans
without a date of death born more than 115 years ago" and "list the 10
biggest cities in Europe with female mayors" or "find me all writers who
are not authors".
We're perfectly happy if its something we have to hack on, but we don't
want to be the _only_ ones using it. That lands you in lsearchd like
situations where the open source world moves in a different direction than
you do and then no one knows how to support your software. You become
afraid to restart it, much less release new versions of it. (BTW, Chad
just pulled the last remaining trickle of traffic from it today, party!)
So we identified three good candidates: Titan (backed by Cassandra),
OrientDB and ArangoDB. We prototyped against both Titan and OrientDB and
both worked pretty well. We didn't have time to prototype against ArangoDB
and we also had communications mixup with upstream. Anyway, when it came
down to it we had two working prototypes so taking time to build a third
felt a bit redundant. You can see much of our notes here
<https://www.mediawiki.org/wiki/Wikibase/Indexing>.
The trouble with two working prototypes is that you can't just flip a coin
to pick one. I guess you could, but instead we made a spreadsheet
<https://docs.google.com/a/wikimedia.org/spreadsheets/d/1MXikljoSUVP77w7JKf9…>.
We rated Titan and OrientDB in 25ish categories, weighted the categories,
and added the results and picked a winner. The process wasn't perfect, but
it was a thing of beauty to watch four people simultaneously edit the
spreadsheet, leaving comments explaining most of the numbers. Titan
eventually won by a fairly wide margin so we're proceeding with work on it.
I expect lots of people will have comments. Please reply here and/or
comment directly on the spreadsheet. Everyone in WMF can leave comments on
the sheet and most interested parties at WMDE have been given rights to do
so. I'm loath to set the document to world comment-able for some reason.
I don't think its particularly likely we'll end up reworking the
spreadsheet to the point where Titan isn't still the victor, but if we do
lets try to do so quickly so we can stop work on it.
We've started to try and use our workboard
<https://phabricator.wikimedia.org/project/board/891/> to keep track of
work left to do. The next big steps are:
* Draw up an architecture so we know how many and what kind of servers ask
ops for
* To port what we have from Titan 0.5.0 to Titan 0.9.0-M1 so we're on the
most current development line (also, 0.9.0-M1 supports reverse-i-search,
and who can live without that?)
* To implement incremental updates
* Start prototyping a public query API
So, any questions?
Note: I've also sent this email to WMF-ops but can't add both lists to the
same conversation because ops email are moderated and any conversation here
would create a moderation nightmare.
Nik
Hello New Freebase :)
I have updated the OpenRefine reconciliation redo task to support
reconciling against Wikidata here:
https://github.com/OpenRefine/OpenRefine/issues/805
However, in our inspection of your API, would could not find a close
approximate for something that the Freebase API had, which was OUTPUT of
Properties for an Entity / Item.
Looking at this query syntax:
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Valve&lan…
I would ideally like to have a way to output additional metadata or
properties and their values "for each item".
Is there a way to do so ? Or is a second request always needed to query
the ids (a pain point)?
Thanks in advance,
Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>