Hi!
As I am working on improving Wikidata fulltext search[1], I'd like to
talk about search results page. Right now search results page for
Wikidata is less than ideal, here are the issues I see with it:
- No match highlighting
- Meaningless data, like word count (anybody cares to guess what it is
counting? Anybody ever used it?) and byte count (more useful than word
count but not by much)
- Obviously, search quality is not super high, but that should be
improved with proper description indexing
While working on improving the situation, I would like to solicit
opinions on the set of questions about how the search results page
should look like. Namely:
1. If the match is made on label/description that does not match current
display language, we could opt for:
a) Displaying the description that matched, highlighted. Optionally
maybe display the language of the match (in display language?)
b) Displaying the description in display language, un-highlighted.
Which option is preferable?
2. What we do if the match is on alias? Do we display matching alias,
original label or both? The question above also applies if the match is
on other language alias.
3. It looks clear to me that words count is useless. Is byte count
useful and does it need to be kept?
4. Do we want to display any other parameters of the entity? E.g. we
have in the index: statement_count, sitelink_count, label_count,
incoming_links, etc. Do we want to display any?
5. Display format for Wikidata and for other wikipedia sites is different:
Wikpedia:
Title
Snippet
Wikidata:
Title: Description
I.e. Wikipedia puts title on a separate line, while Wikidata keeps it on
the same line, separated by colon. Is there any reason for this
difference? Do we want to go back to the common format?
Also if you have any other things/ideas/comments about how fulltext
search output for wikidata should be, please tell me.
I am sending this to wikidata-tech and discovery team list only for now,
since it's still work in progress and half-baked, we could open this for
wider discussion later if necessary.
[1] https://phabricator.wikimedia.org/T178851
Thanks,
--
Stas Malyshev
smalyshev(a)wikimedia.org
Hi there,
I'm currently working on a .NET implementation of Wikibase API client, and
have some (perhaps somewhat trivial) questions
Take the API response for wbgetentities on Q513
<https://www.wikidata.org/wiki/Special:ApiSandbox#action=wbgetentities&forma
t=json&uselang=en&ids=Q513&redirects=yes&props=claims&normalize=1> for
example. In the response, we have a claim
.
"P625": [
{
"mainsnak": {
"snaktype": "value",
"property": "P625",
"hash": "72a03bd4ecbba1e7d0e0dfc2779ceccf9b2e0bcb",
"datavalue": {
"value": {
"latitude": 27.988055555556,
"longitude": 86.925277777778,
"altitude": null,
"precision": 0.00027777777777778,
"globe": "http://www.wikidata.org/entity/Q2"
},
"type": "globecoordinate"
},
"datatype": "globe-coordinate"
},
"type": "statement",
"id": "q513$6dcddd25-48a6-229a-0e36-31b122c2c813",
"rank": "normal",
"references": [
{
.
1. Is "altitude" attribute in "globe-coordinate" property type still in
use? I saw this attribute in the API responses, but as far as I can observe,
they are always null. Is this attribute either reserved for future use or
obsoleted? (I've opened T177269 <https://phabricator.wikimedia.org/T177269>
but later thought that maybe mail list is more appropriate for this.)
2. Currently the "type" (in "datavalue" node) and "datatype" attributes
seems somewhat duplicate, are you planning to make it possible for one
"datatype" to support some other "type"s, to some extent similar to
polymorphism?
Regards,
Xinyan
Just a reminder,
We are starting to phase out the table, it will stop getting updated from
end of the next week and will be dropped two or three weeks from now, the
wb_terms table now have term_full_entity_id instead which is not replicated
to labs yet [0] but it will very soon.
[0]: https://phabricator.wikimedia.org/T167114
Best
>
> From: Magnus Manske <magnusmanske(a)googlemail.com>
> Date: Thu, Jun 1, 2017 at 5:21 PM
> Subject: Re: [Wikidata-tech] Breaking change: "wb_entity_per_page" table
> will not be updated and replicated on ToolLabs anymore
> To: Daniel Kinzler <daniel.kinzler(a)wikimedia.de>, Wikidata technical
> discussion <wikidata-tech(a)lists.wikimedia.org>
>
>
> The original code predated SPARQL, so I have to change it anyway. The
> example I gave is small enough for SPARQL, but others will not be.
>
> On Thu, Jun 1, 2017 at 4:11 PM Daniel Kinzler <daniel.kinzler(a)wikimedia.de>
> wrote:
>
>> Am 01.06.2017 um 16:59 schrieb Magnus Manske:
>> > As an example from my BEACON tool, I want all properties that have a
>> formatter
>> > property, with English label. That SQL is now:
>> >
>> > SELECT DISTINCT page_title,term_text FROM pagelinks,page,wb_terms WHERE
>> > page_namespace=120 AND substr(page_title,2)=term_entity_id and
>> > term_entity_type='property' and term_language='en' and
>> term_type='label' and
>> > pl_from=page_id and pl_title='P1630' and pl_namespace=120 and
>> > pl_from_namespace=120 ORDER BY term_text
>> >
>> > Note the "substr". My first attempt was "page_title=concat('Q',term_en
>> tity_id)",
>> > but that took forever.
>> >
>> > If we indeed get a full entity ID=page title column for wb_terms, and
>> > for wb_items_per_site etc., that would at least fix the on-the-fly
>> compute. I
>> > shall thus wait with code updates until I get the full story, and not
>> just
>> > piece-by-piece...
>>
>> There is currently no plan to put the full ID into wb_items_per_site or
>> wb_property_info, because these tables are bound to a specific entity
>> type.
>> Whether we want to do this would be a whole new discussion.
>>
>> For what you are doing there, it's probably a lot easier to use the query
>> service. SPARQL:
>>
>> SELECT DISTINCT ?property ?propertyLabel
>> WHERE {
>> ?property a wikibase:Property .
>> ?property wdt:P1630 ?format .
>> SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
>> }
>>
>> --
>> Daniel Kinzler
>> Principal Platform Engineer
>>
>> Wikimedia Deutschland
>> Gesellschaft zur Förderung Freien Wissens e.V.
>>
> _______________________________________________
> Wikidata-tech mailing list
> Wikidata-tech(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>
'+'
On Oct 3, 2017 5:17 PM, "Stas Malyshev" <smalyshev(a)wikimedia.org> wrote:
Hi!
On 10/3/17 4:49 PM, Marco Neumann wrote:
> thank you Lucas and Stas, this works for me.
>
> so it would be fair to say that p:P39 by-passes the semantics of
> wdt:P39 with ranking*. for my own understanding why is a wdt property
> called a direct property**?
Because wdt: links directly to value, while p: links to a statement
(where ps: links to the value). But that's not the only property of wdt:
- another property that it links to "truthy" (current, best, etc.) value
- one that has best rank in this property (hence the "t" letter). This
may be what you want or not, depending on general semantics and your
particular case. For many properties, ranks do not play significant
role, since these properties do not change with time and do not have
temporally limited statements. So for these, using wdt: is always ok.
For some, like positions, offices, relationships between humans, etc.
the values can have temporal limits and if you want best/current one,
you use wdt:, otherwise you use p:/ps:. If you still want to account for
rank using p:/ps:, there are rank triples and wikibase:BestRank class
(see
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Statement_
representation).
--
Stas Malyshev
smalyshev(a)wikimedia.org
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech