Hi Stas,

while you are at it, some things would be very useful to be search-able (maybe some are already by now):
* "primary" (not references/qualifiers) years, for birth/death/flourit etc.
* "primary" string/monolingual values (title, taxon name, etc.)
* "primary" IDs, e.g. VIAF (might cause confusion with years, so maybe only add numerical IDs if 5+ digits?)

Cheers,
Magnus

On Wed, Oct 25, 2017 at 1:50 AM Stas Malyshev <smalyshev@wikimedia.org> wrote:
Hi!

As I am working on improving Wikidata fulltext search[1], I'd like to
talk about search results page. Right now search results page for
Wikidata is less than ideal, here are the issues I see with it:

- No match highlighting
- Meaningless data, like word count (anybody cares to guess what it is
counting? Anybody ever used it?) and byte count (more useful than word
count but not by much)
- Obviously, search quality is not super high, but that should be
improved with proper description indexing

While working on improving the situation, I would like to solicit
opinions on the set of questions about how the search results page
should look like. Namely:

1. If the match is made on label/description that does not match current
display language, we could opt for:
a) Displaying the description that matched, highlighted. Optionally
maybe display the language of the match (in display language?)
b) Displaying the description in display language, un-highlighted.
Which option is preferable?

2. What we do if the match is on alias? Do we display matching alias,
original label or both? The question above also applies if the match is
on other language alias.

3. It looks clear to me that words count is useless. Is byte count
useful and does it need to be kept?

4. Do we want to display any other parameters of the entity? E.g. we
have in the index: statement_count, sitelink_count, label_count,
incoming_links, etc. Do we want to display any?

5. Display format for Wikidata and for other wikipedia sites is different:
Wikpedia:

Title
Snippet

Wikidata:

Title: Description

I.e. Wikipedia puts title on a separate line, while Wikidata keeps it on
the same line, separated by colon. Is there any reason for this
difference? Do we want to go back to the common format?

Also if you have any other things/ideas/comments about how fulltext
search output for wikidata should be, please tell me.

I am sending this to wikidata-tech and discovery team list only for now,
since it's still work in progress and half-baked, we could open this for
wider discussion later if necessary.

[1] https://phabricator.wikimedia.org/T178851

Thanks,
--
Stas Malyshev
smalyshev@wikimedia.org

_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech