I got 99 problems, but a bug ain't one.


On Mon, Aug 3, 2015 at 1:30 PM, Dan Garry <dgarry@wikimedia.org> wrote:
In a twist of irony, this issue was actually caused by a patch I wrote to fix an annoying little bug in the app where the namespace of some pages was being set to null when they were saved to the user's storage.

You can see in the changes I made to the persistence helper that I took the column that was the timestamp and used it for the namespace instead. This was my first change to the database layer of the app, and I didn't quite realise the ramifications of doing what I did. Since Dmitry's fix noted that it was silly to ever use column indices rather than looking them up by name, I don't feel too bad about it.. ;-)

99 little bugs in the code, 99 little bugs, take one down, patch it around, 127 little bugs in the code.

Dan

On 2 August 2015 at 17:14, Oliver Keyes <okeyes@wikimedia.org> wrote:
Hey all,

This Friday, Trey Jones (our awesome Relevance Engineer) and I spent
some time playing detective with the sampled request logs and a list
of the most common queries resulting in zero results. We found a lot
of interesting things. In particular:

1. A common pattern in which queries, for no particular reason, had a
UNIX timestamp preceding them (example: "1436336857594:2019 FIFA
Women's World Cup"). This is responsible, on its own, for 3% of zero
results queries - and it appears to be caused by the Wikimedia Apps.
2. A search for strings in quotes followed by 'film' (example:
"\"Seventh Son\" film"). This is caused by a media player and is
responsible for around 0.5% of zero results queries.
3. A search for "quot" strings (example: " quot James Tree quot").
This is from the National Library of Australia and is again around
0.5% of zero results queries.
4. A search for a page title and the name of a page that appears as a
link within that page (example: "\"2C-T-19\" AND \"JWH-081\""). This
is about 6% of queries and appears to come from a German IP address.
We're unaware of who this person is or what they're trying, so if
anyone knows what on earth this is, we'd appreciate the hint ;).

https://phabricator.wikimedia.org/T107724 is a card representing the
need to reach out to these people, where possible (obviously this will
be easier for the app team than anyone else ;p). If we can get all of
these solved for, we could drop the zero results rate for full text by
about 10% Obviously cutting /all/ of it out is improbable, but we're
hopeful that we can drop this number and get a better understanding of
what third-party users are trying to achieve, to boot.

--
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search



--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation

_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search