Thanks to Dan for the great writeup; I've been finding this fantastic primarily because half of my bugs are indices-related and this makes me feel better :D.
Actually, thanks to an overflow problem it's now -2147483648 bugs. We've fixed MediaWiki, let's all go home!
On 3 August 2015 at 13:30, Dan Garry dgarry@wikimedia.org wrote:
In a twist of irony, this issue was actually caused by a patch I wrote to fix an annoying little bug in the app where the namespace of some pages was being set to null when they were saved to the user's storage.
You can see in the changes I made to the persistence helper that I took the column that was the timestamp and used it for the namespace instead. This was my first change to the database layer of the app, and I didn't quite realise the ramifications of doing what I did. Since Dmitry's fix noted that it was silly to ever use column indices rather than looking them up by name, I don't feel too bad about it.. ;-)
99 little bugs in the code, 99 little bugs, take one down, patch it around, 127 little bugs in the code.
Dan
On 2 August 2015 at 17:14, Oliver Keyes okeyes@wikimedia.org wrote:
Hey all,
This Friday, Trey Jones (our awesome Relevance Engineer) and I spent some time playing detective with the sampled request logs and a list of the most common queries resulting in zero results. We found a lot of interesting things. In particular:
- A common pattern in which queries, for no particular reason, had a
UNIX timestamp preceding them (example: "1436336857594:2019 FIFA Women's World Cup"). This is responsible, on its own, for 3% of zero results queries - and it appears to be caused by the Wikimedia Apps. 2. A search for strings in quotes followed by 'film' (example: ""Seventh Son" film"). This is caused by a media player and is responsible for around 0.5% of zero results queries. 3. A search for "quot" strings (example: " quot James Tree quot"). This is from the National Library of Australia and is again around 0.5% of zero results queries. 4. A search for a page title and the name of a page that appears as a link within that page (example: ""2C-T-19" AND "JWH-081""). This is about 6% of queries and appears to come from a German IP address. We're unaware of who this person is or what they're trying, so if anyone knows what on earth this is, we'd appreciate the hint ;).
https://phabricator.wikimedia.org/T107724 is a card representing the need to reach out to these people, where possible (obviously this will be easier for the app team than anyone else ;p). If we can get all of these solved for, we could drop the zero results rate for full text by about 10% Obviously cutting /all/ of it out is improbable, but we're hopeful that we can drop this number and get a better understanding of what third-party users are trying to achieve, to boot.
-- Oliver Keyes Count Logula Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search