maybe not the best place to talk about that but...
I'd like to categorize some phab tasks so that I can access them quickly
in the future. At first I thought that tags would be a perfect fit by
creating my own custom tags. But as far as I understood tags are
projects and I'm not allowed to create them.
I suppose that if this feature is protected behind permissions this is
because phab admins do not want someone to pollute the system with
My usecase is:
Sometimes users report queries that are not performing very well.
Usually by reading the query I can identify and classify the cause. This
cause can be something like:
- bad weighting of words in the title
- text analysis issue
- index/db discrepancies
This list is quite vague...
While it's not worth fixing a particular issue that mentions a specific
query it's sometimes helpful to retrieve such tickets (where sometimes I
added a comment) while I'm working on this class of problems:
- just to have more examples to test
- maybe I was wrong with the initial classification and the problem is
Retrieving such tickets is painful today, because I have to rely on
search, not to blame phab developpers, search is hard we all know :)
Today I used the parent/child relationships e.g.
https://phabricator.wikimedia.org/T128073 but I don't think it's the
proper approach because when I classify tickets I don't necessarily have
a parent task ready.
Thanks for your suggestions.
Is there a particular term for search engine sidebars of Wikipedia content?
For example, do we call them "search engine previews" or "Wikipedia
sidebars on search pages"? I imagine that Google and Microsoft have certain
terminology, and I'd like to be consistent when I'm referring to them in
the LearnWiki videos, provided that the term is something that the average
user would understand.
Mikhail has written up and should soon release his report on our recent
TextCat A/B tests; the results look good, and language identification and
cross-wiki searching definitely improve the results (in terms of results
shown and results clicked) for otherwise poorly performing queries (those
that get fewer than 3 results).
Mikhail's report also suggests looking at some measure of confidence for
the language identification to see if that has any effect on the quality
(in terms of number of results, but more importantly clicks) of the
crosswiki (also "interwiki") results. This sounds like a good idea, but
TextCat doesn't make it super easy to do. I have some ideas, though, and I
would love some suggestions from anyone else who has any ideas.
The details are kind of technical, so if that kind of thing makes your eyes
glaze over, you should avert your gaze now.
Otherwise, check out my write up on TextCat and confidence
and share your ideas here, or on the talk page.
Software Engineer, Discovery
While looking at the elasticsearch dashboard on Grafana  I see that
we have weekly spikes in response times from codfw. My guess is that
this is related to the weekly update of page rank.
We see fairly large spikes on the overall 95%-ile for codfw (from a
usual ~300[ms] to ~1-1.5[s]). Those spikes are more visible on codfw
than on eqiad as we have less overall traffic on codfw compared to
eqiad. This makes indexing more visible compared to reads. So far, no
problem, the graph look bad, but this can be explained and does not
show user impact.
We also see weekly spikes on the 75%-ile of more-like queries (from a
usual ~200-300[ms] to 300-400[ms]). More-like queries are the only
queries sent to codfw. This is not yet worrisome, but is probably
something we should keep an eye on and improve before it starts to be
I have mostly no idea how those page rank updates work. Would it be
possible to throttle the index update from those jobs? Increase the
frequency of those update to reduce the impact?
Operations Engineer, Discovery