Right now, I think many of us profile locally or in VMs, which can be useful for relative metrics or quickly identifying bottlenecks, but doesn't really get us the kind of information you're talking about from any sort of real-world setting, or in any way that would be consistent from engineer to engineer, or even necessarily from day to day. From network topology to article counts/sizes/etc and everything in between, there's a lot we can't really replicate or accurately profile against. Are there plans to put together and support infrastructure for this? It seems to me that this proposal is contingent upon a consistent environment accessible by engineers for performance testing.
On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan yastrakhan@wikimedia.orgwrote:
API is fairly complex to meassure and performance target. If a bot requests 5000 pages in one call, together with all links & categories, it might take a very long time (seconds if not tens of seconds). Comparing that to another api request that gets an HTML section of a page, which takes a fraction of a second (especially when comming from cache) is not very useful.
On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres lists@pgehres.com wrote:
From where would you propose measuring these data points? Obviously network latency will have a great impact on some of the metrics and a consistent location would help to define the pass/fail of each test. I do think another benchmark Ops "features" would be a set of latency-to-datacenter values, but I know that is a much harder taks.
Thanks
for putting this together.
On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman <afeldman@wikimedia.org
wrote:
I'd like to push for a codified set of minimum performance standards
that
new mediawiki features must meet before they can be deployed to larger wikimedia sites such as English Wikipedia, or be considered complete.
These would look like (numbers pulled out of a hat, not actual suggestions):
- p999 (long tail) full page request latency of 2000ms
- p99 page request latency of 800ms
- p90 page request latency of 150ms
- p99 banner request latency of 80ms
- p90 banner request latency of 40ms
- p99 db query latency of 250ms
- p90 db query latency of 50ms
- 1000 write requests/sec (if applicable; writes operations must be
free
from concurrency issues)
- guidelines about degrading gracefully
- specific limits on total resource consumption across the stack per
request
- etc..
Right now, varying amounts of effort are made to highlight potential performance bottlenecks in code review, and engineers are encouraged to profile and optimize their own code. But beyond "is the site still up
for
everyone / are users complaining on the village pump / am I ranting in irc", we've offered no guidelines as to what sort of request latency is reasonable or acceptable. If a new feature (like aftv5, or flow) turns
out
not to meet perf standards after deployment, that would be a high
priority
bug and the feature may be disabled depending on the impact, or if not addressed in a reasonable time frame. Obviously standards like this
can't
be applied to certain existing parts of mediawiki, but systems other
than
the parser or preprocessor that don't meet new standards should at
least
be
prioritized for improvement.
Thoughts?
Asher _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l