The duration log shows
I think you're focusing too much on the duration log which isn't graphed yet. Implementing graphs for that data has been constantly postponed in our cycle planning because it's been considered lower priority than the rest. We can focus on challenges specific to that data whenever it gets picked up.
We also want to display global average loading time, which is an
average of all the logged loading times (which, per above, use different
sampling).
We might event want to display per-country loading times, which is an even more random mix of data from different wikis.
Having every graph and metric possible isn't necessarily a useful goal. Specific graphs are only worth having if they provide actionable conclusions that can't be found by looking at other graphs. For example, not being able to generate global graphs isn't that big a deal if we can draw the same conclusions they would provide by looking at the graphs of very large wikis. An entertaining graph isn't necessarily useful.
At this point the action log is the only one likely to have mixed sampling, but we only use that one for totals, not averages/percentiles. The only metrics we're displaying averages and percentiles for have consistent sampling across all wikis. Even for the duration log, there is consistent sampling at the moment, and it's so similar to the other sampled metrics we currently have that I don't foresee the need to introduced mixed sampling.
As for adapting the consistent sampling we currently have on our sampled logs to improve the accuracy of metrics on small countries/small wikis where the sample size is too small, is it really useful? Are we likely to find that increasing the accuracy of the measurement of a specific metric in a given African country will tell us something we don't already know? There's plenty of useful data on metrics with decent sample sizes, I think that trying to increase the sample size of each small metric for each small country is a little futile.