I'm also more worried about memory consumption than speed.
By far the biggest performance issue regarding speed is the fact that we load
entire entities just to look up a single label. This has been known for a while.
But with the new data model, we no longer do deferred unstubbing. Everything is
unserialized right away, always, even if in the end, all we need is a single
label of the entity. That's especially bad if there is a lot of referenced
entities, of course.
On top of that, PHP seems to "sometimes" get confused when memory is running
low. This seems "somehow" connected to ArrayObject. These effects are hard to
reproduce, though; we are not sure what exactly is going on.
In any case, we should try to be less wasteful with memory. Having a stub
implementations for StatementList would already help a lot. I'll be working on
removing the need to load so many entities in the first place (we already had
TermsLookup in the sprint, but didn't get around to working on it - partially
due to the problems on the live site).
Am 08.10.2014 09:43, schrieb Markus Krötzsch:
Btw, when doing such performance measures, it would be
great to get some memory
statistics from PHP as well. From my past as an SMW developer, I remember seeing
incredible memory footprints of apparently simple PHP objects. OoM would be one
of the most common causes for blank pages, much more common than timeouts, and
even a single object in PHP can take up huge amounts of memory.
Markus
On 07.10.2014 23:44, Jeroen De Dauw wrote:
Hey,
Thank you for making the measurements. Can you estimate the time for
item Q183 specifically? Since it is 1000 entities weighing 19 MB,
this means that on average the entities were 19 KB. Germany on the
other hand is much larger, and it makes we wonder how it scales to
that size.
Good point - I did not realize the outliers are that big. Q183 takes
~415ms, which is rather long. ~25ms json_decode, ~390ms array ->
objects. In itself that is not a problem, though perhaps something to
look at after we fixed the critical performance issues. This also does
illustrate that one should be careful to not fully deserilaize entities
when that is not needed, and that fully deserializing a collection of
entities in one request is something to avoid.
Do we have code that falls in the later category? Even if we do only
partial deserialization, this is still going to be to costly for an
action done dozens of times during a request. We should also not simply
assume this is the case now and stop looking for what the critical
issues are.
Cheers
--
Jeroen De Dauw -
http://www.bn2vs.com
Software craftsmanship advocate
Evil software architect at Wikimedia Germany
~=[,,_,,]:3
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.