Re: [Wikidata-tech] Problems with overly large items

8 Oct 2014

I'm also more worried about memory consumption than speed.

By far the biggest performance issue regarding speed is the fact that we load
entire entities just to look up a single label. This has been known for a while.

But with the new data model, we no longer do deferred unstubbing. Everything is
unserialized right away, always, even if in the end, all we need is a single
label of the entity. That's especially bad if there is a lot of referenced
entities, of course.

On top of that, PHP seems to "sometimes" get confused when memory is running
low. This seems "somehow" connected to ArrayObject. These effects are hard to
reproduce, though; we are not sure what exactly is going on.

In any case, we should try to be less wasteful with memory. Having a stub
implementations for StatementList would already help a lot. I'll be working on
removing the need to load so many entities in the first place (we already had
TermsLookup in the sprint, but didn't get around to working on it - partially
due to the problems on the live site).

Am 08.10.2014 09:43, schrieb Markus Krötzsch:
...
  Btw, when doing such performance measures, it would be
great to get some memory
 statistics from PHP as well. From my past as an SMW developer, I remember seeing
 incredible memory footprints of apparently simple PHP objects. OoM would be one
 of the most common causes for blank pages, much more common than timeouts, and
 even a single object in PHP can take up huge amounts of memory.

 Markus

 On 07.10.2014 23:44, Jeroen De Dauw wrote:
  Hey,

     Thank you for making the measurements. Can you estimate the time for
     item Q183 specifically? Since it is 1000 entities weighing 19 MB,
     this means that on average the entities were 19 KB. Germany on the
     other hand is much larger, and it makes we wonder how it scales to
     that size.

 Good point - I did not realize the outliers are that big. Q183 takes
 ~415ms, which is rather long. ~25ms json_decode, ~390ms array ->
 objects. In itself that is not a problem, though perhaps something to
 look at after we fixed the critical performance issues. This also does
 illustrate that one should be careful to not fully deserilaize entities
 when that is not needed, and that fully deserializing a collection of
 entities in one request is something to avoid.

 Do we have code that falls in the later category? Even if we do only
 partial deserialization, this is still going to be to costly for an
 action done dozens of times during a request. We should also not simply
 assume this is the case now and stop looking for what the critical
 issues are.

 Cheers

 -- 
 Jeroen De Dauw - http://www.bn2vs.com
 Software craftsmanship advocate
 Evil software architect at Wikimedia Germany
 ~=[,,_,,]:3

 _______________________________________________
 Wikidata-tech mailing list
 Wikidata-tech(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

 _______________________________________________
 Wikidata-tech mailing list
 Wikidata-tech(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-tech 

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Wikidata-tech] Problems with overly large items