Re: [Wikidata-tech] Report on loading wikidata - Wikidata-tech

7 Dec 2017

did you try to point the wdqs copy to your tdb/fuseki endpoint?

On Thu, 7 Dec 2017 at 18:58, Andy Seaborne &lt;andy(a)apache.org&gt; wrote:

...
  Dell XPS 13 (model 9350) - the 2015 model.
 Ubuntu 17.10, not a VM.
 1T SSD.
 16G RAM.
 Two volumes = root and user.
 Swappiness = 10

 java version "1.8.0_151" (OpenJDK)

 Data: latest-truthy.nt.gz (version of 2017-11-24)

 == TDB1, tdbloader2
    8 hours // 76,164 TPS

 Using SORT_ARGS: --temporary-directory=/home/afs/Datasets/tmp
 to make sure the temporary files are on the large volume.

 The run took 28877 seconds and resulted in a 173G database.

 All the index files are the same size.

 node2id : 12G
 OSP     : 53G
 SPO     : 53G
 POS     : 53G

 Algorithm:

 Data phase:

 parse file, create node table and a temporary file of all triples (3x 64
 bit numbers, written in text.

 Index phase:

 for each index, sort the temp file (using sort(1), an external sort
 utility), and make the index file by writing the sorted results, filling
 the data blocks and creating any tree blocks needed. This is a
 stream-write process - calculate the data block, write it out when full
 and never touch it again.

 This results in data blocks being completely full, unlike the standard
 B+Tree insertion algorithm. It is why indexes are exactly the same size.

 Building SPO is faster because the data is nearly sorted to start with,.
 Data often tends to arrive grouped by subject.

 tdbloader2 is doing stream (append) I/O on index files, not a random
 access pattern.

 == TDB1 tdbloader1
    29 hours 43 minutes // 20,560 TPS

 106,975 seconds
 297G    DB-truthy

 node2id: 12G
 OSP:     97G
 SPO:     96G
 POS:     98G

 Same size node2id table, larger indexes.

 Algorithm:

 Data phase:

 parse the file and create the node table and the SPO index.
 The creation of SPO is by b+tree insert so blocks are partially full
 (average is empirically about 2/3 full). When a block fills up, it is
 split into 2.  The node table is exactly the same as tdbloader2 because
 nodes are stored in the same order.

 Index phase:

 for each index, copy SPO to the index.  This is a tree sort and the
 access pattern on blocks is fairly random which is a bad thing. Doing
 one at a time is faster than two together because more RAM in the
 OS-managed file system cache, is devoted to caching one index.  A cache
 miss is a possible write to disk, and always a read from disk, which is
 a lot of work even with an SSD.

 Stream reading SPO is efficient - it is not random I/O, it is stream I/O.

 Once the cache-efficiency of the OS disk cache drops, tdbloader slows
 down markedly.

 == Comparison of TDB1 loaders.

 Building an index is a sort because the B+Trees hold data sorted.

 The approach of tdbloader2 is to use an external sort algorithm (i.e.
 sort larger than RAM using temporary files) done by a highly tuned
 utility, unix sort(1).

 The approach of tdbloader1 is to copy into a sorted datastructure. For
 example, copying index SPO to POS, it is creating a file with keys
 sorted by P then O then S, which is not the arrival order which is
 S-sorted.  tdbloader1 maximises OS caching of memory mapped files by
 doing indexes one at a time.  Experimentation shows that doing two at
 once is slower, and doing two in parallel is no better and sometimes
 worse, than doing sequentially.

 == TDB2

 TDB2 is experimental.  The current TDB2 loader is a functional placeholder.

 It is writing all three indexes at the same time.  While for SPO this is
 not a bad access pattern (subjects are naturally grouped), for POS and
 OSP, the I/O is a random pattern, not a stream pattern.  There is more
 than double contention for OS disk cache, hence it is slow and gets
 slower faster.

 == More details.

 For more information, consult the Jena dev@ and user@ archives and the
 code.
 -- 

---
Marco Neumann
KONA