Hi!
I will look into the size of the jnl file but should that not be located where the blazegraph is running from the sparql endpoint or is this a special flavour? Was also thinking of looking into a gitlab runner which occasionally could generate a HDT file from the ttl dump if our server can handle it but for this an md5 sum file would be preferable or should a timestamp be sufficient?
Publishing jnl file for Blazegraph may be not as useful as one would think, because jnl file is specific for a specific vocabulary and certain other settings - i.e., unless you run the same WDQS code (which customizes some of these) of the same version, you won't be able to use the same file. Of course, since WDQS code is open source, it may be good enough, so in general publishing such file may be possible.
Currently, it's about 300G size uncompressed. No idea how much compressed. Loading it takes a couple of days on reasonably powerful machine, more on labs ones (I haven't tried to load full dump on labs for a while, since labs VMs are too weak for that).
In general, I'd say it'd take about 100M per million of triples. Less if triples are using repeated URIs, probably more if they contain ton of text data.