Hi,
On Wed, Oct 02, 2013 at 12:10:51AM +0100, Magnus Manske wrote:
300GB per year should be trivial. It might increase slightly if we decide to use a relational database for everything; [...]
I wholeheartedly agree. Relational databases come with many aspects and downsides that we do not need in our setting, and in fact are even in the way. However, up to now, the request is to get the data into MySQL. And there were some voices that are a bit concerned to see even 300GB in a MySQL database.
But I am not sure what parts of the requirement are set in stone :-)
[ exploiting structure in data ]
Yes, I completely agree. The data comes with lots and lots of structure. We can easily get for example the date down way below 1 byte. And also the page counts are screaming to have their structure exploited.
However:
[...] at the cost of more complex low-level queries.
that's the real problem. Let's find a sweet spot that allows to perform queries still easily enough, while keeping data sizes manageable on the backend. Give us some ammunition to discuss the requirements :-)
* Do we need to have all of the data in MySQL in a trivial schema at all costs, or * is it better to have the data at least in MySQL but with a not totally trivial way to query, or * would it be even better to have a nice clean interface to the data that you maybe cannot query by SQL but allows to formulate queries in a straight forward way?
What would you prefer?
Best regards, Christian