Hi,
On Wed, Oct 02, 2013 at 12:10:51AM +0100, Magnus Manske wrote:
300GB per year should be trivial. It might increase
slightly if we decide
to use a relational database for everything; [...]
I wholeheartedly agree.
Relational databases come with many aspects and downsides that we do
not need in our setting, and in fact are even in the way. However, up
to now, the request is to get the data into MySQL. And there were some
voices that are a bit concerned to see even 300GB in a MySQL database.
But I am not sure what parts of the requirement are set in stone :-)
[ exploiting structure in data ]
Yes, I completely agree. The data comes with lots and lots of
structure. We can easily get for example the date down way below 1
byte. And also the page counts are screaming to have their structure
exploited.
However:
[...] at the cost of more complex
low-level queries.
that's the real problem. Let's find a sweet spot that allows to
perform queries still easily enough, while keeping data sizes
manageable on the backend. Give us some ammunition to discuss the
requirements :-)
* Do we need to have all of the data in MySQL in a trivial schema at
all costs, or
* is it better to have the data at least in MySQL but with a not totally
trivial way to query, or
* would it be even better to have a nice clean interface to the data
that you maybe cannot query by SQL but allows to formulate queries in
a straight forward way?
What would you prefer?
Best regards,
Christian
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------