On Wed, Oct 2, 2013 at 5:16 AM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Magnus Manske, 02/10/2013 10:12:
Depending on the absolute value of "all costs", I'd prefer #1, or a
combination of #2&#3.

For GLAM (which is what I am mostly involved in), monthly page views
would suffice, and those should be easily done in MySQL.

Daily views would be nice-to-have, but do not reed to be in MySQL. [...]

I'd second this. We have partners (but also, say, internal WikiProjects) working on a long tail of tens or hundreds thousand pages with their own project: cutting this long tail, including redlinks, would be a higher loss than a decrease in resolution.


Thank you both for the response, this is very useful to know.  If I'm hearing people correctly so far:

* reduced resolution is OK, handle requests for higher resolution data further down the line.
* hacking the data to reduce size is OK if needed, but preferably the hacks should not be lossy.
* a database is not absolutely 100% necessary but is preferred.

If that's right, I have an additional question: would a non-relational database be acceptable?  I'm not saying we're planning this, just wondering what people think.  If, for example, the data would be available in a public Cassandra cluster.  Would people be willing to understand how CQL [1] works?


[1] - http://cassandra.apache.org/doc/cql/CQL.html