IR-Cache provide their traces on less than a second granularity. They have been doing that for years. The way they deal with the storage problem is by having a rotating log with maximum one week, so when they will add a new file for today, they will delete the one for Monday last week. Anyone requiring to use data of more than one week needs to write his own script or download the files at least once a week.
Should Wikimedia provide such data, there shouldn't be a storage problem.


On Mon, Sep 22, 2014 at 7:13 AM, Pine W <wiki.pine@gmail.com> wrote:

Hm, on the second point the person to ask is Toby, but it sounds like there are reasons for the minimun one hour granulatity, and with Oliver's point it sounds like this research approach won't produce the intended benefits anyway. Perhaps another reason for one hour minimum granulatity is because of the storage and other resource requirements for highly granular data are too expensive to justify the benefits.



Wiki-research-l mailing list