web log data - Analytics

3 Mar 2017

Hi All -

Emanuele Rocca suggested we reach out to you guys and see if you guys would
be willing to share web log/content access data.

Jelena and I are network security researchers at University of Southern
California's Information Sciences Institute. We're working on a project for
application-level DDoS defences, and are evaluating our defences for web
applications.

Our defences model how legitimate users interact with served content and
using these models we attempt to differentiate between legitimate users and
any attacking bots during high-load (ie a potential attack). Our models are
based on the timing between user requests and the semantic connections (or
lack there of) between content requests. More information on our NSF funded
project can be found here: https://steel.isi.edu/Projects/frade/

Right now, to collect data for evaluation, we've mirrored several sites
(Wikipedia is one of them :) and hired ~200 users to interact with our
mirrored sites (for app-level attack data, we simulate attacks). Of course,
this isn't the most ideal way of getting data on human-content interaction
and we would be thrilled to augment our evaluation with "real world" data.

Wikipedia is particularly of interest to us given the number of "good" bots
which access content and who's access patterns may not exist in our current
models trained on human interactions with mirrored wikipedia content.

We would be extremely grateful for any information you are willing to
share. We understand and fully support the need to preserve privacy of
Wikipedia and Wikipedia users, and we regularly work with anonymized
datasets.  If there's any need for NDAs or similar agreements, we are very
open to whatever is necessary. In addition to web/access logs, any
information on application-level DoS attacks or flash crowds you have
experienced we would be grateful for as well.

cheers,
gen