Hi Alexander!
This indeed seems like an interesting project. Responding to your suggestions:
First, I am ready to collaborate with you on making this data available as other researchers have done in the past. I would appreciate if you let me know which steps I need to take in order to work with you on this task.
I'd suggest you apply for a research project here[1]. The research team will discuss the project with you. And if it gets approved, you can sign and NDA and have access to the raw data. You can also apply for a grant here[2].
Second, you can consider making this data available after achieving the necessary level of confidentiality. For example, you can group request types so that each group has at least 1000 unique IP-addresses.
There are a couple tasks[3] in our backlog about effectively anonymizing the pageview data for a general purpose. We used an algorithm similar to what you proposed. Our experience, though, is that anonymization (for general purpose) is a non-trivial task. We plan to work on this in the mid-term (actually, we already started to work on it, see tasks) but we have other priorities for the next quarter. I'd suggest again that you apply for a specific project for the needs of your study here[1][2].
Another challenge, I guess, would be categorizing the articles as educational or entertainment. The categories in Wikipedia are a cool way to browse, but not an exact way of clustering contents. And I guess the frontier between educational and entertainment can be sometimes fuzzy, no? A very interesting challenge anyway.
cheers!
[3]