Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

​We grant access to webrequest log data and the non-public derivatives of it not very frequently. When we do, we do it through creating formal collaborations with the researchers. What these collaborations are and how we set them up are explained at https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations.

To provide more context:

Requiring formal collaborations as a necessary step for accessing the data means that we cannot scale rapidly, i.e, each researcher on our team is only able to be involved in so many of them. The practical cap is somewhere around 3 collaborations per researcher in my experience. We understand that this is a problem as we would like more researchers to work with this data. We reconsider ways for expanding our capacity to collaborate frequently. We also always consider releasing more data-sets publicly since ultimately, that's one of the best ways for us to empower others do what they want to work on and find value in.


