It was not the original intention of us at WikiTeam to create these media tarballs so that researchers can use them from there. We created these tarballs so that everyone in the Wikimedia movement can be rest assured that there is one backup copy of their media on the Internet Archive. Trust me, the number of people who are going to actually use these tarballs are going to be lesser than the number of people editing the smaller wikis combined, certainly everyone is going to be using the data on Commons itself. So, we can fully focus on improving Commons to make it more data-accessible without taking the risk of having people working on the tarballs on the Internet Archive for research instead.
That being said, we can't even guarantee that the images in these tarballs are up-to-date. They are all downloaded and should be regarded as a snapshot of the image at the time of download, not an effective live backup of all the images on Commons. We are looking into creating subsequent tarballs that take into account the new uploads and the re-uploads so that Commons is actually backed up.
I guess the way we presented the tarballs on the Internet Archive is enough to deter anyone from conducting research directly from it, unless he/she does an in-depth mining of the data to get what he/she wants, but it certainly is going to be much tougher than mining the information from Commons directly in its current state.