I have integrated the wikix program that extracts image files and analyzes image template usage in the enwiki dumps into the AI engine I use for machine translation projects. I use wikix to sync up with the Wikipedia Image repository. It has yielded some useful results as a side affect which may be useful for the Wikipedia community on the English Wikipedia.
During analysis of the last dumps posted as enwiki-20070206, the program identified all tag usages in templates for image tagging in use on the English Wikipedia as well as all suspect image files which may be trojans, viruses, and other types of content which has been uploaded as images to the site.
The image files and data are grouped into the following output logs from the English Wikipedia. Not all the files are trojans and some of them are probably ok , but a some may not be, particularly files named "spoof" and MS word files which can contain VB5 virus code if downloaded from Wikipedia. At any rate, the list of files and the articles which they link to are provided and it may be useful for someone to review these files since they appear to be file types which can harbor viruses and trojans. They are files I will not be hosting or pulling into Wikigadugi since they may contain malicious code.
images.log - all image files referenced in the last enwiki dumps reject.log - all suspect files which may be viruses or trojans listed by article title which link to the image files fragment.log - all templates and image tags used in templates which alias to the Image: directive as some point through the website logic and the first article title in which they appear. (this is interesting to see how many tags people create in templates to map to Image:)
These logs can be downloaded from:
ftp://www.wikigadugi.org/wiki/xml/wikix-logs.tar.gz
Jeff
wikimedia-l@lists.wikimedia.org