[Commons-l] Programmatically categorizing media in the Commons with Machine Learning

11 Aug 2016


      Hey folks!
A few months back a colleague of mine was looking for some unstructured
images to analyze as part of a demo for the Google Cloud Vision API
https://cloud.google.com/blog/big-data/2016/05/explore-the-galaxy-of-images-with-cloud-vision-api.
Luckily, I knew just the place
https://commons.wikimedia.org/wiki/Category:Media_needing_categories, and
the resulting demo http://vision-explorer.reactive.ai/, built by Reactive
Inc., is pretty awesome.  It was shared on-stage by Jeff Dean during the
keynote
https://www.youtube.com/watch?v=HgWHeT_OwHc&feature=youtu.be&t=2h1m19s at
GCP NEXT 2016.
I wanted to quickly share the data from the programmatically identified
images so it could be used to help categorize the media in the Commons.
There's about 80,000 images worth of data:
-
map.txt
   https://storage.googleapis.com/gcs-samples2-explorer/reprocess/map.txt
   (5.9MB): A single text file mapping id to filename in a "id : filename"
   format, one per line
-
results.tar.gz
   https://storage.googleapis.com/gcs-samples2-explorer/reprocess/results.tar.gz
   (29.6MB): a tgz'd directory of json files representing the output of the
   API
   https://cloud.google.com/vision/reference/rest/v1/images/annotate#response-body,
   in the format "${id}.jpg.json"
We're making this data available under the CC0 license, and these links
will likely be live for at least a few weeks.
If you're interested in working with the Cloud Vision API to tag other
images in the Commons, talk to the WMF Community Tech team.
Thanks for your help!

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Commons-l] Programmatically categorizing media in the Commons with Machine Learning