Hey everyone,
apologies for the cross-posting, we're just too excited:
we're looking for a new member for our team [0], who'll dive right away in
the promising Structured Data project. [1]
Is our future colleague hiding among the tech ambassadors, translators,
GLAM people, community members we usually work with? We look forward to
finding out soon.
So please, check the full job description [2], apply, or tell/recommend
anyone who you think may be a good fit. For any questions, please contact
me personally (not here).
Thanks!
Elitre (WMF)
Senior Community Liaison, Technical Collaboration
[0] https://meta.wikimedia.org/wiki/Community_Liaisons
[1] https://commons.wikimedia.org/wiki/Commons:Structured_data
[2]
https://boards.greenhouse.io/wikimedia/jobs/610643?gh_src=o3gjf21#.WMGV0Rih…
Looks like some of these images still need categorization. I think there's
still an unrealized opportunity here to use the results I shared to work
the backlog of the category on the Commons.
On Thu, Aug 11, 2016 at 1:47 PM Pine W <wiki.pine(a)gmail.com> wrote:
Forwarding.
Pine
---------- Forwarded message ----------
From: "Jordan Adler" <jmadler(a)google.com>
Date: Aug 11, 2016 13:06
Subject: [Commons-l] Programmatically categorizing media in the Commons
with Machine Learning
To: "commons-l(a)wikimedia.org" <commons-l(a)lists.wikimedia.org>
Cc: "Ray Sakai" <rsakai(a)reactive.co.jp>, "Ram Ramanathan" <
ramramanathan(a)google.com>, "Kazunori Sato" <kazsato(a)google.com>
Hey folks!
A few months back a colleague of mine was looking for some unstructured
images to analyze as part of a demo for the Google Cloud Vision API
<https://cloud.google.com/blog/big-data/2016/05/explore-the-galaxy-of-images…>.
Luckily, I knew just the place
<https://commons.wikimedia.org/wiki/Category:Media_needing_categories>, and
the resulting demo <http://vision-explorer.reactive.ai/>, built by Reactive
Inc., is pretty awesome. It was shared on-stage by Jeff Dean during the
keynote
<https://www.youtube.com/watch?v=HgWHeT_OwHc&feature=youtu.be&t=2h1m19s> at
GCP NEXT 2016.
I wanted to quickly share the data from the programmatically identified
images so it could be used to help categorize the media in the Commons.
There's about 80,000 images worth of data:
-
map.txt
<https://storage.googleapis.com/gcs-samples2-explorer/reprocess/map.txt>
(5.9MB): A single text file mapping id to filename in a "id : filename"
format, one per line
-
results.tar.gz
<https://storage.googleapis.com/gcs-samples2-explorer/reprocess/results.tar.…>
(29.6MB): a tgz'd directory of json files representing the output of the
API
<https://cloud.google.com/vision/reference/rest/v1/images/annotate#response-…>,
in the format "${id}.jpg.json"
We're making this data available under the CC0 license, and these links
will likely be live for at least a few weeks.
If you're interested in working with the Cloud Vision API to tag other
images in the Commons, talk to the WMF Community Tech team.
Thanks for your help!
_______________________________________________
Commons-l mailing list
Commons-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l