Dear all,
thanks for all your help with answering questions and giving feedback
over the last couple of months. I'm happy to say that we're finally at
a stage where we've hashed 22,452,638 images from Wikimedia Commons
and launched Elog.io in public beta: http://elog.io/
Elog.io is an open API as well as browser plugins, that can query and
get information about images using a perceptual hash that's easy and
quick to calculate in a browser.
What the browser extensions allow you to do is match an image you find
"in the wild" against Wikimedia Commons. If it can be matched against
an image from Commons, it'll show you the title, author, and license,
and give you links back to Wikimedia, the license, and a quick and
handy "Copy as HTML" to copy the image and attribution as a HTML
snippet for pasting into Word, LibreOffice, Wordpress, etc.
Our API provides lookup functions to find information using a URL (the
Commons' page name URL) or using the perceptual hash. You get
information back as JSON in W3C Media Annotations format. of course,
the information you get back is no better than the one provided by the
Commons API, so if you already have a page name URL, you may as well
query it directly, and rely on our API only for searching by
perceptual hashes.
The algorithm we use for calculating perceptual hashes, which you'll
need to query our API, is at http://blockhash.io/
Sincerely,
Jonas
Greetings,
As many of you are aware, we're currently in the process of
collectively adding machine-readable metadata to many files and
templates that don't have them, both on Commons and on all other
Wikimedia wikis with local uploads [1,2]. This makes it much easier to
see and re-use multimedia files consistently with best practices for
attribution across a variety of channels (offline, PDF exports, mobile
platforms, MediaViewer, WikiWand, etc.)
In October, I created a dashboard to track how many files were missing
the machine-readable markers on each wiki [3]. Unfortunately, due to
the size of Commons, I needed to find another way to count them there.
Yesterday, I finished to implement the script for Commons, and started
to run it. As of today, we have accurate numbers for the quantity of
files missing machine-readable metadata on Commons: ~533,000, out of
~24 million [4]. It may seem like a lot, but I personally think it's a
great testament to the dedication of the Commons community.
Now that we have numbers, we can work on going through those files and
fixing them. Many of them are missing the {{information}} template,
but many of those are also part of a batch: either they were uploaded
by the same user, or they were mass-uploaded by a bot. In either case,
this makes it easier to parse the information and add the
{{information}} template automatically with a bot, thus avoiding
painful manual work.
I invite you to take a look at the list of files at
https://tools.wmflabs.org/mrmetadata/commons/commons/index.html and
see if you can find such groups and patterns.
Once you identify a pattern, you're encouraged to add a section to the
Bot Requests page on Commons, so that a bot owner can fix them:
https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Adding_the_In…
I believe we can make a lot of progress rapidly if we dive into the
list of files and fix all the groups we can find. The list and
statistics will be updated daily so it'll be easy to see our progress.
Let me know if you'd like to help but are unsure how!
[1] https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive
[2] https://blog.wikimedia.org/2014/11/07/cleaning-up-file-metadata-for-humans-…
[3] https://tools.wmflabs.org/mrmetadata/
[4] https://tools.wmflabs.org/mrmetadata/commons/commons/index.html
--
Guillaume Paumier
On Fri, Dec 12, 2014 at 2:41 AM, Ricordisamoa
<ricordisamoa(a)openmailbox.org> wrote:
> Il 11/12/2014 23:28, Dan Garry ha scritto:
>>
>> THIS IS AWESOME
>>
>> Do you know when we are going to be able to start querying this via an API
> in production?
>>
>> The Mobile Apps Team would love to consume this data, as opposed to the
> present data exposed via the CommonsMetadata API (which is scraped, eugh).
>
> As far as I understand the information Guillaume is talking about is exactly
> the one scraped by CommonsMetadata.
> See https://tools.wmflabs.org/mrmetadata/how_it_works.html:
> «The script needs to go through all file description pages of a wiki, and
> check for machine-readable metadata by querying the CommonsMetadata
> extension.»
That's correct. However, just to be clear, CommonsMetadata doesn't
just scrape the HTML (or the wikitext), it scrapes the HTML to look
for the machine-readable markers, and exposes that information through
the API.
Until we have Structured Data (which is /at least/ a year out),
CommonsMetadata is still the best way to access that information.
--
Guillaume Paumier
Now with accurate sender, apologies that this didn't go through at first.
---------- Forwarded message ----------
From: "Jonas Öberg" <jonas(a)shuttleworthfoundation.org>
Date: 11 Dec 2014 16:01
Subject: Re: [Commons-l] Elog.io now up w/ Commons data
To: "Wikimedia Commons Discussion List" <commons-l(a)lists.wikimedia.org>
Cc: <wikitech-l(a)lists.wikimedia.org>
Hi Cornelius!
For images which it match against the catalog, it should give accurate
information. If it doesn't, use the "report" link to let us know!
You're right though that for images it doesn't find in its catalog, we
don't provide any information. That's the equivalent of saying "this
picture may or may not be openly licensed, but right now we have no
information to tell either way"
Sincerely,
Jonas
On 11 Dec 2014 15:57, "Cornelius Kibelka" <cornelius.kibelka(a)wikimedia.de>
wrote:
> Wow, what a nice and interesting browser extension. Congrats!
>
> Just a question: as far as I can see the tool doens't give the complete
> and correction licensing information, as the source is missing. Or I'm
> missleading?
>
> Best
> Cornelius
>
> 2014-12-10 19:30 GMT+01:00 Jonas Öberg <jonas(a)commonsmachinery.se>:
>
>> Dear all,
>>
>> thanks for all your help with answering questions and giving feedback
>> over the last couple of months. I'm happy to say that we're finally at
>> a stage where we've hashed 22,452,638 images from Wikimedia Commons
>> and launched Elog.io in public beta: http://elog.io/
>>
>> Elog.io is an open API as well as browser plugins, that can query and
>> get information about images using a perceptual hash that's easy and
>> quick to calculate in a browser.
>>
>> What the browser extensions allow you to do is match an image you find
>> "in the wild" against Wikimedia Commons. If it can be matched against
>> an image from Commons, it'll show you the title, author, and license,
>> and give you links back to Wikimedia, the license, and a quick and
>> handy "Copy as HTML" to copy the image and attribution as a HTML
>> snippet for pasting into Word, LibreOffice, Wordpress, etc.
>>
>> Our API provides lookup functions to find information using a URL (the
>> Commons' page name URL) or using the perceptual hash. You get
>> information back as JSON in W3C Media Annotations format. of course,
>> the information you get back is no better than the one provided by the
>> Commons API, so if you already have a page name URL, you may as well
>> query it directly, and rely on our API only for searching by
>> perceptual hashes.
>>
>> The algorithm we use for calculating perceptual hashes, which you'll
>> need to query our API, is at http://blockhash.io/
>>
>>
>> Sincerely,
>> Jonas
>>
>> _______________________________________________
>> Commons-l mailing list
>> Commons-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/commons-l
>>
>
>
>
> --
> Cornelius Kibelka
>
> International Affairs
> Werkstudent | student trainee
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
>
> Tel.: +49 30 219158260
> http://wikimedia.de
>
> <http://wikimedia.de/>Stellen Sie sich eine Welt vor, in der jeder Mensch
> freien Zugang zu der
> Gesamtheit des Wissens der Menschheit hat. Helfen Sie uns dabei!
> http://spenden.wikimedia.de/
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Commons-l mailing list
> Commons-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>
>
Hey everyone :)
I've been asked to enable access to the data on Wikidata for Commons.
I'm happy to make that happen. We'll enable access on December 2nd.
What does this mean? You will be able to access data from an item on
Wikidata like the date of birth of an artist or the name of a city in
different languages. Where and how much you make use of that is up for
you to decide. You will be able to access the data in two ways. The
first one is the #property parser function
(https://meta.wikimedia.org/wiki/Wikidata/Notes/Inclusion_syntax). The
second one is via Lua
(https://www.mediawiki.org/wiki/Extension:Wikibase_Client/Lua). There
are two big caveats at this point. 1) You will only be able to access
data for items that are connected via a sitelink to the page you want
to show the data on. We're currently working on allowing accessing
data from any item. This should be available around January/February.
2) You can not use this to store meta data (like the date a picture
was taken or who took it) about individual files. This will in the
future be stored on Commons itself as part of the structured data
project (https://commons.wikimedia.org/wiki/Commons:Structured_data).
Please let me know if you have any questions. I am looking forward to
more integration between Commons and Wikidata and all the things this
will make possible. It'd be great if you could help with updating and
expanding https://commons.wikimedia.org/wiki/Commons:Wikidata. The
relevant page on Wikidata is
https://www.wikidata.org/wiki/Wikidata:Wikimedia_Commons.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
[sorry for cross-posting]
Hi there,
maybe some of you have seen it already: Wikimedia Deutschland, the German
Commission for UNESCO and North Rhine-Westphalian Library Service Centre
just published a guide on how to correctly use Creative Commons licenses.
You can read all about it here:
https://blog.wikimedia.org/2014/12/09/using-licenses-easy-and-legal/.
The guide also has a pretty nice Meta page (
https://meta.wikimedia.org/wiki/Open_Content_-_A_Practical_Guide_to_Using_C…)
where you can read the full text or download the PDF. Thanks to Jean-Fred
for turning on the translation tool! I am looking forward to the guide
being available in many, many languages.
If you have any comments or questions, please get in touch with me via
e-mail or the talk page on Meta (
https://meta.wikimedia.org/wiki/Talk:Open_Content_-_A_Practical_Guide_to_Us…
).
Best,
Katja
--
Katja Ullrich
Politik & Gesellschaft
-------------------------------------
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Telefon 030 - 219 158 26-0
www.wikimedia.de
Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
Wissens frei teilhaben kann. Helfen Sie uns dabei!
http://spenden.wikimedia.de/
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
> >
> > Message: 4
> > Date: Thu, 4 Dec 2014 14:58:37 -0500
> > From: "Sreejith K." <sreejithk2000(a)gmail.com>
> > To: Wikimedia Commons Discussion List <commons-l(a)lists.wikimedia.org>
> > Subject: Re: [Commons-l] Duplicate removal?
> > Message-ID:
> > <CAN8yy7Mtte+FPJ5N=hq=
rQC3onOq5Vvtcixzt+mZ2kxfDAcdKQ(a)mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > I am using Wikimedia APIs to create a gallery of duplicates and
routinely
> > clean them. You can see the results here.
> >
> > https://commons.wikimedia.org/wiki/User:Sreejithk2000/Duplicates
> >
> > The page also has a link to the script. If anyone is interested in using
> > this script, let me know and I can work with you to customize it.
> >
> > - Sreejith K.
> >
> >
>
See also https://commons.wikimedia.org/wiki/Special:ListDuplicatedFiles
which lists files that have the most byte for byte duplicates (really most
of the time those should use file redirects).
--
Thanks Jonas for experimenting with this sort of thing. I always wished we
did something with preceptual hashes internally in addition to the sha1
hashes we do currently.
--bawolff
Hi Jesse,
Thanks for sharing this nice success story!
I’m really happy to hear that over 400 videos were added to articles within three weeks of your event.
This is a great accomplishment, given that there is still not a lot of video on Wikipedia at this time. Nicely done!
I just tweeted it here, if you’d like to retweet:
https://twitter.com/fabriceflorin/status/539847861178212353
I also recommended it to our social media team, and am sharing it with our multimedia and commons mailing lists.
Keep up the great work :)
Be well,
Fabrice
> On Dec 2, 2014, at 5:24 AM, Jesse de Vos <jdvos(a)beeldengeluid.nl> wrote:
>
> Hi all,
>
> I have written a blogpost about our positive(!) experiences with organizing a video-challenge on the UNESCO World day for Audiovisual Heritage. You can find it here:
>
> http://www.beeldengeluid.nl/en/blogs/research-amp-development-en/201412/vid… <http://www.beeldengeluid.nl/en/blogs/research-amp-development-en/201412/vid…>
>
> Most notably, over 400 videos were added to articles within three weeks.
> We're very open to suggestions how we can improve these type of 'contests' and perhaps there are people who would like to join in for next years' World Day of Audiovisual Heritage (7th October 2015)? :)
>
> Best,
> Jesse
> --
> Met vriendelijke groet,
>
> Jesse de Vos
> GLAM-wiki coördinator
>
> T 035 - 677 39 37
> Aanwezig: ma, di, do
>
> <http://www.beeldengeluid.nl/>
> Nederlands Instituut voor Beeld en Geluid
> Media Parkboulevard 1, 1217 WE Hilversum | Postbus 1060, 1200 BB Hilversum | beeldengeluid.nl <http://www.beeldengeluid.nl/>
> _______________________________________________
> Wikivideo-l mailing list
> Wikivideo-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikivideo-l
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)