Hi Adrien,
this looks very interesting - I'm happy to see your work and I briefly
looked into your sources and API. With your 440 000 images, do you
have any clear idea about the accuracy of ORB? To explain: I'm working
on Elog.io, which provides a *similar* service and API[1] as yours,
but uses a rather different algorithm and store, and a different use
case. Our algorithm is a variant of a Blockhash[2] algorithm, which
does not do any feature detection at all, but which can easily run in
a browser or mobile platform (we have versions for JavaScript, C and
Python) to generate 256 bit hashes of images. With a hamming distance
calculation, we then determine the quality of a match.
We work primarily on a use case of verbatim use, with a user getting
images from Wikimedia and re-using them elsewhere. Algorithms without
feature detection give very bad results for any modifications to an
image, like rotating, cropping, etc. But since that's not within our
use case, it works, though the flip side of if them is of course that
you can't expect to photograph something (a newspaper article with an
image for instance) and then match it against a set of images as you
expect to be able to do.
The other difference is that our database store isn't specifically
tailored to our hashes: we use W3C Media Annotations to store any kind
of metadata about images, and could equally well store your ORB
signatures assuming they can be serialised.
To give you some numbers, for our use cases (verbatim use, potentially
with format change jpg->png etc, and scaling down to 100px width) we
can successfully match ca 87% of cases, and we have a collision rate
(different images resulting in same or near same hashes) of ca 1,2%.
Both numbers against the Wikimedia Commons set.
While we currently have the full ~22M images from Wikimedia Commons in
our database, we're still ironing out the kinks of the system and
making some additional improvements. If you think that we should
consider ORB instead of or in addition to our current algorithms, we'd
love to give that a try, and it'd obviously be very interesting if we
could end up having compatible signatures compared to your database.
Sincerely,
Jonas
[1]
http://docs.cmcatalog.apiary.io
[2]
http://blockhash.io
Jonas
On 24 November 2014 at 11:25, Adrien Maglo <adrien(a)visualink.io> wrote:
Hello,
I am not sure this is the right mailing list to introduce this project but I
have just released Displee. It is a small Android app that allows to search
for images in the English Wikipedia by taking pictures:
https://play.google.com/store/apps/details?id=org.visualink.displee
It is a kind of open source Google Goggles for images from
en.wikipedia.org.
I have developed Displee as a demonstrator of Pastec
http://pastec.io, my
open source image recognition index and search engine for mobile apps.
The index hosted on my server in France currently contains about 440 000
images. They may not be the most relevant ones but this is a start. ;-)
I have also other ideas to improve this tiny app if it has an interest for
the community.
Displee source code (MIT) is available here:
https://github.com/Visu4link/displee
Pastec source code (LGPL) is available here:
https://github.com/Visu4link/pastec
The source code of the Displee back-end is not released yet. It is basically
a python3 Django application.
I will be glad to receive your feedback and answer any question!
Best regards,
--
Adrien Maglo
Pastec developer
http://www.pastec.io
+33 6 27 94 34 41
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Jonas Öberg, Founder & Shuttleworth Foundation Fellow
Commons Machinery | jonas(a)commonsmachinery.se
E-mail is the fastest way to my attention