Hi Adrien,
this looks very interesting - I'm happy to see your work and I briefly looked into your sources and API. With your 440 000 images, do you have any clear idea about the accuracy of ORB? To explain: I'm working on Elog.io, which provides a *similar* service and API[1] as yours, but uses a rather different algorithm and store, and a different use case. Our algorithm is a variant of a Blockhash[2] algorithm, which does not do any feature detection at all, but which can easily run in a browser or mobile platform (we have versions for JavaScript, C and Python) to generate 256 bit hashes of images. With a hamming distance calculation, we then determine the quality of a match.
We work primarily on a use case of verbatim use, with a user getting images from Wikimedia and re-using them elsewhere. Algorithms without feature detection give very bad results for any modifications to an image, like rotating, cropping, etc. But since that's not within our use case, it works, though the flip side of if them is of course that you can't expect to photograph something (a newspaper article with an image for instance) and then match it against a set of images as you expect to be able to do.
The other difference is that our database store isn't specifically tailored to our hashes: we use W3C Media Annotations to store any kind of metadata about images, and could equally well store your ORB signatures assuming they can be serialised.
To give you some numbers, for our use cases (verbatim use, potentially with format change jpg->png etc, and scaling down to 100px width) we can successfully match ca 87% of cases, and we have a collision rate (different images resulting in same or near same hashes) of ca 1,2%. Both numbers against the Wikimedia Commons set.
While we currently have the full ~22M images from Wikimedia Commons in our database, we're still ironing out the kinks of the system and making some additional improvements. If you think that we should consider ORB instead of or in addition to our current algorithms, we'd love to give that a try, and it'd obviously be very interesting if we could end up having compatible signatures compared to your database.
Sincerely, Jonas
[1] http://docs.cmcatalog.apiary.io [2] http://blockhash.io
Jonas
On 24 November 2014 at 11:25, Adrien Maglo adrien@visualink.io wrote:
Hello,
I am not sure this is the right mailing list to introduce this project but I have just released Displee. It is a small Android app that allows to search for images in the English Wikipedia by taking pictures: https://play.google.com/store/apps/details?id=org.visualink.displee It is a kind of open source Google Goggles for images from en.wikipedia.org.
I have developed Displee as a demonstrator of Pastec http://pastec.io, my open source image recognition index and search engine for mobile apps. The index hosted on my server in France currently contains about 440 000 images. They may not be the most relevant ones but this is a start. ;-) I have also other ideas to improve this tiny app if it has an interest for the community.
Displee source code (MIT) is available here: https://github.com/Visu4link/displee Pastec source code (LGPL) is available here: https://github.com/Visu4link/pastec The source code of the Displee back-end is not released yet. It is basically a python3 Django application.
I will be glad to receive your feedback and answer any question!
Best regards,
-- Adrien Maglo Pastec developer http://www.pastec.io +33 6 27 94 34 41
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l