On 8/25/07, Erik Moeller <erik(a)wikimedia.org> wrote:
On 8/26/07, Bryan Tong Minh
<bryan.tongminh(a)gmail.com> wrote:
What would be really useful is an sha archive of
the internet ;P
Imagine that we can find the source of an image by just looking it up
in the archive.
That actually should be doable for the major image search engines.
I'll try to get the idea passed around a bit at least.
If you do end up in one of these conversations: Also try to get a feel
for how they'd feel about also generating a lookup key for fuzzy
matching.
SHA-1 will allow us to catch bit identical duplicates, but it fails if
someone resizes, crops, recompresses, or strips EXIF. Even if their
change isn't visible. It would be a good first step but it is trivial
to evade, even accidentally.
I've been working on writing software for doing fuzzy image matching.
It has been a low priority project that I've worked on off-and-on for
the last few months so it is slow in coming, but I will eventually
produce something good or someone else will beat me to it.
It isn't something that we should allow to slow down the introduction
of exact match searches, but it would be good to have the contacts
ready when we can propose doing something more.
Also related to this subject is the request I sent to the board a
while back on contacting copyright violation detection companies. I
never heard any response:
---------- Forwarded message ----------
From: Gregory Maxwell <gmaxwell(a)wikimedia.org>
Date: Feb 28, 2007 7:34 PM
Subject: Contacting copyright violation detection companies.
To: board-l(a)lists.wikimedia.org
There are several commercial companies that exist to to help copyright
holders locate web sites which are infringing their copyrights.
They exact method of operation differs from company to company, but
all appear to involve the company running a web spider that goes out
and looks for possibly infringing content and all that I've found sell
this as a service to content holders.
For example, one company is:
Digimarc (
http://www.digimarc.com/). With digimarc's approach content
holders add invisible watermarks to their content which digimarc web
spiders detect. Digimark also offers a no-cost software tool for
Windows/Mac which decodes and displays any embedded watermarks.
What I'd like to do is contact one or more of these companies to
explore opportunities for us to cooperate for our mutual benefit.
I see a number of benefits and a number of potential risks:
Benefits:
* Reduction in copyright violating content on our projects.
* Increased speed in detection of copyright violations.
* An independent indicator of the effectiveness of our communities'
ability to detect copyright violations.
* An opportunity to make public statements about our efforts and
differentiate ourselves from many other web 2.0 services and highly
our higher goals
* Increased evidence of due care on our part which may be useful in
future legal disputes.
* Improved efficiency
- since some of these services would spider us anyways. Cooperation
may yield decreased bandwidth usage, and without our cooperation our
method of notice will be DMCA takedown requests.
* Establishing a relationship before a possible change in legal
climate switches these companies into a 'charge the service provider'
business model.
Risks:
* Incorrect detection: some companies may falsely claim ownership of
public domain content.
* The detection company may consider us a potential customer and nag
us to purchase services.
* Loss of goodwill from interacting with companies whose purpose can
be publicly unpopular.
--Forcing the takedown of illegally copied videos on youtube garnishes
enough dislike, but many of these companies also play in the Digital
restrictions management space.
It also may be possible that such companies might be interested in a
live media feed, possibly a service we could sell them, or possibly
income we forgo in the spirit of cooperation and mutual benefit. I
suspect that we're an unattractive enough target and good enough
policing ourselves that no one would be interested in paying.
I believe that the first risk can be resolved by setting this up to
provide input to the community rather than some sort of automatic
upload restriction. The second point is harder to address.
What I'm looking for is permission to make contact and see what
possibilities exist, I would then report back to the board with my
findings.
I'm also interested any guidance related to what we are willing to do
which I could use in my initial discussions. For example, would we be
willing to run a non-free company provided watermark detection tool to
avoid having to send all our uploads off for checking?
I've been on the lookout to researchers interested in developing open
source fuzzy image comparison tools for our own checking purposes (for
example, to detect uploads of previously deleted content). I think
that such tools will be important in the long term, but the proposed
cooperation would not be mutually exclusive and would serve a
different but related purpose.
(I'm not on the board list, so remember to copy me on replies)