I have proposed use of local sensitive hashing algorithms for at least three different purposes in the past. All being turned down. Probably it is due to LSHs being difficult to understand, and not to forget it is a fairly bit of fighting over what is and whats not a "real" LSH. In the past there have been a proposal to remove the SHA-1 digest for the revision, which I guess shows how hard it is to argue about the necessity of hashes.
If we want to do LSH for media, then we should probably check which DCT gives best performance. In particular we should check out whether there are methods that gives smaller footprints and faster calculation and comparison. Media streams can also be fingerprinted by using clip points. Also, as DCT is closely related to Fourier transforms (it is a real component Fourier transform), it could also be interesting to checking out cepstrum based transforms.
Related to this is also face recognition, but then we must discuss various methods for generating eigenfaces. Not sure if this is the proper forum for that!
On Sun, May 12, 2019 at 1:41 PM Fæ faewik@gmail.com wrote:
A couple of years ago a proposed project was for the WMF to pay for access to the Google image matching API access so we could run a copyvio bot on the live new uploads list. Such a bot would not be terribly hard to get working, and would be a great experiment to see if this aspect of the more boring side of sysop tools could be reduced.[1]
Not specifically advocating auto-deletion, but daily housekeeping image matches to highly likely copyrighted categories would make mass housekeeping very easy.
A separate old chestnut was my proposal to introduce systemic image hashes, which neatly show "close" image matches.[2] With a Commons hat on, such a project would be of far more immediate pragmatic use than mobile-related and structured data-related projects that seem to suck up all the oxygen and volunteer time available.
Note that the history of these project/funding ideas is so long, that several of the most experienced long term volunteers that were originally interested have since retired. Without some positive short term encouragement, not only do these ideas never reach the useful experiment stage, but the volunteers involved simply fade away.
Links
- https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2016/02#Goog...
- https://commons.wikimedia.org/wiki/User:Fae/Imagehash
Fae
On Sun, 12 May 2019 at 12:21, Amir Sarabadani ladsgroup@gmail.com wrote:
IMO commons need either a Clue Bot NG for new uploads or ores support for images that might be copyright violation, or both.
Best
On Sun, May 12, 2019 at 1:10 PM Yaroslav Blanter ymbalt@gmail.com wrote:
Just the active community itself is too small, compared with the amount of material it has to deal with.
Cheers Yaroslav
On Sun, May 12, 2019 at 1:07 PM Benjamin Ikuta benjaminikuta@gmail.com wrote:
Is the shortage of admins due to a lack of people willing or capable to
do
the job, or increasing difficulty in obtaining the bit?
On May 12, 2019, at 3:55 AM, Tomasz Ganicz polimerek@gmail.com wrote:
Well, Actually, at the moment it looks they are all undeleted.
The good habit - which I was keeping when organizing several
GLAM-related
mass uploads - was to create on Commons project page describing what it
is
intended to be uploaded, preferably in English. Then you can create a project template to mark all uploads with them.
See: https://commons.wikimedia.org/wiki/Commons:Partnerships
Despite practical issue of avoiding unnecessary clashes with Common's admins - creating template and project page helps to promote you
project
across Wikimedia communities and may inspire others to do something
similar.
Commons is indeed quite hostile environment for uploaders, but on the
other
hand it is constantly flooded by hundreds of copyright violating
files a
day:
See the list from just one day:
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/2019/05/01
so this hostility works both ways - Common's admins have to cope with aggressive hostile copyright violators every day, and after some time - decide to leave or became being hostile themselves... and the other
issue
is decreasing number of active admins and OTRS agents.
I think - sooner or later - all this system - uploads - screening
uploads
by admins, and OTRS agreements - needs deep rethinking.
niedz., 12 maj 2019 o 10:48 Mister Thrapostibongles < thrapostibongles@gmail.com> napisał(a):
Hello all,
There seems to be a dispute between the Outreach and the Commons
components
of The Community, judging by the article "Wikimedia Commons: a highly hostile place for multimedia students contributions" at the Education Newsletter
https://outreach.wikimedia.org/wiki/Education/News/April_2019/Wikimedia_Comm...
As far as I can understand it, some students on an Outreach project uploaded some rather well-made video material, and comeone on Commons deleted them because they appeared to well-made to be student projects
and
so concluded they were copyright violations. But some rather odd
remarks
were made "Commons has to fight the endless stream of uploaded
copyrighted
content on behalf of a headquarters in San Francisco that doesn't
care."
and "you have regarded Commons as little more than free cloud storage for images you intend to use on Wikipedia ".
Perhaps the Foundation needs to resolve this dispute?
Thrapostibongles _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- Tomek "Polimerek" Ganicz http://pl.wikimedia.org/wiki/User:Polimerek http://www.ganicz.pl/poli/ _______________________________________________ Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- Amir (he/him) _______________________________________________
faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe