Dear Wikimedia friends,
We’re happy to be able to share some news and information around the
Wikimedia Conference with you:
1) Report
Today, we have published our report on the Wikimedia Conference 2016. It
does not only describe our tasks and activities before, on and after the
conference, but also provides a comprehensive overview of our learnings and
experience. The report is meant to serve as a learning resources, and based
on that, we will share more learning patterns in the upcoming months.
https://meta.wikimedia.org/wiki/Wikimedia_Conference_2016/Report
2) Follow-up page
To better see what continuities and links between the Wikimedia Conferences
exist, we have created a follow-up page which include the major topics of
the last two conferences. We will share more updates in the upcoming months.
https://meta.wikimedia.org/wiki/Wikimedia_Conference/Follow-Up
3) First learning pattern
Together with our “Visiting Wikimedian” Teele Vaalma, who supported us
tremendously in the organization of the Wikimedia Conference, we have
written a first learning pattern on how to find the right venue for a
conference (including a checklist!).
https://meta.wikimedia.org/wiki/Grants:Learning_patterns/Step_by_step_guide…
4) Wikimedia Conference 2017
The next edition of the Wikimedia Conference will take place in Berlin
between March 31st and April 2nd, 2017. We expect the registration to begin
in mid-November and will share a timeline in the upcoming weeks.
https://meta.wikimedia.org/wiki/Wikimedia_Conference_2017
Cheers & good reading,
Cornelius, Nicole, Wenke and Daniela
(WMCON organizing team)
--
Cornelius Kibelka
Program and Engagement Coordinator (PEC), GHM
for the Wikimedia Conference
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
http://wikimedia.de
Stellen Sie sich eine Welt vor, in der jeder Mensch an der Menge allen
Wissens frei teilhaben kann. Helfen Sie uns dabei!
http://spenden.wikimedia.de/
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207
Gostaria de saber de vocês se valeu a pena terem o selo do AffCom, quais
foram as vantagens para parar o processo vigente e reiniciar nos moldes
exigidos.
Quais foram os benefícios de afastarem voluntários chegando a permanentes
bloqueios sem razão [1].
E se foi bom para o Movimento WIkimedia no Brasil como um todo na visão de
vocês.
Valeu a pena?
----
E outra pergunta, para aqueles que disseram que eu afastava voluntários,
com 3 anos sem eu estar ativo aqui, era o problema?
Beijinhos.
[1]
https://br.wikimedia.org/wiki/Usu%C3%A1rio_Discuss%C3%A3o:Rodrigo_Tetsuo_Ar…
--
Rodrigo Tetsuo Argenton
rodrigo.argenton(a)gmail.com
+55 11 979 718 884
Sorry but I had a very different reaction to that essay. While there are
some useful kernels of truth, they were overshadowed by dreck.
I get why nonprofits prefer unrestricted grants.
I get why donors prefer restricted grants.
In general, my sympathy is with the donors. There may be times, hopefully
many times that the donor is generally supportive of the work of the
nonprofit— then an unrestricted grant is the best way to go. There are
other times that a donor is seeking to accomplish some goal, casts about to
locate a nonprofit who might be able to achieve that goal, and undertakes
to provide a grant. In this case it would be absurd for the donor to
provide an unrestricted grant.
The essay started with an anecdote about onerous reporting requirements.
However, my response to that anecdote is different than the authors. The
grant wording just didn't pop up out of the blue, it almost certainly was
known to the nonprofit before the money exchanged hands. Did someone read
it to realize that there was a mismatch in line items? A relatively naïve
donor may have just asked some low-level employee to dream up some sort of
reporting requirements. In that case, prior to the acceptance it would've
been a good time for a discussion to point out the mismatch in line items.
It is very possible the donor just want some type of accounting and might
be flexible and willing to accept the nonprofits line item structure.
Another possibility is that the donor is experienced and the line items
makes sense, in which case some introspection about one's own line item
structure is warranted. The main point being that the discussion about the
reporting requirements should take place before the grant is accepted not
whined about after the grant is accepted.
Sphilbrick
On Mon, Jul 18, 2016 at 8:00 AM, <wikimedia-l-request(a)lists.wikimedia.org>
wrote:
> Send Wikimedia-l mailing list submissions to
> wikimedia-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
> or, via email, send a message with subject or body 'help' to
> wikimedia-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wikimedia-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikimedia-l digest..."
>
>
> Today's Topics:
>
> 1. Re: With my thanks to everyone ... (Jan-Bart de Vreede)
> 2. Essay: "We need to stop treating nonprofits the way society
> treats poor people" (Pine W)
> 3. Re: Essay: "We need to stop treating nonprofits the way
> society treats poor people" (Gerard Meijssen)
> 4. 100k articles written with Content Translation tool
> (Runa Bhattacharjee)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 17 Jul 2016 14:09:20 +0200
> From: Jan-Bart de Vreede <jdevreede(a)wikimedia.org>
> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
> Subject: Re: [Wikimedia-l] With my thanks to everyone ...
> Message-ID: <E3088B2E-0ADC-47AB-9D88-A04C2717B553(a)wikimedia.org>
> Content-Type: text/plain; charset=utf-8
>
> Hi Geoff
>
> What can I say that others have not said earlier in this thread…
>
> Well… perhaps that you were a Rock(star) during my tenure as (vice-)chair
> of the Foundation Board of Trustees. Relying on you to give sound advice
> was one of the easiest decisions I ever made. You have shaped many of our
> governance aspects, and can take credit for a lot of our growth in
> professionalism and have given us great examples of how to have a
> department conduct community consultations. Your enthusiasm for the mission
> combined with your warm human side always made it a pleasure to work with
> you. There is a reason that there is a stereotype of the typical “Lawyer”,
> the reason is that you were able to clearly break through that ;) The world
> is a small place and I am sure that some of us will run into you sooner or
> later, if only because you simply get to enjoy Wikimania in Montreal as a
> volunteer?
>
> Many many many thanks for everything you have contributed to the
> Foundation over the past years.
>
> Jan-Bart
>
>
>
> > On 13 Jul 2016, at 23:25, Geoff Brigham <gbrigham(a)wikimedia.org> wrote:
> >
> > Hi all,
> >
> > Over the past five years, I’ve been honored to serve as the General
> Counsel
> > and Secretary of the Wikimedia Foundation. This job has been amazing, and
> > I’m grateful to everyone who has made it so rewarding. It's now time for
> my
> > next step, so, in the coming days, I will be leaving the Foundation to
> > pursue a new career opportunity.
> >
> > I depart with such love for the mission, the Foundation, the Wikimedia
> > communities, and my colleagues at work. I thank my past and present
> bosses
> > as well as the Board for their support and guidance. I stand in awe of
> the
> > volunteer writers, editors, and photographers who contribute every day to
> > the Wikimedia projects. And I will hold special to my heart my past and
> > current teams, including legal and community advocacy. :) You have
> taught,
> > given, and enriched me so much.
> >
> > After my departure, Michelle Paulson will serve as interim head of Legal,
> > and, subject to Board approval, Stephen LaPorte will serve as interim
> > Secretary to the Board. I can happily report that they have the
> experience
> > and expertise to ensure a smooth and professional transition.
> >
> > The future of the Foundation under Katherine's leadership is exciting.
> > Having had the pleasure of working for her, I know Katherine will take
> the
> > Foundation to its next level in promoting and defending the outstanding
> > mission and values of the Wikimedia movement. Although I'm delighted
> about
> > my next opportunity, I will miss this new chapter in the Foundation's
> > story.
> >
> > My last day at the Foundation will be July 18th. After that, I will take
> a
> > month off to recharge my batteries, and then I start my new gig at
> YouTube
> > in the Bay Area. There, I will serve as Director of YouTube Trust &
> Safety,
> > managing global teams for policy, legal, and anti-abuse operations. As
> with
> > Wikimedia, I look forward to learning from those teams and tackling
> > together a new set of exciting, novel challenges.
> >
> > For those who want to stay in touch, please do! My personal email is:
> > geoffrey.r.brigham(a)gmail.com.
> >
> > With respect, admiration, and gratitude,
> >
> > Geoff
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 18 Jul 2016 00:50:02 -0700
> From: Pine W <wiki.pine(a)gmail.com>
> To: Wikimedia Mailing List <Wikimedia-l(a)lists.wikimedia.org>,
> Wikimedia Movement Affiliates discussion list
> <affiliates(a)lists.wikimedia.org>
> Subject: [Wikimedia-l] Essay: "We need to stop treating nonprofits the
> way society treats poor people"
> Message-ID:
> <CAF=
> dyJj34P_ocgSdZTjmhiZh3bWqpKsseN4GZt74k96u0MY13g(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> This essay just landed in my inbox. It's good food for thought about how we
> (WMF and affiliates) manage our grant allocations, our relationships with
> major donors, and our reporting systems.
>
>
> http://nonprofitwithballs.com/2016/07/we-need-to-stop-treating-nonprofits-t…
>
> Pine
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 18 Jul 2016 11:14:44 +0200
> From: Gerard Meijssen <gerard.meijssen(a)gmail.com>
> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
> Subject: Re: [Wikimedia-l] Essay: "We need to stop treating nonprofits
> the way society treats poor people"
> Message-ID:
> <CAO53wxUuq=
> hOncKPr5jpnvwuh0t2scDQhy3FApZcp-NVumGjoQ(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hoi,
> It resonates powerfully. I had a grant for a very specific deliverable. The
> money was earmarked for this and in stead of payment of delivery, I had to
> jump through all kinds of hoops.
>
> It may be that all the effort to give money to do things is well
> intentioned however, have we ever looked in the effectiveness, did we ever
> consider the cost of the effort of the financial reporting. What we pay for
> is executing a project and we require a financial report. It is
> demotivating.
> Thanks,
> GerardM
>
> On 18 July 2016 at 09:50, Pine W <wiki.pine(a)gmail.com> wrote:
>
> > This essay just landed in my inbox. It's good food for thought about how
> we
> > (WMF and affiliates) manage our grant allocations, our relationships with
> > major donors, and our reporting systems.
> >
> >
> >
> http://nonprofitwithballs.com/2016/07/we-need-to-stop-treating-nonprofits-t…
> >
> > Pine
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 18 Jul 2016 16:45:37 +0530
> From: Runa Bhattacharjee <rbhattacharjee(a)wikimedia.org>
> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
> Subject: [Wikimedia-l] 100k articles written with Content Translation
> tool
> Message-ID:
> <CAE7QTsQTOym=
> nAc3ze60Gj+rNkP4bdOME_QNBCQRWc+k5G8KKA(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hello everyone,
>
> Our barely 1-year old Content Translation Tool just passed 100,000
> translations <
> https://en.wikipedia.org/wiki/Special:ContentTranslationStats>.
> We've made a video to celebrate the achievement:
> Wikimedia Commons:
>
> https://commons.wikimedia.org/wiki/File:The_Wikipedia_Content_Translation_T…
> Facebook:
> https://www.facebook.com/wikipedia/
> Twitter:
> https://twitter.com/Wikipedia/status/754377307377197060
> YouTube:
> https://www.youtube.com/watch?v=3btQ5fpn4sA
> Vimeo:
> https://vimeo.com/174526242
>
> You can read more about it on the blog:
> https://blog.wikimedia.org/2016/07/16/content-translation-milestone/
>
> Thanks
> Runa
>
> --
> Language Engineering Manager
> Outreach and QA Coordinator
> Wikimedia Foundation
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>
>
> ------------------------------
>
> End of Wikimedia-l Digest, Vol 148, Issue 29
> ********************************************
>
Hey, after discussion with WMF folks, and more, I figured I should disclose
the following, in respect.
In addition to founding craigslist, I've been focused on philanthropy via
craigconnects.org, specifically supporting nonprofit journalism, which I
feel includes Wikipedia. I just posted regarding the ethics of funding
nonprofit journalism, and thought this would be relevant here.
Further, given my heavy Wikipedia existing and future grants, I've decided
I need to disclose everything I do that's related, limited only by the do
no harm principle. That is, I'll be subjecting everything I do re Wikipedia
to public scrutiny, and this is the start of that.
Related:
https://blog.wikimedia.org/2016/06/08/craig-newmark-wikipedia-future/http://craigconnects.org/2016/07/funding-non-profit-journalism-be-transpare…
Craig Newmark
founder, craigslist
Dear all,
This letter is a public call for interested people to serve as Board
Governance Committee (BGC) Volunteer and Advisory members. The BGC can have
non-voting members participate in the Committee on an annual basis [1].
This opportunity has never been used by the BGC before, so we decided to
frame the process the same way that Audit Committee had in April [2].
The BGC has identified our priorities for the next 12 months. The minutes
of our first meeting are published here [3]. Please read the document and
think if you can help / support to reach them (and what tasks exactly you
would like to work on).
We need people who are interested in results and constructive work, not in
“one more hat to wear”, so you need to be seriously committed and be aware
that if your participation does not add value, we shall remove you from
this position.
*To submit your candidacy, please send to me at this email address (ntymkiv
at wikimedia.org <http://wikimedia.org> <mailto:ntymkiv at wikimedia.org
<http://wikimedia.org>>) your resume, the top 3 reasons why you want to do
this and the top 3 things you will add to the Board Governance Committee.
Please also indicate the priorities you would like to work on and/or other
things that (to your mind) we failed to include there.*
The selection criteria:
- 2+ years of being a board member (Wikimedia affiliates / other comparable
organizations; experience in a large NGO would be a bonus)
- Solid understanding of the mission of Wikimedia Foundation, general
understanding of the Movement, and the Foundation's affiliates and partners
and willingness to be a trusted advisor to the Board of Trustees
- Capacity to commit an estimated 20-30 hours annually to attend both
quarterly and other ad-hoc meetings, prepare or review required materials,
and interact with committee, staff and/or board as required
- Prior executive experience (CEO, Finance, HR, Operations roles etc) would
be a bonus
- Demonstrated track record of involvement with the Wikimedia movement
outside one's own affiliate or community (would be a bonus)
The timeline for the selection process (tight so as to have selected
volunteers attend the next meeting):
- Candidates submit their interest and the above information to me no later
than July 21
- The BGC will interview top candidates by July 31
- The BGC will select the candidates by August 05
- Selected candidates join the committee by August 15 and attend the next
Committee meeting.
Please forward this letter to any list you think is appropriate and
directly to people who may be interested.
Best regards,
antanana / Nataliia Tymkiv
[1]
https://wikimediafoundation.org/wiki/Resolution:Approving_the_revised_Board…
[2] https://lists.wikimedia.org/pipermail/wikimedia-l/2016-April/083638.html
[3]
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Board_Governance_Commi…
This message is available on Meta-Wiki:
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Board_Governance_Commi…
<
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Board_Governance_Commi…
>
I use search to find typos and misused words, so I'm guilty of some of the
gibberish looking searches
<https://en.wikipedia.org/wiki/User:WereSpielChequers/searches>.
If we are concerned that some common searches could have Privacy
implications, why not create it as a deleted page and announce its
(non)existence on the admins noticeboard?
WSC
On 15 July 2016 at 19:25, <wikimedia-l-request(a)lists.wikimedia.org> wrote:
> Send Wikimedia-l mailing list submissions to
> wikimedia-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
> or, via email, send a message with subject or body 'help' to
> wikimedia-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wikimedia-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikimedia-l digest..."
>
>
> Today's Topics:
>
> 1. Re: [discovery] Fwd: Improving search (sort of) (Dan Garry)
> 2. Re: [discovery] Fwd: Improving search (sort of) (James Heilman)
> 3. Re: [discovery] Fwd: Improving search (sort of) (James Heilman)
> 4. Re: [discovery] Fwd: Improving search (sort of) (Robert Fernandez)
> 5. Re: [discovery] Fwd: Improving search (sort of) (Nathan)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 15 Jul 2016 09:05:54 -0700
> From: Dan Garry <dgarry(a)wikimedia.org>
> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
> Cc: A public mailing list about Wikimedia Search and Discovery
> projects <discovery(a)lists.wikimedia.org>, Trey Jones
> <tjones(a)wikimedia.org>
> Subject: Re: [Wikimedia-l] [discovery] Fwd: Improving search (sort of)
> Message-ID:
> <
> CAOW03MHsgowW-gAd6uDJs_ONvA8ZNiUyKcCrP2evOK1B+2DOZA(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On 15 July 2016 at 08:44, James Heilman <jmh649(a)gmail.com> wrote:
> >
> > Thanks for the in depth discussion. So if the terms people are using that
> > result in "zero search results" are typically gibberish why do we care if
> > 30% of our searches result in "zero search results"? A big deal was made
> > about this a while ago.
> >
>
> Good question! I originally used to say that it was my aspiration that
> users should never get zero results when searching Wikipedia. As a result
> of Trey's analysis, I don't say that any more. ;-) There are many
> legitimate cases where users should get zero results. However, there are
> still tons of examples of where giving users zero results is incorrect;
> "jurrasic world" was a prominent example of that.
>
> It's still not quite right to say that *all* the terms that people use to
> get zero results are gibberish. There is an extremely long tail
> <https://en.wikipedia.org/wiki/Long_tail> of zero results queries that
> aren't gibberish, it's just that the top 100 are dominated by gibberish.
> This would mean we'd have to release many, many more than the top 100,
> which significantly increases the risk of releasing personal information.
>
>
> > If one was just to look at those search terms that more than 100 IPs
> > searched for would that not remove the concerns about anonymity? One
> could
> > also limit the length of the searches displaced to 50 characters. And
> just
> > provide the first 100 with an initial human review to make sure we are
> not
> > miss anything.
> >
>
> The problem with this is that there are still no guarantees. What if you
> saw the search query "DF198671E"? You might not think anything of it, but I
> would recognise it as an example of a national insurance number
> <https://en.wikipedia.org/wiki/National_Insurance_number>, the British
> equivalent of a social security number [1]. There's always going to be the
> potential that we accidentally release something sensitive when we release
> arbitrary user input, even if it's manually examined by humans.
>
> So, in summary:
>
> - The top 100 zero results queries are dominated by gibberish.
> - There's a long tail of zero results queries, meaning we'd have to
> reduce many more than the top 100.
> - Manually examining the top zero results queries is not a foolproof way
> of eliminating personal data since it's arbitrary user input.
>
> I'm happy to answer any questions. :-)
>
> Thanks,
> Dan
>
> [1]: Don't panic, this example national insurance number is actually
> invalid. ;-)
>
> --
> Dan Garry
> Lead Product Manager, Discovery
> Wikimedia Foundation
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 15 Jul 2016 10:19:08 -0600
> From: James Heilman <jmh649(a)gmail.com>
> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
> Cc: A public mailing list about Wikimedia Search and Discovery
> projects <discovery(a)lists.wikimedia.org>, Trey Jones
> <tjones(a)wikimedia.org>
> Subject: Re: [Wikimedia-l] [discovery] Fwd: Improving search (sort of)
> Message-ID:
> <CAF1en7WBrxDJ_H3J=
> eN5NZmGueQEZ+txOGAG5u4af3FwTVV55Q(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> The "jurrasic world" example is a good one as it was "fixed" by User:Foxj
> adding a redirect
> https://en.wikipedia.org/w/index.php?title=Jurrasic_world&action=history
>
> Agree we would need to be careful. The chance of many different IPs all
> searching for "DF198671E" is low but I agree not zero and we would need to
> have people run the results before they are displayed.
>
> I guess the question is how much work would it take to look at this sort of
> data for more examples like "jurrasic world"?
>
> James
>
> On Fri, Jul 15, 2016 at 10:05 AM, Dan Garry <dgarry(a)wikimedia.org> wrote:
>
> > On 15 July 2016 at 08:44, James Heilman <jmh649(a)gmail.com> wrote:
> > >
> > > Thanks for the in depth discussion. So if the terms people are using
> that
> > > result in "zero search results" are typically gibberish why do we care
> if
> > > 30% of our searches result in "zero search results"? A big deal was
> made
> > > about this a while ago.
> > >
> >
> > Good question! I originally used to say that it was my aspiration that
> > users should never get zero results when searching Wikipedia. As a result
> > of Trey's analysis, I don't say that any more. ;-) There are many
> > legitimate cases where users should get zero results. However, there are
> > still tons of examples of where giving users zero results is incorrect;
> > "jurrasic world" was a prominent example of that.
> >
> > It's still not quite right to say that *all* the terms that people use to
> > get zero results are gibberish. There is an extremely long tail
> > <https://en.wikipedia.org/wiki/Long_tail> of zero results queries that
> > aren't gibberish, it's just that the top 100 are dominated by gibberish.
> > This would mean we'd have to release many, many more than the top 100,
> > which significantly increases the risk of releasing personal information.
> >
> >
> > > If one was just to look at those search terms that more than 100 IPs
> > > searched for would that not remove the concerns about anonymity? One
> > could
> > > also limit the length of the searches displaced to 50 characters. And
> > just
> > > provide the first 100 with an initial human review to make sure we are
> > not
> > > miss anything.
> > >
> >
> > The problem with this is that there are still no guarantees. What if you
> > saw the search query "DF198671E"? You might not think anything of it,
> but I
> > would recognise it as an example of a national insurance number
> > <https://en.wikipedia.org/wiki/National_Insurance_number>, the British
> > equivalent of a social security number [1]. There's always going to be
> the
> > potential that we accidentally release something sensitive when we
> release
> > arbitrary user input, even if it's manually examined by humans.
> >
> > So, in summary:
> >
> > - The top 100 zero results queries are dominated by gibberish.
> > - There's a long tail of zero results queries, meaning we'd have to
> > reduce many more than the top 100.
> > - Manually examining the top zero results queries is not a foolproof
> way
> > of eliminating personal data since it's arbitrary user input.
> >
> > I'm happy to answer any questions. :-)
> >
> > Thanks,
> > Dan
> >
> > [1]: Don't panic, this example national insurance number is actually
> > invalid. ;-)
> >
> > --
> > Dan Garry
> > Lead Product Manager, Discovery
> > Wikimedia Foundation
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
>
>
>
> --
> James Heilman
> MD, CCFP-EM, Wikipedian
>
> The Wikipedia Open Textbook of Medicine
> www.opentextbookofmedicine.com
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 15 Jul 2016 10:25:54 -0600
> From: James Heilman <jmh649(a)gmail.com>
> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
> Cc: A public mailing list about Wikimedia Search and Discovery
> projects <discovery(a)lists.wikimedia.org>, Trey Jones
> <tjones(a)wikimedia.org>
> Subject: Re: [Wikimedia-l] [discovery] Fwd: Improving search (sort of)
> Message-ID:
> <
> CAF1en7VYkakrzZf6bMcCtv1dBj2NROSnY1Gv8BwyOEkg+yiTSw(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Forwarded at the request of Trey Jones
>
> Hey James,
>
> When we first started looking at zero results rate (ZRR), it was an easy
> metric to calculate, and it was surprisingly high. We still look at ZRR
> <https://searchdata.wmflabs.org/metrics/#failure_rate> because it is so
> easy to measure, and anything that improves it is probably a net positive
> (note the big dip when the new completion suggester was deployed!!), but we
> have more complex metrics that we prefer. There's user engagement
> <https://searchdata.wmflabs.org/metrics/#kpi_augmented_clickthroughs
> >/augmented
> clickthroughs, which combines clicks and dwell time and other user
> activity. We also use historical click data in a metric that improves when
> we move clicked-on results higher in the results list, which we use with
> the Relevance Forge
> <
> https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/discovery/relevan…
> >
> .
>
> And I didn't mean to give the impression that *most* zero-results queries
> are gibberish, though many, many are. And that was something we didn't
> really know a year ago. There are also non-gibberish results that correctly
> get zero results, like most DOI
> <
> https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Zero-Resul…
> >
> and
> many media player
> <
> https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Zero-Resul…
> >
> queries.
> We also see a lot of non-notable (not-yet-notable?) public figures (local
> bands, online artists, youtube musicians), and sometimes just random names.
>
> The discussion in response to Dan's original comment in Phab mentions some
> approaches to reduce the risk of automatically releasing private info, but
> I still take an absolute stand against unreviewed release. If I can get a
> few hundred people to click on a link like this
> <
> https://en.wikipedia.org/w/index.php?title=Special:Search&profile=default&f…
> >,
> I can get any message I want on that list. (Curious? Did you click?) The
> message could be less anonymous and much more obnoxious, obviously.
>
> 50 character limits won't stop emails and phone numbers from making the
> list (which invites spam and cranks). Those can be filtered, but not
> perfectly.
>
> I've only looked at these top lists by day in the past, but on that time
> scale the top results are usually under 1000 count (and that includes IP
> duplicates), so the list of queries with 100 IPs might also be very small.
>
> As I said, I'm happy to do the data slogging to try this in a better
> fashion if this task is prioritized, and I'd be happy to be wrong about the
> quality of the results, but I'm still not hopeful.
>
> —Trey
>
> Trey Jones
> Software Engineer, Discovery
> Wikimedia Foundation
>
>
> On Fri, Jul 15, 2016 at 10:19 AM, James Heilman <jmh649(a)gmail.com> wrote:
>
> > The "jurrasic world" example is a good one as it was "fixed" by User:Foxj
> > adding a redirect
> > https://en.wikipedia.org/w/index.php?title=Jurrasic_world&action=history
> >
> > Agree we would need to be careful. The chance of many different IPs all
> > searching for "DF198671E" is low but I agree not zero and we would need
> > to have people run the results before they are displayed.
> >
> > I guess the question is how much work would it take to look at this sort
> > of data for more examples like "jurrasic world"?
> >
> > James
> >
> > On Fri, Jul 15, 2016 at 10:05 AM, Dan Garry <dgarry(a)wikimedia.org>
> wrote:
> >
> >> On 15 July 2016 at 08:44, James Heilman <jmh649(a)gmail.com> wrote:
> >> >
> >> > Thanks for the in depth discussion. So if the terms people are using
> >> that
> >> > result in "zero search results" are typically gibberish why do we care
> >> if
> >> > 30% of our searches result in "zero search results"? A big deal was
> made
> >> > about this a while ago.
> >> >
> >>
> >> Good question! I originally used to say that it was my aspiration that
> >> users should never get zero results when searching Wikipedia. As a
> result
> >> of Trey's analysis, I don't say that any more. ;-) There are many
> >> legitimate cases where users should get zero results. However, there are
> >> still tons of examples of where giving users zero results is incorrect;
> >> "jurrasic world" was a prominent example of that.
> >>
> >> It's still not quite right to say that *all* the terms that people use
> to
> >> get zero results are gibberish. There is an extremely long tail
> >> <https://en.wikipedia.org/wiki/Long_tail> of zero results queries that
> >> aren't gibberish, it's just that the top 100 are dominated by gibberish.
> >> This would mean we'd have to release many, many more than the top 100,
> >> which significantly increases the risk of releasing personal
> information.
> >>
> >>
> >> > If one was just to look at those search terms that more than 100 IPs
> >> > searched for would that not remove the concerns about anonymity? One
> >> could
> >> > also limit the length of the searches displaced to 50 characters. And
> >> just
> >> > provide the first 100 with an initial human review to make sure we are
> >> not
> >> > miss anything.
> >> >
> >>
> >> The problem with this is that there are still no guarantees. What if you
> >> saw the search query "DF198671E"? You might not think anything of it,
> but
> >> I
> >> would recognise it as an example of a national insurance number
> >> <https://en.wikipedia.org/wiki/National_Insurance_number>, the British
> >> equivalent of a social security number [1]. There's always going to be
> the
> >> potential that we accidentally release something sensitive when we
> release
> >> arbitrary user input, even if it's manually examined by humans.
> >>
> >> So, in summary:
> >>
> >> - The top 100 zero results queries are dominated by gibberish.
> >> - There's a long tail of zero results queries, meaning we'd have to
> >> reduce many more than the top 100.
> >> - Manually examining the top zero results queries is not a foolproof
> >> way
> >> of eliminating personal data since it's arbitrary user input.
> >>
> >> I'm happy to answer any questions. :-)
> >>
> >> Thanks,
> >> Dan
> >>
> >> [1]: Don't panic, this example national insurance number is actually
> >> invalid. ;-)
> >>
> >> --
> >> Dan Garry
> >> Lead Product Manager, Discovery
> >> Wikimedia Foundation
> >> _______________________________________________
> >> Wikimedia-l mailing list, guidelines at:
> >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> >> New messages to: Wikimedia-l(a)lists.wikimedia.org
> >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> >> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> >
> >
> >
> >
> > --
> > James Heilman
> > MD, CCFP-EM, Wikipedian
> >
> > The Wikipedia Open Textbook of Medicine
> > www.opentextbookofmedicine.com
> >
>
>
>
> --
> James Heilman
> MD, CCFP-EM, Wikipedian
>
> The Wikipedia Open Textbook of Medicine
> www.opentextbookofmedicine.com
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 15 Jul 2016 14:15:31 -0400
> From: Robert Fernandez <wikigamaliel(a)gmail.com>
> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
> Subject: Re: [Wikimedia-l] [discovery] Fwd: Improving search (sort of)
> Message-ID:
> <
> CAMY8yAWisp507c_F3hJbcRWT20NZjyczN0oiZWPZ-UwggRi38Q(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> > If I can get a
> few hundred people to click on a link like this
> <
> https://en.wikipedia.org/w/index.php?title=Special:Search&profile=default&f…
> >,
> I can get any message I want on that list. (Curious? Did you click?) The
> message could be less anonymous and much more obnoxious, obviously
>
> They could vandalize any one of over ten million pages on the English
> Wikipedia and get the same result. We should be conscious of the
> dangers but we can easily route around them like we do with other
> kinds of vandalism.
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 15 Jul 2016 14:25:08 -0400
> From: Nathan <nawrich(a)gmail.com>
> To: wikigamaliel(a)gmail.com, Wikimedia Mailing List
> <wikimedia-l(a)lists.wikimedia.org>
> Subject: Re: [Wikimedia-l] [discovery] Fwd: Improving search (sort of)
> Message-ID:
> <CALKX9dTwh=
> BDVPFtiT6tGw53XRccC8TbyZd2kJ9benKx18Jj5w(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> How hard would it be to ask for search feedback on search results, perhaps
> piloting with some small subset of zero-result searches? For 1/1000 ZRRs,
> prompt the user to provide some type of useful information about why there
> should be results, or if there ought to be, or what category of information
> the searcher was looking for, etc. You'd get junk and noise, but it might
> be one way to filter out a lot of the gibberish. You could also ask people
> to agree to make their failed search part of a publicly visible list,
> although this could of course be gamed.
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>
>
> ------------------------------
>
> End of Wikimedia-l Digest, Vol 148, Issue 26
> ********************************************
>