As my final hurrah, I've released the data access guidelines used by
the Discovery team in research and analysis on to Meta. It can be
found at https://meta.wikimedia.org/wiki/Discovery/Data_access_guidelines
The intent here is to ensure that we are transparent about what
information we have, what we do with it, and what expectations we have
on how employees will guarantee the safety and security of the
information and the people behind it.
To my knowledge the Discovery team is the first team in Engineering to
have released this kind of information so prominently. I am pretty
happy we could lead the way, and look forward to other groups with
research interests hopefully doing the same!
I’m writing to solicit feedback on a plan to slightly rework the Discovery
workboards on Phabricator.
Back when we were assembling the team and putting the process together, I
expressed a strong desire to have a centralised Discovery workboard which
would contain every task related to Discovery’s work. In the end, I’d say
that hasn’t really worked out, because there’s simply too many tasks in the
workboard and too much in flight. It’s been hard to sensibly break things
down into different categories of task on a per-project basis. To help
alleviate this problem, here’s my proposal:
- The Discovery workboard is disabled.
- The Discovery tag will continue to exist as a parent-tag for
Discovery’s work, but there will be no associated workboard.
- Individual backlog workboards can be created for individual projects
at the request of that team, for example:
- Sprint boards will remain exactly as they are presently.
How will this affect you? Firstly, this won’t affect you at all unless you
spend a significant amount of time in the Discovery backlog, which should
mostly be! Otherwise, nothing will really change for you. The workflow of
engineers working on Maps and Wikidata Query Service, for example, will
likely be completely unaffected by this change.
As I’m the primary consumer of the Discovery board, and it’s not working
very well for me at the minute, I’d like to move forwards with this
proposal unless there are any strong objections. Kevin and I will handle
all the logistics.
Lead Product Manager, Discovery
[resending due to mailing list byte limits]
Slightly off-topic - You may try separating latency effects in an A/B test
by running a counter factual test. Run your feature, but don't display any
changes. A/B/CF test. The experiment scorecard then would compare your
feature & control to the counterfactual test highlighting indirect changes
like added latency.
Greetings Discovery Friends, Foes, and Other Enthusiasts!
For various reasons, we're looking to rename the Relevanc(e|y) Lab so that
it doesn't have "lab" in its name.
If you have any clever or amusing ideas, jot them down over in Phabricator:
If you aren't Phab-enabled, you can read the discussion so far to get
calibrated, and reply here and I'll post them back to Phab.
Forwarding to the Discovery mailing list, although it sounds like the OP
hopes to have any possible discussion on Wikitech-l.
I wonder if there would be ways for WMF Discovery to leverage the work
that's being done already on Commoncrawl and Commonsearch for use in
Wikimedia internal search.
---------- Forwarded message ----------
From: Sylvain Zimmer <sylvain(a)sylvainzimmer.com>
Date: Sun, Mar 6, 2016 at 11:46 AM
Subject: [Wikitech-l] Using Wikipedia/Wikidata in a nonprofit search engine
Some of you may be familiar with http://commoncrawl.org ; they are
doing an excellent job of making large crawls of the web accessible to
I've been working on an open search engine based on these crawls for a
while, and I would love to have your feedbacks on the project:
Specifically, I would be curious to know what you would consider to be
the best possible integration of Wikipedia & Wikidata in a general
As a first step, we have just started using the "official website"
property from Wikidata and we are considering importing the Wikipedia
abstracts next (https://github.com/commonsearch/cosr-back/issues/11).
I'm looking forward to your feedbacks... or contributions! :-)
Thanks in advance,
PS: A few wikimedians recommended me to post on wikitech-l to keep the
focus on the technical aspects of the project and hopefully avoid
linking this project in any way to the KE stuff, which it actually
predates by far (https://news.ycombinator.com/item?id=6209088).
Wikitech-l mailing list
You may also find these diagrams useful:
On Feb 24, 2016 6:58 PM, "Mukunda Modell" <mmodell(a)wikimedia.org> wrote:
> >> On Feb 17, 2016 1:50 AM, "Guillaume Lederrey" <glederrey(a)wikimedia.org>
> >> wrote:
> >>> * I still have not found a global architecture schema (something like
> >>> a high level component or deplyoment diagram). But I have never seen
> >>> any company having those...
> I made a diagram of the scap (mediawiki) deployment architecture a while
> back: https://commons.wikimedia.org/wiki/File:Scap-diagram.png ..
> That does not exactly apply to the new scap3 architecture but it's not too
> far off.
> On Thu, Feb 18, 2016 at 10:37 AM, Giuseppe Lavagetto
> <glavagetto(a)wikimedia.org> wrote:
> > About cherry-picks in beta: the problem is not cherry-picking (I think
> > it's a reasonable way to test things) but persistent cherry-picking to
> > monkey patch problems is. I think if we follow the flow of:
> > - writing a patch
> > - testing it on beta with a cherry-pick
> > - get it merged on ops/puppet and production
> There are a lot of patches on beta these days and there have been a lot of
> different people cherry-picking without much coordination. This has lead
> to breakage quite often. Patches also get lost regularly. I assume this
> usually happens because someone has rebased the HEAD and accidentally
> dropped a patch.
> It can be really difficult to get a patch merged in ops/puppet within a
> week (or even a month). I've seen a lot of patches sit around for weeks and
> even now with the Puppet SWAT windows, it's still sometimes unrealistic to
> expect patches get merged into production that quickly. (+CC Tyler)
> Without a system to manage things, and with very little coordination
> between everyone who is working on beta, I don't expect the situation to
> improve too much.
> I intend to propose a solution for beta & puppet patch cherry-picks very
> soon, however, I haven't fully formulated my proposal yet. I will write to
> the ops list when I have something written in a clear and presentable way.
> Ops mailing list
I'm thinking in going to DevopsDays Amsterdam (June 29th, 30th and
July 1st 2016) . It might be a good occasion to meet some of you in
real life if you are so inclined.
For those who do not know what DevopsDays are, it is a fun conference
format, with a fairly small audience (200-400 people) centered around
Devops themes (no surprise). The really fun part of this conference is
the Open Spaces . It is a good occasion to have great conversations
with smart people, exchange idea, spread our knowledge, ...
Let me know...
I have been wanting to get a deeper look at Salt for a long long
time... Seems that the time has come!
On Tue, Mar 1, 2016 at 10:48 PM, Andrew Otto <otto(a)wikimedia.org> wrote:
> Could also put it in puppet in elasticsearch module or role.
> We often use salt for things like this. It works (sometimes?)!
> On Tue, Mar 1, 2016 at 2:47 PM, Daniel Zahn <dzahn(a)wikimedia.org> wrote:
>> On Tue, Mar 1, 2016 at 10:02 AM, Guillaume Lederrey
>> <glederrey(a)wikimedia.org> wrote:
>>> Do we have a central place for those kind of scripts? I'd like to
>>> version it in a more obvious place than my personal Github repo.
>> +1 for not using personal Github. If in doubt you can always use this:
>> operations/software Random software tools for ops tasks (svn2git,
>> udpprofile, etc)
>> and then there are a lot of repos under operations/software/foo
>> Daniel Zahn <dzahn(a)wikimedia.org>
>> Operations Engineer
>> Ops mailing list