[Wikimedia-search] Concerns about production-like projects in labs

List overview All Threads
Download

newer

older

[Wikimedia-search] Schema:Search

[Wikimedia-search] Maps tasks and...

Kevin Smith

16 Jun 2015 16 Jun '15

11:46 p.m.

In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially). We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics. Kevin Smith Agile Coach Wikimedia Foundation *Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*

Attachments:

attachment.htm (text/html — 1012 bytes)

Show replies by thread

Max Semenik

17 Jun 17 Jun

2:28 a.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Maps is not going to be production-like. On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org> wrote:

...

-- Best regards, Max Semenik ([[User:MaxSem]])

Stas Malyshev

2:50 a.m.

Hi!

...

In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).

What specifically are the concerns? As I understand, the requirements to run service in production are much higher than in labs, so we can either run it on labs, or not run it at all, at least for the time it takes to complete all the work required to fill the delta.

...

issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.

What about analytics? Is it about analyzing WDQS? I'd be glad to help if I can though not sure what needs to be done there. -- Stas Malyshev smalyshev(a)wikimedia.org

Oliver Keyes

3:16 a.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

On 16 June 2015 at 20:50, Stas Malyshev <smalyshev(a)wikimedia.org> wrote:

...

Hi!

In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).

See below.

...

issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.

What about analytics? Is it about analyzing WDQS? I'd be glad to help if I can though not sure what needs to be done there.

The problem, as we've gone back and forth about for a while on phabricator, is that labs has absolutely zero inbuilt infrastructure for analytics. If things are in production they go through the frontend varnishes, which are hooked up to HDFS, and all is fine. We have the request logs. If things are in labs...nothing. There is no access to HDFS, there is no consistent varnish setup that pipes things there, and analytics engineering has pretty much no plans to set up that sort of infrastructure.

...

-- Stas Malyshev smalyshev(a)wikimedia.org _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

-- Oliver Keyes Research Analyst Wikimedia Foundation

Oliver Keyes

3:16 a.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Max, could you explain that? On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

...

Maps is not going to be production-like. On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org> wrote:

In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially). We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics. Kevin Smith Agile Coach Wikimedia Foundation Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality. _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

-- Best regards, Max Semenik ([[User:MaxSem]]) _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

-- Oliver Keyes Research Analyst Wikimedia Foundation

Max Semenik

3:52 a.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

...

Max, could you explain that? On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

Maps is not going to be production-like. On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org>

wrote:

> > In a recent meeting, Oliver expressed concerns about us having services > running in labs which are treated sort of like they were in production. > Examples include WDQS (already) and maps (potentially). > > We agreed to have this discussion on the mailing list, so this is an > invitation to do so. I am not familiar enough with the various technical > issues to explain them properly, so hopefully someone else will step in

and

> do so. I believe one big area of concern is analytics. > > > Kevin Smith > Agile Coach > Wikimedia Foundation > > > Imagine a world in which every single human being can freely share in

the

sum of all knowledge. That's our commitment. Help us make it a reality. _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

-- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Stas Malyshev

9:22 a.m.

Hi!

...

Right. What I am still missing is that HDFS, varnish, etc. are means to an end, end being delivering info (in this case, usage logs) somewhere, and then doing something. So I do not have right now clear picture of what is that somewhere/something, and what data it consumes in what form. Maybe if I would be more up to speed on this - or at least understood what inputs are required and which forms of these inputs are acceptable, I could have a better picture. -- Stas Malyshev smalyshev(a)wikimedia.org

Kevin Smith

5:01 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Max: I thought we were hoping to use the labs map servers to support production-like cases. For example, possibly "beta" mobile apps, or maybe hooked into some small production wiki. Even if that is not the case, it seems like for anything we do run in labs for a long time, we would want to be able to measure. Usage and speed. Kevin Smith Agile Coach Wikimedia Foundation *Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.* On Tue, Jun 16, 2015 at 6:52 PM, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

...

On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

Maps is not going to be production-like. On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org>

wrote:

technical

> issues to explain them properly, so hopefully someone else will step

in and

> do so. I believe one big area of concern is analytics. > > > Kevin Smith > Agile Coach > Wikimedia Foundation > > > Imagine a world in which every single human being can freely share in

the

_______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Oliver Keyes

6:08 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Is it running for a really long time? Is it designed to not fall over? Is it no longer being actively looked at with the look you give a rickety Jenga tower because you're confident it's bug-free? Are users being pointed at it without caveats? Then it's production. On 16 June 2015 at 21:52, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

...

Max, could you explain that? On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

Maps is not going to be production-like. On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org> wrote:

In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially). We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics. Kevin Smith Agile Coach Wikimedia Foundation Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality. _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

_______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

-- Oliver Keyes Research Analyst Wikimedia Foundation

James Douglas

6:09 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Do we have our definition of production posted anywhere? On Wed, Jun 17, 2015 at 9:08 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

...

Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod. 16 июня 2015 г. 18:16 пользователь "Oliver Keyes" <okeyes(a)wikimedia.org> написал: > Max, could you explain that? > > On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote: > > Maps is not going to be production-like. > > > > On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org> > > wrote: > >> > >> In a recent meeting, Oliver expressed concerns about us having

services

> >> running in labs which are treated sort of like they were in

production.

> >> Examples include WDQS (already) and maps (potentially). > >> > >> We agreed to have this discussion on the mailing list, so this is an > >> invitation to do so. I am not familiar enough with the various > >> technical > >> issues to explain them properly, so hopefully someone else will step

> >> and > >> do so. I believe one big area of concern is analytics. > >> > >> > >> Kevin Smith > >> Agile Coach > >> Wikimedia Foundation > >> > >> > >> Imagine a world in which every single human being can freely share in > >> the > >> sum of all knowledge. That's our commitment. Help us make it a

reality.

> > _______________________________________________ > Wikimedia-search mailing list > Wikimedia-search(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > -- Best regards, Max Semenik ([[User:MaxSem]]) _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

_______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Oliver Keyes

6:10 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Strictly-speaking "production" is the "primary" cluster but the point here is about the reliability users assume around a service rather than the reliability it has. On 17 June 2015 at 12:09, James Douglas <jdouglas(a)wikimedia.org> wrote:

...

Do we have our definition of production posted anywhere? On Wed, Jun 17, 2015 at 9:08 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

Max, could you explain that? On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote: > Maps is not going to be production-like. > > On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org> > wrote: >> >> In a recent meeting, Oliver expressed concerns about us having >> services >> running in labs which are treated sort of like they were in >> production. >> Examples include WDQS (already) and maps (potentially). >> >> We agreed to have this discussion on the mailing list, so this is an >> invitation to do so. I am not familiar enough with the various >> technical >> issues to explain them properly, so hopefully someone else will step >> in >> and >> do so. I believe one big area of concern is analytics. >> >> >> Kevin Smith >> Agile Coach >> Wikimedia Foundation >> >> >> Imagine a world in which every single human being can freely share >> in >> the >> sum of all knowledge. That's our commitment. Help us make it a >> reality. >> >> _______________________________________________ >> Wikimedia-search mailing list >> Wikimedia-search(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search >> > > > > -- > Best regards, > Max Semenik ([[User:MaxSem]]) > > _______________________________________________ > Wikimedia-search mailing list > Wikimedia-search(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

_______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

-- Oliver Keyes Research Analyst Wikimedia Foundation

James Douglas

6:12 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

It would be easier to manage users' assumptions if we officially defined reliability (and latency, availability, etc.) for our various environments. On Wed, Jun 17, 2015 at 9:10 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

...

Do we have our definition of production posted anywhere? On Wed, Jun 17, 2015 at 9:08 AM, Oliver Keyes <okeyes(a)wikimedia.org>

wrote:

> > Is it running for a really long time? Is it designed to not fall over? > Is it no longer being actively looked at with the look you give a > rickety Jenga tower because you're confident it's bug-free? Are users > being pointed at it without caveats? Then it's production. > > On 16 June 2015 at 21:52, Max Semenik <maxsem.wiki(a)gmail.com> wrote: > > Explain what? Everything we have in labs is for experimentation and > > prototyping. It should not be used for prod. > > > > 16 июня 2015 г. 18:16 пользователь "Oliver Keyes" <

okeyes(a)wikimedia.org>

> написал: > >> Max, could you explain that? >> >> On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote: >> > Maps is not going to be production-like. >> > >> > On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org

> >> > wrote: > >> >> > >> >> In a recent meeting, Oliver expressed concerns about us having > >> >> services > >> >> running in labs which are treated sort of like they were in > >> >> production. > >> >> Examples include WDQS (already) and maps (potentially). > >> >> > >> >> We agreed to have this discussion on the mailing list, so this is

> >> >> invitation to do so. I am not familiar enough with the various > >> >> technical > >> >> issues to explain them properly, so hopefully someone else will

step

> >> in > >> and > >> do so. I believe one big area of concern is analytics. > >> > >> > >> Kevin Smith > >> Agile Coach > >> Wikimedia Foundation > >> > >> > >> Imagine a world in which every single human being can freely share > >> in > >> the > >> sum of all knowledge. That's our commitment. Help us make it a > >> reality. > >> > >> _______________________________________________ > >> Wikimedia-search mailing list > >> Wikimedia-search(a)lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > >> > > > > > > > > -- > > Best regards, > > Max Semenik ([[User:MaxSem]]) > > > > _______________________________________________ > > Wikimedia-search mailing list > > Wikimedia-search(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Wikimedia-search mailing list > Wikimedia-search(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

_______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Oliver Keyes

6:13 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Well, no, HDFS is a means to and end of storing data in a form that can be cleaned with ETL processes so that /then/ they can go to the somewhere/something - which is a lot of use cases but most prominently our dashboards and ad-hoc research tasks. Let me be clear here that this isn't a theoretical exercise existing in a vacuum; we do not want /a/ answer that can be hooked up to the dashboards. That's easy. That's a hideous shell script that scaps nginx files over. We want a answer that can be hooked up to the dashboards for many, many, many things, because we're not just wanting metrics and analytics for WDQS, we're also wanting them for the production API and for user events and for the cirrus logs and for high-level KPIs and that's just the things we've wanted this month. I can't be building out an entirely new pipeline every single time someone builds a thing. That's not an efficient use of our analysts time and it massively increases the chance that something will go wrong. I'm not asking for an alternative to HDFS, because I don't want to be doing that. I'm asking for HDFS because then we don't need to reinvent the wheel every time we build a thing. If we can't do HDFS and going to production isn't going to work, then let's talk about what the alternatives are. Until then the use case is "the data being in HDFS so that analysts can consume it" and higher-level use cases are overthinking. On 17 June 2015 at 03:22, Stas Malyshev <smalyshev(a)wikimedia.org> wrote:

...

Hi!

-- Oliver Keyes Research Analyst Wikimedia Foundation

Jeff Gage

6:41 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Labs hosts a special project called Tool Labs which https://wikitech.wikimedia.org/wiki/Help:FAQ describes as "designed to host tools that are not so complicated that they would require a dedicated instance". One notable tool is Geohack (https://tools.wmflabs.org/geohack/): every article about a location links to it. Example from https://en.wikipedia.org/wiki/Aberdeen: Coordinates: 57.1526°N 2.1100°W <https://tools.wmflabs.org/geohack/geohack.php?pagename=Aberdeen&params=57.1526_N_-2.1100_E_type:city_region:GB> There are also various analytics dashboards in labs which might be considered as "production" https://glam-metrics.wmflabs.org/ https://metrics.wmflabs.org/ http://reportcard.wmflabs.org/ http://searchdata.wmflabs.org/ As others have mentioned, part of the problem is the ease in spinning up a labs instance vs hardware procurement for a deployment into production. Ops (Alex) is working on virtualization infrastructure for use in prod called Ganeti (an alternative to the OpenStack virtualization software which powers labs); the first service was recently migrated from hardware to a Ganeti instance: etherpad. Ganeti could be the way forward for promoting services from labs to prod without requiring dedicated hardware. https://wikitech.wikimedia.org/wiki/Ganeti On Wed, Jun 17, 2015 at 8:01 AM, Kevin Smith <ksmith(a)wikimedia.org> wrote:

...

On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

Maps is not going to be production-like. On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org>

wrote:

> > In a recent meeting, Oliver expressed concerns about us having

services

> running in labs which are treated sort of like they were in

production.

> Examples include WDQS (already) and maps (potentially). > > We agreed to have this discussion on the mailing list, so this is an > invitation to do so. I am not familiar enough with the various

technical

> issues to explain them properly, so hopefully someone else will step

in and

> do so. I believe one big area of concern is analytics. > > > Kevin Smith > Agile Coach > Wikimedia Foundation > > > Imagine a world in which every single human being can freely share in

the

> sum of all knowledge. That's our commitment. Help us make it a

reality.

_______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Oliver Keyes

6:43 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

Does Ganeti live in the production cluster and therefore sit behind the firewalling that prohibits loading data into our analytics pipelines from Labs? On 17 June 2015 at 12:41, Jeff Gage <jgage(a)wikimedia.org> wrote:

...

On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote: > Maps is not going to be production-like. > > On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org> wrote: >> >> In a recent meeting, Oliver expressed concerns about us having services >> running in labs which are treated sort of like they were in production. >> Examples include WDQS (already) and maps (potentially). >> >> We agreed to have this discussion on the mailing list, so this is an >> invitation to do so. I am not familiar enough with the various technical >> issues to explain them properly, so hopefully someone else will step in and >> do so. I believe one big area of concern is analytics. >> >> >> Kevin Smith >> Agile Coach >> Wikimedia Foundation >> >> >> Imagine a world in which every single human being can freely share in the >> sum of all knowledge. That's our commitment. Help us make it a reality. >> >> _______________________________________________ >> Wikimedia-search mailing list >> Wikimedia-search(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search >> > > > > -- > Best regards, > Max Semenik ([[User:MaxSem]]) > > _______________________________________________ > Wikimedia-search mailing list > Wikimedia-search(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

_______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

-- Oliver Keyes Research Analyst Wikimedia Foundation

Jeff Gage

6:46 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

...

Does Ganeti live in the production cluster and therefore sit behind the firewalling that prohibits loading data into our analytics pipelines from Labs?

Yes, but it seems possible to me that the firewalls could be modified to support specific data flows.

Federico Leva (Nemo)

7:44 p.m.

James Douglas, 17/06/2015 18:12:

...

It would be easier to manage users' assumptions if we officially defined reliability (and latency, availability, etc.) for our various environments.

Huge +1. Nemo

aude

8:04 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

On Wed, Jun 17, 2015 at 5:01 PM, Kevin Smith <ksmith(a)wikimedia.org> wrote:

...

Max: I thought we were hoping to use the labs map servers to support production-like cases.

labs is really only suitable for temporary, proof of concept or experimentation maps stuff.

...

For example, possibly "beta" mobile apps, or maybe hooked into some small production wiki.

we definitely need to move beyond that.

...

Even if that is not the case, it seems like for anything we do run in labs for a long time, we would want to be able to measure. Usage and speed.

agree, having metrics is certainly important for stuff in labs. Cheers, Katie

...

Kevin Smith Agile Coach Wikimedia Foundation *Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.* On Tue, Jun 16, 2015 at 6:52 PM, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

On 16 June 2015 at 20:28, Max Semenik <maxsem.wiki(a)gmail.com> wrote:

Maps is not going to be production-like. On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith(a)wikimedia.org>

wrote:

> > In a recent meeting, Oliver expressed concerns about us having

services

> running in labs which are treated sort of like they were in

production.

> Examples include WDQS (already) and maps (potentially). > > We agreed to have this discussion on the mailing list, so this is an > invitation to do so. I am not familiar enough with the various

technical

> issues to explain them properly, so hopefully someone else will step

in and

> do so. I believe one big area of concern is analytics. > > > Kevin Smith > Agile Coach > Wikimedia Foundation > > > Imagine a world in which every single human being can freely share in

the

> sum of all knowledge. That's our commitment. Help us make it a

reality.

_______________________________________________ Wikimedia-search mailing list Wikimedia-search(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

-- @wikimediadc / @wikidata

Stas Malyshev

10:33 p.m.

Hi!

...

Thanks for explaining more! I think I understand you concern better now. With the renewed attention to WDQS productization, the point may be moot soon, but in case it won't be, I just wanted to explore a possibility of using the same infrastructure but with different inputs - or maybe possibility of building a bridge between HDFS and whatever we have in labs. I'm not saying this necessarily makes sense, but if it doesn't, I'd like to know why.

...

reinvent the wheel every time we build a thing. If we can't do HDFS and going to production isn't going to work, then let's talk about what the alternatives are. Until then the use case is "the data being in HDFS so that analysts can consume it" and higher-level use cases are overthinking.

OK. Then if we go to production soon (hopefully) I assume we have an existing workflow allowing us to get stuff to HDFS. If not, we _may_ (again, if that doesn't make sense, fine, but would like to hear the reasons) explore the possibility of some process that would allow us to get data from whatever we have now (which can be rather flexible) into HDFS. -- Stas Malyshev smalyshev(a)wikimedia.org

Oliver Keyes

11:10 p.m.

New subject: [Wikimedia-search] Concerns about production-like projects in labs

On 17 June 2015 at 16:33, Stas Malyshev <smalyshev(a)wikimedia.org> wrote:

...

Hi!

Thanks! It does make sense; at the moment the blocker is somewhere wooly around analytics and ops. So, on the same infrastructure, Analytics Engineering have indicated (iirc) they're comfortable standing up a HDFS instance but not so keen on maintaining it indefinitely. This makes total sense with their priorities. On building the bridge; at the moment the HDFS cluster is very deliberately firewalled. We'd need to deal with that (perhaps, as suggested, making highly specific and authenticated holes in the firewall?) before it was possible, and that seems to be an Analytics/Opsen thing.

...

We do! if the WDQS queries are going through Production's varnish caches (an existing cluster) they go in automatically. If they go through a new frontend cluster, the cost of switching them in is fairly small.

...

-- Oliver Keyes Research Analyst Wikimedia Foundation

3235

days inactive

3236

days old

discovery@lists.wikimedia.org

Manage subscription

19 comments

8 participants

tags (0)

participants (8)

aude
Federico Leva (Nemo)
James Douglas
Jeff Gage
Kevin Smith
Max Semenik
Oliver Keyes
Stas Malyshev