In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
*Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org wrote:
In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
*Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org wrote:
In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod. 16 июня 2015 г. 18:16 пользователь "Oliver Keyes" okeyes@wikimedia.org написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org
wrote:
In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in
and
do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share in
the
sum of all knowledge. That's our commitment. Help us make it a reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Max: I thought we were hoping to use the labs map servers to support production-like cases. For example, possibly "beta" mobile apps, or maybe hooked into some small production wiki.
Even if that is not the case, it seems like for anything we do run in labs for a long time, we would want to be able to measure. Usage and speed.
Kevin Smith Agile Coach Wikimedia Foundation
*Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*
On Tue, Jun 16, 2015 at 6:52 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod. 16 июня 2015 г. 18:16 пользователь "Oliver Keyes" okeyes@wikimedia.org написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org
wrote:
In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various
technical
issues to explain them properly, so hopefully someone else will step
in and
do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share in
the
sum of all knowledge. That's our commitment. Help us make it a reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Labs hosts a special project called Tool Labs which https://wikitech.wikimedia.org/wiki/Help:FAQ describes as "designed to host tools that are not so complicated that they would require a dedicated instance". One notable tool is Geohack (https://tools.wmflabs.org/geohack/): every article about a location links to it. Example from https://en.wikipedia.org/wiki/Aberdeen: Coordinates: 57.1526°N 2.1100°W https://tools.wmflabs.org/geohack/geohack.php?pagename=Aberdeen¶ms=57.1526_N_-2.1100_E_type:city_region:GB
There are also various analytics dashboards in labs which might be considered as "production" https://glam-metrics.wmflabs.org/ https://metrics.wmflabs.org/ http://reportcard.wmflabs.org/ http://searchdata.wmflabs.org/ As others have mentioned, part of the problem is the ease in spinning up a labs instance vs hardware procurement for a deployment into production. Ops (Alex) is working on virtualization infrastructure for use in prod called Ganeti (an alternative to the OpenStack virtualization software which powers labs); the first service was recently migrated from hardware to a Ganeti instance: etherpad. Ganeti could be the way forward for promoting services from labs to prod without requiring dedicated hardware.
https://wikitech.wikimedia.org/wiki/Ganeti
On Wed, Jun 17, 2015 at 8:01 AM, Kevin Smith ksmith@wikimedia.org wrote:
Max: I thought we were hoping to use the labs map servers to support production-like cases. For example, possibly "beta" mobile apps, or maybe hooked into some small production wiki.
Even if that is not the case, it seems like for anything we do run in labs for a long time, we would want to be able to measure. Usage and speed.
Kevin Smith Agile Coach Wikimedia Foundation
*Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*
On Tue, Jun 16, 2015 at 6:52 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod. 16 июня 2015 г. 18:16 пользователь "Oliver Keyes" okeyes@wikimedia.org написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org
wrote:
In a recent meeting, Oliver expressed concerns about us having
services
running in labs which are treated sort of like they were in
production.
Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various
technical
issues to explain them properly, so hopefully someone else will step
in and
do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share in
the
sum of all knowledge. That's our commitment. Help us make it a
reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Does Ganeti live in the production cluster and therefore sit behind the firewalling that prohibits loading data into our analytics pipelines from Labs?
On 17 June 2015 at 12:41, Jeff Gage jgage@wikimedia.org wrote:
Labs hosts a special project called Tool Labs which https://wikitech.wikimedia.org/wiki/Help:FAQ describes as "designed to host tools that are not so complicated that they would require a dedicated instance". One notable tool is Geohack (https://tools.wmflabs.org/geohack/): every article about a location links to it. Example from https://en.wikipedia.org/wiki/Aberdeen: Coordinates: 57.1526°N 2.1100°W https://tools.wmflabs.org/geohack/geohack.php?pagename=Aberdeen¶ms=57.1526_N_-2.1100_E_type:city_region:GB
There are also various analytics dashboards in labs which might be considered as "production" https://glam-metrics.wmflabs.org/ https://metrics.wmflabs.org/ http://reportcard.wmflabs.org/ http://searchdata.wmflabs.org/ As others have mentioned, part of the problem is the ease in spinning up a labs instance vs hardware procurement for a deployment into production. Ops (Alex) is working on virtualization infrastructure for use in prod called Ganeti (an alternative to the OpenStack virtualization software which powers labs); the first service was recently migrated from hardware to a Ganeti instance: etherpad. Ganeti could be the way forward for promoting services from labs to prod without requiring dedicated hardware.
https://wikitech.wikimedia.org/wiki/Ganeti
On Wed, Jun 17, 2015 at 8:01 AM, Kevin Smith ksmith@wikimedia.org wrote:
Max: I thought we were hoping to use the labs map servers to support production-like cases. For example, possibly "beta" mobile apps, or maybe hooked into some small production wiki.
Even if that is not the case, it seems like for anything we do run in labs for a long time, we would want to be able to measure. Usage and speed.
Kevin Smith Agile Coach Wikimedia Foundation
*Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*
On Tue, Jun 16, 2015 at 6:52 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod. 16 июня 2015 г. 18:16 пользователь "Oliver Keyes" okeyes@wikimedia.org написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org
wrote:
In a recent meeting, Oliver expressed concerns about us having
services
running in labs which are treated sort of like they were in
production.
Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various
technical
issues to explain them properly, so hopefully someone else will step
in and
do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share
in the
sum of all knowledge. That's our commitment. Help us make it a
reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Does Ganeti live in the production cluster and therefore sit behind the firewalling that prohibits loading data into our analytics pipelines from Labs?
Yes, but it seems possible to me that the firewalls could be modified to support specific data flows.
On Wed, Jun 17, 2015 at 5:01 PM, Kevin Smith ksmith@wikimedia.org wrote:
Max: I thought we were hoping to use the labs map servers to support production-like cases.
labs is really only suitable for temporary, proof of concept or experimentation maps stuff.
For example, possibly "beta" mobile apps, or maybe hooked into some small production wiki.
we definitely need to move beyond that.
Even if that is not the case, it seems like for anything we do run in labs for a long time, we would want to be able to measure. Usage and speed.
agree, having metrics is certainly important for stuff in labs.
Cheers, Katie
Kevin Smith Agile Coach Wikimedia Foundation
*Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*
On Tue, Jun 16, 2015 at 6:52 PM, Max Semenik maxsem.wiki@gmail.com wrote:
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod. 16 июня 2015 г. 18:16 пользователь "Oliver Keyes" okeyes@wikimedia.org написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org
wrote:
In a recent meeting, Oliver expressed concerns about us having
services
running in labs which are treated sort of like they were in
production.
Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various
technical
issues to explain them properly, so hopefully someone else will step
in and
do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share in
the
sum of all knowledge. That's our commitment. Help us make it a
reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Is it running for a really long time? Is it designed to not fall over? Is it no longer being actively looked at with the look you give a rickety Jenga tower because you're confident it's bug-free? Are users being pointed at it without caveats? Then it's production.
On 16 June 2015 at 21:52, Max Semenik maxsem.wiki@gmail.com wrote:
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod.
16 июня 2015 г. 18:16 пользователь "Oliver Keyes" okeyes@wikimedia.org написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org wrote:
In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Do we have our definition of production posted anywhere?
On Wed, Jun 17, 2015 at 9:08 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Is it running for a really long time? Is it designed to not fall over? Is it no longer being actively looked at with the look you give a rickety Jenga tower because you're confident it's bug-free? Are users being pointed at it without caveats? Then it's production.
On 16 June 2015 at 21:52, Max Semenik maxsem.wiki@gmail.com wrote:
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod.
16 июня 2015 г. 18:16 пользователь "Oliver Keyes" okeyes@wikimedia.org написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org wrote:
In a recent meeting, Oliver expressed concerns about us having
services
running in labs which are treated sort of like they were in
production.
Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step
in
and do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a
reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Strictly-speaking "production" is the "primary" cluster but the point here is about the reliability users assume around a service rather than the reliability it has.
On 17 June 2015 at 12:09, James Douglas jdouglas@wikimedia.org wrote:
Do we have our definition of production posted anywhere?
On Wed, Jun 17, 2015 at 9:08 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Is it running for a really long time? Is it designed to not fall over? Is it no longer being actively looked at with the look you give a rickety Jenga tower because you're confident it's bug-free? Are users being pointed at it without caveats? Then it's production.
On 16 June 2015 at 21:52, Max Semenik maxsem.wiki@gmail.com wrote:
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod.
16 июня 2015 г. 18:16 пользователь "Oliver Keyes" okeyes@wikimedia.org написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith ksmith@wikimedia.org wrote:
In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
We agreed to have this discussion on the mailing list, so this is an invitation to do so. I am not familiar enough with the various technical issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.
Kevin Smith Agile Coach Wikimedia Foundation
Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
It would be easier to manage users' assumptions if we officially defined reliability (and latency, availability, etc.) for our various environments.
On Wed, Jun 17, 2015 at 9:10 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Strictly-speaking "production" is the "primary" cluster but the point here is about the reliability users assume around a service rather than the reliability it has.
On 17 June 2015 at 12:09, James Douglas jdouglas@wikimedia.org wrote:
Do we have our definition of production posted anywhere?
On Wed, Jun 17, 2015 at 9:08 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
Is it running for a really long time? Is it designed to not fall over? Is it no longer being actively looked at with the look you give a rickety Jenga tower because you're confident it's bug-free? Are users being pointed at it without caveats? Then it's production.
On 16 June 2015 at 21:52, Max Semenik maxsem.wiki@gmail.com wrote:
Explain what? Everything we have in labs is for experimentation and prototyping. It should not be used for prod.
16 июня 2015 г. 18:16 пользователь "Oliver Keyes" <
okeyes@wikimedia.org>
написал:
Max, could you explain that?
On 16 June 2015 at 20:28, Max Semenik maxsem.wiki@gmail.com wrote:
Maps is not going to be production-like.
On Tue, Jun 16, 2015 at 2:46 PM, Kevin Smith <ksmith@wikimedia.org
wrote: > > In a recent meeting, Oliver expressed concerns about us having > services > running in labs which are treated sort of like they were in > production. > Examples include WDQS (already) and maps (potentially). > > We agreed to have this discussion on the mailing list, so this is
an
> invitation to do so. I am not familiar enough with the various > technical > issues to explain them properly, so hopefully someone else will
step
> in > and > do so. I believe one big area of concern is analytics. > > > Kevin Smith > Agile Coach > Wikimedia Foundation > > > Imagine a world in which every single human being can freely share > in > the > sum of all knowledge. That's our commitment. Help us make it a > reality. > > _______________________________________________ > Wikimedia-search mailing list > Wikimedia-search@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search >
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Hi!
In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
What specifically are the concerns?
As I understand, the requirements to run service in production are much higher than in labs, so we can either run it on labs, or not run it at all, at least for the time it takes to complete all the work required to fill the delta.
issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.
What about analytics? Is it about analyzing WDQS? I'd be glad to help if I can though not sure what needs to be done there.
On 16 June 2015 at 20:50, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
In a recent meeting, Oliver expressed concerns about us having services running in labs which are treated sort of like they were in production. Examples include WDQS (already) and maps (potentially).
What specifically are the concerns?
As I understand, the requirements to run service in production are much higher than in labs, so we can either run it on labs, or not run it at all, at least for the time it takes to complete all the work required to fill the delta.
See below.
issues to explain them properly, so hopefully someone else will step in and do so. I believe one big area of concern is analytics.
What about analytics? Is it about analyzing WDQS? I'd be glad to help if I can though not sure what needs to be done there.
The problem, as we've gone back and forth about for a while on phabricator, is that labs has absolutely zero inbuilt infrastructure for analytics.
If things are in production they go through the frontend varnishes, which are hooked up to HDFS, and all is fine. We have the request logs. If things are in labs...nothing. There is no access to HDFS, there is no consistent varnish setup that pipes things there, and analytics engineering has pretty much no plans to set up that sort of infrastructure.
-- Stas Malyshev smalyshev@wikimedia.org
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Hi!
The problem, as we've gone back and forth about for a while on phabricator, is that labs has absolutely zero inbuilt infrastructure for analytics.
If things are in production they go through the frontend varnishes, which are hooked up to HDFS, and all is fine. We have the request logs. If things are in labs...nothing. There is no access to HDFS, there is no consistent varnish setup that pipes things there, and analytics engineering has pretty much no plans to set up that sort of infrastructure.
Right. What I am still missing is that HDFS, varnish, etc. are means to an end, end being delivering info (in this case, usage logs) somewhere, and then doing something. So I do not have right now clear picture of what is that somewhere/something, and what data it consumes in what form. Maybe if I would be more up to speed on this - or at least understood what inputs are required and which forms of these inputs are acceptable, I could have a better picture.
Well, no, HDFS is a means to and end of storing data in a form that can be cleaned with ETL processes so that /then/ they can go to the somewhere/something - which is a lot of use cases but most prominently our dashboards and ad-hoc research tasks.
Let me be clear here that this isn't a theoretical exercise existing in a vacuum; we do not want /a/ answer that can be hooked up to the dashboards. That's easy. That's a hideous shell script that scaps nginx files over. We want a answer that can be hooked up to the dashboards for many, many, many things, because we're not just wanting metrics and analytics for WDQS, we're also wanting them for the production API and for user events and for the cirrus logs and for high-level KPIs and that's just the things we've wanted this month.
I can't be building out an entirely new pipeline every single time someone builds a thing. That's not an efficient use of our analysts time and it massively increases the chance that something will go wrong. I'm not asking for an alternative to HDFS, because I don't want to be doing that. I'm asking for HDFS because then we don't need to reinvent the wheel every time we build a thing. If we can't do HDFS and going to production isn't going to work, then let's talk about what the alternatives are. Until then the use case is "the data being in HDFS so that analysts can consume it" and higher-level use cases are overthinking.
On 17 June 2015 at 03:22, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
The problem, as we've gone back and forth about for a while on phabricator, is that labs has absolutely zero inbuilt infrastructure for analytics.
If things are in production they go through the frontend varnishes, which are hooked up to HDFS, and all is fine. We have the request logs. If things are in labs...nothing. There is no access to HDFS, there is no consistent varnish setup that pipes things there, and analytics engineering has pretty much no plans to set up that sort of infrastructure.
Right. What I am still missing is that HDFS, varnish, etc. are means to an end, end being delivering info (in this case, usage logs) somewhere, and then doing something. So I do not have right now clear picture of what is that somewhere/something, and what data it consumes in what form. Maybe if I would be more up to speed on this - or at least understood what inputs are required and which forms of these inputs are acceptable, I could have a better picture. -- Stas Malyshev smalyshev@wikimedia.org
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Hi!
Well, no, HDFS is a means to and end of storing data in a form that can be cleaned with ETL processes so that /then/ they can go to the somewhere/something - which is a lot of use cases but most prominently our dashboards and ad-hoc research tasks.
Thanks for explaining more! I think I understand you concern better now. With the renewed attention to WDQS productization, the point may be moot soon, but in case it won't be, I just wanted to explore a possibility of using the same infrastructure but with different inputs - or maybe possibility of building a bridge between HDFS and whatever we have in labs. I'm not saying this necessarily makes sense, but if it doesn't, I'd like to know why.
reinvent the wheel every time we build a thing. If we can't do HDFS and going to production isn't going to work, then let's talk about what the alternatives are. Until then the use case is "the data being in HDFS so that analysts can consume it" and higher-level use cases are overthinking.
OK. Then if we go to production soon (hopefully) I assume we have an existing workflow allowing us to get stuff to HDFS. If not, we _may_ (again, if that doesn't make sense, fine, but would like to hear the reasons) explore the possibility of some process that would allow us to get data from whatever we have now (which can be rather flexible) into HDFS.
On 17 June 2015 at 16:33, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Well, no, HDFS is a means to and end of storing data in a form that can be cleaned with ETL processes so that /then/ they can go to the somewhere/something - which is a lot of use cases but most prominently our dashboards and ad-hoc research tasks.
Thanks for explaining more! I think I understand you concern better now. With the renewed attention to WDQS productization, the point may be moot soon, but in case it won't be, I just wanted to explore a possibility of using the same infrastructure but with different inputs - or maybe possibility of building a bridge between HDFS and whatever we have in labs. I'm not saying this necessarily makes sense, but if it doesn't, I'd like to know why.
Thanks! It does make sense; at the moment the blocker is somewhere wooly around analytics and ops. So, on the same infrastructure, Analytics Engineering have indicated (iirc) they're comfortable standing up a HDFS instance but not so keen on maintaining it indefinitely. This makes total sense with their priorities. On building the bridge; at the moment the HDFS cluster is very deliberately firewalled. We'd need to deal with that (perhaps, as suggested, making highly specific and authenticated holes in the firewall?) before it was possible, and that seems to be an Analytics/Opsen thing.
reinvent the wheel every time we build a thing. If we can't do HDFS and going to production isn't going to work, then let's talk about what the alternatives are. Until then the use case is "the data being in HDFS so that analysts can consume it" and higher-level use cases are overthinking.
OK. Then if we go to production soon (hopefully) I assume we have an existing workflow allowing us to get stuff to HDFS. If not, we _may_ (again, if that doesn't make sense, fine, but would like to hear the reasons) explore the possibility of some process that would allow us to get data from whatever we have now (which can be rather flexible) into HDFS.
We do! if the WDQS queries are going through Production's varnish caches (an existing cluster) they go in automatically. If they go through a new frontend cluster, the cost of switching them in is fairly small.
-- Stas Malyshev smalyshev@wikimedia.org
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
wikimedia-search@lists.wikimedia.org