I am kicking off this thread after a good conversation with Nuria and Kaldari on pain points and opportunities we have around data QA for EventLogging.
Kaldari, Leila and I have gone through several rounds of data QA before and after the deployment of new features on Mobile and we haven’t found yet a good solution to catch data quality issues early enough in the deployment cycle. Data quality issues with EventLogging typically fall under one of these 5 scenarios:
1) events are logged and schema-compliant but don’t capture data correctly (for example: a wrong value is logged; event counts that should match don’t) 2) events are logged but are not schema-compliant (e.g.: a required field is missing) 3) events are missing due to issues with the instrumentation (e.g.: a UI element is not instrumented) 4) events are missing due to client issues (a specific UI element is not correctly rendered on a given browser/platform and as a result the event is not fired) 5) events are missing due to EventLogging outages
In the early days, Ori and I floated the idea of unit tests for instrumentation to capture constraint violations that are not easily detected via manual testing or the existing client-side validation, but this never happened. When it comes to feature deployments, beta labs is a great starting point for running manual data QA in an environment that is as close as possible to prod. However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
Having a full-fledged set of unit tests for data would be terrific, but in the short term I’d like to find a better way to at least identify events that fail validation as early as possible.
- the SQL log database has real-time data but only for event that pass client-side validation - the JSON logfiles on stat1003 include invalid events, but the data is only rsync’ed from vanadium once a day
is there a way to inspect invalid events in near real time without having access to vanadium? For example, could we create either a dedicated database to write invalid events only or a logfile for validation errors rsync’ed to stat1003 more frequently than once a day?
Thoughts?
Dario
Thanks Dario, et al.
A +1 from me -- this will make integration a lot easier. Let's see if we can address this in the Q3 project about dashboarding.
-Toby
On Thu, Dec 11, 2014 at 4:11 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I am kicking off this thread after a good conversation with Nuria and Kaldari on pain points and opportunities we have around *data QA for EventLogging*.
Kaldari, Leila and I have gone through several rounds of data QA before and after the deployment of new features on Mobile and we haven’t found yet a good solution to catch data quality issues early enough in the deployment cycle. Data quality issues with EventLogging typically fall under one of these 5 scenarios:
- events are logged and schema-compliant but don’t capture data correctly
(for example: a wrong value is logged; event counts that should match don’t) 2) events are logged but are not schema-compliant (e.g.: a required field is missing) 3) events are missing due to issues with the instrumentation (e.g.: a UI element is not instrumented) 4) events are missing due to client issues (a specific UI element is not correctly rendered on a given browser/platform and as a result the event is not fired) 5) events are missing due to EventLogging outages
In the early days, Ori and I floated the idea of unit tests for instrumentation to capture constraint violations that are not easily detected via manual testing or the existing client-side validation, but this never happened. When it comes to feature deployments, beta labs is a great starting point for running manual data QA in an environment that is as close as possible to prod. However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
Having a full-fledged set of unit tests for data would be terrific, but in the short term I’d like to find a better way to at least *identify events that fail validation as early as possible*.
- the SQL log database has real-time data but only for event that pass
client-side validation
- the JSON logfiles on stat1003 include invalid events, but the data is
only rsync’ed from vanadium once a day
is there a way to inspect invalid events in near real time without having access to vanadium? For example, could we create either a dedicated database to write invalid events only or a logfile for validation errors rsync’ed to stat1003 more frequently than once a day?
Thoughts?
Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Thu, Dec 11, 2014 at 4:11 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
is there a way to inspect invalid events in near real time without having access to vanadium?
There's this graph: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=14...)
The key is 'diffSeries(eventlogging.overall.raw.rate,eventlogging.overall.valid.rate)', which gets you the rate of invalid events per second.
It is not broken down by schema, though.
We can't write invalid events to a database -- at least not the same way we write well-formed events. The table schema is derived from the event schema, so an invalid event would violate the constraints of the table as well.
It's possible (and easy) to set something up that watches invalid events in real-time and does something with them. The question is: what? E-mail an alert? Produce a daily report? Generate a graph?
If you describe how you'd like to consume the data, I can try to hash out an an implementation with Nuria and Christian.
Captured in Phab:
https://phabricator.wikimedia.org/T78355
Please wordsmith and add other projects as appropriate. Thanks!
On Thu, Dec 11, 2014 at 4:28 PM, Ori Livneh ori@wikimedia.org wrote:
On Thu, Dec 11, 2014 at 4:11 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
is there a way to inspect invalid events in near real time without having access to vanadium?
There's this graph: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=14...)
The key is 'diffSeries(eventlogging.overall.raw.rate,eventlogging.overall.valid.rate)', which gets you the rate of invalid events per second.
It is not broken down by schema, though.
We can't write invalid events to a database -- at least not the same way we write well-formed events. The table schema is derived from the event schema, so an invalid event would violate the constraints of the table as well.
It's possible (and easy) to set something up that watches invalid events in real-time and does something with them. The question is: what? E-mail an alert? Produce a daily report? Generate a graph?
If you describe how you'd like to consume the data, I can try to hash out an an implementation with Nuria and Christian.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
thanks for the quick turnaround.
On Dec 11, 2014, at 4:28 PM, Ori Livneh ori@wikimedia.org wrote:
There's this graph: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=14...) https://graphite.wikimedia.org/render/?width=586&height=308&_salt=1418343627.977&from=-1weeks&target=movingMedian(diffSeries(eventlogging.overall.raw.rate%2Ceventlogging.overall.valid.rate)%2C20)
The key is 'diffSeries(eventlogging.overall.raw.rate,eventlogging.overall.valid.rate)', which gets you the rate of invalid events per second.
It is not broken down by schema, though.
this is great for monitoring, for QA purposes we really need the raw data
We can't write invalid events to a database -- at least not the same way we write well-formed events. The table schema is derived from the event schema, so an invalid event would violate the constraints of the table as well.
rrright
It's possible (and easy) to set something up that watches invalid events in real-time and does something with them. The question is: what? E-mail an alert? Produce a daily report? Generate a graph?
If you describe how you’d like to consume the data, I can try to hash out an an implementation with Nuria and Christian.
a JSON log like all-events.log but sync’ed from vanadium more frequently would do the job for me. It can also be truncated as we probably only need a relatively short time window and the complete data is captured in all-events anyway.
D
Team:
Besides the ability of testing in beta labs and the monitoring that ori highlited the incoming raw stream of events is available in 1003/1002 on port 8600.
From 1002 or 1003 you can run: zsub vanadium.eqiad.wmnet:8600 and see the
incoming stream.
I am not sure that something beyond that is needed, please check it out and let us know.
Thanks,
Nuria
On Thu, Dec 11, 2014 at 4:44 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
thanks for the quick turnaround.
On Dec 11, 2014, at 4:28 PM, Ori Livneh ori@wikimedia.org wrote:
There's this graph: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=14...)
The key is 'diffSeries(eventlogging.overall.raw.rate,eventlogging.overall.valid.rate)', which gets you the rate of invalid events per second.
It is not broken down by schema, though.
this is great for monitoring, for QA purposes we really need the raw data
We can't write invalid events to a database -- at least not the same way we write well-formed events. The table schema is derived from the event schema, so an invalid event would violate the constraints of the table as well.
rrright
It's possible (and easy) to set something up that watches invalid events in real-time and does something with them. The question is: what? E-mail an alert? Produce a daily report? Generate a graph?
If you describe how you’d like to consume the data, I can try to hash out an an implementation with Nuria and Christian.
a JSON log like all-events.log but sync’ed from vanadium more frequently would do the job for me. It can also be truncated as we probably only need a relatively short time window and the complete data is captured in all-events anyway.
D
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi,
On Thu, Dec 11, 2014 at 06:03:15PM -0800, Nuria Ruiz wrote:
Besides the ability of testing in beta labs and the monitoring that ori highlited the incoming raw stream of events is available in 1003/1002 on port 8600.
That's not the raw stream, but the multiplexed stream of validated events. Hence, it does not contain the invalid events Dario is looking for.
From 1002 or 1003 you can run: zsub vanadium.eqiad.wmnet:8600 and see the incoming stream.
This command adds to the load of vanadium--especially the network load, which jumps up by 20% when running the command. Since some parts on vanadium are UDP, it's better not to saturate the network.
So please only run the command if you have to, and only as long as you have to.
Have fun, Christian
Hi Dario,
On Thu, Dec 11, 2014 at 04:11:49PM -0800, Dario Taraborelli wrote:
I am kicking off this thread [...]
Thanks!
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
Full ACK.
However, that sounds like we're only talking about schemas where the collection code got tested using Vagrant or beta, and is known to work on the relevant portion of the traffic.
And since you say that it's on browsers/platforms that we don't necessarily test for internally, I assume we're actually talking only about a small fraction of the traffic.
I assume that scope for the rest of the reply.
is there a way to inspect invalid events in near real time without having access to vanadium?
* Urgent, ad-hoc needs
For urgent, ad-hoc needs, (which should happen really seldom, given the scope), ping us in IRC in #wikimedia-analytics. At least qchris, milimetric, and nuria should be able to ssh into vanadium and can take a look right away.
If none of them are around, Ops of course have access to the relevant files on vanadium [1]. And since we're in the case of urgent, ad-hoc needs, I am sure they'd help out.
* Not so urgent needs
For not so urgent needs, since it's only a small fraction of the traffic, I am not sure real-time need is worth it.
Sure it would be nice to provide near real-time access to those files, but we should also get the cluster into a more reliable state, implement UDFs for researches to make their lives easier, and get the data-warehouse up and running ;-)
But I see that meanwhile a Phabricator task got added, and I guess I am alone with my judgement :-)
Have fun, Christian
[1] Either
/srv/log/eventlogging/client-side-events.log
or
/srv/log/eventlogging/server-side-events.log
depending on the kind of event you're looking for.
But I see that meanwhile a Phabricator task got added, and I guess I am alone with my judgement :-)
Actually, I fully agree with you than no more infrastructure in this regard is needed and I think we were a little fast filing tasks here. I really think that every time we find ourselves testing in production we should evaluate what can do better in the testing pipeline but not augment production with more "testing" tools.
For now we should be able to help in irc and do as much testing as possible in beta labs. How to access data in beta labs is documented here: https://wikitech.wikimedia.org/wiki/EventLogging/Testing/BetaLabs
I talked to mobile team about testing in beta labs (as it was an issue with mobile instrumentation what sprang this discussion) and they have used it as of recent.
Thanks,
Nuria
On Mon, Dec 15, 2014 at 6:45 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Dario,
On Thu, Dec 11, 2014 at 04:11:49PM -0800, Dario Taraborelli wrote:
I am kicking off this thread [...]
Thanks!
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
Full ACK.
However, that sounds like we're only talking about schemas where the collection code got tested using Vagrant or beta, and is known to work on the relevant portion of the traffic.
And since you say that it's on browsers/platforms that we don't necessarily test for internally, I assume we're actually talking only about a small fraction of the traffic.
I assume that scope for the rest of the reply.
is there a way to inspect invalid events in near real time without having access to vanadium?
- Urgent, ad-hoc needs
For urgent, ad-hoc needs, (which should happen really seldom, given the scope), ping us in IRC in #wikimedia-analytics. At least qchris, milimetric, and nuria should be able to ssh into vanadium and can take a look right away.
If none of them are around, Ops of course have access to the relevant files on vanadium [1]. And since we're in the case of urgent, ad-hoc needs, I am sure they'd help out.
- Not so urgent needs
For not so urgent needs, since it's only a small fraction of the traffic, I am not sure real-time need is worth it.
Sure it would be nice to provide near real-time access to those files, but we should also get the cluster into a more reliable state, implement UDFs for researches to make their lives easier, and get the data-warehouse up and running ;-)
But I see that meanwhile a Phabricator task got added, and I guess I am alone with my judgement :-)
Have fun, Christian
[1] Either
/srv/log/eventlogging/client-side-events.log
or
/srv/log/eventlogging/server-side-events.log
depending on the kind of event you're looking for.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I closed the Phabricator task with a links to this thread and the wikitech doc for testing on beta cluster. https://phabricator.wikimedia.org/T78355
On Mon, Dec 15, 2014 at 7:35 AM, Nuria Ruiz nuria@wikimedia.org wrote:
But I see that meanwhile a Phabricator task got added, and I guess I am alone with my judgement :-)
Actually, I fully agree with you than no more infrastructure in this regard is needed and I think we were a little fast filing tasks here. I really think that every time we find ourselves testing in production we should evaluate what can do better in the testing pipeline but not augment production with more "testing" tools.
For now we should be able to help in irc and do as much testing as possible in beta labs. How to access data in beta labs is documented here: https://wikitech.wikimedia.org/wiki/EventLogging/Testing/BetaLabs
I talked to mobile team about testing in beta labs (as it was an issue with mobile instrumentation what sprang this discussion) and they have used it as of recent.
Thanks,
Nuria
On Mon, Dec 15, 2014 at 6:45 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Dario,
On Thu, Dec 11, 2014 at 04:11:49PM -0800, Dario Taraborelli wrote:
I am kicking off this thread [...]
Thanks!
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
Full ACK.
However, that sounds like we're only talking about schemas where the collection code got tested using Vagrant or beta, and is known to work on the relevant portion of the traffic.
And since you say that it's on browsers/platforms that we don't necessarily test for internally, I assume we're actually talking only about a small fraction of the traffic.
I assume that scope for the rest of the reply.
is there a way to inspect invalid events in near real time without having access to vanadium?
- Urgent, ad-hoc needs
For urgent, ad-hoc needs, (which should happen really seldom, given the scope), ping us in IRC in #wikimedia-analytics. At least qchris, milimetric, and nuria should be able to ssh into vanadium and can take a look right away.
If none of them are around, Ops of course have access to the relevant files on vanadium [1]. And since we're in the case of urgent, ad-hoc needs, I am sure they'd help out.
- Not so urgent needs
For not so urgent needs, since it's only a small fraction of the traffic, I am not sure real-time need is worth it.
Sure it would be nice to provide near real-time access to those files, but we should also get the cluster into a more reliable state, implement UDFs for researches to make their lives easier, and get the data-warehouse up and running ;-)
But I see that meanwhile a Phabricator task got added, and I guess I am alone with my judgement :-)
Have fun, Christian
[1] Either
/srv/log/eventlogging/client-side-events.log
or
/srv/log/eventlogging/server-side-events.log
depending on the kind of event you're looking for.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi,
On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote:
I closed the Phabricator task with a links to this thread and the wikitech doc for testing on beta cluster.
I am fine with keeping the task closed.
But I am somewhat surprised to see beta mentioned in the resolution. Note that Dario's request set scope as [1]
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
. That's a valid scope, but from my point of view, beta does not match that scope.
Neither is beta large scale, nor is it hammered on with crazy devices.
Beta is just a halfing the distance between EventLogging's devserver (Vagrant!) and production.
Have fun, Christian
[1] https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
I share Christian's concerns -
Dario/Leila - can you comment based on your recent experiences with WikiGrok?
Thanks
-Toby
On Dec 15, 2014, at 9:42 AM, Christian Aistleitner christian@quelltextlich.at wrote:
Hi,
On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote: I closed the Phabricator task with a links to this thread and the wikitech doc for testing on beta cluster.
I am fine with keeping the task closed.
But I am somewhat surprised to see beta mentioned in the resolution. Note that Dario's request set scope as [1]
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
. That's a valid scope, but from my point of view, beta does not match that scope.
Neither is beta large scale, nor is it hammered on with crazy devices.
Beta is just a halfing the distance between EventLogging's devserver (Vagrant!) and production.
Have fun, Christian
[1] https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I filed a bug about the difficultly of debugging Schema failure back in November, but no one ever responded to it: https://phabricator.wikimedia.org/T75678
On Mon, Dec 15, 2014 at 10:06 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I share Christian's concerns -
Dario/Leila - can you comment based on your recent experiences with WikiGrok?
Thanks
-Toby
On Dec 15, 2014, at 9:42 AM, Christian Aistleitner <
christian@quelltextlich.at> wrote:
Hi,
On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote: I closed the Phabricator task with a links to this thread and the
wikitech
doc for testing on beta cluster.
I am fine with keeping the task closed.
But I am somewhat surprised to see beta mentioned in the resolution. Note that Dario's request set scope as [1]
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
. That's a valid scope, but from my point of view, beta does not match that scope.
Neither is beta large scale, nor is it hammered on with crazy devices.
Beta is just a halfing the distance between EventLogging's devserver (Vagrant!) and production.
Have fun, Christian
[1]
https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Mon, Dec 15, 2014 at 10:06 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I share Christian's concerns -
Dario/Leila - can you comment based on your recent experiences with WikiGrok?
I agree with Christian.
QA in beta labs is good but not enough. We still need to do QA when a feature goes to production and currently, it's very hard to figure out if there's a problem with logging. An example:
While testing WikiGrok in production, we learned that after some point tests from Firefox browser from my machine were not logged. We did not get any errors for this. I found out about this because I was trying to manually make a trace of activities and see if I can stitch them together and make sense of them. We eventually figured out what was going on in that case [1], but it concerns me that there may be other important events that we don't log in the DB and we never know that we're not logging.
Leila [1] https://lists.wikimedia.org/pipermail/analytics/2014-December/002864.html
Thanks
-Toby
On Dec 15, 2014, at 9:42 AM, Christian Aistleitner <
christian@quelltextlich.at> wrote:
Hi,
On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote: I closed the Phabricator task with a links to this thread and the
wikitech
doc for testing on beta cluster.
I am fine with keeping the task closed.
But I am somewhat surprised to see beta mentioned in the resolution. Note that Dario's request set scope as [1]
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
. That's a valid scope, but from my point of view, beta does not match that scope.
Neither is beta large scale, nor is it hammered on with crazy devices.
Beta is just a halfing the distance between EventLogging's devserver (Vagrant!) and production.
Have fun, Christian
[1]
https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
QA in beta labs is good but not enough. We still need to do QA when a
feature goes to production and currently This is true but at the same time, I do not see anything in the description of your FF events that could not be tested on beta-labs. If we are talking add-block that can be tested even earlier, vagrant will be a fine venue. All the issues related to the client (browser) not emitting events can be tested on the development environment with ease.
On Mon, Dec 15, 2014 at 4:18 PM, Leila Zia leila@wikimedia.org wrote:
On Mon, Dec 15, 2014 at 10:06 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I share Christian's concerns -
Dario/Leila - can you comment based on your recent experiences with WikiGrok?
I agree with Christian.
QA in beta labs is good but not enough. We still need to do QA when a feature goes to production and currently, it's very hard to figure out if there's a problem with logging. An example:
While testing WikiGrok in production, we learned that after some point tests from Firefox browser from my machine were not logged. We did not get any errors for this. I found out about this because I was trying to manually make a trace of activities and see if I can stitch them together and make sense of them. We eventually figured out what was going on in that case [1], but it concerns me that there may be other important events that we don't log in the DB and we never know that we're not logging.
Leila [1] https://lists.wikimedia.org/pipermail/analytics/2014-December/002864.html
Thanks
-Toby
On Dec 15, 2014, at 9:42 AM, Christian Aistleitner <
christian@quelltextlich.at> wrote:
Hi,
On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote: I closed the Phabricator task with a links to this thread and the
wikitech
doc for testing on beta cluster.
I am fine with keeping the task closed.
But I am somewhat surprised to see beta mentioned in the resolution. Note that Dario's request set scope as [1]
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
. That's a valid scope, but from my point of view, beta does not match that scope.
Neither is beta large scale, nor is it hammered on with crazy devices.
Beta is just a halfing the distance between EventLogging's devserver (Vagrant!) and production.
Have fun, Christian
[1]
https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I reopened the task because discussions on this are still ongoing and the issue isn't entirely resolved.
I'd like to move this to a video conference call between analytics developers and analytics engineering to come to a mutual understanding of what the current pain points are and what's the biggest priority. We'll then communicate a plan back to the list and update the tasks involved.
On Mon, Dec 15, 2014 at 4:37 PM, Nuria Ruiz nuria@wikimedia.org wrote:
QA in beta labs is good but not enough. We still need to do QA when a
feature goes to production and currently This is true but at the same time, I do not see anything in the description of your FF events that could not be tested on beta-labs. If we are talking add-block that can be tested even earlier, vagrant will be a fine venue. All the issues related to the client (browser) not emitting events can be tested on the development environment with ease.
On Mon, Dec 15, 2014 at 4:18 PM, Leila Zia leila@wikimedia.org wrote:
On Mon, Dec 15, 2014 at 10:06 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I share Christian's concerns -
Dario/Leila - can you comment based on your recent experiences with WikiGrok?
I agree with Christian.
QA in beta labs is good but not enough. We still need to do QA when a feature goes to production and currently, it's very hard to figure out if there's a problem with logging. An example:
While testing WikiGrok in production, we learned that after some point tests from Firefox browser from my machine were not logged. We did not get any errors for this. I found out about this because I was trying to manually make a trace of activities and see if I can stitch them together and make sense of them. We eventually figured out what was going on in that case [1], but it concerns me that there may be other important events that we don't log in the DB and we never know that we're not logging.
Leila [1] https://lists.wikimedia.org/pipermail/analytics/2014-December/002864.html
Thanks
-Toby
On Dec 15, 2014, at 9:42 AM, Christian Aistleitner <
christian@quelltextlich.at> wrote:
Hi,
On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote: I closed the Phabricator task with a links to this thread and the
wikitech
doc for testing on beta cluster.
I am fine with keeping the task closed.
But I am somewhat surprised to see beta mentioned in the resolution. Note that Dario's request set scope as [1]
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
. That's a valid scope, but from my point of view, beta does not match that scope.
Neither is beta large scale, nor is it hammered on with crazy devices.
Beta is just a halfing the distance between EventLogging's devserver (Vagrant!) and production.
Have fun, Christian
[1]
https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Monday, December 15, 2014, Kevin Leduc kevin@wikimedia.org wrote:
I'd like to move this to a video conference call between analytics developers and analytics engineering to come to a mutual understanding of what the current pain points are and what's the biggest priority.
It probably makes sense to have someone from R&D with experience in QA in that meeting (Dario if you want a more experienced person, myself otherwise). Not sure if you meant the same when you said analytics engineering.
Leila
On Mon, Dec 15, 2014 at 4:37 PM, Nuria Ruiz <nuria@wikimedia.org javascript:_e(%7B%7D,'cvml','nuria@wikimedia.org');> wrote:
QA in beta labs is good but not enough. We still need to do QA when a
feature goes to production and currently This is true but at the same time, I do not see anything in the description of your FF events that could not be tested on beta-labs. If we are talking add-block that can be tested even earlier, vagrant will be a fine venue. All the issues related to the client (browser) not emitting events can be tested on the development environment with ease.
On Mon, Dec 15, 2014 at 4:18 PM, Leila Zia <leila@wikimedia.org javascript:_e(%7B%7D,'cvml','leila@wikimedia.org');> wrote:
On Mon, Dec 15, 2014 at 10:06 AM, Toby Negrin <tnegrin@wikimedia.org javascript:_e(%7B%7D,'cvml','tnegrin@wikimedia.org');> wrote:
I share Christian's concerns -
Dario/Leila - can you comment based on your recent experiences with WikiGrok?
I agree with Christian.
QA in beta labs is good but not enough. We still need to do QA when a feature goes to production and currently, it's very hard to figure out if there's a problem with logging. An example:
While testing WikiGrok in production, we learned that after some point tests from Firefox browser from my machine were not logged. We did not get any errors for this. I found out about this because I was trying to manually make a trace of activities and see if I can stitch them together and make sense of them. We eventually figured out what was going on in that case [1], but it concerns me that there may be other important events that we don't log in the DB and we never know that we're not logging.
Leila [1] https://lists.wikimedia.org/pipermail/analytics/2014-December/002864.html
Thanks
-Toby
On Dec 15, 2014, at 9:42 AM, Christian Aistleitner <
christian@quelltextlich.at javascript:_e(%7B%7D,'cvml','christian@quelltextlich.at');> wrote:
Hi,
On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote: I closed the Phabricator task with a links to this thread and the
wikitech
doc for testing on beta cluster.
I am fine with keeping the task closed.
But I am somewhat surprised to see beta mentioned in the resolution. Note that Dario's request set scope as [1]
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
. That's a valid scope, but from my point of view, beta does not match that scope.
Neither is beta large scale, nor is it hammered on with crazy devices.
Beta is just a halfing the distance between EventLogging's devserver (Vagrant!) and production.
Have fun, Christian
[1]
https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at
javascript:_e(%7B%7D,'cvml','christian@quelltextlich.at');
4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org
javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org');
Analytics mailing list Analytics@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/analytics
I added a comment to the ticket requesting a simple error log for validation errors. I think that would solve about 50% of the problem and should be easy to implement.
Kaldari
On Mon, Dec 15, 2014 at 5:58 PM, Leila Zia leila@wikimedia.org wrote:
On Monday, December 15, 2014, Kevin Leduc kevin@wikimedia.org wrote:
I'd like to move this to a video conference call between analytics developers and analytics engineering to come to a mutual understanding of what the current pain points are and what's the biggest priority.
It probably makes sense to have someone from R&D with experience in QA in that meeting (Dario if you want a more experienced person, myself otherwise). Not sure if you meant the same when you said analytics engineering.
Leila
On Mon, Dec 15, 2014 at 4:37 PM, Nuria Ruiz nuria@wikimedia.org wrote:
QA in beta labs is good but not enough. We still need to do QA when a
feature goes to production and currently This is true but at the same time, I do not see anything in the description of your FF events that could not be tested on beta-labs. If we are talking add-block that can be tested even earlier, vagrant will be a fine venue. All the issues related to the client (browser) not emitting events can be tested on the development environment with ease.
On Mon, Dec 15, 2014 at 4:18 PM, Leila Zia leila@wikimedia.org wrote:
On Mon, Dec 15, 2014 at 10:06 AM, Toby Negrin tnegrin@wikimedia.org wrote:
I share Christian's concerns -
Dario/Leila - can you comment based on your recent experiences with WikiGrok?
I agree with Christian.
QA in beta labs is good but not enough. We still need to do QA when a feature goes to production and currently, it's very hard to figure out if there's a problem with logging. An example:
While testing WikiGrok in production, we learned that after some point tests from Firefox browser from my machine were not logged. We did not get any errors for this. I found out about this because I was trying to manually make a trace of activities and see if I can stitch them together and make sense of them. We eventually figured out what was going on in that case [1], but it concerns me that there may be other important events that we don't log in the DB and we never know that we're not logging.
Leila [1] https://lists.wikimedia.org/pipermail/analytics/2014-December/002864.html
Thanks
-Toby
On Dec 15, 2014, at 9:42 AM, Christian Aistleitner <
christian@quelltextlich.at> wrote:
Hi,
> On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote: > I closed the Phabricator task with a links to this thread and the
wikitech
> doc for testing on beta cluster.
I am fine with keeping the task closed.
But I am somewhat surprised to see beta mentioned in the resolution. Note that Dario's request set scope as [1]
However, there are types of data quality issues that we only discover when collecting data at scale and in the wild (on browsers/platforms that we don’t necessarily test for internally).
. That's a valid scope, but from my point of view, beta does not
match
that scope.
Neither is beta large scale, nor is it hammered on with crazy
devices.
Beta is just a halfing the distance between EventLogging's devserver (Vagrant!) and production.
Have fun, Christian
[1]
https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics