Hey yo,
Just a note that EventLogging had replication problems and needed to be backfilled yesterday. This means that if you had scripts running early this morning over EventLogging data from yesterday or the last few days, you're probably gonna need to rerun them and should check whether you need to.
Clarification; it's backfilling from the database consumer's POV, but no data actually got dropped. It was just replication lag :)
On 12 January 2016 at 10:01, Oliver Keyes okeyes@wikimedia.org wrote:
Hey yo,
Just a note that EventLogging had replication problems and needed to be backfilled yesterday. This means that if you had scripts running early this morning over EventLogging data from yesterday or the last few days, you're probably gonna need to rerun them and should check whether you need to.
-- Oliver Keyes Count Logula Wikimedia Foundation
Update: still backlogged, be aware if you're relying on EL for day-to-day events.
On 12 January 2016 at 10:06, Oliver Keyes okeyes@wikimedia.org wrote:
Clarification; it's backfilling from the database consumer's POV, but no data actually got dropped. It was just replication lag :)
On 12 January 2016 at 10:01, Oliver Keyes okeyes@wikimedia.org wrote:
Hey yo,
Just a note that EventLogging had replication problems and needed to be backfilled yesterday. This means that if you had scripts running early this morning over EventLogging data from yesterday or the last few days, you're probably gonna need to rerun them and should check whether you need to.
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
Update: partial resolution thus far. Schemas producing fewer than 1,000 events until the replication script gets to them (i.e. most smaller ones) are now working again. Others have lag. You should check your tables, basically.
Many thanks to Nuria and Mr Otto for resolving so much of the problem; it's a very FUD-like process and their ability to cut through it with clarity is most admirable :).
On 13 January 2016 at 11:16, Oliver Keyes okeyes@wikimedia.org wrote:
Update: still backlogged, be aware if you're relying on EL for day-to-day events.
On 12 January 2016 at 10:06, Oliver Keyes okeyes@wikimedia.org wrote:
Clarification; it's backfilling from the database consumer's POV, but no data actually got dropped. It was just replication lag :)
On 12 January 2016 at 10:01, Oliver Keyes okeyes@wikimedia.org wrote:
Hey yo,
Just a note that EventLogging had replication problems and needed to be backfilled yesterday. This means that if you had scripts running early this morning over EventLogging data from yesterday or the last few days, you're probably gonna need to rerun them and should check whether you need to.
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
Monday update!
Jaime is looking into the problem and you can see the commentary and regular updates at https://phabricator.wikimedia.org/T123634 . It looks like many many long-running queries are gradually accumulating the lag, and Faidon's commentary on the Ops list was accurate. So, please keep your queries short or on Quarry if you possibly can.
In the long-term I suspect we want a second box, so that we have "all the databases up to date" to draw from for reporting and "all the databases maybe a bit lagged" for the queries that take a while to run, but we shall see what we shall see. Thanks to Andrew and Nuria for keeping on this and Jaime for jumping right back in so soon after returning from holiday.
On 15 January 2016 at 10:37, Oliver Keyes okeyes@wikimedia.org wrote:
Update: partial resolution thus far. Schemas producing fewer than 1,000 events until the replication script gets to them (i.e. most smaller ones) are now working again. Others have lag. You should check your tables, basically.
Many thanks to Nuria and Mr Otto for resolving so much of the problem; it's a very FUD-like process and their ability to cut through it with clarity is most admirable :).
On 13 January 2016 at 11:16, Oliver Keyes okeyes@wikimedia.org wrote:
Update: still backlogged, be aware if you're relying on EL for day-to-day events.
On 12 January 2016 at 10:06, Oliver Keyes okeyes@wikimedia.org wrote:
Clarification; it's backfilling from the database consumer's POV, but no data actually got dropped. It was just replication lag :)
On 12 January 2016 at 10:01, Oliver Keyes okeyes@wikimedia.org wrote:
Hey yo,
Just a note that EventLogging had replication problems and needed to be backfilled yesterday. This means that if you had scripts running early this morning over EventLogging data from yesterday or the last few days, you're probably gonna need to rerun them and should check whether you need to.
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
Thanks for keeping the list updated on this, Oliver. You are awesome :)
On Mon, Jan 18, 2016 at 12:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Monday update!
Jaime is looking into the problem and you can see the commentary and regular updates at https://phabricator.wikimedia.org/T123634 . It looks like many many long-running queries are gradually accumulating the lag, and Faidon's commentary on the Ops list was accurate. So, please keep your queries short or on Quarry if you possibly can.
In the long-term I suspect we want a second box, so that we have "all the databases up to date" to draw from for reporting and "all the databases maybe a bit lagged" for the queries that take a while to run, but we shall see what we shall see. Thanks to Andrew and Nuria for keeping on this and Jaime for jumping right back in so soon after returning from holiday.
On 15 January 2016 at 10:37, Oliver Keyes okeyes@wikimedia.org wrote:
Update: partial resolution thus far. Schemas producing fewer than 1,000 events until the replication script gets to them (i.e. most smaller ones) are now working again. Others have lag. You should check your tables, basically.
Many thanks to Nuria and Mr Otto for resolving so much of the problem; it's a very FUD-like process and their ability to cut through it with clarity is most admirable :).
On 13 January 2016 at 11:16, Oliver Keyes okeyes@wikimedia.org wrote:
Update: still backlogged, be aware if you're relying on EL for day-to-day events.
On 12 January 2016 at 10:06, Oliver Keyes okeyes@wikimedia.org wrote:
Clarification; it's backfilling from the database consumer's POV, but no data actually got dropped. It was just replication lag :)
On 12 January 2016 at 10:01, Oliver Keyes okeyes@wikimedia.org
wrote:
Hey yo,
Just a note that EventLogging had replication problems and needed to be backfilled yesterday. This means that if you had scripts running early this morning over EventLogging data from yesterday or the last few days, you're probably gonna need to rerun them and should check whether you need to.
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics