Looks like I was right.  

I had a query writing to a table.  I thought it would finish last night, but it ran for > 24 hours.  I've killed it and changed the query so that it will write to an output file instead.  I restarted the query, but now lag seems to be recovering.  Lag is ~14 hours for enwiki and ~ 18 hours for log. 

-Aaron




On Tue, Jul 29, 2014 at 5:56 PM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:
This might be me.  Killing the query I'm worried about.   I'll report back. 


On Tue, Jul 29, 2014 at 5:46 PM, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi,

just a quick heads up that the replication lag on
analytics-store.eqiad.wmnet (aka “The one machine to rule them all”)
has risen to >12 hours for s1 replicas. Other replicas are fine.

So on analytics-store.eqiad.wmnet:
* enwiki is affected.
* log (EventLogging) is affected.

Other databases (like dewiki, eswiki, ...) on
analytics-store.eqiad.wmnet are /not/ affected.



For queries that only rely on enwiki, or log, you can use

  s1-analytics-slave.eqiad.wmnet

as drop in replacement. enwiki and log are not lagging there.

I filed RT ticket 8032:
  https://rt.wikimedia.org/Ticket/Display.html?id=8032

Best regards,
Christian


--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  christian@quelltextlich.at
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics