db1047 & one box to rule them all

List overview All Threads
Download

newer

older

Qualitative Data Visualization

EventLogging migration

Sean Pringle

29 Apr 2014 29 Apr '14

3:20 p.m.

Hi!

The speed bumps from the eventlogging migration are almost ironed out:

1. db1048 has had the eventlogging uuid fields made formally UNIQUE KEY. I gather Ori will now run some validation against logs to check for remaining gaps.

2. db1046 which died mid-migration has been restored and is catching up. This doesn't really affect Analytics except that it's to be part of db1047's replication chain for eventlogging.

3. db1047 is finishing up reloading log data and removing the CONNECT federated tables involved in bug 64445[1].

As something of a consolation prize, "analytics-store.eqiad.wmnet" is now open for SELECT queries from the 'research' user. This box:

- Is a CNAME for dbstore1002.eqaid.wmnet.

- Replicates all wikis in one place.

- Can be hammered. Please feel free.

- Can have scratch space for temporary writes (but doesn't yet).

- Can replicate eventlogging too (but doesn't yet).

I would appreciate if anyone has some suitable read-only reports to try out, please do so and report back.

BR Sean

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=64445

-- DBA @ WMF

Attachments:

attachment.htm (text/html — 1.5 KB)

Show replies by date

Sean Pringle

29 Apr 29 Apr

4:31 p.m.

On Wed, Apr 30, 2014 at 1:20 AM, Sean Pringle springle@wikimedia.orgwrote:

...

As something of a consolation prize, "analytics-store.eqiad.wmnet" is now open for SELECT queries from the 'research' user. This box:

Is a CNAME for dbstore1002.eqaid.wmnet.

Just to be contrary I've already messed with the CNAME to point to dbstore1001 instead :-) Am doing more work on 02.

The dbstore* are the same setup, each replicating all-shards, etc. Just use the CNAME and all should be well.

Dario Taraborelli

8:01 p.m.

Sean, consolation prizes are understated, this is terrific.

I just noticed that centralauth is not included, after EventLogging data this is the most useful database to have replicated on the big one box.

Dario

On Apr 29, 2014, at 6:31 PM, Sean Pringle springle@wikimedia.org wrote:

...

On Wed, Apr 30, 2014 at 1:20 AM, Sean Pringle springle@wikimedia.org wrote:

As something of a consolation prize, "analytics-store.eqiad.wmnet" is now open for SELECT queries from the 'research' user. This box:

Is a CNAME for dbstore1002.eqaid.wmnet.

Just to be contrary I've already messed with the CNAME to point to dbstore1001 instead :-) Am doing more work on 02.

The dbstore* are the same setup, each replicating all-shards, etc. Just use the CNAME and all should be well.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Sean Pringle

30 Apr 30 Apr

12:13 a.m.

On Wed, Apr 30, 2014 at 6:01 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:

...

Sean, consolation prizes are understated, this is terrific.

I just noticed that centralauth is not included, after EventLogging data this is the most useful database to have replicated on the big one box.

Good point. I had not granted access to centralauth for the 'research' user. Should work now.

Oliver Keyes

1:07 a.m.

One word: YAY!

Thank you so much for this, Sean :D

On 29 April 2014 17:13, Sean Pringle springle@wikimedia.org wrote:

...

On Wed, Apr 30, 2014 at 6:01 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:

...
Sean, consolation prizes are understated, this is terrific.

I just noticed that centralauth is not included, after EventLogging data this is the most useful database to have replicated on the big one box.

Good point. I had not granted access to centralauth for the 'research' user. Should work now.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

Oliver Keyes

2:44 a.m.

Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries). Thoughts:

*We should probably put in some kind of restrictions around what we care about. For example, I see the tables relating to the WIkimania and Arbcom wikis in there. This is not data I think we're ever going to care about, but it is *data*, which means we'll either have to write really complex UNIONs to gather global data, with a constantly-maintained list of dbs-we-don't-care-about, or accept inaccuracies in our data. My suggestion would be for these dbs to be removed and excluded from replication, using the noc dblists to identify the ones we don't care about; generally "deleted","closed","special","wikimedia" wikis aren't things we want to be running queries over. *This is probably my bad, but I understood the goal to be having a single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

On 29 April 2014 18:07, Oliver Keyes okeyes@wikimedia.org wrote:

...

One word: YAY!

Thank you so much for this, Sean :D

On 29 April 2014 17:13, Sean Pringle springle@wikimedia.org wrote:

...
On Wed, Apr 30, 2014 at 6:01 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:

...
Sean, consolation prizes are understated, this is terrific.

I just noticed that centralauth is not included, after EventLogging data this is the most useful database to have replicated on the big one box.

Good point. I had not granted access to centralauth for the 'research' user. Should work now.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

-- Oliver Keyes Research Analyst Wikimedia Foundation

Oliver Keyes

5:39 a.m.

Also, where are you replicating from? Only there are kind of a lot of tables here I don't recognise.

On 29 April 2014 19:44, Oliver Keyes okeyes@wikimedia.org wrote:

...

Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries). Thoughts:

*We should probably put in some kind of restrictions around what we care about. For example, I see the tables relating to the WIkimania and Arbcom wikis in there. This is not data I think we're ever going to care about, but it is *data*, which means we'll either have to write really complex UNIONs to gather global data, with a constantly-maintained list of dbs-we-don't-care-about, or accept inaccuracies in our data. My suggestion would be for these dbs to be removed and excluded from replication, using the noc dblists to identify the ones we don't care about; generally "deleted","closed","special","wikimedia" wikis aren't things we want to be running queries over. *This is probably my bad, but I understood the goal to be having a single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

On 29 April 2014 18:07, Oliver Keyes okeyes@wikimedia.org wrote:

...
One word: YAY!

Thank you so much for this, Sean :D

On 29 April 2014 17:13, Sean Pringle springle@wikimedia.org wrote:

...
On Wed, Apr 30, 2014 at 6:01 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:

...
Sean, consolation prizes are understated, this is terrific.

I just noticed that centralauth is not included, after EventLogging data this is the most useful database to have replicated on the big one box.

Good point. I had not granted access to centralauth for the 'research' user. Should work now.

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

-- Oliver Keyes Research Analyst Wikimedia Foundation

-- Oliver Keyes Research Analyst Wikimedia Foundation

Sean Pringle

6:42 a.m.

On Wed, Apr 30, 2014 at 3:39 PM, Oliver Keyes okeyes@wikimedia.org wrote:

...

Also, where are you replicating from? Only there are kind of a lot of tables here I don't recognise.

...

From each shard using the same replication streams as the

sX-analytics-slaves use.

Which tables?

-- DBA @ WMF

Oliver Keyes

9:13 a.m.

Hurrh; that's...weird. So, looking at s1, I now see the same tables as on the One Box, but I swear I haven't seen some of them there before. Examples are exorphans, log_search and filejournal.

Aaron, Dario, Leila: am I crazy, or were these not previously there?

On 29 April 2014 23:42, Sean Pringle springle@wikimedia.org wrote:

...

On Wed, Apr 30, 2014 at 3:39 PM, Oliver Keyes okeyes@wikimedia.orgwrote:

...
Also, where are you replicating from? Only there are kind of a lot of tables here I don't recognise.

From each shard using the same replication streams as the sX-analytics-slaves use.

Which tables?

-- DBA @ WMF

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

Oliver Keyes

9:17 a.m.

There's also "prefstats", the last entry in which is dated 20120321195103. Either I'm mad or these are legacy tables that were previously excluded from replication to the analytics slaves (it's probable that I'm mad).

On 30 April 2014 02:13, Oliver Keyes okeyes@wikimedia.org wrote:

...

Hurrh; that's...weird. So, looking at s1, I now see the same tables as on the One Box, but I swear I haven't seen some of them there before. Examples are exorphans, log_search and filejournal.

Aaron, Dario, Leila: am I crazy, or were these not previously there?

On 29 April 2014 23:42, Sean Pringle springle@wikimedia.org wrote:

...
On Wed, Apr 30, 2014 at 3:39 PM, Oliver Keyes okeyes@wikimedia.orgwrote:

...
Also, where are you replicating from? Only there are kind of a lot of tables here I don't recognise.

From each shard using the same replication streams as the sX-analytics-slaves use.

Which tables?

-- DBA @ WMF

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

-- Oliver Keyes Research Analyst Wikimedia Foundation

Sean Pringle

10:05 a.m.

On Wed, Apr 30, 2014 at 7:17 PM, Oliver Keyes okeyes@wikimedia.org wrote:

...

There's also "prefstats", the last entry in which is dated 20120321195103. Either I'm mad or these are legacy tables that were previously excluded from replication to the analytics slaves (it's probable that I'm mad).

At least log_search, filejournal, and prefstats also exist on s1-analytics-slave enwiki. Maybe some quirk of account permissions was affecting what you saw last time.

Still, I'm leaning toward the mad option ;)

Dario Taraborelli

1:23 p.m.

Oliver asked me to confirm that he’s not hallucinating and I too am seeing tables that were not previously visible to the research user on the slaves. Not a big deal and probably a legacy permission issue.

On Apr 30, 2014, at 12:05 PM, Sean Pringle springle@wikimedia.org wrote:

...

On Wed, Apr 30, 2014 at 7:17 PM, Oliver Keyes okeyes@wikimedia.org wrote: There's also "prefstats", the last entry in which is dated 20120321195103. Either I'm mad or these are legacy tables that were previously excluded from replication to the analytics slaves (it's probable that I'm mad).

At least log_search, filejournal, and prefstats also exist on s1-analytics-slave enwiki. Maybe some quirk of account permissions was affecting what you saw last time.

Still, I'm leaning toward the mad option ;) _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Sean Pringle

6:40 a.m.

On Wed, Apr 30, 2014 at 12:44 PM, Oliver Keyes okeyes@wikimedia.org wrote:

...

Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries). Thoughts:

*We should probably put in some kind of restrictions around what we care about. For example, I see the tables relating to the WIkimania and Arbcom wikis in there. This is not data I think we're ever going to care about, but it is *data*, which means we'll either have to write really complex UNIONs to gather global data, with a constantly-maintained list of dbs-we-don't-care-about, or accept inaccuracies in our data. My suggestion would be for these dbs to be removed and excluded from replication, using the noc dblists to identify the ones we don't care about; generally "deleted","closed","special","wikimedia" wikis aren't things we want to be running queries over.

If there are wikis you guys know for sure nobody using 'research' user will ever want, then they can simply be hidden by modifying the account grants.

*This is probably my bad, but I understood the goal to be having a single

...

db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

No, that wasn't the goal. Sorry if there was miscommunication. The actual data will remain in separate wikis using regular replication.

However, it's quite possible to create one or more unified databases with (for example) SQL VIEWs that union all tables from a set of pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, really. We could even allow ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I don't think anything need be ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. :-)

Sean

-- DBA @ WMF

Dario Taraborelli

1:21 p.m.

On Apr 30, 2014, at 8:40 AM, Sean Pringle springle@wikimedia.org wrote:

...

On Wed, Apr 30, 2014 at 12:44 PM, Oliver Keyes okeyes@wikimedia.org wrote: Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries). Thoughts:

*We should probably put in some kind of restrictions around what we care about. For example, I see the tables relating to the WIkimania and Arbcom wikis in there. This is not data I think we're ever going to care about, but it is data, which means we'll either have to write really complex UNIONs to gather global data, with a constantly-maintained list of dbs-we-don't-care-about, or accept inaccuracies in our data. My suggestion would be for these dbs to be removed and excluded from replication, using the noc dblists to identify the ones we don't care about; generally "deleted","closed","special","wikimedia" wikis aren't things we want to be running queries over.

If there are wikis you guys know for sure nobody using ‘research' user will ever want, then they can simply be hidden by modifying the account grants.

Oliver, I am not sure how we define “data we’re [n]ever going to care about”. I do expect we will receive occasional requests for data related to closed or special wikis (see https://office.wikimedia.org/wiki/File:Officewiki_ae.png just to mention a recent example).

The point about global queries is well taken, but I think it should be handled differently (see below). Since we’re not talking about privacy here (uncensored data can be obtained by anyone with access to the production DBs), but usability, I’d avoid making assumptions about which wikis should *always* be excluded. We should have an equivalent of the API’s sitematrix with project metadata to allow flexible filtering.

...

*This is probably my bad, but I understood the goal to be having a single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

No, that wasn't the goal. Sorry if there was miscommunication. The actual data will remain in separate wikis using regular replication.

However, it's quite possible to create one or more unified databases with (for example) SQL VIEWs that union all tables from a set of pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, really. We could even allow ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I don't think anything need be ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. :-)

that would work, Oliver is right that creating views for core tables in pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about we create a page on wikitech with requirements for these views and we take it from there?

Dario

Dan Andreescu

1:59 p.m.

This is awesome, thank you Sean

...

*This is probably my bad, but I understood the goal to be having a single

...
db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

No, that wasn't the goal. Sorry if there was miscommunication. The actual data will remain in separate wikis using regular replication.

However, it's quite possible to create one or more unified databases with (for example) SQL VIEWs that union all tables from a set of pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, really. We could even allow ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I don't think anything need be ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. :-)

that would work, Oliver is right that creating views for core tables in pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about we create a page on wikitech with requirements for these views and we take it from there?

Union-ified views sound great here. Let's see how they perform. I bet they'll be fine but if they're not, maybe we can throw them into Hadoop? Using the views to do the MySQL -> Hadoop replication would be so much easier than going to each database individually.

Leila Zia

3:05 p.m.

Hi Sean,

I am very excited about this. Thank you. :-) Re unified views:

On Wed, Apr 30, 2014 at 6:59 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:

...

This is awesome, thank you Sean

...
*This is probably my bad, but I understood the goal to be having a

...
single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

No, that wasn't the goal. Sorry if there was miscommunication. The actual data will remain in separate wikis using regular replication.

However, it's quite possible to create one or more unified databases with (for example) SQL VIEWs that union all tables from a set of pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, really. We could even allow ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I don't think anything need be ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. :-)

that would work, Oliver is right that creating views for core tables in pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about we create a page on wikitech with requirements for these views and we take it from there?

Union-ified views sound great here. Let's see how they perform. I bet they'll be fine but if they're not, maybe we can throw them into Hadoop? Using the views to do the MySQL -> Hadoop replication would be so much easier than going to each database individually.

Like Oliver, I also thought we will have everything in one database. I guess Oliver and I talk a lot to each other. ;-)

It will be great to have unified views.

Thanks, Leila p.s. Oliver, you are not hallucinating, as Dario confirmed, too.

...

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Oliver Keyes

4:34 p.m.

On 30 April 2014 06:59, Dan Andreescu dandreescu@wikimedia.org wrote:

...

This is awesome, thank you Sean

...
*This is probably my bad, but I understood the goal to be having a

...
single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

No, that wasn't the goal. Sorry if there was miscommunication. The actual data will remain in separate wikis using regular replication.

However, it's quite possible to create one or more unified databases with (for example) SQL VIEWs that union all tables from a set of pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, really. We could even allow ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I don't think anything need be ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. :-)

that would work, Oliver is right that creating views for core tables in pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about we create a page on wikitech with requirements for these views and we take it from there?

Union-ified views sound great here. Let's see how they perform. I bet they'll be fine but if they're not, maybe we can throw them into Hadoop? Using the views to do the MySQL -> Hadoop replication would be so much easier than going to each database individually.

Totally down for that, but...

https://bugzilla.wikimedia.org/show_bug.cgi?id=64262

...

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

Toby Negrin

4:42 p.m.

I think we'll put everything on Hadoop at some point but we're focusing on the page views now.

Regarding the bug - if you're ready to use it I can see if Andrew can install the java package.

-Toby

...

On Apr 30, 2014, at 9:34 AM, Oliver Keyes okeyes@wikimedia.org wrote:

...
On 30 April 2014 06:59, Dan Andreescu dandreescu@wikimedia.org wrote: This is awesome, thank you Sean

...
...
...
*This is probably my bad, but I understood the goal to be having a single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

No, that wasn't the goal. Sorry if there was miscommunication. The actual data will remain in separate wikis using regular replication.

However, it's quite possible to create one or more unified databases with (for example) SQL VIEWs that union all tables from a set of pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, really. We could even allow ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I don't think anything need be ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. :-)

that would work, Oliver is right that creating views for core tables in pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about we create a page on wikitech with requirements for these views and we take it from there?

Union-ified views sound great here. Let's see how they perform. I bet they'll be fine but if they're not, maybe we can throw them into Hadoop? Using the views to do the MySQL -> Hadoop replication would be so much easier than going to each database individually.

Totally down for that, but... https://bugzilla.wikimedia.org/show_bug.cgi?id=64262

...

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Oliver Keyes

5:21 p.m.

Not quite there yet - just pointing to it as a potentially blocker to the "let's move everything to Hadoop!" idea (which I fully support). If the goal is to enable research using unified data, but the unified data is more difficult to access than the non-unified data, we probably haven't moved the needle enough to justify it. "A sane way to access this stuff from Python and R" should probably be considered a pretty firm prerequisite, because without that, the utility isn't tremendously increased.

On 30 April 2014 09:42, Toby Negrin tnegrin@wikimedia.org wrote:

...

I think we'll put everything on Hadoop at some point but we're focusing on the page views now.

Regarding the bug - if you're ready to use it I can see if Andrew can install the java package.

-Toby

On Apr 30, 2014, at 9:34 AM, Oliver Keyes okeyes@wikimedia.org wrote:

On 30 April 2014 06:59, Dan Andreescu dandreescu@wikimedia.org wrote:

...
This is awesome, thank you Sean

...
*This is probably my bad, but I understood the goal to be having a

...
single db containing unified, core tablets. So, we'd have one db, with one revision table, that'd have an extra column of "wiki" that denoted the project the entry referred to. This would let us perform global queries without the complex UNIONs mentioned above. Is this still the goal, or...?

No, that wasn't the goal. Sorry if there was miscommunication. The actual data will remain in separate wikis using regular replication.

However, it's quite possible to create one or more unified databases with (for example) SQL VIEWs that union all tables from a set of pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, really. We could even allow ad-hoc creation of unified views for whatever .dblist is appropriate for the project. I don't think anything need be ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. :-)

that would work, Oliver is right that creating views for core tables in pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about we create a page on wikitech with requirements for these views and we take it from there?

Union-ified views sound great here. Let's see how they perform. I bet they'll be fine but if they're not, maybe we can throw them into Hadoop? Using the views to do the MySQL -> Hadoop replication would be so much easier than going to each database individually.

Totally down for that, but...

https://bugzilla.wikimedia.org/show_bug.cgi?id=64262

...

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

Sean Pringle

9:48 a.m.

On Wed, Apr 30, 2014 at 12:44 PM, Oliver Keyes okeyes@wikimedia.org wrote:

...

Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries).

eventlogging should now be accessible from the One Box. It will still be playing catch up replication for a few hours though...

-- DBA @ WMF

Oliver Keyes

9:49 a.m.

Whee!

On 30 April 2014 02:48, Sean Pringle springle@wikimedia.org wrote:

...

On Wed, Apr 30, 2014 at 12:44 PM, Oliver Keyes okeyes@wikimedia.orgwrote:

...
Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries).

eventlogging should now be accessible from the One Box. It will still be playing catch up replication for a few hours though...

-- DBA @ WMF

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation

Dario Taraborelli

1:28 p.m.

~ 30 hours of replag as I write but this is very exciting, thanks Sean!

In case you’re wondering, the EventLogging DB is called “log” as the previous one.

On Apr 30, 2014, at 11:49 AM, Oliver Keyes okeyes@wikimedia.org wrote:

...

Whee!

On 30 April 2014 02:48, Sean Pringle springle@wikimedia.org wrote: On Wed, Apr 30, 2014 at 12:44 PM, Oliver Keyes okeyes@wikimedia.org wrote: Okay, so, have tested (to a limited degree. The work I'm doing that involves the dbs involves eventlogging, so this is mostly me making up excuses to run queries).

eventlogging should now be accessible from the One Box. It will still be playing catch up replication for a few hours though...

-- DBA @ WMF

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Ori Livneh

7:58 p.m.

Sean, thanks so much!

On Tue, Apr 29, 2014 at 8:20 AM, Sean Pringle springle@wikimedia.orgwrote:

...

db1048 has had the eventlogging uuid fields made formally UNIQUE KEY. I

gather Ori will now run some validation against logs to check for remaining gaps.

This should finish in 16 hours or so.

As something of a consolation prize, "analytics-store.eqiad.wmnet" is now

...

open for SELECT queries from the 'research' user. This box: [...]

Can replicate eventlogging too (but doesn't yet).

Could we please? :)

Ori Livneh

9:48 p.m.

On Wed, Apr 30, 2014 at 12:58 PM, Ori Livneh ori@wikimedia.org wrote:

...

As something of a consolation prize, "analytics-store.eqiad.wmnet" is

...
now open for SELECT queries from the 'research' user. This box: [...]

Can replicate eventlogging too (but doesn't yet).

Could we please? :)

(Just realized this is already done. Awesome.)

...

Gilles Dubuc

2 May 2 May

10:46 a.m.

Where might I find the credentials of the research user? I only have research_prod's password, which I was using to connect to db1047. That one doesn't seem to work on analytics-store.eqiad.wmnet

On Wed, Apr 30, 2014 at 11:48 PM, Ori Livneh ori@wikimedia.org wrote:

...

On Wed, Apr 30, 2014 at 12:58 PM, Ori Livneh ori@wikimedia.org wrote:

...
As something of a consolation prize, "analytics-store.eqiad.wmnet" is

...
now open for SELECT queries from the 'research' user. This box: [...]

Can replicate eventlogging too (but doesn't yet).

Could we please? :)

(Just realized this is already done. Awesome.)

...

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Dario Taraborelli

3:23 p.m.

Hi Gilles,

you shouldn’t use “research_prod” if you simply need to perform read-only queries against the slaves (the “research” user is the one you should use instead, at least until we revisit the policy of SQL credentials with ops). I’ll drop you a line off-list with instructions on the credentials.

On May 2, 2014, at 12:46 PM, Gilles Dubuc gilles@wikimedia.org wrote:

...

Where might I find the credentials of the research user? I only have research_prod's password, which I was using to connect to db1047. That one doesn't seem to work on analytics-store.eqiad.wmnet

On Wed, Apr 30, 2014 at 11:48 PM, Ori Livneh ori@wikimedia.org wrote: On Wed, Apr 30, 2014 at 12:58 PM, Ori Livneh ori@wikimedia.org wrote:

As something of a consolation prize, "analytics-store.eqiad.wmnet" is now open for SELECT queries from the 'research' user. This box: [...]

Can replicate eventlogging too (but doesn't yet).

Could we please? :)

(Just realized this is already done. Awesome.)

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

3903

Age (days ago)

3906

Last active (days ago)

analytics@lists.wikimedia.org

25 comments

8 participants

tags (0)

participants (8)

Dan Andreescu
Dario Taraborelli
Gilles Dubuc
Leila Zia
Oliver Keyes
Ori Livneh
Sean Pringle
Toby Negrin