Hi, This is not urgetn, but if anyone is interested in troubleshooting a (probably basic) problem I am having with mysql, I would appreciate it!
I am trying to query the MobileWebClickTracking table, which is ginormous and keeps timing out or boiling my RAM So Dario suggested I use screen to dump a section of the table into a more workable table.
Here is his test query that seemed to work:
*screen mysql -h analytics-store.eqiad.wmnet -B -e "CREATE TABLE staging.jkatz_foo SELECT * FROM enwiki.user LIMIT 300;"*
I tried to do this on my own (I use analytics-slave instead, as my credentials don't seem to work on analytics-store), and it doesn't seem to do anything. Can you try on your end and let me know if you're having any luck?
Here is the query:
jkatz@bast1001:~$ screen
jkatz@bast1001:~$ mysql -h analytics-slave.eqiad.wmnet -u research -pJoFjnA90Ajyp -B -e "Insert into staging.jkatz_clicktracking1 Select * from log.MobileWebClickTracking_5929948 WHERE ('timestamp' between 20141101000000 and 20141130000000) and wiki like 'enwiki';"
When I look in processes in stat1003, I only see a sleep command--no query: [image: Inline image 2]
Is the query within that sleep? Anyway, I seem to be creating tables but they do not have any rows in them. I have double checked that the data between those dates exists in that table.
Any thoughts?
Best,
J
Actually, no; those values don't exist.
mysql:research@s1-analytics-slave.eqiad.wmnet [staging]> DESCRIBE log.MobileWebClickTracking_5929948 -> ; +---------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------------+--------------+------+-----+---------+-------+ | id | int(11) | NO | PRI | NULL | | | uuid | char(32) | NO | UNI | NULL | | | clientIp | varchar(191) | YES | | NULL | | | clientValidated | tinyint(1) | YES | | NULL | | | isTruncated | tinyint(1) | YES | | NULL | | | timestamp | varchar(14) | YES | MUL | NULL | | | webHost | varchar(191) | YES | | NULL | | | wiki | varchar(191) | YES | | NULL | | | event_destination | varchar(191) | YES | | NULL | | | event_mobileMode | varchar(191) | YES | | NULL | | | event_name | varchar(191) | YES | | NULL | | | event_userEditCount | int(11) | NO | | NULL | | | event_username | varchar(191) | NO | | NULL | | | userAgent | varchar(191) | YES | | NULL | | +---------------------+--------------+------+-----+---------+-------+
MediaWiki timestamps are stored as character strings, not numeric values, because it allows for convenient alpha-sorting. You need to be using timestamp BETWEEN '2014110100000' AND..., with quotes, rather than treating them as numeric values.
On 29 December 2014 at 20:21, Jon Katz jkatz@wikimedia.org wrote:
Hi, This is not urgetn, but if anyone is interested in troubleshooting a (probably basic) problem I am having with mysql, I would appreciate it!
I am trying to query the MobileWebClickTracking table, which is ginormous and keeps timing out or boiling my RAM So Dario suggested I use screen to dump a section of the table into a more workable table.
Here is his test query that seemed to work:
*screen mysql -h analytics-store.eqiad.wmnet -B -e "CREATE TABLE staging.jkatz_foo SELECT * FROM enwiki.user LIMIT 300;"*
I tried to do this on my own (I use analytics-slave instead, as my credentials don't seem to work on analytics-store), and it doesn't seem to do anything. Can you try on your end and let me know if you're having any luck?
Here is the query:
jkatz@bast1001:~$ screen
jkatz@bast1001:~$ mysql -h analytics-slave.eqiad.wmnet -u research -pJoFjnA90Ajyp -B -e "Insert into staging.jkatz_clicktracking1 Select * from log.MobileWebClickTracking_5929948 WHERE ('timestamp' between 20141101000000 and 20141130000000) and wiki like 'enwiki';"
When I look in processes in stat1003, I only see a sleep command--no query: [image: Inline image 2]
Is the query within that sleep? Anyway, I seem to be creating tables but they do not have any rows in them. I have double checked that the data between those dates exists in that table.
Any thoughts?
Best,
J
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
You're my hero, Oliver. Thanks!
On Mon, Dec 29, 2014 at 5:26 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Actually, no; those values don't exist.
mysql:research@s1-analytics-slave.eqiad.wmnet [staging]> DESCRIBE log.MobileWebClickTracking_5929948 -> ; +---------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------------+--------------+------+-----+---------+-------+ | id | int(11) | NO | PRI | NULL | | | uuid | char(32) | NO | UNI | NULL | | | clientIp | varchar(191) | YES | | NULL | | | clientValidated | tinyint(1) | YES | | NULL | | | isTruncated | tinyint(1) | YES | | NULL | | | timestamp | varchar(14) | YES | MUL | NULL | | | webHost | varchar(191) | YES | | NULL | | | wiki | varchar(191) | YES | | NULL | | | event_destination | varchar(191) | YES | | NULL | | | event_mobileMode | varchar(191) | YES | | NULL | | | event_name | varchar(191) | YES | | NULL | | | event_userEditCount | int(11) | NO | | NULL | | | event_username | varchar(191) | NO | | NULL | | | userAgent | varchar(191) | YES | | NULL | | +---------------------+--------------+------+-----+---------+-------+
MediaWiki timestamps are stored as character strings, not numeric values, because it allows for convenient alpha-sorting. You need to be using timestamp BETWEEN '2014110100000' AND..., with quotes, rather than treating them as numeric values.
On 29 December 2014 at 20:21, Jon Katz jkatz@wikimedia.org wrote:
Hi, This is not urgetn, but if anyone is interested in troubleshooting a (probably basic) problem I am having with mysql, I would appreciate it!
I am trying to query the MobileWebClickTracking table, which is ginormous and keeps timing out or boiling my RAM So Dario suggested I use screen to dump a section of the table into a more workable table.
Here is his test query that seemed to work:
*screen mysql -h analytics-store.eqiad.wmnet -B -e "CREATE TABLE staging.jkatz_foo SELECT * FROM enwiki.user LIMIT 300;"*
I tried to do this on my own (I use analytics-slave instead, as my credentials don't seem to work on analytics-store), and it doesn't seem to do anything. Can you try on your end and let me know if you're having any luck?
Here is the query:
jkatz@bast1001:~$ screen
jkatz@bast1001:~$ mysql -h analytics-slave.eqiad.wmnet -u research -pJoFjnA90Ajyp -B -e "Insert into staging.jkatz_clicktracking1 Select * from log.MobileWebClickTracking_5929948 WHERE ('timestamp' between 20141101000000 and 20141130000000) and wiki like 'enwiki';"
When I look in processes in stat1003, I only see a sleep command--no query: [image: Inline image 2]
Is the query within that sleep? Anyway, I seem to be creating tables but they do not have any rows in them. I have double checked that the data between those dates exists in that table.
Any thoughts?
Best,
J
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi,
On Mon, Dec 29, 2014 at 08:26:25PM -0500, Oliver Keyes wrote:
MediaWiki timestamps are stored as character strings, not numeric values, because it allows for convenient alpha-sorting. You need to be using timestamp BETWEEN '2014110100000' AND..., with quotes, rather than treating them as numeric values.
EventLogging stores timestamps as characters. True. But MySQL is good at automatic type conversion. So one can use numeric values just fine:
mysql:research@dbstore1002.eqiad.wmnet [(none)]> SELECT timestamp FROM log.MobileWebClickTracking_5929948 WHERE timestamp BETWEEN 20140101000000 and 20140101000000 LIMIT 1; +----------------+ | timestamp | +----------------+ | 20140101000000 | +----------------+ 1 row in set (0.02 sec)
So no real requirement to quote the values in the BETWEEN in this case. But quoting them makes a few things easier (like allowing to truncate values).
The real issue with the original query in
On 29 December 2014 at 20:21, Jon Katz jkatz@wikimedia.org wrote:
mysql [...] -e "Insert into staging.jkatz_clicktracking1 Select * from log.MobileWebClickTracking_5929948 WHERE ('timestamp' between 20141101000000 and 20141130000000) and wiki like 'enwiki';"
is elsewhere in the WHERE part. The first condition is
'timestamp' between 20141101000000 and 20141130000000
which (due to the quotes) checks if the literal string “timestamp” (and not the value of the column called timestamp) is between the given values.
This BETWEEN condition on the literal string “timestamp” is never met. Hence, the SELECT gave the empty result set.
Have fun, Christian
Huh; interesting! I remember running into a lot of problems with the revision table due to conversion.
On 3 January 2015 at 08:05, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
On Mon, Dec 29, 2014 at 08:26:25PM -0500, Oliver Keyes wrote:
MediaWiki timestamps are stored as character strings, not numeric values, because it allows for convenient alpha-sorting. You need to be using timestamp BETWEEN '2014110100000' AND..., with quotes, rather than
treating
them as numeric values.
EventLogging stores timestamps as characters. True. But MySQL is good at automatic type conversion. So one can use numeric values just fine:
mysql:research@dbstore1002.eqiad.wmnet [(none)]> SELECT timestamp FROM log.MobileWebClickTracking_5929948 WHERE timestamp BETWEEN 20140101000000 and 20140101000000 LIMIT 1; +----------------+ | timestamp | +----------------+ | 20140101000000 | +----------------+ 1 row in set (0.02 sec)
So no real requirement to quote the values in the BETWEEN in this case. But quoting them makes a few things easier (like allowing to truncate values).
The real issue with the original query in
On 29 December 2014 at 20:21, Jon Katz jkatz@wikimedia.org wrote:
mysql [...] -e "Insert into staging.jkatz_clicktracking1 Select * from log.MobileWebClickTracking_5929948 WHERE ('timestamp' between 20141101000000 and 20141130000000) and wiki like 'enwiki';"
is elsewhere in the WHERE part. The first condition is
'timestamp' between 20141101000000 and 20141130000000
which (due to the quotes) checks if the literal string “timestamp” (and not the value of the column called timestamp) is between the given values.
This BETWEEN condition on the literal string “timestamp” is never met. Hence, the SELECT gave the empty result set.
Have fun, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Huh; interesting! I remember running into a lot of problems with the revision table due to conversion.
You're not imagining things, Oliver. The explanation is too complicated to mention here, and I always forget it. Whether you are ok quoting dates or not depends on what indices are set on those fields and this varies from labs to the private slaves. In short, you're better off using quoted dates in all cases, just to keep it simple.
On 3 January 2015 at 08:05, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
On Mon, Dec 29, 2014 at 08:26:25PM -0500, Oliver Keyes wrote:
MediaWiki timestamps are stored as character strings, not numeric
values,
because it allows for convenient alpha-sorting. You need to be using timestamp BETWEEN '2014110100000' AND..., with quotes, rather than
treating
them as numeric values.
EventLogging stores timestamps as characters. True. But MySQL is good at automatic type conversion. So one can use numeric values just fine:
mysql:research@dbstore1002.eqiad.wmnet [(none)]> SELECT timestamp FROM log.MobileWebClickTracking_5929948 WHERE timestamp BETWEEN 20140101000000 and 20140101000000 LIMIT 1; +----------------+ | timestamp | +----------------+ | 20140101000000 | +----------------+ 1 row in set (0.02 sec)
So no real requirement to quote the values in the BETWEEN in this case. But quoting them makes a few things easier (like allowing to truncate values).
The real issue with the original query in
On 29 December 2014 at 20:21, Jon Katz jkatz@wikimedia.org wrote:
mysql [...] -e "Insert into staging.jkatz_clicktracking1 Select * from log.MobileWebClickTracking_5929948 WHERE ('timestamp' between 20141101000000 and 20141130000000) and wiki like 'enwiki';"
is elsewhere in the WHERE part. The first condition is
'timestamp' between 20141101000000 and 20141130000000
which (due to the quotes) checks if the literal string “timestamp” (and not the value of the column called timestamp) is between the given values.
This BETWEEN condition on the literal string “timestamp” is never met. Hence, the SELECT gave the empty result set.
Have fun, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Ahh. Rule 5!
On 6 January 2015 at 17:07, Dan Andreescu dandreescu@wikimedia.org wrote:
Huh; interesting! I remember running into a lot of problems with the revision table due to conversion.
You're not imagining things, Oliver. The explanation is too complicated to mention here, and I always forget it. Whether you are ok quoting dates or not depends on what indices are set on those fields and this varies from labs to the private slaves. In short, you're better off using quoted dates in all cases, just to keep it simple.
On 3 January 2015 at 08:05, Christian Aistleitner christian@quelltextlich.at wrote:
Hi,
On Mon, Dec 29, 2014 at 08:26:25PM -0500, Oliver Keyes wrote:
MediaWiki timestamps are stored as character strings, not numeric values, because it allows for convenient alpha-sorting. You need to be using timestamp BETWEEN '2014110100000' AND..., with quotes, rather than treating them as numeric values.
EventLogging stores timestamps as characters. True. But MySQL is good at automatic type conversion. So one can use numeric values just fine:
mysql:research@dbstore1002.eqiad.wmnet [(none)]> SELECT timestamp FROM log.MobileWebClickTracking_5929948 WHERE timestamp BETWEEN 20140101000000 and 20140101000000 LIMIT 1; +----------------+ | timestamp | +----------------+ | 20140101000000 | +----------------+ 1 row in set (0.02 sec)
So no real requirement to quote the values in the BETWEEN in this case. But quoting them makes a few things easier (like allowing to truncate values).
The real issue with the original query in
On 29 December 2014 at 20:21, Jon Katz jkatz@wikimedia.org wrote:
mysql [...] -e "Insert into staging.jkatz_clicktracking1 Select * from log.MobileWebClickTracking_5929948 WHERE ('timestamp' between 20141101000000 and 20141130000000) and wiki like 'enwiki';"
is elsewhere in the WHERE part. The first condition is
'timestamp' between 20141101000000 and 20141130000000
which (due to the quotes) checks if the literal string “timestamp” (and not the value of the column called timestamp) is between the given values.
This BETWEEN condition on the literal string “timestamp” is never met. Hence, the SELECT gave the empty result set.
Have fun, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics