Hey guys,
I'm trying to use the sha1 hashes of Wiki content for the first time (woot! Props to D et al. for seeing it through) but I'm having some trouble actually getting them out of the API/databases. It looks like the checksums only go back to April 19th. Is this true of all pages? Is there any plan to propagate the metric backwards?
-Aaron
Which machine are you accessing? D
On Thu, May 10, 2012 at 4:58 PM, Aaron Halfaker aaron.halfaker@gmail.comwrote:
Hey guys,
I'm trying to use the sha1 hashes of Wiki content for the first time (woot! Props to D et al. for seeing it through) but I'm having some trouble actually getting them out of the API/databases. It looks like the checksums only go back to April 19th. Is this true of all pages? Is there any plan to propagate the metric backwards?
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I'm using the web API and db42.
API example: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
MySQL example: mysql> select rev_id, rev_sha1 from revision where rev_page = 15661504 and rev_id <= 488033783 order by rev_timestamp desc limit 2; +-----------+---------------------------------+ | rev_id | rev_sha1 | +-----------+---------------------------------+ | 488033783 | i8x0e29kxs2t1o9f1h03ks7q59yyyrv | | 485404713 | | +-----------+---------------------------------+ 2 rows in set (0.00 sec)
-Aaron
On Thu, May 10, 2012 at 4:03 PM, Diederik van Liere <dvanliere@wikimedia.org
wrote:
Which machine are you accessing? D
On Thu, May 10, 2012 at 4:58 PM, Aaron Halfaker aaron.halfaker@gmail.comwrote:
Hey guys,
I'm trying to use the sha1 hashes of Wiki content for the first time (woot! Props to D et al. for seeing it through) but I'm having some trouble actually getting them out of the API/databases. It looks like the checksums only go back to April 19th. Is this true of all pages? Is there any plan to propagate the metric backwards?
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Aaron,
You are right, it seems not to have been calculated for older pages (I checked it for eswiki). I was under the impression that this either had finished or is in progress. Probably best to ask Asher for more details.
Best,
Diederik
On Thu, May 10, 2012 at 5:09 PM, Aaron Halfaker aaron.halfaker@gmail.comwrote:
I'm using the web API and db42.
API example: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
MySQL example: mysql> select rev_id, rev_sha1 from revision where rev_page = 15661504 and rev_id <= 488033783 order by rev_timestamp desc limit 2; +-----------+---------------------------------+ | rev_id | rev_sha1 | +-----------+---------------------------------+ | 488033783 | i8x0e29kxs2t1o9f1h03ks7q59yyyrv | | 485404713 | | +-----------+---------------------------------+ 2 rows in set (0.00 sec)
-Aaron
On Thu, May 10, 2012 at 4:03 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:
Which machine are you accessing? D
On Thu, May 10, 2012 at 4:58 PM, Aaron Halfaker <aaron.halfaker@gmail.com
wrote:
Hey guys,
I'm trying to use the sha1 hashes of Wiki content for the first time (woot! Props to D et al. for seeing it through) but I'm having some trouble actually getting them out of the API/databases. It looks like the checksums only go back to April 19th. Is this true of all pages? Is there any plan to propagate the metric backwards?
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Asher,
As you can see in the conversation below, I discovered that sha1 hashes are not being generated historically for revision content in Wikipedia -- just generated from the time of release (mid-April) forward. Is this intended? If so, is there a plan to fill in the sha1's historically or should we (data monkeys) keep track of THE GREAT SHA1 EPOCH of April, 2012 when performing our analysis from here forward?
Thanks! -Aaron
On Thu, May 10, 2012 at 4:27 PM, Diederik van Liere <dvanliere@wikimedia.org
wrote:
Hi Aaron,
You are right, it seems not to have been calculated for older pages (I checked it for eswiki). I was under the impression that this either had finished or is in progress. Probably best to ask Asher for more details.
Best,
Diederik
On Thu, May 10, 2012 at 5:09 PM, Aaron Halfaker aaron.halfaker@gmail.comwrote:
I'm using the web API and db42.
API example: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
MySQL example: mysql> select rev_id, rev_sha1 from revision where rev_page = 15661504 and rev_id <= 488033783 order by rev_timestamp desc limit 2; +-----------+---------------------------------+ | rev_id | rev_sha1 | +-----------+---------------------------------+ | 488033783 | i8x0e29kxs2t1o9f1h03ks7q59yyyrv | | 485404713 | | +-----------+---------------------------------+ 2 rows in set (0.00 sec)
-Aaron
On Thu, May 10, 2012 at 4:03 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:
Which machine are you accessing? D
On Thu, May 10, 2012 at 4:58 PM, Aaron Halfaker < aaron.halfaker@gmail.com> wrote:
Hey guys,
I'm trying to use the sha1 hashes of Wiki content for the first time (woot! Props to D et al. for seeing it through) but I'm having some trouble actually getting them out of the API/databases. It looks like the checksums only go back to April 19th. Is this true of all pages? Is there any plan to propagate the metric backwards?
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
That was intended as far as the original deployment. AFAIK, there's a script to backfill values but RobLa should be able to tell you more.
On Thu, May 10, 2012 at 2:34 PM, Aaron Halfaker aaron.halfaker@gmail.comwrote:
Asher,
As you can see in the conversation below, I discovered that sha1 hashes are not being generated historically for revision content in Wikipedia -- just generated from the time of release (mid-April) forward. Is this intended? If so, is there a plan to fill in the sha1's historically or should we (data monkeys) keep track of THE GREAT SHA1 EPOCH of April, 2012 when performing our analysis from here forward?
Thanks! -Aaron
On Thu, May 10, 2012 at 4:27 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:
Hi Aaron,
You are right, it seems not to have been calculated for older pages (I checked it for eswiki). I was under the impression that this either had finished or is in progress. Probably best to ask Asher for more details.
Best,
Diederik
On Thu, May 10, 2012 at 5:09 PM, Aaron Halfaker <aaron.halfaker@gmail.com
wrote:
I'm using the web API and db42.
API example: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
MySQL example: mysql> select rev_id, rev_sha1 from revision where rev_page = 15661504 and rev_id <= 488033783 order by rev_timestamp desc limit 2; +-----------+---------------------------------+ | rev_id | rev_sha1 | +-----------+---------------------------------+ | 488033783 | i8x0e29kxs2t1o9f1h03ks7q59yyyrv | | 485404713 | | +-----------+---------------------------------+ 2 rows in set (0.00 sec)
-Aaron
On Thu, May 10, 2012 at 4:03 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:
Which machine are you accessing? D
On Thu, May 10, 2012 at 4:58 PM, Aaron Halfaker < aaron.halfaker@gmail.com> wrote:
Hey guys,
I'm trying to use the sha1 hashes of Wiki content for the first time (woot! Props to D et al. for seeing it through) but I'm having some trouble actually getting them out of the API/databases. It looks like the checksums only go back to April 19th. Is this true of all pages? Is there any plan to propagate the metric backwards?
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Actually, I'll defer to Aaron Schulz on this (cc'd). Reading the server admin log[1], it looks like he started this on April 24 on "all.dblist", which I'm assuming means a sequential run though all of our dbs. No idea whatsoever what the ETA is for this or how to find out other than asking Aaron.
As Federico pointed out, this is tracked here: https://bugzilla.wikimedia.org/show_bug.cgi?id=36081
Rob
On Thu, May 10, 2012 at 2:53 PM, Asher Feldman afeldman@wikimedia.org wrote:
That was intended as far as the original deployment. AFAIK, there's a script to backfill values but RobLa should be able to tell you more.
On Thu, May 10, 2012 at 2:34 PM, Aaron Halfaker aaron.halfaker@gmail.com wrote:
Asher,
As you can see in the conversation below, I discovered that sha1 hashes are not being generated historically for revision content in Wikipedia -- just generated from the time of release (mid-April) forward. Is this intended? If so, is there a plan to fill in the sha1's historically or should we (data monkeys) keep track of THE GREAT SHA1 EPOCH of April, 2012 when performing our analysis from here forward?
Thanks! -Aaron
On Thu, May 10, 2012 at 4:27 PM, Diederik van Liere dvanliere@wikimedia.org wrote:
Hi Aaron,
You are right, it seems not to have been calculated for older pages (I checked it for eswiki). I was under the impression that this either had finished or is in progress. Probably best to ask Asher for more details.
Best,
Diederik
On Thu, May 10, 2012 at 5:09 PM, Aaron Halfaker aaron.halfaker@gmail.com wrote:
I'm using the web API and db42.
API example: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
MySQL example: mysql> select rev_id, rev_sha1 from revision where rev_page = 15661504 and rev_id <= 488033783 order by rev_timestamp desc limit 2; +-----------+---------------------------------+ | rev_id | rev_sha1 | +-----------+---------------------------------+ | 488033783 | i8x0e29kxs2t1o9f1h03ks7q59yyyrv | | 485404713 | | +-----------+---------------------------------+ 2 rows in set (0.00 sec)
-Aaron
On Thu, May 10, 2012 at 4:03 PM, Diederik van Liere dvanliere@wikimedia.org wrote:
Which machine are you accessing? D
On Thu, May 10, 2012 at 4:58 PM, Aaron Halfaker aaron.halfaker@gmail.com wrote:
Hey guys,
I'm trying to use the sha1 hashes of Wiki content for the first time (woot! Props to D et al. for seeing it through) but I'm having some trouble actually getting them out of the API/databases. It looks like the checksums only go back to April 19th. Is this true of all pages? Is there any plan to propagate the metric backwards?
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Aaron Halfaker, 10/05/2012 23:34:
As you can see in the conversation below, I discovered that sha1 hashes are not being generated historically for revision content in Wikipedia -- just generated from the time of release (mid-April) forward. Is this intended? If so, is there a plan to fill in the sha1's historically or should we (data monkeys) keep track of THE GREAT SHA1 EPOCH of April, 2012 when performing our analysis from here forward?
There's nothing strange in this behaviour, it's been always like that for similar new columns, they need to be populated (usually, they are after a few years). Bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=36081
Nemo