The only reason I didn't break this down by
namespace was because they
queries would have taken an order of magnitude longer to join the revision
and page tables. The query I used didn't even need to read the revision or
archive tables. It only read an index on those tables. That made it go
pretty fast. :) I'd be interested in taking another pass if you guys
don't mind dealing with a heavier server load.
On Thu, Jun 12, 2014 at 7:05 PM, Dario Taraborelli <
dtaraborelli(a)wikimedia.org> wrote:
Aaron – this is fantastic.
Two quick questions:
- was the decision not to break down the data by namespace (matching Erik
Zachte’s master editor data dump) intentional?
- are we expecting to refresh the archived revision count field every
month?
Dario
On Jun 12, 2014, at 2:33 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
wrote:
+1
For example, the last time I sent a similar email to the list, it was for
the wiki_info table. One of the tasks I have is to break the code for
generating that table out of the analysis project it lives in and make it a
separate repo so that Oliver can send pull requests to fix issues and/or
maintain his own managed table.
It would be great to work towards an architecture that allows us to keep
these tables up-to-date without user-based cron jobs.
-Aaron
On Thu, Jun 12, 2014 at 4:24 PM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
This is great. I'd like to go on record
saying that this is leaning
towards a data warehouse kind of approach - basically pre-aggregating
useful datasets. So we might want to do this in a more organized way down
the line.
On Thu, Jun 12, 2014 at 2:57 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
This is fricking awesome!
On 12 June 2014 10:58, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
> I created a new table on analytics-store.eqiad.wmnet. It contains the
> monthly edit counts for all wikis. See a brief overview below.
>
> Note that the "revisions" column contains a count of all revisions --
> archived or not. The "archived" column contains a count of archived
> revisions. So revisions - archived == non-archived revisions.
>
> analytics-store.eqiad.wmnet [staging]> explain editor_month;
> +-------------------+----------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +-------------------+----------------+------+-----+---------+-------+
> | wiki | varbinary(50) | NO | PRI | | |
> | month | varbinary(7) | NO | PRI | | |
> | user_id | int(11) | NO | PRI | 0 | |
> | user_name | varbinary(191) | YES | | NULL | |
> | user_registration | varbinary(14) | YES | | NULL | |
> | archived | int(11) | YES | | NULL | |
> | revisions | int(11) | YES | | NULL | |
> +-------------------+----------------+------+-----+---------+-------+
> 7 rows in set (0.01 sec)
>
> analytics-store.eqiad.wmnet [staging]> select * from editor_month
> limit 3;
>
>
+--------+---------+---------+------------+-------------------+----------+-----------+
> | wiki | month | user_id | user_name | user_registration |
> archived | revisions |
>
>
+--------+---------+---------+------------+-------------------+----------+-----------+
> | enwiki | 2001-01 | 34 | WojPob | 20010129110725 |
> 0 | 13 |
> | enwiki | 2001-01 | 99 | RoseParks | 20010121021221 |
> 0 | 7 |
> | enwiki | 2001-01 | 479 | JimboWales | 20010123223416 |
> 0 | 13 |
>
>
+--------+---------+---------+------------+-------------------+----------+-----------+
> 3 rows in set (0.03 sec)
>
> Feedback is welcome. One of the next things, I'd like to do is
> remove the "-" from the month column as it ruins comparison with MW
> timestamps.
>
> -Aaron
>
> _______________________________________________
> wmfresearch mailing list
> wmfresearch(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wmfresearch
>
>
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org