I polled https://www.mediawiki.org/w/api.php?action=sitematrix&format=jsonfm to get a list of wikis and some metadata then I pulled it into a table in the new analytics-store DB.
The data should be complete at the time I pulled it. It will be relatively cheap to update, so we could set a cron job to check against sitematrix every night. See details below.
analytics-store.eqiad.wmnet [staging]> explain wiki_info; +-----------------+----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+----------------+------+-----+---------+-------+ | wiki | varbinary(100) | NO | PRI | | | | code | varbinary(100) | YES | | NULL | | | sitename | varbinary(100) | YES | | NULL | | | url | varbinary(255) | YES | | NULL | | | lang_id | int(11) | YES | | NULL | | | lang_code | varbinary(100) | YES | | NULL | | | lang_name | varbinary(255) | YES | | NULL | | | lang_local_name | varbinary(255) | YES | | NULL | | +-----------------+----------------+------+-----+---------+-------+ 8 rows in set (0.00 sec)
analytics-store.eqiad.wmnet [staging]> select * from wiki_info limit 3; +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | wiki | code | sitename | url | lang_id | lang_code | lang_name | lang_local_name | +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | rnwiki | wiki | Wikipedia | http://rn.wikipedia.org | 216 | rn | Kirundi | Rundi | | rnwiktionary | wiktionary | Wikipedia | http://rn.wiktionary.org | 216 | rn | Kirundi | Rundi | | rowiki | wiki | Wikipedia | http://ro.wikipedia.org | 217 | ro | română | Romanian | +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ 3 rows in set (0.02 sec)
-Aaron
awesome!
On Mon, Jun 2, 2014 at 7:49 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
I polled https://www.mediawiki.org/w/api.php?action=sitematrix&format=jsonfm to get a list of wikis and some metadata then I pulled it into a table in the new analytics-store DB.
The data should be complete at the time I pulled it. It will be relatively cheap to update, so we could set a cron job to check against sitematrix every night. See details below.
analytics-store.eqiad.wmnet [staging]> explain wiki_info; +-----------------+----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+----------------+------+-----+---------+-------+ | wiki | varbinary(100) | NO | PRI | | | | code | varbinary(100) | YES | | NULL | | | sitename | varbinary(100) | YES | | NULL | | | url | varbinary(255) | YES | | NULL | | | lang_id | int(11) | YES | | NULL | | | lang_code | varbinary(100) | YES | | NULL | | | lang_name | varbinary(255) | YES | | NULL | | | lang_local_name | varbinary(255) | YES | | NULL | | +-----------------+----------------+------+-----+---------+-------+ 8 rows in set (0.00 sec)
analytics-store.eqiad.wmnet [staging]> select * from wiki_info limit 3;
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | wiki | code | sitename | url | lang_id | lang_code | lang_name | lang_local_name |
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | rnwiki | wiki | Wikipedia | http://rn.wikipedia.org | 216 | rn | Kirundi | Rundi | | rnwiktionary | wiktionary | Wikipedia | http://rn.wiktionary.org | 216 | rn | Kirundi | Rundi | | rowiki | wiki | Wikipedia | http://ro.wikipedia.org | 217 | ro | română | Romanian |
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ 3 rows in set (0.02 sec)
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
that’s nifty, thanks Aaron.
On Jun 3, 2014, at 5:14 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
awesome!
On Mon, Jun 2, 2014 at 7:49 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote: I polled https://www.mediawiki.org/w/api.php?action=sitematrix&format=jsonfm to get a list of wikis and some metadata then I pulled it into a table in the new analytics-store DB.
The data should be complete at the time I pulled it. It will be relatively cheap to update, so we could set a cron job to check against sitematrix every night. See details below.
analytics-store.eqiad.wmnet [staging]> explain wiki_info; +-----------------+----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+----------------+------+-----+---------+-------+ | wiki | varbinary(100) | NO | PRI | | | | code | varbinary(100) | YES | | NULL | | | sitename | varbinary(100) | YES | | NULL | | | url | varbinary(255) | YES | | NULL | | | lang_id | int(11) | YES | | NULL | | | lang_code | varbinary(100) | YES | | NULL | | | lang_name | varbinary(255) | YES | | NULL | | | lang_local_name | varbinary(255) | YES | | NULL | | +-----------------+----------------+------+-----+---------+-------+ 8 rows in set (0.00 sec)
analytics-store.eqiad.wmnet [staging]> select * from wiki_info limit 3; +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | wiki | code | sitename | url | lang_id | lang_code | lang_name | lang_local_name | +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | rnwiki | wiki | Wikipedia | http://rn.wikipedia.org | 216 | rn | Kirundi | Rundi | | rnwiktionary | wiktionary | Wikipedia | http://rn.wiktionary.org | 216 | rn | Kirundi | Rundi | | rowiki | wiki | Wikipedia | http://ro.wikipedia.org | 217 | ro | română | Romanian | +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ 3 rows in set (0.02 sec)
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Point of clarification; is this all wikis, or all active wikis, or...? For example, it would be useful to be able to exclude special or closed wikis if necessary.
On 3 June 2014 07:51, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
that’s nifty, thanks Aaron.
On Jun 3, 2014, at 5:14 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
awesome!
On Mon, Jun 2, 2014 at 7:49 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
I polled https://www.mediawiki.org/w/api.php?action=sitematrix&format=jsonfm to get a list of wikis and some metadata then I pulled it into a table in the new analytics-store DB.
The data should be complete at the time I pulled it. It will be relatively cheap to update, so we could set a cron job to check against sitematrix every night. See details below.
analytics-store.eqiad.wmnet [staging]> explain wiki_info; +-----------------+----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+----------------+------+-----+---------+-------+ | wiki | varbinary(100) | NO | PRI | | | | code | varbinary(100) | YES | | NULL | | | sitename | varbinary(100) | YES | | NULL | | | url | varbinary(255) | YES | | NULL | | | lang_id | int(11) | YES | | NULL | | | lang_code | varbinary(100) | YES | | NULL | | | lang_name | varbinary(255) | YES | | NULL | | | lang_local_name | varbinary(255) | YES | | NULL | | +-----------------+----------------+------+-----+---------+-------+ 8 rows in set (0.00 sec)
analytics-store.eqiad.wmnet [staging]> select * from wiki_info limit 3;
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | wiki | code | sitename | url | lang_id | lang_code | lang_name | lang_local_name |
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | rnwiki | wiki | Wikipedia | http://rn.wikipedia.org | 216 | rn | Kirundi | Rundi | | rnwiktionary | wiktionary | Wikipedia | http://rn.wiktionary.org | 216 | rn | Kirundi | Rundi | | rowiki | wiki | Wikipedia | http://ro.wikipedia.org | 217 | ro | română | Romanian |
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ 3 rows in set (0.02 sec)
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
+1, shall we add private and close as separate fields?
On Jun 3, 2014, at 5:15 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Point of clarification; is this all wikis, or all active wikis, or...? For example, it would be useful to be able to exclude special or closed wikis if necessary.
On 3 June 2014 07:51, Dario Taraborelli dtaraborelli@wikimedia.org wrote: that’s nifty, thanks Aaron.
On Jun 3, 2014, at 5:14 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
awesome!
On Mon, Jun 2, 2014 at 7:49 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote: I polled https://www.mediawiki.org/w/api.php?action=sitematrix&format=jsonfm to get a list of wikis and some metadata then I pulled it into a table in the new analytics-store DB.
The data should be complete at the time I pulled it. It will be relatively cheap to update, so we could set a cron job to check against sitematrix every night. See details below.
analytics-store.eqiad.wmnet [staging]> explain wiki_info; +-----------------+----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+----------------+------+-----+---------+-------+ | wiki | varbinary(100) | NO | PRI | | | | code | varbinary(100) | YES | | NULL | | | sitename | varbinary(100) | YES | | NULL | | | url | varbinary(255) | YES | | NULL | | | lang_id | int(11) | YES | | NULL | | | lang_code | varbinary(100) | YES | | NULL | | | lang_name | varbinary(255) | YES | | NULL | | | lang_local_name | varbinary(255) | YES | | NULL | | +-----------------+----------------+------+-----+---------+-------+ 8 rows in set (0.00 sec)
analytics-store.eqiad.wmnet [staging]> select * from wiki_info limit 3; +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | wiki | code | sitename | url | lang_id | lang_code | lang_name | lang_local_name | +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | rnwiki | wiki | Wikipedia | http://rn.wikipedia.org | 216 | rn | Kirundi | Rundi | | rnwiktionary | wiktionary | Wikipedia | http://rn.wiktionary.org | 216 | rn | Kirundi | Rundi | | rowiki | wiki | Wikipedia | http://ro.wikipedia.org | 217 | ro | română | Romanian | +--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ 3 rows in set (0.02 sec)
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
That would be awesome. I mean, this is already awesome but, more awesome :). Also, uggggggh at our API's JSON output being a thinly-veiled hack straight on top of XML.
On 3 June 2014 17:29, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
+1, shall we add private and close as separate fields?
On Jun 3, 2014, at 5:15 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Point of clarification; is this all wikis, or all active wikis, or...? For example, it would be useful to be able to exclude special or closed wikis if necessary.
On 3 June 2014 07:51, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
that’s nifty, thanks Aaron.
On Jun 3, 2014, at 5:14 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
awesome!
On Mon, Jun 2, 2014 at 7:49 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
I polled https://www.mediawiki.org/w/api.php?action=sitematrix&format=jsonfm to get a list of wikis and some metadata then I pulled it into a table in the new analytics-store DB.
The data should be complete at the time I pulled it. It will be relatively cheap to update, so we could set a cron job to check against sitematrix every night. See details below.
analytics-store.eqiad.wmnet [staging]> explain wiki_info; +-----------------+----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+----------------+------+-----+---------+-------+ | wiki | varbinary(100) | NO | PRI | | | | code | varbinary(100) | YES | | NULL | | | sitename | varbinary(100) | YES | | NULL | | | url | varbinary(255) | YES | | NULL | | | lang_id | int(11) | YES | | NULL | | | lang_code | varbinary(100) | YES | | NULL | | | lang_name | varbinary(255) | YES | | NULL | | | lang_local_name | varbinary(255) | YES | | NULL | | +-----------------+----------------+------+-----+---------+-------+ 8 rows in set (0.00 sec)
analytics-store.eqiad.wmnet [staging]> select * from wiki_info limit 3;
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | wiki | code | sitename | url | lang_id | lang_code | lang_name | lang_local_name |
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ | rnwiki | wiki | Wikipedia | http://rn.wikipedia.org | 216 | rn | Kirundi | Rundi | | rnwiktionary | wiktionary | Wikipedia | http://rn.wiktionary.org | 216 | rn | Kirundi | Rundi | | rowiki | wiki | Wikipedia | http://ro.wikipedia.org | 217 | ro | română | Romanian |
+--------------+------------+-----------+--------------------------+---------+-----------+-----------+-----------------+ 3 rows in set (0.02 sec)
-Aaron
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics