TLDR: - PAWS can now connect to the new replicas, see News/Wiki Replicas 2020 Redesign#How should I connect to databases in PAWS? https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#How_should_I_connect_to_databases_in_PAWS%3F for more info. - Report issues here: T276284 Establish a working setup for PAWS with multi-instance wikireplicas https://phabricator.wikimedia.org/T276284
Hi!
PAWS is now capable of connecting and using the new replicas.
Here are some resources you can check:
- News/Wiki Replicas 2020 Redesign#How should I connect to databases in PAWS? https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#How_should_I_connect_to_databases_in_PAWS%3F - Accessing the new replicas, changes from the previous cluster https://public.paws.wmcloud.org/User:JHernandez_(WMF)/Accessing%20the%20new%20replicas,%20changes%20from%20the%20previous%20cluster.ipynb - Using Wikireplicas from PAWS with Python https://public.paws.wmcloud.org/User:JHernandez_(WMF)/Accessing%20Wikireplicas%20from%20PAWS.ipynb
In summary, due to issues with mysql-proxy and the new architecture, connecting to the replicas will be more in line with the Toolforge approach.
There is a credentials file in $HOME/.my.cnf that you can use when connecting, instead of the environment variables. For the host name, you can use the same ones you would use when connecting from Toolforge (" {wiki}.{analytics,web}.db.svc.wikimedia.cloud").
To update a notebook, here is an example of the couple of changes when connecting:
- import os import pymysql
conn = pymysql.connect( - host = os.environ['MYSQL_HOST'], + host = "eswiki.analytics.db.svc.wikimedia.cloud",
- user = os.environ['MYSQL_USERNAME'], - password = os.environ['MYSQL_PASSWORD'], + read_default_file = ".my.cnf", database = "eswiki_p" )
Note you have to connect to the host name of the DB you are going to query against.
Existing notebooks remain readable with the output cached, and we are working on updating the documentation.
In two weeks -April 15- the old cluster will migrate the old cluster to utilize new replication hosts, at which point replication may stop and running PAWS notebooks connecting to the old cluster may get stale results.
In ~four weeks -April 28- the old hostnames will be redirected to the new cluster, and running notebooks connecting to MYSQL_HOST will not work and will need updating the credentials and DB host name.
If you find any issues or problems or need help, please reach out via IRC, mailing list, or in the phabricator task T276284 Establish a working setup for PAWS with multi-instance wikireplicas https://phabricator.wikimedia.org/T276284