TLDR:
Hi!
PAWS is now capable of connecting and using the new replicas.
Here are some resources you can check:
- News/Wiki Replicas 2020 Redesign#How should I connect to databases in PAWS?
- Accessing the new replicas, changes from the previous cluster
- Using Wikireplicas from PAWS with Python
In summary, due to issues with mysql-proxy and the new architecture, connecting to the replicas will be more in line with the Toolforge approach.
There is a credentials file in $HOME/.my.cnf that you can use when connecting, instead of the environment variables. For the host name, you can use the same ones you would use when connecting from Toolforge ("{wiki}.{analytics,web}.db.svc.wikimedia.cloud").
To update a notebook, here is an example of the couple of changes when connecting:
- import os
import pymysql
conn = pymysql.connect(
- host = os.environ['MYSQL_HOST'],
+ host = "eswiki.analytics.db.svc.wikimedia.cloud",
- user = os.environ['MYSQL_USERNAME'],
- password = os.environ['MYSQL_PASSWORD'],
+ read_default_file = ".my.cnf",
database = "eswiki_p"
)Note you have to connect to the host name of the DB you are going to query against.
Existing notebooks remain readable with the output cached, and we are working on updating the documentation.
In two weeks -April 15- the old cluster will migrate the old cluster to utilize new replication hosts, at which point replication may stop and running PAWS notebooks connecting to the old cluster may get stale results.
In ~four weeks -April 28- the old hostnames will be redirected to the new cluster, and running notebooks connecting to MYSQL_HOST will not work and will need updating the credentials and DB host name.
--
Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation