Hi all, Is it possible to store data into user tables through queries on Wikireplica DBs? Or is it only possible by mysqldump'ing from the replica DB and loading into the user table in a separate step? I am thinking of aggregate data. Thanks!
No, the wikireplicas are read-only.
That said, ToolsDB (Toolforge user DB) is not equipped to handle becoming an aggregate datamart for the wiki replicas at any thing close to large (wiki) scale--just small chunks that are ideally regularly cleaned up. Are we talking about a database on a Cloud VPS instance or ToolsDB?
Brooke Storm SRE Wikimedia Cloud Services bstorm@wikimedia.org IRC: bstorm_
On 5/8/20 12:31 PM, Huji Lee wrote:
Hi all, Is it possible to store data into user tables through queries on Wikireplica DBs? Or is it only possible by mysqldump'ing from the replica DB and loading into the user table in a separate step? I am thinking of aggregate data. Thanks!
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
I was thinking of running a query, temporarily store its output on ToolsDB. Join it with some other query and then throw it away. All in small scale and for short-term.
On Fri, May 8, 2020 at 5:01 PM Brooke Storm bstorm@wikimedia.org wrote:
No, the wikireplicas are read-only.
That said, ToolsDB (Toolforge user DB) is not equipped to handle becoming an aggregate datamart for the wiki replicas at any thing close to large (wiki) scale--just small chunks that are ideally regularly cleaned up. Are we talking about a database on a Cloud VPS instance or ToolsDB?
Brooke Storm SRE Wikimedia Cloud Servicesbstorm@wikimedia.org IRC: bstorm_
On 5/8/20 12:31 PM, Huji Lee wrote:
Hi all, Is it possible to store data into user tables through queries on Wikireplica DBs? Or is it only possible by mysqldump'ing from the replica DB and loading into the user table in a separate step? I am thinking of aggregate data. Thanks!
Wikimedia Cloud Services mailing listCloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Hmm, does create temporary table work? If not, can it?
Martin
On Fri, May 8, 2020, 11:04 PM Huji Lee huji.huji@gmail.com wrote:
I was thinking of running a query, temporarily store its output on ToolsDB. Join it with some other query and then throw it away. All in small scale and for short-term.
On Fri, May 8, 2020 at 5:01 PM Brooke Storm bstorm@wikimedia.org wrote:
No, the wikireplicas are read-only.
That said, ToolsDB (Toolforge user DB) is not equipped to handle becoming an aggregate datamart for the wiki replicas at any thing close to large (wiki) scale--just small chunks that are ideally regularly cleaned up. Are we talking about a database on a Cloud VPS instance or ToolsDB?
Brooke Storm SRE Wikimedia Cloud Servicesbstorm@wikimedia.org IRC: bstorm_
On 5/8/20 12:31 PM, Huji Lee wrote:
Hi all, Is it possible to store data into user tables through queries on Wikireplica DBs? Or is it only possible by mysqldump'ing from the replica DB and loading into the user table in a separate step? I am thinking of aggregate data. Thanks!
Wikimedia Cloud Services mailing listCloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Temporary tables are awesome! But they are session-dependent. I am thinking of a scenario in which different scripts (different sessions) would generate data sets in parallel, and then they are combined.
Obviously, I can do all of this using flat file data dumps (what I do right now). But I was trying to understand my options.
On Fri, May 8, 2020 at 5:07 PM Martin Urbanec martin.urbanec@wikimedia.cz wrote:
Hmm, does create temporary table work? If not, can it?
Martin
On Fri, May 8, 2020, 11:04 PM Huji Lee huji.huji@gmail.com wrote:
I was thinking of running a query, temporarily store its output on ToolsDB. Join it with some other query and then throw it away. All in small scale and for short-term.
On Fri, May 8, 2020 at 5:01 PM Brooke Storm bstorm@wikimedia.org wrote:
No, the wikireplicas are read-only.
That said, ToolsDB (Toolforge user DB) is not equipped to handle becoming an aggregate datamart for the wiki replicas at any thing close to large (wiki) scale--just small chunks that are ideally regularly cleaned up. Are we talking about a database on a Cloud VPS instance or ToolsDB?
Brooke Storm SRE Wikimedia Cloud Servicesbstorm@wikimedia.org IRC: bstorm_
On 5/8/20 12:31 PM, Huji Lee wrote:
Hi all, Is it possible to store data into user tables through queries on Wikireplica DBs? Or is it only possible by mysqldump'ing from the replica DB and loading into the user table in a separate step? I am thinking of aggregate data. Thanks!
Wikimedia Cloud Services mailing listCloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Coming back to this comment from Martin: how can we use temporary tables on Labs? They don't seem to be available on a connection to wiki replica DBs:
MariaDB [fawiki_p]> create temporary table tmp(i int); ERROR 1044 (42000): Access denied for user 's12345'@'%' to database 'fawiki_p'
On Fri, May 8, 2020 at 5:07 PM Martin Urbanec martin.urbanec@wikimedia.cz wrote:
Hmm, does create temporary table work? If not, can it?
Martin
On Fri, May 8, 2020, 11:04 PM Huji Lee huji.huji@gmail.com wrote:
I was thinking of running a query, temporarily store its output on ToolsDB. Join it with some other query and then throw it away. All in small scale and for short-term.
On Fri, May 8, 2020 at 5:01 PM Brooke Storm bstorm@wikimedia.org wrote:
No, the wikireplicas are read-only.
That said, ToolsDB (Toolforge user DB) is not equipped to handle becoming an aggregate datamart for the wiki replicas at any thing close to large (wiki) scale--just small chunks that are ideally regularly cleaned up. Are we talking about a database on a Cloud VPS instance or ToolsDB?
Brooke Storm SRE Wikimedia Cloud Servicesbstorm@wikimedia.org IRC: bstorm_
On 5/8/20 12:31 PM, Huji Lee wrote:
Hi all, Is it possible to store data into user tables through queries on Wikireplica DBs? Or is it only possible by mysqldump'ing from the replica DB and loading into the user table in a separate step? I am thinking of aggregate data. Thanks!
Wikimedia Cloud Services mailing listCloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
On Sun, May 10, 2020 at 4:55 PM Huji Lee huji.huji@gmail.com wrote:
Coming back to this comment from Martin: how can we use temporary tables on Labs? They don't seem to be available on a connection to wiki replica DBs:
MariaDB [fawiki_p]> create temporary table tmp(i int); ERROR 1044 (42000): Access denied for user 's12345'@'%' to database 'fawiki_p'
Temporary tables are not supported on the Wiki Replica servers. See https://phabricator.wikimedia.org/T179628 for discussion. The TL;DR is that MySQL/MariaDB's temporary tables can cause locking which in turn causes replication lag.
Bryan
On Fri, May 8, 2020 at 3:01 PM Brooke Storm bstorm@wikimedia.org wrote:
No, the wikireplicas are read-only.
There is slightly more context for this restriction in the blog post that announced the "new" Wiki Replicas databases in 2017 [0].
The prior generation of Wiki Replica databases did allow arbitrary table creation by users. This was both a powerful feature and a cause of pain for system administration. User created tables were not replicated across the cluster which meant that when system maintenance was performed tables or data would mysteriously vanish from the point of view of the user if traffic was routed to a different backend server. Data would also be irretrievably lost when a server experienced major system failures. It was a difficult decision for the Cloud Services and DBA teams to make to remove this useful feature, but doing so has allowed much greater ease of administration and by extension better availability and less replication drift issues for the wiki databases than was possible with the feature in place.
[0]: https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_server...
Bryan
One clarification,
While everything Bryan says is right, I have to say that the main factor in not allowing writes on wikireplicas wasn't pain for administration (which was real) as much as avoiding continuous blockage of data due to other users creating long-running write queries (pain for users). As Bryan says, indeed a read only service allows automatic load balancing to spread the load, avoiding outages. Long running read queries can cause overload, but not blocking.
Creating regularly summary tables that are useful for several users is something that I would like to explore, but as I mentioned here: https://lists.wikimedia.org/pipermail/cloud/2020-April/001051.html there is a lot of programming and a list of problems to solve before that could be setup. This is right now not a priority, but like with every task on Wikimedia (see the linked task), volunteer work could help speed it up 0:-).
On Sat, May 9, 2020 at 3:01 AM Bryan Davis bd808@wikimedia.org wrote:
On Fri, May 8, 2020 at 3:01 PM Brooke Storm bstorm@wikimedia.org wrote:
No, the wikireplicas are read-only.
There is slightly more context for this restriction in the blog post that announced the "new" Wiki Replicas databases in 2017 [0].
The prior generation of Wiki Replica databases did allow arbitrary table creation by users. This was both a powerful feature and a cause of pain for system administration. User created tables were not replicated across the cluster which meant that when system maintenance was performed tables or data would mysteriously vanish from the point of view of the user if traffic was routed to a different backend server. Data would also be irretrievably lost when a server experienced major system failures. It was a difficult decision for the Cloud Services and DBA teams to make to remove this useful feature, but doing so has allowed much greater ease of administration and by extension better availability and less replication drift issues for the wiki databases than was possible with the feature in place.
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Super informative! Thank you Bryan and Jaime. I will take a look at that task and see if I can be of any help.
On Sat, May 9, 2020 at 1:38 AM Jaime Crespo jcrespo@wikimedia.org wrote:
One clarification,
While everything Bryan says is right, I have to say that the main factor in not allowing writes on wikireplicas wasn't pain for administration (which was real) as much as avoiding continuous blockage of data due to other users creating long-running write queries (pain for users). As Bryan says, indeed a read only service allows automatic load balancing to spread the load, avoiding outages. Long running read queries can cause overload, but not blocking.
Creating regularly summary tables that are useful for several users is something that I would like to explore, but as I mentioned here: https://lists.wikimedia.org/pipermail/cloud/2020-April/001051.html there is a lot of programming and a list of problems to solve before that could be setup. This is right now not a priority, but like with every task on Wikimedia (see the linked task), volunteer work could help speed it up 0:-).
On Sat, May 9, 2020 at 3:01 AM Bryan Davis bd808@wikimedia.org wrote:
On Fri, May 8, 2020 at 3:01 PM Brooke Storm bstorm@wikimedia.org wrote:
No, the wikireplicas are read-only.
There is slightly more context for this restriction in the blog post that announced the "new" Wiki Replicas databases in 2017 [0].
The prior generation of Wiki Replica databases did allow arbitrary table creation by users. This was both a powerful feature and a cause of pain for system administration. User created tables were not replicated across the cluster which meant that when system maintenance was performed tables or data would mysteriously vanish from the point of view of the user if traffic was routed to a different backend server. Data would also be irretrievably lost when a server experienced major system failures. It was a difficult decision for the Cloud Services and DBA teams to make to remove this useful feature, but doing so has allowed much greater ease of administration and by extension better availability and less replication drift issues for the wiki databases than was possible with the feature in place.
Bryan
Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
-- Jaime Crespo http://wikimedia.org _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud