Toolserver-l May 2011

toolserver-l@lists.wikimedia.org

13 participants
21 discussions

by Ryan Kaldari

Sorry for the listserv spam, but if anyone can take care of this JIRA ticket this weekend, it would be greatly appreciated: https://jira.toolserver.org/browse/TS-1038 My laptop died right before I left for the Wiki-GLAMcamp in New York. I'm hoping to work with Multichill on developing some GLAM tools this weekend, but unfortunately, I'm locked out of the toolserver at the moment. Ryan Kaldari

12 years, 11 months

Change to misc sites web server

by River Tarnell

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, The front-end web server on amaranth (which serves JIRA, MediaWiki, FishEye and WordPress) has been changed from Sun Java System Web Server to Apache. This should not cause any noticeable changes, but please report any problems to JIRA or ts-admins(a)toolserver.org. - river. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (SunOS) iEYEARECAAYFAk3T3OEACgkQIXd7fCuc5vJr8wCfYRN6DaltlbGWX+qfDv/R4oLe mSEAniY5Yfoj2VvWV1wsE+1Db8oWQb1/ =D6af -----END PGP SIGNATURE-----

12 years, 11 months

Small (non-breaking) change of toolserver.namespacename

by DaB.

Hello all, some user complained of the new way toolserver.namespacename handles cases in which the primary name of namespace is identical to the canonical name. To fix this without breaking tools that rely on the new behavior, I add a new column "ns_is_favorite" to the table today (this solution was found with the help of valhallasw and Krinkle in IRC yesterday). The coloumn is a boolean field that describe if this ns_name is the best solution for a ns_id or not. See the following examples: mysql> SELECT ns_name,ns_type FROM namespacename WHERE domain="en.wikipedia.org" AND ns_id=4; +-----------+-----------+ | ns_name | ns_type | +-----------+-----------+ | Wikipedia | primary | | Project | canonical | | WP | alias | +-----------+-----------+ mysql> SELECT ns_name,ns_type FROM namespacename WHERE domain="en.wikipedia.org" AND ns_id=4 AND ns_is_favorite=1; +-----------+---------+ | ns_name | ns_type | +-----------+---------+ | Wikipedia | primary | +-----------+---------+ mysql> SELECT ns_name,ns_type FROM namespacename WHERE domain="en.wikipedia.org" AND ns_id=2; +---------+-----------+ | ns_name | ns_type | +---------+-----------+ | User | canonical | +---------+-----------+ mysql> SELECT ns_name,ns_type FROM namespacename WHERE domain="en.wikipedia.org" AND ns_id=2 AND ns_is_favorite=1; +---------+-----------+ | ns_name | ns_type | +---------+-----------+ | User | canonical | +---------+-----------+ As you can see, if you use "ns_is_favorite=1" you will always get only 1 result, no matter if there are primary and canonical or just a canonical one. If you have questions or suggestions, please reply. Sincerly, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

12 years, 11 months

Replag

by John

Ive seen the primary sql servers for 1,2/5 lagged about 12 hours for the last day. While the fast servers are current. Any idea on the source of the issue? John

12 years, 11 months

fixing problems regarding PHP's multi-byte string processing

by Yusuke M

Hi Toolserver users and admins, We've seen a problem regarding non-Latin (Unicode) texts in PHP [1], and this is already a long-standing issue. I'd like to wrap up the situation and would like to discuss how to get it better. Here is the summary of the problem: Several widely-used string functions of PHP, including strupper() and ucfirst(), are known to "corrupt" strings when used under a UTF-8 locale [2], which is the current setting at the Toolserver. The problem is that those functions can incorrectly recognize a part of a multi-byte character sequence as a single-byte character. When those parts are converted into upper/lower cases, the resulting string will corrupt. We've seen this problem has been breaking down the functionalities of a number of major tools on the Toolserver including Vvv's sulutils and SoxRed93's edit counter. For example, a Chinese/Japanese string "利用者" (meaning "user") doesn't have a capitalized form. However, when it's passed to a tool which (I assume) uses ucfirst, the first character is converted into a non-existent character [3], and the result doesn't make sense. An incomplete list of the affected tools is available at [4]. See also TS-923 [1] for more details. River suggested [1] to solve it by migrating into multi-byte aware functions such as mb_strupper [5], but I think it's not an ideal solution. I'd totally encourage the migration too, but it would take time for all developers to fix their tools appropriately. I hope we can have a more fundamental, instant solution. The synchronization of reports of similar problems [4] suggests that there was a underlying common reason. The behavior of string processing seem to have changed in different programs almost simultaneously, somewhere around October 2010. The underlying reason might be a side-effect from some changes in the PHP platform on the Toolserver, but I don't have any clue what it really was. If someone could point out the original reason, it would be a great help to step forward to a better solution. Wikimedians and Toolserver users using multi-byte characters (including Arabic, Chinese, Korean and Japanese characters) have been apparently unhappy about this problem for more than half a year. I hope all the tools can (again) work more multilingually. Any comments or suggestions? [1] https://jira.toolserver.org/browse/TS-923 [2] http://www.phpwact.org/php/i18n/utf-8 [3] http://toolserver.org/~vvv/sulutil.php?user=%E5%88%A9%E7%94%A8%E8%80%85 [4] https://jira.toolserver.org/secure/ManageLinks.jspa?id=24486 [5] http://php.net/manual/en/function.mb-strtoupper.php Cheers, Whym -- http://toolserver.org/~whym/

12 years, 11 months

s1 servers changed

by River Tarnell

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Due to high replication lag on thyme (s1) I've switched 'sql-s1' to rosemary. If you have scripts which access user databases on s1 and haven't been updated to use sql-s1-user instead, you will find your databases are missing. The fix is to connect to sql-s1-user instead of sql-s1. Scripts which do not use user databases will not be affected and should not be changed. This change will be reverted once thyme has caught up. - river. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (SunOS) iEYEARECAAYFAk3OC2QACgkQIXd7fCuc5vIz7ACdENxlPUZHXengxHE/ZcYp4hkt g5oAoLTAXH4VJWJVZCla5sAhQ25sniRh =9Jc7 -----END PGP SIGNATURE-----

12 years, 11 months

Schema-only database backups

by River Tarnell

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, It is now possible to mark a user database to be backed up without its data, i.e. only the schema is dumped (mysqldump -d). To do this, include the string "_transient" in the database name: u_jsmith_transient u_jsmith_my_transient_data_p If you have databases whose data doesn't need to be backed up, it would be very helpful if you could rename them to include _transient in the name, to reduce the load / disk space requirements of the nightly backup job. Unfortunately MySQL doesn't provide a way to rename a database, but you can copy the database to a new name like this: $ mysqladmin create u_jsmith_transient $ mysqldump --opt u_jsmith | mysql u_jsmith_transient The old database should then be dropped. - river. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (SunOS) iEYEARECAAYFAk3OGlQACgkQIXd7fCuc5vKdsQCaAwWY3LJ/H4+JcvB0x7VHTyzS VSMAnjVCQBvlfOVG/2DmLtUz0c9EUe5Y =sB8i -----END PGP SIGNATURE-----

12 years, 11 months

Quick response if possible / re. teacher, class editing

by c h

Briefly: an English teacher is insisting you / WMF (via emails) have said it's OK for his class to share an account. We've tried very nicely helping him make drafts and not share, but he's insistent; indeed now claiming it is a 'crime against humanity'. It's ongoing; they've created a couple of test live articles, and been warned by several. I've tried my damnedest to help 'em, but...they're about to be blocked, I think. I just want to check - did you indicate anything to him, giving him the OK to share accounts? See http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incide… Cheers.

12 years, 11 months

Re: [Toolserver-l] Toolserver-l Digest, Vol 67, Issue 5

by c h

Ignore last message; clearly sent to wrong addy. Sorry. -Chzz

12 years, 11 months

Query Service Inquiry

by Jim Hutchinson

(I tried to post this question before but was not properly registered for the mailing list. If this is a repeat I apologize.) I am in need of some guidance on how to get some data out of the query service. I signed up for an account, but I'm not sure if I'm supposed to create an "issue" or if there is some other process I should follow. I'm also not sure if the query service is right way to go about this. In short, as part of a graduate research project, I need to select about 100 articles in Wikipedia, find the most frequent editors (there is a "contributors" tool which ranks them) such as those with more than 10 edits to the article and then for each of these editors generate a list of all pages they have edited with a frequency count for each. Does this sound like the query service is the right way to go about collecting this data and if so can someone point me to the proper procedure for making such a request? Thanks. -- Jim

12 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Toolserver-l May 2011