December 2017 - Cloud - lists.wikimedia.org

Changing my shell preference
by Huji Lee 31 Jan '18

31 Jan '18

Is there a way to change my preferred shell on the Clouds to zsh? When I try chsh -s `which zsh` it asks for a password, which I don't have of course. Thanks, Huji

6 14

Help with a query
by Huji Lee 31 Dec '17

31 Dec '17

I wrote this query to find all page moves done in fawiki in 2017, and determine how many edits the performing user had prior to that page move. The query tries to use indexes, as much as I could think of, and yet it runs for a very long time (more than 20 min, at which point it gets killed). Any ideas on how to further optimize this query is appreciated! Thanks, Huji use fawiki_p; select log_id, log_timestamp, log_user, log_user_text, log_title, log_comment, log_page, page_namespace, case when ug_group = 'bot' then 1 else 0 end as user_is_bot, ( select count(*) from revision_userindex where rev_user = log_user and rev_timestamp < log_timestamp ) as rev_count_before_move from logging join page on page_id = log_page left join user_groups on log_user = ug_user and ug_group = 'bot' where log_action = 'move' and log_timestamp > '20170101000000'

6 13

Quarry slower than before
by Huji Lee 30 Dec '17

30 Dec '17

I feel like Quarry is slower than before (before being last week or last month). Queries almost always get queued, and once executed, simple queries take longer to result. I have no idea how to investigate this though. Any thoughts? Huji

2 2

Re: [Cloud] Tools that need user tables joins
by Martin Domdey 28 Dec '17

28 Dec '17

6 10

Browser extension for unsourced Wikipedia articles
by Guilherme Gonçalves 24 Dec '17

24 Dec '17

Hi everyone, I've been hacking on a new tool and I thought I'd share what (little) I have so far to get some comments and learn of related approaches from the community. The basic idea would be to have a browser extension that tells the user if the current page they're viewing looks like a good reference for a Wikipedia article, for some whitelisted domains like news websites. This would hopefully prompt casual/opportunistic edits, especially for articles that may be overlooked normally. As a proof of concept for a backend, I built a simple bag-of-words model of the TextExtracts of enwiki's Category:All_articles_needing_additional_references. I then set up a tool [1] to receive HTML input and retrieve the 5 most similar articles to that input. You can try it out in your browser [2], or on the command line [3]. The results could definitely be better, but having tried it on a few different articles over the past few days, I think there's some potential there. I'd be interested in hearing your thoughts on this. Specifically: * If such a backend/API were available, would you be interested in using it for other tools? If so, what functionality would you expect from it? * I'm thinking of just throwing away the above proof of concept and using ElasticSearch, though I don't know a lot about it. Is anyone aware of a similar dataset that already exists there, by any chance? Or any reasons not to go that way? * Any other comments on the overall idea or implementation? Thanks! 1- https://github.com/eggpi/similarity 2- https://tools.wmflabs.org/similarity/ 3- Example: curl https://www.nytimes.com/2017/09/22/opinion/sunday/portugal-drug-decriminali… | curl -X POST http://tools.wmflabs.org/similarity/search --form "text=<-" -- Guilherme P. Gonçalves

5 8

Re: [Cloud] interwiki languages
by Martin Domdey 24 Dec '17

24 Dec '17

2 2

Re: [Cloud] Tools that need user tables joins
by Martin Domdey 23 Dec '17

23 Dec '17

5 6

Tools that need user tables joins
by Maarten Dammers 23 Dec '17

23 Dec '17

Hi everyone, In the new database setup user databases are no longer possible on the same servers as where the production databases are. I noticed on https://phabricator.wikimedia.org/T142807 Daniel saying "Death blow for GHEL coordinate extraction and WikiMiniAtlas." and on https://phabricator.wikimedia.org/T183066 several tools broke down. Do we have an overview of tools that are now broken? Did the database admins actually contact the tool maintainers about the loss of functionality or was this just send to the -announce list? Maarten

1 0

Problem with one of the database servers?
by Russell Blau 18 Dec '17

18 Dec '17

One of tools.dplbot's daily tasks has been having repeated problems since yesterday. A script that ran without errors and completed in about 10 minutes on Friday ran for over 90 minutes on Saturday, and died with a "MySQL server has gone away" error. There were no edits to the script in between Friday and Saturday, so I have to assume that something changed on the server side. The script reads from enwiki.analytics.db.svc.eqiad.wmflabs, and both reads from and writes to tools.labsdb. All of the errors occurred on writes to the user database. I was able to work around the errors by dropping the database connection and opening a new one immediately before writing (I have no idea why this works, since the timeout setting on the database for inactive connections is 8 hours, and this script was not even running for two hours; but it did work). However, the script continues to run for an order of magnitude longer than it did on Friday (~100 minutes vs. ~10 minutes). Is anyone else experiencing similar issues? -- Russell Blau russblau(a)imapmail.org

3 2

Grid hosts
by John 18 Dec '17

18 Dec '17

Given that Trusty was released almost 4 years ago, is there any plans for getting a newer platform for grid users? This is partially in relation to T183090, there are some areas where the k8s just fail. What prospects are there for moving to a newer grid exec nodes? I would start to expect that we will be seeing more and more cases of software incompatibility or security issues arise as time passes, and that given the glacial speed at which such a move would take I am surprised we have not seen the first stages of a migration already in progress.

2 1

2024

2023

2022

2021

2020

2019

2018

2017

Cloud December 2017