One of the conclusions from the recent SessionManager rollout failure [0] was:
"we should have recruited and coordinated testing by developers and
users inside and outside of the WMF while the code was only on the
beta testing cluster"
SessionManager is back on the WMF beta cluster [1] now after being
briefly removed for the 1.27.0-wmf.12 release cycle, so an
announcement seems in order. The beta cluster implements a SUL
authenticated wiki farm that is completely separate from the Wikimedia
production SUL system. Helping test SessionManager there would involve
logging in, logging out, creating new user accounts and generally
wandering around the wikis doing things you would normally do in
production while keeping an eye out for session related issues.
If you spot something (or just think you spotted something) file a
Phabricator task with as many details as you can provide and tag it
with the #reading-infrastructure-team project. For session related
issues getting traces of the headers and cookies used in the requests
that are failing is most helpful. You can also poke around in the
logging interface [2] to try and find associated error messages.
If you find other bugs, report them in Phabricator too. :)
Also please remember NOT TO USE passwords in the beta cluster that
match the passwords you use anywhere else on the planet!
[0]: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160123-Session…
[1]: http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page
[2]: https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/default
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Hello,
to anyone who is a client of stream.wikimedia.org
(https://wikitech.wikimedia.org/wiki/RCStream), so people who run tools
relying on the RC stream.
In about 48 hours, on February 3rd at 20:00 UTC, we will have to reboot
the backend servers of the stream.wm.org service, rcs1001 and 1002.
This service is loadbalanced and we will only reboot the 2 servers one at a
time,
so there should be no service downtime.
But nevertheless your clients will get disconnected and may need your
intervention to reconnect.
Please be prepared to do so if your client will not automatically reconnect.
I will send another mail once this has happened.
Best regards,
Daniel
--
Daniel Zahn <dzahn(a)wikimedia.org>
Operations Engineer
Hi folks,
In the ArchCom meeting earlier today, Daniel, Timo, Tim and I discussed the
way we handle RFC assignments in Phabricator. Previously, the RFC would
frequently be assigned to person writing the RFC. As we try out the Rust
model (per T123606 <https://phabricator.wikimedia.org/T123606>), and as we
try to increase the speed by which RFCs move though the process, we thought
it would make sense to also assign RFCs to shepherds on the ArchCom.
We didn't discuss all of the implications of this in the meeting today, but
we think this might help us scale our RFC triage process. What do you all
think?
Rob
A security vulnerability has been discovered in MediaWiki setups which
use both Wikibase and MobileFrontend.
All projects in the Wikimedia cluster have been since patched but if
you use these two extensions please be sure to apply the fix.
Patch file and issue are documented on https://phabricator.wikimedia.org/T125684
I came here to post about the same problem. It appears to have broken
this morning. Did the API change again?
On 4 February 2016 at 07:33, Dr. Michael Bonert
<michael(a)librepathology.org> wrote:
> In September 2014, I described an intermittent recurrent problem associated
> with the InstantCommons images.
>
> I hadn't seen the problem seen since April or May of 2015.
>
> It is now back -- after I upgraded to MediaWiki 1.26.2.
> Unlike in the past, it hasn't resolved with a server re-boot.
>
> I did have record traffic earlier in the week... but the problem --like in
> the past-- doesn't seem to be traffic related.
>
> I cannot find a relation to memory. I'm not running out of memory.
> The underlying LAMP (Debian Linux Apache MySQL PHP) stack hasn't changed.
>
> Product Versions
> Operating System: Debian (stable)
> MediaWiki 1.26.2 (f465524)
> PHP 5.6.13-0+deb8u1 (apache2handler)
> MySQL 5.5.44-0+deb8u1
>
> The details of what is installed is here:
> http://librepathology.org/wiki/index.php/Special:Version
>
> I do have git installed -- and am using it to upgrade.
>
> Like in the past-- the image that is local to the site, i.e.
> non-InstantCommons,
> (
> http://librepathology.org/wiki/index.php/File:Atypical_ductal_hyperplasia_-…
> )
> is still there-- and displays properly. At the same time, all the images
> from the InstantCommons are gone.
>
>
> A list of previous posts I did can be found here:
> https://lists.wikimedia.org/pipermail/mediawiki-l/2014-September/043340.html
> https://lists.wikimedia.org/pipermail/mediawiki-l/2014-September/043345.html
> https://lists.wikimedia.org/pipermail/mediawiki-l/2014-September/043348.html
> https://lists.wikimedia.org/pipermail/mediawiki-l/2014-September/043349.html
>
> Help on this would be much appreciated...
>
> I have a testing site (pathologyprotocols.org) that is running the exact
> same set-up (behind a login).
> It is also failing suddenly in the same way; the InstantCommons images are
> gone.
>
> I wonder whether...
> - It is the InstantCommons server
> - There is "talk" between the InstantCommons server and a MediaWiki install
> (with the InstantCommons activated).
> I know this as images that are removed from the InstantCommons... are
> then removed on then
> MediaWiki install with InstantCommons activated.
>
> I think the communication is governed by 'descriptionCacheExpiry' and
> 'apiThumbCacheExpiry'
> https://www.mediawiki.org/wiki/Manual:$wgUseInstantCommons
> ? I wonder whether setting those to infinity would solve anything.
> - I suspect the thumb cache is purged... and then there is a bug preventing
> the image from being
> confirmed as being the same as on the InstantCommons... or there a
> processing bottle neck as the thumbnails have to be re-created.
>
> Thanks in Advance,
> Michael
>
>
> _______________________________________________
> MediaWiki-l mailing list
> To unsubscribe, go to:
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Hi,
This is the monthly report from the Wikimedia Performance Team for January
2016.
## Our progress ##
### Multi-datacenter
* The central login system started to use DB slaves for some actions
instead of master DB.
* MediaWiki Special pages and Action classes now support defining DB query
and write expectations (with logging for violations thereof).
* Clean up of haphazard database transaction methods largely finished in
MediaWiki core and extensions.
* Lock acquisition time reduced. Logic was inefficient and resulted in
wasted time. This reduced time spent in backend when saving edits.
* "Rebound purges" enabled in production. To compensate for DB lag, a
secondary purge for articles avoids stale content in Varnish. –
https://phabricator.wikimedia.org/T113192
### Navigation Timing
* We experimenting with creating a new metric "Time to first image" (how
long for the principle image to show). Based on video capture, we were
unable to correlate User Timing API measurements with when an image
actually becomes visible. To be revisited at a later time. –
https://phabricator.wikimedia.org/T115600.
### Media handling (Thumbor)
* Added cgroup support for controlling resource consumption.
* Implemented ability to pre-render multiple thumbnail sizes with a single
request.
* Video thumbnails render faster by loading only the relevant frame from
Swift (not the whole video).
* Working on a strategy for Thumbor deployment in Wikimedia production.
Thumbor is stateless and acts as drop-in replacement for current MediaWiki
PHP image scalers. – https://phabricator.wikimedia.org/T121388
* Our Thumbor plugins have been consolidated into a single Git repo and
moved Gerrit to Phabricator Diffusion.
### ResourceLoader:
* Work has started on the solution for the cache performance problem with
static MediaWiki resources. – https://phabricator.wikimedia.org/T99096
### Metric dashboards
* Earlier this month, Timo Tijhof gave a tech talk on "Creating Useful
Dashboards with Grafana" – https://www.youtube.com/watch?v=UlL6UoRUQAM
* New dashboard: https://grafana.wikimedia.org/dashboard/db/edit-count (Global
edit rate of Wikimedia wikis)
* New dashboard:
https://grafana.wikimedia.org/dashboard/db/time-to-first-byte (Navigation
Timing "responseStart" metric)
## How are we doing? ##
### Metrics
Client-side performance has remained stable over the past month. Save
Timing has also remained stable, around the 1s median mark.
Backend Save Processing Timing has improved slightly and was consistently
45ms (median) and 95ms (p75) lower in January compared to December.
https://performance.wikimedia.org/#!/monthhttps://grafana.wikimedia.org/dashboard/db/save-timing?from=now-50dhttps://grafana.wikimedia.org/dashboard/db/navigation-timing?from=now-50dhttps://grafana.wikimedia.org/dashboard/db/time-to-first-byte?from=now-50d
### Job Queue
On January 1st, the Job Queue started growing rapidly with htmlCacheUpdate
jobs. This was mitigated on 21 January by adding a dedicated runner for
that job type. Total queue size reached over 7 million before the dedicated
runner went live (typical size is under 300K). –
https://phabricator.wikimedia.org/T124194. There is an ongoing
investigation at https://phabricator.wikimedia.org/T124418 about the
increase of those jobs.
https://grafana.wikimedia.org/dashboard/db/job-queue-health?from=now-50d
Until the next time,
Gilles, Peter, Aaron, Ori, and Timo.
https://www.mediawiki.org/wiki/Wikimedia_Performance_Team
Hello,
I am writing to let you know that Pygments, the library we use for
syntax-highlighting, has been updated to the latest release (version 2.1)
in the SyntaxHighlight_GeSHi extension and across Wikimedia wikis. This
release includes support for 53 new languages and language variants.
They are listed below, with the lexer name (the string to use as a value
for the 'lang' attribute of <syntaxhighlight lang=...> tags) in parenthesis:
- ABNF ("abnf")
- ADL ("adl")
- Arduino ("arduino")
- BC ("bc")
- BNF ("bnf")
- Boogie ("boogie")
- CAmkES ("camkes", "idl4")
- CPSA ("cpsa")
- CSS+Mako ("css+mako")
- Component Pascal ("cp", "componentpascal")
- Crmsh ("pcmk", "crmsh")
- Csound Document ("csound-csd", "csound-document")
- Csound Orchestra ("csound", "csound-orc")
- Csound Score ("csound-score", "csound-sco")
- Earl Grey ("earlgrey", "earl-grey", "eg")
- Easytrieve ("easytrieve")
- Elm ("elm")
- Ezhil ("ezhil")
- Fish ("fishshell", "fish")
- FortranFixed ("fortranfixed")
- HTML+Mako ("html+mako")
- Hexdump ("hexdump")
- J ("j")
- JCL ("jcl")
- JavaScript+Mako ("js+mako", "javascript+mako")
- LessCss ("less")
- MSDOS Session ("doscon")
- Mako ("mako")
- ODIN ("odin")
- PacmanConf ("pacmanconf")
- ParaSail ("parasail")
- PkgConfig ("pkgconfig")
- PowerShell Session ("ps1con")
- Praat ("praat")
- QML ("qbs")
- QVTO ("qvt", "qvto")
- Roboconf Graph ("roboconf-graph")
- Roboconf Instances ("roboconf-instances")
- Shen ("shen")
- SuperCollider ("sc", "supercollider")
- TAP ("tap")
- Tcsh Session ("tcshcon")
- Termcap ("termcap")
- Terminfo ("terminfo")
- Terraform ("terraform", "tf")
- Thrift ("thrift")
- TrafficScript ("trafficscript", "rts")
- Turtle ("turtle")
- TypeScript ("typescript")
- X10 ("xten", "x10")
- XML+Mako ("xml+mako")
- cADL ("cadl")
J and BNF were previously supported by mapping them to other languages with
close-enough syntax: Objective-J and Extended BNF, respectively. Each of
those now has a dedicated lexer, so the highlighted output should hew more
closely to the language definition.
For more information on SyntaxHighlight_GeSHi and Pygments, see:
- https://www.mediawiki.org/wiki/Extension:SyntaxHighlight_GeSHi
- http://pygments.org/
Ori Livneh
Perf Team
wikitech.wikimedia.org will be unavailable for a short time on
Friday, beginning around 7AM San Francisco time while we apply kernel
updates and reboot[1]. Ideally the downtime will only last 5-10 minutes
but I've scheduled an hour window on the deploy calendar in case any of
the boxes have trouble reviving.
Labs instances and Toollabs services should not be affected. Some
alerts may fire during the window as we're planning to reboot one of the
monitoring hosts and the effects of that are unpredictable.
In case of emergency, the content of wikitech (minus some images)
is always available on the external backup:
https://wikitech-static.wikimedia.org/wiki/Main_Page
-Andrew
[1] For the curious (and my future reference), we're rebooting silver,
holmium, labcontrol1002, labnet1001, labmon1001
= 2015-02-03 =
== Product ==
=== Reading ===
==== Web ====
* New related pages desktop design pushed, will measure engagement
* Instrumentation in prep for language switcher change [analytics]
* new user page designs on mobile web beta going live
* Extension:Gather - PageImages showing non-free images - meeting
* scheduled to iron out API request/response format to first try in
* Related Articles on web to adjust output
==== Android ====
* New beta coming soon with improved memory usage for images and support
* for animations. Also includes A/B testing for CirrusSearch (T125393).
==== iOS ====
* Will be integrating "top articles" feature using pageviews API
* Looking forward to mobileview API change from web team to get article
* namespaces!
==== Reading Infrastructure ====
* Nothing much this week. SessionManager should be coming back into
* master once wmf.12 is verified not to break stuff.
=== Community Tech ===
* No update.
=== Editing ===
==== Collaboration ====
* '''Blocking''':
** Dry run patch for external store migration is merged. Now we need to
set External Store up on Beta, then test the dry run patch there:
https://phabricator.wikimedia.org/T119567
* '''Blocked''':
** Flow dump generation on dumps.wikimedia.org:
https://phabricator.wikimedia.org/T119511
* '''Updates''':
** We're still working on human-readable names for cross-wiki
notifications: https://phabricator.wikimedia.org/T121936
** MediaWiki presence at FOSDEM went well.
==== Language ====
* No update.
==== Multimedia ====
* '''Blocking''': none
* '''Blocked''': none
* '''Updates''':
** Work on image tweaks extension continues; may need input later.
==== Parsing ====
* '''Blocking''': none
* '''Blocked''': Need input from Collaboration, see below.
* '''Updates''':
** ruthenium updated to jessie and node 4.2 with almost everything
puppetized (thanks to ops, Marko, Ori). Tests look good. Need to run
separate memory load tests before making a decision to move production
parsoid to node 4.2
** Need input from collaboration team about
https://phabricator.wikimedia.org/T124837 (migrating Flow to talk with
RESTBase) since it will simplify support when we remove inlined data-mw
from Parsoid HTML (I consider this ticket a weak blocker)
*** Matt: Should we schedule a meeting about this? Sure .. let us do it
this week.
** Will work with services team to finalize REST API versioning policy
this week -- last chance to provide input on
https://phabricator.wikimedia.org/T124365 ... Policy will be applied
when Parsoid HTML version is changed after inlined data-mw is moved out
of Parsoid HTML
** Heads up (VE, Language, Flow): We might be able to deploy
templatedata-based serialization of transclusions next week (depends on
reviews this week).
*** https://gerrit.wikimedia.org/r/#/c/264043/ if you want to test your
respective clients against it (
https://gerrit.wikimedia.org/r/#/c/264043/16/tests/mocha/templatedata.js
and https://gerrit.wikimedia.org/r/#/c/264043/16/tests/mockAPI.js has
tests that spec behavior if you want to take a closer look)
==== VisualEditor ====
* '''Blocking''': none known
* '''Blocked''':
** https://phabricator.wikimedia.org/T58337 being worked on in review
from Krinkle for https://gerrit.wikimedia.org/r/#/c/259771/ and
https://gerrit.wikimedia.org/r/#/c/265878/ and so
https://gerrit.wikimedia.org/r/#/c/265879/
* '''Updates''':
** Released yesterday (wmf.12), editing via jQuery.IME (thanks to
Language for their support); table editing improvements (move
columns/rows; copy-paste multiple cells; make/unmake tables
sortable/wikitable; cell and table contexts)
** wmf.13 will contain a split up version of OOUI; see
https://phabricator.wikimedia.org/T113677 for work on this and some
numbers. This is not a breaking change except for non-MediaWiki users of
OOUI like VE, for whom we'll flag this.
=== Discovery ===
* Data import from analytics to ES started
* Working on integrating completion suggester for all prefix searches
* (will involve some small API changes in SearchEngine)
* TextCat is ready for inclusion in mediawiki/vendors, waiting for final
* security signoff
* Preparing for A/B test to use opening_text instead of text in morelike
* query, to improve performance
* Upgrading Wikidata Query Service to Blazegraph 2.0, so far working ok
* but some weird exceptions, investigating
* '''Blocking''': none
* '''Blocked''': security final signoff for textcat
== Technology ==
=== Analytics ===
* Dashiki: implemented limn-like layout, will end-of-life most limn
* dashboards soon
* Event Logging: problems were largely due to large tables, getting
* better as we're trimming some of those
* Wikimetrics: finished program metrics feature, deploying soon
* Jobs to count Uniques based on the Last-Access cookie are being
* productionized, will be available soon
* Bot convention thread on analytics-l concluded with us asking
* non-human user agents to include the word "Bot", for analytics
* purposes. We'll communicate that here soon:
* https://meta.wikimedia.org/wiki/User-Agent_policy
=== Performance ===
* No update.
=== Release Engineering ===
* Blocking:
** Phase out antimony.wikimedia.org,
https://phabricator.wikimedia.org/T123718
** /var/log/phd/damons.log growing on iridium,
https://phabricator.wikimedia.org/T124651
* Blocked: none
* Updates:
** 1.27.0-wmf.11 was skipped (and burned in effigy)
** 1.27.0-wmf.12 delayed by staging issues yesterday but train is
starting today
*** fun firefighting due to /srv/mediawiki-staging being wiped out
*** wmf.12 does not contain SessionManager due to an outstanding bug
that remains unreproducible
** scap 3.0 tagged and packaged! thanks ops!
** Finishing up work on puppet scap provider
=== Research ===
* No update.
=== Security ===
* No update.
=== Services ===
* Decommissioning parsoid-lb.wikimedia.org around Feb 22 --
* https://phabricator.wikimedia.org/T110474
:* use RESTBase instead
* RESTBase
:* added streaming support
:* refactor: separating out the framework part of RESTBase
:* minor tweaks and bug fixes
* EventBus
:* enabled on all wikis (module private ones)
:* final tweaks to the schemas -
https://phabricator.wikimedia.org/T124741
=== Technical Operations ===
* No update.
== Advancement ==
=== Fundraising Tech ===
* Adam Wight and Andrew Green taking the month off fr-tech to work on
* Education Program extension
* Got CI jobs running against paymentswiki branch of mediawiki (voting)
* with 1.27 non-voting (thank you releng)
* more CiviCRM enhancements
* fixes and enhancements for backup credit card processor
* prep for Latin America fundraising expansion
* investigating banner impression data outage that started yesterday
* (https://phabricator.wikimedia.org/T125676)
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |