Hi,
I rely quite a lot on phabricator notifications for checking new activity
on my subscribed tasks (I've found to suit me better than email).
After using the unread view in phab quite a lot I noticed that I really
wanted to see the notifications grouped by the task, since that let's me
see the new activity on the task at a glance and open it to read/act if I
need to.
So I've made a user script that does exactly that:
https://gist.github.com/joakin/c0e5dffc23aaf05175a580d24a2adefe
There's a gif of what this does in the README and the code is really short.
I hope this is useful to some of you, it certainly has been making my life
easier.
I apply it to https://phabricator.wikimedia.org/notification/query/unread/
only, but I guess there's no reason why it wouldn't work in
https://phabricator.wikimedia.org/notification/*
Cheers.
Hi everyone,
For this week's office hour, we'd like to discuss [T589: schema change
for page content language][1] Timo brought this up on the list a
couple of weeks ago, and didn't get much of a response. (But thank you
Jaime for responding!) We discussed this one fairly recently ([July
13][2]). This area of our system is still in need of simplification
and optimization.
The discussion is scheduled for 2016-08-31 UTC:
Time: Wednesday 21 UTC (2pm PDT, 23 CEST)
Place: #wikimedia-office
Phab event: [E266][3]
[ArchCom/Status][4]
Rob
[1]: <https://phabricator.wikimedia.org/T589>
[3]: <https://phabricator.wikimedia.org/E228> July 13 meeting
[3]: <https://phabricator.wikimedia.org/E266> Upcoming meeting
[4]: <https://www.mediawiki.org/wiki/Architecture_committee/Status>
---------- Forwarded message ----------
From: Krinkle <krinklemail(a)gmail.com>
Date: Wed, Aug 10, 2016 at 1:54 PM
Subject: [Wikitech-l] Schema migration for 'image' and 'oldimage' tables
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
TL;DR: Participate on T589 and help decide what the upcoming schema change
should entail, and how we'll migrate existing data.
Hey all,
Couple weeks ago we dedicated an IRC office hour to
https://phabricator.wikimedia.org/T589 (RFC: image and oldimage tables).
Updated draft at:
https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tabl…
We clarified scope and purpose of this particular RFC. Other issues are
still important but considered orthogonal, and to be dealt with parallelly
(or at a later time).
Revised problem statement:
1. File revisions should have unique identifiers (better than "current file
title + upload timestamp". (Subject to race conditions, hard to
index/query, etc.)
2. Uploading new file revisions must not involve rows moving across tables,
or rows being replaced.
Participants agreed with the revised problem statement, it makes sense not
to merely add primary keys to the existing tables ("Proposal 1" on the RFC
draft), as that wouldn't adequately solve the Problem 2.
The second proposal was to separate information about image revision from
the image entity itself. Similar to the page/revisions tables. This was
generally accepted as a good idea, but details are still to be determined.
The general idea is that all revision-specific information (except for a
pointer to the current revision) would no longer live in the 'image' table.
Instead, information about all (for both current and past revisions) would
live in the same table (instead of being moved around from one table to
another when it's no longer the current one).
Details at:
https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tabl…
Some open questions I'd like to see discussed on Phabricator (or here on
wikitech):
1. Which fields do we keep in the 'image' table (img_id, img_name,
img_latest, anything else?).
All fields currently being queried from both tables, will probably only
stay in the image revision table. But there are a few fields that we
intentionally only want to query about current versions. For example
'img_sha1'. For duplicate-detection, we need to only consider the latest
revisions. Doing this by keeping img_sha1 means uploading a new revision
will involve updating two fields instead of one (img_latest and img_sha1).
This isn't unprecedented as we do this for page as well
(WikiPage::updateRevisionOn; page_latest, page_touched, page_is_redirect,
page_len).
Are there other fields we need to keep besides img_sha1? Or should we can
solve the img_sha1 use case in a different manner?
2. img_metadata
This field is a blob of serialised PHP (typically representing the Exif
data of an image).
Tim (correct me if I got it wrong) mentioned we could potentially make
migration easier by changing img_metadata to be stored in a separate table
and change the img_metadata field (in the image revision table) to instead
be a pointer to a primary key.
This could potentially be done separately later, but if it helps migration,
we should consider doing it now.
How will this interact with file deletion? Will it be difficult to garbage
collect this? Do we need to? (We don't seem to do it for the 'text' table /
external store; is it worth moving this an external store?)
3. Migration
If we rename both tables (image/oldimage -> file/filerevision), we'd have
the ability to run migration in the background without interfering with the
live site, and without requiring a long read-only period and/or duplicate
and additional code complexity to be developed.
Is there a way we can do the migration without creating two new tables?
Using the oldimage table as import destination for current rows isn't
straight forward as existing scan queries would need to skip the current
rows somehow while in the midst of this migration. Seems possible, but is
it worth the complexity? (We'd need extra code that knows about that
migration field, and how long do we keep that code? Also complicates
migration for third-parties using update.php).
Is creating the new tables separately viable for the scale of Wikimedia
Commons? (and dropping the old ones once finished). Is this a concern from
a DBA perspective with regards to storage space? (We'd temporarily need
about twice the space for these tables). So far I understood that it
wouldn't be a problem per se, but that there are also other options we can
explore for Wikimedia. For example we could use a separate set of slaves
and alter those while depooled (essentially using entirely separate set of
db slaves instead of a separate table within each slave).
Do we create the new table(s) separately and switch over once it's caught
up? This would require doing multiple passes as we depool slaves one by one
(we've done that before at Wikimedia). Switch-over could be done by
migrating before the software upgrade, with a very short read-only period
after the last pass is finished. It wouldn't require maintaining multiple
code paths, which is attractive.
Other ideas?
-- Timo
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
https://www.mediawiki.org/wiki/Scrum_of_scrums/2016-08-31
== Product ==
=== Reading ===
==== Reading Web ====
* Current sprint:
* Fixing lazy loaded images bugs (ex: mathml formulas)
* Diagnosed problem with hovercards EL data. Will submit fix next sprint
* Train blocked and unblocked on MediawikiServices introduction on
MobileFrontend
* Next sprint:
* Carry on work on footer work and lazy loaded images work
* Related pages improvements
* Shipping wikidata descriptions in mobile web to some wikis
==== Mobile Content Service (MCS) ====
* Plus symbol in title fix work in progress
** blocked on issue with url fragments (e.g. /wiki/[title]#[section])
* Trending service standup ongoing, some field renaming work, probably
better to move from rcstream to ChangeProp
* Likely 'On this day' service work to start in next several weeks
==== Android native app ====
* Current sprint: https://phabricator.wikimedia.org/project/view/2178/
** continuing navigation overhaul; forecasting to have it complete this
sprint.
** made an 'interim' release to production, with some Feed features that
were most requested by users.
* Next sprint:
* planning to complete design touch-ups and get ready to release.
==== iOS native app ====
* Current release board:
https://phabricator.wikimedia.org/project/board/1736/
** 5.1 set to be released today or tomorrow
** We found a late minor regression affecting citation links, but likely
* Next board (no change):
https://phabricator.wikimedia.org/project/view/2150/
** 5.2 is in developmentwith expected deployment alongside iOS 10 release
in late September
*** Adding iOS 10 support (with widgets)
*** Dropping iOS 8 support
-
==== Reading Infrastructure ====
* nothing blocking/blocked
=== Community Tech ===
* Currently rolling out numeric collation to English Wikipedia (will take a
few more days for the script to complete)
* Rolling out PageAssessments to English Wikivoyage this week (possibly
English Wikipedia next week). Jamie will help us monitor.
* No blockers
=== Editing ===
==== Collaboration ====
* '''Blocking''':
** Continuing work on Flow caching rewrite for multi-DC. We're now 1)
using WanCache, 2) deleting on write and setting cache on read. Still
verifying that everything is working properly.
* '''Blocked'':
* '''Updates''':
** Finished the work to unwatch from Echo notifications.
** Flow VE fixes
** Added a server-side message poster. This is a way to post to a talk
page without knowing whether it uses Flow or wikitext. We already have
this on the client as well.
** Issues with mw.notify. We've temporarily re-implemented locally, but
want to resolve the core issues and use that. See
https://gerrit.wikimedia.org/r/#/c/306560/ .
==== Language ====
* Blocking:
* Blocked:
* Updates:
** Apertium packaging work finished (Except kaz/kaz-tat), Kartik/Alex to
start work on Jessie migration for service.
** New CXStats page:
https://test.wikipedia.org/wiki/Special:ContentTranslationStats
** Work related to template adaption continue.
** MLEB released last week.
==== Parsing ====
* Blocked on security review of Parser Migration extension (I see now that
Security is on it)
* Ongoing work to clean up parser tests infrastructure
* Resumed work on Language Variants support in Parsoid (initial work to
attain rendering parity with PHP Parser output)
* Ongoing work with Linker rewrite as part of cleanup for the shadow
namespaces work
== Analytics ==
* loading of new AQS (pageview API) cluster ongoing, will switch over to it
when done, scaling and load testing docs avilable here:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Scaling
* new event bus logging from mw hooks merged, will start being available on
event bus soon
* browser dashboards with loads of traction as of late thanks to twitter
and blogpost: https://blog.wikimedia.org/2016/08/19/most-popular-browser/ (we
reached 2000 unqiue visits):
https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os
* Currently evaluating druid/clickhouse as viable datastore for edit data
=== Security ===
* Security reviews this week:
** Youdao MT
** Catching up on overdue reviews -- comments for
https://phabricator.wikimedia.org/T141591 (HTML5Depurate and
ParserMigration will be posted later today)
* Darian working on hiring tasks
* Darian out next week (Sept. 6-9)
** Substitute with Phabricator understanding needed for Data Breach
Training on Sept. 9th at 11:00a.m. Pacific; just have to answer questions
that may come up with regard to setting Phab tickets private
=== Services ===
* Blocking: none
* Blocked: none
* Updates
** Parsoid move to scap3 completed
** Change-Prop is updating summary on wikidata item change now
** Summary endpoint includes wikidata description now
** New sections transform API to be deployed today
=== Technical Operations ===
* '''Blocking'''
** None
* '''Blocked'''
** HTML (RESTBase) dumps: script does not account for deleted
pages/content. - blocked by services -
https://phabricator.wikimedia.org/T133547
* Updates
** wikidiff2 upgraded to 1.4.1 across the cluster
** got a new DBA hire (fewer work for Jcrespo)
** TechOps offsite happening week of Sept 25 (last week of quarter), please
work around this for deployments
=== ArchCom ===
* https://www.mediawiki.org/wiki/ArchComStatus
* '''Last week''': 2016-08-24 (Wednesday, 2016W34)
** [[Phab:E262|E262: ArchCom Planning meeting]] ([[Architecture
committee/2016-08-24|notes]])
** [[Phab:E263|E263: ArchCom-RFC office hour]]
*** Topic: '''[[Phab:T69223|Schema change for page content language
(T69223)]]'''
*** '''''Final comment''' ends 2016-08-31 (Wednesday)''
* '''This week''': 2016-08-31 (Wednesday, 2016W35)
** [[Phab:E265|E265: ArchCom Planning meeting]] ([[Architecture
committee/2016-08-31|notes]])
** [[Phab:E266|E266: ArchCom-RFC office hour]]
*** Topic: [[Phab:T589|'''image and oldimage tables (T589)''']]
*** RelEng participation would be especially helpful (late breaking request)
=== Discovery ===
* No blockers
* Working on BM25 implementation
* Working on multi-wiki indexes
* Working on integrating Polestar (http://vega.github.io/polestar/),
working on beta site (wdqs-test.wmflabs.org)
* SPARQL Workshop on September 8th:
https://office.wikimedia.org/wiki/SPARQL_workshop
==== Maps ====
* Enabling <maplink> everywhere (https://phabricator.wikimedia.org/T144062 )
=== RelEng ===
* '''Blocking'''
* '''Blocked'''
** (ops) https://gerrit.wikimedia.org/r/#/c/300092/ ("contint: tidy
Nodepool slaves config history")
** (ops) Help requested: Upgrade base MW-Vagrant image to Jessie -
https://phabricator.wikimedia.org/T136429
*** Outline from bd808: https://phabricator.wikimedia.org/T136429#2572195
*** Ori suggesting Ops support:
https://phabricator.wikimedia.org/T136429#2572433
* '''Updates'''
=== Performance ===
* No blockers
* More ResourceLoader work (cached module load performance improvements)
* More transactions work
* mcrouter for WANCache support added
* Multi-DC ChronologyProtector improvements (masking latency)
* Thumbor fully set up on beta, a few things to improve before switching it
on
* PerformanceInspector bugfixes based on beta, getting ready for community
outreach
* WebPageTest traffic shaping bugfixes
=== Fundraising Tech ===
* Deploying Redis consumers, decommissioning activemq
* More dedupe work
* CentralNotice geolocation changes
https://phabricator.wikimedia.org/T143271#2562534
* Large civi upgrade tonight
== Wikidata ==
* No blockers.
* Contributing in ArchCom discussions (content languages, multi content, …).
* Refactoring our jQuery UI based code base (
https://phabricator.wikimedia.org/T142694).
* Figuring out how to show usage tracking data (
https://phabricator.wikimedia.org/T103091).
* Restoring "purge without confirm" user right (
https://phabricator.wikimedia.org/T143435).
== WMDE TCB ==
* Wondering if there's any update on ETA for adding watchlist IDs to the
production database (<https://phabricator.wikimedia.org/T125990> for<
https://phabricator.wikimedia.org/T8964>)
https://phabricator.wikimedia.org/diffusion/MW/browse/master/RELEASE-NOTES-…
* User::isBot() method for checking if an account is a bot role account.
> * Added a new hook, 'UserIsBot', to aid in determining if a user is a bot.
>
Sounds great! We have been waiting for it for a long time.
Will that work with a time parameter? Can I know somehow that the user WAS
a bot in the time of an edit?
Other question: release notes say, "1.28 has several database changes since
1.27, and will not work without schema
updates."
If I want to install a private wiki for private purposes on my own laptop
(on user, offline work, no access from outside), which is better choice:
stable version or 1.28 alpha to avoid database changes later?
--
Bináris
Hi,
How can I find out from witihn a wiki, what namespaces itt uses? I thought,
Special:Version would be a place to find, but no.
(I mined it from Pywikibot at last.)
Additional question: where can I see available aliases as a user?
--
Bináris
Hello Everyone,
I am writing to share with you an effort from the Android team to
start identifying
themes of products
<https://www.mediawiki.org/wiki/Reading/Readers_contributions> [0] that
would allow readers to create micro-contributions that are welcomed and
actually needed by fellow Wikipedia editors.
The team has already identified 18 ideas as examples of tasks readers can
do to help editors, we would like to expand the conversation to help us
evaluate the importance of the idea*s*. While thinking, the team already
had criteria for evaluating the ideas
<https://www.mediawiki.org/wiki/Reading/Readers_contributions/Reading_team_t…>,
but this is still missing community input on how ideas are evaluated and
what would actually get high votes for being something that matters, in
order for the team to start working on. Please feel encouraged to add
more ideas and adjust criteria for evaluation if needed.
This work is a continuation of the reading consultation
<https://www.mediawiki.org/wiki/User_Interaction_Consultation> earlier done
in April. The team is excited to continue the conversation early with the
community in order to define product themes.
Ideas promoted from this conversation will be designed in Android first,
given the consideration of lower traffic and relative ease of
implementation, but the team will be excited and watching for lessons
learned in order to move ideas to the web.
This work is made possible by Jon Katz, Reading team's senior PM, and
Dmitry Brant, the product owner of Android. Thanks for their thoughtful
and collaborative approach".
We will allow the conversation to run for a month, after which we can
already start exploring ideas for implementation in Q3. Please help spread
the word across village pumps.
Looking forward to your input --
Best,
Moushira
Community Liaison for Reading team
[0] https://www.mediawiki.org/wiki/Reading/Readers_contributions
[1] https://www.mediawiki.org/wiki/Reading/Readers_
contributions/Reading_team_thoughts
[1] https://www.mediawiki.org/wiki/User_Interaction_Consultation
Hey,
This is the 19th weekly update from revision scoring team that we have sent
to this mailing list.
Deployments:
- We deployed a set of new models to ORES that reduce our memory usage
and slightly increase fitness. [1] These models were discussed in an email
to the "ai" mailing list. [2]
- We also completed a major quarterly goal. The ORES review tool is now
deployed as a beta feature on 8 wikis! [3] This came with some quick fixes
to fix some confusion and usability issues. [4] The beta feature is now
available on English, Polish, Portuguese, Russian, Dutch, Persian and
Turkish Wikipedias as well as Wikidata.
New development:
- We discussed and came to a rough consensus about how to integrate ORES
into api.php. [5]
- We deployed a new edit quality campaign on English Wikipedia to gather
more data for training ORES. [6, 7]
- We added a specific set of user groups to the ORES models for Turkish
Wikipedia and saw an increase in model fitness. [8]
Maintenance and robustness:
- We fixed bugs in our maintenance scripts for purging old model
versions [9, 10]
- We switch to using our production models on the beta labs cluster so
now we can catch vandalism there too (and know that the models actually
work) [11]
- We improved the error messages reported from Wiki Labels so that the
actual error appears when the API responds with non-200 HTTP status [12]
1. https://phabricator.wikimedia.org/T144101 -- Deploy ORES at 2016-08-29
2. https://lists.wikimedia.org/pipermail/ai/2016-August/000068.html
3. https://phabricator.wikimedia.org/T140002 -- [Epic] Deploy ORES review
tool
4. https://phabricator.wikimedia.org/T143988 -- $wgOresModels set all
models true
5. https://phabricator.wikimedia.org/T122689 -- [Discuss] api.php
integration with ORES
6. https://phabricator.wikimedia.org/T143745 -- Deploy 2016 edit quality
campaign to English Wikipedia
7. https://en.wikipedia.org/wiki/Wikipedia:Labels/Edit_quality
8. https://phabricator.wikimedia.org/T140474 -- Include specific user
groups in the trwiki edit quality model
9. https://phabricator.wikimedia.org/T144216 -- Purge model score should
clean when there is no row is ores_model too
10. https://phabricator.wikimedia.org/T143798 -- Update model versions is
badly broken in ORES extension
11. https://phabricator.wikimedia.org/T143567 -- Switch beta to use the
proper wiki models for scoring (rather than "testwiki")
12. https://phabricator.wikimedia.org/T138255 -- Wikilabels UI reports
non-200 status errors badly
Sincerely,
Aaron from the Revision Scoring team
Can we get a repo for each major release that contains core, skins, and all
extensions in a single checkout? Right now I am updating and with all of
the sub modules, I am running into issues where I cant set a branch on the
extensions/ repo for REL 1_27 and get all extensions as of that branching,
Instead I am forced to go thru one by one and manually set it, if and only
if a given extension has said branch set, otherwise I am out of luck.
Getting everything together would cause a larger checkout, but keeps things
together for those who want to pick and choose.
Hey folks,
We've been working on generating some updated models for ORES. These
models will behave slightly differently from the models that we currently
have deployed. This is a natural artifact of retraining the models on the
*exact same data* again because of some random properties of the learning
algorithms. So, for the most part, this should be a non-issue for any
tools that use ORES. However, I wanted to take this opportunity to
highlight some of the facilities ORES provides to help automatically detect
and adjust for these types of changes.
*== Versions ==*
ORES provides information about all of the models. This information
includes a model version number. If you are caching ORES scores locally,
we recommend invalidating old scores whenever this model number changes.
For example, https://ores.wikimedia.org/v2/scores/enwiki/damaging/12345678
currently returns
{
"scores": {
"enwiki": {
"damaging": {
"scores": {
"12345678": {
"prediction": false,
"probability": {
"false": 0.7141333465390294,
"true": 0.28586665346097057
}
}
},
"version": "0.1.1"
}
}
}
}
This score was generated with the "0.1.1" version of the model. But once
we deploy the new models, the same request will return:
{
"scores": {
"enwiki": {
"damaging": {
"scores": {
"12345678": {
"prediction": false,
"probability": {
"false": 0.8204647324045306,
"true": 0.17953526759546945
}
}
},
"version": "0.1.2"
}
}
}
}
Note that the version number changes to "0.1.2" and the probabilities
change slightly. In this case, we're essentially re-training the same
model in a similar way, so we increment the "patch" number.
However, we're switching modeling strategies for the article quality models
(enwiki-wp10, frwiki-wp10 & ruwiki-wp10), so those versions increment the
minor version from "0.3.2" to "0.4.0". You may see more substantial
changes in prediction probabilities with those models, but a quick
spot-checking suggests that the changes are not substantial.
*== Test statistics and threshholding ==*
So, many tools that use our edit quality models (reverted, damaging and
goodfaith) will set threshholds for flagging edits for review. In order to
support these tools, we produce test statistics that suggest useful
thresholds.
https://ores.wmflabs.org/v2/scores/enwiki/damaging/?model_info=test_stats
produces:
...
"filter_rate_at_recall(min_recall=0.75)": {
"filter_rate": 0.869,
"recall": 0.752,
"threshold": 0.492
},
"filter_rate_at_recall(min_recall=0.9)": {
"filter_rate": 0.753,
"recall": 0.902,
"threshold": 0.173
},
...
These two statistics show useful thresholds for detecting damaging edits.
E.g. if you want to be sure that you catch nearly all vandalism (and are OK
with a higher false-positive rate), set the threshold at 0.173, but if
you'd like to catch most vandalism with almost no false-positives, set the
threshold at 0.492. These fields can be read automatically by tools so
that they do not need to be manually updated every time that we deploy a
new model.
Let me know if you have any questions and happy hacking!
-Aaron