TL;DR: Participate on T589 and help decide what the upcoming schema change
should entail, and how we'll migrate existing data.
Hey all,
Couple weeks ago we dedicated an IRC office hour to
https://phabricator.wikimedia.org/T589 (RFC: image and oldimage tables).
Updated draft at:
https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tabl…
We clarified scope and purpose of this particular RFC. Other issues are
still important but considered orthogonal, and to be dealt with parallelly
(or at a later time).
Revised problem statement:
1. File revisions should have unique identifiers (better than "current file
title + upload timestamp". (Subject to race conditions, hard to
index/query, etc.)
2. Uploading new file revisions must not involve rows moving across tables,
or rows being replaced.
Participants agreed with the revised problem statement, it makes sense not
to merely add primary keys to the existing tables ("Proposal 1" on the RFC
draft), as that wouldn't adequately solve the Problem 2.
The second proposal was to separate information about image revision from
the image entity itself. Similar to the page/revisions tables. This was
generally accepted as a good idea, but details are still to be determined.
The general idea is that all revision-specific information (except for a
pointer to the current revision) would no longer live in the 'image' table.
Instead, information about all (for both current and past revisions) would
live in the same table (instead of being moved around from one table to
another when it's no longer the current one).
Details at:
https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tabl…
Some open questions I'd like to see discussed on Phabricator (or here on
wikitech):
1. Which fields do we keep in the 'image' table (img_id, img_name,
img_latest, anything else?).
All fields currently being queried from both tables, will probably only
stay in the image revision table. But there are a few fields that we
intentionally only want to query about current versions. For example
'img_sha1'. For duplicate-detection, we need to only consider the latest
revisions. Doing this by keeping img_sha1 means uploading a new revision
will involve updating two fields instead of one (img_latest and img_sha1).
This isn't unprecedented as we do this for page as well
(WikiPage::updateRevisionOn; page_latest, page_touched, page_is_redirect,
page_len).
Are there other fields we need to keep besides img_sha1? Or should we can
solve the img_sha1 use case in a different manner?
2. img_metadata
This field is a blob of serialised PHP (typically representing the Exif
data of an image).
Tim (correct me if I got it wrong) mentioned we could potentially make
migration easier by changing img_metadata to be stored in a separate table
and change the img_metadata field (in the image revision table) to instead
be a pointer to a primary key.
This could potentially be done separately later, but if it helps migration,
we should consider doing it now.
How will this interact with file deletion? Will it be difficult to garbage
collect this? Do we need to? (We don't seem to do it for the 'text' table /
external store; is it worth moving this an external store?)
3. Migration
If we rename both tables (image/oldimage -> file/filerevision), we'd have
the ability to run migration in the background without interfering with the
live site, and without requiring a long read-only period and/or duplicate
and additional code complexity to be developed.
Is there a way we can do the migration without creating two new tables?
Using the oldimage table as import destination for current rows isn't
straight forward as existing scan queries would need to skip the current
rows somehow while in the midst of this migration. Seems possible, but is
it worth the complexity? (We'd need extra code that knows about that
migration field, and how long do we keep that code? Also complicates
migration for third-parties using update.php).
Is creating the new tables separately viable for the scale of Wikimedia
Commons? (and dropping the old ones once finished). Is this a concern from
a DBA perspective with regards to storage space? (We'd temporarily need
about twice the space for these tables). So far I understood that it
wouldn't be a problem per se, but that there are also other options we can
explore for Wikimedia. For example we could use a separate set of slaves
and alter those while depooled (essentially using entirely separate set of
db slaves instead of a separate table within each slave).
Do we create the new table(s) separately and switch over once it's caught
up? This would require doing multiple passes as we depool slaves one by one
(we've done that before at Wikimedia). Switch-over could be done by
migrating before the software upgrade, with a very short read-only period
after the last pass is finished. It wouldn't require maintaining multiple
code paths, which is attractive.
Other ideas?
-- Timo
Hi,
Nodepool isn't happy, so our CI is having trouble spinning up new instances
for testing. We're
well aware and working on it.
Since a couple of people have already asked us what they can do to
help...best thing you can do
is avoid pushing changes for review (if they're not urgent) for a bit. And
*definitely* don't run
"recheck" on existing changes for right now.
Thanks for understanding!
-Chad
https://www.mediawiki.org/wiki/Scrum_of_scrums/2016-08-10
= 2016-08-10 =
== Product ==
=== Reading ===
==== Reading Web ====
* Current sprint: https://phabricator.wikimedia.org/project/view/2115/
* Next sprint : https://phabricator.wikimedia.org/project/view/2126/
* New language switcher button deployed this morning (about 5 minutes ago,
as pages are purged naturally changes will be visible)
* Lazy loaded images for mobile web Wikipedias shipping next sprint in 3
deployments: all small wikis, all medium wikis, all large wikis
* Adding a contribution tab to the hamburger menu next sprint
* Enhancing watchstar next sprint, primarily for non-js users
==== iOS native app ====
* 5.0.6 Passed Regression last week
** Holding while we diagnose a memory crash
** Working with beta testers to verify fixes
https://phabricator.wikimedia.org/tag/ios-app-v5.0.6-hotfix/
* 5.1 is in Development
** Major features are iPad and Find in Page
** Expectd to be feature complete end of week
** Expected to go to beta next week
https://phabricator.wikimedia.org/project/board/1736/query/open/
* 5.2 Scheduled to begin development next week
** iOS 10 release
** Major features are Widgets and Feed improvements
** Expected to go to beta in early september
==== Android native app ====
* Current sprint: https://phabricator.wikimedia.org/project/view/2091/
* Next sprint: https://phabricator.wikimedia.org/project/board/2142/
* Working on navigation overhaul
==== Mobile Content Service ====
* Working on fixing links with + signs (
https://phabricator.wikimedia.org/T136223)
==== Reading Infrastructure ====
* Gergo working on pywikibot issue, Brad is OOO
===Community Tech===
* Requested new hardware for cross-wiki watchlists (
https://phabricator.wikimedia.org/T142538 )
* Will be switching Macedoian and/or Swedish Wikipedias to numeric sorting
next week for wider testing
* Working on Programs Dashboard (https://outreachdashboard.wmflabs.org/)
* Working on IABot improvements/stability
== Editing ==
=== VisualEditor ===
* Blocked:
** We're waiting on Design Research with T141069 so we can work with them
on T141068 for the new wikitext editor work.
* Blocking:
** Parsing team are waiting for our response on "native" Parsoid <gallery>
implementation. Thalia will look at it.
* Updates:
** Now live for logged-in users on Arabic-script Wikipedias; logged-out
users planned for next week, then Indic script Wikipedias
** Lots of the team in town this week. Working on a few things. Come talk
to us if you're also in town and want to say hi.
** HTML diffs for partial page saving (so faster saves); needs input from
Services team on https://github.com/edg2s/restbase/tree/sections branch of
RESTbase; PR forthcoming
** New wikitext editor work continues;
https://phabricator.wikimedia.org/T142138 and so on
=== Parsing ===
* Blocked:
** https://phabricator.wikimedia.org/T141591 -- Security review needed
** https://gerrit.wikimedia.org/r/#/c/264026/ -- Parsoid-native
implementation of <gallery>; need review and feedback from VE team
* Blocking:
** Shadow Namespaces: Please review my comments
https://phabricator.wikimedia.org/T91162#2408633 -- I think this is a RFC
discussion matter, but happy to read through it and review it. :)
* Updates:
** Parsoid cluster upgraded to node v4 & Jessie (thanks ops) - deployments
resumed yesterday
** Next step: migrate deployment process to scap3 -- to be scheduled with
Services
** https://gerrit.wikimedia.org/r/#/c/303912/ Implement magic word based
opt-out of global user pages -- bikeshed on __NOGLOBAL__ or __NOFOREIGN__
...
=== Multimedia ===
* Blocked:
** Is there an update on thumbor production status? (Performance?) It's
still blocking ImageTweaks deployment.
* Blocking: None of which we're aware.
* Updates:
** Continuing our work on FileAnnotations; currently pending on
=== Language ===
* Blocked:
* Blocking:
* Updates:
** ContentTranslation dumps are available at,
https://dumps.wikimedia.org/other/contenttranslation/
** CX MT card and templates getting new changes this and next week.
** Apertium -> Jessie work in progress. Few packages left to upload.
=== Collaboration ===
* Blocked
* Blocking
** We are working on rewriting the Flow caching for the data center
migration - https://phabricator.wikimedia.org/T120009
* Updates
** Finishing the ptwikibooks LiquidThreads -> Flow conversion, which had
some issues - https://phabricator.wikimedia.org/T119509
** Working on allowing dynamic actions in the Echo secondary links (e.g.
unwatching a Flow board through AJAX) -
https://phabricator.wikimedia.org/T132975
== Technology ==
=== Analytics ===
* refinery deployment migrated to scap3, tested and everything works well
* sqooping all mediawiki databases out to hdfs, from dbstore1002, so far
seems to not impact that machine much
* pagecounts-raw and pagecounts-all-sites dataset updates have been
stopped, old files remain, upgrade to the new pageviews or pagecounts-ez
datasets as per: https://dumps.wikimedia.org/other/analytics/
* eventlogging kafka pipeline upgraded to fix broker restart bug
* follow-up on pageview spike: Windows update caused a problem with the TLS
handshake in Chrome 41
=== Architecture ===
** **https://www.mediawiki.org/wiki/Architecture_committee/Status*
<https://www.mediawiki.org/wiki/Architecture_committee/Status>
** Last week: ArchCom meetings 2016W31: 2016-08-03 (Wed)
*** 1pm PDT (20 UTC) Planning meeting: [[Phab:E250]]
*** 2pm PDT (21 UTC) IRC #wikimedia-office [[Phab:*E251]]*
***** [[Phab:T128351]]: Notifications in core*
** This week: ArchCom meetings 2016W32: 2016-08-10 (Wed)
*** 1pm PDT (20 UTC) Planning meeting: [[Phab:E258]]
*** 2pm PDT (21 UTC) IRC #wikimedia-office [[Phab:*E259]]*
***** [[Wikitext]] *(and whether Wikimedia should invest in a spec)
=== Services ===
* Parsoid move to Jessie and node 4 complete
* Electron PDF rendering service discussion:
*https://phabricator.wikimedia.org/T142226*
<https://phabricator.wikimedia.org/T142226>
* Working on new kafka driver for Change-Prop - deploy as soon as kafka
upgrade to 0.9
=== Technical Operations ===
* '''Blocked'''
** None
* '''Blocking'''
** None
* Updates
** Parsoid clusters upgraded to jessie throughout the week
** Worked with research to get ORES workers their own dedicated cluster.
Hardware estimations done
** xkey meeting held.
** Review of ContentTranslation apertium packages ongoing
** labs got openstack liberty upgrade
=== Discovery ===
* No blockers
* Working on BM25 implementation
* Working on multi-wiki indexes
* Question mark removal deployed
* Bugfixes for cross-language search deployed
* Portal stats updated + fixes for small languages
=== Interactive Team ===
* Working on deploying <mapframe> and <maplink> to all non-Wikipedia sites
* Working on replacing GeoHack with <maplink> + info screen on all
Wikipedias
* Working on deploying Tabular data on Commons. Already synced up with
Wikidata.
=== Security ===
* MediaWiki 1.27.1 will be released this week
* Notifications for OAuth consumer proposals are being deployed this week:
** https://phabricator.wikimedia.org/T61772
** https://phabricator.wikimedia.org/T62528
* Planning for captcha improvements continues
** https://phabricator.wikimedia.org/T125132
** https://phabricator.wikimedia.org/T141490
** First step will be re-generating images using current script
== Wikidata ==
* No blockers.
* New features on query.wikidata.org (live preview of example queries, map
layers).
* Working on first feedback we got from publishing the Commons MediaInfo
prototype.
* Layout tweaks to the default Wikidata UI.
== Fundraising Tech ==
* No blockers
* Deploying and monitoring first activemq replacement changes
* Minor Central Notice deploy with tonight's SWAT
* Testing background contact de-duping script for CiviCRM
== Performance ==
* No blockers
* Ori still out
* Still doing heavy refactoring in ResourceLoader/Outputpage for the
critical rendering path
* Still working on transaction bugfixes and improvements for multi-DC
* WebpageTest now supports Opera Mini & UC Mini
* Thumbor plugins test suite complete, production work to resume next week
* Team offsite to be in NY area end of November
Dear users, developers and all people interested in semantic wikis,
We are very happy to announce that early bird registration to the 13th
Semantic MediaWiki Conference is now open!
Important facts reminder:
* Dates: September 28th to September 30th 2016 (Wednesday to Friday)
* Location: German Institute for International Educational Research
(DIPF), Schloßstraße 29, Frankfurt am Main, Germany.
* Conference page: https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2016
* Participants: Everybody interested in semantic wikis, especially in
Semantic MediaWiki, e.g. users, developers, consultants, business
representatives and researchers.
We welcome new contributions from you:
* We encourage contributions about applications and development of
semantic wikis; for a list of topics, see [0].
* Please propose regular talks, posters or workshops on the conference
website. We will do our best to consider your proposal in the conference
program. An interesting variety of contributions has already been
proposed, see [1].
* Presentations will generally be video and audio recorded and made
available for others after the conference.
News on participation and tutorials:
* You can now officially register for the conference [2] and benefit
from early bird fees until September 11, 2016.
Organization:
* German Institute for International Educational Research (DIPF) [3] and
Open Semantic Data Association e. V. [4] have become the official
organisers of SMWCon Fall 2016
* Thanks go to Wikimedia Germany [5] for supporting SMWCon Fall 2016
If you have questions you can contact Sabine Melnicki, Kendra Sticht and
Christoph Schindler (Program Chairs), Karsten Hoffmeyer (General Chair)
or Lia Veja (Local Chair) per e-mail (Cc).
We will be happy to see you in Frankfurt!
Sabine Melnicki, Kendra Sticht and Christoph Schindler
(Program Board)
[0] <https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2016/Announcement>
[1]
<https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2016#Program_proposals>
[2] <https://ti.to/smwconfall2016/frankfurt>
[3] <http://www.dipf.de/en/dipf-news>
[4] <http://www.opensemanticdata.org/>
[5] <https://www.wikimedia.de/>
Hi everyone,
This week's office hour: Wikitext! This discussion is intended to be
a continuation of the "Loosing the history of our projects to bitrot."
thread.
Coren stated the work in front of us very well at the start of it.
> You know, this is actually quite troublesome: as the platform evolves
> the older data becomes increasingly hard to use at all - making it
> effectively lost even if we kept the bits around. This is a rather
> widespread issue in computing as a rule; but I now find myself distressed
> at its unavoidable effect on what we've always intended to be a permanent
> contribution to humanity.
The thread he started had pretty robust participation on a really
important, which seemed to us in ArchCom worth continuing the
discussion in one of our weekly office hours. So, after checking with
Subbu (in my list message) that's what ended up as the top candidate.
Subbu did some work to structure the conversation ([A Spec For
Wikitext][1]) and I did some cleanup of the [Wikitext page on
mw.org][2] as a possible hub for information on this topic, with
[Talk:Wikitext][3] providing a durable conversation venue.
Please join us to discuss this further Wednesday UTC
Time: Wednesday 21 UTC (2pm PDT, 23 CEST)
Place: #wikimedia-office
Phab event: [E259][4]
Rob
p.s. as always, ArchCom status updates continue on [ArchCom/Status][5]
[1]: https://www.mediawiki.org/wiki/Parsing/Notes/A_Spec_For_Wikitext
[2]: https://www.mediawiki.org/wiki/Wikitext
[3]: https://www.mediawiki.org/wiki/Talk:Wikitext
[4]: https://phabricator.wikimedia.org/E259
[5]: https://www.mediawiki.org/wiki/Architecture_committee/Status
Hey folks,
This is the 16th weekly update from revision scoring team that we have sent
to this mailing list.
New developments:
- We created dashboards for the ORES service in the Beta cluster[1] and
created panes for tracking failed jobs[2].
- We extended the documentation for the ORES review tool[3,4]
Maintenance:
- We did some work to make the Beta cluster look more like production so
that we can do better testing before the next deployment
- We set up a password on the Beta redis server[5]
- We configured the Beta ORES extension to actually use the Beta ORES
service[6]
- We also prepared a set of puppet changes for the deployment of a
refactored version of ORES to production[7]
Issues in WMFLabs
- We investigated a series of timeout errors that were appearing in the
logs[8]
- We investigated a periodic redis-related error that shower up when
scoring edits[9]
- We fixed our "05" web node that was periodically running out of
memory[10]
Estimating future resource needs
- In preparation for buying new hardware, we measured our past memory
usage and extrapolated forward two years to estimate what hardware
requirements we'll have[11]
1. https://phabricator.wikimedia.org/T142294 - Dashboard or pane for
ORES service in beta
1. https://phabricator.wikimedia.org/T142119 - Dashboard or pane for
ORES failed jobs on beta
1. https://phabricator.wikimedia.org/T140150 - Make user-centered
documentation for review tool
1. https://www.mediawiki.org/wiki/ORES_review_tool
1. https://phabricator.wikimedia.org/T141823 - Set up password on ORES
Beta redis server
1. https://phabricator.wikimedia.org/T141825 - Config beta ORES
extension to use the beta ORES service
1. https://phabricator.wikimedia.org/T141575 - Puppet config changes for
ORES refactor
1. https://phabricator.wikimedia.org/T141368 - [Investigate] ORES time
out errors in logs
1. https://phabricator.wikimedia.org/T141946 - [Investigate] Periodic
redis related errors in wmflabs
1. https://phabricator.wikimedia.org/T141523 - [Investigate] web-05
downtime
1. https://phabricator.wikimedia.org/T142046 - Extrapolate memory usage
per worker forward 2 years
Sincerely,
Aaron from the Revision Scoring team