[x-posted announcement]
Hello,
The next online office hour session of the Wikimedia Language team is
scheduled for next Wednesday, March 2nd 2016 at 14:00 UTC. This session is
going to be an online discussion over Google Hangouts/Youtube with a
simultaneous IRC conversation. Due to the limitation of Google Hangouts,
only a limited number of participation slots are available. Hence, do
please let us know (on the event page
<https://plus.google.com/u/0/events/cbn4a2gubl4m6au3jv0gllh5t0k>) if you
would like to join in the Hangout. The IRC channel #wikimedia-office and
the Q&A channel for the youtube broadcast will be open for interactions
during the session.
Our last online round-table session was held in November 2015. You can
watch the recording here: https://www.youtube.com/watch?v=eYWZ6C4N93Y
Please read below for the event details, including local time and do let us
know if you have any questions.
Thank you
Runa
== Details ==
# Event: Wikimedia Language team's office hour session
# When: March 2nd, 2016 (Wednesday) at 14:00 UTC (check local time
http://www.timeanddate.com/worldclock/fixedtime.html?iso=20160302T1400)
# Where: https://plus.google.com/u/0/events/cbn4a2gubl4m6au3jv0gllh5t0k and
on IRC #wikimedia-office (Freenode)
# Agenda: Content Translation updates and Q & A
--
Language Engineering Manager
Outreach and QA Coordinator
Wikimedia Foundation
Greetings,
Wikimedia is a mentoring organization for Outreachy round 12 and the
application period is now open[1]. More details can be found out on the
program page[2].
We have a lot of project ideas in the Possible-Tech-Projects board[3],
where students can look for ideas.
Some of them have a well defined scope with micro-tasks and ready to be
featured, while many others require your love.
Do consider stepping forward and adding yourself as a mentor if you feel
you could help move any of the projects forward, especially in the "
*Discussion*" and "*Missing mentors*" column. A lot of projects in the
"*Missing
mentors*" column have also been recently added, which are Community
wishlist items, and stand a good chance of being featured.
Your comments would be greatly appreciated in shaping these ideas into
featured projects for the students.
Also, if you know of other projects/ideas which could be a good fit for a 3
month GSoC/Outreachy internship, feel free to add them to the board.
[1] - https://outreachy.gnome.org/?q=program_home&prg=6
[2] - https://www.mediawiki.org/wiki/Outreachy/Round_12
[3] - https://phabricator.wikimedia.org/tag/possible-tech-projects/
-Thank You!
-Sumit
-Co-organizer of Outreachy 12 alongwith Tony Thomas
Hi,
we are considering a policy for REST API end point result format
versioning and negotiation. The background and considerations are
spelled out in a task and mw.org page:
https://phabricator.wikimedia.org/T124365https://www.mediawiki.org/wiki/Talk:API_versioning
Based on the discussion so far, have come up with the following
candidate solution:
1) Clearly advise clients to explicitly request the expected mime type
with an Accept header. Support older mime types (with on-the-fly
transformations) until usage has fallen below a very low percentage,
with an explicit sunset announcement.
2) Always return the latest content type if no explicit Accept header
was specified.
We are interested in hearing your thoughts on this.
Once we have reached rough consensus on the way forward, we intend to
apply the newly minted policy to an evolution of the Parsoid HTML
format, which will move the data-mw attribute to a separate metadata
blob.
Gabriel Wicke
Hey all,
TLDR: ORES extension [1] which is an extension that integrates ORES service
[2] with Wikipedia to make fighting vandalism easier and more efficient is
in the progress of deployment. You can test it in
https://mw-revscoring.wmflabs.org (Enable it in your preferences first)
You probably know ORES. It's an API service that gives probably of an edit
being vandalism, it also does other AI-related stuff like guessing the
quality of articles in Wikipedia. We have a nice blog post in Wikimedia
Blog [3] and media paid some attention to it [4]. Thanks to Aaron Halfaker
and others [5] for their work in building this service. There are several
tools using ORES to highlight possibly vandalism edits. Huggle, gadgets
like ScoredRevisions, etc. But an extension does this job much more
efficiently.
The extension which is being developed by Adam Wight, Kunal Mehta and me
highlights unpatrolled edits in recentchanges, watchlists, related changes
and in future, user contributions if ORES score of those edits pass a
certain threshold. GUI design is made by May Galloway. ORES API (
ores.wmflabs.org) only gives you a score between 0 and 1. Zero means it's
not vandalism at all and one means it's vandalism for sure. You can test
its simple GUI in https://ores.wmflabs.org/ui/. It's possible to change the
threshold in your preferences in the recent changes tab (you have options
instead of numbers because we thought numbers are not very intuitive).
Also, we enabled it in a test wiki so you test it:
https://mw-revscoring.wmflabs.org. You need to make an account (use a dummy
password) and then enable it in beta features tab. Note that building AI
tool to detect vandalism in a test wiki sounds a little bit silly ;) so we
set up a dummy model that probability of an edit being vandalism is
backward of the last two digits (e.g. diff id:12345 = score:54%). In a more
technical aspect, we store these scores in ores_classification table so we
can do a lot more analysis with them once the extension is deployed. Fun
use cases such as the average score of a certain page or contributions of a
user or members of a category, etc.
We passed security review and we have consensus to enable it in Persian
Wikipedia. We are only blocked on ORES moving from Labs to production
(T106867 [6]). The next wiki is Wikidata, we are good to go once the
community finishes labeling edits so we can build the "damaging" model. We
can enable it Portuguese and Turkish Wikipedia after March because s2 and
s3 have database storage issues right now. For other Wikis, you need to
check if ORES supports the Wiki and if community finished labeling edits
for ORES (check out the table at [2])
If you want to report bugs or add feature requests you can find it in here
[7].
[1]: https://www.mediawiki.org/wiki/Extension:ORES
[2]: https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service
[3]:
https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/
[4]:
https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Media
[5]:
https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team
[6]: https://phabricator.wikimedia.org/T106867
[7]: https://phabricator.wikimedia.org/tag/mediawiki-extensions-ores/
Best
My apologies for the short notice. Normally we announce these more than one
hour in advance, but I forgot.
In today's RFC meeting, we will discuss the following RFC:
* Standardise on how to access/register JavaScript interfaces
<https://phabricator.wikimedia.org/T108655>
<https://phabricator.wikimedia.org/E144
<https://phabricator.wikimedia.org/E140>>
The meeting will be on the IRC channel #wikimedia-office on
chat.freenode.net at the following time:
* UTC: Wednesday 22:00
* US PST: Wednesday 14:00
* Europe CET: Wednesday 23:00
* Australia AEDT: Thursday 09:00
Roan
https://www.mediawiki.org/wiki/Scrum_of_scrums/2016-02-24
= 2016-02-24 =
== Technology ==
=== Analytics ===
* '''Blocking''': (nobody we know)
* '''Blocked''': (on nothing)
* '''Updates''':
** Upgraded to CDH 5.5, comes with lots of improvements for those using the
Hadoop cluster:
http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_new_i…
** Internally released data that estimates the number of Unique Devices
hitting each of our domains, using the Last Access cookie. This is a major
release, and it's available in the wmf database in hive, in the
last_access_uniques_daily table.
** Fixed handling of uri-encoded page titles in the pageview API
=== Architecture ===
* '''Blocking''':
** ???
* '''Blocked''':
** ???
* '''Updates''':
** ???
=== Performance ===
* '''Blocking''':
** ???
* '''Blocked''':
** ???
* '''Updates''':
** ???
=== Release Engineering ===
* '''Blocking''':
** https://phabricator.wikimedia.org/T111259
** Update train email as to why stalled
* '''Blocked''':
** None
* '''Updates''':
** AQS deployed via Scap3, (hooray \o/ +1) ready for new services w/new
version
** Phabricator updates happened, puppet work continues
** Train (still) not running wmf.14 on testwiki and that's all
=== Research ===
* '''Blocking''':
** nothing we know of
* '''Blocked''':
** blocked on ops for ORES in production
*** also blocks deployment of ORES extension to fawiki and wikidata
*** halfak would like to engage with ops - could someone contact him?
* '''Updates''':
** none
=== Security ===
* '''Blocking''':
** ???
* '''Blocked''':
** ???
* '''Updates''':
** Working through lots of security bugs
** PageViewInfo review in progress
=== Services ===
* '''Blocking''':
** ???
* '''Blocked''':
** ???
* '''Updates''':
** ???
=== Technical Operations ===
* '''Blocking''':
** ???
* '''Blocked''':
** ???
* '''Updates''':
** ???
== Product ==
=== Community Tech ===
* '''Blocking''':
** ???
* '''Blocked''':
** ???
* '''Updates''':
** ???
=== Discovery ===
* '''Blocking''':
** none afaik
* '''Blocked''':
** Would like Ops input for https://phabricator.wikimedia.org/T126730 (caching
model for WDQS)
** Would like Sec review on SVG sanitizer JS lib
** Would like Sec review on Schema validator php lib
* '''Updates''':
** Preparing to switch completion suggester into production (March)
** A number of new interesting graphs at http://discovery.wmflabs.org/ e.g.
http://discovery.wmflabs.org/metrics/#failure_langproj,
http://discovery.wmflabs.org/portal/#browser_breakdown
** Not much new, mostly bugfixes, tweaks and maintenance
==== Graphs ====
** Pageview API graphs getting popular
=== Editing ===
==== Collaboration ====
* '''Blocking''':
** External Store - In progress. Will soon enable External Store on Beta
Cluster as a pre-requisite for this. If you want to look/give feedback on
the Beta change, see https://phabricator.wikimedia.org/T95871
* '''Blocked''':
** Flow dumps on dumps.wikimedia.org:
https://phabricator.wikimedia.org/T119511
** Schema change to make a column NOT NULL in production:
https://phabricator.wikimedia.org/T122111#2050844
* '''Updates''':
** Enabled Echo cross-wiki notifications feature on initial wave of wikis.
Good feedback so far.
** Working on some issues with Flow board moves.
** Also, not a Collaboration team thing, but we've asked for feedback on
some Code of Conduct proposed changes:
https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Suggested_changes
==== Language ====
* '''Blocking''':
** None
* '''Blocked''':
** None
* '''Updates''':
**
==== Multimedia ====
* '''Blocking''':
** ???
* '''Blocked''':
** ???
* '''Updates''':
** ???
==== Parsing ====
* '''Blocking''':
** None?
* '''Blocked''':
** None
* '''Updates''':
** Templatedata based serialization being deployed today (
https://phabricator.wikimedia.org/T111674 and
https://phabricator.wikimedia.org/T104599 )
** Kunal and Ori have been investigating
https://phabricator.wikimedia.org/T124356 ... Ori might have made some
headway there.
*** Filed https://phabricator.wikimedia.org/T127757 to fix getText()
semantics to prevent this kind of sneaky bugs in the future.
** Heads up for release engineering:
https://phabricator.wikimedia.org/T111259 bit us once more recently.
==== VisualEditor ====
* '''Blocking''':
** None known.
* '''Blocked''':
** Waiting on Design Research availability for user testing of Single Edit
Tab integration
* '''Updates''':
** Single Edit Tab went to Hungarian Wikipedia yesterday; now waiting on
user feedback.
** Some improvements to OOUI; note the breaking change for wmf.15+ (no
known issues in gerrit master code).
** Last week we said we'd update on assessing the performance impact of
OOUI on all read pages; this is not firm yet, but appears to be a trivial
additional cost.
=== Fundraising Tech ===
* No blockers/blocking
* Investigating anomalies
* Improving CiviCRM reporting
* Testing backup processor improvements
* Further Latin America processor work
=== Reading ===
==== Android ====
* '''Updates''':
** Nothing to report.
==== Reading Infrastructure ====
* We've been mostly chasing performance issues that people found that touch
stuff SessionManager also touched.
* In the not too distant future, load.php is going to start enforcing the
fact that it's supposed to not depend on the session or the request data.
** See https://phabricator.wikimedia.org/T127233 and subtasks.
** Check your ResourceLoaderModule subclasses and your
'ResourceLoaderGetConfigVars' hook functions to make sure you're not using
$wgUser or $wgLang (or their equivalents via RequestContext). You'll
generally want to use the user and language from the ResourceLoaderContext
or use the 'MakeGlobalVariablesScript' hook instead. Remember that Message
objects will use $wgLang by default.
** Check your parser hooks to make sure you're using
$parserOptions->getUser() or $parser->getTargetLanguage() instead of
$wgUser or $wgLang (or their equivalents via RequestContext), respectively.
Otherwise you're liable to blow things up if your hook gets used in a
message somewhere.
** Timo says he'll send an announcement of some sort once details are
settled.
Hello,
As we know, wiki (mainly wikipedia) articles go into a lot of details about
the subject. They often tend to become verbose. Sometimes individual
sections become as long as articles.
The information about a topic is split across various pages which are
linked in the article.We have to open several such links to get a good
understanding of the article.
Navigation popups/Hovercards make it a bit simpler. But the info provided
by them is often out of context .They are more about an introduction to the
linked article rather than the intended page and their connection; which
makes it disconnected and muddled. It helps a reader figure out the
importance of a page, but not its relevancy.
As part of GSoC project, I was thinking of making a summarization tool that
could automatically create a wholesome summary of the article. The links,
categories, infoboxes and other unique wiki things make it much different
and interesting than simple text summarization. It makes it easier to gauge
the context and relevancy of articles and the linked structure make it
possible to crawl to relevant pages (like Hovercard). Finally, combining
only the important and relevant information (from all sections), we can
form a coherent and lucid summary for the reader. The intro paragraphs just
provide an introduction to the article whereas the script will provide a
jist of the entire article (and hence would be bigger in most cases)
Though there has been some independent research
<http://lms.comp.nus.edu.sg/sites/default/files/publication.../acl09-yesr.pdf>
done on it, the possibility of such a tool was never discussed at length on
wikimedia.
So, I want to ask the opinion of all the members towards such a tool, in
the above or some other form. Also does it seem like something that can be
done as a GSoC project (MVP)? Would there be any mentors interested?
Hi all,
A request has come up (https://phabricator.wikimedia.org/T126832) to
re-create pt.wikimedia.org on the wikimedia cluster. Unfortunately it was
previously hosted there and so the 'ptwikimedia' database name is already
taken.
Since database renaming does not really appear to be an option, does anyone
have any objections to using 'pt2wikimedia' (or similar, suggestions
welcome) instead for the new wiki? I know this doesn't fit the existing
pattern so I'm unsure about just going ahead without asking for input from
a wider audience.
Alex