GCP has a number of models-as-a-service
<https://cloud.google.com/products/machine-learning/> that might be useful.
On Mon, Apr 3, 2017 at 6:46 PM Daniel Mietchen <
daniel.mietchen(a)googlemail.com> wrote:
> Hi Jordan,
> can your pipeline help with video or perhaps even audio as well?
> There are lots of such files as well that need categorization.
> Thanks,
> Daniel
>
> On Tue, Apr 4, 2017 at 12:05 AM, Jordan Adler <jmadler(a)google.com> wrote:
> > Looks like some of these images still need categorization. I think
> there's
> > still an unrealized opportunity here to use the results I shared to work
> the
> > backlog of the category on the Commons.
> >
> > On Thu, Aug 11, 2016 at 1:47 PM Pine W <wiki.pine(a)gmail.com> wrote:
> >>
> >> Forwarding.
> >>
> >> Pine
> >>
> >> ---------- Forwarded message ----------
> >> From: "Jordan Adler" <jmadler(a)google.com>
> >> Date: Aug 11, 2016 13:06
> >> Subject: [Commons-l] Programmatically categorizing media in the Commons
> >> with Machine Learning
> >> To: "commons-l(a)wikimedia.org" <commons-l(a)lists.wikimedia.org>
> >> Cc: "Ray Sakai" <rsakai(a)reactive.co.jp>, "Ram Ramanathan"
> >> <ramramanathan(a)google.com>, "Kazunori Sato" <kazsato(a)google.com>
> >>
> >> Hey folks!
> >>
> >>
> >> A few months back a colleague of mine was looking for some unstructured
> >> images to analyze as part of a demo for the Google Cloud Vision API.
> >> Luckily, I knew just the place, and the resulting demo, built by
> Reactive
> >> Inc., is pretty awesome. It was shared on-stage by Jeff Dean during the
> >> keynote at GCP NEXT 2016.
> >>
> >>
> >> I wanted to quickly share the data from the programmatically identified
> >> images so it could be used to help categorize the media in the Commons.
> >> There's about 80,000 images worth of data:
> >>
> >>
> >> map.txt (5.9MB): A single text file mapping id to filename in a "id :
> >> filename" format, one per line
> >>
> >> results.tar.gz (29.6MB): a tgz'd directory of json files representing
> the
> >> output of the API, in the format "${id}.jpg.json"
> >>
> >>
> >> We're making this data available under the CC0 license, and these links
> >> will likely be live for at least a few weeks.
> >>
> >>
> >> If you're interested in working with the Cloud Vision API to tag other
> >> images in the Commons, talk to the WMF Community Tech team.
> >>
> >>
> >> Thanks for your help!
> >>
> >>
> >> _______________________________________________
> >> Commons-l mailing list
> >> Commons-l(a)lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/commons-l
> >>
> >
> > _______________________________________________
> > Commons-l mailing list
> > Commons-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/commons-l
> >
>
> _______________________________________________
> Commons-l mailing list
> Commons-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>
Hello,
As of the release of service-runner v2.3.0~[1] earlier today, we are no
longer supporting Node.js v0.1x platforms. The minimum Node version needed
to power your services is now set at v4.2.2, but we encourage the library's
users to develop and run their services on Node v6.x, the current Node TLS
release.
If this change is affecting your services in a negative way, please let us
know here on-list or by filing a task in Phabricator against the
service-runner tag~[2].
Best,
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
[1] https://github.com/wikimedia/service-runner/releases/tag/v2.3.0
[2] https://phabricator.wikimedia.org/project/board/1062/
https://www.mediawiki.org/wiki/Scrum_of_scrums/2017-04-19
*= 2017-04-19=*
contact: https://www.mediawiki.org/wiki/Wikimedia_Engineering
== Call outs:==
* Releng: if you have a scap3-deployed repo that has a patch
https://gerrit.wikimedia.org/r/#/q/topic:T162814+%28status:open%29 please
merge
* Analytics: Piwik is being upgraded tomorrow, April 20th, may have a 30
minute down-time
* Analytics: Wikistats 2.0 prototype consultation going on at
https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedb…
== Product ==
=== Reading ===
==== iOS ====
* Last Week
** Continued work on 5.4.1 -
https://phabricator.wikimedia.org/project/view/2600/
*** Background feed loading & coalescing
*** Crash fixes & performance enhancements
** 5.5 - https://phabricator.wikimedia.org/project/view/2602/
*** Places
*** JavaScript consolidation with Android
*** Move footer content to WebView
* This Week
** Testing 5.4.1
** Continue work on 5.5 (Places, JS consolidation)
==== Android ====
* Beta release this week containing Wikidata title description editing
expanded to many more languages, as well as various offline UX improvements
* Further improving offline functionality and surrounding UX polish
* Continuing work on cross-platform consolidation of CSS & JS
* Beginning discussion of implementing offline ZIM collections (Q4 goal)
* Current release board:
https://phabricator.wikimedia.org/project/view/2352/
==== Reading Infrastructure ====
* TemplateStyles CR, familiarizing with OCG
* MCS: Finally updating Parsoid version requested by MCS to 1.3.0. Working
on refactoring mobile-sections to a new, intermediary, mobile HTML endpoint.
=== Web ===
Wrapping up page previews work
Beginning work on a print specific stylesheet
=== Editing ===
==== Collaboration ====
* No deploys this week, but on Monday, planning to enable new RC Filters as
a Beta Feature on English Wikipedia (which does have ORES), plus all
non-ORES wikis (with the possible exception of German Wikipedia).
* Preview for when deployments restart:
** Working on transforming Wikidata user IDs so propagated edits show user
responsible
** Optimization so if we know a query will return 0 results, we won't do
the query at all. Some of these no-result queries have extremely poor
performance.
** Other bug fixes
==== Parsing ====
* Linter: Continuing to address bug reports and tweaking it. Was disabled
from large wikis last Friday because of performance issues (
https://phabricator.wikimedia.org/T148609 ). Problem is now fixed and will
be re-enabled next week. Decided to finish tweaking and improving output
before a wider announcement.
==== Language ====
* ContentTranslation disabled in all Wikis due to high load on x1 in DC
switch. See: https://phabricator.wikimedia.org/T163344 Ops/DBA aware. Team
will debug further on it.
* Work on CX + OOjs continue.
==== UI Standardization ====
* This week:
** Continued work to provide WikimediaUI Base variables in core
https://phabricator.wikimedia.org/T123359
* Updates:
** OOjs UI:
*** Release of v0.21.1 with 11 UI/a11y improvements
https://phabricator.wikimedia.org/diffusion/GOJU/browse/master/History.md –
among those:
**** MediaWiki theme: Ensure WCAG level AA contrast on unsupported
SelectFileWidget
**** MediaWiki theme: Make readonly TextInputWidget appearance clearer
**** MediaWiki theme: TagMultiselectWidget outlined UI improvements
**** MenuOptionWidget: Remove theme-independent 'check' icon (Prateek
Saxena)
**** DropdownInput-/RadioSelectInputWidget: Remove unnecessary ARIA
attributes
=== Wikidata ===
* continue work on federation and structured wiktionary
* deploying geoshape data type on Wikidata next Monday
* also enabling Cognate extension (interwiki links) on Wiktionary next
Monday
== Technology ==
=== Security ===
* Reviews
** Ex:WikibaseMediaInfo
** TemplateStyles re-review
=== Services ===
* Blockers: none
* Updates:
** Services DC switchover yesterday
** RESTBase summary endpoint now allows 5 minutes client-side caching
=== Analytics ===
* Ongoing work on EventLogging analysis support in Hadoop
* Ongoing work on Wikistats 2.0 data back-end
* Piwik being upgraded tomorrow, will have a short (30-minute or so)
downtime
* Wikistats 2.0 consultation on the visual design prototype happening now:
https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedb…
(prototype at https://analytics-prototype.wmflabs.org )
* Dashiki configuration articles on meta are all screwed up, can't fix them
until the codfw-related deployment moratorium is over
=== RelEng ===
* Blockers: none
* Blocking: none?
* '''Updates'''
** If you have a scap3-deployed repo that has a patch in
https://gerrit.wikimedia.org/r/#/q/topic:T162814+%28status:open%29 please
merge
=== Discovery ===
* No blockers
* New blog post about search:
https://blog.wikimedia.org/2017/04/10/searching-wikipedia/
* Made plan to deploy archive search:
https://phabricator.wikimedia.org/T163235 comments welcome
* Portal updates: https://phabricator.wikimedia.org/T128546
* Building infrastructure for machine learning assisted ranking (aka
MjoLniR)
* Working on Wikidata search improvement
=== Fundraising Tech ===
* More Paypal Express Checkout fixes
* Coordinating with Comms to update the WMF logo in various places:
https://phabricator.wikimedia.org/T144254
* CentralNotice: Banner sequence feature is in code review
https://phabricator.wikimedia.org/T144453
* CiviCRM: getting rid of the rest of our local core hacks
=== Community Tech ===
No blockers
* Pushed out Special:AutoblockList, enhancements coming
* Getting community feedback on LoginNotify (
https://www.mediawiki.org/wiki/Extension:LoginNotify)
* Analyzing cookie blocking on English Wikipedia prior to broader roll-out
to all wikis
* Work continuing on CodeMirror (syntax highlighting) (
https://www.mediawiki.org/wiki/Extension:CodeMirror)
TemplateStyles could really use this, since people object to e.g. the
TemplateStyles CSS being able to mess with the diff tables. I posted an
analysis and some options at
https://phabricator.wikimedia.org/T37247#3181097. Feedback would be
appreciated, particularly from someone familiar with how exactly content
gets into VE and Flow as to what if anything else might be needed to get
the new div to be output in those extensions.
Thanks.
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
Hi everybody!
As a reminder the CREDIT Showcase is next week on Wednesday,
1-February-2017 (see https://www.mediawiki.org/wiki/CREDIT_showcase for
details). Also, as I mentioned previously we're conducting a survey about
CREDIT. We'd appreciate your feedback! Here is a link to the survey (which
is hosted on a third-party service), and, for information about privacy and
data handling, the survey privacy statement.
https://docs.google.com/a/wikimedia.org/forms/d/e/1FAIpQLSedAtyPfcEhT6OVd26…https://wikimediafoundation.org/wiki/CREDIT_Feedback_Survey_Privacy_Stateme…
.
This email is being sent to several mailing lists in order to reach
multiple audiences. As always, please follow the list link at the very
bottom of this email in case you want to manage your list subscription
options such as digest, unsubscribe, and so on.
And, as usual, if you'd like to share the news about the upcoming CREDIT,
here's some suggested verbiage.
*Hi <FNAME>*
*I hope all is well with you! I wanted to let you know about CREDIT, a
monthly demo series that we’re running to showcase open source tech
projects from Wikimedia’s Community, Reading, Editing, Discovery,
Infrastructure and Technology teams. *
*CREDIT is open to the public, and we welcome questions and discussion. The
next CREDIT will be held on February 1st at 11am PT / 2pm ET / 19:00 UTC. *
*There’s more info on MediaWiki
<https://www.mediawiki.org/wiki/CREDIT_showcase>, and on Etherpad
<https://etherpad.wikimedia.org/p/CREDIT>, which is where we take notes and
ask questions. You can also ask questions on IRC in the Freenode chatroom
#wikimedia-office (web-based access here
<https://webchat.freenode.net/?channels=%23wikimedia-office>). Links to
video will become available at these locations shortly before the event.*
*Please feel free to pass this information along to any interested folks.
Our projects tend to focus on areas that might be of interest to folks
working across the open source tech community: language detection,
numerical sort, large data visualizations, maps, and all sorts of other
things.*
*If you have any questions, please let me know! Thanks, and I hope to see
you at CREDIT.*
*YOURNAME*
Thanks!
Adam Baso
Director of Engineering, Reading
Wikimedia Foundation
abaso(a)wikimedia.org
This affects your wiki if you are using both Flow and Nuke.
We recently fixed https://phabricator.wikimedia.org/T162621 , an issue
with Flow's Nuke integration.
This has now been merged to master as well as the two supported Flow
release branches (1.27 and 1.28):
master - https://gerrit.wikimedia.org/r/#/c/348407/ (merged)
1.27 - https://gerrit.wikimedia.org/r/#/c/348408/1
1.28 - https://gerrit.wikimedia.org/r/#/c/348409/1
This has already been deployed to WMF production.
There is an unrelated Jenkins issue with 1.27 and 1.28. Until those are
merged, you can download the patches using Download->Checkout in the
top-right of Gerrit. Sorry for the inconvenience.
Matt Flaschen
Hey folks,
In this update, I'm going to change some things up to try and make this
update easier for you to consume. The biggest change you'll notice is that
I've broken up the [#] references in each section. I hope that saves you
some scrolling and confusion. You'll also notice that I have changed the
subject line from "Revision scoring" to "Scoring Platform" because it's now
clear that, come July, I'll be leading a new team with that name at the
Wikimedia Foundation. There'll be an announcement about that coming once
our budget is finalized. I'll try to keep this subject consistent for the
foreseeable future so that your email clients will continue to group the
updates into one big thread.
*Deployments & maintenance:*
In this cycle, we've gotten better at tracking our deployments and noting
what changes do out with each deployment. You can click on the phab task
for a deployment and observe the sub-tasks to find out what was deployed.
We had 3 deployments for ORES since mid-march[1,2,3]. We've had two
deployments to Wikilabels[4,5] and we've added a maintenance notices for a
short period of downtime that's coming up on April 21st[6,7].
1. https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod
(Mid-March)
2. https://phabricator.wikimedia.org/T160638 -- Deploy ORES late march
3. https://phabricator.wikimedia.org/T161748 -- Deploy ORES early April
4. https://phabricator.wikimedia.org/T161002 -- Late march wikilabels
deployment
5. https://phabricator.wikimedia.org/T163016 -- Deploy Wikilabels mid-April
6. https://phabricator.wikimedia.org/T162888 -- Add header to Wikilabels
that warns of upcoming maintenance.
7. https://phabricator.wikimedia.org/T162265 -- Manage wikilabels for
labsdb1004 maintenance
*Making ORES better:*
We've been working to make ORES easier to extend and more useful. ORES now
reports it's relevant versions at https://ores.wikimedia.org/versions[8].
We've also reduced the complexity of our "precaching" system that scores
edits before you ask for them[9,10]. We're taking advantage of logstash to
store and query our logs[11]. We've also implemented some nice
abstractions for requests and responses in ORES[12] that allowed us to
improve our metrics tracking substantially[13].
8. https://phabricator.wikimedia.org/T155814 -- Expose version of the
service and its dependencies
9. https://phabricator.wikimedia.org/T148714 -- Create generalized
"precache" endpoint for ORES
10. https://phabricator.wikimedia.org/T162627 -- Switch `/precache` to be a
POST end point
11. https://phabricator.wikimedia.org/T149010 -- Send ORES logs to logstash
12. https://phabricator.wikimedia.org/T159502 -- Exclude precaching
requests from cache_miss/cache_hit metrics
13. https://phabricator.wikimedia.org/T161526 -- Implement
ScoreRequest/ScoreResponse pattern in ORES
*New functionality:*
In the last month and a half, we've added basic support to Korean
Wikipedia[14,15]. Props to Revi for helping us work through a bunch of
issues with our Korean language support[16,17,18].
We've also gotten the ORES Review tool deployed to Hebrew
Wikipedia[19,20,21,22] and Estonian Wikipedia[23,24,25]. We're also
working with the Collaboration team to implement the threshold test
statistics that they need to tune their new Edit Review interface[26] and
we're working towards making this kind of work self-serve so that that
product team and other tool developers won't have to wait on us to
implement these threshold stats in the future[27].
14. https://phabricator.wikimedia.org/T161617 -- Deploy reverted model for
kowiki
15. https://phabricator.wikimedia.org/T161616 -- Train/test reverted model
for kowiki
16. https://phabricator.wikimedia.org/T160752 -- Korean generated word
lists are in chinese
17. https://phabricator.wikimedia.org/T160757 -- Add language support for
Korean
18. https://phabricator.wikimedia.org/T160755 -- Fix tokenization for Korean
19. https://phabricator.wikimedia.org/T161621 -- Deploy ORES Review Tool
for hewiki
20. https://phabricator.wikimedia.org/T130284 -- Deploy edit quality models
for hewiki
21. https://phabricator.wikimedia.org/T160930 -- Train damaging and
goodfaith models for hewiki
22. https://phabricator.wikimedia.org/T130263 -- Complete hewiki edit
quality campaign
23. https://phabricator.wikimedia.org/T159609 -- Deploy ORES review tool to
etwiki
24. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models
for etwiki
25. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit
quality campaign
26. https://phabricator.wikimedia.org/T162377 -- Implement additional
test_stats in editquality
27. https://phabricator.wikimedia.org/T162217 -- Implement "thresholds",
deprecate "pile of tests_stats"
*ORES training / labeling campaigns:*
Thanks to a lot of networking at Wikimedia Conference and some help from
Ijon (Asaf Batrov), we've found a bunch of new collaborators to help us
deploy ORES to new wikis. As is critcial in this process, we need to
deploy labeling campaigns so that Wikipedians can help us train ORES.
We've got new editquality labeling campaigns deployed to Albanian[28],
Finnish[29], Latvian[30], Korean[31], and Turkish[21] Wikipedias.
We've also been working on a new type of model: "Item quality" in
Wikidata. We've deployed, labeled, and analyzed a pilot[33], fixed some
critical bugs that came up[34,35], and we've finally launched a 5k item
campaign which is already 17% done[36]! See
https://www.wikidata.org/wiki/Wikidata:Item_quality_campaign if you'd like
to help us out.
28. https://phabricator.wikimedia.org/T161981 -- Edit quality campaign for
Albanian Wikipedia
29. https://phabricator.wikimedia.org/T161905 -- Edit quality campaign for
Finnish Wikipedia
30. https://phabricator.wikimedia.org/T162032 -- Edit quality campaign for
Latvian Wikipedia
31. https://phabricator.wikimedia.org/T161622 -- Deploy editquality
campaign in Korean Wikipedia
32. https://phabricator.wikimedia.org/T161977 -- Start v2 editquality
campaign for trwiki
33. https://phabricator.wikimedia.org/T159570 -- Deploy the pilot of
Wikidata item quality campaign
34. https://phabricator.wikimedia.org/T160256 -- Wikidata items render
badly in Wikilabels
35. https://phabricator.wikimedia.org/T162530 -- Implement "unwanted pages"
filtering strategy for Wikidata
36. https://phabricator.wikimedia.org/T157493 -- Deploy Wikidata item
quality campaign
*Bug fixing:*
As usual, we have a few weird bug that got in our way. We needed to move
to a bigger virtual machine in "Beta Labs" because our models take up a
bunch of hard drive space[37]. We found that Wikilabels wasn't removing
expired tasks correctly and that this was making it difficult to finish
labeling campaigns[38]. We also had a lot of right-to-left issues when we
did an upgrade of OOjs UI[38]. There was an old bug we had with
https://translatewiki.net in one of our message keys[39].
37. https://phabricator.wikimedia.org/T160762 -- deployment-ores-redis
/srv/ redis is too small (500MBytes)
38. https://phabricator.wikimedia.org/T161521 -- Wikilabels is not cleaning
up expired tasks for Wikidata item quality campaign
39. https://phabricator.wikimedia.org/T161533 -- Fix RTL issues in
Wikilabels after OOjs UI upgrade
40. https://phabricator.wikimedia.org/T132197 -- qqq for a wiki-ai message
cannot be loaded
-Aaron
Principal Research Scientist
Head of the Scoring Platform Team
Hiya!
tl;dr: if your repo has a patch from me[0], please merge it :)
The longer explanation for these patches is that the deployment server
from which your code is fetched by targets is set via the git_server
configuration variable. This variable will be updated in Puppet when the
primary deployment server changes; however, updating it in every repo
would be time consuming. Yesterday, I made a bunch of patches to remove
this configuration variable from any repo where it is set. By removing
this configuration variable from individual repos, all repos will
respect the global value for git_server that is set in Puppet meaning
that repo owners shouldn't have to worry about making updates when a
deployment server is changed.
If you have any questions let me know via email or IRC in
#wikimedia-releng.
Thank you for your help!
-- Tyler
[0]. <https://gerrit.wikimedia.org/r/#/q/topic:T162814+status:open>