Re: [Analytics] [Research-Internal] Article about ML in production woes

7 Feb 2019

Hey Andrew!

Thank you so much for sharing this and start this conversation. We had a
meeting at All Hands with all people interested in "Image Classification"
https://phabricator.wikimedia.org/T215413 , and one of the open questions
was exactly how to find a "common repository" for ML models that different
groups and products within the organization can use. So, please, count me
in!

Thanks,

M

On Thu, Feb 7, 2019 at 4:38 PM Aaron Halfaker &lt;ahalfaker(a)wikimedia.org&gt;
wrote:

...
  Just gave the article a quick read.  I think this
article pushes on some
 key issues for sure.  I definitely agree with the focus on python/jupyter
 as essential for a productive workflow that leverages the best from
 research scientists.  We've been thinking about what ORES 2.0 would look
 like and event streams are the dominant proposal for improving on the
 limitations of our queue-based worker pool.

 One of the nice things about ORES/revscoring is that it provides a nice
 framework for operating using the *exact same code* no matter the
 environment.  E.g. it doesn't matter if we're calling out to an API to get
 data for feature extraction or providing it via a stream.  By investing in
 a dependency injection strategy, we get that flexibility.  So to me, the
 hardest problem -- the one I don't quite know how to solve -- is how we'll
 mix and merge streams to get all of the data we want available for feature
 extraction.  If I understand correctly, that's where Kafka shines.  :)

 I'm definitely interested in fleshing out this proposal.  We should
 probably be exploring the processes for training new types of models (e.g.
 image processing) using different strategies than ORES.  In ORES, we're
 almost entirely focused on using sklearn but we have some basic
 abstractions for other estimator libraries.  We also make some strong
 assumptions about running on a single CPU that could probably be broken for
 some performance gains using real concurrency.

 -Aaron

 On Thu, Feb 7, 2019 at 10:05 AM Goran Milovanovic <
 goran.milovanovic_ext(a)wikimedia.de&gt; wrote:

  Hi Andrew,

 I have recently started a six month AI/Machine Learning Engineering
 course which focuses exactly on the topics that you've shown interest in.

 So,

 >>  I'd love it if we had a working
group (or whatever) that focused on  how to standardize how we train and deploy ML
for production use.

 Count me in.

 Regards,
 Goran

 Goran S. Milovanović, PhD
 Data Scientist, Software Department
 Wikimedia Deutschland

 ------------------------------------------------
 "It's not the size of the dog in the fight,
 it's the size of the fight in the dog."
 - Mark Twain
 ------------------------------------------------

 On Thu, Feb 7, 2019 at 4:16 PM Andrew Otto &lt;otto(a)wikimedia.org&gt; wrote:

  Just came across

 https://www.confluent.io/blog/machine-learning-with-python-jupyter-ksql-ten…

 In it, the author discusses some of what he calls the 'impedance
 mismatch' between data engineers and production engineers.  The links to
 Ubers Michelangelo <https://eng.uber.com/michelangelo/> (which as far
 as I can tell has not been open sourced) and the Hidden Technical Debt
 in Machine Learning Systems paper

<https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf>
are
 also very interesting!

 At All hands I've been hearing more and more about using ML in
 production, so these things seem very relevant to us.  I'd love it if we
 had a working group (or whatever) that focused on how to standardize how we
 train and deploy ML for production use.

 :)
 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 --

 Aaron Halfaker

 Principal Research Scientist

 Head of the Scoring Platform team
 Wikimedia Foundation
 _______________________________________________
 Research-Internal mailing list
 Research-Internal(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/research-internal

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] [Research-Internal] Article about ML in production woes