Re: [Wikimedia-search] Concerns about production-like projects in labs

17 Jun 2015

Hi!

...
  Well, no, HDFS is a means to and end of storing data
in a form that
 can be cleaned with ETL processes so that /then/ they can go to the
 somewhere/something - which is a lot of use cases but most prominently
 our dashboards and ad-hoc research tasks. 
Thanks for explaining more! I think I understand you concern better now.
With the renewed attention to WDQS productization, the point may be moot
soon, but in case it won't be, I just wanted to explore a possibility of
using the same infrastructure but with different inputs - or maybe
possibility of building a bridge between HDFS and whatever we have in
labs. I'm not saying this necessarily makes sense, but if it doesn't,
I'd like to know why.

...
  reinvent the wheel every time we build a thing. If we
can't do HDFS
 and going to production isn't going to work, then let's talk about
 what the alternatives are. Until then the use case is "the data being
 in HDFS so that analysts can consume it" and higher-level use cases
 are overthinking. 
OK. Then if we go to production soon (hopefully) I assume we have an
existing workflow allowing us to get stuff to HDFS. If not, we _may_
(again, if that doesn't make sense, fine, but would like to hear the
reasons) explore the possibility of some process that would allow us to
get data from whatever we have now (which can be rather flexible) into
HDFS.

-- 
Stas Malyshev
smalyshev(a)wikimedia.org

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Wikimedia-search] Concerns about production-like projects in labs