Hi!
Well, no, HDFS is a means to and end of storing data
in a form that
can be cleaned with ETL processes so that /then/ they can go to the
somewhere/something - which is a lot of use cases but most prominently
our dashboards and ad-hoc research tasks.
Thanks for explaining more! I think I understand you concern better now.
With the renewed attention to WDQS productization, the point may be moot
soon, but in case it won't be, I just wanted to explore a possibility of
using the same infrastructure but with different inputs - or maybe
possibility of building a bridge between HDFS and whatever we have in
labs. I'm not saying this necessarily makes sense, but if it doesn't,
I'd like to know why.
reinvent the wheel every time we build a thing. If we
can't do HDFS
and going to production isn't going to work, then let's talk about
what the alternatives are. Until then the use case is "the data being
in HDFS so that analysts can consume it" and higher-level use cases
are overthinking.
OK. Then if we go to production soon (hopefully) I assume we have an
existing workflow allowing us to get stuff to HDFS. If not, we _may_
(again, if that doesn't make sense, fine, but would like to hear the
reasons) explore the possibility of some process that would allow us to
get data from whatever we have now (which can be rather flexible) into
HDFS.
--
Stas Malyshev
smalyshev(a)wikimedia.org