Re: [AI] Another round of name that thing

5 Apr 2017

      On Wed, Apr 5, 2017 at 12:55 PM, Aaron Halfaker aaron.halfaker@gmail.com
wrote:
...
Link to code?
No code yet, although there is proof of concept code which this will
inform this work at
stat1002.eqiad.wmnet:/a/ebernhardson/spark_feature_log/code
...
"ltr" means "left to right" to me.  Maybe you could do something like
"ltrank"
Sounds like LTR is out as the term is already used elsewhere and is more
widely known. LTRank isn't a bad compromise with spelling out the whole
thing.
...
On Wed, Apr 5, 2017 at 2:28 PM, Erik Bernhardson <
ebernhardson@wikimedia.org> wrote:
...
We seem to have some consensus that for the upcoming learning to rank
work we will build out a python library to handle the bulk of the backend
data plumbing work. The library will primarily be code integrating with
pyspark to do various pieces such as:
# Sampling from the click logs to generate the set of queries + page's
that will be labeled with click models
# Distributing the work of running click models against those sampled
data sets
# Pushing queries we use for feature generation into kafka, and reading
back the resulting feature vectors (the other end of this will run those
generated queries against either the hot-spare elasticsearch cluster or the
relforge cluster to get feature scores)
# Merging feature vectors with labeled data, splitting into
test/train/validate sets, and writing out files formatted for whichever
training library we decide on (xgboost, lightgbm and ranklib are in the
running currently)
# Whatever plumbing is necessary to run the actual model training and do
hyper parameter optimization
# Converting the resulting models into a format suitable for use with the
elasticsearch learn to rank plugin
# Reporting on the quality of models vs some baseline
The high level goal is that we would have relatively simple python
scripts in our analytics repository that are called from oozie, those
scripts would know the appropriate locations to load/store data and pass
into this library for the bulk of the processing. There will also be some
script, probably within the library, that combines many of these steps for
feature engineering purposes to take some set of features and run the whole
thing.
So, what do we call this thing? Horrible first attempts:

ltr-pipeline
learn-to-rank-pipeline
bob
cirrussearch-ltr
???

AI mailing list
AI@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ai

AI mailing list
AI@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ai

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [AI] Another round of name that thing