Looks like the job now requires more memory for the executor nodes to be
reliable. This is due to the coalesce introduced in
https://phabricator.wikimedia.org/T323614 creating bigger parquet files,
which require more memory buffering before writing.
Re-running with a bump from 9G to 12G.
On Thu, Jan 5, 2023 at 5:29 AM Marco Fossati <mfossati(a)wikimedia.org> wrote:
> Hola Xabriel,
>
> Forwarding this image suggestions alert, not sure if you get them.
> Can you please look into it?
>
> Thanks!
> Marco
>
> ---------- Forwarded message ---------
> From: <airflow-platform_eng(a)an-airflow1004.eqiad.wmnet>
> Date: Thu, Jan 5, 2023 at 5:47 AM
> Subject: [Sd-alerts] Airflow alert: <TaskInstance:
> image-suggestions.commons_index 2022-12-26T00:00:00+00:00 [failed]>
> To: <sd-alerts(a)lists.wikimedia.org>
>
>
> Try 6 out of 6
> Exception:
> SkeinHook Airflow SparkSkeinSubmitHook skein launcher
> image-suggestions__commons_index__20221226 application_1663082229270_669385
> Log: Link
> <http://localhost:8080/log?execution_date=2022-12-26T00%3A00%3A00%2B00%3A00&…>
> Host: an-airflow1004.eqiad.wmnet
> Log file:
> /srv/airflow-platform_eng/logs/image-suggestions/commons_index/2022-12-26T00:00:00+00:00.log
> Mark success: Link
> <http://localhost:8080/success?task_id=commons_index&dag_id=image-suggestion…>
> _______________________________________________
> Sd-alerts mailing list -- sd-alerts(a)lists.wikimedia.org
> List information:
> https://lists.wikimedia.org/postorius/lists/sd-alerts.lists.wikimedia.org/
>
--
-xabriel