Looks like the job now requires more memory for the executor nodes to be reliable. This is due to the coalesce introduced in https://phabricator.wikimedia.org/T323614 creating bigger parquet files, which require more memory buffering before writing.

Re-running with a bump from 9G to 12G.

On Thu, Jan 5, 2023 at 5:29 AM Marco Fossati <mfossati@wikimedia.org> wrote:
Hola Xabriel,

Forwarding this image suggestions alert, not sure if you get them.
Can you please look into it?

Thanks!
Marco

---------- Forwarded message ---------
From: <airflow-platform_eng@an-airflow1004.eqiad.wmnet>
Date: Thu, Jan 5, 2023 at 5:47 AM
Subject: [Sd-alerts] Airflow alert: <TaskInstance: image-suggestions.commons_index 2022-12-26T00:00:00+00:00 [failed]>
To: <sd-alerts@lists.wikimedia.org>


Try 6 out of 6
Exception:
SkeinHook Airflow SparkSkeinSubmitHook skein launcher image-suggestions__commons_index__20221226 application_1663082229270_669385
Log: Link
Host: an-airflow1004.eqiad.wmnet
Log file: /srv/airflow-platform_eng/logs/image-suggestions/commons_index/2022-12-26T00:00:00+00:00.log
Mark success: Link
_______________________________________________
Sd-alerts mailing list -- sd-alerts@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/sd-alerts.lists.wikimedia.org/


--
-xabriel