Good morning data folks, This morning we migrate our webrequest dataset to feed from HAProxy instead of Varnish. The expected differences are documented in this google doc https://docs.google.com/document/d/1cCSGzLUfVWUHjqG5v5VdLADsbzmMklczQ1YG7oghGl8/edit?tab=t.0#heading=h.501d0uw4oyze, as well as the detailed analysis and the migration plan. We expect to be done in a few hours, in the meantime you may experience failing queries if you use the webrequest data, and new data will not be flowing until we are done. For those interested in following the operation, we'll post on slack in this thread https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1743494545802669. We'll post here again at the end of the operation. Thank you for your understanding :)
Hi again folks,
The operation has successfully finished.
The `wmf.webrequest` hive table now contains data coming from HAProxy. We still have to rerun downstream jobs (pageviews mostly) for the today's first hours so that we have a clean cut on data at April 1st. 6y. This will be done in the next hours but shouldn't disrupt your work.
*An important thing to note* is that for cluster space reasons we only have data since mid-March in the `wmf.webrequest` table. It'll build up to the 90 days retention in the next months. If you need older data, you can use the `wmf_deprecated.webrequest` hive table that contains the 'old' Varnish data we keep in case something goes wrong with HAProxy. We will keep this table for at least a month.
Of course if you spot something odd or if you have questions, please let us know :)
On Tue, Apr 1, 2025 at 10:25 AM Joseph Allemandou jallemandou@wikimedia.org wrote:
Good morning data folks, This morning we migrate our webrequest dataset to feed from HAProxy instead of Varnish. The expected differences are documented in this google doc https://docs.google.com/document/d/1cCSGzLUfVWUHjqG5v5VdLADsbzmMklczQ1YG7oghGl8/edit?tab=t.0#heading=h.501d0uw4oyze, as well as the detailed analysis and the migration plan. We expect to be done in a few hours, in the meantime you may experience failing queries if you use the webrequest data, and new data will not be flowing until we are done. For those interested in following the operation, we'll post on slack in this thread https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1743494545802669. We'll post here again at the end of the operation. Thank you for your understanding :)
-- Joseph Allemandou (joal) (he / him) Staff Data Engineer Wikimedia Foundation
awesome job, thank you!
On Tue, Apr 1, 2025 at 7:55 AM Joseph Allemandou jallemandou@wikimedia.org wrote:
Hi again folks,
The operation has successfully finished.
The `wmf.webrequest` hive table now contains data coming from HAProxy. We still have to rerun downstream jobs (pageviews mostly) for the today's first hours so that we have a clean cut on data at April 1st. 6y. This will be done in the next hours but shouldn't disrupt your work.
*An important thing to note* is that for cluster space reasons we only have data since mid-March in the `wmf.webrequest` table. It'll build up to the 90 days retention in the next months. If you need older data, you can use the `wmf_deprecated.webrequest` hive table that contains the 'old' Varnish data we keep in case something goes wrong with HAProxy. We will keep this table for at least a month.
Of course if you spot something odd or if you have questions, please let us know :)
On Tue, Apr 1, 2025 at 10:25 AM Joseph Allemandou < jallemandou@wikimedia.org> wrote:
Good morning data folks, This morning we migrate our webrequest dataset to feed from HAProxy instead of Varnish. The expected differences are documented in this google doc https://docs.google.com/document/d/1cCSGzLUfVWUHjqG5v5VdLADsbzmMklczQ1YG7oghGl8/edit?tab=t.0#heading=h.501d0uw4oyze, as well as the detailed analysis and the migration plan. We expect to be done in a few hours, in the meantime you may experience failing queries if you use the webrequest data, and new data will not be flowing until we are done. For those interested in following the operation, we'll post on slack in this thread https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1743494545802669. We'll post here again at the end of the operation. Thank you for your understanding :)
-- Joseph Allemandou (joal) (he / him) Staff Data Engineer Wikimedia Foundation
-- Joseph Allemandou (joal) (he / him) Staff Data Engineer Wikimedia Foundation
analytics-announce@lists.wikimedia.org