Hey all,
Trying to upload a cohort of about 700 users just gives me a 504. What gives?
All of labs is undergoing maintenance right now.
On Thu, Oct 10, 2013 at 6:28 PM, Steven Walling swalling@wikimedia.orgwrote:
Hey all,
Trying to upload a cohort of about 700 users just gives me a 504. What gives?
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Steven,
It could be related to general Labs maintenance that is happening right now but to be sure can you email me your cohort so we can test it ourselves tomorrow? D
On Thu, Oct 10, 2013 at 9:28 PM, Steven Walling swalling@wikimedia.orgwrote:
Hey all,
Trying to upload a cohort of about 700 users just gives me a 504. What gives?
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Thu, Oct 10, 2013 at 6:32 PM, Diederik van Liere <dvanliere@wikimedia.org
wrote:
Hi Steven,
It could be related to general Labs maintenance that is happening right now but to be sure can you email me your cohort so we can test it ourselves tomorrow? D
I'll try it again when Labs is not undergoing maintenance. It's not critical.
Hi Steven -- is this working now?
Cheers,
-Toby
On Thu, Oct 10, 2013 at 6:36 PM, Steven Walling swalling@wikimedia.orgwrote:
On Thu, Oct 10, 2013 at 6:32 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:
Hi Steven,
It could be related to general Labs maintenance that is happening right now but to be sure can you email me your cohort so we can test it ourselves tomorrow? D
I'll try it again when Labs is not undergoing maintenance. It's not critical.
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I think we should set the right expectations about working with large cohorts.
Quoting Dan's response from September 17:
I just wanted to correct one small misunderstanding. Running large cohorts does *not* work in wikimetrics at this time for two reasons:
You'll have a problem uploading them as Dario mentioned (because it validates each user individually against the database, as Dario guessed). The best solution for this is to create a temp table of all the users we are trying to upload and verify them in one query. This would be very fast and not too hard to implement.
A large cohort will not fit in the "IN" clause of a SQL query. This is a known limitation and we have to fix it by creating a temporary table from the cohort. We can then join to the temp table for any metrics. The reason I've delayed this is because the same mechanism could be used to implement dynamic cohorts, boolean cohort combinations, and project level cohorts. We should prioritize these technically related features and then I can come up with a plan to do the minimally viable thing without shooting ourselves in the foot.
Hope that makes sense. As for the rest, I leave prioritization up to you guys except where it touches on technical issues, as above.
Dario
On Oct 11, 2013, at 9:50 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Steven -- is this working now?
Cheers,
-Toby
On Thu, Oct 10, 2013 at 6:36 PM, Steven Walling swalling@wikimedia.org wrote:
On Thu, Oct 10, 2013 at 6:32 PM, Diederik van Liere dvanliere@wikimedia.org wrote: Hi Steven,
It could be related to general Labs maintenance that is happening right now but to be sure can you email me your cohort so we can test it ourselves tomorrow? D
I'll try it again when Labs is not undergoing maintenance. It's not critical.
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
What is meant by large here?
On Fri, Oct 11, 2013 at 10:42 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I think we should set the right expectations about working with large cohorts.
Quoting Dan's response from September 17:
I just wanted to correct one small misunderstanding. Running large cohorts does *not* work in wikimetrics at this time for two reasons:
- You'll have a problem uploading them as Dario mentioned (because it
validates each user individually against the database, as Dario guessed). The best solution for this is to create a temp table of all the users we are trying to upload and verify them in one query. This would be very fast and not too hard to implement.
- A large cohort will not fit in the "IN" clause of a SQL query. This is
a known limitation and we have to fix it by creating a temporary table from the cohort. We can then join to the temp table for any metrics. The reason I've delayed this is because the same mechanism could be used to implement dynamic cohorts, boolean cohort combinations, and project level cohorts. We should prioritize these technically related features and then I can come up with a plan to do the minimally viable thing without shooting ourselves in the foot.
Hope that makes sense. As for the rest, I leave prioritization up to you guys except where it touches on technical issues, as above.
Dario
On Oct 11, 2013, at 9:50 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Steven -- is this working now?
Cheers,
-Toby
On Thu, Oct 10, 2013 at 6:36 PM, Steven Walling swalling@wikimedia.orgwrote:
On Thu, Oct 10, 2013 at 6:32 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:
Hi Steven,
It could be related to general Labs maintenance that is happening right now but to be sure can you email me your cohort so we can test it ourselves tomorrow? D
I'll try it again when Labs is not undergoing maintenance. It's not critical.
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Fri, Oct 11, 2013 at 10:42 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I think we should set the right expectations about working with large cohorts.
I would not call 700 editors a large cohort :) this should just work fine. We don't know yet whether this was related to labs maintenance or not and I asked Steven to share the cohort with us to verify that it works.
Quoting Dan's response from September 17:
I just wanted to correct one small misunderstanding. Running large cohorts does *not* work in wikimetrics at this time for two reasons:
- You'll have a problem uploading them as Dario mentioned (because it
validates each user individually against the database, as Dario guessed). The best solution for this is to create a temp table of all the users we are trying to upload and verify them in one query. This would be very fast and not too hard to implement.
Uploading a cohort this should work work but it's a blocking operation
which is not very user friendly, Mingle card 818 addresses this issue.
- A large cohort will not fit in the "IN" clause of a SQL query. This is
a known limitation and we have to fix it by creating a temporary table from the cohort. We can then join to the temp table for any metrics. The reason I've delayed this is because the same mechanism could be used to implement dynamic cohorts, boolean cohort combinations, and project level cohorts. We should prioritize these technically related features and then I can come up with a plan to do the minimally viable thing without shooting ourselves in the foot.
I did some calculations and it seems that this is only an issue with
cohorts larger than 200k editors.
Hope that makes sense. As for the rest, I leave prioritization up to you guys except where it touches on technical issues, as above.
Dario
On Oct 11, 2013, at 9:50 AM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Steven -- is this working now?
Cheers,
-Toby
On Thu, Oct 10, 2013 at 6:36 PM, Steven Walling swalling@wikimedia.orgwrote:
On Thu, Oct 10, 2013 at 6:32 PM, Diederik van Liere < dvanliere@wikimedia.org> wrote:
Hi Steven,
It could be related to general Labs maintenance that is happening right now but to be sure can you email me your cohort so we can test it ourselves tomorrow? D
I'll try it again when Labs is not undergoing maintenance. It's not critical.
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Fri, Oct 11, 2013 at 10:42 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote: I think we should set the right expectations about working with large cohorts. I would not call 700 editors a large cohort :) this should just work fine. We don't know yet whether this was related to labs maintenance or not and I asked Steven to share the cohort with us to verify that it works.
I tried uploading a CSV with 500 user_ids and got the same 504 Gateway Time-out error as Steven
Quoting Dan's response from September 17:
I just wanted to correct one small misunderstanding. Running large cohorts does *not* work in wikimetrics at this time for two reasons:
- You'll have a problem uploading them as Dario mentioned (because it validates each user individually against the database, as Dario guessed). The best solution for this is to create a temp table of all the users we are trying to upload and verify them in one query. This would be very fast and not too hard to implement.
Uploading a cohort this should work work but it's a blocking operation which is not very user friendly, Mingle card 818 addresses this issue.
Thanks, bookmarked :)
- A large cohort will not fit in the "IN" clause of a SQL query. This is a known limitation and we have to fix it by creating a temporary table from the cohort. We can then join to the temp table for any metrics. The reason I've delayed this is because the same mechanism could be used to implement dynamic cohorts, boolean cohort combinations, and project level cohorts. We should prioritize these technically related features and then I can come up with a plan to do the minimally viable thing without shooting ourselves in the foot.
I did some calculations and it seems that this is only an issue with cohorts larger than 200k editors.
We will need to discuss and benchmark performance for these jobs, this is one of the issues that I'd personally like to see prioritized over UX as it's something the entire Product team would benefit from.
Dario
On Fri, Oct 11, 2013 at 10:58 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
On Fri, Oct 11, 2013 at 10:42 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I think we should set the right expectations about working with large cohorts.
I would not call 700 editors a large cohort :) this should just work fine. We don't know yet whether this was related to labs maintenance or not and I asked Steven to share the cohort with us to verify that it works.
I tried uploading a CSV with 500 user_ids and got the same 504 Gateway Time-out error as Steven
It might have to do with the proxy server we are using for the SSL
certificate. I believe Dan is looking into this right now. (thanks Yuvi for pointing this out)
Quoting Dan's response from September 17:
I just wanted to correct one small misunderstanding. Running large cohorts does *not* work in wikimetrics at this time for two reasons:
- You'll have a problem uploading them as Dario mentioned (because it
validates each user individually against the database, as Dario guessed). The best solution for this is to create a temp table of all the users we are trying to upload and verify them in one query. This would be very fast and not too hard to implement.
Uploading a cohort this should work work but it's a blocking operation
which is not very user friendly, Mingle card 818 addresses this issue.
Thanks, bookmarked :)
- A large cohort will not fit in the "IN" clause of a SQL query. This
is a known limitation and we have to fix it by creating a temporary table from the cohort. We can then join to the temp table for any metrics. The reason I've delayed this is because the same mechanism could be used to implement dynamic cohorts, boolean cohort combinations, and project level cohorts. We should prioritize these technically related features and then I can come up with a plan to do the minimally viable thing without shooting ourselves in the foot.
I did some calculations and it seems that this is only an issue with
cohorts larger than 200k editors.
We will need to discuss and benchmark performance for these jobs, this is one of the issues that I'd personally like to see prioritized over UX as it's something the entire Product team would benefit from.
Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Dear All,
It turns out it was a timeout setting issue with the new HTTPS gateway. It was set to 60 seconds which is reasonable, but not if all that heavy validation is happening synchronously. So we upped it temporarily but this just means we need to schedule Mingle #818 ASAP.
Thanks for your patience, all should be well now.
Dan
On Fri, Oct 11, 2013 at 2:13 PM, Diederik van Liere <dvanliere@wikimedia.org
wrote:
On Fri, Oct 11, 2013 at 10:58 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
On Fri, Oct 11, 2013 at 10:42 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I think we should set the right expectations about working with large cohorts.
I would not call 700 editors a large cohort :) this should just work fine. We don't know yet whether this was related to labs maintenance or not and I asked Steven to share the cohort with us to verify that it works.
I tried uploading a CSV with 500 user_ids and got the same 504 Gateway Time-out error as Steven
It might have to do with the proxy server we are using for the SSL
certificate. I believe Dan is looking into this right now. (thanks Yuvi for pointing this out)
Quoting Dan's response from September 17:
I just wanted to correct one small misunderstanding. Running large cohorts does *not* work in wikimetrics at this time for two reasons:
- You'll have a problem uploading them as Dario mentioned (because it
validates each user individually against the database, as Dario guessed). The best solution for this is to create a temp table of all the users we are trying to upload and verify them in one query. This would be very fast and not too hard to implement.
Uploading a cohort this should work work but it's a blocking operation
which is not very user friendly, Mingle card 818 addresses this issue.
Thanks, bookmarked :)
- A large cohort will not fit in the "IN" clause of a SQL query. This
is a known limitation and we have to fix it by creating a temporary table from the cohort. We can then join to the temp table for any metrics. The reason I've delayed this is because the same mechanism could be used to implement dynamic cohorts, boolean cohort combinations, and project level cohorts. We should prioritize these technically related features and then I can come up with a plan to do the minimally viable thing without shooting ourselves in the foot.
I did some calculations and it seems that this is only an issue with
cohorts larger than 200k editors.
We will need to discuss and benchmark performance for these jobs, this is one of the issues that I'd personally like to see prioritized over UX as it's something the entire Product team would benefit from.
Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
thank you guys.
On Oct 11, 2013, at 11:45 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
Dear All,
It turns out it was a timeout setting issue with the new HTTPS gateway. It was set to 60 seconds which is reasonable, but not if all that heavy validation is happening synchronously. So we upped it temporarily but this just means we need to schedule Mingle #818 ASAP.
Thanks for your patience, all should be well now.
Dan
On Fri, Oct 11, 2013 at 2:13 PM, Diederik van Liere dvanliere@wikimedia.org wrote:
On Fri, Oct 11, 2013 at 10:58 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
On Fri, Oct 11, 2013 at 10:42 AM, Dario Taraborelli dtaraborelli@wikimedia.org wrote: I think we should set the right expectations about working with large cohorts. I would not call 700 editors a large cohort :) this should just work fine. We don't know yet whether this was related to labs maintenance or not and I asked Steven to share the cohort with us to verify that it works.
I tried uploading a CSV with 500 user_ids and got the same 504 Gateway Time-out error as Steven
It might have to do with the proxy server we are using for the SSL certificate. I believe Dan is looking into this right now. (thanks Yuvi for pointing this out)
Quoting Dan's response from September 17:
I just wanted to correct one small misunderstanding. Running large cohorts does *not* work in wikimetrics at this time for two reasons:
- You'll have a problem uploading them as Dario mentioned (because it validates each user individually against the database, as Dario guessed). The best solution for this is to create a temp table of all the users we are trying to upload and verify them in one query. This would be very fast and not too hard to implement.
Uploading a cohort this should work work but it's a blocking operation which is not very user friendly, Mingle card 818 addresses this issue.
Thanks, bookmarked :)
- A large cohort will not fit in the "IN" clause of a SQL query. This is a known limitation and we have to fix it by creating a temporary table from the cohort. We can then join to the temp table for any metrics. The reason I've delayed this is because the same mechanism could be used to implement dynamic cohorts, boolean cohort combinations, and project level cohorts. We should prioritize these technically related features and then I can come up with a plan to do the minimally viable thing without shooting ourselves in the foot.
I did some calculations and it seems that this is only an issue with cohorts larger than 200k editors.
We will need to discuss and benchmark performance for these jobs, this is one of the issues that I'd personally like to see prioritized over UX as it's something the entire Product team would benefit from.
Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Temperoarily fixed with https://gerrit.wikimedia.org/r/#/c/89247/. A more permanent fix is to make it not block that long :)
On Fri, Oct 11, 2013 at 11:45 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:
Dear All,
It turns out it was a timeout setting issue with the new HTTPS gateway. It was set to 60 seconds which is reasonable, but not if all that heavy validation is happening synchronously. So we upped it temporarily but this just means we need to schedule Mingle #818 ASAP.
Thanks for your patience, all should be well now.
Thanks Dan. It's working fine for me now.
I'm looking into this guys. The largest cohort we have so far has over 2500 editors in it, so 700 is definitely not too big. I'm not sure how large something would have to be before it broke the system, but I'm guessing tens of thousands if not hundreds like Dario says.
The 504 timeout should not be happening. Could one of you email me the cohort so I can test?
Dan
On Fri, Oct 11, 2013 at 1:58 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
On Fri, Oct 11, 2013 at 10:42 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
I think we should set the right expectations about working with large cohorts.
I would not call 700 editors a large cohort :) this should just work fine. We don't know yet whether this was related to labs maintenance or not and I asked Steven to share the cohort with us to verify that it works.
I tried uploading a CSV with 500 user_ids and got the same 504 Gateway Time-out error as Steven
Quoting Dan's response from September 17:
I just wanted to correct one small misunderstanding. Running large cohorts does *not* work in wikimetrics at this time for two reasons:
- You'll have a problem uploading them as Dario mentioned (because it
validates each user individually against the database, as Dario guessed). The best solution for this is to create a temp table of all the users we are trying to upload and verify them in one query. This would be very fast and not too hard to implement.
Uploading a cohort this should work work but it's a blocking operation
which is not very user friendly, Mingle card 818 addresses this issue.
Thanks, bookmarked :)
- A large cohort will not fit in the "IN" clause of a SQL query. This
is a known limitation and we have to fix it by creating a temporary table from the cohort. We can then join to the temp table for any metrics. The reason I've delayed this is because the same mechanism could be used to implement dynamic cohorts, boolean cohort combinations, and project level cohorts. We should prioritize these technically related features and then I can come up with a plan to do the minimally viable thing without shooting ourselves in the foot.
I did some calculations and it seems that this is only an issue with
cohorts larger than 200k editors.
We will need to discuss and benchmark performance for these jobs, this is one of the issues that I'd personally like to see prioritized over UX as it's something the entire Product team would benefit from.
Dario
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics