Google Code-in is an annual contest for 13-17 year old students. It will take place from Nov28 to Jan17 and is not only about coding tasks.
While we wait whether Wikimedia will get accepted: * You have small, self-contained bugs you'd like to see fixed? * Your documentation needs specific improvements? * Your user interface has small design issues? * Your Outreachy/Summer of Code project welcomes small tweaks? * You'd enjoy helping someone port your template to Lua? * Your gadget code uses some deprecated API calls? * You have tasks in mind that welcome some research?
Also note that "Beginner tasks" (e.g. "Set up Vagrant" etc) and "generic" tasks are very welcome (e.g. "Choose & fix 2 PHP7 issues from the list in https://phabricator.wikimedia.org/T120336 "). Because we will need hundreds of tasks. :)
And we also have more than 400 unassigned open 'easy' tasks listed: https://phabricator.wikimedia.org/maniphest/query/HCyOonSbFn.z/#R Would you be willing to mentor some of those in your area?
Please take a moment to find / update [Phabricator etc.] tasks in your project(s) which would take an experienced contributor 2-3 hours. Check
https://www.mediawiki.org/wiki/Google_Code-in/Mentors
and please ask if you have any questions!
For some achievements from last round, see https://blog.wikimedia.org/2017/02/03/google-code-in/
Thanks!, andre
On 10/16/2017 03:15 PM, Andre Klapper wrote:
Google Code-in is an annual contest for 13-17 year old students. It will take place from Nov28 to Jan17 and is not only about coding tasks.
[snip]
Would it be possible to add T144714, or something based on it, to the list?
https://phabricator.wikimedia.org/T144714
It's not that hard or complex and so should be quite good for that age group.
/Lars
Lars Noodén, 17/10/2017 16:16:
Would it be possible to add T144714, or something based on it, to the list?
No, because it would require the minor to sign an NDA.
Nemo
On 10/17/2017 04:45 PM, Federico Leva (Nemo) wrote:
Lars Noodén, 17/10/2017 16:16:
Would it be possible to add T144714, or something based on it, to the list?
No, because it would require the minor to sign an NDA.
Nemo
Ok. Is there a checklist of things to do that I may work on that task instead? I think the general formula can be reused for other books so it will be useful to figure out.
/Lars
Lars Noodén, 17/10/2017 17:13:
Ok. Is there a checklist of things to do that I may work on that task instead?
In theory https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations which was linked from https://phabricator.wikimedia.org/T144714#2618396 via https://meta.wikimedia.org/wiki/Research:FAQ#collaborations.
Nemo
On 10/17/2017 05:53 PM, Federico Leva (Nemo) wrote:
Lars Noodén, 17/10/2017 17:13:
Ok. Is there a checklist of things to do that I may work on that task instead?
In theory https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations which was linked from https://phabricator.wikimedia.org/T144714#2618396 via https://meta.wikimedia.org/wiki/Research:FAQ#collaborations.
Nemo
Thanks. The Formal Collaborations page [1] says that a research project page should be started in the Metawiki as a first step. That looks like it might be this page:
https://meta.wikimedia.org/wiki/Research
Is it just fine to create a page for the project proposal and add it there?
/Lars
[1] https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations
The research page ( https://meta.wikimedia.org/wiki/Research ) seems to be automatically generated.
How would I go about finding the next step towards establishing a project? I now have a preliminary draft of a proposal:
https://meta.wikimedia.org/wiki/Research:Finding_Search_Engine_Terms_Used_to...
/Lars
Hi Lars,
On Fri, Nov 3, 2017 at 4:46 AM, Lars Noodén lars.nooden@gmail.com wrote:
The research page ( https://meta.wikimedia.org/wiki/Research ) seems to be automatically generated.
How would I go about finding the next step towards establishing a project?
I assume by establishing a project you mean finding a way to get access to the data that your research proposal is going to use. If that is correct:
I now have a preliminary draft of a proposal:
https://meta.wikimedia.org/wiki/Research:Finding_Search_ Engine_Terms_Used_to_Retrieve_Wikibooks
I will review this page and get back to you next week. To set expectations: all I can promise is that we will review the page and discuss if we can find a light-weight format to help you with it. I can't promise that we can actually make it happen as the resources are very tight on our end. We will do our best.
The ticket for tracking this task is https://phabricator.wikimedia.org/T179693 .
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
/Lars
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 11/03/2017 04:12 PM, Leila Zia wrote: [snip]
I assume by establishing a project you mean finding a way to get access to the data that your research proposal is going to use. If that is correct:
Yes.
I now have a preliminary draft of a proposal:
https://meta.wikimedia.org/wiki/Research:Finding_Search_ Engine_Terms_Used_to_Retrieve_Wikibooks
I will review this page and get back to you next week. To set expectations: all I can promise is that we will review the page and discuss if we can find a light-weight format to help you with it. I can't promise that we can actually make it happen as the resources are very tight on our end. We will do our best.
Thanks. I appreciate it.
The ticket for tracking this task is https://phabricator.wikimedia.org/T179693 .
Excellent.
/Lars
By the way, the referer header would only have the search query if the user was using Google/Bing/etc. over HTTP, not HTTPS. For Google searchers using HTTPS, we'd only see they came from "https://www.google.com/", due to Google's "origin" meta referer setting ( https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin)
Since Google & Bing force you into HTTPS, we actually only end up with search queries from a few people who use very out of date browsers that don't support meta referer or HTTPS, since the latest versions of major browsers now do ( https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer#Browser_co...) So keep in mind that any retrieved data would be unrepresentative of overall population, but it doesn't look like Lars is planning to do any statistical analysis.
Another thing I'd note is that a search term may still contain sensitive information even outside the context of the rest of the search query. A phone number or an email address might show up as a single search term, and that's still PII.
- Mikhail
On Fri, Nov 3, 2017 at 10:02 AM, Lars Noodén lars.nooden@gmail.com wrote:
On 11/03/2017 04:12 PM, Leila Zia wrote: [snip]
I assume by establishing a project you mean finding a way to get access
to
the data that your research proposal is going to use. If that is
correct:
Yes.
I now have a preliminary draft of a proposal:
https://meta.wikimedia.org/wiki/Research:Finding_Search_ Engine_Terms_Used_to_Retrieve_Wikibooks
I will review this page and get back to you next week. To set expectations: all I can promise is that we will review the page and
discuss
if we can find a light-weight format to help you with it. I can't promise that we can actually make it happen as the resources are very tight on
our
end. We will do our best.
Thanks. I appreciate it.
The ticket for tracking this task is https://phabricator.wikimedia.org/T179693 .
Excellent.
/Lars
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 11/07/2017 01:44 AM, Mikhail Popov wrote:
By the way, the referer header would only have the search query if the user was using Google/Bing/etc. over HTTP, not HTTPS. For Google searchers using HTTPS, we'd only see they came from "https://www.google.com/", due to Google's "origin" meta referer setting ( https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin)
Since Google & Bing force you into HTTPS, we actually only end up with search queries from a few people who use very out of date browsers that don't support meta referer or HTTPS, since the latest versions of major browsers now do ( https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer#Browser_co...) So keep in mind that any retrieved data would be unrepresentative of overall population, but it doesn't look like Lars is planning to do any statistical analysis.
Correct, though I would be open to someone with the skill and interest helping, I don't plan to do any statistical analysis myself.
About the Referer header, from what I read the header is not sent only if "an unsecured HTTP request is used and the referring page was received with a secure protocol (HTTPS)" [1] That should be rare since the search engines now redirect to HTTPS right away and people would be entering their search terms in a form submitted over HTTPS.
A spot check of a few browsers shows the Refer header in use for these at least when using HTTPS:
Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) QupZilla/1.8.9 Safari/538.1 Mozilla/5.0 (X11; CrOS x86_64 9765.85.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.123 Safari/537.36 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36 Lynx/2.8.9dev.11 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/3.5.6
The Referrer Policy is in the draft stage [2] and might or might not affect source web sites in the future, but if it does it looks like it is a very long way from becoming widely deployed, years if ever. So it is unlikely to be a factor in Q1 2018 or Q2 2018
Another thing I'd note is that a search term may still contain sensitive information even outside the context of the rest of the search query. A phone number or an email address might show up as a single search term, and that's still PII.
It may be possible. Any suggestions on work-arounds other than manual intervention on the database results?
/Lars
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer
[2] https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-origin
About the Referer header, from what I read the header is not sent only if "an unsecured HTTP request is used and the referring page was received with a secure protocol (HTTPS)" [1] That should be rare since the search engines now redirect to HTTPS right away and people would be entering their search terms in a form submitted over HTTPS.
A spot check of a few browsers shows the Refer header in use for these at least when using HTTPS:
Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) QupZilla/1.8.9 Safari/538.1 Mozilla/5.0 (X11; CrOS x86_64 9765.85.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.123 Safari/537.36 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36 Lynx/2.8.9dev.11 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/3.5.6
The Referrer Policy is in the draft stage [2] and might or might not affect source web sites in the future, but if it does it looks like it is a very long way from becoming widely deployed, years if ever. So it is unlikely to be a factor in Q1 2018 or Q2 2018
[2] https://w3c.github.io/webappsec-referrer-policy/# referrer-policy-origin
The referrer policy is already in use at Google, which is why we don't see users' search queries in referer field in our request logs; just that they came from Google.
- Mikhail
On 11/07/2017 09:49 PM, Mikhail Popov wrote:
The referrer policy is already in use at Google, which is why we don't see users' search queries in referer field in our request logs; just that they came from Google.
Thanks. I'm looking at the current version: https://www.w3.org/TR/referrer-policy/
Are there any published articles, statistics, or reports about how widely referrer policy has already been deployed?
/Lars
I would say that referrer "origin-when-cross-origin" (Send a full URL when performing a same-origin request, but only send the origin of the document for other cases) is probably the most widely deployed default on the internets, we use it as well as google, facebook...
For wikipedia, see: https://phabricator.wikimedia.org/T87276
On Tue, Nov 7, 2017 at 12:07 PM, Lars Noodén lars.nooden@gmail.com wrote:
On 11/07/2017 09:49 PM, Mikhail Popov wrote:
The referrer policy is already in use at Google, which is why we don't see users' search queries in referer field in our request logs; just that they came from Google.
Thanks. I'm looking at the current version: https://www.w3.org/TR/referrer-policy/
Are there any published articles, statistics, or reports about how widely referrer policy has already been deployed?
/Lars
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
An update on this request:
Lars and I went off-list for a bit (Nuria and Mikhail are cc-ed in those conversations). Research doesn't have capacity to pick this task up at the moment, but if other people with appropriate access have bandwidth to pick it up and respond to it, they should feel free to. A few things for those who may be able to help: * Lars confirmed that even the skewed data from very specific browsers may help them gain some insight and it can be better than not knowing anything extra at all (the current case). * If you decide to work on a query and release the data, please ping Research and Legal before releasing it unless the data is highly aggregated. This can especially be important in this case where only a few not-very-widely used browsers are sending this information to our servers.
Lars: I'm sorry that Research was not able to be of help. With the best of our intentions, we have to say no to so many requests. We need to be aware of our already long backlogs, but also aware of other teams' backlogs that will be affected by our decision. In this case, depending on which path we go with, Research commitment can mean Security, Legal, Analytics, and Tech Ops commitment and work.
Thank you for your understanding, and I'm here to help if someone else picks up this task and they need Research input.
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Tue, Nov 7, 2017 at 12:22 PM, Nuria Ruiz nuria@wikimedia.org wrote:
I would say that referrer "origin-when-cross-origin" (Send a full URL when performing a same-origin request, but only send the origin of the document for other cases) is probably the most widely deployed default on the internets, we use it as well as google, facebook...
For wikipedia, see: https://phabricator.wikimedia.org/T87276
On Tue, Nov 7, 2017 at 12:07 PM, Lars Noodén lars.nooden@gmail.com wrote:
On 11/07/2017 09:49 PM, Mikhail Popov wrote:
The referrer policy is already in use at Google, which is why we don't see users' search queries in referer field in our request logs; just that they came from Google.
Thanks. I'm looking at the current version: https://www.w3.org/TR/referrer-policy/
Are there any published articles, statistics, or reports about how widely referrer policy has already been deployed?
/Lars
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Lars:
This is not so much a request for a research project but rather an ad-hoc requests for data. This request is a similar one to the one you mentioned on a prior thread: https://www.mail-archive.com/ analytics@lists.wikimedia.org/msg03760.html for which you filed this phabricator ticket: https://phabricator.wikimedia.org/T144714
As Leila mentioned our resources to attend to such a requests are very limited but more so, in this case most of the data you are interested in we do not have for the reason's explained prior.I am actually not sure we would have any data at all that could help you to be honest.
In case anyone wonders our FAQ for ad-hoc data requests is here: https://meta.wikimedia.org/wiki/Research:FAQ#Where_do_I_find_data_or_statist...
Thanks,
Nuria
On Fri, Nov 17, 2017 at 8:07 PM, Leila Zia leila@wikimedia.org wrote:
An update on this request:
Lars and I went off-list for a bit (Nuria and Mikhail are cc-ed in those conversations). Research doesn't have capacity to pick this task up at the moment, but if other people with appropriate access have bandwidth to pick it up and respond to it, they should feel free to. A few things for those who may be able to help:
- Lars confirmed that even the skewed data from very specific browsers
may help them gain some insight and it can be better than not knowing anything extra at all (the current case).
- If you decide to work on a query and release the data, please ping
Research and Legal before releasing it unless the data is highly aggregated. This can especially be important in this case where only a few not-very-widely used browsers are sending this information to our servers.
Lars: I'm sorry that Research was not able to be of help. With the best of our intentions, we have to say no to so many requests. We need to be aware of our already long backlogs, but also aware of other teams' backlogs that will be affected by our decision. In this case, depending on which path we go with, Research commitment can mean Security, Legal, Analytics, and Tech Ops commitment and work.
Thank you for your understanding, and I'm here to help if someone else picks up this task and they need Research input.
Best, Leila
-- Leila Zia Senior Research Scientist Wikimedia Foundation
On Tue, Nov 7, 2017 at 12:22 PM, Nuria Ruiz nuria@wikimedia.org wrote:
I would say that referrer "origin-when-cross-origin" (Send a full URL
when
performing a same-origin request, but only send the origin of the
document
for other cases) is probably the most widely deployed default on the internets, we use it as well as google, facebook...
For wikipedia, see: https://phabricator.wikimedia.org/T87276
On Tue, Nov 7, 2017 at 12:07 PM, Lars Noodén lars.nooden@gmail.com
wrote:
On 11/07/2017 09:49 PM, Mikhail Popov wrote:
The referrer policy is already in use at Google, which is why we don't see users' search queries in referer field in our request logs; just that they came from Google.
Thanks. I'm looking at the current version: https://www.w3.org/TR/referrer-policy/
Are there any published articles, statistics, or reports about how widely referrer policy has already been deployed?
/Lars
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 11/18/2017 12:40 AM, Nuria Ruiz wrote:
In case anyone wonders our FAQ for ad-hoc data requests is here: https://meta.wikimedia.org/wiki/Research:FAQ#Where_do_I_find_data_or_statist...
Thanks. That link suggests that the "Discovery" team would be the right one to contact about referred traffic. Maybe that subsection [1] could benefit from a few more lines to add direct mention of referred traffic which is e therin the main description in the FAQ and is also what I am looking for:
https://phabricator.wikimedia.org/T144714
As mentioned, I expect that from an author perspective, it is similar for other books. Also, getting some data as clues from search terms would be better than the current situation of no data.
Given the changes going on in the world with the phasing out of the "referer" and phasing in of the "origin" HTTP headers, there is less and less of that particular data going forward. However, even historical data is still somewhat useful in my case. The same for even partial coverage with current header data. Because of the restricted nature of the data searched through, there seem to be only a few people able to do task T144714, even though the query formula seems, from the outside at least, easily re-usable for other books. I don't want to bother people with repeated requests and at the same time I would not want the task neglected indefinitely.
What would be my best course of action to get someone new or old to work through T144714 as a smaller task rather than a larger project?
/Lars
[1] https://www.mediawiki.org/wiki/Wikimedia_Product#Discovery