Hi,
I[0] am an Outreachy Round 15 intern and I am working on AICaptcha[1]
project. This project is aimed at creating a better captcha system (like
the Google invisible captcha) which can prevent/reduce the incidence of
bots creating user accounts and spamming Wikipedia. My mentors on this
project are Gergő Tisza[2] and Adam Roses Wight[3].
The key aspects of this project are:
1. Data capture for training a machine learning classifier which is
elaborated in Phabricator task[4]. The data can be captured from the
registration page using the WikiMediaEvents extension.
2. Feature selection, dealing with selecting the most appropriate features
which can improve the classification model, explained in Phabricator task[5]
3. Finding appropriate machine learning classifier to create the model [6].
Kindly provide suggestions/ideas on Phabricator, so that any idea missed by
oversight can be discussed. Also if there are any possible issues which I
have not thought about yet, please comment on the tasks so that I can take
care of them sooner rather than later.
Links:
[0]: https://meta.wikimedia.org/wiki/User:Groovier
[1] : https://phabricator.wikimedia.org/project/profile/3137/
[2]: https://www.mediawiki.org/wiki/User:Tgr_(WMF)
[3]: https://www.mediawiki.org/wiki/User:Adamw
[4]: https://phabricator.wikimedia.org/T183991
[5]:https://phabricator.wikimedia.org/T183998
[6]: https://phabricator.wikimedia.org/T184013
Thanks
Vinitha
Hi there wiki-research folks,
This is just a heads-up that English Wikipedia has adopted a new policy[1]
about research on that project. The policy codifies some new requirements
for community notification and disclosure that potentially apply to all
research projects (regardless of the affiliation of the researcher).
You can read more about the policy on WP:NOT[1], but I've included the
major points below for your convenience:
- any research project that involves directly changing article content,
surveying a large number of editors, or asking editors sensitive questions
about their real-life identities needs to be discussed on Wikipedia's
Village Pump[2] before it is begun[3]
- researchers should disclose who they are on their user pages,
including their institutional affiliation, sources of research funding (if
applicable), and the intentions behind their research[4]
Many aspects of this policy boil down to either common sense, existing
ethical standards for human subjects research, or both. However, this
policy also leaves certain definitions and thresholds undefined. What is a
"large number" of surveyed users? What is a "sensitive question"?
There are no concrete answer to these questions yet, and that's probably a
good thing. The best way to keep this policy from becoming overly
restrictive[5] is for researchers to follow its guidance in good faith, and
ask questions when they're uncertain.
Projects that are deemed to be in violation of these guidelines may lose
editing privileges. If the violations are deemed particularly frequent or
severe, the EnWiki community may decide to make even more rules, which
could have a chilling effect on wikiresearch in general. Nobody wants
that.
If you have general questions about this policy or its application, the
best place to ask is the WP:NOT talkpage.[6]
If you have questions related to a specific planned research project, the
best thing to do is to err on the side of caution and open up a discussion
on the Village Pump before you begin.
You are also welcome to post your project plan to this list, where we, your
friendly peers, will hopefully offer constructive feedback and links to
relevant resources.
Wikimedia Foundation research staff are not in charge of these guidelines,
but are happy to offer advice "from the trenches" so to speak if asked. We
are on this list too.
As always, if you are currently researching Wikipedia, or plan to do so,
please create a Research Project page on MetaWiki[7] (example[8], tips[9]),
keep it up to date, and link to it from your userpage[10]. That way
interested parties can follow your research and ask questions, and you
won't need to constantly re-explain what you're doing every time someone
asks.
Happy researching,
Jonathan
1.
https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_…
2. https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)
3.
https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#cite_note-7
4.
https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#cite_note-8
5. https://meta.wikimedia.org/wiki/Instruction_creep
6. https://en.wikipedia.org/wiki/Wikipedia_talk:What_Wikipedia_is_not
7. https://meta.wikimedia.org/wiki/Research:Projects
8.
https://meta.wikimedia.org/wiki/Research:Supporting_Commons_contribution_by…
9.
https://meta.wikimedia.org/wiki/Research:Project_documentation_best_practic…
10. https://meta.wikimedia.org/wiki/User:LZia_(WMF)
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Hi Jonathan,
Can you please give a concrete example of what, for example, the
http://ide.mit.edu/sites/default/files/publications/SSRN-id3039505.pdf
researchers would have had to do differently under this new policy?
Best regards,
Jim
> Date: Tue, 2 Jan 2018 15:29:03 -0800
> From: Jonathan Morgan <jmorgan(a)wikimedia.org>
> To: Wiki Research-l <Wiki-research-l(a)lists.wikimedia.org>
> Subject: [Wiki-research-l] New policy about performing research on English Wikipedia
>
> Hi there wiki-research folks,
>
> This is just a heads-up that English Wikipedia has adopted a new policy[1]
> about research on that project. The policy codifies some new requirements
> for community notification and disclosure that potentially apply to all
> research projects (regardless of the affiliation of the researcher).
>
> You can read more about the policy on WP:NOT[1], but I've included the
> major points below for your convenience:
>
> - any research project that involves directly changing article content,
> surveying a large number of editors, or asking editors sensitive questions
> about their real-life identities needs to be discussed on Wikipedia's
> Village Pump[2] before it is begun[3]
> - researchers should disclose who they are on their user pages,
> including their institutional affiliation, sources of research funding (if
> applicable), and the intentions behind their research[4]
>
> Many aspects of this policy boil down to either common sense, existing
> ethical standards for human subjects research, or both. However, this
> policy also leaves certain definitions and thresholds undefined. What is a
> "large number" of surveyed users? What is a "sensitive question"?
> There are no concrete answer to these questions yet, and that's probably a
> good thing. The best way to keep this policy from becoming overly
> restrictive[5] is for researchers to follow its guidance in good faith, and
> ask questions when they're uncertain.
>
> Projects that are deemed to be in violation of these guidelines may lose
> editing privileges. If the violations are deemed particularly frequent or
> severe, the EnWiki community may decide to make even more rules, which
> could have a chilling effect on wikiresearch in general. Nobody wants
> that.
>
> If you have general questions about this policy or its application, the
> best place to ask is the WP:NOT talkpage.[6]
>
> If you have questions related to a specific planned research project, the
> best thing to do is to err on the side of caution and open up a discussion
> on the Village Pump before you begin.
>
> You are also welcome to post your project plan to this list, where we, your
> friendly peers, will hopefully offer constructive feedback and links to
> relevant resources.
>
> Wikimedia Foundation research staff are not in charge of these guidelines,
> but are happy to offer advice "from the trenches" so to speak if asked. We
> are on this list too.
>
> As always, if you are currently researching Wikipedia, or plan to do so,
> please create a Research Project page on MetaWiki[7] (example[8], tips[9]),
> keep it up to date, and link to it from your userpage[10]. That way
> interested parties can follow your research and ask questions, and you
> won't need to constantly re-explain what you're doing every time someone
> asks.
>
> Happy researching,
>
> Jonathan
>
>
>
> 1.
> https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_…
> 2. https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)
> 3.
> https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#cite_note-7
> 4.
> https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#cite_note-8
> 5. https://meta.wikimedia.org/wiki/Instruction_creep
> 6. https://en.wikipedia.org/wiki/Wikipedia_talk:What_Wikipedia_is_not
> 7. https://meta.wikimedia.org/wiki/Research:Projects
> 8.
> https://meta.wikimedia.org/wiki/Research:Supporting_Commons_contribution_by…
> 9.
> https://meta.wikimedia.org/wiki/Research:Project_documentation_best_practic…
> 10. https://meta.wikimedia.org/wiki/User:LZia_(WMF)
>
>
>
>
> --
> Jonathan T. Morgan
> Senior Design Researcher
> Wikimedia Foundation
> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Dear Users,
I thank you for your efforts. The Call for proposals for WikiIndaba 2018, the 3rd African Wikimedia community conference is now open. If you want to participate and share your experience, tools, skills, knowledge, opinions or ideas with mostly active African wikimedians, please submit your proposals to https://meta.m.wikimedia.org/wiki/WikiIndaba_conference_2018/Submissions. The deadline for that is January 15th.
Yours Sincerely,
Houcemeddine Turki
Dear Users,
I have the great honour to inform you that the Call for Proposals for WikiIndaba 2018 is now open. WikiIndaba 2018 is the 3rd conference of African Wikimedia movement and will give to participants the opportunity to share their Wikimedia-related experience and skills with a wide and active African Wikimedia audience. The conference will be held in Tunisia from 16 to 18 March 2018. If you want to participate to WikiIndaba and share your works and thoughts with African Wikimedians, feel free to submit your proposal in https://meta.m.wikimedia.org/wiki/WikiIndaba_conference_2018/Submissions. The deadline for giving proposals will be January 15th, 2018.
If you need a scholarship to attend WikiIndaba 2018, you can apply to it in https://docs.google.com/forms/d/e/1FAIpQLSdJJ2I0FBqp4SuiW5ypj-9lnLaAidUmhMs….
Looking forward to seeing you in Tunis next March.
Yours Sincerely,
Houcemeddine Turki
Felix Nartey
Isla Haddow-Flood
Dear Users,
I have the great honour to inform you that the Call for Proposals for WikiIndaba 2018 is now open. WikiIndaba 2018 is the 3rd conference of African Wikimedia movement and will give to participants the opportunity to share their Wikimedia-related experience and skills with a wide and active African Wikimedia audience. The conference will be held in Tunisia from 16 to 18 March 2018. If you want to participate to WikiIndaba and share your works and thoughts with African Wikimedians, feel free to submit your proposal in https://meta.m.wikimedia.org/wiki/WikiIndaba_conference_2018/Submissions. The deadline for giving proposals will be January 15th, 2018.
If you need a scholarship to attend WikiIndaba 2018, you can apply to it in https://docs.google.com/forms/d/e/1FAIpQLSdJJ2I0FBqp4SuiW5ypj-9lnLaAidUmhMs….
Looking forward to seeing you in Tunis next March.
Yours Sincerely,
Houcemeddine Turki
Felix Nartey
Isla Haddow-Flood
Cross-posting from analytics – very excited about this announcement.
Congrats on the launch!
---------- Forwarded message ----------
From: Nuria Ruiz <nuria(a)wikimedia.org>
Date: Wed, Dec 13, 2017 at 8:25 PM
Subject: [Analytics] Wikistats gets a facelift - Alpha Launch of Wikistats 2
To: "A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics." <analytics(a)lists.wikimedia.org>
Hello from Analytics Team!
We are happy to announce the Alpha release of Wikistats 2. Wikistats has
been redesigned for architectural simplicity, faster data processing, and a
more dynamic and interactive user experience. First goal is to match the
numbers of the current system, and to provide the most important reports,
as decided by the Wikistats community (see survey) [1]. Over time, we will
continue to migrate reports and add new ones that you find useful. We can
also analyze the data in new and interesting ways, and look forward to
hearing your feedback and suggestions. [2]
You can go directly to Spanish Wikipedia
https://stats.wikimedia.org/v2/#/es.wikipedia.org
or browse all projects
https://stats.wikimedia.org/v2/#/all-projects
The new site comes with a whole new set of APIs, similar to our existing
Pageview API but with edit data. You can start using them today, they are
documented here:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats
FAQ:
Why is this an alpha?
There are features that we feel a full-fledged product should have that are
still missing, such as localization. The data-processing pipeline for the
new Wikistats has been rebuilt from scratch (it uses distributed-computing
tools such as Hadoop) and we want to see how it is used before calling it
final. Also while we aim to update data monthly, it will happen a few days
after the month rolls because of the amount of data to move and compute.
How about comparing data between two wikis?
You can do it with two tabs but we are aware this UI might not solve all
use cases for the most advanced Wikistats users. We aim to tackle those in
the future.
How do I file bugs?
Use the handy link in the footer: https://phabricator.wikimedia.
org/maniphest/task/edit/?title=Wikistats%20Bug&projectPHIDs=Analytics-
Wikistats,Analytics
How do I comment on design?
The consultation on design already happened but we are still watching the
talk page: https://www.mediawiki.org/wiki/Wikistats_2.0_Design_
Project/RequestforFeedback/Round2
[1] https://www.mediawiki.org/wiki/Analytics/Wikistats/
DumpReports/Future_per_report
[2] https://wikitech.wikimedia.org/wiki/Talk:Analytics/Systems/Wikistats
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
*Dario Taraborelli *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
Apologies, a clarification about the time:
This will be at 11:30AM (PST) 19:30 UTC.
On Mon, Dec 11, 2017 at 4:21 PM, Lani Goto <lgoto(a)wikimedia.org> wrote:
> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, December
> 13, 2017 at 11:15 AM (PST) 18:15 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=OoVwus1Owtk
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here.
>
> This month's presentation:
> *The State of the Article Expansion Recommendation System*
> By Leila Zia
> Only 1% of English Wikipedia articles are labeled with quality class Good
> or better, and 37% of the articles are stubs. We are building an article
> expansion recommendation system to change this in Wikipedia, across many
> languages. In this presentation, I will talk with you about our current
> thinking of the vision and direction of the research that can help us build
> such a recommendation system, and share more about one specific area of
> research we have heavily focused on in the past months: building a
> recommendation system that can help editors identify what sections to add
> to an already existing article. I present some of the challenges we faced,
> the methods we devised or used to overcome them, and the result of the
> first line of experiments on the quality of such recommendations (teaser:
> the results are really promising. The precision and recall at 10 is 80%.)
>
>
> --
> Lani Goto
> Project Assistant, Engineering Admin
>
--
Lani Goto
Project Assistant, Engineering Admin
Leila and Lani,
The Article Expansion Recommendation System is an absolutely
spectacular project, which will clearly very substantially improve the
encyclopedia in ways that perhaps no other single effort has come near
to being able, so I can't wait to learn more about it. But I might not
be able to make the live-stream time, so I want to get in this
question in advance:
Are you using or do you plan to use ORES quality predictions, the
upcoming article importance predictions, and pageview statistics to
rank article expansion recommendations?
> The next Research Showcase will be live-streamed this Wednesday, December
> 13, 2017 at 11:15 AM (PST) 18:15 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=OoVwus1Owtk
>
> As usual, you can join the conversation on IRC at #wikimedia-research. And,
> you can watch our past research showcases here.
>
> This month's presentation:
> "The State of the Article Expansion Recommendation System"
>
> By Leila Zia
>
> Only 1% of English Wikipedia articles are labeled with quality class Good
> or better, and 37% of the articles are stubs. We are building an article
> expansion recommendation system to change this in Wikipedia, across many
> languages. In this presentation, I will talk with you about our current
> thinking of the vision and direction of the research that can help us build
> such a recommendation system, and share more about one specific area of
> research we have heavily focused on in the past months: building a
> recommendation system that can help editors identify what sections to add
> to an already existing article. I present some of the challenges we faced,
> the methods we devised or used to overcome them, and the result of the
> first line of experiments on the quality of such recommendations (teaser:
> the results are really promising. The precision and recall at 10 is 80%.)