VcamX created this task. VcamX claimed this task. VcamX added subscribers: Aklapper, NiharikaKohli, JeanFred, Legoktm, Qgil, Ricordisamoa, Halfak, Krinkle, jayvdb, valhallasw, MZMcBride, csteipp, pywikipedia-bugs, VcamX. VcamX added projects: Outreachy-Round-10, Google-Summer-of-Code-2015, Pywikibot-network, Pywikibot-login.py, Possible-Tech-Projects, Google-Code-in-2014, pywikibot-core, Pywikibot-General.
TASK DESCRIPTION # Implement OAuth Support for Pywikibot
## Abstraction
This project is focusing on implement OAuth support for Pywikibot, which is a collection of tools that automate work on MediaWiki sites. OAuth will offer a more reasonable, safe and robust way to identity authentication for users who use Pywikibot to maintain their MediaWiki sites.
## Name and contact information
- Name: Jiarong Wei - Email: vcamx3@gmail.com - IRC Nickname: VcamX - Blog: [http://vcamx.net%5D(http://vcamx.net) - GitHub: [https://github.com/VcamX%5D(https://github.com/VcamX) - Resume: [http://vcamx.net/resume/%5D(http://vcamx.net/resume/) - Location: Hangzhou, Zhejiang - Typical working hours: (UTC+8:00) 9:30 AM to 17:00 PM. Waking hours are 8:30 AM to 23:00 PM.
## Synopsis
OAuth is a popular open standard which allows third-party applications to access sensitive resources on some websites on behalf of users. Usually if applications have to access some user's data on other websites, users may expose their username and password to those applications, which may cause the risk of leaking information. OAuth uses tokens as the solution. Applications would get tokens instead of users' name and password. Different token means different privilege. Tokens can be assigned and revoked. Applications can only access resources which users want them to access.
As a typical toolkit of MediaWiki, it's important for Pywikibot to supply reasonable, safe and robust authentication method to users. MediaWiki support OAuth 1.0a through OAuth extension. Considering its rule as an automatic tool of cooperating people to manage wiki sites, it may involve high level account, like sysop account. If users are less careful, they could also leak their password through logs. Leaking of such information is very serious. So it's reasonable to assign tokens (limited privilage) instead of password (unlimited privilage) to applications which serve MediaWiki projects, e.g. Pywikibot. Thus, besides default password-authentication method, supporting OAuth for Pywikibot is a necessary and urgent mission.
Possible mentors: jayvdb and Hallfak
## Deliverables
According to the description of [T74065](https://phabricator.wikimedia.org/T74065) on Phabricator, the requirement is very clear:
### 1. OAuth support
The implmentation of OAuth 1.0a support.
### 2. Unit tests and deploying
Two mandatory unit tests:
- A unit test should perform a login and logout using OAuth with assertions that verify APISite._userinfo is correct. - A unit test should login, edit a userpage, and confirm the edit was performed using the OAuth-authenticated account.
These two are mandatory, but I have to assume that more tests may be needed since more requirements for test could arise during the development.
The unit tests should be configured to run on Travis CI when the secret key is available in the Travis CI configuration, and skipped when it isn't.
### 3. Documentation
OAuth is a new authentication method for Pywikibot. So supplying a How-to document about its usage is necessary. Also, docs for developers may be needed.
## Timeline
- **Before May 25, 2015**
- Plan and confirm implementation details (see below).
- Implement OAuth support.
- **May 25, 2015 (Students begin coding for their Google Summer of Code projects) to June 25, 2015**
- Implement OAuth support.
- **June 26, 2015 (Mentors and students can begin submitting mid-term evaluations) to Aug 16, 2015**
- Implement two mandatory unit tests (maybe more if needed) and deploy test running on Travis CI.
- Write How-to and other related documents.
- Fix bugs.
- **Aug 17, 2015 (Suggested 'pencils down' date) to August 21, 2015 (Firm 'pencils down' date)**
- Finish the final report, present the result to the community and Google.
- **August 28, 2015 (Students can begin submitting required code samples to Google)**
- Submit required code samples to Google.
## Project Details
### Preparation
I'd like to seperate it into three parts:
1. Get familiar with Pywikibot code and MediaWiki API 2. Get familiar with OAuth-related knowledge, libraries (e.g. MediaWiki-OAuth) and OAuth extension of MediaWiki software 3. Get familiar with Pywikibot test code, Travis CI and its configuration 4. Build development environment
Frankly speaking, I just started to touch Pywikibot and MediaWiki software on Februray this year. I'm getting familiar with code of Pywikibot and inner implementation. With the help of jayvdb and Nullzero, I've solved some bugs ([T74974](https://phabricator.wikimedia.org/T74974) and [T57140](https://phabricator.wikimedia.org/T57140)) of Pywikibot-login.py and got them merged. I built local MediaWiki site, installedfor testing and bug replication by using Bitnami stack. Also I read the source code of MediaWiki-OAuth and got to know the usage of Travis CI.
I think the preparation for me is mostly done.
### Planning
Mentors discussed about the implementation details on [Phabricator](https://phabricator.wikimedia.org/T74065). Thanks to Halfak's work, there is a generic OAuth handshake helper in Python, MediaWiki-OAuth, dedicated to MediaWiki OAuth. This really does a big favor for me. So what's left is how to sign new requests with the AccessToken achieved through MediaWiki-OAuth, just as Halfak said. As a conclusion of the discussion, there are two available schemas for signing new requests. The first one is sticking to httplib2 and implementing our own signing strategy with oauthlib. The second one is switching from httplib2 to requests as HTTP communication handler and using requests-oauthlib to signing requests. I've done some investigation for this. And I think these two different schemas focus on different points.
The first schema is to implement our own OAuth signing strategy. It's more relevant to this project considering the goal of this project, that's we're coding OAuth. However, as Halfakk said, this will be hard to do and need some experienced guys to review the implementation. The bug of authentication is vital.
The second schema is more about migrating I think. Since requests and requests-oauthlib support OAuth 1/2 natively and are widely used, the robusty of OAuth authentication is more reliable. But considering conisistency, it's not sensible to use requests/requests-oauthlib and httplib2 simultaneously in my opinion. So migrating to requests/requests-oauthlib is necessary. Pywikibot doesn't just use httplib2 directly, it adds some add-ons, e.g. cookies-support, multi-thread and connection-pool. So confirming requests has equivelant functions and implementing some wrappers for requests is the main point.
Both schemas have their own pros and cons. Httplib2 is more historical and compact. Requests is more popular and powerful. It's hard to judge which one is much better. But I think it's more painful to fully migrate to requests since httplib2 is integrated so tightly in Pywikibot. A lot of work had been done for adaptation. Migrating may be less meaningful considering we are just need OAuth. So I prefer the first schema. That's just my opinion for this delimma. (I don't have any strong bias for this. Both schemas make sense for me. Discussion with two mentors is necessary.)
There are some other implementation details need to be considered: storing keys and tokens for OAuth, Site object adaptation, exception catching and so on.
### Implementation details
#### OAuth implementation
According to the requirement of OAuth implementation, some changes and updatation are needed:
1. **pywikibot/comms**: This is a sub-package which provide basic HTTP request/response handlers. So, MediaWiki-OAuth need to be integrated here to handle OAuth handshakes between Pywikibot and web server. Signing requests with access token when using OAuth authentication also goes here. The first schema need to extend **pywikibot/comms/threadedhttp.py** by adding our own OAuth requests signing. The second schema is more complicated. Most parts of **pywikibot/comms/http.py** and **pywikibot/comms/threadedhttp.py** need to changed (There's a [commit](https://gerrit.wikimedia.org/r/#/c/189821/) on Gerrit about this, which is mentioned on the discussion on Phabricator). 2. **pywikibot/config2.py**: This works as a template for user-config.py which is provided by users. Since OAuth is different from password authentication, we need to add new configuration items here. 3. **pywikibot/login.py**: This is the implementation of basic login mechanism. So this module need update. 4. **pywikibot/data/api.py**: This contains a wrapper for LoginManager in pywikibot/login.py, so I have to assume this also need to be updated. 5. **pywikibot/site.py**: This contains the abstraction of wiki sites. So if users choose to use OAuth, the access token might be stored in Site object and also have some flags indicating that. 6. **pywikibot/exception.py**: This contains exceptions might be throwed. Exceptions which inform users about what's wrong during OAuth authentication need to be added.
#### Unit tests
For OAuth support, we should test that Pywikibot could achieve the right user identity through OAuth authentication and use the identity obtained to perform proper actions.
My opinion is to add an individual test like **pywikibot/test/oauth_tests.py**, under **pywikibot/test**, so the two mandatory tests or more related tests could go there. Also, to support these tests, something may be needed:
1. **pywikibot/test/aspects.py**: This module provides some building blocks for tests. The **RequireUserMixin** provides user login checking. The **MetaTestCaseClass** provides metadata for configuration. The corresponding code may be added to these class. Also, we should provide something like OAuthSiteTestCase other than DefaultSiteTestCase to distinct two authentication methods. And it'll be used in our tests 2. **pywikibot/test/http_tests.py**: This is for **pywikibot/comms**. So all tests should be passed and additional tests may be needed here if we choose to migrating to requests library from httplib2.
#### Documentation
This part may include comments in code, documentation in Pywikibot's manual and documentation for developers.
The comments in code should be meaningful and concise.
The How-to documentation for the usage of OAuth authentication could be added to [Manual:Pywikibot/Basic use](http://www.mediawiki.org/wiki/Manual:Pywikibot/Basic_use)
The documentation for developers should describe the idea of design and the basic structure for convenience of bug fixing and improvement.
#### Note
The implementation details above is based on what I understand about the code by now. if there are bugs or mistakes, I'll appreciate if you could point them out and help me fix them, so I improve the details :D
## Participation
### Communication of progress
- IRC Channel: This is always my first option for help whenever I am stuck at something. I'll be available on IRC channels, pywikibot and wikimedia-dev, by the nickname VcamX. - Mailing list: I suscribed to mailing lists such as wikitech-l and Pywikipedia-l. If I can't get instant response, mailing list is my second choice. - Weekly report: Weekly report is helpful for summing up, reviewing what I have done and what I need to change. It's a good way of communicating progress.
### Publishing source code
- Gerrit: Wikimedia Code Review
### How and where you plan to ask for help?
- Try to solve by my self: read documentation, search online and so on. - Ask help from the mentors and community through IRC and mailing list.
## About me
**Education completed or in progress**
B.S. in Computer Science, Zhejiang University, Hangzhou, China
**How did you hear about this program?**
I searched for organizations available on GSoC 2014 and found MediaWiki. On its Phabricator, I found this project seems good for me. This project was for GCI 2014 originally and I was not sure whether it's available on GSoC 2015. Then I got confirmation from jayvdb. So, I think that's it.
**Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?**
Before June (included), I must spend some time on my graduation project and graduation affairs, so I decide to start coding earlier for compensating the loss.
**We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?**
Only GSoC.
## Past experience
### FOSS Projects
**[Xapian](http://www.xapian.org)**
Xapian is an Open Source Search Engine Library written in C++. In GSoC 2014, my work is mainly focusing on refactoring the LETOR module. [Link](https://www.google-melange.com/gsoc/project/details/google/gsoc2014/vcamx/56...)
### Personal Repositories
[GitHub](https://github.com/VcamX)
[BitBucket](https://bitbucket.org/VcamX)
### Relevant projects
I've been using Python for many years. I like its conciseness and power.
During the second year of university, I took an part-time internship at a small startup in Hangzhou, China. My job is to write Web spiders using Python for B2C online retailers in Mainland China: Tmall/Taobao, JD.com, Amazon and Yixun. As a course project, I used Django to build a website for comparing price, which has a back end service for scrap the price and a front end for displaying the detailed information of goods. Also, my group's course project of Information Retrieval is a tiny picture search engine based on Python and MIRFLICKR dataset. It can search by both text and picture. I implemented the text search by using TF-IDF and the website front end based on Flask and JavaScript.
Besides those, I wrote some Python code for fun. I also have project experience of C/C++, Java.
### Interested projects
I like writing Python code. Pywikibot is what I need. I don't apply for other projects. For me, concentrating on one single project is better than diffusing energy on many projects. Focusing makes me more efficient.
## Any other info
MediaWiki Foundation is one of the greatest nonprofit organization around the world. I benefited so much from Wikipedia and its sibiling projects as everyone on earth. I'm very willing to get involved in Pywikibot project and MediaWiki to learn and contribute.
TASK DETAIL https://phabricator.wikimedia.org/T93352
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: VcamX Cc: VcamX, csteipp, MZMcBride, valhallasw, jayvdb, Krinkle, Halfak, Ricordisamoa, Qgil, Legoktm, JeanFred, NiharikaKohli, Aklapper, Imaculate, pywikipedia-bugs