This is Peng Wan. I have submitted my application to wikimedia of Gsoc. My Project title is "Figuring out the most popular pages".
Here is the project's short description: The feature aims to figure out the most popular and favorite pages in wikimedia. The most popular pages are calculated when users click on pages. The click event can send the action to the database. Then we can figure out the most popular pages by querying through the destination urls. As for the most favorite pages, I want to add a "like" or "+1" tag in every page. If one user likes the content in the page, s/he just need to click the "like" link to add the "like" number in database.
Here is my proposal link: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/buaajacks...
I would appreciate for your advice about my proposal.
Thanks Peng Wan
On 6 April 2011 16:02, Peng Wan buaajackson@gmail.com wrote:
This is Peng Wan. I have submitted my application to wikimedia of Gsoc. My Project title is "Figuring out the most popular pages".
Does it do anything http://stats.grok.se/ doesn't?
- d.
See also: http://dammit.lt/wikistats/
I've parsed every one of these files (at hour granularity; grok.se aggregates at day-level, I believe) since Jan. 2010 into a DB structure indexed by page title. It takes up about 400GB of space, at the moment.
While a comprehensive measurement study over this data would be interesting (long term trends, traffic spikes during cultural events, etc.) -- the technical infrastructure is already in place. I doubt a measurement study meets GSoC requirements.
Thanks, -AW
On 04/06/2011 11:05 AM, David Gerard wrote:
On 6 April 2011 16:02, Peng Wanbuaajackson@gmail.com wrote:
This is Peng Wan. I have submitted my application to wikimedia of Gsoc. My Project title is "Figuring out the most popular pages".
Does it do anything http://stats.grok.se/ doesn't?
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hoi, While it is interesting to know what articles are popular, it serves no real purpose except for curiosity. When the information does show what articles are missed most often, you provide functionality that we do not have and that has a really practical application.
When the people in India, Pakistan and Sri Lanka all at the same time have a bout of cricket fever, it will show in the traffic numbers from these areas. When a Pakistani cricketer is really popular on Wikipedia and he is missing on the Tamil or Singhala Wikipedia it is the type of intelligence that makes for an article that is likely to be really popular.
There is a big need for statistics that point to the articles that are missing and, there are several approaches to such data. In my opinion given the right take on this issue it is definitely a GSOC worthy project. Thanks, GerardM
On 6 April 2011 17:46, Andrew G. West westand@cis.upenn.edu wrote:
See also: http://dammit.lt/wikistats/
I've parsed every one of these files (at hour granularity; grok.se aggregates at day-level, I believe) since Jan. 2010 into a DB structure indexed by page title. It takes up about 400GB of space, at the moment.
While a comprehensive measurement study over this data would be interesting (long term trends, traffic spikes during cultural events, etc.) -- the technical infrastructure is already in place. I doubt a measurement study meets GSoC requirements.
Thanks, -AW
On 04/06/2011 11:05 AM, David Gerard wrote:
On 6 April 2011 16:02, Peng Wanbuaajackson@gmail.com wrote:
This is Peng Wan. I have submitted my application to wikimedia of Gsoc. My Project title is "Figuring out the most popular pages".
Does it do anything http://stats.grok.se/ doesn't?
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Andrew G. West, Doctoral Student Dept. of Computer and Information Science University of Pennsylvania, Philadelphia PA Website: http://www.cis.upenn.edu/~westand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Andrew G. West wrote:
I've parsed every one of these files (at hour granularity; grok.se aggregates at day-level, I believe) since Jan. 2010 into a DB structure indexed by page title. It takes up about 400GB of space, at the moment.
Is your database available to the public? The Toolserver folks have been talking about getting the page view stats into usable form for quite some time, but nothing's happened yet. If you have an API or something similar, that would be fantastic. (stats.grok.se has a rudimentary API that I don't imagine many people are aware of.)
MZMcBride
Not sure I want to throw the API open to the public (the grok.se folks, and others, have a fine service for casual experimentation).
However, I am willing to share the data with interested researchers who need to do some serious crunching (I have a Java API and could distribute database credentials on a per-case basis).
I'll note that I only parse English Wikipedia at this time. I've found it useful in my anti-vandalism research (i.e., "given that edit survived between time [w] and [x] on article [y], we estimate it received [z] views"). Thanks, -AW
On 04/06/2011 08:44 PM, MZMcBride wrote:
Andrew G. West wrote:
I've parsed every one of these files (at hour granularity; grok.se aggregates at day-level, I believe) since Jan. 2010 into a DB structure indexed by page title. It takes up about 400GB of space, at the moment.
Is your database available to the public? The Toolserver folks have been talking about getting the page view stats into usable form for quite some time, but nothing's happened yet. If you have an API or something similar, that would be fantastic. (stats.grok.se has a rudimentary API that I don't imagine many people are aware of.)
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org