We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Hi all, we produced a prototype of an editor-editor interaction network visualization for individual articles, based on the word/tokens deleted and reintroduced by editors. It will be presented as a demo at the WWW conference this year [1], but we would love to also get some feedback on it from this list. It's in an early stage and pretty slow when loading up, so have patience when you try it out here: http://km.aifb.kit.edu/sites/whovis/index.html, and be sure to read the "how to" section on the site. Alternatively you can watch the (semi-professional) screencast I did :P, it explains most of the functions.
The (disagreement) interactions are based on a extended version of the extraction of authorship we do with wikiwho [2], and the graph drawing is done almost exactly after the nice method proposed by Brandes et al. [3] . The code can be found at github, both for the interaction-extraction extension of wikiwho [4] and the visualization itself [5], which basically produces an json output for feeding the D3 visualization libraries we use. We have yet to generate output for more articles, so far we only show a handful for demonstration purposes. The whole thing also fits nicely (and was supposed to go along) with the IEG proposal that Pine had started on editor interaction [6] .
word provenance/authorship API prototype:
Also, we have worked a bit on our early prototype for an API for word provenance/authorship:
You can get word/token-wise information from which revision what content originated (and thereby which editor originally authored the word) at
http://193.175.238.123/wikiwho/wikiwho_api.py?revid=<REV_ID>"&name=<ARTICLENAME>&format=json
(<ARTICLENAME> -> name of the article in ns:0, in the english wikipedia, <REV_ID> -> rev_id of that article for which you want the authorship information, format is currently only json)
Example: http://193.175.238.123/wikiwho/wikiwho_api.py?revid=649876382&name=Laura_Bu…
Output format is currently:
{"tokens": [{"token": "<FIRST TOKEN IN THE WIKI MARKUP TEXT>", "author_name": "<NAME OF AUTHOR OF THE TOKEN>", "rev_id": "<REV_ID WHEN TOKEN WAS FIRST ADDED>"}, {"token": "<SECOND TOKEN IN THE WIKI MARKUP TEXT>", "author_name": "<NAME OF AUTHOR OF THE TOKEN>", "rev_id": "<REV_ID WHEN TOKEN WAS FIRST ADDED>"}, {"token": "<THIRD TOKEN …
… ], "message": null, "success": "true", "revision": {"article": "<NAME OF REQUESTED ARTICLE>", "time": "<TIMESTAMP OF REQUESTED REV_ID>", "reviid": <REQUESTED REV_ID>, "author": "<AUTHOR OF REQUESTED REV_ID>"}}
DISCLAIMER: there are problems with getting/processing the XML for larger articles right now, so don't be surprised if that gives you an error sometimes (i.e. querying "Barack Obama" for instance and similar sizes will *not* succeed for higher revision numbers). Also, we are working on the speed and providing more precomputed articles (right now almost all are computed on request, although we save intermediary results). Still, for most articles it works fine and the output has been tested for accuracy (cf. [2]).
At some point in the future, this API will also be able to deliver the interaction data that the visualization is build on.
I'm looking forward to your feedback :)
Cheers,
Fabian
[1] http://f-squared.org/wikiwho/demo32.pdf
[2] http://f-squared.org/wikiwho/
[3] http://dl.acm.org/citation.cfm?id=1526808
[4] https://github.com/maribelacosta/wikiwho
[5] https://github.com/wikiwho/whovis
[6] https://meta.wikimedia.org/wiki/Grants:IEG/Editor_Interaction_Data_Extracti…
--
Fabian Flöck
Research Associate
Computational Social Science department @GESIS
Unter Sachsenhausen 6-8, 50667 Cologne, Germany
Tel: + 49 (0) 221-47694-208
fabian.floeck(a)gesis.org<mailto:fabian.floeck@gesis.org>
www.gesis.orgwww.facebook.com/gesis.org
[cid:E09117E0-16C9-4BCB-B1F9-D758CB4CE0D3]
Wikipedia Research Track at OpenSym 2015!
Wikis and Open Collaboration Track at OpenSym 2015!
We have extended the research paper submission deadline to *April 13th,
2015* AoE (Monday night) to give authors more time to finish their paper
submissions.
http://www.opensym.org/os2015/call-for-papers/wikis-and-open-collaboration-…http://www.opensym.org/os2015/call-for-papers/wikipedia-research-track/
We would appreciate an abstract submission by the old deadline, March
29th, 2015. Abstract submission is not required but should help us plan
the review process. Please submit through
https://easychair.org/conferences/?conf=opensym2015
Those who will attend OpenSym 2015 at San Francisco's Golden Gate Club
http://www.presidio.gov/venues/Pages/Golden-Gate-Club.aspx
will not only experience a great research program and community events,
but also get to listen to and engage with keynote speakers
- Peter Norvig of Google,
- Richard Gabriel of IBM,
- Robert Glushko of UC Berkeley, and
- Tony Wasserman of CMU (Silicon Valley).
See you in San Francisco!
--
Prof. Dr. Dirk Riehle, Friedrich-Alexander-University Erlangen-Nürnberg
Open Source Research Group, Applied Software Engineering
Web: http://osr.cs.fau.de, Email: dirk.riehle(a)fau.de
Cell phone: +49 157 8153 4150 or +1 650 450 8550
Hi all,
Does anyone know if data about gender of contributors on projects other
than English Wikipedia is available? In addition to other language
Wikipedias, it would be interesring if we have data about Commons,
Wikisource, Wikivoyage, Wiktionary, Wikispecies, etc.
Also, anecodotally I obeserve a relatively high percentage of female
participation in education and GLAM activities. Do we have data about the
gender of participants in education and GLAM, particularly in leadership
roles?
Thanks,
Pine
Hi,
Bob West, Jure Leskovec, and myself are organizing a workshop in ICWSM
focused on the challenges and opportunities of Wikipedia. You can find more
information about the workshop and call for papers below.
Looking forward to seeing many of you in person in the workshop.
Best,
Leila
*Call for Workshop Papers*
Workshop on Wikipedia, a Social Pedia: Research Challenges and Opportunities
May 26, Oxford, England
co-located with the 9th International Conference on Weblogs and Social
Media (ICWSM 2015)
http://snap.stanford.edu/wiki-icwsm15/
Deadline for papers: Tuesday, March 24, 2015, 23:59 AoE
Wikipedia is one of the most popular sites on the Web, a main source of
knowledge for a large fraction of Internet users, and, in the light of its
collaborative nature, an inherently social medium. Therefore, and since not
only all content but also many activity logs are available to the public,
Wikipedia has become an important object of study for researchers across
many subfields of the computational and social sciences, such as
social-network analysis, social psychology, education, anthropology,
political science, human-computer interaction, cognitive science,
artificial intelligence, linguistics, and natural-language processing.
This workshop is a venue for all researchers exploring social aspects of
Wikipedia. The workshop will feature high-profile speakers from academia
and the Wikimedia Foundation and aims to create a forum where participants
can connect both among each other and with researchers at the Wikimedia
Foundation.
Topics of interest include, but are not limited to:
- Collaborative content creation
- Consensus-finding and conflict resolution on editorial issues
- Content consumption on Wikipedia
- Participation in discussions and their dynamics
- Collaborative task management
- Evolution of hierarchies
- Wikipedia as a sensor for real-world events, culture, etc.
- Demographics of Wikipedia readers and editors
- Engagement and incentivization of editors
We invite the submission of regular research papers (6–8 pages) as well as
position papers (2–4 pages). Authors whose papers are accepted to the
workshop will have the opportunity to participate in a poster session.
*Submission instructions*
Regular and position papers should be formatted according to AAAI
formatting guidelines (http://www.aaai.org/Publications/Author/author.php).
Please submit papers using EasyChair at https://easychair.org/conferences/?
conf=wikiicwsm2015
*Review and the archival of papers*
Authors will be notified of acceptance or rejection on or before Tuesday,
March 31, 2015.
The accepted papers will be published on the workshop webpage (unless the
authors object), and authors whose papers are accepted will have the
opportunity to participate in a poster session.
*Organizing committee*
Robert West, Stanford University
Jure Leskovec, Stanford University
Leila Zia, Wikimedia Foundation
Hi all,
I'm representing a team of researchers from Drexel University who are
researching privacy practices among Wikipedia editors. If you have ever
thought about your privacy when editing Wikipedia or taken steps to protect
your privacy when you edit, we’d like to learn from you about it.
The study is titled “Privacy, Anonymity, and Peer Production.” Details can
be found on meta where the project was discussed before beginning
recruitment here: (
https://meta.wikimedia.org/wiki/Research:Anonymity_and_Peer_Production).
If you would like to help us out, you need to read and complete the online
consent form linked here and we will get in contact with you:
http://andreaforte.net/wp.html.
We are planning to conduct interviews that will last anywhere from 30-90
minutes (depending on how much you have to say) by phone or Skype and we
can offer you $20 for your time, but you do not need to accept payment to
participate.
I have been researching Wikipedia since 2004 and have conducted many
studies, most of which have resulted in papers that you can find here:
http://andreaforte.net.
Thanks for considering it, please contact me if you have questions!
Andrea Forte
http://en.wikipedia.org/wiki/User:Andicat
and
Rachel Greenstadt
Nazanin Andalibi
--
:: Andrea Forte
:: Assistant Professor
:: College of Computing and Informatics, Drexel University
:: http://www.andreaforte.net
Hi all,
I am compiling some stats regarding the work done on the Art & Feminism
edit-a-thons for my local chapter and while checking the state of the wikis
regarding female artists I noticed that there are huge local differences
per language wiki regarding "who is notable". One of the things I love
about Mix-and-Match is the way you can easily check the sitelinks per wiki.
You can also download the data with autolist to see which biographies are
popular across different languages. I noticed that in the case of women
this seems to be way different than for men. Women artists are more likely
to be notable in one or two languages only, possibly because they travel
less, making their art known more locally than otherwise - who knows?
In any case, here is something to chew on:
https://commons.wikimedia.org/wiki/File:Women_vs_Men_per_external_db_using_…
I wish we had more databases from more countries that only contain artists
that we could load into Mix-and-Match!
Jane
Has anyone studied comparing the use of http://erg.delph-in.net/logon
with traditional measures of text reading level for locating potentially
confusing sentences?
Hi,
This month's research showcase
<https://www.mediawiki.org/w/index.php?title=Analytics/Research_and_Data/Sho…>
is scheduled for Wednesday, March 25, 11:30 (PST).
We will have two presentations on user session identification by Aaron
Halfaker, and mining missing hyperlinks in Wikipedia by Bob West.
As usual, the event will be recorded and publicly streamed on YouTube
(links will follow). We will hold a discussion and take questions from
remote participants via the Wikimedia Research IRC channel
(#wikimedia-research on freenode).
Looking forward to seeing you there.
Leila