Wiki-research-l December 2013

wiki-research-l@lists.wikimedia.org

28 participants
18 discussions

Distributing the Wikipedia category/pagelink graph
by Dario Taraborelli 17 Dec '13

17 Dec '13

(cross-posting Sebastiano’s post from the analytics list, this may be of interest to both the wikidata and wiki-research-l communities) Begin forwarded message: > From: Sebastiano Vigna <vigna(a)di.unimi.it> > Subject: [Analytics] Distributing an official graph > Date: December 9, 2013 at 10:09:31 PM PST > > [Reposted from private discussion after Dario's request] > > My problem is that of exploring the graph structure of Wikipedia > > 1) easily; > 2) reproducibly; > 3) in a way that does not depend on parsing artifacts. > > Presently, when people wants to do this they either do their own parsing of the dumps, or they use the SQL data, or they download a dataset like > > http://law.di.unimi.it/webdata/enwiki-2013/ > > which has everything "cooked up". > > My frustration in the last few days was when trying to add the category links. I didn't realize (well, it's not very documented) that bliki extracts all links and render them in HTML *except* for the category links, that are instead accessible programmatically. Once I got there, I was able to make some progress. > > Nonetheless, I think that the graph of Wikipedia connections (hyperlinks and category links) is really a mine of information and it is a pity that a lot of huffing and puffing is necessary to do something as simple as a reverse visit of the category links from "People" to get, actually, all people pages (this is a bit more complicated--there are many false positives, but after a couple of fixes worked quite well). > > Moreover, one has continuously this feeling of walking on eggshells: a small change in bliki, a small change in the XML format and everything might stop working is such a subtle manner that you realize it only after a long time. > > I was wondering if Wikimedia would be interested in distributing in compressed form the Wikipedia graph. That would be the "official" Wikipedia graph--the benefits, in particular for people working on leveraging semantic information from Wikipedia, would be really significant. > > I would (obviously) propose to use our Java framework, WebGraph, which is actually quite standard in distributing large (well, actually much larger) graphs, such as ClueWeb09 http://lemurproject.org/clueweb09/, ClueWeb12 http://lemurproject.org/clueweb12/ and the recent Common Web Crawl http://webdatacommons.org/hyperlinkgraph/index.html. But any format is OK, even a pair of integers per line. The advantage of a binary compressed form is reduced network utilization, instantaneous availability of the information, etc. > > Probably it would be useful to actually distribute several graphs with the same dataset--e.g., the category links, the content link, etc. It is immediate, using WebGraph, to build a union (i.e., a superposition) of any set of such graphs and use it transparently as a single graph. > > In my mind the distributed graph should have a contiguous ID space, say, induced by the lexicographical order of the titles (possibly placing template pages at the start or at the end of the ID space). We should provide graphs, and a bidirectional node<->title map. All such information would use about 300M of space for the current English Wikipedia. People could then associate pages to nodes using the title as a key. > > But this last part is just rambling. :) > > Let me know if you people are interested. We can of course take care of the process of cooking up the information once it is out of the SQL database. > > Ciao, > > seba > > > _______________________________________________ > Analytics mailing list > Analytics(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics

5 4

Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?
by James Salsman 15 Dec '13

15 Dec '13

Maximilian Klein wrote: >... Can you also think of any other dimensions or heuristics > to programatically rate? Ref tags per article text bytes works pretty well, even by itself. Also, please consider readability metrics. I would say that at this point on enwiki, about a third of our real reader-impeding quallity issues have more to do with overly technical jargon-laden articles, which usually also have word and sentence length issues, than underdeveloped exposition. Especially our math articles, many of which are almost useless for undergraduates, let alone students at the earlier grade levels where the corresponding concepts are introduced. The good news is that doesn't seem to be happening in other topic areas like biology, physics, or medicine. But math is kind of a disaster area that way and it's not getting better with time.

1 0

Fwd: Fwd: Data Science for Social Good Summer Fellowship
by Giovanni Luca Ciampaglia 10 Dec '13

10 Dec '13

Not wiki-related per se, but probably many people on this list might be interested. G > *From: *Rayid Ghani <rayid(a)uchicago.edu <mailto:rayid@uchicago.edu>> > *Subject: **Data Science for Social Good Summer Fellowship* > *Date: *December 9, 2013 3:00:10 PM EST > *To: *Rayid Ghani <rayid(a)uchicago.edu <mailto:rayid@uchicago.edu>> > > Hi, > I'm running the Eric & Wendy Schmidt "Data Science for Social Good" Summer > Fellowship again this year at the University of Chicago and need help in > recruiting strong students (grad students or junior/senior undergrads with > CS, Machine Learning, and/or Stats background). The goal is to get up to > 50 students in Chicago this summer and have them work on high-impact > social problems (in education, healthcare, energy, transportation, crime, > etc.) using Machine Learning, Data Mining, and other related buzzwords. The > students will work with full-time mentors from academia and industry. The > fellowships are paid competitively and we will provide housing as well. > > More details are at http://dssg.uchicago.edu <http://dssg.uchicago.edu/>. > Applications for the fellowship are due February 1, 2014. > > If you have (or know of) strong CS/Stats/Econometrics/Applied Math/Policy > students who have an interest in making an impact by working on high-impact > social problems using machine learning/data mining/stats, please forward this > to them. > > Thanks, > Rayid > > P.S. We’re also looking for full-time mentors (strong technical folks with > real-world experience who want to spend the summer in Chicago working with a > team of fellows). > > Rayid Ghani > Computation Institute & Harris School of Public Policy > University of Chicago > rayid(a)uchicago.edu <mailto:rayid@uchicago.edu> > http://www.rayidghani.com <http://www.rayidghani.com/>

3 2

gastroenterology and hepatology articles (was Re: Fwd: the Helsinki Times evaluates...)
by James Salsman 09 Dec '13

09 Dec '13

Has there ever been a general purpose encyclopedia which was found suitable for medical student instruction? What are our median level readers going to do if we suddenly start including enough pathophysiology images to please the med school instructors? I'm not entirely sure it will help them, although on the other hand it might encourage them to see a professional which is what they often should be doing instead of reading Wikipedia. (But if wishes were horses, beggars would ride....) >... Daniel Mietchen wrote: >> >> A similar paper on 39 gastroenterology/ hepatology articles on the >> English Wikipedia came to different conclusions: >> https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Medicine#Paper:_.2…

2 1

Fwd: the Helsinki Times evaluates the Finnish Wikipedia
by phoebe ayers 09 Dec '13

09 Dec '13

(a la the old Nature study) http://www.helsinkitimes.fi/finland/finland-news/domestic/8619-world-s-larg… World’s largest study on Wikipedia: Better than it’s reputation 05 Dec 2013 Wikipedia is the most popular encyclopedia in the world, and now in many countries it is also the only one. But can the information in Wikipedia be trusted at all? Helsingin Sanomat newspaper evaluated 134 articles in the Finnish-language version of Wikipedia with 96 experts. As far as we know, our study is the most extensive individual investigation on the trustworthiness of Wikipedia in the whole world. We found out that Wikipedia is better than it’s reputation: seventy per cent of the articles got good points for accuracy. But in many ways the Finnish Wikipedia is also far from flawless. Olavi Koistinen HS Hundreds of thousands of people use the Finnish-language version of Wikipedia every day. Its articles rank at the top in Google search results, whether you are looking for information on climate change, Michael Jackson, Sydney, ball bearings, cancer, allosaurus, or any number of other topics. Wikipedia is the only Finnish encyclopedia that is still being updated. The golden age of printed encyclopedias was in the 1990s, but with the advent of the internet, their sales collapsed. The last traditional Finnish encyclopedia, the online version of WSOY’s Facta, was closed down in 2011. The reliability of the English language Wikipedia has been studied for years. The most famous study was published by the science magazine Nature in 2005, comparing the number of errors in Wikipedia with those in the Encyclopaedia Britannica. The result of the comparison was that Wikipedia was almost as accurate as the Encyclopaedia Britannica. The study involved 42 articles from each encyclopaedia. However, no thorough investigations have been made about the reliability of the Finnish Wikipedia – until now. We first chose 134 articles in the Finnish Wikipedia, covering different areas of life, to be studied. Then we asked an expert with a thorough knowledge on the subject matter to evaluate each article. There were a total of 96 people making assessments, most of them professors or other university researchers. They work in eight different Finnish universities. The Finnish Wikipedia currently has nearly 340 000 articles, so the random selection of 134 articles naturally does not give a complete picture of the reliability of the encyclopedia. Our study is nevertheless the most extensive individual investigation or report on the trustworthiness of Wikipedia in the whole world, as far as we know, if the matter is measured on the basis of the number of articles examined. “With this measure the study is unique in its scale”, says researcher Arto Lanamäki of the University of Bergen in Norway. Not even the English-language Wikipedia has undergone such an extensive single survey. The experts read the articles and scored them based on six different indicators, which were lack of errors, coverage and balance, sourcing, topicality, neutrality, and clarity. The articles were graded on a scale of 1-5, in which 1 was the worst mark and 5 was the best. The entire result dataset was published as an open data, and is available here. First the good news: The Finnish Wikipedia is largely error-free. The lack of errors is the area in which Wikipedia clearly got its best score. For instance, if someone uses Wikipedia to find the year in which Christopher Columbus discovered the America, the information in the article is most likely to be correct. No less than 70 per cent of the articles were judged to be good (4) or excellent (5) with respect to lack of errors. According to the indicative evaluation scale a four means that that the article has only “scattered small errors, no big ones”. This is how we evaluated the Finnish language Wikipedia Journalists of Helsingin Sanomat first listed 150 topics from different walks of life. Only after that did we check if any Wikipedia articles actually existed on the topic (the vast majority did exist). 134 Wikipedia articles were selected for the final evaluation. To evaluate each article we chose an university-level researcher with knowledge on the subject matter to be an evaluator. There were a total of 96 evaluators, of which 94 were researchers. On top of that, two experienced sport journalists did evaluate 8 Wikipedia articles about sports. Some of the evaluators assessed more than one article. The evaluation was done via a web form. The basic principle of the evaluation was: If an amateur would read the article, would he or she get a truthful impression of the subject matter? Each evaluator scored the article on the basis of six criteria: lack of errors, coverage and balance, sourcing, topicality, neutrality, and clarity. The scale of the points was 5=excellent, 4=good, 3=adequate, 2=tolerable, 1=poor. The evaluators were also given the chance to comment the article based on each criteria. Most of the evaluations were made in early November. They were based on copies of Wikipedia articles as they were on October 22, 2013. On the question of lack of errors the most common mark was four. What does it mean in practice? According to the evaluators, factors such as these: “No misleading errors, but some imprecision.” (Higgs boson) “Quite proficient text, hardly any actual factual errors. However, odd transliteration stands out.” (Icon) “Geological periods do not quite hit the mark, the factual errors are insignificant.” (Climate change) However, the conclusion should not be drawn that Wikipedia can be trusted naively. A total of 14 articles were graded as poor (1) or tolerable (2). “Already in the first sentence the definition of photosynthesis is odd, and several mistakes were found.” (Photosynthesis) “The article is very uneven in quality, and at times quite propagandistic.” (Syrian civil war) Many of the articles containing the most errors were about a complicated or abstract phenomena, such as the eurozone debt crisis, evolution, or the Syrian civil war. On the other hand, several articles dealing with equally complex subjects got high marks for lack of errors. For instance, the articles for the big bang and climate change were praised for their precision. Although most of the articles were extolled for their accuracy, it can be concluded that there is more variation in the quality of articles in Finnish Wikipedia than in traditional encyclopedias. According to studies, the same also applies to English Wikipedia: the best articles are brilliant, but many are weak. How can a reader assess the quality of a Wikipedia article? A good way to examine the reliability of any particular article is to check it’s factual sources. If the sources of information are listed at the end of an entry or inside the text itself, the reader can become acquainted with them and form an opinion of the credibility of the information on that basis. According to our study there are serious shortcomings in the Finnish Wikipedia specifically in the sourcing of articles. No less than 38 per cent of the articles got a grade of weak (1) or tolerable (2). The evaluators had harsh criticism for these articles. “Sources were not used at all and it shows.” (Internal combustion engine) “On the basis of the text it would seem that sources have mainly included TV documentaries or children’s dinosaur books.” (Allosaurus) “Based primarily on a single disputed work. Sources from antiquity are considered inadequate and hostile.” (Caligula) The last comment highlights a broader problem. The evaluators were critical that many articles were based mostly or partly on one source. This sometimes causes problems: the information from individual works can be selective, and the interpretations biased. “The article seeks to be neutral. However, the points of emphasis of the material that it was based on can clearly be seen.” (Protestant Reformation) Of Wikipedia’s own ideals, neutrality is one of the most important. Wikipedia emphasises that the articles need to be written from a “neutral” point of view. On the basis of our survey the Finnish Wikipedia has been written mainly with a balanced approach, with 56 per cent of the articles getting good (4) or excellent (5) marks. If some articles in the sample were slanted in their statements or points of view, it usually was not attributable to the writer’s deliberate partiality. “The article is positive in its attitude and aims at neutrality. The problems in the content are connected with the source material and the writer’s basic knowledge: it is clearly not a result of anything deliberate.” (Middle Ages) Of the 134 articles that were surveyed, only six were found to have been written with a clear bias, in the opinion of the reviewer. The articles were connected with politics of the United States (Osama bin Laden’s death, the US Democratic Party, the 9/11 terror attacks, Alan Greenspan) and events in the Middle East (The civil war in Syria, the second intifada). “In conditions of war and with an intense conflict dominating, the neutrality requirement for the article is difficult. Occasionally, when reading this article it seems, however, that no actual attempt was made to attain neutrality.” (Civil war in Syria) “A fairly flattering article for the Democrats. Other points of view would also exist.” (US Democratic Party) It is interesting that the experts felt that the wrong kind of “neutrality” and avoiding expressing points of view is also a problem. If the writer is afraid to make interpretations of any kind, the article sometimes ends up being superficial. “There are no interpretations and consequently no points of view either.” (Kingdom of Mali) “Generally a text only lists things instead of pondering them and presenting well-founded evaluations.” (Finnish composer Kaija Saariaho) “As the article seeks to be very objective, it starts strangely by describing Pinochet as a ‘president’ – after all, he was one of the best-known dictators of the 20th century.” (Augusto Pinochet) Sometimes individual facts in an article are correct, but the text fails to mention relevant aspects of the topic, or mentions them too briefly. The article might also ramble on excessively about insignificant details. In such a situation the lay reader might get an inaccurate image of, for instance, what the most important turning points of a country’s history or what the most important achievements of a researcher might be. This is a quality factor which the evaluators measured by scoring each article based on “coverage and balance”. It has an impact on how well Wikipedia can convey an overall image of the matter to the reader. The experts’ marks on coverage and balance were divided almost equally between good and bad. So at least half of the articles could have been more comprehensive and balanced. “The article primarily tells about marriage based on Western – mainly Roman law and the Christian tradition. Other cultures and religions have been left completely outside the examination.” (Marriage) “The article puts far too much emphasis on personal history and even on related insignificant details. The presentation of the main topic, the scientific work, is far too short and superficial compared with the rest of the material.” (Albert Einstein) Printed encyclopedias were often criticised as containing obsolete information even when they were fresh off the press. Wikipedia has a better chance to stay topical, since articles can be updated at any time. The evaluators gave Finnish Wikipedia fairly high marks for topicality, with 43 per cent of the articles getting either excellent or good marks, but on the other hand, 31 per cent were poor or just tolerable in terms of freshness. When something big happens, breaking news are often updated into Wikipedia, but the follow-up on events is poor. “All sources are old, which means that the content of the text has not been updated after 2011 to any practical degree.” (Osama bin Laden’s death) On the other hand, articles can become obsolete even if the topic is a mountain range that rose from inside the earth in prehistoric times, or a notable person who died in the 1930s; research on these topics often still goes on. “The biggest problem of the article is that it does not reflect the current situation of international research.” (Middle Ages) “Within psychoanalysis, much has happened since Freud, and continues to happen, and Freud’s ideas have been re-evaluated many times. There is almost nothing about this in the article.” (Sigmund Freud) Table of results on the study of Finnish Wikipedia 1: Lack of errors 2: Coverage and balance 3. Sourcing 4: Topicality 5: Neutrality 6. Clarity 1 (poor) 3 % 5 % 17 % 13 % 5 % 2 % 2 (tolerable) 7 % 24 % 21 % 19 % 16 % 11 % 3 (adequate) 19 % 41 % 34 % 25 % 22 % 31 % 4 (good) 47 % 25 % 21 % 30 % 34 % 42 % 5 (excellent) 23 % 5 % 7 % 13 % 22 % 13 % % = amount of articles inside the whole sample, which was 134 articles So what are we supposed to think about all of these results? Can information from Finnish Wikipedia be trusted or not? There are two points of view on this matter. Those who have felt so far that everything that is in Wikipedia is true would do well to re-examine their naiveté to a certain degree. Those who have felt that Wikipedia’s content is nothing but inaccurate and biased pseudo-information, should ease up a little. Based on our survey, this is not the case; as a source of information, Wikipedia is a better than it’s reputation – which is not a particularly good one. Arto Lanamäki, who has studied Wikipedia at the University of Bergen, says that people often take a very suspicious view on Wikipedia, even if they themselves use it regularly for seeking information. “In studies, the same article has been placed in the framework of the Encyclopaedia Britannica and Wikipedia, and brought to different people for evaluation. It is quite common for people to take a more suspicious view of the article when it is presented in the framework of Wikipedia”, Lanamäki says. On the other hand, there are some good reasons for the doubts: the quality of Wikipedia fluctuates considerably. Seventy per cent of the articles got good points for accuracy in our study. If the bar is lowered slightly, then 90 per cent of the articles were at least “adequate” in the view of the reviewers. That is a vast majority, but you might also ask yourself a question: would you trust a printed encyclopedia if you knew that every tenth article in it was inaccurate? Wikipedia’s undeniable strength, however, is that the information is updated and upgraded all the time. If you paid good money in the early 2000s for a set of books for your bookshelf, it is already obsolete in many respects. Wikipedia’s massive popularity indicates that our view of factual information is in flux. Rising alongside information confirmed by experts and printed in dignified books, is peer-produced information that is constantly spreading and accumulating on the internet. If you see that something written in Wikipedia makes no sense, do everyone a favour and edit the article so that it is better. You don’t even need a Wikipedia user account for it. Ask not how Wikipedia can help you – ask how you can help Wikipedia. (This article was originally published in Finnish in Helsingin Sanomat at November 30th. The HS working group taking part in the drafting of the survey included Riikka Haikarainen, Tuomas Kaseva, Niko Kettunen, Olavi Koistinen, Veikko Lautsi, Siri Markula, Sami Simola and Timo Paukku.) Finnish Wikipedia has only a few hundred regular writers By Olavi Koistinen HS Wikipedia can be written or edited by anyone. About 70,000 people have done one or more edits on the Finnish Wikipedia. However, most of the content comes from a much smaller group of active contributors. About 200 people write or edit Finnish Wikipedia regularly, says Joonas Lyytinen, a veteran Wikipedia contributor. He also says that there is an even more active core group consisting of only 20–30 people who produce very large amount of content and take care of many administrative tasks in Finnish Wikipedia. Being part of that group, Lyytinen himself has written about 2000 articles for Finnish version of Wikipedia. A typical Wikipedia author is a male university student. Often he is a layman who enjoys writing Wikipedia articles because it is a nice way to learn new things – by researching and writing about them. Researchers also write in Wikipedia, but less than laypeople do. At least some of the researchers feel hesitant with respect to Wikipedia, says University of Bergen researcher Arto Lanamäki, who has studied the Finnish Wikipedia. The articles in Wikipedia are often the result of teamwork, and there is an ongoing debate about the content in the discussion pages of the Wikipedia. In this debate, a researcher thoroughly familiar with the subject does not get any credibility over a layman just because of his or her position or title – the best argument wins. Arguing with hobbyists can be stressful for many researchers. “Often a writer has tried to explain his or her point of view by saying ‘don’t you understand that I have a doctorate and I have researched this topic?’ However, this will not work with Wikipedia”, Lanamäki says. There is no advance censorship, but active users of Wikipedia monitor new updates constantly to counter vandalism. Sometimes, for instance, middle school students tend to sabotage articles. Finnish Wikipedia would benefit from having more writers. There are just 5 million Finnish speakers and only a few real experts in some fields of knowledge. Wikipedia writers also suffer from a somewhat negative image among occasional users of the service. This might even hinder some people from participating in writing Wikipedia articles. “It has been observed in academic studies that an impediment to participation in Wikipedia is that people do not want to be labelled Wikipedia nerds”, Lanamäki says. Olavi Koistinen HS -- * I use this address for lists; send personal messages to phoebe.ayers <at> gmail.com *

6 8

Wikipedia corpora from Google
by Dario Taraborelli 07 Dec '13

07 Dec '13

Google has released over time a huge amount of open data from or about Wikipedia. Check them out: http://googleresearch.blogspot.com/2013/12/free-language-lessons-for-comput… Some highlights: 50,000 Lessons on How to Read: a Relation Extraction Corpus What is it: A human-judged dataset of two relations involving public figures on Wikipedia: about 10,000 examples of “place of birth” and 40,000 examples of “attended or graduated from an institution.” 40 Million Entities in Context What is it: A disambiguation set consisting of pointers to 10 million web pages with 40 million entities that have links to Wikipedia. This is another entity resolution corpus, since the links can be used to disambiguate the mentions, but unlike the ClueWeb example above, the links are inserted by the web page authors and can therefore be considered human annotation. Distributing the Edit History of Wikipedia Infoboxes What is it: The edit history of 1.8 million infoboxes in Wikipedia pages in one handy resource. Attributes on Wikipedia change over time, and some of them change more than others. Understanding attribute change is important for extracting accurate and useful information from Wikipedia. Dictionaries for linking Text, Entities, and Ideas What is it: We created a large database of pairs of 175 million strings associated with 7.5 million concepts, annotated with counts, which were mined from Wikipedia. The concepts in this case are Wikipedia articles, and the strings are anchor text spans that link to the concepts in question. Dario (ht Nicolas Torzec)

1 0

The Wikimedia Research Newsletter 3(11) is out
by Dario Taraborelli 07 Dec '13

07 Dec '13

The November 2013 issue of the Wikimedia Research Newsletter is BIG - check it out: https://meta.wikimedia.org/wiki/Research:Newsletter/2013/November In this issue: • 1 What drives people to contribute to Wikipedia? Experiment suggests reciprocity and social image motivations • 2 Does "cultural imperialism" prevent the incorporation of indigenous knowledge on Wikipedia? • 3 How PR professionals see Wikipedia: Trends from second US survey • 4 Report from the inaugural L2 Wiki Research Hackathon • 5 Briefly • 5.1 "Iron Law of Oligarchy" (1911) confirmed on Wikia wikis • 5.2 Twitter activity leads Wikipedia activity by an hour • 5.3 "Google loves Wikipedia" • 5.4 New article assessment algorithm scores quality of editors, too • 5.5 "How do metrics of link analysis correlate to quality, relevance and popularity in Wikipedia?" • 5.6 Usage of images and sounds is related to the quality of Wikipedia articles • 5.7 Student perception of Wikipedia's credibility is significantly influenced by their professors' opinion • 5.8 Non-participation of female students on Wikipedia influenced by school, peers and lack of community awareness • 5.9 Gender gap coverage in media and blogs • 5.10 German Wikipedia articles become static while English ones continue to develop • 5.11 New sockpuppet corpus • 5.12 Workshop on "User behavior and content generation on Wikipedia" ••• 18 publications were covered in this issue ••• Thanks to Piotr Konieczny, Brian Keegan, Nicolas Jullien, Amir E. Aharoni, Henrique Andrade, Daniel Mietchen, Giovanni Luca Ciampaglia, and Aaron Halfaker for contributing. Dario Taraborelli and Tilman Bayer -- Wikimedia Research Newsletter https://meta.wikimedia.org/wiki/Research:Newsletter/ * Follow us on Twitter/Identi.ca: @WikiResearch * Receive this newsletter by mail: https://lists.wikimedia.org/mailman/listinfo/research-newsletter * Subscribe to the RSS feed: http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/

1 0

Websci'14 Call for Data Visualization Challenge
by Giovanni Luca Ciampaglia 03 Dec '13

03 Dec '13

** Apologies for multiple postings; please circulate widely ** Websci'14 Call for Data Visualization Challenge =============================================== We are delighted to announce the Web Science 2014 Visualization Challenge! The Web has generated huge amounts of data at massive scale, but making sense of these datasets and representing them in a compact and easily-interpretable way remains very difficult. The goal of this challenge is to encourage innovative visualizations of Web data. We particularly encourage entries that reflect the interdisciplinary spirit of the Web Science conference. To enable this visualization, we have prepared several large-scale, easy-to-use, publicly-available datasets: 1. Web traffic data, including more than 200 million HTTP requests from browsers to servers; 2. Twitter data, including a sample of more than 22 million tweets; 3. Social bookmarking data, consisting of about 430,000 bookmarked pages; 4. Co-authorship of academic papers, consisting of about 21.5 million papers and 10.8 million authors Complete details on these datasets are available here: http://cnets.indiana.edu/groups/nan/webtraffic/websci14-data. All of the datasets are stored in simple file formats, so that they can be easily used without much technical expertise. We are pleased to offer a cash prize of at least $1000 to be split among the winning entries. Winners will be announced and displayed at the Web Science conference in June 2014, presented on the Web Science website (http://websci14.org), and the winners will be encouraged to present a poster at the conference describing their work. The entries will be judged based on four criteria: (1) innovative use of data, (2) clarity of visualization, (3) quality of design, and (4) potential impact. Rules 1. For fairness, the visualization must be primarily based on the data that we provide. Other datasets may be used to augment ours, but these datasets must be publicly-available and described in detail in the documentation (see #4 below). 2. The visualization must be a static image, and must be submitted as a PDF. In addition to the main PDF, please submit a PNG version at a resolution of about 640x480, for display on Web pages, social media sites, mobile devices, etc. This PNG version need not contain the full visualization, but should be an appropriate representation (e.g. a subset of the full PDF). 3. Please include a separate PDF file containing a description of the visualization, including: (1) name(s), affiliation(s), and contact information of the creator(s), (2) the purpose of the visualization, (3) which dataset(s) were used, (4) a brief description of how the visualizations was created, and (5) any other information you would like to share with the judges. 4. By submitting your visualization, you agree to allow us to display your visualization at the conference and on the Web Science website and social media channels. (We will give proper attribution, of course.) You also certify that you are the copyright holder of the visualization and are authorized to give us this permission. 5. Entries are due by 11:59PM Hawaii time on April 15, 2014. Please e-mail your entry to David Crandall <djcran(a)indiana.edu>. (If you do not receive a confirmation email within 24 hours, your entry has not been received and should be re-sent.) Panel of judges * Yong-Yeol Ahn, Indiana University * Katy Borner, Indiana * University Mark Meiss, * Google Dimitar Nikolov, Indiana University * Maximilian Schich, University of Texas For questions, please contact David Crandall <djcran(a)indiana.edu>. For more information about the 2014 Web Science Conference, please see http://websci14.org. -- Giovanni Luca Ciampaglia Postdoctoral fellow Center for Complex Networks and Systems Research Indiana University ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408 ☞ http://cnets.indiana.edu/ ✉ gciampag(a)indiana.edu ✆ 1-812-855-7261

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l December 2013