Hi!
I am doing a PhD on online civic participation project
(e-participation). Within my research, I have carried out a user
survey, where I asked how many people ever edited/created a page on a
Wiki. Now I would like to compare the results with the overall rate of
wiki editing/creation on country level.
I've found some country-level statistics on Wikipedia Statistics (e.g.
3,000 editors of Wikipedia articles in Italy) but data for UK and
France are not available since Wikipedia provides statistics by
languages, not by countries. I'm thus looking for statistics on UK and
France (but am also interested in alternative ways of measuring wiki
editing/creation in Sweden and Italy).
I would be grateful for any tips!
Sunny regards, Alina
--
Alina ÖSTLING
PhD Candidate
European University Institute
www.eui.eu
Hi all;
I'm starting a new project, a wiki search engine. It uses MediaWiki,
Semantic MediaWiki and other minor extensions, and some tricky templates
and bots.
I remember Wikia Search and how it failed. It had the mini-article thingy
for the introduction, and then a lot of links compiled by a crawler. Also
something similar to a social network.
My project idea (which still needs a cool name) is different. Althought it
uses an introduction and images copied from Wikipedia, and some links from
the "External links" sections, it is only a start. The purpose is that
community adds, removes and orders the results for each term, and creates
redirects for similar terms to avoid duplicates.
Why this? I think that Google PageRank isn't enough. It is frequently
abused by farmlinks, SEOs and other people trying to put their websites
above.
Search "Shakira" in Google for example. You see 1) Official site, 2)
Wikipedia 3) Twitter 4) Facebook, then some videos, some news, some images,
Myspace. It wastes 3 or more results in obvious nice sites (WP, TW, FB).
The wiki search engine puts these sites in the top, and an introduction and
related terms, leaving all the space below to not so obvious but
interesting websites. Also, if you search for "semantic queries" like
"right-wing newspapers" in Google, you won't find real newspapers but
"people and sites discussing about ring-wing newspapers". Or latex and
LaTeX being shown in the same results pages. These issues can be resolved
with disambiguation result pages.
How we choose which results are above or below? The rules are not fully
designed yet, but we can put official sites in the first place, then .gov
or .edu domains which are important ones, and later unofficial websites,
blogs, giving priority to local language, etc. And reaching consensus.
We can control aggresive spam with spam blacklists, semi-protect or protect
highly visible pages, and use bots or tools to check changes.
It obviously has a CC BY-SA license and results can be exported. I think
that this approach is the opposite to Google today.
For weird queries like "Albert Einstein birthplace" we can redirect to the
most obvious results page (in this case Albert Einstein) using a hand-made
redirect or by software (some little change in MediaWiki).
You can check a pretty alpha version here http://www.todogratix.es (only
Spanish by now sorry) which I'm feeding with some bots.
I think that it is an interesting experiment. I'm open to your questions
and feedback.
Regards,
emijrp
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
Is favicon only in the Chinese Wikipedia top 100?
It seems so, and is odd if the problem is a web browser bug.
John Vandenberg.
sent from Galaxy Note
On Dec 28, 2012 4:07 PM, "Johan Gunnarsson" <johan.gunnarsson(a)gmail.com>
wrote:
> On Fri, Dec 28, 2012 at 5:33 AM, John Vandenberg <jayvdb(a)gmail.com> wrote:
> > Hi Johan,
> >
> > Thank you for the lovely data at
> >
> > https://toolserver.org/~johang/2012.html
> >
> > I posted that link to my facebook (below if you want to join in
> > there), and a few language specific facebook groups, and there have
> > been some concerns raised about the results, which I'll list below.
> >
> > These lists are getting some traction in the press so it would be good
> > to understand it better.
> >
> > http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed
>
> Cool, cool.
>
> >
> > Why is [[zh:Favicon]] #2?
> >
> > The data doesnt appear to support that
> >
> > http://stats.grok.se/zh/201201/Favicon
> > http://stats.grok.se/zh/latest90/Favicon
>
> My post-processing filtering follows redirects to find the "true"
> title. In this case the page Favicon.ico redirects to Favicon. This is
> probably due to broken browsers trying to load the icon.
>
> >
> > Number 1 in French is a plant native to asia. The stats for December
> disagree
> > https://en.wikipedia.org/wiki/Ilex_crenata
> > http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9
>
> French's Ilex_crenata redirects to Houx_crénelé.
>
> Ilex_crenata had huge traffic in April:
> http://stats.grok.se/fr/201204/Ilex_crenata
>
> There are a bunch of spikes like this. I can't really explain it. I
> talked to Domas Mituzas (the maintainer of the original dumps I use)
> yesterday and he suggested it might be bots going crazy for whatever
> reason. I'd love to filter all these false positives, but haven't been
> able to come up with an easy way to do it.
>
> Might be possible with access to logs with the user-agent string, but
> that would probably inflate the dataset size even more. It's already
> past the terabyte. However that could probably be solved by sampling
> (for example) 1/100 of the entries.
>
> Comments and ideas are welcome!
>
> >
> > Number 1 in German is Cul de sac. This is odd, but matches the stats
> > http://stats.grok.se/de/201207/Sackgasse
>
> RIght. This one is funny. It has huge traffic on weekdays only.
> Deserted on weekends.
>
> >
> > Number 1 in Dutch is a Chinese mountain. The stats for December disagree
> > http://stats.grok.se/nl/201212/Hua_Shan
>
> July/August agree: http://stats.grok.se/nl/201208/Hua_Shan
>
> >
> > Number 4 in Hebrew is zipper. The stats for December disagree
> > http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F
>
> April agrees:
> http://stats.grok.se/he/201204/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F
>
> >
> > Number 2 in Spanish is '@'. This is odd, but matches the stats
> > http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29
> >
> > --
> > John Vandenberg
> > https://www.facebook.com/johnmark.vandenberg
>
I'm sending this to Wikimedia-l, Wikitech-l, and Research-l in case other people in the Wikimedia movement or staff are interested in "big data" as it relates to Wikimedia. I hope that those who are interested in discussions about WMF editor engagement efforts, WMF fundraising, or WMF HR practices will also find that this email interests them. Feel free to skip straight to the links in the latter portion of this email if you're already familiar with "big data" and its analysis and if you just want to see what other people are writing about the subject.
* Introductory comments / my personal opinion
"Big data" refers to large quantities of information that are so large that they are difficult to analyze and may not be related internally in an obvious way. See https://en.wikipedia.org/wiki/Big_data
I think that most of us would agree that moving much of an organization's information into "the Cloud", and/or directing people to analyze massive quantities of information, will not automatically result in better, or even good, decisions based on that information. Also, I think that most of us would agree that bigger and/or more accessible quantities of data does not necessarily imply that the data are more accurate or more relevant for a particular purpose. Another concern is the possibility of unwelcome intrusions into sensitive information, including the possibility of data breaches; imagine the possible consequences if a hacker broke into supposedly secure databases held by Facebook or the Securities and Exchange Commission.
We have an enormous quantity of data on Wikimedia projects, and many ways that we can examine those data. As this Dilbert strip points out, context is important, and looking at statistics devoid of their larger contexts can be problematic. http://dilbert.com/strips/comic/1993-02-07/
Since data analysis is also something that Wikipedia does in the areas I mentioned previously, I'm passing along a few links for those who may be interested about the benefits and limitations of big data.
* Links:
>From the Harvard Business Review
http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1
>From the New York Times
https://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-fo…
and
https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-wo…
>From the Wall Street Journal. This may be especially interesting to those who are participating in the discussions on Wikimedia-l regarding how Wikimedia selects, pays, and manages its staff.
http://online.wsj.com/article/SB10000872396390443890304578006252019616768.h…
And from English Wikipedia (:
https://en.wikipedia.org/wiki/Big_data
and
https://en.wikipedia.org/wiki/Data_mining
and
https://en.wikipedia.org/wiki/Business_intelligence
Cheers,
Pine
Yikes. I've dropped the ball on some keyword searching where I promised to help.
Worse yet, I've lost the email correspondence with the people I was helping.
I am working on repackaging the PEG Exploratory Parsing tool I built a couple of years ago. This is my experiment in building a wiki "laboratory" for posing and answering a certain class of questions about what people write in wikipedia.
I've been distracted from this work and now find that my email has glitched so I no longer have the test case I'd hoped to pursue. If I promised to help you, please forgive my tardiness and renew the correspondence.
Thanks and best regards. -- Ward
Dear semantic and non-semantic wiki communities,
please find below the full CfP for CICM (Intelligent Computer
Mathematics), 8-12 July in Bath, UK (submission deadline 8 March).
Wikis are widely used to author and publish mathematical knowledge and
thus particularly relevant to the
* DML (Digital Mathematical Libraries) and
* MKM (Mathematical Knowledge Management)
conference tracks, and I'm sure there are a lot of ongoing activities
that could be presented in the
* Systems & Projects
track as well.
Just some examples of where wiki technology (beyond mere LaTeX to PNG
rendering) has previously been used in connection with mathematical
knowledge:
* 2011 workshop on mathematical wikis (http://www.cs.ru.nl/mwitp/)
* The management of the Mizar Mathematical Library and other collections
of formal mathematical knowledge is being facilitated by wikis
(http://arxiv.org/abs/1107.3209, http://arxiv.org/abs/1005.4552)
* Besides Wikipedia there are further, math-specific community wikis,
e.g. http://www.proofwiki.org, http://www.planetmath.org,
http://michaelnielsen.org/polymath1/
Cheers,
Christoph
--- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< ---
CICM 2013 - Conferences on Intelligent Computer Mathematics
July 8-12, 2013 at University of Bath, Bath, UK
http://www.cicm-conference.org/2013/cicm.php
Call for Papers
----------------------------------------------------------------
As computers and communications technology advance, greater
opportunities arise for intelligent mathematical computation. While
computer algebra, automated deduction, mathematical publishing and
novel user interfaces individually have long and successful histories,
we are now seeing increasing opportunities for synergy among these
areas. The Conferences on Intelligent Computer Mathematics offers a
venue for discussing these areas and their synergy.
The conference will take place at the University of Bath (www.bath.ac.uk),
with James Davenport as the local organiser. It consists of four tracks:
Calculemus
Chair: Wolfgang Windsteiger
Digital Mathematical Libraries (DML)
Chair: Petr Sojka
Mathematical Knowledge Management (MKM)
Chair: David Aspinall
Systems and Projects
Chair: Christoph Lange
As in previous years, there are plans to organise a workshop for
presentations by Doctoral students.
The overall programme will be organised by the General Program Chair
Jacques Carette.
----------------------------------------------------------------
Important dates
----------------------------------------------------------------
Abstract submission: 1 March 2013
Submission deadline: 8 March 2013
Reviews sent to authors: 5 April 2013
Rebuttals due: 8 April 2013
Notification of acceptance: 14 April 2013
Camera ready copies due: 26 April 2013
Conference: 8-12 July 2013
----------------------------------------------------------------
Tracks
----------------------------------------------------------------
==========
Calculemus
==========
Calculemus 2013 invites the submission of original research contributions
to be considered for publication and presentation at the conference.
Calculemus is a series of conferences dedicated to the integration of
computer algebra systems (CAS) and systems for mechanised reasoning like
interactive proof assistants (PA) or automated theorem provers (ATP).
Currently, symbolic computation is divided into several (more or less)
independent branches: traditional ones (e.g., computer algebra and
mechanised reasoning) as well as newly emerging ones (on user interfaces,
knowledge management, theory exploration, etc.) The main concern of the
Calculemus community is to bring these developments together in order to
facilitate the theory, design, and implementation of integrated
mathematical assistant systems that will be used routinely by
mathematicians, computer scientists and all others who need
computer-supported mathematics in their every day business.
All topics in the intersection of computer algebra systems and automated
reasoning systems are of interest for Calculemus. These include but are not
limited to:
* Automated theorem proving in computer algebra systems.
* Computer algebra in theorem proving systems.
* Adding reasoning capabilities to computer algebra systems.
* Adding computational capabilities to theorem proving systems.
* Theory, design and implementation of interdisciplinary systems for
computer mathematics.
* Case studies and applications that involve a mix of computation and
reasoning.
* Case studies in formalization of mathematical theories.
* Representation of mathematics in computer algebra systems.
* Theory exploration techniques.
* Combining methods of symbolic computation and formal deduction.
* Input languages, programming languages, types and constraint languages,
and modeling languages for mathematical assistant systems.
* Homotopy type theory.
* Infrastructure for mathematical services.
===
DML
===
Mathematicians dream of a digital archive containing all peer-reviewed
mathematical literature ever published, properly linked, validated and
verified. It is estimated that the entire corpus of mathematical
knowledge published over the centuries does not exceed 100,000,000
pages, an amount easily manageable by current information technologies.
Track objective is to provide a forum for development of math-aware
technologies, standards, algorithms and formats towards fulfillment
of the dream of global digital mathematical library (DML). Computer
scientists (D) and librarians of digital age (L) are especially
welcome to join mathematicians (M) and discuss many aspects of DML
preparation.
Track topics are all topics of mathematical knowledge management
and digital libraries applicable in the context of DML building --
processing of math knowledge expressed in scientific papers in
natural languages, namely:
* Math-aware text mining (math mining) and MSC classification
* Math-aware representations of mathematical knowledge
* Math-aware computational linguistics and corpora
* Math-aware tools for [meta]data and fulltext processing
* Math-aware OCR and document analysis
* Math-aware information retrieval
* Math-aware indexing and search
* Authoring languages and tools
* MathML, OpenMath, TeX and other mathematical content standards
* Web interfaces for DML content
* Mathematics on the web, math crawling and indexing
* Math-aware document processing workflows
* Archives of written mathematics
* DML management, bussiness models
* DML rights handling, funding, sustainability
* DML content acquisition, validation and curation
===
MKM
===
Mathematical Knowledge Management is an interdisciplinary field of
research in the intersection of mathematics, computer science, library
science, and scientific publishing. The objective of MKM is to develop
new and better ways of managing sophisticated mathematical knowledge,
based on innovative technology of computer science, the Internet, and
intelligent knowledge processing. MKM is expected to serve
mathematicians, scientists, and engineers who produce and use
mathematical knowledge; educators and students who teach and learn
mathematics; publishers who offer mathematical textbooks and
disseminate new mathematical results; and librarians and
mathematicians who catalog and organize mathematical knowledge.
The conference is concerned with all aspects of mathematical knowledge
management. A non-exclusive list of important topics includes:
* Representations of mathematical knowledge
* Authoring languages and tools
* Repositories of formalized mathematics
* Deduction systems
* Mathematical digital libraries
* Diagrammatic representations
* Mathematical OCR
* Mathematical search and retrieval
* Math assistants, tutoring and assessment systems
* MathML, OpenMath, and other mathematical content standards
* Web presentation of mathematics
* Data mining, discovery, theory exploration
* Computer algebra systems
* Collaboration tools for mathematics
* Challenges and solutions for mathematical workflows
====================
Systems and Projects
====================
The Systems and Projects track of the Conferences on Intelligent Computer
Mathematics is a forum for presenting available systems and new and
ongoing projects in all areas and topics related to the CICM conferences:
* Deduction and Computer Algebra (Calculemus)
* Digital Mathematical Libraries (DML)
* Mathematical Knowledge Management (MKM)
* Artificial Intelligence and Symbolic Computation (AISC)
The track aims to provide an overview of the latest developments and
trends within the CICM community as well as to exchange ideas between
developers and introduce systems to an audience of potential users.
----------------------------------------------------------------
Submission Instructions
----------------------------------------------------------------
Submissions to the research tracks must not exceed 15 pages and will be
reviewed and evaluated with respect to relevance, clarity, quality,
originality, and impact. Shorter papers, e.g., for system
descriptions, are welcome. Authors will have an opportunity to respond
to their papers' reviews before the programme committee makes a
decision.
System descriptions and projects descriptions should be 2-4 pages and
should present
* newly developed systems,
* systems that have not previously been presented to the CICM community,
or
* significant updates to existing systems.
Systems must be available for download.
Project presentations should describe
* projects that are new or about to start,
* ongoing projects that have not yet been presented to the CICM community.
* significant new developments in ongoing previously presented projects.
Presentations of new projects should mention relevant previous work and
include a roadmap that outlines concrete steps. All submissions should
contain links to demos, downloadable systems, or project websites.
Accepted conference submissions from all tracks is intended to be published
as a volume in the series Lecture Notes in Artificial Intelligence (LNAI)
by Springer. In addition to these formal proceedings, authors are permitted
and encouraged to publish the final versions of their papers on arXiv.org.
Work-in-progress submissions are intended to provide a forum for the
presentation of original work that is not (yet) in a suitable form for
submission as a full or system description paper. This includes work
in progress and emerging trends. Their size is not limited, but we
recommend 5-10 pages.
The programme committee may offer authors of rejected formal
submissions to publish their contributions as work-in-progress papers
instead. Depending on the number of work-in-progress papers accepted,
they will be presented at the conference either as short talks or as
posters. The work-in-progress proceedings will be published as a
technical report, as well as online with CEUR-WS.org.
All papers should be prepared in LaTeX and formatted according to the
requirements of Springer's LNCS series (the corresponding style files
can be downloaded from
http://www.springer.de/comp/lncs/authors.html). By submitting a paper
the authors agree that if it is accepted at least one of the authors
will attend the conference to present it.
Electronic submission is done through easychair
http://www.easychair.org/conferences/?conf=cicm2013
----------------------------------------------------------------
Programme Committee
----------------------------------------------------------------
Jacques Carette, McMaster University, Canada
Wolfgang Windsteiger, RISC Institute, JKU Linz, Austria
Petr Sojka, Masaryk University, Faculty of Informatics, Czech Republic
David Aspinall, University of Edinburgh, UK
Christoph Lange, University of Birmingham, UK
Till Mossakowski, DFKI Bremen, Germany
Jónathan Heras, University of Dundee, UK
Josef Urban, Radboud University, Netherlands
Deyan Ginev, Jacobs University Bremen, Germany
Rob Arthan, Queen Mary University of London, UK
Makarius Wenzel, Université Paris-Sud 11, France
Hendrik Tews, TU Dresden, Germany
Simon Colton, Department of Computing, Imperial College, London, UK
Paul Libbrecht, Martin Luther University Halle-Wittenberg, Germany
Cezary Kaliszyk, University of Innsbruck, Austria
Andrea Kohlhase, Jacobs University Bremen, Germany
Yannis Haralambous, Télécom Bretagne, France
Florian Rabe, Jacobs University Bremen, Germany
Akiko Aizawa, NII, The University of Tokyo, Japan
Carsten Schuermann, IT University of Copenhagen, Denmark
Magnus O. Myreen, University of Cambridge, UK
Janka Chlebíková, School of Computing, University of Portsmouth, UK
Richard Zanibbi, Rochester Institute of Technology, US
Michael Kohlhase, Jacobs University Bremen, Germany
Adam Kilgarriff, Lexical Computing Ltd, UK
Leo Freitas, Newcastle University, UK
Frank Tompa, University of Waterloo, Canada
Gudmund Grov, Heriot-Watt University, Edinburgh, UK
Jeremy Avigad, Carnegie Mellon University, US
Stephen Watt, University of Western Ontario, Canada
Temur Kutsia, RISC Institute, JKU Linz, Austria
Manfred Kerber, University of Birmingham, UK
Hoon Hong, North Carolina State University, US
Christoph Lüth, DFKI Bremen, Germany
Thierry Bouche, Université Joseph Fourier (Grenoble), France
Andrea Asperti, University of Bologna, Italy
Jesse Alama, CENTRIA, FCT, Universidade Nova de Lisboa, Portugal
Jiří Rákosník, Institute of Mathematics, Academy of Sciences, Czech Republic
Thomas Hales, University of Pittsburgh, US
Predrag Janičić, Department for Computer Science, University of
Belgrade, Serbia
(more names will be added as confirmations arrive)
--
Christoph Lange, School of Computer Science, University of Birmingham
http://cs.bham.ac.uk/~langec/, Skype duke4701
→ Enabling Domain Experts to use Formalised Reasoning @ AISB 2013
2–5 April 2013, Exeter, UK. Deadline 14 Jan
http://cs.bham.ac.uk/research/projects/formare/events/aisb2013/
→ Intelligent Computer Mathematics, 7–12 Jul 2013, Bath, UK; Deadline 8 Mar
http://cicm-conference.org/2013/
CICM 2013 - Conference on Intelligent Computer Mathematics
July 8-12, 2012 at the University of Bath, UK
http://www.cicm-conference.org/2013
Call for Workshop Proposals
----------------------------------------------------------------------
As computers and communications technology advance, greater
opportunities arise for intelligent mathematical computation. While
computer algebra, automated deduction, mathematical publishing and
novel user interfaces individually have long and successful histories,
we are now seeing increasing opportunities for synergy among these
areas.
Workshop proposals for CICM 2013 are solicited. Both well-established
workshops and newer or brand new ones are encouraged.
Please provide the following information:
+ Workshop title.
+ Names and affiliations of organizers.
+ Brief description of workshop goals and/or topics.
+ Proposed workshop duration (half a day up to two days is possible).
+ If the workshop has met previously, please include the conference
affiliation for the previous meeting. If the workshop is new,
please indicate so.
CICM conference fees will be levied on a per-day basis, so that
workshop-only participation is possible. The CICM organizers plan to
make available a small amount towards partial reimbursement for travel
expenses of invited speakers. Also, CICM will take care of copying and
distributing informal printed proceedings for workshops that would
like this service, as well as permanently archived open access online
proceedings with CEUR-WS.org.
All proposals should be sent via email to
cicm-organizers(a)jacobs-university.de
for consideration by the CICM 2013 organizers:
James Davenport (University of Bath, UK): Conference Chair
Jacques Carette (McMaster University, Canada): Program Chair
David Aspinall (University of Edinburgh, Scotland): MKM Track Chair
Christoph Lange (Univ of Birmingham, UK): System & Projects Track Chair
Petr Sojka (Masaryk University, CZ): DML Track Chair
Wolfgang Windsteiger (RISC, Austria): Calculemus Track Chair
Important dates:
Deadline for proposal submissions: January 28, 2013
Acceptance/rejection notification: February 8, 2013
Workshop dates: July 8-12, 2013
-----------------------------------------------------------------------