I'm adding some tweaks to the WikiXRay parser of meta-history dumps. I
now extract internal, external links, and so on, but I'd also like to
extract the plain text (without HTML code and, possibly, also filtering
Does anyone nows a good python library to do that? I believe there
should be something out there, as there exist bots and crawlers automating
the data extraction process from one wiki to other.
Thanks in advance for your comments.
¿Con Mascota por primera vez? - Sé un mejor Amigo
Entra en Yahoo! Respuestas.
---------- Forwarded message ----------
From: lorenzo benussi <lorenzo.benussi(a)unito.it>
Date: 21 Jan 2008 22:51
Subject: [Icommons] Call for paper - CREATING VALUE THROUGH DIGITAL COMMONS
CALL FOR PAPER
TRACK WITHIN THE EURAM Conference 2008
CREATING VALUE THROUGH DIGITAL COMMONS.
How collective management of IPRs, open innovation models, and digital
communities shape the industrial dynamics in the XXI century.
Ljubljana & Blend – May 15th -17th 2008
The track, hosted within the EURAM Conference 2008 (http://www.euram2008.org
), focuses on how digital commons (DC) create value through open ways
of managing knowledge and innovation. We encourage the submission of
papers addressing how open accessibility, mainly through digital
networks, is affecting economic activities.
In recent years, information has become a primary wealth-creating
asset, while technological developments have transformed the
production process from physically-based to knowledge-based. As a
consequence, the management of technology and knowledge are key
factors to compete on the markets.
Following this path, new ways of managing IPRs that foster instead of
limit the access to information (Benkler, 2006), have emerged as
reliable business opportunities for firms in different sectors
(software, biotechnologies, pharmaceutical, media, industrial design).
Moreover, the creativity and competitiveness of companies are
benefiting from open production and innovation methods (Chesbrough,
2003; von Hippel, 2005), relying on an emergent division of labour,
collective ownership of intellectual properties, and information
sharing in on-line communities. Leading examples of these new trends
are the success of Free/Open Source (FOSS) solutions; the increasing
usage of Creative Commons Licenses; the phenomenal success of the on-
line encyclopaedia Wikipedia and, more generally, the flourishing of
user generated contents.
This track aims at contributing to the research agenda on the economic
exploitation of digital commons, dealing with intriguing research
questions as.: how is possible to profit from public knowledge? How
does the open model perform compared to the proprietary one? Do
digital commons favour the creation of new ventures? How to manage
incentives and the division of labour in open environments? Which are
the strategies to compete in such new markets?
Extended abstracts will be considered, but a preference will be given
to full papers. We invite well-crafted papers contributing original
ideas on different sectors as software, biotechnology, media,
pharmaceutical, and industrial design. All submissions will receive a
double blind review process.
A non-exhaustive list of themes is as follows:
l BUSINESS MODELS: taxonomies and case studies on how firms
extract value from digital commons; sustainability of open business
models; relationships between firms and virtual communities; division
of labour and competitive strategies in open environments;
l OPEN INSTITUTIONAL REGIMES: new ways of managing IPRs;
comparison between open and traditional IPRs regimes; free riding vs.
trust in on-line communities; policies to increase knowledge
production by sharing and re-using
l FREE/OPEN SOURCE ENTREPRENEURSHIP: FOSS-based firms as a
special case of New Technology-Based Firms; fund-rising of FOSS based
enterprises; FOSS strategies of large and small software companies;
hybridization between commercial and Free/Open Software; FOSS
strategies in developing countries
l OPEN INNOVATION: internal vs. external sources of knowledge,
new ways of organising R&D; types of knowledge and models of
governance; open management and integration of intangible assets;
assessment of users' integration inside firms boundaries; on-line
community building and management.
l COMMONS-BASED CREATIVITY: how public availability of contents
through digital networks fosters creativity; analysis of collective
invention/creation processes (Wikipedia, Linux, SETI@home); reuse and
mixing of existing contents as lever of creativity; economic
exploitation of public licensed contents.
We are making arrangements for a Special Issue of an International
Journal and are planning to invite two keynote speakers among the most
important scholars in the field.
Cristina Rossi, Politecnico di Milano, cristina1.rossi(a)polimi.it
Lorenzo Benussi, University of Turin, lorenzo.benussi(a)unito.it
Jean Michel Dalle, Université Pierre et Marie Curie, jean-michel.dalle(a)upmc.fr
Authors should submit their potential contribution to this EURAM track
by 27 January 2008 24:00 (CET)trough the conference Web site
Decisions of paper acceptance will be made by FEBRUARY 22TH 2007
Laboratory of Economics of Innovation "F. Momigliano"
Department of Economics "S. Cognetti de Martiis"
University of Turin
Via Po 53
10124 Turin, Italy
Tel +39 011 6704980
Fax +39 011 6703895
"Knowledge is like a candle. Even as it lights a new candle, the
strength of the original flame is not diminished." Thomas Jefferson
Icommons mailing list
They've just been waiting in a mountain for the right moment:
For those who were interested in a Mediawiki grammar etc, here is a
For research purposes as well as the Wiki Creole community's
convenience, we are making our EBNF grammar, the XML schema definition,
and the to/from XML transformations available. You can use these
specifications to create your own parsers as well as use standard
technology (DOM, XSLT) to work with wiki pages and display or save them.
For more, see the dedicated Wiki Creole page at
http://www.riehle.org/wiki-creole as well as the WikiCreole community at
Phone: + 1 (650) 215 3459
I am trying to get hold of the latest dump of all English pages, all
However, I am having real tracking it down.
The most recent dump of the file (18th October 2007) seems to have
The directory for the December 18th dump is a blank page:
The progress page for all dumps shows 'abort' across the board:
Does anyone know what is going on here?
I would really appreciate it if someone could point me in the right
Thank you for your time,
University of Teesside
Hey everyone, if anyone has a full history dump of the english wikipedia and
is willing to give me a copy let me know. I can download at up to roughly
10MB/s from a high speed source or I can send you a hard drive and pay for
all postage. I'm interested in the most recent dump available.
Happy New Years =)