JWPL - Java Wikipedia Library
version 0.3 beta is now available
http://www.ukp.tu-darmstadt.de/software/JWPL
INTRODUCTION
Lately, Wikipedia has been recognized as a promising lexical semantic resource. We present JWPL, a free Java-based Wikipedia application programming interface, that enables the use of Wikipedia as a NLP resource by providing efficient programmatic access to the knowledge therein.
FUNCTIONALITY
Fast access to: * article text * categories * redirects * links between articles (ingoing and outgoing).
Discrimination between * article pages * disambiguation pages * redirect pages.
Available languages: * English * German * Czech * Ukrainian
Other languages will be added step by step.
DOWNLOAD
JWPL Java library http://www.ukp.tu-darmstadt.de/software/JWPL
Wikipedia data (with database scheme optimized for large-scale NLP tasks) ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data
LICENCE
JWPL is free for non-profit and non-commercial use.
ABOUT
JWPL was developed by the Ubiquitous Knowledge Processing Lab at Darmstadt University of Technlogy.
Why don't you release this under a free license so that the Wikimedia Foundation could use it?
On Jul 2, 2007, at 8:33 AM, Torsten Zesch wrote:
JWPL - Java Wikipedia Library version 0.3 beta is now available http://www.ukp.tu-darmstadt.de/software/JWPL
INTRODUCTION
Lately, Wikipedia has been recognized as a promising lexical semantic resource. We present JWPL, a free Java-based Wikipedia application programming interface, that enables the use of Wikipedia as a NLP resource by providing efficient programmatic access to the knowledge therein.
FUNCTIONALITY
Fast access to:
- article text
- categories
- redirects
- links between articles (ingoing and outgoing).
Discrimination between
- article pages
- disambiguation pages
- redirect pages.
Available languages:
- English
- German
- Czech
- Ukrainian
Other languages will be added step by step.
DOWNLOAD
JWPL Java library http://www.ukp.tu-darmstadt.de/software/JWPL
Wikipedia data (with database scheme optimized for large-scale NLP tasks) ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data
LICENCE
JWPL is free for non-profit and non-commercial use.
ABOUT
JWPL was developed by the Ubiquitous Knowledge Processing Lab at Darmstadt University of Technlogy.
http://www.ukp.tu-darmstadt.de
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Pywikipediabot provides this functionality under a free license.
/Brian
On 7/2/07, Jimmy Wales jwales@wikia.com wrote:
Why don't you release this under a free license so that the Wikimedia Foundation could use it?
On Jul 2, 2007, at 8:33 AM, Torsten Zesch wrote:
JWPL - Java Wikipedia Library version 0.3 beta is now available http://www.ukp.tu-darmstadt.de/software/JWPL
INTRODUCTION
Lately, Wikipedia has been recognized as a promising lexical semantic resource. We present JWPL, a free Java-based Wikipedia application programming interface, that enables the use of Wikipedia as a NLP resource by providing efficient programmatic access to the knowledge therein.
FUNCTIONALITY
Fast access to:
- article text
- categories
- redirects
- links between articles (ingoing and outgoing).
Discrimination between
- article pages
- disambiguation pages
- redirect pages.
Available languages:
- English
- German
- Czech
- Ukrainian
Other languages will be added step by step.
DOWNLOAD
JWPL Java library http://www.ukp.tu-darmstadt.de/software/JWPL
Wikipedia data (with database scheme optimized for large-scale NLP tasks) ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data
LICENCE
JWPL is free for non-profit and non-commercial use.
ABOUT
JWPL was developed by the Ubiquitous Knowledge Processing Lab at Darmstadt University of Technlogy.
http://www.ukp.tu-darmstadt.de
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hoi, It is nice that the pywikipediabot provides certain functionality. The question is how usable is it in the first place when this info is needed interactively. Also when there are more tools that provide a great job, we will get a situation where advances in one tool will egg on the people of another tool to do even better.
Thanks, GerardM
On 7/2/07, Brian Brian.Mingus@colorado.edu wrote:
Pywikipediabot provides this functionality under a free license.
/Brian
On 7/2/07, Jimmy Wales <jwales@wikia.com > wrote:
Why don't you release this under a free license so that the Wikimedia Foundation could use it?
On Jul 2, 2007, at 8:33 AM, Torsten Zesch wrote:
JWPL - Java Wikipedia Library version 0.3 beta is now available http://www.ukp.tu-darmstadt.de/software/JWPL
INTRODUCTION
Lately, Wikipedia has been recognized as a promising lexical semantic resource. We present JWPL, a free Java-based Wikipedia application programming interface, that enables the use of Wikipedia as a NLP resource by providing efficient programmatic access to the knowledge therein.
FUNCTIONALITY
Fast access to:
- article text
- categories
- redirects
- links between articles (ingoing and outgoing).
Discrimination between
- article pages
- disambiguation pages
- redirect pages.
Available languages:
- English
- German
- Czech
- Ukrainian
Other languages will be added step by step.
DOWNLOAD
JWPL Java library http://www.ukp.tu-darmstadt.de/software/JWPL
Wikipedia data (with database scheme optimized for large-scale NLP tasks) ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data
LICENCE
JWPL is free for non-profit and non-commercial use.
ABOUT
JWPL was developed by the Ubiquitous Knowledge Processing Lab at Darmstadt University of Technlogy.
http://www.ukp.tu-darmstadt.de
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi,
You may also be interested in REST way: http://www.mediawiki.org/wiki/API
Regards, /Mike/
GerardM wrote:
Hoi, It is nice that the pywikipediabot provides certain functionality. The question is how usable is it in the first place when this info is needed interactively. Also when there are more tools that provide a great job, we will get a situation where advances in one tool will egg on the people of another tool to do even better.
Thanks, GerardM
On 7/2/07, *Brian* <Brian.Mingus@colorado.edu mailto:Brian.Mingus@colorado.edu> wrote:
Pywikipediabot provides this functionality under a free license. /Brian On 7/2/07, * Jimmy Wales* <jwales@wikia.com <mailto:jwales@wikia.com>> wrote: Why don't you release this under a free license so that the Wikimedia Foundation could use it? On Jul 2, 2007, at 8:33 AM, Torsten Zesch wrote: > > JWPL - Java Wikipedia Library > > version 0.3 beta is now available > > http://www.ukp.tu-darmstadt.de/software/JWPL > > > INTRODUCTION > > Lately, Wikipedia has been recognized as a promising lexical > semantic resource. We present JWPL, a free Java-based Wikipedia > application programming interface, that enables the use of > Wikipedia as a NLP resource by providing efficient programmatic > access to the knowledge therein. > > > FUNCTIONALITY > > Fast access to: > * article text > * categories > * redirects > * links between articles (ingoing and outgoing). > > Discrimination between > * article pages > * disambiguation pages > * redirect pages. > > Available languages: > * English > * German > * Czech > * Ukrainian > > Other languages will be added step by step. > > > DOWNLOAD > > JWPL Java library > http://www.ukp.tu-darmstadt.de/software/JWPL <http://www.ukp.tu-darmstadt.de/software/JWPL> > > Wikipedia data > (with database scheme optimized for large-scale NLP tasks) > ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data > > > LICENCE > > JWPL is free for non-profit and non-commercial use. > > > ABOUT > > JWPL was developed by the Ubiquitous Knowledge Processing Lab > at Darmstadt University of Technlogy. > > http://www.ukp.tu-darmstadt.de <http://www.ukp.tu-darmstadt.de> > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> > http://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> http://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
We used a python mysql library to access the article text and it was quite fast. The critical step for increased speed was the following index:
dbCursor.execute("ALTER TABLE page ADD INDEX (page_title);")
We've got several thousands lines of python code that is already GPL'ed (since the copyleft goes at the top, it's the first code I write ;) and will be released at Wikimania.
On 7/2/07, GerardM gerard.meijssen@gmail.com wrote:
Hoi, It is nice that the pywikipediabot provides certain functionality. The question is how usable is it in the first place when this info is needed interactively. Also when there are more tools that provide a great job, we will get a situation where advances in one tool will egg on the people of another tool to do even better.
Thanks, GerardM
On 7/2/07, Brian Brian.Mingus@colorado.edu wrote:
Pywikipediabot provides this functionality under a free license.
/Brian
On 7/2/07, Jimmy Wales <jwales@wikia.com > wrote:
Why don't you release this under a free license so that the Wikimedia Foundation could use it?
On Jul 2, 2007, at 8:33 AM, Torsten Zesch wrote:
JWPL - Java Wikipedia Library version 0.3 beta is now available http://www.ukp.tu-darmstadt.de/software/JWPL
INTRODUCTION
Lately, Wikipedia has been recognized as a promising lexical semantic resource. We present JWPL, a free Java-based Wikipedia application programming interface, that enables the use of Wikipedia as a NLP resource by providing efficient programmatic access to the knowledge therein.
FUNCTIONALITY
Fast access to:
- article text
- categories
- redirects
- links between articles (ingoing and outgoing).
Discrimination between
- article pages
- disambiguation pages
- redirect pages.
Available languages:
- English
- German
- Czech
- Ukrainian
Other languages will be added step by step.
DOWNLOAD
JWPL Java library http://www.ukp.tu-darmstadt.de/software/JWPL
Wikipedia data (with database scheme optimized for large-scale NLP tasks) ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data
LICENCE
JWPL is free for non-profit and non-commercial use.
ABOUT
JWPL was developed by the Ubiquitous Knowledge Processing Lab at Darmstadt University of Technlogy.
http://www.ukp.tu-darmstadt.de
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Gerard provides an important insight below: The more we are able to decouple the software components that run Wikipedia, the more open it becomes to innovation. It is much harder to change one large piece of software than it is to change and innovate on multiple components that are decoupled from each other through well-defined standards.
I'm making this point, because the recent announcement of Wiki Creole is an important step in this direction. It is only subset of what Mediawiki provides, but may work well for 90% of all users, and it can serve as a critical standard that decouples a (UI-less) wiki engine used through services from wiki editors from additional wiki tooling like bots and converters.
I don't understand Mediawiki enough in detail, but I would argue that separating the rendering from the provision and storage of wiki pages would save the WMF a lot of money. If you managed to unload the rendering to clients, I'd assume the server capacity you provide could be reduced, perhaps significantly.
Dirk
GerardM wrote:
Hoi, It is nice that the pywikipediabot provides certain functionality. The question is how usable is it in the first place when this info is needed interactively. Also when there are more tools that provide a great job, we will get a situation where advances in one tool will egg on the people of another tool to do even better.
Thanks, GerardM
On 7/2/07, *Brian* <Brian.Mingus@colorado.edu mailto:Brian.Mingus@colorado.edu> wrote:
Pywikipediabot provides this functionality under a free license. /Brian On 7/2/07, * Jimmy Wales* <jwales@wikia.com <mailto:jwales@wikia.com>> wrote: Why don't you release this under a free license so that the Wikimedia Foundation could use it? On Jul 2, 2007, at 8:33 AM, Torsten Zesch wrote: > > JWPL - Java Wikipedia Library > > version 0.3 beta is now available > > http://www.ukp.tu-darmstadt.de/software/JWPL > > > INTRODUCTION > > Lately, Wikipedia has been recognized as a promising lexical > semantic resource. We present JWPL, a free Java-based Wikipedia > application programming interface, that enables the use of > Wikipedia as a NLP resource by providing efficient programmatic > access to the knowledge therein. > > > FUNCTIONALITY > > Fast access to: > * article text > * categories > * redirects > * links between articles (ingoing and outgoing). > > Discrimination between > * article pages > * disambiguation pages > * redirect pages. > > Available languages: > * English > * German > * Czech > * Ukrainian > > Other languages will be added step by step. > > > DOWNLOAD > > JWPL Java library > http://www.ukp.tu-darmstadt.de/software/JWPL <http://www.ukp.tu-darmstadt.de/software/JWPL> > > Wikipedia data > (with database scheme optimized for large-scale NLP tasks) > ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data > > > LICENCE > > JWPL is free for non-profit and non-commercial use. > > > ABOUT > > JWPL was developed by the Ubiquitous Knowledge Processing Lab > at Darmstadt University of Technlogy. > > http://www.ukp.tu-darmstadt.de <http://www.ukp.tu-darmstadt.de> > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> > http://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> http://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
There is a Java program with free license for search of related terms in Wikipedia. It is Synarcher program. Welcome http://synarcher.sourceforge.net
It contains functions to access Wikipedia data and several algorithms (see Bibliography at the site). But at this moment the Wikipedia API in Synarcher is not so clear as JWPL (I think it's too complex now). I am planning to improve the design of Synarcher code in future.
Jimmy Wales wrote: Why don't you release this under a free license so that the Wikimedia Foundation could use it?
On Jul 2, 2007, at 8:33 AM, Torsten Zesch wrote:
JWPL - Java Wikipedia Library version 0.3 beta is now available http://www.ukp.tu-darmstadt.de/software/JWPL
INTRODUCTION
Lately, Wikipedia has been recognized as a promising lexical semantic resource. We present JWPL, a free Java-based Wikipedia application programming interface, that enables the use of Wikipedia as a NLP resource by providing efficient programmatic access to the knowledge therein.
FUNCTIONALITY
Fast access to:
- article text
- categories
- redirects
- links between articles (ingoing and outgoing).
Discrimination between
- article pages
- disambiguation pages
- redirect pages.
Available languages:
- English
- German
- Czech
- Ukrainian
Other languages will be added step by step.
DOWNLOAD
JWPL Java library http://www.ukp.tu-darmstadt.de/software/JWPL
Wikipedia data (with database scheme optimized for large-scale NLP tasks) ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data
LICENCE
JWPL is free for non-profit and non-commercial use.
ABOUT
JWPL was developed by the Ubiquitous Knowledge Processing Lab at Darmstadt University of Technlogy.
http://www.ukp.tu-darmstadt.de
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Another GNU GPL licensed Python tool is WikiXRay, more oriented to automate research jobs.
In the following days we will release a major update of tools and graphics, as explained in http://meta.wikimedia.org/wiki/Talk:WikiXRay#Following_Updates
The new Python parser will be (as far as we know) the most complete and precise parser for processing the XML dumps for research purposes.
Saludos.
Felipe Ortega.
Jimmy Wales jwales@wikia.com escribió: Why don't you release this under a free license so that the Wikimedia Foundation could use it?
On Jul 2, 2007, at 8:33 AM, Torsten Zesch wrote:
JWPL - Java Wikipedia Library version 0.3 beta is now available http://www.ukp.tu-darmstadt.de/software/JWPL
INTRODUCTION
Lately, Wikipedia has been recognized as a promising lexical semantic resource. We present JWPL, a free Java-based Wikipedia application programming interface, that enables the use of Wikipedia as a NLP resource by providing efficient programmatic access to the knowledge therein.
FUNCTIONALITY
Fast access to:
- article text
- categories
- redirects
- links between articles (ingoing and outgoing).
Discrimination between
- article pages
- disambiguation pages
- redirect pages.
Available languages:
- English
- German
- Czech
- Ukrainian
Other languages will be added step by step.
DOWNLOAD
JWPL Java library http://www.ukp.tu-darmstadt.de/software/JWPL
Wikipedia data (with database scheme optimized for large-scale NLP tasks) ftp://ftp.tu-darmstadt.de/pub/tud/informatik/JWPL_data
LICENCE
JWPL is free for non-profit and non-commercial use.
ABOUT
JWPL was developed by the Ubiquitous Knowledge Processing Lab at Darmstadt University of Technlogy.
http://www.ukp.tu-darmstadt.de
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
---------------------------------
¡Descubre una nueva forma de obtener respuestas a tus preguntas! Entra en Yahoo! Respuestas.
wiki-research-l@lists.wikimedia.org