Hello, Seeking for help in order to make a definitive decision. For some months, I'm looking for a Java API which helps me to access Wikipedia and get the content of articles. My project is to buils a taxonomy of concepts of a given damain. Details: 1. I have a corpus of damin texts, I extract the first set of terms( that represents the domain). 2. I search in wikipedia the articles of these words in order to extract their definitions. The definition of the word helps me to find the hyperonym of this word. The call for wikipedia will surely be done in a java loop. 3. I search the definitions of the hyperonyms found in tyhe previous step to find their hyperonyms, and so on. 4. I draw a graph linking the words to their hyperonyms.
My problem is that for the step 2, I can not make a definitive decision. 1. I wrote java code to access Wikipedia online. It succeeds but the speed of my connexion determines if the execusion succeeds or fails giving a set of exceptions. Sometimes, the execusion gives me only 2 or 3 articles. 2. I tryed to use JWPL to treate Wikipedia dumps. I failed because I have not enough RAM. 3. I'm now hesitating between a set of Java Apis. Please give me your points of views if you have already done something in this sense. I made a serious investigation and I found the following links: 1- http://wdm.cs.waikato.ac.nz:8080/wiki/Wiki.jsp?page=Installing%20the%20Java%... 2- http://jwikiapi.sourceforge.net/index.html 3- http://code.google.com/p/gwtwiki/ 4- http://www.mediawiki.org/wiki/API%3aMain_page 5- http://jwbf.sourceforge.net/
Give me your suggestions please.
Regards Khalida Ben Sidi Ahmed
Hello, Seeking for help in order to make a definitive decision. For some months, I'm looking for a Java API which helps me to access Wikipedia and get the content of articles. My project is to buils a taxonomy of concepts of a given damain. Details: 1. I have a corpus of damin texts, I extract the first set of terms( that represents the domain). 2. I search in wikipedia the articles of these words in order to extract their definitions. The definition of the word helps me to find the hyperonym of this word. The call for wikipedia will surely be done in a java loop. 3. I search the definitions of the hyperonyms found in tyhe previous step to find their hyperonyms, and so on. 4. I draw a graph linking the words to their hyperonyms.
My problem is that for the step 2, I can not make a definitive decision. 1. I wrote java code to access Wikipedia online. It succeeds but the speed of my connexion determines if the execusion succeeds or fails giving a set of exceptions. Sometimes, the execusion gives me only 2 or 3 articles. 2. I tryed to use JWPL to treate Wikipedia dumps. I failed because I have not enough RAM. 3. I'm now hesitating between a set of Java Apis. Please give me your points of views if you have already done something in this sense. I made a serious investigation and I found the following links: 1- http://wdm.cs.waikato.ac.nz:8080/wiki/Wiki.jsp?page=Installing%20the%20Java%... 2- http://jwikiapi.sourceforge.net/index.html 3- http://code.google.com/p/gwtwiki/ 4- http://www.mediawiki.org/wiki/API%3aMain_page 5- http://jwbf.sourceforge.net/
I'd appreciate any suggestions.
Regards Khalida Ben Sidi Ahmed
On 12/1/2011 6:56 PM, Khalida BEN SIDI AHMED wrote:
Hello, Seeking for help in order to make a definitive decision. For some months, I'm looking for a Java API which helps me to access Wikipedia and get the content of articles. My project is to buils a taxonomy of concepts of a given damain.
Look up
They extract a lot of stuff out of Wikipedia. They have open source extraction software that you can download, run and modify. If you like DBpedia, you'll probably also like
Right now Freebase is assimilating concepts from WordNet, which would help you on your mission.
wikitech-l@lists.wikimedia.org