Hi all, I am new to wikipedia API. can you help me with the following: I want to grep all the content of the "united states of america" to a text file with out images. I am looking a response in text format.
How can I do that? looking for: http://en.wikipedia.org/wiki/United_States%C2%A0 page contents.
I got this url constructed: http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=uni...
But I am not getting what I want ..:(.may be I am missing some thing basic I did based on http://en.wikipedia.org/w/api.php..
1.how can I get the content of what ever string I give in the query? please help me with the url. 2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json? 3. In the unites_states example, I want to get the first coulm of the citys ( Leading population centers
) how can I get that .
-pavi
On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar pavankumarstudent@yahoo.comwrote:
Hi all, I am new to wikipedia API. can you help me with the following: I want to grep all the content of the "united states of america" to a text file with out images. I am looking a response in text format.
How can I do that? looking for: http://en.wikipedia.org/wiki/United_States page contents.
I got this url constructed:
http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=uni...
But I am not getting what I want ..:(.may be I am missing some thing basic
A simple mistake: You are writing the article in lowercase.
Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=Uni...
I did based on http://en.wikipedia.org/w/api.php..
1.how can I get the content of what ever string I give in the query? please help me with the url. 2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
See the list of formats at https://www.mediawiki.org/wiki/API:Data_formats#Output
3. In the unites_states example, I want to get the first coulm of the
citys ( Leading population centers ) how can I get that .
Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.
Thank you for thereply. with case changes that worked But: when I am trying to get the data in Json.which I think is better to parser: http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Un...
I see that I am getting lot of data like: == \u0906\u0923\u093f \u092a\u094d\u0930\u0926\u0947\u0936]]\n[[ms:Negeri dan wilayah di India]]\n[[nl:Lijst van staten en territoria van India]]\n[[ne:\u092d\u093e\u0930\u0924\u0915\u093e \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941 \u0930 \u0915\u0947\u0928\u094d\u0926\u094d\u0930 \u0936\u093e\u0938\u093f\u0924 \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941]]\n[[ja:\u30a4\u30f3\u30c9\u306e\u5730\u65b9\u884c\u653f\u533a\u753b]]\n[[no:Indias delstater og territorier]]\n[[nn:Statar og territorium i India]]\n[[or:\u0b2d\u0b3e\u0b30\u0b24\u0b30 is my query correct? all I need is to get the Leading population centers
________________________________ From: Platonides platonides@gmail.com To: Pavan Kumar pavankumarstudent@yahoo.com; MediaWiki API announcements & discussion mediawiki-api@lists.wikimedia.org Sent: Thursday, August 9, 2012 4:14 AM Subject: Re: [Mediawiki-api] getting data for a topic
On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar pavankumarstudent@yahoo.com wrote:
Hi all,
I am new to wikipedia API. can you help me with the following: I want to grep all the content of the "united states of america" to a text file with out images. I am looking a response in text format.
How can I do that? looking for: http://en.wikipedia.org/wiki/United_States%C2%A0 page contents.
I got this url constructed: http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=uni...
But I am not getting what I want ..:(.may be I am missing some thing basic
A simple mistake: You are writing the article in lowercase.
Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=Uni...
I did based on http://en.wikipedia.org/w/api.php..
1.how can I get the content of what ever string I give in the query? please help me with the url. 2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
See the list of formats at https://www.mediawiki.org/wiki/API:Data_formats#Output
3. In the unites_states example, I want to get the first coulm of the citys (
Leading population centers
) how can I get that .
Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.
If your work focuses on getting structured data, i recommend using dbpedia.org or freebase.com. They both structure wikipedia data and they have structured query languages.
-- Tommy Chheng
El jueves, agosto 9, 2012 a las 10:53 PM, Pavan Kumar escribió:
Thank you for thereply. with case changes that worked But: when I am trying to get the data in Json.which I think is better to parser: http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Un...
I see that I am getting lot of data like:
\u0906\u0923\u093f \u092a\u094d\u0930\u0926\u0947\u0936]]\n[[ms:Negeri dan wilayah di India]]\n[[nl:Lijst van staten en territoria van India]]\n[[ne:\u092d\u093e\u0930\u0924\u0915\u093e \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941 \u0930 \u0915\u0947\u0928\u094d\u0926\u094d\u0930 \u0936\u093e\u0938\u093f\u0924 \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941]]\n[[ja:\u30a4\u30f3\u30c9\u306e\u5730\u65b9\u884c\u653f\u533a\u753b]]\n[[no:Indias delstater og territorier]]\n[[nn:Statar og territorium i India]]\n[[or:\u0b2d\u0b3e\u0b30\u0b24\u0b30 is my query correct? all I need is to get the Leading population centers
From: Platonides <platonides@gmail.com (mailto:platonides@gmail.com)> To: Pavan Kumar <pavankumarstudent@yahoo.com (mailto:pavankumarstudent@yahoo.com)>; MediaWiki API announcements & discussion <mediawiki-api@lists.wikimedia.org (mailto:mediawiki-api@lists.wikimedia.org)> Sent: Thursday, August 9, 2012 4:14 AM Subject: Re: [Mediawiki-api] getting data for a topic
On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar <pavankumarstudent@yahoo.com (mailto:pavankumarstudent@yahoo.com)> wrote:
Hi all, I am new to wikipedia API. can you help me with the following: I want to grep all the content of the "united states of america" to a text file with out images. I am looking a response in text format.
How can I do that? looking for: http://en.wikipedia.org/wiki/United_States page contents.
I got this url constructed: http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=uni... But I am not getting what I want ..:(.may be I am missing some thing basic
A simple mistake: You are writing the article in lowercase.
Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=Uni...
I did based on http://en.wikipedia.org/w/api.php..
1.how can I get the content of what ever string I give in the query? please help me with the url. 2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
See the list of formats at https://www.mediawiki.org/wiki/API:Data_formats#Output
- In the unites_states example, I want to get the first coulm of the citys (
Leading population centers
) how can I get that .
Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org (mailto:Mediawiki-api@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Thanks Tommy chheng for the reply.
My requirement is to get the information of things I posted in the query. example: I post the following query: http://en.wikipedia.org/wiki/List_of_United_States_cities_by_population
I want to get all the states in USA. so I was looking if I can get the JSON putput and use any tool to extract the JSON output for alll the states..
I was bit histent to use new API's but I will look into that..also
can you tell me if there is any other good tools that convert for me JSON to get the information i am looking
________________________________ From: Tommy Chheng tommy.chheng@gmail.com To: Pavan Kumar pavankumarstudent@yahoo.com; MediaWiki API announcements & discussion mediawiki-api@lists.wikimedia.org Sent: Thursday, August 9, 2012 10:56 PM Subject: RE: [Mediawiki-api] getting data for a topic
If your work focuses on getting structured data, i recommend using dbpedia.org or freebase.com. They both structure wikipedia data and they have structured query languages.
-- Tommy Chheng
El jueves, agosto 9, 2012 a las 10:53 PM, Pavan Kumar escribió:
Thank you for thereply. with case changes that worked But: when I am trying to get the data in Json.which I think is better to parser: http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Un...
I see that I am getting lot of data like:
\u0906\u0923\u093f \u092a\u094d\u0930\u0926\u0947\u0936]]\n[[ms:Negeri dan wilayah di India]]\n[[nl:Lijst van staten en territoria van India]]\n[[ne:\u092d\u093e\u0930\u0924\u0915\u093e \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941 \u0930 \u0915\u0947\u0928\u094d\u0926\u094d\u0930 \u0936\u093e\u0938\u093f\u0924 \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941]]\n[[ja:\u30a4\u30f3\u30c9\u306e\u5730\u65b9\u884c\u653f\u533a\u753b]]\n[[no:Indias delstater og territorier]]\n[[nn:Statar og territorium i India]]\n[[or:\u0b2d\u0b3e\u0b30\u0b24\u0b30 is my query correct? all I need is to get the Leading population centers
From: Platonides platonides@gmail.com To: Pavan Kumar pavankumarstudent@yahoo.com; MediaWiki API announcements & discussion mediawiki-api@lists.wikimedia.org Sent: Thursday, August 9, 2012 4:14 AM Subject: Re: [Mediawiki-api] getting data for a topic
On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar pavankumarstudent@yahoo.com wrote:
Hi all,
I am new to wikipedia API. can you help me with the following: I want to grep all the content of the "united states of america" to a text file with out images. I am looking a response in text format.
How can I do that? looking for: http://en.wikipedia.org/wiki/United_States%C2%A0 page contents.
I got this url constructed: http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=uni...
But I am not getting what I want ..:(.may be I am missing some thing basic
A simple mistake: You are writing the article in lowercase.
Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=Uni...
I did based on http://en.wikipedia.org/w/api.php..
1.how can I get the content of what ever string I give in the query? please help me with the url. 2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
See the list of formats at https://www.mediawiki.org/wiki/API:Data_formats#Output
- In the unites_states example, I want to get the first coulm of the citys (
Leading population centers
) how can I get that .
Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Hi, Please refer to freebase or dbpedia for getting that information in JSON form.
Here are some links that might be helpful http://stackoverflow.com/questions/9989727/freebase-api-for-sorting-by-city-... https://plus.google.com/u/0/109936836907132434202/posts/AXWkBuX5Umi
i recommend following up in either freebase/dbpedia message boards.
-- Tommy Chheng
El jueves, agosto 9, 2012 a las 11:28 PM, Pavan Kumar escribió:
Thanks Tommy chheng for the reply.
My requirement is to get the information of things I posted in the query. example: I post the following query: http://en.wikipedia.org/wiki/List_of_United_States_cities_by_population
I want to get all the states in USA. so I was looking if I can get the JSON putput and use any tool to extract the JSON output for alll the states..
I was bit histent to use new API's but I will look into that..also
can you tell me if there is any other good tools that convert for me JSON to get the information i am looking
From: Tommy Chheng <tommy.chheng@gmail.com (mailto:tommy.chheng@gmail.com)> To: Pavan Kumar <pavankumarstudent@yahoo.com (mailto:pavankumarstudent@yahoo.com)>; MediaWiki API announcements & discussion <mediawiki-api@lists.wikimedia.org (mailto:mediawiki-api@lists.wikimedia.org)> Sent: Thursday, August 9, 2012 10:56 PM Subject: RE: [Mediawiki-api] getting data for a topic
If your work focuses on getting structured data, i recommend using dbpedia.org (http://dbpedia.org/) or freebase.com (http://freebase.com/). They both structure wikipedia data and they have structured query languages.
-- Tommy Chheng
El jueves, agosto 9, 2012 a las 10:53 PM, Pavan Kumar escribió:
Thank you for thereply. with case changes that worked But: when I am trying to get the data in Json.which I think is better to parser: http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Un...
I see that I am getting lot of data like:
\u0906\u0923\u093f \u092a\u094d\u0930\u0926\u0947\u0936]]\n[[ms:Negeri dan wilayah di India]]\n[[nl:Lijst van staten en territoria van India]]\n[[ne:\u092d\u093e\u0930\u0924\u0915\u093e \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941 \u0930 \u0915\u0947\u0928\u094d\u0926\u094d\u0930 \u0936\u093e\u0938\u093f\u0924 \u0930\u093e\u091c\u094d\u092f\u0939\u0930\u0941]]\n[[ja:\u30a4\u30f3\u30c9\u306e\u5730\u65b9\u884c\u653f\u533a\u753b]]\n[[no:Indias delstater og territorier]]\n[[nn:Statar og territorium i India]]\n[[or:\u0b2d\u0b3e\u0b30\u0b24\u0b30 is my query correct? all I need is to get the Leading population centers
From: Platonides <platonides@gmail.com (mailto:platonides@gmail.com)> To: Pavan Kumar <pavankumarstudent@yahoo.com (mailto:pavankumarstudent@yahoo.com)>; MediaWiki API announcements & discussion <mediawiki-api@lists.wikimedia.org (mailto:mediawiki-api@lists.wikimedia.org)> Sent: Thursday, August 9, 2012 4:14 AM Subject: Re: [Mediawiki-api] getting data for a topic
On Thu, Aug 9, 2012 at 8:16 AM, Pavan Kumar <pavankumarstudent@yahoo.com (mailto:pavankumarstudent@yahoo.com)> wrote:
Hi all, I am new to wikipedia API. can you help me with the following: I want to grep all the content of the "united states of america" to a text file with out images. I am looking a response in text format.
How can I do that? looking for: http://en.wikipedia.org/wiki/United_States page contents.
I got this url constructed: http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=uni... But I am not getting what I want ..:(.may be I am missing some thing basic
A simple mistake: You are writing the article in lowercase.
Try with http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=Uni...
I did based on http://en.wikipedia.org/w/api.php..
1.how can I get the content of what ever string I give in the query? please help me with the url. 2. I am trying to have this in a text file. can I get the response in the text format? other than xml and json?
See the list of formats at https://www.mediawiki.org/wiki/API:Data_formats#Output
- In the unites_states example, I want to get the first coulm of the citys (
Leading population centers
) how can I get that .
Extracting content from the inside of the article content will require you to perform some parsing of the wikitext.
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org (mailto:Mediawiki-api@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Hi, I know wikipedia has the information if lot of actors Is there a way I can write a Mediawiki API to get the list of all actors in wikipedia and there corresponding links... I am asking for a search in wikipedia :-)))) is it available...
pl let me know
//
On 12 August 2012 09:58, Pavan Kumar pavankumarstudent@yahoo.com wrote:
Hi, I know wikipedia has the information if lot of actors Is there a way I can write a Mediawiki API to get the list of all actors in wikipedia and there corresponding links... I am asking for a search in wikipedia :-)))) is it available...
pl let me know
Well it's not as simple as you were probably hoping.
There is only a "MediaWiki API" that is the same on all the projects related to Wikipedia. This means it is totally unaware of the content. This means it knows nothing about Encyclopedias or how one might be formatted to fit in a MediaWiki wiki.
It only knows about the generic concepts and operations of a wiki.
This means you can't query for "all actors".
Fortunately one of the generic concepts of a wiki is that of "categories" and you can use the API to investigate what's in a category:
http://www.mediawiki.org/wiki/API:Categorymembers
You can query the Category:Actors like so: http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&...
This will return not just the pages about actors though, but anything people have put in the category, some of which may surprise you. Here's an example in XML:
<?xml version="1.0"?> <api> <query> <categorymembers> <cm pageid="35149376" ns="0" title="Tany Youne" /> <cm pageid="35963938" ns="0" title="Donna Wyant" /> <cm pageid="35778902" ns="14" title="Category:Actors by award" /> </categorymembers> </query> <query-continue> <categorymembers cmcontinue="..." /> </query-continue> </api>
Most importantly in a category you will usually find subcategories, so to build up a list of all the actors in Wikipedia you will need to descend recursively into some of those subcategories.
The biggest problem here is that counter to expectations of many, the categories in Wikipedia are not arranged into a strict hierarchy. There is not a "tree" of categories but "graph" connected in all kinds of whimsical ways.
So you will need to analyse yourself the subcategories of the actor category and make a hard-codes list of which to include, or you will need to design some clever heuristics to decide which subcategory paths to follow and which to ignore.
Some of this work may or may not have already been turned into an "ontology" that you can query using SPARQL in DBpedia, which is data mined from Wikipedia:
Good luck. Andrew Dunbar (hippietrail)
mediawiki-api@lists.wikimedia.org