Hi Pablo,
Thank you for pointing to the right link. DBPedia seems to be the place for me to use for querying the information.
Regards, Venkatesh
On Tue, Oct 9, 2012 at 3:01 PM, Pablo N. Mendes pablomendes@gmail.comwrote:
You should probably take a look at DBpedia.
Go to http://dbpedia.org/sparql and try this query:
select distinct * where { ?song http://purl.org/dc/terms/subject < http://dbpedia.org/resource/Category:Hindi_songs%3E . ?song rdf:type http://dbpedia.org/ontology/Song . ?song http://dbpedia.org/ontology/artist ?artist . ?song http://dbpedia.org/ontology/runtime ?runtime . } limit 100
If that's interesting for you there is more info at: http://dbpedia.org/
In that case, the right list would be: https://lists.sourceforge.net/lists/listinfo/dbpedia-users
Cheers, Pablo
On Mon, Oct 8, 2012 at 4:45 PM, Venkatesh Channal < venkateshchannal@gmail.com> wrote:
Hi,
I would like to fetch all page text information of all wiki pages that belong to a movie category. Eg: http://en.wikipedia.org/wiki/Category:Hindi_songs
From the page text I would like to extract information related to song title, song length, singer, name of movie/album etc. I am not interested in extracting images just the information about the song.
My questions:
- Is there a way to download only those pages that I am interested in
that belong to a particular category instead of downloading the entire dump?
- Is it required to have PHP knowledge to install the db dump on a local
machine?
- Are there are tools that extract the information and provide the
required data to be stored in MySQL database?
If this is not the right forum to have my questions answered could you please redirect me to the appropriate forum.
Thanks and regards, Venkatesh Channal
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
--
Pablo N. Mendes http://pablomendes.com Events: http://wole2012.eurecom.fr