Re: [Xmldatadumps-l] Wikipedia page content dump based on category

9 Oct 2012

Hi Pablo,

Thank you for pointing to the right link. DBPedia seems to be the place for
me to use for querying the information.

Regards,
Venkatesh

On Tue, Oct 9, 2012 at 3:01 PM, Pablo N. Mendes &lt;pablomendes(a)gmail.com&gt;wrote;wrote:

...

 You should probably take a look at DBpedia.

 Go to http://dbpedia.org/sparql and try this query:

 select distinct * where {
   ?song <http://purl.org/dc/terms/subject> <
 http://dbpedia.org/resource/Category:Hindi_songs> .
   ?song rdf:type <http://dbpedia.org/ontology/Song> .
   ?song <http://dbpedia.org/ontology/artist> ?artist .
   ?song <http://dbpedia.org/ontology/runtime> ?runtime .
 }
 limit 100

 If that's interesting for you there is more info at:
 http://dbpedia.org/

 In that case, the right list would be:
 https://lists.sourceforge.net/lists/listinfo/dbpedia-users

 Cheers,
 Pablo

 On Mon, Oct 8, 2012 at 4:45 PM, Venkatesh Channal <
 venkateshchannal(a)gmail.com&gt; wrote:

  Hi,

 I would like to fetch all page text information of all wiki pages that
 belong to a movie category. Eg:
 http://en.wikipedia.org/wiki/Category:Hindi_songs

 From the page text I would like to extract information related to song
 title, song length, singer, name of movie/album etc. I am not interested in
 extracting images just the information about the song.

 My questions:

 1) Is there a way to download only those pages that I am interested in
 that belong to a particular category instead of downloading the entire dump?

 2) Is it required to have PHP knowledge to install the db dump on a local
 machine?

 3) Are there are tools that extract the information and provide the
 required data to be stored in MySQL database?

 If this is not the right forum to have my questions answered could you
 please redirect me to the appropriate forum.

 Thanks and regards,
 Venkatesh Channal

 _______________________________________________
 Xmldatadumps-l mailing list
 Xmldatadumps-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

 --
 ---
 Pablo N. Mendes
 http://pablomendes.com
 Events: http://wole2012.eurecom.fr

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Xmldatadumps-l] Wikipedia page content dump based on category