AUTO EXTRACTION FROM WWW - Wikitech-l

14 Oct 2003

Dear Sir,
	 We are a group of 3 students currently pursuing our B.E - IT
(Bachelor of Engg. Information Technology)from the Mumbai
University, INDIA. 
As of now we are working on a project titled " AUTO EXTRACTION
OF CONTENTS FROM THE WORLD WIDE WEB" as a part of our BE
project, 
in the renowned institute os HBCSE-TIFR 
( Homi Bhabha Center for Science Education - Tata Institute of
Fundamental Research) under the guidance of Scientist
Dr.Nagarjuna.G.

	Our project is based on 
				OS	     - GNU/LINUX	
				Language     - Python
				Server       - Zope
				Application  - GNOWSYS

GNOWSYS, Gnowledge Networking and Organizing System, is a web
application for developing and maintaining semantic web content
developed in Python and works as an installed product in Zope 	
Our project involves automatically extracting data from the
(WWW) World Wide Web) & use GNOWSYS for handling this vast
amount of data. This will not only help us store data in the
Gnowledge base in form of meaningful relationships but also see
its handling of huge amount of data.
The URL for our site is "http://www.gnowledge.org"

With this regards we could think no one but Wikipedia, which in
itself is a phenomenon.

We would be glad if u could answer to few of our queries :

1] What is the format in which the data is stored in Wikipedia
???
2] Apart from http or ftp are there any other specific protocols
that are in use, 
   which will be required to communicate to the Wikipedia Server
???
3] How can we utilize the SQL dump ???

We hope you will answer our queries at the earliest
With warm regards
							      Thanking You

						[ Rameez Don , Jaymin Darbari, Ulhas Dhuri ]
________________________________
15 Mbytes Free Web-based and  POP3
Sign up now: http://www.gawab.com