Re: [Wikitech-l] (no subject)

28 Jan 2010


      李琴 wrote:
...
Hi all,
  I have  built a LocalWiki.   Now I want the data of it to keep consistent 
with the 
Wikipedia and one work I should do is to get the data of update from 
Wikipedia.   
I get the URLs through analyzing the RSS
(http://zh.wikipedia.org/w/index.php?title=Special:%E6%9C%80%E8%BF%91%E6%9B%B...) 
and get all HTML content of the edit box by analyzing 
these URLs after opening an URL and clicking the ’edit this page’.  
(eg: 
http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%8... 
and its edit interface is 
http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%8... 
.   However, I encounter two problems during my work.
Firstly, sometimes I can’t open a URL which is from the RSS and I don’t 
know why.   
That’s because I visit it too frequently and my IP address is prohibited 
or the network is too slow?
  If the reason is the former, how often can I visit a page of Wikipedia?   
Is there a timeout?
Secondly, just as mentioned before
I want to download all HTML of the content in the edit box from Wikipedia, 
however, 
I can do sometimes but other times I just can download part of it, what’s 
the reason?
Thanks
vanessa
Using the api or special:export you can request several pages per http
request, which is nicer to the system. You should also add a maxlag
parameter.
Obviously you must put a proper User-Agent, so that if your bot causes
issues you can be contacted/banned.
Wikimedia Foundation offers a live feed to keep the wikis up-to-date,
check http://meta.wikimedia.org/wiki/Wikimedia_update_feed_service

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] (no subject)