Hi all,
I have built a LocalWiki. Now I want the data of it to keep consistent
with the
Wikipedia and one work I should do is to get the data of update from
Wikipedia.
I get the URLs through analyzing the RSS
(
http://zh.wikipedia.org/w/index.php?title=Special:%E6%9C%80%E8%BF%91%E6%9B%…)
and get all HTML content of the edit box by analyzing
these URLs after opening an URL and clicking the ’edit this page’.
(eg:
http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%…
and its edit interface is
http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%…
. However, I encounter two problems during my work.
Firstly, sometimes I can’t open a URL which is from the RSS and I don’t
know why.
That’s because I visit it too frequently and my IP address is prohibited
or the network is too slow?
If the reason is the former, how often can I visit a page of Wikipedia?
Is there a timeout?
Secondly, just as mentioned before
I want to download all HTML of the content in the edit box from Wikipedia,
however,
I can do sometimes but other times I just can download part of it, what’s
the reason?
Thanks
vanessa