Mark Wagner schrieb:
I'm working on a bot to deal with the flood of
no-source and untagged images
on the English Wikipedia. My current design calls for, once a day,
downloading the upload log for the previous 24 hours, then checking each
image description page and adding a template as appropriate. About 2000
images are uploaded each day, and only around 15% need tagging. What's the
best way of getting the wikitext of an article if there's an 85% chance that
you won't be editing it? Is Special:Export faster than starting an edit, or
is there some other method?
Thanks,
Mark
[[en:User:Carnildo]]
Use this code:
<?PHP
function GetPageSource($page) {
$wikiIndexPHP="/w/index.php";
$wikiSrv="en.wikipedia.org";
$fp = fsockopen ($wikiSrv, 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
fputs ($fp,"GET
".$wikiIndexPHP."?title=".urlencode($page)."&action=raw HTTP/1.0
Host: ".$wikiSrv."
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.1)
Gecko/20060111 Firefox/1.5.0.1
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: ".MakeCookieString()."
Cache-Control: max-age=0\r\n\r\n");
while (!feof($fp)) {
$buf.= fgets($fp,128);
}
fclose($fp);
UpdateSessionCookie($buf);
$buf=preg_match ("/\r\n\r\n(.*)$/is",$buf,$hit);
return $hit[1];
}
}
?>
When you call this function it returns the whole page source.
Greets,
Marco
[[:de:User:HardDisk]]