Mark Wagner schrieb:
I'm working on a bot to deal with the flood of no-source and untagged images on the English Wikipedia. My current design calls for, once a day, downloading the upload log for the previous 24 hours, then checking each image description page and adding a template as appropriate. About 2000 images are uploaded each day, and only around 15% need tagging. What's the best way of getting the wikitext of an article if there's an 85% chance that you won't be editing it? Is Special:Export faster than starting an edit, or is there some other method?
Thanks, Mark [[en:User:Carnildo]]
Use this code: <?PHP function GetPageSource($page) { $wikiIndexPHP="/w/index.php"; $wikiSrv="en.wikipedia.org"; $fp = fsockopen ($wikiSrv, 80, $errno, $errstr, 30); if (!$fp) { echo "$errstr ($errno)<br />\n"; } else { fputs ($fp,"GET ".$wikiIndexPHP."?title=".urlencode($page)."&action=raw HTTP/1.0 Host: ".$wikiSrv." User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: ".MakeCookieString()." Cache-Control: max-age=0\r\n\r\n"); while (!feof($fp)) { $buf.= fgets($fp,128); } fclose($fp); UpdateSessionCookie($buf); $buf=preg_match ("/\r\n\r\n(.*)$/is",$buf,$hit); return $hit[1]; } } ?> When you call this function it returns the whole page source.
Greets,
Marco [[:de:User:HardDisk]]