Re: [Wikibots-l] Retrieving page source without editing

11 Mar 2006

Mark Wagner schrieb:
...
  I'm working on a bot to deal with the flood of
no-source and untagged images
 on the English Wikipedia.  My current design calls for, once a day,
 downloading the upload log for the previous 24 hours, then checking each
 image description page and adding a template as appropriate.  About 2000
 images are uploaded each day, and only around 15% need tagging.  What's the
 best way of getting the wikitext of an article if there's an 85% chance that
 you won't be editing it?  Is Special:Export faster than starting an edit, or
 is there some other method?

 Thanks,
 Mark
 [[en:User:Carnildo]] Use this code:
<?PHP
function GetPageSource($page) {
         $wikiIndexPHP="/w/index.php";
         $wikiSrv="en.wikipedia.org";
         $fp = fsockopen ($wikiSrv, 80, $errno, $errstr, 30);
         if (!$fp) {
                 echo "$errstr ($errno)<br />\n";
         } else {
         fputs ($fp,"GET 
".$wikiIndexPHP."?title=".urlencode($page)."&action=raw HTTP/1.0
Host: ".$wikiSrv."
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.0.1) 
Gecko/20060111 Firefox/1.5.0.1
Accept: 
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: ".MakeCookieString()."
Cache-Control: max-age=0\r\n\r\n");
                 while (!feof($fp)) {
                         $buf.= fgets($fp,128);
                 }
                 fclose($fp);
                 UpdateSessionCookie($buf);
                 $buf=preg_match ("/\r\n\r\n(.*)$/is",$buf,$hit);
                 return $hit[1];
         }
}
?>
When you call this function it returns the whole page source.

Greets,

Marco
[[:de:User:HardDisk]]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wikibots-l] Retrieving page source without editing