I am writing a PHP script which aim is to retrieve the size of each row of the history of several articles. For this, I've spent some time looking at the different PHP files in wiki/includesto try to understand the overall architecture of mediawiki. I am not an expert in PHP, but my skils in C/C++ and Java (long time ago though :) helped me. My script reads a list of articles in an input file, and write in an output file, the total length and the realtive size of each edit (difference between current length and previous one). I am using the function Article::getRevisionText in order to get the full text version of old_text. It works perfectly for a limited number of articles, but the script stops randomly when the list is longer. It never stops at the same place even though around the same article (but never at the same sql row), which makes me think that there is probably an error related to memory or buffer issues and sql queries maybe, or at least something being full that needs to be emptied. I've spent several days but my limited PHP skills do not allow me to fix the bug. I am not asking anyone to debug the PHP file, but can anyone have a quick look at the code to tell me whether there are problems in freeing memory, buffers, sql results or anything that would justify the behaviour of the script? That would be a great help for my master thesis.
Thank you.
Kevin Carillo
<?php
define( 'MEDIAWIKI', true ); require_once( './includes/Defines.php' ); require_once( './LocalSettings.php' ); require_once( 'includes/Setup.php' );
$article_title = ""; $prev_article_size =0; $contrib_size =0; $count_edit =0; $log_file_nb =1; $count_article_20 =1; $count_article =1;
#get database handler $dbw =& wfGetDB( DB_MASTER );
#open log_file $log_file = fopen("c:/www/contrib/nms0/nms0_archive_$log_file_nb.txt",'w');
# open file which contains titles of all articles $article_title_file = fopen('c:/www/contrib/article_title_list.txt','r');
# main while loop that read the file containing titles pf articles while (!feof($article_title_file)) {
# prepare log_file: open new file if 20 articles have been logged in current log_file if ($count_article_20 >20) {
# close current handler fclose($log_file);
# prepare name of new log_file $log_file_nb ++;
# reset nb of edits of the vurrent articles $count_edit = 0;
# open new log file $log_file = fopen("c:/www/contrib/nms0/nms0_archive_$log_file_nb.txt",'w');
# reset counter $count_article_20 =1; }
# get current article title $article_title_temp = fgets($article_title_file, 300);
# fgets adds a space character at the end of the article title -> remove it if (strpos($article_title_temp,' ') == FALSE) { $article_title = substr ($article_title_temp,0,strlen($article_title_tempstrlen)-2); } else $article_title = $article_title_temp;
fwrite($log_file,"\n Article: $article_title Nb: $count_article\n"); fwrite($log_file,"------------------------------------------------\n");
# get history of the article $query="select old_id, old_title, old_text, old_flags, old_timestamp from old where old_namespace=0 and old_title = "$article_title" order by old_timestamp;";
$obj = $dbw->doQuery($query); #$dbw->deadlockLoop();
if ( $dbw->numRows( $obj ) ) {
while ( $row = $dbw->fetchObject( $obj ) ) {
$count_edit++;
$old_id = $row->old_id; $old_title = $row->old_title; $old_timestamp = $row->old_timestamp;
$old_text_full = Article::getRevisionText($row);
# calculate length $length = strlen($old_text_full); # calculate contribution size $contrib_size=$length - $prev_article_size; #keep length of article in $prev_article_size $prev_article_size = $length;
fwrite($log_file,"old_title: $old_title edit nb:$count_edit "); fwrite($log_file," length:$length sold_timestamp: $old_timestamp contrib_size: $contrib_size \n");
} # end while loop SQL results
$dbw->freeResult( $obj ); } # end if else { fwrite($log_file,"$sold_title : No History. \n "); }
# update count of articles to track for log_file $count_article_20++;
# update total nb of articles processed $count_article++;
} # end while loop article titles
# close file handlers fclose($log_file); fclose($article_title_file);
?>