I am writing a PHP script which aim is to retrieve the size of each row of
the history of several articles.
For this, I've spent some time looking at the different PHP files in
wiki/includesto try to understand the overall architecture of mediawiki.
I am not an expert in PHP, but my skils in C/C++ and Java (long time ago
though :) helped me.
My script reads a list of articles in an input file, and write in an output
file, the total length and the realtive size of each edit (difference
between current length and previous one). I am using the function
Article::getRevisionText in order to get the full text version of old_text.
It works perfectly for a limited number of articles, but the script stops
randomly when the list is longer.
It never stops at the same place even though around the same article (but
never at the same sql row), which makes me think that there is probably an
error related to memory or buffer issues and sql queries maybe, or at least
something being full that needs to be emptied.
I've spent several days but my limited PHP skills do not allow me to fix the
bug.
I am not asking anyone to debug the PHP file, but can anyone have a quick
look at the code to tell me whether there are problems in freeing memory,
buffers, sql results or anything that would justify the behaviour of the
script?
That would be a great help for my master thesis.
Thank you.
Kevin Carillo
<?php
define( 'MEDIAWIKI', true );
require_once( './includes/Defines.php' );
require_once( './LocalSettings.php' );
require_once( 'includes/Setup.php' );
$article_title = "";
$prev_article_size =0;
$contrib_size =0;
$count_edit =0;
$log_file_nb =1;
$count_article_20 =1;
$count_article =1;
#get database handler
$dbw =& wfGetDB( DB_MASTER );
#open log_file
$log_file = fopen("c:/www/contrib/nms0/nms0_archive_$log_file_nb.txt",'w');
# open file which contains titles of all articles
$article_title_file = fopen('c:/www/contrib/article_title_list.txt','r');
# main while loop that read the file containing titles pf articles
while (!feof($article_title_file)) {
# prepare log_file: open new file if 20 articles have been logged in
current log_file
if ($count_article_20 >20) {
# close current handler
fclose($log_file);
# prepare name of new log_file
$log_file_nb ++;
# reset nb of edits of the vurrent articles
$count_edit = 0;
# open new log file
$log_file =
fopen("c:/www/contrib/nms0/nms0_archive_$log_file_nb.txt",'w');
# reset counter
$count_article_20 =1;
}
# get current article title
$article_title_temp = fgets($article_title_file, 300);
# fgets adds a space character at the end of the article title -> remove
it
if (strpos($article_title_temp,' ') == FALSE) {
$article_title = substr
($article_title_temp,0,strlen($article_title_tempstrlen)-2);
}
else
$article_title = $article_title_temp;
fwrite($log_file,"\n Article: $article_title Nb: $count_article\n");
fwrite($log_file,"------------------------------------------------\n");
# get history of the article
$query="select old_id, old_title, old_text, old_flags, old_timestamp
from old where old_namespace=0 and old_title =
\"$article_title\"
order by old_timestamp;";
$obj = $dbw->doQuery($query);
#$dbw->deadlockLoop();
if ( $dbw->numRows( $obj ) ) {
while ( $row = $dbw->fetchObject( $obj ) ) {
$count_edit++;
$old_id = $row->old_id;
$old_title = $row->old_title;
$old_timestamp = $row->old_timestamp;
$old_text_full = Article::getRevisionText($row);
# calculate length
$length = strlen($old_text_full);
# calculate contribution size
$contrib_size=$length - $prev_article_size;
#keep length of article in $prev_article_size
$prev_article_size = $length;
fwrite($log_file,"old_title: $old_title edit nb:$count_edit ");
fwrite($log_file," length:$length sold_timestamp:
$old_timestamp contrib_size: $contrib_size \n");
} # end while loop SQL results
$dbw->freeResult( $obj );
} # end if
else {
fwrite($log_file,"$sold_title : No History. \n ");
}
# update count of articles to track for log_file
$count_article_20++;
# update total nb of articles processed
$count_article++;
} # end while loop article titles
# close file handlers
fclose($log_file);
fclose($article_title_file);
?>