Hi,
Am 25.07.2009, 17:13 Uhr, schrieb River Tarnell river@loreley.flyingparchment.org.uk:
.NET is not, itself, non-free. Microsoft's implementation (the most common one) is, but Mono (http://mono-project.com/Main_Page) is not. perhaps the AWB developers could make whatever changes are needed to run it on a free implementation.
Mono works great, I'm using bots using the DotNetWikiBot framework on the toolserver.
For simple parsing of a pages-articles.xml file, you may test a script, I used some time ago - it is a very simple xml parser (for the pages-articles.xml structure) and calls a function called "test" with the article title and the text of the article. Its not the perfect solution but the solution implemented in five minutes ;)
function test($title, $text) { // do something here }
$filename = "enwiki-200XXXXX-pages-articles.xml"; $dataFile = fopen($filename, "r"); if ($dataFile) { $status = 0; while (!feof($dataFile)) { $buffer = fgets($dataFile, 4096); if (($status == 0) && (stripos($buffer, "<page>") !== false)) $status = 1; elseif (($status == 1) && (stripos($buffer, "<title>") !== false)) $title = strip_tags($buffer); elseif (($status == 1) && (stripos($buffer, "<revision>") !== false)) $status = 2; elseif (($status == 2) && (stripos($buffer, "<text ") !== false)) { $status = 3; $text = strip_tags($buffer); if (stripos($buffer, "</text>") !== false) { $status = 2; } } elseif (($status == 3) && (stripos($buffer, "</text>") === false)) $text .= strip_tags($buffer); elseif ($status == 3) { $text .= strip_tags($buffer); $status = 2; } elseif (($status == 2) && (stripos($buffer, "</revision>") !== false)) $status = 1; elseif (($status == 1) && (stripos($buffer, "</page>") !== false)) { test(trim($title), trim($text)); $title = ""; $text = ""; $status = 0; } } fclose($dataFile); } else { die("File not found: $filename"); }