Hi,
Am 25.07.2009, 17:13 Uhr, schrieb River Tarnell
<river(a)loreley.flyingparchment.org.uk>uk>:
.NET is not, itself, non-free. Microsoft's
implementation (the most
common
one) is, but Mono (
http://mono-project.com/Main_Page) is not. perhaps
the AWB
developers could make whatever changes are needed to run it on a free
implementation.
Mono works great, I'm using bots using the DotNetWikiBot framework on the
toolserver.
For simple parsing of a pages-articles.xml file, you may test a script, I
used some time ago - it is a very simple xml parser (for the
pages-articles.xml structure) and calls a function called "test" with the
article title and the text of the article. Its not the perfect solution
but the solution implemented in five minutes ;)
function test($title, $text)
{
// do something here
}
$filename = "enwiki-200XXXXX-pages-articles.xml";
$dataFile = fopen($filename, "r");
if ($dataFile)
{
$status = 0;
while (!feof($dataFile))
{
$buffer = fgets($dataFile, 4096);
if (($status == 0) && (stripos($buffer, "<page>") !==
false))
$status = 1;
elseif (($status == 1) && (stripos($buffer, "<title>") !==
false))
$title = strip_tags($buffer);
elseif (($status == 1) && (stripos($buffer, "<revision>")
!== false))
$status = 2;
elseif (($status == 2) && (stripos($buffer, "<text ") !==
false))
{
$status = 3;
$text = strip_tags($buffer);
if (stripos($buffer, "</text>") !== false) { $status = 2; }
}
elseif (($status == 3) && (stripos($buffer, "</text>") ===
false))
$text .= strip_tags($buffer);
elseif ($status == 3)
{
$text .= strip_tags($buffer);
$status = 2;
}
elseif (($status == 2) && (stripos($buffer, "</revision>")
!==
false))
$status = 1;
elseif (($status == 1) && (stripos($buffer, "</page>") !==
false))
{
test(trim($title), trim($text));
$title = ""; $text = "";
$status = 0;
}
}
fclose($dataFile);
}
else
{
die("File not found: $filename");
}