Re: [Toolserver-l] Five questions

21 Sep 2006


      On Thu, 21 Sep 2006 09:54:31 +0200, toolserver-l-request@Wikipedia.org  
wrote:
...
Message: 13
Date: Thu, 21 Sep 2006 09:54:29 +0200
From: Stefan K?hn kuehn-s@gmx.net
Subject: [Toolserver-l] Five questions
To: Mailingliste Toolserver toolserver-l@Wikipedia.org
Message-ID: 45124535.8050706@gmx.net
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
...
Can I use Perl to get some data from the database? (I work since three
weeks with Perl and CGI, so I need a small example.)
Yes. I have no experience with it (I use python, which -imo- is easier  
than perl), but google showed
http://www.codeproject.com/perl/perldbi.asp
which could be useful. Perl/SQL is - for example- used at slashdot.org
...
Question 2:
Can I use a SQL-Question with perl? For example: I want the category's
of page "xy" in DE.
See the previous link. Make a SQL query for it; something like
"select * from dewiki_p.templatelinks where tl_namespace=<template  
namespace> and tl_title=<template title>".
...
Question 3.
When I scan all articles from the Dump (XML-File) for coordinates I need
with Perl in EN 45 minutes and in DE 15 minutes. If I use in the future
hopefully the MySQL-database I think this process will be to long for
the database. So that all other services have a problem. Is this right?
Or is the power of the database strong enough for this full-text search?
There is no need for a full-text search. If you get all articles that have  
the template on it, you will only need the text from those pages. Getting  
the text then regexping it on hemlock (with a low priority) probably is  
the best way to do it. You will need the text tables for it though, and  
they are not yet available.
...
Question 4.
After the full-text search I will put the results in the database.
Therefore I need also a Perl-example. Please help me.
See the same link.
...
Question 5.
If I can not do the full-text search at the MySQL-database I think it
would be very helpful to make one directory for all users with all dumps
(XML-files). So that every user can use this dumps. At the moment I have
the current dump (DE+EN) in my home-directory, but I am sure that other
users also have dumps. What did you think about this?
Wait until the text tables are there ;) - a FULLTEXT index of the text  
table would be useful, but probably very space-consuming. However, if  
zedler is overloaded by all the database requests, it could be an idea to  
do xml-fulltext-searches on hemlock (on low priority). Just my $0.02
-valhallasw

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] Five questions