Liz Kim wrote:
We have a wiki directory site which is a simple table with names, phone numbers. ect.. I was thinking about creating a search script to do a look up, go through this file and find by anything in the content. I can think of two ways to approach this..
- Somehow customize the search to ONLY search within the page for this
directory page. 2. Write a script that goes into the database to do a search.. Any inputs/suggestions?
If you use a MediaWiki template to enter the information in the tables, you get a kind of semantic markup of the data. This can then be harvested, either from an XML dump, or by modifying the MediaWiki software to do the harvesting when a page is saved.
Instead of writing in the page:
{| |- ! Name || Phone No. |- | Lars || 47 |- | Liz || 32 |}
You can write:
{| |- ! Name || Phone No.
{{phonebookentry|name=Lars|no=47}} {{phonebookentry|name=Liz|no=32}}
|}
That kind of markup is a lot easier to harvest and analyze, because it hints at what the values are supposed to mean. And then you let the Template:Phonebookentry contain this:
|- | {{{name}}} || {{{no}}}
One such harvesting attempt for Wikipedia's contents is described on http://meta.wikimedia.org/wiki/User:LA2/Extraktor