Hi, We have a wiki directory site which is a simple table with names, phone numbers. ect.. I was thinking about creating a search script to do a look up, go through this file and find by anything in the content. I can think of two ways to approach this.. 1. Somehow customize the search to ONLY search within the page for this directory page. 2. Write a script that goes into the database to do a search.. Any inputs/suggestions? Thank you
Liz Kim wrote:
We have a wiki directory site which is a simple table with names, phone numbers. ect.. I was thinking about creating a search script to do a look up, go through this file and find by anything in the content. I can think of two ways to approach this..
- Somehow customize the search to ONLY search within the page for this
directory page. 2. Write a script that goes into the database to do a search.. Any inputs/suggestions?
If you use a MediaWiki template to enter the information in the tables, you get a kind of semantic markup of the data. This can then be harvested, either from an XML dump, or by modifying the MediaWiki software to do the harvesting when a page is saved.
Instead of writing in the page:
{| |- ! Name || Phone No. |- | Lars || 47 |- | Liz || 32 |}
You can write:
{| |- ! Name || Phone No.
{{phonebookentry|name=Lars|no=47}} {{phonebookentry|name=Liz|no=32}}
|}
That kind of markup is a lot easier to harvest and analyze, because it hints at what the values are supposed to mean. And then you let the Template:Phonebookentry contain this:
|- | {{{name}}} || {{{no}}}
One such harvesting attempt for Wikipedia's contents is described on http://meta.wikimedia.org/wiki/User:LA2/Extraktor
On Friday 08 September 2006 09:40, Lars Aronsson wrote:
Liz Kim wrote:
We have a wiki directory site which is a simple table with names, phone numbers. ect.. I was thinking about creating a search script to do a look up, go through this file and find by anything in the content. I can think of two ways to approach this..
- Somehow customize the search to ONLY search within the page for this
directory page. 2. Write a script that goes into the database to do a search.. Any inputs/suggestions?
If you use a MediaWiki template to enter the information in the tables, you get a kind of semantic markup of the data. This can then be harvested, either from an XML dump, or by modifying the MediaWiki software to do the harvesting when a page is saved.
Or you could use Semantic MediaWiki [1] to enter the data (this can also be combined with a template, but need not -- so the syntax of the final pages could be the similar to the Template-approach). The software then does the extraction for you and provides the data in RDF/XML format. We have shown at Wikimania2006 how 7 lines of PHP suffice to load data from this format, even on the fly and over the web (some slides for the tutorial are at [2]). But most other common programming languages have good RDF support as well.
But maybe you would not even need the extraction, since Semantic MediaWiki already has some built-in search functions (which may or may not be useful for your setting).
We also use Semantic MediaWiki in our group-wiki to store our telephone numbers. We do it by putting the numbers on the user pages of our members. A list with all telephone numbers is then created automatically elsewhere in the wiki, and you can directly search for numbers by person. If you do not want to have extra articles for everything that has a telephone number, then Semantic MediaWiki can probably just help you in part of the extraction (e.g. you could get strings of the form "Name: some number" and continue processing these). At least you avoid parsing the wiki articles yourself.
Cheers,
Markus
[1] http://ontoworld.org/wiki/Semantic_MediaWiki [2] http://wikimania2006.wikimedia.org/wiki/Proceedings:MK1
2006/9/8, Liz Kim lizkim270@gmail.com:
Hi, We have a wiki directory site which is a simple table with names, phone numbers. ect.. I was thinking about creating a search script to do a look up, go through this file and find by anything in the content. I can think of two ways to approach this..
- Somehow customize the search to ONLY search within the page for this
directory page. 2. Write a script that goes into the database to do a search.. Any inputs/suggestions? Thank you
Number 1 could be done by making the directory a separate namespace, or by having a template on all those pages, and search on the wanted text in combination with the text of the template.
wikitech-l@lists.wikimedia.org