It'd be great if you could train this on different OSS packages like Wordpress, Mediawiki and Vanilla forum to generate a site-wide search of content, and also extract related posts/articles/discussions from one topic.
Hi,
We have developed a fast similarity search algorithm (FastSS) for keyword search and we have used English Wikipedia articles to test it. The similarity metric is the edit distance between words, which is language independent. The result is displayed according to the occurrence and edit distance. The website (http://fastss.csg.uzh.ch/) has a demo of our prototype.
The indexing of the complete English Wikipedia takes ~3 days. We are still working on improving the performance, solving issues with umlauts, improving the rendering of the output page and integrating the title in the indexing phase.
Should we work towards a mediawiki integration of FastSS? Is it of interest to you to include FastSS into wikimedia? Any comments are welcome.
Regards,
Thomas Bocek
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l