Markus Kr?tzsch wrote:
I did not follow this discussion, but it seems appropriate to point to the
Semantic MediaWiki extension, which computes "implied categories" like "American actors" on request (it can combine unions, intersections, namespace membership, and further "semantic properties").
We will assist in any effort towards developing an efficient way of doing this, since our current implementation is probably not fast enough for large wikis.
Hi Markus. The semantic mediawiki extension is very cool, but I think the main issue at this point is exactly what you said in your second paragraph: An efficient way to do the data retrieval portion of this stuff (specifically, for me, category intersections). There are a few very neat extensions (semantic mediawiki, DPL, a home brewed category intersections special page I did for media wiki 1.4x) but they are not fast enough for a large wiki. This is the problem I'm trying to solve (for category intersections, anyway), and then we can hash out interfaces etc. I've got a test script using a MySQL fulltext index that may be good enough, and if it isn't, I'll do one using Lucene (the php version).
Maybe though, it's appropriate to talk about what features category intersections and semantic mediawiki share, and see if we can't find (data retrieval) solutions for both. I'm not familar with the backend of semantic mediawiki at all, so I can't comment on that.
Best Regards, Aerik
Best Regards, Aerik
On Thursday 11 January 2007 18:31, Aerik Sylvan wrote:
Markus Kr?tzsch wrote:
I did not follow this discussion, but it seems appropriate to point to the
Semantic MediaWiki extension, which computes "implied categories" like "American actors" on request (it can combine unions, intersections, namespace membership, and further "semantic properties").
We will assist in any effort towards developing an efficient way of doing this, since our current implementation is probably not fast enough for large wikis.
Hi Markus. The semantic mediawiki extension is very cool, but I think the main issue at this point is exactly what you said in your second paragraph: An efficient way to do the data retrieval portion of this stuff (specifically, for me, category intersections). There are a few very neat extensions (semantic mediawiki, DPL, a home brewed category intersections special page I did for media wiki 1.4x) but they are not fast enough for a large wiki. This is the problem I'm trying to solve (for category intersections, anyway), and then we can hash out interfaces etc. I've got a test script using a MySQL fulltext index that may be good enough, and if it isn't, I'll do one using Lucene (the php version).
Maybe though, it's appropriate to talk about what features category intersections and semantic mediawiki share, and see if we can't find (data retrieval) solutions for both. I'm not familar with the backend of semantic mediawiki at all, so I can't comment on that.
SMW's backend trivially extends the DB layout to add some tables for storing all semantic information. This is fast enough for small wikis, and only for those. Querying generates a lot of joins among a few large tables, quite similar to the situation with category intersections.
SMW differs from category intersection problem in that it also considers other properties (in addition to "is element of category"). In general, it stores data of the form
A has_property B with value C
i.e. triples. Out current storage model is a so called "single table approach": have (essentially) one large table with (essentially) three columns A, B, and C. Another approach is to have one table for each B, with two columns A and C. This generates smaller tables but you can get large numbers of tables. There are hybrid approaches that are better. But it seems that smart caching strategies, and a not-quite-realtime computation could be more robust solutions to achieve practical scalability.
Considering text-indexing for category intersection, I do not see how this could be used for SMW, since the property (B) is implicit. A typical SMW-search would be: give me all As with a property B1 with unknown value X that has a property B2 with value C, i.e. search for the pattern "A -B1-> * -B2-> C" (example: give me all cities which have a mayor that is a member of the democratic party). Not an easy task.
Anyway, if you have results for category intersections we would be interested to hear about them. We can also provide our not-too-slow Wikipedia test-server for large scale experiments.
Regards,
Markus
Best Regards, Aerik
Best Regards, Aerik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org