Hi All, I was really hoping get some feedback on the performance of my proof of concept intersections page at http://aerik.com/wikintersections.php - anybody?
This is using the MyISAM table with categories stored as words in one row per page, fulltext indexed. It was a bit faster and much more consistent on my local machine, but I'd really like anybody interested in intersections to throw queries at it and beat it up - see if this might be an efficient enough solution for prime time.
Of course, some difficult to anticipate factors are that if category intersections are adopted and become popular, we will likely see a movement towards more implied categories ("Americans" and "Actors" instead of "American Actors") and fewer deep categories. I can imagine the effect this will have on the index (fewer keywords with each having more entries). I think this works okay as "+Living_people +Articles_with_unsourced_statements" (two very large catgories) performs well.
Thanks, Aerik
P.S. I've been testing this by clicking the links, then picking some other existant article from the wikipedia entry at the previous intersection. I have a lot of noise in my result time, so in a pure form, I think the approach is good with results often coming in in less than .5 seconds, but sometimes the same query, or a query of similar complexity will come in at 2 seconds. I don't know how to extrapolate this to a theory of how it would perform on the live servers.
Aerik Sylvan wrote:
Hi All, I was really hoping get some feedback on the performance of my proof of concept intersections page at http://aerik.com/wikintersections.php - anybody?
The performance was acceptable to me on most of the tests I did. A few outliers came in over 4 seconds,
Of course, some difficult to anticipate factors are that if category intersections are adopted and become popular, we will likely see a movement towards more implied categories ("Americans" and "Actors" instead of "American Actors")
This is why people want the intersections, so the system should be able to scale efficiently enough for it.
Matthew Flaschen
On 1/6/07, Matthew Flaschen matthew.flaschen@gatech.edu wrote:
Of course, some difficult to anticipate factors are that if category intersections are adopted and become popular, we will likely see a movement towards more implied categories ("Americans" and "Actors" instead of "American Actors")
This is why people want the intersections, so the system should be able to scale efficiently enough for it.
Indeed, what we probably want is a way to formalise these "implied categories". Like creating a category "American actors" which simply contains the text "Category:Americans Category:Actors", and having the whole thing work just as you'd expect.
Steve
Steve Bennett wrote:
On 1/6/07, Matthew Flaschen matthew.flaschen@gatech.edu wrote:
Of course, some difficult to anticipate factors are that if category intersections are adopted and become popular, we will likely see a movement towards more implied categories ("Americans" and "Actors" instead of "American Actors")
This is why people want the intersections, so the system should be able to scale efficiently enough for it.
Indeed, what we probably want is a way to formalise these "implied categories". Like creating a category "American actors" which simply contains the text "Category:Americans Category:Actors", and having the whole thing work just as you'd expect.
That shouldn't even be necessary. That just creates the semantic problems we have. It could be as simple as
[[:Intersect:American|Actors|Link text]]
That reminds me. What about category unions?
Matthew Flaschen
On Wednesday 10 January 2007 01:31, Matthew Flaschen wrote:
Steve Bennett wrote:
On 1/6/07, Matthew Flaschen matthew.flaschen@gatech.edu wrote:
Of course, some difficult to anticipate factors are that if category intersections are adopted and become popular, we will likely see a movement towards more implied categories ("Americans" and "Actors" instead of "American Actors")
This is why people want the intersections, so the system should be able to scale efficiently enough for it.
Indeed, what we probably want is a way to formalise these "implied categories". Like creating a category "American actors" which simply contains the text "Category:Americans Category:Actors", and having the whole thing work just as you'd expect.
I did not follow this discussion, but it seems appropriate to point to the Semantic MediaWiki extension, which computes "implied categories" like "American actors" on request (it can combine unions, intersections, namespace membership, and further "semantic properties").
We will assist in any effort towards developing an efficient way of doing this, since our current implementation is probably not fast enough for large wikis. Anyway, APIs and basic user interfaces exist. The homepage of this extension is [1]. See [2] for an example of automatically combining information into new listings, and [3] for a dynamic category intersection of two categories (I fear that ontoworld does not have many examples of things with more than one category). The query syntax used is described at [4].
-- Markus
[1] http://ontoworld.org/wiki/SMW [2] http://ontoworld.org/wiki/Africa [3] http://ontoworld.org/wiki/Special:Ask?query=%5B%5BCategory%3ASemantic+browse... [4] http://ontoworld.org/wiki/Help:Semantic_search
That shouldn't even be necessary. That just creates the semantic problems we have. It could be as simple as
[[:Intersect:American|Actors|Link text]]
That reminds me. What about category unions?
Matthew Flaschen
Aerik Sylvan wrote:
Hi All, I was really hoping get some feedback on the performance of my proof of concept intersections page at http://aerik.com/wikintersections.php - anybody?
The performance (as in, speed) is very good.
The implementation is not. It strips away all apostrophes. It is therefore impossible to use categories that contain an apostrophe in their name, e.g. [[Category:Children's literature]]. I am finding it hard to imagine this as an accident; you must have written code explicitly to remove apohstrophes instead of escaping them properly. Why didn't this ring an alarm bell in you? :-p
wikitech-l@lists.wikimedia.org