On 10/24/2010 8:42 PM, Aryeh Gregor wrote:
My first thought was to write a GPU program to crack
MediaWiki
password hashes as quickly as possible, then use what we've studied in
class about GPU architecture to design a hash function that would be
as slow as possible to crack on a GPU relative to its PHP execution
speed, as Tim suggested a while back. However, maybe there's
something more interesting I could do.
Boring.
I want Wikipedia converted into facts in a representation system
that supports modal, temporal, and "microtheory" reasoning. You
know, in the "real" world, :James_T_Kirk is a :Fictional_Character,
but in the Star Trek universe, he's a :Person.
Of course, you'd have to pick some chunk out of that big task
that's doable. One thing I'd like is something that extracts the
"meaning" of hyperlinks. For instance, if we look at
http://en.wikipedia.org/wiki/Bruce_Lee
We see a link to :Wong_Jack_Man, and in dbpedia right now, this is
represented as a unidirectional hyperlink w/o semantics. Now, a
smarter system could say
:Bruce_Lee :Had_A_Fight_With :Wong_Jack_Man.
Although wikipedia is a relatively difficult text to work with with
typical BOW and NLP methods, it's got enough semantic structure that
hybrid semantic-BOW/NLP methods ought to be able to work miracles. I
think that the way hyperlinks are used in text could be used to learn
templates for detecting named entity references. I think it also ought
to be possible to build linguistic models for classification. For
instance, if you're having trouble telling your Jaguars apart,
http://en.wikipedia.org/wiki/Jaguar_(disambiguation)
<http://en.wikipedia.org/wiki/Jaguar_%28disambiguation%29>
and related documents might help you make a filter that can tell the
difference between "jaguar the cat" and "jaguar the car".