Hi,
I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz
If you are somehow interested or just want to share your thoughts do not hesitate to contact me.
Best regards, Jozef Misutka __________________________________ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka
Hi Jozef,
I just played around a bit and liked what I saw, though I didn't see much, as the site was very slow.
How did you strip the dump of the non-mathematical articles? I am asking because one of the major uses that I have in mind for a good mathematical search engine would be to identify areas around topic A (say, theoretical biology) that use the same concepts as those in topic B (say, economics). Very often such distant fields are only weakly connected, but solutions or approaches that work in one of them are not infrequently transferable. In order to be useful for such purposes, your corpus would still have to contain the economics/ theoretical biology articles (at least those that use equations), but I couldn't find evidence for that.
Daniel
On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misutka@ksi.mff.cuni.cz wrote:
Hi,
I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz
If you are somehow interested or just want to share your thoughts do not hesitate to contact me.
Best regards, Jozef Misutka __________________________________ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Daniel,
On Tue, Apr 3, 2012 at 12:15 AM, Daniel Mietchen < daniel.mietchen@googlemail.com> wrote:
Hi Jozef,
I just played around a bit and liked what I saw, though I didn't see much, as the site was very slow.
it was a HW failure (RAID5 I think...). Anyway, it was fixed several hours ago.
How did you strip the dump of the non-mathematical articles?
Very simply: a mathematical article is an article which contains "</math" inside.
I do not claim to have a perfect Wikipedia tag parser but the vast majority of the formulae in Wikipedia are typeset using standard Wikipedia rules and are simply inside text which is fine.
I am
asking because one of the major uses that I have in mind for a good mathematical search engine would be to identify areas around topic A (say, theoretical biology) that use the same concepts as those in topic B (say, economics). Very often such distant fields are only weakly connected, but solutions or approaches that work in one of them are not infrequently transferable.
That is exactly one of the interesting applications for a mathematical search engine.
I wanted to reply to you with something interesting, so I called my friend asking him about interesting formulae from economy. He told me about Vasicek model, so I tried to search for the formula dr_t = a(b-r_t) dt + \sigma dW_t which resulted in 2 hits at no abstraction level - no big deal. But then I tried to abstract it and another hit came which is imho interesting (different variables used but the same formula).
Vasicek modelhttp://egomath.projekty.ms.mff.cuni.cz/index.php?q=&math=dr_t+%3D+a%28b-r_t%29+dt+%2B+%5Csigma+dW_t&hide_snippets=0 Vasicek model and similarhttp://egomath.projekty.ms.mff.cuni.cz/index.php?q=&math=dr_t+%3D+a%28b-r_t%29+dt+%2B+%5Csigma+dW_t&level=9&hide_snippets=0
In order to be useful for such purposes, your corpus would still have to contain the economics/ theoretical biology articles (at least those that use equations), but I couldn't find evidence for that.
See the number of documents (and categories) when you search for simple text e.g., economy http://egomath.projekty.ms.mff.cuni.cz/index.php?math=&q=economy biology http://egomath.projekty.ms.mff.cuni.cz/index.php?math=&q=biology
Jozef
Daniel
On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misutka@ksi.mff.cuni.cz wrote:
Hi,
I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz
If you are somehow interested or just want to share your thoughts do not hesitate to contact me.
Best regards, Jozef Misutka __________________________________ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Dear Jozef,
a good example - abstractions like that from dr_t = a(b-r_t) dt + \sigma dW_t in http://en.wikipedia.org/wiki/Vasicek_model to dx_t = \theta (\mu-x_t),dt + \sigma, dW_t in http://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process are indeed very useful and an invitation to play.
Thank you!
Daniel
On Wed, Apr 4, 2012 at 3:03 AM, Jozef Misutka misutka@ksi.mff.cuni.cz wrote:
Hi Daniel,
On Tue, Apr 3, 2012 at 12:15 AM, Daniel Mietchen daniel.mietchen@googlemail.com wrote:
Hi Jozef,
I just played around a bit and liked what I saw, though I didn't see much, as the site was very slow.
it was a HW failure (RAID5 I think...). Anyway, it was fixed several hours ago.
How did you strip the dump of the non-mathematical articles?
Very simply: a mathematical article is an article which contains "</math" inside.
I do not claim to have a perfect Wikipedia tag parser but the vast majority of the formulae in Wikipedia are typeset using standard Wikipedia rules and are simply inside text which is fine.
I am asking because one of the major uses that I have in mind for a good mathematical search engine would be to identify areas around topic A (say, theoretical biology) that use the same concepts as those in topic B (say, economics). Very often such distant fields are only weakly connected, but solutions or approaches that work in one of them are not infrequently transferable.
That is exactly one of the interesting applications for a mathematical search engine.
I wanted to reply to you with something interesting, so I called my friend asking him about interesting formulae from economy. He told me about Vasicek model, so I tried to search for the formula dr_t = a(b-r_t) dt + \sigma dW_t which resulted in 2 hits at no abstraction level - no big deal. But then I tried to abstract it and another hit came which is imho interesting (different variables used but the same formula).
Vasicek model Vasicek model and similar
In order to be useful for such purposes, your corpus would still have to contain the economics/ theoretical biology articles (at least those that use equations), but I couldn't find evidence for that.
See the number of documents (and categories) when you search for simple text e.g., economy http://egomath.projekty.ms.mff.cuni.cz/index.php?math=&q=economy biology http://egomath.projekty.ms.mff.cuni.cz/index.php?math=&q=biology
Jozef
Daniel
On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misutka@ksi.mff.cuni.cz wrote:
Hi,
I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz
If you are somehow interested or just want to share your thoughts do not hesitate to contact me.
Best regards, Jozef Misutka __________________________________ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Daniel,
if you want precise extraction, try the Sweble parser, http://sweble.org
Cheers, Dirk
On 03.04.2012 00:15, Daniel Mietchen wrote:
Hi Jozef,
I just played around a bit and liked what I saw, though I didn't see much, as the site was very slow.
How did you strip the dump of the non-mathematical articles? I am asking because one of the major uses that I have in mind for a good mathematical search engine would be to identify areas around topic A (say, theoretical biology) that use the same concepts as those in topic B (say, economics). Very often such distant fields are only weakly connected, but solutions or approaches that work in one of them are not infrequently transferable. In order to be useful for such purposes, your corpus would still have to contain the economics/ theoretical biology articles (at least those that use equations), but I couldn't find evidence for that.
Daniel
On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutkamisutka@ksi.mff.cuni.cz wrote:
Hi,
I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz
If you are somehow interested or just want to share your thoughts do not hesitate to contact me.
Best regards, Jozef Misutka __________________________________ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org