Hi Daniel,


On Tue, Apr 3, 2012 at 12:15 AM, Daniel Mietchen <daniel.mietchen@googlemail.com> wrote:
Hi Jozef,

I just played around a bit and liked what I saw, though I didn't see
much, as the site was very slow.

it was a HW failure (RAID5 I think...). Anyway, it was fixed several hours ago.

 

How did you strip the dump of the non-mathematical articles?

Very simply: a mathematical article is an article which contains "&lt;/math" inside. 

I do not claim to have a perfect Wikipedia tag parser but the vast majority of the formulae in Wikipedia are typeset using standard Wikipedia rules and are simply inside text which is fine. 


I am
asking because one of the major uses that I have in mind for a good
mathematical search engine would be to identify areas around topic A
(say, theoretical biology) that use the same concepts as those in
topic B (say, economics). Very often such distant fields are only
weakly connected, but solutions or approaches that work in one of them
are not infrequently transferable.

That is exactly one of the interesting applications for a mathematical search engine.

I wanted to reply to you with something interesting, so I called my friend asking him about interesting formulae from economy. He told me about Vasicek model, so I tried to search for the formula
dr_t = a(b-r_t) dt + \sigma dW_t
which resulted in 2 hits at no abstraction level - no big deal. But then I tried to abstract it and another hit came which is imho interesting (different variables used but the same formula).

Vasicek model
Vasicek model and similar

 
In order to be useful for such
purposes, your corpus would still have to contain the economics/
theoretical biology articles (at least those that use equations), but
I couldn't find evidence for that.

See the number of documents (and categories) when you search for simple text e.g., 
economy
http://egomath.projekty.ms.mff.cuni.cz/index.php?math=&q=economy
biology
http://egomath.projekty.ms.mff.cuni.cz/index.php?math=&q=biology

Jozef
 

Daniel

On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka <misutka@ksi.mff.cuni.cz> wrote:
> Hi,
>
> I want to introduce a *mathematical* search engine working over English
> Wikipedia dump. The key advantage is simple - *it works* ;).
> Better than a nice speech is a real demo which can be found here:
> http://egomath.projekty.ms.mff.cuni.cz
>
> If you are somehow interested or just want to share your thoughts do not
> hesitate to contact me.
>
> Best regards,
> Jozef Misutka
> __________________________________
> Charles University in Prague,
> Department of Software Engineering,
> www: http://www.ksi.mff.cuni.cz/cs/~misutka
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l