Hi Markus,
regardless of the beta label of mathoid: If something is broken, report it on Phabricator
and it will be fixed quite soon.
Maybe it would be wise, if we released mathoid 1.0, to get rid of that label.
I think semantic math is possible, but not today.
I invite you to discuss the efforts in building a global digital mathematical library, but
maybe this is not the best thread for that.
From my understanding we Wikidata is in phase 2
according to
https://de.wikipedia.org/wiki/Wikidata
thus data from Wikipedia can be exported to wikidata and displayed on Wikipedia
thereafter.
What we are doing here is supporting math according to the definition of <math> tag
in wikitext.
Nothing more and nothing less.
I'm convinced that starting with this approach is the best thing we can do.
Writing a script that would convert that to MathML in the future is well studied.
Going back from MathML to TeX is currently not possible, to the best of my knowledge.
Best
Moritz
Moritz Schubotz
TU Berlin, Fakultät IV
DIMA - Sekr. EN7
Raum E-N 741
Einsteinufer 17
D-10587 Berlin
Germany
Tel.: +49 30 314 22784
Mobil:+49 1578 047 1397
E-Mail: schubotz(a)tu-berlin.de
Skype: Schubi87
ICQ: 200302764
Msn: Moritz(a)Schubotz.de
-----Ursprüngliche Nachricht-----
Von: Markus Krötzsch [mailto:markus@semantic-mediawiki.org]
Gesendet: Donnerstag, 4. Februar 2016 18:47
An: Schubotz, Moritz; Discussion list for the Wikidata project.
Betreff: Re: AW: AW: AW: [Wikidata] upcoming deployments/features
Hi Moritz,
On 04.02.2016 12:08, Schubotz, Moritz wrote:
Hi Markus,
with regard to the beta status: I think if word was open source it would be called
word-beta.
:-D
You sound like me when I was a student. However, let's be serious: there are quality
criteria in software engineering, and "open source" is just a license model and
has nothing to do with these. The site you linked says "beta", but the API input
underneath says "unstable", which sounds as if it might change in incompatible
ways in the future. This sounds like more work for me as a user who would need to rely on
this.
Mathoid can be installed locally, via npm install
mathoid.
That's not what I meant. As I understand, you could have a Javascript application
using MathJax that, once loaded, will be able to render all MathJax-compatile math even
when you are offline, without requiring you to install anything on your machine. You could
also provide these resources locally on your intranet without running a dedicated
service.
Also, to be clear: for me, this is not a question of "Mathoid vs MathJax". What
I was suggesting was to use a subset of LaTeX that works in both, so that implementers
have a choice of what they like most. I am sure they both have their pros and cons.
If your application supports HTML5 you can just
include the MathML returned by your local mathoid installation (or take the cached version
from the service wmf provides for the public benefits.
You could also install your own restbase instance for caching.
However, this requires some technical skills. Gabriel Wicke and me are working on Docker
Containers, to simplify the installation procedure.
When you use MathML there are (in theory) no problems with the styling and the
integration to your custom application.
However, for devices that do not fully support HTML5 the fallback images are not optimal,
since their shape is fixed.
They look similar to the way how LaTeX would render the input and do not adjust to the
layout.
While LaTeX is appreciated by many scientists, web browsers are no TeX renders and
display the declarative style information.
Note, that this is a completely different approach from imperative TeX typesetting
instructions.
MathJax now tries to support imperative typesetting instructions within a declarative
document.
While this is a nice bridge technology, we should finally aim for full declaratively.
I could not agree more, but for this very reason it is hard to see how TeX could be a good
basis for this in general. Any tool that translates from TeX to MathML must make some
sense of the spacial relationship of symbols as typeset by a LaTeX program, which can only
ever be an approximate, heuristic process. This is all good, but as you put it, it is
necessarily bridge technology.
Putting that in a broader picture, I completely share your initial skepticism in starting
with this texvc dialect.
The optimal way would certainly be to support content MathML to support all the formula
semantics
https://www.w3.org/TR/MathML3/chapter4.html .
However, while this is a nice idea I think it's not very likely at the moment that
people would enter content MathML expressions.
(yes, obviously, this is not a workable surface syntax)
Therefore, I think it's reasonable to start with
the same format that used within Wikipedia, and keep the formats in sync.
For the future one could image a tex dialect that includes semantic
macros that link to the semantic concepts as defined in the MathML
sepc
You are arguing as if we could change datatypes every month or so.
Wikidata is a big, slow-moving project. People rely on it. When you introduce a new type
of data, you get a significant number of people started entering such data, editing it,
developing own quality criteria and guidelines for it, building external applications that
use special libraries, and so on. Why is it so hard to build a visual editor for
Wikipedia? It's not because there is no technical route to do it, it is because we
have to deal with more than a decade of history in our databases. Your arguments above are
only about technology, disregarding the users (it's like saying "Let's just
start with Wikitext -- we can always move to a visual editor in a few years!"). It
seems that, however, the move you make now is what will hinder and slow down the
transition you are hoping for.
I also have the feeling that you are not aware of the way in which Wikidata is used. Our
data is not just an internal source code like the wikitext in Wikipedia might be -- it is
our main content. Your mathoid display, however good it is, is just a UI. The real data is
LaTeX now:
you made it into the main *exchange format* for math on Wikidata.
Don't get me wrong: it might still be for the best. Maybe a real, semantic math markup
is so far away that we just cannot wait for it.
Just like Wikipedia would never have happened if they would have waited for visual editors
to become available. On the other hand, Wikipedia is also one of the last sites on the
planet that will get to use such technology. There is always a trade-off. I have been (and
still am) missing a discussion of the costs and benefits of this. This whole issue is a
communication problem much more than a technical problem (in particular, don't apply
my critique this to your programming work).
For example the following input
$ Z(t) = \exp@{\iunit \vartheta(t)} \RiemannZeta@{\tfrac{1}{2}+\iunit
t} $ Which would be rendered as displayed here
http://drmf.wmflabs.org/wiki/Formula:DLMF:25.10:E1
The input form above was used by the editors of the DLMF
http://dlmf.nist.gov/ to produce a digital version of the Handbook of Mathematical
Functions with Formulas, Graphs, and Mathematical Tables, edited by Milton Abramowitz and
Irene A. Stegun.
While there are a lot of plans what could be done in the future like
for instance
- Identifier Namespaces in Mathematical Notation
http://de.slideshare.net/AlexeyGrigorev/identifier-namespaces-in-mathe
matical-notation)
- Wolfram Alpha integration
https://www.dima.tu-berlin.de/menue/theses/open_theses/msc_integrating
_computer_algebra_systems_and_word_processors_formulae/
We still need to walk before we run. I.e. start with something simple and plan more
advanced stuff for the future.
The intermediate next steps are discussed here
https://phabricator.wikimedia.org/T67397
If you think that there is an idea that is ready to implement please share it in the
structured task tracker.
I am not sure I really got all the "intermediate next steps" from the rather
long bug discussion there, but thanks for the useful pointer to where this design has come
into being (I had not seen this in previous discussions). Here is what I got:
It all started with the desire to display and calculate, but the latter was given up. It
seems that the option of using a smaller subset of LaTeX to improve compatibility with
other tools has not been considered there. What I was also missing is some technical
description of what the datatype should eventually contain (like a simple grammar that
defines all strings that are permitted as input).
There are some interesting confusions about what "semantics" of an operator
would even mean. For example, there is the question whether the join and meet in a Boolean
algebra are the same as the conjunction and disjunction in first-order logic (clearly not,
and even less so than TomTom said already: one is a "semantic" operation on
Boolean algebras while the other is an operator in a term algebra used to define the
*syntax* of logic, which is not a Boolean algebra at all; the corresponding semantic
operations in first-order logic would be a join and a union, which would agree with meet
and join on the two-element Boolean algebra if you would define your semantics bottom up
as in most logic textbooks -- and, yes, you could factorise the sytnax into a Lindenbaum
algebra too, which gives you yet another way to get a Boolean algebra here).
If you look at that case, you can see that the idea of a semantic markup might be
hopelessly futuristic. I am ready to accept that layout is the only thing we can hope for
in this domain, and that therefore LaTeX is the best choice. But then one could still use
a standard format rather than one with those weird pseudo-semantic macros like \and and
\or that pretend to refer to an operator when in reality they do not. It is interesting to
note that LaTeX primarily uses command names that describe the shape of the glyphs and
avoid any semantic connotation (\bowtie rather than \join, \wedge rather than \and, etc.).
I think there is a lot of wisdom in that.
Thank you again for all your input and the interest in that project.
Thanks for communicating ;-)
Markus
P.S. You mentioned that you are not on this mailing list. You really should be, given that
you are apparently the main contact person for one of the (small number of) datatypes we
have in Wikidata. It is a huge responsibility.
P.P.S. I can see now in your footer that you are also involved with MathML. I find it
quite ironic that I am trying to convince you that a custom LaTeX dialect is maybe not
what we should have picked to exchange math on the Web. Do you realise that you could have
pushed this whole thing into using MathML internally, offering LaTeX only as a surface
syntax and compatibility mode for texvc? It even seems as if you have the technology ready
to do this.
*Disclaimer: I'm a PhD student in the Database Systems and Information Management
Group. While this message reflects my personal opinion, I might have been influenced from
Database Research ideas.Moreover, I'm director of the MathML association and there
committed to the association goals in enabling math rendering in all Web rendering engines
http://mathml-association.org/ . In addition, I'm an offsite collaborator of the
National Institute of Standards and Technology in the USA and I really appreciate
standards.
Moritz Schubotz
TU Berlin, Fakultät IV
DIMA - Sekr. EN7
Raum E-N 741
Einsteinufer 17
D-10587 Berlin
Germany
Tel.: +49 30 314 22784
Mobil:+49 1578 047 1397
E-Mail: schubotz(a)tu-berlin.de
Skype: Schubi87
ICQ: 200302764
Msn: Moritz(a)Schubotz.de
-----Ursprüngliche Nachricht-----
Von: Markus Krötzsch [mailto:markus@semantic-mediawiki.org]
Gesendet: Donnerstag, 4. Februar 2016 08:20
An: Schubotz, Moritz; Discussion list for the Wikidata project.
Betreff: Re: AW: AW: [Wikidata] upcoming deployments/features
Hi Moritz,
On 03.02.2016 15:25, Schubotz, Moritz wrote:
Hi Markus,
I think we agree on the goals cf.
http://arxiv.org/abs/1404.6179 By
the way the texvc dialect is now 13 years old at least.
For now it's required to be 100% compatible to the texvc dialect in order to use
wikidata in Mediawiki instances.
However, for the future there are also plans to support more markup.
But all new options are blocked by
https://phabricator.wikimedia.org/T74240
Mathoid, the service that converts the texvc dialect to MathML, SVG + PNG can also be
used without a MediaWiki instance.
I posted links to the Restbase Web UI before.
api.formulasearchengine.com (with experimental features)
de.wikipedia.org/api (stable)
This is the API you said "has been opened to the public just moments ago" and
which describes itself as "currently in beta testing"? That seems a bit shaky to
say the least. In your email, you said that this API was for extracting LaTeX package
names and identifiers, not for rendering content, so I have not looked at it for this
purpose. How does this compare to MathJax in terms of usage? Are the output types
similar?
It seems your solution adds the dependency on an external server, so this cannot be used
in offline mode, I suppose? How does it support styling of content for your own
application, e.g., how do you select the fonts to be used?
I think we agree that real documentation should be a bit more than an unexplained link in
an email. Anyway, it is not your role to provide documentation on new Wikidata features or
to make sure that stakeholders are taken along when new features are deployed, so
don't worry too much about this. I am sure your students did a good job implementing
this, and from there on it is really in other people's hands.
Cheers,
Markus
Am 03.02.2016 um 14:31 schrieb Markus Krötzsch:
Hi Moritz,
I must say that this is not very reassuring. So basically what we
have in this datatype now is a "LaTeX-like" markup language that is
only supported by one implementation that was created for MediaWiki,
and partially by a LaTeX package that you created.
Markus, this TeX dialoect is not a new invention by Moritz. It's what
the Math extension for MediaWiki has been using for over a decade
now, and it's used on hundreds of thousands of pages on Wikipedia.
All that we are doing now is making this same exact syntax available
for property values on wikibase, using the same exact code for rendering it.
I think having consistent handling for math formulas between wikitext
and wikibase is the right thing to do. Of course it would have been
nice for MediaWiki to not invent it's own TeX dialect for this, but
it's 10 years to late for that complaint now.
Moritz, I seem to recall that the new Math extension uses a
standalone service for rendering TeX to PNG, SVG, or MathML. Can that
service easily be used outside the context of MediaWiki?