Microsoft has unveiled an idea about a grammar and style tool for Word. [1] I proposed something similar for detecting problematic grammatical constructs in the content translation tools.[2] It is a couple of years ago now, and I closed the task.
[1] https://venturebeat.com/2019/05/06/microsoft-debuts-ideas-in-word-a-grammar-... [2] https://phabricator.wikimedia.org/T162525
Perhaps I'll explain this a bit better…
Words can be converted into a vector representation by a word2vec algorithm [1]. After conversion words will be a point in a high dimensional space. Relations between words will then be vectors between such points. Similar relations (or related relations) can be found by operations on such vectors, or sets of vectors. Often this is visualized as queen is to king as woman is to man, and similar relations.
Some relations is quite obvious and common, but some relations simply does not exist. If we can make a probability model over relations (a regression model) then we can estimate the probability of observing a specific relation, and thus be able to say "this does not seem to be a probable word". (Typically one of several sequence models ("Recurrent neural network" [2]) would be used for the estimation, and triplet loss [3] for the training phase.)
It would be like having a "spell right"-metric for text fragments.
Note that this isn't quite as easy as described, as words might have multiple interpretations and that makes it difficult to build a stable vector representation. An example is "car" which is something you typically drive on a road, but it can also be part of a train, or a toy.
[1] https://en.wikipedia.org/wiki/Word2vec [2] https://en.wikipedia.org/wiki/Recurrent_neural_network [3] https://en.wikipedia.org/wiki/Triplet_loss
On Sun, May 19, 2019 at 2:55 PM John Erling Blad jeblad@gmail.com wrote:
Microsoft has unveiled an idea about a grammar and style tool for Word. [1] I proposed something similar for detecting problematic grammatical constructs in the content translation tools.[2] It is a couple of years ago now, and I closed the task.
[1] https://venturebeat.com/2019/05/06/microsoft-debuts-ideas-in-word-a-grammar-... [2] https://phabricator.wikimedia.org/T162525
wikimedia-l@lists.wikimedia.org