Thursday, September 1st, 2005, 8:21 am
Machine Learning and Grammar
The Science Blog outlines a method for teaching the computer various languages without any human involvement, i.e. in a data-driven manner.
[computers can learn languages] “…autonomously and without previous information infer the underlying rules of grammar. The rules can then be used to generate new and meaningful sentences. The method also works for such data as sheet music or protein sequences.”
Having quotes the above, I believe there is nothing truly novel to find in the study. The blog item does not have any mentioning of Markov models. That, in my blunt opinion, is proof that the author fails to grasp the real merits of the method. It also appears as if the author has limited knowledge in the field which is technically discussed. The ability to look at sequences of words up to a certain depth (as much as brute-force permits) presently produces nice textures in graphics and get good flow of coherent text (a paper-generating tool from MIT comes to mind).
The method can indeed perform more admirably than by making use of probabilistic models solely. It can extract grammatical content, but so can logical inference and theorem prover programs like Vampire (Voronkov and his students from Manchester University). As a matter fact, a few years ago, as part of a course on languages and semantics, we wrote programs in ML that translated English sentences into first-order logic. This technology has been out in the wild for many, many years.