"...it is notorious that the ideal of a grammar which fully succeeds in correctly distinguishing grammatical from ungrammatical sequences has never been attained for even one speaker."
From a discussion about "loss of generality" on the Funknet discussion list (http://lloyd.emich.edu/cgi-bin/wa?A2=ind0406&L=funknet&D=0&P=3622):
'The particular analysis which interests me is one I found in a historical
retrospective by Fritz Newmeyer and others "Chomsky's 1962 programme for
linguistics" (in Newmeyer's "Generative Linguistics -- A Historical
Perspective", Routledge, 1996, and apparently also published in "Proc. of the
XVth International Congress of Linguists".)
Newmeyer is talking mostly about Chomsky's "Logical basis of linguistic
"...the real bone of contention was phonology and the phoneme concept, as Murray (1981:110-111) has pointed out; compare Archibald A. Hill's observation:
I think that if one can speak of partial survival [in the revolution of Chomskyan and post-Chomskyan linguistics], I have partially survived it. [...].I could stay with the Transformationalists pretty well, until they attacked my darling, the phoneme. I will never be a complete transformationalist because I am still a phonemicist. (Hill 1980:75)"
Similarity modeling - Similarity modeling uses the same parameters as distributional analysis, but it assumes nice clusters are not possible. Instead it makes ad-hoc analogies by averaging sets of properties, as required:
E.g. Dagan, Marcus, Markovitch '95: (p.g. 32, long version) "It has been
traditionally assumed that ... information about words should be
generalized using word classes ... However, it was never clearly shown
that unrestricted language is indeed structured in accordance with
(and previously on p.g. 4) "... our method assumes that
Berkeley Linguistics Society, vol. 13 (1987), 139-157
"I am concerned in this paper with the ... the assumption of an
abstract, mentally represented rule system which is somehow implemented when
...'Culture is temporal, emergent, and disputed'
(Clifford 1986:19). I believe the same is true of grammar, which like speech
itself must be viewed as a real-time, social phenomenon, and therefore is
temporal; its structure is always deferred, always in a process but never
arriving, and therefore emergent; and since I can only choose a tiny fraction
Work like the classic seminal work of Pawley and Syder demonstrate natural language is far from random, but is equally far from regular:
"The problem we are addressing is that native speakers do not
exercise the creative potential of syntactic rules to anything like
their full extent, and that, indeed, if they did do so they would
not be accepted as exhibiting nativelike control of the language.
The fact is that only a small proportion of the total set of grammatical
sentences are nativelike in form - in the sense of being
Natural language appears to be random (c.f. from the Hutter Prize page):
"...in 1950, Claude Shannon estimated the entropy (compression limit)
of written English to be about 1 bit per character . To date, no
compression program has achieved this level."
The most successful contemporary natural language technologies are probabilistic.
The usual explanation is that something external selects between alternatives which are equally probable on linguistic grounds. Commonly this external factor is assumed to be "meaning".
From Heisenberg to Goedel via Chaitin
Authors: C.S. Calude, M.A. Stay
(Submitted on 26 Feb 2004 (v1), last revised 11 Jul 2006 (this version, v6))
From a discussion on the "Hutter-Prize" Google group (http://groups.google.com/group/Hutter-Prize/browse_thread/thread/bfea185...):
...If A_1, A_2,... A_n
are the contexts of A in some text, and X_1, X_2,...X_n are contexts of
other tokens, then the number of ways A can have common contexts with
other tokens in the text, and thus uniquely specify some new
paradigmatic class, are just Matt's "(n choose k) = n!/(k!(n-k)!)
possible sets", where k is the number of common contexts between A and
some other token.
The syntagmatic distribution of sequences AX_? specified by these
From a thread on the comp.compression discussion list, 2007 (http://groups.google.com/group/Hutter-Prize/browse_thread/thread/bfea185...):
I think the insight we have been looking for to model AI/NLP is that
the information needed to code different ways of ordering a system
(knowledge) is always greater than the information to code the system
itself (for a random system.)
In the context of AI/NLP it is important to note that random need not
mean indeterminate. I hope I demonstrated this in our earlier thread on