Alexander Pruss's Blog: syntax

Showing posts with label syntax. Show all posts

Wednesday, May 14, 2025

Semantics of syntactically incorrect language

As anyone who has talked with a language-learner knows, syntactically incorrect sentences often succeed in expressing a proposition. This is true even in the case of formal languages.

Formal semantics, say of the Tarski sort, has difficulties with syntactically incorrect sentences. One approach to saving the formal semantics is as follows: Given a syntactically incorrect sentence, we find a contextually appropriate syntactically correct sentence in the vicinity (and what counts as vicinity depends on the pattern of errors made by the language user), and apply the formal semantics to that. For instance, if someone says “The sky are blue”, we replace it with “The sky is blue” in typical contexts and “The skies are blue” in some atypical contexts (e.g., discussion of multiple planets), and then apply formal semantics to that.

Sometimes this is what we actually do when communicating with someone who makes grammatical errors. But typically we don’t bother to translate to a correct sentence: we can just tell what is meant. In fact, in some cases, we might not even ourselves know how to translate to a correct sentence, because the proposition being expressed is such that it is very difficult even for a native speaker to get the grammar right.

There can even be cases where there is no grammatically correct sentence that expresses the exact idea. For instance, English has a simple present and a present continuous, while many other languages have just one present tense. In those languages, we sometimes cannot produce an exact grammatically correct translation of an English sentence. One can use some explicit markers to compensate for the lack of, say, a present continuous, but the semantic value of a sentence using these markers is unlikely to correspond exactly to the meaning of the present continuous (the markers may have a more determinate semantics than the present continuous). But we can imagine a speaker of such a language who imitates the English present continuous by a literal word-by-word translation of “I am” followed by the other language’s closest equivalent to a gerund, even when such translation is grammatically incorrect. In such a case, assuming the listener knows English, the meaning may be grasped, but nobody is capable of expressing the exact meaning in a syntactically correct way. (One might object that one can just express the meaning in English. But that need not be true. The verb in question may be one that does not have a precise equivalent in English.)

Thus we cannot account for the semantics of syntactically incorrect sentences by applying semantics to a syntactically corrected version. We need a semantics that works directly for syntactically incorrect sentences. This suggests that formal semantics are necessarily mere approximate models.

Similar issues, of course, arise with poetry.

Friday, August 26, 2016

Do stipulations change the language?

Technical and legal writing often contains stipulations. The stipulations change the meanings of words already in the language and sometimes introduce neologisms. It seems, however, that technical and legal writing in English is still writing in English. After all, the stipulations are given in English, and stipulation is a mechanism of the English language, akin to macros in some computer programming languages. But we can now suppose that there is a pair of genuinely distinct natural languages, A and B, such that the grammatical structure of A is a subset of the grammatical structure of B, so that if we take any sentence of A, we can translate it word-by-word or word-by-phrase to a sentence of B. Now we can imagine Jan is a speaker of B and as a preamble she goes through all the vocabulary of A and stipulates its meaning in B. She then speaks just like a speaker of A.

When Jan utters something that sounds just like a sentence of A, and means the same thing as the sentence of A, is she speaking B or A? It seems she is speaking B. Stipulation is a mechanism of B, after all, and she is simply heavily relying on this mechanism.

Of course, there probably is no such pair of natural languages. But there will be partial cases of this, particularly if A is restricted to, say, a technical subset, and if we have a high tolerance for artificial-sounding sentences of B. And we can imagine that eventually a human language will develop (whether "naturally" or by explicit construction) that not only allows the stipulation of terms, but has highly flexible syntax, like some programming languages. At this point, they will be able to speak their extensible language, but with one preamble sound just like speakers of French and with another just like speakers of Mandarin. But the language itself wouldn't be a superset of French or Mandarin. And eventually the preamble could be skipped. The language could have a convention where by adopting a particular accent and intonation, one is implicitly speaking within the scope of a preamble made by another speaker, a preamble that stipulated which accent and intonation counted as a switch to the scope of that preamble. Then all we would need to do is to have a speaker (or a family of speakers) give a French-preamble and another speaker give a Mandarin-preamble. As soon as any speaker of our flexible language starts accenting and intoning as in French or Mandarin, their language falls under the scope of the preamble. (The switch of accent and tone will be akin to importing a module into a computer program.) But it's important to note that the production of a preambles should not be thought of as a change in the language any more than saying "Let's call the culprit x" changes English. It's just another device within the old language.

What's the philosophical upshot of these thought experiments? Maybe not that much. But I think they confirm some thoughts about language that people have had already. First, the question of when a language is being changed and when one is simply making use of the flexible facilities of the original question is probably not well-defined. Second, given linguistic flexibility, the idea of context-free sentences and of lexical meaning independent of context is deeply problematic. Stipulative preambles are a kind of context, and any sentence can have its meaning affected by them. There might be default meanings in the absence of some marker, but the absence of a marker is itself a marker. Third, we get further confirmation of the point here that syntax is in general semantically fraught, since it is possible to make the choice of preamble be conditional on how the world is. Fourth, this line of thought makes more plausible the idea that in some important sense we are all speaking subsets of the same language (cf. universal grammar).

This post is based on a line of inquiry I'm pursuing: What can we learn about language from computer languages?

Thursday, August 25, 2016

Syntax and semantics

One of the things that I've been puzzled by for a long time is the distinction between syntax and semantics. Start with this syntactically flawed bit of English:

Obama equals.

It is syntactically flawed, because "to equal" is a transitive verb, and a sentence that applies an intransitive verb to a single argument is ungrammatical, just as an atomic sentence in First Order Logic that applies a binary predicate to one argument is ungrammatical. (I leave open the further question whether "Obama equals Obama" is grammatically correct; maybe the arguments of the English "equals" have to be quantities.) This is a matter of syntax. But now consider this more complicated bit of language:

Let xyzzing be sitting if the temperature is more than 34 degrees and let it be equalling otherwise. Obama xyzzes.

My second sentence makes perfect sense when the temperature is 40 degrees, but is ungrammatical in exactly the same way that (1) is when the temperature is 30 degrees. Its grammaticality is, thus, semantically dependent.

One might object that the second sentence of (2) is syntactically correct even when the temperature is 30 degrees. It's just that it then has a semantic value of undefined. This move is similar to how we might analyze this bit of Python code:

def a(f): print(f(1))
def g(x,y): x+y
def h(x): 2*x
a(h if temperatureSensor()>34 else g)

This code will crash with

TypeError: <lambda>() takes exactly 2 arguments (1 given)

when the temperature sensor value is, say, 30. But the behavior of a program, including crashing, is a matter of semantics. The Python standard (I assume) specifies that the program is going to crash in this way. I could catch the TypeError if I liked with try/except, and make the program politely print "Sorry!" when that happens instead of crashing. There is no syntactic problem: print(f(1)) is always a perfectly syntactically correct bit of code, even though it throws a TypeError whenever it's called with f having only one argument.

I think the move to say that it is the semantic value of the second sentence of (2) that depends on temperature, not its grammaticality, is plausible. But this move allows for a different way of attacking the distinction between syntax and semantics. Once we've admitted that the second sentence of (2) is always grammatical but sometimes has the undefined value, we can say that (1) is grammatically correct, but always has the semantic value of undefined, and the same is true for anything else that we didn't want to consider grammatically correct.

One might then try to recapture something like the syntax/semantics distinction by saying things like this: an item is syntactically incorrect in a given context provided that it's a priori that its semantic value in that context is undefined. This would mean that (2) is syntactically correct, but the following is not:

Let xyzzing be sitting if Fermat's Last Theorem is false and let it be equalling otherwise. Obama xyzzes.

For it's a priori that Fermat's Last Theorem is true. I think, though, that a syntax/semantics distinction that distinguishes (2) from (3) is too far from the classical distinction to count as an account of it.

It may, however, be the case that even if there is no general distinction between syntax and semantics, in the case of particular languages or families of languages one can draw a line in the sand for convenience of linguistic analysis. But as a rule of thumb, nothing philosophically or semantically deep should rely on that line.

Now it's time to be a bit of a hypocrite and prepare my intermediate logic lecture, where instilling the classical distinction between syntax and semantics is one of my course objectives. But FOL is a special case where the distinction makes good sense.

Wednesday, April 22, 2015

System-relativity of proofs

There is a generally familiar way in which the question whether a mathematical statement has a proof is relative to a deductive system: for a proof is a proof in some system L, i.e., the proof starts with the axioms of L and proceeds by the rules of L. Something can be provable in one system—say, Euclidean geometry—but not provable in another—say, Riemannian geometry.

But there is a less familiar way in which the provability of a statement is relative. The question whether a sentence p is provable in a system L is itself a mathematical question. Proofs are themselves mathematical objects—they are directly the objects in a mathematical theory of strings of symbols and indirectly they are the objects of arithmetic when we encode them using something like Goedel numbering. The question whether there exists a proof of p in L is itself a mathematical question, and thus it makes sense to ask this question in different mathematical systems, including L itself.

If we want to make explicit both sorts of relativity, we can say things like:

p has (does not have) a proof in a system L according to M.

Here, M might itself be a deductive system, in which case the claim is that the sentence "p has (does not have) a proof in L" can itself be proved in M (or else we can talk of the Goedel number translation of this), or M might be a model in which case the claim is that "p has a proof in L" is true in that model.

This is not just pedantry. Assume Peano Arithmetic (PA) is consistent. Goedel's second incompleteness theorem then tells us that the consistency of PA cannot be proved in PA. Skipping over the distinction between a sentence and its Goedel number, let "Con(PA)" say that PA is consistent. Then what we learn from the second incompleteness theorem is that:

Con(PA) has no proof in PA.

Now, statement (2), while true, is itself not provable in PA.[note 1] Hence there are non-standard models of PA according to which (2) is false. But there are also models of PA according to which (2) is true, since (2) is in fact true. Thus, there are models of PA according to which Con(PA) has no proof and there are models of PA according to which Con(PA) has a proof.

This has an important consequence for philosophy of mathematics. Suppose we want to de-metaphysicalize mathematics, move us away from questions about which axioms are and are not actually true. Then we are apt to say something like this: mathematics is not about discovering which mathematical claims are true, but about discovering which mathematical claims can be proved in which systems. However, what we learn from the second incompleteness theorem is that the notion of provability carries the same kind of exposure to mathematical metaphysics, to questions about the truth of axioms, as naively looking for mathematical truths did.

And if one tries to de-metaphysicalize provability by saying that what we are after in the end is not the question whether p is provable in L, but whether p is provable in L according to M, then that simply leads to a regress. For the question whether p is provable in L according to M is in turn a mathematical question, and then it makes sense to ask according which system we are asking it. The only way to arrest the regress seems to be to suppose that at some level that we simply are talking of how things really are, rather than how they are in or according to a system.

Maybe, though, one could say the following to limit one's metaphysical exposure: Mathematics is about discovering proofs rather than about discovering what has a proof. However, this is a false dichotomy, since by discovering a proof of p, one discovers that p has a proof.

Tuesday, April 10, 2012

Top-down and bottom-up syntax

There are two fundamentally different approaches to syntax. One way starts at the bottom, with fundamental building blocks like names, variables and predicates, and thinks of a sentence as built up out of these by applying various operators. Thus, we get "The cat is on the mat and the dog is beside the mat" from elements like "the cat", "is on", "the mat", "the dog" and "is beside", by using operators like conjunction and binary-predication:

"The cat is on the mat and the dog is beside the mat" = conjunction(binary-predication("is on", "the cat", "the mat"), binary-predication("is beside", "the cat", "the mat")).

We can then parse the sentence back down into the elements it came from by inverting the operators (and if the operators are many-to-one there will be parsing ambiguity).

The other approach starts at the top with a sentence (or, more generally, well-formed formula) and then parses it by using parsing relations like conjoins (e.g., "p and q" conjoins "p" and "q") or binarily-applies (e.g., "the cat is on the mat" binarily-applies "is on" to "the cat" and "the mat").

There are four reasons I know of for preferring the top-down approach.

A. The possibility of multiple ways of expressing the same structure. For instance, "p and q" conjoins "p" and "q", but it's not the only way of conjoining these: "p but q" also conjoins "p" and "q". The bottom-up approach can handle this by having multiple conjunction operators like conjoin-with-and, conjoin-with-but and conjoin-with-and-also, but then we need to introduce a higher order property of these operators that says that they are conjunctions. Moreover, we should not suppose separate operators in cases where the meaning is the same, and sometimes the meaning will be exactly the same.

B. Partial sense. There is no way of forming the sentence

2+2=5 and the borogove is mimsy

in the bottom-up approach, because "borogove" is not a noun of English and "is mimsy" is not a predicate of English, so there is nothing to plug into a unary-predication operator to form the second conjunct. But on the top-down approach, we can do a first step of parsing the sentence: (2) conjoins "2+2=5" and "the borogove is mimsy". And we know that one conjunct is false, so we conclude that (2) isn't true before we even start asking whether the second conjunct makes sense.

C. Ungrammatical sentences. The bottom-up approach has no way of making sense of ungrammatical sentences like a non-native speaker's

Jane love Bob.

For there is no predicate F such that the sentence is equal to binary-predicate(F, "Jane", "Bob"), so there is no way of parsing. But the top-down approach is not committed to all sentences coming from application of specified predicates. But the top-down approach can say that (3) binarily-applies "loves" to "Jane" and "Bob", school-marmish opinions to the contrary notwithstanding. The bottom-up approach can handle ungrammatical sentences in two different ways. One way is to suppose that any particular ungrammatical sentence is in fact a mistaken version of a grammatically correct sentence. Maybe that's true for (3), but I doubt that this is tenable for the full range of understandable but grammatically incorrect sentences. The second is to include a range of ungrammatical operators, such as binary-predicate-dropping-suffix-s. This is not satisfactory—there are too many such.

D. Extensibility. It's an oversimplification to think that a sentence that applies a predicate is formed simply out of the predicate and its arguments by means of a predication operator. There are other elements that need to be packaged up into the sentence, such as emphasis, degree of confidence, connotation, etc. These may be conveyed by tone of voice, context or choice of "synonym". One could handle this in two ways on the bottom-up view. One way is to add additional argument slots to the predication operators, slots for emphasis type, confidence, connotation, etc. This is messy, because as we discover new features of our language, we will have to keep on revising the arity of these operators. The second approach is to suppose that a sentence is formed by applying additional operators, such an emphasis operator or a confidence operator, after applying, say, the last predication operator. Thus, a particular instance of "Socrates is wise" might be the result of:

confidence(emphasis(predication("Socrates, "is wise"), 3.4), .98).

But now we can't take the resulting sentence and directly parse it into subject and predicate by simply inverting the predication operator. We first have to invert the confidence operator, and then we have to invert the emphasis operator. In other words, parsing requires a large number of other operators to invert. But on the top-down approach, this is easy. For if S is our confidenced and emphasized token of "Socrates is wise", then applies(S, "is wise", "Socrates"). No need to invert several additional operators to say that. If we are interested in the other features of S, however, then we can see what other parsing predicates, such as has-confidence, can be applied to S. But that's optional. Because we are not parsing in principle by inverting compositional operators, we don't need to worry about the other operators when we don't care about that aspect of the communicative content.

There is also a down-side to the top-down approach. Because of point C, we have no way of codifying its parsing predicates like binarily-applies for natural languages. That, I think, is exactly how it should be.

Monday, October 12, 2009

Some naive thoughts on syntax

I am neither a linguist nor a philosopher of language, so what I will say is naive and may be completely silly.

It seems to be common to divide up the task of analyzing language between syntax and semantics. Syntax determines how to classify linguistic strings into categories such as "sentence", "well-formed formula", "predicate", "name", etc. If the division is merely pragmatic, that's fine. But if something philosophical is supposed to ride on the division, we should be cautious. Concepts like "sentence" and "predicate" are ones that we need semantic vocabulary to explain—a sentence is the sort of thing that could be true or false, or maybe the sort of thing that is supposed to express a proposition. A predicate is the sort of thing that can be applied to one or more referring expressions.

If one wants syntax to be purely formal, we should see it as classifying permissible utterances into a bunch of formal categories. As pure syntactitians, we should not presuppose any particular set of categories into which the strings are to be classified. If we are not to suppose any specific semantic concepts, the basic category should be, I think, that of a "permissible conversation" (it may well be that the concept of a "conversation" is itself semantic—but it will be the most general semantic concept). Then, as pure syntactitians, we study permissible conversations, trying to classify their components. We can model a permissible conversation as a string of characters tagged by speaker (we could model the tagging as colors—we put what is spoken by different people in different colors). Then as pure syntactitians, we study the natural rules for generating permissible conversations.

It may well be that in the case of a human language, the natural generating rules for speakers will make use of concepts such as "sentence" and "well-formed formula", but this should not be presupposed at the outset.

Here is an interesting question: Do we have good reason to suppose that if we restricted syntax to something to be discovered by this methodology, the categories we would come up with would be at all the familiar linguistic categories? I think we are not in a position to know the answer to this. The categories that we in fact have were not discovered by this methodology. They were discovered by a mix of this methodology and semantic considerations. And that seems the better way to go to generate relevant syntactic categories than the road of pure syntax. But the road that we in fact took does not allow for a neat division of labor between syntax and semantics, since many of our syntactic categories are also natural semantic ones, and their semantic naturalness that goes into making them linguistic relevant.

Thursday, November 1, 2007

Syntax and context (Language, Part II)

To resume my attempt to erase the distinction between utterance and context, I shall argue that to judge whether a sentence is syntactically correct can require information about what looks like "context".

First, note that the fact that a given uttered sequence of sounds is in language or dialect L₁ rather than in some other language L₂ is surely very much a contextual feature, and one that has to be known in order to judge syntactic correctness. It may be that the speaker announced which language she was speaking, or that she is assumed to be speaking the same language as her interlocutor previously was speaking, or that she is speaking the majority language in the culture. Often one can guess from the words said, or the accent with which they are said, what language is being spoken. Often, but not always. Consider a case where we have two closely cognate languages. Then, one will sometimes have a case that the same set of sounds could be parsed either as a correct sentence of one language or as a somewhat mispronounced sentence of the other language. Which is the right interpretation depends on which language was contextually established as the one in which this utterance is being, and this in turn will answer the question whether the utterance was syntactically correct. So, if we recognize such a thing as context at all, we should likewise recognize as part of the context the fact that a givien language was in play, and hence we should conclude that context is relevant to syntax.

Second, we can come up with some somewhat odd-ball cases. You say: "Jones was walking" and I add "under the bridge". Whether what I said was syntactically correct depends on what you had said--had you said nothing, my addition would have been nonsense. So, what you say helps determine syntactic correctness.

Third, it seems to me that in gendered languages to use the wrong gender in words that refer to the speaker is to make a syntactic mistake--but then whether a sentence is syntactically correct will depend on whether the speaker is male or female, an apparently contextual feature. We can imagine even more radical versions of this--we can imagine a language where, say, completely different word order is to be used by men and by women. Similarly, it seems a syntactic mistake for a collective to speak in the first person singular. However, I am aware that there are other ways of interpreting the gender/number case (one might say that all the sentences in question are syntactically correct regardless of who says them, but there is some other kind of error in them).

So what? Well, if we need to know what we normally think of as context to determine syntactic correctness, then it seems that the choice of the setting in which we utter a set of noises is just as much a linguistic choice as the choice of what noises to make, because the setting and the noises interact to produce a syntactically correct or incorrect sentence.

Alexander Pruss's Blog