7

I am reading on CPP macro expansion and wanted to understand expansion when the (optional) token-string is not provided. I found gcc v4.8.4 does this:

$ cat zz.c
#define B
(B)
|B|
$ gcc -E zz.c
# 1 "zz.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "zz.c"

()
| |

Can anyone explain why the expansion is zero spaces in one instance and one in the other?

3
  • 1
    Probably because | and || are different operators. I hope someone writes an answer with exact rules the cpp uses. Commented Jun 12, 2015 at 23:42
  • It can be funny seeing somebody write x | value | SOME_FLAG and it turns out to be x || SOME_FLAG, it's probably this way so it won't compile. Commented Jun 12, 2015 at 23:53
  • But that would mean || is treated as a special case. This could be avoided by always inserting spaces, which would not be against C rules (except for macro concatenations). Commented Jun 12, 2015 at 23:56

3 Answers 3

7

The C preprocessor operates on "tokens" and whenever there's a possibility of changing the meaning or ambiguity, it always adds whitespace in order to preserve the meaning.

Consider your example,

(B)

there's no ambiguity or meaning altering whether there's a space between ( and ) added or not irrespective of the macro value of B.

But it's not the case with

|B|

Depending on the macro B, this above could either be || or |something|. So preprocessor is forced to add a whitespace in order to keep C's lexical rules.

The same behaviour can be seen with any other token that could alter the meaning. For example,

#define B +
B+

would produce

+ +

as opposed to

++

for the said reason.

However, this is only the preprocessor that complies to C lexical rules. GCC does have and support an old preprocessor called traditional processor which wouldn't add any extra whitespaces. For example, if you call preprocessor in traditional mode:

gcc -E -traditional-cpp file.c

then

#define B 

(B)
|B|

produce (without the whitespace)

()
||
Sign up to request clarification or add additional context in comments.

Comments

4

The output of gcc -E intentionally does not match the exact rules specified by the C standard. The C standard does not describe any particular way the preprocessor result should be visible, and does not even require such a way exist.

The only time some sort of preprocessor output is required to be visible is when the # operator is used. And if you use this, you can see that there isn't any space.

flaming.toaster's answer rightly points out that the reason the gcc -E output inserts a space is to prevent the two consecutive |s from being parsed as a single || token. The following program is required to give a diagnostic for the syntax error:

#define EMPTY
int main() { return 0 |EMPTY| 0; }

and the space is there to make sure the compiler still has enough information to actually generate the error.

8 Comments

So files get scanned for operators with a macro in between that would lead to a single operator when empty? (A quick list: ++, --, ->, and the boolean operators. Any more?)
@Jongware Consider also of two consecutive identifiers: #define F(X) X, and then F(int)F(main)(){}. Actually, GCC's approach is that two consecutive token types get a space inserted between them if some tokens of those types would be misinterpreted if left adjacent, even if it isn't a problem for the specific tokens in question, so you might see an extra space in a few corner cases where it isn't strictly needed, for example between 1 and +, since + could appear in a pp-number (after an E).
@Jongware From a standard's POV, that would be perfectly fine, but when human readers are inspecting the preprocessor's output, it's nice to only get extra spaces when needed. Macros that expand to expressions are very frequently parenthesised, and may contain other macros. There's no need to change (((...))) to ( ( ( ... ) ) ), that would just be a lot of extra noise.
@Jongware If you use -save-temps, then the exact output as given by -E is saved to a file, and it's that file that gets re-parsed. If you don't use -save-temps, then the internal preprocessor gets used. You're right to wonder, you'd even be right to worry, that -E could end up saving incorrect preprocessor output to a file, which then can't be re-parsed correctly: that did happen. I know because I reported one such case as a bug (now fixed) where the space handling hadn't been updated for new C++11 token types. :)
@BlueMoon That's actually considered a mis-use by the GCC devs unless extra command-line options are used. It's true that many programs and scripts rely on parsing gcc -E's output, but when those programs and scripts broke when a new version of GCC changed the way gcc -E displayed its output, that was not considered a bug in GCC, as the precise format was never specified.
|
1

edit: see hvd's answer about gcc's preprocessor implementation

This may be to differentiate between the bitwise and logical OR operators.

This sample:

if (x | 4) printf("true\n"); // Bitwise OR, may or may not be true

Is different from:

if (x || 4) printf("true\n"); // Always true

Since they are different operators with different functions, it is necessary for the preprocessor to add whitespace to avoid changing the intended meaning of the statement.

1 Comment

It doesn't answer the question: why is it done? it could have not been done at the first place.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.