cpp expansion of macro with no token-string

Question

I am reading on CPP macro expansion and wanted to understand expansion when the (optional) token-string is not provided. I found gcc v4.8.4 does this:

$ cat zz.c
#define B
(B)
|B|
$ gcc -E zz.c
# 1 "zz.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "zz.c"

()
| |

Can anyone explain why the expansion is zero spaces in one instance and one in the other?

Probably because | and || are different operators. I hope someone writes an answer with exact rules the cpp uses. — hyde
– hyde, Commented Jun 12, 2015 at 23:42
It can be funny seeing somebody write x | value | SOME_FLAG and it turns out to be x || SOME_FLAG, it's probably this way so it won't compile. — Mark Segal
– Mark Segal, Commented Jun 12, 2015 at 23:53
But that would mean || is treated as a special case. This could be avoided by always inserting spaces, which would not be against C rules (except for macro concatenations). — Jongware
– Jongware, Commented Jun 12, 2015 at 23:56

P.P · Accepted Answer · 2015-06-13 00:09:03Z

The C preprocessor operates on "tokens" and whenever there's a possibility of changing the meaning or ambiguity, it always adds whitespace in order to preserve the meaning.

Consider your example,

(B)

there's no ambiguity or meaning altering whether there's a space between ( and ) added or not irrespective of the macro value of B.

But it's not the case with

|B|

Depending on the macro B, this above could either be || or |something|. So preprocessor is forced to add a whitespace in order to keep C's lexical rules.

The same behaviour can be seen with any other token that could alter the meaning. For example,

#define B +
B+

would produce

+ +

as opposed to

++

for the said reason.

However, this is only the preprocessor that complies to C lexical rules. GCC does have and support an old preprocessor called traditional processor which wouldn't add any extra whitespaces. For example, if you call preprocessor in traditional mode:

gcc -E -traditional-cpp file.c

then

#define B 

(B)
|B|

produce (without the whitespace)

()
||

user743382 · Accepted Answer · 2015-06-12 23:57:00Z

4

The output of gcc -E intentionally does not match the exact rules specified by the C standard. The C standard does not describe any particular way the preprocessor result should be visible, and does not even require such a way exist.

The only time some sort of preprocessor output is required to be visible is when the # operator is used. And if you use this, you can see that there isn't any space.

flaming.toaster's answer rightly points out that the reason the gcc -E output inserts a space is to prevent the two consecutive |s from being parsed as a single || token. The following program is required to give a diagnostic for the syntax error:

#define EMPTY
int main() { return 0 |EMPTY| 0; }

and the space is there to make sure the compiler still has enough information to actually generate the error.

answered Jun 12, 2015 at 23:57

user743382

8 Comments

Jongware Over a year ago

So files get scanned for operators with a macro in between that would lead to a single operator when empty? (A quick list: ++, --, ->, and the boolean operators. Any more?)

user743382 Over a year ago

@Jongware Consider also of two consecutive identifiers: #define F(X) X, and then F(int)F(main)(){}. Actually, GCC's approach is that two consecutive token types get a space inserted between them if some tokens of those types would be misinterpreted if left adjacent, even if it isn't a problem for the specific tokens in question, so you might see an extra space in a few corner cases where it isn't strictly needed, for example between 1 and +, since + could appear in a pp-number (after an E).

user743382 Over a year ago

@Jongware From a standard's POV, that would be perfectly fine, but when human readers are inspecting the preprocessor's output, it's nice to only get extra spaces when needed. Macros that expand to expressions are very frequently parenthesised, and may contain other macros. There's no need to change (((...))) to ( ( ( ... ) ) ), that would just be a lot of extra noise.

user743382 Over a year ago

@Jongware If you use -save-temps, then the exact output as given by -E is saved to a file, and it's that file that gets re-parsed. If you don't use -save-temps, then the internal preprocessor gets used. You're right to wonder, you'd even be right to worry, that -E could end up saving incorrect preprocessor output to a file, which then can't be re-parsed correctly: that did happen. I know because I reported one such case as a bug (now fixed) where the space handling hadn't been updated for new C++11 token types. :)

user743382 Over a year ago

@BlueMoon That's actually considered a mis-use by the GCC devs unless extra command-line options are used. It's true that many programs and scripts rely on parsing gcc -E's output, but when those programs and scripts broke when a new version of GCC changed the way gcc -E displayed its output, that was not considered a bug in GCC, as the precise format was never specified.

|

flaming.toaster · Accepted Answer · 2015-06-13 00:13:14Z

1

edit: see hvd's answer about gcc's preprocessor implementation

This may be to differentiate between the bitwise and logical OR operators.

This sample:

if (x | 4) printf("true\n"); // Bitwise OR, may or may not be true

Is different from:

if (x || 4) printf("true\n"); // Always true

Since they are different operators with different functions, it is necessary for the preprocessor to add whitespace to avoid changing the intended meaning of the statement.

edited Jun 13, 2015 at 0:13

answered Jun 12, 2015 at 23:50

flaming.toaster

416 bronze badges

1 Comment

Mark Segal Over a year ago

It doesn't answer the question: why is it done? it could have not been done at the first place.

Collectives™ on Stack Overflow

cpp expansion of macro with no token-string

3 Answers 3

Comments

8 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related