[clang][Sema] Fix type of an statement expression ending with an atomic type #119711

alejandro-alvarez-sonarsource · 2024-12-12T15:25:50Z

When a statement expression's last statement is an atomic variable, GCC and Clang disagree on the type of the expression. This can be made apparent using typeof and forcing a diagnostic message:

_Atomic int a = 0;
typeof(({a;})) x = "0";

GCC complains about initializing int with char*
Clang complains about initializing _Atomic(int) with a char[2]

Due to the type of the statement expression being deduced to be atomic, we end with three implicit casts inside the StmtExpr on the AST:

LValueToRValue -> AtomicToNonAtomic -> NonAtomicToAtomic

In some situations, this can end on an assertion inside IntExprEvaluator, as reported in #106576.

With this patch, we now have two implicit casts, since the type of the statement expression is deduced to be non-atomic:

LValueToRValue -> AtomicToNonAtomic

This is consistent with the C standard (6.7.2.4, p4)

The properties associated with atomic types are meaningful only for expressions that are lvalues.

But a statement expression is an rvalue.

IntExprEvaluator assumptions are now satisfied and there is no assertion error.
Additionally, the typeof trick mentioned above shows that the type is consistently deduced between GCC and Clang.

Fixes #106576

llvmbot · 2024-12-12T15:26:31Z

@llvm/pr-subscribers-clang

Author: Alejandro Álvarez Ayllón (alejandro-alvarez-sonarsource)

Changes

When a statement expression's last statement is an atomic variable, GCC and Clang disagree on the type of the expression. This can be made apparent using typeof and forcing a diagnostic message:

_Atomic int a = 0;
typeof(({a;})) x = "0";

GCC complains about initializing int with char*
Clang complains about initializing _Atomic(int) with a char[2]

Due to the type of the statement expression being deduced to be atomic, we end with three implicit casts inside the StmtExpr on the AST:

LValueToRValue -> AtomicToNonAtomic -> NonAtomicToAtomic

In some situations, this can end on an assertion inside IntExprEvaluator, as reported in #106576.

With this patch, we now have two implicit casts, since the type of the statement expression is deduced to be non-atomic:

LValueToRValue -> AtomicToNonAtomic

This is consistent with the C standard (6.7.2.4, p4)

> The properties associated with atomic types are meaningful only for expressions that are lvalues.

But a statement expression is an rvalue.

IntExprEvaluator assumptions are now satisfied and there is no assertion error.
Additionally, the typeof trick mentioned above shows that the type is consistently deduced between GCC and Clang.

Fixes #106576

Full diff: https://github.com/llvm/llvm-project/pull/119711.diff

2 Files Affected:

(modified) clang/lib/Sema/SemaExpr.cpp (+9-2)
(added) clang/test/Sema/gh106576.c (+16)

diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 20bf6f7f6f28ff..45ae97807c2032 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -15861,10 +15861,17 @@ ExprResult Sema::ActOnStmtExprResult(ExprResult ER) {
   if (Cast && Cast->getCastKind() == CK_ARCConsumeObject)
     return Cast->getSubExpr();
 
+  auto Ty = E->getType().getUnqualifiedType();
+
+  // If the type is an atomic, the statement type is the underlying type.
+  if (const AtomicType *AT = Ty->getAs<AtomicType>()) {
+    Ty = AT->getValueType().getUnqualifiedType();
+    return PerformImplicitConversion(E, Ty, AssignmentAction::Casting);
+  }
+
   // FIXME: Provide a better location for the initialization.
   return PerformCopyInitialization(
-      InitializedEntity::InitializeStmtExprResult(
-          E->getBeginLoc(), E->getType().getUnqualifiedType()),
+      InitializedEntity::InitializeStmtExprResult(E->getBeginLoc(), Ty),
       SourceLocation(), E);
 }
 
diff --git a/clang/test/Sema/gh106576.c b/clang/test/Sema/gh106576.c
new file mode 100644
index 00000000000000..792977dea14132
--- /dev/null
+++ b/clang/test/Sema/gh106576.c
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -fsyntax-only -verify %s
+
+typedef _Atomic char atomic_char;
+
+typedef _Atomic char atomic_char;
+
+atomic_char counter;
+
+char load_plus_one(void) {
+  return ({counter;}) + 1; // no crash
+}
+
+char type_of_stmt_expr(void) {
+  typeof(({counter;})) y = ""; // expected-error-re {{incompatible pointer to integer conversion initializing 'typeof (({{{.*}}}))' (aka 'char') with an expression of type 'char[1]'}}
+  return y;
+}

shafik

This will eventually need a release note.

alejandro-alvarez-sonarsource · 2025-01-09T15:35:52Z

Ping?

shafik

This looks about right to me but I would like another set of eyes.

AaronBallman · 2025-01-27T20:09:33Z

clang/lib/Sema/SemaExpr.cpp

@@ -15919,10 +15919,17 @@ ExprResult Sema::ActOnStmtExprResult(ExprResult ER) {
  if (Cast && Cast->getCastKind() == CK_ARCConsumeObject)
    return Cast->getSubExpr();

+  auto Ty = E->getType().getUnqualifiedType();


What I'm confused by is actually a few lines farther up. Why do we not want to do an lvalue-to-rvalue conversion when the result of the statement expression is an rvalue? That conversion would strip all the qualifiers, including the atomic ones, which seems like exactly the behavior we want.

That comment comes from 34376a68

CC @rjmccall in case you have some recollection here.

I believe it's because we're setting up for a PerformCopyInitialization below, so the l-value conversion would be redundant (or possibly slightly wrong for some C++ class types?). Certainly could be clearer in the comment.

As far as this patch goes, my longstanding thought is that _Atomic is clearly a qualifier. I know the C standard doesn't agree, but the C standard is wrong, the same way that it's wrong when it says that function references aren't l-values; sometimes the formal system has an obvious truth that doesn't align with the glossary. In this case, the language treats _Atomic as a qualifier for almost every purpose, and the compiler generally reflects that: most of the places that uniformly look through qualifiers also look through _Atomic, and most of the places that need to look specifically for _Atomic also need to look for certain qualifiers.

So my suggestion is that common functions like getUnqualifiedType() should just be looking through _Atomic, which would have fixed this bug before it happened. If there's a need for a narrower API that only looks through non-_Atomic qualifiers (or perhaps some subset of "non-ABI-affecting" qualifiers?), we can add that with a nice name that suggests when it should be used.

So my suggestion is that common functions like getUnqualifiedType() should just be looking through _Atomic, which would have fixed this bug before it happened.

While I understand the reasoning, I am wary of touching a method that is used (roughly counted) over 400 times. Unfortunately, I do not currently have the bandwith to commit into carefully combing all of them.

As far as this patch goes, my longstanding thought is that _Atomic is clearly a qualifier.

Errrr, I don't agree. I think _Atomic is a type specifier. _Atomic int and int are not the same type (heck, they can even have different sizes and alignments!). It's somewhat like a qualifier in that it impacts how you access the underlying object (so in that regard, it is a bit like volatile int and int), but I think the fundamental thing is that you rarely want int semantics directly when the type is spelled _Atomic int.

So my suggestion is that common functions like getUnqualifiedType() should just be looking through _Atomic, which would have fixed this bug before it happened. If there's a need for a narrower API that only looks through non-_Atomic qualifiers (or perhaps some subset of "non-ABI-affecting" qualifiers?), we can add that with a nice name that suggests when it should be used.

On the one hand, this specific bug has bitten us so many times I want to agree with your proposed approach.

On the other hand, that's going to be a massive undertaking that's pretty risky. We have times when we definitely do not want the qualified type to drop the atomic qualifier. For example, when merging function types, we call getUnqualifiedType() to determine whether two parameters are compatible. void func(const int i); and void func(int i); are compatible redeclarations, while void func(_Atomic int i); and void func(int i); are not. There are other times when we do want the atomic qualifier dropped. It's a frustrating mess and I'm certain we have numerous bugs in this area.

So the question to me becomes: do we fix this narrow issue and kick the can down the road? Or do we try to find someone willing to undergo the more significant cleanup?

As far as this patch goes, my longstanding thought is that _Atomic is clearly a qualifier.

Errrr, I don't agree. I think _Atomic is a type specifier. _Atomic int and int are not the same type

const int and int are also not the same type.

(heck, they can even have different sizes and alignments!).

Size and alignment is the (back-justified) reason we draw the line where we currently do. It is not a particularly sensible way to do it. Most of our existing qualifiers affect code generation and ABI, just in ways that happen to not affect size and alignment. Several qualifiers affect the ABI rules for containing aggregates, for example.

It's somewhat like a qualifier in that it impacts how you access the underlying object (so in that regard, it is a bit like volatile int and int), but I think the fundamental thing is that you rarely want int semantics directly when the type is spelled _Atomic int.

I'm not quite sure what you're saying here. On the implementation level, the compiler can never ignore qualifiers when performing an access to an object; the whole point of qualifiers is that they affect the semantics of accesses. On the language design level, I agree with you that programmers should never perform normal accesses to _Atomic objects instead of using the intrinsics. However, that is an argument that it was an unfortunate language design decision that _Atomic behaves like a qualifier. In the language that we've actually got, _Atomic has most of the characteristics of qualifiers, and there is no benefit to the compiler implementation in pretending otherwise.

The common characteristics of qualifiers:

They convey an aspect of how a value is stored or accessed other than the abstract value itself.

They are typically disallowed or silently dropped at the top level in type contexts that primarily describe an abstract value rather than storage, such as casts, return values, and parameters (when determining the function type).

R-value expressions represent an abstract value and are (for the most part) unqualified at the top level. Using a qualified l-value in a context that requires an r-value drops all qualifiers as part of the conversion to an r-value. The qualifiers typically affect the code-generation of the load.

Assignment consumes an abstract value and therefore expects an r-value on the RHS, which (per above) is of unqualified type. The qualifier typically affects code-generation of the store.

The qualifier (if it can apply to structs and unions at all) propagates along member accesses, such that the type of x.member has all the top-level qualifiers of both x and the member.

Most of these apply to _Atomic. The most notable exception is the last, which C makes UB. But it is not uncommon for qualifiers to have tweaks to various of these rules, like how C++ allows qualified return values of class type for some reason.

The subtyping rules for CV qualifiers — the ability to e.g. convert an int** to const int * const * — do not generally work for other qualifiers. For example, even just considering semi-standardized qualifiers, you clearly cannot add an address space in a nested position this way. I believe you have to understand this as a useful special case for the CV qualifiers. (It really should not work for restrict, although I can't remember what rule we actually implement.)

So my suggestion is that common functions like getUnqualifiedType() should just be looking through _Atomic, which would have fixed this bug before it happened. If there's a need for a narrower API that only looks through non-_Atomic qualifiers (or perhaps some subset of "non-ABI-affecting" qualifiers?), we can add that with a nice name that suggests when it should be used.

On the one hand, this specific bug has bitten us so many times I want to agree with your proposed approach.

On the other hand, that's going to be a massive undertaking that's pretty risky.

Yeah, I don't disagree with this, and I'm not trying to force the problem to be solved immediately. I would like to try to get consensus on the right thing to do, though.

We have times when we definitely do not want the qualified type to drop the atomic qualifier. For example, when merging function types, we call getUnqualifiedType() to determine whether two parameters are compatible. void func(const int i); and void func(int i); are compatible redeclarations, while void func(_Atomic int i); and void func(int i); are not. There are other times when we do want the atomic qualifier dropped. It's a frustrating mess and I'm certain we have numerous bugs in this area.

In this case, I have to wonder if the standard is in the same place Clang is — if this was really an intentional allowance for _Atomic T as a parameter in function types or just an oversight. It's hard to imagine what an atomic argument could possibly mean. Perhaps it was an oversight that they no longer feel comfortable fixing.

So the question to me becomes: do we fix this narrow issue and kick the can down the road? Or do we try to find someone willing to undergo the more significant cleanup?

If we agree that the compiler probably ought to be defaulting to treating atomic types as qualifiers, I think the path is:

Introduce a function that looks through _Atomic as well as the other qualifiers.

Make a best effort to switch existing call sites of getUnqualifiedType() that immediately also look through _Atomic to instead use the common function. Probably ought to be the same patch as (1).

Rework this patch to just use the new function.

Introduce a function that explicitly does not look through _Atomic.

Gradually go through the remaining getUnqualifiedType() uses and figure out which of the explicit functions they should use, adding tests as necessary.

When that's done, decide if we've got compelling evidence that getUnqualifiedType() should just default to looking through _Atomic.

I think asking Alejandro to do steps 1-3 as part of this bug fix is reasonable, and then the rest can be an open project.

As far as this patch goes, my longstanding thought is that _Atomic is clearly a qualifier.

Errrr, I don't agree. I think _Atomic is a type specifier. _Atomic int and int are not the same type

const int and int are also not the same type.

Maybe I'm being imprecise, sorry for that. const int and int are both "morally" of type int in that the allowed accesses to either type will give the same code generation and behavior. There's no difference between read accesses to int and const int: https://godbolt.org/z/GzhKWr8W6

That is mostly a special case of const. Even volatile requires code-gen differences, at least when generating a high-level IR subject to optimization.

(heck, they can even have different sizes and alignments!).

Size and alignment is the (back-justified) reason we draw the line where we currently do. It is not a particularly sensible way to do it. Most of our existing qualifiers affect code generation and ABI, just in ways that happen to not affect size and alignment. Several qualifiers affect the ABI rules for containing aggregates, for example.

Do you have an example of impacting the ABI rules in C?

If you mean qualifiers officially blessed in any way by WG14, I believe that's just const, volatile, restrict, and the address-space qualifiers from TR 18037. Address space qualifiers do affect ABI, but they don't specifically affect struct layout because they are not allowed on struct members.

Loading an int to get its value is a different operation than loading an atomic int to get its value; the latter may require entirely different instruction selection. Probably int is a bad example, perhaps struct values makes the distinction more clear?

Needing different instruction selection is common for extended qualifiers. Even among the official qualifiers, address spaces typically require different instruction selection.

Another notable exception is that _Atomic is not silently dropped at the top level in type for return values and parameters when determining the function type, as mentioned above.

It actually is dropped for return types; it's just parameters that are different.

In this case, I have to wonder if the standard is in the same place Clang is — if this was really an intentional allowance for _Atomic T as a parameter in function types or just an oversight. It's hard to imagine what an atomic argument could possibly mean. Perhaps it was an oversight that they no longer feel comfortable fixing.

Doesn't it matter in the same way it matters for const parameters: it impacts the accesses to the parameter within the function body itself?

Sure, but it is not in any way necessary for this to be part of the function type, just as it isn't part of the function type for const. The parameter is an independent object and has no relation to anything in the caller, so making the atomic-ness part of the signature is pointless.

If we agree that the compiler probably ought to be defaulting to treating atomic types as qualifiers, I think the path is:

1. Introduce a function that looks through `_Atomic` as well as the other qualifiers. 2. Make a best effort to switch existing call sites of `getUnqualifiedType()` that immediately also look through `_Atomic` to instead use the common function. Probably ought to be the same patch as (1).

I think we've already done this though. We have getAtomicUnqualifiedType() which we use when we want to get a non-atomic, unqualified type.

3. Rework this patch to just use the new function. 4. Introduce a function that explicitly does not look through `_Atomic`.

Which is getUnqualifiedType() currently.

Right, it would have the same behavior as getUnqualifiedType() (and presumably would be implemented as a call to it). Making it a separate function lets you keep track of the fact that you considered a caller and deliberately did not want it to look through _Atomic, though.

I think asking Alejandro to do steps 1-3 as part of this bug fix is reasonable, and then the rest can be an open project.

So I think what I'm getting is: we should audit all existing calls to getUnqualifiedType() and getAtomicUnqualifiedType() to determine whether we're correctly handling atomics or not, and if there are more uses of getAtomicUnqualifiedType() after correcting issues, then maybe we want getUnqualifiedType() to look through atomics and rename getAtomicUnqualifiedType() to something else?

Right, with the tweak that the audit always switches away from getUnqualifiedType(): each call either becomes getAtomicUnqualifiedType() (I forgot that this already exists) or getUnqualifiedTypePreservingAtomic(), and then the audit is done when there are no remaining calls to getUnqualifiedType(). At that point, we decide if having a getUnqualifiedType() is a good idea or if we should just force all callers to explicitly choose.

And as mentioned, there's a short-term subset of the plan that unblocks this fix.

And as mentioned, there's a short-term subset of the plan that unblocks this fix.

Thanks for the feedback and for the enlightening discussion. I will go in this direction, but it will take me a while, since I have quite a few things on the backlog.

Right, with the tweak that the audit always switches away from getUnqualifiedType(): each call either becomes getAtomicUnqualifiedType() (I forgot that this already exists) or getUnqualifiedTypePreservingAtomic(), and then the audit is done when there are no remaining calls to getUnqualifiedType(). At that point, we decide if having a getUnqualifiedType() is a good idea or if we should just force all callers to explicitly choose.

Okay, this is a plan I can get behind -- thank you for the good discussion!

And as mentioned, there's a short-term subset of the plan that unblocks this fix.

Thanks for the feedback and for the enlightening discussion. I will go in this direction, but it will take me a while, since I have quite a few things on the backlog.

Before you get too far into anything else, could you try just changing the call to getUnqualifiedType() to getAtomicUnqualifiedType() in the original code and seeing if that fixes the problem? That should be fine for the local, short-term fix.

Before you get too far into anything else, could you try just changing the call to getUnqualifiedType() to getAtomicUnqualifiedType() in the original code and seeing if that fixes the problem? That should be fine for the local, short-term fix.

It fixes it, yes, thanks!

(Sorry for the force-push, I saw some seemingly unrelated tests were broken and I wanted to make sure it wasn't due to this, after un-merging main all tests are green)

clang/docs/ReleaseNotes.rst

rjmccall · 2025-02-12T19:55:10Z

clang/test/Sema/gh106576.c

+
+typedef _Atomic char atomic_char;
+
+typedef _Atomic char atomic_char;


Presumably the redundant typedef is not a critical part of this test.

It isn't, I have removed the duplicated line.

Co-authored-by: John McCall <[email protected]>

rjmccall

Thanks, LGTM.

alejandro-alvarez-sonarsource · 2025-02-18T07:57:13Z

Thanks! Do you mind doing the merge, since I can't?

…ic type (llvm#119711) When a statement expression's last statement is an atomic variable, GCC and Clang disagree on the type of the expression. This can be made apparent using `typeof` and forcing a diagnostic message: ```cpp _Atomic int a = 0; typeof(({a;})) x = "0"; ``` * GCC complains about initializing `int` with `char*` * Clang complains about initializing `_Atomic(int)` with a `char[2]` Due to the type of the statement expression being deduced to be atomic, we end with three implicit casts inside the `StmtExpr` on the AST: * `LValueToRValue` -> `AtomicToNonAtomic` -> `NonAtomicToAtomic` In some situations, this can end on an assertion inside `IntExprEvaluator`, as reported in llvm#106576. With this patch, we now have two implicit casts, since the type of the statement expression is deduced to be non-atomic: * `LValueToRValue` -> `AtomicToNonAtomic` This is consistent with the C standard (6.7.2.4, p4) > The properties associated with atomic types are meaningful only for expressions that are lvalues. But a statement expression is an rvalue. `IntExprEvaluator` assumptions are now satisfied and there is no assertion error. Additionally, the `typeof` trick mentioned above shows that the type is consistently deduced between GCC and Clang. Fixes llvm#106576 --------- Co-authored-by: John McCall <[email protected]>

alejandro-alvarez-sonarsource added 3 commits December 12, 2024 15:08

Add regression test

d398fa1

Tentative fix

296aa35

Use PerformImplicitConversion

d13e109

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Dec 12, 2024

cor3ntin requested a review from AaronBallman December 12, 2024 17:22

shafik reviewed Dec 18, 2024

View reviewed changes

alejandro-alvarez-sonarsource added 3 commits December 18, 2024 09:16

Merge branch 'main' into aa/fix_gh106576

764c452

Add release note

02d8385

Merge branch 'main' into aa/fix_gh106576

62a63cc

Merge branch 'main' into aa/fix_gh106576

f1b8d34

shafik reviewed Jan 24, 2025

View reviewed changes

shafik requested a review from erichkeane January 24, 2025 23:00

AaronBallman reviewed Jan 27, 2025

View reviewed changes

Fix by using getAtomicUnqualifiedType

2cfef98

alejandro-alvarez-sonarsource force-pushed the aa/fix_gh106576 branch from 75b12bb to 2cfef98 Compare February 12, 2025 12:04

Merge remote-tracking branch 'upstream/main' into aa/fix_gh106576

0c181e0

rjmccall reviewed Feb 12, 2025

View reviewed changes

alejandro-alvarez-sonarsource and others added 2 commits February 13, 2025 08:59

Update clang/docs/ReleaseNotes.rst

87f6644

Co-authored-by: John McCall <[email protected]>

Remove duplicated typedef from test

81e4efb

rjmccall approved these changes Feb 17, 2025

View reviewed changes

cor3ntin merged commit 252c83b into llvm:main Feb 18, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clang][Sema] Fix type of an statement expression ending with an atomic type #119711

[clang][Sema] Fix type of an statement expression ending with an atomic type #119711

alejandro-alvarez-sonarsource commented Dec 12, 2024

llvmbot commented Dec 12, 2024

shafik left a comment

alejandro-alvarez-sonarsource commented Jan 9, 2025

shafik left a comment

AaronBallman Jan 27, 2025

rjmccall Jan 27, 2025

alejandro-alvarez-sonarsource Feb 3, 2025

AaronBallman Feb 3, 2025

rjmccall Feb 3, 2025 •

edited

Loading

rjmccall Feb 4, 2025

alejandro-alvarez-sonarsource Feb 6, 2025

AaronBallman Feb 6, 2025

rjmccall Feb 6, 2025

alejandro-alvarez-sonarsource Feb 12, 2025

rjmccall Feb 12, 2025

alejandro-alvarez-sonarsource Feb 13, 2025

rjmccall left a comment

alejandro-alvarez-sonarsource commented Feb 18, 2025


		typedef _Atomic char atomic_char;

		typedef _Atomic char atomic_char;

[clang][Sema] Fix type of an statement expression ending with an atomic type #119711

[clang][Sema] Fix type of an statement expression ending with an atomic type #119711

Conversation

alejandro-alvarez-sonarsource commented Dec 12, 2024

llvmbot commented Dec 12, 2024

shafik left a comment

Choose a reason for hiding this comment

alejandro-alvarez-sonarsource commented Jan 9, 2025

shafik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjmccall Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjmccall left a comment

Choose a reason for hiding this comment

alejandro-alvarez-sonarsource commented Feb 18, 2025

rjmccall Feb 3, 2025 •

edited

Loading