Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][Sema] Fix type of an statement expression ending with an atomic type #119711

Merged
merged 11 commits into from
Feb 18, 2025

Conversation

alejandro-alvarez-sonarsource
Copy link
Contributor

When a statement expression's last statement is an atomic variable, GCC and Clang disagree on the type of the expression. This can be made apparent using typeof and forcing a diagnostic message:

_Atomic int a = 0;
typeof(({a;})) x = "0";
  • GCC complains about initializing int with char*
  • Clang complains about initializing _Atomic(int) with a char[2]

Due to the type of the statement expression being deduced to be atomic, we end with three implicit casts inside the StmtExpr on the AST:

  • LValueToRValue -> AtomicToNonAtomic -> NonAtomicToAtomic

In some situations, this can end on an assertion inside IntExprEvaluator, as reported in #106576.

With this patch, we now have two implicit casts, since the type of the statement expression is deduced to be non-atomic:

  • LValueToRValue -> AtomicToNonAtomic

This is consistent with the C standard (6.7.2.4, p4)

The properties associated with atomic types are meaningful only for expressions that are lvalues.

But a statement expression is an rvalue.

IntExprEvaluator assumptions are now satisfied and there is no assertion error.
Additionally, the typeof trick mentioned above shows that the type is consistently deduced between GCC and Clang.

Fixes #106576

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Dec 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 12, 2024

@llvm/pr-subscribers-clang

Author: Alejandro Álvarez Ayllón (alejandro-alvarez-sonarsource)

Changes

When a statement expression's last statement is an atomic variable, GCC and Clang disagree on the type of the expression. This can be made apparent using typeof and forcing a diagnostic message:

_Atomic int a = 0;
typeof(({a;})) x = "0";
  • GCC complains about initializing int with char*
  • Clang complains about initializing _Atomic(int) with a char[2]

Due to the type of the statement expression being deduced to be atomic, we end with three implicit casts inside the StmtExpr on the AST:

  • LValueToRValue -> AtomicToNonAtomic -> NonAtomicToAtomic

In some situations, this can end on an assertion inside IntExprEvaluator, as reported in #106576.

With this patch, we now have two implicit casts, since the type of the statement expression is deduced to be non-atomic:

  • LValueToRValue -> AtomicToNonAtomic

This is consistent with the C standard (6.7.2.4, p4)

> The properties associated with atomic types are meaningful only for expressions that are lvalues.

But a statement expression is an rvalue.

IntExprEvaluator assumptions are now satisfied and there is no assertion error.
Additionally, the typeof trick mentioned above shows that the type is consistently deduced between GCC and Clang.

Fixes #106576


Full diff: https://github.com/llvm/llvm-project/pull/119711.diff

2 Files Affected:

  • (modified) clang/lib/Sema/SemaExpr.cpp (+9-2)
  • (added) clang/test/Sema/gh106576.c (+16)
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 20bf6f7f6f28ff..45ae97807c2032 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -15861,10 +15861,17 @@ ExprResult Sema::ActOnStmtExprResult(ExprResult ER) {
   if (Cast && Cast->getCastKind() == CK_ARCConsumeObject)
     return Cast->getSubExpr();
 
+  auto Ty = E->getType().getUnqualifiedType();
+
+  // If the type is an atomic, the statement type is the underlying type.
+  if (const AtomicType *AT = Ty->getAs<AtomicType>()) {
+    Ty = AT->getValueType().getUnqualifiedType();
+    return PerformImplicitConversion(E, Ty, AssignmentAction::Casting);
+  }
+
   // FIXME: Provide a better location for the initialization.
   return PerformCopyInitialization(
-      InitializedEntity::InitializeStmtExprResult(
-          E->getBeginLoc(), E->getType().getUnqualifiedType()),
+      InitializedEntity::InitializeStmtExprResult(E->getBeginLoc(), Ty),
       SourceLocation(), E);
 }
 
diff --git a/clang/test/Sema/gh106576.c b/clang/test/Sema/gh106576.c
new file mode 100644
index 00000000000000..792977dea14132
--- /dev/null
+++ b/clang/test/Sema/gh106576.c
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -fsyntax-only -verify %s
+
+typedef _Atomic char atomic_char;
+
+typedef _Atomic char atomic_char;
+
+atomic_char counter;
+
+char load_plus_one(void) {
+  return ({counter;}) + 1; // no crash
+}
+
+char type_of_stmt_expr(void) {
+  typeof(({counter;})) y = ""; // expected-error-re {{incompatible pointer to integer conversion initializing 'typeof (({{{.*}}}))' (aka 'char') with an expression of type 'char[1]'}}
+  return y;
+}

Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will eventually need a release note.

@alejandro-alvarez-sonarsource
Copy link
Contributor Author

Ping?

Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks about right to me but I would like another set of eyes.

@shafik shafik requested a review from erichkeane January 24, 2025 23:00
@@ -15919,10 +15919,17 @@ ExprResult Sema::ActOnStmtExprResult(ExprResult ER) {
if (Cast && Cast->getCastKind() == CK_ARCConsumeObject)
return Cast->getSubExpr();

auto Ty = E->getType().getUnqualifiedType();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm confused by is actually a few lines farther up. Why do we not want to do an lvalue-to-rvalue conversion when the result of the statement expression is an rvalue? That conversion would strip all the qualifiers, including the atomic ones, which seems like exactly the behavior we want.

That comment comes from 34376a68

CC @rjmccall in case you have some recollection here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's because we're setting up for a PerformCopyInitialization below, so the l-value conversion would be redundant (or possibly slightly wrong for some C++ class types?). Certainly could be clearer in the comment.

As far as this patch goes, my longstanding thought is that _Atomic is clearly a qualifier. I know the C standard doesn't agree, but the C standard is wrong, the same way that it's wrong when it says that function references aren't l-values; sometimes the formal system has an obvious truth that doesn't align with the glossary. In this case, the language treats _Atomic as a qualifier for almost every purpose, and the compiler generally reflects that: most of the places that uniformly look through qualifiers also look through _Atomic, and most of the places that need to look specifically for _Atomic also need to look for certain qualifiers.

So my suggestion is that common functions like getUnqualifiedType() should just be looking through _Atomic, which would have fixed this bug before it happened. If there's a need for a narrower API that only looks through non-_Atomic qualifiers (or perhaps some subset of "non-ABI-affecting" qualifiers?), we can add that with a nice name that suggests when it should be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my suggestion is that common functions like getUnqualifiedType() should just be looking through _Atomic, which would have fixed this bug before it happened.

While I understand the reasoning, I am wary of touching a method that is used (roughly counted) over 400 times. Unfortunately, I do not currently have the bandwith to commit into carefully combing all of them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as this patch goes, my longstanding thought is that _Atomic is clearly a qualifier.

Errrr, I don't agree. I think _Atomic is a type specifier. _Atomic int and int are not the same type (heck, they can even have different sizes and alignments!). It's somewhat like a qualifier in that it impacts how you access the underlying object (so in that regard, it is a bit like volatile int and int), but I think the fundamental thing is that you rarely want int semantics directly when the type is spelled _Atomic int.

So my suggestion is that common functions like getUnqualifiedType() should just be looking through _Atomic, which would have fixed this bug before it happened. If there's a need for a narrower API that only looks through non-_Atomic qualifiers (or perhaps some subset of "non-ABI-affecting" qualifiers?), we can add that with a nice name that suggests when it should be used.

On the one hand, this specific bug has bitten us so many times I want to agree with your proposed approach.

On the other hand, that's going to be a massive undertaking that's pretty risky. We have times when we definitely do not want the qualified type to drop the atomic qualifier. For example, when merging function types, we call getUnqualifiedType() to determine whether two parameters are compatible. void func(const int i); and void func(int i); are compatible redeclarations, while void func(_Atomic int i); and void func(int i); are not. There are other times when we do want the atomic qualifier dropped. It's a frustrating mess and I'm certain we have numerous bugs in this area.

So the question to me becomes: do we fix this narrow issue and kick the can down the road? Or do we try to find someone willing to undergo the more significant cleanup?

Copy link
Contributor

@rjmccall rjmccall Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as this patch goes, my longstanding thought is that _Atomic is clearly a qualifier.

Errrr, I don't agree. I think _Atomic is a type specifier. _Atomic int and int are not the same type

const int and int are also not the same type.

(heck, they can even have different sizes and alignments!).

Size and alignment is the (back-justified) reason we draw the line where we currently do. It is not a particularly sensible way to do it. Most of our existing qualifiers affect code generation and ABI, just in ways that happen to not affect size and alignment. Several qualifiers affect the ABI rules for containing aggregates, for example.

It's somewhat like a qualifier in that it impacts how you access the underlying object (so in that regard, it is a bit like volatile int and int), but I think the fundamental thing is that you rarely want int semantics directly when the type is spelled _Atomic int.

I'm not quite sure what you're saying here. On the implementation level, the compiler can never ignore qualifiers when performing an access to an object; the whole point of qualifiers is that they affect the semantics of accesses. On the language design level, I agree with you that programmers should never perform normal accesses to _Atomic objects instead of using the intrinsics. However, that is an argument that it was an unfortunate language design decision that _Atomic behaves like a qualifier. In the language that we've actually got, _Atomic has most of the characteristics of qualifiers, and there is no benefit to the compiler implementation in pretending otherwise.

The common characteristics of qualifiers:

  • They convey an aspect of how a value is stored or accessed other than the abstract value itself.
  • They are typically disallowed or silently dropped at the top level in type contexts that primarily describe an abstract value rather than storage, such as casts, return values, and parameters (when determining the function type).
  • R-value expressions represent an abstract value and are (for the most part) unqualified at the top level. Using a qualified l-value in a context that requires an r-value drops all qualifiers as part of the conversion to an r-value. The qualifiers typically affect the code-generation of the load.
  • Assignment consumes an abstract value and therefore expects an r-value on the RHS, which (per above) is of unqualified type. The qualifier typically affects code-generation of the store.
  • The qualifier (if it can apply to structs and unions at all) propagates along member accesses, such that the type of x.member has all the top-level qualifiers of both x and the member.

Most of these apply to _Atomic. The most notable exception is the last, which C makes UB. But it is not uncommon for qualifiers to have tweaks to various of these rules, like how C++ allows qualified return values of class type for some reason.

The subtyping rules for CV qualifiers — the ability to e.g. convert an int** to const int * const * — do not generally work for other qualifiers. For example, even just considering semi-standardized qualifiers, you clearly cannot add an address space in a nested position this way. I believe you have to understand this as a useful special case for the CV qualifiers. (It really should not work for restrict, although I can't remember what rule we actually implement.)

So my suggestion is that common functions like getUnqualifiedType() should just be looking through _Atomic, which would have fixed this bug before it happened. If there's a need for a narrower API that only looks through non-_Atomic qualifiers (or perhaps some subset of "non-ABI-affecting" qualifiers?), we can add that with a nice name that suggests when it should be used.

On the one hand, this specific bug has bitten us so many times I want to agree with your proposed approach.

On the other hand, that's going to be a massive undertaking that's pretty risky.

Yeah, I don't disagree with this, and I'm not trying to force the problem to be solved immediately. I would like to try to get consensus on the right thing to do, though.

We have times when we definitely do not want the qualified type to drop the atomic qualifier. For example, when merging function types, we call getUnqualifiedType() to determine whether two parameters are compatible. void func(const int i); and void func(int i); are compatible redeclarations, while void func(_Atomic int i); and void func(int i); are not. There are other times when we do want the atomic qualifier dropped. It's a frustrating mess and I'm certain we have numerous bugs in this area.

In this case, I have to wonder if the standard is in the same place Clang is — if this was really an intentional allowance for _Atomic T as a parameter in function types or just an oversight. It's hard to imagine what an atomic argument could possibly mean. Perhaps it was an oversight that they no longer feel comfortable fixing.

So the question to me becomes: do we fix this narrow issue and kick the can down the road? Or do we try to find someone willing to undergo the more significant cleanup?

If we agree that the compiler probably ought to be defaulting to treating atomic types as qualifiers, I think the path is:

  1. Introduce a function that looks through _Atomic as well as the other qualifiers.
  2. Make a best effort to switch existing call sites of getUnqualifiedType() that immediately also look through _Atomic to instead use the common function. Probably ought to be the same patch as (1).
  3. Rework this patch to just use the new function.
  4. Introduce a function that explicitly does not look through _Atomic.
  5. Gradually go through the remaining getUnqualifiedType() uses and figure out which of the explicit functions they should use, adding tests as necessary.
  6. When that's done, decide if we've got compelling evidence that getUnqualifiedType() should just default to looking through _Atomic.

I think asking Alejandro to do steps 1-3 as part of this bug fix is reasonable, and then the rest can be an open project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as this patch goes, my longstanding thought is that _Atomic is clearly a qualifier.

Errrr, I don't agree. I think _Atomic is a type specifier. _Atomic int and int are not the same type

const int and int are also not the same type.

Maybe I'm being imprecise, sorry for that. const int and int are both "morally" of type int in that the allowed accesses to either type will give the same code generation and behavior. There's no difference between read accesses to int and const int: https://godbolt.org/z/GzhKWr8W6

That is mostly a special case of const. Even volatile requires code-gen differences, at least when generating a high-level IR subject to optimization.

(heck, they can even have different sizes and alignments!).

Size and alignment is the (back-justified) reason we draw the line where we currently do. It is not a particularly sensible way to do it. Most of our existing qualifiers affect code generation and ABI, just in ways that happen to not affect size and alignment. Several qualifiers affect the ABI rules for containing aggregates, for example.

Do you have an example of impacting the ABI rules in C?

If you mean qualifiers officially blessed in any way by WG14, I believe that's just const, volatile, restrict, and the address-space qualifiers from TR 18037. Address space qualifiers do affect ABI, but they don't specifically affect struct layout because they are not allowed on struct members.

Loading an int to get its value is a different operation than loading an atomic int to get its value; the latter may require entirely different instruction selection. Probably int is a bad example, perhaps struct values makes the distinction more clear?

Needing different instruction selection is common for extended qualifiers. Even among the official qualifiers, address spaces typically require different instruction selection.

Another notable exception is that _Atomic is not silently dropped at the top level in type for return values and parameters when determining the function type, as mentioned above.

It actually is dropped for return types; it's just parameters that are different.

In this case, I have to wonder if the standard is in the same place Clang is — if this was really an intentional allowance for _Atomic T as a parameter in function types or just an oversight. It's hard to imagine what an atomic argument could possibly mean. Perhaps it was an oversight that they no longer feel comfortable fixing.

Doesn't it matter in the same way it matters for const parameters: it impacts the accesses to the parameter within the function body itself?

Sure, but it is not in any way necessary for this to be part of the function type, just as it isn't part of the function type for const. The parameter is an independent object and has no relation to anything in the caller, so making the atomic-ness part of the signature is pointless.

If we agree that the compiler probably ought to be defaulting to treating atomic types as qualifiers, I think the path is:

1. Introduce a function that looks through `_Atomic` as well as the other qualifiers.

2. Make a best effort to switch existing call sites of `getUnqualifiedType()` that immediately also look through `_Atomic` to instead use the common function. Probably ought to be the same patch as (1).

I think we've already done this though. We have getAtomicUnqualifiedType() which we use when we want to get a non-atomic, unqualified type.

3. Rework this patch to just use the new function.

4. Introduce a function that explicitly does not look through `_Atomic`.

Which is getUnqualifiedType() currently.

Right, it would have the same behavior as getUnqualifiedType() (and presumably would be implemented as a call to it). Making it a separate function lets you keep track of the fact that you considered a caller and deliberately did not want it to look through _Atomic, though.

I think asking Alejandro to do steps 1-3 as part of this bug fix is reasonable, and then the rest can be an open project.

So I think what I'm getting is: we should audit all existing calls to getUnqualifiedType() and getAtomicUnqualifiedType() to determine whether we're correctly handling atomics or not, and if there are more uses of getAtomicUnqualifiedType() after correcting issues, then maybe we want getUnqualifiedType() to look through atomics and rename getAtomicUnqualifiedType() to something else?

Right, with the tweak that the audit always switches away from getUnqualifiedType(): each call either becomes getAtomicUnqualifiedType() (I forgot that this already exists) or getUnqualifiedTypePreservingAtomic(), and then the audit is done when there are no remaining calls to getUnqualifiedType(). At that point, we decide if having a getUnqualifiedType() is a good idea or if we should just force all callers to explicitly choose.

And as mentioned, there's a short-term subset of the plan that unblocks this fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as mentioned, there's a short-term subset of the plan that unblocks this fix.

Thanks for the feedback and for the enlightening discussion. I will go in this direction, but it will take me a while, since I have quite a few things on the backlog.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, with the tweak that the audit always switches away from getUnqualifiedType(): each call either becomes getAtomicUnqualifiedType() (I forgot that this already exists) or getUnqualifiedTypePreservingAtomic(), and then the audit is done when there are no remaining calls to getUnqualifiedType(). At that point, we decide if having a getUnqualifiedType() is a good idea or if we should just force all callers to explicitly choose.

Okay, this is a plan I can get behind -- thank you for the good discussion!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as mentioned, there's a short-term subset of the plan that unblocks this fix.

Thanks for the feedback and for the enlightening discussion. I will go in this direction, but it will take me a while, since I have quite a few things on the backlog.

Before you get too far into anything else, could you try just changing the call to getUnqualifiedType() to getAtomicUnqualifiedType() in the original code and seeing if that fixes the problem? That should be fine for the local, short-term fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you get too far into anything else, could you try just changing the call to getUnqualifiedType() to getAtomicUnqualifiedType() in the original code and seeing if that fixes the problem? That should be fine for the local, short-term fix.

It fixes it, yes, thanks!

(Sorry for the force-push, I saw some seemingly unrelated tests were broken and I wanted to make sure it wasn't due to this, after un-merging main all tests are green)

clang/docs/ReleaseNotes.rst Outdated Show resolved Hide resolved

typedef _Atomic char atomic_char;

typedef _Atomic char atomic_char;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the redundant typedef is not a critical part of this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't, I have removed the duplicated line.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@alejandro-alvarez-sonarsource
Copy link
Contributor Author

Thanks! Do you mind doing the merge, since I can't?

@cor3ntin cor3ntin merged commit 252c83b into llvm:main Feb 18, 2025
9 checks passed
wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
…ic type (llvm#119711)

When a statement expression's last statement is an atomic variable, GCC
and Clang disagree on the type of the expression. This can be made
apparent using `typeof` and forcing a diagnostic message:

```cpp
_Atomic int a = 0;
typeof(({a;})) x = "0";
```

* GCC complains about initializing `int` with `char*`
* Clang complains about initializing `_Atomic(int)` with a `char[2]`

Due to the type of the statement expression being deduced to be atomic,
we end with three implicit casts inside the `StmtExpr` on the AST:

* `LValueToRValue` -> `AtomicToNonAtomic`  ->  `NonAtomicToAtomic`

In some situations, this can end on an assertion inside
`IntExprEvaluator`, as reported in llvm#106576.

With this patch, we now have two implicit casts, since the type of the
statement expression is deduced to be non-atomic:

* `LValueToRValue` -> `AtomicToNonAtomic`

This is consistent with the C standard (6.7.2.4, p4)

> The properties associated with atomic types are meaningful only for
expressions that are lvalues.

But a statement expression is an rvalue.

`IntExprEvaluator` assumptions are now satisfied and there is no
assertion error.
Additionally, the `typeof` trick mentioned above shows that the type is
consistently deduced between GCC and Clang.

Fixes llvm#106576

---------

Co-authored-by: John McCall <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash on code with StmtExpr and atomic char load in Expr::EvaluateAsRValue
6 participants