Merge pull request #1905 from xlsynth:cdleary/2024-12-23-exhasutiveness

PiperOrigin-RevId: 723651667
google · Feb 6, 2025 · 5ff8c64 · 5ff8c64
2 parents 06bb700 + 8247a72
commit 5ff8c64
Show file tree

Hide file tree

Showing 63 changed files with 2,745 additions and 175 deletions.
diff --git a/.github/workflows/continuous-integration.yml b/.github/workflows/continuous-integration.yml
@@ -44,7 +44,7 @@ jobs:
 
       - name: Bazel Test xls/... (opt)
         run: |
-          bazel test -c opt --noshow_progress --test_output=errors -- //xls/... -//xls/contrib/... -//xls/dev_tools/...
+          bazel test -c opt --noshow_progress --test_output=errors -- //xls/... -//xls/contrib/... -//xls/dev_tools/... //docs_src/... //xls/tests:documentation_test
 
       - name: Bazel Test Including Contrib & DevTools (opt)
         run: |

diff --git a/docs_src/design_docs/dslx_pattern_exhaustiveness.md b/docs_src/design_docs/dslx_pattern_exhaustiveness.md
@@ -0,0 +1,169 @@
+# DSLX Pattern Exhaustiveness
+
+Document contact: [cdleary](https://github.com/cdleary), first written
+2025-02-03
+
+## Overview
+
+DSLX supports pattern matching for a limited pattern grammar/syntax. This
+document describes the approach we take to checking the exhaustiveness of
+pattern matches.
+
+## Pattern Matching
+
+The pattern matching syntax supports the following:
+
+*   `_` -- a wildcard pattern.
+*   `name` -- a name pattern -- this can either bind a name, if it is currently
+    unbound, or can be used to check for equality if it is bound.
+
+    (Aside: This is admittedly a little scary, but it is identical to Rust, and
+    the defined-but-did-not-use warning usually flags any misuse.)
+
+*   `range` -- a range pattern.
+
+*   `colon::reference` -- a colon-ref pattern -- equality to some
+    externally-defined entity.
+
+*   `..` -- usable within a tuple, discards an arbitrary number of elements
+
+*   `()` -- tuples and nesting of tuple patterns within tuple patterns.
+
+Notably there is not support for structure pattern matching or arrays; these are
+desirable to match expressive ability of Rust but are not yet implemented.
+
+## Exhaustiveness Checking
+
+Exhaustiveness checking works by projecting types into a sparse N-dimensional
+array.
+
+At the start of exhaustiveness checking we use the type being matched on to make
+an N-dimensional array with the extent in each dimension being determined by the
+type; e.g. if we're matching on a `u32` we make an interval that represents the
+range `[0, 2^32-1]`.
+
+As we add patterns, we subtract out the space that the pattern covers from the
+overall space.
+
+If we manage to subtract the entire space, then we know that the pattern is
+exhaustive.
+
+If we don't manage to subtract the entire space, then we know that the pattern
+is not exhaustive.
+
+### Monolithic N-dimensional Space
+
+The exhaustiveness checker works by representing the entire space of values that
+can be matched on as a single N-dimensional space. This is conceptually simple
+but it flattens all values being matched on into a single N-dimensional sparse
+array, which means we discard all hierarchy present in the value being matched
+upon. It seems worth noting that this approach is monolithic and flattens the
+entire value space down to leaves, razing the tuple structure of the value being
+matched upon.
+
+### Subtracting from N-dimensional Space
+
+Note that there are two ways you could conceptually approach the idea of
+exhaustiveness -- additive and subtractive. In an additive approach you might
+unify spaces until you arrived at a space that was equal to the full space.
+Instead, our data structure subtracts, as it's easier to distribute subtraction
+over disjoint pieces. For example, consider a 2D rectangle where we subtract out
+the center of the rectangle, leaving just the left hand side and left hand side
+disjointly. We store that in our `NdRegion` as two disjoint contiguous
+`NdIntervals`, and then when we go to subtract out a new pattern (which can be
+represented as a `NdInterval` as a single pattern always occupies contiguous
+space) we can just subtract the new "user added interval" from both pieces of
+the space in "brute force" fashion without worrying about merging or which
+sub-space might be affected. This definitely seems to simplify things. The
+canonical paper on different approaches for this is Luc Maranget's "Warnings for
+pattern matching" -- admittedly I didn't do a lot of research for this
+implementation, though, the N-dimensional interval space was intuitive enough,
+so there may be better approaches we want to try over time.
+
+### Semantically Load-Bearing
+
+To be clear, exhaustiveness checking is semantically load-bearing; i.e. it is
+not just a lint, but a semantic check. Once exhaustive pattern match is proven
+it means we can turn the final arm of a match statement into an "else"
+condition, e.g. for a
+[priority select](https://google.github.io/xls/ir_semantics/#priority_sel) IR
+lowering in the `default` case.
+
+It was discussed among XLS steering folks whether to make this an opt-in
+language feature for purposes of landing, but consensus was that it was useful
+enough to enable by default -- this is one of the top reported pain points for
+DSLX writing -- just with warnings for redundant patterns default-off for now so
+that real-world code bases have time to transition.
+
+## Structure
+
+The code is initially structured as follows:
+
+*   `DeduceMatch` -- the main entry point for pattern matching and
+    exhaustiveness checking from the type system, we call into the
+    `MatchExhaustivenessChecker` for each pattern encountered in the match
+    statement.
+*   `match_exhaustiveness_checker.h` -- the main class for checking
+    exhaustiveness.
+
+    The deduce rule feeds this object pattern by pattern to check whether a
+    pattern has led us to the point of exhaustion. This streaming
+    pattern-at-a-time interface allows us also give a helpful warning when a
+    pattern is fully redundant with previous patterns, or if we have similarly
+    added patterns even though we've passed the point of exhaustion.
+
+    DSLX types and values of particular types are translated into intervals and
+    points at this level to subtract from the `NdRegion` that we maintain to
+    determine exhaustiveness.
+
+*   `nd_region.h` -- an N-dimensional region type `NdRegion` for representing
+    the space we're whittling down, with patterns, towards exhaustiveness. Note
+    that an `NdRegion` is a collection of disjoint `NdInterval`s that we
+    subtract from in "brute force" fashion as user-written patterns are
+    introduced to the space, as described above.
+
+    Each pattern can conceptually be translated into an `NdInterval` as it
+    represents some contiguous space of values in the overall N-dimensional
+    space.
+
+*   `interp_value_interval.h` -- an interval type `InterpValueInterval` for
+    representing the range of DSLX values for a given type; provides some basic
+    facilities for 1D interval arithmetic and queries like `Contains`,
+    `Intersects`, `Covers`, etc.
+
+## Notes on Wrinkles
+
+**Enums** in DSLX are conceptually a set of names that live in some underlying
+bit space representation, and may be sparse within that space. That is, there's
+nothing wrong with making `enum E : u8 { A = u8:5, B = u8:10 }`. The language
+contract is that there can never be an enum value that takes on an
+out-of-defined-namespace value. As a result, we project the enum namespace into
+a dense unsigned bit space and make intervals over that dense space. i.e. for
+the enum `E` above we would require two values to cover the entire space.
+Effectively, we don't care what the underlying bit representation is for the
+purpose of pattern matching exhaustion.
+
+Empty enums (i.e. enums with no defined values in its namespace) are similar to
+**zero-bit values** in that they have no real representable values. These
+bit-space could be defined to be trivially exhaustive, or impossible to match
+on -- we choose the latter for now because it's more convenient for
+implementation, we can call it a one-bit space and any value with this type
+trivially that one value (so we need to have one pattern covering it, but the
+pattern matches by definition).
+
+**Tokens**: Note that there is also a question of tokens which are
+zero-bit-like, but I imagine we don't want to re-bind tokens through a pattern
+match so we can observe its linear dataflow in a given function.
+
+**Arrays**: We don't currently support any interesting pattern syntax for
+arrays, and they can conceptually create large spaces of values in the
+flattening process, so this initial change for exhaustiveness makes them
+disallowed in matched expressions until support can be added more
+comprehensively. It's not for very serious reasons, however, they could be
+flattened in a similar fashion to tuples.
+
+**Zero-Element Ranges** are possible to write in DSLX, and so in building up
+intervals we have to keep a maybe-interval concept until we have resolved fully
+that there are nonzero values in the interval. These wind up being N-dimensional
+intervals with zero volume so they never subtract out any space in the
+exhaustion process.
diff --git a/docs_src/dslx_std.md b/docs_src/dslx_std.md
@@ -636,13 +636,15 @@ enum EnumType : u2 {
 }
 
 fn main(x: EnumType) -> u32 {
-    match x {
-        EnumType::FIRST => u32:0,
-        EnumType::SECOND => u32:1,
+    if x == EnumType::FIRST {
+        u32:0
+    } else if x == EnumType::SECOND {
+        u32:1
+    } else {
         // This should not be reachable.
         // But, if we synthesize hardware, under this condition the function is
         // well-defined to give back zero.
-        _ => fail!("unknown_EnumType", u32:0),
+        fail!("unknown_EnumType", u32:0)
     }
 }
 ```

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -68,6 +68,7 @@ nav:
     - Design Docs:
       - Legalize Multiple Channel Ops Per Channel: 'design_docs/legalize_multiple_channel_ops_per_channel.md'
       - Proc-scoped channels: 'design_docs/proc_scoped_channels.md'
+      - DSLX Pattern Exhaustiveness: 'design_docs/dslx_pattern_exhaustiveness.md'
     - Releasing: 'releasing.md'
   - NoC:
     - Overview: 'noc/xls_noc_readme.md'

diff --git a/xls/dslx/BUILD b/xls/dslx/BUILD
@@ -703,6 +703,7 @@ cc_test(
         ":warning_kind",
         "//xls/common:xls_gunit_main",
         "//xls/common/status:matchers",
+        "@com_google_absl//absl/status:status_matchers",
         "@com_google_googletest//:gtest",
     ],
 )

diff --git a/xls/dslx/bytecode/bytecode_interpreter_test.cc b/xls/dslx/bytecode/bytecode_interpreter_test.cc
@@ -921,8 +921,9 @@ fn main(x: u32) -> u32 {
 
   absl::StatusOr<InterpValue> value =
       Interpret(kProgram, "main", {InterpValue::MakeU32(2)});
-  EXPECT_THAT(value.status(), StatusIs(absl::StatusCode::kInternal,
-                                       HasSubstr("The value was not matched")));
+  EXPECT_THAT(value.status(),
+              StatusIs(absl::StatusCode::kInvalidArgument,
+                       HasSubstr("Match pattern is not exhaustive")));
 }
 
 TEST_F(BytecodeInterpreterTest, RunMatchWithNameRefs) {

diff --git a/xls/dslx/create_import_data.cc b/xls/dslx/create_import_data.cc
@@ -38,14 +38,14 @@ ImportData CreateImportData(
   return import_data;
 }
 
-ImportData CreateImportDataForTest(
-    std::unique_ptr<VirtualizableFilesystem> vfs) {
+ImportData CreateImportDataForTest(std::unique_ptr<VirtualizableFilesystem> vfs,
+                                   WarningKindSet warnings) {
   if (vfs == nullptr) {
     vfs = std::make_unique<RealFilesystem>();
   }
   absl::Span<const std::filesystem::path> additional_search_paths = {};
   ImportData import_data(xls::kDefaultDslxStdlibPath, additional_search_paths,
-                         kDefaultWarningsSet, std::move(vfs));
+                         warnings, std::move(vfs));
   import_data.SetBytecodeCache(std::make_unique<BytecodeCache>());
   return import_data;
 }

diff --git a/xls/dslx/create_import_data.h b/xls/dslx/create_import_data.h
@@ -25,6 +25,7 @@
 #include "xls/dslx/import_data.h"
 #include "xls/dslx/virtualizable_file_system.h"
 #include "xls/dslx/warning_kind.h"
+
 namespace xls::dslx {
 
 // Creates an ImportData with the given stdlib and search paths and assigns a
@@ -37,7 +38,8 @@ ImportData CreateImportData(
 // Creates an ImportData with reasonable defaults (standard path to the stdlib
 // and no additional search paths).
 ImportData CreateImportDataForTest(
-    std::unique_ptr<VirtualizableFilesystem> vfs = nullptr);
+    std::unique_ptr<VirtualizableFilesystem> vfs = nullptr,
+    WarningKindSet warnings = kAllWarningsSet);
 
 std::unique_ptr<ImportData> CreateImportDataPtrForTest();
 

diff --git a/xls/dslx/exhaustiveness/BUILD b/xls/dslx/exhaustiveness/BUILD
@@ -0,0 +1,107 @@
+# Copyright 2025 The XLS Authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+package(
+    default_applicable_licenses = ["//:license"],
+    default_visibility = ["//xls:xls_internal"],
+    licenses = ["notice"],  # Apache 2.0
+)
+
+cc_library(
+    name = "interp_value_interval",
+    srcs = ["interp_value_interval.cc"],
+    hdrs = ["interp_value_interval.h"],
+    deps = [
+        "//xls/dslx:interp_value",
+        "@com_google_absl//absl/log",
+        "@com_google_absl//absl/log:check",
+        "@com_google_absl//absl/strings:str_format",
+    ],
+)
+
+cc_test(
+    name = "interp_value_interval_test",
+    srcs = ["interp_value_interval_test.cc"],
+    deps = [
+        ":interp_value_interval",
+        "//xls/common:xls_gunit_main",
+        "//xls/common/status:matchers",
+        "//xls/common/status:status_macros",
+        "@com_google_googletest//:gtest",
+    ],
+)
+
+cc_library(
+    name = "nd_region",
+    srcs = ["nd_region.cc"],
+    hdrs = ["nd_region.h"],
+    deps = [
+        ":interp_value_interval",
+        "//xls/dslx:interp_value",
+        "@com_google_absl//absl/base:core_headers",
+        "@com_google_absl//absl/log",
+        "@com_google_absl//absl/strings",
+        "@com_google_absl//absl/types:span",
+    ],
+)
+
+cc_test(
+    name = "nd_region_test",
+    srcs = ["nd_region_test.cc"],
+    deps = [
+        ":nd_region",
+        "//xls/common:xls_gunit_main",
+        "//xls/common/fuzzing:fuzztest",
+        "//xls/common/status:matchers",
+        "//xls/common/status:status_macros",
+        "@com_google_googletest//:gtest",
+    ],
+)
+
+cc_library(
+    name = "match_exhaustiveness_checker",
+    srcs = ["match_exhaustiveness_checker.cc"],
+    hdrs = ["match_exhaustiveness_checker.h"],
+    deps = [
+        ":interp_value_interval",
+        ":nd_region",
+        "//xls/common:visitor",
+        "//xls/dslx:import_data",
+        "//xls/dslx:interp_value",
+        "//xls/dslx/frontend:ast",
+        "//xls/dslx/frontend:pos",
+        "//xls/dslx/type_system:type",
+        "//xls/dslx/type_system:type_info",
+        "@com_google_absl//absl/log",
+    ],
+)
+
+cc_test(
+    name = "exhaustiveness_match_test",
+    srcs = ["exhaustiveness_match_test.cc"],
+    deps = [
+        ":match_exhaustiveness_checker",
+        "//xls/common:visitor",
+        "//xls/common:xls_gunit_main",
+        "//xls/common/status:matchers",
+        "//xls/dslx:create_import_data",
+        "//xls/dslx:parse_and_typecheck",
+        "//xls/dslx/frontend:ast",
+        "//xls/dslx/type_system:type",
+        "//xls/dslx/type_system:type_info",
+        "@com_google_absl//absl/log",
+        "@com_google_absl//absl/status:status_matchers",
+        "@com_google_googletest//:gtest",
+    ],
+)