Skip to content

Commit 8d7ffb4

Browse files
idavisorpuente-MS
andauthored
Reimplement Parser (#2149)
## An intro to the PR with some useful links for the relevant sections: This PR has two major components. A custom QASM3 parser, which replaces the IBM parser. And a refactored version of the compiler. The custom QASM3 parser has three components: 1. The raw lexer. 2. The cooked lexer. 3. Parser. The refactored compiler has two components: 4. Lowerer. 5. Compiler. Below is more detailed overview of the purpose of each component: The OpenQASM 3.0 grammar will be useful while reviewing sections 1-2 https://openqasm.com/grammar/index.html 1. The raw lexer: Takes the source code as input and returns a stream of "raw tokens". You can think of these mostly as characters like '(' and type literals. 2. The cooked lexer: Takes the raw token stream and returns a stream of cooked tokens. Cooked tokens have more knowledge about QASM3 syntax, and they make parsing easier. 3. Parser: Takes the stream of cooked tokens as input and returns an AST. We call it the syntax AST, to differentiate it from the semantic AST, which will be introduced later. The grammar and specification of QASM3 don't fully agree. We had to take some decissions while implementing the parser. They are documented in the rust code. If you want to double check anything, the language spec will come in handy: https://openqasm.com/language/index.html 4. Lowerer: Takes the AST returned by the parser and performs semantic analysis on it. This is where we report any QASM3 related errors, like "functions can only be defined on the global scope." The lowerer returns a semantic AST. You will want to use the language spec to verify that the errors that we are checking for are actually on the spec, or that we are not missing any checks. https://openqasm.com/language/index.html 5. Compiler: Takes the semantic AST and compiles it to a Q# AST. This is where we report any Q# related errors, like unsupported QASM3 features because they don't make sense in Q#. The compiler is a very straighforward mapping from the semantic AST to the Q# AST, since most of the heavy lifting is done during lowering. ## Recomended Sections for reviewers: qsc_qasm3/src/lib.rs (Entry point and general structure) Mine, Stefan qsc_qasm3/src/parser/completion.rs (Completion) qsc_qasm3/src/io.rs (IO) Mine qsc_qasm3/src/lex.rs (Raw and Cooked lexer) Scott qsc_qasm3/src/parser.rs (Parser) qsc_qasm3/src/parser/ast.rs (QASM3 AST, double check against grammar & Spec) Scott qsc_qasm3/src/oqasm_helpers.rs qsc_qasm3/src/semantic/types.rs (QASM3 types) qsc_qasm3/src/semantic.rs (Lowerer entry point) qsc_qasm3/src/lowerer.rs This requires QASM3 specific knowledge. We can use as many eyes as possible. Dmitry, Scott qsc_qasm3/src/stdlib.rs qsc_qasm3/src/stdlib/angle.rs qsc_qasm3/src/stdlib/QasmStdrs Dmitry qsc_qasm3/src/types.rs (Types) qsc_qasm3/src/ast_builder.rs qsc_qasm3/src/compiler.rs (Compiler) qsc_qasm3/src/runtime.rs (Runtime Features) Stefan fuzz/ (Testing) Mine, Stefan pip/src/ (Interop) qsc/src/ qsc_codegen/src/ Mine, Stefan .github/fuzz (GitHub pipeline) Ian, Stefan # Tracking items ## Lexing - [x] Basic raw tokens - [x] Basic cooked tokens - [x] Cook pragma and annotation ## Parsing - [x] Version #2148 - [x] Include #2148 - [x] pragma #2148 - [x] annotation #2148 - [x] Statements - - [x] alias #2191 - [x] array concatenation #2191 - [x] assignment #2191 - [x] barrier #2200 - [x] box #2200 - [x] break #2191 - [x] cal #2200 - [x] calibration grammar #2200 - [x] classical decl - [x] Bit, int, uint, float, angle, bool, duration, stretch #2165 - [x] Arrays #2180 - [x] Measurement result #2180 - [x] const decls #2165 - [x] continue #2191 - [x] def #2176 - [x] defcal #2200 - [x] delay #2200 - [x] end #2191 - [x] expression statement #2191 - [x] extern #2176 - [x] for loops #2191 - [x] gate call #2200 - [x] gate statement #2176 - [x] gphase #2200 - [x] if statement #2191 - [x] IO decls #2165 - [x] measure arrow #2200 - [x] old style decl #2176 - [x] quantum decl #2160 - [x] reset #2200 - [x] return #2176 - [x] switch statements #2178 - [x] while loops #2191 - [x] gate def #2176 - [x] Precedence #2166 - [x] Expressions - - [x] ident #2148 - [x] unary #2166 - [x] binary #2166 - [x] literal #2160 - [x] timing literal #2200 - [x] function call #2166 - [x] cast #2166 - [x] assignment #2216 - [x] assignment op #2216 - [x] Check for cyclic includes #2278 - [x] Document parsing functions with grammar defs and spec notes #2222 - [x] Remove measurement exprs from const decl parsing. Update notes on the grammar docs as this is a spec bug. #2272 ## Lowering - [x] Constant Evaluation for array sizes and type widths #2246 - [x] Statements - - [x] alias #2221 - [x] assignment - [x] simple #2221 - [x] indexed #2221 - [x] barrier #2246 - [x] box (unimplemented) - [x] break #2272 - [x] cal (unimplemented) - [x] calibration grammar (unimplemented) - [x] classical decls with default and literal initializers - [x] Bit, int, uint, float, angle, bool, duration, stretch #2221 - [x] creg #2246 - [x] qreg #2246 - [x] Bitarrays #2246 - [x] Arrays (unimplemented) - [x] Measurement result #2246 - [x] classical decls with all expressions and casting #2214 - [x] Bit, int, uint, float, angle, bool, duration, stretch #2221 - [x] creg #2246 - [x] tests see bit[] tests for cases #2246 - [x] qreg #2246 - [x] tests - see qubit[] tests for cases #2246 - [x] Bitarrays #2246 - [x] Arrays (unimplemented) - [x] Measurement result #2246 - [x] continue #2272 - [x] def #2254 - [x] defcal (unimplemented) - [x] delay (unimplemented) - [x] end #2271 - [x] expression statement #2221 - [x] extern #2274 - [x] for loops #2232 - [x] gate call #2246 - [x] broadcast (unimplemented) - [x] gphase #2254 - [x] if statement #2232 - [x] IO decls - [x] measure arrow #2246 - [x] old style decl #2246 - [x] quantum decl #2246 - [x] reset #2246 - [x] return #2254 - [x] add cast of return value to containing function return ty #2272 - [x] switch statements #2232 - [x] while loops #2232 - [x] gate def #2254 - [x] Return semantic error on anonymous blocks and switch statements if version is set to 3.0 as they were introduced in 3.1 #2232 - [x] Expressions - [x] binary #2239 - [x] unary #2239 - [x] function call #2254 - [x] literals - [x] bit, bool, float, int, uint #2221 - [x] angle #2267 - [x] complex #2268 - [x] duration #2272 - [x] cast (unimplemented) - [x] Review ignored tests from old parser #2272 - [x] SimulatableIntrinsic support #2271 - [x] Error when not applied to a gate or def #2271 ## Compiling - [x] Only set `EntryPoint()` attr on operation when compiling in file mode - [x] Statements - - [x] alias (unsupported) #2239 - [x] array concatenation - [x] assignment #2239 - [x] barrier #2246 - [x] box (unimplemented) - [x] pragma impls (unimplemented) - [x] break (unsupported) #2272 - [x] cal (unimplemented) - [x] calibration grammar (unimplemented) - [x] classical decls #2239 - [x] Bit, int, float, angle, bool, duration, stretch #2239 - [x] creg #2246 - [x] qreg #2246 - [x] Arrays (unimplemented) - [x] Measurement result #2246 - [x] continue (unsupported) #2272 - [x] def #2254 - [x] simulatable intrinsic #2271 - [x] end #2271 - [x] expression statement #2239 - [x] extern (unimplemented) - [x] for loops #2254 - [x] gate call #2246 - [x] gphase #2254 - [x] if statement #2254 - [x] IO decls #2239 - [x] measure arrow #2246 - [x] old style decl #2246 - [x] quantum decl #2246 - [x] reset #2246 - [x] return #2254 - [x] switch statements #2254 - [x] while loops #2239 - [x] gate def #2254 - [x] simulatable intrinsic #2272 - [x] gate scopes capture const variables in the outside scopes. We compiled gates to lambda operations to get this behavior. But this breaks simulatable intrinsics. Now that we have const evaluation, we can overcome this. #2270 - [x] delay (unimplemented) - [x] barrier #2246 - [x] Expressions - [x] binary #2239 - [x] unary #2239 - [x] function call #2254 - [x] literals - [x] bit, bool, float, int, uint #2221 - [x] angle #2267 - [x] complex #2268 - [x] duration (unsupported) #2239 - [x] cast (unimplemented) - [x] Move runtime functions in new dependency package so that they are parsed as a unit and imported by user code. This enables spans and a file name for runtime calls that get generated. ## Fit and Finish - [x] Move Q# related semantic errors from Lowerer to Compiler - [x] Add better error messages to const evaluator. We can pass `&mut Lowerer` as an argument so that it can push semantic errors. #2277 - [x] Clean up TODO items #2285 - [x] Review compiler error messages #2286 - [x] Review parser error messages #2286 - [x] Review lexer error messages #2286 - [x] Delete old compiler internals that are no longer used and remove stale refs #2284 - [x] Benchmark parser #2288 - [ ] Remove dead_code ## Postponed - [ ] Update ast_builder calls with module names instead of old ns names - [ ] `bit x = 0; bit y = ~x;` is valid QASM3, but it gets compiled to `let x = Zero; let y = ~~~x;` which is invalid Q# code, since the `Result` type in Q# doesn't support unary bitwise negation. This is the same for all other operations that bit should support, Q# Result only supports equality. - [ ] We are currently not enforcing that qasm3 `uint` types must be positive, since they are compiled to Q# `Int` which are signed. We might need a `UInt` type similar to the `Angle` type introduced in #2267 to be able to enforce this constraint. - [ ] uint 63 bitness with 64 bigint. Separate ops for uint? - [ ] Lower `Ident` as `Rc<Symbol>` and not as `SymbolId` to avoid an unnecessary `SymbolTable` lookup? In that way we don't even need to pass the symbol table to the compiler. - [ ] Use the formal parameters' `SymbolId`s when creating the function type, so that we can give the user a better error message when they pass an argument that fails implicit casting to a function. There is a catch: we need to insert the function symbol into the symbol table before pushing the scope where the formal parameters live, but we need the `SymbolId`s of the formal parameters to construct the function symbol. - [ ] Create qubit cleanup calls which can be conditionally added to end of program during compile, not done for fragments #2281 - [ ] Every time we allow an arithmetic lint in `compiler/qsc_qasm3/src/semantic/ast/const_eval.rs` we need to issue the same lint in Q#. - [ ] Double check cast from `int` to `uint` and cast from `uint` to `int` in const evaluator. Both types are represented as `i64` so there is nothing to do, which is confusing. - [ ] Profile the usage of `Box<[Box<T>]>` with iai-callgrind in a large OpenQASM3 sample to verify that is actually faster than using Vec<T>. Even though Box<T> uses less stack space, it reduces cache locality, because now you need to be jumping around in memory to read contiguous elements of a list. I suspect that what we really want to use if `Box<[T]>`. - [ ] Input decls are pushed to the symbol table, but should not be in the stmts list. This may be an issue for tooling as there isn't a way to have a forward declared variable in Q#. `compiler/qsc_qasm3/src/compiler.rs::QasmCompiler::compile_output_decl_stmt`. - [ ] See the comment on `compiler/qsc_qasm3/src/compiler.rs::QasmCompiler::create_entry_item` saying "This can create a collision on multiple compiles when interactive. We also have issues with the new entry point inference logic." - [ ] Minimize number of errors reported in the compiler. There are many situations where we report duplicated or unnecessary extra errors that could make harder for the user to fix the actual problem. For example, see the test named `fuzzer_issue_2294`. The one valuable error there is the one saying "undefined symbol _`. --------- Co-authored-by: orpuente-MS <[email protected]> Co-authored-by: Oscar Puente <[email protected]>
1 parent 4dc9e93 commit 8d7ffb4

File tree

243 files changed

+54215
-10395
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

243 files changed

+54215
-10395
lines changed

.github/workflows/fuzz.yml

+21-17
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ env:
77
TMIN_LOG_FNAME: fuzz.tmin.log # File name to redirect the fuzzing input minimization log to.
88
GH_ISSUE_TEMPLATE_RFPATH: .github/ISSUE_TEMPLATE/fuzz_bug_report.md
99
# GitHub issue template rel file path.
10-
TARGET_NAME: compile # Fuzzing target name. Fuzzes the `compile` func of the Q# compiler.
1110
ARTIFACTS_RDPATH: fuzz/artifacts # Fuzzing artifacts rel dir path.
1211
SEEDS_RDPATH: fuzz/seed_inputs # Fuzzing seed inputs rel dir path.
1312
SEEDS_FNAME: list.txt # Fuzzing seed inputs list file name.
@@ -31,11 +30,15 @@ jobs:
3130
fuzz:
3231
name: Fuzzing
3332
strategy:
33+
fail-fast: false
3434
matrix:
3535
os: [ubuntu-latest] # Fuzzing is not supported on Win. The macos is temporarily removed
3636
# because of low availability.
37-
runs-on: ${{ matrix.os }}
37+
target_name: [qsharp, qasm]
3838

39+
runs-on: ${{ matrix.os }}
40+
permissions:
41+
issues: write
3942
steps:
4043
- name: Install and Configure Tools
4144
run: |
@@ -49,29 +52,30 @@ jobs:
4952
submodules: "true"
5053

5154
- name: Gather the Seed Inputs
55+
if: matrix.target_name == 'qsharp'
5256
run: |
5357
cd $OWNER_RDPATH # Enter the dir containing the fuzzing infra.
5458
5559
# Clone the submodules of QDK:
5660
REPOS="Quantum Quantum-NC QuantumKatas QuantumLibraries iqsharp qdk-python qsharp-compiler qsharp-runtime"
5761
for REPO in $REPOS ; do
5862
git clone --depth 1 --single-branch --no-tags --recurse-submodules --shallow-submodules --jobs 4 \
59-
https://github.com/microsoft/$REPO.git $SEEDS_RDPATH/$TARGET_NAME/$REPO
63+
https://github.com/microsoft/$REPO.git $SEEDS_RDPATH/${{ matrix.target_name }}/$REPO
6064
done
6165
6266
# Build a comma-separated list of all the .qs files in $SEEDS_FNAME file:
63-
find $SEEDS_RDPATH/$TARGET_NAME -name "*.qs" | tr "\n" "," > \
64-
$SEEDS_RDPATH/$TARGET_NAME/$SEEDS_FNAME
67+
find $SEEDS_RDPATH/${{ matrix.target_name }} -name "*.qs" | tr "\n" "," > \
68+
$SEEDS_RDPATH/${{ matrix.target_name }}/$SEEDS_FNAME
6569
6670
- name: Build and Run the Fuzz Target
6771
run: |
6872
cd $OWNER_RDPATH # Enter the dir containing the fuzzing infra.
69-
cargo fuzz build --release --sanitizer=none --features do_fuzz $TARGET_NAME # Build the fuzz target.
73+
cargo fuzz build --release --sanitizer=none --features do_fuzz ${{ matrix.target_name }} # Build the fuzz target.
7074
7175
# Run fuzzing for specified number of seconds and redirect the `stderr` to a file
7276
# whose name is specified by the STDERR_LOG_FNAME env var:
73-
RUST_BACKTRACE=1 cargo fuzz run --release --sanitizer=none --features do_fuzz $TARGET_NAME -- \
74-
-seed_inputs=@$SEEDS_RDPATH/$TARGET_NAME/$SEEDS_FNAME \
77+
RUST_BACKTRACE=1 cargo fuzz run --release --sanitizer=none --features do_fuzz ${{ matrix.target_name }} -- \
78+
-seed_inputs=@$SEEDS_RDPATH/${{ matrix.target_name }}/$SEEDS_FNAME \
7579
-max_total_time=$DURATION_SEC \
7680
-rss_limit_mb=4096 \
7781
-max_len=20000 \
@@ -116,33 +120,33 @@ jobs:
116120
# the subsequent `run:` and `uses:` steps.
117121
118122
# Determine the name of a file containing the input of interest (that triggers the panic/crash):
119-
if [ -e $ARTIFACTS_RDPATH/$TARGET_NAME/crash-* ]; then # Panic and Stack Overflow Cases.
123+
if [ -e $ARTIFACTS_RDPATH/${{ matrix.target_name }}/crash-* ]; then # Panic and Stack Overflow Cases.
120124
TO_MINIMIZE_FNAME=crash-*;
121-
elif [ -e $ARTIFACTS_RDPATH/$TARGET_NAME/oom-* ]; then # Out-of-Memory Case.
125+
elif [ -e $ARTIFACTS_RDPATH/${{ matrix.target_name }}/oom-* ]; then # Out-of-Memory Case.
122126
TO_MINIMIZE_FNAME=oom-*;
123127
else
124-
echo -e "File to minimize not found.\nContents of artifacts dir \"$ARTIFACTS_RDPATH/$TARGET_NAME/\":"
125-
ls $ARTIFACTS_RDPATH/$TARGET_NAME/
128+
echo -e "File to minimize not found.\nContents of artifacts dir \"$ARTIFACTS_RDPATH/${{ matrix.target_name }}/\":"
129+
ls $ARTIFACTS_RDPATH/${{ matrix.target_name }}/
126130
fi
127131
128132
if [ "$TO_MINIMIZE_FNAME" != "" ]; then
129133
echo "TO_MINIMIZE_FNAME: $TO_MINIMIZE_FNAME"
130134
131135
# Minimize the input:
132-
( cargo fuzz tmin --release --sanitizer=none --features do_fuzz -r 10000 $TARGET_NAME $ARTIFACTS_RDPATH/$TARGET_NAME/$TO_MINIMIZE_FNAME 2>&1 ) > \
136+
( cargo fuzz tmin --release --sanitizer=none --features do_fuzz -r 10000 ${{ matrix.target_name }} $ARTIFACTS_RDPATH/${{ matrix.target_name }}/$TO_MINIMIZE_FNAME 2>&1 ) > \
133137
$TMIN_LOG_FNAME || MINIMIZATION_FAILED=1
134138
135139
# Get the minimized input relative faile path:
136140
if [ "$MINIMIZATION_FAILED" == "1" ]; then
137141
# Minimization failed, get the latest successful minimized input relative faile path:
138142
MINIMIZED_INPUT_RFPATH=`
139143
cat $TMIN_LOG_FNAME | grep "CRASH_MIN: minimizing crash input: " | tail -n 1 |
140-
sed "s|^.*\($ARTIFACTS_RDPATH/$TARGET_NAME/[^\']*\).*|\1|"`
144+
sed "s|^.*\($ARTIFACTS_RDPATH/${{ matrix.target_name }}/[^\']*\).*|\1|"`
141145
else
142146
# Minimization Succeeded, get the reported minimized input relative faile path::
143147
MINIMIZED_INPUT_RFPATH=`
144148
cat $TMIN_LOG_FNAME | grep "failed to minimize beyond" |
145-
sed "s|.*\($ARTIFACTS_RDPATH/$TARGET_NAME/[^ ]*\).*|\1|" `
149+
sed "s|.*\($ARTIFACTS_RDPATH/${{ matrix.target_name }}/[^ ]*\).*|\1|" `
146150
fi
147151
echo "MINIMIZED_INPUT_RFPATH: $MINIMIZED_INPUT_RFPATH"
148152
echo "MINIMIZED_INPUT_RFPATH=$MINIMIZED_INPUT_RFPATH" >> "$GITHUB_ENV"
@@ -187,8 +191,8 @@ jobs:
187191
path: |
188192
${{ env.OWNER_RDPATH }}/${{ env.STDERR_LOG_FNAME }}
189193
${{ env.OWNER_RDPATH }}/${{ env.TMIN_LOG_FNAME }}
190-
${{ env.OWNER_RDPATH }}/${{ env.ARTIFACTS_RDPATH }}/${{ env.TARGET_NAME }}/*
191-
${{ env.OWNER_RDPATH }}/${{ env.SEEDS_RDPATH }}/${{ env.TARGET_NAME }}/${{ env.SEEDS_FNAME }}
194+
${{ env.OWNER_RDPATH }}/${{ env.ARTIFACTS_RDPATH }}/${{ matrix.target_name }}/*
195+
${{ env.OWNER_RDPATH }}/${{ env.SEEDS_RDPATH }}/${{ matrix.target_name }}/${{ env.SEEDS_FNAME }}
192196
if-no-files-found: error
193197

194198
- name: "If Fuzzing Failed: Report GutHub Issue"

0 commit comments

Comments
 (0)