Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing Doris sql #31508

Open
10 tasks
iamhucong opened this issue May 31, 2024 · 10 comments
Open
10 tasks

Support parsing Doris sql #31508

iamhucong opened this issue May 31, 2024 · 10 comments

Comments

@iamhucong
Copy link
Contributor

Background

Hi community.
The ShardingSphere SQL parser engine helps users to parse SQL to create the AST (Abstract Syntax Tree) and visit the AST to get SQLStatement (Java Object). Currently, we are planning to enhance the support for Doris SQL parsing in ShardingSphere.

More details:
https://shardingsphere.apache.org/document/current/en/reference/sharding/parse/

Issue Background Explanation

The current issue involves using a custom web scraping script to fetch SQL cases from the official website of a corresponding database. These SQL cases are then passed to ShardingSphere's parsing engine for analysis. For SQL cases that fail to be parsed successfully, every 3 to 5 SQL cases are grouped together as an issue.

  1. Since SQL cases are obtained through web scraping, it cannot be guaranteed that all SQL cases are correct. Please follow the following process to handle this pull request (PR).
  2. Some SQL cases may have already been fixed in other PRs. For cases that can already be executed successfully, simply leave a comment to ignore them.
  3. If a SQL case can be executed successfully without any code changes, there is no need to add a corresponding test assert file.

Task

This issue is to support more Doris sql parsing, as follows:

BITXOR
INSTR
RIGHT
EXTRACT URL PARAMETER
BITSHIFT, BITSHIFTRIGHT
LTRIM
RANK() OVER(partition_by_clause order_by_clause)
WINDOW,FUNCTION,RANK
ALTER RESOURCE 'spark0' PROPERTIES ("working_dir" = "hdfs://127.0.0.1:10000/tmp/doris_new")
ALTER RESOURCE 'remote_s3' PROPERTIES ("s3.connection.maximum" = "100")

Overall Procedure

If you intend to participate in fixing this issue, please feel free to leave a comment below the issue. Community members will assign the issue accordingly.

For example, you can leave a comment like this: "Hi, please assign this issue to me. Thank you!"

Once you have claimed the issue, please review the syntax of the SQL on the official website of the corresponding database. Execute the SQL on the respective database to ensure the correctness of the SQL syntax.

You can check the corresponding source of each SQL case on the official database website by clicking on the link provided below each case.

Next, execute the problematic SQL cases mentioned above in the database (you can quickly start the corresponding database using the Docker image for that database, and then connect to it using
a client you are familiar with), to ensure that the SQL syntax itself is correct.

Fixing ANTLR Grammar Parsing Issue

Once you have confirmed the correctness of the SQL syntax, you can validate and fix the grammar parsing issue in ShardingSphere.

If you are using IntelliJ IDEA, you will need to install the ANTLR plugin before proceeding.

If it is an ANTLR parsing error message, try to repair the .g4 file by comparing it with the official database syntax until the SQL can be correctly parsed by ANTLR.

When there is no error message in the ANTLR Preview window, it means that ANTLR can correctly parse the SQL.

Visitor problem fix

After ANTLR parses SQL into an abstract syntax tree, ShardingSphere will access the abstract syntax tree through Visitor and extract the required information.
If you need to extract Segments, you need to first execute:

mvn -T 2C clean install -DskipTests

Under the shardingsphere-parser module to compile the entire parser module.
Then rewrite the corresponding visit method in SQLStatementVisitorr as needed to extract the corresponding Segment.

Add assertion test file

After the above SQL parsing problem is repaired, the corresponding Test needs to be added.
The steps are as follows:

  1. Add the corresponding sql-case in the sql/supported directory.
  2. Add case assertions in the case directory of the shardingsphere-test-it-parser module.
  3. Run org.apache.shardingsphere.test.it.sql.parser.internal.InternalSQLParserIT
    After SQL Parser IT runs successfully, you can submit a PR.

Relevant Skills

  1. Master JAVA language
  2. Have a basic understanding of Antlr g4 file
  3. Be familiar with Doris SQLs
Copy link

There hasn't been any activity on this issue recently, and in order to prioritize active issues, it will be marked as stale.

@github-actions github-actions bot added the stale label Jun 30, 2024
@terrymanu terrymanu removed the stale label Jul 7, 2024
@danigiri
Copy link
Contributor

danigiri commented Oct 9, 2024

I could look at this (and some of the rest of Doris issues) as part of hacktoberfest

@iamhucong iamhucong modified the milestones: 5.5.1, 5.5.2 Oct 10, 2024
@danigiri
Copy link
Contributor

In the meantime I will look into setting up the environment and getting master to build (it doesn't)

@danigiri
Copy link
Contributor

I can't get it to build in MacOS but in Linux it does build first try. I am trying to copy the generated .class files and this seems to be a workaround.

Feel free to assign this to me, in the meantime I am giving it a try.

@danigiri
Copy link
Contributor

Thank you, at the moment working on a PoC for a single operation to get the hang of it.

@iamhucong
Copy link
Contributor Author

You are welcome. Feel free to ask if you have any questions at all.

@danigiri
Copy link
Contributor

No worries, I have the first PR ready, but I need to try the SQL in Apache Doris to verify it. Doris requires a AVX2 CPU and I do not have that hardware available. It asks to recompile, and after 24h of CPU time using 2 cores it still has not finished :)

CPU%   MEM%  VIRT  RES       PID USER          TIME+ THR  NI S  R/s W/s  
>271   26.9  4.94G 2.06G 1422354 root        44h9:27 20    0 S    ? ?    java 

@danigiri
Copy link
Contributor

First PR ready and managed to compile doris (by increasing heap size)

iamhucong added a commit that referenced this issue Oct 16, 2024
Added BITXOR to Doris parsing support (#31508)
danigiri added a commit to danigiri/shardingsphere that referenced this issue Oct 16, 2024
- related to apache#31508
- includes test
- `mvn spotless:apply -Pcheck` and `mvn test -Dtest=InternalDorisParserIT`
- will update release notes in next commit
- will check if MySQL supports INSTR in next commit
- verified query in Doris
danigiri added a commit to danigiri/shardingsphere that referenced this issue Oct 17, 2024
- related to apache#31508
- includes test
- `mvn spotless:apply -Pcheck` and `mvn test -Dtest=InternalDorisParserIT`
- will update release notes in next commit
- will check if MySQL supports INSTR in next commit
- verified query in Doris
danigiri added a commit to danigiri/shardingsphere that referenced this issue Oct 17, 2024
- related to apache#31508
- includes test
- `mvn spotless:apply -Pcheck` and `mvn test -Dtest=InternalDorisParserIT`
- will update release notes in next commit
- will check if MySQL supports INSTR in next commit
- verified query in Doris
danigiri added a commit to danigiri/shardingsphere that referenced this issue Oct 17, 2024
- related to apache#31508
- includes test
- `mvn spotless:apply -Pcheck` and `mvn test -Dtest=InternalDorisParserIT`
- will update release notes in next commit
- will check if MySQL supports INSTR in next commit
- verified query in Doris
danigiri added a commit to danigiri/shardingsphere that referenced this issue Oct 17, 2024
- related to apache#31508
- includes test
- `mvn spotless:apply -Pcheck` and `mvn test -Dtest=InternalDorisParserIT`
- will update release notes in next commit
- will check if MySQL supports INSTR in next commit
- verified query in Doris
danigiri added a commit to danigiri/shardingsphere that referenced this issue Oct 17, 2024
- related to apache#31508
- includes test
- `mvn spotless:apply -Pcheck` and `mvn test -Dtest=InternalDorisParserIT`
- will update release notes in next commit
- will check if MySQL supports INSTR in next commit
- verified query in Doris
strongduanmu pushed a commit that referenced this issue Oct 20, 2024
* Added support for INSTR to Doris parsing

- related to #31508
- includes test
- `mvn spotless:apply -Pcheck` and `mvn test -Dtest=InternalDorisParserIT`
- will update release notes in next commit
- will check if MySQL supports INSTR in next commit
- verified query in Doris

* Updated release notes to reflect #33258

* Added comment markers for INSTR and BIT_XOR

Small additional changes:
- Also moved the java code to match the order in g4 more.
- Added missing import as well.

* Added release notes

Also added missing `// DORIS ADDED BEGIN|END` marker
strongduanmu pushed a commit that referenced this issue Oct 27, 2024
* Adding parse support for STRRIGHT alias

RIGHT is already implemented and supported by both MySQL and Doris
- Added parsing suppor t for the `STRRIGHT` alias in Doris
- Updated existing tests to reflect that `RIGHT` is also supported by Doris
- Added STRRIGHT tests
- Marked `// DORIS ADDED BEGIN|END` Doris-specific changes
- Added as a `regularFunctionName` which is where `RIGHT` is

* fixed test for STRRIGHT

* updated release notes

* fixed tests (x2)

also removed MySQL as target for the alias

* double checked test results
@danigiri
Copy link
Contributor

danigiri commented Nov 4, 2024

looking at EXTRACT_URL_PARAMETER next

@linghengqian
Copy link
Member

No worries, I have the first PR ready, but I need to try the SQL in Apache Doris to verify it. Doris requires a AVX2 CPU and I do not have that hardware available. It asks to recompile, and after 24h of CPU time using 2 cores it still has not finished :)

strongduanmu pushed a commit that referenced this issue Dec 11, 2024
* added extract_url_parameter as extra

- added test
- added parser g4 in  `extractUrlParameterFunction`
- added visitor logic
- ran spotless
- ran (though will check again once committed and do a full test)
```
cd test/it/parser/
mvn -DskipTests=true generate-sources compile install -q
cd parser/sql/dialect/doris/
mvn test -Dit.test=InternalDorisParserIT
```

 Question
- do we collapse two consecutive changes?
So
```
// DORIS ADDED END
// DORIS ADDED BEGIN
```

would be
```
```

* fixed test index

* fixed all test indexes

* updated release notes
@taojintianxia taojintianxia modified the milestones: 5.5.2, 5.5.3 Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants