Skip to content

Commit eadd43e

Browse files
committed
[Feat] Add english doc.
1 parent 03e1f5f commit eadd43e

11 files changed

+2788
-15
lines changed

Diff for: README.md

+15-12
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
1-
# CodeFuse-Query A Data-Centric Static Code Analysis System
2-
<p align="center">
1+
# CodeFuse-Query: A Data-Centric Static Code Analysis System
2+
<div align="center">
33
<img src="https://github.com/codefuse-ai/MFTCoder/blob/main/assets/github-codefuse-logo-update.jpg" width="50%" />
4-
</p>
4+
</div>
55

6-
<p align="center">
6+
<div align="center">
7+
<p>
78
<a href="https://github.com/codefuse-ai/CodeFuse-Query">
89
<img alt="stars" src="https://img.shields.io/github/stars/codefuse-ai/CodeFuse-Query?style=social" />
910
</a>
@@ -22,9 +23,11 @@
2223
<a href="https://marketplace.visualstudio.com/items?itemName=CodeFuse-Query.codefuse-query-extension">
2324
<img alt="VSCode Plugin" src="https://img.shields.io/visual-studio-marketplace/i/CodeFuse-Query.codefuse-query-extension?style=social&logo=visualstudiocode&logoColor=%23007ACC" />
2425
</a>
25-
</p>
26-
27-
[中文文档](./README_zh.md)
26+
</p>
27+
<p>
28+
[[中文]](README_cn.md) | [**English**]
29+
</p>
30+
</div>
2831

2932
## What is CodeFuse-Query?
3033
In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design.
@@ -66,12 +69,12 @@ Note: The maturity level of the language status is determined based on the types
6669
[Installation, Configuration, and Running](./doc/3_install_and_run.md)
6770

6871
## Documentation
69-
- [Abstract](./doc/1_abstract.md)
70-
- [Introduction](./doc/2_introduction.md)
72+
- [Abstract](./doc/1_abstract.en.md)
73+
- [Introduction](./doc/2_introduction.en.md)
7174
- [User Case](./doc/user_case.en.md)
72-
- [Installation, Configuration, and Running](./doc/3_install_and_run.md)
73-
- [GödelScript Query Language](./doc/4_godelscript_language.md)
74-
- [Developing Plugins (VSCode)](./doc/5_toolchain.md)
75+
- [Installation, Configuration, and Running](./doc/3_install_and_run.en.md)
76+
- [GödelScript Query Language](./doc/4_godelscript_language.en.md)
77+
- [Developing Plugins (VSCode)](./doc/5_toolchain.en.md)
7578
- [COREF API](https://codefuse-ai.github.io/CodeFuse-Query/godel-api/coref_library_reference.html)
7679

7780
## Tutorial

Diff for: README_zh.md renamed to README_cn.md

+7-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
<img src="https://github.com/codefuse-ai/MFTCoder/blob/main/assets/github-codefuse-logo-update.jpg" width="50%" />
44
</p>
55

6-
<p align="center">
6+
<div align="center">
7+
<p>
78
<a href="https://github.com/codefuse-ai/CodeFuse-Query">
89
<img alt="stars" src="https://img.shields.io/github/stars/codefuse-ai/CodeFuse-Query?style=social" />
910
</a>
@@ -22,7 +23,11 @@
2223
<a href="https://marketplace.visualstudio.com/items?itemName=CodeFuse-Query.codefuse-query-extension">
2324
<img alt="VSCode Plugin" src="https://img.shields.io/visual-studio-marketplace/i/CodeFuse-Query.codefuse-query-extension?style=social&logo=visualstudiocode&logoColor=%23007ACC" />
2425
</a>
25-
</p>
26+
</p>
27+
<p>
28+
[**中文**] | [English](README.md)
29+
</p>
30+
</div>
2631

2732
[English Documentation](./README.md)
2833

Diff for: doc/1_abstract.en.md

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Abstract
2+
With the increasing popularity of large-scale software development, the demand for scalable and adaptable static code analysis techniques is growing. Traditional static analysis tools such as Clang Static Analyzer (CSA) or PMD have shown good results in checking programming rules or style issues. However, these tools are often designed for specific objectives and are unable to meet the diverse and changing needs of modern software development environments. These needs may relate to Quality of Service (QoS), various programming languages, different algorithmic requirements, and various performance needs. For example, a security team might need sophisticated algorithms like context-sensitive taint analysis to review smaller codebases, while project managers might need a lighter algorithm, such as one that calculates cyclomatic complexity, to measure developer productivity on larger codebases.
3+
4+
These diversified needs, coupled with the common computational resource constraints in large organizations, pose a significant challenge. Traditional tools, with their problem-specific computation methods, often fail to scale in such environments. This is why we introduced CodeQuery, a centralized data platform specifically designed for large-scale static analysis.
5+
In implementing CodeQuery, we treat source code and analysis results as data, and the execution process as big data processing, a significant departure from traditional tool-centric approaches. We leverage common systems in large organizations, such as data warehouses, data computation facilities like MaxCompute and Hive, OSS object storage, and flexible computing resources like Kubernetes, allowing CodeQuery to integrate seamlessly into these systems. This approach makes CodeQuery highly maintainable and scalable, capable of supporting diverse needs and effectively addressing changing demands. Furthermore, CodeQuery's open architecture encourages interoperability between various internal systems, facilitating seamless interaction and data exchange. This level of integration and interaction not only increases the degree of automation within the organization but also improves efficiency and reduces the likelihood of manual errors. By breaking down information silos and fostering a more interconnected, automated environment, CodeQuery significantly enhances the overall productivity and efficiency of the software development process.
6+
Moreover, CodeQuery's data-centric approach offers unique advantages when addressing domain-specific challenges in static source code analysis. For instance, source code is typically a highly structured and interconnected dataset, with strong informational and relational ties to other code and configuration files. By treating code as data, CodeQuery can adeptly handle these issues, making it especially suitable for use in large organizations where codebases evolve continuously but incrementally, with most code undergoing minor changes daily while remaining stable. CodeQuery also supports use cases like code-data based Business Intelligence (BI), generating reports and dashboards to aid in monitoring and decision-making processes. Additionally, CodeQuery plays an important role in analyzing training data for large language models (LLMs), providing deep insights to enhance the overall effectiveness of these models.
7+
8+
In the current field of static analysis, CodeQuery introduces a new paradigm. It not only meets the needs of analyzing large, complex codebases but is also adaptable to the ever-changing and diversified scenarios of static analysis. CodeQuery's data-centric approach gives it a unique advantage in dealing with code analysis issues in big data environments. Designed to address static analysis problems in large-scale software development settings, it views both source code and analysis results as data, allowing it to integrate flexibly into various systems within large organizations. This approach not only enables efficient handling of large codebases but can also accommodate various complex analysis needs, thereby making static analysis work more effective and accurate.
9+
10+
The characteristics and advantages of CodeQuery can be summarized as follows:
11+
12+
- **Highly Scalable**: CodeQuery can handle large codebases and adapt to different analysis needs. This high level of scalability makes CodeQuery particularly valuable in large organizations.
13+
- **Data-Centric**: By treating source code and analysis results as data, CodeQuery's data-centric approach gives it a distinct edge in addressing code analysis problems in big data environments.
14+
- **Highly Integrated**: CodeQuery can integrate seamlessly into various systems within large organizations, including data warehouses, data computation facilities, object storage, and flexible computing resources. This high level of integration makes the use of CodeQuery in large organizations more convenient and efficient.
15+
- **Supports Diverse Needs**: CodeQuery can process large codebases and accommodate various complex analysis needs, including QoS analysis, cross-language analysis, algorithmic needs, and performance requirements.
16+
17+
CodeQuery is a powerful static code analysis platform, suitable for large-scale, complex codebase analysis scenarios. Its data-centric approach and high scalability give it a unique advantage in the modern software development environment. As static code analysis technology continues to evolve, CodeQuery is expected to play an increasingly important role in this field.

Diff for: doc/1_abstract.md

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
# 引言
12
随着大规模软件开发的普及,对可扩展且易于适应的静态代码分析技术的需求正在加大。传统的静态分析工具,如 Clang Static Analyzer (CSA) 或 PMD,在检查编程规则或样式问题方面已经展现出了良好的效果。然而,这些工具通常是为了满足特定的目标而设计的,往往无法满足现代软件开发环境中多变和多元化的需求。这些需求可以涉及服务质量 (QoS)、各种编程语言、不同的算法需求,以及各种性能需求。例如,安全团队可能需要复杂的算法,如上下文敏感的污点分析,来审查较小的代码库,而项目经理可能需要一种相对较轻的算法,例如计算圈复杂度的算法,以在较大的代码库上测量开发人员的生产力。
23

34
这些多元化的需求,加上大型组织中常见的计算资源限制,构成了一项重大的挑战。由于传统工具采用的是问题特定的计算方式,往往无法在这种环境中实现扩展。因此,我们推出了 CodeQuery,这是一个专为大规模静态分析设计的集中式数据平台。

0 commit comments

Comments
 (0)