Skip to content

Commit a264311

Browse files
authored
update readme doc to the latest research and community status. (#134)
* update readme doc to the latest research and community status.
1 parent d42efd8 commit a264311

File tree

2 files changed

+48
-2
lines changed

2 files changed

+48
-2
lines changed

README.md

+48-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# JiaoziFS
1+
# JiaoziFS (JZFS)
22
A version control file system for data centric applications & teams.
33

44
<p align="left">
@@ -8,6 +8,40 @@ A version control file system for data centric applications & teams.
88
<br>
99
</p>
1010

11+
<img src="https://github.com/jiaozifs/jiaozifs/docs/logo/jiaozifs.png" width="100">
12+
13+
----
14+
### What is JiaoziFS?
15+
JiaoziFS is an industry-leading **Data-Centric Version Control** File System, helps ensure Responsible AI Engineering by improving **Data Versioning**, **Provenance**, and **Reproducibility**.
16+
17+
Note:
18+
* The name Jiaozi pays tribute to the world's earliest paper money: [Song Dynasty Jiaozi](https://en.wikipedia.org/wiki/Jiaozi_(currency)).
19+
* JiaoziFS is yet another implementation of [IPFS (InterPlanetary File System)](https://ipfs.tech/) as JiaoziFS will be compatible with the [implementation requirements](https://specs.ipfs.tech/architecture/principles/#ipfs-implementation-requirements) of IPFS.
20+
* As a filesystem of data versioning at scale, although JiaoziFS is built for machine learning, It has a wide range of use scenarios (refer A Universe of Uses) and can be seamlessly integrated into all your data stack.
21+
22+
Data-centric AI is about the practice of iterating and collaborating on data, used to build AI systems, programmatically. Machine learning pioneer Andrew Ng [argues that focusing on the quality of data fueling AI systems will help unlock its full power](https://youtu.be/TU6u_T-s68Y).
23+
24+
----
25+
### Why JiaoziFS?
26+
In production systems with machine learning components, updates and experiments are frequent. New updates to models(data products) may be released every day or every few minutes, and different users may see the results of different models as part of A/B experiments or canary releases.
27+
28+
* **Version Everything**: Data scientists are often criticized for being less disciplined with versioning their experiments(versioning of data, pipeline, code, and models), especially when using computational notebooks.
29+
* **Track Data Provenance**: This applies to all processing steps in an AI/ML pipeline, including data collection/acquisition, data merging, data cleaning, feature extraction, learning, or deployment.
30+
* **Reproducibility**: A final question of AI/ML that is often relevant for debugging, audits, and also science more broadly is to what degree data, models, and decisions can be reproduced.
31+
32+
----
33+
### A Universe of Uses
34+
JiaoziFS's versatility shines across different industries – making it the multi-purpose tool for the **data centric applications and teams**.
35+
36+
* **Enterprise DataHub & Data Collaboration**: Depending on your operating scale, you may even be managing multiple team members, who may be spread across different locations. JiaoziFS enable Collaborative Datasets Version Management at Scale,Share & collaborate easily: Instantly share insights and co-edit with your team.
37+
* **DataOps & Data Products & Data Mesh**: Augmenting Enterprise Data Development and Operations,JiaoziFS ensures Responsible DataOps/AIOps/MLOps by improving Data Versioning, Provenance, and Reproducibility. JiaoziFS makes a fusion of data science and product development and allows data to be containerized into shareable, tradeable, and trackable assets(data products or data NFTs). Versioning data products in a maturing Data Mesh environment via standard processes, data consumers can be informed about both breaking and non-breaking changes in a data product, as well as retirement of data products.
38+
* **Digital Twins for Manufacturing**: Developing digital twins for manufacturing involves managing tons of large files and multiple iterations of a project. All of the data collected and created in the digital twin process (and there is a lot of it) needs to be managed carefully. JiaoziFS allows you to manage changes to files over time and store these modifications in a database.
39+
40+
----
41+
### Spec
42+
[JiaoziFS Specification](https://github.com/jiaozifs/Spec)
43+
44+
----
1145
### Basic Build And Usage
1246

1347
#### Requirement
@@ -35,6 +69,18 @@ After following the above steps, you should be able to see an executable file na
3569
```bash
3670
docker run -v <data>:/app -p 34913:34913 gitdatateam/jzfs:latest --db "postgres://<user>:<password>@192.168.1.16:5432/jiaozifs?sslmode=disable" --bs_path /app/data --listen http://0.0.0.0:34913 --config /app/config.toml
3771
```
38-
## License
72+
73+
----
74+
### Cloud
75+
[Try without installing](https://cloud.jiaozifs.com)
76+
77+
----
78+
### Contributors
79+
80+
<a href="https://github.com/hunjixin" target="_blank"><img src="https://avatars.githubusercontent.com/u/41407352?v=4" width="5%" height="5%"/></a> <a href="https://github.com/Brownjy" target="_blank"><img src="https://avatars.githubusercontent.com/u/54040689?v=4" width="5%" height="5%"/></a> <a href="https://github.com/TsumikiQAQ" target="_blank"><img src="https://avatars.githubusercontent.com/u/116857998?v=4" width="5%" height="5%"/></a> <a href="https://github.com/taoshengshi" target="_blank"><img src="https://avatars.githubusercontent.com/u/33315004?v=4" width="5%" height="5%"/></a> <a href="https://github.com/gitdata001" target="_blank"><img src="https://avatars.githubusercontent.com/u/157772574?v=4" width="5%" height="5%"/></a>
81+
82+
----
83+
### License
3984

4085
Dual-licensed under [MIT](https://github.com/jiaozifs/jiaozifs/blob/main/LICENSE-MIT) + [Apache 2.0](https://github.com/jiaozifs/jiaozifs/blob/main/LICENSE-APACHE)
86+

docs/logo/jiaozifs.png

31.3 KB
Loading

0 commit comments

Comments
 (0)