|
| 1 | +--- |
| 2 | +sidebar_position: 1 |
| 3 | +sidebar_label: Introduction |
| 4 | +--- |
| 5 | + |
| 6 | +# The Zed Project |
| 7 | + |
| 8 | +Zed offers a new approach to data that makes it easier to manipulate and manage |
| 9 | +your data. |
| 10 | + |
| 11 | +With Zed's new [super-structured data model](formats/README.md#2-zed-a-super-structured-pattern), |
| 12 | +messy JSON data can easily be given the fully-typed precision of relational tables |
| 13 | +without giving up JSON's uncanny ability to represent eclectic data. |
| 14 | + |
| 15 | +## Getting Started |
| 16 | + |
| 17 | +Trying out Zed is easy: just [install](install.md) the command-line tool |
| 18 | +[`zq`](commands/zq.md) and run through the [zq tutorial](tutorials/zq.md). |
| 19 | + |
| 20 | +`zq` is a lot like [`jq`](https://stedolan.github.io/jq/) |
| 21 | +but is built from the ground up as a search and analytics engine based |
| 22 | +on the [Zed data model](formats/zed.md). Since Zed data is a |
| 23 | +proper superset of JSON, `zq` also works natively with JSON. |
| 24 | + |
| 25 | +While `zq` and the Zed data formats are production quality, the Zed project's |
| 26 | +[Zed data lake](commands/zed.md) is a bit [earlier in development](commands/zed.md#status). |
| 27 | + |
| 28 | +For a non-technical user, Zed is as easy to use as web search |
| 29 | +while for a technical user, Zed exposes its technical underpinnings |
| 30 | +in a gradual slope, providing as much detail as desired, |
| 31 | +packaged up in the easy-to-understand |
| 32 | +[ZSON data format](formats/zson.md) and |
| 33 | +[Zed language](language/README.md). |
| 34 | + |
| 35 | +## Terminology |
| 36 | + |
| 37 | +"Zed" is an umbrella term that describes |
| 38 | +a number of different elements of the system: |
| 39 | +* The [Zed data model](formats/zed.md) is the abstract definition of the data types and semantics |
| 40 | +that underlie the Zed formats. |
| 41 | +* The [Zed formats](formats/README.md) are a family of |
| 42 | +[sequential (ZNG)](formats/zng.md), [columnar (VNG)](formats/vng.md), |
| 43 | +and [human-readable (ZSON)](formats/zson.md) formats that all adhere to the |
| 44 | +same abstract Zed data model. |
| 45 | +* A [Zed lake](commands/zed.md) is a collection of Zed data stored |
| 46 | +across one or more [data pools](commands/zed.md#data-pools) with ACID commit semantics and |
| 47 | +accessed via a [Git](https://git-scm.com/)-like API. |
| 48 | +* The [Zed language](language/README.md) is the system's dataflow language for performing |
| 49 | +queries, searches, analytics, transformations, or any of the above combined together. |
| 50 | +* A [Zed query](language/overview.md) is a Zed script that performs |
| 51 | +search and/or analytics. |
| 52 | +* A [Zed shaper](language/shaping.md) is a Zed script that performs |
| 53 | +data transformation to _shape_ |
| 54 | +the input data into the desired set of organizing Zed data types called "shapes", |
| 55 | +which are traditionally called _schemas_ in relational systems but are |
| 56 | +much more flexible in the Zed system. |
| 57 | + |
| 58 | +## Digging Deeper |
| 59 | + |
| 60 | +The [Zed language documentation](language/README.md) |
| 61 | +is the best way to learn about `zq` in depth. |
| 62 | +All of its examples use `zq` commands run on the command line. |
| 63 | +Run `zq -h` for a list of command options and online help. |
| 64 | + |
| 65 | +The [Zed lake documentation](commands/zed.md) |
| 66 | +is the best way to learn about `zed`. |
| 67 | +All of its examples use `zed` commands run on the command line. |
| 68 | +Run `zed -h` or `-h` with any subcommand for a list of command options |
| 69 | +and online help. The same language query that works for `zq` operating |
| 70 | +on local files or streams also works for `zed query` operating on a lake. |
| 71 | + |
| 72 | +## Design Philosophy |
| 73 | + |
| 74 | +The design philosophy for Zed is based on composable building blocks |
| 75 | +built from self-describing data structures. Everything in a Zed lake |
| 76 | +is built from Zed data and each system component can be run and tested in isolation. |
| 77 | + |
| 78 | +Since Zed data is self-describing, this approach makes stream composition |
| 79 | +very easy. Data from a Zed query can trivially be piped to a local |
| 80 | +instance of `zq` by feeding the resulting Zed stream to stdin of `zq`, for example, |
| 81 | +``` |
| 82 | +zed query "from pool | ...remote query..." | zq "...local query..." - |
| 83 | +``` |
| 84 | +There is no need to configure the Zed entities with schema information |
| 85 | +like [protobuf configs](https://developers.google.com/protocol-buffers/docs/proto3) |
| 86 | +or connections to |
| 87 | +[schema registries](https://docs.confluent.io/platform/current/schema-registry/index.html). |
| 88 | + |
| 89 | +A Zed lake is completely self-contained, requiring no auxiliary databases |
| 90 | +(like the [Hive metastore](https://cwiki.apache.org/confluence/display/hive/design)) |
| 91 | +or other third-party services to interpret the lake data. |
| 92 | +Once copied, a new service can be instantiated by pointing a `zed serve` |
| 93 | +at the copy of the lake. |
| 94 | + |
| 95 | +Functionality like [data compaction](commands/zed.md#manage) and retention are all API-driven. |
| 96 | + |
| 97 | +Bite-sized components are unified by the Zed data, usually in the ZNG format: |
| 98 | +* All lake meta-data is available via meta-queries. |
| 99 | +* All like operations available through the service API are also available |
| 100 | +directly via the `zed` command. |
| 101 | +* Lake management is agent-driven through the API. For example, instead of complex policies |
| 102 | +like data compaction being implemented in the core with some fixed set of |
| 103 | +algorithms and policies, an agent can simply hit the API to obtain the meta-data |
| 104 | +of the objects in the lake, analyze the objects (e.g., looking for too much |
| 105 | +key space overlap) and issue API commands to merge overlapping objects |
| 106 | +and delete the old fragmented objects, all with the transactional consistency |
| 107 | +of the commit log. |
| 108 | +* Components are easily tested and debugged in isolation. |
0 commit comments