Skip to content

Commit efc06be

Browse files
Bruce Irschickbirschick-bq
Bruce Irschick
andauthored
[AD-272] Refactor documentation. (#27)
### Summary [AD-272] Refactor documentation. ### Description - Move content from README to separate sections under documentation node. - Update index and other sections to use the new sections. - Separate the BI Tool setup section. ### Related Issue https://bitquill.atlassian.net/browse/AD-272 * [AD-272] Refactor documentation. https://bitquill.atlassian.net/browse/AD-272 * Commit Code Coverage Badge * [AD-272] Refactor documentation. https://bitquill.atlassian.net/browse/AD-272 * Commit Code Coverage Badge * [AD-272] Refactor documentation. https://bitquill.atlassian.net/browse/AD-272 * Commit Code Coverage Badge * [AD-272] Change images for "DocumentDB by Amazon". Removed directions for unsigned TACO file. https://bitquill.atlassian.net/browse/AD-272 * [AD-272] Added generated PDF file. https://bitquill.atlassian.net/browse/AD-272 * Commit Code Coverage Badge * [AD-272] Removed single document. Merged previous documentation propertly from REAMD.md. https://bitquill.atlassian.net/browse/AD-272 * Commit Code Coverage Badge Co-authored-by: birschick-bq <[email protected]>
1 parent eee9215 commit efc06be

9 files changed

+649
-631
lines changed

README.md

+11-487
Large diffs are not rendered by default.

src/markdown/index.md

+58-3
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,69 @@
1-
# DocumentDB JDBC Driver Documentation
1+
# Amazon DocumentDB JDBC Driver Documentation
22

3-
These sections cover the setup and use of the Amazon DocumentDB JDBC driver.
3+
## Overview
44

5-
## Topics
5+
The JDBC driver for the Amazon DocumentDB managed document database provides an
6+
SQL-relational interface for developers and BI tool users.
7+
8+
## License
9+
10+
This project is licensed under the Apache-2.0 License.
11+
12+
## Documentation
613

714
- Setup
815
- [Amazon DocumentDB JDBC Driver Setup](setup/setup.md)
916
- [Using an SSH Tunnel to Connect to Amazon DocumentDB](setup/ssh-tunnel.md)
17+
- [BI Tool Setup](setup/bi-tool-setup.md)
1018
- [Connection String Syntax and Options](setup/connection-string.md)
1119
- Managing Schema
20+
- [Schema Discovery and Generation](schema/schema-discovery.md)
1221
- [Managing Schema Using the Command Line Interface](schema/manage-schema-cli.md)
1322
- [Table Schemas JSON Format](schema/table-schemas-json-format.md)
23+
- SQL and JDBC Compatibility
24+
- [SQL and JDBC Support and Limitations](sql/sql-jdbc-limitations.md)
25+
26+
## Getting Started
27+
28+
Follow the [requirements and setup directions](setup/setup.md) to get you environment ready to use the
29+
Amazon DocumentDB JDBC driver. Assuming your Amazon DocumentDB cluster is hosted in a private VPC,
30+
you'll want to [create an SSH tunnel](setup/ssh-tunnel.md) to bridge to your cluster in the VPC.
31+
If you're a Tableau or other BI user, follow the directions on how to
32+
[setup and use BI tools](setup/bi-tool-setup.md) with the driver.
33+
34+
## Setup and Usage
35+
36+
To set up and use the DocumentDB JDBC driver, see [Amazon DocumentDB JDBC Driver Setup](setup/setup.md).
37+
38+
## Connection String Syntax
39+
40+
```
41+
jdbc:documentdb://[<user>[:<password>]@]<hostname>[:<port>]/<database-name>[?<option>=<value>[&<option>=<value>[...]]]
42+
```
43+
44+
For more information about connecting to an Amazon DocumentDB database using this JDBC driver, see
45+
the [connection string documentation](setup/connection-string.md) for more details.
46+
## Schema Discovery
47+
48+
The Amazon DocumentDB JDBC driver can perform automatic schema discovery and generate an SQL to
49+
DocumentDB schema mapping. See the [schema discovery documentation](schema/schema-discovery.md)
50+
for more details of this process.
51+
52+
## Schema Management
53+
54+
The SQL to DocumentDB schema mapping can be managed in the following ways:
55+
56+
- generated
57+
- removed
58+
- listed
59+
- exported
60+
- imported
61+
62+
See the [schema management documentation](schema/manage-schema-cli.md) and
63+
[table schemas JSON format](schema/table-schemas-json-format.md) for further
64+
information.
65+
66+
## SQL and JDBC Limitations
1467

68+
The Amazon DocumentDB JDBC driver has a number of important limitations. See the
69+
[SQL and JDBC limitations documentation](sql/sql-jdbc-limitations.md) for mor information.
+294
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
# Schema Discovery
2+
3+
## Automated Discovery Behavior
4+
5+
When the JDBC driver connection needs to get the schema for the collection in the database,
6+
it will poll for all the collections in the database.
7+
8+
The driver will determine if a cached version of the schema for that collection already exists.
9+
If a cached version does not exist, it will sample the collection for documents and create a schema
10+
based on the following behavior.
11+
12+
### Schema Generation Limitations
13+
14+
The DocumentDB JDBC driver imposes a limit on the length of identifiers at 128 characters.
15+
The schema generator may truncate the length of generated identifiers (table names and column names)
16+
to ensure they fit that limit.
17+
18+
### Scanning Method Options
19+
20+
The sampling behavior can be modified using connection string or datasource options.
21+
22+
- `scanMethod=<option>`
23+
- `random` - (default) The sample documents are returned in _random_ order.
24+
- `idForward` - The sample documents are returned in order of id.
25+
- `idReverse` - The sample documents are returned in reverse order of id.
26+
- `all` - Sample all the documents in the collection.
27+
- `scanLimit=<n>` - The number of documents to sample. The value must be a positive integer.
28+
The default value is `1000`. If `scanMethod` is set to `all`, this option is ignored.
29+
30+
### DocumentDB Data Types
31+
32+
The DocumentDB server supports a number of MongoDB data types. Listed below are the supported data
33+
types, and their associated JDBC data types.
34+
35+
| MongoDB Data Type | Supported in DocumentDB | JDBC Data Type |
36+
| ----------- | ----------- | ----------- |
37+
| Binary Data| Yes | VARBINARY |
38+
| Boolean | Yes | BOOLEAN |
39+
| Double | Yes | DOUBLE |
40+
| 32-bit Integer | Yes | INTEGER |
41+
| 64-bit Integer | Yes | BIGINT |
42+
| String | Yes | VARCHAR |
43+
| ObjectId | Yes | VARCHAR |
44+
| Date | Yes | TIMESTAMP |
45+
| Null | Yes | VARCHAR |
46+
| Regular Expression | Yes | VARCHAR |
47+
| Timestamp | Yes | VARCHAR |
48+
| MinKey | Yes | VARCHAR |
49+
| MaxKey | Yes | VARCHAR |
50+
| Object | Yes | _virtual table_ |
51+
| Array | Yes | _virtual table_ |
52+
| Decimal128 | No | DECIMAL |
53+
| JavaScript | No | VARCHAR |
54+
| JavaScript (with scope) | No | VARCHAR |
55+
| Undefined | No | VARCHAR |
56+
| Symbol | No | VARCHAR |
57+
| DBPointer (4.0+) | No | VARCHAR |
58+
59+
### Mapping Scalar Document Fields
60+
61+
When scanning a sample of documents from a collection, the JDBC driver will create one or more
62+
schema to represent the samples in the collection. In general, a scalar field in the document
63+
maps to a column in the table schema. For example, in a collection named `team`, and a single
64+
document `{ "_id" : "112233", "name" : "Alastair", "age" : 25 }`, this would map to schema:
65+
66+
| Table Name | Column Name | Data Type | Key |
67+
| ---| --- | --- | --- |
68+
| team | _**team__id**_ | VARCHAR | PK |
69+
| team | name | VARCHAR | |
70+
| team | age | INTEGER | |
71+
72+
### Data Type Conflict Promotion
73+
74+
When scanning the sampled documents, it is possible that the data types for a field are not
75+
consistent from document to document. In this case, the JDBC driver will _promote_ the JDBC data
76+
type to a common data type that will suit all data types from the sampled documents.
77+
78+
#### Example
79+
80+
```json
81+
{
82+
"_id" : "112233",
83+
"name" : "Alastair",
84+
"age" : 25
85+
}
86+
```
87+
88+
```json
89+
{
90+
"_id" : "112244",
91+
"name" : "Benjamin",
92+
"age" : "32"
93+
}
94+
```
95+
96+
The `age` field is of type _32-bit integer_ in the first document but _string_ in the second document.
97+
Here the JDBC driver will promote the JDBC data type to VARCHAR to handle either data type when
98+
encountered.
99+
100+
| Table Name | Column Name | Data Type | Key |
101+
| ---| --- | --- | --- |
102+
| team | _**team__id**_ | VARCHAR | PK |
103+
| team | name | VARCHAR | |
104+
| team | age | VARCHAR | |
105+
106+
### Scalar-Scalar Conflict Promotion
107+
108+
The following diagram shows the way in which scalar-scalar data type conflicts are resolved.
109+
110+
![Scalar-Scalar Promotion](src/markdown/images/ScalarDataTypePromotion-transparent.png)
111+
112+
### Object and Array Data Type Handling
113+
114+
So far, we've only described how scalar data types are mapped. Object and Array data types are
115+
(currently) mapped to virtual tables. The JDBC driver will create a virtual table to represent
116+
either object or array fields in a document. The name of the mapped virtual table will concatenate the
117+
original collection's name followed by the field's name separated by an underscore character ("_").
118+
119+
The base table's primary key ("_id") takes on a new name in the new
120+
virtual table and is provided as a foreign key to the associated base table.
121+
122+
For embedded array type fields, index columns are generated to represent the
123+
index into the array at each level of the array.
124+
125+
#### Embedded Object Field Example
126+
127+
For object fields in a document, a mapping to a virtual table is created by the
128+
JDBC driver.
129+
130+
Collection: `customer`
131+
132+
```json
133+
{
134+
"_id" : "112233",
135+
"name" : "George Jackson",
136+
"address" : {
137+
"address1" : "123 Avenue Way",
138+
"address2" : "Apt. 5",
139+
"city" : "Hollywood",
140+
"region" : "California",
141+
"country" : "USA",
142+
"code" : "90210"
143+
}
144+
}
145+
```
146+
147+
maps to schema for `customer` table, ...
148+
149+
| Table Name | Column Name | Data Type | Key |
150+
| --- | --- | --- | --- |
151+
| customer | _**customer__id**_ | VARCHAR | PK |
152+
| customer | name | VARCHAR | |
153+
154+
... and the `customer_address` virtual table
155+
156+
| Table Name | Column Name | Data Type | Key |
157+
| --- | --- | --- | --- |
158+
| customer_address | _**customer__id**_ | VARCHAR | PK/FK |
159+
| customer_address | address1 | VARCHAR | |
160+
| customer_address | address2 | VARCHAR | |
161+
| customer_address | city | VARCHAR | |
162+
| customer_address | region | VARCHAR | |
163+
| customer_address | country | VARCHAR | |
164+
| customer_address | code | VARCHAR | |
165+
166+
So the resulting data in the two tables would look like this...
167+
168+
#### Table: customer
169+
170+
| _**customer__id**_ | name |
171+
| --- | --- |
172+
| "112233" | "George Jackson" |
173+
174+
#### Virtual Table: customer_address
175+
176+
| _**customer__id**_ | address1 | address2 | city | region | country | code |
177+
| --- | --- | --- | --- | --- | --- | --- |
178+
| "112233" | "123 Avenue Way" | "Apt. 5" | "Hollywood" | "California" | "USA" | "90210" |
179+
180+
To query the data and return all columns, use the following query with a
181+
JOIN statement to get the matching address data.
182+
183+
```mysql-sql
184+
SELECT * FROM "customer"
185+
INNER JOIN "customer_address"
186+
ON "customer"."customer__id" = "customer_address"."customer__id"
187+
```
188+
189+
#### Embedded Array Field Example
190+
191+
For array fields in a document, a mapping to a virtual table is also created by the
192+
JDBC driver.
193+
194+
Collection: `customer1`
195+
196+
```json
197+
{
198+
"_id" : "112233",
199+
"name" : "George Jackson",
200+
"subscriptions" : [
201+
"Vogue",
202+
"People",
203+
"USA Today"
204+
]
205+
}
206+
```
207+
208+
maps to schema for the `customer1` table, ...
209+
210+
| Table Name | Column Name | Data Type | Key |
211+
| --- | --- | --- | --- |
212+
| customer1 | _**customer1__id**_ | VARCHAR | PK |
213+
| customer1 | name | VARCHAR | |
214+
215+
... and the `customer1_subscriptions` virtual table
216+
217+
| Table Name | Column Name | Data Type | Key |
218+
| --- | --- | --- | --- |
219+
| customer1_subscriptions | _**customer1__id**_ | VARCHAR | PK/FK |
220+
| customer1_subscriptions | subscriptions_index_lvl0 | BIGINT | PK |
221+
| customer1_subscriptions | value | VARCHAR | |
222+
223+
So the resulting data in the two tables would look like this...
224+
225+
#### Table: customer1
226+
227+
| _**customer1__id**_ | name |
228+
| --- | --- |
229+
| "112233" | "George Jackson" |
230+
231+
#### Virtual Table: customer1_subscriptions
232+
233+
| _**customer1__id**_ | subscriptions_index_lvl0 | value |
234+
| --- | --- | --- |
235+
| "112233" | 0 | "Vogue" |
236+
| "112233" | 1 | "People" |
237+
| "112233" | 2 | "USA Today" |
238+
239+
To query the data and return all columns, use the following query with a
240+
JOIN statement to get the matching _subscriptions_ data.
241+
242+
```mysql-sql
243+
SELECT * FROM "customer1"
244+
INNER JOIN "customer1_subscriptions"
245+
ON "customer"."customer1__id" = "customer_address"."customer1__id"
246+
```
247+
248+
### Scalar-Complex Type Conflict Promotion
249+
250+
Like the scalar-scalar type conflicts, the same field in different documents can have conflicting
251+
data types between complex (array and object) and scalar (integer, boolean, etc.). All of these
252+
conflicts are resolved (promoted) to VARCHAR for those fields. In this case, array and object data
253+
is returned as the JSON representation.
254+
255+
#### Embedded Array - String Field Conflict Example
256+
257+
Collection: `customer2`
258+
259+
```json
260+
{
261+
"_id" : "112233",
262+
"name" : "George Jackson",
263+
"subscriptions" : [
264+
"Vogue",
265+
"People",
266+
"USA Today"
267+
]
268+
}
269+
```
270+
271+
```json
272+
{
273+
"_id" : "112244",
274+
"name" : "Joan Starr",
275+
"subscriptions" : 1
276+
}
277+
```
278+
279+
maps to schema for the `customer2` table, ...
280+
281+
| Table Name | Column Name | Data Type | Key |
282+
| --- | --- | --- | --- |
283+
| customer2 | _**customer2__id**_ | VARCHAR | PK |
284+
| customer2 | name | VARCHAR | |
285+
| customer2 | subscription | VARCHAR | |
286+
287+
So the resulting data in the table would look like this...
288+
289+
#### Table: customer2
290+
291+
| _**customer2__id**_ | name | subscriptions |
292+
| --- | --- | --- |
293+
| "112233" | "George Jackson" | "\[ \\"Vogue\\", \\"People\\", \\"USA Today\\" \]" |
294+
| "112244" | "Joan Starr" | "1" |

src/markdown/setup/amazon-ca-certs.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@ file which should work when connecting to Amazon DocumentDB clusters using SSL/T
55
However, if you want to provide a new Amazon RDS Certificate Authority root certificate, follow the
66
direction below.
77

8-
The certificates are distributed
9-
[bundled or unbundled](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html).
8+
The certificate is distributed
9+
[unbundled here](https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem).
1010

1111
1. [Download the root CA certificate](https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem)
1212
1. We recommend you relocate the file into your user's home directory:
1313
1. Window: `$HOME`
1414
1. MacOS/Linus: `~`
1515
1. Add the `tlsCAFile` option to your [JDBC connection string](connection-string.md).
16-
For example, `jdbc:documentdb://localhost:27117/customer?tlsAllowInvalidHostnames=true&tlsCAFile=~/rds-ca-2019-root.pem`
16+
For example, `jdbc:documentdb://localhost:27117/customer?tlsAllowInvalidHostnames=true&tlsCAFile=rds-ca-2019-root.pem`

0 commit comments

Comments
 (0)