aws
diff --git a/‎README.md
Lines changed: 11 additions & 487 deletions b/‎README.md
Lines changed: 11 additions & 487 deletions
diff --git a/‎src/markdown/index.md
Lines changed: 58 additions & 3 deletions b/‎src/markdown/index.md
Lines changed: 58 additions & 3 deletions
diff --git a/‎src/markdown/schema/schema-discovery.md
Lines changed: 294 additions & 0 deletions b/‎src/markdown/schema/schema-discovery.md
Lines changed: 294 additions & 0 deletions
diff --git a/‎src/markdown/setup/amazon-ca-certs.md
Lines changed: 3 additions & 3 deletions b/‎src/markdown/setup/amazon-ca-certs.md
Lines changed: 3 additions & 3 deletions
@@ -1,14 +1,69 @@
-# DocumentDB JDBC Driver Documentation
+# Amazon DocumentDB JDBC Driver Documentation
 
-These sections cover the setup and use of the Amazon DocumentDB JDBC driver.
+## Overview
 
-## Topics
+The JDBC driver for the Amazon DocumentDB managed document database provides an
+SQL-relational interface for developers and BI tool users.
+
+## License
+
+This project is licensed under the Apache-2.0 License.
+
+## Documentation
 
 - Setup
     - [Amazon DocumentDB JDBC Driver Setup](setup/setup.md)
     - [Using an SSH Tunnel to Connect to Amazon DocumentDB](setup/ssh-tunnel.md)
+    - [BI Tool Setup](setup/bi-tool-setup.md)  
     - [Connection String Syntax and Options](setup/connection-string.md)
 - Managing Schema
+    - [Schema Discovery and Generation](schema/schema-discovery.md)
     - [Managing Schema Using the Command Line Interface](schema/manage-schema-cli.md)
     - [Table Schemas JSON Format](schema/table-schemas-json-format.md)
+- SQL and JDBC Compatibility
+    - [SQL and JDBC Support and Limitations](sql/sql-jdbc-limitations.md)
+  
+## Getting Started
+
+Follow the [requirements and setup directions](setup/setup.md) to get you environment ready to use the
+Amazon DocumentDB JDBC driver. Assuming your Amazon DocumentDB cluster is hosted in a private VPC, 
+you'll want to [create an SSH tunnel](setup/ssh-tunnel.md) to bridge to your cluster in the VPC.
+If you're a Tableau or other BI user, follow the directions on how to 
+[setup and use BI tools](setup/bi-tool-setup.md) with the driver.
+
+## Setup and Usage
+
+To set up and use the DocumentDB JDBC driver, see [Amazon DocumentDB JDBC Driver Setup](setup/setup.md).
+
+## Connection String Syntax
+
+```
+jdbc:documentdb://[<user>[:<password>]@]<hostname>[:<port>]/<database-name>[?<option>=<value>[&<option>=<value>[...]]]
+```
+
+For more information about connecting to an Amazon DocumentDB database using this JDBC driver, see
+the [connection string documentation](setup/connection-string.md) for more details.
+## Schema Discovery
+
+The Amazon DocumentDB JDBC driver can perform automatic schema discovery and generate an SQL to
+DocumentDB schema mapping. See the [schema discovery documentation](schema/schema-discovery.md)
+for more details of this process.
+
+## Schema Management
+
+The SQL to DocumentDB schema mapping can be managed in the following ways:
+
+- generated
+- removed
+- listed
+- exported
+- imported
+
+See the [schema management documentation](schema/manage-schema-cli.md) and
+[table schemas JSON format](schema/table-schemas-json-format.md) for further
+information.
+
+## SQL and JDBC Limitations
 
+The Amazon DocumentDB JDBC driver has a number of important limitations. See the
+[SQL and JDBC limitations documentation](sql/sql-jdbc-limitations.md) for mor information.
@@ -0,0 +1,294 @@
+# Schema Discovery
+
+## Automated Discovery Behavior
+
+When the JDBC driver connection needs to get the schema for the collection in the database,
+it will poll for all the collections in the database.
+
+The driver will determine if a cached version of the schema for that collection already exists.
+If a cached version does not exist, it will sample the collection for documents and create a schema
+based on the following behavior.
+
+### Schema Generation Limitations
+
+The DocumentDB JDBC driver imposes a limit on the length of identifiers at 128 characters.
+The schema generator may truncate the length of generated identifiers (table names and column names)
+to ensure they fit that limit.
+
+### Scanning Method Options
+
+The sampling behavior can be modified using connection string or datasource options.
+
+- `scanMethod=<option>`
+    - `random` - (default) The sample documents are returned in _random_ order.
+    - `idForward` - The sample documents are returned in order of id.
+    - `idReverse` - The sample documents are returned in reverse order of id.
+    - `all` - Sample all the documents in the collection.
+- `scanLimit=<n>` - The number of documents to sample. The value must be a positive integer.
+  The default value is `1000`. If `scanMethod` is set to `all`, this option is ignored.
+
+### DocumentDB Data Types
+
+The DocumentDB server supports a number of MongoDB data types. Listed below are the supported data
+types, and their associated JDBC data types.
+
+| MongoDB Data Type | Supported in DocumentDB | JDBC Data Type |
+| ----------- | ----------- | ----------- |
+| Binary Data| Yes | VARBINARY |
+| Boolean | Yes | BOOLEAN |
+| Double | Yes | DOUBLE |
+| 32-bit Integer | Yes | INTEGER |
+| 64-bit Integer | Yes | BIGINT |
+| String | Yes | VARCHAR |
+| ObjectId | Yes | VARCHAR |
+| Date | Yes | TIMESTAMP |
+| Null | Yes | VARCHAR |
+| Regular Expression | Yes | VARCHAR |
+| Timestamp | Yes | VARCHAR |
+| MinKey | Yes | VARCHAR |
+| MaxKey | Yes | VARCHAR |
+| Object | Yes | _virtual table_ |
+| Array | Yes | _virtual table_ |
+| Decimal128 | No | DECIMAL |
+| JavaScript | No | VARCHAR |
+| JavaScript (with scope) | No | VARCHAR |
+| Undefined | No | VARCHAR |
+| Symbol | No | VARCHAR |
+| DBPointer (4.0+) | No | VARCHAR |
+
+### Mapping Scalar Document Fields
+
+When scanning a sample of documents from a collection, the JDBC driver will create one or more
+schema to represent the samples in the collection. In general, a scalar field in the document
+maps to a column in the table schema. For example, in a collection named `team`, and a single
+document `{ "_id" : "112233", "name" : "Alastair", "age" : 25 }`, this would map to schema:
+
+| Table Name | Column Name | Data Type | Key |
+| ---| --- | --- | --- |
+| team | _**team__id**_ | VARCHAR | PK |
+| team | name | VARCHAR | |
+| team | age | INTEGER | |
+
+### Data Type Conflict Promotion
+
+When scanning the sampled documents, it is possible that the data types for a field are not
+consistent from document to document. In this case, the JDBC driver will _promote_ the JDBC data
+type to a common data type that will suit all data types from the sampled documents.
+
+#### Example
+
+```json
+{
+  "_id" : "112233",
+  "name" : "Alastair",
+  "age" : 25
+}
+```
+
+```json
+{
+  "_id" : "112244",
+  "name" : "Benjamin",
+  "age" : "32"
+}
+```
+
+The `age` field is of type _32-bit integer_ in the first document but _string_ in the second document.
+Here the JDBC driver will promote the JDBC data type to VARCHAR to handle either data type when
+encountered.
+
+| Table Name | Column Name | Data Type | Key |
+| ---| --- | --- | --- |
+| team | _**team__id**_ | VARCHAR | PK |
+| team | name | VARCHAR | |
+| team | age | VARCHAR | |
+
+### Scalar-Scalar Conflict Promotion
+
+The following diagram shows the way in which scalar-scalar data type conflicts are resolved.
+
+![Scalar-Scalar Promotion](src/markdown/images/ScalarDataTypePromotion-transparent.png)
+
+### Object and Array Data Type Handling
+
+So far, we've only described how scalar data types are mapped. Object and Array data types are
+(currently) mapped to virtual tables. The JDBC driver will create a virtual table to represent
+either object or array fields in a document. The name of the mapped virtual table will concatenate the
+original collection's name followed by the field's name separated by an underscore character ("_").
+
+The base table's primary key ("_id") takes on a new name in the new
+virtual table and is provided as a foreign key to the associated base table.
+
+For embedded array type fields, index columns are generated to represent the
+index into the array at each level of the array.
+
+#### Embedded Object Field Example
+
+For object fields in a document, a mapping to a virtual table is created by the
+JDBC driver.
+
+Collection: `customer`
+
+```json
+{
+  "_id" : "112233",
+  "name" : "George Jackson",
+  "address" : {
+    "address1" : "123 Avenue Way",
+    "address2" : "Apt. 5",
+    "city" : "Hollywood",
+    "region" : "California",
+    "country" : "USA",
+    "code" : "90210"
+  }
+}
+```
+
+maps to schema for `customer` table, ...
+
+| Table Name | Column Name | Data Type | Key |
+| --- | --- | --- | --- |
+| customer | _**customer__id**_ | VARCHAR | PK |
+| customer | name | VARCHAR | |
+
+... and the `customer_address` virtual table
+
+| Table Name | Column Name | Data Type | Key |
+| --- | --- | --- | --- |
+| customer_address | _**customer__id**_ | VARCHAR | PK/FK |
+| customer_address | address1 | VARCHAR | |
+| customer_address | address2 | VARCHAR | |
+| customer_address | city | VARCHAR | |
+| customer_address | region | VARCHAR | |
+| customer_address | country | VARCHAR | |
+| customer_address | code | VARCHAR | |
+
+So the resulting data in the two tables would look like this...
+
+#### Table: customer
+
+| _**customer__id**_ | name |
+| --- | --- |
+| "112233" | "George Jackson" |
+
+#### Virtual Table: customer_address
+
+| _**customer__id**_ | address1 | address2 | city | region | country | code |
+| --- | --- | --- | --- | --- | --- | --- |
+| "112233" | "123 Avenue Way" | "Apt. 5" | "Hollywood" | "California" | "USA" | "90210" |
+
+To query the data and return all columns, use the following query with a
+JOIN statement to get the matching address data.
+
+```mysql-sql
+SELECT * FROM "customer"
+  INNER JOIN "customer_address"
+    ON "customer"."customer__id" = "customer_address"."customer__id"
+```
+
+#### Embedded Array Field Example
+
+For array fields in a document, a mapping to a virtual table is also created by the
+JDBC driver.
+
+Collection: `customer1`
+
+```json
+{
+  "_id" : "112233",
+  "name" : "George Jackson",
+  "subscriptions" : [
+    "Vogue",
+    "People",
+    "USA Today"
+  ]
+}
+```
+
+maps to schema for the `customer1` table, ...
+
+| Table Name | Column Name | Data Type | Key |
+| --- | --- | --- | --- |
+| customer1 | _**customer1__id**_ | VARCHAR | PK |
+| customer1 | name | VARCHAR | |
+
+... and the `customer1_subscriptions` virtual table
+
+| Table Name | Column Name | Data Type | Key |
+| --- | --- | --- | --- |
+| customer1_subscriptions | _**customer1__id**_ | VARCHAR | PK/FK |
+| customer1_subscriptions | subscriptions_index_lvl0 | BIGINT | PK |
+| customer1_subscriptions | value | VARCHAR | |
+
+So the resulting data in the two tables would look like this...
+
+#### Table: customer1
+
+| _**customer1__id**_ | name |
+| --- | --- |
+| "112233" | "George Jackson" |
+
+#### Virtual Table: customer1_subscriptions
+
+| _**customer1__id**_ | subscriptions_index_lvl0 | value |
+| --- | --- | --- |
+| "112233" | 0 | "Vogue" |
+| "112233" | 1 | "People" |
+| "112233" | 2 | "USA Today" |
+
+To query the data and return all columns, use the following query with a
+JOIN statement to get the matching _subscriptions_ data.
+
+```mysql-sql
+SELECT * FROM "customer1"
+  INNER JOIN "customer1_subscriptions"
+    ON "customer"."customer1__id" = "customer_address"."customer1__id"
+```
+
+### Scalar-Complex Type Conflict Promotion
+
+Like the scalar-scalar type conflicts, the same field in different documents can have conflicting
+data types between complex (array and object) and scalar (integer, boolean, etc.). All of these
+conflicts are resolved (promoted) to VARCHAR for those fields. In this case, array and object data
+is returned as the JSON representation.
+
+#### Embedded Array - String Field Conflict Example
+
+Collection: `customer2`
+
+```json
+{
+  "_id" : "112233",
+  "name" : "George Jackson",
+  "subscriptions" : [
+    "Vogue",
+    "People",
+    "USA Today"
+  ]
+}
+```
+
+```json
+{
+  "_id" : "112244",
+  "name" : "Joan Starr",
+  "subscriptions" : 1
+}
+```
+
+maps to schema for the `customer2` table, ...
+
+| Table Name | Column Name | Data Type | Key |
+| --- | --- | --- | --- |
+| customer2 | _**customer2__id**_ | VARCHAR | PK |
+| customer2 | name | VARCHAR | |
+| customer2 | subscription | VARCHAR | |
+
+So the resulting data in the table would look like this...
+
+#### Table: customer2
+
+| _**customer2__id**_ | name | subscriptions |
+| --- | --- | --- |
+| "112233" | "George Jackson" | "\[ \\"Vogue\\", \\"People\\",  \\"USA Today\\" \]" |
+| "112244" | "Joan Starr" | "1" |
@@ -5,12 +5,12 @@ file which should work when connecting to Amazon DocumentDB clusters using SSL/T
 However, if you want to provide a new Amazon RDS Certificate Authority root certificate, follow the
 direction below.
 
-The certificates are distributed 
-[bundled or unbundled](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html).
+The certificate is distributed 
+[unbundled here](https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem).
 
 1. [Download the root CA certificate](https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem)
 1. We recommend you relocate the file into your user's home directory:
    1. Window: `$HOME`
    1. MacOS/Linus: `~`
 1. Add the `tlsCAFile` option to your [JDBC connection string](connection-string.md).
-   For example, `jdbc:documentdb://localhost:27117/customer?tlsAllowInvalidHostnames=true&tlsCAFile=~/rds-ca-2019-root.pem`
+   For example, `jdbc:documentdb://localhost:27117/customer?tlsAllowInvalidHostnames=true&tlsCAFile=rds-ca-2019-root.pem`