You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1.[Senzing APIs Bare Metal Usage](#senzing-apis-bare-metal-usage)
13
-
1.[Configuration](#configuration)
14
-
2.[Usage](#usage)
15
-
1.[Docker Usage](#docker-usage)
16
-
1.[Configuration](#configuration-1)
17
-
2.[Usage](#usage-1)
18
-
1.[Items of Note](#items-of-note)
19
-
1.[With Info](#with-info)
20
-
2.[Parallel Processing](#parallel-processing)
21
-
3.[Scalability](#scalability)
22
-
4.[Randomize Input Files](#randomize-input-files)
23
-
5.[Purging Senzing Repository Between Examples](#purging-senzing-repository-between-examples)
24
-
6.[Input Load File Sizes](#input-load-file-sizes)
16
+
1.[Legend]
17
+
1.[Warning]
18
+
1.[Senzing Engine Configuration]
19
+
1.[Senzing APIs Bare Metal Usage]
20
+
1.[Configuration]
21
+
2.[Usage]
22
+
1.[Docker Usage]
23
+
1.[Configuration]
24
+
2.[Usage]
25
+
1.[Items of Note]
26
+
1.[With Info]
27
+
2.[Parallel Processing]
28
+
3.[Scalability]
29
+
4.[Randomize Input Data]
30
+
5.[Purging Senzing Repository Between Examples]
31
+
6.[Input Load File Sizes]
25
32
26
33
### Legend
27
34
@@ -33,7 +40,7 @@ Succinct examples of how you might use the Senzing APIs for operational tasks.
33
40
34
41
## Warning
35
42
36
-
:warning::warning::warning:**Only run the code snippets against a test Senzing database instance.** Running the snippets adds and deletes data, and some snippets purge the entire database of currently ingested data. It is recommended to create a separate test Senzing project if you are using a bare metal Senzing install, or if using Docker a separate Senzing database to use only with the snippets. If you are getting started and are unsure please contact [Senzing Support](https://senzing.zendesk.com/hc/en-us/requests/new). :warning::warning::warning:
43
+
:warning::warning::warning:**Only run the code snippets against a test Senzing database instance.** Running the snippets adds and deletes data, and some snippets purge the entire database of currently ingested data. It is recommended to create a separate test Senzing project if you are using a bare metal Senzing install, or if using Docker a separate Senzing database to use only with the snippets. If you are getting started and are unsure please contact [Senzing Support]. :warning::warning::warning:
37
44
38
45
## Senzing Engine Configuration
39
46
@@ -56,27 +63,32 @@ The JSON configuration string is set via the environment variable `SENZING_ENGIN
56
63
57
64
## Senzing APIs Bare Metal Usage
58
65
59
-
You may already have installed the Senzing APIs and created a Senzing project by following the [Quickstart Guide](https://senzing.zendesk.com/hc/en-us/articles/115002408867-Quickstart-Guide). If not, and you would like to install the Senzing APIs directly on a machine, follow the steps in the[Quickstart Guide](https://senzing.zendesk.com/hc/en-us/articles/115002408867-Quickstart-Guide). Be sure to review the API [Quickstart Roadmap](https://senzing.zendesk.com/hc/en-us/articles/115001579954-API-Quickstart-Roadmap), especially the [System Requirements](https://senzing.zendesk.com/hc/en-us/articles/115010259947).
66
+
You may already have installed the Senzing APIs and created a Senzing project by following the [Quickstart Guide]. If not, and you would like to install the Senzing APIs directly on a machine, follow the steps in the[Quickstart Guide]. Be sure to review the API [Quickstart Roadmap], especially the [System Requirements].
60
67
61
68
### Configuration
62
69
63
70
When using a bare metal install, the initialization parameters used by the Senzing Python utilities are maintained within `<project_path>/etc/G2Module.ini`.
64
71
65
72
🤔To convert an existing Senzing project G2Module.ini file to a JSON string use one of the following methods:
:pencil2:`<project_path>` in the above examples should point to your project.
104
116
105
117
## Docker Usage
106
118
107
-
The included Dockerfile leverages the [Senzing API runtime](https://github.com/Senzing/senzingapi-runtime) image to provide an environment to run the code snippets.
119
+
The included Dockerfile leverages the [Senzing API runtime] image to provide an environment to run the code snippets.
108
120
109
-
### Configuration
121
+
### Configuration for Docker usage
110
122
111
123
When used with a container, the JSON configuration is relative to the paths within the container. The JSON configuration should look like:
112
124
@@ -125,23 +137,23 @@ When used with a container, the JSON configuration is relative to the paths with
125
137
126
138
✏️You only need to modify the `CONNECTION` string to point to your Senzing database.
127
139
128
-
### Usage
140
+
### Usage for Dccker usage
129
141
130
142
1. Clone this repository
131
-
2. Export the engine configuration environment variable
143
+
1. Export the engine configuration environment variable
@@ -174,7 +186,7 @@ A feature of Senzing is the capability to pass changes from data manipulation AP
174
186
}
175
187
```
176
188
177
-
The AFFECTED_ENTITIES object contains a list of all entity IDs affected. Separate processes can query the affected entities and synchronize changes and information to downstream systems. For additional information see [Real-time replication and analytics](https://senzing.zendesk.com/hc/en-us/articles/4417768234131--Advanced-Real-time-replication-and-analytics).
189
+
The AFFECTED_ENTITIES object contains a list of all entity IDs affected. Separate processes can query the affected entities and synchronize changes and information to downstream systems. For additional information see [Real-time replication and analytics].
178
190
179
191
### Parallel Processing
180
192
@@ -190,14 +202,44 @@ If a single very large load file and 3 machines were available for performing da
190
202
191
203
When providing your own input file(s) to the snippets or your own applications and processing data manipulation tasks (adding, deleting, replacing), it is important to randomize the file(s) or other input methods when running multiple threads. If source records that pertain to the same entity are clustered together, multiple processes or threads could all be trying to work on the same entity concurrently. This causes contention and overhead resulting in slower performance. To prevent this contention always randomize input data.
192
204
193
-
You may be able to randomize your input files during ETL and mapping the source data to the [Senzing Entity Specification](https://senzing.zendesk.com/hc/en-us/articles/231925448-Generic-Entity-Specification). Otherwise utilities such as [shuf](https://man7.org/linux/man-pages/man1/shuf.1.html) or [terashuf](https://github.com/alexandres/terashuf) for large files can be used.
205
+
You may be able to randomize your input files during ETL and mapping the source data to the [Senzing Entity Specification]. Otherwise utilities such as [shuf] or [terashuf] for large files can be used.
194
206
195
207
### Purging Senzing Repository Between Examples
196
208
197
209
When trying out different examples you may notice consecutive tasks complete much faster than an initial run. For example, running a loading task for the first time without the data in the system will be representative of load rate. If the same example is subsequently run again without purging the system it will complete much faster. This is because Senzing knows the records already exist in the system and it skips them.
198
210
199
-
To run the same example again and see representative performance, first [purge](Python/Tasks/Initialization/PurgeRepository.py) the Senzing repository of the loaded data. Some examples don't require purging between running them, an example would be the deleting examples that require data to be ingested first. See the usage notes for each task category for an overview of how to use the snippets.
211
+
To run the same example again and see representative performance, first [purge] the Senzing repository of the loaded data. Some examples don't require purging between running them, an example would be the deleting examples that require data to be ingested first. See the usage notes for each task category for an overview of how to use the snippets.
200
212
201
213
### Input Load File Sizes
202
214
203
-
There are different sized load files within the [Data](Resources/Data/) path that can be used to decrease or increase the volume of data loaded depending on the specification of your hardware. The files are named loadx.json, where the x specifies the number of records in the file.
215
+
There are different sized load files within the [Data] path that can be used to decrease or increase the volume of data loaded depending on the specification of your hardware. The files are named loadx.json, where the x specifies the number of records in the file.
0 commit comments