Skip to content

Commit 80a7524

Browse files
committed
#28 Add warning
1 parent eb62958 commit 80a7524

File tree

5 files changed

+162
-98
lines changed

5 files changed

+162
-98
lines changed

.github/workflows/pylint.yaml

+13-13
Original file line numberDiff line numberDiff line change
@@ -13,19 +13,19 @@ jobs:
1313
python-version: ["3.8", "3.9", "3.10"]
1414

1515
steps:
16-
- uses: actions/checkout@v4
16+
- uses: actions/checkout@v4
1717

18-
- name: set up Python ${{ matrix.python-version }}
19-
uses: actions/setup-python@v5
20-
with:
21-
python-version: ${{ matrix.python-version }}
18+
- name: set up Python ${{ matrix.python-version }}
19+
uses: actions/setup-python@v5
20+
with:
21+
python-version: ${{ matrix.python-version }}
2222

23-
- name: install dependencies
24-
run: |
25-
python -m pip install --upgrade pip
26-
pip install pylint
23+
- name: install dependencies
24+
run: |
25+
python -m pip install --upgrade pip
26+
pip install pylint
2727
28-
- name: analysing the code with pylint
29-
run: |
30-
# shellcheck disable=SC2046
31-
pylint $(git ls-files '*.py')
28+
- name: analysing the code with pylint
29+
run: |
30+
# shellcheck disable=SC2046
31+
pylint $(git ls-files '*.py')

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,4 @@ dmypy.json
127127

128128
# Pyre type checker
129129
.pyre/
130+
.history

.project

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
<?xml version="1.0" encoding="UTF-8"?>
22
<projectDescription>
3-
<name>code-snippets</name>
3+
<name>code-snippets-v3</name>
44
</projectDescription>

CHANGELOG.md

+6-3
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,8 @@
22

33
All notable changes to this project will be documented in this file.
44

5-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6-
[markdownlint](https://dlaa.me/markdownlint/),
7-
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
5+
The format is based on [Keep a Changelog], [markdownlint],
6+
and this project adheres to [Semantic Versioning].
87

98
## [1.1.1] - 2024-05-24
109

@@ -36,3 +35,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3635
### Added to 1.0.0
3736

3837
- Initial
38+
39+
[Keep a Changelog]: https://keepachangelog.com/en/1.0.0/
40+
[markdownlint]: https://dlaa.me/markdownlint/
41+
[Semantic Versioning]: https://semver.org/spec/v2.0.0.html

README.md

+141-81
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,34 @@
1-
# code-snippets
1+
# code-snippets-v3
2+
3+
## :warning: Warning
4+
5+
This repository is specifically for Senzing API V3.
6+
It is not designed to work with Senzing SDK V4.
7+
8+
To find the Senzing API V4 version of this repository, visit [code-snippets-v4].
29

310
## Overview
411

5-
Succinct examples of how you might use the Senzing APIs for operational tasks.
12+
Succinct examples of how you might use the Senzing APIs for operational tasks.
13+
614
## Contents
715

8-
1. [Legend](#legend)
9-
1. [Warning](#warning)
10-
1. [Senzing Engine Configuration](#senzing-engine-configuration)
11-
1. [Senzing APIs Bare Metal Usage](#senzing-apis-bare-metal-usage)
12-
1. [Configuration](#configuration)
13-
2. [Usage](#usage)
14-
1. [Docker Usage](#docker-usage)
15-
1. [Configuration](#configuration-1)
16-
2. [Usage](#usage-1)
17-
1. [Items of Note](#items-of-note)
18-
1. [With Info](#with-info)
19-
2. [Parallel Processing](#parallel-processing)
20-
3. [Scalability](#scalability)
21-
4. [Randomize Input Files](#randomize-input-files)
22-
5. [Purging Senzing Repository Between Examples](#purging-senzing-repository-between-examples)
23-
6. [Input Load File Sizes](#input-load-file-sizes)
16+
1. [Legend]
17+
1. [Warning]
18+
1. [Senzing Engine Configuration]
19+
1. [Senzing APIs Bare Metal Usage]
20+
1. [Configuration]
21+
2. [Usage]
22+
1. [Docker Usage]
23+
1. [Configuration]
24+
2. [Usage]
25+
1. [Items of Note]
26+
1. [With Info]
27+
2. [Parallel Processing]
28+
3. [Scalability]
29+
4. [Randomize Input Data]
30+
5. [Purging Senzing Repository Between Examples]
31+
6. [Input Load File Sizes]
2432

2533
### Legend
2634

@@ -30,81 +38,89 @@ Succinct examples of how you might use the Senzing APIs for operational tasks.
3038
1. :pencil2: - A "pencil" icon means that the instructions may need modification before performing.
3139
1. :warning: - A "warning" icon means that something tricky is happening, so pay attention.
3240

33-
3441
## Warning
3542

36-
:warning::warning::warning: __Only run the code snippets against a test Senzing database instance.__ Running the snippets adds and deletes data, and some snippets purge the entire database of currently ingested data. It is recommended to create a separate test Senzing project if you are using a bare metal Senzing install, or if using Docker a separate Senzing database to use only with the snippets. If you are getting started and are unsure please contact [Senzing Support](https://senzing.zendesk.com/hc/en-us/requests/new). :warning::warning::warning:
43+
:warning::warning::warning: **Only run the code snippets against a test Senzing database instance.** Running the snippets adds and deletes data, and some snippets purge the entire database of currently ingested data. It is recommended to create a separate test Senzing project if you are using a bare metal Senzing install, or if using Docker a separate Senzing database to use only with the snippets. If you are getting started and are unsure please contact [Senzing Support]. :warning::warning::warning:
3744

3845
## Senzing Engine Configuration
3946

4047
A JSON configuration string is used by the snippets to specify initialization parameters to the Senzing engine:
4148

4249
```json
4350
{
44-
"PIPELINE":
45-
{
46-
"SUPPORTPATH": "/home/senzing/mysenzproj1/data",
47-
"CONFIGPATH": "/home/senzing/mysenzproj1/etc",
48-
"RESOURCEPATH": "/home/senzing/mysenzproj1/resources"
49-
},
50-
"SQL":
51-
{
52-
"CONNECTION": "postgresql://user:password@host:5432:g2"
53-
}
51+
"PIPELINE": {
52+
"SUPPORTPATH": "/home/senzing/mysenzproj1/data",
53+
"CONFIGPATH": "/home/senzing/mysenzproj1/etc",
54+
"RESOURCEPATH": "/home/senzing/mysenzproj1/resources"
55+
},
56+
"SQL": {
57+
"CONNECTION": "postgresql://user:password@host:5432:g2"
58+
}
5459
}
5560
```
5661

5762
The JSON configuration string is set via the environment variable `SENZING_ENGINE_CONFIGURATION_JSON`.
5863

5964
## Senzing APIs Bare Metal Usage
60-
You may already have installed the Senzing APIs and created a Senzing project by following the [Quickstart Guide](https://senzing.zendesk.com/hc/en-us/articles/115002408867-Quickstart-Guide). If not, and you would like to install the Senzing APIs directly on a machine, follow the steps in the[ Quickstart Guide](https://senzing.zendesk.com/hc/en-us/articles/115002408867-Quickstart-Guide). Be sure to review the API [Quickstart Roadmap](https://senzing.zendesk.com/hc/en-us/articles/115001579954-API-Quickstart-Roadmap), especially the [System Requirements](https://senzing.zendesk.com/hc/en-us/articles/115010259947).
65+
66+
You may already have installed the Senzing APIs and created a Senzing project by following the [Quickstart Guide]. If not, and you would like to install the Senzing APIs directly on a machine, follow the steps in the [Quickstart Guide]. Be sure to review the API [Quickstart Roadmap], especially the [System Requirements].
6167

6268
### Configuration
6369

64-
When using a bare metal install, the initialization parameters used by the Senzing Python utilities are maintained within ```<project_path>/etc/G2Module.ini```.
70+
When using a bare metal install, the initialization parameters used by the Senzing Python utilities are maintained within `<project_path>/etc/G2Module.ini`.
6571

6672
🤔To convert an existing Senzing project G2Module.ini file to a JSON string use one of the following methods:
6773

68-
* [G2ModuleIniToJson.py](Python/Tasks/Initialization/)
69-
* Modify the path to your projects G2Module.ini file.
70-
71-
* [jc](https://github.com/kellyjonbrazil/jc)
72-
* ```console
73-
cat <project_path>/etc/G2Module.ini | jc --ini
74-
```
75-
* Python one liner
76-
* ```python
77-
python3 -c $'import configparser; ini_file_name = "<project_path>/etc/G2Module.ini";engine_config_json = {};cfgp = configparser.ConfigParser();cfgp.optionxform = str;cfgp.read(ini_file_name)\nfor section in cfgp.sections(): engine_config_json[section] = dict(cfgp.items(section))\nprint(engine_config_json)'
78-
```
79-
80-
* [SenzingGo.py](https://github.com/Senzing/senzinggo)
81-
* ```console
82-
<project_path>/python/SenzingGo.py --iniToJson
83-
```
84-
74+
- [G2ModuleIniToJson.py]
75+
76+
- Modify the path to your projects G2Module.ini file.
77+
78+
- [jc]
79+
80+
- ```console
81+
cat <project_path>/etc/G2Module.ini | jc --ini
82+
```
83+
84+
- Python one liner
85+
86+
- ```python
87+
python3 -c $'import configparser; ini_file_name = "<project_path>/etc/G2Module.ini";engine_config_json = {};cfgp = configparser.ConfigParser();cfgp.optionxform = str;cfgp.read(ini_file_name)\nfor section in cfgp.sections(): engine_config_json[section] = dict(cfgp.items(section))\nprint(engine_config_json)'
88+
```
89+
90+
- [SenzingGo.py]
91+
92+
- ```console
93+
<project_path>/python/SenzingGo.py --iniToJson
94+
```
95+
8596
:pencil2: `<project_path>` in the above example should point to your project.
8697

8798
### Usage
99+
88100
1. Clone this repository
89-
2. Export the engine configuration obtained for your project from [Configuration](#configuration), e.g.,
101+
1. Export the engine configuration obtained for your project from [Configuration], e.g.,
102+
90103
```console
91104
export SENZING_ENGINE_CONFIGURATION_JSON='{"PIPELINE": {"SUPPORTPATH": "/<project_path>/data", "CONFIGPATH": "<project_path>/etc", "RESOURCEPATH": "<project_path>/resources"}, "SQL": {"CONNECTION": "postgresql://user:password@host:5432:g2"}}'
92105
```
93-
3. Source the Senzing project setupEnv file
106+
107+
1. Source the Senzing project setupEnv file
108+
94109
```console
95110
source <project_path>/setupEnv
96111
```
97-
4. Run code snippets
112+
113+
1. Run code snippets
98114

99115
:pencil2: `<project_path>` in the above examples should point to your project.
100-
101-
116+
102117
## Docker Usage
103118

104-
The included Dockerfile leverages the [Senzing API runtime](https://github.com/Senzing/senzingapi-runtime) image to provide an environment to run the code snippets.
119+
The included Dockerfile leverages the [Senzing API runtime] image to provide an environment to run the code snippets.
105120

106-
### Configuration
107-
When used with a container, the JSON configuration is relative to the paths within the container. The JSON configuration should look like:
121+
### Configuration for Docker usage
122+
123+
When used with a container, the JSON configuration is relative to the paths within the container. The JSON configuration should look like:
108124

109125
```json
110126
{
@@ -121,65 +137,109 @@ The included Dockerfile leverages the [Senzing API runtime](https://github.com/S
121137

122138
✏️You only need to modify the `CONNECTION` string to point to your Senzing database.
123139

124-
### Usage
140+
### Usage for Dccker usage
141+
125142
1. Clone this repository
126-
2. Export the engine configuration environment variable
143+
1. Export the engine configuration environment variable
144+
127145
```console
128146
export SENZING_ENGINE_CONFIGURATION_JSON='{"PIPELINE": {"CONFIGPATH": "/etc/opt/senzing", "RESOURCEPATH": "/opt/senzing/g2/resources", "SUPPORTPATH": "/opt/senzing/data"}, "SQL": {"CONNECTION": "postgresql://user:password@host:5432:g2"}}'
129147
```
130-
3. Build the Docker image
131-
```console
148+
149+
1. Build the Docker image
150+
151+
```console
132152
cd <repository_dir>
133-
docker build --tag senzing/code-snippets .
153+
docker build --tag senzing/code-snippets-v3 .
134154
```
135-
4. Run a container
155+
156+
1. Run a container
157+
136158
```console
137159
docker run \
138160
--env SENZING_ENGINE_CONFIGURATION_JSON \
139161
--interactive \
140162
--tty \
141163
--rm \
142-
senzing/code-snippets
164+
senzing/code-snippets-v3
143165
```
144166

145167
✏️You only need to modify the `CONNECTION` string to point to your Senzing database.
146168

147169
## Items of Note
148-
170+
149171
### With Info
172+
150173
A feature of Senzing is the capability to pass changes from data manipulation API calls to downstream systems for analysis, consolidation and replication. Any API that can change the outcome of entity resolution have a "WithInfo" version of the API. For example, addRecord and addRecordWithInfo. The "WithInfo" version of the API returns a response message detailing any entities that were affected by the API. In the following example (from addRecordWithInfo) a single entity with the ID 7903 was affected.
174+
151175
```json
152176
{
153-
"DATA_SOURCE": "TEST",
154-
"RECORD_ID": "10945",
155-
"AFFECTED_ENTITIES": [
156-
{
157-
"ENTITY_ID": 7903,
158-
"LENS_CODE": "DEFAULT"
159-
}
160-
],
161-
"INTERESTING_ENTITIES": []
177+
"DATA_SOURCE": "TEST",
178+
"RECORD_ID": "10945",
179+
"AFFECTED_ENTITIES": [
180+
{
181+
"ENTITY_ID": 7903,
182+
"LENS_CODE": "DEFAULT"
183+
}
184+
],
185+
"INTERESTING_ENTITIES": []
162186
}
163187
```
164-
The AFFECTED_ENTITIES object contains a list of all entity IDs affected. Separate processes can query the affected entities and synchronize changes and information to downstream systems. For additional information see [Real-time replication and analytics](https://senzing.zendesk.com/hc/en-us/articles/4417768234131--Advanced-Real-time-replication-and-analytics).
188+
189+
The AFFECTED_ENTITIES object contains a list of all entity IDs affected. Separate processes can query the affected entities and synchronize changes and information to downstream systems. For additional information see [Real-time replication and analytics].
165190

166191
### Parallel Processing
192+
167193
Many of the example tasks demonstrate concurrent execution with threads. The entity resolution process involves IO operations, the use of concurrent processes and threads when calling the Senzing APIs provides scalability and performance. If using multiple processes, each process should have its own instance of a Senzing engine, for example G2Engine. Each engine object can support multiple threads.
168194

169195
### Scalability
170-
Many of the examples demonstrate using multiple threads to utilize the resources available on the machine. Consider loading data into Senzing and increasing the load rate, loading (and other tasks) can be horizontally scaled by utilizing additional machines.
171196

172-
If a single very large load file and 3 machines were available for performing data load, the file can be split into 3 with each machine running the sample code or your own application. Horizontal scaling such as this does require the Senzing database to have the capacity to accept the additional workload and not become the bottleneck.
197+
Many of the examples demonstrate using multiple threads to utilize the resources available on the machine. Consider loading data into Senzing and increasing the load rate, loading (and other tasks) can be horizontally scaled by utilizing additional machines.
198+
199+
If a single very large load file and 3 machines were available for performing data load, the file can be split into 3 with each machine running the sample code or your own application. Horizontal scaling such as this does require the Senzing database to have the capacity to accept the additional workload and not become the bottleneck.
173200

174201
### Randomize Input Data
175-
When providing your own input file(s) to the snippets or your own applications and processing data manipulation tasks (adding, deleting, replacing), it is important to randomize the file(s) or other input methods when running multiple threads. If source records that pertain to the same entity are clustered together, multiple processes or threads could all be trying to work on the same entity concurrently. This causes contention and overhead resulting in slower performance. To prevent this contention always randomize input data.
176202

177-
You may be able to randomize your input files during ETL and mapping the source data to the [Senzing Entity Specification](https://senzing.zendesk.com/hc/en-us/articles/231925448-Generic-Entity-Specification). Otherwise utilities such as [shuf](https://man7.org/linux/man-pages/man1/shuf.1.html) or [terashuf](https://github.com/alexandres/terashuf) for large files can be used.
203+
When providing your own input file(s) to the snippets or your own applications and processing data manipulation tasks (adding, deleting, replacing), it is important to randomize the file(s) or other input methods when running multiple threads. If source records that pertain to the same entity are clustered together, multiple processes or threads could all be trying to work on the same entity concurrently. This causes contention and overhead resulting in slower performance. To prevent this contention always randomize input data.
204+
205+
You may be able to randomize your input files during ETL and mapping the source data to the [Senzing Entity Specification]. Otherwise utilities such as [shuf] or [terashuf] for large files can be used.
178206

179207
### Purging Senzing Repository Between Examples
208+
180209
When trying out different examples you may notice consecutive tasks complete much faster than an initial run. For example, running a loading task for the first time without the data in the system will be representative of load rate. If the same example is subsequently run again without purging the system it will complete much faster. This is because Senzing knows the records already exist in the system and it skips them.
181210

182-
To run the same example again and see representative performance, first [purge](Python/Tasks/Initialization/PurgeRepository.py) the Senzing repository of the loaded data. Some examples don't require purging between running them, an example would be the deleting examples that require data to be ingested first. See the usage notes for each task category for an overview of how to use the snippets.
211+
To run the same example again and see representative performance, first [purge] the Senzing repository of the loaded data. Some examples don't require purging between running them, an example would be the deleting examples that require data to be ingested first. See the usage notes for each task category for an overview of how to use the snippets.
183212

184213
### Input Load File Sizes
185-
There are different sized load files within the [Data](Resources/Data/) path that can be used to decrease or increase the volume of data loaded depending on the specification of your hardware. The files are named loadx.json, where the x specifies the number of records in the file.
214+
215+
There are different sized load files within the [Data] path that can be used to decrease or increase the volume of data loaded depending on the specification of your hardware. The files are named loadx.json, where the x specifies the number of records in the file.
216+
217+
[code-snippets-v4]: https://github.com/Senzing/code-snippets-v4
218+
[Configuration]: #configuration
219+
[Data]: Resources/Data/
220+
[Docker Usage]: #docker-usage
221+
[G2ModuleIniToJson.py]: Python/Tasks/Initialization/
222+
[Input Load File Sizes]: #input-load-file-sizes
223+
[Items of Note]: #items-of-note
224+
[jc]: https://github.com/kellyjonbrazil/jc
225+
[Legend]: #legend
226+
[Parallel Processing]: #parallel-processing
227+
[purge]: Python/Tasks/Initialization/PurgeRepository.py
228+
[Purging Senzing Repository Between Examples]: #purging-senzing-repository-between-examples
229+
[Quickstart Guide]: https://senzing.zendesk.com/hc/en-us/articles/115002408867-Quickstart-Guide
230+
[Quickstart Roadmap]: https://senzing.zendesk.com/hc/en-us/articles/115001579954-API-Quickstart-Roadmap
231+
[Randomize Input Data]: #randomize-input-data
232+
[Real-time replication and analytics]: https://senzing.zendesk.com/hc/en-us/articles/4417768234131--Advanced-Real-time-replication-and-analytics
233+
[Scalability]: #scalability
234+
[Senzing API runtime]: https://github.com/Senzing/senzingapi-runtime
235+
[Senzing APIs Bare Metal Usage]: #senzing-apis-bare-metal-usage
236+
[Senzing Engine Configuration]: #senzing-engine-configuration
237+
[Senzing Entity Specification]: https://senzing.zendesk.com/hc/en-us/articles/231925448-Generic-Entity-Specification
238+
[Senzing Support]: https://senzing.zendesk.com/hc/en-us/requests/new
239+
[SenzingGo.py]: https://github.com/Senzing/senzinggo
240+
[shuf]: https://man7.org/linux/man-pages/man1/shuf.1.html
241+
[System Requirements]: https://senzing.zendesk.com/hc/en-us/articles/115010259947
242+
[terashuf]: https://github.com/alexandres/terashuf
243+
[Usage]: #usage
244+
[Warning]: #warning
245+
[With Info]: #with-info

0 commit comments

Comments
 (0)