Skip to content

Add Link Resolver Module with CRUD functionality and DTOs #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: staging
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .env
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,13 @@ DATABASE_URL="postgres://tkawcvceqtdlrt:b63809a94a8d366a92eca1d489dbc43bcc7a58fd
SHADOW_DATABASE_URL="postgres://kyrhydpwvpavke:7917c48b5ad1e3cfc294df930e053075270752c19bd13c1ea6fd31280722735c@ec2-44-205-112-253.compute-1.amazonaws.com:5432/dfdm5lo7eed2pb"


JWT_SECRET='mSSS9Zrd'
JWT_SECRET='mSSS9Zrd'

# MinIO Configuration
MINIO_ENDPOINT=localhost
MINIO_PORT=9000
MINIO_USE_SSL=false
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=link-resolvers
MINIO_REGION=us-east-1
91 changes: 91 additions & 0 deletions DATA_INTEGRITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# GS1 Identity Resolver Data Integrity

The GS1 Identity Resolver uses AWS S3/MinIO for storage. Since these storage systems don't provide strong consistency guarantees for all operations, we've implemented additional mechanisms to ensure data integrity and handle concurrent modifications.

## ETag-Based Optimistic Concurrency Control

Our system uses the built-in ETag feature of S3/MinIO to provide optimistic concurrency control:

1. **ETags for File Versioning**: Every object in S3/MinIO has an ETag that changes whenever the object is updated. We leverage this to detect concurrent modifications.

2. **Conditional Operations**: When updating data, we include the original ETag to ensure that the object hasn't been modified by another process since it was retrieved.

3. **Optimistic Locking**: If the ETags don't match during an update, the operation fails, and the client must retrieve the latest version and try again.

4. **Immutable History Records**: All changes to product data are recorded in immutable history files, with each historical version being preserved with its own ETag.

## Verification and Usage

### API Endpoints

The system provides API endpoints to verify ETag-based concurrency control:

- `GET /gs1/verify/{entityType}/{entityId}` - Verify ETag exists for a specific entity
- `GET /gs1/verify/metadata` - Verify ETag exists for system metadata

### CLI Commands

CLI commands for verification:

```bash
# Verify ETag concurrency control for a product
yarn gs1:verify -t product -i 01/12345678901234

# Verify ETag concurrency control for system metadata
yarn gs1:verify -t metadata -i system

# Using the verification script
./verify-integrity.sh product 01/12345678901234
```

## Usage Examples

### Retrieving Data with ETag for Updates

To update data, first retrieve it with the ETag:

```
GET /gs1/products/01/12345678901234?includeETag=true
```

Response:
```json
{
"id": "01/12345678901234",
"name": "Product Name",
"data": "...",
"_etag": "a1b2c3d4..."
}
```

### Updating with ETag Validation

When updating, include the original ETag:

```
POST /gs1/products/01/12345678901234
{
"id": "01/12345678901234",
"name": "Updated Product Name",
"data": "...",
"_etag": "a1b2c3d4..."
}
```

If the object has been modified by another process since retrieval, the update will fail with a 409 Conflict response, indicating that the client needs to fetch the latest version and try again.

## Benefits

1. **Tamper Detection**: The ETag mechanism ensures data hasn't been modified unexpectedly.

2. **Corruption Prevention**: By verifying ETags before updates, we prevent accidental overwrites of data changed by other processes.

3. **Optimistic Concurrency**: The system provides optimistic concurrency control, allowing high throughput while still preventing data corruption from concurrent modifications.

4. **Audit Trail**: Immutable history records provide a complete audit trail of all changes to product data.

## Implementation Details

- `MinioService` includes methods for handling ETags, such as `getFileWithETag`, `uploadFile` (with optional ETag parameter), and `getETag`.
- `GS1StorageService` leverages these methods to implement ETag-based concurrency control at a higher level.
- `GS1ResolverService` provides a clean API that handles ETags transparently for client applications.
84 changes: 84 additions & 0 deletions ETAG_CONCURRENCY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# ETag-Based Concurrency Control for GS1 Identity Resolver

This document provides an overview of the ETag-based concurrency control implemented in the GS1 Identity Resolver system.

## What are ETags?

ETags (Entity Tags) are HTTP response headers used for web cache validation and for conditional requests to prevent the "lost update" problem. In S3/MinIO, ETags represent a hash of the object content and change whenever the object is modified.

## Why ETags instead of Custom Hash Files?

We've migrated from our previous approach of managing separate hash files to using the built-in ETags provided by S3/MinIO for several reasons:

1. **Simplicity**: Using native ETags eliminates the need for maintaining separate hash files.
2. **Efficiency**: No extra storage or computation costs associated with calculating and storing custom hashes.
3. **Native Integration**: S3/MinIO already provides ETags for all objects by default.
4. **Atomicity**: ETag verification is built into S3/MinIO's conditional operations, ensuring atomic updates.

## How ETag Concurrency Works

### Basic Flow

1. **Retrieve with ETag**: When retrieving a resource for update purposes, include the query parameter `?includeETag=true` to get the current ETag.
```
GET /gs1/products/01/12345678901234?includeETag=true
```

2. **Submit with ETag**: When updating, include the original ETag in your request body:
```json
{
"name": "Updated Product",
"_etag": "a1b2c3d4..."
}
```

3. **Server Verification**: The server validates that the ETag matches before applying the update. If the ETags don't match, the update is rejected with a 409 Conflict response.

Comment on lines +18 to +36
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add language specifiers to fenced code blocks.

The code examples for ETag usage are helpful, but the Markdown code fence blocks are missing language specifiers.

-   ```
+   ```http
    GET /gs1/products/01/12345678901234?includeETag=true
    ```
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

23-23: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

### Handling Conflicts

If you receive a 409 Conflict response, it means someone else has modified the resource since you retrieved it. To resolve this:

1. Retrieve the resource again to get the latest version and ETag.
2. Apply your changes to this latest version.
3. Submit the update with the new ETag.

## Implementation Details

- `MinioService` handles the low-level interactions with S3/MinIO, including ETag retrieval and conditional operations.
- `GS1StorageService` implements higher-level logic for managing GS1 entities with ETag concurrency control.
- `GS1ResolverService` provides a clean API that handles the business logic, including ETag verification.

## Testing ETag Concurrency

You can test ETag concurrency control using the provided tools:

### API Endpoints
```
GET /gs1/verify/{entityType}/{entityId}
```

### Command Line
```bash
# Using npm/yarn scripts
yarn gs1:verify:etag -t product -i 01/12345678901234

# Using the shell script
./verify-integrity.sh product 01/12345678901234
```

### API Client Example
We've included an example script showing how to use ETags in an API client:
```bash
# Run the example
./api-examples/etag-update-flow.sh

# Simulate a conflict
./api-examples/etag-update-flow.sh --simulate-conflict
```
Comment on lines +51 to +77
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Comprehensive testing instructions with missing language specifier.

The testing instructions are detailed and cover multiple approaches (API, CLI, sample script), but there's a code block missing a language specifier.

-```
+```http
 GET /gs1/verify/{entityType}/{entityId}

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

56-56: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit -->


## Benefits

- **Prevents Data Loss**: Ensures updates are based on the most recent version of the data.
- **Optimistic Concurrency**: Allows high throughput without locking, only rejecting conflicting updates.
- **Data Integrity**: Helps maintain consistency and integrity of your GS1 data.
- **Simplified Architecture**: Leverages native S3/MinIO capabilities without additional complexity.
92 changes: 92 additions & 0 deletions api-examples/etag-update-flow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#!/bin/bash

# Example script showing how to use ETags for optimistic concurrency control
# when updating resources in the GS1 Identity Resolver

API_URL="http://localhost:3000/gs1"
PRODUCT_ID="01/12345678901234"

echo "GS1 Identity Resolver API - ETag Concurrency Control Example"
echo "==========================================================="
echo

# Step 1: Get the product with ETag
echo "Step 1: Retrieving product with ETag..."
PRODUCT_INFO=$(curl -s "${API_URL}/products/${PRODUCT_ID}?includeETag=true")
echo "Product retrieved:"
echo "$PRODUCT_INFO" | python -m json.tool
echo

# Extract the ETag from the product information
ETAG=$(echo "$PRODUCT_INFO" | grep -o '"_etag":"[^"]*"' | sed 's/"_etag":"//;s/"//')
echo "ETag extracted: $ETAG"
echo

# Simulate another client modifying the product
simulate_concurrent_update() {
echo "Simulating concurrent update by another client..."

# Create a temporary product data file with a different description
cat > /tmp/concurrent_update.json <<EOF
{
"name": "Product modified by other client",
"description": "This change was made by a different client"
}
EOF

# Make the concurrent update
curl -s -X POST \
-H "Content-Type: application/json" \
-d @/tmp/concurrent_update.json \
"${API_URL}/products/${PRODUCT_ID}" > /dev/null

rm /tmp/concurrent_update.json

echo "Concurrent update completed."
echo
}

# Step 2: Try to update with valid ETag
echo "Step 2: Updating product with ETag..."

# Create a temporary product data file
cat > /tmp/product_update.json <<EOF
{
"name": "Updated Product Name",
"description": "This is an updated description with ETag validation",
"_etag": "$ETAG"
}
EOF

# Simulate concurrent update if requested
if [ "$1" == "--simulate-conflict" ]; then
simulate_concurrent_update
fi

# Make the update request
UPDATE_RESULT=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d @/tmp/product_update.json \
"${API_URL}/products/${PRODUCT_ID}")

rm /tmp/product_update.json

# Check if the update succeeded or failed due to ETag mismatch
if echo "$UPDATE_RESULT" | grep -q "Concurrency conflict"; then
echo "❌ Update FAILED due to ETag mismatch (concurrent modification detected)"
echo "Error message:"
echo "$UPDATE_RESULT" | python -m json.tool
echo
echo "To resolve this conflict:"
echo "1. Get the latest version of the resource with a fresh ETag"
echo "2. Apply your changes to this latest version"
echo "3. Submit the update with the new ETag"
else
echo "✅ Update SUCCEEDED - No conflicts detected"
echo "Updated product:"
echo "$UPDATE_RESULT" | python -m json.tool
fi

echo
echo "To simulate a conflict, run this script with the --simulate-conflict parameter:"
echo " $0 --simulate-conflict"
17 changes: 17 additions & 0 deletions docker-compose.minio.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: '3.7'

services:
minio:
image: minio/minio
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIO_ROOT_USER: minioadmin
MINIO_ROOT_PASSWORD: minioadmin
volumes:
- minio-data:/data
command: server --console-address ":9001" /data

volumes:
minio-data:
38 changes: 38 additions & 0 deletions initialize-gs1.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash

# Start MinIO in the background using Docker Compose
echo "Starting MinIO..."
docker compose -f docker-compose.minio.yml up -d

# Wait a moment for MinIO to fully initialize
echo "Waiting for MinIO to start..."
sleep 5

# Run the GS1 initialization command
echo "Initializing GS1 identity resolver with sample data..."
yarn ts-node src/cli.ts initialize-gs1

# Verify data integrity after initialization
echo ""
echo "Verifying data integrity of the system metadata..."
yarn ts-node src/cli.ts verify-integrity -t metadata -i system

echo ""
echo "Verifying data integrity of a sample product..."
yarn ts-node src/cli.ts verify-integrity -t product -i "01/12345678901234"

echo ""
echo "GS1 Identity Resolver has been initialized!"
echo "You can access MinIO console at: http://localhost:9001"
echo "Login with: minioadmin / minioadmin"
echo "Check the gs1-identity-resolver bucket for your data"
echo ""
echo "To test the API, try these endpoints:"
echo "- http://localhost:3000/gs1/products/01/12345678901234"
echo "- http://localhost:3000/gs1/products/01/12345678901235/10/ABC123"
echo "- http://localhost:3000/gs1/01/12345678901234 (Digital Link)"
echo ""
echo "To verify data integrity via API:"
echo "- http://localhost:3000/gs1/verify/product/01/12345678901234"
echo "- http://localhost:3000/gs1/verify/metadata/system"
echo ""
Loading