Skip to content

Add dtos and other classes for task #2446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 92 commits into from
Feb 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
753618b
Util classes for data loader
inv-jishnu Dec 4, 2024
8d39d02
Fix spotbug issue
inv-jishnu Dec 4, 2024
bf94c49
Removed error message and added core error
inv-jishnu Dec 6, 2024
47be388
Applied spotless
inv-jishnu Dec 6, 2024
913eb1c
Fixed unit test failures
inv-jishnu Dec 6, 2024
1f204b8
Merge branch 'master' into feat/data-loader/utils
ypeckstadt Dec 11, 2024
6cfa83a
Basic data import enum and exception
inv-jishnu Dec 11, 2024
d381b2b
Removed exception class for now
inv-jishnu Dec 11, 2024
67f2474
Added DECIMAL_FORMAT
inv-jishnu Dec 12, 2024
14e3593
Path util class updated
inv-jishnu Dec 12, 2024
a096d51
Feedback changes
inv-jishnu Dec 13, 2024
dbf1940
Merge branch 'master' into feat/data-loader/utils
ypeckstadt Dec 13, 2024
cd8add9
Merge branch 'master' into feat/data-loader/utils
ypeckstadt Dec 16, 2024
52890c8
Changes
inv-jishnu Dec 16, 2024
5114639
Merge branch 'master' into feat/data-loader/import-data-1
inv-jishnu Dec 17, 2024
4f9cd75
Merge branch 'feat/data-loader/utils' into feat/data-loader/scaladb-dao
inv-jishnu Dec 17, 2024
1997eb8
Added ScalarDB Dao
inv-jishnu Dec 17, 2024
91e6310
Merge branch 'master' into feat/data-loader/scaladb-dao
inv-jishnu Dec 17, 2024
8a7338b
Remove unnecessary files
inv-jishnu Dec 17, 2024
2b52eeb
Initial commit [skip ci]
inv-jishnu Dec 17, 2024
e206073
Changes
inv-jishnu Dec 17, 2024
26d3144
Changes
inv-jishnu Dec 18, 2024
b86487d
spotbugs exclude
inv-jishnu Dec 18, 2024
818a2b4
spotbugs exclude -2
inv-jishnu Dec 18, 2024
90c4105
Added a file [skip ci]
inv-jishnu Dec 18, 2024
3d5d3e0
Added unit test files [skip ci]
inv-jishnu Dec 18, 2024
6495202
Spotbug fixes
inv-jishnu Dec 19, 2024
90abd9e
Removed use of List.of to fix CI error
inv-jishnu Dec 19, 2024
ba2b3dd
Merged changes from master after resolving conflict
inv-jishnu Dec 19, 2024
b1b811b
Merge branch 'master' into feat/data-loader/metadata-service
inv-jishnu Dec 19, 2024
30db988
Applied spotless
inv-jishnu Dec 19, 2024
e9bb004
Added export options validator
inv-jishnu Dec 19, 2024
03324e1
Minor change in test
inv-jishnu Dec 19, 2024
d6aaf85
Applied spotless on CoreError
inv-jishnu Dec 19, 2024
4439dea
Make constructor private and improve javadocs
ypeckstadt Dec 19, 2024
ccb1ace
Improve javadocs
ypeckstadt Dec 20, 2024
a374f1a
Add private constructor to TableMetadataUtil
ypeckstadt Dec 20, 2024
a65c9b5
Apply spotless fix
ypeckstadt Dec 20, 2024
b3279ba
Fix the validation for partition and clustering keys
ypeckstadt Dec 23, 2024
78a8170
Fix spotless format
ypeckstadt Dec 23, 2024
acedabe
Partial feedback changes
inv-jishnu Dec 24, 2024
bf31a01
Data chunk and task result enums and dtos
inv-jishnu Dec 24, 2024
57cd330
Spotless applied
inv-jishnu Dec 24, 2024
7a39564
Changes
inv-jishnu Dec 26, 2024
a95a858
Resolved conflicts and merged latest changes from main
inv-jishnu Dec 26, 2024
093cb1d
Merge branch 'feat/data-loader/scaladb-dao' into feat/data-loader/imp…
inv-jishnu Dec 31, 2024
bfebd95
Merge branch 'feat/data-loader/metadata-service' into feat/data-loade…
inv-jishnu Dec 31, 2024
fd1c186
Control file files
inv-jishnu Dec 31, 2024
e2cc6ac
Added task files and dtos
inv-jishnu Jan 2, 2025
8c75b79
Fix unit test failure
inv-jishnu Jan 2, 2025
98618aa
Fix spot bugs failure
inv-jishnu Jan 2, 2025
c05286d
Merge branch 'feat/data-loader/scaladb-dao' into feat/data-loader/exp…
inv-jishnu Jan 2, 2025
0d3f79e
Export tasks added
inv-jishnu Jan 2, 2025
2365460
Merge branch 'feat/data-loader/metadata-service' into feat/data-loade…
inv-jishnu Jan 2, 2025
95022a9
Initial commit [skip ci]
inv-jishnu Jan 2, 2025
89fea78
Added changes
inv-jishnu Jan 6, 2025
29a8c25
Fix spot less issue
inv-jishnu Jan 6, 2025
45adc95
Merge branch 'master' into feat/data-loader/export-tasks
inv-jishnu Jan 6, 2025
67dcb06
Merge branch 'master' into feat/data-loader/scaladb-dao
ypeckstadt Jan 7, 2025
cebb543
Merge branch 'master' into feat/data-loader/control-file
ypeckstadt Jan 8, 2025
8ecb39c
Changes -1
inv-jishnu Jan 9, 2025
f6c54ec
Updated test code to remove warning
inv-jishnu Jan 13, 2025
b92758c
Merged latest changes from main after resolving conflicts
inv-jishnu Jan 13, 2025
90c4830
Changes added
inv-jishnu Jan 16, 2025
39c43de
Removed scalardb manager file
inv-jishnu Jan 16, 2025
3fe30a3
Merge branch 'master' into feat/data-loader/scaladb-dao
inv-jishnu Jan 20, 2025
4df4acd
Removed wildcard import
inv-jishnu Jan 20, 2025
53cd523
Merge branch 'master' into feat/data-loader/scaladb-dao
inv-jishnu Jan 23, 2025
f4f253e
Changes
inv-jishnu Jan 28, 2025
6d43bdc
Merge branch 'master' into feat/data-loader/scaladb-dao
inv-jishnu Jan 28, 2025
9c4ae23
Resolved conflicts and merge latest changes from master
inv-jishnu Jan 28, 2025
c9d01cb
Added default case in switch to resolve sportbugs warning
inv-jishnu Jan 28, 2025
9224c7b
Merge branch 'master' into feat/data-loader/import-task
inv-jishnu Jan 28, 2025
5e61fd1
Merge branch 'master' into feat/data-loader/control-file
inv-jishnu Jan 28, 2025
f024670
Merge branch 'feat/data-loader/scaladb-dao' into feat/data-loader/con…
inv-jishnu Jan 29, 2025
39ecefe
Merge changes from master after resolving conflicts
inv-jishnu Jan 31, 2025
0984d51
Resolved conflicts and merged latest changes from main
inv-jishnu Feb 3, 2025
aadf3e1
Resolved conflicts and merged latest changes from master
inv-jishnu Feb 3, 2025
6998b68
Changes
inv-jishnu Feb 3, 2025
3c79ab6
Merged changes from master after resolving conflicts
inv-jishnu Feb 3, 2025
3ff03d9
Merge branch 'master' into feat/data-loader/export-tasks
inv-jishnu Feb 4, 2025
f3fb8d8
Merge branch 'master' into feat/data-loader/control-file
inv-jishnu Feb 4, 2025
6dd2ce2
Resolved conflicts and merged changes from branch feat/data-loader/co…
inv-jishnu Feb 4, 2025
1996865
Reverted new line removal
inv-jishnu Feb 4, 2025
da2e241
Merge export tasks branch after resolving conflicts
inv-jishnu Feb 4, 2025
7d7ec91
Revert "Merge export tasks branch after resolving conflicts"
inv-jishnu Feb 4, 2025
31094b1
Resolved conflicts and merged latest changes from main
inv-jishnu Feb 7, 2025
aebcef6
Removing unwanted changes [skip ci]
inv-jishnu Feb 7, 2025
b5134b1
Changes
inv-jishnu Feb 10, 2025
285f51d
Java doc minor change [skip ci]
inv-jishnu Feb 10, 2025
c4a4eb4
Updated test class to be package private
inv-jishnu Feb 12, 2025
6d8ecf3
Updated error code ids to make numbering consistent
inv-jishnu Feb 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions core/src/main/java/com/scalar/db/common/error/CoreError.java
Original file line number Diff line number Diff line change
Expand Up @@ -802,6 +802,20 @@ public enum CoreError implements ScalarDbError {
"Duplicated data mappings found for column '%s' in table '%s'",
"",
""),
DATA_LOADER_MISSING_CLUSTERING_KEY_COLUMN(
Category.USER_ERROR,
"0174",
"Missing required field or column mapping for clustering key %s",
"",
""),
DATA_LOADER_MISSING_PARTITION_KEY_COLUMN(
Category.USER_ERROR,
"0175",
"Missing required field or column mapping for partition key %s",
"",
""),
DATA_LOADER_MISSING_COLUMN(
Category.USER_ERROR, "0176", "Missing field or column mapping for %s", "", ""),

//
// Errors for the concurrency error category
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
package com.scalar.db.dataloader.core.dataimport;

import com.scalar.db.dataloader.core.FileFormat;
import com.scalar.db.dataloader.core.dataimport.controlfile.ControlFile;
import com.scalar.db.dataloader.core.dataimport.controlfile.ControlFileValidationLevel;
import com.scalar.db.dataloader.core.dataimport.log.LogMode;
import lombok.Builder;
import lombok.Data;

/** Import options to import data into one or more ScalarDB tables */
@Builder
@Data
public class ImportOptions {

@Builder.Default private final ImportMode importMode = ImportMode.UPSERT;
@Builder.Default private final boolean requireAllColumns = false;
@Builder.Default private final FileFormat fileFormat = FileFormat.JSON;
@Builder.Default private final boolean prettyPrint = false;
@Builder.Default private final boolean ignoreNullValues = false;
@Builder.Default private final LogMode logMode = LogMode.SPLIT_BY_DATA_CHUNK;

@Builder.Default
private final ControlFileValidationLevel controlFileValidationLevel =
ControlFileValidationLevel.MAPPED;

@Builder.Default private final char delimiter = ',';

@Builder.Default private final boolean logSuccessRecords = false;
@Builder.Default private final boolean logRawRecord = false;

private final int dataChunkSize;
private final int transactionBatchSize;
private final ControlFile controlFile;
private final String namespace;
private final String tableName;
private final int maxThreads;
private final String customHeaderRow;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
package com.scalar.db.dataloader.core.dataimport.log;

/** Log modes available for import logging */
public enum LogMode {
SINGLE_FILE,
SPLIT_BY_DATA_CHUNK
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package com.scalar.db.dataloader.core.dataimport.task.mapping;

import com.fasterxml.jackson.databind.node.ObjectNode;
import com.scalar.db.dataloader.core.dataimport.controlfile.ControlFileTable;
import com.scalar.db.dataloader.core.dataimport.controlfile.ControlFileTableFieldMapping;

public class ImportDataMapping {

/**
* * Update the source data replace the source column name with the target column name according
* to control file table data
*
* @param source source data
* @param controlFileTable control file table to map source data
*/
public static void apply(ObjectNode source, ControlFileTable controlFileTable) {
// Copy the source field data to the target column if missing
for (ControlFileTableFieldMapping mapping : controlFileTable.getMappings()) {
String sourceField = mapping.getSourceField();
String targetColumn = mapping.getTargetColumn();

if (source.has(sourceField) && !source.has(targetColumn)) {
source.set(targetColumn, source.get(sourceField));
source.remove(sourceField);
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
package com.scalar.db.dataloader.core.dataimport.task.validation;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import javax.annotation.concurrent.Immutable;

/** The validation result for a data source record */
@Immutable
public final class ImportSourceRecordValidationResult {

private final List<String> errorMessages;
private final Set<String> columnsWithErrors;

/** Constructor */
public ImportSourceRecordValidationResult() {
this.errorMessages = new ArrayList<>();
this.columnsWithErrors = new HashSet<>();
}

/**
* Add a validation error message for a column. Also marking the column as containing an error.
*
* @param columnName column name
* @param errorMessage error message
*/
public void addErrorMessage(String columnName, String errorMessage) {
this.columnsWithErrors.add(columnName);
this.errorMessages.add(errorMessage);
}

/** @return Immutable list of validation error messages */
public List<String> getErrorMessages() {
return Collections.unmodifiableList(this.errorMessages);
}

/** @return Immutable set of columns that had errors */
public Set<String> getColumnsWithErrors() {
return Collections.unmodifiableSet(this.columnsWithErrors);
}

/** @return Validation is valid or not */
public boolean isValid() {
return this.errorMessages.isEmpty();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
package com.scalar.db.dataloader.core.dataimport.task.validation;

import com.fasterxml.jackson.databind.JsonNode;
import com.scalar.db.api.TableMetadata;
import com.scalar.db.common.error.CoreError;
import com.scalar.db.dataloader.core.DatabaseKeyType;
import com.scalar.db.transaction.consensuscommit.ConsensusCommitUtils;
import java.util.Set;
import lombok.AccessLevel;
import lombok.NoArgsConstructor;

@NoArgsConstructor(access = AccessLevel.PRIVATE)
public class ImportSourceRecordValidator {

/**
* Create list for validation error messages. Validate everything and not return when one single
* error is found. Avoiding trial and error imports where every time a new error appears
*
* @param partitionKeyNames List of partition keys in table
* @param clusteringKeyNames List of clustering keys in table
* @param columnNames List of all column names in table
* @param sourceRecord source data
* @param allColumnsRequired If true treat missing columns as an error
* @return Source record validation result
*/
public static ImportSourceRecordValidationResult validateSourceRecord(
Set<String> partitionKeyNames,
Set<String> clusteringKeyNames,
Set<String> columnNames,
JsonNode sourceRecord,
boolean allColumnsRequired,
TableMetadata tableMetadata) {
ImportSourceRecordValidationResult validationResult = new ImportSourceRecordValidationResult();

// check if partition keys are found
checkMissingKeys(DatabaseKeyType.PARTITION, partitionKeyNames, sourceRecord, validationResult);

// check if clustering keys are found
checkMissingKeys(
DatabaseKeyType.CLUSTERING, clusteringKeyNames, sourceRecord, validationResult);

// Check if the record is missing any columns
if (allColumnsRequired) {
checkMissingColumns(
sourceRecord,
columnNames,
validationResult,
validationResult.getColumnsWithErrors(),
tableMetadata);
}

return validationResult;
}

/**
* Check if the required keys are found in the data file.
*
* @param keyType Type of key to validate
* @param keyColumnNames List of required column names
* @param sourceRecord source data
* @param validationResult Source record validation result
*/
public static void checkMissingKeys(
DatabaseKeyType keyType,
Set<String> keyColumnNames,
JsonNode sourceRecord,
ImportSourceRecordValidationResult validationResult) {
for (String columnName : keyColumnNames) {
if (!sourceRecord.has(columnName)) {
String errorMessageFormat =
keyType == DatabaseKeyType.PARTITION
? CoreError.DATA_LOADER_MISSING_PARTITION_KEY_COLUMN.buildMessage(columnName)
: CoreError.DATA_LOADER_MISSING_CLUSTERING_KEY_COLUMN.buildMessage(columnName);
validationResult.addErrorMessage(columnName, errorMessageFormat);
}
}
}

/**
* Make sure the json object is not missing any columns. Error added to validation errors lists
*
* @param sourceRecord Source json object
* @param columnNames List of column names for a table
* @param validationResult Source record validation result
* @param ignoreColumns Columns that can be ignored in the check
*/
public static void checkMissingColumns(
JsonNode sourceRecord,
Set<String> columnNames,
ImportSourceRecordValidationResult validationResult,
Set<String> ignoreColumns,
TableMetadata tableMetadata) {
for (String columnName : columnNames) {
// If the field is not a metadata column and is missing and should not be ignored
if ((ignoreColumns == null || !ignoreColumns.contains(columnName))
&& !ConsensusCommitUtils.isTransactionMetaColumn(columnName, tableMetadata)
&& !sourceRecord.has(columnName)) {
validationResult.addErrorMessage(
columnName, CoreError.DATA_LOADER_MISSING_COLUMN.buildMessage(columnName));
}
}
}

/**
* Make sure the json object is not missing any columns. Error added to validation errors lists
*
* @param sourceRecord Source json object
* @param columnNames List of column names for a table
* @param validationResult Source record validation result
*/
public static void checkMissingColumns(
JsonNode sourceRecord,
Set<String> columnNames,
ImportSourceRecordValidationResult validationResult,
TableMetadata tableMetadata) {
ImportSourceRecordValidator.checkMissingColumns(
sourceRecord, columnNames, validationResult, null, tableMetadata);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
package com.scalar.db.dataloader.core.dataimport.task.mapping;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import com.scalar.db.dataloader.core.dataimport.controlfile.ControlFileTable;
import com.scalar.db.dataloader.core.dataimport.controlfile.ControlFileTableFieldMapping;
import java.util.ArrayList;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

class ImportDataMappingTest {

ControlFileTable controlFilTable;

@BeforeEach
void setup() {
controlFilTable = new ControlFileTable("namespace", "table");
ControlFileTableFieldMapping m1 = new ControlFileTableFieldMapping("source_id", "target_id");
ControlFileTableFieldMapping m2 =
new ControlFileTableFieldMapping("source_name", "target_name");
ControlFileTableFieldMapping m3 =
new ControlFileTableFieldMapping("source_email", "target_email");
ArrayList<ControlFileTableFieldMapping> mappingArrayList = new ArrayList<>();
mappingArrayList.add(m1);
mappingArrayList.add(m2);
mappingArrayList.add(m3);
controlFilTable.getMappings().addAll(mappingArrayList);
}

@Test
void apply_withValidData_shouldUpdateSourceData() throws JsonProcessingException {
ObjectMapper objectMapper = new ObjectMapper();
ObjectNode source = objectMapper.createObjectNode();
source.put("source_id", "111");
source.put("source_name", "abc");
source.put("source_email", "[email protected]");
ImportDataMapping.apply(source, controlFilTable);
// Assert changes
Assertions.assertEquals("111", source.get("target_id").asText());
Assertions.assertEquals("abc", source.get("target_name").asText());
Assertions.assertEquals("[email protected]", source.get("target_email").asText());
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
package com.scalar.db.dataloader.core.dataimport.task.validation;

import com.fasterxml.jackson.databind.JsonNode;
import com.scalar.db.api.TableMetadata;
import com.scalar.db.common.error.CoreError;
import com.scalar.db.dataloader.core.UnitTestUtils;
import java.util.HashSet;
import java.util.Set;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;

class ImportSourceRecordValidatorTest {

TableMetadata mockMetadata = UnitTestUtils.createTestTableMetadata();

@Test
void
validateSourceRecord_withValidData_shouldReturnValidImportSourceRecordValidationResultWithoutErrors() {
Set<String> partitionKeyNames = mockMetadata.getPartitionKeyNames();
Set<String> clusteringKeyNames = mockMetadata.getClusteringKeyNames();
Set<String> columnNames = mockMetadata.getColumnNames();
JsonNode sourceRecord = UnitTestUtils.getOutputDataWithoutMetadata();
ImportSourceRecordValidationResult result =
ImportSourceRecordValidator.validateSourceRecord(
partitionKeyNames, clusteringKeyNames, columnNames, sourceRecord, false, mockMetadata);
Assertions.assertTrue(result.getColumnsWithErrors().isEmpty());
}

@Test
void
validateSourceRecord_withValidDataWithAllColumnsRequired_shouldReturnValidImportSourceRecordValidationResultWithoutErrors() {
Set<String> partitionKeyNames = mockMetadata.getPartitionKeyNames();
Set<String> clusteringKeyNames = mockMetadata.getClusteringKeyNames();
Set<String> columnNames = mockMetadata.getColumnNames();
JsonNode sourceRecord = UnitTestUtils.getOutputDataWithoutMetadata();
ImportSourceRecordValidationResult result =
ImportSourceRecordValidator.validateSourceRecord(
partitionKeyNames, clusteringKeyNames, columnNames, sourceRecord, true, mockMetadata);
Assertions.assertTrue(result.getColumnsWithErrors().isEmpty());
}

@Test
void
validateSourceRecord_withInValidPartitionKey_shouldReturnValidImportSourceRecordValidationResultWithErrors() {
Set<String> partitionKeyNames = new HashSet<>();
partitionKeyNames.add("id1");
Set<String> clusteringKeyNames = mockMetadata.getClusteringKeyNames();
Set<String> columnNames = mockMetadata.getColumnNames();
JsonNode sourceRecord = UnitTestUtils.getOutputDataWithoutMetadata();
ImportSourceRecordValidationResult result =
ImportSourceRecordValidator.validateSourceRecord(
partitionKeyNames, clusteringKeyNames, columnNames, sourceRecord, false, mockMetadata);
Assertions.assertFalse(result.getColumnsWithErrors().isEmpty());
}

@Test
void
validateSourceRecord_withInValidPartitionKeyWithAllColumnsRequired_shouldReturnValidImportSourceRecordValidationResultWithErrors() {
Set<String> partitionKeyNames = new HashSet<>();
partitionKeyNames.add("id1");
Set<String> clusteringKeyNames = mockMetadata.getClusteringKeyNames();
Set<String> columnNames = mockMetadata.getColumnNames();
JsonNode sourceRecord = UnitTestUtils.getOutputDataWithoutMetadata();
ImportSourceRecordValidationResult result =
ImportSourceRecordValidator.validateSourceRecord(
partitionKeyNames, clusteringKeyNames, columnNames, sourceRecord, true, mockMetadata);
Assertions.assertFalse(result.getColumnsWithErrors().isEmpty());
Assertions.assertEquals(1, result.getErrorMessages().size());
}

@Test
void
validateSourceRecord_withInValidClusteringKey_shouldReturnValidImportSourceRecordValidationResultWithErrors() {
Set<String> partitionKeyNames = mockMetadata.getPartitionKeyNames();
Set<String> clusteringKeyNames = new HashSet<>();
clusteringKeyNames.add("id1");
Set<String> columnNames = mockMetadata.getColumnNames();
JsonNode sourceRecord = UnitTestUtils.getOutputDataWithoutMetadata();
ImportSourceRecordValidationResult result =
ImportSourceRecordValidator.validateSourceRecord(
partitionKeyNames, clusteringKeyNames, columnNames, sourceRecord, false, mockMetadata);
Assertions.assertFalse(result.getColumnsWithErrors().isEmpty());
Assertions.assertEquals(
CoreError.DATA_LOADER_MISSING_CLUSTERING_KEY_COLUMN.buildMessage("id1"),
result.getErrorMessages().get(0));
}
}
Loading