Skip to content

Add import log classes and utils #2591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 8, 2025
Merged

Conversation

inv-jishnu
Copy link
Contributor

@inv-jishnu inv-jishnu commented Apr 11, 2025

Description

In this PR I have added the classes and util files to add logs for the import process. The generated logs files are summary logs, success logs and failure logs

Related issues and/or PRs

Please review this PR once the following PR is reviewed and merged and the master is then merged to this branch.

Changes made

Added log classes and utils for import module. The log implementations are primarily single, and split log mode. The single log only generates a single summary, failure and success log file for the import while the split log mode generates a summary, failure and success log for each data chunk.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

Road map to merge remaining data loader core files. Current status

Release notes

NA

@inv-jishnu inv-jishnu self-assigned this Apr 11, 2025
@inv-jishnu inv-jishnu added the enhancement New feature or request label Apr 11, 2025
@inv-jishnu inv-jishnu marked this pull request as draft April 11, 2025 03:48
@inv-jishnu inv-jishnu mentioned this pull request Apr 11, 2025
6 tasks
@inv-jishnu inv-jishnu changed the title Add import log module Add import log classes and utils Apr 11, 2025
@ypeckstadt ypeckstadt marked this pull request as ready for review April 15, 2025 05:32
.rowNumber(taskResult.getRowNumber());

// Only add the raw record if the configuration is set to log raw source data
if (config.isLogRawSourceRecords()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.isLogRawSourceRecords()

config.logRawSourceRecordsEnabled() or config.isLogRawSourceRecordsEnabled() or config.shouldLogRawSourceRecords() is a bit better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inv-jishnu How about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brfrn169 san,
I had renamed it again in0f34395 now to isLogRawSourceRecordsEnabled from isLogRawSourceRecords. Sorry for the confusion.
Thank you.

Comment on lines 36 to 37
private LogWriter successLogWriter;
private LogWriter failureLogWriter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these 2 log writers can be final.

* @throws IOException if an I/O error occurs while writing to the log
*/
private void logDataChunkSummary(ImportDataChunkStatus dataChunkStatus) throws IOException {
if (summaryLogWriter == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method called by multiple threads? If so, this initialization for summaryLogWriter requires something like synchronized block.

* been completed.
*/
private void closeAllLogWriters() {
closeLogWriter(summaryLogWriter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Null check isn't needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@komamitsu san,
The null check is added in called closeLogWriter method.on AbstractImportLogger class. Should I move it from there to here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. It sounds good 👍 . (closeLogWriterIfNeeded or something might be clearer in this case, though.)

throws IOException {
String logFileName = batchResult.isSuccess() ? SUCCESS_LOG_FILE_NAME : FAILURE_LOG_FILE_NAME;
LogWriter logWriter = batchResult.isSuccess() ? successLogWriter : failureLogWriter;
if (logWriter == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both log writers are not null since they are created in the constructor, right?

case SUMMARY:
logWriterMap = summaryLogWriters;
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add default: throw new AssertionError() just in case?

private LogWriter initializeLogWriterIfNeeded(LogFileType logFileType, int dataChunkId)
throws IOException {
Map<Integer, LogWriter> logWriters = getLogWriters(logFileType);
if (!logWriters.containsKey(dataChunkId)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method called by multiple threads? If so, it needs to be thread safe.

try {
writeImportTaskResultDetailToLogs(taskResult);
} catch (IOException e) {
LOGGER.error("Failed to write success/failure logs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOGGER.error("Failed to write success/failure logs");
LOGGER.error("Failed to write success/failure logs", e);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's the expected behavior, but don't need to throw an exception to tell the user the problem? Just outputting error log is enough?

* @throws IOException if an I/O error occurs while writing the record
*/
@Override
public void write(JsonNode sourceRecord) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void write(JsonNode sourceRecord) throws IOException {
public void write(@Nullable JsonNode sourceRecord) throws IOException {

if (sourceRecord == null) {
return;
}
synchronized (logWriter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObjectMapper is fully thread-safe. So, I don't think this synchronization is needed.

@feeblefakie feeblefakie requested a review from Copilot April 22, 2025 01:50
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds new logging classes and utilities for the import module that generate summary, success, and failure log files, supporting both single and split log file modes.

  • Introduces implementations for log writing (LocalFileLogWriter, DefaultLogWriterFactory) and corresponding logger classes (SingleFileImportLogger, SplitByDataChunkImportLogger).
  • Updates and adds comprehensive tests for the new logging functionalities.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/log/writer/DefaultLogWriterFactoryTest.java Test for verifying creation of LocalFileLogWriter.
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/log/SplitByDataChunkImportLoggerTest.java Tests for split log mode behavior and file creation.
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/log/SingleFileImportLoggerTest.java Tests for single file log mode functionality.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/log/writer/*.java New log writer interfaces and implementations.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/log/*.java New logger implementations and configuration for import logging.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/log/ImportLoggerConfig.java Added immutable configuration for import logging.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/log/AbstractImportLogger.java Base class for import loggers with common logging functionality.

Comment on lines 130 to 131
.dataChunkId(taskResult.getDataChunkId())
.rowNumber(taskResult.getRowNumber());
Copy link
Preview

Copilot AI Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ImportTaskResult builder is calling rowNumber(taskResult.getRowNumber()) twice; remove the duplicate call to avoid potential confusion and ensure clean builder initialization.

Suggested change
.dataChunkId(taskResult.getDataChunkId())
.rowNumber(taskResult.getRowNumber());
.dataChunkId(taskResult.getDataChunkId());

Copilot uses AI. Check for mistakes.

@inv-jishnu inv-jishnu requested a review from komamitsu April 23, 2025 09:46
Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions. PTAL!

.rowNumber(taskResult.getRowNumber());

// Only add the raw record if the configuration is set to log raw source data
if (config.isLogRawSourceRecords()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inv-jishnu How about this?

closeLogWriter(summaryLogWriter);
closeLogWriter(successLogWriter);
closeLogWriter(failureLogWriter);
summaryLogWriter = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are successLogWriter and failureLogWriter also reinitialized after being closed?
It looks like they are only initialized once in the constructor, and I don’t see this class being reused after calling closeAllLogWriters().

@inv-jishnu inv-jishnu requested a review from brfrn169 May 2, 2025 12:30
@inv-jishnu
Copy link
Contributor Author

@brfrn169 san,
I have made a few changes based on feedback and replied to your questions.
PTAL again when you get a chance.
Thank you.

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@ypeckstadt ypeckstadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@feeblefakie feeblefakie merged commit 7450430 into master May 8, 2025
51 checks passed
@feeblefakie feeblefakie deleted the feat/data-loader/import-log-2 branch May 8, 2025 04:09
feeblefakie pushed a commit that referenced this pull request May 8, 2025
feeblefakie pushed a commit that referenced this pull request May 8, 2025
feeblefakie pushed a commit that referenced this pull request May 8, 2025
feeblefakie pushed a commit that referenced this pull request May 8, 2025
feeblefakie pushed a commit that referenced this pull request May 8, 2025
feeblefakie pushed a commit that referenced this pull request May 8, 2025
feeblefakie pushed a commit that referenced this pull request May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants