Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#366 Track raw files at sources and in the metastore by size instead by count #371

Merged
merged 2 commits into from
Mar 13, 2024

Conversation

yruslan
Copy link
Collaborator

@yruslan yruslan commented Mar 13, 2024

Closes #366

Since the choice of what info to track spreads across all pipeline, and can't be solved on the source level of RawFileSource, I decided to always track files by size, and not make it optional.

We we need to track raw files in a different way in the future, will add an option then.

Here is how notifications that involve file landing jobs look like:
Screenshot 2024-03-13 at 11 45 12

Copy link

Unit Test Coverage

File Coverage [87.61%] 🍏
TaskResult.scala 100% 🍏
RawFileSource.scala 98.67% 🍏
MetastorePersistenceRaw.scala 98.23% 🍏
SparkSource.scala 94.49% 🍏
PipelineNotificationBuilderHtml.scala 91.69% 🍏
FsUtils.scala 84.57% 🍏
OrchestratorImpl.scala 82.44% 🍏
TaskRunnerBase.scala 79.92%
ConcurrentJobRunnerImpl.scala 79.28%
Total Project Coverage 83.3% 🍏

@yruslan yruslan marked this pull request as ready for review March 13, 2024 10:49
@yruslan
Copy link
Collaborator Author

yruslan commented Mar 13, 2024

Thanks for the review!

@yruslan yruslan merged commit 940c49c into main Mar 13, 2024
9 checks passed
@yruslan yruslan deleted the feature/366-track-raw-files-by-size branch March 13, 2024 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability for RawFileSource to track changes by file size not file count
2 participants