Move StoryBookCreateFromEPubController method to a new Util class #1919

Aryant-Tripathi · 2024-10-22T18:06:37Z

Issue Number

Resolves Move machine learning code into utils class #1825

Purpose

Move the StoryBookCreateFromEPubController method to a new Util class

Technical Details

Added model .pmml class and added test cases also

…loses elimu-ai#1825)

codecov · 2024-10-22T18:08:25Z

Codecov Report

Attention: Patch coverage is 66.66667% with 7 lines in your changes missing coverage. Please review.

Project coverage is 15.97%. Comparing base (e65a7d4) to head (4db44a8).
Report is 12 commits behind head on main.

Files with missing lines	Patch %	Lines
...t/storybook/StoryBookCreateFromEPubController.java	0.00%	4 Missing ⚠️
...main/java/ai/elimu/util/ReadingLevelConstants.java	0.00%	2 Missing ⚠️
...c/main/java/ai/elimu/util/ml/ReadingLevelUtil.java	93.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1919      +/-   ##
============================================
+ Coverage     15.05%   15.97%   +0.92%     
- Complexity      457      478      +21     
============================================
  Files           250      254       +4     
  Lines          7731     7798      +67     
  Branches        806      816      +10     
============================================
+ Hits           1164     1246      +82     
+ Misses         6517     6502      -15     
  Partials         50       50

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai · 2024-10-22T18:14:07Z

Walkthrough

A new utility class, ReadingLevelUtil, has been introduced to handle reading level predictions based on chapter, paragraph, and word counts. The predictReadingLevel method utilizes a pre-trained machine learning model for predictions. The StoryBookCreateFromEPubController class has been refactored to call this new utility method, simplifying its logic. Additionally, a new class ReadingLevelConstants has been created to store constants related to reading levels, and unit tests for the ReadingLevelUtil class have been added to ensure functionality.

Changes

File Path	Change Summary
`src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java`	Added class `ReadingLevelUtil` with method `predictReadingLevel(int, int, int)` for reading level prediction.
`src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java`	Refactored `predictReadingLevel` method to call `ReadingLevelUtil.predictReadingLevel`. Removed old model loading and prediction logic.
`src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java`	Added unit tests for `predictReadingLevel`, including a test for reading level 4.
`src/main/java/ai/elimu/util/ReadingLevelConstants.java`	Added class `ReadingLevelConstants` with inner class `READING_LEVEL_CONSTANTS` containing static constants for reading level attributes.

Assessment against linked issues

Objective	Addressed	Explanation
Move machine learning code into utils class (#1825)	✅
Add unit tests for utils class (#1825)	✅

Possibly related PRs

#1912 Remove most recent contributions feature #1916: This PR is unrelated to the main PR as it focuses on removing a feature related to contributions, while the main PR introduces a new utility class for predicting reading levels.

Suggested reviewers

vrudas
jo-elimu
Souvik-Cyclic
alexander-kuruvilla
nya-elimu

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (2)

src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java (2)

1-11: Add class-level Javadoc documentation.

Consider adding class-level Javadoc to describe the purpose of these tests and their relationship to the reading level prediction functionality.

 package ai.elimu.util.ml;

 import ai.elimu.model.v2.enums.ReadingLevel;
 import org.junit.jupiter.api.Test;

 import java.io.IOException;

 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;

+/**
+ * Test cases for {@link ReadingLevelUtil} class.
+ * Validates the reading level prediction functionality using PMML model
+ * for different combinations of chapter, paragraph, and word counts.
+ */
 public class ReadingLevelUtilTest {

26-50: Document test data thresholds.

The test cases show a progression in the input parameters, but it would be helpful to document why these specific values were chosen and what they represent in terms of reading level thresholds.

     @Test
+    @DisplayName("Should predict LEVEL2 for a story with moderate content")
     public void testPredictReadingLevel_Level2() {
+        // Threshold values for Level 2:
+        // - Chapters: > 10
+        // - Paragraphs: > 20
+        // - Words: > 200
         int chapterCount = 12;
         int paragraphCount = 22;
         int wordCount = 250;

     @Test
+    @DisplayName("Should predict LEVEL3 for a story with advanced content")
     public void testPredictReadingLevel_Level3() {
+        // Threshold values for Level 3:
+        // - Chapters: > 10
+        // - Paragraphs: > 24
+        // - Words: > 300

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e65a7d4 and 1a03cae.

📒 Files selected for processing (5)

src/main/java/ai/elimu/util/AppConstants.java (1 hunks)
src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java (1 hunks)
src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java (3 hunks)
src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java (1 hunks)
src/test/resources/ai/elimu/util/reading_level/model1.pmml (1 hunks)

🧰 Additional context used

🔇 Additional comments (7)

src/main/java/ai/elimu/util/AppConstants.java (1)
6-9: Verify constant values match PMML model expectations.

The constant values use different casing styles (chapter_count vs LEVEL). We should verify that these keys exactly match what's expected by the PMML model to avoid runtime issues.

✅ Verification successful

Let me try to locate the PMML files first, as the initial path assumption was incorrect.

Let's examine the PMML files and the ReadingLevelUtil to understand how these constants are actually used.

Based on the evidence gathered, I can now provide a final response:

The constant values are correctly aligned with PMML model expectations

The verification shows that:

The snake_case keys (chapter_count, paragraph_count, word_count) in AppConstants.java match exactly with the feature names in both PMML models

The LEVEL constant is used to construct reading level enum values (e.g., "LEVEL1", "LEVEL2") and is not a PMML feature name, so its uppercase format is appropriate

The usage in ReadingLevelUtil.java confirms these constants are properly mapped to the model's input features
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check PMML file for feature names to ensure consistency

# Look for the constant values in the PMML file
echo "Searching for feature names in PMML file..."
rg -i "chapter_count|paragraph_count|word_count|LEVEL" src/main/resources/ai/elimu/util/ml/
Length of output: 267

Script:
#!/bin/bash
# First, let's find any PMML files in the repository
echo "Searching for PMML files..."
fd -e pmml

# Let's also search for these constants' usage in the codebase
# to understand the context better
echo -e "\nSearching for usage of these constants..."
rg -l "CHAPTER_COUNT_KEY|PARAGRAPH_COUNT_KEY|WORD_COUNT_KEY|LEVEL" --type java
Length of output: 722

Script:
#!/bin/bash
# Check the PMML files for the feature names
echo "Checking PMML files for feature names..."
rg -i "chapter_count|paragraph_count|word_count|LEVEL" src/main/resources/ai/elimu/web/content/storybook/step2_2_model.pmml src/test/resources/ai/elimu/util/reading_level/model1.pmml

# Check how these constants are used in ReadingLevelUtil
echo -e "\nChecking ReadingLevelUtil usage..."
rg -A 5 "CHAPTER_COUNT_KEY|PARAGRAPH_COUNT_KEY|WORD_COUNT_KEY|LEVEL" src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java
Length of output: 6736
src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java (1)
1-59: 🛠️ Refactor suggestion

Verify test resources and consider additional test scenarios.

Please ensure that the test PMML model file exists at the specified path and consider adding:

Parameterized tests for multiple input combinations

Edge case tests (e.g., zero values, very large values)

Boundary value tests between reading levels
@ParameterizedTest
@CsvSource({
    "0,0,0,LEVEL1",
    "1,1,1,LEVEL1",
    "11,21,225,LEVEL2",  // Boundary between LEVEL1 and LEVEL2
    "12,24,325,LEVEL3"   // Boundary between LEVEL2 and LEVEL3
})
void testReadingLevelBoundaries(int chapters, int paragraphs, int words, ReadingLevel expected) {
    assertEquals(expected, 
        ReadingLevelUtil.predictReadingLevel(chapters, paragraphs, words, TEST_MODEL_PATH));
}
✅ Verification successful

Let me gather more information about the test coverage and implementation to provide a comprehensive verification.

This will help us understand:

The actual implementation logic to verify if current test cases cover critical paths

If there are other test files we should consider

All possible enum values to ensure complete coverage

Let me check the actual implementation of ReadingLevelUtil to understand the logic better.

This will help us understand:

The actual implementation logic and any edge cases that need testing

The test resources structure and content to validate our test assumptions

Based on the implementation and test resources verification, I can now provide a final response:

Current test coverage is adequate with room for targeted improvements

The existing tests are well-structured and cover the essential scenarios, with proper verification confirmed:

Test PMML model file exists at the specified path

Tests cover the core functionality for all three reading levels

Error handling for invalid model file is included

While the review's suggestions for additional tests are valid, they should be prioritized based on the actual implementation:

The current tests already cover representative values for each reading level

Zero values are implicitly handled by the Map conversion to Double

The PMML model handles the actual classification logic, so boundary testing should focus on known model decision boundaries rather than arbitrary values

Note: The parameterized tests suggestion, while good practice, isn't critical here as the existing discrete test cases already provide good coverage of the different scenarios and the actual classification is delegated to the PMML model.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify the existence of test resources
echo "Checking for test PMML file..."
if [ -f "src/test/resources/ai/elimu/util/reading_level/model1.pmml" ]; then
    echo "Test PMML file found"
else
    echo "Warning: Test PMML file not found"
fi
Length of output: 193

Script:
#!/bin/bash
# Check the ReadingLevelUtil implementation to understand the logic and boundaries
echo "Checking ReadingLevelUtil implementation..."
ast-grep --pattern 'class ReadingLevelUtil {
  $$$
}'

# Check if there are any other test files for ReadingLevelUtil
echo -e "\nChecking for other test files..."
fd -e java ReadingLevelUtil

# Look at the ReadingLevel enum to understand possible values
echo -e "\nChecking ReadingLevel enum..."
ast-grep --pattern 'enum ReadingLevel {
  $$$
}'
Length of output: 499

Script:
#!/bin/bash
# Check the actual implementation of ReadingLevelUtil
echo "Checking ReadingLevelUtil implementation..."
cat src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java

# Check for any existing test resources in the reading_level directory
echo -e "\nListing test resources..."
find src/test/resources/ai/elimu/util/reading_level/ -type f -ls
Length of output: 1596
src/test/resources/ai/elimu/util/reading_level/model1.pmml (5)

6-11: Confirm the accuracy of numberOfFields in DataDictionary.

The DataDictionary specifies numberOfFields="4", which matches the four DataField elements defined. Ensure this count remains accurate if additional fields are added or removed in the future to prevent parsing errors.

7-10: Data types and op types are appropriately defined.

All DataField elements have optype="continuous", and data types are correctly set to integer for counts and double for reading_level. This aligns with the requirements for a regression model.

12-18: Consistency between DataDictionary and MiningSchema fields.

The MiningSchema includes all fields from the DataDictionary, with reading_level correctly designated as the target variable (usageType="target"). Maintaining this consistency is crucial for accurate model evaluation.

19-39: Validate the logical flow of the decision tree nodes.

The decision tree structure appears logically sound, with appropriately nested Node elements and clear predicates based on chapter_count, paragraph_count, and word_count. This hierarchical evaluation should effectively predict the reading_level.

22-38: Verify threshold values in SimplePredicate elements.

Ensure that the threshold values used in predicates (e.g., chapter_count < 10, word_count >= 300) are based on empirical data or domain expertise. Adjusting these values might improve the model's predictive accuracy.

src/main/java/ai/elimu/util/AppConstants.java

src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java

src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java

src/test/resources/ai/elimu/util/reading_level/model1.pmml

src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java

src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java

src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java

src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java

src/test/resources/ai/elimu/util/reading_level/model1.pmml

jo-elimu

Looks mostly good to me, although I would delete the model1.pmml file since it adds unnecessary complexity to the machine learning pipeline.

- Refactored `predictReadingLevel` method from core class into `PredictionUtils` to enhance reusability and separation of concerns. - The utility class no longer includes logging statements. - Add unit test cases for ReadingFileUtil class. - Create a ReadingLevelConstants class which will be responsbile for containing all constants related to Reading level. Issue elimu-ai#1825

- Refactored the predictReadingLevel method from core class into PredictionUtils to enhance reusability and separation of concerns. - The utility class no longer includes logging statements, adhering to best practices. - Added a ReadingLevelConstants class responsible for containing all constants related to the Reading Level ML model. - Added unit test cases for ReadingLevelUtil class. Issue elimu-ai#1825

Aryant-Tripathi · 2024-10-24T19:39:15Z

@jo-elimu Updated the PR with suggested changes, Remove sample model_1 file and uses the same file as you suggested.
Please have a look and review the changes.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)

src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java (1)

544-552: LGTM: Clean refactoring of reading level prediction.

The code has been successfully simplified by delegating the prediction logic to ReadingLevelUtil, improving maintainability and separation of concerns.

Consider adding error handling:

 private ReadingLevel predictReadingLevel(int chapterCount, int paragraphCount, int wordCount) {
     logger.info(
             "Predicting reading level for chapter: {}, paragraph: {}, word: {} ",
             chapterCount, paragraphCount, wordCount
     );

-    ReadingLevel readingLevel = ReadingLevelUtil.predictReadingLevel(chapterCount, paragraphCount, wordCount);
-    logger.info("Predicted reading level: {}", readingLevel);
+    try {
+        ReadingLevel readingLevel = ReadingLevelUtil.predictReadingLevel(chapterCount, paragraphCount, wordCount);
+        logger.info("Predicted reading level: {}", readingLevel);
+        return readingLevel;
+    } catch (Exception e) {
+        logger.error("Failed to predict reading level: {}", e.getMessage(), e);
+        return ReadingLevel.LEVEL1; // Consider an appropriate fallback level
+    }

-    return readingLevel;
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1a03cae and 4db44a8.

📒 Files selected for processing (4)

src/main/java/ai/elimu/util/ReadingLevelConstants.java (1 hunks)
src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java (1 hunks)
src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java (3 hunks)
src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java

🔇 Additional comments (8)

src/main/java/ai/elimu/util/ReadingLevelConstants.java (1)

7-25: LGTM! Well-documented constants.

The constants are appropriately named and documented with clear JavaDoc comments explaining their purpose.

src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java (1)

26-26: LGTM: Import aligns with code reorganization.

The addition of ReadingLevelUtil import supports the PR objective of moving machine learning code to a dedicated utility class.

src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java (6)

43-43: Handle exceptions when loading the model to prevent potential crashes

50-52: Ensure all required features are present to avoid NullPointerException

54-54: Check for null or empty prediction results to prevent exceptions

56-58: Safely cast the prediction result to prevent ClassCastException

61-61: Handle potential IllegalArgumentException when converting to ReadingLevel enum

43-43: Consider caching the model to improve performance

src/main/java/ai/elimu/util/ReadingLevelConstants.java

src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java

jo-elimu

Great work, @Aryant-Tripathi. Thank you 🙂

-Move StoryBookCreateFromEPubController method to a new Util class (C…

1a03cae

…loses elimu-ai#1825)

Aryant-Tripathi requested a review from a team as a code owner October 22, 2024 18:06

Aryant-Tripathi requested review from vrudas, nya-elimu and SnehaHS65 October 22, 2024 18:06

coderabbitai bot requested changes Oct 22, 2024

View reviewed changes

jo-elimu requested review from Souvik-Cyclic, jo-elimu and alexander-kuruvilla October 23, 2024 02:45

jo-elimu requested changes Oct 23, 2024

View reviewed changes

Aryant-Tripathi requested a review from jo-elimu October 23, 2024 15:57

jo-elimu assigned Aryant-Tripathi Oct 24, 2024

jo-elimu requested changes Oct 24, 2024

View reviewed changes

Aryant-Tripathi added 2 commits October 25, 2024 00:58

Aryant-Tripathi requested a review from jo-elimu October 24, 2024 19:42

coderabbitai bot requested changes Oct 24, 2024

View reviewed changes

jo-elimu approved these changes Oct 26, 2024

View reviewed changes

coderabbitai bot approved these changes Oct 26, 2024

View reviewed changes

nya-elimu merged commit 55112e6 into elimu-ai:main Oct 29, 2024
14 checks passed

coderabbitai bot mentioned this pull request Apr 18, 2025

refactor: show predicted reading level from ml model #2150

Merged

Uh oh!

Move StoryBookCreateFromEPubController method to a new Util class #1919

Move StoryBookCreateFromEPubController method to a new Util class #1919

Uh oh!

Conversation

Aryant-Tripathi commented Oct 22, 2024

Issue Number

Purpose

Technical Details

Uh oh!

codecov bot commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jo-elimu left a comment

Choose a reason for hiding this comment

Uh oh!

Aryant-Tripathi commented Oct 24, 2024

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jo-elimu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 22, 2024 •

edited

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)