Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move StoryBookCreateFromEPubController method to a new Util class #1919

Conversation

Aryant-Tripathi
Copy link
Contributor

Issue Number

Purpose

  • Move the StoryBookCreateFromEPubController method to a new Util class

Technical Details

  • Added model .pmml class and added test cases also

@Aryant-Tripathi Aryant-Tripathi requested a review from a team as a code owner October 22, 2024 18:06
Copy link

codecov bot commented Oct 22, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 7 lines in your changes missing coverage. Please review.

Project coverage is 15.97%. Comparing base (e65a7d4) to head (4db44a8).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
...t/storybook/StoryBookCreateFromEPubController.java 0.00% 4 Missing ⚠️
...main/java/ai/elimu/util/ReadingLevelConstants.java 0.00% 2 Missing ⚠️
...c/main/java/ai/elimu/util/ml/ReadingLevelUtil.java 93.33% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1919      +/-   ##
============================================
+ Coverage     15.05%   15.97%   +0.92%     
- Complexity      457      478      +21     
============================================
  Files           250      254       +4     
  Lines          7731     7798      +67     
  Branches        806      816      +10     
============================================
+ Hits           1164     1246      +82     
+ Misses         6517     6502      -15     
  Partials         50       50              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

coderabbitai bot commented Oct 22, 2024

Walkthrough

A new utility class, ReadingLevelUtil, has been introduced to handle reading level predictions based on chapter, paragraph, and word counts. The predictReadingLevel method utilizes a pre-trained machine learning model for predictions. The StoryBookCreateFromEPubController class has been refactored to call this new utility method, simplifying its logic. Additionally, a new class ReadingLevelConstants has been created to store constants related to reading levels, and unit tests for the ReadingLevelUtil class have been added to ensure functionality.

Changes

File Path Change Summary
src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java Added class ReadingLevelUtil with method predictReadingLevel(int, int, int) for reading level prediction.
src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java Refactored predictReadingLevel method to call ReadingLevelUtil.predictReadingLevel. Removed old model loading and prediction logic.
src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java Added unit tests for predictReadingLevel, including a test for reading level 4.
src/main/java/ai/elimu/util/ReadingLevelConstants.java Added class ReadingLevelConstants with inner class READING_LEVEL_CONSTANTS containing static constants for reading level attributes.

Assessment against linked issues

Objective Addressed Explanation
Move machine learning code into utils class (#1825)
Add unit tests for utils class (#1825)

Possibly related PRs

Suggested reviewers

  • vrudas
  • jo-elimu
  • Souvik-Cyclic
  • alexander-kuruvilla
  • nya-elimu

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (2)
src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java (2)

1-11: Add class-level Javadoc documentation.

Consider adding class-level Javadoc to describe the purpose of these tests and their relationship to the reading level prediction functionality.

 package ai.elimu.util.ml;

 import ai.elimu.model.v2.enums.ReadingLevel;
 import org.junit.jupiter.api.Test;

 import java.io.IOException;

 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;

+/**
+ * Test cases for {@link ReadingLevelUtil} class.
+ * Validates the reading level prediction functionality using PMML model
+ * for different combinations of chapter, paragraph, and word counts.
+ */
 public class ReadingLevelUtilTest {

26-50: Document test data thresholds.

The test cases show a progression in the input parameters, but it would be helpful to document why these specific values were chosen and what they represent in terms of reading level thresholds.

     @Test
+    @DisplayName("Should predict LEVEL2 for a story with moderate content")
     public void testPredictReadingLevel_Level2() {
+        // Threshold values for Level 2:
+        // - Chapters: > 10
+        // - Paragraphs: > 20
+        // - Words: > 200
         int chapterCount = 12;
         int paragraphCount = 22;
         int wordCount = 250;

     @Test
+    @DisplayName("Should predict LEVEL3 for a story with advanced content")
     public void testPredictReadingLevel_Level3() {
+        // Threshold values for Level 3:
+        // - Chapters: > 10
+        // - Paragraphs: > 24
+        // - Words: > 300
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e65a7d4 and 1a03cae.

📒 Files selected for processing (5)
  • src/main/java/ai/elimu/util/AppConstants.java (1 hunks)
  • src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java (1 hunks)
  • src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java (3 hunks)
  • src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java (1 hunks)
  • src/test/resources/ai/elimu/util/reading_level/model1.pmml (1 hunks)
🧰 Additional context used
🔇 Additional comments (7)
src/main/java/ai/elimu/util/AppConstants.java (1)

6-9: Verify constant values match PMML model expectations.

The constant values use different casing styles (chapter_count vs LEVEL). We should verify that these keys exactly match what's expected by the PMML model to avoid runtime issues.

✅ Verification successful

Let me try to locate the PMML files first, as the initial path assumption was incorrect.


Let's examine the PMML files and the ReadingLevelUtil to understand how these constants are actually used.


Based on the evidence gathered, I can now provide a final response:

The constant values are correctly aligned with PMML model expectations

The verification shows that:

  • The snake_case keys (chapter_count, paragraph_count, word_count) in AppConstants.java match exactly with the feature names in both PMML models
  • The LEVEL constant is used to construct reading level enum values (e.g., "LEVEL1", "LEVEL2") and is not a PMML feature name, so its uppercase format is appropriate
  • The usage in ReadingLevelUtil.java confirms these constants are properly mapped to the model's input features
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check PMML file for feature names to ensure consistency

# Look for the constant values in the PMML file
echo "Searching for feature names in PMML file..."
rg -i "chapter_count|paragraph_count|word_count|LEVEL" src/main/resources/ai/elimu/util/ml/

Length of output: 267


Script:

#!/bin/bash
# First, let's find any PMML files in the repository
echo "Searching for PMML files..."
fd -e pmml

# Let's also search for these constants' usage in the codebase
# to understand the context better
echo -e "\nSearching for usage of these constants..."
rg -l "CHAPTER_COUNT_KEY|PARAGRAPH_COUNT_KEY|WORD_COUNT_KEY|LEVEL" --type java

Length of output: 722


Script:

#!/bin/bash
# Check the PMML files for the feature names
echo "Checking PMML files for feature names..."
rg -i "chapter_count|paragraph_count|word_count|LEVEL" src/main/resources/ai/elimu/web/content/storybook/step2_2_model.pmml src/test/resources/ai/elimu/util/reading_level/model1.pmml

# Check how these constants are used in ReadingLevelUtil
echo -e "\nChecking ReadingLevelUtil usage..."
rg -A 5 "CHAPTER_COUNT_KEY|PARAGRAPH_COUNT_KEY|WORD_COUNT_KEY|LEVEL" src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java

Length of output: 6736

src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java (1)

1-59: 🛠️ Refactor suggestion

Verify test resources and consider additional test scenarios.

Please ensure that the test PMML model file exists at the specified path and consider adding:

  1. Parameterized tests for multiple input combinations
  2. Edge case tests (e.g., zero values, very large values)
  3. Boundary value tests between reading levels
@ParameterizedTest
@CsvSource({
    "0,0,0,LEVEL1",
    "1,1,1,LEVEL1",
    "11,21,225,LEVEL2",  // Boundary between LEVEL1 and LEVEL2
    "12,24,325,LEVEL3"   // Boundary between LEVEL2 and LEVEL3
})
void testReadingLevelBoundaries(int chapters, int paragraphs, int words, ReadingLevel expected) {
    assertEquals(expected, 
        ReadingLevelUtil.predictReadingLevel(chapters, paragraphs, words, TEST_MODEL_PATH));
}
✅ Verification successful

Let me gather more information about the test coverage and implementation to provide a comprehensive verification.

This will help us understand:

  1. The actual implementation logic to verify if current test cases cover critical paths
  2. If there are other test files we should consider
  3. All possible enum values to ensure complete coverage

Let me check the actual implementation of ReadingLevelUtil to understand the logic better.

This will help us understand:

  1. The actual implementation logic and any edge cases that need testing
  2. The test resources structure and content to validate our test assumptions

Based on the implementation and test resources verification, I can now provide a final response:

Current test coverage is adequate with room for targeted improvements

The existing tests are well-structured and cover the essential scenarios, with proper verification confirmed:

  • Test PMML model file exists at the specified path
  • Tests cover the core functionality for all three reading levels
  • Error handling for invalid model file is included

While the review's suggestions for additional tests are valid, they should be prioritized based on the actual implementation:

  1. The current tests already cover representative values for each reading level
  2. Zero values are implicitly handled by the Map conversion to Double
  3. The PMML model handles the actual classification logic, so boundary testing should focus on known model decision boundaries rather than arbitrary values

Note: The parameterized tests suggestion, while good practice, isn't critical here as the existing discrete test cases already provide good coverage of the different scenarios and the actual classification is delegated to the PMML model.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the existence of test resources
echo "Checking for test PMML file..."
if [ -f "src/test/resources/ai/elimu/util/reading_level/model1.pmml" ]; then
    echo "Test PMML file found"
else
    echo "Warning: Test PMML file not found"
fi

Length of output: 193


Script:

#!/bin/bash
# Check the ReadingLevelUtil implementation to understand the logic and boundaries
echo "Checking ReadingLevelUtil implementation..."
ast-grep --pattern 'class ReadingLevelUtil {
  $$$
}'

# Check if there are any other test files for ReadingLevelUtil
echo -e "\nChecking for other test files..."
fd -e java ReadingLevelUtil

# Look at the ReadingLevel enum to understand possible values
echo -e "\nChecking ReadingLevel enum..."
ast-grep --pattern 'enum ReadingLevel {
  $$$
}'

Length of output: 499


Script:

#!/bin/bash
# Check the actual implementation of ReadingLevelUtil
echo "Checking ReadingLevelUtil implementation..."
cat src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java

# Check for any existing test resources in the reading_level directory
echo -e "\nListing test resources..."
find src/test/resources/ai/elimu/util/reading_level/ -type f -ls

Length of output: 1596

src/test/resources/ai/elimu/util/reading_level/model1.pmml (5)

6-11: Confirm the accuracy of numberOfFields in DataDictionary.

The DataDictionary specifies numberOfFields="4", which matches the four DataField elements defined. Ensure this count remains accurate if additional fields are added or removed in the future to prevent parsing errors.


7-10: Data types and op types are appropriately defined.

All DataField elements have optype="continuous", and data types are correctly set to integer for counts and double for reading_level. This aligns with the requirements for a regression model.


12-18: Consistency between DataDictionary and MiningSchema fields.

The MiningSchema includes all fields from the DataDictionary, with reading_level correctly designated as the target variable (usageType="target"). Maintaining this consistency is crucial for accurate model evaluation.


19-39: Validate the logical flow of the decision tree nodes.

The decision tree structure appears logically sound, with appropriately nested Node elements and clear predicates based on chapter_count, paragraph_count, and word_count. This hierarchical evaluation should effectively predict the reading_level.


22-38: Verify threshold values in SimplePredicate elements.

Ensure that the threshold values used in predicates (e.g., chapter_count < 10, word_count >= 300) are based on empirical data or domain expertise. Adjusting these values might improve the model's predictive accuracy.

Copy link
Member

@jo-elimu jo-elimu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, although I would delete the model1.pmml file since it adds unnecessary complexity to the machine learning pipeline.

- Refactored `predictReadingLevel` method from core class into `PredictionUtils`
  to enhance reusability and separation of concerns.
- The utility class no longer includes logging statements.
- Add unit test cases for ReadingFileUtil class.
- Create a ReadingLevelConstants class which will be responsbile for containing all constants related to Reading level.

Issue elimu-ai#1825
- Refactored the predictReadingLevel method from core class into PredictionUtils to enhance reusability and separation of concerns.
- The utility class no longer includes logging statements, adhering to best practices.
- Added a ReadingLevelConstants class responsible for containing all constants related to the Reading Level ML model.
- Added unit test cases for ReadingLevelUtil class.

Issue elimu-ai#1825
@Aryant-Tripathi
Copy link
Contributor Author

@jo-elimu Updated the PR with suggested changes, Remove sample model_1 file and uses the same file as you suggested.
Please have a look and review the changes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)
src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java (1)

544-552: LGTM: Clean refactoring of reading level prediction.

The code has been successfully simplified by delegating the prediction logic to ReadingLevelUtil, improving maintainability and separation of concerns.

Consider adding error handling:

 private ReadingLevel predictReadingLevel(int chapterCount, int paragraphCount, int wordCount) {
     logger.info(
             "Predicting reading level for chapter: {}, paragraph: {}, word: {} ",
             chapterCount, paragraphCount, wordCount
     );

-    ReadingLevel readingLevel = ReadingLevelUtil.predictReadingLevel(chapterCount, paragraphCount, wordCount);
-    logger.info("Predicted reading level: {}", readingLevel);
+    try {
+        ReadingLevel readingLevel = ReadingLevelUtil.predictReadingLevel(chapterCount, paragraphCount, wordCount);
+        logger.info("Predicted reading level: {}", readingLevel);
+        return readingLevel;
+    } catch (Exception e) {
+        logger.error("Failed to predict reading level: {}", e.getMessage(), e);
+        return ReadingLevel.LEVEL1; // Consider an appropriate fallback level
+    }

-    return readingLevel;
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1a03cae and 4db44a8.

📒 Files selected for processing (4)
  • src/main/java/ai/elimu/util/ReadingLevelConstants.java (1 hunks)
  • src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java (1 hunks)
  • src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java (3 hunks)
  • src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/test/java/ai/elimu/util/ml/ReadingLevelUtilTest.java
🔇 Additional comments (8)
src/main/java/ai/elimu/util/ReadingLevelConstants.java (1)

7-25: LGTM! Well-documented constants.

The constants are appropriately named and documented with clear JavaDoc comments explaining their purpose.

src/main/java/ai/elimu/web/content/storybook/StoryBookCreateFromEPubController.java (1)

26-26: LGTM: Import aligns with code reorganization.

The addition of ReadingLevelUtil import supports the PR objective of moving machine learning code to a dedicated utility class.

src/main/java/ai/elimu/util/ml/ReadingLevelUtil.java (6)

43-43: Handle exceptions when loading the model to prevent potential crashes


50-52: Ensure all required features are present to avoid NullPointerException


54-54: Check for null or empty prediction results to prevent exceptions


56-58: Safely cast the prediction result to prevent ClassCastException


61-61: Handle potential IllegalArgumentException when converting to ReadingLevel enum


43-43: Consider caching the model to improve performance

Copy link
Member

@jo-elimu jo-elimu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @Aryant-Tripathi. Thank you 🙂

@nya-elimu nya-elimu merged commit 55112e6 into elimu-ai:main Oct 29, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move machine learning code into utils class
3 participants