Feature/week3 #2

srues2 · 2024-11-18T12:54:40Z

Release Notes

New Features

Introduced A/B testing framework for credit default prediction.
Added functionality for creating and serving feature tables in Databricks.
Implemented model serving endpoints for real-time predictions.

Improvements

Enhanced error handling and logging in data preprocessing and cleaning processes.
Updated configuration files to reflect new catalog and schema names.
Improved README documentation to clarify recent changes and new features.

Bug Fixes

Resolved compatibility issues with pyarrow and mlflow.
Chores
Updated project dependencies and versioning in pyproject.toml.

Summary by CodeRabbit

Release Notes

New Features
- Introduced functionality for serving feature tables and machine learning models in Databricks.
- Added A/B testing framework for machine learning models, allowing for performance comparison.
Enhancements
- Improved feature engineering by including "AverageTemperature" in datasets.
- Updated model registration to reflect a focus on sleep efficiency metrics.
- Enhanced error handling and performance measurement in model serving scripts.
Documentation
- Comprehensive README added for Week 3, detailing feature and model serving processes.
- Updated project configuration to include A/B testing parameters.
Bug Fixes
- Simplified installation commands and environment restart processes for clarity.

coderabbitai · 2024-11-18T12:54:49Z

Walkthrough

This pull request introduces several enhancements across multiple scripts in a Databricks environment. Key changes include the addition of a feature engineering model for sleep efficiency, the creation of feature and model serving scripts, and the implementation of A/B testing for machine learning models. The modifications also encompass updates to configuration files to support new features and parameters, improving the overall functionality and clarity of the codebase.

Changes

File Path	Change Summary
`notebooks/week1&2/05.log_and_register_fe_model.py`	Updated model registration name from "house_prices_model_fe" to "sleep_efficiencies_model_fe"; added "AverageTemperature" feature to datasets.
`notebooks/week3/01.feature_serving.py`	Introduced a script for creating and serving a feature table using Delta tables; added functionality for predictions and serving endpoints.
`notebooks/week3/02.model_serving.py`	Created a script for serving a machine learning model; includes endpoint creation and load testing functionality.
`notebooks/week3/03.model_serving_feature_lookup.py`	Added functionality for creating an online table and serving a model endpoint; includes real-time updates and API request handling.
`notebooks/week3/04.AB_test_model_serving.py`	Implemented A/B testing framework for models; includes model training, logging, and serving endpoint creation.
`notebooks/week3/README.md`	Updated documentation to provide an overview of Week 3 materials, covering feature serving, model serving, and A/B testing.
`project_config.yml`	Added A/B testing parameters under the `ab_test` key; existing parameters remain unchanged.
`src/sleep_efficiency/config.py`	Introduced a new `ab_test` attribute in the `ProjectConfig` class to hold A/B test parameters.

Possibly related PRs

Feature/week1and2 stanruessink #1: The changes in the main PR regarding the logging and registration of a feature engineering model are related to the enhancements introduced in PR Feature/week1and2 stanruessink #1, which also focuses on creating a feature table and implementing feature functions for sleep efficiency prediction.

Suggested reviewers

basakeskili

🐰 In the garden, where data flows,
A model for sleep efficiency grows.
With features and tests, we hop with delight,
Serving predictions, both day and night.
A/B testing brings joy to our quest,
In the world of data, we strive for the best! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (22)

src/sleep_efficiency/config.py (2)
Line range hint 7-16: Add validation for A/B test parameters.

Consider adding specific validation rules for A/B test parameters to ensure required fields are present and values are within expected ranges.

Example implementation:
from typing import Any, Dict, List
from pydantic import BaseModel, Field

class ABTestConfig(BaseModel):
    learning_rate_a: float = Field(gt=0, lt=1)
    learning_rate_b: float = Field(gt=0, lt=1)
    max_depth_a: int = Field(gt=0)
    max_depth_b: int = Field(gt=0)

class ProjectConfig(BaseModel):
    # ... existing fields ...
    ab_test: ABTestConfig
Line range hint 7-24: Consider adding configuration versioning and documentation.

As this configuration is critical for the A/B testing framework and used across multiple notebooks, consider:

Adding a version field to track configuration schema changes

Including a docstring describing the expected structure of the YAML configuration

Adding validation for backward compatibility

Example implementation:
class ProjectConfig(BaseModel):
    """Project configuration for sleep efficiency model.
    
    Expected YAML structure:
    ```yaml
    version: "1.0"
    ab_test:
      learning_rate_a: float  # Learning rate for model A (0-1)
      learning_rate_b: float  # Learning rate for model B (0-1)
      max_depth_a: int       # Max depth for model A (>0)
      max_depth_b: int       # Max depth for model B (>0)
    ```
    """
    version: str = "1.0"  # Configuration schema version
    # ... rest of the fields ...
notebooks/week3/README.md (6)
9-10: Fix grammatical issues in the overview section.

Add a comma after "Last week" and consider rephrasing for better clarity:
-Last week we demonstrated model training and registering for different use cases.
+Last week, we demonstrated model training and registering for different use cases.
🧰 Tools

🪛 LanguageTool

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...red in this lecture. ## Overview Last week we demonstrated model training and regi...

(AI_HYDRA_LEO_MISSING_COMMA)

15-17: Specify language for code blocks.

Add Python language specification to the code block:
-```
+```python
01.feature_serving.py
🧰 Tools

🪛 Markdownlint

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

27-28: Fix grammar in feature serving explanation.

Correct the subject-verb agreement:
-The subsequent code examples shows how to invoke this endpoint and get responses.
+The subsequent code examples show how to invoke this endpoint and get responses.
🧰 Tools

🪛 LanguageTool

[grammar] ~27-~27: The verb form ‘shows’ does not seem to match the subject ‘examples’.
Context: ...e lookups. The subsequent code examples shows how to invoke this endpoint and get res...

(SUBJECT_VERB_AGREEMENT_PLURAL)

31-32: Fix formatting and grammar in model serving section.

Add Python language specification to the code block

Add missing articles:
-Model serving is a process of creating a model serving endpoint that can be used for inference. Endpoint creation process is similar to feature serving
+Model serving is a process of creating a model serving endpoint that can be used for inference. The endpoint creation process is similar to feature serving

-We also added an example piece of code for simple load test to get average latency.
+We also added an example piece of code for a simple load test to get average latency.
Also applies to: 33-33, 40-40

44-45: Fix formatting and spelling in feature lookup section.

Add Python language specification to the code block

Use consistent list style (asterisks instead of dashes)

Fix spelling errors:
-- We start with creating an online table for existing offline feature table
+* We start with creating an online table for existing offline feature table

-Next is the same as in the previous notebook, we create an endpoint using the model we registred
+Next is the same as in the previous notebook, we create an endpoint using the model we registered
Also applies to: 50-54

1-70: Consider adding additional sections to enhance documentation.

The documentation provides good coverage of the implementation details. Consider adding:

A section on error handling and logging improvements mentioned in PR objectives

Compatibility notes regarding pyarrow and mlflow integration

A troubleshooting guide for common issues

Would you like me to help draft these additional sections?

🧰 Tools

🪛 LanguageTool

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...red in this lecture. ## Overview Last week we demonstrated model training and regi...

(AI_HYDRA_LEO_MISSING_COMMA)

[grammar] ~27-~27: The verb form ‘shows’ does not seem to match the subject ‘examples’.
Context: ...e lookups. The subsequent code examples shows how to invoke this endpoint and get res...

(SUBJECT_VERB_AGREEMENT_PLURAL)

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ...ndpoint that can be used for inference. Endpoint creation process is similar to feature ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~37-~37: You might be missing the article “the” here.
Context: ... the model. It's important to note that entity name we pass is a registered model name...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~40-~40: You might be missing the article “a” here.
Context: ...also added an example piece of code for simple load test to get average latency. ### ...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

[uncategorized] ~47-~47: The preposition ‘to’ seems more likely in this position.
Context: ...rained model and create a feature table for look up. Then we create a model serving...

(AI_HYDRA_LEO_REPLACE_FOR_TO)

[typographical] ~50-~50: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...he table we created last week on week 2 - 05.log_and_register_fe_model.py noteboo...

(DASH_RULE)

[uncategorized] ~51-~51: You might be missing the article “the” here.
Context: ...ed for our model to look up features at serving endpoint. - Next is the same as in the ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[typographical] ~52-~52: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e registred in the same notebook week 2 - 05.log_and_register_fe_model.p. This is...

(DASH_RULE)

[uncategorized] ~53-~53: You might be missing the article “a” here.
Context: ...lookup and feature func. - When we send request to the model endpoint, this time, we wo...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

[typographical] ~64-~64: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e the same approach as we did in week 2 - 03.log_and_register_model.py. - We trai...

(DASH_RULE)

[uncategorized] ~70-~70: This verb may not be in the correct form. Consider using a different form for this context.
Context: ..., similar to our previous examples, and shows how to invoke it.

(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)

🪛 Markdownlint

50-50: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

51-51: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

52-52: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

53-53: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

63-63: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

64-64: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

65-65: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

66-66: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

67-67: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

68-68: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

69-69: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

70-70: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

30-30: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

43-43: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

57-57: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)
notebooks/week1&2/05.log_and_register_fe_model.py (3)
179-187: Document feature selection rationale.

The addition of "AverageTemperature" as a feature lacks documentation explaining its significance for sleep efficiency prediction. Additionally, there's commented code about sleep_hours_duration that should be cleaned up.

Consider:

Adding a comment explaining why temperature affects sleep efficiency

Removing the commented code about sleep_hours_duration since it's not being used

229-229: Consider adding version information to the model name.

While the model name change from house prices to sleep efficiencies is appropriate, consider incorporating version information in the model name for better tracking and A/B testing support.

Example:
-    model_uri=f"runs:/{run_id}/lightgbm-pipeline-model-fe", name=f"{catalog_name}.{schema_name}.sleep_efficiencies_model_fe"
+    model_uri=f"runs:/{run_id}/lightgbm-pipeline-model-fe", name=f"{catalog_name}.{schema_name}.sleep_efficiencies_model_fe_v1"
Line range hint 1-230: Consider splitting responsibilities and adding error handling.

The script currently handles multiple concerns (feature engineering, model training, and registration) which could be separated for better maintainability. Additionally, error handling should be added for feature lookups and model training.

Consider:

Splitting into separate notebooks for feature engineering and model training

Adding try-catch blocks around feature lookups and model training

Adding logging for better observability
notebooks/week3/03.model_serving_feature_lookup.py (5)
94-94: Inconsistent feature names in the comment

The comment mentions excluding "OverallQual", "GrLivArea", and "GarageCars", which seem unrelated to sleep efficiency. These features appear to be from a different dataset, possibly housing data. Please update the comment to reflect the correct features relevant to sleep efficiency.

120-120: Useless expression train_set.dtypes

The expression train_set.dtypes is evaluated but not assigned or used elsewhere in the code. If this was intended for debugging, consider removing it or assigning the result to a variable for further use.

🧰 Tools

🪛 Ruff

120-120: Found useless expression. Either assign it to a variable or remove it.

(B018)

124-124: Redundant evaluation of dataframe_records[0]

The expression dataframe_records[0] is evaluated but not used or assigned. If this was meant for debugging purposes, consider removing it or utilizing it meaningfully within the code.

128-128: Module-level import not at the top of the file

The import statement import pandas as pd should be placed at the top of the file according to PEP 8 guidelines. This improves code readability and maintains consistency.

Apply this diff to move the import statement:
17a18
+ import pandas as pd
128d128
- import pandas as pd
🧰 Tools

🪛 Ruff

128-128: Module level import not at top of file

(E402)

158-158: Useless expression sleep_features.dtypes

The expression sleep_features.dtypes is evaluated but not used or assigned to any variable. If this was intended for debugging, consider removing it or using the result appropriately.

🧰 Tools

🪛 Ruff

158-158: Found useless expression. Either assign it to a variable or remove it.

(B018)
notebooks/week3/02.model_serving.py (3)
133-133: Correct typo in print statement.

There's a typo in the print statement: "Reponse" should be "Response".

Apply this diff to fix the typo:
-print("Reponse text:", response.text)
+print("Response text:", response.text)
93-109: Remove irrelevant example payload in the docstring.

The multiline string contains an example payload unrelated to the current context, which may cause confusion.

Apply this diff to remove the unnecessary code:
-"""
-Each body should be list of json with columns
-
-[{'LotFrontage': 78.0,
-  'LotArea': 9317,
-  'OverallQual': 6,
-  'OverallCond': 5,
-  'YearBuilt': 2006,
-  'Exterior1st': 'VinylSd',
-  'Exterior2nd': 'VinylSd',
-  'MasVnrType': 'None',
-  'Foundation': 'PConc',
-  'Heating': 'GasA',
-  'CentralAir': 'Y',
-  'SaleType': 'WD',
-  'SaleCondition': 'Normal'}]
-"""
42-45: Parameterize the model version instead of hardcoding.

Hardcoding entity_version=3 and the model name with version can lead to maintenance issues. Parameterizing the model version makes the code more adaptable to changes.

Introduce a variable for the model version:
model_version = 3  # Update as necessary or retrieve dynamically
Update the code to use this variable:
             entity_name=f"{catalog_name}.{schema_name}.sleep_efficiency_model_basic",
             scale_to_zero_enabled=True,
             workload_size="Small",
-            entity_version=3,
+            entity_version=model_version,
         traffic_config=TrafficConfig(
-            routes=[Route(served_model_name="sleep_efficiency_model_basic-3", traffic_percentage=100)]
+            routes=[Route(served_model_name=f"sleep_efficiency_model_basic-{model_version}", traffic_percentage=100)]
         ),
Also applies to: 50-50
notebooks/week3/04.AB_test_model_serving.py (2)
269-269: Remove or utilize the unnecessary expression at line 269

The standalone expression predictions doesn't affect the program's execution. If the intention is to display the predictions, consider using a print statement or logging.

Apply this change to display the predictions:
-predictions
+print(predictions)
🧰 Tools

🪛 Ruff

269-269: Found useless expression. Either assign it to a variable or remove it.

(B018)

340-344: Add error handling for the HTTP request to the serving endpoint

Currently, the requests.post call lacks error handling, which may lead to unhandled exceptions in case of network failures or unexpected server responses. Implementing error handling will make the code more robust.

Here's a suggested refactor:
+try:
     response = requests.post(
         f"{model_serving_endpoint}",
         headers={"Authorization": f"Bearer {token}"},
         json={"dataframe_records": dataframe_records[0]},
     )
+    response.raise_for_status()
+except requests.exceptions.HTTPError as http_err:
+    print(f"HTTP error occurred: {http_err}")
+except Exception as err:
+    print(f"An error occurred: {err}")
notebooks/week3/01.feature_serving.py (1)
253-254: Verify the response status codes during load testing

In the load test, the status codes of the responses are retrieved but not checked. It's recommended to verify that all requests were successful and handle any failed requests to ensure accurate performance measurements.

Consider updating the code to check for successful status codes:
for future in as_completed(futures):
    status_code, latency = future.result()
    if status_code == 200:
        latencies.append(latency)
    else:
        print(f"Request failed with status code: {status_code}")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2d7c333 and 50c6938.

📒 Files selected for processing (8)

notebooks/week1&2/05.log_and_register_fe_model.py (3 hunks)
notebooks/week3/01.feature_serving.py (1 hunks)
notebooks/week3/02.model_serving.py (1 hunks)
notebooks/week3/03.model_serving_feature_lookup.py (1 hunks)
notebooks/week3/04.AB_test_model_serving.py (1 hunks)
notebooks/week3/README.md (1 hunks)
project_config.yml (1 hunks)
src/sleep_efficiency/config.py (1 hunks)

🧰 Additional context used

🪛 Ruff

notebooks/week1&2/05.log_and_register_fe_model.py

3-3: SyntaxError: Expected a statement

3-3: SyntaxError: Simple statements must be separated by newlines or semicolons

3-3: SyntaxError: Expected an identifier

3-3: SyntaxError: Simple statements must be separated by newlines or semicolons

notebooks/week3/01.feature_serving.py

174-174: Undefined name dbutils

(F821)

183-183: Undefined name display

(F821)

notebooks/week3/02.model_serving.py

62-62: Undefined name dbutils

(F821)

notebooks/week3/03.model_serving_feature_lookup.py

89-89: Undefined name dbutils

(F821)

120-120: Found useless expression. Either assign it to a variable or remove it.

(B018)

128-128: Module level import not at top of file

(E402)

158-158: Found useless expression. Either assign it to a variable or remove it.

(B018)

notebooks/week3/04.AB_test_model_serving.py

269-269: Found useless expression. Either assign it to a variable or remove it.

(B018)

301-301: Undefined name dbutils

(F821)

🪛 LanguageTool

notebooks/week3/README.md

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...red in this lecture. ## Overview Last week we demonstrated model training and regi...

(AI_HYDRA_LEO_MISSING_COMMA)

[grammar] ~27-~27: The verb form ‘shows’ does not seem to match the subject ‘examples’.
Context: ...e lookups. The subsequent code examples shows how to invoke this endpoint and get res...

(SUBJECT_VERB_AGREEMENT_PLURAL)

[uncategorized] ~33-~33: You might be missing the article “the” here.
Context: ...ndpoint that can be used for inference. Endpoint creation process is similar to feature ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~37-~37: You might be missing the article “the” here.
Context: ... the model. It's important to note that entity name we pass is a registered model name...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~40-~40: You might be missing the article “a” here.
Context: ...also added an example piece of code for simple load test to get average latency. ### ...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

[uncategorized] ~47-~47: The preposition ‘to’ seems more likely in this position.
Context: ...rained model and create a feature table for look up. Then we create a model serving...

(AI_HYDRA_LEO_REPLACE_FOR_TO)

[typographical] ~50-~50: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...he table we created last week on week 2 - 05.log_and_register_fe_model.py noteboo...

(DASH_RULE)

[uncategorized] ~51-~51: You might be missing the article “the” here.
Context: ...ed for our model to look up features at serving endpoint. - Next is the same as in the ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[typographical] ~52-~52: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e registred in the same notebook week 2 - 05.log_and_register_fe_model.p. This is...

(DASH_RULE)

[uncategorized] ~53-~53: You might be missing the article “a” here.
Context: ...lookup and feature func. - When we send request to the model endpoint, this time, we wo...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

[typographical] ~64-~64: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e the same approach as we did in week 2 - 03.log_and_register_model.py. - We trai...

(DASH_RULE)

[uncategorized] ~70-~70: This verb may not be in the correct form. Consider using a different form for this context.
Context: ..., similar to our previous examples, and shows how to invoke it.

(AI_EN_LECTOR_REPLACEMENT_VERB_FORM)

🪛 Markdownlint

notebooks/week3/README.md

50-50: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

51-51: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

52-52: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

53-53: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

63-63: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

64-64: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

65-65: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

66-66: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

67-67: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

68-68: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

69-69: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

70-70: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

30-30: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

43-43: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

57-57: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (10)

project_config.yml (3)

10-11: Review A/B test parameter values

The learning rates for both variants are identical (0.02), which defeats the purpose of A/B testing for this parameter. Consider using different values to effectively test the impact of learning rate on model performance.

13-14: Verify max_depth parameter implications

The significant difference in max_depth values (6 vs 10) could lead to notably different model complexities:

Variant A (depth=6): More regularized, less prone to overfitting
Variant B (depth=10): More complex, potentially better at capturing intricate patterns but higher risk of overfitting

Ensure this difference aligns with your experimental objectives and consider monitoring validation metrics closely.

12-12: 🛠️ Refactor suggestion

Restructure A/B test parameters for consistency

The n_estimators parameter in the ab_test section should follow the same naming pattern as other parameters with variant suffixes (a/b). Additionally, consider removing the duplicate definition as it's already defined in the main parameters section.

Apply this change:

ab_test:
  learning_rate_a: 0.02
  learning_rate_b: 0.02
- n_estimators: 1000
+ n_estimators_a: 1000
+ n_estimators_b: 1000
  max_depth_a: 6
  max_depth_b: 10

Likely invalid or redundant comment.

src/sleep_efficiency/config.py (1)

15-15: LGTM! Verify A/B test parameters in config files.

The new ab_test attribute is well-defined and properly typed.

Let's verify the A/B test parameters are properly defined in the YAML config files:

✅ Verification successful

A/B test parameters are properly configured in YAML

The ab_test configuration in project_config.yml is well-structured with:

Variant-specific parameters (A/B) for learning rate and max depth
Common parameters shared across variants (n_estimators)
Values that align with the base model parameters

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if A/B test parameters are defined in YAML config files
# Expected: Find YAML files containing ab_test configuration

# Search for YAML files containing ab_test configuration
rg -l "ab_test:" -g "*.yml" -g "*.yaml"

Length of output: 58

Script:

#!/bin/bash
# Let's check the content of project_config.yml to verify the A/B test parameters
cat project_config.yml

Length of output: 608

notebooks/week3/README.md (1)

58-59: Fix formatting and verify A/B testing implementation.

Add Python language specification to the code block
Use consistent list style (asterisks instead of dashes)

The A/B testing implementation using hash function needs verification to ensure fair traffic distribution.

Also applies to: 63-70

✅ Verification successful

A/B testing implementation is well-implemented with fair traffic distribution

The implementation in notebooks/week3/04.AB_test_model_serving.py shows a robust A/B testing setup:

Uses MD5 hash of user ID for consistent model assignment
Modulo 2 operation ensures 50-50 split between models
Wrapper class properly handles input validation and model selection

The original review comment's concerns about traffic distribution are addressed by the implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify A/B testing implementation in the codebase

# Search for hash function implementation in A/B testing
rg -A 10 "hash.*function" "04.AB_test_model_serving.py"

# Look for model wrapper class implementation
ast-grep --pattern 'class $_ {
  $$$
  predict($_, $_) {
    $$$
  }
}'

Length of output: 197

Script:

#!/bin/bash
# Search for the A/B testing related Python files in notebooks directory
fd -e py "AB.*test" notebooks/

# Check content of any A/B testing related notebooks
fd -e ipynb "AB.*test" notebooks/ -x cat {}

# Search for hash function implementation in Python files under notebooks
rg -A 10 "hash.*function" notebooks/

Length of output: 587

Script:

#!/bin/bash
# Check the content of the A/B testing implementation file
cat notebooks/week3/04.AB_test_model_serving.py

# Search for any other files that might reference A/B testing implementation
rg -l "hash|wrapper.*class|AB.*test" notebooks/week3/

Length of output: 11528

notebooks/week3/02.model_serving.py (1)

35-35: Verify dataset size before converting to Pandas DataFrame.

Converting a large Spark DataFrame to a Pandas DataFrame using .toPandas() can cause memory issues if the dataset is substantial.

Ensure that the train_set is small enough to fit into memory. If it's large, consider sampling or processing the data in Spark before conversion.

notebooks/week3/04.AB_test_model_serving.py (1)

301-301: Verify the availability of dbutils in the execution environment

The variable dbutils might be undefined if this script runs outside a Databricks notebook environment. Ensure that dbutils is available or import it appropriately.

If necessary, you can import dbutils using:

from pyspark.dbutils import DBUtils
dbutils = DBUtils(spark)

🧰 Tools

🪛 Ruff

301-301: Undefined name dbutils

(F821)

notebooks/week3/01.feature_serving.py (3)

174-174: Undefined name dbutils outside of notebook context

The variable dbutils is used without being defined or imported. In Databricks notebooks, dbutils is provided implicitly. However, if this code is intended to run outside of a notebook environment, this will result in a NameError.

Ensure that dbutils is appropriately defined or imported if the code will be executed outside of a notebook. If running in a notebook, you can disregard the static analysis warning.

🧰 Tools

🪛 Ruff

174-174: Undefined name dbutils

(F821)

183-183: Undefined name display outside of notebook context

The function display() is used without being imported or defined. In Databricks notebooks, display() is available by default. If this script is intended to run outside of a notebook environment, consider replacing display(id_list) with print(id_list) or importing the necessary module.

Ensure compatibility with the intended execution environment.

🧰 Tools

🪛 Ruff

183-183: Undefined name display

(F821)

90-90: Confirm preprocessing of categorical features for prediction

The line pipeline.predict(df[cat_features + num_features]) uses both categorical and numerical features directly for prediction. Ensure that the pipeline includes preprocessing steps to handle categorical variables appropriately (e.g., encoding). If the model expects preprocessed inputs, verify that the data matches the expected format.

Confirm that the pipeline can handle the provided categorical features without additional preprocessing.

coderabbitai · 2024-11-18T13:05:46Z

notebooks/week1&2/05.log_and_register_fe_model.py

@@ -1,10 +1,10 @@
 # Databricks notebook source
 # The 2 cells below is only when you are running from databricks UI, because of 'possible' not working locally in VS
-# %pip install mlops_with_databricks-0.0.1-py3-none-any.whl
+%pip install ../mlops_with_databricks-0.0.1-py3-none-any.whl


💡 Codebase verification

The relative path is incorrect and should be updated to ./mlops_with_databricks-0.0.1-py3-none-any.whl

The wheel file is actually located in the same directory as the notebook (notebooks/), not in the parent directory. The current relative path ../mlops_with_databricks-0.0.1-py3-none-any.whl is incorrect and would fail to find the package.

🔗 Analysis chain

Consider using an absolute path for package installation.

The relative path ../mlops_with_databricks-0.0.1-py3-none-any.whl might break if the notebook is run from a different working directory.

Let's verify the package location:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Find the wheel file location fd -e whl

Length of output: 66

🧰 Tools

🪛 Ruff

3-3: SyntaxError: Expected a statement

3-3: SyntaxError: Simple statements must be separated by newlines or semicolons

3-3: SyntaxError: Expected an identifier

3-3: SyntaxError: Expected an identifier

3-3: SyntaxError: Simple statements must be separated by newlines or semicolons

coderabbitai · 2024-11-18T13:05:47Z

notebooks/week3/03.model_serving_feature_lookup.py

+
+# COMMAND ----------
+
+token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()


⚠️ Potential issue

Undefined name dbutils

The variable dbutils is not defined in this script. This will result in a NameError when attempting to retrieve the API token. Ensure that dbutils is available in the execution environment or properly imported.

🧰 Tools

🪛 Ruff

89-89: Undefined name dbutils

(F821)

coderabbitai · 2024-11-18T13:05:47Z

notebooks/week3/03.model_serving_feature_lookup.py

+dataframe_records[0] = [
+    {k: (v.isoformat() if isinstance(v, pd.Timestamp) else v) for k, v in record.items()}
+    for record in dataframe_records[0]
+]


⚠️ Potential issue

Processing only the first record in dataframe_records

Currently, you're modifying only the first element of dataframe_records, which means only the first record will have the pd.Timestamp objects formatted correctly. If your intent is to process all records before sending them to the endpoint, you should iterate over the entire list.

Consider updating the code as follows to process all records:

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-18T13:05:47Z

notebooks/week3/02.model_serving.py

+sampled_records = train_set[required_columns].sample(n=1000, replace=True).to_dict(orient="records")
+dataframe_records = [[record] for record in sampled_records]


🛠️ Refactor suggestion

Refactor to eliminate code duplication when converting timestamps.

The logic for converting pd.Timestamp objects to ISO format appears multiple times. Refactoring this into a helper function improves maintainability and reduces repetition.

Define a helper function:

def convert_timestamps(record): return {k: (v.isoformat() if isinstance(v, pd.Timestamp) else v) for k, v in record.items()}

Update the code to use this function:

-sampled_records = train_set[required_columns].sample(n=1000, replace=True).to_dict(orient="records") -dataframe_records = [[record] for record in sampled_records] +sampled_records = train_set[required_columns].sample(n=1000, replace=True).to_dict(orient="records") +dataframe_records = [convert_timestamps(record) for record in sampled_records]

In the send_request function:

-def send_request(): - random_record = [ - {k: (v.isoformat() if isinstance(v, pd.Timestamp) else v) for k, v in record.items()} - for record in random.choice(dataframe_records) - ] +def send_request(): + random_record = random.choice(dataframe_records)

Ensure that when sending the request, random_record is properly formatted:

response = requests.post( model_serving_endpoint, headers=headers, - json={"dataframe_records": random_record}, + json={"dataframe_records": [random_record]}, )

Also applies to: 118-121, 153-155

coderabbitai · 2024-11-18T13:05:47Z

notebooks/week3/02.model_serving.py

+
+# COMMAND ----------
+
+token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()


⚠️ Potential issue

Fix undefined name dbutils.

The variable dbutils is not defined in this context, which will lead to a NameError. To access dbutils in a Python script, you need to import it appropriately.

Apply this diff to fix the issue:

+from pyspark.dbutils import DBUtils spark = SparkSession.builder.getOrCreate() +dbutils = DBUtils(spark) token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff

62-62: Undefined name dbutils

(F821)

coderabbitai · 2024-11-18T13:05:47Z

notebooks/week3/02.model_serving.py

+    random_record = [
+        {k: (v.isoformat() if isinstance(v, pd.Timestamp) else v) for k, v in record.items()}
+        for record in random.choice(dataframe_records)
+    ]
+    start_time = time.time()
+    response = requests.post(
+        model_serving_endpoint,
+        headers=headers,
+        json={"dataframe_records": random_record},
+    )
+    end_time = time.time()
+    latency = end_time - start_time
+    return response.status_code, latency
+


🛠️ Refactor suggestion

Add error handling in the send_request function.

Currently, if a request fails, the exception isn't handled, which might disrupt the load test.

Add try-except blocks to handle exceptions gracefully:

def send_request(): try: random_record = random.choice(dataframe_records) start_time = time.time() response = requests.post( model_serving_endpoint, headers=headers, json={"dataframe_records": [random_record]}, ) end_time = time.time() latency = end_time - start_time + if response.status_code != 200: + print(f"Request failed with status code {response.status_code}") + return response.status_code, latency + except Exception as e: + print(f"An error occurred: {e}") + return None, None - - return response.status_code, latency

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-18T13:05:47Z

notebooks/week3/04.AB_test_model_serving.py

+    def predict(self, context, model_input):
+        if isinstance(model_input, pd.DataFrame):
+            sleep_person_id = str(model_input["id"].values[0])
+            hashed_id = hashlib.md5(sleep_person_id.encode(encoding="UTF-8")).hexdigest()
+            # convert a hexadecimal (base-16) string into an integer
+            if int(hashed_id, 16) % 2:
+                predictions = self.model_a.predict(model_input.drop(["id"], axis=1))
+                return {"Prediction": predictions[0], "model": "Model A"}
+            else:
+                predictions = self.model_b.predict(model_input.drop(["id"], axis=1))
+                return {"Prediction": predictions[0], "model": "Model B"}
+        else:
+            raise ValueError("Input must be a pandas DataFrame.")
+


⚠️ Potential issue

Ensure proper handling of batch inputs in the predict method

The current implementation of the predict method assumes that model_input contains only a single record, accessing it via values[0]. This approach may lead to errors or incorrect predictions when multiple records are passed. To make the model robust and capable of handling batch predictions, consider iterating over all records.

Here's a suggested refactor:

def predict(self, context, model_input): if isinstance(model_input, pd.DataFrame): - sleep_person_id = str(model_input["id"].values[0]) - hashed_id = hashlib.md5(sleep_person_id.encode(encoding="UTF-8")).hexdigest() - if int(hashed_id, 16) % 2: - predictions = self.model_a.predict(model_input.drop(["id"], axis=1)) - return {"Prediction": predictions[0], "model": "Model A"} - else: - predictions = self.model_b.predict(model_input.drop(["id"], axis=1)) - return {"Prediction": predictions[0], "model": "Model B"} + predictions = [] + models_used = [] + for idx, row in model_input.iterrows(): + sleep_person_id = str(row["id"]) + hashed_id = hashlib.md5(sleep_person_id.encode("UTF-8")).hexdigest() + model_features = row.drop("id").to_frame().T + if int(hashed_id, 16) % 2: + prediction = self.model_a.predict(model_features) + models_used.append("Model A") + else: + prediction = self.model_b.predict(model_features) + models_used.append("Model B") + predictions.append(prediction[0]) + return pd.DataFrame({"Prediction": predictions, "model": models_used}) else: raise ValueError("Input must be a pandas DataFrame.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def predict(self, context, model_input):

if isinstance(model_input, pd.DataFrame):

sleep_person_id = str(model_input["id"].values[0])

hashed_id = hashlib.md5(sleep_person_id.encode(encoding="UTF-8")).hexdigest()

# convert a hexadecimal (base-16) string into an integer

if int(hashed_id, 16) % 2:

predictions = self.model_a.predict(model_input.drop(["id"], axis=1))

return {"Prediction": predictions[0], "model": "Model A"}

else:

predictions = self.model_b.predict(model_input.drop(["id"], axis=1))

return {"Prediction": predictions[0], "model": "Model B"}

else:

raise ValueError("Input must be a pandas DataFrame.")

def predict(self, context, model_input):

if isinstance(model_input, pd.DataFrame):

predictions = []

models_used = []

for idx, row in model_input.iterrows():

sleep_person_id = str(row["id"])

hashed_id = hashlib.md5(sleep_person_id.encode("UTF-8")).hexdigest()

model_features = row.drop("id").to_frame().T

if int(hashed_id, 16) % 2:

prediction = self.model_a.predict(model_features)

models_used.append("Model A")

else:

prediction = self.model_b.predict(model_features)

models_used.append("Model B")

predictions.append(prediction[0])

return pd.DataFrame({"Prediction": predictions, "model": models_used})

else:

raise ValueError("Input must be a pandas DataFrame.")

coderabbitai · 2024-11-18T13:05:47Z

notebooks/week3/01.feature_serving.py

+    FeatureLookup(
+        table_name=feature_table_name,
+        lookup_key="Id",
+        feature_names=["id", "sleep_duration", "awakenings", "sleep_month", "Predicted_SleepEfficiency"],


⚠️ Potential issue

Inconsistent use of 'Id' vs 'id' could lead to errors

There is inconsistency in the use of 'Id' (uppercase I) and 'id' (lowercase i) as key identifiers in your code. This can cause KeyError exceptions or unexpected behavior when accessing DataFrame columns or performing lookups. Ensure consistent use of 'id' (lowercase) throughout the code to match the DataFrame column names.

Apply the following changes to resolve the inconsistency:

At line 132:

- lookup_key="Id", + lookup_key="id",

At line 179:

- id_list = preds_df["Id"] + id_list = preds_df["id"]

At line 197:

- json={"dataframe_records": [{"Id": "2"}]}, + json={"dataframe_records": [{"id": "2"}]},

At line 215:

- json={"dataframe_split": {"columns": ["Id"], "data": [["2"]]}}, + json={"dataframe_split": {"columns": ["id"], "data": [["2"]]}},

At line 237:

- json={"dataframe_records": [{"Id": random_id}]}, + json={"dataframe_records": [{"id": random_id}]},

Also applies to: 179-179, 197-197, 215-215, 237-237

coderabbitai · 2024-11-18T13:05:48Z

notebooks/week3/01.feature_serving.py

+    response = requests.post(
+        serving_endpoint,
+        headers=headers,
+        json={"dataframe_records": [{"Id": random_id}]},
+    )
+    end_time = time.time()
+    latency = end_time - start_time  # Calculate latency for this request
+    return response.status_code, latency


🛠️ Refactor suggestion

Add exception handling to API requests in send_request function

The send_request function lacks error handling for potential exceptions during the API request. This could cause the program to crash during the load test if a network error occurs.

Consider adding exception handling to make the code more robust:

def send_request(): random_id = random.choice(id_list) start_time = time.time() try: response = requests.post( serving_endpoint, headers=headers, json={"dataframe_records": [{"id": random_id}]}, ) response.raise_for_status() status_code = response.status_code except requests.exceptions.RequestException as e: status_code = None print(f"Request failed: {e}") end_time = time.time() latency = end_time - start_time # Calculate latency for this request return status_code, latency

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (9)

notebooks/week3/03.model_serving_feature_lookup.py (3)

1-2: Consider using a package registry for distribution

Installing from a local wheel file may cause versioning and distribution issues. Consider publishing the package to a private PyPI repository or using Databricks' artifact storage.

69-81: Consider workload size based on performance requirements

The endpoint is configured with a "Small" workload size and scale-to-zero enabled. While this is cost-effective, verify that:

The small instance type can handle your expected request volume

The cold start latency from scale-to-zero is acceptable for your use case

149-149: Remove or utilize unused data loading

The temperature features table is loaded but never used. Either remove this line if it's not needed or add the intended usage of this data.
notebooks/week3/01.feature_serving.py (5)
96-101: Add error handling for table creation

The table creation should handle cases where the table might already exist.
+try:
     fe.create_table(
         name=feature_table_name,
         primary_keys=["id"],
         df=preds_df,
         description="Sleep efficiencies predictions feature table",
     )
+except Exception as e:
+    print(f"Error creating table: {e}")
+    # Consider whether to proceed or raise the exception
151-162: Consider adding monitoring and alerting

While the endpoint configuration looks good, consider adding monitoring and alerting for production readiness. This could include metrics for latency, error rates, and resource utilization.

174-174: Add type hint for dbutils

The dbutils variable is flagged as undefined. Add a type hint to clarify it's a Databricks utility.
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from databricks.sdk.runtime import dbutils
🧰 Tools

🪛 Ruff

174-174: Undefined name dbutils

(F821)

189-202: Add response content validation

The endpoint test should validate the response content structure and values, not just the status code.
 response = requests.post(
     f"{serving_endpoint}",
     headers={"Authorization": f"Bearer {token}"},
     json={"dataframe_records": [{"Id": "2"}]},
 )
 
 end_time = time.time()
 execution_time = end_time - start_time
 
 print("Response status:", response.status_code)
+if response.status_code == 200:
+    response_data = response.json()
+    # Validate response structure
+    if "predictions" not in response_data:
+        print("Warning: Unexpected response structure")
+    # Validate prediction values
+    predictions = response_data.get("predictions", [])
+    if not predictions or not all(isinstance(p, (int, float)) for p in predictions):
+        print("Warning: Invalid prediction values")
 print("Reponse text:", response.text)
 print("Execution time:", execution_time, "seconds")
221-225: Consider parameterizing load test configuration

Load test parameters like num_requests and max_workers should be configurable to support different test scenarios.
+# Load test configuration
+load_test_config = {
+    "num_requests": 10,
+    "max_workers": 100,
+    "timeout_seconds": 30
+}

 serving_endpoint = f"https://{host}/serving-endpoints/sleep-efficiencies-feature-serving/invocations"
 id_list = preds_df.select("Id").rdd.flatMap(lambda x: x).collect()
 headers = {"Authorization": f"Bearer {token}"}
-num_requests = 10
+num_requests = load_test_config["num_requests"]
notebooks/week3/04.AB_test_model_serving.py (1)
103-103: Consider retrieving git_sha programmatically

Currently, git_sha is hardcoded as "week3". To enhance reproducibility and ensure accurate tracking of model versions, consider retrieving the actual Git commit hash programmatically.

Apply this diff to implement the change:
- git_sha = "week3"
+ import subprocess
+ git_sha = subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 50c6938 and 8c0feba.

📒 Files selected for processing (4)

notebooks/week1&2/05.log_and_register_fe_model.py (3 hunks)
notebooks/week3/01.feature_serving.py (1 hunks)
notebooks/week3/03.model_serving_feature_lookup.py (1 hunks)
notebooks/week3/04.AB_test_model_serving.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

notebooks/week1&2/05.log_and_register_fe_model.py

🧰 Additional context used

🪛 Ruff

notebooks/week3/01.feature_serving.py

174-174: Undefined name dbutils

(F821)

notebooks/week3/03.model_serving_feature_lookup.py

90-90: Undefined name dbutils

(F821)

notebooks/week3/04.AB_test_model_serving.py

299-299: Undefined name dbutils

(F821)

🔇 Additional comments (4)

notebooks/week3/03.model_serving_feature_lookup.py (2)

90-91: Skip comment about undefined dbutils as it was already reported in past reviews

🧰 Tools

🪛 Ruff

90-90: Undefined name dbutils

(F821)

44-52: Verify primary key uniqueness for temperature features

Using only "Month" as the primary key assumes that temperature features are unique per month. This might lead to data conflicts if there are multiple temperature readings per month.

✅ Verification successful

Let me try a different approach to verify the data structure and primary key usage.

Let me try one more search without file type restriction to find the feature engineering code.

Based on the search results, I can now provide a final response since I found the table creation code that shows the complete schema and constraints.

Primary key on "Month" is correctly configured

The table schema in notebooks/week1&2/05.log_and_register_fe_model.py shows that:

The table has Id, Month, and AverageTemperature columns
Month is explicitly defined as a PRIMARY KEY constraint
Data is sourced from temperatures_netherlands_2021 table with a direct month-to-month mapping

The primary key choice is appropriate as the table stores aggregated temperature data per month, ensuring uniqueness.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential duplicate months in the source table
echo "Checking for duplicate months in temperature_features table..."
ast-grep --pattern 'SELECT Month, COUNT(*) as count 
  FROM $catalog.$schema.temperature_features 
  GROUP BY Month 
  HAVING COUNT(*) > 1'

Length of output: 276

Script:

#!/bin/bash
# Search for table creation or feature engineering code
rg -A 10 "temperature_features" --type python

Length of output: 76

Script:

#!/bin/bash
# Search for temperature features table creation and transformation
rg -A 10 "temperature_features"

Length of output: 6625

notebooks/week3/01.feature_serving.py (1)

2-2: Verify the wheel file installation path

The wheel file is being installed from a relative path. This could break if the notebook is moved to a different location.

✅ Verification successful

The relative wheel file path is correct and follows project conventions

Based on the repository structure and README documentation, the relative wheel file path ../mlops_with_databricks-0.0.1-py3-none-any.whl in notebooks/week3/01.feature_serving.py is correct because:

The wheel file is consistently referenced across multiple notebooks in the same way
The README explicitly documents the process of building and uploading the wheel file to Databricks volumes
The notebook structure follows a weekly organization pattern where notebooks are placed in week-specific folders

The relative path ensures the wheel file can be accessed from any week's notebook directory, making it a reliable approach for this project's structure.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the wheel file exists and get its details
fd -e whl . | while read -r wheel; do
    echo "Found wheel: $wheel"
    unzip -l "$wheel" | grep -i "METADATA\|RECORD"
done

Length of output: 358

Script:

#!/bin/bash
# Let's try to understand the notebook structure and wheel file references
rg -l "mlops_with_databricks.*\.whl" --type py

# Check if there are any setup or installation instructions in README
fd "README" --type f --exec cat {}

Length of output: 7357

notebooks/week3/04.AB_test_model_serving.py (1)

215-228: Ensure proper handling of batch inputs in the predict method

coderabbitai · 2024-11-19T08:36:28Z

notebooks/week3/03.model_serving_feature_lookup.py

+config = ProjectConfig.from_yaml(config_path="../../project_config.yml")
+
+catalog_name = config.catalog_name
+schema_name = config.schema_name
+


🛠️ Refactor suggestion

Remove duplicate configuration loading

This configuration loading block is identical to the one at lines 38-40. Consider removing this duplicate block to improve maintainability.

-config = ProjectConfig.from_yaml(config_path="../../project_config.yml") - -catalog_name = config.catalog_name -schema_name = config.schema_name

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

config = ProjectConfig.from_yaml(config_path="../../project_config.yml")

catalog_name = config.catalog_name

schema_name = config.schema_name

coderabbitai · 2024-11-19T08:36:28Z

notebooks/week3/03.model_serving_feature_lookup.py

+train_set = spark.table(f"{catalog_name}.{schema_name}.train_set").toPandas()
+
+sampled_records = train_set[required_columns].sample(n=1000, replace=True).to_dict(orient="records")
+dataframe_records = [[record] for record in sampled_records]


🛠️ Refactor suggestion

Consider batching for large datasets

Loading the entire training set into memory with .toPandas() could cause OOM issues with large datasets. Consider using:

Sampling at Spark level before conversion to Pandas

Processing in batches for production workloads

-train_set = spark.table(f"{catalog_name}.{schema_name}.train_set").toPandas() -sampled_records = train_set[required_columns].sample(n=1000, replace=True) +train_set = spark.table(f"{catalog_name}.{schema_name}.train_set").select(*required_columns) +sampled_records = train_set.sample(n=1000, withReplacement=True).toPandas()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

train_set = spark.table(f"{catalog_name}.{schema_name}.train_set").toPandas()

sampled_records = train_set[required_columns].sample(n=1000, replace=True).to_dict(orient="records")

dataframe_records = [[record] for record in sampled_records]

train_set = spark.table(f"{catalog_name}.{schema_name}.train_set").select(*required_columns)

sampled_records = train_set.sample(n=1000, withReplacement=True).toPandas()

dataframe_records = [[record] for record in sampled_records.to_dict(orient="records")]

coderabbitai · 2024-11-19T08:36:28Z

notebooks/week3/03.model_serving_feature_lookup.py

+response = requests.post(
+    f"{model_serving_endpoint}",
+    headers={"Authorization": f"Bearer {token}"},
+    json={"dataframe_records": dataframe_records[0]},
+)


⚠️ Potential issue

Add error handling for API requests

The endpoint call lacks error handling. Consider adding try-catch blocks and proper error handling for:

Connection errors

Timeout errors

HTTP error responses

+try: response = requests.post( f"{model_serving_endpoint}", headers={"Authorization": f"Bearer {token}"}, json={"dataframe_records": dataframe_records[0]}, + timeout=30 # Add appropriate timeout ) + response.raise_for_status() +except requests.exceptions.RequestException as e: + print(f"Error calling endpoint: {str(e)}") + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

response = requests.post(

f"{model_serving_endpoint}",

headers={"Authorization": f"Bearer {token}"},

json={"dataframe_records": dataframe_records[0]},

)

try:

response = requests.post(

f"{model_serving_endpoint}",

headers={"Authorization": f"Bearer {token}"},

json={"dataframe_records": dataframe_records[0]},

timeout=30 # Add appropriate timeout

)

response.raise_for_status()

except requests.exceptions.RequestException as e:

print(f"Error calling endpoint: {str(e)}")

raise

coderabbitai · 2024-11-19T08:36:28Z

notebooks/week3/01.feature_serving.py

+# COMMAND ----------
+
+# Load the MLflow model for predictions
+pipeline = mlflow.sklearn.load_model(f"models:/{catalog_name}.{schema_name}.sleep_efficiency_model_basic/3")


🛠️ Refactor suggestion

Consider parameterizing the model version

The model version is hardcoded to "3". Consider moving this to configuration to make it more maintainable and flexible.

-pipeline = mlflow.sklearn.load_model(f"models:/{catalog_name}.{schema_name}.sleep_efficiency_model_basic/3") +model_version = config.model_version # Add this to your config +pipeline = mlflow.sklearn.load_model(f"models:/{catalog_name}.{schema_name}.sleep_efficiency_model_basic/{model_version}")

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (6)

notebooks/week3/02.model_serving.py (4)

72-86: Move required columns to configuration file.

The list of required columns should be maintained in the configuration for better maintainability and reusability.
+# Add to ProjectConfig class
+required_columns: List[str] = [
+    "age", "sleep_duration", "rem_sleep_percentage",
+    "deep_sleep_percentage", "light_sleep_percentage",
+    "awakenings", "caffeine_consumption",
+    "alcohol_consumption", "exercise_frequency",
+    "gender", "smoking_status", "bedtime",
+    "wakeup_time",
+]

-required_columns = [
-    "age",
-    "sleep_duration",
-    ...
-]
+required_columns = config.required_columns
93-109: Remove or update misleading documentation.

The example request body format shows house pricing features (LotFrontage, LotArea, etc.) which don't match the actual sleep efficiency model features being used.

132-134: Improve response logging security and format.

The current logging might expose sensitive data and lacks proper formatting:

Avoid logging full response text

Add log levels (INFO/ERROR)

Format the output consistently
-print("Response status:", response.status_code)
-print("Reponse text:", response.text)
-print("Execution time:", execution_time, "seconds")
+print(f"Response status: {response.status_code}")
+if response.status_code != 200:
+    print(f"Error: {response.status_code}")
+    print(f"Error message: {response.json().get('message', 'Unknown error')}")
+print(f"Execution time: {execution_time:.2f} seconds")
147-148: Make load test parameters configurable.

The number of requests and concurrent workers should be configurable parameters.
+# Add to ProjectConfig class
+load_test_config: Dict[str, int] = {
+    "num_requests": 1000,
+    "max_workers": 100
+}

-num_requests = 1000
+num_requests = config.load_test_config["num_requests"]

notebooks/week3/04.AB_test_model_serving.py (2)

96-96: Consider adding cross-validation for hyperparameter tuning

The current implementation uses fixed hyperparameters from config. Adding cross-validation would help optimize model performance.

Consider implementing k-fold cross-validation:

from sklearn.model_selection import GridSearchCV

param_grid = {
    'regressor__learning_rate': [parameters_a['learning_rate'], 0.01, 0.1],
    'regressor__max_depth': [parameters_a['max_depth'], 5, 7]
}
cv_pipeline = GridSearchCV(pipeline, param_grid, cv=5)
cv_pipeline.fit(X_train, y_train)

337-348: Improve response handling and logging

The current implementation prints raw response text without proper error handling or structured logging.

Improve response handling:

-response = requests.post(
-    f"{model_serving_endpoint}",
-    headers={"Authorization": f"Bearer {token}"},
-    json={"dataframe_records": dataframe_records[0]},
-)
-
-end_time = time.time()
-execution_time = end_time - start_time
-
-print("Response status:", response.status_code)
-print("Reponse text:", response.text)
-print("Execution time:", execution_time, "seconds")
+try:
+    response = requests.post(
+        f"{model_serving_endpoint}",
+        headers={"Authorization": f"Bearer {token}"},
+        json={"dataframe_records": dataframe_records[0]},
+        timeout=30
+    )
+    response.raise_for_status()
+    
+    end_time = time.time()
+    execution_time = end_time - start_time
+    
+    result = response.json()
+    print(f"Success! Execution time: {execution_time:.2f} seconds")
+    print(f"Predictions: {json.dumps(result, indent=2)}")
+except requests.exceptions.RequestException as e:
+    print(f"Error calling endpoint: {e}")
+    if hasattr(e, 'response'):
+        print(f"Response status: {e.response.status_code}")
+        print(f"Response body: {e.response.text}")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8c0feba and 7c2e3ac.

📒 Files selected for processing (4)

notebooks/week3/01.feature_serving.py (1 hunks)
notebooks/week3/02.model_serving.py (1 hunks)
notebooks/week3/03.model_serving_feature_lookup.py (1 hunks)
notebooks/week3/04.AB_test_model_serving.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

notebooks/week3/01.feature_serving.py
notebooks/week3/03.model_serving_feature_lookup.py

🔇 Additional comments (2)

notebooks/week3/02.model_serving.py (1)

2-2: Verify the integrity of the local wheel file installation.

Installing from a local wheel file could pose security risks. Consider:

Publishing the package to a private PyPI repository
Adding integrity checks (e.g., SHA-256 hash verification)
Documenting the wheel file's source and build process

notebooks/week3/04.AB_test_model_serving.py (1)

215-228: LGTM with existing review comment

The implementation of the A/B testing logic using hashed IDs is sound, but please address the batch prediction handling issue raised in the previous review.

coderabbitai · 2024-11-19T12:21:39Z

notebooks/week3/02.model_serving.py

+    name="sleep-efficiencies-model-serving",
+    config=EndpointCoreConfigInput(
+        served_entities=[
+            ServedEntityInput(
+                entity_name=f"{catalog_name}.{schema_name}.sleep_efficiency_model_basic",
+                scale_to_zero_enabled=True,
+                workload_size="Small",
+                entity_version=3,
+            )


🛠️ Refactor suggestion

Avoid hardcoding endpoint name and version number.

The endpoint name and version number should be configurable parameters:

Move "sleep-efficiencies-model-serving" to configuration

Make the entity version (3) configurable

+# Add to ProjectConfig class +model_endpoint_name: str = "sleep-efficiencies-model-serving" +model_version: int = 3 - name="sleep-efficiencies-model-serving", + name=config.model_endpoint_name, config=EndpointCoreConfigInput( served_entities=[ ServedEntityInput( entity_name=f"{catalog_name}.{schema_name}.sleep_efficiency_model_basic", scale_to_zero_enabled=True, workload_size="Small", - entity_version=3, + entity_version=config.model_version,

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-19T12:21:39Z

notebooks/week3/02.model_serving.py

+with ThreadPoolExecutor(max_workers=100) as executor:
+    futures = [executor.submit(send_request) for _ in range(num_requests)]
+
+    for future in as_completed(futures):
+        status_code, latency = future.result()
+        latencies.append(latency)
+


🛠️ Refactor suggestion

Improve load test results collection and error handling.

The current implementation doesn't track failed requests or provide detailed statistics.

+results = {"success": 0, "failed": 0, "latencies": [], "errors": {}} with ThreadPoolExecutor(max_workers=100) as executor: futures = [executor.submit(send_request) for _ in range(num_requests)] for future in as_completed(futures): - status_code, latency = future.result() - latencies.append(latency) + try: + status_code, latency = future.result() + if status_code == 200: + results["success"] += 1 + results["latencies"].append(latency) + else: + results["failed"] += 1 + results["errors"][status_code] = results["errors"].get(status_code, 0) + 1 + except Exception as e: + results["failed"] += 1 + error_type = type(e).__name__ + results["errors"][error_type] = results["errors"].get(error_type, 0) + 1 +print(f"\nResults:") +print(f"Success: {results['success']}/{num_requests} ({results['success']/num_requests*100:.2f}%)") +print(f"Failed: {results['failed']}/{num_requests} ({results['failed']/num_requests*100:.2f}%)") +if results["errors"]: + print("\nErrors:") + for error, count in results["errors"].items(): + print(f"- {error}: {count}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with ThreadPoolExecutor(max_workers=100) as executor:

futures = [executor.submit(send_request) for _ in range(num_requests)]

for future in as_completed(futures):

status_code, latency = future.result()

latencies.append(latency)

results = {"success": 0, "failed": 0, "latencies": [], "errors": {}}

with ThreadPoolExecutor(max_workers=100) as executor:

futures = [executor.submit(send_request) for _ in range(num_requests)]

for future in as_completed(futures):

try:

status_code, latency = future.result()

if status_code == 200:

results["success"] += 1

results["latencies"].append(latency)

else:

results["failed"] += 1

results["errors"][status_code] = results["errors"].get(status_code, 0) + 1

except Exception as e:

results["failed"] += 1

error_type = type(e).__name__

results["errors"][error_type] = results["errors"].get(error_type, 0) + 1

print(f"\nResults:")

print(f"Success: {results['success']}/{num_requests} ({results['success']/num_requests*100:.2f}%)")

print(f"Failed: {results['failed']}/{num_requests} ({results['failed']/num_requests*100:.2f}%)")

if results["errors"]:

print("\nErrors:")

for error, count in results["errors"].items():

print(f"- {error}: {count}")

coderabbitai · 2024-11-19T12:21:39Z

notebooks/week3/04.AB_test_model_serving.py

+# Repeat the training and logging steps for Model B using parameters for B
+pipeline = Pipeline(steps=[("preprocessor", preprocessor), ("regressor", LGBMRegressor(**parameters_b))])
+
+# Start MLflow run for Model B
+with mlflow.start_run(tags={"model_class": "B", "git_sha": git_sha}) as run:
+    run_id = run.info.run_id
+
+    pipeline.fit(X_train, y_train)
+    y_pred = pipeline.predict(X_test)
+
+    mse = mean_squared_error(y_test, y_pred)
+    mae = mean_absolute_error(y_test, y_pred)
+    r2 = r2_score(y_test, y_pred)
+
+    mlflow.log_param("model_type", "LightGBM with preprocessing")
+    mlflow.log_params(parameters_b)
+    mlflow.log_metric("mse", mse)
+    mlflow.log_metric("mae", mae)
+    mlflow.log_metric("r2_score", r2)
+    signature = infer_signature(model_input=X_train, model_output=y_pred)
+
+    dataset = mlflow.data.from_spark(train_set_spark, table_name=f"{catalog_name}.{schema_name}.train_set", version="0")
+    mlflow.log_input(dataset, context="training")
+    mlflow.sklearn.log_model(sk_model=pipeline, artifact_path="lightgbm-pipeline-model", signature=signature)
+
+model_version = mlflow.register_model(
+    model_uri=f"runs:/{run_id}/lightgbm-pipeline-model", name=model_name, tags={"git_sha": f"{git_sha}"}
+)


🛠️ Refactor suggestion

Refactor duplicate model training code

The Model B training code is duplicated from Model A. Consider extracting the training logic into a reusable function.

Suggested refactor:

def train_and_log_model(X_train, y_train, parameters, model_class, git_sha): pipeline = Pipeline(steps=[ ("preprocessor", preprocessor), ("regressor", LGBMRegressor(**parameters)) ]) with mlflow.start_run(tags={"model_class": model_class, "git_sha": git_sha}) as run: run_id = run.info.run_id pipeline.fit(X_train, y_train) y_pred = pipeline.predict(X_test) metrics = { "mse": mean_squared_error(y_test, y_pred), "mae": mean_absolute_error(y_test, y_pred), "r2": r2_score(y_test, y_pred) } mlflow.log_param("model_type", "LightGBM with preprocessing") mlflow.log_params(parameters) for metric_name, value in metrics.items(): mlflow.log_metric(metric_name, value) signature = infer_signature(model_input=X_train, model_output=y_pred) dataset = mlflow.data.from_spark( train_set_spark, table_name=f"{catalog_name}.{schema_name}.train_set", version="0" ) mlflow.log_input(dataset, context="training") mlflow.sklearn.log_model( sk_model=pipeline, artifact_path="lightgbm-pipeline-model", signature=signature ) return run_id, pipeline # Usage: run_id_a, model_a = train_and_log_model(X_train, y_train, parameters_a, "A", git_sha) run_id_b, model_b = train_and_log_model(X_train, y_train, parameters_b, "B", git_sha)

coderabbitai · 2024-11-19T12:21:39Z

notebooks/week3/04.AB_test_model_serving.py

+model_name = f"{catalog_name}.{schema_name}.sleep_efficiencies_model_ab"
+
+# Git commit hash for tracking model version
+git_sha = "week3"


⚠️ Potential issue

Replace hardcoded git SHA with actual commit hash

Using a hardcoded value "week3" as git SHA defeats the purpose of version tracking. This should be dynamically retrieved from the git repository.

Replace with actual git SHA using:

-git_sha = "week3" +import subprocess +git_sha = subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

git_sha = "week3"

import subprocess

git_sha = subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()

coderabbitai · 2024-11-19T12:21:39Z

notebooks/week3/04.AB_test_model_serving.py

+model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version.version}")
+
+# Run prediction
+predictions = model.predict(X_test.iloc[0:1])
+


⚠️ Potential issue

Add error handling for model loading and prediction

The current implementation lacks error handling for model loading and prediction operations.

Add error handling:

-model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version.version}") +try: + model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version.version}") +except Exception as e: + print(f"Error loading model: {e}") + raise -# Run prediction -predictions = model.predict(X_test.iloc[0:1]) +try: + predictions = model.predict(X_test.iloc[0:1]) +except Exception as e: + print(f"Error during prediction: {e}") + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version.version}")

# Run prediction

predictions = model.predict(X_test.iloc[0:1])

try:

model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version.version}")

except Exception as e:

print(f"Error loading model: {e}")

raise

try:

predictions = model.predict(X_test.iloc[0:1])

except Exception as e:

print(f"Error during prediction: {e}")

raise

coderabbitai · 2024-11-19T12:21:39Z

notebooks/week3/04.AB_test_model_serving.py

+workspace.serving_endpoints.create(
+    name="sleep-efficiencies-model-serving-ab-test",
+    config=EndpointCoreConfigInput(
+        served_entities=[
+            ServedEntityInput(
+                entity_name=f"{catalog_name}.{schema_name}.sleep_efficiencies_model_pyfunc_ab_test",
+                scale_to_zero_enabled=True,
+                workload_size="Small",
+                entity_version=model_version.version,
+            )
+        ]
+    ),
+)


🛠️ Refactor suggestion

Add retry logic and validation for endpoint creation

The endpoint creation lacks retry logic and validation, which could lead to failures in unstable network conditions.

Add retry logic:

+from tenacity import retry, stop_after_attempt, wait_exponential + +@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10)) +def create_endpoint(workspace, name, config): + try: + endpoint = workspace.serving_endpoints.create(name=name, config=config) + # Wait for endpoint to be ready + while endpoint.state.config_update is not None: + time.sleep(10) + endpoint = workspace.serving_endpoints.get(name) + return endpoint + except Exception as e: + print(f"Error creating endpoint: {e}") + raise -workspace.serving_endpoints.create( +create_endpoint( + workspace, name="sleep-efficiencies-model-serving-ab-test", config=EndpointCoreConfigInput(...) )

Committable suggestion skipped: line range outside the PR's diff.

basakeskili

Looks nice!

sruessink and others added 13 commits November 15, 2024 10:34

setup feature serving file

73ce4b9

add feature serving code

c64a9e9

model serving

72a2d5a

new version model serving

41b991f

model serving features lookup

169deb4

new code model serving feature lookup code

937d593

ab testing code

8c613eb

update project config ab testing

78d24ca

update config with abtest

ef2f7fb

fix ab testing code

96f5571

add readme file

d70843c

update readme

3d419ff

fix errors uv

50c6938

srues2 requested a review from a team as a code owner November 18, 2024 12:54

coderabbitai bot reviewed Nov 18, 2024

View reviewed changes

remove useless statements

8c0feba

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

fix dbutils bug

7c2e3ac

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

basakeskili approved these changes Nov 25, 2024

View reviewed changes


		# COMMAND ----------

		token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

		sampled_records = train_set[required_columns].sample(n=1000, replace=True).to_dict(orient="records")
		dataframe_records = [[record] for record in sampled_records]

-with ThreadPoolExecutor(max_workers=100) as executor:
-    futures = [executor.submit(send_request) for _ in range(num_requests)]
-    for future in as_completed(futures):
-        status_code, latency = future.result()
-        latencies.append(latency)
+results = {"success": 0, "failed": 0, "latencies": [], "errors": {}}
+with ThreadPoolExecutor(max_workers=100) as executor:
+    futures = [executor.submit(send_request) for _ in range(num_requests)]
+    for future in as_completed(futures):
+        try:
+            status_code, latency = future.result()
+            if status_code == 200:
+                results["success"] += 1
+                results["latencies"].append(latency)
+            else:
+                results["failed"] += 1
+                results["errors"][status_code] = results["errors"].get(status_code, 0) + 1
+        except Exception as e:
+            results["failed"] += 1
+            error_type = type(e).__name__
+            results["errors"][error_type] = results["errors"].get(error_type, 0) + 1
+print(f"\nResults:")
+print(f"Success: {results['success']}/{num_requests} ({results['success']/num_requests*100:.2f}%)")
+print(f"Failed: {results['failed']}/{num_requests} ({results['failed']/num_requests*100:.2f}%)")
+if results["errors"]:
+    print("\nErrors:")
+    for error, count in results["errors"].items():
+        print(f"- {error}: {count}")

	git_sha = "week3"
	import subprocess
	git_sha = subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()

Feature/week3 #2

Are you sure you want to change the base?

Feature/week3 #2

Uh oh!

Conversation

srues2 commented Nov 18, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2024

srues2 commented Nov 18, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)