Skip to content

Conversation

@rhaegar325
Copy link
Collaborator

Summary

Adds memory validation checks to the write() method to prevent out-of-memory errors when writing large datasets.

Problem

The write() method loads the entire dataset into memory via v[:] = vdat.values. For large datasets, this can cause:

  • Memory overflow and process termination
  • System instability due to excessive swapping
  • Conflicts with Dask worker memory limits

Solution

Added pre-write memory checks that:

  1. Estimate data size before loading, accounting for ~1.5x overhead
  2. Detect existing Dask client and retrieve worker/cluster memory limits
  3. Validate memory availability based on the following logic:
Condition Action
Data > total cluster memory Raise MemoryError
Data > single worker memory but < total cluster memory Close Dask client, use local memory
Data > available system memory (no client) Raise MemoryError
Data < available memory Proceed with write

Changes

  • Added estimate_data_size() helper function
  • Added Dask client detection and memory limit retrieval
  • Added memory validation logic with appropriate error messages
  • Auto-closes Dask client when single-file write requires more memory than individual workers

@rhaegar325 rhaegar325 requested a review from rbeucher December 3, 2025 04:03
@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 6.06061% with 31 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@207f119). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/access_moppy/base.py 6.06% 31 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #132   +/-   ##
=======================================
  Coverage        ?   54.21%           
=======================================
  Files           ?       18           
  Lines           ?     2134           
  Branches        ?        0           
=======================================
  Hits            ?     1157           
  Misses          ?      977           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants