Skip to content

Conversation

@trivialfis
Copy link
Member

@trivialfis trivialfis commented Jul 15, 2025

Related: #9472 .

@trivialfis trivialfis changed the title [WIP][EM] Remove text file input for external memory. [EM] Remove text file input for external memory. Jul 15, 2025
@trivialfis trivialfis marked this pull request as ready for review July 15, 2025 16:41
@trivialfis trivialfis requested a review from Copilot July 15, 2025 17:13

This comment was marked as outdated.

@trivialfis trivialfis requested a review from Copilot July 15, 2025 17:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes support for text-file-based external memory, standardizes in-memory and external-memory test utilities, and updates documentation and examples accordingly.

  • Deprecate and remove “cache” suffix parsing in DMatrix::Load; always load data in-memory or via new iterator-based external memory API
  • Introduce GetExternalMemoryDMatrixFromData helper and refactor tests to use RandomDataGenerator and HostDeviceVector for external-memory scenarios
  • Remove legacy ExternalMemory examples from JVM packages and update tutorials to reflect removal of text-file external memory

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/data/data.cc Disallow “#cache” URI syntax and simplify DMatrix::Load
tests/cpp/helpers.{h,cc} Add GetExternalMemoryDMatrixFromData for tests
tests/cpp/data/test_sparse_page_* Refactor to use iterator API and RandomDataGenerator
tests/cpp/common/test_hist_util.* Migrate to HostDeviceVector and remove deprecated helpers
jvm-packages/.../ExternalMemory.* Remove ExternalMemory example code and references
doc/tutorials/*.rst Update guidance to note removal of text-file external memory
Comments suppressed due to low confidence (4)

doc/tutorials/external_memory.rst:521

  • [nitpick] This standalone bullet may render incorrectly in RST. Consider using a proper directive (e.g., .. note:: or a paragraph) and ensuring a blank line before the item for correct formatting.
- The text file cache format has been removed in 3.1.0.

tests/cpp/data/test_sparse_page_dmatrix.cu:11

  • The test uses FileExists but no longer includes the header that defines it (previously provided by filesystem.h). Please reintroduce the appropriate include (e.g. "../filesystem.h") or add a forward declaration for FileExists.
#include "../helpers.h"

tests/cpp/data/test_sparse_page_dmatrix.cc:16

  • This file uses dmlc::TemporaryDirectory but the include for filesystem.h (which defines TemporaryDirectory) was removed. Add back the include for filesystem.h or <dmlc/filesystem.h>.
#include "../helpers.h"

src/data/data.cc:1007

  • [nitpick] The parameter name for the cache prefix was removed, making the signature less self-documenting. Consider restoring the parameter name (e.g., cache_prefix) for clarity.
template DMatrix* DMatrix::Create(

@trivialfis
Copy link
Member Author

cc @rongou .

@trivialfis trivialfis merged commit 3d9db0b into dmlc:master Jul 15, 2025
81 of 84 checks passed
@trivialfis trivialfis deleted the ext-drop-text branch July 15, 2025 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants