Skip to content

Conversation

@wbo4958
Copy link
Contributor

@wbo4958 wbo4958 commented Jan 27, 2025

Supporting ext memory which is based on #11181

@wbo4958
Copy link
Contributor Author

wbo4958 commented Jan 27, 2025

Hi @trivialfis, please help review

@trivialfis
Copy link
Member

trivialfis commented Feb 8, 2025

Note:

  • The external memory is only supported by GPU at the moment; we should make this clear.
  • Must enable RMM.
  • Need global configuration.

@trivialfis
Copy link
Member

We need to let XGBoost access as many CPU threads as available.

@trivialfis
Copy link
Member

  • The default nthread parameter in the spark package XGBoost estimator is 1.
  • There are still other factors limiting the openmp threads in my test run. Perhaps Spark is setting environment variables for executors?

@trivialfis
Copy link
Member

For reference, the --conf spark.task.cpus=2 affects the global OpenMP runtime.

@trivialfis
Copy link
Member

trivialfis commented Feb 11, 2025

The two remaining items:

  • Make sure CPU code can not run into this option accidentally.
  • Expose global configuration for RMM. (java, scala)

@trivialfis trivialfis mentioned this pull request Feb 12, 2025
9 tasks
@wbo4958
Copy link
Contributor Author

wbo4958 commented Feb 22, 2025

Hi @trivialfis, please help review it.

@trivialfis
Copy link
Member

Will review. I have tested the PR on my local machine with 2 GPUs.

  • Overlapping is working.
  • Initialization is quite slow, probably due to disk write. We will need better profiling annotation in the future.

inputNextIsCalled = true
withResource(new GpuColumnBatch(iter.next())) { batch =>
if (iter.eq(input)) {
externalMemory.cacheTable(batch.table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could write data asynchronously in the future (after this PR). This way, we can let XGBoost handle the batch while it's being written simultaneously.

val path = Paths.get(dirPath)
if (!Files.exists(path)) {
Files.createDirectories(path)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty else clause, is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. removed the empty else

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@trivialfis trivialfis merged commit 337ee78 into dmlc:master Feb 25, 2025
60 checks passed
@wbo4958 wbo4958 deleted the java-ext branch February 26, 2025 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants