Skip to content

[CI] IndexStatsIT testThrottleStats failing #126359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
elasticsearchmachine opened this issue Apr 5, 2025 · 6 comments · May be fixed by #128049
Open

[CI] IndexStatsIT testThrottleStats failing #126359

elasticsearchmachine opened this issue Apr 5, 2025 · 6 comments · May be fixed by #128049
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Indexing Meta label for Distributed Indexing team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Apr 5, 2025

Build Scans:

Reproduction Line:

./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.indices.stats.IndexStatsIT.testThrottleStats" -Dtests.seed=AFB308EF37A80705 -Dtests.locale=en-MV -Dtests.timezone=Pacific/Ponape -Druntime.java=24

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: null

Issue Reasons:

  • [main] 3 consecutive failures in test testThrottleStats
  • [main] 5 failures in test testThrottleStats (3.2% fail rate in 157 executions)
  • [main] 2 failures in pipeline elasticsearch-periodic-platform-support (50.0% fail rate in 4 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test-failure Triaged test failures from CI needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Distributed Indexing Meta label for Distributed Indexing team labels Apr 5, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@albertzaharovits albertzaharovits self-assigned this Apr 6, 2025
@albertzaharovits albertzaharovits added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 9, 2025
arteam added a commit to arteam/elasticsearch that referenced this issue Apr 14, 2025
Backports elastic#126113 to 7.17

> Ensures proper cleanup in the testThrottleStats test

Resolve elastic#126359
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 3 failures in test testThrottleStats (1.9% fail rate in 155 executions)
  • [main] 2 failures in pipeline elasticsearch-periodic-platform-support (50.0% fail rate in 4 executions)

Build Scans:

@arteam
Copy link
Contributor

arteam commented May 12, 2025

Seems to be caused by #127173

@arteam
Copy link
Contributor

arteam commented May 12, 2025

The test is failing on Engine.PauseLock#throttle

at __randomizedtesting.SeedInfo.seed([C5CC926EDF15D7F6]:0)	
	at org.elasticsearch.index.engine.Engine$PauseLock.throttle(Engine.java:615)	
	at org.elasticsearch.index.engine.Engine$IndexThrottle.activate(Engine.java:475)	
	at org.elasticsearch.index.engine.InternalEngine.activateThrottling(InternalEngine.java:2829)	
	at org.elasticsearch.index.engine.InternalEngine$EngineThreadPoolMergeScheduler.enableIndexingThrottling(InternalEngine.java:2923)	
	at org.elasticsearch.index.engine.ThreadPoolMergeScheduler.checkMergeTaskThrottling(ThreadPoolMergeScheduler.java:206)	
	at org.elasticsearch.index.engine.ThreadPoolMergeScheduler.submitNewMergeTask(ThreadPoolMergeScheduler.java:173)	
	at org.elasticsearch.index.engine.ThreadPoolMergeScheduler.merge(ThreadPoolMergeScheduler.java:128)	
	at org.elasticsearch.index.engine.ThreadPoolMergeScheduler$MergeTask.run(ThreadPoolMergeScheduler.java:415)	
	at org.elasticsearch.index.engine.ThreadPoolMergeExecutorService.runMergeTask(ThreadPoolMergeExecutorService.java:209)	
	at org.elasticsearch.index.engine.ThreadPoolMergeExecutorService.lambda$enqueueMergeTaskExecution$4(ThreadPoolMergeExecutorService.java:181)	
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:977)	
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)	
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)	
	at java.lang.Thread.run(Thread.java:1447)

@arteam arteam assigned ankikuma and unassigned albertzaharovits May 12, 2025
@arteam
Copy link
Contributor

arteam commented May 12, 2025

It does get reproduced with

./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.indices.stats.IndexStatsIT.testThrottleStats" -Dtests.seed=780ADB4AEE1AEE71 -Dtests.locale=ckb -Dtests.timezone=Asia/Phnom_Penh -Druntime.java=24
    java.lang.AssertionError: throttling deactivated but not active
        at __randomizedtesting.SeedInfo.seed([780ADB4AEE1AEE71]:0)
        at org.elasticsearch.index.engine.Engine$IndexThrottle.deactivate(Engine.java:481)
        at org.elasticsearch.index.engine.InternalEngine.deactivateThrottling(InternalEngine.java:2838)
        at org.elasticsearch.index.engine.InternalEngine$EngineThreadPoolMergeScheduler.disableIndexingThrottling(InternalEngine.java:2934)
        at org.elasticsearch.index.engine.ThreadPoolMergeScheduler.checkMergeTaskThrottling(ThreadPoolMergeScheduler.java:213)
        at org.elasticsearch.index.engine.ThreadPoolMergeScheduler.mergeTaskDone(ThreadPoolMergeScheduler.java:248)
        at org.elasticsearch.index.engine.ThreadPoolMergeScheduler$MergeTask.run(ThreadPoolMergeScheduler.java:411)
        at org.elasticsearch.index.engine.ThreadPoolMergeExecutorService.runMergeTask(ThreadPoolMergeExecutorService.java:209)
        at org.elasticsearch.index.engine.ThreadPoolMergeExecutorService.lambda$enqueueMergeTaskExecution$4(ThreadPoolMergeExecutorService.java:181)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:977)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)

@arteam
Copy link
Contributor

arteam commented May 13, 2025

I believe the issue is the assert semaphore.availablePermits() == Integer.MAX_VALUE; assertion in PauseLock#throttle. I'm not sure that we can guarantee that we don't call it multiple times without unthrottling.

arteam added a commit to arteam/elasticsearch that referenced this issue May 13, 2025
`Engine.PauseLock#throttle` can be called when the lock is being throttled,
so we can't guarantee that all permits are available before throttling.

Resolve elastic#126359
See elastic#127173
arteam added a commit to arteam/elasticsearch that referenced this issue May 13, 2025
`Engine.PauseLock#throttle` can be called when the lock is being throttled,
so we can't guarantee that all permits are available before throttling.

Resolve elastic#126359
See elastic#127173
arteam added a commit to arteam/elasticsearch that referenced this issue May 13, 2025
`Engine.PauseLock#throttle` can be called when the lock is being throttled,
so we can't guarantee that all permits are available before throttling.

Resolve elastic#126359
See elastic#127173
@arteam arteam linked a pull request May 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed Indexing Meta label for Distributed Indexing team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants