You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Improve dynamic flush scheduler for otel data
The changes introduce a more flexible and adaptive dynamic flush scheduler to address production issues where the system wasn't flushing data fast enough, causing memory growth and crashes. This issue arises from the existing scheduler handling only a single flush at a time, limiting concurrency and failing to cope with the influx of logs.
- Added configuration options for setting minimum and maximum concurrency levels, maximum batch size, and memory pressure threshold. These parameters ensure that flush operations adjust dynamically based on workload and pressure.
- Implemented `pLimit` to facilitate concurrent flush operations, with adjustments made according to batch queue length and memory pressure.
- Metrics reporting improvements were added to monitor the dynamic behavior of the flush scheduler, aiding in identifying performance issues and optimizing the operation accordingly.
* Implement load shedding for TaskEvent records
This change introduces load shedding mechanisms to manage TaskEvent
records, particularly those of kind LOG, when the system experiences
high volumes and is unable to flush to the database in a timely
manner. The addition aims to prevent overwhelming the system and
ensure critical tasks are prioritized.
- Added configuration options for `loadSheddingThreshold` and
`loadSheddingEnabled` in multiple modules to activate load shedding.
- Introduced `isDroppableEvent` function to allow specific events to
be dropped when load shedding is enabled.
- Ensured metrics are updated to reflect dropped events and load
shedding status, providing visibility into system performance
during high load conditions.
- Updated loggers to inform about load shedding state changes,
ensuring timely awareness of load management activities.
* Fix undefined 'queuePressure' variable in DynamicFlushScheduler
The 'queuePressure' variable was being used without being defined
in the DynamicFlushScheduler class, causing potential runtime
errors. This commit adds the missing definition and ensures that
the variable is correctly calculated based on the 'totalQueuedItems'
and 'memoryPressureThreshold'.
- Addressed code inconsistencies and improved formatting.
- Defined 'queuePressure' in the 'adjustConcurrency' method
to prevent potential undefined errors.
- Enhanced readability by maintaining consistent spacing and
format across the file, contributing to the stability and
maintainability of the code.
- Adjusted batch size logic based on the newly defined 'queuePressure'
variable.
* Refactor concurrency adjustment logic in scheduler
The concurrency adjustment logic in the dynamic flush scheduler has been refactored to improve clarity and maintainability. This change moves the calculation of pressure metrics outside of the conditional blocks to ensure they are always determined prior to decision-making.
- The queue pressure and time since last flush calculations were moved up in the code to be independent of the 'backOff' condition.
- This refactor sets up the groundwork for more reliable concurrency scaling and better performance monitoring capabilities. The overall logic of adjusting concurrency based on system pressure metrics remains unchanged.
This adjustment addresses ongoing issues with the scheduler that were not resolved by previous changes.
* Some tweaks
0 commit comments