Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-33298 JTrace support sampling configuration #19456

Merged

Conversation

rpastrana
Copy link
Member

@rpastrana rpastrana commented Jan 24, 2025

  • Adds Jtrace sampler configuration
  • Adds OTel sampler initialization logic
  • Updates JTrace configuration README
  • Provides samples
  • Jlog trace/span ids suppressed if not sampled (not sure if this is wanted)

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33298

Jirabot Action Result:
Assigning user: [email protected]
Workflow Transition To: Merge Pending
Updated PR

@rpastrana rpastrana requested a review from jakesmith January 24, 2025 20:16
@rpastrana rpastrana force-pushed the HPCC-33298-JTrace-Sampling branch 3 times, most recently from 5a8ea52 to c56a09b Compare January 28, 2025 22:11
@rpastrana
Copy link
Member Author

@jakesmith made some wholesale changes in order to support the parentBased sampler annotation just in case you were in the middle of reviewing

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana - looks good, some very minor comments.

if (!isEmptyString(samplerType))
{
const char * samplerArgument = samplerTree->queryProp("@argument");
if (strcmp("AlwaysOff", samplerType)==0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: could use jlib's streq

else if (strcmp("Ratio", samplerType)==0)
{
size_t pos;
double ratio = std::stod(samplerArgument, &pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const char * samplerArgument = samplerTree->queryProp("@argument");

could move to here, as only use of argument, and do:

    double ratio = samplerTree->getPropReal("@argument");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's meant to be a generic argument for future support, but in reality the ratio is likely to be the only use, will use your suggestion


if (sampler && samplerTree->getPropBool("@parentBased", true))
{
return std::unique_ptr<opentelemetry::sdk::trace::ParentBasedSampler>(new opentelemetry::sdk::trace::ParentBasedSampler( std::move(sampler)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/formatting: extra unintended space before std::move in 'ParentBasedSampler( std::move(sampler))'

if (!sampler)
{
sampler = std::unique_ptr<opentelemetry::sdk::trace::AlwaysOnSampler>
(new opentelemetry::sdk::trace::AlwaysOnSampler);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent, as per line 1378:

sampler.reset(new opentelemetry::sdk::trace::AlwaysOnSampler());

@rpastrana rpastrana force-pushed the HPCC-33298-JTrace-Sampling branch from 34965d1 to 1e85d43 Compare February 1, 2025 01:14
@rpastrana rpastrana requested a review from jakesmith February 1, 2025 01:14
@@ -768,9 +768,6 @@ class CSpan : public CInterfaceOf<ISpan>
if (span == nullptr)
return false;

if (!span->IsRecording()) //if not sampled, we shouldn't consider this valid?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe outside the scope of this issue, but found this causes a hang under certain parentbased configurations -- further analysis is needed, but removing this for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith this amended commit includes the code review changes you requested and approved + this change

jakesmith
jakesmith previously approved these changes Feb 6, 2025
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana - looks good. Please squash.

@rpastrana
Copy link
Member Author

Do not merge yet, I'm not happy w/ the hang I've noticed (even if avoided by removing call to span->IsRecording

@jakesmith jakesmith self-requested a review February 13, 2025 17:14
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not merge yet, I'm not happy w/ the hang I've noticed (even if avoided by removing call to span->IsRecording

merely commenting, to keep the status of the PR on track (not pending review at this stage).

@jakesmith jakesmith dismissed their stale review February 14, 2025 15:36

Do not merge yet, I'm not happy w/ the hang I've noticed (even if avoided by removing call to span->IsRecording

@rpastrana
Copy link
Member Author

@jakesmith I've reviewed, and this is safe to merge.

@rpastrana rpastrana requested a review from jakesmith February 20, 2025 15:51
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana - looks good. Please squash.

@rpastrana rpastrana force-pushed the HPCC-33298-JTrace-Sampling branch from 1e85d43 to 3a50375 Compare February 21, 2025 14:02
@rpastrana
Copy link
Member Author

rpastrana commented Feb 21, 2025

@ghalliday ready to be merged

Just noticed the conflicts (not sure how that happened), will resolve asap

Conflicts were resolved

@rpastrana rpastrana force-pushed the HPCC-33298-JTrace-Sampling branch from 3a50375 to 1653e52 Compare February 25, 2025 18:04
@rpastrana rpastrana requested a review from jakesmith February 25, 2025 18:05
@rpastrana
Copy link
Member Author

@jakesmith in the process of rebasing due to conflicts, I noticed the sample yaml file was incorrect. Latest commit addresses that minor issue. Please review

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana - looks good afaics. Please squash.

@rpastrana rpastrana force-pushed the HPCC-33298-JTrace-Sampling branch from 1653e52 to 38d3a05 Compare February 26, 2025 15:42
@rpastrana
Copy link
Member Author

Squashed and ready to merge

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana a few comments - mainly regarding ensuring the configuration makes sense from a user's point of view, rather than reflecting the underlying implementation.

if (!sampler)
{
sampler.reset(new opentelemetry::sdk::trace::AlwaysOnSampler());
WARNLOG("JTrace sampler set to 'Always ON' by default!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want this tracing.
This if could be deleted and covered by the default case in the caller.

double ratio = samplerTree->getPropReal("@argument");
if (ratio < 0 || ratio > 1)
{
OERRLOG("JTrace invalid ratio sampler configuration. Ratio must be LE 1.0 or GE 0.0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

picky: "Ratio must be LE 1.0 or GE 0.0"
should be "and", and symbols probably clearer
"Ratio must >= 0.0 and <= 1.0" or "between 0.0 and 1.0"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really picky: For an if() statement, it is better to have the normal/expected case first. It is easier to read.

}
else if (streq(samplerType,"Ratio"))
{
double ratio = samplerTree->getPropReal("@argument");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a default value of -1, so that failing to specify a ratio is caught as a problem - rather than silently removing all items.

@@ -1468,6 +1532,10 @@ Expected Configuration format:
disabled: true #optional - disable OTel tracing
alwaysCreateGlobalIds : false #optional - should global ids always be created?
alwaysCreateTraceIds #optional - should trace ids always be created?
sampler: #optional - controls how traces are either suppressed or sampled
type: #"AlwaysOff" | "AlwaysOn" | "Ratio"
argument: #optional sampler type configuration value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user's point of view argument is very generic. ratio would be better.

You could argue this interface has been defined based on the implementation, rather than what would make sense from the user's viewpoint. I am not sure if you want to go this way, but you could remove the type field, and just rely on a ratio (or percentage) field. If it was 0 you would sample none, if 100 you would include all, otherwise you would create a ratio sampler.

sampler: #optional - controls how traces are either suppressed or sampled
type: #"AlwaysOff" | "AlwaysOn" | "Ratio"
argument: #optional sampler type configuration value
parentBased: #optional sets the sampling decision based on the Span’s parent, or absence of parent, to know which secondary sampler to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? does it mean exclude if the parent is not included in a sample? What does it mean from the user's perspective?

"description": "Name of the Head Sampling type AlwaysOff|AlwaysOn|Ratio"
},
"argument" : {
"type": "string",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would naturally be a number - see comments on "argument" v "ratio" below.

@@ -1171,6 +1171,25 @@
"type": "boolean",
"description": "If true, sets OTEL library logging to debug level (otherwise set to warning)"
},
"sampler": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Would "sampling" be more intuitive to the user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usability improvement would be negligible.

@rpastrana rpastrana requested a review from ghalliday March 3, 2025 19:55
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good set of changes, but I think the using a generic "argument" is wrong.
I don't think there are any other examples like it in the helm chart, and it isn't immediately obvious what it means, and requires the user to specify the value using the wrong type.

@rpastrana rpastrana requested a review from ghalliday March 4, 2025 23:41
@rpastrana rpastrana force-pushed the HPCC-33298-JTrace-Sampling branch from 5f23509 to 3f9dd92 Compare March 4, 2025 23:47
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please squash, and a couple of things to tidy up.

"enum": ["AlwaysOff", "AlwaysOn", "Ratio"],
"description": "Name of the Head Sampling type AlwaysOff|AlwaysOn|Ratio"
},
"ratio" : {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: indented by 2 extra

endpoint: "localhost:4317" #exporter specific key/value pairs
sampling: #optional - controls how traces are either suppressed or sampled
type: #"AlwaysOff" | "AlwaysOn" | "Ratio"
argument: #optional - Generic argument dependent on sampling type used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs updating to refer to ratio instead.

@rpastrana rpastrana force-pushed the HPCC-33298-JTrace-Sampling branch from 3f9dd92 to 4fceee8 Compare March 5, 2025 20:38
@rpastrana
Copy link
Member Author

squashed

- Adds Jtrace sampler configuration
- Adds OTel sampler initialization logic
- Updates JTrace configuration README
- Provides samples
- Jlog trace/span ids suppressed if not sampled (not sure if this is wanted)
- Rename configuration element from sampler to sampling

Signed-off-by: Rodrigo Pastrana <[email protected]>
@rpastrana rpastrana force-pushed the HPCC-33298-JTrace-Sampling branch from 4fceee8 to 2bc7186 Compare March 6, 2025 15:02
@ghalliday ghalliday merged commit b3b6ad6 into hpcc-systems:candidate-9.8.x Mar 6, 2025
51 of 52 checks passed
Copy link

github-actions bot commented Mar 6, 2025

Jirabot Action Result:
Added fix version: 9.8.66
Added fix version: 9.10.12
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants