Skip to content

[2/4] Offline pipeline evaluation and tests #8908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: wuandy/RealPpl_1
Choose a base branch
from

Conversation

wu-hui
Copy link
Contributor

@wu-hui wu-hui commented Apr 8, 2025

No description provided.

@wu-hui wu-hui requested review from a team as code owners April 8, 2025 22:08
Copy link

changeset-bot bot commented Apr 8, 2025

⚠️ No Changeset found

Latest commit: 6f0e635

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

github-actions bot commented Apr 8, 2025

Vertex AI Mock Responses Check ⚠️

A newer major version of the mock responses for Vertex AI unit tests is available. update_vertexai_responses.sh should be updated to clone the latest version of the responses: v8.0

*/
public _userDataWriter: AbstractUserDataWriter,
readonly stages: Stage[],
readonly converter: unknown = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not critical, but we've been removing converter from the pipeline constructors.

import { JsonProtoSerializer } from '../remote/serializer';

/**
* Base-class implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update this comment before the release

* @param stages
* @param converter
*/
newPipeline(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this pattern in RealtimePipeline, but it's not required. It was useful in Pipeline / LitePipeline. Could be replaced with a direct call to new RealtimePipeline() for simpler code

new Sort(
this.readUserData(
'sort',
this.readUserData('sort', optionsOrOrderings.orderings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a redundant call to readUserData

AggregateFunction,
ListOfExprs,
isNan,
isError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming these linter errors are false positives caused by the way the code was split. Either way, I'm sure we will have some formatting and linter fixes when merging to main.

} else if (functionExpr.name === 'timestamp_sub') {
return new CoreTimestampSub(functionExpr);
}
} else if (expr.exprType === 'AggregateFunction') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still work since AggregateFunction no longer extends Expr?

return EvaluateResult.newError();
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern seems to be repeated often. I wonder if we can refactor this into some shared code to reduce code size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let result = {};
if (evaluateParams(expr.params, [['BOOLEAN']], result) {
// not successful
return result.errorResult; // either newError or newNull
}

// use evaluated params
result.evaluated[0]

}

if (foundNullAtLeastOnce) {
return EvaluateResult.newNull();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you will not hit this case, the loop would have already returned

}
case 'NULL': {
foundNull = true;
foundNullAtLeastOnce = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These foundNull(.*) seem to be redundant

stringValue: evaluated.value?.stringValue
?.split('')
.reverse()
.join('')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mark this as an implementation to evaluate efficiency, if you have not already. On the surface, this appears like it would be slow.

switch (evaluated.type) {
case 'NULL':
return EvaluateResult.newNull();
case 'STRING': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverse and some other functions support Bytes in addition to string. Are bytes encoded as string somewhere upstream of this call, or will we need to add support for bytes?

@wu-hui wu-hui force-pushed the wuandy/RealPpl_2 branch from 6710057 to 0c570ab Compare April 14, 2025 18:34
@@ -242,41 +288,322 @@ export function toPipeline(query: Query, db: Firestore): Pipeline {
}
}

return pipeline;
return pipeline.stages;
}

function whereConditionsFromCursor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably got lost in the merge. But the implementation in feat/pipelines is more performant and easier to read. Consider using that implementation instead.

hasOrder = true;
// add exists to force sparse semantics
// Is this really needed?
// newStages.push(new Where(new And(stage.orders.map(order => order.expr.exists()))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sparse semantics need to be enforced? If not, should we remove this commented out code?

'Empty document paths are not allowed in DocumentsSource'
);
}
if (stage.docPaths) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stage.docPaths will always be truthy since we referenced stage.docPaths.length above

stage: DocumentsSource,
input: PipelineInputOutput[]
): PipelineInputOutput[] {
if (stage.docPaths.length === 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (stage.docPaths && stage.docPaths.length === 0) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this if ther is ever a condition where docPaths is unset

{ serializer: pipeline.serializer },
d2 as MutableDocument
)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we could memoize leftValue and rightValue. May be something for a future PR.

* @param value The value of the constant.
*/
constructor(private value: unknown) {
constructor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at a minimum we want @hideconstructor

export function currentContext(): FunctionExpr {
return new FunctionExpr('current_context', []);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should delete this expression. It will not be exposed in the SDKs. This may have been added back in a merge.

* // Check if the document has a field named "phoneNumber"
* exists("phoneNumber");
* // Check if the result of a calculation is NaN
* isNaN("value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears as if the comments for this and the overload above were reverted in merge. The comments should apply to exists not isNaN

*
* ```typescript
* // Check if the result of a calculation is NaN
* isNaN(field("value").divide(0));
* ```
*
* @param value The expression to check.
* @return A new {@code Expr} representing the 'isNaN' check.
* @return A new {@code Expr} representing the 'isNull' check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment needs fix

* // Check if the result of a calculation is NaN
* isNaN("value");
* // Check if the result of a calculation is null.
* isNull("value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment needs fix

return new CorePipeline(pipeline.serializer, newStages);
}

export type QueryOrPipeline = Query | CorePipeline;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because QueryOrPipeline is a union type that is used throughout the implementation, to support porting to other platforms, we should change this to a common interface between the two types.

return queryEquals(left as Query, right as Query);
}

export type TargetOrPipeline = Target | CorePipeline;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question to be covered with the team. But what allows us to skip creating of an equivalent "Pipeline Target" class

stage.orders.some(
order =>
order.expr instanceof Field &&
order.expr.fieldName() === DOCUMENT_KEY_NAME
Copy link
Contributor

@MarkDuckworth MarkDuckworth Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does NAME guarantee stable sort on a collection group?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess it will. This is the full path not just the ID.

@wu-hui wu-hui force-pushed the wuandy/RealPpl_2 branch from 0c570ab to 6f0e635 Compare April 24, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants