-
Notifications
You must be signed in to change notification settings - Fork 15.8k
Do not require serialized dag for dags test #56660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Do not require serialized dag for dags test #56660
Conversation
DAG test should be able to run without serialized dag. This PR is to ensure serialized dag is not required when running dag test. I used the triggering_user_name to exclude the check for serdag but I wonder if we should have a specific name to exclude serdag in the dagrun creation other than triggering_user_name. Please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ephraimbuddy Thanks a lot for working on this so quickly.
Would it be worth to update the PR description to mention it closes #56657?
# create the associated task instances | ||
# state is None at the moment of creation | ||
run.verify_integrity(session=session, dag_version_id=dag_version.id) | ||
run.verify_integrity(session=session, dag_version_id=dag_version.id if dag_version else None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to run verify integrity at all if dag version is none? (I dont have the context loaded in my head right now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We need to create the Task Instances
) | ||
dag_version = DagVersion.get_latest_version(dag.dag_id, session=session) | ||
if not dag_version: | ||
if not dag_version and triggering_user_name != "dag_test": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this works, but i don't line this method of detection.
First question: why do we need a orm dag object anyway? (I think this is since that is where the scheduling logic is?)
Second question: since we know we aren't going to store anything here in the db: do we need to call this function, or would a specialised one that just creates the orm objects in memory fir the purposes of dag.test be better suited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First question: why do we need a orm dag object anyway? (I think this is since that is where the scheduling logic is?)
The ORM object here is used to ensure the dag version is linked to the created dag run and task instances. Since we don't need that in test, we can use the serdag in the test funtion itself:
airflow/task-sdk/src/airflow/sdk/definitions/dag.py
Lines 1200 to 1210 in 947f4b9
scheduler_dag = SerializedDAG.deserialize_dag(SerializedDAG.serialize_dag(self)) # type: ignore[arg-type] | |
# Preserve callback functions from original Dag since they're lost during serialization | |
# and yes it is a hack for now! It is a tradeoff for code simplicity. | |
# Without it, we need "Scheduler Dag" (Serialized dag) for the scheduler bits | |
# -- dep check, scheduling tis | |
# and need real dag to get and run callbacks without having to load the dag model | |
# Scheduler DAG shouldn't have these attributes, but assigning them | |
# here is an easy hack to get this test() thing working. | |
scheduler_dag.on_success_callback = self.on_success_callback # type: ignore[attr-defined] | |
scheduler_dag.on_failure_callback = self.on_failure_callback # type: ignore[attr-defined] |
Second question: since we know we aren't going to store anything here in the db: do we need to call this function, or would a specialised one that just creates the orm objects in memory fir the purposes of dag.test be better suited?
I think the serdag used in the dag.test method is enough. I can prevent calling the dag_version if triggering_user_name is dag_test. The dag_version is only used to ensure dagrun and created TI is linked to a version but tests can skip it
Yes. Done. Thanks |
I noticed the task is failing when executing:
|
DAG test should be able to run without serialized dag. This PR is to ensure serialized dag is not required when running dag test.
I used the triggering_user_name to exclude the check for serdag but I wonder if we should have a specific name to exclude serdag in the dagrun creation other than triggering_user_name. Please let me know
Closes: #56657