-
Notifications
You must be signed in to change notification settings - Fork 114
Description
Is this a new bug in metricflow?
- I believe this is a new bug in metricflow
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
MetricFlow fails to resolve queries that use the same dimension across PRIMARY and FOREIGN entity metrics. For example, businessunit__businessunit_name
(from a PRIMARY job entity metric) and job__businessunit__businessunit_name
(from a FOREIGN job entity metric) are treated as incompatible, even though they are the same dimension. Interestingly, MetricFlow has no problem resolving the same dimension when accessed by multiple different FOREIGN entity metrics - it only fails when one is PRIMARY. Therefore, a possible workaround is to never use PRIMARY entity measures/metrics, and to move all measures into a boilerplate "companion" entity with a foreign relationship. But this would cause us to double the number of entities in our semantic layer.
Why This Is A Bug
-
Logical Equivalence: The dimension paths are logically equivalent - they reference the same underlying data through the same relationships.
-
Inconsistent Behavior: MetricFlow can resolve the paths when both metrics use different FOREIGN relationships but fails when mixing PRIMARY and FOREIGN, despite the semantic equivalence.
-
Unnecessary Boilerplate: The current behavior forces users to create redundant "companion entities" for every entity just to enable basic row counting alongside other metrics.
-
Path Resolution Logic: If MetricFlow can successfully join and coalesce
job__businessunit__businessunit_name
from two different FOREIGN relationships (as shown in the working SQL), it should be able to recognize thatbusinessunit__businessunit_name
(from PRIMARY) andjob__businessunit__businessunit_name
(from FOREIGN) are equivalent.
Generated SQL from Workaround
When using the companion entity pattern, MetricFlow successfully generates SQL that shows it understands how to join these dimensions:
SELECT
COALESCE(subq_12.job__businessunit__businessunit_name, subq_23.job__businessunit__businessunit_name)
AS job__businessunit__businessunit_name,
MAX(subq_12.job_count) AS job_count,
MAX(subq_23.material_costs) AS material_costs
FROM (
-- Job count subquery using businessunit__businessunit_name
SELECT
subq_8.businessunit__businessunit_name AS job__businessunit__businessunit_name,
SUM(job_companion_src.companion_count) AS job_count
FROM SCHEMA.DATABASE.staging_job_companion_table job_companion_src
LEFT OUTER JOIN (
SELECT
job_cte.job AS job,
businessunit_cte.businessunit_name AS businessunit__businessunit_name
FROM job_cte job_cte
LEFT OUTER JOIN businessunit_cte businessunit_cte
ON job_cte.businessunit = businessunit_cte.businessunit
) subq_8
ON job_companion_src.job_ref = subq_8.job
GROUP BY subq_8.businessunit__businessunit_name
) subq_12
FULL OUTER JOIN (
-- Material costs subquery using same dimension
SELECT
subq_19.businessunit__businessunit_name AS job__businessunit__businessunit_name,
SUM(subq_14.material_total_cost) AS material_costs
FROM (
SELECT
CONCAT(_tenant_id, '|', job_id) AS job,
total_cost AS material_total_cost
FROM SCHEMA.DATABASE.staging_material_costs_table material_costs_src
) subq_14
LEFT OUTER JOIN (
SELECT
job_cte.job AS job,
businessunit_cte.businessunit_name AS businessunit__businessunit_name
FROM job_cte job_cte
LEFT OUTER JOIN businessunit_cte businessunit_cte
ON job_cte.businessunit = businessunit_cte.businessunit
) subq_19
ON subq_14.job = subq_19.job
GROUP BY subq_19.businessunit__businessunit_name
) subq_23
ON subq_12.job__businessunit__businessunit_name = subq_23.job__businessunit__businessunit_name
GROUP BY COALESCE(subq_12.job__businessunit__businessunit_name, subq_23.job__businessunit__businessunit_name)
Expected Behavior
MetricFlow should recognize that businessunit__businessunit_name
and job__businessunit__businessunit_name
refer to the same dimension and successfully resolve the query. The paths are logically equivalent:
- From PRIMARY job entity:
businessunit__businessunit_name
- From FOREIGN job entity:
job__businessunit__businessunit_name
Both ultimately reference the same businessunit_name
field through the job's relationship to businessunit.
Steps To Reproduce
Minimal Reproducible Example
Semantic Model Configuration
# models/semantic_models/jobs_v2.yml
semantic_models:
- name: jobs_v2
model: ref('staging_job_table')
entities:
- name: job
type: primary
expr: job_id
- name: businessunit
type: foreign
expr: businessunit_id
measures:
- name: job_count
agg: count_distinct
expr: job_id
# models/semantic_models/material_costs.yml
semantic_models:
- name: material_costs
model: ref('staging_material_costs_table')
entities:
- name: job
type: foreign
expr: job_id
measures:
- name: material_costs
agg: sum
expr: total_cost
Failing Query
mf query --metrics job_count,material_costs \
--group-by businessunit__businessunit_name \
--explain
Actual Behavior
The query fails with the following error:
ERROR: Got error(s) during query resolution.
Error #1:
Message:
Query(['job_count', 'material_costs']) is built from:
Metric('job_count'), Metric('material_costs').
However, the given input does not match to a common item that is available to those parents:
{
"Matching items for: Metric('job_count')": [
'businessunit__businessunit_name',
],
"Matching items for: Metric('material_costs')": [
'job__businessunit__businessunit_name',
],
}
For time dimension inputs, please specify a time grain as ambiguous resolution only allows resolution
when the parents have the same defined time gain.
Query Input:
businessunit__businessunit_name
Issue Location:
[Resolve Query(['job_count', 'material_costs'])]
Current Workaround
Creating a "companion entity" pattern where we add a trivial foreign entity relationship to job:
# models/semantic_models/job_companion.yml
semantic_models:
- name: job_companion
model: ref('staging_job_companion_table') # Simple SELECT *, 1 as companion_count FROM job
entities:
- name: job
type: foreign # Forces consistent path resolution
expr: job_id
measures:
- name: job_companion_count
agg: sum
expr: companion_count
# Updated metrics to use companion
metrics:
- name: job_count
type: simple
measure: job_companion_count # Now uses FOREIGN relationship
With this pattern, the following query works:
mf query --metrics job_count,material_costs \
--group-by job__businessunit__businessunit_name \
--explain
This succeeds because both metrics now traverse through FOREIGN relationships, giving them consistent paths.
Relevant log output
Environment
- MetricFlow version: 0.8.2
- dbt-core version: 1.9.4
- dbt-snowflake: 1.8.4
- Database: Snowflake
- Python version: 3.11
Which database are you using?
snowflake
Additional Context
Impact
We cannot use measures from primary entities in queries that combine metrics from different semantic models. This forces us to create artificial "companion" entities with foreign relationships for every entity, effectively doubling our semantic model count. What should be a simple COUNT(DISTINCT job_id)
now requires an entire separate model and staging table.
Suggested Solution
MetricFlow should enhance its dimension path resolution to recognize when paths are semantically equivalent, particularly when:
- A PRIMARY entity's direct dimension path (e.g.,
businessunit__businessunit_name
) - Is equivalent to a FOREIGN entity's path through the same entity (e.g.,
job__businessunit__businessunit_name
)
The resolution logic should treat these as the same dimension for query planning purposes, similar to how it already handles multiple FOREIGN relationships to the same dimension.
Additional Context
- Derived metrics that depend on measures from primary entities also require this workaround