Skip to content

[Bug] MetricFlow fails to resolve equivalent dimension paths across PRIMARY and FOREIGN entity relationships #1780

@dstein-st

Description

@dstein-st

Is this a new bug in metricflow?

  • I believe this is a new bug in metricflow
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

MetricFlow fails to resolve queries that use the same dimension across PRIMARY and FOREIGN entity metrics. For example, businessunit__businessunit_name (from a PRIMARY job entity metric) and job__businessunit__businessunit_name (from a FOREIGN job entity metric) are treated as incompatible, even though they are the same dimension. Interestingly, MetricFlow has no problem resolving the same dimension when accessed by multiple different FOREIGN entity metrics - it only fails when one is PRIMARY. Therefore, a possible workaround is to never use PRIMARY entity measures/metrics, and to move all measures into a boilerplate "companion" entity with a foreign relationship. But this would cause us to double the number of entities in our semantic layer.

Why This Is A Bug

  1. Logical Equivalence: The dimension paths are logically equivalent - they reference the same underlying data through the same relationships.

  2. Inconsistent Behavior: MetricFlow can resolve the paths when both metrics use different FOREIGN relationships but fails when mixing PRIMARY and FOREIGN, despite the semantic equivalence.

  3. Unnecessary Boilerplate: The current behavior forces users to create redundant "companion entities" for every entity just to enable basic row counting alongside other metrics.

  4. Path Resolution Logic: If MetricFlow can successfully join and coalesce job__businessunit__businessunit_name from two different FOREIGN relationships (as shown in the working SQL), it should be able to recognize that businessunit__businessunit_name (from PRIMARY) and job__businessunit__businessunit_name (from FOREIGN) are equivalent.

Generated SQL from Workaround

When using the companion entity pattern, MetricFlow successfully generates SQL that shows it understands how to join these dimensions:

SELECT
  COALESCE(subq_12.job__businessunit__businessunit_name, subq_23.job__businessunit__businessunit_name) 
    AS job__businessunit__businessunit_name,
  MAX(subq_12.job_count) AS job_count,
  MAX(subq_23.material_costs) AS material_costs
FROM (
  -- Job count subquery using businessunit__businessunit_name
  SELECT
    subq_8.businessunit__businessunit_name AS job__businessunit__businessunit_name,
    SUM(job_companion_src.companion_count) AS job_count
      FROM SCHEMA.DATABASE.staging_job_companion_table job_companion_src
  LEFT OUTER JOIN (
    SELECT
      job_cte.job AS job,
      businessunit_cte.businessunit_name AS businessunit__businessunit_name
    FROM job_cte job_cte
    LEFT OUTER JOIN businessunit_cte businessunit_cte
      ON job_cte.businessunit = businessunit_cte.businessunit
  ) subq_8
  ON job_companion_src.job_ref = subq_8.job
  GROUP BY subq_8.businessunit__businessunit_name
) subq_12
FULL OUTER JOIN (
  -- Material costs subquery using same dimension
  SELECT
    subq_19.businessunit__businessunit_name AS job__businessunit__businessunit_name,
    SUM(subq_14.material_total_cost) AS material_costs
  FROM (
    SELECT
      CONCAT(_tenant_id, '|', job_id) AS job,
      total_cost AS material_total_cost
    FROM SCHEMA.DATABASE.staging_material_costs_table material_costs_src
  ) subq_14
  LEFT OUTER JOIN (
    SELECT
      job_cte.job AS job,
      businessunit_cte.businessunit_name AS businessunit__businessunit_name
    FROM job_cte job_cte
    LEFT OUTER JOIN businessunit_cte businessunit_cte
      ON job_cte.businessunit = businessunit_cte.businessunit
  ) subq_19
  ON subq_14.job = subq_19.job
  GROUP BY subq_19.businessunit__businessunit_name
) subq_23
ON subq_12.job__businessunit__businessunit_name = subq_23.job__businessunit__businessunit_name
GROUP BY COALESCE(subq_12.job__businessunit__businessunit_name, subq_23.job__businessunit__businessunit_name)

Expected Behavior

MetricFlow should recognize that businessunit__businessunit_name and job__businessunit__businessunit_name refer to the same dimension and successfully resolve the query. The paths are logically equivalent:

  • From PRIMARY job entity: businessunit__businessunit_name
  • From FOREIGN job entity: job__businessunit__businessunit_name

Both ultimately reference the same businessunit_name field through the job's relationship to businessunit.

Steps To Reproduce

Minimal Reproducible Example

Semantic Model Configuration

# models/semantic_models/jobs_v2.yml
semantic_models:
  - name: jobs_v2
    model: ref('staging_job_table')
    entities:
      - name: job
        type: primary
        expr: job_id
      - name: businessunit
        type: foreign
        expr: businessunit_id
    measures:
      - name: job_count
        agg: count_distinct
        expr: job_id

# models/semantic_models/material_costs.yml
semantic_models:
  - name: material_costs
    model: ref('staging_material_costs_table')
    entities:
      - name: job
        type: foreign
        expr: job_id
    measures:
      - name: material_costs
        agg: sum
        expr: total_cost

Failing Query

mf query --metrics job_count,material_costs \
  --group-by businessunit__businessunit_name \
  --explain

Actual Behavior

The query fails with the following error:

ERROR: Got error(s) during query resolution.

Error #1:
  Message:

    Query(['job_count', 'material_costs']) is built from:

      Metric('job_count'), Metric('material_costs').
    However, the given input does not match to a common item that is available to those parents:

      {
        "Matching items for: Metric('job_count')": [
          'businessunit__businessunit_name',
        ],
        "Matching items for: Metric('material_costs')": [
          'job__businessunit__businessunit_name',
        ],
      }

    For time dimension inputs, please specify a time grain as ambiguous resolution only allows resolution 
    when the parents have the same defined time gain.

  Query Input:

    businessunit__businessunit_name

  Issue Location:

    [Resolve Query(['job_count', 'material_costs'])]

Current Workaround

Creating a "companion entity" pattern where we add a trivial foreign entity relationship to job:

# models/semantic_models/job_companion.yml
semantic_models:
  - name: job_companion
    model: ref('staging_job_companion_table')  # Simple SELECT *, 1 as companion_count FROM job
    entities:
      - name: job
        type: foreign  # Forces consistent path resolution
        expr: job_id
    measures:
      - name: job_companion_count
        agg: sum
        expr: companion_count

# Updated metrics to use companion
metrics:
  - name: job_count
    type: simple
    measure: job_companion_count  # Now uses FOREIGN relationship

With this pattern, the following query works:

mf query --metrics job_count,material_costs \
  --group-by job__businessunit__businessunit_name \
  --explain

This succeeds because both metrics now traverse through FOREIGN relationships, giving them consistent paths.

Relevant log output

Environment

- MetricFlow version: 0.8.2
- dbt-core version: 1.9.4
- dbt-snowflake: 1.8.4
- Database: Snowflake
- Python version: 3.11

Which database are you using?

snowflake

Additional Context

Impact

We cannot use measures from primary entities in queries that combine metrics from different semantic models. This forces us to create artificial "companion" entities with foreign relationships for every entity, effectively doubling our semantic model count. What should be a simple COUNT(DISTINCT job_id) now requires an entire separate model and staging table.

Suggested Solution

MetricFlow should enhance its dimension path resolution to recognize when paths are semantically equivalent, particularly when:

  1. A PRIMARY entity's direct dimension path (e.g., businessunit__businessunit_name)
  2. Is equivalent to a FOREIGN entity's path through the same entity (e.g., job__businessunit__businessunit_name)

The resolution logic should treat these as the same dimension for query planning purposes, similar to how it already handles multiple FOREIGN relationships to the same dimension.

Additional Context

  • Derived metrics that depend on measures from primary entities also require this workaround

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageTasks that need to be triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions