Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate fuzzy search query performance discrepancy #2164

Open
He1DAr opened this issue Nov 12, 2024 · 1 comment · May be fixed by #2102
Open

Investigate fuzzy search query performance discrepancy #2164

He1DAr opened this issue Nov 12, 2024 · 1 comment · May be fixed by #2102

Comments

@He1DAr
Copy link
Collaborator

He1DAr commented Nov 12, 2024

There’s a significant difference in performance when running the same query locally vs. in the dev environment, even though the query uses the same execution strategy in both environments.

Query:

SELECT 
	  *
      FROM stacks_blockchain_api.txs
      WHERE canonical = true AND microblock_canonical = true AND block_height <= 157521 AND burn_block_time <= 1730591999
	  AND (
        CASE 
          WHEN EXISTS (
            SELECT 1
            FROM pg_extension
            WHERE extname = 'pg_trgm'
          ) 
          THEN similarity(contract_call_function_name, 'swab') > 0.3
          ELSE contract_call_function_name ILIKE '%swab%'
          END
      )
      ORDER BY burn_block_time DESC, block_height DESC, microblock_sequence DESC, tx_index DESC
      LIMIT 30
      OFFSET 0;

Analysis output:

[
  {
    "Plan": {
      "Node Type": "Limit",
      "Parallel Aware": false,
      "Async Capable": false,
      "Startup Cost": 821.56,
      "Total Cost": 887.96,
      "Plan Rows": 30,
      "Plan Width": 2475,
      "Actual Startup Time": 730.829,
      "Actual Total Time": 730.837,
      "Actual Rows": 30,
      "Actual Loops": 1,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "InitPlan",
          "Subplan Name": "InitPlan 1 (returns $0)",
          "Parallel Aware": false,
          "Async Capable": false,
          "Relation Name": "pg_extension",
          "Alias": "pg_extension",
          "Startup Cost": 0.00,
          "Total Cost": 1.01,
          "Plan Rows": 1,
          "Plan Width": 0,
          "Actual Startup Time": 0.019,
          "Actual Total Time": 0.020,
          "Actual Rows": 1,
          "Actual Loops": 1,
          "Filter": "(extname = 'pg_trgm'::name)",
          "Rows Removed by Filter": 1
        },
        {
          "Node Type": "Incremental Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Async Capable": false,
          "Startup Cost": 820.54,
          "Total Cost": 13030750.71,
          "Plan Rows": 5886325,
          "Plan Width": 2475,
          "Actual Startup Time": 730.827,
          "Actual Total Time": 730.831,
          "Actual Rows": 30,
          "Actual Loops": 1,
          "Sort Key": ["txs.burn_block_time DESC", "txs.block_height DESC", "txs.microblock_sequence DESC", "txs.tx_index DESC"],
          "Presorted Key": ["txs.burn_block_time"],
          "Full-sort Groups": {
            "Group Count": 1,
            "Sort Methods Used": ["quicksort"],
            "Sort Space Memory": {
              "Average Sort Space Used": 85,
              "Peak Sort Space Used": 85
            }
          },
          "Plans": [
            {
              "Node Type": "Index Scan",
              "Parent Relationship": "Outer",
              "Parallel Aware": false,
              "Async Capable": false,
              "Scan Direction": "Backward",
              "Index Name": "txs_burn_block_time_index",
              "Relation Name": "txs",
              "Alias": "txs",
              "Startup Cost": 0.43,
              "Total Cost": 12688723.86,
              "Plan Rows": 5886325,
              "Plan Width": 2475,
              "Actual Startup Time": 717.984,
              "Actual Total Time": 730.722,
              "Actual Rows": 31,
              "Actual Loops": 1,
              "Index Cond": "(burn_block_time <= 1730591999)",
              "Rows Removed by Index Recheck": 0,
              "Filter": "(canonical AND microblock_canonical AND (block_height <= 157521) AND CASE WHEN $0 THEN (similarity(contract_call_function_name, 'swab'::text) > '0.3'::double precision) ELSE (contract_call_function_name ~~* '%swab%'::text) END)",
              "Rows Removed by Filter": 981643
            }
          ]
        }
      ]
    },
    "Planning Time": 0.363,
    "Triggers": [
    ],
    "Execution Time": 730.896
  }
]
@github-project-automation github-project-automation bot moved this to 🆕 New in API Board Nov 12, 2024
@smcclellan smcclellan moved this from 🆕 New to 🏗 In Progress in API Board Nov 18, 2024
@smcclellan smcclellan moved this from 🏗 In Progress to 📋 Backlog in API Board Nov 18, 2024
@zone117x
Copy link
Member

We've had several issues similar to this: a postgres query is relatively fast on a local machine, but slow in dev/staging. We've also seen instances where the query seems faster in prod or local when those environments are not under any load. I.e. a query may be fast on a highly provisioned machine that is not under load, but once it's under load that query ends end becoming a significant bottleneck.

It's generally a good rule of thumb that if a query is not performant in the dev/stg environment, then it will not be performant when used at scale in the prod environment.

With this particular query, I don't think we can squeeze out about 1-2 orders of magnitude better performance unless we make a significant change to the approach. I think we'd need to do something like: create a new table that is responsible for tracking search terms, and create a search query against that table. That approach may work but keep in mind it's more involved than it might seem at face value. It involves:

  • Create a new table in a sql migration file
  • Optimize the indexes for very fast search queries
  • Update several areas in the data ingestion code so that we write to this new table
  • Write an optimize sql read query against that new table

@zone117x zone117x linked a pull request Mar 25, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📋 Backlog
Development

Successfully merging a pull request may close this issue.

4 participants