[Issue #1167] Add filtering to the search endpoint (#1468)

## Summary Fixes #1167 ### Time to review: __15 mins__ ## Changes proposed Added filtering to the search endpoint, includes all but the query box parameter which has its own follow-up ticket Added utilities to help generate the search filter schema Added indexes to improve the performance of search (see additional info below for details) Extensive additions to the tests Added the ability to choose examples on the OpenAPI docs (included an example with no filters, and one with many) Fixed a bug in the Paginator for handling counts (will follow-up and fix in the template repo) ## Context for reviewers This change has been extensively tested, manually, and through an enormous amount of new unit tests. As the change was already getting quite large, a few things will be dealt with in follow-up tickets: * Query filtering: #1455 * Fixing logging formatting: #1466 * Additional order_by fields: #1467 For the filters, they're all `one_of` filters which means that only one of the supplied values needs to match for it to pass the where clause (literally the where clauses generate as `where table.column in (1, 2, 3)`). You can see an example query below. The agency filter is a bit odd as I made it a `startswith` style filter instead to handle the way agency codes get nested. We may want to adjust this further in the future, but this will at least technically handle hierarchies of agencies right now. ## Additional information I extensively tested the performance of the queries we run. I locally loaded in ~11k records using our factories (ran the `seed-local-db` script 300 times). With the API functioning, I make SQLAlchemy output the queries it ran and did an `EXPLAIN ANALYZE ...` on the big ones. I then added several indexes which improved the performance. The primary query of the API looks like this: ```sql SELECT opportunity.opportunity_id, opportunity.opportunity_number, opportunity.opportunity_title, opportunity.agency, opportunity.opportunity_category_id, opportunity.category_explanation, opportunity.is_draft, opportunity.revision_number, opportunity.modified_comments, opportunity.publisher_user_id, opportunity.publisher_profile_id, opportunity.created_at, opportunity.updated_at FROM opportunity JOIN current_opportunity_summary ON opportunity.opportunity_id = current_opportunity_summary.opportunity_id JOIN opportunity_summary ON current_opportunity_summary.opportunity_summary_id = opportunity_summary.opportunity_summary_id JOIN link_opportunity_summary_funding_instrument ON opportunity_summary.opportunity_summary_id = link_opportunity_summary_funding_instrument.opportunity_summary_id JOIN link_opportunity_summary_funding_category ON opportunity_summary.opportunity_summary_id = link_opportunity_summary_funding_category.opportunity_summary_id JOIN link_opportunity_summary_applicant_type ON opportunity_summary.opportunity_summary_id = link_opportunity_summary_applicant_type.opportunity_summary_id WHERE opportunity.is_draft IS FALSE AND(EXISTS ( SELECT 1 FROM current_opportunity_summary WHERE opportunity.opportunity_id = current_opportunity_summary.opportunity_id)) AND current_opportunity_summary.opportunity_status_id IN(1,2) AND link_opportunity_summary_funding_instrument.funding_instrument_id IN(1,2) AND link_opportunity_summary_funding_category.funding_category_id IN(1,3,20) AND link_opportunity_summary_applicant_type.applicant_type_id IN(1, 2, 13) AND((opportunity.agency ILIKE 'US-ABC%') OR(opportunity.agency ILIKE 'HHS%')) ORDER BY opportunity.opportunity_id DESC LIMIT 25 OFFSET 25 ``` Without any of the new indexes, `EXPLAIN ANALYZE` gives this a cost of ~1100 (non-specific unit). With the new indexes it becomes ~800. The actual runtime of these queries is in the 5-10ms range with or without the indexes, so it's minor either way. Note that querying the API locally, this gives response times of 50-150ms (slower initially before caching likely takes hold). Also if we're just filtering by something like opportunity status, then the costs are around 10-15. See: https://www.postgresql.org/docs/current/using-explain.html#USING-EXPLAIN-ANALYZE --------- Co-authored-by: nava-platform-bot <[email protected]>
HHS · Mar 18, 2024 · 15f9d59 · 15f9d59
1 parent 5d50bc4
commit 15f9d59
Show file tree

Hide file tree

Showing 15 changed files with 1,280 additions and 59 deletions.
diff --git a/api/openapi.generated.yml b/api/openapi.generated.yml
@@ -314,6 +314,47 @@ paths:
           application/json:
             schema:
               $ref: '#/components/schemas/OpportunitySearchRequest'
+            examples:
+              example1:
+                summary: No filters
+                value:
+                  pagination:
+                    order_by: opportunity_id
+                    page_offset: 1
+                    page_size: 25
+                    sort_direction: ascending
+              example2:
+                summary: All filters
+                value:
+                  query: research
+                  filters:
+                    agency:
+                      one_of:
+                      - US-ABC
+                      - HHS
+                    applicant_type:
+                      one_of:
+                      - state_governments
+                      - county_governments
+                      - individuals
+                    funding_category:
+                      one_of:
+                      - recovery_act
+                      - arts
+                      - natural_resources
+                    funding_instrument:
+                      one_of:
+                      - cooperative_agreement
+                      - grant
+                    opportunity_status:
+                      one_of:
+                      - forecasted
+                      - posted
+                  pagination:
+                    order_by: opportunity_id
+                    page_offset: 1
+                    page_size: 25
+                    sort_direction: descending
       security:
       - ApiKeyAuth: []
   /v0/opportunities/{opportunity_id}:
@@ -727,6 +768,130 @@ components:
           type: string
           format: date-time
           readOnly: true
+    FundingInstrumentFilter:
+      type: object
+      properties:
+        one_of:
+          type: array
+          items:
+            enum:
+            - cooperative_agreement
+            - grant
+            - procurement_contract
+            - other
+            type:
+            - string
+    FundingCategoryFilter:
+      type: object
+      properties:
+        one_of:
+          type: array
+          items:
+            enum:
+            - recovery_act
+            - agriculture
+            - arts
+            - business_and_commerce
+            - community_development
+            - consumer_protection
+            - disaster_prevention_and_relief
+            - education
+            - employment_labor_and_training
+            - energy
+            - environment
+            - food_and_nutrition
+            - health
+            - housing
+            - humanities
+            - infrastructure_investment_and_jobs_act
+            - information_and_statistics
+            - income_security_and_social_services
+            - law_justice_and_legal_services
+            - natural_resources
+            - opportunity_zone_benefits
+            - regional_development
+            - science_technology_and_other_research_and_development
+            - transportation
+            - affordable_care_act
+            - other
+            type:
+            - string
+    ApplicantTypeFilter:
+      type: object
+      properties:
+        one_of:
+          type: array
+          items:
+            enum:
+            - state_governments
+            - county_governments
+            - city_or_township_governments
+            - special_district_governments
+            - independent_school_districts
+            - public_and_state_institutions_of_higher_education
+            - private_institutions_of_higher_education
+            - federally_recognized_native_american_tribal_governments
+            - other_native_american_tribal_organizations
+            - public_and_indian_housing_authorities
+            - nonprofits_non_higher_education_with_501c3
+            - nonprofits_non_higher_education_without_501c3
+            - individuals
+            - for_profit_organizations_other_than_small_businesses
+            - small_businesses
+            - other
+            - unrestricted
+            type:
+            - string
+    OpportunityStatusFilter:
+      type: object
+      properties:
+        one_of:
+          type: array
+          items:
+            enum:
+            - forecasted
+            - posted
+            - closed
+            - archived
+            type:
+            - string
+    AgencyFilter:
+      type: object
+      properties:
+        one_of:
+          type: array
+          items:
+            type: string
+            minLength: 2
+            example: US-ABC
+    OpportunitySearchFilter:
+      type: object
+      properties:
+        funding_instrument:
+          type:
+          - object
+          allOf:
+          - $ref: '#/components/schemas/FundingInstrumentFilter'
+        funding_category:
+          type:
+          - object
+          allOf:
+          - $ref: '#/components/schemas/FundingCategoryFilter'
+        applicant_type:
+          type:
+          - object
+          allOf:
+          - $ref: '#/components/schemas/ApplicantTypeFilter'
+        opportunity_status:
+          type:
+          - object
+          allOf:
+          - $ref: '#/components/schemas/OpportunityStatusFilter'
+        agency:
+          type:
+          - object
+          allOf:
+          - $ref: '#/components/schemas/AgencyFilter'
     OpportunityPagination:
       type: object
       properties:
@@ -761,6 +926,15 @@ components:
     OpportunitySearchRequest:
       type: object
       properties:
+        query:
+          type: string
+          description: Query string which searches against several text fields
+          example: research
+        filters:
+          type:
+          - object
+          allOf:
+          - $ref: '#/components/schemas/OpportunitySearchFilter'
         pagination:
           type:
           - object

diff --git a/api/src/api/opportunities_v0_1/opportunity_routes.py b/api/src/api/opportunities_v0_1/opportunity_routes.py
@@ -9,6 +9,7 @@
 from src.logging.flask_logger import add_extra_data_to_current_request_logs
 from src.services.opportunities_v0_1.get_opportunity import get_opportunity
 from src.services.opportunities_v0_1.search_opportunities import search_opportunities
+from src.util.dict_util import flatten_dict
 
 logger = logging.getLogger(__name__)
 
@@ -23,17 +24,55 @@
 See [Release Phases](https://github.com/github/roadmap?tab=readme-ov-file#release-phases) for further details.
 """
 
+examples = {
+    "example1": {
+        "summary": "No filters",
+        "value": {
+            "pagination": {
+                "order_by": "opportunity_id",
+                "page_offset": 1,
+                "page_size": 25,
+                "sort_direction": "ascending",
+            },
+        },
+    },
+    "example2": {
+        "summary": "All filters",
+        "value": {
+            "query": "research",
+            "filters": {
+                "agency": {"one_of": ["US-ABC", "HHS"]},
+                "applicant_type": {
+                    "one_of": ["state_governments", "county_governments", "individuals"]
+                },
+                "funding_category": {"one_of": ["recovery_act", "arts", "natural_resources"]},
+                "funding_instrument": {"one_of": ["cooperative_agreement", "grant"]},
+                "opportunity_status": {"one_of": ["forecasted", "posted"]},
+            },
+            "pagination": {
+                "order_by": "opportunity_id",
+                "page_offset": 1,
+                "page_size": 25,
+                "sort_direction": "descending",
+            },
+        },
+    },
+}
+
 
 @opportunity_blueprint.post("/opportunities/search")
 @opportunity_blueprint.input(
-    opportunity_schemas.OpportunitySearchRequestSchema, arg_name="search_params"
+    opportunity_schemas.OpportunitySearchRequestSchema, arg_name="search_params", examples=examples
 )
 # many=True allows us to return a list of opportunity objects
 @opportunity_blueprint.output(opportunity_schemas.OpportunitySchema(many=True))
 @opportunity_blueprint.auth_required(api_key_auth)
 @opportunity_blueprint.doc(description=SHARED_ALPHA_DESCRIPTION)
 @flask_db.with_db_session()
 def opportunity_search(db_session: db.Session, search_params: dict) -> response.ApiResponse:
+    add_extra_data_to_current_request_logs(flatten_dict(search_params, prefix="request.body"))
+    logger.info("POST /v0.1/opportunities/search")
+
     with db_session.begin():
         opportunities, pagination_info = search_opportunities(db_session, search_params)
 

diff --git a/api/src/api/opportunities_v0_1/opportunity_schemas.py b/api/src/api/opportunities_v0_1/opportunity_schemas.py
@@ -1,4 +1,5 @@
 from src.api.schemas.extension import Schema, fields
+from src.api.schemas.search_schema import StrSearchSchemaBuilder
 from src.constants.lookup_constants import (
     ApplicantType,
     FundingCategory,
@@ -242,9 +243,43 @@ class OpportunitySchema(Schema):
     updated_at = fields.DateTime(dump_only=True)
 
 
+class OpportunitySearchFilterSchema(Schema):
+    funding_instrument = fields.Nested(
+        StrSearchSchemaBuilder("FundingInstrumentFilterSchema")
+        .with_one_of(allowed_values=FundingInstrument)
+        .build()
+    )
+    funding_category = fields.Nested(
+        StrSearchSchemaBuilder("FundingCategoryFilterSchema")
+        .with_one_of(allowed_values=FundingCategory)
+        .build()
+    )
+    applicant_type = fields.Nested(
+        StrSearchSchemaBuilder("ApplicantTypeFilterSchema")
+        .with_one_of(allowed_values=ApplicantType)
+        .build()
+    )
+    opportunity_status = fields.Nested(
+        StrSearchSchemaBuilder("OpportunityStatusFilterSchema")
+        .with_one_of(allowed_values=OpportunityStatus)
+        .build()
+    )
+    agency = fields.Nested(
+        StrSearchSchemaBuilder("AgencyFilterSchema")
+        .with_one_of(example="US-ABC", minimum_length=2)
+        .build()
+    )
+
+
 class OpportunitySearchRequestSchema(Schema):
-    # A follow-up ticket will add filters and sorting
-    # for now just including the pagination parameters.
+    query = fields.String(
+        metadata={
+            "description": "Query string which searches against several text fields",
+            "example": "research",
+        }
+    )
+
+    filters = fields.Nested(OpportunitySearchFilterSchema())
 
     pagination = fields.Nested(
         generate_pagination_schema(

diff --git a/api/src/api/schemas/search_schema.py b/api/src/api/schemas/search_schema.py
@@ -0,0 +1,75 @@
+from enum import StrEnum
+from typing import Type
+
+from src.api.schemas.extension import Schema, fields, validators
+
+
+class StrSearchSchemaBuilder:
+    """
+    Builder for setting up a filter in a search endpoint schema.
+
+    Our schemas are setup to look like:
+
+        {
+            "filters": {
+                "field": {
+                    "one_of": ["x", "y", "z"]
+                }
+            }
+        }
+
+    This helps generate the filters for a given field. At the moment,
+    only a one_of filter is implemented.
+
+    Usage::
+
+        # In a search request schema, you would use it like so
+
+        class OpportunitySearchFilterSchema(Schema):
+            example_enum_field = fields.Nested(
+                StrSearchSchemaBuilder("ExampleEnumFieldSchema")
+                    .with_one_of(allowed_values=ExampleEnum)
+                    .build()
+            )
+
+            example_str_field = fields.Nested(
+                StrSearchSchemaBuilder("ExampleStrFieldSchema")
+                    .with_one_of(example="example_value", minimum_length=5)
+                    .build()
+            )
+    """
+
+    def __init__(self, schema_class_name: str):
+        # The schema class name is used on the endpoint
+        self.schema_fields: dict[str, fields.MixinField] = {}
+        self.schema_class_name = schema_class_name
+
+    def with_one_of(
+        self,
+        *,
+        allowed_values: Type[StrEnum] | None = None,
+        example: str | None = None,
+        minimum_length: int | None = None
+    ) -> "StrSearchSchemaBuilder":
+        metadata = {}
+        if example:
+            metadata["example"] = example
+
+        # We assume it's just a list of strings
+        if allowed_values is None:
+            params: dict = {"metadata": metadata}
+            if minimum_length is not None:
+                params["validate"] = [validators.Length(min=2)]
+
+            list_type: fields.MixinField = fields.String(**params)
+
+        # Otherwise it is an enum type which handles allowed values
+        else:
+            list_type = fields.Enum(allowed_values, metadata=metadata)
+
+        self.schema_fields["one_of"] = fields.List(list_type)
+
+        return self
+
+    def build(self) -> Schema:
+        return Schema.from_dict(self.schema_fields, name=self.schema_class_name)  # type: ignore