Skip to content

Commit

Permalink
[Issue #1167] Add filtering to the search endpoint (#1468)
Browse files Browse the repository at this point in the history
## Summary
Fixes #1167

### Time to review: __15 mins__

## Changes proposed
Added filtering to the search endpoint, includes all but the query box
parameter which has its own follow-up ticket

Added utilities to help generate the search filter schema

Added indexes to improve the performance of search (see additional info
below for details)

Extensive additions to the tests

Added the ability to choose examples on the OpenAPI docs (included an
example with no filters, and one with many)

Fixed a bug in the Paginator for handling counts (will follow-up and fix
in the template repo)

## Context for reviewers
This change has been extensively tested, manually, and through an
enormous amount of new unit tests. As the change was already getting
quite large, a few things will be dealt with in follow-up tickets:
* Query filtering: #1455
* Fixing logging formatting:
#1466
* Additional order_by fields:
#1467

For the filters, they're all `one_of` filters which means that only one
of the supplied values needs to match for it to pass the where clause
(literally the where clauses generate as `where table.column in (1, 2,
3)`). You can see an example query below.

The agency filter is a bit odd as I made it a `startswith` style filter
instead to handle the way agency codes get nested. We may want to adjust
this further in the future, but this will at least technically handle
hierarchies of agencies right now.

## Additional information
I extensively tested the performance of the queries we run. I locally
loaded in ~11k records using our factories (ran the `seed-local-db`
script 300 times). With the API functioning, I make SQLAlchemy output
the queries it ran and did an `EXPLAIN ANALYZE ...` on the big ones. I
then added several indexes which improved the performance.

The primary query of the API looks like this:
```sql
SELECT
	opportunity.opportunity_id,
	opportunity.opportunity_number,
	opportunity.opportunity_title,
	opportunity.agency,
	opportunity.opportunity_category_id,
	opportunity.category_explanation,
	opportunity.is_draft,
	opportunity.revision_number,
	opportunity.modified_comments,
	opportunity.publisher_user_id,
	opportunity.publisher_profile_id,
	opportunity.created_at,
	opportunity.updated_at
FROM
	opportunity
	JOIN current_opportunity_summary ON opportunity.opportunity_id = current_opportunity_summary.opportunity_id
	JOIN opportunity_summary ON current_opportunity_summary.opportunity_summary_id = opportunity_summary.opportunity_summary_id
	JOIN link_opportunity_summary_funding_instrument ON opportunity_summary.opportunity_summary_id = link_opportunity_summary_funding_instrument.opportunity_summary_id
	JOIN link_opportunity_summary_funding_category ON opportunity_summary.opportunity_summary_id = link_opportunity_summary_funding_category.opportunity_summary_id
	JOIN link_opportunity_summary_applicant_type ON opportunity_summary.opportunity_summary_id = link_opportunity_summary_applicant_type.opportunity_summary_id
WHERE
	opportunity.is_draft IS FALSE
	AND(EXISTS (
			SELECT
				1 FROM current_opportunity_summary
			WHERE
				opportunity.opportunity_id = current_opportunity_summary.opportunity_id))
	AND current_opportunity_summary.opportunity_status_id IN(1,2)
	AND link_opportunity_summary_funding_instrument.funding_instrument_id IN(1,2)
	AND link_opportunity_summary_funding_category.funding_category_id IN(1,3,20)
	AND link_opportunity_summary_applicant_type.applicant_type_id IN(1, 2, 13)
	AND((opportunity.agency ILIKE 'US-ABC%')
	OR(opportunity.agency ILIKE 'HHS%'))
ORDER BY
	opportunity.opportunity_id DESC
LIMIT 25 OFFSET 25
```

Without any of the new indexes, `EXPLAIN ANALYZE` gives this a cost of
~1100 (non-specific unit). With the new indexes it becomes ~800. The
actual runtime of these queries is in the 5-10ms range with or without
the indexes, so it's minor either way. Note that querying the API
locally, this gives response times of 50-150ms (slower initially before
caching likely takes hold). Also if we're just filtering by something
like opportunity status, then the costs are around 10-15.
See:
https://www.postgresql.org/docs/current/using-explain.html#USING-EXPLAIN-ANALYZE

---------

Co-authored-by: nava-platform-bot <[email protected]>
  • Loading branch information
chouinar and nava-platform-bot authored Mar 18, 2024
1 parent 5d50bc4 commit 15f9d59
Show file tree
Hide file tree
Showing 15 changed files with 1,280 additions and 59 deletions.
174 changes: 174 additions & 0 deletions api/openapi.generated.yml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,47 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/OpportunitySearchRequest'
examples:
example1:
summary: No filters
value:
pagination:
order_by: opportunity_id
page_offset: 1
page_size: 25
sort_direction: ascending
example2:
summary: All filters
value:
query: research
filters:
agency:
one_of:
- US-ABC
- HHS
applicant_type:
one_of:
- state_governments
- county_governments
- individuals
funding_category:
one_of:
- recovery_act
- arts
- natural_resources
funding_instrument:
one_of:
- cooperative_agreement
- grant
opportunity_status:
one_of:
- forecasted
- posted
pagination:
order_by: opportunity_id
page_offset: 1
page_size: 25
sort_direction: descending
security:
- ApiKeyAuth: []
/v0/opportunities/{opportunity_id}:
Expand Down Expand Up @@ -727,6 +768,130 @@ components:
type: string
format: date-time
readOnly: true
FundingInstrumentFilter:
type: object
properties:
one_of:
type: array
items:
enum:
- cooperative_agreement
- grant
- procurement_contract
- other
type:
- string
FundingCategoryFilter:
type: object
properties:
one_of:
type: array
items:
enum:
- recovery_act
- agriculture
- arts
- business_and_commerce
- community_development
- consumer_protection
- disaster_prevention_and_relief
- education
- employment_labor_and_training
- energy
- environment
- food_and_nutrition
- health
- housing
- humanities
- infrastructure_investment_and_jobs_act
- information_and_statistics
- income_security_and_social_services
- law_justice_and_legal_services
- natural_resources
- opportunity_zone_benefits
- regional_development
- science_technology_and_other_research_and_development
- transportation
- affordable_care_act
- other
type:
- string
ApplicantTypeFilter:
type: object
properties:
one_of:
type: array
items:
enum:
- state_governments
- county_governments
- city_or_township_governments
- special_district_governments
- independent_school_districts
- public_and_state_institutions_of_higher_education
- private_institutions_of_higher_education
- federally_recognized_native_american_tribal_governments
- other_native_american_tribal_organizations
- public_and_indian_housing_authorities
- nonprofits_non_higher_education_with_501c3
- nonprofits_non_higher_education_without_501c3
- individuals
- for_profit_organizations_other_than_small_businesses
- small_businesses
- other
- unrestricted
type:
- string
OpportunityStatusFilter:
type: object
properties:
one_of:
type: array
items:
enum:
- forecasted
- posted
- closed
- archived
type:
- string
AgencyFilter:
type: object
properties:
one_of:
type: array
items:
type: string
minLength: 2
example: US-ABC
OpportunitySearchFilter:
type: object
properties:
funding_instrument:
type:
- object
allOf:
- $ref: '#/components/schemas/FundingInstrumentFilter'
funding_category:
type:
- object
allOf:
- $ref: '#/components/schemas/FundingCategoryFilter'
applicant_type:
type:
- object
allOf:
- $ref: '#/components/schemas/ApplicantTypeFilter'
opportunity_status:
type:
- object
allOf:
- $ref: '#/components/schemas/OpportunityStatusFilter'
agency:
type:
- object
allOf:
- $ref: '#/components/schemas/AgencyFilter'
OpportunityPagination:
type: object
properties:
Expand Down Expand Up @@ -761,6 +926,15 @@ components:
OpportunitySearchRequest:
type: object
properties:
query:
type: string
description: Query string which searches against several text fields
example: research
filters:
type:
- object
allOf:
- $ref: '#/components/schemas/OpportunitySearchFilter'
pagination:
type:
- object
Expand Down
41 changes: 40 additions & 1 deletion api/src/api/opportunities_v0_1/opportunity_routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from src.logging.flask_logger import add_extra_data_to_current_request_logs
from src.services.opportunities_v0_1.get_opportunity import get_opportunity
from src.services.opportunities_v0_1.search_opportunities import search_opportunities
from src.util.dict_util import flatten_dict

logger = logging.getLogger(__name__)

Expand All @@ -23,17 +24,55 @@
See [Release Phases](https://github.com/github/roadmap?tab=readme-ov-file#release-phases) for further details.
"""

examples = {
"example1": {
"summary": "No filters",
"value": {
"pagination": {
"order_by": "opportunity_id",
"page_offset": 1,
"page_size": 25,
"sort_direction": "ascending",
},
},
},
"example2": {
"summary": "All filters",
"value": {
"query": "research",
"filters": {
"agency": {"one_of": ["US-ABC", "HHS"]},
"applicant_type": {
"one_of": ["state_governments", "county_governments", "individuals"]
},
"funding_category": {"one_of": ["recovery_act", "arts", "natural_resources"]},
"funding_instrument": {"one_of": ["cooperative_agreement", "grant"]},
"opportunity_status": {"one_of": ["forecasted", "posted"]},
},
"pagination": {
"order_by": "opportunity_id",
"page_offset": 1,
"page_size": 25,
"sort_direction": "descending",
},
},
},
}


@opportunity_blueprint.post("/opportunities/search")
@opportunity_blueprint.input(
opportunity_schemas.OpportunitySearchRequestSchema, arg_name="search_params"
opportunity_schemas.OpportunitySearchRequestSchema, arg_name="search_params", examples=examples
)
# many=True allows us to return a list of opportunity objects
@opportunity_blueprint.output(opportunity_schemas.OpportunitySchema(many=True))
@opportunity_blueprint.auth_required(api_key_auth)
@opportunity_blueprint.doc(description=SHARED_ALPHA_DESCRIPTION)
@flask_db.with_db_session()
def opportunity_search(db_session: db.Session, search_params: dict) -> response.ApiResponse:
add_extra_data_to_current_request_logs(flatten_dict(search_params, prefix="request.body"))
logger.info("POST /v0.1/opportunities/search")

with db_session.begin():
opportunities, pagination_info = search_opportunities(db_session, search_params)

Expand Down
39 changes: 37 additions & 2 deletions api/src/api/opportunities_v0_1/opportunity_schemas.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from src.api.schemas.extension import Schema, fields
from src.api.schemas.search_schema import StrSearchSchemaBuilder
from src.constants.lookup_constants import (
ApplicantType,
FundingCategory,
Expand Down Expand Up @@ -242,9 +243,43 @@ class OpportunitySchema(Schema):
updated_at = fields.DateTime(dump_only=True)


class OpportunitySearchFilterSchema(Schema):
funding_instrument = fields.Nested(
StrSearchSchemaBuilder("FundingInstrumentFilterSchema")
.with_one_of(allowed_values=FundingInstrument)
.build()
)
funding_category = fields.Nested(
StrSearchSchemaBuilder("FundingCategoryFilterSchema")
.with_one_of(allowed_values=FundingCategory)
.build()
)
applicant_type = fields.Nested(
StrSearchSchemaBuilder("ApplicantTypeFilterSchema")
.with_one_of(allowed_values=ApplicantType)
.build()
)
opportunity_status = fields.Nested(
StrSearchSchemaBuilder("OpportunityStatusFilterSchema")
.with_one_of(allowed_values=OpportunityStatus)
.build()
)
agency = fields.Nested(
StrSearchSchemaBuilder("AgencyFilterSchema")
.with_one_of(example="US-ABC", minimum_length=2)
.build()
)


class OpportunitySearchRequestSchema(Schema):
# A follow-up ticket will add filters and sorting
# for now just including the pagination parameters.
query = fields.String(
metadata={
"description": "Query string which searches against several text fields",
"example": "research",
}
)

filters = fields.Nested(OpportunitySearchFilterSchema())

pagination = fields.Nested(
generate_pagination_schema(
Expand Down
75 changes: 75 additions & 0 deletions api/src/api/schemas/search_schema.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
from enum import StrEnum
from typing import Type

from src.api.schemas.extension import Schema, fields, validators


class StrSearchSchemaBuilder:
"""
Builder for setting up a filter in a search endpoint schema.
Our schemas are setup to look like:
{
"filters": {
"field": {
"one_of": ["x", "y", "z"]
}
}
}
This helps generate the filters for a given field. At the moment,
only a one_of filter is implemented.
Usage::
# In a search request schema, you would use it like so
class OpportunitySearchFilterSchema(Schema):
example_enum_field = fields.Nested(
StrSearchSchemaBuilder("ExampleEnumFieldSchema")
.with_one_of(allowed_values=ExampleEnum)
.build()
)
example_str_field = fields.Nested(
StrSearchSchemaBuilder("ExampleStrFieldSchema")
.with_one_of(example="example_value", minimum_length=5)
.build()
)
"""

def __init__(self, schema_class_name: str):
# The schema class name is used on the endpoint
self.schema_fields: dict[str, fields.MixinField] = {}
self.schema_class_name = schema_class_name

def with_one_of(
self,
*,
allowed_values: Type[StrEnum] | None = None,
example: str | None = None,
minimum_length: int | None = None
) -> "StrSearchSchemaBuilder":
metadata = {}
if example:
metadata["example"] = example

# We assume it's just a list of strings
if allowed_values is None:
params: dict = {"metadata": metadata}
if minimum_length is not None:
params["validate"] = [validators.Length(min=2)]

list_type: fields.MixinField = fields.String(**params)

# Otherwise it is an enum type which handles allowed values
else:
list_type = fields.Enum(allowed_values, metadata=metadata)

self.schema_fields["one_of"] = fields.List(list_type)

return self

def build(self) -> Schema:
return Schema.from_dict(self.schema_fields, name=self.schema_class_name) # type: ignore
Loading

0 comments on commit 15f9d59

Please sign in to comment.