Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load GitHub data into Postgres in Dev #2665

Closed
6 tasks done
widal001 opened this issue Oct 30, 2024 · 28 comments · Fixed by #2744, #2759, #2778, #2786 or #2796
Closed
6 tasks done

Load GitHub data into Postgres in Dev #2665

widal001 opened this issue Oct 30, 2024 · 28 comments · Fixed by #2744, #2759, #2778, #2786 or #2796
Assignees

Comments

@widal001
Copy link
Collaborator

widal001 commented Oct 30, 2024

Summary

Load the GitHub issue data into Postgres in dev so that we can start analyzing this data in Metabase.

Note: Ideally we'd be loading this in using the new data schema that @DavidDudas-Intuitial has been working on, but at as a fall back, we can use the GitHubIssue.to_sql() method to dump the flattened data into Postgres, and we should consider switching to that fallback strategy if we can't get the new data schema working by 11/5.

TODO:

  • Run daily step function to execute make gh-transform-and-load
  • Add new step function to execute make init-db
  • Add IAM support to /analytics db client
  • Troubleshoot failed db connection attempts from ECS container
  • Troubleshoot create table privilege problems

Acceptance criteria

  • On a daily basis, we load data exported from GitHub into the Postgres DB in dev
@widal001
Copy link
Collaborator Author

This would most likely use AWS step functions to run the CLI command for loading the data. @coilysiren can support on how to build/run the step function.

@widal001
Copy link
Collaborator Author

widal001 commented Nov 5, 2024

Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed.

1 similar comment
@widal001
Copy link
Collaborator Author

widal001 commented Nov 5, 2024

Beep boop: Automatically setting the point and sprint values for this issue in project HHS/13 because they were unset when the issue was closed.

@DavidDudas-Intuitial
Copy link
Collaborator

DavidDudas-Intuitial commented Nov 6, 2024

PR #2759

@widal001
Copy link
Collaborator Author

widal001 commented Nov 7, 2024

Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot.

@widal001 widal001 closed this as completed Nov 7, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Simpler Grants Product & Delivery Nov 7, 2024
@sarahknoppA6
Copy link
Collaborator

@widal001 can you tell us why this keeps getting moved to done?

DavidDudas-Intuitial added a commit that referenced this issue Nov 7, 2024
## Summary
Fixes #2665 

### Time to review: __1 min__

## Changes proposed
> What was added, updated, or removed in this PR.

Added `gh-transform-and-load` command to existing `make gh-data-export`
command. I'm not sure if this is sufficient or correct, but I'm taking a
guess based on what I see in
#2546 and
#2506.

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

In the analytics work stream, we have a new CLI command `make
gh-transform-and-load` for transforming and loading (some) GitHub data.
Per issue #2665, that command should be run daily, after the existing
`gh-data-export` command which exports data from Github.

I see that `scheduled_jobs.tf` seems to be the mechanism by which `make
gh-data-export` runs daily. In this PR I'm taking and educated guess and
attempting to add `gh-transform-and-load` to the existing job, and
requesting feedback from @coilysiren as to whether this is the correct
approach.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.

Co-authored-by: kai [they] <[email protected]>
@github-project-automation github-project-automation bot moved this from In Progress to Done in Simpler Grants Product & Delivery Nov 7, 2024
@DavidDudas-Intuitial
Copy link
Collaborator

PR #2778

@DavidDudas-Intuitial
Copy link
Collaborator

PR #2826

DavidDudas-Intuitial added a commit that referenced this issue Nov 13, 2024
## Summary
Partially Fixes #2665 

### Time to review: __1 min__

## Changes proposed
> What was added, updated, or removed in this PR.

Adds db name to Postgres connection url; removes logging

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.
@widal001
Copy link
Collaborator Author

Beep boop: Automatically closing this issue because it was marked as 'Done' in https://github.com/orgs/HHS/projects/13. This action was performed by a bot.

@DavidDudas-Intuitial
Copy link
Collaborator

PR #2828

@widal001 widal001 moved this from Done to In Progress in Simpler.Grants.gov Product Backlog Nov 13, 2024
DavidDudas-Intuitial added a commit that referenced this issue Nov 13, 2024
## Summary
Maybe Fixes #2665 

### Time to review: __5 mins__

## Changes proposed
> What was added, updated, or removed in this PR.

Added connection pools to `/analytics`, replacing single instance db
connections. Implementation follows pattern in
`/api/src/adapters/db/clients/`.

## Context for reviewers
> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

This is latest step in a series of attempts to resolve failed db
connections from /analytics step functions. See ticket history and
comments for details.

## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.
@github-project-automation github-project-automation bot moved this from In Progress to Done in Simpler Grants Product & Delivery Nov 13, 2024
@DavidDudas-Intuitial
Copy link
Collaborator

PR #2836

@DavidDudas-Intuitial
Copy link
Collaborator

Related: #2840

@sarahknoppA6 sarahknoppA6 changed the title Load GitHub data into Postgres Load GitHub data into Postgres in Dev Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment