Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up GitHub API response caching proxy-server #99

Open
hlibbabii opened this issue Jan 17, 2021 · 2 comments
Open

Set up GitHub API response caching proxy-server #99

hlibbabii opened this issue Jan 17, 2021 · 2 comments

Comments

@hlibbabii
Copy link
Member

We often need to query GitHub API to get data. However, there is a limit on how many requests per hour one can do with one token; also querying on the fly is expensive timewise. The approach we took with our "bugginess" training set is to preload all the commits. and save them into CSV files. Now I started looking into the tools that detect refactorings in a given commit. Unfortunately, one of the tools, RefactoringMiner, does not provide the possibility to pass a git diff to it out of the box. It asks for a GitHub URL or path to a locally cloned repo. Given this, an alternative to pre-loading commits in a format that might not be suitable for all use-cases is to consider querying API on the fly again. However, we can make it cheaper by setting up a proxy that would cache the GitHub API responses so that when querying them repeatedly quota is not used. Another benefit of this is speed. We can set up the proxy on the same machine we run the pipeline on (ironspeed), so this won't be different than just reading pre-loaded data from the disk.

@hlibbabii hlibbabii added the idea label Jan 17, 2021
@hlibbabii
Copy link
Member Author

Use a virtual file system instead? See related work from here: https://hal.archives-ouvertes.fr/hal-03139393/document

@hlibbabii
Copy link
Member Author

Started working on CommitExplorer that downloads GitHub repos and stores them on the disk

@hlibbabii hlibbabii added the tech label May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant