Skip to content

Commit c4b6cb7

Browse files
committed
Added improvements to detect and handle non-text files effectively
1 parent 7aa0fc6 commit c4b6cb7

File tree

6 files changed

+148
-75
lines changed

6 files changed

+148
-75
lines changed

Diff for: README.md

+16-15
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,35 @@
11
# CodeWhisperer
22

3-
CodeWhisperer is an innovative Node.js application that allows users to have conversations with their code repositories. By leveraging the power of OpenAI's AI models, it can summarize text files and entire code repositories to facilitate a chat-like interaction.
3+
CodeWhisperer is a web-based application designed to facilitate insightful interactions between users and their code repositories. By leveraging cutting-edge AI technologies, CodeWhisperer provides intelligent summaries and detailed explanations, engaging users in a conversational interface that demystifies complex codebases.
44

55
## Overview
66

7-
The application is built using Express for server management, MongoDB for data persistence, and Bootstrap for the front-end interface. It primarily interacts with OpenAI's API to create summaries of repositories and engage users in conversations about their code. The app processes GitHub repositories by cloning them, generating summaries through the gpt-3.5-turbo-16k and gpt-4-turbo-preview AI models, and then communicates the results via email using Sendgrid.
7+
The app integrates the Express framework within a Node.js environment, utilizing MongoDB for data persistence. For UI elegance and interactivity, Bootstrap is employed. The core functionality is built around OpenAI's powerful language models, enabling the app to generate concise text summaries and interactive Q&A sessions about a user's code repository.
88

99
## Features
1010

11-
- Accepts GitHub repository URLs and user email addresses for processing
12-
- Clones repositories and summarizes text files and overall project contents
13-
- Communicates with OpenAI's API to generate summaries and answer user queries
14-
- Sends email notifications with links to interact with the repository summary
15-
- Provides a chat interface for dynamic interaction with the project summary
11+
- Submits GitHub repository URLs and email addresses for processing
12+
- Clones GitHub repositories and communicates with OpenAI API to generate summaries
13+
- Provides interactive explanations of code via a unique link, offering insights into repositories
14+
- Utilizes Sendgrid for email notifications upon the completion of repository analysis
15+
- Proffers a unique and engaging way to understand and engage with one's programming projects
1616

1717
## Getting started
1818

1919
### Requirements
2020

21-
- Node.js environment
22-
- MongoDB instance
23-
- OpenAI API key
24-
- Sendgrid credentials for email notifications
21+
- Node.js
22+
- MongoDB
23+
- A Sendgrid account for email services
24+
- An OpenAI API key for natural language processing
2525

2626
### Quickstart
2727

28-
1. Clone the repository to your machine.
29-
2. Install necessary Node.js packages with `npm install`.
30-
3. Configure the required environment variables within an `.env` file.
31-
4. Run the application using `npm start` or `node server.js`.
28+
1. Ensure that MongoDB is running on your system.
29+
2. Clone the repository to your local machine.
30+
3. Install node modules by running `npm install`.
31+
4. Set up the necessary environment variables in an `.env` file.
32+
5. Start the application with `npm start`, and navigate to `localhost:3001` on your web browser.
3233

3334
### License
3435

Diff for: gitHandler.js

+34-24
Original file line numberDiff line numberDiff line change
@@ -7,40 +7,41 @@ const maxFileSize = 32 * 1024; // 32kb in bytes
77

88
const cloneAndProcessRepo = async (repoUrl) => {
99
const tempDir = tmp.dirSync({ unsafeCleanup: true });
10-
console.log(`Temporary directory created at: ${tempDir.name}`); // gpt_pilot_debugging_log
10+
console.log(`Temporary directory created at: ${tempDir.name}`);
1111
try {
1212
console.log(`Cloning the repository: ${repoUrl}`);
1313
const git = simpleGit();
1414
await git.clone(repoUrl, tempDir.name);
1515
console.log('Repository cloned.');
16-
const files = await getAllFiles(tempDir.name);
17-
const processedFiles = await filterAndCheckFiles(files);
18-
console.log('Files have been checked.');
19-
console.log(`Temporary directory will be kept at: ${tempDir.name} for file processing.`); // gpt_pilot_debugging_log
20-
return { processedFiles: processedFiles, tempDirPath: tempDir.name };
16+
const { allFiles, textFiles } = await getAllFiles(tempDir.name);
17+
const processedFiles = await filterAndCheckFiles(textFiles);
18+
console.log('Files have been checked and filtered.');
19+
return { processedFiles: processedFiles, allFiles: allFiles, tempDirPath: tempDir.name };
2120
} catch (error) {
22-
console.error('Error occurred in cloneAndProcessRepo:', error.message, error.stack); // gpt_pilot_debugging_log
23-
tempDir.removeCallback(); // Ensure cleanup even in case of error
21+
console.error('Error occurred in cloneAndProcessRepo:', error.message, error.stack);
22+
tempDir.removeCallback();
2423
throw error;
2524
}
2625
};
2726

28-
const getAllFiles = async (dirPath, arrayOfFiles = []) => {
27+
const getAllFiles = async (dirPath, allFiles = [], textFiles = []) => {
2928
try {
3029
const files = await fs.readdir(dirPath);
3130
for (const file of files) {
3231
const fullPath = path.join(dirPath, file);
33-
console.log(`Full path of file: ${fullPath}`); // gpt_pilot_debugging_log
3432
const stat = await fs.stat(fullPath);
33+
allFiles.push(fullPath);
3534
if (stat.isDirectory()) {
36-
await getAllFiles(fullPath, arrayOfFiles);
35+
await getAllFiles(fullPath, allFiles, textFiles);
3736
} else {
38-
arrayOfFiles.push(fullPath);
37+
if (isText(fullPath)) {
38+
textFiles.push(fullPath);
39+
}
3940
}
4041
}
41-
return arrayOfFiles;
42+
return { allFiles, textFiles };
4243
} catch (error) {
43-
console.error('Error occurred while getting all files:', error.message, error.stack); // gpt_pilot_debugging_log
44+
console.error('Error occurred while getting all files:', error.message, error.stack);
4445
throw error;
4546
}
4647
};
@@ -50,25 +51,34 @@ const filterAndCheckFiles = async (files) => {
5051
const processedFiles = [];
5152
for (const file of files) {
5253
const stat = await fs.stat(file);
53-
if (stat.size <= maxFileSize && isText(file)) {
54-
processedFiles.push(file);
54+
if (stat.size <= maxFileSize) {
55+
const isTextFile = await isText(file);
56+
console.log(`Checked if file is text: ${file}, Result: ${isTextFile}`);
57+
if (isTextFile) {
58+
processedFiles.push(file);
59+
}
5560
}
5661
}
57-
console.log(`Text files smaller than 32kb:`, processedFiles); // gpt_pilot_debugging_log
62+
console.log(`Text files smaller than 32kb:`, processedFiles);
5863
return processedFiles;
5964
} catch (error) {
60-
console.error('Error occurred while filtering and checking files:', error.message, error.stack); // gpt_pilot_debugging_log
65+
console.error('Error occurred while filtering and checking files:', error.message, error.stack);
6166
throw error;
6267
}
6368
};
6469

65-
const isText = (filename) => {
66-
const textFileExtensions = /\.txt$/i;
67-
const isText = textFileExtensions.test(path.extname(filename));
68-
console.log(`Checking if ${filename} is a text file: ${isText}`); // gpt_pilot_debugging_log
69-
return isText;
70+
const isText = async (filename) => {
71+
try {
72+
await fs.readFile(filename, 'utf8');
73+
console.log(`The file ${filename} is a text file.`);
74+
return true;
75+
} catch (error) {
76+
console.error('Error when checking if file is a text file:', error.message, error.stack);
77+
return false;
78+
}
7079
};
7180

7281
module.exports = {
73-
cloneAndProcessRepo
82+
cloneAndProcessRepo,
83+
isText
7484
};

Diff for: package.json

+4-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"description": "",
55
"main": "index.js",
66
"scripts": {
7-
"test": "echo \"Error: no test specified\" && exit 1"
7+
"test": "jest"
88
},
99
"keywords": [],
1010
"author": "",
@@ -23,5 +23,8 @@
2323
"simple-git": "^3.22.0",
2424
"tmp": "^0.2.1",
2525
"uuid": "^9.0.1"
26+
},
27+
"devDependencies": {
28+
"jest": "^29.7.0"
2629
}
2730
}

Diff for: processRepository.js

+23-10
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,21 @@ const openai = new OpenAI(process.env.OPENAI_API_KEY);
1111
async function processRepository(githubUrl, email) {
1212
// Start processing asynchronously
1313
processRepoInBackground(githubUrl, email).catch(error => {
14-
console.error('Asynchronous processing error:', error.message, error.stack); // gpt_pilot_debugging_log
14+
console.error('Asynchronous processing error:', error.message, error.stack);
1515
// Save the error state to the database
1616
Repository.findOneAndUpdate({ githubUrl, email }, { isProcessed: true, processingError: error.message }, { new: true }).catch(err => {
17-
console.error('Failed to update repository with error state:', err.message, err.stack); // gpt_pilot_debugging_log
17+
console.error('Failed to update repository with error state:', err.message, err.stack);
1818
});
1919
});
2020

2121
// Return immediately for the server to send the response
22-
console.log(`Processing started for repository: ${githubUrl}`); // This log confirms the asynchronous start
22+
console.log(`Processing started for repository: ${githubUrl}`);
2323
}
2424

2525
async function processRepoInBackground(githubUrl, email) {
2626
let tempDirPath;
2727
try {
28-
const { processedFiles, tempDirPath: dirPath } = await cloneAndProcessRepo(githubUrl);
28+
const { processedFiles, allFiles, tempDirPath: dirPath } = await cloneAndProcessRepo(githubUrl);
2929
tempDirPath = dirPath;
3030
let fileSummariesObject = {};
3131

@@ -35,21 +35,34 @@ async function processRepoInBackground(githubUrl, email) {
3535
const summary = await generateSummary(content);
3636
fileSummariesObject[file] = summary; // Store summary associated with file name
3737
} catch (fileReadError) {
38-
console.error('Error reading file:', fileReadError.message, fileReadError.stack); // gpt_pilot_debugging_log
38+
console.error('Error reading or summarizing file:', file, fileReadError.message, fileReadError.stack);
3939
}
4040
}
4141

4242
const fileSummariesArray = Object.values(fileSummariesObject); // Convert summaries object to array
43-
console.log('File summaries:', fileSummariesArray); // gpt_pilot_debugging_log
44-
43+
console.log('File summaries:', fileSummariesArray);
44+
4545
const combinedSummaries = fileSummariesArray.join(' ');
46+
47+
// Convert allFiles array to string, where each file path is separated by a new line
48+
const allFilesString = allFiles.join('\n');
49+
console.log(`All files as string: ${allFilesString}`); // gpt_pilot_debugging_log
50+
51+
// Now, combine the individual file summaries and the allFilesString
52+
const combinedSummariesWithPaths = `${combinedSummaries}\n\n${allFilesString}`;
53+
54+
// Update the projectSummaryResponse OpenAI call:
4655
const projectSummaryResponse = await openai.chat.completions.create({
4756
model: 'gpt-4-turbo-preview',
48-
messages: [{ role: "system", content: "Summarize this project based on the individual file summaries." }, { role: "user", content: combinedSummaries }],
49-
max_tokens: 1024,
57+
messages: [{ role: "system", content: "Summarize this project based on the individual file summaries and the list of all file paths." }, { role: "user", content: combinedSummariesWithPaths }],
58+
max_tokens: 2048,
5059
temperature: 0.5
60+
}).catch(error => {
61+
console.error('Error during OpenAI project summary call with all file paths:', error.message, error.stack); // gpt_pilot_debugging_log
62+
throw error;
5163
});
52-
64+
console.log(`Project summary with all file paths has been generated.`); // gpt_pilot_debugging_log
65+
5366
const projectSummary = projectSummaryResponse.choices[0].message.content.trim();
5467
const updatedRepository = await Repository.findOneAndUpdate(
5568
{ githubUrl, email },

Diff for: routes.js

+24-25
Original file line numberDiff line numberDiff line change
@@ -21,33 +21,32 @@ router.post('/submit', async (req, res) => {
2121
}
2222
try {
2323
const repoResponse = await axios.get(githubUrl.replace('https://github.com', 'https://api.github.com/repos'));
24-
if (!repoResponse.data.private && repoResponse.data.size <= 500) {
25-
const existingRepo = await Repository.findOne({ githubUrl });
26-
if (existingRepo) {
27-
if (existingRepo.isProcessed) {
28-
return res.redirect(`/explain/${existingRepo.uuid}`);
29-
} else {
30-
return res.status(200).send('Repository is currently being processed.');
31-
}
32-
}
33-
const newRepo = new Repository({ githubUrl, email });
34-
await newRepo.save();
35-
console.log(`New repository saved with URL: ${githubUrl}`);
36-
37-
try {
38-
await processRepository(newRepo.githubUrl, newRepo.email);
39-
console.log(`Processing started for repository: ${newRepo.githubUrl}`);
40-
} catch (error) {
41-
console.error(`Error during the processing of the repository: ${error.message}`, error.stack);
42-
return res.status(500).send(`Error during the processing of the repository: ${error.message}`);
43-
}
44-
45-
res.status(201).send('Repository processing started. You will receive an email when it is complete.');
24+
if (repoResponse.data.private) {
25+
console.error('Repository is private:', githubUrl);
26+
return res.status(400).send('Repository is private or does not exist.');
27+
} else if (repoResponse.data.size === 0) {
28+
console.log(`Repository ${githubUrl} is empty and cannot be processed.`);
29+
return res.status(400).send('The repository is empty and cannot be processed.');
4630
} else if (repoResponse.data.size > 500) {
47-
res.status(400).send('Repository has more than 500 files and cannot be processed.');
48-
} else {
49-
res.status(400).send('Repository is private or does not exist.');
31+
console.error('Repository has more than 500 files:', githubUrl);
32+
return res.status(400).send('Repository has more than 500 files and cannot be processed.');
33+
}
34+
const existingRepo = await Repository.findOne({ githubUrl });
35+
if (existingRepo) {
36+
if (existingRepo.isProcessed) {
37+
console.log(`Redirecting to existing processed repository: ${existingRepo.uuid}`);
38+
return res.redirect(`/explain/${existingRepo.uuid}`);
39+
} else {
40+
console.log(`Repository is currently being processed: ${githubUrl}`);
41+
return res.status(200).send('Repository is currently being processed.');
42+
}
5043
}
44+
const newRepo = new Repository({ githubUrl, email });
45+
await newRepo.save();
46+
console.log(`New repository saved with URL: ${githubUrl}`);
47+
await processRepository(newRepo.githubUrl, newRepo.email);
48+
console.log(`Processing started for repository: ${newRepo.githubUrl}`);
49+
res.status(201).send('Repository processing started. You will receive an email when it is complete.');
5150
} catch (error) {
5251
console.error('Error handling POST /submit:', error.message, error.stack);
5352
res.status(400).send('Error checking repository. Make sure the URL is correct and the repository is public.');

Diff for: tests/gitHandler.test.js

+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
const fs = require('fs-extra');
2+
const { isText } = require('../gitHandler'); // Adjust the import statement to destructure isText directly
3+
4+
describe('isText function', () => {
5+
afterEach(() => {
6+
jest.restoreAllMocks();
7+
});
8+
9+
it('should consider a file with text contents as a text file', async () => {
10+
jest.spyOn(fs, 'readFile').mockResolvedValue('This is a text file');
11+
const filename = 'test.txt';
12+
// try-catch is unnecessary here because jest will catch rejections for us
13+
const result = await isText(filename);
14+
expect(result).toBe(true);
15+
});
16+
17+
it('should not consider a binary file as a text file', async () => {
18+
jest.spyOn(fs, 'readFile').mockRejectedValue(new Error('Binary file'));
19+
const filename = 'binaryfile.bin';
20+
// try-catch is unnecessary here because jest will catch rejections for us
21+
const result = await isText(filename);
22+
expect(result).toBe(false);
23+
});
24+
25+
it('should properly read files regardless of extension', async () => {
26+
jest.spyOn(fs, 'readFile')
27+
.mockResolvedValueOnce('Normal text')
28+
.mockRejectedValueOnce(new Error('Binary content'));
29+
const textFilename = 'file.with.unknownext';
30+
const binaryFilename = 'image.jpg';
31+
// try-catch is unnecessary here because jest will catch rejections for us
32+
const textResult = await isText(textFilename);
33+
const binaryResult = await isText(binaryFilename);
34+
expect(textResult).toBe(true);
35+
expect(binaryResult).toBe(false);
36+
});
37+
38+
it('should handle encoding-related read errors gracefully', async () => {
39+
const error = new Error('EncodingError');
40+
error.code = 'ENOENT';
41+
jest.spyOn(fs, 'readFile').mockRejectedValue(error);
42+
const filename = 'fileWithEncodingIssues.txt';
43+
// try-catch is unnecessary here because jest will catch rejections for us
44+
const result = await isText(filename);
45+
expect(result).toBe(false);
46+
});
47+
});

0 commit comments

Comments
 (0)