You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search
2
-
3
-
[](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=599293758&machine=standardLinux32gb&devcontainer_path=.devcontainer%2Fdevcontainer.json&location=WestUs2)
4
-
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/azure-samples/azure-search-openai-demo)
5
-
6
-
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access the ChatGPT model (gpt-35-turbo), and Azure Cognitive Search for data indexing and retrieval.
1
+
# ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Java Version
2
+
This repo is the java conversion of the well known [chatGPT + Enterprise data code sample](https://github.com/Azure-Samples/azure-search-openai-demo) originally written in python.
3
+
It demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access the ChatGPT model (gpt-35-turbo), and Azure Cognitive Search for data indexing and retrieval.
7
4
8
5
The repo includes sample data so it's ready to try end to end. In this sample application we use a fictitious company called Contoso Electronics, and the experience allows its employees to ask questions about the benefits, internal policies, as well as job descriptions and roles.
9
6
@@ -18,6 +15,18 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
18
15
19
16

20
17
18
+
## Python Converstion Status
19
+
While this first version is MVP showcasing semantic search scenario with java and azure open AI, It is still under active development. Below you can find the status of the conversation and the planned features.
20
+
21
+
Python Approach | Java Open AI SDK | Java Semantic Kernel |
22
+
:------------ | :-------------| :-------------|
23
+
RetrieveThenRead | :white_check_mark: | :x:
24
+
ChatReadRetrieveRead| :white_check_mark: | :x:
25
+
ReadRetrieveRead | :x: | :soon:
26
+
ReadDecomposeAsk | :x: | :soon:
27
+
28
+
29
+
21
30
## Getting Started
22
31
23
32
> **IMPORTANT:** In order to deploy and run this example, you'll need an **Azure subscription with access enabled for the Azure OpenAI service**. You can request access [here](https://aka.ms/oaiapply). You can also visit [here](https://azure.microsoft.com/free/cognitive-search/) to get some free Azure credits to get you started.
@@ -26,8 +35,8 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
***Important**: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
@@ -37,25 +46,12 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
37
46
*[Powershell 7+ (pwsh)](https://github.com/powershell/powershell) - For Windows users only.
38
47
***Important**: Ensure you can run `pwsh.exe` from a PowerShell command. If this fails, you likely need to upgrade PowerShell.
39
48
40
-
>NOTE: Your Azure Account must have `Microsoft.Authorization/roleAssignments/write` permissions, such as [User Access Administrator](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#user-access-administrator) or [Owner](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#owner).
41
-
42
-
#### To Run in GitHub Codespaces or VS Code Remote Containers
43
-
44
-
You can run this repo virtually by using GitHub Codespaces or VS Code Remote Containers. Click on one of the buttons below to open this repo in one of those options.
45
49
46
-
[](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=599293758&machine=standardLinux32gb&devcontainer_path=.devcontainer%2Fdevcontainer.json&location=WestUs2)
47
-
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/azure-samples/azure-search-openai-demo)
50
+
>NOTE: The initial cognitive search documents indexing process (triggered as post provision task by azd) is still using the original python scripts. That's why python is still required to run this java example.
48
51
49
-
### Installation
50
-
51
-
#### Project Initialization
52
-
53
-
1. Create a new folder and switch to it in the terminal
54
-
1. Run `azd auth login`
55
-
1. Run `azd init -t azure-search-openai-demo`
56
-
* note that this command will initialize a git repository and you do not need to clone this repository
52
+
>NOTE: Your Azure Account must have `Microsoft.Authorization/roleAssignments/write` permissions, such as [User Access Administrator](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#user-access-administrator) or [Owner](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#owner).
57
53
58
-
####Starting from scratch
54
+
### Starting from scratch
59
55
60
56
Execute the following command, if you don't have any pre-existing Azure services and want to start from a fresh deployment.
61
57
@@ -67,7 +63,7 @@ It will look like the following:
67
63
68
64

69
65
70
-
> NOTE: It may take a minute for the application to be fully deployed. If you see a "Python Developer" welcome screen, then wait a minute and refresh the page.
66
+
> NOTE: It may take a minute for the application to be fully deployed.
71
67
72
68
#### Use existing resources
73
69
@@ -79,29 +75,17 @@ It will look like the following:
79
75
80
76
> NOTE: You can also use existing Search and Storage Accounts. See `./infra/main.parameters.json` for list of environment variables to pass to `azd env set` to configure those existing resources.
81
77
82
-
#### Deploying or re-deploying a local clone of the repo
83
-
84
-
* Simply run `azd up`
85
-
86
-
#### Running locally
78
+
### Running locally
87
79
88
80
1. Run `azd login`
89
81
2. Change dir to `app`
90
82
3. Run `./start.ps1` or `./start.sh` or run the "VS Code Task: Start App" to start the project locally.
83
+
4. Wait for the spring boot server to start and refresh your browser to localhost:8080
91
84
92
-
#### Sharing Environments
93
-
94
-
Run the following if you want to give someone else access to completely deployed and existing environment.
95
-
96
-
1. Install the [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli)
97
-
1. Run `azd init -t azure-search-openai-demo`
98
-
1. Run `azd env refresh -e {environment name}` - Note that they will need the azd environment name, subscription Id, and location to run this command - you can find those values in your `./azure/{env name}/.env` file. This will populate their azd environment's .env file with all the settings needed to run the app locally.
99
-
1. Run `pwsh ./scripts/roles.ps1` - This will assign all of the necessary roles to the user so they can run the app locally. If they do not have the necessary permission to create roles in the subscription, then you may need to run this script for them. Just be sure to set the `AZURE_PRINCIPAL_ID` environment variable in the azd .env file or in the active shell to their Azure Id, which they can get with `az account show`.
100
-
101
-
### Quickstart
85
+
### UI Navigation
102
86
103
87
* In Azure: navigate to the Azure WebApp deployed by azd. The URL is printed out when azd completes (as "Endpoint"), or you can find it in the Azure portal.
104
-
* Running locally: navigate to 127.0.0.1:5000
88
+
* Running locally: navigate to localhost:8080
105
89
106
90
Once in the web app:
107
91
@@ -121,12 +105,31 @@ Once in the web app:
121
105
122
106
### FAQ
123
107
124
-
***Question***: Why do we need to break up the PDFs into chunks when Azure Cognitive Search supports searching large documents?
108
+
<details>
109
+
<summary>Why do we need to break up the PDFs into chunks when Azure Cognitive Search supports searching large documents?</summary>
110
+
111
+
Chunking allows us to limit the amount of information we send to OpenAI due to token limits. By breaking up the content, it allows us to easily find potential chunks of text that we can inject into OpenAI. The method of chunking we use leverages a sliding window of text such that sentences that end one chunk will start the next. This allows us to reduce the chance of losing the context of the text.
112
+
</details>
125
113
126
-
***Answer***: Chunking allows us to limit the amount of information we send to OpenAI due to token limits. By breaking up the content, it allows us to easily find potential chunks of text that we can inject into OpenAI. The method of chunking we use leverages a sliding window of text such that sentences that end one chunk will start the next. This allows us to reduce the chance of losing the context of the text.
114
+
<details>
115
+
<summary>How can we upload additional PDFs without redeploying everything?</summary>
116
+
117
+
To upload more PDFs, put them in the data/ folder and run `./scripts/prepdocs.sh` or `./scripts/prepdocs.ps1`. To avoid reuploading existing docs, move them out of the data folder. You could also implement checks to see whats been uploaded before; our code doesn't yet have such checks.
118
+
</details>
127
119
128
120
### Troubleshooting
129
121
130
-
If you see this error while running `azd deploy`: `read /tmp/azd1992237260/backend_env/lib64: is a directory`, then delete the `./app/backend/backend_env folder` and re-run the `azd deploy` command. This issue is being tracked here: <https://github.com/Azure/azure-dev/issues/1237>
122
+
Here are the most common failure scenarios and solutions:
123
+
124
+
1. The subscription (`AZURE_SUBSCRIPTION_ID`) doesn't have access to the Azure OpenAI service. Please ensure `AZURE_SUBSCRIPTION_ID` matches the ID specified in the [OpenAI access request process](https://aka.ms/oai/access).
125
+
126
+
1. You're attempting to create resources in regions not enabled for Azure OpenAI (e.g. East US 2 instead of East US), or where the model you're trying to use isn't enabled. See [this matrix of model availability](https://aka.ms/oai/models).
127
+
128
+
1. You've exceeded a quota, most often number of resources per region. See [this article on quotas and limits](https://aka.ms/oai/quotas).
129
+
130
+
1. You're getting "same resource name not allowed" conflicts. That's likely because you've run the sample multiple times and deleted the resources you've been creating each time, but are forgetting to purge them. Azure keeps resources for 48 hours unless you purge from soft delete. See [this article on purging resources](https://learn.microsoft.com/azure/cognitive-services/manage-resources?tabs=azure-portal#purge-a-deleted-resource).
131
+
132
+
1. You see `CERTIFICATE_VERIFY_FAILED` when the `prepdocs.py` script runs. That's typically due to incorrect SSL certificates setup on your machine. Try the suggestions in this [StackOverflow answer](https://stackoverflow.com/questions/35569042/ssl-certificate-verify-failed-with-python3/43855394#43855394).
131
133
132
-
If the web app fails to deploy and you receive a '404 Not Found' message in your browser, run `azd deploy`.
134
+
1. After running `azd up` and visiting the website, you see a '404 Not Found' in the browser. Wait 10 minutes and try again, as it might be still starting up. Then try running `azd deploy` and wait again. If you still encounter errors with the deployed app, consult these [tips for debugging Flask app deployments](http://blog.pamelafox.org/2023/06/tips-for-debugging-flask-deployments-to.html)
135
+
and file an issue if the error logs don't help you resolve the issue.
Copy file name to clipboardExpand all lines: app/backend/src/test/java/com/microsoft/openai/samples/rag/AskAPITests.java
+1-1
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@
34
34
/**
35
35
* This class tests the Ask API showcasing how you can mock azure services using mockito.
36
36
* CognitiveSearch and OpenAI models are immutable from the client usage perspective, so in order to create when/then condition with mockito
37
-
* we used a reflection hack to make some model constructor public. @see CognitiveSearchUnitTestUtils and @see OpenAIUnitTestUtils for more info.
37
+
* we used a reflection hack to make some model private constructor public. @see CognitiveSearchUnitTestUtils and @see OpenAIUnitTestUtils for more info.
0 commit comments