Skip to content

Commit

Permalink
feat: adds context caching, toast UI, versioned templates
Browse files Browse the repository at this point in the history
feat: adds context caching, toast UI, versioned templates
  • Loading branch information
telpirion authored Oct 27, 2024
2 parents 3f65fb5 + 9b999ec commit dcba2da
Show file tree
Hide file tree
Showing 12 changed files with 343 additions and 39 deletions.
11 changes: 0 additions & 11 deletions db.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ import (

"cloud.google.com/go/firestore"
"google.golang.org/api/iterator"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)

const (
Expand Down Expand Up @@ -119,15 +117,6 @@ func getConversation(userEmail, projectID string) ([]ConversationBit, error) {

// Check whether this user exists in the database or not
docRef := client.Collection(CollectionName).Doc(userEmail)
_, err = docRef.Get(ctx)
if status.Code(err) == codes.NotFound {
return conversations, nil
}
if err != nil {
LogError(fmt.Sprintf("firestore.DocumentRef: %v\n", err))
return conversations, err
}

iter := docRef.Collection(SubCollectionName).Documents(ctx)
for {
doc, err := iter.Next()
Expand Down
90 changes: 85 additions & 5 deletions docs/Week3-DevelopmentLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,93 @@

For this week's activities, we must do the following:

- [ ] Retrieve stored conversations with the user.
- [ ] Set the user's past conversations as context with Vertex (Gemini).
- [ ] Set the user's past conversations as context in template (Gemma).
- [ ] Find data about how to fine tune Gemini, Gemma, or Gemma 2.
- [x] Retrieve stored conversations with the user.
- [x] Set the user's past conversations as context with Vertex (Gemini).
- [x] Set the user's past conversations as context in template (Gemma).
- [x] Find data about how to fine tune Gemini, Gemma, or Gemma 2.
- [ ] Fine tune a Gemini, Gemma, or Gemma 2 model using the OpenAssistant
guanaco dataset on HuggingFace.
- [ ] Save the fine-tuned model to Model Registry, Model Garden, or HuggingFace.
- [ ] Deploy the model to an endpoint.
- [ ] Integrate the model into the web app.
moma

Nice-to-haves:

- [ ] Create a toast UI element that informs the user when their response rating was received.

## Retrieving user context for models

+ 👍 Setting the context cache for Gemini seems really simple. The API really helps with
doing so.
+ I'm not sure whether the AI Platform API has a field for storing context. This might
need to be included a la grounding.
+ 👎 Caching with Gen AI fails with because the cache isn't large enough?!

```sh
2024/10/25 19:56:49 error:
Couldn't store conversation context: CreateCachedContent: rpc error: code = InvalidArgument desc = The cached content is of 2235 tokens. The minimum token count to start caching is 32768.
```
- 👎 This is from a stored set of 22 back-and-forth query/responses from a real user (me) and the model!!!
- 🤔 For users that need to provide context to the model, but DON'T have the 32768 token count to start caching, what is the recommended approach?
RAG? Should our docs maybe recommend an approach?
- 🤔 How would I know if I have the recommended token count without trial & error? If my ~22 response/replies is only 2235 tokens, assuming that each
response/reply is about 110 tokens, then a conversation (from my app) needs to have a total of 330 response/replies from a user before caching is
available ...
- 🤔 I wonder ... is it assumed that the system instructions are included in that token amount?

+ 🤔 Storing the context as a RAG part of the prompt seems to be working okay for Gemini. I wonder what would happen if I
used Gemini outputs as context for the Gemma prompt? Do I need to filter out Gemma & Gemini context histories?

- 👎 Oof, the model failed with a cryptic `rpc error: code = Internal desc = {"error":"Incomplete generation","error_type":"Incomplete generation"}`
error. I honestly don't know how to debug that error...
- Looking in the logs, I see that there were TOO many new tokens for the Gemma model:
```json
{"timestamp":"2024-10-25T22:35:49.174225Z","level":"ERROR","message":"`inputs` tokens + `max_new_tokens` must be <= 2048. Given: 4375 `inputs` tokens and 100 `max_new_tokens`","target":"text_generation_router::infer","filename":"router/src/infer/mod.rs","line_number":102,"span":{"name":"generate_stream"},"spans":[{"name":"vertex_compatibility"},{"name":"generate"},{"name":"generate_stream"}]}
```
- Looking into Go tokenizers ... it looks like `SentencePiece` is what Gemma uses, which has a C++ binary associated with it (?).
* https://github.com/eliben/go-sentencepiece
* go-sentence piece needs this file: https://github.com/google/gemma_pytorch/blob/main/tokenizer/tokenizer.model
* https://github.com/google/sentencepiece
- Other open source tokenizers:
* https://github.com/tiktoken-go/tokenizer (for OpenAI models)
- 🤔 Maybe we need to record the number of tokens in each ConversationBit, and then only collect the first 2000.
We might be able to use the Firestore aggregation filters to get this out-of-the-box.
Sources:
+ https://go.dev/play/p/4rLkXhW570p
+ 👎 https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create
+ 👎 https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-use
+ https://pkg.go.dev/errors#As
+ https://stackoverflow.com/questions/54156119/range-over-string-slice-in-golang-template
+ https://ai.google.dev/gemma/docs/model_card_2
## Creating a UI toast
+ Going to use CSS animations to show and hide the toast notification, using keyframes.
+ Getting the timing just right is the tough part, making sure that the notification shows
and then is hidden again.
Sources:
+ https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_animations/Using_CSS_animations
+ 👎 https://stackoverflow.com/questions/16670931/hide-scroll-bar-but-while-still-being-able-to-scroll
## Finding information about fine tuning
+ 😬 It looks like some of the versions of the Guanaco dataset aren't well supported. One version of the
dataset said that there could be some inappropriate content in the dataset.
+ Fine tuning seemingly is only documented for Python. It _should_ be possible in other languages
that the Vertex client library is available in.

Sources:

+ https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-use-supervised-tuning#python
+ https://huggingface.co/datasets/timdettmers/openassistant-guanaco
+ https://guanaco-model.github.io
8 changes: 8 additions & 0 deletions js/herodotusMsg.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
window.addEventListener("load", function () {

const toast = document.querySelector('.notification');

document.addEventListener("msg", () => {
document.querySelector("progress").classList.toggle("is-hidden");
document.querySelector(".message-actual").classList.toggle("is-hidden");
Expand Down Expand Up @@ -29,6 +31,12 @@ window.addEventListener("load", function () {
})
.then(data => {
console.log(data);
toast.classList.toggle("toast");
toast.classList.toggle("toast-hide");
this.setTimeout(()=>{
toast.classList.toggle("toast-hide");
toast.classList.toggle("toast");
}, 5000);
})
});
});
Expand Down
8 changes: 4 additions & 4 deletions logger.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,21 +33,21 @@ func _logInternal(fullMessage string, sev logging.Severity) {
}

func LogError(message string) {
fullMessage := fmt.Sprintf("error: \n%v\n", message)
fullMessage := fmt.Sprintf("error: %v\n", message)
_logInternal(fullMessage, logging.Error)
}

func LogInfo(message string) {
fullMessage := fmt.Sprintf("info: \n%v\n", message)
fullMessage := fmt.Sprintf("info: %v\n", message)
_logInternal(fullMessage, logging.Info)
}

func LogDebug(message string) {
fullMessage := fmt.Sprintf("debug: \n%v\n", message)
fullMessage := fmt.Sprintf("debug: %v\n", message)
_logInternal(fullMessage, logging.Debug)
}

func LogWarning(message string) {
fullMessage := fmt.Sprintf("warning: \n%v\n", message)
fullMessage := fmt.Sprintf("warning: %v\n", message)
_logInternal(fullMessage, logging.Warning)
}
34 changes: 31 additions & 3 deletions main.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package main

import (
"errors"
"fmt"
"log"
"net/http"
Expand All @@ -16,6 +17,7 @@ var (
projectID string
userEmail string = "[email protected]"
userEmailParam string = "user"
convoContext string
)

type ClientError struct {
Expand Down Expand Up @@ -76,6 +78,29 @@ func startConversation(c *gin.Context) {

LogInfo("Start conversation request received")

// create a new conversation context
convoHistory, err := getConversation(userEmail, projectID)
if err != nil {
LogError(fmt.Sprintf("couldn't get conversation history: %v\n", err))
}

// VertexAI + Gemini caching has a hard lower minimum; warn if the
// minimum isn't reached
convoContext, err = storeConversationContext(convoHistory, projectID)
var minConvoNum *MinCacheNotReachedError
if errors.As(err, &minConvoNum) {
LogWarning(err.Error())
} else if err != nil {
LogError(fmt.Sprintf("couldn't store conversation context: %v\n", err))
}

// Populate the conversation context variable for grounding both Gemma and
// Gemini (< 33000 tokens) caching.
err = setConversationContext(convoHistory)
if err != nil {
LogError(fmt.Sprintf("couldn't set conversation context: %v\n", err))
}

c.HTML(http.StatusOK, "index.html", gin.H{
"Message": struct {
Message string
Expand All @@ -100,7 +125,7 @@ func respondToUser(c *gin.Context) {
var promptTemplate string
err := c.BindJSON(&userMsg)
if err != nil {
LogError(fmt.Sprintf("Couldn't parse client message: %v\n", err))
LogError(fmt.Sprintf("couldn't parse client message: %v\n", err))
c.JSON(http.StatusBadRequest, gin.H{
"Message": "Couldn't parse payload",
})
Expand All @@ -115,7 +140,7 @@ func respondToUser(c *gin.Context) {
promptTemplate = GeminiTemplate
}
if err != nil {
LogError(fmt.Sprintf("Bad response from Gemini %v\n", err))
LogError(fmt.Sprintf("bad response from %s: %v\n", userMsg.Model, err))
botResponse = "Oops! I had troubles understanding that ..."
}

Expand All @@ -127,11 +152,14 @@ func respondToUser(c *gin.Context) {
Prompt: promptTemplate,
}

// Use a separate thread to store the conversation
// Store the conversation in Firestore and update the cachedContext
// This is dual-entry accounting so that we don't have to query Firestore
// every time to update the cached context
documentID, err := saveConversation(*convo, userEmail, projectID)
if err != nil {
LogError(fmt.Sprintf("Couldn't save conversation: %v\n", err))
}
cachedContext += fmt.Sprintf("### Human: %s\n### Assistant: %s\n", userMsg.Message, botResponse)

c.JSON(http.StatusOK, gin.H{
"Message": struct {
Expand Down
Binary file modified my-herodotus
Binary file not shown.
1 change: 1 addition & 0 deletions templates/conversation_history.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{{range .}} ### Human: {{.UserQuery}} ### Assistant: {{.BotResponse}} {{ printf "\n" }}{{end}}
19 changes: 19 additions & 0 deletions templates/gemini.2024.10.25.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
### Instruction:

Ignore all previous instructions.

You are a helpful travel agent assistant. The user will ask you about where to go on vacation.
You are going to help them plan their trip.

Be sure to label your response with '### Assistant:' and end your response with '##ENDRESPONSE##'.

Do not include system instructions in the response.

Here is the conversation history between you, Assistant, and the user, Human.

{{ .History }}

### Input:
Here is the user query. Respond to the user's request. Check your answer before responding.

### Human: {{ .Query }}
19 changes: 19 additions & 0 deletions templates/gemma.2024.10.25.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
### Instruction:

Ignore all previous instructions.

You are a helpful travel agent assistant. The user will ask you about where to go on vacation.
You are going to help them plan their trip.

Be sure to label your response with '### Assistant:' and end your response with '##ENDRESPONSE##'.

Do not include system instructions in the response.

Here is the conversation history between you, Assistant, and the user, Human. Check your answer before responding

{{ .History }}

### Input:
Here is the user query. Respond to the user's request. Check your answer before responding.

### Human: {{ .Query }}
60 changes: 52 additions & 8 deletions templates/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>My Own Herodotus</title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/css/bulma.min.css">
<link href="https://fonts.googleapis.com/css?family=Roboto" rel="preload" as="font">
<link href="https://fonts.googleapis.com/css2?family=Roboto:ital,wght@0,100;0,300;0,400;0,500;0,700;0,900;1,100;1,300;1,400;1,500;1,700;1,900&display=swap" rel="stylesheet">
<link href="https://fonts.googleapis.com/css2?family=Material+Symbols+Outlined:opsz,wght,FILL,[email protected],100..700,0..1,-50..200&icon_names=send,thumb_down,thumb_up" rel="stylesheet" />
<style>
body, html {
font-family: Roboto !important;
font-family: "Roboto", sans-serif;
}
.material-symbols-outlined {
font-variation-settings:
Expand All @@ -18,16 +18,57 @@
'GRAD' 0,
'opsz' 48
}
.notification {
margin-top: 15px;
}
@keyframes fade-in {
25%,
90% {
opacity: 0%;
}

40% {
scale: 100%;
}
}
.toast {
animation-name: fade-in;
animation-duration: 6s;
}
.toast-hide {
opacity: 0%;
}
/* This overflow code doesn't seem to work. */
.conversation {
width: 100%;
height: 100%;
overflow: hidden;
}
.conversation > .scroll {
width: 100%;
height: 100%;
overflow-y: scroll;
box-sizing: content-box;
}
</style>
<link rel="icon" type="image/x-icon" href="/favicon.ico" />

</head>
<body>
<div class="container">
<p class="title is-1">My Own Herodotus</p>
<p class="subtitle is-3">A travel guide for everyone</p>
<div class="columns">
<div class="column is-four-fifths">
<p class="title is-1">My Own Herodotus</p>
<p class="subtitle is-3">A travel guide for everyone</p>
</div>
<div class="column">
<div class="toast-hide notification is-info">
Response rating received!
</div>
</div>
</div>
<hr class="bd-hr">

<div class="block">
<div class="box">
<div class="select">
Expand All @@ -38,10 +79,13 @@
</div>
</div>
</div>

{{ template "herodotus_msg.tmpl" .Message }}
<div class="conversation">
<div class="scroll">
{{ template "herodotus_msg.tmpl" .Message }}

{{ template "user_msg.tmpl" .Message }}
{{ template "user_msg.tmpl" .Message }}
</div>
</div>
</div>
<script type="module" src="js/appInit.js"></script>
<script type="module" defer src="js/validateAuth.js"></script>
Expand Down
Loading

0 comments on commit dcba2da

Please sign in to comment.