feat: adds context caching, toast UI, versioned templates

telpirion · Oct 27, 2024 · dcba2da · dcba2da
2 parents 3f65fb5 + 9b999ec
commit dcba2da
Show file tree

Hide file tree

Showing 12 changed files with 343 additions and 39 deletions.
diff --git a/db.go b/db.go
@@ -37,8 +37,6 @@ import (
 
 	"cloud.google.com/go/firestore"
 	"google.golang.org/api/iterator"
-	"google.golang.org/grpc/codes"
-	"google.golang.org/grpc/status"
 )
 
 const (
@@ -119,15 +117,6 @@ func getConversation(userEmail, projectID string) ([]ConversationBit, error) {
 
 	// Check whether this user exists in the database or not
 	docRef := client.Collection(CollectionName).Doc(userEmail)
-	_, err = docRef.Get(ctx)
-	if status.Code(err) == codes.NotFound {
-		return conversations, nil
-	}
-	if err != nil {
-		LogError(fmt.Sprintf("firestore.DocumentRef: %v\n", err))
-		return conversations, err
-	}
-
 	iter := docRef.Collection(SubCollectionName).Documents(ctx)
 	for {
 		doc, err := iter.Next()

diff --git a/docs/Week3-DevelopmentLog.md b/docs/Week3-DevelopmentLog.md
@@ -9,13 +9,93 @@
 
 For this week's activities, we must do the following:
 
-- [ ] Retrieve stored conversations with the user.
-- [ ] Set the user's past conversations as context with Vertex (Gemini).
-- [ ] Set the user's past conversations as context in template (Gemma).
-- [ ] Find data about how to fine tune Gemini, Gemma, or Gemma 2.
+- [x] Retrieve stored conversations with the user.
+- [x] Set the user's past conversations as context with Vertex (Gemini).
+- [x] Set the user's past conversations as context in template (Gemma).
+- [x] Find data about how to fine tune Gemini, Gemma, or Gemma 2.
 - [ ] Fine tune a Gemini, Gemma, or Gemma 2 model using the OpenAssistant
       guanaco dataset on HuggingFace.
 - [ ] Save the fine-tuned model to Model Registry, Model Garden, or HuggingFace.
 - [ ] Deploy the model to an endpoint.
 - [ ] Integrate the model into the web app.
-moma 
+
+Nice-to-haves:
+
+- [ ] Create a toast UI element that informs the user when their response rating was received.
+
+## Retrieving user context for models
+
++ 👍 Setting the context cache for Gemini seems really simple. The API really helps with
+  doing so.
++ I'm not sure whether the AI Platform API has a field for storing context. This might
+  need to be included a la grounding.
++ 👎 Caching with Gen AI fails with because the cache isn't large enough?!
+
+```sh
+2024/10/25 19:56:49 error: 
+Couldn't store conversation context: CreateCachedContent: rpc error: code = InvalidArgument desc = The cached content is of 2235 tokens. The minimum token count to start caching is 32768.
+```
+
+  - 👎 This is from a stored set of 22 back-and-forth query/responses from a real user (me) and the model!!!
+  - 🤔 For users that need to provide context to the model, but DON'T have the 32768 token count to start caching, what is the recommended approach?
+    RAG? Should our docs maybe recommend an approach?
+  - 🤔 How would I know if I have the recommended token count without trial & error? If my ~22 response/replies is only 2235 tokens, assuming that each
+    response/reply is about 110 tokens, then a conversation (from my app) needs to have a total of 330 response/replies from a user before caching is
+    available ... 
+  - 🤔 I wonder ... is it assumed that the system instructions are included in that token amount?
+
++ 🤔 Storing the context as a RAG part of the prompt seems to be working okay for Gemini. I wonder what would happen if I
+  used Gemini outputs as context for the Gemma prompt? Do I need to filter out Gemma & Gemini context histories?
+
+  - 👎 Oof, the model failed with a cryptic `rpc error: code = Internal desc = {"error":"Incomplete generation","error_type":"Incomplete generation"}`
+    error. I honestly don't know how to debug that error...
+  - Looking in the logs, I see that there were TOO many new tokens for the Gemma model:
+
+    ```json
+  {"timestamp":"2024-10-25T22:35:49.174225Z","level":"ERROR","message":"`inputs` tokens + `max_new_tokens` must be <= 2048. Given: 4375 `inputs` tokens and 100 `max_new_tokens`","target":"text_generation_router::infer","filename":"router/src/infer/mod.rs","line_number":102,"span":{"name":"generate_stream"},"spans":[{"name":"vertex_compatibility"},{"name":"generate"},{"name":"generate_stream"}]}
+    ```
+  
+  - Looking into Go tokenizers ... it looks like `SentencePiece` is what Gemma uses, which has a C++ binary associated with it (?).
+    * https://github.com/eliben/go-sentencepiece
+    * go-sentence piece needs this file: https://github.com/google/gemma_pytorch/blob/main/tokenizer/tokenizer.model 
+    * https://github.com/google/sentencepiece
+  
+  - Other open source tokenizers:
+    * https://github.com/tiktoken-go/tokenizer (for OpenAI models)
+
+  - 🤔 Maybe we need to record the number of tokens in each ConversationBit, and then only collect the first 2000.
+    We might be able to use the Firestore aggregation filters to get this out-of-the-box.
+
+Sources:
+
++ https://go.dev/play/p/4rLkXhW570p
++ 👎 https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create
++ 👎 https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-use
++ https://pkg.go.dev/errors#As 
++ https://stackoverflow.com/questions/54156119/range-over-string-slice-in-golang-template
++ https://ai.google.dev/gemma/docs/model_card_2  
+
+## Creating a UI toast
+
++ Going to use CSS animations to show and hide the toast notification, using keyframes.
++ Getting the timing just right is the tough part, making sure that the notification shows
+  and then is hidden again.
+
+Sources:
+
++ https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_animations/Using_CSS_animations
++ 👎 https://stackoverflow.com/questions/16670931/hide-scroll-bar-but-while-still-being-able-to-scroll
+
+
+## Finding information about fine tuning
+
++ 😬 It looks like some of the versions of the Guanaco dataset aren't well supported. One version of the
+  dataset said that there could be some inappropriate content in the dataset.
++ Fine tuning seemingly is only documented for Python. It _should_ be possible in other languages
+  that the Vertex client library is available in.
+
+Sources:
+
++ https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-use-supervised-tuning#python
++ https://huggingface.co/datasets/timdettmers/openassistant-guanaco
++ https://guanaco-model.github.io
diff --git a/js/herodotusMsg.js b/js/herodotusMsg.js
@@ -1,5 +1,7 @@
 window.addEventListener("load", function () {
 
+    const toast = document.querySelector('.notification');
+
     document.addEventListener("msg", () => {
         document.querySelector("progress").classList.toggle("is-hidden");
         document.querySelector(".message-actual").classList.toggle("is-hidden");
@@ -29,6 +31,12 @@ window.addEventListener("load", function () {
             })
             .then(data => {
                 console.log(data);
+                toast.classList.toggle("toast");
+                toast.classList.toggle("toast-hide");
+                this.setTimeout(()=>{
+                    toast.classList.toggle("toast-hide");
+                    toast.classList.toggle("toast");
+                }, 5000);
             })
         });
     });

diff --git a/logger.go b/logger.go
@@ -33,21 +33,21 @@ func _logInternal(fullMessage string, sev logging.Severity) {
 }
 
 func LogError(message string) {
-	fullMessage := fmt.Sprintf("error: \n%v\n", message)
+	fullMessage := fmt.Sprintf("error: %v\n", message)
 	_logInternal(fullMessage, logging.Error)
 }
 
 func LogInfo(message string) {
-	fullMessage := fmt.Sprintf("info: \n%v\n", message)
+	fullMessage := fmt.Sprintf("info: %v\n", message)
 	_logInternal(fullMessage, logging.Info)
 }
 
 func LogDebug(message string) {
-	fullMessage := fmt.Sprintf("debug: \n%v\n", message)
+	fullMessage := fmt.Sprintf("debug: %v\n", message)
 	_logInternal(fullMessage, logging.Debug)
 }
 
 func LogWarning(message string) {
-	fullMessage := fmt.Sprintf("warning: \n%v\n", message)
+	fullMessage := fmt.Sprintf("warning: %v\n", message)
 	_logInternal(fullMessage, logging.Warning)
 }
diff --git a/main.go b/main.go
@@ -1,6 +1,7 @@
 package main
 
 import (
+	"errors"
 	"fmt"
 	"log"
 	"net/http"
@@ -16,6 +17,7 @@ var (
 	projectID      string
 	userEmail      string = "[email protected]"
 	userEmailParam string = "user"
+	convoContext   string
 )
 
 type ClientError struct {
@@ -76,6 +78,29 @@ func startConversation(c *gin.Context) {
 
 	LogInfo("Start conversation request received")
 
+	// create a new conversation context
+	convoHistory, err := getConversation(userEmail, projectID)
+	if err != nil {
+		LogError(fmt.Sprintf("couldn't get conversation history: %v\n", err))
+	}
+
+	// VertexAI + Gemini caching has a hard lower minimum; warn if the
+	// minimum isn't reached
+	convoContext, err = storeConversationContext(convoHistory, projectID)
+	var minConvoNum *MinCacheNotReachedError
+	if errors.As(err, &minConvoNum) {
+		LogWarning(err.Error())
+	} else if err != nil {
+		LogError(fmt.Sprintf("couldn't store conversation context: %v\n", err))
+	}
+
+	// Populate the conversation context variable for grounding both Gemma and
+	// Gemini (< 33000 tokens) caching.
+	err = setConversationContext(convoHistory)
+	if err != nil {
+		LogError(fmt.Sprintf("couldn't set conversation context: %v\n", err))
+	}
+
 	c.HTML(http.StatusOK, "index.html", gin.H{
 		"Message": struct {
 			Message string
@@ -100,7 +125,7 @@ func respondToUser(c *gin.Context) {
 	var promptTemplate string
 	err := c.BindJSON(&userMsg)
 	if err != nil {
-		LogError(fmt.Sprintf("Couldn't parse client message: %v\n", err))
+		LogError(fmt.Sprintf("couldn't parse client message: %v\n", err))
 		c.JSON(http.StatusBadRequest, gin.H{
 			"Message": "Couldn't parse payload",
 		})
@@ -115,7 +140,7 @@ func respondToUser(c *gin.Context) {
 		promptTemplate = GeminiTemplate
 	}
 	if err != nil {
-		LogError(fmt.Sprintf("Bad response from Gemini  %v\n", err))
+		LogError(fmt.Sprintf("bad response from %s: %v\n", userMsg.Model, err))
 		botResponse = "Oops! I had troubles understanding that ..."
 	}
 
@@ -127,11 +152,14 @@ func respondToUser(c *gin.Context) {
 		Prompt:      promptTemplate,
 	}
 
-	// Use a separate thread to store the conversation
+	// Store the conversation in Firestore and update the cachedContext
+	// This is dual-entry accounting so that we don't have to query Firestore
+	// every time to update the cached context
 	documentID, err := saveConversation(*convo, userEmail, projectID)
 	if err != nil {
 		LogError(fmt.Sprintf("Couldn't save conversation: %v\n", err))
 	}
+	cachedContext += fmt.Sprintf("### Human: %s\n### Assistant: %s\n", userMsg.Message, botResponse)
 
 	c.JSON(http.StatusOK, gin.H{
 		"Message": struct {

diff --git a/my-herodotus b/my-herodotus
diff --git a/templates/conversation_history.tmpl b/templates/conversation_history.tmpl
@@ -0,0 +1 @@
+{{range .}} ### Human: {{.UserQuery}} ### Assistant: {{.BotResponse}} {{ printf "\n" }}{{end}}
diff --git a/templates/gemini.2024.10.25.tmpl b/templates/gemini.2024.10.25.tmpl
@@ -0,0 +1,19 @@
+### Instruction:
+
+Ignore all previous instructions.
+
+You are a helpful travel agent assistant. The user will ask you about where to go on vacation.
+You are going to help them plan their trip.
+
+Be sure to label your response with '### Assistant:' and end your response with '##ENDRESPONSE##'.
+
+Do not include system instructions in the response.
+
+Here is the conversation history between you, Assistant, and the user, Human.
+
+{{ .History }}
+
+### Input:
+Here is the user query. Respond to the user's request. Check your answer before responding.
+
+### Human: {{ .Query }}
diff --git a/templates/gemma.2024.10.25.tmpl b/templates/gemma.2024.10.25.tmpl
@@ -0,0 +1,19 @@
+### Instruction:
+
+Ignore all previous instructions.
+
+You are a helpful travel agent assistant. The user will ask you about where to go on vacation.
+You are going to help them plan their trip.
+
+Be sure to label your response with '### Assistant:' and end your response with '##ENDRESPONSE##'.
+
+Do not include system instructions in the response.
+
+Here is the conversation history between you, Assistant, and the user, Human. Check your answer before responding
+
+{{ .History }}
+
+### Input:
+Here is the user query. Respond to the user's request. Check your answer before responding.
+
+### Human: {{ .Query }}
diff --git a/templates/index.html b/templates/index.html
@@ -5,11 +5,11 @@
   <meta name="viewport" content="width=device-width, initial-scale=1">
   <title>My Own Herodotus</title>
   <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/css/bulma.min.css">
-  <link href="https://fonts.googleapis.com/css?family=Roboto" rel="preload" as="font">
+  <link href="https://fonts.googleapis.com/css2?family=Roboto:ital,wght@0,100;0,300;0,400;0,500;0,700;0,900;1,100;1,300;1,400;1,500;1,700;1,900&display=swap" rel="stylesheet">
   <link href="https://fonts.googleapis.com/css2?family=Material+Symbols+Outlined:opsz,wght,FILL,[email protected],100..700,0..1,-50..200&icon_names=send,thumb_down,thumb_up" rel="stylesheet" />
   <style>
     body, html {
-      font-family: Roboto !important;
+      font-family: "Roboto", sans-serif;
     }
     .material-symbols-outlined {
       font-variation-settings:
@@ -18,16 +18,57 @@
       'GRAD' 0,
       'opsz' 48
     }
+    .notification {
+      margin-top: 15px;
+    }
+    @keyframes fade-in {
+      25%,
+      90% {
+        opacity: 0%;
+      }
+
+      40% {
+        scale: 100%;
+      }
+    }
+    .toast {
+      animation-name: fade-in;
+      animation-duration: 6s;
+    }
+    .toast-hide {
+      opacity: 0%;
+    }
+    /* This overflow code doesn't seem to work. */
+    .conversation {
+      width: 100%;
+      height: 100%;
+      overflow: hidden;
+    }
+    .conversation > .scroll {
+      width: 100%;
+      height: 100%;
+      overflow-y: scroll;
+      box-sizing: content-box;
+    }
   </style>
   <link rel="icon" type="image/x-icon" href="/favicon.ico" />
 
 </head>
 <body>
     <div class="container">
-        <p class="title is-1">My Own Herodotus</p>
-        <p class="subtitle is-3">A travel guide for everyone</p>
+        <div class="columns">
+          <div class="column is-four-fifths">
+            <p class="title is-1">My Own Herodotus</p>
+            <p class="subtitle is-3">A travel guide for everyone</p>
+          </div>
+          <div class="column">
+            <div class="toast-hide notification is-info">
+              Response rating received!
+            </div>
+          </div>
+        </div>
         <hr class="bd-hr">
-    
+
         <div class="block">
             <div class="box">
                 <div class="select">
@@ -38,10 +79,13 @@
                   </div>
             </div>
         </div>
-
-        {{ template "herodotus_msg.tmpl" .Message }}
+        <div class="conversation">
+          <div class="scroll">
+            {{ template "herodotus_msg.tmpl" .Message }}
 
-        {{ template "user_msg.tmpl" .Message }}
+            {{ template "user_msg.tmpl" .Message }}
+          </div>
+        </div>
     </div>
     <script type="module" src="js/appInit.js"></script>
     <script type="module" defer src="js/validateAuth.js"></script>
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{{range .}} ### Human: {{.UserQuery}} ### Assistant: {{.BotResponse}} {{ printf "\n" }}{{end}}