|
| 1 | +--- |
| 2 | +layout: prediction_post |
| 3 | +published: True |
| 4 | +title: Applying massive language models in the real world with Cohere |
| 5 | +--- |
| 6 | + |
| 7 | +A little less than a year ago, I joined the awesome <a href="https://cohere.ai">Cohere</a> team. The company trains massive language models (both GPT-like and BERT-like) and offers them as an API (which also supports finetuning). Its founders include Google Brain alums including co-authors of the original Transformers paper. It's a fascinating role where I get to help companies and developers put these massive models to work solving real-world problems. |
| 8 | + |
| 9 | +I love that I get to share some of the intuitions developers need to start problem-solving with these models. Even though I've been working very closely on pretrained Transformers for the past several years (for this blog and in developing <a href="https://github.com/jalammar/ecco">Ecco</a>), I'm enjoying the convenience of problem-solving with managed language models as it frees up the restrictions of model loading/deployment and memory/GPU management. |
| 10 | + |
| 11 | +These are some of the articles I wrote and collaborated with colleagues on the last few months: |
| 12 | + |
| 13 | +### <a href="https://docs.cohere.ai/intro-to-llms/">Intro to Large Language Models with Cohere</a> |
| 14 | +<div class="row two-column-text"> |
| 15 | + <div class="col-md-6 col-xs-12"> |
| 16 | + <a href="https://docs.cohere.ai/intro-to-llms/"><img src="https://docs.cohere.ai/img/intro-llms/text-to-text-or-embedding-language-model.png" class="small-image"/></a> |
| 17 | + </div> |
| 18 | + <div class="col-md-6 col-xs-12"> |
| 19 | + <p>This is a high-level intro to large language models to people who are new to them. It establishes the difference between generative (GPT-like) and representation (BERT-like) models and examples use cases for them.</p> |
| 20 | + <p>This is one of the first articles I got to write. It's extracted from a much larger document that I wrote to explore some of the visual language to use in explaining the application of these models.</p> |
| 21 | + </div> |
| 22 | +</div> |
| 23 | + |
| 24 | +### <a href="https://docs.cohere.ai/prompt-engineering-wiki/">A visual guide to prompt engineering </a> |
| 25 | + |
| 26 | +<div class="row two-column-text"> |
| 27 | + <div class="col-md-6 col-xs-12"> |
| 28 | + <a href="https://docs.cohere.ai/prompt-engineering-wiki/"><img src="/images/cohere/language-model-input-prompt.png" class="small-image"/></a> |
| 29 | + </div> |
| 30 | + <div class="col-md-6 col-xs-12"> |
| 31 | + <p>Massive GPT models open the door for a new way of programming. If you structure the input text in the right way, you can useful (and often fascinating) results for a lot of taasks (e.g. text classification, copy writing, summarization...etc). |
| 32 | + </p> |
| 33 | + <p>This article visually demonstrates four principals to create prompts effectively. </p> |
| 34 | + </div> |
| 35 | +</div> |
| 36 | + |
| 37 | + |
| 38 | +### <a href="https://docs.cohere.ai/text-summarization-example/"> Text Summarization</a> |
| 39 | + |
| 40 | +<div class="row two-column-text"> |
| 41 | + <div class="col-md-6 col-xs-12"> |
| 42 | + <a href="https://docs.cohere.ai/text-summarization-example/"><img src="https://github.com/cohere-ai/notebooks/raw/main/notebooks/images/summarization.png" class="small-image"/></a> |
| 43 | + </div> |
| 44 | + <div class="col-md-6 col-xs-12"> |
| 45 | + <p>This is a walkthrough of creating a simple summarization system. It links to a jupyter notebook which includes the code to start experimenting with text generation and summarization.</p> |
| 46 | + <p>The end of this notebook shows an important idea I want to spend more time on in the future. That of how to rank/filter/select the best from amongst multiple generations.</p> |
| 47 | + </div> |
| 48 | +</div> |
| 49 | + |
| 50 | + |
| 51 | +### <a href="https://docs.cohere.ai/semantic-search/">Semantic Search</a> |
| 52 | + |
| 53 | +<div class="row two-column-text"> |
| 54 | + <div class="col-md-6 col-xs-12"> |
| 55 | + <a href="https://docs.cohere.ai/semantic-search/"><img src="https://github.com/cohere-ai/notebooks/raw/main/notebooks/images/basic-semantic-search-overview.png?3" class="small-image"/></a> |
| 56 | + </div> |
| 57 | + <div class="col-md-6 col-xs-12"> |
| 58 | + <p>Semantic search has to be one of the most exciting applications of sentence embedding models. This tutorials implements a "similar questions" functionality using sentence embeddings and a a vector search library.</p> |
| 59 | + <p>The vector search library used here is <a href="https://github.com/spotify/annoy">Annoy</a> from Spotify. There are a bunch of others out there. <a href="https://github.com/facebookresearch/faiss">Faiss</a> is used widely. I experiment with <a href="https://github.com/lmcinnes/pynndescent">PyNNDescent</a> as well.</p> |
| 60 | + </div> |
| 61 | +</div> |
| 62 | + |
| 63 | + |
| 64 | +### <a href="https://docs.cohere.ai/finetuning-representation-models/"> Finetuning Representation Models</a> |
| 65 | + |
| 66 | +<div class="row two-column-text"> |
| 67 | + <div class="col-md-6 col-xs-12"> |
| 68 | + <a href="https://docs.cohere.ai/finetuning-representation-models/"><img src="https://docs.cohere.ai/img/finetuning-rep/semantic-embed-labeled.png" class="small-image"/></a> |
| 69 | + </div> |
| 70 | + <div class="col-md-6 col-xs-12"> |
| 71 | + <p>Finetuning tends to lead to the best results language models can achieve. This article explains the intuitions around finetuning representation/sentence embedding models. I've added a couple more visuals to the <a href="https://twitter.com/JayAlammar/status/1490712428686024705">Twitter thread</a>.</p> |
| 72 | +<p>The research around this area is very interesting. I've highly enjoyed papers like <a href="https://arxiv.org/abs/1908.10084">Sentence BERT</a> and <a href="https://arxiv.org/abs/2007.00808">Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval</a></p> |
| 73 | + </div> |
| 74 | +</div> |
| 75 | + |
| 76 | + |
| 77 | +### <a href="https://docs.cohere.ai/token-picking/">Controlling Generation with top-k & top-p</a> |
| 78 | + |
| 79 | +<div class="row two-column-text"> |
| 80 | + <div class="col-md-6 col-xs-12"> |
| 81 | + <a href="https://docs.cohere.ai/token-picking/"><img src="https://docs.cohere.ai/img/token-picking/language-model-probability-distribution-output-tokens.png" class="small-image"/></a> |
| 82 | + </div> |
| 83 | + <div class="col-md-6 col-xs-12"> |
| 84 | + <p>This one is a little bit more technical. It explains the parameters you tweak to adjust a GPT's <i>decoding strategy</i> -- the method with which the system picks output tokens. |
| 85 | + </p> |
| 86 | + </div> |
| 87 | +</div> |
| 88 | + |
| 89 | + |
| 90 | +### <a href="https://docs.cohere.ai/text-classification-embeddings/">Text Classification Using Embeddings</a> |
| 91 | + |
| 92 | +<div class="row two-column-text"> |
| 93 | + <div class="col-md-6 col-xs-12"> |
| 94 | + <a href="https://docs.cohere.ai/text-classification-embeddings/"><img src="https://github.com/cohere-ai/notebooks/raw/main/notebooks/images/simple-classifier-embeddings.png" class="small-image"/></a> |
| 95 | + </div> |
| 96 | + <div class="col-md-6 col-xs-12"> |
| 97 | + <p> |
| 98 | + This is a walkthrough of one of the most common use cases of embedding models -- text classification. It is similar to <a href="http://127.0.0.1:4000/a-visual-guide-to-using-bert-for-the-first-time/">A Visual Guide to Using BERT for the First Time</a>, but uses Cohere's API. |
| 99 | + </p> |
| 100 | + </div> |
| 101 | +</div> |
| 102 | + |
| 103 | +You can find these and upcoming articles in the <a href="https://docs.cohere.ai/">Cohere docs</a> and <a href="https://github.com/cohere-ai/notebooks">notebooks repo</a>. I have quite number of experiments and interesting workflows I'd love to be sharing in the coming weeks. So stay tuned! |
0 commit comments