Skip to content

Commit ee5a0f3

Browse files
authored
add batch speedup evaluation to fastdraft_deepseek notebook (#2734)
Added a section to fastdraft_deepseek notebook to evaluate the speedup of deepseek with speculative decoding over multiple examples
1 parent 422fa4a commit ee5a0f3

File tree

2 files changed

+175
-0
lines changed

2 files changed

+175
-0
lines changed

supplementary_materials/notebooks/fastdraft-deepseek/fastdraft_deepseek.ipynb

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,137 @@
309309
"print(f\"End to end speedup with FastDraft and speculative decoding is {ar_gen_time / sd_gen_time:.2f}x\")"
310310
]
311311
},
312+
{
313+
"cell_type": "markdown",
314+
"metadata": {},
315+
"source": [
316+
"## Evaluate Speculative Decoding Speedup On Multiple Examples\n",
317+
"\n",
318+
"In this section we compare auto-regressive generation and speculative-decoding generation with DeepSeek-R1-Distill-Llama-8B model on multiple examples. \n",
319+
"We use 40 example-prompts taken from [MT-Bench](https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts) and from [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) datasets.\n",
320+
"We loop over these examples and measure generation times, first without speculative-decoding and later with speculative-decoding. Eventually we compare generation times for both methods and compute the average speedup gain."
321+
]
322+
},
323+
{
324+
"cell_type": "markdown",
325+
"metadata": {},
326+
"source": [
327+
"### 1. Run target model without speculative decoding\n",
328+
"As in previous section, we will first run generation without speculative-decoding, but this time we will run it over 40 examples."
329+
]
330+
},
331+
{
332+
"cell_type": "code",
333+
"execution_count": null,
334+
"metadata": {},
335+
"outputs": [],
336+
"source": [
337+
"import openvino_genai as ov_genai\n",
338+
"import sys\n",
339+
"import time\n",
340+
"from tqdm import tqdm\n",
341+
"from llm_pipeline_with_hf_tokenizer import LLMPipelineWithHFTokenizer\n",
342+
"\n",
343+
"print(f\"Loading model from {model_dir}\")\n",
344+
"\n",
345+
"# Define scheduler\n",
346+
"scheduler_config = ov_genai.SchedulerConfig()\n",
347+
"scheduler_config.num_kv_blocks = 2048 // 16\n",
348+
"scheduler_config.dynamic_split_fuse = False\n",
349+
"scheduler_config.max_num_batched_tokens = 2048\n",
350+
"\n",
351+
"pipe = LLMPipelineWithHFTokenizer(model_dir, device, scheduler_config=scheduler_config)\n",
352+
"\n",
353+
"generation_config = ov_genai.GenerationConfig()\n",
354+
"generation_config.max_new_tokens = 1024\n",
355+
"\n",
356+
"print(\"Loading prompts...\")\n",
357+
"import json\n",
358+
"f= open('prompts.json')\n",
359+
"prompts = json.load(f)\n",
360+
"prompts = [[{\"role\": \"user\", \"content\": p }] for p in prompts]\n",
361+
"\n",
362+
"times_auto_regressive = []\n",
363+
"for prompt in tqdm(prompts):\n",
364+
" start_time = time.perf_counter()\n",
365+
" result = pipe.generate(prompt, generation_config, apply_chat_template=True)\n",
366+
" end_time = time.perf_counter()\n",
367+
" times_auto_regressive.append(end_time - start_time)\n",
368+
"print(\"Done\")\n",
369+
"\n",
370+
"import gc\n",
371+
"\n",
372+
"del pipe\n",
373+
"gc.collect()"
374+
]
375+
},
376+
{
377+
"cell_type": "markdown",
378+
"metadata": {},
379+
"source": [
380+
"### 2. Run target model with speculative decoding\n",
381+
"Now we will run generation with speculative-decoding over the same 40 examples."
382+
]
383+
},
384+
{
385+
"cell_type": "code",
386+
"execution_count": null,
387+
"metadata": {},
388+
"outputs": [],
389+
"source": [
390+
"print(f\"Loading draft from {draft_model_path}\")\n",
391+
"\n",
392+
"# Define scheduler for the draft\n",
393+
"\n",
394+
"draft_scheduler_config = ov_genai.SchedulerConfig()\n",
395+
"draft_scheduler_config.num_kv_blocks = 2048 // 16\n",
396+
"draft_scheduler_config.dynamic_split_fuse = False\n",
397+
"draft_scheduler_config.max_num_batched_tokens = 2048\n",
398+
"\n",
399+
"draft_model = ov_genai.draft_model(draft_model_path, device, scheduler_config=draft_scheduler_config)\n",
400+
"\n",
401+
"pipe = LLMPipelineWithHFTokenizer(model_dir, device, scheduler_config=scheduler_config, draft_model=draft_model)\n",
402+
"\n",
403+
"\n",
404+
"generation_config = ov_genai.GenerationConfig()\n",
405+
"generation_config.num_assistant_tokens = 3\n",
406+
"generation_config.max_new_tokens = 2048\n",
407+
"\n",
408+
"times_speculative_decoding = []\n",
409+
"\n",
410+
"print(\"Running Speculative Decoding generation...\")\n",
411+
"for prompt in tqdm(prompts):\n",
412+
" start_time = time.perf_counter()\n",
413+
" result = pipe.generate(prompt, generation_config, apply_chat_template=True)\n",
414+
" end_time = time.perf_counter()\n",
415+
" times_speculative_decoding.append((end_time - start_time))\n",
416+
"print(\"Done\")"
417+
]
418+
},
419+
{
420+
"cell_type": "markdown",
421+
"metadata": {},
422+
"source": [
423+
"### 3. Calculate speedup\n"
424+
]
425+
},
426+
{
427+
"cell_type": "code",
428+
"execution_count": null,
429+
"metadata": {},
430+
"outputs": [],
431+
"source": [
432+
"avg_speedup = sum([x / y for x, y in zip(times_auto_regressive, times_speculative_decoding)]) / len(prompts)\n",
433+
"print(f\"average speedup: {avg_speedup:.2f}\")"
434+
]
435+
},
436+
{
437+
"cell_type": "markdown",
438+
"metadata": {},
439+
"source": [
440+
"We see that by using speculative-decoding with FastDraft we can accelerate DeepSeek-R1-Distill-Llama-8B generation by ~1.5x on avarage."
441+
]
442+
},
312443
{
313444
"attachments": {},
314445
"cell_type": "markdown",
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
[
2+
"which word does not belong to the other: Hammer, Screwdriver, Nail, Wood",
3+
"Each problem consists of three statements. Based on the first two statements, the third statement may be true, false, or uncertain.\n1. Oranges cost more than apples.\n2. Oranges cost less than bananas.\n3. Bananas cost more than apples and bananas cost more than orange.\nIf the first two statements are true, then the third statement is",
4+
"Parents have complained to the principal about bullying during recess. The principal wants to quickly resolve this, instructing recess aides to be vigilant. Which situation should the aides report to the principal?\na) An unengaged girl is sitting alone on a bench, engrossed in a book and showing no interaction with her peers.\nb) Two boys engaged in a one-on-one basketball game are involved in a heated argument regarding the last scored basket.\nc) A group of four girls has surrounded another girl and appears to have taken possession of her backpack.\nd) Three boys are huddled over a handheld video game, which is against the rules and not permitted on school grounds.",
5+
"Which number is bigger 9.11 or 9.9?",
6+
"A tech startup invests $8000 in software development in the first year, and then invests half of that amount in software development in the second year.\nWhat's the total amount the startup invested in software development over the two years?",
7+
"In a survey conducted at a local high school, preferences for a new school color were measured: 58% of students liked the color blue, 45% preferred green, and 22% liked both colors. If we randomly pick a student from the school, what's the probability that they would like neither blue nor green?",
8+
"Lily has a rubber ball that she drops from the top of a wall. the wall is 2 meters tall. how long will it take for the ball to reach the ground?",
9+
"Draft a professional email seeking your supervisor's feedback on the 'Quarterly Financial Report' you prepared. Ask specifically about the data analysis, presentation style, and the clarity of conclusions drawn. Keep the email short and to the point.",
10+
"Suppose you are a mathematician and poet. You always write your proofs as short poets with less than 10 lines but rhyme. Prove the square root of 2 is irrational number.",
11+
"Extract the following information from the presented texts: The name of the book, the author, the main character, the year of publication. Output in the format of \"main character, book, author, year of publication\", one book per line.\na) In the realm of wizarding literature, a true standout is the work of J.K. Rowling. One of her books that left an indelible mark is 'Harry Potter and the Philosopher's Stone'. This iconic tale, published in 1997, tells the story of Harry, a young orphan who discovers his magical abilities on his 11th birthday. Soon, he finds himself at the Hogwarts School of Witchcraft and Wizardry, a place teeming with magic and adventure, located somewhere in Scotland.\nb) The magic of Middle-earth has entranced readers worldwide, thanks to the brilliance of J.R.R. Tolkien. In one of his seminal works, 'The Lord of the Rings: The Fellowship of the Ring', published in 1954, we meet Frodo Baggins, a brave hobbit tasked with the perilous quest of destroying the One Ring. The epic journey takes him from the peaceful Shire to the tumultuous regions of Middle-earth.\nc) In a galaxy far, far away, the imagination of L.E. Starlighter gives us 'The Prism Galaxy Chronicles: The Awakening of the Starcaster'. Published in 2028, the story is about Zylo, a humble spaceship mechanic, who unexpectedly discovers he's a Starcaster - a rare individual with the power to manipulate stardust. Set against the backdrop of an interstellar empire in turmoil, Zylo's destiny unfolds on numerous alien worlds, each with its unique cosmic charm.",
12+
"Given the following data, identify the company with the highest profit in 2021 and provide its CEO's name:\na) Company X, with CEO Amy Williams, reported $30 billion in revenue and a $3 billion profit in 2021.\nb) Company Y, led by CEO Mark Thompson, posted a $60 billion revenue and a $6 billion profit in the same year.\nc) Company Z, under CEO Sarah Johnson, announced a $20 billion revenue and a $7 billion profit in 2021.\nd) Company W, managed by CEO James Smith, revealed a $300 billion revenue with a $21 billion profit in 2021.\ne) Company V, with CEO Lisa Brown, reported a $200 billion revenue and a $25 billion profit in 2021.\nf) Company U, under CEO John White, posted a $180 billion revenue and a $20 billion profit in the same year.",
13+
"Which is a species of fish? Tope or Rope",
14+
"Identify which instrument is string or percussion: Cantaro, Gudok",
15+
"Identify which instrument is string or woodwind: Panduri, Zurna",
16+
"Identify which instrument is string or percussion: Kpanlogo, Shamisen",
17+
"Which of these are rappers? Eminem, Michael Jackson, Rihanna, 50 Cent",
18+
"Classify each of the following pieces of equipment according to the sport they are used for, either basketball, football, or soccer: shin guards, shooting sleeve, penalty flag, corner flag, kicking tee, and goalie gloves.",
19+
"Identify which animal species is alive or extinct: Palaeophis, Giant Tortoise",
20+
"If we were playing a game where we had to identify things that can be found inside a house, which of these would we call out: car, chair, table, park, cloud, microwave.",
21+
"Classify the following as either dark-colored beers or light colored beers: porter, pilsner, stout, amber, lager",
22+
"Categorize each of the following instruments as either string or keyboard: Guitar, Violin, piano, harmonium, cello, accordion, banjo",
23+
"What are common ingredients of a full english breakfast?",
24+
"Classify each of the following as either a bird, animal, reptile or insect: tiger, heron, eagle, alligator, snake, spider, ant, dog, cat, rhinoceros, kingfisher, chameleon, hornet, butterfly",
25+
"Which is a species of fish? Sea dragon or Red bearded",
26+
"Categorize where each of these household items belong: bed, couch, desk",
27+
"Separate the following suburbs into those that border, and do not border, the Brisbane River: Indooroopilly, Bulimba, St Lucia, Newstead, Wilston, West End, Toowong, Bowen Hills and, Wooloongabba.",
28+
"Choose the word which is different from the rest: hangar, platform, dock, park, bus stand",
29+
"Identify which instrument is string or percussion: Den-den daiko, Luc huyen cam",
30+
"Pick if these would be useful or not useful for a high school student to put in their backpack. Notebooks, textbook, desk lamp, pencil pouch, beach ball, pillow, laptop.",
31+
"Classify the following numbers as 'prime' or 'composite' - 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16.",
32+
"Classify the below characters based on whether they are created by Marvel comics or D.C. comics\n Spider-Man, Iron Man, Captain America, Thor, Doctor Strange, Superman, Batman, Wonder Woman, Flash, Aquaman",
33+
"Which of the following are colors: red, black, yellow, orange, sun, sunflower, chips, book, white, pink, blue, keyboard.",
34+
"Which sports would be easiest to find success in if you’re not tall: baseball, soccer, basketball, bowling.",
35+
"What are the classifications of Academic Degrees?",
36+
"I'm creating a class to explain to kids the difference between items that are powered and can be plugged in and items that aren't. Please divide the following things into those you can plug in and those that you can't: table, hairdryer, television, chair, computer, fridge, comb, flowers.",
37+
"Identify which instrument is string or woodwind: Wheelharp, Clarinet",
38+
"Tell me whether these cities are in Texas: Austin, Houston, New York, Chicago, Miami, Dallas",
39+
"Classify the following types of cars as \"economy\" or \"luxury\": Ford, Chevrolet, Lamborghini, Ferrari, Mercedes, Honda, Lexus, Toyota, Nissan, Subaru",
40+
"Classify each of the following grades as being in elementary or high school: 10th grade, 3rd grade, 4th grade, 12th grade, 1st grade.",
41+
"Which of the following are ice cream toppings and which are salad dressings: thousand island, chocolate sauce, hot fudge, balsamic vinaigrette, whipped cream, and Caesar."
42+
]
43+
44+

0 commit comments

Comments
 (0)