diff --git a/blog/briefly-evaluating-qwq-preview/content.md b/blog/briefly-evaluating-qwq-preview/content.md index 1bc48a8..8813a50 100644 --- a/blog/briefly-evaluating-qwq-preview/content.md +++ b/blog/briefly-evaluating-qwq-preview/content.md @@ -4,7 +4,7 @@ Alibaba has somewhat stunned the AI community by their releasing a relatively capable o1 competitor as an open model, mere months after the unveiling of o1. The model is called QwQ Preview, or QwQ-32B-Preview if you will. You can learn more about it on google. -Let's briefly evaluate its coding chops, using the Q4_K_M variant running via Ollama. +Let's briefly evaluate its coding chops, using the Q4_K_M variant running via Ollama. Note that I won't be comparing it to o1 directly since my OAI API credit is at 0 and I don't want to put more in there right now. ## Add comments to C++ code @@ -707,6 +707,74 @@ int main(void) A small ray tracer. Qwen 2.5 Coder gets further than QwQ; neither gets it right. +## Spatial reasoning + + +

+ This code draws the likeness of a 1990s ray traced scene of two spheres. It doesn't quite look physically correct though. +

+ +> Code +```js + + + + + Simple Scene + + + + + + +``` +
+ +> Original +![{iframe}{inline-class:model-response}](./spat-o.html) + +> QwQ Preview 32B +![{iframe}{inline-class:model-response}](./spat-qwq.html) + +> Claude 3.5 Sonnet (October 2024) +![{iframe}{inline-class:model-response}](./spat-s35.html) + +> Mistral Large 123B +![{iframe}{inline-class:model-response}](./spat-misla.html) + +> Grok 2 +![{iframe}{inline-class:model-response}](./spat-grok2.html) + +> Qwen 2.5 72B +![{iframe}{inline-class:model-response}](./spat-q25.html) + +> Llama 3.2 Vision 90B +![{iframe}{inline-class:model-response}](./spat-llama-vision-90.html) + +In the original image, the reflection of the red sphere on the blue sphere is incorrectly placed. QwQ was the only model that clearly understood this and fixed it. Grok 2 might also have had a sniff of this issue; the other models not so much. + ## Use a GUI framework @@ -781,5 +849,5 @@ Both QwQ and Qwen 2.5 Coder added a game-over condition, though I'd say QwQ's is ## Conclusions -QwQ Preview does a decent job, often rising above Qwen 2.5 Coder and sometimes matching Claude 3.5 Sonnet; but on the flip side eating up quite a few more tokens. Interesting to see the full QwQ down the road. +QwQ Preview does a decent job, often rising above Qwen 2.5 Coder and sometimes even Claude 3.5 Sonnet; but on the flip side eating up quite a few more tokens. Interesting to see the full QwQ down the road. diff --git a/blog/briefly-evaluating-qwq-preview/index.html b/blog/briefly-evaluating-qwq-preview/index.html index 3c4b421..67e9e07 100644 --- a/blog/briefly-evaluating-qwq-preview/index.html +++ b/blog/briefly-evaluating-qwq-preview/index.html @@ -69,7 +69,7 @@

Alibaba has somewhat stunned the AI community by their releasing a relatively capable o1 competitor as an open model, mere months after the unveiling of o1. The model is called QwQ Preview, or QwQ-32B-Preview if you will. You can learn more about it on google.

-

Let's briefly evaluate its coding chops, using the Q4_K_M variant running via Ollama.

+

Let's briefly evaluate its coding chops, using the Q4_K_M variant running via Ollama. Note that I won't be comparing it to o1 directly since my OAI API credit is at 0 and I don't want to put more in there right now.

@@ -778,6 +778,68 @@

A small ray tracer. Qwen 2.5 Coder gets further than QwQ; neither gets it right.

+
+ +

+ This code draws the likeness of a 1990s ray traced scene of two spheres. It doesn't quite look physically correct though. +

+ + + + + + +
+ + + + + + + + + + + + + + +

In the original image, the reflection of the red sphere on the blue sphere is incorrectly placed. QwQ was the only model that clearly understood this and fixed it. Grok 2 might also have had a sniff of this issue; the other models not so much.

@@ -845,7 +907,7 @@

Both QwQ and Qwen 2.5 Coder added a game-over condition, though I'd say QwQ's is better. They both also implemented right-clicking to place flags, but since the UI framework only supports left-click, that can't be tested. Overall, I wouldn't call either version finished, but getting there.

-

QwQ Preview does a decent job, often rising above Qwen 2.5 Coder and sometimes matching Claude 3.5 Sonnet; but on the flip side eating up quite a few more tokens. Interesting to see the full QwQ down the road.

+

QwQ Preview does a decent job, often rising above Qwen 2.5 Coder and sometimes even Claude 3.5 Sonnet; but on the flip side eating up quite a few more tokens. Interesting to see the full QwQ down the road.

diff --git a/blog/briefly-evaluating-qwq-preview/spat-grok2.html b/blog/briefly-evaluating-qwq-preview/spat-grok2.html new file mode 100644 index 0000000..15a17fb --- /dev/null +++ b/blog/briefly-evaluating-qwq-preview/spat-grok2.html @@ -0,0 +1,66 @@ + + + + + Improved Ray Traced Scene + + + + + + + diff --git a/blog/briefly-evaluating-qwq-preview/spat-llama-vision-90.html b/blog/briefly-evaluating-qwq-preview/spat-llama-vision-90.html new file mode 100644 index 0000000..7d2fe63 --- /dev/null +++ b/blog/briefly-evaluating-qwq-preview/spat-llama-vision-90.html @@ -0,0 +1,53 @@ + + + + + Simple Scene + + + + + + diff --git a/blog/briefly-evaluating-qwq-preview/spat-misla.html b/blog/briefly-evaluating-qwq-preview/spat-misla.html new file mode 100644 index 0000000..9e47013 --- /dev/null +++ b/blog/briefly-evaluating-qwq-preview/spat-misla.html @@ -0,0 +1,46 @@ + + + + + Simple Scene + + + + + + + + diff --git a/blog/briefly-evaluating-qwq-preview/spat-o.html b/blog/briefly-evaluating-qwq-preview/spat-o.html new file mode 100644 index 0000000..1b2bee1 --- /dev/null +++ b/blog/briefly-evaluating-qwq-preview/spat-o.html @@ -0,0 +1,33 @@ + + + + + Simple Scene + + + + + + diff --git a/blog/briefly-evaluating-qwq-preview/spat-q25.html b/blog/briefly-evaluating-qwq-preview/spat-q25.html new file mode 100644 index 0000000..9140b55 --- /dev/null +++ b/blog/briefly-evaluating-qwq-preview/spat-q25.html @@ -0,0 +1,60 @@ + + + + + Simple Scene + + + + + + + diff --git a/blog/briefly-evaluating-qwq-preview/spat-qwq.html b/blog/briefly-evaluating-qwq-preview/spat-qwq.html new file mode 100644 index 0000000..624cd31 --- /dev/null +++ b/blog/briefly-evaluating-qwq-preview/spat-qwq.html @@ -0,0 +1,66 @@ + + + + + Simple Scene + + + + + + + diff --git a/blog/briefly-evaluating-qwq-preview/spat-s35.html b/blog/briefly-evaluating-qwq-preview/spat-s35.html new file mode 100644 index 0000000..e01458b --- /dev/null +++ b/blog/briefly-evaluating-qwq-preview/spat-s35.html @@ -0,0 +1,67 @@ + + + + + Ray Traced Scene + + + + + +