| AI Model | Success | Score | FCSR | Status |
|---|---|---|---|---|
z-ai/glm-5 |
4 | 100 | 72% | π’ |
qwen/qwen3-coder-next |
2 | 98.21 | 51% | π‘ |
deepseek/deepseek-v3.1-terminus-exacto |
2 | 97.91 | 85% | π‘ |
openai/gpt-4.1-mini |
1 | 96.13 | 83% | π‘ |
qwen/qwen3-next-80b-a3b-instruct |
0 | 95.17 | 71% | π‘ |
qwen/qwen3-30b-a3b-thinking-2507 |
0 | 92.92 | 75% | π‘ |
- FCSR: Function Calling Success Rate
- Status:
- π’: All projects completed successfully
- π‘: Some projects failed
- β: All projects failed or not executed
| Project | Score | Analyze | Prisma | Interface | Test | Realize |
|---|---|---|---|---|---|---|
todo |
100 | π’ | π’ | π’ | π’ | π’ |
bbs |
100 | π’ | π’ | π’ | π’ | π’ |
reddit |
100 | π’ | π’ | π’ | π’ | π’ |
shopping |
100 | π’ | π’ | π’ | π’ | π’ |
- Source Code:
z-ai/glm-5/todo - Score: 100
- Elapsed Time: 2h 51m 32s
- Token Usage: 25.86M
- Function Calling Success Rate: 73.81%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 1, documents: 11 |
653.8K | 13m 27s | 96% |
| π’ Database | namespaces: 2, models: 4 |
776.6K | 15m 0s | 96% |
| π’ Interface | operations: 18, schemas: 22 |
17.09M | 1h 25m 27s | 62% |
| π’ Test | functions: 56 |
5.75M | 34m 1s | 82% |
| π’ Realize | functions: 26 |
1.59M | 23m 35s | 86% |
- Source Code:
z-ai/glm-5/bbs - Score: 100
- Elapsed Time: 17h 57m 0s
- Token Usage: 95.01M
- Function Calling Success Rate: 76.83%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 1, documents: 16 |
1.14M | 20m 9s | 93% |
| π’ Database | namespaces: 9, models: 28 |
3.58M | 20m 46s | 98% |
| π’ Interface | operations: 59, schemas: 79 |
65.37M | 11h 46m 37s | 65% |
| π’ Test | functions: 188 |
17.81M | 4h 41m 46s | 93% |
| π’ Realize | functions: 96 |
7.10M | 47m 41s | 81% |
- Source Code:
z-ai/glm-5/reddit - Score: 100
- Elapsed Time: 13h 10m 46s
- Token Usage: 112.01M
- Function Calling Success Rate: 72.37%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 1, documents: 11 |
718.5K | 27m 47s | 89% |
| π’ Database | namespaces: 7, models: 22 |
3.21M | 46m 19s | 97% |
| π’ Interface | operations: 73, schemas: 82 |
71.72M | 4h 39m 51s | 64% |
| π’ Test | functions: 188 |
24.51M | 5h 59m 46s | 80% |
| π’ Realize | functions: 113 |
11.85M | 1h 17m 1s | 77% |
- Source Code:
z-ai/glm-5/shopping - Score: 100
- Elapsed Time: 7h 51m 6s
- Token Usage: 252.94M
- Function Calling Success Rate: 71.21%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 17 |
1.24M | 35m 23s | 95% |
| π’ Database | namespaces: 8, models: 33 |
5.02M | 27m 35s | 96% |
| π’ Interface | operations: 119, schemas: 148 |
167.10M | 3h 45m 7s | 63% |
| π’ Test | functions: 325 |
56.62M | 1h 15m 45s | 83% |
| π’ Realize | functions: 178 |
22.96M | 1h 47m 14s | 70% |
| Project | Score | Analyze | Prisma | Interface | Test | Realize |
|---|---|---|---|---|---|---|
todo |
100 | π’ | π’ | π’ | π’ | π’ |
bbs |
100 | π’ | π’ | π’ | π’ | π’ |
reddit |
97.61 | π’ | π’ | π’ | π’ | π‘ |
shopping |
95.22 | π’ | π’ | π’ | π’ | π‘ |
- Source Code:
qwen/qwen3-coder-next/todo - Score: 100
- Elapsed Time: 2h 15m 51s
- Token Usage: 38.78M
- Function Calling Success Rate: 69.69%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 1, documents: 11 |
577.1K | 10m 22s | 92% |
| π’ Database | namespaces: 2, models: 7 |
1.38M | 3m 13s | 94% |
| π’ Interface | operations: 20, schemas: 25 |
25.03M | 27m 22s | 55% |
| π’ Test | functions: 49 |
8.39M | 58m 57s | 82% |
| π’ Realize | functions: 30 |
3.40M | 35m 54s | 82% |
- Source Code:
qwen/qwen3-coder-next/bbs - Score: 100
- Elapsed Time: 3h 22m 36s
- Token Usage: 137.74M
- Function Calling Success Rate: 55.03%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 4, documents: 12 |
774.3K | 12m 35s | 96% |
| π’ Database | namespaces: 4, models: 21 |
3.18M | 5m 36s | 86% |
| π’ Interface | operations: 76, schemas: 77 |
92.65M | 42m 3s | 49% |
| π’ Test | functions: 177 |
25.45M | 1h 33m 51s | 52% |
| π’ Realize | functions: 107 |
15.68M | 48m 29s | 65% |
- Source Code:
qwen/qwen3-coder-next/reddit - Score: 97.61
- Elapsed Time: 5h 37m 56s
- Token Usage: 331.75M
- Function Calling Success Rate: 47.50%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 4, documents: 12 |
877.8K | 18m 41s | 90% |
| π’ Database | namespaces: 7, models: 51 |
7.73M | 21m 37s | 86% |
| π’ Interface | operations: 124, schemas: 117 |
206.26M | 2h 0m 34s | 30% |
| π’ Test | functions: 324 |
78.84M | 1h 24m 45s | 64% |
| π‘ Realize | functions: 176, errors: 7 |
38.04M | 1h 32m 18s | 73% |
- Source Code:
qwen/qwen3-coder-next/shopping - Score: 95.22
- Elapsed Time: 8h 0m 16s
- Token Usage: 783.88M
- Function Calling Success Rate: 50.78%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 12 |
1.13M | 32m 58s | 87% |
| π’ Database | namespaces: 9, models: 72 |
11.44M | 11m 33s | 86% |
| π’ Interface | operations: 183, schemas: 217 |
495.03M | 2h 7m 58s | 42% |
| π’ Test | functions: 483 |
145.61M | 2h 9m 41s | 64% |
| π‘ Realize | functions: 289, errors: 23 |
130.66M | 2h 58m 5s | 49% |
| Project | Score | Analyze | Prisma | Interface | Test | Realize |
|---|---|---|---|---|---|---|
todo |
100 | π’ | π’ | π’ | π’ | π’ |
bbs |
92.31 | π’ | π’ | π’ | π’ | π‘ |
reddit |
99.34 | π’ | π’ | π’ | π’ | π‘ |
shopping |
100 | π’ | π’ | π’ | π’ | π’ |
- Source Code:
deepseek/deepseek-v3.1-terminus-exacto/todo - Score: 100
- Elapsed Time: 4h 12m 35s
- Token Usage: 68.88M
- Function Calling Success Rate: 87.29%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 1, documents: 11 |
761.3K | 3m 17s | 100% |
| π’ Database | namespaces: 5, models: 44 |
5.74M | 13m 37s | 89% |
| π’ Interface | operations: 43, schemas: 57 |
36.74M | 2h 9m 10s | 85% |
| π’ Test | functions: 119 |
15.38M | 52m 55s | 94% |
| π’ Realize | functions: 66 |
10.27M | 53m 34s | 78% |
- Source Code:
deepseek/deepseek-v3.1-terminus-exacto/bbs - Score: 92.31
- Elapsed Time: 23h 21m 41s
- Token Usage: 654.70M
- Function Calling Success Rate: 84.86%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 11 |
682.8K | 13m 45s | 100% |
| π’ Database | namespaces: 7, models: 76 |
10.12M | 28m 25s | 93% |
| π’ Interface | operations: 438, schemas: 323 |
255.69M | 13h 37m 39s | 85% |
| π’ Test | functions: 1195 |
232.70M | 3h 21m 43s | 92% |
| π‘ Realize | functions: 585, errors: 75 |
155.49M | 5h 40m 7s | 73% |
- Source Code:
deepseek/deepseek-v3.1-terminus-exacto/reddit - Score: 99.34
- Elapsed Time: 12h 16m 21s
- Token Usage: 448.48M
- Function Calling Success Rate: 86.90%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 12 |
858.7K | 13m 25s | 100% |
| π’ Database | namespaces: 7, models: 87 |
9.87M | 30m 54s | 93% |
| π’ Interface | operations: 322, schemas: 290 |
249.02M | 5h 50m 23s | 83% |
| π’ Test | functions: 978 |
126.57M | 3h 26m 59s | 94% |
| π‘ Realize | functions: 457, errors: 5 |
62.17M | 2h 14m 38s | 82% |
- Source Code:
deepseek/deepseek-v3.1-terminus-exacto/shopping - Score: 100
- Elapsed Time: 16h 56m 2s
- Token Usage: 504.08M
- Function Calling Success Rate: 85.34%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 4, documents: 11 |
893.3K | 19m 56s | 100% |
| π’ Database | namespaces: 7, models: 99 |
14.61M | 34m 56s | 93% |
| π’ Interface | operations: 351, schemas: 305 |
238.75M | 9h 44m 1s | 82% |
| π’ Test | functions: 939 |
171.90M | 3h 54m 10s | 91% |
| π’ Realize | functions: 490 |
77.92M | 2h 22m 57s | 78% |
| Project | Score | Analyze | Prisma | Interface | Test | Realize |
|---|---|---|---|---|---|---|
todo |
100 | π’ | π’ | π’ | π’ | π’ |
bbs |
96.73 | π’ | π’ | π’ | π’ | π‘ |
reddit |
92.3 | π’ | π’ | π’ | π’ | π‘ |
shopping |
95.48 | π’ | π’ | π’ | π’ | π‘ |
- Source Code:
openai/gpt-4.1-mini/todo - Score: 100
- Elapsed Time: 48m 34s
- Token Usage: 21.35M
- Function Calling Success Rate: 80.62%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 1, documents: 12 |
502.6K | 2m 14s | 100% |
| π’ Database | namespaces: 2, models: 6 |
2.16M | 7m 9s | 34% |
| π’ Interface | operations: 22, schemas: 31 |
12.82M | 18m 22s | 77% |
| π’ Test | functions: 52 |
3.94M | 7m 41s | 100% |
| π’ Realize | functions: 34 |
1.92M | 13m 6s | 93% |
- Source Code:
openai/gpt-4.1-mini/bbs - Score: 96.73
- Elapsed Time: 3h 48m 42s
- Token Usage: 205.88M
- Function Calling Success Rate: 84.53%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 4, documents: 12 |
492.0K | 3m 8s | 100% |
| π’ Database | namespaces: 7, models: 36 |
9.66M | 17m 19s | 34% |
| π’ Interface | operations: 231, schemas: 175 |
110.43M | 45m 18s | 75% |
| π’ Test | functions: 536 |
60.65M | 1h 58m 45s | 98% |
| π‘ Realize | functions: 312, errors: 17 |
24.65M | 44m 10s | 94% |
- Source Code:
openai/gpt-4.1-mini/reddit - Score: 92.3
- Elapsed Time: 3h 46m 59s
- Token Usage: 234.20M
- Function Calling Success Rate: 81.86%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 4, documents: 12 |
522.3K | 4m 17s | 100% |
| π’ Database | namespaces: 7, models: 43 |
14.48M | 22m 51s | 27% |
| π’ Interface | operations: 220, schemas: 188 |
116.41M | 53m 9s | 75% |
| π’ Test | functions: 518 |
59.39M | 54m 42s | 98% |
| π‘ Realize | functions: 304, errors: 39 |
43.40M | 1h 31m 58s | 90% |
- Source Code:
openai/gpt-4.1-mini/shopping - Score: 95.48
- Elapsed Time: 4h 16m 46s
- Token Usage: 394.57M
- Function Calling Success Rate: 83.78%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 14 |
665.7K | 3m 18s | 100% |
| π’ Database | namespaces: 10, models: 66 |
26.46M | 23m 5s | 25% |
| π’ Interface | operations: 330, schemas: 348 |
214.79M | 1h 2m 14s | 79% |
| π’ Test | functions: 807 |
93.42M | 1h 25m 3s | 98% |
| π‘ Realize | functions: 491, errors: 37 |
59.24M | 1h 23m 3s | 95% |
| Project | Score | Analyze | Prisma | Interface | Test | Realize |
|---|---|---|---|---|---|---|
todo |
96.84 | π’ | π’ | π’ | π’ | π‘ |
bbs |
96.17 | π’ | π’ | π’ | π’ | π‘ |
reddit |
92.99 | π’ | π’ | π’ | π’ | π‘ |
shopping |
94.68 | π’ | π’ | π’ | π’ | π‘ |
- Source Code:
qwen/qwen3-next-80b-a3b-instruct/todo - Score: 96.84
- Elapsed Time: 2h 48m 37s
- Token Usage: 19.64M
- Function Calling Success Rate: 81.98%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 1, documents: 11 |
1.20M | 11m 19s | 72% |
| π’ Database | namespaces: 4, models: 7 |
974.3K | 5m 36s | 90% |
| π’ Interface | operations: 13, schemas: 16 |
10.95M | 1h 57m 53s | 76% |
| π’ Test | functions: 27 |
3.70M | 14m 35s | 91% |
| π‘ Realize | functions: 19, errors: 1 |
2.81M | 19m 12s | 84% |
- Source Code:
qwen/qwen3-next-80b-a3b-instruct/bbs - Score: 96.17
- Elapsed Time: 4h 38m 10s
- Token Usage: 130.68M
- Function Calling Success Rate: 69.23%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 12 |
1.52M | 22m 21s | 70% |
| π’ Database | namespaces: 2, models: 23 |
6.45M | 44m 11s | 70% |
| π’ Interface | operations: 66, schemas: 72 |
74.19M | 1h 18m 12s | 55% |
| π’ Test | functions: 135 |
27.33M | 48m 30s | 82% |
| π‘ Realize | functions: 94, errors: 6 |
21.20M | 1h 24m 54s | 80% |
- Source Code:
qwen/qwen3-next-80b-a3b-instruct/reddit - Score: 92.99
- Elapsed Time: 6h 44m 15s
- Token Usage: 188.24M
- Function Calling Success Rate: 70.90%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 5, documents: 13 |
1.07M | 8m 16s | 64% |
| π’ Database | namespaces: 5, models: 29 |
3.36M | 5m 33s | 80% |
| π’ Interface | operations: 117, schemas: 92 |
100.15M | 2h 11m 25s | 58% |
| π’ Test | functions: 231 |
46.45M | 55m 8s | 78% |
| π‘ Realize | functions: 154, errors: 18 |
37.21M | 3h 23m 51s | 79% |
- Source Code:
qwen/qwen3-next-80b-a3b-instruct/shopping - Score: 94.68
- Elapsed Time: 4h 39m 52s
- Token Usage: 200.59M
- Function Calling Success Rate: 71.14%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 13 |
1.49M | 15m 57s | 68% |
| π’ Database | namespaces: 10, models: 35 |
6.51M | 10m 55s | 83% |
| π’ Interface | operations: 79, schemas: 106 |
113.56M | 1h 17m 43s | 56% |
| π’ Test | functions: 175 |
49.59M | 1h 3m 24s | 81% |
| π‘ Realize | functions: 124, errors: 11 |
29.44M | 1h 51m 52s | 80% |
| Project | Score | Analyze | Prisma | Interface | Test | Realize |
|---|---|---|---|---|---|---|
todo |
97.6 | π’ | π’ | π’ | π’ | π‘ |
bbs |
94.07 | π’ | π’ | π’ | π’ | π‘ |
reddit |
90 | π’ | π’ | π’ | π’ | π‘ |
shopping |
90 | π’ | π’ | π’ | π’ | π‘ |
- Source Code:
qwen/qwen3-30b-a3b-thinking-2507/todo - Score: 97.6
- Elapsed Time: 5h 7m 11s
- Token Usage: 31.14M
- Function Calling Success Rate: 74.21%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 1, documents: 11 |
406.8K | 5m 11s | 100% |
| π’ Database | namespaces: 2, models: 6 |
882.7K | 10m 1s | 81% |
| π’ Interface | operations: 17, schemas: 21 |
18.07M | 1h 38m 44s | 54% |
| π’ Test | functions: 36 |
7.77M | 1h 42m 24s | 94% |
| π‘ Realize | functions: 25, errors: 1 |
4.01M | 1h 30m 50s | 84% |
- Source Code:
qwen/qwen3-30b-a3b-thinking-2507/bbs - Score: 94.07
- Elapsed Time: 6h 1m 46s
- Token Usage: 99.95M
- Function Calling Success Rate: 72.82%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 11 |
546.4K | 7m 55s | 92% |
| π’ Database | namespaces: 6, models: 19 |
1.95M | 12m 37s | 86% |
| π’ Interface | operations: 50, schemas: 71 |
61.41M | 2h 20m 12s | 57% |
| π’ Test | functions: 88 |
22.35M | 58m 42s | 93% |
| π‘ Realize | functions: 81, errors: 8 |
13.69M | 2h 22m 17s | 79% |
- Source Code:
qwen/qwen3-30b-a3b-thinking-2507/reddit - Score: 90
- Elapsed Time: 7h 46m 21s
- Token Usage: 140.42M
- Function Calling Success Rate: 75.51%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 2, documents: 14 |
717.4K | 5m 11s | 100% |
| π’ Database | namespaces: 8, models: 26 |
3.15M | 17m 36s | 82% |
| π’ Interface | operations: 65, schemas: 90 |
74.29M | 2h 57m 44s | 61% |
| π’ Test | functions: 133 |
30.67M | 1h 17m 42s | 92% |
| π‘ Realize | functions: 104, errors: 30 |
31.58M | 3h 8m 7s | 78% |
- Source Code:
qwen/qwen3-30b-a3b-thinking-2507/shopping - Score: 90
- Elapsed Time: 6h 27m 24s
- Token Usage: 161.24M
- Function Calling Success Rate: 76.16%
| Phase | Generated | Token Usage | Elapsed Time | FCSR |
|---|---|---|---|---|
| π’ Analyze | actors: 3, documents: 12 |
804.8K | 7m 47s | 92% |
| π’ Database | namespaces: 6, models: 38 |
4.09M | 21m 16s | 85% |
| π’ Interface | operations: 80, schemas: 103 |
81.86M | 1h 58m 56s | 63% |
| π’ Test | functions: 147 |
39.54M | 1h 12m 31s | 89% |
| π‘ Realize | functions: 126, errors: 27 |
34.95M | 2h 46m 51s | 78% |