Cli by ErvinXie · Pull Request #1765 · kvcache-ai/ktransformers

ErvinXie · 2025-12-29T03:51:44Z

Fix Cli

gemini-code-assist · 2025-12-29T03:52:08Z

Summary of Changes

Hello @ErvinXie, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the kt-kernel command-line interface by enhancing diagnostic capabilities and improving the flexibility of the model inference server (run) command. The doctor command now offers comprehensive insights into kt-kernel installations, while the run command's migration to click allows for seamless passthrough of advanced SGLang server options, making the CLI more powerful and adaptable to various deployment scenarios. Additionally, model discovery has been made more robust through recursive path searching.

Highlights

Enhanced doctor command: The doctor command now provides detailed diagnostics for kt-kernel installations, including version, CPU variant (AMX, AVX512, AVX2), and installation paths, offering specific hints and status indicators. It also removes the check for the ktransformers package.
Refactored run command with click: The run command has been migrated from typer to click to enable passing arbitrary, unknown command-line arguments directly to the underlying SGLang server, significantly increasing its flexibility and extensibility.
Improved Model Path Discovery: The _find_model_path and find_local_models utilities now support recursive searching for model directories up to a specified depth, making model discovery more robust and capable of finding models nested deeper within configured model paths.
Dynamic Argument Handling for run: The main CLI entry point now specially handles the run command, bypassing typer's default argument parsing and delegating to click for run's specific argument handling, including the ability to process and forward unrecognized options to the SGLang server.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly enhances the kt-kernel CLI by upgrading the run command to use click instead of typer, allowing it to accept and pass through arbitrary, unknown command-line arguments directly to the underlying SGLang server. This refactoring involved creating a new internal _run_impl function and special handling in main.py to manage click's argument parsing. Additionally, the doctor command is updated to provide detailed kt-kernel installation diagnostics, including CPU variant detection (AMX, AVX512, AVX2), version, and installation paths, and removes ktransformers from its package checks. The model path discovery logic in both run.py and model_registry.py was modified to support recursive searching up to a specified max_depth for local models. Review comments highlighted critical issues with the initial implementation of CPU variant parsing in doctor.py, noting its fragility and suggesting a more robust splitting method. The recursive model search logic in both _find_model_path and find_local_models was also identified as flawed, inefficiently using rglob and ignoring the max_depth parameter, with a more efficient glob-pattern-based approach suggested for correction.

gemini-code-assist · 2025-12-29T03:54:58Z

kt-kernel/python/cli/commands/doctor.py

+            if "_kt_kernel_ext_" in basename:
+                # Extract variant from _kt_kernel_ext_amx.cpython-311-x86_64-linux-gnu.so
+                parts = basename.split("_")
+                if len(parts) >= 4:
+                    variant = parts[3]  # "amx" from "_kt_kernel_ext_amx..."
+                    if variant.startswith("avx"):
+                        # Normalize avx variants
+                        if variant in ["avx512", "avx512_bf16", "avx512_vbmi", "avx512_vnni", "avx512_base"]:
+                            variants.add("avx512")
+                        else:
+                            variants.add(variant)
+                    else:
+                        variants.add(variant)


The logic for parsing the CPU variant from the .so filename is incorrect. basename.split("_") is fragile, and parts[3] will not correctly extract the variant. For a filename like _kt_kernel_ext_amx.cpython-311-x86_64-linux-gnu.so, parts[3] would be 'ext', not 'amx'. This will cause incorrect variant detection and lead to wrong recommendations in the doctor command. A more robust method, such as splitting on the known prefix _kt_kernel_ext_, should be used.

Suggested change

if "_kt_kernel_ext_" in basename:

# Extract variant from _kt_kernel_ext_amx.cpython-311-x86_64-linux-gnu.so

parts = basename.split("_")

if len(parts) >= 4:

variant = parts[3] # "amx" from "_kt_kernel_ext_amx..."

if variant.startswith("avx"):

# Normalize avx variants

if variant in ["avx512", "avx512_bf16", "avx512_vbmi", "avx512_vnni", "avx512_base"]:

variants.add("avx512")

else:

variants.add(variant)

else:

variants.add(variant)

if "_kt_kernel_ext_" in basename:

# Extract variant from _kt_kernel_ext_amx.cpython-311-x86_64-linux-gnu.so

try:

variant_part = basename.split("_kt_kernel_ext_")[1]

variant = variant_part.split(".")[0]

if variant.startswith("avx") and variant in {"avx512", "avx512_bf16", "avx512_vbmi", "avx512_vnni", "avx512_base"}:

variants.add("avx512")

else:

variants.add(variant)

except IndexError:

# Could not parse variant, skip this file.

pass

gemini-code-assist · 2025-12-29T03:54:58Z

kt-kernel/python/cli/commands/run.py

+        # Search recursively up to max_depth
+        for depth in range(max_depth):
+            for name in possible_names:
+                if depth == 0:
+                    # Direct children: models_dir / name
+                    search_paths = [models_dir / name]
+                else:
+                    # Nested: use rglob to find directories matching the name
+                    search_paths = list(models_dir.rglob(name))

-        for path in possible_paths:
-            if path.exists() and (path / "config.json").exists():
-                return path
+                for path in search_paths:
+                    if path.exists() and (path / "config.json").exists():
+                        return path


The recursive search logic in _find_model_path is flawed. The else block uses models_dir.rglob(name), which searches at all depths, ignoring the max_depth parameter from the docstring. Additionally, the for depth in range(max_depth) loop causes rglob to be called multiple times redundantly if a model is not found at depth=1. This is inefficient and does not respect the max_depth constraint. A correct implementation should build a glob pattern for each depth.

Suggested change

# Search recursively up to max_depth

for depth in range(max_depth):

for name in possible_names:

if depth == 0:

# Direct children: models_dir / name

search_paths = [models_dir / name]

else:

# Nested: use rglob to find directories matching the name

search_paths = list(models_dir.rglob(name))

for path in possible_paths:

if path.exists() and (path / "config.json").exists():

return path

for path in search_paths:

if path.exists() and (path / "config.json").exists():

return path

# Search recursively up to max_depth

for name in possible_names:

for depth in range(max_depth):

pattern = "*/" * depth + name

for path in models_dir.glob(pattern):

if path.is_dir() and (path / "config.json").exists():

return path

gemini-code-assist · 2025-12-29T03:54:58Z

kt-kernel/python/cli/utils/model_registry.py

+                # Search recursively up to max_depth
+                for depth in range(max_depth):
+                    # Build glob pattern for current depth
+                    # depth=0: direct children, depth=1: grandchildren, etc.
+                    glob_pattern = "*" if depth > 0 else ""
+                    for _ in range(depth):
+                        glob_pattern = "*/" + glob_pattern if glob_pattern else "*"
+
+                    for name in possible_names:
+                        if depth == 0:
+                            # Direct children: models_dir / name
+                            search_paths = [models_dir / name]
+                        else:
+                            # Nested: use rglob to find directories matching the name
+                            search_paths = list(models_dir.rglob(name))
+
+                        for path in search_paths:
+                            if path.exists() and (path / "config.json").exists():
+                                results.append((model, path))
+                                found = True
+                                break
+
+                        if found:
+                            break
+
+                    if found:
                        break


This function has the same flawed recursive search logic as _find_model_path in run.py. The else block uses models_dir.rglob(name), which searches at all depths, ignoring the max_depth parameter. The for depth in range(max_depth) loop also leads to redundant rglob calls. This is inefficient and does not respect the max_depth constraint. The unused glob_pattern logic from lines 313-317 is also confusing and should be removed.

# Search recursively up to max_depth for name in possible_names: for depth in range(max_depth): pattern = "*/" * depth + name for path in models_dir.glob(pattern): if path.is_dir() and (path / "config.json").exists(): results.append((model, path)) found = True break if found: break

* fix pypi cuda install (#1763) * Update release-pypi.yml (#1764) * fix cuda wheel build (#1766) * Cli (#1765) * [feat]: add custom option for kt run * [feat]: depth 3 * [docs]: add kt-cli doc and update corresponding website (#1768) * Remove kt-kernel-cuda, kt-kernel uses the version with cuda (#1769) * Update release-pypi.yml (#1770) * bump to 0.5.0.post1 (#1771) * [ci]: Patch ci (#1772) * [docs]: add kt-cli doc and update corresponding website * [feat]: update issue template * [fix]: fix moe hpp bug. (#1780) fix moe hpp init bug. * Fix moe bug. (#1783) * [fix]: fix moe.hpp load from file bug. * [fix]: fix all moe hpp init bug. * [fix]: fix moe & awq-moe ug. * [feat](kt-sft-refactor): load from huggingface safetensor file * [fix]: fix bugs for activation, sft forward and backward --------- Co-authored-by: Jianwei Dong <dongjw24@mails.tsinghua.edu.cn> Co-authored-by: ErvinXie <ervinxie@qq.com> Co-authored-by: ZiWei Yuan <yzwliam@126.com> Co-authored-by: Oql <1692110604@qq.com> Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>

ErvinXie added 2 commits December 24, 2025 15:55

[feat]: add custom option for kt run

3e726f6

[feat]: depth 3

9305985

ErvinXie added the run-ci label Dec 29, 2025

gemini-code-assist bot reviewed Dec 29, 2025

View reviewed changes

ErvinXie merged commit 9539ab9 into main Dec 29, 2025
9 checks passed

ErvinXie deleted the cli branch December 29, 2025 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cli#1765

Cli#1765
ErvinXie merged 2 commits intomainfrom
cli

ErvinXie commented Dec 29, 2025

Uh oh!

gemini-code-assist bot commented Dec 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ErvinXie commented Dec 29, 2025

Uh oh!

gemini-code-assist bot commented Dec 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant