Skip to content

Cli#1765

Merged
ErvinXie merged 2 commits intomainfrom
cli
Dec 29, 2025
Merged

Cli#1765
ErvinXie merged 2 commits intomainfrom
cli

Conversation

@ErvinXie
Copy link
Collaborator

Fix Cli

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ErvinXie, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the kt-kernel command-line interface by enhancing diagnostic capabilities and improving the flexibility of the model inference server (run) command. The doctor command now offers comprehensive insights into kt-kernel installations, while the run command's migration to click allows for seamless passthrough of advanced SGLang server options, making the CLI more powerful and adaptable to various deployment scenarios. Additionally, model discovery has been made more robust through recursive path searching.

Highlights

  • Enhanced doctor command: The doctor command now provides detailed diagnostics for kt-kernel installations, including version, CPU variant (AMX, AVX512, AVX2), and installation paths, offering specific hints and status indicators. It also removes the check for the ktransformers package.
  • Refactored run command with click: The run command has been migrated from typer to click to enable passing arbitrary, unknown command-line arguments directly to the underlying SGLang server, significantly increasing its flexibility and extensibility.
  • Improved Model Path Discovery: The _find_model_path and find_local_models utilities now support recursive searching for model directories up to a specified depth, making model discovery more robust and capable of finding models nested deeper within configured model paths.
  • Dynamic Argument Handling for run: The main CLI entry point now specially handles the run command, bypassing typer's default argument parsing and delegating to click for run's specific argument handling, including the ability to process and forward unrecognized options to the SGLang server.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the kt-kernel CLI by upgrading the run command to use click instead of typer, allowing it to accept and pass through arbitrary, unknown command-line arguments directly to the underlying SGLang server. This refactoring involved creating a new internal _run_impl function and special handling in main.py to manage click's argument parsing. Additionally, the doctor command is updated to provide detailed kt-kernel installation diagnostics, including CPU variant detection (AMX, AVX512, AVX2), version, and installation paths, and removes ktransformers from its package checks. The model path discovery logic in both run.py and model_registry.py was modified to support recursive searching up to a specified max_depth for local models. Review comments highlighted critical issues with the initial implementation of CPU variant parsing in doctor.py, noting its fragility and suggesting a more robust splitting method. The recursive model search logic in both _find_model_path and find_local_models was also identified as flawed, inefficiently using rglob and ignoring the max_depth parameter, with a more efficient glob-pattern-based approach suggested for correction.

Comment on lines +64 to +76
if "_kt_kernel_ext_" in basename:
# Extract variant from _kt_kernel_ext_amx.cpython-311-x86_64-linux-gnu.so
parts = basename.split("_")
if len(parts) >= 4:
variant = parts[3] # "amx" from "_kt_kernel_ext_amx..."
if variant.startswith("avx"):
# Normalize avx variants
if variant in ["avx512", "avx512_bf16", "avx512_vbmi", "avx512_vnni", "avx512_base"]:
variants.add("avx512")
else:
variants.add(variant)
else:
variants.add(variant)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic for parsing the CPU variant from the .so filename is incorrect. basename.split("_") is fragile, and parts[3] will not correctly extract the variant. For a filename like _kt_kernel_ext_amx.cpython-311-x86_64-linux-gnu.so, parts[3] would be 'ext', not 'amx'. This will cause incorrect variant detection and lead to wrong recommendations in the doctor command. A more robust method, such as splitting on the known prefix _kt_kernel_ext_, should be used.

Suggested change
if "_kt_kernel_ext_" in basename:
# Extract variant from _kt_kernel_ext_amx.cpython-311-x86_64-linux-gnu.so
parts = basename.split("_")
if len(parts) >= 4:
variant = parts[3] # "amx" from "_kt_kernel_ext_amx..."
if variant.startswith("avx"):
# Normalize avx variants
if variant in ["avx512", "avx512_bf16", "avx512_vbmi", "avx512_vnni", "avx512_base"]:
variants.add("avx512")
else:
variants.add(variant)
else:
variants.add(variant)
if "_kt_kernel_ext_" in basename:
# Extract variant from _kt_kernel_ext_amx.cpython-311-x86_64-linux-gnu.so
try:
variant_part = basename.split("_kt_kernel_ext_")[1]
variant = variant_part.split(".")[0]
if variant.startswith("avx") and variant in {"avx512", "avx512_bf16", "avx512_vbmi", "avx512_vnni", "avx512_base"}:
variants.add("avx512")
else:
variants.add(variant)
except IndexError:
# Could not parse variant, skip this file.
pass

Comment on lines +607 to +619
# Search recursively up to max_depth
for depth in range(max_depth):
for name in possible_names:
if depth == 0:
# Direct children: models_dir / name
search_paths = [models_dir / name]
else:
# Nested: use rglob to find directories matching the name
search_paths = list(models_dir.rglob(name))

for path in possible_paths:
if path.exists() and (path / "config.json").exists():
return path
for path in search_paths:
if path.exists() and (path / "config.json").exists():
return path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The recursive search logic in _find_model_path is flawed. The else block uses models_dir.rglob(name), which searches at all depths, ignoring the max_depth parameter from the docstring. Additionally, the for depth in range(max_depth) loop causes rglob to be called multiple times redundantly if a model is not found at depth=1. This is inefficient and does not respect the max_depth constraint. A correct implementation should build a glob pattern for each depth.

Suggested change
# Search recursively up to max_depth
for depth in range(max_depth):
for name in possible_names:
if depth == 0:
# Direct children: models_dir / name
search_paths = [models_dir / name]
else:
# Nested: use rglob to find directories matching the name
search_paths = list(models_dir.rglob(name))
for path in possible_paths:
if path.exists() and (path / "config.json").exists():
return path
for path in search_paths:
if path.exists() and (path / "config.json").exists():
return path
# Search recursively up to max_depth
for name in possible_names:
for depth in range(max_depth):
pattern = "*/" * depth + name
for path in models_dir.glob(pattern):
if path.is_dir() and (path / "config.json").exists():
return path

Comment on lines +311 to 337
# Search recursively up to max_depth
for depth in range(max_depth):
# Build glob pattern for current depth
# depth=0: direct children, depth=1: grandchildren, etc.
glob_pattern = "*" if depth > 0 else ""
for _ in range(depth):
glob_pattern = "*/" + glob_pattern if glob_pattern else "*"

for name in possible_names:
if depth == 0:
# Direct children: models_dir / name
search_paths = [models_dir / name]
else:
# Nested: use rglob to find directories matching the name
search_paths = list(models_dir.rglob(name))

for path in search_paths:
if path.exists() and (path / "config.json").exists():
results.append((model, path))
found = True
break

if found:
break

if found:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function has the same flawed recursive search logic as _find_model_path in run.py. The else block uses models_dir.rglob(name), which searches at all depths, ignoring the max_depth parameter. The for depth in range(max_depth) loop also leads to redundant rglob calls. This is inefficient and does not respect the max_depth constraint. The unused glob_pattern logic from lines 313-317 is also confusing and should be removed.

                # Search recursively up to max_depth
                for name in possible_names:
                    for depth in range(max_depth):
                        pattern = "*/" * depth + name
                        for path in models_dir.glob(pattern):
                            if path.is_dir() and (path / "config.json").exists():
                                results.append((model, path))
                                found = True
                                break
                    if found:
                        break

@ErvinXie ErvinXie merged commit 9539ab9 into main Dec 29, 2025
9 checks passed
@ErvinXie ErvinXie deleted the cli branch December 29, 2025 07:19
JimmyPeilinLi added a commit that referenced this pull request Jan 20, 2026
* fix pypi cuda install (#1763)

* Update release-pypi.yml (#1764)

* fix cuda wheel build (#1766)

* Cli (#1765)

* [feat]: add custom option for kt run

* [feat]: depth 3

* [docs]: add kt-cli doc and update corresponding website (#1768)

* Remove kt-kernel-cuda, kt-kernel uses the version with cuda (#1769)

* Update release-pypi.yml (#1770)

* bump to 0.5.0.post1 (#1771)

* [ci]: Patch ci (#1772)

* [docs]: add kt-cli doc and update corresponding website

* [feat]: update issue template

* [fix]: fix moe hpp bug. (#1780)

fix moe hpp init bug.

* Fix moe bug. (#1783)

* [fix]: fix moe.hpp load from file bug.

* [fix]: fix all moe hpp init bug.

* [fix]: fix moe & awq-moe ug.

* [feat](kt-sft-refactor): load from huggingface safetensor file

* [fix]: fix bugs for activation, sft forward and backward

---------

Co-authored-by: Jianwei Dong <dongjw24@mails.tsinghua.edu.cn>
Co-authored-by: ErvinXie <ervinxie@qq.com>
Co-authored-by: ZiWei Yuan <yzwliam@126.com>
Co-authored-by: Oql <1692110604@qq.com>
Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant