-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
System Info
- Library: @huggingface/transformers
- Version: 3.8.1
- TypeScript: 5.9.3
- Node.js/Browser: 24.12.0
Environment/Platform
- Website/web-app
- Browser extension
- Server-side (e.g., Node.js, Deno, Bun)
- Desktop app (e.g., Electron)
- Other (e.g., VSCode extension)
Description
Problem Description
The TypeScript type definitions for AutoTokenizer and AutoModelForCausalLM are incomplete, causing compilation errors when trying to use methods that are available at runtime but missing from the type definitions.
Specific Issues:
-
AutoTokenizermissing methods: Theapply_chat_templateanddecodemethods are not available in theAutoTokenizertype definition, even though they exist in the underlyingPreTrainedTokenizerclass. -
AutoModelForCausalLMmissing methods: Thegeneratemethod is not available in theAutoModelForCausalLMtype definition, even though it exists in the underlyingPreTrainedModelclass. -
Incorrect inheritance hierarchy: The auto classes don't properly expose the methods from their base classes in their TypeScript definitions.
Error Messages:
Property 'decode' does not exist on type 'AutoTokenizer'.
Property 'generate' does not exist on type 'AutoModelForCausalLM'.
Property 'apply_chat_template' does not exist on type 'AutoTokenizer'.
Current Workaround:
Users are forced to use type assertions like as any or manual casting to PreTrainedTokenizer/PreTrainedModel to access these methods, which defeats the purpose of type safety.
Expected Behavior
The AutoTokenizer and AutoModelForCausalLM classes should properly expose all methods from their base classes (PreTrainedTokenizer and PreTrainedModel respectively) in their TypeScript type definitions, allowing direct access to methods like apply_chat_template, decode, and generate without requiring type assertions.
Actual Behavior
The type definitions only expose the factory methods (from_pretrained) but not the actual model/tokenizer methods that are available after instantiation.
Reproduction
Code Example
import { AutoModelForCausalLM, AutoTokenizer } from '@huggingface/transformers';
async function example() {
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/functiongemma-270m-game');
const model = await AutoModelForCausalLM.from_pretrained('Xenova/functiongemma-270m-game');
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
];
// This should work but TypeScript compilation fails:
const formattedInput = tokenizer.apply_chat_template(messages, {
add_generation_prompt: true,
tokenize: true,
return_tensors: false,
return_dict: true,
});
// This should work but TypeScript compilation fails:
const output = await model.generate([1, 2, 3], {
max_new_tokens: 100,
do_sample: false,
temperature: 0.1,
});
// This should work but TypeScript compilation fails:
const decodedText = await tokenizer.decode(output[0], {
skip_special_tokens: true
});
console.log(decodedText);
}
// The above code will compile with errors like:
// Property 'apply_chat_template' does not exist on type 'AutoTokenizer'
// Property 'generate' does not exist on type 'AutoModelForCausalLM'
// Property 'decode' does not exist on type 'AutoTokenizer'Additional Context
This issue affects users who want to use the full functionality of the tokenizer and model classes while maintaining type safety. The current type definitions force users to choose between type safety (using workarounds) and accessing the library's full API.
The auto classes (AutoTokenizer, AutoModelForCausalLM) are factory classes that return instances of the actual tokenizer/model classes, so their type definitions should reflect the full interface of those instances, not just the factory methods.