Skip to content

Incorrect Type Definitions for AutoTokenizer and AutoModelForCausalLM #1495

@VinayHajare

Description

@VinayHajare

System Info

  • Library: @huggingface/transformers
  • Version: 3.8.1
  • TypeScript: 5.9.3
  • Node.js/Browser: 24.12.0

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

Problem Description

The TypeScript type definitions for AutoTokenizer and AutoModelForCausalLM are incomplete, causing compilation errors when trying to use methods that are available at runtime but missing from the type definitions.

Specific Issues:

  1. AutoTokenizer missing methods: The apply_chat_template and decode methods are not available in the AutoTokenizer type definition, even though they exist in the underlying PreTrainedTokenizer class.

  2. AutoModelForCausalLM missing methods: The generate method is not available in the AutoModelForCausalLM type definition, even though it exists in the underlying PreTrainedModel class.

  3. Incorrect inheritance hierarchy: The auto classes don't properly expose the methods from their base classes in their TypeScript definitions.

Error Messages:

Property 'decode' does not exist on type 'AutoTokenizer'.
Property 'generate' does not exist on type 'AutoModelForCausalLM'.
Property 'apply_chat_template' does not exist on type 'AutoTokenizer'.

Current Workaround:

Users are forced to use type assertions like as any or manual casting to PreTrainedTokenizer/PreTrainedModel to access these methods, which defeats the purpose of type safety.

Expected Behavior

The AutoTokenizer and AutoModelForCausalLM classes should properly expose all methods from their base classes (PreTrainedTokenizer and PreTrainedModel respectively) in their TypeScript type definitions, allowing direct access to methods like apply_chat_template, decode, and generate without requiring type assertions.

Actual Behavior

The type definitions only expose the factory methods (from_pretrained) but not the actual model/tokenizer methods that are available after instantiation.

Reproduction

Code Example

import { AutoModelForCausalLM, AutoTokenizer } from '@huggingface/transformers';

async function example() {
    const tokenizer = await AutoTokenizer.from_pretrained('Xenova/functiongemma-270m-game');
    const model = await AutoModelForCausalLM.from_pretrained('Xenova/functiongemma-270m-game');

    const messages = [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: 'Hello!' }
    ];

    // This should work but TypeScript compilation fails:
    const formattedInput = tokenizer.apply_chat_template(messages, {
        add_generation_prompt: true,
        tokenize: true,
        return_tensors: false,
        return_dict: true,
    });

    // This should work but TypeScript compilation fails:
    const output = await model.generate([1, 2, 3], {
        max_new_tokens: 100,
        do_sample: false,
        temperature: 0.1,
    });

    // This should work but TypeScript compilation fails:
    const decodedText = await tokenizer.decode(output[0], { 
        skip_special_tokens: true 
    });

    console.log(decodedText);
}

// The above code will compile with errors like:
// Property 'apply_chat_template' does not exist on type 'AutoTokenizer'
// Property 'generate' does not exist on type 'AutoModelForCausalLM'
// Property 'decode' does not exist on type 'AutoTokenizer'

Additional Context

This issue affects users who want to use the full functionality of the tokenizer and model classes while maintaining type safety. The current type definitions force users to choose between type safety (using workarounds) and accessing the library's full API.

The auto classes (AutoTokenizer, AutoModelForCausalLM) are factory classes that return instances of the actual tokenizer/model classes, so their type definitions should reflect the full interface of those instances, not just the factory methods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions