fix(deps): update dependency transformers to v4.49.0 - autoclosed #242
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
4.48.3
->4.49.0
Release Notes
huggingface/transformers (transformers)
v4.49.0
: : Helium, Qwen2.5-VL, SuperGlue, Granite Vision, Zamba2, GOT-OCR 2.0, DAB-DETR, Depth Pro, RT-DETRv2Compare Source
New models
Helium
Helium-1 preview is a lightweight language model with 2B parameters, targeting edge and mobile devices. It supports the following languages: English, French, German, Italian, Portuguese, Spanish.
Qwen2.5-VL
The Qwen2.5-VL model is an update to Qwen2-VL from Qwen team, Alibaba Group.
The abstract from this update is the following:
Qwen2.5-VL marks a major step forward from Qwen2-VL, built upon the latest Qwen2.5 LLM. We’ve accelerated training and testing through the strategic implementation of window attention within the ViT. The ViT architecture itself has been refined with SwiGLU and RMSNorm, aligning it more closely with the LLM’s structure. A key innovation is the expansion of native dynamic resolution to encompass the temporal dimension, in addition to spatial aspects. Furthermore, we’ve upgraded MRoPE, incorporating absolute time alignment on the time axis to allow the model to effectively capture temporal dynamics, regardless of frame rate, leading to superior video understanding.
SuperGlue
The SuperGlue model was proposed in SuperGlue: Learning Feature Matching with Graph Neural Networks by Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz and Andrew Rabinovich.
This model consists of matching two sets of interest points detected in an image. Paired with the SuperPoint model, it can be used to match two images and estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.
Granite Vision Support
The Granite Vision model is a variant of LLaVA-NeXT, leveraging a Granite language model alongside a SigLIP visual encoder. It utilizes multiple concatenated vision hidden states as its image features, similar to VipLlava. It also uses a larger set of image grid pinpoints than the original LlaVa-NeXT models to support additional aspect ratios.
Zamba2
Zamba2 is a large language model (LLM) trained by Zyphra, and made available under an Apache 2.0 license.
Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B are hybrid models combining state-space models (Specifically Mamba) and transformer, and were trained using next-token prediction. Zamba2 uses shared transformer layers after every 6 mamba blocks. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B were pre-trained on 2T and 3T tokens, respectively.
GOT-OCR 2.0
GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music. While this implementation of the model will only output plain text, the outputs can be further processed to render the desired format, with packages like pdftex, mathpix, matplotlib, tikz, verovio or pyecharts. The model can also be used for interactive OCR, where the user can specify the region to be recognized by providing the coordinates or the color of the region’s bounding box.
DAB-DETR
DAB-DETR is an enhanced variant of Conditional DETR. It utilizes dynamically updated anchor boxes to provide both a reference query point (x, y) and a reference anchor size (w, h), improving cross-attention computation. This new approach achieves 45.7% AP when trained for 50 epochs with a single ResNet-50 model as the backbone.
Depth PRO
DepthPro is a foundation model for zero-shot metric monocular depth estimation, designed to generate high-resolution depth maps with remarkable sharpness and fine-grained details. It employs a multi-scale Vision Transformer (ViT)-based architecture, where images are downsampled, divided into patches, and processed using a shared Dinov2 encoder. The extracted patch-level features are merged, upsampled, and refined using a DPT-like fusion stage, enabling precise depth estimation.
RT-DETRv2
An improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility. These improvements yield a 0.3 to 1.4 increase in mAP metrics on the COCO dataset, all while maintaining the same parameter count and frames-per-second (FPS) performance.
Transformers-CLI
Transformers' CLI welcomes a new command:
chat
. This command starts a conversation with the model of your choosing directly in your terminal.This feature exists in TRL and has been migrated to
transformers
for easier usage.Processor Standardization
An ongoing work is to standardize the image processors so that their API is equivalent. Additionally, the processors are given a fast variant so that they are never blockers in the image processing pipelines.
In this release, several processors have been standardized and have seen their fast version be contributed.
Breaking changes
DPT segmentation maps
DPT image processors did not support
segmentation_maps
, instead only requiringimages
. This has been fixed.This adds an argument to the
preprocess
method, therefore users using arguments as positional arguments with that method may see changed behavior. We recommend using keyword arguments for such methods so as to not be bothered by the addition of new features.segmentation maps
support for DPT image processor by @simonreise in #34345Image classification pipeline and single vs multi-label
The
problem_type
in the config.json file was read incorrectly by the pipeline, which mapped single-label to multi-label losses, and vice-versa. This has been fixed.Fixing the LayerNorm beta/gamma renames
The description of the pull request is the easiest way to understand the problem, why it exists, and how it is solved; please read the description below:
VLM cleanup
The
ignore_index
property of the llava configuration has been removed as it was not serving a purpose.Quantization
Quantization has received several improvements and fixes, including the contribution of FP8 quantization and the HIGGS quantization interface.
Generate
max_length
by @gante in #36120generate
-related objects and methods scheduled for removal in v4.48 by @gante in #35677GenerationConfig(cache_implementation="static")
by @gante in #35679SequenceBiasLogitsProcessor
by @gante in #35699torch.compile(model.forward)
as a fast test by @gante in #34544Pipelines
Pipelines have received several bug fixes and improvements which are detailed below.
Bugfixes and improvements
test_custom_4d_attention_mask
by @ydshieh in #35606EarlyStoppingCallback
not requireload_best_model_at_end
by @muellerzr in #35101test_beam_search_low_memory
by @ydshieh in #35611MobileNetV1ModelTest::test_batching_equivalence
for now by @ydshieh in #35614Phi
] bias should be True by @ArthurZucker in #35650Compile
] Only test compiling model forward pass by @ArthurZucker in #35658zero_shot_image_classification
documentation guide link in SigLIP by @aretrace in #35671Trainer
cannot correctly calltorch_jit_model_eval
by @Wanguy in #35722pt_to_tf
by @gante in #35672check_circleci_user
job by @Sai-Suraj-27 in #32866MimiModel
with DeepSpeed ZeRO-3 by @anferico in #34735PeftModel
by @ambroser53 in #35680MimiModel
with DeepSpeed ZeRO-3" by @eustlb in #35755self-comment-ci.yml
by @ydshieh in #35548timm
import behaviour by @rwightman in #35800test_batching_equivalence
's flakiness by @ydshieh in #35729TimmWrapper
by @ariG23498 in #35744timm
tag to timm-wrapper models. by @pcuenca in #35794get_cached_models
by @Wauplin in #35809docs/source/ar/tasks/masked_language_modeling.md
into Arabic by @AhmedAlmaghz in #35198benchmark
code by @gante in #35730self-comment-ci.yml
by @ydshieh in #35816working-directory
inself-comment-ci.yml
by @ydshieh in #35833head_dim
in config extracted from Gemma2 GGUF model by @Isotr0py in #35818tests
] remove some flash attention class tests by @ArthurZucker in #35817num_logits_to_keep
as Tensor + add flag by @Cyrilvallez in #35757test_pipelines_video_classification
that was always failing by @CalOmnie in #35842Rocketknight1
toself-comment-ci.yml
by @ydshieh in #35881_supports_static_cache = True
for some model classes by @ydshieh in #34975test_generated_length_assisted_generation
by @keyboardAnt in #34935unwrap_and_save_reload_schedule
to useweights_only=False
by @ydshieh in #35952squad_convert_example_to_features
to work with numpy v2 by @ydshieh in #35955test_assisted_decoding_matches_greedy_search
by @ydshieh in #35951transformers-pytorch-deepspeed-latest-gpu
by @ydshieh in #35940Tester object has no attribute '_testMethodName'
by @faaany in #35781TimmBackboneModelTest::test_batching_equivalence
by @ydshieh in #35971benchmark.yml
by @ydshieh in #35974generation
/quantization
) by @ydshieh in #35341self-comment-ci.yml
by @ydshieh in #36030Qwen2VLImageProcessorFast
intoQwen2VLProcessor
by @yeliudev in #35987past_key_values
by @yaswanth19 in #35890test_flash_attn_2_can_dispatch_composite_models
by @ydshieh in #36050trainer.md
by @faaany in #36066perf_infer_gpu_one.md
by @faaany in #36087torch.export
and fix some vision models by @qubvel in #35124output_dir
Optional inTrainingArguments
#27866 by @sambhavnoobcoder in #35735PretrainedConfig
andPreTrainedModel
by @hmellor in #36091Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.