Skip to content

Commit 27708f2

Browse files
committed
updated project with Transformer support
1 parent cb3c41f commit 27708f2

19 files changed

+1403
-23
lines changed

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ authors:
66
orcid: https://orcid.org/0000-0003-1786-7551
77
title: "MMKit-Features: Multimodal Features Extraction Toolkit"
88
version: 0.0.1
9-
date-released: 2022-06-04
9+
date-released: 2023-05-12

README.md

Lines changed: 57 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,39 @@
11
# MMKit-Features: Multimodal Feature Extraction Toolkit
22

3-
A light-weight Python library to utilize multimodal features for deep learning.
3+
Traditional knowledge graphs (KGs) are usually comprised of entities, relationships, and attributes. However, they are not designed to effectively store or represent multimodal data. This limitation prevents them from capturing and integrating information from different modes of data, such as text, images, and audio, in a meaningful and holistic way.
4+
5+
The `MMKit-Features` project proposes a multimodal architecture to build multimodal knowledge graphs with flexible multimodal feature extraction and dynamic multimodal concept generation.
46

57
## Project Goal
6-
- To extract, store, and fuse various features from multimodal datasets rapidly and efficiently;
7-
- To provide a common multimodal information processing framework for multimodal features;
8-
- To achieve generative adversarial network-based multimodal knowledge representation dynamically.
8+
- To extract, store, and fuse various multimodal features from multimodal datasets efficiently;
9+
- To achieve generative adversarial network(GAN)-based multimodal knowledge representation dynamically in multimodal knowledge graphs;
10+
- To provide a common deep learning-based architecture to enhance multimodal knowledge reasoning in real life.
911

1012
## Installation
1113

14+
You can install this toolkit using our [PyPi](https://pypi.org/project/mmkit-features/) package.
15+
1216
```
1317
pip install mmkit-features
1418
```
1519

16-
## Framework
20+
## Design Science Framework
21+
22+
![Multimodal Computational Sequence](doc/images/multimodal-computational-sequence.jpg)
23+
24+
Figure 1: Multimodal Computational Sequence
1725

18-
![Design science canvas](https://dhchenx.github.io/projects/mmk-features/images/design-science-canvas.jpg)
26+
![GAN-based Multimodal Concept Generation](doc/images/gan-based-cross-modal-generation.jpg)
27+
28+
Figure 2: GAN-based Multimodal Concpet Generation
1929

2030
## Modalities
2131

2232
1. Text/Language modality
2333
2. Image modality
2434
3. Video modality
25-
4. Speech/sound modality
26-
5. Cross-modality between above
35+
4. Audio modality
36+
5. Cross-modality among above
2737

2838
## Usage
2939
A toy example showing how to build a multimodal feature (MMF) library is here:
@@ -34,19 +44,19 @@ from mmkfeatures.fusion.mm_features_node import MMFeaturesNode
3444
import numpy as np
3545
if __name__ == "__main__":
3646
# 1. create an empty multimodal features library with root and dataset names
37-
feature_lib=MMFeaturesLib(root_name="test features",dataset_name="test_features")
47+
feature_lib = MMFeaturesLib(root_name="test features",dataset_name = "test_features")
3848
# 2. set short names for each dimension for convenience
3949
feature_lib.set_features_name(["feature1","feature2","feature3"])
4050
# 3. set a list of content IDs
41-
content_ids=["content1","content2","content3"]
51+
content_ids = ["content1","content2","content3"]
4252
# 4. according to IDs, assign a group of features with interval to corresponding content ID
43-
features_dict={}
53+
features_dict = {}
4454
for id in content_ids:
45-
mmf_node=MMFeaturesNode(id)
55+
mmf_node = MMFeaturesNode(id)
4656
mmf_node.set_item("name",str(id))
4757
mmf_node.set_item("features",np.array([[1,2,3]]))
4858
mmf_node.set_item("intervals",np.array([[0,1]]))
49-
features_dict[id]=mmf_node
59+
features_dict[id] = mmf_node
5060
# 5. set the library's data
5161
feature_lib.set_data(features_dict)
5262
# 6. save the features to disk for future use
@@ -55,10 +65,42 @@ if __name__ == "__main__":
5565
feature_lib.show_structure("test6_feature.csd")
5666
# 8. have a glance of features content within the dataset
5767
feature_lib.show_sample_data("test6_feature.csd")
68+
# 9. Finally, we construct a simple multimodal knowledge base.
5869
```
5970

6071
Further instructions on the toolkit refers to [here](https://github.com/dhchenx/mmkit-features/tree/main/doc).
6172

73+
74+
## Applications
75+
76+
Here are some examples of using our work in real life with codes and documents.
77+
78+
### 1. Multimodal Features Extractors
79+
80+
- [Text Features Extraction](doc/text_features_extraction.md)
81+
- [Speech Features Extraction](doc/speech_features_extraction.md)
82+
- [Image Features Extractoin](doc/image_features_extraction.md)
83+
- [Video Features Extraction](doc/video_features_extraction.md)
84+
- [Transformer-based Features Extraction](src/mmkfeatures/transformer/README.md)
85+
86+
### 2. Multimodal Feature Library (MMFLib)
87+
88+
- [Basic Computational Sequence](doc/simple_computational_seq_use.md)
89+
- [Core use of MMFLib](doc/multimodal_features_library.md)
90+
91+
### 3. Multimodal knowledge bases
92+
93+
- [Multimodal Birds Feature Library](doc/example_bird_library.md)
94+
- [Multimodal Disease Coding Feature Library](doc/example_icd11_library.md)
95+
- [Multimodal ROCO Feature Library](examples/roco_lib/step1_create_lib_roco.py)
96+
97+
### 4. Multimodal Indexing and Querying
98+
99+
- [Brute Force Indexing](examples/birds_features_lib/step3_use_index.py)
100+
- [Inverted Indexing](examples/birds_features_lib/step3_use_index.py)
101+
- [Positional Indexing](examples/birds_features_lib/step3_use_index.py)
102+
- [Multimodal Indexing and querying](examples/birds_features_lib/evaluate/)
103+
62104
## Credits
63105

64106
The project includes some source codes from various open-source contributors. Here is a list of their contributions.
@@ -71,11 +113,11 @@ The project includes some source codes from various open-source contributors. He
71113

72114
## License
73115

74-
This project is provided by [Donghua Chen](https://github.com/dhchenx) with MIT license.
116+
The `mmkit-features` project is provided by [Donghua Chen](https://github.com/dhchenx) with MIT license.
75117

76118
## Citation
77119

78120
Please cite our project if the project is used in your research.
79121

80-
Chen, D. (2022). MMKit-Features: Multimodal Features Extraction Toolkit (Version 0.0.1) [Computer software]
122+
Chen, D. (2023). MMKit-Features: Multimodal Features Extraction Toolkit (Version 0.0.2) [Computer software]
81123

doc/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# MMKit-Features Documents
22

3-
This section presents a summary of usage of the features used in the MMKit-Features Python library.
3+
This section presents a summary of usage of the features used in the `MMKit-Features` Python library.
44

5-
There are several modules used to implement different functions to cope with multimodal features extraction, namely text, image, speech, and video features. Moreover, the toolkit allows us to fuse and store the extracted multimodal features in a rapid and easy manner.
5+
To handle the extraction of various multimodal features such as text, image, speech, and video, different modules are utilized. Furthermore, the toolkit enables the quick and simple fusion and storage of the extracted features.
66

77
## Features Extraction
88

doc/example_icd11_library.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Establishing ICD-11 disease coding library
22

3-
The example demonstrates steps to create a multimodal feature library using the datasets from International Classification of Diseases, Eleventh Revision (ICD-11). The ICD-11 datasets contains massive text description of disease entities and their complicated relationships. It is also sutable for use to show the use of the `mmkit-features` toolkit.
3+
This example demonstrates steps to create a multimodal feature library using the datasets from International Classification of Diseases, Eleventh Revision (ICD-11). The ICD-11 datasets contains massive text description of disease entities and their complicated relationships. It is also suitable for use to show the use of the `mmkit-features` toolkit.
44

55
### Steps
66

Loading
Loading

doc/simple_computational_seq_use.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This base class is deprived of the open-source CMU-Multimodal-SDK project which allows us to store multimodal objects like audio and video files. The core features of the ```computational sequence``` in the SDK is to develop a simple way to store each chunk's features in an order in video/audio files. For example, we can divide a 1-minute video into 60 1-second clips which can be stored in a time order. Then each clip is represented by its extracted features. The computation sequence class considers all objects have a basic property which is time.
44

5-
In our project, we extend the concept of computational sequence in many ways, specially providing a more common way to store, fuse and retrieve extracted features from all sources. In this section, we firstly describe the basic usage of the computational sequence in our project.
5+
In our project, we extended the concept of computational sequence in many ways, specially providing a more common way to store, fuse and retrieve extracted features from all sources. In this section, we firstly describe the basic usage of the computational sequence in our project.
66

77
Here is a toy example to show the use of computational sequence.
88

@@ -120,4 +120,6 @@ if __name__=="__main__":
120120
mydataset.align("compseq_1")
121121
```
122122

123-
The above example is a simple toy one and not suitable for complicated multimodal features use. Therefore, based on the `computational sequence`, we developed a brand-new and complicated one named `computatoinal_sequencex` to facilitate a common frame of storing and manipulating multimodal features for high-level applications in many fields. We will discuss the new one in other section.
123+
The above example is a simple toy one and not suitable for complicated multimodal features use. Therefore, based on the `computational sequence`, we developed a brand-new and complicated one named `computatoinal_sequencex` to facilitate a common frame of storing and manipulating multimodal features for high-level applications in many fields.
124+
125+
We will discuss the new one in other section.

doc/text_features_extraction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,4 @@ if __name__=="__main__":
3434

3535
```
3636

37-
Most of the methods generate word vectors with fixed length to represent text for our analysis. We highly recommend to use GloVe embedding to generate word vectors.
37+
Most of the methods generate word vectors with fixed length to represent text for our analysis. We highly recommend you to use GloVe embedding to generate word vectors.

doc/video_features_extraction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Video Features Extraction
22

3-
Extracting video features from a video file like *.mp4 file is very complicated. There are many frames from the video which are considered as images. But at the same time, we have to consider the temporal information in the video.
3+
Extracting video features from a video file like `*.mp4` file is very complicated. There are many frames from the video which are considered as images. But at the same time, we have to consider the temporal information in the video.
44

55
A simple example of extracting video features using the `mmkit-features` toolkit is below:
66

src/mmkfeatures/transformer/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
## Transformer-based Feature Extraction
2+
3+
We also integrate the state-of-the-art Transformer-based methods to extract features based on a series of large-scale pretrained models.
4+
5+
### Examples
6+
7+
1. Text-based Transformer-based feature extraction using [Transformer-XL](https://huggingface.co/transfo-xl-wt103).
8+
9+
2. Image-based Transformer-based feature extraction based on [Swin Transformer](https://github.com/microsoft/Swin-Transformer).
10+
11+
More implementaion of Transformer-based extractors are coming soon.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# --------------------------------------------------------
2+
# Swin Transformer
3+
# Copyright (c) 2021 Microsoft
4+
# Licensed under The MIT License [see LICENSE for details]
5+
# Written by Ze Liu
6+
# --------------------------------------------------------
7+
8+
from .swin_transformer import SwinTransformer
9+
10+
11+
def build_model(config, encoder='swintransformer'):
12+
if encoder == 'swintransformer':
13+
model = SwinTransformer(img_size=config.DATA.IMG_SIZE,
14+
patch_size=config.MODEL.SWIN.PATCH_SIZE,
15+
in_chans=config.MODEL.SWIN.IN_CHANS,
16+
num_classes=config.MODEL.NUM_CLASSES,
17+
embed_dim=config.MODEL.SWIN.EMBED_DIM,
18+
depths=config.MODEL.SWIN.DEPTHS,
19+
num_heads=config.MODEL.SWIN.NUM_HEADS,
20+
window_size=config.MODEL.SWIN.WINDOW_SIZE,
21+
mlp_ratio=config.MODEL.SWIN.MLP_RATIO,
22+
qkv_bias=config.MODEL.SWIN.QKV_BIAS,
23+
qk_scale=config.MODEL.SWIN.QK_SCALE,
24+
ape=config.MODEL.SWIN.APE,
25+
patch_norm=config.MODEL.SWIN.PATCH_NORM)
26+
# 固定网络参数
27+
for para in model.parameters():
28+
para.requires_grad = False
29+
else:
30+
raise NotImplementedError(f"Unknown model: {encoder}")
31+
32+
return model
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# --------------------------------------------------------
2+
# Swin Transformer
3+
# Copyright (c) 2021 Microsoft
4+
# Licensed under The MIT License [see LICENSE for details]
5+
# Written by Ze Liu
6+
# --------------------------------------------------------'
7+
8+
import os
9+
import yaml
10+
from yacs.config import CfgNode as CN
11+
12+
_C = CN()
13+
14+
# Base config files
15+
_C.BASE = ['']
16+
_C.TAG = "default"
17+
18+
# -----------------------------------------------------------------------------
19+
# Data settings
20+
# -----------------------------------------------------------------------------
21+
_C.DATA = CN()
22+
# Batch size for a single GPU, could be overwritten by command line argument
23+
_C.DATA.BATCH_SIZE = 1
24+
# Path to dataset, could be overwritten by command line argument
25+
_C.DATA.DATA_PATH = r'D:\UIBE科研\国自科青年\多模态机器学习\projects\mmkit-features\examples\birds_features_lib\datasets\CUB_200_2011\images'
26+
_C.DATA.DATABASE_PATH = './database/DB.npz'
27+
# Path to index table
28+
_C.DATA.INDEX_PATH = './database/index.txt'
29+
# Input image size
30+
_C.DATA.IMG_SIZE = 224
31+
# Interpolation to resize image (random, bilinear, bicubic)
32+
_C.DATA.INTERPOLATION = 'bicubic'
33+
# Pin CPU memory in DataLoader for more efficient (sometimes) transfer to GPU.
34+
_C.DATA.PIN_MEMORY = True
35+
# Number of data loading threads
36+
_C.DATA.NUM_WORKERS = 8
37+
38+
# -----------------------------------------------------------------------------
39+
# Model settings
40+
# -----------------------------------------------------------------------------
41+
_C.MODEL = CN()
42+
# Model type
43+
_C.MODEL.TYPE = 'swin'
44+
# Model name
45+
_C.MODEL.NAME = 'swin_tiny_patch4_window7_224'
46+
# num classes
47+
_C.MODEL.NUM_CLASSES = 1000
48+
49+
# Swin Transformer parameters
50+
_C.MODEL.SWIN = CN()
51+
_C.MODEL.SWIN.PATCH_SIZE = 4
52+
_C.MODEL.SWIN.IN_CHANS = 3
53+
_C.MODEL.SWIN.EMBED_DIM = 96
54+
_C.MODEL.SWIN.DEPTHS = [2, 2, 6, 2]
55+
_C.MODEL.SWIN.NUM_HEADS = [3, 6, 12, 24]
56+
_C.MODEL.SWIN.WINDOW_SIZE = 7
57+
_C.MODEL.SWIN.MLP_RATIO = 4.
58+
_C.MODEL.SWIN.QKV_BIAS = True
59+
_C.MODEL.SWIN.QK_SCALE = None
60+
_C.MODEL.SWIN.APE = False
61+
_C.MODEL.SWIN.PATCH_NORM = True
62+
63+
64+
def _update_config_from_file(config, cfg_file):
65+
config.defrost()
66+
with open(cfg_file, 'r') as f:
67+
yaml_cfg = yaml.load(f, Loader=yaml.FullLoader)
68+
69+
for cfg in yaml_cfg.setdefault('BASE', ['']):
70+
if cfg:
71+
_update_config_from_file(
72+
config, os.path.join(os.path.dirname(cfg_file), cfg)
73+
)
74+
print('=> merge config from {}'.format(cfg_file))
75+
config.merge_from_file(cfg_file)
76+
config.freeze()
77+
78+
79+
def update_config(config, args):
80+
_update_config_from_file(config, args.cfg)
81+
82+
config.defrost()
83+
84+
# merge from specific arguments
85+
if args.batch_size:
86+
config.DATA.BATCH_SIZE = args.batch_size
87+
if args.data_path:
88+
config.DATA.DATA_PATH = args.data_path
89+
if args.resume:
90+
config.MODEL.RESUME = args.resume
91+
92+
93+
# set local rank for distributed training
94+
config.LOCAL_RANK = args.local_rank
95+
96+
config.freeze()
97+
98+
99+
def get_config(args):
100+
"""Get a yacs CfgNode object with default values."""
101+
# Return a clone so that the defaults will not be altered
102+
# This is for the "local variable" use pattern
103+
config = _C.clone()
104+
update_config(config, args)
105+
106+
return config

0 commit comments

Comments
 (0)