You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A light-weight Python library to utilize multimodal features for deep learning.
3
+
Traditional knowledge graphs (KGs) are usually comprised of entities, relationships, and attributes. However, they are not designed to effectively store or represent multimodal data. This limitation prevents them from capturing and integrating information from different modes of data, such as text, images, and audio, in a meaningful and holistic way.
4
+
5
+
The `MMKit-Features` project proposes a multimodal architecture to build multimodal knowledge graphs with flexible multimodal feature extraction and dynamic multimodal concept generation.
4
6
5
7
## Project Goal
6
-
- To extract, store, and fuse various features from multimodal datasets rapidly and efficiently;
7
-
- To provide a common multimodal information processing framework for multimodal features;
8
-
- To achieve generative adversarial network-based multimodal knowledge representation dynamically.
8
+
- To extract, store, and fuse various multimodal features from multimodal datasets efficiently;
9
+
- To achieve generative adversarial network(GAN)-based multimodal knowledge representation dynamically in multimodal knowledge graphs;
10
+
- To provide a common deep learning-based architecture to enhance multimodal knowledge reasoning in real life.
9
11
10
12
## Installation
11
13
14
+
You can install this toolkit using our [PyPi](https://pypi.org/project/mmkit-features/) package.
Copy file name to clipboardExpand all lines: doc/README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# MMKit-Features Documents
2
2
3
-
This section presents a summary of usage of the features used in the MMKit-Features Python library.
3
+
This section presents a summary of usage of the features used in the `MMKit-Features` Python library.
4
4
5
-
There are several modules used to implement different functions to cope with multimodal features extraction, namely text, image, speech, and video features. Moreover, the toolkit allows us to fuse and store the extracted multimodal features in a rapid and easy manner.
5
+
To handle the extraction of various multimodal features such as text, image, speech, and video, different modules are utilized. Furthermore, the toolkit enables the quick and simple fusion and storage of the extracted features.
Copy file name to clipboardExpand all lines: doc/example_icd11_library.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
## Establishing ICD-11 disease coding library
2
2
3
-
The example demonstrates steps to create a multimodal feature library using the datasets from International Classification of Diseases, Eleventh Revision (ICD-11). The ICD-11 datasets contains massive text description of disease entities and their complicated relationships. It is also sutable for use to show the use of the `mmkit-features` toolkit.
3
+
This example demonstrates steps to create a multimodal feature library using the datasets from International Classification of Diseases, Eleventh Revision (ICD-11). The ICD-11 datasets contains massive text description of disease entities and their complicated relationships. It is also suitable for use to show the use of the `mmkit-features` toolkit.
Copy file name to clipboardExpand all lines: doc/simple_computational_seq_use.md
+4-2Lines changed: 4 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
This base class is deprived of the open-source CMU-Multimodal-SDK project which allows us to store multimodal objects like audio and video files. The core features of the ```computational sequence``` in the SDK is to develop a simple way to store each chunk's features in an order in video/audio files. For example, we can divide a 1-minute video into 60 1-second clips which can be stored in a time order. Then each clip is represented by its extracted features. The computation sequence class considers all objects have a basic property which is time.
4
4
5
-
In our project, we extend the concept of computational sequence in many ways, specially providing a more common way to store, fuse and retrieve extracted features from all sources. In this section, we firstly describe the basic usage of the computational sequence in our project.
5
+
In our project, we extended the concept of computational sequence in many ways, specially providing a more common way to store, fuse and retrieve extracted features from all sources. In this section, we firstly describe the basic usage of the computational sequence in our project.
6
6
7
7
Here is a toy example to show the use of computational sequence.
8
8
@@ -120,4 +120,6 @@ if __name__=="__main__":
120
120
mydataset.align("compseq_1")
121
121
```
122
122
123
-
The above example is a simple toy one and not suitable for complicated multimodal features use. Therefore, based on the `computational sequence`, we developed a brand-new and complicated one named `computatoinal_sequencex` to facilitate a common frame of storing and manipulating multimodal features for high-level applications in many fields. We will discuss the new one in other section.
123
+
The above example is a simple toy one and not suitable for complicated multimodal features use. Therefore, based on the `computational sequence`, we developed a brand-new and complicated one named `computatoinal_sequencex` to facilitate a common frame of storing and manipulating multimodal features for high-level applications in many fields.
Copy file name to clipboardExpand all lines: doc/text_features_extraction.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -34,4 +34,4 @@ if __name__=="__main__":
34
34
35
35
```
36
36
37
-
Most of the methods generate word vectors with fixed length to represent text for our analysis. We highly recommend to use GloVe embedding to generate word vectors.
37
+
Most of the methods generate word vectors with fixed length to represent text for our analysis. We highly recommend you to use GloVe embedding to generate word vectors.
Copy file name to clipboardExpand all lines: doc/video_features_extraction.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
## Video Features Extraction
2
2
3
-
Extracting video features from a video file like *.mp4 file is very complicated. There are many frames from the video which are considered as images. But at the same time, we have to consider the temporal information in the video.
3
+
Extracting video features from a video file like `*.mp4` file is very complicated. There are many frames from the video which are considered as images. But at the same time, we have to consider the temporal information in the video.
4
4
5
5
A simple example of extracting video features using the `mmkit-features` toolkit is below:
0 commit comments