Skip to content

Commit d4eb29f

Browse files
BI-LSTM-CRF_Tensorflow (pclubiitk#19)
* Create dataset * Add files via upload * Delete dataset * Create dataset * Add files via upload * Delete dataset * Add files via upload * Update README.md * Create dataset * Add files via upload * Delete dataset * Create dataset * Delete dataset * Create readme.md * Add files via upload * Delete readme.md * Update README.md * Update README.md * Delete utils.py * Delete train.py * Delete README.md * Delete model.py * Delete Train_ner.csv * Delete accuracy.png * Delete algorithm.png * Delete bilstm-crf.png * Delete bilstm.png * Delete bilstm_eq.png * Delete bilstm_graph.png * Delete crf.png * Delete eq.png * Delete loss.png * Delete lstm-crf.png * Create readme.md * Add files via upload * Delete Train_ner.csv * Delete accuracy.png * Delete algorithm.png * Delete bilstm-crf.png * Delete bilstm.png * Delete bilstm_eq.png * Delete bilstm_graph.png * Delete crf.png * Delete eq.png * Delete loss.png * Delete lstm-crf.png * Update README.md * add dataloader.py * Update README
1 parent 0265072 commit d4eb29f

File tree

15 files changed

+368
-0
lines changed

15 files changed

+368
-0
lines changed

NLP/BI-LSTM_CRF_Tensorflow/README.md

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# Tensoflow Implementation Of BI-LSTM-CRF
2+
## Downloading Dataset
3+
```python
4+
!kaggle datasets download -d abhinavwalia95/entity-annotated-corpus
5+
import zipfile
6+
with zipfile.ZipFile("/entity-annotated-corpus.zip", 'r') as zip_ref:
7+
zip_ref.extractall("/entity-annotated-corpus")
8+
9+
```
10+
## Usage
11+
```bash
12+
$ python3 train.py
13+
```
14+
>**_NOTE:_** On Notebook use-
15+
```python
16+
! git clone link-to-repo
17+
%run train.py
18+
```
19+
20+
## Help Log
21+
```
22+
usage: train.py [-h] [--train_path TRAIN_PATH] [--output_dir OUTPUT_DIR]
23+
[--max_len MAX_LEN] [--batch_size BATCH_SIZE]
24+
[--hidden_num HIDDEN_NUM] [--embedding_size EMBEDDING_SIZE]
25+
[--embedding_file EMBEDDING_FILE] [--epoch EPOCH] [--lr LR]
26+
[--require_improvement REQUIRE_IMPROVEMENT]
27+
28+
train
29+
30+
optional arguments:
31+
-h, --help show this help message and exit
32+
--train_path TRAIN_PATH
33+
train file
34+
--output_dir OUTPUT_DIR
35+
output_dir
36+
--max_len MAX_LEN max_len
37+
--batch_size BATCH_SIZE
38+
batch_size
39+
--hidden_num HIDDEN_NUM
40+
hidden_num
41+
--embedding_size EMBEDDING_SIZE
42+
embedding_size
43+
--embedding_file EMBEDDING_FILE
44+
embedding_file
45+
--epoch EPOCH epoch
46+
--lr LR lr
47+
--require_improvement REQUIRE_IMPROVEMENT
48+
require_improvement
49+
```
50+
## Contributed by :
51+
* [Akshay Gupta](https://github.com/akshay-gupta123)
52+
53+
## Refrence
54+
* **Title** : Bidirectional LSTM-CRF Models for Sequence Tagging
55+
* **Link** : https://arxiv.org/abs/1508.01991
56+
* **Author** : Zhiheng Huang,Wei Xu,Kai Yu
57+
* **Tags** : Recurrent Neral Network, Name Entity Recoginition
58+
* **Published** : 9 Aug, 2015
59+
60+
# Summary
61+
62+
## Introduction
63+
64+
The paper deals with sequence tagging task of NLP. This paper proposed a variety of Long-Short_Term Memory based models for sequence tagging including LSTM,BI-LSTM,LSTM-CRF anf BI-LSTM-CRF with focus on LSTM+CRF models . It compares performance of proposed model with other models like Conv-CRF etc also showing BI-LSTM-CRF is more robust as compared to others.
65+
66+
## Models
67+
**1. LSTM Networks**
68+
69+
A RNN maintains a memory baseed on history information, which enables the model to predict the current output conditioned on long distances fetures.
70+
An input layer represents features at time t. An input layer has the __same dimensionality__ as __feature size__. An output layer represents a probability distribution over labels at time t. It has the __same dimensionality__ as size of __labels__. Compared to feedforward network, a RNN introduces the connection between the previous hidden state and current hidden state. This recurrent layer is designed to store history information. The values in the hidden and output layers are computed as follows:
71+
72+
**h(t) = f(Ux(t) + Wh(t − 1)), (1)
73+
y(t) = g(Vh(t)), (2)**
74+
75+
where U, W, and V are the connection weights to be computed in training time, and f(z) is sigmoid and g(z) is softmax function
76+
77+
<img src="https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/bilstm_graph.png" width="425"/> <img src="https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/bilstm_eq.png" width="425"/>
78+
79+
where σ is the logistic sigmoid function, and i, f, o and c are the input gate, forget gate, output gate and cell vectors, all of which are the same size as the hidden vector h
80+
81+
**2. BI-LSTM Networks**
82+
83+
In sequence tagging task, we have access to both past and future input features for a given time, we can thus utilize a bidirectional LSTM network
84+
In doing so, we can efficiently make use of past features (via forward states) and future features (viab ackward states) for a specific time frame. The forward and backward passes over the unfolded network over time are carried out in a similar way to regular network forward and backward passes, except unfolding the hidden states for all time steps,also need a special treatment at the beginning and the end of the data points. In paper,authors do forward and backward for whole sentences and reset the hidden states to 0 at the begning of each sentence and batch implementation enables multiple sentences to be processed at the same time.
85+
86+
![A BI-LSTM-Network](https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/bilstm.png)
87+
88+
**3. CRF Networks**
89+
90+
There are two different ways to make use of neighbor tag information in predicting current tags. The
91+
first is to predict a distribution of tags for each time step and then use beam-like decoding to find optimal tag sequences. The second one is to focus on sentence level instead of individual positions, thus leading to Conditional Random Fields
92+
(CRF) models. It has been shown that CRFs can produce higher tagging accuracy in general.
93+
94+
![A CRF Model](https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/crf.png)
95+
96+
**4. LSTM-CRF Networks**
97+
98+
Combinig a CRF and LSTM model can efficiently use past input features via a LSTM layer and sentence level tag information via a CRF layer. A CRF layer has a state transition matrix as parameters. With such a layer, we can efficiently use past and future tags to predict the current tag. The element __[f0]*i,t*__ of the matrix is the score output by the network with parameters θ, for the sentence __[x]^T*1*__ and for the i-th tag,at the t-th word and introduce a a transition score __[A]*i,j*__ to model the transition from i-th state to jth for a pair of consecutive time steps.
99+
100+
![Score Equation](https://github.com/akshay-gupta123/blob/tree/master/NLP/BI-LSTM_CRF_Tensorflow/asset/eq.png)
101+
102+
Note that this transition matrix is *position independent*. We now denote the new parameters for our network as __*˜θ = θ∪ {[A]i,j∀i, j}*__. The score of a sentence [x]^T*1*
103+
along with a path of tags __[i]^T*1*__ is then given by the sum of transition scores and network scores:
104+
105+
![LSTM-CRF](https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/lstm-crf.png)
106+
107+
**5. BI-LSTM-CRF-Networks**
108+
109+
Similar to a LSTM-CRF network, we combine a bidirectional LSTM network and a CRF network to form a BI-LSTM-CRF network. In addition to the past input features and sentence level tag information used in a LSTM-CRF model, a BILSTM-CRF model can use the future input features. The extra features can boost tagging accuracy as we will show in experiments.
110+
![ BI-LSTM-CRF model](https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/bilstm-crf.png)
111+
112+
## Training
113+
114+
All models used in this paper share a generic SGD forward and backward training procedure. authors choose the most complicated model, BI-LSTM-CRF, to illustrate the training algorithm as shown in Algorithm 1.
115+
116+
![Training Algorithm](https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/algorithm.png)
117+
118+
In each epoch,the whole training data is divided into batches and process one batch at a time. Each batch contains a list of
119+
sentences which is determined by the parameter of *batch size*. For each batch,first run bidirectional LSTM-CRF model forward pass which includes the forward pass for both forward state and backward state of LSTM. As a result, we get the the output score
120+
__fθ([x]T1)__ for all tags at all positions, then run CRF layer forward and backward pass to compute gradients for network output and state transition edges. After that,back propagate the errors from the output to the input, which includes the backward pass for both forward and backward states of LSTM. Finally update the network parameters which include the state transition matrix
121+
__[A]i,j∀i, j__, and the original bidirectional LSTM
122+
123+
__Default Parameters__
124+
* Learning Rate - 0.001
125+
* Maximum length of Sentence - 50
126+
* Epoch - 15
127+
* Batch Size - 64
128+
* Hidden Unit size 512
129+
* Embedding size - 300
130+
* Optimizer - Adam
131+
132+
133+
## Result
134+
135+
**Dataset**- Entity tags are encoded using a BIO annotation scheme, where each entity label is prefixed with either B or I letter. B- denotes the beginning and I- inside of an entity. The prefixes are used to detect multi-word entities All other words, which don’t refer to entities of interest, are labeled with the O tag.
136+
137+
![Dataset](https://miro.medium.com/max/1134/1*dZKZk4gs5bp52H4FY34SZA.png)
138+
139+
* Loss vs Steps Graph
140+
141+
![Loss](https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/loss.png)
142+
143+
* Accuracy vs Steps Graph
144+
145+
![Accuracy](https://github.com/akshay-gupta123/model-zoo/blob/master/NLP/BI-LSTM_CRF_Tensorflow/asset/accuracy.png)
146+
147+
## Robustness
148+
149+
To estimate the robustness of models with respect to engineered features,authors train LSTM, BI-LSTM, CRF, LSTMCRF, and BI-LSTM-CRF models with word features only. CRF models’ performance is *significantly degraded* with the removal of spelling and context features. This reveals the fact that CRF models heavily rely on engineered featuresto obtain good performance. On the other hand,
150+
LSTM based models, especially BI-LSTM and BI-LSTM-CRF models are *more robust and they are less affected by the removal of engineering features*. For all three tasks, BI-LSTM-CRF models result in the highest tagging accuracy
151+
152+
153+
154+
155+
156+
157+
158+
159+
160+
32.8 KB
Loading
110 KB
Loading
44.8 KB
Loading
47.3 KB
Loading
64.3 KB
Loading
Loading
27.3 KB
Loading
13.2 KB
Loading
49.1 KB
Loading

0 commit comments

Comments
 (0)