Skip to content

Latest commit

 

History

History
194 lines (167 loc) · 9.24 KB

Changelog.md

File metadata and controls

194 lines (167 loc) · 9.24 KB

TODO: [] fine tunning to find the best parameters [] tensorboardX [] Prediction GUI

===================================================================== Jul 10, 2023

  • With the recent version of Pandas, read_csv should be loaded with the parameter keep_default_na=False to prevent reading None as NaN, as 'None' is a word in normal English.

===================================================================== Oct 12, 2022

  • Make reproducible output by fixing seed for torch and numpy

===================================================================== July 28, 2022

  • Adding labels for Confusion Matrix axises
  • Revision of confusion matrix's labels

===================================================================== July 26, 2022

  • ignore_index=0 for CrossEntropyLoss to ignore the padding index. This option Specifies a target value that is ignored and does not contribute to the input gradient resulting in lower computation consumption and the more accurate loss and f1-score as without the ignore_index all the paddings are also included in the criteria.
  • Revision of flatten function applying the sentences' lengths in order to compute the loss more accurately.
  • By these two modifications, the issue with padding seems solved.
  • Now, the predicition of the first token is almost correct.
  • target_size is now 3 instead of 4 (to count the token); actually, the loss function needs to be fed ignore_index=target_size.
  • Classification Report
  • Confusion Matrix

===================================================================== July 25, 2022

  • list in list(zip(*batch)) of collate_fn function was not necessary and just made the running time more.
  • .to(device) inside the collate_fn to get rid of device migration inside the training phase.
  • some retouchments in char2vec

===================================================================== July 20, 2022

  • Solved issue with prediction. Using the set to remove the redundant characters resulted in a new order in each run. To get rid of this issue, the sorted function guarantees that we have a unique order. Another solution to this problem is to save the chr2id dictionary with the model and reload it during the prediction.

===================================================================== July 19, 2022

  • padding_idx=0 for nn.Embedding layer.
  • GPU support; run and tested on Google Colab
  • increasing dropout, out_ch1, out_ch2 to 0.3, 37, 35, respectively doesn't help so much (f1-score of 0.926 in comparison to 0.924!), so I reverted them to the smaller size.

===================================================================== July 18, 2022

  • predict.py for the ordinary application.
 python predict.py --text [sample text] --model [pretrained model]
  • Since I've used F1_Score with micro average, I should mention here that it means micro-F1 = micro-precision = micro-recall = accuracy, i.e. I've reported in all cases accuray not f1-score. From now on, I use macro average for f1-score. Therefore, the result would be more realistic now.

plot

  • Saving the best model for prediction
  • multiple-width filter bank in the second layer of the Char2Vec --> better result and less overfitting.
BiLSTMtagger(
  (word_embeddings): Char2Vec(
    (embeds): Embedding(298, 9)
    (conv1): Sequential(
      (0): Conv1d(9, 12, kernel_size=(3,), stride=(1,))
      (1): ReLU()
      (2): Dropout(p=0.25, inplace=False)
    )
    (convs2): ModuleList(
      (0): Sequential(
        (0): Conv1d(12, 5, kernel_size=(3,), stride=(1,))
        (1): ReLU()
      )
      (1): Sequential(
        (0): Conv1d(12, 5, kernel_size=(4,), stride=(1,))
        (1): ReLU()
      )
      (2): Sequential(
        (0): Conv1d(12, 5, kernel_size=(5,), stride=(1,))
        (1): ReLU()
      )
    )
    (linear): Sequential(
      (0): Linear(in_features=15, out_features=15, bias=True)
      (1): ReLU()
    )
  )
  (lstm): LSTM(15, 128, num_layers=2, batch_first=True, dropout=0.25, bidirectional=True)
  (hidden2tag): Linear(in_features=256, out_features=4, bias=True)
)

plot

===================================================================== July 17, 2022

  • F1 score with weighted average instead of micro.
  • Char2Vec class
  • removing repetition in a token with more than 4 characters and truncation of any words to the length of at most 20 characters; ==> a slightly better result
  • Char2Vec+BiLSTM finished, with f1=0.9549, val_f1=0.9443; another slight improvement in the model
BiLSTMtagger(
(word_embeddings): Char2Vec(
    (embeds): Embedding(298, 9)
    (conv1): Sequential(
    (0): Conv1d(9, 12, kernel_size=(3,), stride=(1,))
    (1): ReLU()
    (2): Dropout(p=0.1, inplace=False)
    )
    (conv2): Sequential(
    (0): Conv1d(12, 15, kernel_size=(3,), stride=(1,))
    (1): ReLU()
    )
    (linear): Sequential(
    (0): Linear(in_features=15, out_features=15, bias=True)
    (1): ReLU()
    )
)
(lstm): LSTM(15, 128, num_layers=2, batch_first=True, dropout=0.25, bidirectional=True)
(hidden2tag): Linear(in_features=256, out_features=4, bias=True)
)

plot

===================================================================== July 16, 2022

  • dechipher the text/label from the output of network

  • tokens should be considered in the context, not as a collection of single tokens: in the following audio is a Spanish token, not an English one.

    @andres_romero17 si , prometo hacer un audio :)

    other es other es es es es other

  • loss/f1_score plot

plot

  • data analysis around tweets and their tokens/chars
  • code sanitization

===================================================================== July 15, 2022

  • Printing the loss for train/val set on the screen
  • computation of f1_score for both training and validation set shows the network convergence
  • SGD, lr=0.1, hidden_dim=64
Epoch  1/40, loss=0.9072, val_loss=0.8901    ,train_f1=0.5998, val_f1=0.5462
Epoch  2/40, loss=0.6987, val_loss=0.7863    ,train_f1=0.7165, val_f1=0.6602
Epoch  3/40, loss=0.5788, val_loss=0.7573    ,train_f1=0.7714, val_f1=0.7342
Epoch  4/40, loss=0.4912, val_loss=0.7454    ,train_f1=0.8088, val_f1=0.7589
Epoch  5/40, loss=0.4221, val_loss=0.7322    ,train_f1=0.8367, val_f1=0.7747
Epoch 10/40, loss=0.2226, val_loss=0.6976    ,train_f1=0.9123, val_f1=0.7897
Epoch 15/40, loss=0.1427, val_loss=0.7406    ,train_f1=0.9431, val_f1=0.8072
Epoch 20/40, loss=0.1083, val_loss=0.6276    ,train_f1=0.9577, val_f1=0.8133
Epoch 25/40, loss=0.0925, val_loss=0.6425    ,train_f1=0.9648, val_f1=0.8163
Epoch 30/40, loss=0.0842, val_loss=0.6611    ,train_f1=0.9683, val_f1=0.8171
Epoch 35/40, loss=0.0792, val_loss=0.6735    ,train_f1=0.9701, val_f1=0.8178
Epoch 40/40, loss=0.0763, val_loss=0.6753    ,train_f1=0.9711, val_f1=0.8180
  • Adam+ReduceLROnPlateau, lr=1e-3, wd=1e-5, hidden_dim=128
Epoch 1/7, loss=0.5991, val_loss=0.5572    ,train_f1=0.7311, val_f1=0.7483
Epoch 2/7, loss=0.2947, val_loss=0.4787    ,train_f1=0.9005, val_f1=0.8266
Epoch 3/7, loss=0.1783, val_loss=0.4336    ,train_f1=0.9485, val_f1=0.8379
Epoch 4/7, loss=0.1256, val_loss=0.4124    ,train_f1=0.9653, val_f1=0.8494
Epoch 5/7, loss=0.1049, val_loss=0.3998    ,train_f1=0.9698, val_f1=0.8512
Epoch 6/7, loss=0.0977, val_loss=0.3884    ,train_f1=0.9714, val_f1=0.8512
Epoch 7/7, loss=0.0940, val_loss=0.3817    ,train_f1=0.9725, val_f1=0.8529
  • Minibatches made a great leap: train_f1=0.97, val_f1=0.94
Epoch  1/40, loss=0.5998, val_loss=0.4027    ,train_f1=0.7502, val_f1=0.7768
Epoch  2/40, loss=0.3764, val_loss=0.3790    ,train_f1=0.8179, val_f1=0.7971
Epoch  3/40, loss=0.3242, val_loss=0.3561    ,train_f1=0.8501, val_f1=0.8307
Epoch  4/40, loss=0.2618, val_loss=0.2922    ,train_f1=0.8861, val_f1=0.8741
Epoch  5/40, loss=0.2209, val_loss=0.2553    ,train_f1=0.9065, val_f1=0.8931
Epoch 10/40, loss=0.1291, val_loss=0.1723    ,train_f1=0.9460, val_f1=0.9291
Epoch 15/40, loss=0.0892, val_loss=0.1429    ,train_f1=0.9616, val_f1=0.9419
Epoch 20/40, loss=0.0665, val_loss=0.1471    ,train_f1=0.9675, val_f1=0.9409
Epoch 25/40, loss=0.0510, val_loss=0.1481    ,train_f1=0.9715, val_f1=0.9397
Epoch 30/40, loss=0.0420, val_loss=0.1676    ,train_f1=0.9742, val_f1=0.9397
Epoch 35/40, loss=0.0359, val_loss=0.1756    ,train_f1=0.9755, val_f1=0.9386
Epoch 40/40, loss=0.0323, val_loss=0.1934    ,train_f1=0.9765, val_f1=0.9403
  • BiLSTM with 2 layers and dropout prevents kind of overfitting

===================================================================== July 14, 2022

  • Data Class improvement
    • several dictionaries to convert token,label,char to id and vice versa
    • making the coded sentences and their counterparts labels
  • LSTM class
  • CodeSwitchDataset as well as customized DataLoader

===================================================================== July 13, 2022

  • github repo initailization
  • reading the paper
  • starting the code with Data Class
    • an issue with quoting in reading tsv files