Skip to content

Commit 2553464

Browse files
authored
Clarification about entity recognition helper files (#85)
1 parent 4722dce commit 2553464

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

entity_recognition/entity_recognition_training.ipynb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@
2323
"source": [
2424
"This notebook demonstrates how to train a NLP model for entity recognition and use it to produce out-of-sample predicted probabilities for each token. These are required inputs to find label issues in token classification datasets with cleanlab. The specific token classification task we consider here is Named Entity Recognition with the [CoNLL-2003 dataset](https://deepai.org/dataset/conll-2003-english), and we train a Transformer network from [HuggingFace's transformers library](https://github.com/huggingface/transformers). This notebook demonstrates how to produce the `pred_probs`, using them to find label issues is demonstrated in cleanlab's [Token Classification Tutorial](https://docs.cleanlab.ai/stable/tutorials/token_classification.html). \n",
2525
"\n",
26+
"Note: running this notebook requires the **.py** files from the **entity_recognition/** parent folder, if running in Colab or locally, make sure you've copied these helper **.py** files to your environment as well. \n",
27+
"\n",
2628
"**Overview of what we'll do in this notebook:** \n",
2729
"- Read and process text datasets with per-token labels in the CoNLL format. \n",
2830
"- Compute out-of-sample predicted probabilities by training a BERT Transformer network via cross-validation. \n",

0 commit comments

Comments
 (0)