You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|1 |[datalab](datalab_image_classification/README.md)| Use Datalab to detect various types of data issues in (a subset of) the Caltech-256 image classification dataset. |
12
-
|2 |[find_label_errors_iris](find_label_errors_iris/find_label_errors_iris.ipynb)| Find label errors introduced into the Iris classification dataset.|
13
-
|3 |[classifier_comparison](classifier_comparison/classifier_comparison.ipynb)| Use CleanLearning to train 10 different classifiers on 4 dataset distributions with label errors.|
14
-
|4 |[hyperparameter_optimization](hyperparameter_optimization/hyperparameter_optimization.ipynb)| Hyperparameter optimization to find the best settings of CleanLearning's optional parameters. |
15
-
|5 |[simplifying_confident_learning](simplifying_confident_learning/simplifying_confident_learning.ipynb)| Straightforward implementation of Confident Learning algorithm with raw numpy code.|
16
-
|6 |[visualizing_confident_learning](visualizing_confident_learning/visualizing_confident_learning.ipynb)| See how cleanlab estimates parameters of the label error distribution (noise matrix).|
17
-
|7 |[find_tabular_errors](find_tabular_errors/find_tabular_errors.ipynb)| Handle mislabeled [tabular data](https://github.com/cleanlab/s/blob/master/student-grades-demo.csv) to improve a XGBoost classifier.|
18
-
|8|[fine_tune_LLM](fine_tune_LLM/LLM_with_noisy_labels_cleanlab.ipynb)| Fine-tuning OpenAI language models with noisily labeled text data|
19
-
|9 |[cnn_mnist](cnn_mnist/find_label_errors_cnn_mnist.ipynb)| Finding label errors in MNIST image data with a [Convolutional Neural Network](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/mnist_pytorch.py).|
20
-
|10 |[huggingface_keras_imdb](huggingface_keras_imdb/huggingface_keras_imdb.ipynb)| CleanLearning for text classification with Keras Model + pretrained BERT backbone and Tensorflow Dataset. |
21
-
|11 |[fasttext_amazon_reviews](fasttext_amazon_reviews/fasttext_amazon_reviews.ipynb)| Finding label errors in Amazon Reviews text dataset using a cleanlab-compatible [FastText model](fasttext_amazon_reviews/fasttext_wrapper.py).|
22
-
|12 |[multiannotator_cifar10](multiannotator_cifar10/multiannotator_cifar10.ipynb)| Iteratively improve consensus labels and trained classifier from data labeled by multiple annotators. |
23
-
|13 |[llm_evals_w_crowdlab](llm_evals_w_crowdlab/llm_evals_w_crowdlab.ipynb)| Reliable LLM Evaluation with multiple human/AI reviewers of varying competency (via CROWDLAB and LLM-as-judge GPT token probabilities).|
24
-
|14 |[active_learning_multiannotator](active_learning_multiannotator/active_learning.ipynb)| Improve a classifier model by iteratively collecting additional labels from data annotators. This active learning pipeline considers data labeled in batches by multiple (imperfect) annotators. |
25
-
|15 |[active_learning_single_annotator](active_learning_single_annotator/active_learning_single_annotator.ipynb)| Improve a classifier model by iteratively labeling batches of currently-unlabeled data. This demonstrates a standard active learning pipeline with *at most one label* collected for each example (unlike our multi-annotator active learning notebook which allows re-labeling). |
26
-
|16 |[active_learning_transformers](active_learning_transformers/active_learning.ipynb)| Improve a Transformer model for classifying politeness of text by iteratively labeling and re-labeling batches of data using multiple annotators. If you haven't done active learning with re-labeling, try the [active_learning_multiannotator](active_learning_multiannotator/active_learning.ipynb) notebook first. |
27
-
|17 |[outlier_detection_cifar10](outlier_detection_cifar10/outlier_detection_cifar10.ipynb)| Train AutoML for image classification and use it to detect out-of-distribution images.|
28
-
|18 |[multilabel_classification](multilabel_classification/image_tagging.ipynb)| Find label errors in an image tagging dataset ([CelebA](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) using a [Pytorch model](multilabel_classification/pytorch_network_training.ipynb) you can easily train for multi-label classification.|
29
-
|19 |[entity_recognition](entity_recognition/)| Train Transformer model for Named Entity Recognition and produce out-of-sample `pred_probs` for **cleanlab.token_classification**.|
30
-
|20 |[transformer_sklearn](transformer_sklearn/transformer_sklearn.ipynb)| How to use `KerasWrapperModel` to make any Keras model sklearn-compatible, demonstrated here for a BERT Transformer. |
31
-
|21 |[cnn_coteaching_cifar10](cnn_coteaching_cifar10/README.md)| Train a [Convolutional Neural Network](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/cifar_cnn.py) on noisily labeled Cifar10 image data using cleanlab with [coteaching](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/coteaching.py). |
32
-
|22 |[non_iid_detection](non_iid_detection/non_iid_detection.ipynb)| Use Datalab to detect non-IID sampling (e.g. drift) in datasets based on numeric features or embeddings. |
33
-
|23 |[object_detection](object_detection/README.md)| Train Detectron2 object detection model for use with cleanlab. |
34
-
|24 |[semantic segmentation](segmentation/training_ResNeXt50_for_Semantic_Segmentation_on_SYNTHIA.ipynb)| Train ResNeXt semantic segmentation model for use with cleanlab. |
35
-
|24 |[spurious correlations](spurious_correlations_datalab/detecting_spurious_correlations.ipynb)| Train a CNN model on spurious and non-spurious versions of a subset of [Food-101](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) dataset. Use `Datalab` to detect issues in the spuriously correlated datasets. |
|[datalab](datalab_image_classification/README.md)| Use Datalab to detect various types of data issues in (a subset of) the Caltech-256 image classification dataset. |
12
+
|[llm_evals_w_crowdlab](llm_evals_w_crowdlab/llm_evals_w_crowdlab.ipynb)| Reliable LLM Evaluation with multiple human/AI reviewers of varying competency (via CROWDLAB and LLM-as-judge GPT token probabilities).|
13
+
|[fine_tune_LLM](fine_tune_LLM/LLM_with_noisy_labels_cleanlab.ipynb)| Fine-tuning OpenAI language models with noisily labeled text data |
14
+
|[entity_recognition](entity_recognition/)| Train Transformer model for Named Entity Recognition and produce out-of-sample `pred_probs` for **cleanlab.token_classification**.|
15
+
|[multiannotator_cifar10](multiannotator_cifar10/multiannotator_cifar10.ipynb)| Iteratively improve consensus labels and trained classifier from data labeled by multiple annotators. |
16
+
|[active_learning_multiannotator](active_learning_multiannotator/active_learning.ipynb)| Improve a classifier model by iteratively collecting additional labels from data annotators. This active learning pipeline considers data labeled in batches by multiple (imperfect) annotators. |
17
+
|[active_learning_single_annotator](active_learning_single_annotator/active_learning_single_annotator.ipynb)| Improve a classifier model by iteratively labeling batches of currently-unlabeled data. This demonstrates a standard active learning pipeline with *at most one label* collected for each example (unlike our multi-annotator active learning notebook which allows re-labeling). |
18
+
|[active_learning_transformers](active_learning_transformers/active_learning.ipynb)| Improve a Transformer model for classifying politeness of text by iteratively labeling and re-labeling batches of data using multiple annotators. If you haven't done active learning with re-labeling, try the [active_learning_multiannotator](active_learning_multiannotator/active_learning.ipynb)notebook first. |
19
+
|[outlier_detection_cifar10](outlier_detection_cifar10/outlier_detection_cifar10.ipynb)| Train AutoML for image classification and use it to detect out-of-distribution images.|
20
+
|[multilabel_classification](multilabel_classification/image_tagging.ipynb)| Find label errors in an image tagging dataset ([CelebA](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) using a [Pytorch model](multilabel_classification/pytorch_network_training.ipynb) you can easily train for multi-label classification.|
21
+
|[find_label_errors_iris](find_label_errors_iris/find_label_errors_iris.ipynb)| Find label errors introduced into the Iris classification dataset.|
22
+
|[classifier_comparison](classifier_comparison/classifier_comparison.ipynb)| Use CleanLearning to train 10 different classifiers on 4 dataset distributions with label errors.|
23
+
|[hyperparameter_optimization](hyperparameter_optimization/hyperparameter_optimization.ipynb)| Hyperparameter optimization to find the best settings of CleanLearning's optional parameters.|
24
+
|[simplifying_confident_learning](simplifying_confident_learning/simplifying_confident_learning.ipynb)| Straightforward implementation of Confident Learning algorithm with raw numpy code.|
25
+
|[visualizing_confident_learning](visualizing_confident_learning/visualizing_confident_learning.ipynb)| See how cleanlab estimates parameters of the label error distribution (noise matrix).|
26
+
|[find_tabular_errors](find_tabular_errors/find_tabular_errors.ipynb)| Handle mislabeled [tabular data](https://github.com/cleanlab/s/blob/master/student-grades-demo.csv) to improve a XGBoost classifier.|
27
+
|[cnn_mnist](cnn_mnist/find_label_errors_cnn_mnist.ipynb)| Finding label errors in MNIST image data with a [Convolutional Neural Network](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/mnist_pytorch.py).|
28
+
|[huggingface_keras_imdb](huggingface_keras_imdb/huggingface_keras_imdb.ipynb)| CleanLearning for text classification with Keras Model + pretrained BERT backbone and Tensorflow Dataset. |
29
+
|[fasttext_amazon_reviews](fasttext_amazon_reviews/fasttext_amazon_reviews.ipynb)| Finding label errors in Amazon Reviews text dataset using a cleanlab-compatible [FastText model](fasttext_amazon_reviews/fasttext_wrapper.py).|
30
+
|[transformer_sklearn](transformer_sklearn/transformer_sklearn.ipynb)| How to use `KerasWrapperModel` to make any Keras model sklearn-compatible, demonstrated here for a BERT Transformer. |
31
+
|[cnn_coteaching_cifar10](cnn_coteaching_cifar10/README.md)| Train a [Convolutional Neural Network](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/cifar_cnn.py) on noisily labeled Cifar10 image data using cleanlab with [coteaching](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/coteaching.py). |
32
+
|[non_iid_detection](non_iid_detection/non_iid_detection.ipynb)| Use Datalab to detect non-IID sampling (e.g. drift) in datasets based on numeric features or embeddings. |
33
+
|[object_detection](object_detection/README.md)| Train Detectron2 object detection model for use with cleanlab. |
34
+
|[semantic segmentation](segmentation/training_ResNeXt50_for_Semantic_Segmentation_on_SYNTHIA.ipynb)| Train ResNeXt semantic segmentation model for use with cleanlab. |
35
+
|[spurious correlations](spurious_correlations_datalab/detecting_spurious_correlations.ipynb)| Train a CNN model on spurious and non-spurious versions of a subset of [Food-101](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) dataset. Use `Datalab` to detect issues in the spuriously correlated datasets. |
0 commit comments