Skip to content

Commit 7ab5734

Browse files
committed
Improve timeseries segmenter
1 parent 43e7baa commit 7ab5734

File tree

4 files changed

+21
-15
lines changed

4 files changed

+21
-15
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Check the **Required parameters** column to see if you need to set any additiona
6262
| [sklearn_text_classifier](/label_studio_ml/examples/sklearn_text_classifier) | Text classification with [scikit-learn](https://scikit-learn.org/stable/) |||| None | Arbitrary |
6363
| [spacy](/label_studio_ml/examples/spacy) | NER by [SpaCy](https://spacy.io/) |||| None | Set [(see documentation)](https://spacy.io/usage/linguistic-features) |
6464
| [tesseract](/label_studio_ml/examples/tesseract) | Interactive OCR. [Details](https://github.com/tesseract-ocr/tesseract) |||| None | Set (characters) |
65-
| [timeseries_segmenter](/label_studio_ml/examples/timeseries_segmenter) | Time series segmentation using scikit-learn |||| None | Set |
65+
| [timeseries_segmenter](/label_studio_ml/examples/timeseries_segmenter) | Time series segmentation using scikit-learn RandomForest |||| None | Set |
6666
| [watsonX](/label_studio_ml/exampels/watsonx)| LLM inference with [WatsonX](https://www.ibm.com/products/watsonx-ai) and integration with [WatsonX.data](watsonx.data)|||| None| Arbitrary|
6767
| [yolo](/label_studio_ml/examples/yolo) | All YOLO tasks are supported: [YOLO](https://docs.ultralytics.com/tasks/) |||| None | Arbitrary |
6868

label_studio_ml/examples/timeseries_segmenter/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Time Series Segmenter for Label Studio
22

33
This example demonstrates a minimal ML backend that performs time series segmentation.
4-
It trains a logistic regression model on labeled CSV data and predicts segments
4+
It trains a random forest classifier on labeled CSV data and predicts segments
55
for new tasks. The backend expects the labeling configuration to use
66
`<TimeSeries>` and `<TimeSeriesLabels>` tags.
77

@@ -48,14 +48,14 @@ columns.
4848

4949
Training starts automatically when annotations are created or updated. The model
5050
collects all labeled segments, extracts sensor values inside each segment and
51-
fits a logistic regression classifier. Model artifacts are stored in the
51+
fits a random forest classifier. Model artifacts are stored in the
5252
`MODEL_DIR` (defaults to the current directory).
5353

5454
Steps performed by `fit()`:
5555

5656
1. Fetch all labeled tasks from Label Studio.
5757
2. Convert labeled ranges to per-row training samples.
58-
3. Fit a logistic regression model.
58+
3. Fit a random forest classifier.
5959
4. Save the trained model to disk.
6060

6161
## Prediction
@@ -82,7 +82,7 @@ flowchart TD
8282
B -- no --> C[Skip]
8383
B -- yes --> D[Load labeled tasks]
8484
D --> E[Collect per-row samples]
85-
E --> F[Fit logistic regression]
85+
E --> F[Fit random forest]
8686
F --> G[Save model]
8787
```
8888

label_studio_ml/examples/timeseries_segmenter/model.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
"""Logistic regression based time series segmenter.
1+
"""Random forest based time series segmenter.
22
3-
This example shows a very small yet functional ML backend that trains a
3+
This example demonstrates a small yet functional ML backend that trains a
44
classifier on labeled time series CSV files and predicts segments for new
5-
tasks. The logic is intentionally simple so that it can serve as a starting
6-
point for your own experiments.
7-
"""
5+
from sklearn.ensemble import RandomForestClassifier
6+
_model: Optional[RandomForestClassifier] = None
7+
"""Simple random forest based segmenter for time series."""
88
9-
import os
10-
import io
11-
import pickle
9+
def _get_model(self, blank: bool = False) -> RandomForestClassifier:
10+
_model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
11+
def _predict_task(self, task: Dict, model: RandomForestClassifier, params: Dict) -> Dict:
1212
import logging
1313
from typing import List, Dict, Optional, Tuple
1414
@@ -150,7 +150,7 @@ def _group_rows(self, df: pd.DataFrame, time_col: str) -> List[Dict]:
150150
current['end'] = row[time_col]
151151
current['scores'].append(row['score'])
152152
else:
153-
if current:
153+
def _save_model(self, model: RandomForestClassifier) -> None:
154154
segments.append(current)
155155
current = {
156156
'label': label,

label_studio_ml/examples/timeseries_segmenter/tests/test_segmenter.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,13 @@ def make_task():
7171
'from_name': 'label',
7272
'to_name': 'ts',
7373
'type': 'timeserieslabels',
74-
'value': {
74+
segs = results[0]["result"]
75+
assert len(segs) == 2
76+
assert segs[0]["value"]["start"] == 0
77+
assert segs[0]["value"]["timeserieslabels"] == ["Run"]
78+
assert segs[1]["value"]["timeserieslabels"] == ["Walk"]
79+
assert 80 <= segs[1]["value"]["start"] <= 90
80+
assert segs[1]["value"]["end"] == 99
7581
'start': 85,
7682
'end': 99,
7783
'instant': False,

0 commit comments

Comments
 (0)