-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Estimators for Timeseries SSA / IID ChangepointDetection and SpikeDetection transforms #1254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ganik
commented
Oct 13, 2018
- Created Estimators for SSAChangePointDetector, SSASpikeDetector, IidChangePointDetector, IidSpikeDetector
- Added unit tests
…& IID SpikeDetection transformes. Added unit tests
/// <summary> | ||
/// Counts the number of rows observed by the transform so far. | ||
/// </summary> | ||
protected int RowCounter { get; private set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int [](start = 22, length = 3)
should be long, we support more than 2billion rows in dataview #Resolved
using Microsoft.ML.Runtime.RunTests; | ||
using System; | ||
using System.Collections.Generic; | ||
using Microsoft.ML.Runtime.TimeSeriesProcessing; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctrl+r+g #Resolved
|
||
var pipe = new SsaChangePointEstimator(Env, | ||
Confidence, ChangeHistorySize, MaxTrainingSize, SeasonalitySize, "Value", "Change"); | ||
TestEstimatorCore(pipe, dataView); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestEstimatorCore [](start = 12, length = 17)
can we have at least one test with wrong schema? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion, found few issues when added the invalid input tests
In reply to: 225355511 [](ancestors = 225355511)
@@ -16,7 +16,7 @@ | |||
fAnomaly: Vec<R4, 4> | |||
Metadata 'SlotNames': Vec<Text, 4>: Length=4, Count=4 | |||
[0] 'Alert', [1] 'Raw Score', [2] 'P-Value Score', [3] 'Martingale Score' | |||
---- IidChangePointDetector ---- | |||
---- RowToRowMapperTransform ---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move this tests to Common/SavePipe folder, or you need to do same thing in SingleRelease/SavePipe #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string output) | ||
{ | ||
Contracts.CheckValue(env, nameof(env)); | ||
_host = env.Register("SsaSpikeEstimator"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"SsaSpikeEstimator" [](start = 33, length = 19)
nameof(SsaSpikeEstimator) #Closed
private readonly int _trainingWindowSize; | ||
private readonly int _seasonalityWindowSize; | ||
|
||
public SsaSpikeEstimator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SsaSpikeEstimator [](start = 15, length = 17)
Public constructors need to have proper comments. #Closed
{ | ||
public PipelineColumn Input { get; } | ||
|
||
public OutColumn(Vector<float> input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutColumn [](start = 19, length = 9)
Did you put static method which calls this OutColumn in other class, or it just missing? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed pigsty extensions, will be part of separate PR
In reply to: 225356515 [](ancestors = 225356515)
} | ||
|
||
[Fact] | ||
void TestSsaSpikeEstimator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestSsaSpikeEstimator [](start = 13, length = 21)
Can you add pigsty extension tests to StaticPipeTests.cs file? (see LpGcNormAndWhitening as example) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed pigsty extensions, will be part of separate PR
In reply to: 225356643 [](ancestors = 225356643)
string output) | ||
{ | ||
Contracts.CheckValue(env, nameof(env)); | ||
_host = env.Register("SsaChangePointEstimator"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"SsaChangePointEstimator" [](start = 33, length = 25)
nameof(SsaChangePointEstimator) #Closed
_slotNames[0] = "Alert"; | ||
_slotNames[1] = "Raw Score"; | ||
_slotNames[2] = "P-Value Score"; | ||
_slotNames[3] = "Martingale Score"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would simplify and moved to _slotName definition. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried but getting build error: "... a field initialiser outside the constructor"
In reply to: 225357735 [](ancestors = 225357735)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Ivan meant slotNames = new[] { "Alert", "Foo", "Bar", "Baz" }
In reply to: 226489440 [](ancestors = 226489440,225357735)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, tried this and got build error.
In reply to: 226785830 [](ancestors = 226785830,226489440,225357735)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh u mean inside the constructor. ok
In reply to: 226806798 [](ancestors = 226806798,226785830,226489440,225357735)
{ | ||
return new IidSpikeDetector(env, this, newSource); | ||
Contracts.CheckValue(env, nameof(env)); | ||
_host = env.Register("IidSpikeEstimator"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"IidSpikeEstimator" [](start = 33, length = 19)
nameof #Resolved
private readonly int _confidence; | ||
private readonly int _pvalueHistoryLength; | ||
|
||
public IidSpikeEstimator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add Summary #Resolved
{ | ||
return new IidChangePointDetector(env, this, newSource); | ||
Contracts.CheckValue(env, nameof(env)); | ||
_host = env.Register("IidChangePointEstimator"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IidChangePointEstimator [](start = 34, length = 23)
nameof() #Resolved
private readonly int _confidence; | ||
private readonly int _changeHistoryLength; | ||
|
||
public IidChangePointEstimator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add summary comment #Resolved
public IidChangePointDetector(IHostEnvironment env, Arguments args, IDataView input) | ||
: base(new BaseArguments(args), LoaderSignature, env, input) | ||
// Factory method for SignatureDataTransform. | ||
public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
make it private #Closed
You appear to have changed the format. Have you? If you have, please handle backwards compatibility correctly, thanks. #Closed Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:226 in 48b0ddd. [](commit_id = 48b0ddd, deletion_comment = False) |
I guess you didn't write this test, but it takes a comparatively long time at 14 seconds (debug build), and my observation is that my machine is some multiple faster than the build machines. It appears to also be failing. While you're fixing it, maybe tone it down a bit so it hammers the build systems a bit less. #Closed Refers to: test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs:80 in 48b0ddd. [](commit_id = 48b0ddd, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @ganik ! I have two major concerns:
-
You broke backwards compatibility,
-
You have added two additional tests that at least on Windows debug each take about two minutes, which on the build server I'd expect to take even longer. Whether a fault of the test or the code, this is not something we want in our system.
} | ||
|
||
[Fact] | ||
void TestSsaChangePointEstimator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
[Fact] | ||
void TestSsaSpikeEstimator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="env">A reference to the environment variable.</param> | ||
/// <param name="outputColType"></param> | ||
protected SequentialTransformerBase(int windowSize, int initialWindowSize, string inputColumnName, string outputColumnName, | ||
string name, IHostEnvironment env, ColumnType outputColType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IHostEnvironment [](start = 25, length = 16)
This is always the first parameter in all our ctors. Also, the 'env plus name' pattern is uncommon: we tend to just take IHost
in those protected ctors #Resolved
verWrittenCur: 0x00010001, // Initial | ||
verReadableCur: 0x00010001, | ||
//verWrittenCur: 0x00010001, // Initial | ||
verWrittenCur: 0x00010002, // Added saving _state and _nextPrediction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saving _state [](start = 52, length = 13)
Why do you need to do it now? Is this for the stateful row and spawning off new models? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the preexisting test TestEstimatorCore() will pass on new timeseries estimators. One of the test steps in TestEstimatorCore() is to save and re-load transformer from disk. Without this fix saving and reloading of timeseries SSA transformers changes its prediction due to lost _state and _nextPrediction
In reply to: 229477366 [](ancestors = 229477366)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ganik !