Skip to content

Estimators for Timeseries SSA / IID ChangepointDetection and SpikeDetection transforms #1254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Oct 31, 2018

Conversation

ganik
Copy link
Member

@ganik ganik commented Oct 13, 2018

  • Created Estimators for SSAChangePointDetector, SSASpikeDetector, IidChangePointDetector, IidSpikeDetector
  • Added unit tests

/// <summary>
/// Counts the number of rows observed by the transform so far.
/// </summary>
protected int RowCounter { get; private set; }
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int [](start = 22, length = 3)

should be long, we support more than 2billion rows in dataview #Resolved

using Microsoft.ML.Runtime.RunTests;
using System;
using System.Collections.Generic;
using Microsoft.ML.Runtime.TimeSeriesProcessing;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctrl+r+g #Resolved


var pipe = new SsaChangePointEstimator(Env,
Confidence, ChangeHistorySize, MaxTrainingSize, SeasonalitySize, "Value", "Change");
TestEstimatorCore(pipe, dataView);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestEstimatorCore [](start = 12, length = 17)

can we have at least one test with wrong schema? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion, found few issues when added the invalid input tests


In reply to: 225355511 [](ancestors = 225355511)

@@ -16,7 +16,7 @@
fAnomaly: Vec<R4, 4>
Metadata 'SlotNames': Vec<Text, 4>: Length=4, Count=4
[0] 'Alert', [1] 'Raw Score', [2] 'P-Value Score', [3] 'Martingale Score'
---- IidChangePointDetector ----
---- RowToRowMapperTransform ----
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this tests to Common/SavePipe folder, or you need to do same thing in SingleRelease/SavePipe #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made changes in SingleRelease too


In reply to: 225355661 [](ancestors = 225355661)

string output)
{
Contracts.CheckValue(env, nameof(env));
_host = env.Register("SsaSpikeEstimator");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SsaSpikeEstimator" [](start = 33, length = 19)

nameof(SsaSpikeEstimator) #Closed

private readonly int _trainingWindowSize;
private readonly int _seasonalityWindowSize;

public SsaSpikeEstimator(
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SsaSpikeEstimator [](start = 15, length = 17)

Public constructors need to have proper comments. #Closed

{
public PipelineColumn Input { get; }

public OutColumn(Vector<float> input,
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutColumn [](start = 19, length = 9)

Did you put static method which calls this OutColumn in other class, or it just missing? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed pigsty extensions, will be part of separate PR


In reply to: 225356515 [](ancestors = 225356515)

}

[Fact]
void TestSsaSpikeEstimator()
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestSsaSpikeEstimator [](start = 13, length = 21)

Can you add pigsty extension tests to StaticPipeTests.cs file? (see LpGcNormAndWhitening as example) #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed pigsty extensions, will be part of separate PR


In reply to: 225356643 [](ancestors = 225356643)

string output)
{
Contracts.CheckValue(env, nameof(env));
_host = env.Register("SsaChangePointEstimator");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SsaChangePointEstimator" [](start = 33, length = 25)

nameof(SsaChangePointEstimator) #Closed

_slotNames[0] = "Alert";
_slotNames[1] = "Raw Score";
_slotNames[2] = "P-Value Score";
_slotNames[3] = "Martingale Score";
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify and moved to _slotName definition. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried but getting build error: "... a field initialiser outside the constructor"


In reply to: 225357735 [](ancestors = 225357735)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Ivan meant slotNames = new[] { "Alert", "Foo", "Bar", "Baz" }


In reply to: 226489440 [](ancestors = 226489440,225357735)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, tried this and got build error.


In reply to: 226785830 [](ancestors = 226785830,226489440,225357735)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh u mean inside the constructor. ok


In reply to: 226806798 [](ancestors = 226806798,226785830,226489440,225357735)

{
return new IidSpikeDetector(env, this, newSource);
Contracts.CheckValue(env, nameof(env));
_host = env.Register("IidSpikeEstimator");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"IidSpikeEstimator" [](start = 33, length = 19)

nameof #Resolved

private readonly int _confidence;
private readonly int _pvalueHistoryLength;

public IidSpikeEstimator(
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Summary #Resolved

{
return new IidChangePointDetector(env, this, newSource);
Contracts.CheckValue(env, nameof(env));
_host = env.Register("IidChangePointEstimator");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IidChangePointEstimator [](start = 34, length = 23)

nameof() #Resolved

private readonly int _confidence;
private readonly int _changeHistoryLength;

public IidChangePointEstimator(
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add summary comment #Resolved

public IidChangePointDetector(IHostEnvironment env, Arguments args, IDataView input)
: base(new BaseArguments(args), LoaderSignature, env, input)
// Factory method for SignatureDataTransform.
public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input)
Copy link
Contributor

@Zruty0 Zruty0 Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

make it private #Closed

@TomFinley
Copy link
Contributor

TomFinley commented Oct 30, 2018

            verWeCanReadBack: 0x00010001,

You appear to have changed the format. Have you?

If you have, please handle backwards compatibility correctly, thanks. #Closed


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:226 in 48b0ddd. [](commit_id = 48b0ddd, deletion_comment = False)

@TomFinley
Copy link
Contributor

TomFinley commented Oct 30, 2018

        // bool: ShouldComputeForecastIntervals

I do not see that the load comments match the save comments. #Closed


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:372 in 48b0ddd. [](commit_id = 48b0ddd, deletion_comment = False)

@TomFinley
Copy link
Contributor

TomFinley commented Oct 30, 2018

    public void ChangePointDetectionWithSeasonality()

I guess you didn't write this test, but it takes a comparatively long time at 14 seconds (debug build), and my observation is that my machine is some multiple faster than the build machines. It appears to also be failing. While you're fixing it, maybe tone it down a bit so it hammers the build systems a bit less. #Closed


Refers to: test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs:80 in 48b0ddd. [](commit_id = 48b0ddd, deletion_comment = False)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @ganik ! I have two major concerns:

  1. You broke backwards compatibility,

  2. You have added two additional tests that at least on Windows debug each take about two minutes, which on the build server I'd expect to take even longer. Whether a fault of the test or the code, this is not something we want in our system.

}

[Fact]
void TestSsaChangePointEstimator()
Copy link
Contributor

@TomFinley TomFinley Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So...

image

That's a long time for a test to run, and this isn't even on the slower build machines, but on my dev box here, which is comparatively fast. Could you tone it down a bit? #Closed

}

[Fact]
void TestSsaSpikeEstimator()
Copy link
Contributor

@TomFinley TomFinley Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Another two minute test. If this is expected given the large volume of data, please tone down the volume. Certainly we don't need a 150k sample for a test. 😛 If unexpected, please investigat.e #Closed

/// <param name="env">A reference to the environment variable.</param>
/// <param name="outputColType"></param>
protected SequentialTransformerBase(int windowSize, int initialWindowSize, string inputColumnName, string outputColumnName,
string name, IHostEnvironment env, ColumnType outputColType)
Copy link
Contributor

@Zruty0 Zruty0 Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IHostEnvironment [](start = 25, length = 16)

This is always the first parameter in all our ctors. Also, the 'env plus name' pattern is uncommon: we tend to just take IHost in those protected ctors #Resolved

verWrittenCur: 0x00010001, // Initial
verReadableCur: 0x00010001,
//verWrittenCur: 0x00010001, // Initial
verWrittenCur: 0x00010002, // Added saving _state and _nextPrediction
Copy link
Contributor

@Zruty0 Zruty0 Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saving _state [](start = 52, length = 13)

Why do you need to do it now? Is this for the stateful row and spawning off new models? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the preexisting test TestEstimatorCore() will pass on new timeseries estimators. One of the test steps in TestEstimatorCore() is to save and re-load transformer from disk. Without this fix saving and reloading of timeseries SSA transformers changes its prediction due to lost _state and _nextPrediction


In reply to: 229477366 [](ancestors = 229477366)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ganik !

@shauheen shauheen merged commit 1391107 into dotnet:master Oct 31, 2018
@ganik ganik deleted the ganik/ssa branch November 2, 2018 00:03
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants