Skip to content

Internalization of TensorFlowUtils.cs and refactored TensorFlowCatalog. #2672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 1, 2019
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ public static void Example()
var idv = mlContext.Data.ReadFromEnumerable(data);

// Create a ML pipeline.
var pipeline = mlContext.Transforms.ScoreTensorFlowModel(
var pipeline = mlContext.Transforms.TensorFlow.ScoreTensorFlowModel(
modelLocation,
new[] { nameof(OutputScores.output) },
new[] { nameof(TensorData.input) });
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ public static void Example()
// Load the TensorFlow model once.
// - Use it for quering the schema for input and output in the model
// - Use it for prediction in the pipeline.
var modelInfo = TensorFlowUtils.LoadTensorFlowModel(mlContext, modelLocation);
var modelInfo = mlContext.Transforms.TensorFlow.LoadTensorFlowModel(modelLocation);
var schema = modelInfo.GetModelSchema();
var featuresType = (VectorType)schema["Features"].Type;
Copy link

@yaeldekel yaeldekel Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features [](start = 51, length = 8)

Can we add a sample that uses modelInfo.GetInputSchema() to find out what the name of the input node is?

#Resolved

Copy link
Contributor Author

@zeahmed zeahmed Feb 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see its being used at a couple of places in the tests e.g.

#Resolved

Console.WriteLine("Name: {0}, Type: {1}, Shape: (-1, {2})", "Features", featuresType.ItemType.RawType, featuresType.Dimensions[0]);
Expand All @@ -72,7 +72,7 @@ public static void Example()
var engine = mlContext.Transforms.Text.TokenizeWords("TokenizedWords", "Sentiment_Text")
.Append(mlContext.Transforms.Conversion.ValueMap(lookupMap, "Words", "Ids", new[] { ("VariableLenghtFeatures", "TokenizedWords") }))
.Append(mlContext.Transforms.CustomMapping(ResizeFeaturesAction, "Resize"))
.Append(mlContext.Transforms.ScoreTensorFlowModel(modelInfo, new[] { "Prediction/Softmax" }, new[] { "Features" }))
.Append(mlContext.Transforms.TensorFlow.ScoreTensorFlowModel(modelInfo, new[] { "Prediction/Softmax" }, new[] { "Features" }))
.Append(mlContext.Transforms.CopyColumns(("Prediction", "Prediction/Softmax")))
.Fit(dataView)
.CreatePredictionEngine<IMDBSentiment, OutputScores>(mlContext);
Expand Down
16 changes: 16 additions & 0 deletions src/Microsoft.ML.Data/Transforms/TransformsCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ public sealed class TransformsCatalog
/// </summary>
public FeatureSelectionTransforms FeatureSelection { get; }

/// <summary>
/// List of operations for using TensorFlow model.
/// </summary>
public TensorFlowTransforms TensorFlow { get; }
Copy link
Contributor

@TomFinley TomFinley Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorFlowTransforms [](start = 15, length = 20)

Please do not do this. Otherwise we will have an empty property unless someone imports the nuget, which is confusing and undesirable.

Please follow instead the pattern that we see in image processing. You'll note that we do not have an empty image processing nuget. Rather they are added to this catalog. Similar with ONNX scoring. You'll note that these both have extensions on this TransformsCatalog catalog. What we don't have are empty properties "Images" and "ONNX" littering our central object that users interact with to instantiate components.

This is defensible since we can take someone directly importing a nuget as a strong signal that they want to actually use those transforms. #Resolved


internal TransformsCatalog(IHostEnvironment env)
{
Contracts.AssertValue(env);
Expand All @@ -47,6 +52,7 @@ internal TransformsCatalog(IHostEnvironment env)
Text = new TextTransforms(this);
Projection = new ProjectionTransforms(this);
FeatureSelection = new FeatureSelectionTransforms(this);
TensorFlow = new TensorFlowTransforms(this);
}

public abstract class SubCatalogBase
Expand Down Expand Up @@ -109,5 +115,15 @@ internal FeatureSelectionTransforms(TransformsCatalog owner) : base(owner)
{
}
}

/// <summary>
/// The catalog of TensorFlow operations.
/// </summary>
public sealed class TensorFlowTransforms : SubCatalogBase
{
internal TensorFlowTransforms(TransformsCatalog owner) : base(owner)
{
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ public static void Main(string[] args)
return;
}

foreach (var (name, opType, type, inputs) in TensorFlowUtils.GetModelNodes(new MLContext(), args[0]))
foreach (var (name, opType, type, inputs) in new MLContext().Transforms.TensorFlow.GetModelNodes(args[0]))
{
var inputsString = inputs.Length == 0 ? "" : $", input nodes: {string.Join(", ", inputs)}";
Console.WriteLine($"Graph node: '{name}', operation type: '{opType}', output type: '{type}'{inputsString}");
Expand Down
10 changes: 5 additions & 5 deletions src/Microsoft.ML.TensorFlow/TensorFlow/TensorflowUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ public static class TensorFlowUtils
/// Key to access operator's type (a string) in <see cref="DataViewSchema.Column.Metadata"/>.
/// Its value describes the Tensorflow operator that produces this <see cref="DataViewSchema.Column"/>.
/// </summary>
public const string TensorflowOperatorTypeKind = "TensorflowOperatorType";
internal const string TensorflowOperatorTypeKind = "TensorflowOperatorType";
/// <summary>
/// Key to access upstream operators' names (a string array) in <see cref="DataViewSchema.Column.Metadata"/>.
/// Its value states operators that the associated <see cref="DataViewSchema.Column"/>'s generator depends on.
/// </summary>
public const string TensorflowUpstreamOperatorsKind = "TensorflowUpstreamOperators";
internal const string TensorflowUpstreamOperatorsKind = "TensorflowUpstreamOperators";

internal static DataViewSchema GetModelSchema(IExceptionContext ectx, TFGraph graph, string opType = null)
{
Expand Down Expand Up @@ -94,7 +94,7 @@ internal static DataViewSchema GetModelSchema(IExceptionContext ectx, TFGraph gr
/// </summary>
/// <param name="env">The environment to use.</param>
/// <param name="modelPath">Model to load.</param>
public static DataViewSchema GetModelSchema(IHostEnvironment env, string modelPath)
internal static DataViewSchema GetModelSchema(IHostEnvironment env, string modelPath)
{
var model = LoadTensorFlowModel(env, modelPath);
return GetModelSchema(env, model.Session.Graph);
Expand All @@ -109,7 +109,7 @@ public static DataViewSchema GetModelSchema(IHostEnvironment env, string modelPa
/// <param name="env">The environment to use.</param>
/// <param name="modelPath">Model to load.</param>
/// <returns></returns>
public static IEnumerable<(string, string, DataViewType, string[])> GetModelNodes(IHostEnvironment env, string modelPath)
internal static IEnumerable<(string, string, DataViewType, string[])> GetModelNodes(IHostEnvironment env, string modelPath)
{
var schema = GetModelSchema(env, modelPath);

Expand Down Expand Up @@ -338,7 +338,7 @@ private static void CreateTempDirectoryWithAcl(string folder, string identity)
/// <param name="env">The environment to use.</param>
/// <param name="modelPath">The model to load.</param>
/// <returns></returns>
public static TensorFlowModelInfo LoadTensorFlowModel(IHostEnvironment env, string modelPath)
internal static TensorFlowModelInfo LoadTensorFlowModel(IHostEnvironment env, string modelPath)
{
var session = GetSession(env, modelPath);
return new TensorFlowModelInfo(env, session, modelPath);
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.TensorFlow/TensorFlowModelInfo.cs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ namespace Microsoft.ML.Transforms
/// </item>
/// </list>
/// </summary>
public class TensorFlowModelInfo
public sealed class TensorFlowModelInfo
Copy link

@yaeldekel yaeldekel Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorFlowModelInfo [](start = 24, length = 19)

Would it be possible to rename this to TensorFlowModel? #Resolved

{
internal TFSession Session { get; }
public string ModelPath { get; }
Copy link

@yaeldekel yaeldekel Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

I think this can also be internal. #Resolved

Expand Down
47 changes: 41 additions & 6 deletions src/Microsoft.ML.TensorFlow/TensorflowCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Collections.Generic;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
using Microsoft.ML.Transforms.TensorFlow;

namespace Microsoft.ML
{
Expand All @@ -25,7 +27,7 @@ public static class TensorflowCatalog
/// ]]>
/// </format>
/// </example>
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog,
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog.TensorFlowTransforms catalog,
string modelLocation,
string outputColumnName,
string inputColumnName)
Expand All @@ -45,7 +47,7 @@ public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog ca
/// ]]>
/// </format>
/// </example>
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog,
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog.TensorFlowTransforms catalog,
string modelLocation,
string[] outputColumnNames,
string[] inputColumnNames)
Expand All @@ -58,7 +60,7 @@ public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog ca
/// <param name="tensorFlowModel">The pre-loaded TensorFlow model.</param>
/// <param name="inputColumnName"> The name of the model input.</param>
/// <param name="outputColumnName">The name of the requested model output.</param>
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog,
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog.TensorFlowTransforms catalog,
TensorFlowModelInfo tensorFlowModel,
string outputColumnName,
string inputColumnName)
Expand All @@ -78,7 +80,7 @@ public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog ca
/// ]]>
/// </format>
/// </example>
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog,
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog.TensorFlowTransforms catalog,
TensorFlowModelInfo tensorFlowModel,
string[] outputColumnNames,
string[] inputColumnNames)
Expand All @@ -90,7 +92,7 @@ public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog ca
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="options">The <see cref="TensorFlowEstimator.Options"/> specifying the inputs and the settings of the <see cref="TensorFlowEstimator"/>.</param>
public static TensorFlowEstimator TensorFlow(this TransformsCatalog catalog,
public static TensorFlowEstimator TensorFlow(this TransformsCatalog.TensorFlowTransforms catalog,
TensorFlowEstimator.Options options)
=> new TensorFlowEstimator(CatalogUtils.GetEnvironment(catalog), options);

Expand All @@ -100,9 +102,42 @@ public static TensorFlowEstimator TensorFlow(this TransformsCatalog catalog,
/// <param name="catalog">The transform's catalog.</param>
/// <param name="options">The <see cref="TensorFlowEstimator.Options"/> specifying the inputs and the settings of the <see cref="TensorFlowEstimator"/>.</param>
/// <param name="tensorFlowModel">The pre-loaded TensorFlow model.</param>
public static TensorFlowEstimator TensorFlow(this TransformsCatalog catalog,
public static TensorFlowEstimator TensorFlow(this TransformsCatalog.TensorFlowTransforms catalog,
TensorFlowEstimator.Options options,
TensorFlowModelInfo tensorFlowModel)
=> new TensorFlowEstimator(CatalogUtils.GetEnvironment(catalog), options, tensorFlowModel);

/// <summary>
/// This method retrieves the information about the graph nodes of a TensorFlow model as an <see cref="DataViewSchema"/>.
/// For every node in the graph that has an output type that is compatible with the types supported by
/// <see cref="TensorFlowEstimator"/>, the output schema contains a column with the name of that node, and the
/// type of its output (including the item type and the shape, if it is known). Every column also contains metadata
/// of kind <see cref="TensorFlowUtils.TensorflowOperatorTypeKind"/>, indicating the operation type of the node, and if that node has inputs in the graph,
/// it contains metadata of kind <see cref="TensorFlowUtils.TensorflowUpstreamOperatorsKind"/>, indicating the names of the input nodes.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="modelLocation">Location of the TensorFlow model.</param>
public static DataViewSchema GetModelSchema(this TransformsCatalog.TensorFlowTransforms catalog, string modelLocation)
Copy link

@yaeldekel yaeldekel Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetModelSchema [](start = 37, length = 14)

I think GetModelSchema should take a TensorFlowModelInfo instead of a string
(so that users can use the already loaded model they got from LoadTensorFlowModel). #Resolved

=> TensorFlowUtils.GetModelSchema(CatalogUtils.GetEnvironment(catalog), modelLocation);

/// <summary>
/// This is a convenience method for iterating over the nodes of a TensorFlow model graph. It
/// iterates over the columns of the <see cref="DataViewSchema"/> returned by <see cref="GetModelSchema(TransformsCatalog.TensorFlowTransforms, string)"/>,
/// and for each one it returns a tuple containing the name, operation type, column type and an array of input node names.
/// This method is convenient for filtering nodes based on certain criteria, for example, by the operation type.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="modelLocation">Location of the TensorFlow model.</param>
public static IEnumerable<(string, string, DataViewType, string[])> GetModelNodes(this TransformsCatalog.TensorFlowTransforms catalog, string modelLocation)
Copy link

@yaeldekel yaeldekel Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetModelNodes [](start = 76, length = 13)

This doesn't need to be a part of the public API. #Resolved

Copy link
Contributor Author

@zeahmed zeahmed Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being use here.

foreach (var (name, opType, type, inputs) in TensorFlowUtils.GetModelNodes(new MLContext(), args[0]))

If I don't expose it like this then I will have to make the internal one public. What do you suggest?


In reply to: 258728605 [](ancestors = 258728605)

=> TensorFlowUtils.GetModelNodes(CatalogUtils.GetEnvironment(catalog), modelLocation);

/// <summary>
/// Load TensorFlow model into memory. This is the convenience method that allows the model to be loaded once and subsequently use it for querying schema and creation of
/// <see cref="TensorFlowEstimator"/> using <see cref="TensorFlow(TransformsCatalog.TensorFlowTransforms, TensorFlowEstimator.Options, TensorFlowModelInfo)"/>.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="modelLocation">Location of the TensorFlow model.</param>
public static TensorFlowModelInfo LoadTensorFlowModel(this TransformsCatalog.TensorFlowTransforms catalog, string modelLocation)
=> TensorFlowUtils.LoadTensorFlowModel(CatalogUtils.GetEnvironment(catalog), modelLocation);
}
}
2 changes: 1 addition & 1 deletion test/Microsoft.ML.Tests/Scenarios/TensorflowTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ public void TensorFlowTransforCifarEndToEndTest()
var pipeEstimator = new ImageLoadingEstimator(mlContext, imageFolder, ("ImageReal", "ImagePath"))
.Append(new ImageResizingEstimator(mlContext, "ImageCropped", imageHeight, imageWidth, "ImageReal"))
.Append(new ImagePixelExtractingEstimator(mlContext, "Input", "ImageCropped", interleave: true))
.Append(mlContext.Transforms.ScoreTensorFlowModel(model_location, "Output", "Input"))
.Append(mlContext.Transforms.TensorFlow.ScoreTensorFlowModel(model_location, "Output", "Input"))
.Append(new ColumnConcatenatingEstimator(mlContext, "Features", "Output"))
.Append(new ValueToKeyMappingEstimator(mlContext, "Label"))
.AppendCacheCheckpoint(mlContext)
Expand Down
Loading