Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added dataset parameters to dataflow debug settings and added tests #228

Conversation

ClementVaillantCodit
Copy link
Contributor

  • Added dataset parameters to dataflow debug settings
    • For information, debugSettings.DatasetParameters is of type BinaryData. Data is basically a JSON representation of a Dictionary<string, Dictionary<string, object>>, where the high level dictionary contains the keys of the dataset names, and the low level dictionary contains the actual dataset parameter keys and values
  • Added tests, @stijnmoreels since DataSetParameters property is internal, not sure how I could assert that if a dataset parameter is added to a not empty dictionary of parameters, it is added to the dictionary and not overridden. Any suggestions for this please? Should I expose the RunDataFlowOptions.DataSetParameters via a method?

@ClementVaillantCodit ClementVaillantCodit requested a review from a team as a code owner November 21, 2024 16:22
Copy link

netlify bot commented Nov 21, 2024

Deploy Preview for arcus-testing canceled.

Name Link
🔨 Latest commit 386cc4e
🔍 Latest deploy log https://app.netlify.com/sites/arcus-testing/deploys/674fffa3c96320000803b830

@stijnmoreels stijnmoreels changed the title Added dataset parameters to dataflow debug settings and added tests feat: Added dataset parameters to dataflow debug settings and added tests Nov 21, 2024
Copy link
Member

@stijnmoreels stijnmoreels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you found the time to work on this a bit!
Besides these comments and related to your question on 'how to test this', I would not make the DataSetParameters public as asserting on the existence of those parameters in that dictionary does not garantuee that the dataset parameter was 'correctly added'. That can only be determined if those parameters are actually used with a real dataset.

Is it an idea to make the container name of the datasets in the TemporaryDataFactoryDataFlow test fixture configurable with dataset parameters? Or some other metadata stuff (can be anything). The same test fixture can then expose a method to retrieve back that metadata so that the test can assert that indeed adding dataset parameters result in a changed dataset.

So:

  • Add integration test with at least three dataset parameters to verify that multiple parameters for the same dataset and for different datasets can be added correctly.
  • Update docs\preview\02-Features\06-Integration/01-data-factory.mdx with the new options.AddDataSetParameter(...) method.

ClementVaillantCodit and others added 3 commits November 22, 2024 08:43
- Reorganized using directives in TemporaryDataFlowDebugSession.cs.
- Modified RunDataFlowOptions to change DataSetParameters type, add null checks, and handle nested dictionaries.
- Enhanced TemporaryDataFactoryDataFlow with new properties, methods, and dataset parameter handling.
- Updated RunDataFlowTests with a new test for dataset parameters and removed obsolete tests.
@ClementVaillantCodit
Copy link
Contributor Author

@stijnmoreels I'd appreciate a fresh review on the newly pushed changes please. Added new methods specific to run the dataflow with dataflow params, really unsure if that's a good way to go.
Looking forward to read your comments!

@arcus-automation
Copy link
Collaborator

arcus-automation commented Nov 27, 2024

🧪 Code coverage summary

Metric Value
Line coverage 🟢 91.6%
Branch coverage 🟢 80.9%

Great job! 😎 Your code coverage is higher than my caffeine levels! ☕

Copy link
Member

@stijnmoreels stijnmoreels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already a big step forward on testing out whether we can add dataset parameters. Great job!
The remaining comments are all about the same thing: placing the right kind of information in the test body as currently the parameters are a bit hidden away in the TemporaryDataFactoryDataFlow while they are in fact the leading role in the test. As a test reader, it should be clear that we can provide parameters and that this results in a changed interaction with the DataSet.

Folder paths are I think a good way of doing tihs, indeed, as you provide many sub folders - and, so - many parameters. Only: these sub paths should be generated by the test and provided by the test, instead of being the responsibility of the fixture.

Some comments here already give you inspiration, but contact me if you want to have a chat if something is not clear. This is mostly about limiting duplication/code and place the right focus in the test.

Thx!

ClementVaillantCodit and others added 7 commits November 28, 2024 08:49
- Introduced new classes to encapsulate DataSet options for source and sink in Azure DataFactory.
- Updated methods to utilize these new classes, enhancing flexibility and configurability.
- Refactored and removed redundant methods, consolidating functionality.
- Updated tests to reflect these changes and added a helper method for generating randomized DataSet parameter keys and values.
Copy link
Member

@stijnmoreels stijnmoreels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost there, some final remarks on duplication and readability. Biggest chunk of work is behind us. Great job!

ClementVaillantCodit and others added 3 commits November 29, 2024 14:38
…eterKeyValues` to `AddFolderPathParameters`
…Values properties.

- Updated methods to use dataFlowOptions directly.
- Simplified UploadToSourceAsync method signature and filePath construction.
ClementVaillantCodit and others added 4 commits December 3, 2024 10:14
…xture/TemporaryDataFactoryDataFlow.cs

Co-authored-by: Stijn Moreels <[email protected]>
- Consolidated AddDataSetParameters into ApplyOptions with new parameter.
- Modified ApplyOptions signatures to remove ref and add new argument.
- Directly pass dataset parameter key-values to ApplyOptions.
- Removed `sourceDataSetParameterKeyValues` and `sinkDataSetParameterKeyValues` from `ApplyOptions` method calls and definitions.
- Updated `ApplyOptions` to iterate over `SourceDataSetParameterKeyValues` and `SinkDataSetParameterKeyValues` directly.
- Adjusted `RunDataFlowTests` to reflect the optional `tempDataFlowOptions` parameter.
Copy link
Member

@stijnmoreels stijnmoreels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're there. Thx again!

@ClementVaillantCodit ClementVaillantCodit merged commit b826f6d into main Dec 4, 2024
15 checks passed
@ClementVaillantCodit ClementVaillantCodit deleted the 226-integrationdatafactory-add-support-to-set-dataset-parameters branch December 4, 2024 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Integration.DataFactory] Add support to set Dataset parameters
3 participants