Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EventHubs] Idempotent producer #16756

Conversation

yunhaoling
Copy link
Contributor

@yunhaoling yunhaoling commented Feb 16, 2021

@ghost ghost added the Event Hubs label Feb 16, 2021
@yunhaoling yunhaoling marked this pull request as ready for review February 24, 2021 15:40
@@ -392,6 +417,20 @@ def size_in_bytes(self):
"""
return self._size

@property
def starting_published_sequence_number(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel this needs a rename....
And I think my preference is to go with published_sequence_number have the documentation indicate that this refers to the first in the batch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC batches are transactional right? There's no partial failure here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, batches are transactional, either all events in the batch get sent or failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference on the name -- personally I think the "starting_" prefix adds some clarity for a batch object while with just published_sequence_number would give us the benefit of consistency with EventData.

@jsquire , would you be able to provide some context on the "starting_" prefix being chosen instead of just starting_published_sequence_number?

cc: @chradek @conniey

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Starting" was chosen to denote that it was the sequence number applied to the first event in the batch. The batch itself is not assigned a sequence number, nor is it considered a single item by the service.

To determine the next sequence number that the service expects, one has to take the batch's starting sequence number and increment it by the number of events that were in the batch. This caused confusion for our beta users that expected the "LastPublishedSequenceNumber" returned as part of the partition publishing properties did not align with the "PublishedSequenceNumber" that we assigned to the batch. It was agreed that the prefix "Starting" helped as a mnemonic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a separate issue for further discussion on the namings: #16994

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can keep as-is for preview and settle before GA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the preview may stay for a while (or even be pulled out) so we should have time to discuss among languages :P

Copy link
Member

@swathipil swathipil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a bunch of comments, but mostly about doc cleanup - lmk if comments look wrong or don't make sense :)

@yunhaoling
Copy link
Contributor Author

/azp run python - eventhub - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yunhaoling
Copy link
Contributor Author

/azp run python - eventhub - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yunhaoling
Copy link
Contributor Author

/azp run python - eventhub - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yunhaoling
Copy link
Contributor Author

/azp run python - eventhub - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yunhaoling
Copy link
Contributor Author

/azp run python - eventhub - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yunhaoling
Copy link
Contributor Author

/azp run python - eventhub - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -135,7 +135,8 @@ async def on_event(partition_context, event):
ed = EventData("Event Number {}".format(i))
ed.properties = app_prop
event_list.append(ed)
senders[0].send(event_list)

senders[0].send(EventDataBatch._from_batch(event_list))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it funny that in our tests (both EH and SB), we use these private shortcuts to create a batch...
I wonder how many customers do the same :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea.. I remember there was once a discussion on enabling the Batch object to accept a list, however the pitfall here is when trying to adding a list into batch which exceeds the batch size, then users still have to loop over the list and add event one by one to avoid exceeding the size limit.

@yunhaoling yunhaoling changed the base branch from master to feature/eventhub/idempotent-producer March 4, 2021 17:36
@yunhaoling
Copy link
Contributor Author

/azp run python - eventhub - tests

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@yunhaoling
Copy link
Contributor Author

livetest passed which could be found in the python - eventhub - tests 20210304.3

@yunhaoling yunhaoling merged commit 7ad47d8 into Azure:feature/eventhub/idempotent-producer Mar 4, 2021
yunhaoling added a commit that referenced this pull request Mar 5, 2021
* sync idempotent producer constructor

* sync idempotent producer prototype

* imporve constants

* async impl

* remove duplicate code of validation on outgoing eventdata

* fix bug, add basic test

* fix mypy and pylint

* fix implementation bug

* review feedback

* validate partition configs

* add tests

* add changelog and samples

* update readme

* update shared-requirements

* update tests yml to test unreleased uamqp v1.2.15

* fix tests

* more test fix

* more test fix

* addressing comments

* add more docs

* fix tests

* fix tox warning

* fix pylint

* fix pylint

* remove the change in tests.yml

* revert non-existing links first

* update setup.py

* fix pylint
yunhaoling added a commit to yunhaoling/azure-sdk-for-python that referenced this pull request Mar 5, 2021
* sync idempotent producer constructor

* sync idempotent producer prototype

* imporve constants

* async impl

* remove duplicate code of validation on outgoing eventdata

* fix bug, add basic test

* fix mypy and pylint

* fix implementation bug

* review feedback

* validate partition configs

* add tests

* add changelog and samples

* update readme

* update shared-requirements

* update tests yml to test unreleased uamqp v1.2.15

* fix tests

* more test fix

* more test fix

* addressing comments

* add more docs

* fix tests

* fix tox warning

* fix pylint

* fix pylint

* remove the change in tests.yml

* revert non-existing links first

* update setup.py

* fix pylint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants