-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add id_generator functionality to the document cleaner. #7683
Conversation
Pull Request Test Coverage Report for Build 9083377539Details
💛 - Coveralls |
@silvanocerza see Github issue for previous discussion on how we got here. I can also take back over PR review because I discussed this with @CarlosFerLo Perhaps have a look as well for general comment and thumbs up/down. |
@CarlosFerLo this is much better and almost there. A few minor things needed to push this one through the finish line:
And that should be it, let's integrate this one at that point. Thanks @CarlosFerLo |
@vblagoje got it, working on it right now. I thought highlights were just a summary of everything, needed in all reno. Should I include issue numbers now on or not? |
No problem @CarlosFerLo , yeah just remove the issue reference, leave the enhancement portion only 🙏 |
@vblagoje I think that everything has been solved. :) |
@CarlosFerLo one last nitpick and we can integrate it 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong approach in my opinion.
The best and simpler approach from #7618 is the way to go. This adds too much unnecessary complexity.
@vblagoje you decide, although I believe this new implementation could have more applications, this extra complexity is somewhat too much for the little use it has. But it is your decision to make this merge or not. :) |
@silvanocerza @vblagoje should I reopen #7618 PR? |
Yeah, go ahead. |
Cannot reopen previous PR as it's based on the same branch as this one. Later I will create a new branch with the content of the other PR to be able to reopen the pull request. |
Hey @CarlosFerLo sure, apologies for back and forth, @silvanocerza is an expert in these details, let's go with his recommendations. |
@vblagoje do not worry, I like to learn how to implement different things, and at first I found your way more promissing. |
They are both good, it's nuances. My colleague comes with a lot more Python background and I trust his judgement in these types of decisions. |
See the new PR here: #7703 |
Superseded by #7703 closing this one |
Related Issues
Proposed Changes:
I added a new parameter to the
DocymentCleaner
calledid_generator
, it expects aCallable[[Document, Document], str]
and it is used to generate the new id for the cleaned documents. The first argument is the old document and the second is the new document.I have added two id generators
DEFFAULT_ID_GENERATOR
andKEEP_ID
where the first one keeps the id from the new document unchanged and the second one returns the id of the original document. The keep id function is to relate to the original issue.How did you test it?
Added a few unit tests.
Notes for the reviewer
I am not sure if the existence of the
DEFFAULT_ID_GENERATOR
andKEEP_ID
is what you have in mind, if you don't like it we may just delete them.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
. ✅