Skip to content

Conversation

@danovaro
Copy link
Member

@danovaro danovaro commented Aug 2, 2025

Description

The aim is to unify the WipeVisitor and expose some catalogue and store methods responsible for the backend-specific checks and removal actions.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

🌈🌦️📖🚧 Documentation 🚧📖🌦️🌈
https://sites.ecmwf.int/docs/dev-section/fdb/pull-requests/PR-158

Copy link
Contributor

@simondsmart simondsmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added far too many comments to this PR. And I think that it could get lost in the weeds. My overall comments:

  • The breakdown of Catalogue/store functionality looks good.
  • Clearly we need to have a mutable Catalogue object through this process. This probably means creating a base EntryVisitor class with a non-const, private catalogue_entry, accessible to two derived classes MutableEntryVisitor and ImmutableEntryVisitor (or similar) which access a protected catalogue() which returns a reference differing in constness.
  • This wipe runs fully on the catalogue, dispatching to the stores. But this does not work in all cases. Consider a local disk-based store, or OpenFAM store - the remote catalogue has neither access, nor filesystem permissions to make modifications to the store. As such, control here needs to be routed through the client process. This may require the catalogue to sign change requests to the store (or at least, to have a placeholder for doing this now).
  • I'm not sure that WipeElements is the correct structure to be storing for state to be communicated around this process. That is for communicating back to clients through the API - and it feels a bit forced in places.
  • I can see why we had a second-level visitor pattern, as there is state which needs to be held. Pushing this all to be persistent state in the Catalogue object is extremely ugly, and makes our thread-nastiness even worse. I think that we need, fundamentally, some type of class-specific context object. Probably a friend object. Which stores that persistent state through the wipe process. Should be easy to extract.

#ifndef fdb5_ListIterator_H
#define fdb5_ListIterator_H

#include <unordered_map>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm very surprised to see this being added to this header when the list functionality should be essentially untouched here...

}

size_t encodeSizeString(const std::string& s) {
return (1 + 4 + s.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the 1 and the 4 for?

size_t WipeElement::encodeSize() const {
size_t aux = 0;
for (const auto& l : uris_) {
aux += 256;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

256? wot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this should add the length of the URI? Or something dependent on that.


void WipeElement::print(std::ostream& out) const {

LOG_DEBUG_LIB(LibFdb5) << "Wipe(type=" << type_ << ",msg=" << msg_ << ",uris=[";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof. This is unpleasant inside ::print(), as this method is typically called in the middle of ostream output. Which results in info/error/debug output becoming interleaved on the terminal. A but unpleasant.


// If the catalogue cannot be wiped, then we don't proceed.
// This is a safeguard against wiping a catalogue that is not in a state to be wiped.
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entry visitors can recurse all the way down to elements. If you return false from visitDatabase it says to the visit mechanism not to visit the indexes or the entries explicitly. i.e. stop here.

void deselectIndex() override { NOTIMP; }

std::vector<eckit::PathName> metadataPaths() const override { NOTIMP; }
// std::vector<eckit::PathName> metadataPaths() const override { NOTIMP; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not needed, delete.

const metkit::mars::MarsRequest& request, std::ostream& out, bool doit, bool porcelain,
bool unsafeWipeAll) :
WipeVisitor(request, out, doit, porcelain, unsafeWipeAll), catalogue_(catalogue), store_(store), dbKvName_("") {}
DaosWipeVisitor::DaosWipeVisitor(const DaosCatalogue& catalogue, const metkit::mars::MarsRequest& request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me.

Most of the other WipeVisitors have been removed from the code. I think correctly.

But here, we seem to have just commented most of the functionality out. So I suspect that this is just broken.

return true;
}
bool DaosStore::doWipe(const std::vector<eckit::URI>& unknownURIs) const {
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doWipe is really a NOP?


eckit::URI locationURI{index.location().uri()};

if (!locationURI.path().dirName().sameAs(basePath())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

return forwardApiCall(StatsHelper(), request);
}

WipeIterator RemoteFDB::wipe(const FDBToolRequest& request, bool doit, bool porcelain, bool unsafeWipeAll) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this is that it implies that wipe runs fully on the Catalogue (none of the communication is via the client).

What happens in the case that we have a remote Catalogue, but local stores (e.g. stores on Lustre). The Catalogue may not have the right filesystem permissions to wipe data which has been written by the user.

I'm not sure that this design is functionally correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants