Skip to content

Conversation

blackwinter
Copy link
Member

@blackwinter blackwinter commented Sep 5, 2025

This is a proof-of-concept implementationsolution for extracting the Fix commands from the current enums into individual classes, thus paving the path for direct-linking them in the documentation as well as enabling other Metafacture modules and third-party libraries/applications to register Fix commands of their own (see e.g. Limetrans). Although it's already possible to call custom Java functions (see original issue), those always tend to stick out unfavourably. In addition, it would be nice if we could add "proper" Fix functions from other modules and/or move functions into their respective modules (but maybe extract metafix-api first).

All this to say that we could gain a lot more flexibility by moving away from closed enums. I haven't run any benchmarks yet, though. It remains to be seen what kind of performance impact this refactoring may have. Also, Javadoc and tests are still missing. I'll attend to the missing pieces once we're in agreement that we indeed want to pursue this effort.

Please note that I've preserved the hitherto "inofficial" distinction between script-level, record-level and field-level methods, but now codifying it in the package names. There's no need for it, though; we can also lump them all together in a single package. Finally, the class names are directly derived from the command names, leading to potential conflicts with other frequently used classes (e.g. org.metafacture.metafix.bind.List vs. java.util.List, or org.metafacture.metafix.method.record.Timestamp vs. org.metafacture.metamorph.functions.Timestamp). We might want to reconsider the naming scheme.

And finally finally: This opens up the possibility to align Fix commands with Flux commands in terms of documentation (@Description annotation) and tooling (automatically generate fix-commands.md) as well as enable Flux command registration in the same vein (the duplication of the Flux command names in flux-commands.properties and @FluxCommand has always bugged me...).

It would be great if you could give the implementation a critical look (although the diff is huge, sorry!) and, most importantly, verify that all your use cases are still functional.

@fsteeg
Copy link
Member

fsteeg commented Sep 10, 2025

Cool, that looks very good, thanks for starting that!

The extensibility with new fix commands is great, as well as getting rid of the large and growing enums, and the documentation possibilities.

Two thoughts on naming:

  • We might want to use this opportunity to rename the FixFunction interface to FixMethod (the old enum's name, now free) to avoid function/method confusion, like having all the FixFunction implementations in the method (sub)package(s).
  • As a replacement for the reserved class keyword, I'm a bit irritated by klass (probably because in my head, I 'pronounce' it in German, not like class). I'd prefer clazz, which I think is more common (and e.g. used in the JDK itself).

[...] distinction between script-level, record-level and field-level methods [...] we can also lump them all together in a single package.

I like the additional structure.

Finally, the class names are directly derived from the command names, leading to potential conflicts with other frequently used classes

Could it be a solution to make these package-private? They would not be offered or imported accidentally in most places. Couldn't we make the implementations of the Fix commands package-private in general? They are not, and are not intended to be called from other packages, right?

@fsteeg fsteeg removed their request for review September 10, 2025 16:13
@fsteeg fsteeg removed their assignment Sep 10, 2025
@dr0i dr0i removed their assignment Sep 11, 2025
@blackwinter
Copy link
Member Author

We might want to use this opportunity to rename the FixFunction interface to FixMethod (the old enum's name, now free) to avoid function/method confusion, like having all the FixFunction implementations in the method (sub)package(s).

I'll give my rationale later, but thanks for bringing it up.

As a replacement for the reserved class keyword, I'm a bit irritated by klass (probably because in my head, I 'pronounce' it in German, not like class). I'd prefer clazz, which I think is more common (and e.g. used in the JDK itself).

Indeed, there's also precedent in Metafacture itself. I guess I was more in a Ruby mindset at the time, where klass is actually more common ;)

I like the additional structure.

So do I. But, for me, it's never really clear which method belongs in which category. I'm not even sure whether the current layout is actually "correct".

Could it be a solution to make these package-private? They would not be offered or imported accidentally in most places. Couldn't we make the implementations of the Fix commands package-private in general? They are not, and are not intended to be called from other packages, right?

No, that's not an option. I'll explain later.

TobiasNx added a commit to hbz/lobid-resources that referenced this pull request Sep 23, 2025
Tests: metafacture/metafacture-core#706

There seems to be some kind of error with regard to adding subjects.
@TobiasNx
Copy link
Contributor

@blackwinter I tried lobid-resources with it, after ./gradlew publishToMavenLocal this branch. It seems that there is some kind of error in the transformation of the subject when using this branch, see:
hbz/lobid-resources#2215

@TobiasNx TobiasNx assigned blackwinter and unassigned TobiasNx Sep 23, 2025
@blackwinter
Copy link
Member Author

Thanks for giving it a try! But the changes you're seeing aren't related to this pull request.

@blackwinter blackwinter assigned TobiasNx and unassigned blackwinter Sep 23, 2025
@blackwinter
Copy link
Member Author

We might want to use this opportunity to rename the FixFunction interface to FixMethod (the old enum's name, now free) to avoid function/method confusion, like having all the FixFunction implementations in the method (sub)package(s).

Theoretically, we could already have done it before. But the intention was and still is to separate the concrete Fix method/bind/conditional implementations from the abstract types they represent (function/context/predicate, resp.), while at the same time bridging the gap to Metamorph functions a little. Meaning, I can define my own Fix function (just like I can define a custom Metamorph function), but it's not a Fix method in the sense that it's not a part of the "official" language (as specified by this particular implementation). In other words: One is the API terminology, the other is the language terminology.

Having said that, I'm not opposed to changing the terminology if it's causing confusion. In the end, it doesn't really matter.

Could it be a solution to make these package-private? They would not be offered or imported accidentally in most places. Couldn't we make the implementations of the Fix commands package-private in general? They are not, and are not intended to be called from other packages, right?

Sorry, but I wholeheartedly disagree here. First of all, it doesn't change the fact that the names would still collide. Second of all, those classes are definitely intended to be used elsewhere (notably in custom Fix commands, e.g. in Limetrans).

https://github.com/hbz/limetrans/blob/17256e363cea5359f932c7ec0f049137f41eca50/src/main/java/hbz/limetrans/function/MemberLocal.java#L24

Finally, it wouldn't even work with the current package layout; we'd have to move all commands into the top-level package org.metafacture.metafix.

java.lang.IllegalAccessException: class org.metafacture.metafix.api.FixRegistry cannot access a member of class org.metafacture.metafix.method.record.XXX with modifiers ""

@blackwinter
Copy link
Member Author

Hearing no objections, I'll move it along soon(-ish).

Finalizing the refactoring.
Allowing additional commands to be registered by other Metafacture modules as well as by third-party libraries/applications.

Also opens up the possibility to eventually enable Flux command registration in the same vein.

The command registry is scoped by Metafix instance in order to avoid global state.
The Fix definition is parsed as soon as the Metafix instance is created, thus any custom Fix commands that are going to be used must be registered in the Metafix instance's registry at construction time.
To be removed in version 8.0.0.
blackwinter added a commit to hbz/limetrans that referenced this pull request Oct 9, 2025
@blackwinter blackwinter force-pushed the moveFixCommandEnumsToIndividualClasses branch from c4b5966 to 45a833d Compare October 10, 2025 07:48
blackwinter added a commit that referenced this pull request Oct 10, 2025
This change breaks backwards compatibility.
@blackwinter
Copy link
Member Author

Thanks! I've deprecated the enums (45a833d) and prepared the removal in a different branch (dropFixCommandEnums).

@blackwinter
Copy link
Member Author

blackwinter commented Oct 10, 2025

Benchmark results (for the previously reviewed code, see commit hashes):

Benchmark (fixDef) (input) Score @ 0a058ba Score @ c4b5966 Units Boost
Baseline N/A N/A 1516.759 ± 1.417 1515.548 ± 0.861 ops/us -0.08%
FixParse nothing N/A 38.431 ± 1.645 38.536 ± 1.383 ops/s +0.27%
FixParse alma N/A 27.068 ± 1.210 26.404 ± 1.072 ops/s -2.45%
Metafix nothing empty 1700.376 ± 7.482 1699.012 ± 14.093 ops/s -0.08%
Metafix nothing alma-small 24.202 ± 1.035 23.395 ± 0.463 ops/s -3.33%
Metafix alma empty 525.383 ± 16.180 531.367 ± 6.625 ops/s +1.14%
Metafix alma alma-small 3.079 ± 0.028 3.172 ± 0.037 ops/s +3.02%
SlowMetafix nothing alma-large 14.451 ± 0.031 14.111 ± 0.415 ops/min -2.35%
SlowMetafix alma alma-large 1.821 ± 0.109 1.821 ± 0.011 ops/min +0.00%

Limetrans results for writing approx. 2m records to local file (N=3):

Metafacture Limetrans Runtime Boost
master (0a058ba) master (hbz/limetrans@5aabbf8) 3h29m33s
moveFixCommandEnumsToIndividualClasses (c4b5966) moveFixCommandEnumsToIndividualClasses (hbz/limetrans@450a3b0) 3h59m29s +14.29%

The micro-benchmarks are looking overall unremarkable, but the real-world workload shows a slowdown of more than 10 %. Shouldn't be too alarming, but not exactly what we would like to see either. 😞

@TobiasNx
Copy link
Contributor

TobiasNx commented Oct 10, 2025

Lobid-Resources looks good at the level of our tests, test run a little faster it seems. With regard to OERSI the integration of custom java classes as fix function need some adjustments, the build fails:

BUILD FAILED in 3s
1 actionable task: 1 executed
data/production/bcCampus_textbooks

> Task :compileJava FAILED
/home/tobias/git/oersi-etl/src/main/java/oersi/HtmlToText.java:19: error: cannot find symbol
import org.metafacture.metafix.FixMethod;
                              ^
  symbol:   class FixMethod
  location: package org.metafacture.metafix
1 error

FAILURE: Build failed with an exception.

Which class should I import now?

@blackwinter
Copy link
Member Author

With regard to OERSI the integration of custom java classes as fix function need some adjustments, the build fails:

Which version have you tested? c4b5966 (breaking) or 45a833d (non-breaking)?

@blackwinter
Copy link
Member Author

blackwinter commented Oct 10, 2025

See e.g. this test for the kind of change that is required after the enums have been dropped:

https://github.com/metafacture/metafacture-core/pull/706/files#diff-954daefbc22a10816f770c6c9cb0954eed93796a9fdc24438849c84d5c283cbf

@TobiasNx
Copy link
Contributor

Which version have you tested? c4b5966 (breaking) or 45a833d (non-breaking)?

I used the branch before you pushed the new commits, now I used the newer one and it seems that they work!

@blackwinter
Copy link
Member Author

@dr0i: Do you intend to review as well?

@dr0i dr0i removed their request for review October 13, 2025 07:59
@dr0i dr0i assigned blackwinter and unassigned dr0i Oct 13, 2025
@dr0i
Copy link
Member

dr0i commented Oct 13, 2025

I think this PR is reviewed good enough. You can merge @blackwinter .

@blackwinter blackwinter merged commit 725599e into master Oct 13, 2025
1 check passed
@blackwinter blackwinter deleted the moveFixCommandEnumsToIndividualClasses branch October 13, 2025 08:40
@github-project-automation github-project-automation bot moved this from Review to Done in Metafacture Oct 13, 2025
blackwinter added a commit to hbz/limetrans that referenced this pull request Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants