-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor -> scatter/gather aggregator? #27
Comments
I completely agree that executors should aggregate messages into a collection before the result handler is called. This is certainly feasible too. I've been thinking a lot about the executor, and truthfully, I'm not really too fond of it. For one thing, I think it would be ideal to be able to deploy a network and then perform a remote procedure call on the network from the context from which it was deployed. This is probably the ideal behavior in most situations. For instance, I might want to make a REST server that performs a distributed RPC on a Vertigo network. I would want to create the HTTP server and deploy the network from the same context and then be able to perform the RPC from that context. Currently, the only way to accomplish this is by creating an HTTP server within an executor component, but I'm not sure this is ideal. So, I'm considering whether Vertigo should just provide utilities for collecting and aggregating messages or perhaps provide aggregation as a separate component as you mentioned. The way I see it there are a few options:
The only reason I am hesitant to deprecate and remove the Since the I honestly think I would prefer maintain only two types of components - feeders and workers - and implement operations on top of those types (splitters, aggregators, filters, etc). A while back I actually experimented with some new component implementations for splitters, aggregators, et. al., but I decided to stop development of a higher level API until some more structural changes have been completely. Specifically, I would like to provide options for strong ordering and exactly-once processing. Once those features are complete I would like to see higher level APIs and operations. What are your thoughts? I'm not totally prepared to remove |
Well, there's something to be said for simplicity (and option 1 does happen to be a sufficient solution for my immediate need, cough). Other options seem more involved. Just a minor release with an Aggregator / aggregation-capable Executor might be a nice interim thing, with the cool stuff coming later. Indeed option 2 (collector/aggregator utility classes) might longer-term desirable. And strong ordering and exactly-once are both also attractive features, of course, and option 2 seems to be in the same space, sort of. (I was also thinking of some ordering stuff, actually, there's a particular partial ordering we might want preserved at ingress and exit but out-of-order execution in between would probably mean more performance. That has implementation overlap with correlation and aggregation - basically I was considering naively aggregating then ordering, aggregating then ordering, within each correlated result group, but maybe ordering could be amortised across aggregation if correlators/aggregators/etc. are pluggable entities in themselves that are delegated to, saves having to have a different component type for every variation possible, I was going to try the naive pattern first to see if it was worth something more complex though, TBH it probably won't be in our case). I'm personally vague on what such an api should/would look like though (well, storm+trident exists too, but its streams-of-batches model is a bit different and we kinda want to avoid trident-like batching), you probably already have a much stronger idea given the 0.7 roadmap that's now up. I'm not at all sure I'm "getting" Option 3 (separate API for "rpc") though - what if you want an executor/aggregator-type component (/functionality-of-some-sort) to work inside some larger network, then you still end up needing something from option 1 or 2 anyway (unless I've misunderstood (edit- actually you may not have meant them as mutually exclusive))? The thing outside the network will still need something to call for rpc, so the network "as a whole" still needs to have a spot for input to go and output to come back. |
Okay, that sounds good to me too. I've already implemented this in a new branch. Since the The I'm going to commit this tomorrow and try to push a minor release for you. Currently, there are a lot of changes in master but it's definitely not ready for a release, so I'm going to create a release from a branch off the previous release. The release will likely contain this feature as well as support for deployment of worker verticles. All the other newer features will be pushed to the next big release since a lot of complimenting features are still missing. It's likely that the Edit: Alternatively, I guess an additional method could be added to the Also, what I meant by providing an API outside of network components in option 3 was something like this: vertigo.deployLocalNetwork(network, new Handler<AsyncResult<NetworkContext>>() {
public void handle(AsyncResult<NetworkContext> result) {
RPCClient client = new RPCClient(result.result(), vertx.eventBus());
client.execute(new JsonObject().putString("foo", "bar"), new Handler<AsyncResult<Collection<JsonMessage>>>() {
...
}
}
}); That is, the ability to deploy a network and then send messages to it from the context in which it was deployed (rather than from within a component, e.g.
|
By the way, the RPCFeeder or something like that is actually more in line with the direction in which Vertigo is going. That is, the goal is to provide only two basic component types - a data source and a data processor (Feeder and Worker) - and build a higher level API on top of that. So ultimately I planned to replace the executor with something. Also, an external RPC client may go against plans for Vertigo anyways. One of the features being planned is runtime network configuration changes (so networks don't have to be stopped to be updated, but instead Vertigo will coordinate to update components without losing messages) and any sort of external object that understands the internal details of a Vertigo network could be difficult to manage through that process. Perhaps this is a good move. |
Thanks! If you're willing to do a minor release shortly with an aggregation facility (one way or another given your edit), well, that'd be cool. Having I'll split the rest of my reply into two, X/Y, as the second bit sure turned out more speculative/rambling. X. the execute one vs many api question.
But consider a request to just some long pipeline - a network one might make just for reliability (and perhaps easy composition of predefined components during design)? That could often result in 1:1 request:response. So one-reply might actually be quite a common use case, if maybe not for you or me in particular. So perhaps you might want to provide both an
So a bit like the |
Part 2. Bearing in mind I haven't actually tried anything below so take with big pinch of salt, just been trying to think about it. I drafted most of the below before seeing your latest comment regarding runtime hot reconfiguration concerns etc, but taking that into consideration, maybe half-formed thoughts below could still help spark some ideas to help with stopping external objects knowing any internals of the network, if "external rpcs to the network" remains a goal, so may as well still inflict it upon you. Y. Regarding option 3 / network-external RPCY.1 RPCFeeders as RPC targetsI'm slightly wary of the name In the option 3 case, my understanding is that you would still have some specific component(s) within a network that are the ultimate targets for the external RPCs from outside the network. - i.e. an When sending an external RPC "to the network", though, it may be best not to address it to a particular component within the network, just "to the network". That would allow information-hiding / reimplementation of a different network providing the same rpc api. (and maybe facilitate hot reconfiguration). But then either you allow only one
Though the feedback/circular-connection wiring needed to make Y.2. Identifying which feeders are external rpc targetsSo, However, if an It may help to rename Y.3 re external RPC abstraction/layering concerns.API-details wise, rather than what you just showed - passing in non- Sure, it'll presumably ultimately hit the real In contrast, "network internal" uses of an Y.4 implicit provision of an eventbus-based rpc api, no RPCClient?Hmm. Maybe a bit heavy but, could it be arranged that, when a Network is deployed, a special Yes, the (Not sure how that would tie in with named rpc targets also proposed above - just more eventbus addresses maybe). Or maybe that's not sane and back to Y.3 where, to eventbus-expose the rpc api, at least the deployer of the network would just have be responsible for sending to the network with the help of a But either way that does sort of relate to plans to being able to deploy a json-specified network from the command line directly mentioned in 0.7 roadmap - that may well be a generic network-deployer-verticle/module that you're planning, and maybe that could do with such a the raw eventbus <-> network rpc-type interaction being auto-provided by Y.3 or Y.4 methods or equally likely some other way I haven't considered. Phew, hope all that wasn't complete rubbish. |
I love the long convo :-) I'm actually glad this issue is being tackled right now because it's raising a lot of questions about proper design. You seem to be very much on the right page. I think you're very close to the model that will need to be used to provide external access to networks over the event bus. Essentially, provide a newer version of the executor which can be used as an entry point to the network. So, The reason I say I'm glad this issue is being tackled is because some of the factories in Vertigo need to be refactored in order for this to work. As you mentioned - and I agree - we don't want to keep adding methods like So, that means Vertigo needs to be able to set up the correct components without knowing the implementation details. This is fine in Java. It's easy to abstract the feeder or worker implementation because users can extend I spent a lot of time on component factories and providing the right component types to language modules. So you're right, there was a reason the
The last part is the important step. Originally, Vertigo required users to create their own components within component implementations. So, in Javascript, if you created a network like: network.addFeeder("foo", "foo.js"); Then you would still have to actually create the feeder instance within var vertigo = require('vertigo');
vertigo.createFeeder().start(function(feeder) {
feeder.emit({foo: 'bar'});
}); I didn't like this because it meant feeders could be deployed as workers and workers could be deployed as executors. I like type safety, and that was just not proper, nor was it as easy to use as it could have been. So, with the addition of component typing I was able to clean up that Javascript feeder code like so: var vertigo = require('vertigo');
vertigo.feeder.emit({foo: 'bar'}); Essentially, if So, I'm sure you can see how this is beneficial. But I'm also sure you can see why a separate Indeed, this is an issue that will need to be resolved at some point as there have long been plans for higher level APIs such as filters, aggregators, etc. So I'm sort of brainstorming on how to fix this deficiency right now. I've written the The other alternative that I've often considered is separating tools like these - tools other than the bare minimum var vertigo = require('vertigo');
var higherLevelAPI = require('higherLevelAPI');
var rpc = higherLevelAPI.createRPC(vertigo.feeder);
rpc.execute({foo: 'bar'}, function(error, result) {
...
}); By the way, I do probably favor the idea of exposing a raw event bus API for externally communicating with a network's feeders or executors. I'm just not sure it's wise to add another verticle to all networks, most of which may not even need/use it. Perhaps it could be a network option though. Awesome input! Like I said, I'm working on fixing the executor. Hopefully I'll have something useful tonight. Obviously providing the |
Alright, I think I came up with a way that we can provide custom feeder/worker implementations that are compatible with the current API for dynamic languages. Basically, the implementations will have to be dynamically constructed in dynamic languages, with the specific method for doing that obviously dependent upon the language. For instance, in Python, the current API in a feeder component works like so: from vertigo import feeder
feeder.emit({'foo': 'bar'}) Internally, the Python module constructs the feeder when the from vertigo.feeder import drpc_feeder as feeder
def result_handler(error, result):
print result.body
feeder.emit({'foo': 'bar'}, result_handler) So, considering this, I think it would be fine to commit the The current implementation that I referenced above actually doesn't change the feeder interface at all. It simply changes the As for an As for networks and supporting external calls to a network via the event bus: rather than adding a single static entry point to networks, perhaps a new abstraction could be used specifically for this purpose. I wonder if there are enough use cases to warrant it, but something like a network.addService(new RestService('http://localhost:8080')); Rather than public class RestFeeder extends FeederVerticle {
@Override
public void start(Feeder feeder) {
vertx.createHttpServer().start(new Handler<AsyncResult<HttpServer>>() {
...
});
}
@Override
protected void nextMessage(Feeder feeder) {
}
} or network.addService(new EventBusService("some.address")); rather than public class EventBusFeeder extends FeederVerticle {
@Override
public void start(final Feeder feeder) {
vertx.eventBus().registerHandler("some.address", new Handler<Message<JsonObject>>() {
public void handle(Message<JsonObject> message) {
if (!feeder.feedQueueFull()) {
feeder.emit(message.body());
}
}
});
}
} Basically, those services are just common feeder implementations, but they are also likely very common use cases, especially in the context of Vert.x. Workers could still subscribe to input from specific services, the difference being just that services don't require specific implementations by the user. |
I confess I hadn't really thought much about other language needs (we are actually python users locally, both cpython and jython, but not in conjunction with vertx, at least not to date) The module import solution certainly sounds smart and fits in with the style of the existing vertx/vertigo python api. That api seems geared towards short "script-y" python, unlike the groovy language module there doesn't seem to be support for "class-y" usage - i.e. I can't do a |
Regarding a separate "Service" API, I reckon you mightn't need it if such services are just Feeders too (and you don't want complexities of the information hiding and implicit wiring and verticles as previously discussed). Since you already allow for supply of config objs to |
Regarding the DRPCFeeder impl, you say:
[SNIP - Nonsense largely arising from dgolden forgetting about Java's irritating type erasure, apologies] |
I know 0.7.0 is now going to bring a lot of interesting changes, have been quietly following the developments and group discussions - but sounds like it's now going to be (quite reasonably) a couple more months coming as a result. And unfortunately the drpc-feeder branch now also pre-dates various 0.6.4-era improvements and bugfixes at this stage.
For our use I have just (admittedly less than elegantly) locally extended the 0.6.4 (With the 0.7.0 api changing so significantly as per the readme etc., not sure it's worth worrying too much about 0.6.x api's long-term design..) |
Vertigo 0.7 is in beta and quite different. Not in a bad way, but just so that this ticket is pretty irrelevant by now, probably should just be closed... |
Especially given Executor already aggregates multiple results internally since #17, how about a variant that batches all those results together and passes it to a handler instead of repeatedly calling the result handler? Otherwise it's hard to tell when you've collected the last result arising from a given source message (or is it? vertx and vertigo newbie here...) even though that is effectively already known.
It seems like it would be very handy for a simple scatter/gather aggregation use case. I'm not sure whether it should be a whole separate component type ("Aggregator" or whatever) given it's so very similar to the existing Executor, but otoh the type signature of the handler may have to be very slightly different to take a collection of messages (although I suppose java also allows overloaded methods...)
The text was updated successfully, but these errors were encountered: