Skip to content

Commit ab846e5

Browse files
chore: Document some thoughts for nano-service approach (#691)
Signed-off-by: Joseph Sinclair <[email protected]>
1 parent 2528d68 commit ab846e5

File tree

2 files changed

+127
-0
lines changed

2 files changed

+127
-0
lines changed

docs/design/Nano-Service-Approach.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Block Node Nano-Service Approach
2+
3+
## Abstract
4+
5+
To date, block node has been developed under pressure and with changing, incomplete, or inaccurate requirements.
6+
As a result, the system is tightly interconnected, and suffers from a difficulty making changes to
7+
one segment of the system without also impacting unrelated segments. To address this, and ensure
8+
the Block Node is able to be extended, enhanced, and deployed in the manner required for all
9+
current identified usage, a modified structure of the system is herein detailed.
10+
11+
## Revised Diagram
12+
13+
![Module/Service Diagram](assets/Block-Node-Nano-Services.svg)
14+
15+
## Definitions
16+
17+
<dl>
18+
<dt>Event</dt>
19+
<dd>A Java object defined by the messaging service that enables each other service
20+
to publish a service-defined object with content specific to that service. Note,
21+
the name of this object is not defined in this document. "Event" is a generic term.</dd>
22+
<dt>Service</dt>
23+
<dd>A Java module that is deployed with a particular installation of the Hiero
24+
Block Node. In most cases these modules are housed in independent jars to make
25+
adding and removing services (in a custom deployment) easier.</dd>
26+
</dl>
27+
28+
## Core Concepts
29+
30+
1. Helidon and API objects are restricted to the API layer.
31+
* The less we flow these externally defined interfaces and classes through
32+
the system, the more easily we can potentially make outward facing API
33+
changes without reworking the internal design.
34+
2. Most services are not required to be present in every deployment of the block node.
35+
3. No service should depend on classes or interfaces from another service.
36+
* That service might not be deployed, and each service should be removable
37+
without breaking other services. The exception is the Messaging service.
38+
* To this end, services should be independent modules with clearly defined
39+
and carefully controlled interactions.
40+
4. Two services are required for all deployments, Messaging and Status
41+
* Only Messaging should offer any internal API at all.
42+
* Basically, Messaging is where data and events are published (presumably
43+
via LMAX Disruptor instances) among other services.
44+
* The Status service is only required because it is specified as always
45+
present for a block node client to query via gRPC API.
46+
* The Messaging service should _not_ have any external (i.e. gRPC) API.
47+
* We must remain vigilant to avoid packing these two services with interfaces or extra classes.
48+
* These services should be as slim as possible, and interactions between
49+
services should be based on a very limited set of `Record` messages ("Events")
50+
that are passed between services (blindly) by the Messaging service
51+
rather than interfaces or direct method calls.
52+
5. There is an assumption in this approach that Messaging offers both "push"
53+
and "pull" options for receiving messages, and each service may choose the
54+
most appropriate interaction for that specific service.
55+
* A persistence service, for instance, might use "push" for simplicity and
56+
because it does not benefit from holding items within Messaging, but
57+
a streaming client service might use "pull" in order to allow each of
58+
many remote clients to be receiving data at slightly varying rates and
59+
more easily switch from live to historical and back if a particular
60+
client falls behind and later "catches up".
61+
6. Most services both publish and observe the service "event" messages
62+
* By listening for events, any service can react to changes in any other
63+
service, but also behave reasonably when another service does not exist.
64+
* Publishing an event (rather than calling an API) makes it easy for each
65+
service to focus entirely on its own function and not try to work out the
66+
highly complex possible interactions with all other possible services.
67+
* Some services (e.g. Archive service) won't make sense if _nothing_ publishes
68+
a particular event, but even then the service need not be concerned with
69+
the how/what/why of a event, and need only react if and when a event is
70+
encountered with the relevant type and content.
71+
7. Many services will _also_ listen to the main data messages (List<BlockItem>)
72+
which is the primary data flowing through the system.
73+
* Note that Publisher service is also not required, so this flow of data might
74+
be empty, or might be produced from some other source.
75+
* There _might_ also be a stream of "historical" blocks used to serve client
76+
requests for those blocks. This is still to be determined.
77+
8. Configuration for a service is entirely restricted to that service, and does
78+
not determine what "version" of a service or whether a service is running.
79+
* It _might_ assist multiple loaded "versions" to identify a conflict.
80+
9. The JVM `ServiceLoader` is used to load every service that is present, this
81+
may include multiple services of the same "type" (e.g. multiple archive
82+
services, multiple persistence services, etc...).
83+
* It is up to the particular services to ensure that either multiple
84+
different versions cooperate properly or an error is published on
85+
startup that multiple incompatible services are loaded. Generally it's
86+
cleanest if multiple services of the same type are able to work
87+
independently without issues. If that isn't possible, a service-
88+
specific configuration is a good alternative.
89+
90+
## Expected Benefits
91+
92+
1. Services are decomposed to small units, often what is thought of as a single
93+
process is accomplished by multiple nano-services. This makes each such
94+
service both simple and focused. This also makes adding, removing, and
95+
modifying these services much easier and faster.
96+
* It's also much easier to test services with nothing more than a mock of the
97+
"Messaging" service; which further improves velocity.
98+
2. Composing services may be easier to reason about than composing interfaces,
99+
and systems composed of independent services are easier to modify and revise
100+
than systems with many interacting method calls or complex module
101+
interactions.
102+
3. It is much easier to reason about concurrency for a single focused service
103+
than it is for a larger and more interconnected set of components.
104+
105+
## Considerations and Possible Concerns
106+
107+
1. Sending messages between services is not as efficient as calling a method.
108+
* This is true, but publishing a message consumed by an unknown set of (potentially)
109+
several services is significantly more efficient than trying to manage an uncertain
110+
(and possibly large) number of direct method calls. We are electing to
111+
prioritize emergent behavior and capability over direct-call efficiency.
112+
2. Some services may not make any sense without other services. For example,
113+
a Content Proof service might not be able to function without a State
114+
Management service and/or State Snapshot service.
115+
* If a particular service requires other services, it should document the
116+
expected events (e.g. publish "Need Snapshot For Instant{date/time}" and
117+
expect "Deliver Snapshot For Instant{date1/time1}") and also document
118+
behavior if the response-type event is not published.
119+
* Every service should function, at least to the level of not throwing
120+
exceptions, regardless of which other services are, or are not, present.
121+
* While a service may require certain _messages_ to function correctly
122+
(e.g. a "Deliver Snapshot For Instant..." message in the example above),
123+
the service _must not_ concern itself with _what_ produces those messages
124+
or _how_. This ensures that all services function as intended even if
125+
other services are replaced with completely different, but _compatible_
126+
services.

docs/design/assets/Block-Node-Nano-Services.svg

Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)