--
--
- ADTs
Option
s,Either
s- effects:
ZIO
, catsIO
, monixTask
- composability
- typeclasses, newtypes, higher-kinded types
- ...
- final?
- constant improvement
- bugfixes
- limited amount of time
--
- may never be final
- always imperfect
- even assuming your software is perfect
- outer world is not
- distributed system
- backing services
- kafka, postgres, mongo, redis, ...
- information exchange
- app needs to run "somewhere"
- target platform
- vm?
- kubernetes?
- target platform
.-~~~-.
.- ~ ~-( )_ _
/ ~ -.
| \
\ .'
~- . _____________ . -~
.-~~~-.
.- ~ ~-( )_ _
/ ~ -.
| \
\ .'
~- . _____________ . -~
--
Different rules:
- apps are ephemeral
- fast startup
- graceful shutdown
- frequent restarts
- stateless preferred
- easier to scale horizontally
- containers everywhere
- "self-containing" apps
- started?
- fully initialized?
- ready to handle incoming requests?
--
Started Initialized
|----------------|------------------------------------------>
time
not ready yet can handle incoming traffic
--
- parallel init
- faster
- higher cpu usage at startup
- example:
ZLayer
--
ZLayer.make[Requirements](
CommonConfig.layer,
prometheus.publisherLayer,
prometheus.prometheusLayer,
Bootstrap.sttpBackendLayer,
...,
ProductService.layer,
PostgresDatabase.transactorLive,
Tracing.live,
Baggage.live(),
TracingService.layer,
ContextStorage.fiberRef,
JaegerTracer.live
)
- usually by receiving SIGTERM
- refuse new requests
- finish current ones
--
- cats
Resource
- ZIO
Scope
(ZManaged
)ZIO.acquireReleaseExitWith(acquire)(release)(use)
- effect systems work quite well here
- in terms of signal handling and interruptions
--
Shutdown signal Graceful shutdown? Forced shutdown
|--------------------------------------------->
0s 30s
--
- not implemented?
- force quit
- data losses
- resources not properly released
- takes more time to close the app
- deployment rollout
--
- problem is visible
- not ideal but at least we know
- crashed app can be restarted
- when?
--
override val run =
program.catchAll { error =>
ZIO.logError(s"Ooops, an error occurred: ${error.show}")
}
--
override val run =
program.catchAll { error =>
ZIO.logError(s"Ooops, an error occurred: ${error.show}") *>
exit(ExitCode(2))
}
--
override val run =
program.tapError { error =>
ZIO.logError(s"Ooops, an error occurred: ${error.show}")
}
--
- returning 0 (success)
- may prevent automatic restart
- still works
- works fine?
--
"hidden" failure?
detect it restart / give time to recover
--
but hidden failures happen sometimes
- don't cause whole app to crash
- they make the app unhealthy
crashed > hidden failures?
- are you health?
- are you busy?
--
- executed periodically
- kubernetes example:
- liveness probe - restart
- readiness probe - mark not ready
--
val postgres =
Healthcheck(
"postgres",
for
check <- fr"SELECT 1"
.query[Int]
.unique
.transact(transactor)
.timeout(postgresTimeout)
.orDie
yield check match
case Some(1) => Healthcheck.Status.Ok
case _ => Healthcheck.Status.Error
)
end postgres
--
- what can go wrong?
- too frequent restarts
- restart is (probably) not a permanent fix
- interrupts other parts of the program
- example: kafka consumers
--
- no restart (no problem) > crash / restart
- fine-grained control
- local retries
- restart part of your program
--
private val schedule =
Schedule.recurWhileZIO[Any, Error] {
case RateLimit => ZIO.logInfo("Rate limit exceeded").as(true)
case _ => ZIO.succeed(false)
} && {
(
Schedule.spaced(10.seconds).jittered && Schedule.recurs(7)
).orElse(
Schedule.spaced(10.minutes).jittered && Schedule.recurs(7)
)
}
--
...
someExternalCall
.retry(schedule)
...
- metrics / logs
- pull vs push based
- warning:
- logs can singnificantly slow down your app
--
val asyncJobsInProgress = Metric.gauge("jobs_running")
...
for
_ <- ZIO.logInfo("Starting job")
_ <- PrometheusMetrics.asyncJobsInProgress.increment
...
yield ...
--
ZIO.logAnnotate(
LogAnnotation("key", "val"),
LogAnnotation("key2", "val2")
) {
ZIO.logInfo("Starting ...") *> effect
}
--
- monitoring / log based
- why?
--
- dashboards
- alerting
--
--
- what else?
- stateless?
- exposes metrics / logs info
--
- cpu
- memory
- custom metric
- app needs to expose
- example: kafka lag
- benefits?
- you can even save money!
- auto scale down
- traces show the flow of the request
- what happens next?
- how much time it takes?
- metadata? no problem
--
- standrard
- still fresh and evolving
- not specific implementation
- collection of APIs, SDKs and tools
--
- traces
- metrics
- logs
--
- it's backend responsibility to support the standard
- backends are easier to replace
- jaeger ...
- app needs to send traces to collector
- libraries for popular laguages
- automatic instrumentation available for some of them
--
...
effect @@ tracing.aspects.extractSpan(
TraceContextPropagator.default,
carriers.input,
spanName,
SpanKind.SERVER
) @@ PrometheusMetrics.requestHandlerTimer
.tagged("endpoint", spanName)
.trackDuration
- internal impl ==> outer view
- developer needs to help outer tools
- to make use of their features
- example
- healthchecks
- monitoring / tracing / logs
- fast startup / graceful shutdown
- exit code