Releases: openzipkin/zipkin
Zipkin 1.3
Zipkin 1.3 includes highlighting of spans in error state and improvements to the Cassandra storage component.
Error annotations
Inspired by recent work in OpenTracing, we've added a new annotation "error". When an annotation value, this indicates when a potentially transient error occurred. When a binary annotation key, the value is a human readable message associated with a error resulting in a failed span. See #1140 for details.
Thanks to @virtuald the UI acts according to these rules, highlighting degraded spans yellow, and failed ones red.
Instrumentation (like Brave, zipkin-tracer etc) need to change to support this. Please help if you have time!
Span.timestamp, duration 0 coerce to null
We've noticed some instrumentation log invalid timestamp and duration of 0, when they meant to log null. Timestamp or duration of 0 microseconds are invalid or don't explain latency. We now coerce these 0s to null. For cases where a sub-microsecond span duration occurred, you should round up to 1. See #1155 and #1176
Elasticsearch daily bucket fix
We found and fixed a concurrency bug that could put spans into the wrong daily buckets. See #1175
Cassandra
Schema bug fix
We found a bug where traces against the same service in the same millisecond weren't indexed. This affects indexes only (trace data itself wasn't lost). For example, you might find a trace that exists in cassandra, but you can't query it using the api.
Specifically, the following indexes now have trace_id
added to their PRIMARY_KEY definitions.
- service_span_name_index
- service_name_index
- annotations_index
There's no automatic data migration available. The most straight-forward way to address this in an existing cluster is to drop the following indexes and restart a zipkin server (which will recreate them as long as CASSANDRA_ENSURE_SCHEMA=true
). You can also update the indexes manually based on the schema
Tuning
We've done a lot of work tuning the amount of data written to indexes on a per-span basis. Those using Cassandra should see a significant drop in index size due to reasons documented in the tuning section of the README.
Query logging
Those supporting zipkin may need to debug query latency. We now use the QueryLogger which is enabled when the log category "com.datastax.driver.core.QueryLogger" is at debug or trace level. Trace level includes bound values. See #1156
Zipkin 1.2
Zipkin 1.2.1 includes Prometheus metrics and Elasticsearch bug fixes.
Prometheus metrics are enabled by default, under the /prometheus
endpoint.
Many thanks to Kristian from Iterate for developing this feature!
1.1.5
Zipkin 1.1.4
This is a patch release that fixes a bug where CASSANDRA_ENSURE_SCHEMA
didn't work when the keyspace was absent. See #1128 for details
Zipkin 1.1.0
Zipkin 1.1 most notably offers improvements in the UI and the Cassandra storage component.
Zipkin UI
Thanks to @virtuald, the Zipkin UI now includes a JSON button! This allows you to see the json behind a trace diagram, something quite useful in support. For example, when people report problems in Zipkin, we often ask for json and this feature makes that easier.
@virtuald also improved error handling dramatically. Before, Zipkin wouldn't show server errors, so you'd have to use the javascript console to troubleshoot problems. Now, errors will show in pink boxes.
Cassandra Storage
Thanks to @michaelsembwever, the Cassandra schema now includes default ttls. These obviate the explicit ttls we were programmatically adding when storing each span. Those using CASSANDRA_ENSURE_SCHEMA
(default) will automatically update into this. Those manually controlling the schema should run cassandra-schema-cql3-upgrade-1.txt before Zipkin 2.0 is released (unplanned).
Throttled retries when storage is down
Zipkin can start when storage is unavailable. The health check will report as unavailable until it is. Before, any activation of storage would re-attempt to connect. This has been throttled to no more than once per second to avoid thrashing the network or process with retry attempts.
Zipkin 1.0
Many of you noticed we've been working on a new dependency-light codebase for Zipkin.
Over the last 10 months we've gone beyond feature parity with the previous server, and reduced moving parts until zipkin could be deployable as a single process. The new codebase also includes new features like Elasticsearch and better metrics.
Those already using zipkin should know zipkin-java is schema, api, and environment variable compatible with the old servers. We took great care to ensure it is a drop-in.
Those worried about being first to the fire will be interested in the fact that this code has been in development for 10 months and used in production already by companies like Bouyant, LINE.me and anyone using Spring Cloud Sleuth.