-
Notifications
You must be signed in to change notification settings - Fork 477
Client Improvements Plan
This is a list of improvements that could form the next major relase of the InfluxDb java client.
It is a best practice in java world to represent data as objects. The current library provides capability to serialize query results to java beans:
@Measurement(name = "cpu")
public class Cpu {
@Column(name = "time")
private Instant time;
@Column(name = "host", tag = true)
private String hostname;
@Column(name = "region", tag = true)
private String region;
@Column(name = "idle")
private Double idle;
@Column(name = "happydevop")
private Boolean happydevop;
@Column(name = "uptimesecs")
private Long uptimeSecs;
// getters (and setters if you need)
}
QueryResult queryResult = influxDB.query(new Query("SELECT * FROM cpu", dbName));
InfluxDBResultMapper resultMapper = new InfluxDBResultMapper(); // thread-safe - can be reused
List<Cpu> cpuList = resultMapper.toPOJO(queryResult, Cpu.class);
- The API isn't very nice - we don't need InfluxDBResultMapper.
- It is not efficient - this will always reinstantiate the whole result as java beans at once
Iterator<Cpu> cpus=influxDB.query(new Query("SELECT * FROM cpu", dbName, Cpu.class));
This mechanism works nicely also with chunking. The returned iterator could be also replaced by a lazy initialized List.
To write a data point you need to serialize it as follows:
Point point = Point.measurement("disk")
.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
.addField("used", 80L)
.addField("free", 1L)
.build();
influxDB.write(dbName, rpName, point);
There is no way to use the previously defined and annotated Cpu class to write a data point.
@Measurement(name = "cpu")
public class Cpu {
public Cpu(Instant time, String host, String region, Double idle) {
this.time=time;
this.host=host;
this.region=region;
this.idle=idle;
}
....
}
....
Cpu cpu=new Cpu(new Instant(), 80L, 1L);
influxDB.write(dbName, rpName, cpu);
Point point = Point.measurement("disk")
.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
.addField("used", 80)
.addField("free", 1.0)
-
If the current approach of writing data points into InfluxDB is used it requires user to be quite careful - first write into a measurement defines data types of its fields. In the example above the 'used' field should have been a float, but if submitted as is it will be an integer. The measurement will have to be dropped and recreated to fix this.
-
Also tag structure is usually defined for a measurement (some fields/ tags are required etc). With this approach it is easy forget about a tag/field.
-
It is easy to make a typo when defining field/tag or evean measurement.
Allow the user to define a schema for the measurement where each write will have to be compliant with the schema. The javascript client has solved this:
https://node-influx.github.io/class/src/index.js%7EInfluxDB.html
There is a request to provide async API for the client.
https://github.com/influxdata/influxdb-java/issues/386
This is also related to the error handling problem since we need to signal errors correctly in the async scenario. We don't want to introduce new API that would be changed after fixing the problem below.
Currently, asynchronous processing is available for certain use cases. For example when writing data points you have to explicitly enable batching to get the async behavior.
There was a couple of issues reported where users were confused (their fault) and couldn't recognize that the API can be used asynchronously already.
Provide asynchronous (callback-based) method for missing use cases.
The error handling of the library is very basic - it just detects errors based on non-2xx error code returned by influx DB. No further analysis of the error information contained in the response is performed.
There is already a request to make a distinction between errors caused by the client (http status 4xx) and server failures (http status 5xx)
https://github.com/influxdata/influxdb-java/issues/375
As a solution, we would implement a hierarchy of exception objects (for example InfluxDBClientException, InfluxDBServerException inheriting from the existing InfluxDBException.
Also 4xx status codes we would parse the JSON delivered and provide correct error message sent by Influx as the InfluxDBException message property.
The current client doesn't expect there might be partial writes. The client should get error information only for the data points that failed to be written. This applies when the batching mode is used.
As mentioned in the previous section we would provide an additional method on the exception object to return information about data point that failed to be written.
Recently it has been resolved the problem of setting the consistency level setting:
https://github.com/influxdata/influxdb-java/pull/385
However, there still a problem of propagating the detailed error information to the client.
There was a request to handle errors during batch writes (and it has been fulfilled), however the solution using BiConsumer interface is not very nice and should be reworked so that it is able to transfer error information mentioned above.
Also, the user might get notified not only of errors but also when the points were successfully written into influx db.
Related information:
https://github.com/influxdata/influxdb-java/pull/319 https://github.com/influxdata/influxdb-java/issues/381
Current interface doesn't force the user to handle/catch errors that happen when evaluating chunked query responses.
The documentation even shows an example where error handling is completely missing.
All the improvements above will have an impact on the API, especially write methods. Therefore we should avoid having too many of them and we want them to behave predictably.
Still, we would keep the current API available for backward compatibility, perhaps deprecate some of the existing methods we see these are no longer necessary.
We would also fix the following issue:
https://github.com/influxdata/influxdb-java/issues/378
Right now the user has to build and push data points to InfluxDb. In reality, the user is just monitoring some process and instead of forcing him to implement the process if watching for the events, we can do that for him. For example:
@Observable(measurement = "cpu")
public class Cpu {
@Column(name = "time")
private Instant time;
@Column(name = "host", tag = true)
private String hostname;
@Column(name = "region", tag = true)
private String region;
@Column(name = "idle")
private Double idle;
@Column(name = "happydevop")
private Boolean happydevop;
@Column(name = "uptimesecs")
private Long uptimeSecs;
// getters (and setters if you need)
}
Cpu cpu=new Cpu();
influxDB.observe(cpu, 1000);
This would log the cpu object into influxdb every 1000ms.
influxDB.observe(cpu, 1000);
It would be nice to implement something similar for logging only changes to the cpu object. It would get even simpler:
influxDB.observe(cpu);