Skip to content

Client Improvements Plan

dubsky edited this page Dec 8, 2017 · 12 revisions

This is a list of improvements that could form the next major relase of the InfluxDb java client.

Serialization of data points to/from java objects

Nice deserialization API, efficiency

It is a best practice in java world to represent data as objects. The current library provides capability to serialize query results to java beans:

@Measurement(name = "cpu")
public class Cpu {
    @Column(name = "time")
    private Instant time;
    @Column(name = "host", tag = true)
    private String hostname;
    @Column(name = "region", tag = true)
    private String region;
    @Column(name = "idle")
    private Double idle;
    @Column(name = "happydevop")
    private Boolean happydevop;
    @Column(name = "uptimesecs")
    private Long uptimeSecs;
    // getters (and setters if you need)
}

QueryResult queryResult = influxDB.query(new Query("SELECT * FROM cpu", dbName));

InfluxDBResultMapper resultMapper = new InfluxDBResultMapper(); // thread-safe - can be reused
List<Cpu> cpuList = resultMapper.toPOJO(queryResult, Cpu.class);
  • The API isn't very nice - we don't need InfluxDBResultMapper.
  • It is not efficient - this will always reinstantiate the whole result as java beans at once

Solution

   Iterator<Cpu> cpus=influxDB.query(new Query("SELECT * FROM cpu", dbName, Cpu.class));

This mechanism works nicely also with chunking.

Missing serialization of Java objects to data points

To write a data point you need to serialize it as follows:

Point point = Point.measurement("disk")
					.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
					.addField("used", 80L)
					.addField("free", 1L)
					.build();
influxDB.write(dbName, rpName, point);

There is no way to use the previously defined and annotated Cpu class to write a data point.

Solution

@Measurement(name = "cpu")
public class Cpu {

    public Cpu(Instant time, String host, String region, Double idle) {
        this.time=time;
        this.host=host;
        this.region=region;
        this.idle=idle;
    }

    ....
}
....

Cpu cpu=new Cpu(new Instant(), 80L, 1L);
influxDB.write(dbName, rpName, cpu);

Schema validation

Point point = Point.measurement("disk")
					.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
					.addField("used", 80)
					.addField("free", 1.0)
  • If the current approach of writing data points into InfluxDB is used it requires user to be quite careful - first write into a measurement defines data types of its fields. In the example above the 'used' field should have been a float, but if submitted as is it will be an integer. The measurement will have to be dropped and recreated to fix this.

  • Also tag structure is usually defined for a measurement (some fields/ tags are required etc). With this approach it is easy forget about a tag/field.

  • It is easy to make a typo when defining field/tag or evean measurement.

Solution

Allow the user to define a schema for the measurement where each write will have to be compliant with the schema. The javascript client has solved this:

https://node-influx.github.io/class/src/index.js%7EInfluxDB.html

Asynchronous API

There is a request to provide async API for the client.

https://github.com/influxdata/influxdb-java/issues/386

This is also related to the error handling problem since we need to signal errors correctly in the async scenario. We don't want to introduce new API that would be changed after fixing the problem below.

Currently asynchronous processing is available for certaing use cases. For example when writing data points you have to explicitly enable batching to get async behavior.

Solution

Provide asynchronous (callback based) method for missing use cases.

Handling of errors

The error handling of the library is very basic - it just detects errors based on non-2xx error code returned by influx DB. No further analysis of the error information contained in the response is performed.

Client vs. Server Error distinction

There is already a request to make distinction between errors caused by the client (http status 4xx) and server failures (http staus 5xx)

https://github.com/influxdata/influxdb-java/issues/375

Solution

As a solution we would implement a hiearchy of exception objects (for example InfluxDBClientException, InfluxDBServerException inheriting from the existing InfluxDBException.

Also 4xx status codes we would parse the JSON delivered and provide correct error message sent by Influx as the InfluxDBException message property.

Partial writes

The current client doesn't expect there might be partial writes. The client should get error information only for the data points that failed to be written. This applies when the batching mode is used.

Solution

As mentioned in the previous section we would provide additional method on the exception object to return information about data point that failed to be written.

Handling of partially completed writes into multi-node cluster

Recently it has been resolved the problem of setting the consistency level setting:

https://github.com/influxdata/influxdb-java/pull/385

However, there still a problem of propagating the detailed error information to the client.

Batch Processing Issues

There was a request to handle errors during batch writes (and it has been fulfilled), however the solution using BiConsumer interface is not very nice and should be reworked so that it is able to transfer error information mentioned above.

Also the user might get notified not only on errors but also when the points were sucessfully written into influx db.

Related information:

https://github.com/influxdata/influxdb-java/pull/319 https://github.com/influxdata/influxdb-java/issues/381

Error signalling for chunked query responses

Current interface doesn't force the user to handle/catch errors that happen when evaluating chunked query responses.

The documentation even shows an example where error handling is completely missing.

Cleanup of API write methods

All the improvements above will have impact on the API, especially write methods. Therefore we should avoid having too many of them and we want them to behave predictable.

Still we would keep the current API available for backward compatibility, perhaps deprecate some of the existing methods we we see these are no londer necessary.

We would also fix the following issue:

https://github.com/influxdata/influxdb-java/issues/378

Clone this wiki locally