Skip to content

Client Improvements Plan

dubsky edited this page May 30, 2018 · 12 revisions

This is a list of improvements that could form the next major relase of the InfluxDb java client.

1. Serialization of data points to/from java objects

1.1. Nice deserialization API, efficiency

It is a best practice in java world to represent data as objects. The current library provides capability to serialize query results to java beans:

@Measurement(name = "cpu")
public class Cpu {
    @Column(name = "time")
    private Instant time;
    @Column(name = "host", tag = true)
    private String hostname;
    @Column(name = "region", tag = true)
    private String region;
    @Column(name = "idle")
    private Double idle;
    @Column(name = "happydevop")
    private Boolean happydevop;
    @Column(name = "uptimesecs")
    private Long uptimeSecs;
    // getters (and setters if you need)
}

QueryResult queryResult = influxDB.query(new Query("SELECT * FROM cpu", dbName));

InfluxDBResultMapper resultMapper = new InfluxDBResultMapper(); // thread-safe - can be reused
List<Cpu> cpuList = resultMapper.toPOJO(queryResult, Cpu.class);
  • The API isn't very nice - we don't need InfluxDBResultMapper.
  • It is not efficient - this will always reinstantiate the whole result as java beans at once

Solution

   Iterator<Cpu> cpus=influxDB.query(new Query("SELECT * FROM cpu", dbName, Cpu.class));

This mechanism works nicely also with chunking. The returned iterator could be also replaced by a lazy initialized List.

1.2. Missing serialization of Java objects to data points

To write a data point you need to serialize it as follows:

Point point = Point.measurement("disk")
					.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
					.addField("used", 80L)
					.addField("free", 1L)
					.build();
influxDB.write(dbName, rpName, point);

There is no way to use the previously defined and annotated Cpu class to write a data point.

Solution

@Measurement(name = "cpu")
public class Cpu {

    public Cpu(Instant time, String host, String region, Double idle) {
        this.time=time;
        this.host=host;
        this.region=region;
        this.idle=idle;
    }

    ....
}
....

Cpu cpu=new Cpu(new Instant(), 80L, 1L);
influxDB.write(dbName, rpName, cpu);

1.3. Schema validation

Point point = Point.measurement("disk")
					.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
					.addField("used", 80)
					.addField("free", 1.0)
  • If the current approach of writing data points into InfluxDB is used it requires user to be quite careful - first write into a measurement defines data types of its fields. In the example above the 'used' field should have been a float, but if submitted as is it will be an integer. The measurement will have to be dropped and recreated to fix this.

  • Also tag structure is usually defined for a measurement (some fields/ tags are required etc). With this approach it is easy forget about a tag/field.

  • It is easy to make a typo when defining field/tag or evean measurement.

Solution

Allow the user to define a schema for the measurement where each write will have to be compliant with the schema. The javascript client has solved this:

https://node-influx.github.io/class/src/index.js%7EInfluxDB.html

2. Asynchronous API [Partially done]

There is a request to provide async API for the client.

https://github.com/influxdata/influxdb-java/issues/386

This is also related to the error handling problem since we need to signal errors correctly in the async scenario. We don't want to introduce new API that would be changed after fixing the problem below.

Currently, asynchronous processing is available for certain use cases. For example when writing data points you have to explicitly enable batching to get the async behavior.

There was a couple of issues reported where users were confused (their fault) and couldn't recognize that the API can be used asynchronously already.

Solution

Provide asynchronous (callback-based) method for missing use cases.

3. Handling of errors [Partially done]

The error handling of the library is very basic - it just detects errors based on non-2xx error code returned by influx DB. No further analysis of the error information contained in the response is performed.

3.1. Client vs. Server Error distinction [Partially done]

There is already a request to make a distinction between errors caused by the client (http status 4xx) and server failures (http status 5xx)

https://github.com/influxdata/influxdb-java/issues/375

Solution

As a solution, we would implement a hierarchy of exception objects (for example InfluxDBClientException, InfluxDBServerException inheriting from the existing InfluxDBException.

Also 4xx status codes we would parse the JSON delivered and provide correct error message sent by Influx as the InfluxDBException message property.

3.2. Partial writes

The current client doesn't expect there might be partial writes. The client should get error information only for the data points that failed to be written. This applies when the batching mode is used.

Solution

As mentioned in the previous section we would provide an additional method on the exception object to return information about data point that failed to be written.

3.3. Handling of partially completed writes into a multi-node cluster

Recently it has been resolved the problem of setting the consistency level setting:

https://github.com/influxdata/influxdb-java/pull/385

However, there still a problem of propagating the detailed error information to the client.

3.4. Batch Processing Issues

There was a request to handle errors during batch writes (and it has been fulfilled), however the solution using BiConsumer interface is not very nice and should be reworked so that it is able to transfer error information mentioned above.

Also, the user might get notified not only of errors but also when the points were successfully written into influx db.

Related information:

https://github.com/influxdata/influxdb-java/pull/319 https://github.com/influxdata/influxdb-java/issues/381

3.5. Error signaling for chunked query responses

Current interface doesn't force the user to handle/catch errors that happen when evaluating chunked query responses.

The documentation even shows an example where error handling is completely missing.

4. Cleanup of API write methods

All the improvements above will have an impact on the API, especially write methods. Therefore we should avoid having too many of them and we want them to behave predictably.

Still, we would keep the current API available for backward compatibility, perhaps deprecate some of the existing methods we see these are no longer necessary.

We would also fix the following issue:

https://github.com/influxdata/influxdb-java/issues/378

5. Observer publishing pattern

Right now the user has to build and push data points to InfluxDb. In reality, the user is just monitoring some process and instead of forcing him to implement the process if watching for the events, we can do that for him. For example:

@Observable(measurement = "cpu")
public class Cpu {
    @Column(name = "time")
    private Instant time;
    @Column(name = "host", tag = true)
    private String hostname;
    @Column(name = "region", tag = true)
    private String region;
    @Column(name = "idle")
    private Double idle;
    @Column(name = "happydevop")
    private Boolean happydevop;
    @Column(name = "uptimesecs")
    private Long uptimeSecs;
    // getters (and setters if you need)
}

Cpu cpu=new Cpu();
influxDB.observe(cpu, 1000);

This would log the cpu object into influxdb every 1000ms.

influxDB.observe(cpu, 1000);

It would be nice to implement something similar for logging only changes to the cpu object. It would get even simpler:

influxDB.observe(cpu);
Clone this wiki locally