Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Interval bounds field #217

Open
wants to merge 2 commits into
base: v2-DEV
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 37 additions & 7 deletions src/Intervals.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ using Dates: AbstractDateTime, value, coarserperiod

import Base: ⊆, ⊇, ⊈, ⊉, union, union!, merge

# TODO: Drop these types in favour of symbols when we switch to extending IntervalSets.jl
abstract type Bound end
abstract type Bounded <: Bound end
struct Closed <: Bounded end
Expand All @@ -18,23 +19,52 @@ struct Unbounded <: Bound end

bound_type(x::Bool) = x ? Closed : Open

abstract type AbstractInterval{T, L <: Bound, R <: Bound} end
# Methods for convert between int and tuple representations for space efficiency
# (Open, Closed) is 16 bytes while the integer represenation is only 1 byte.
# TODO: Convert types to symbols when we switch to extending IntervalSets.jl
Copy link
Collaborator

@omus omus May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been catching up with the motivation behind these changes. Is there rational why using symbols is a better approach than using types? Should IntervalSets.jl be updated to use types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like it matters either way. My preference is more focused on making things consistent between the two, so Intervals.jl can extend IntervalSets.jl. As it stands, there doesn't seem to be much advantage to having them be types. We don't leverage the type hierarchy much and we don't store any information in them, so it's just extra names/types to remember. The practical benefit of switching to symbols is that it means we may be able to have Intervals.jl extend IntervalSets.jl with only one breaking release in Intervals.jl. Going the other way around would require a breaking release to IntervalSets.jl and a lot more updates in the ecosystem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the counter argument to using symbols is that by using types you can use the standard type inspection functions to determine what the names are used by this package. Dispatch on these types isn't used commonly outside of this package but I do think it's extremely useful for the internal logic in endpoint.jl.

bounds_int(l::Type{Open}, r::Type{Open}) = 0x00
bounds_int(l::Type{Closed}, r::Type{Open}) = 0x01
bounds_int(l::Type{Open}, r::Type{Closed}) = 0x02
bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x03

bounds_types(x::Integer) = bounds_types(Val(UInt8(x)))
Comment on lines +25 to +30
Copy link
Collaborator

@omus omus May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed opportunities to use bit masks:

Suggested change
bounds_int(l::Type{Open}, r::Type{Open}) = 0x00
bounds_int(l::Type{Closed}, r::Type{Open}) = 0x01
bounds_int(l::Type{Open}, r::Type{Closed}) = 0x02
bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x03
bounds_types(x::Integer) = bounds_types(Val(UInt8(x)))
bounds_int(l::Type{Open}, r::Type{Open}) = 0x00 | 0x00
bounds_int(l::Type{Closed}, r::Type{Open}) = 0x10 | 0x00
bounds_int(l::Type{Open}, r::Type{Closed}) = 0x00 | 0x01
bounds_int(l::Type{Closed}, r::Type{Closed}) = 0x10 | 0x01
function bounds_types(x::UInt8)
L = if x & 0x20 == 0x20
Unbounded
elseif x & 0x10 == 0x10
Closed
else
Open
end
R = if x & 0x02 == 0x02
Unbounded
elseif x & 0x01 == 0x01
Closed
else
Open
end
return (L, R)
end

Can make this logic even cleaner with something like: https://github.com/JuliaTime/TimeZones.jl/blob/master/src/class.jl

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my initial pass did this, but I concluded that the value dispatch was more declarative / readable and I didn't notice a performance difference either way... though maybe there's a specific case you're thinking of?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just from a reasoning standpoint it seems sensible to denote each of the 8-bits for a specific purpose. Your values from 0x00 - 0x03 follow that logic but the unbounded values do not. This makes it difficult to reason into what each bit does and use bitwise operators to determine information about bounds.

I'm fine with using Val dispatch if there isn't a performance issue.

bounds_types(::Val{0x00}) = (Open, Open)
bounds_types(::Val{0x01}) = (Closed, Open)
bounds_types(::Val{0x02}) = (Open, Closed)
bounds_types(::Val{0x03}) = (Closed, Closed)

# Extension for backwards support of Unbounded endpoints to avoid changing existing logic
# TODO: Drop these in favour of the approach in IntervalSets.jl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an approach has yet to be worked out: JuliaMath/IntervalSets.jl#123

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is whether it should be the responsibility of either IntervalSets.jl or Intervals.jl to handle this with yet another custom type. AFAICT, bounded-ness is just a question of whether an endpoint value isfinite which we already have an API for. My preference would be for that API to be handle independently of intervals like in Infinity.jl or Infinities.jl.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that using isfinite on endpoints does seem like a better interface. From what I recall from the Infinity.jl work the parts where this breaks down is that:

  1. Defining Infinite{T} seems like a great idea for making any type into something that supports infinity but you lose the proper type hierarchy which makes this break down in multiple dispatch. Ideally we'd want to define Infinite{T} <: supertype(T) to make this work.
  2. With 1 being infeasible we're left with making specific types to properly define supertypes. This results in not all types having an alternative infinite type.

This is where it started to make more sense to have the intervals to support infinity themselves. For example Interval{Char,Closed,Unbounded}.

An alternative strategy could be something like:

struct Interval{T}
    first::Union{T,NegInf}
    last::Union{T,PosInf}
end

but I doubt that would be an improvement over the v1 setup.

bounds_int(l::Type{Open}, r::Type{Unbounded}) = 0x04
bounds_int(l::Type{Closed}, r::Type{Unbounded}) = 0x05
bounds_int(l::Type{Unbounded}, r::Type{Open}) = 0x06
bounds_int(l::Type{Unbounded}, r::Type{Closed}) = 0x07
bounds_int(l::Type{Unbounded}, r::Type{Unbounded}) = 0x08
bounds_types(::Val{0x04}) = (Open, Unbounded)
bounds_types(::Val{0x05}) = (Closed, Unbounded)
bounds_types(::Val{0x06}) = (Unbounded, Open)
bounds_types(::Val{0x07}) = (Unbounded, Closed)
bounds_types(::Val{0x08}) = (Unbounded, Unbounded)

abstract type AbstractInterval{T} end

Base.eltype(::AbstractInterval{T}) where {T} = T
Base.broadcastable(x::AbstractInterval) = Ref(x)
bounds_types(x::AbstractInterval{T,L,R}) where {T,L,R} = (L, R)
# Subtypes should implement:
# 1. first and last accessors
# 2. bounds_integer and bounds_types accessor

include("isfinite.jl")
include("endpoint.jl")
include("interval.jl")
include("interval_sets.jl")
# include("interval_sets.jl")
include("anchoredinterval.jl")
include("parse.jl")
include("description.jl")
include("plotting.jl")
include("docstrings.jl")
include("deprecated.jl")
include("compat.jl")
#include("plotting.jl")
#include("docstrings.jl")
#include("deprecated.jl")
#include("compat.jl")

export Bound,
Closed,
Expand Down
17 changes: 10 additions & 7 deletions src/anchoredinterval.jl
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ AnchoredInterval{Minute(5), DateTime, Closed, Closed}(DateTime("2016-08-11T12:30

See also: [`Interval`](@ref), [`HE`](@ref), [`HB`](@ref)
"""
struct AnchoredInterval{P, T, L <: Bounded, R <: Bounded} <: AbstractInterval{T,L,R}
struct AnchoredInterval{P, T, L <: Bounded, R <: Bounded} <: AbstractInterval{T}
anchor::T

function AnchoredInterval{P,T,L,R}(anchor::T) where {P, T, L <: Bounded, R <: Bounded}
Expand Down Expand Up @@ -174,6 +174,9 @@ end
anchor(interval::AnchoredInterval) = interval.anchor
span(interval::AnchoredInterval{P}) where P = abs(P)

bounds_int(interval::AnchoredInterval{P,T,L,R}) where {P,T,L,R} = bounds_int(L, R)
bounds_types(interval::AnchoredInterval{P,T,L,R}) where {P,T,L,R} = (L, R)

##### CONVERSION #####

# Allows an interval to be converted to a scalar when the set contained by the interval only
Expand All @@ -197,11 +200,11 @@ function Base.convert(::Type{T}, interval::AnchoredInterval{P,T}) where {P,T}
end

function Base.convert(::Type{Interval}, interval::AnchoredInterval{P,T,L,R}) where {P,T,L,R}
return Interval{T,L,R}(first(interval), last(interval))
return Interval{T}(first(interval), last(interval), (L, R))
end

function Base.convert(::Type{Interval{T}}, interval::AnchoredInterval{P,T,L,R}) where {P,T,L,R}
return Interval{T,L,R}(first(interval), last(interval))
return Interval{T}(first(interval), last(interval), (L, R))
end

# Conversion methods which currently aren't needed but could prove useful. Commented out
Expand All @@ -221,18 +224,18 @@ function Base.convert(::Type{AnchoredInterval{P}}, interval::Interval{T}) where
end
=#

function Base.convert(::Type{AnchoredInterval{Ending}}, interval::Interval{T,L,R}) where {T,L,R}
function Base.convert(::Type{AnchoredInterval{Ending}}, interval::Interval{T}) where {T,L,R}
if !isbounded(interval)
throw(ArgumentError("Unable to represent a non-bounded interval using a `AnchoredInterval`"))
end
AnchoredInterval{-span(interval), T, L, R}(last(interval))
AnchoredInterval{-span(interval), T, bounds_types(interval)...}(last(interval))
end

function Base.convert(::Type{AnchoredInterval{Beginning}}, interval::Interval{T,L,R}) where {T,L,R}
function Base.convert(::Type{AnchoredInterval{Beginning}}, interval::Interval{T}) where {T,L,R}
if !isbounded(interval)
throw(ArgumentError("Unable to represent a non-bounded interval using a `AnchoredInterval`"))
end
AnchoredInterval{span(interval), T, L, R}(first(interval))
AnchoredInterval{span(interval), T, bounds_types(interval)...}(first(interval))
end

##### DISPLAY #####
Expand Down
15 changes: 13 additions & 2 deletions src/endpoint.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,19 @@ const RightEndpoint{T,B} = Endpoint{T, Right, B} where {T,B <: Bound}
LeftEndpoint{B}(ep::T) where {T,B} = LeftEndpoint{T,B}(ep)
RightEndpoint{B}(ep::T) where {T,B} = RightEndpoint{T,B}(ep)

LeftEndpoint(i::AbstractInterval{T,L,R}) where {T,L,R} = LeftEndpoint{T,L}(L !== Unbounded ? first(i) : nothing)
RightEndpoint(i::AbstractInterval{T,L,R}) where {T,L,R} = RightEndpoint{T,R}(R !== Unbounded ? last(i) : nothing)
# Slightly awkward conversion from intervals to endpoints which can still support 'Unbounded'
# NOTE: We're slowly dropping the Unbounded type in favour of using `Infinity`
function LeftEndpoint(i::AbstractInterval{T}) where {T}
L, _ = bounds_types(i)
val = L !== Unbounded ? first(i) : nothing
LeftEndpoint{T, L}(val)
end

function RightEndpoint(i::AbstractInterval{T}) where {T}
_, R = bounds_types(i)
val = R !== Unbounded ? last(i) : nothing
return RightEndpoint{T, R}(val)
end

endpoint(x::Endpoint) = isbounded(x) ? x.endpoint : nothing
bound_type(x::Endpoint{T,D,B}) where {T,D,B} = B
Expand Down
Loading