-
Notifications
You must be signed in to change notification settings - Fork 82
Tokenization
template <typename input_t, typename delimiter_t, typename config_t = std::ignore>
requires forward_range_concept<input_t> && predicate_concept<delimiter_t>
inline auto
split_by(input_t const & input,
delimiter_t && delimiter,
config_t && config) // optional parameter
{
/* implementation detail*/
return // optional<view<view<sequence_type>>>
}
This function operates on a forward_range and returns view of views. The views can be empty if the sequence could not be split because the input might be empty. Otherwise the optional holds a view-of-views, so that no copying of sequence data is needed until the user explicitly assigns the return value to a proper container type to hold the data. This is also the reason, why input_range_concept is not applicable, as there is no guarantee that the seen data for tokenization is still present, when the iteration through the input continues.
namespace seqan3::action
{
constexpr ranges::action< crop_outer_fn > crop_outer { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_outer_fn > crop_outer { /* unspecified */ }
}
Modeling this kind of functions as either views or actions would be desirable. How exactly this has to be implemented remains to be seenβοΈ
namespace seqan3::action
{
constexpr ranges::action< crop_before_last_fn > crop_before_last { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_before_last_fn > crop_before_last { /* unspecified */ }
}
Similar to crop_outer.
namespace seqan3::action
{
constexpr ranges::action< crop_before_first_fn > crop_before_first { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_before_first_fn > crop_before_first { /* unspecified */ }
}
similar to crop_outer.
namespace seqan3::action
{
constexpr ranges::action< crop_after_last_fn > crop_after_last { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_after_last_fn > crop_after_last { /* unspecified */ }
}
namespace seqan3::action
{
constexpr ranges::action< crop_after_first_fn > crop_after_first { /* unspecified */ }
}
namespace seqan3::view
{
constexpr ranges::view< crop_after_first_fn > crop_after_first { /* unspecified */ }
}
template <typename input_t, typename predicate_t>
requires forward_range_concept<input_t> && predicate_concept<predicate_t>
inline auto
find_last(input_t const & input,
predicate_t && p)
{
/* unspecified */
return iterator_t<input_t>{begin(input)};
}
The find_last is just an algorithm, that can be optimised when working on buffered streams, as chunking might be more efficient on streams. However, right now it is nowhere used in seqan For standard containers this could be simply replaced with:
view::find_if(view::reverse(buffer), seqan3::equals_char<','>());
template <typename input_t, typename predicate_t>
requires forward_range_concept<input_t> && predicate_concept<predicate_t>
inline auto
find_first(input_t const & input,
predicate_t && p)
{
/* unspecified */
return iterator_t<input_t>{begin(input)};
}
The find_first is just an algorithm, that can be optimised when working on buffered streams, as chunking might be more efficient on streams. However, right now it is only used in one place of seqan, which does it on a simple CharString buffer. For standard containers this could be simply replaced with:
view::find_if(buffer, seqan3::equals_char<','>());