Hi Kronos team,
I am experimenting with Kronos tokenizer embeddings for market regime detection. On continuously traded markets such as BTCUSDT 15m, the latent regimes look relatively stable: clusters have decent persistence, transition matrices are structured, and latent velocity/persistent instability show some out-of-sample relationship with future drawdown risk.
However, when I apply a similar pipeline to a discontinuously traded market such as CSI500 / Chinese A-share daily data, the regimes become much less stable. Cluster persistence drops, regime labels switch more frequently, and the latent manifold appears less coherent.
My current hypothesis is that the discontinuous market structure may be causing problems for the tokenizer/encoder representation:
- overnight gaps are treated as ordinary adjacent bars;
- weekends and holidays create irregular time intervals;
- A-share sessions have hard open/close boundaries;
- limit-up/limit-down behavior may distort the OHLCV pattern;
- daily bars may compress too much session-specific information;
- raw OHLCV scale and turnover/amount fields may behave differently from continuously traded crypto markets.
I wanted to ask:
- For non-24/7 markets, do you recommend explicitly encoding time gaps, sessions, holidays, or trading calendar information before passing data into Kronos?
- Should overnight gaps be treated as a separate feature rather than normal close-to-open movement?
- For daily equity index data, would you recommend using raw OHLCV/amount, normalized returns/ranges, or the same input format as your examples?
- Is Kronos tokenizer expected to be stable on discontinuously traded markets, or is it mainly optimized for continuous K-line sequences?
- Are there recommended window lengths or preprocessing choices for equity index daily/60m data?
- Would using intraday bars with session-aware preprocessing be preferable to daily bars for regime detection?
- If regime labels become unstable, should we tune the representation extraction step, the clustering method, or the market-data preprocessing first?
Any guidance on how to adapt Kronos embeddings for discontinuous markets would be very helpful.
Hi Kronos team,
I am experimenting with Kronos tokenizer embeddings for market regime detection. On continuously traded markets such as BTCUSDT 15m, the latent regimes look relatively stable: clusters have decent persistence, transition matrices are structured, and latent velocity/persistent instability show some out-of-sample relationship with future drawdown risk.
However, when I apply a similar pipeline to a discontinuously traded market such as CSI500 / Chinese A-share daily data, the regimes become much less stable. Cluster persistence drops, regime labels switch more frequently, and the latent manifold appears less coherent.
My current hypothesis is that the discontinuous market structure may be causing problems for the tokenizer/encoder representation:
I wanted to ask:
Any guidance on how to adapt Kronos embeddings for discontinuous markets would be very helpful.