Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATR introduces look-ahead bias #963

Open
EladioRocha opened this issue Apr 15, 2023 · 6 comments · May be fixed by #1230
Open

ATR introduces look-ahead bias #963

EladioRocha opened this issue Apr 15, 2023 · 6 comments · May be fixed by #1230
Labels
bug Something isn't working

Comments

@EladioRocha
Copy link

EladioRocha commented Apr 15, 2023

Expected Behavior

When we trade with trailing stop loss the ATR is calculated. Fo example suppose that we calculate the ATR for 100 periods the first 100 periods should be removed because they don't have values and if you try to open a trade in the period 10 there isn't an available ATR value therefore isn't possible to determine the stop loss. So in that case all the values that contains NaN values should be removed.

Actual Behavior

In the code when the ATR is calculated the values are backward filled with the future values and according to our previous example if a trade is opened in the period 10, then the ATR used is the ATR in the period 101

def set_atr_periods(self, periods: int = 100):
    """
    Set the lookback period for computing ATR. The default value
    of 100 ensures a _stable_ ATR.
    """
    h, l, c_prev = self.data.High, self.data.Low, pd.Series(self.data.Close).shift(1)
    tr = np.max([h - l, (c_prev - h).abs(), (c_prev - l).abs()], axis=0)
    atr = pd.Series(tr).rolling(periods).mean().bfill().values  # Is look-ahead bias introduced using bfill?
    
    self.__atr = atr

Additional info

image

It maintins the ATR calculation and changes until the period 101

My code is the following

class MyStrategy(TrailingStrategy):
    size = 0.3
    atr = 3
    atr_period = 100

    def init(self):
        super().init()
        self.set_trailing_sl(n_atr=self.atr)
        self.set_atr_periods(periods=self.atr_period)

    def next(self):
        super().next()
        
        if not self.position.is_long:
            self.buy(size=self.size)
@kernc
Copy link
Owner

kernc commented May 3, 2023

True, the .bfill() call here introduces a brief look-ahead bias.

atr = pd.Series(tr).rolling(periods).mean().bfill().values

The assumption was that the ATR for a trailing strategy would be fairly stable.

PR fix welcome!

@kernc kernc added the bug Something isn't working label May 3, 2023
@kernc kernc changed the title The atr introduce look ahead bias or is my interpretation incorrect? ATR introduces look-ahead bias Feb 19, 2025
@bravegag
Copy link

bravegag commented Feb 23, 2025

A simple fix is to do a ffill before the bfill like this:

atr = pd.Series(tr).rolling(periods).mean().ffill().bfill().values 

this ensures that only the leading (oldest prices chronologically) are backfilled for the purpose of calculating HPR.

Also, you may want to consider using the min_periods argument in the rolling, otherwise and if there is missing price data, it will generate a lot of NAs along the way. For the mean function I would recommend setting it to 1 (meaning a valid non-NA mean result will be provisioned when there is at least 1 non-NA value within each window of 100 prices) so that NAs are avoided e.g.

atr = pd.Series(tr).rolling(periods, min_periods=1).mean().ffill().bfill().values 

Last but not least, possibly the f-b fills need to be moved to before the rolling mean calculation, not sure what the data is but if it is prices then it would make more sense:

atr = pd.Series(tr).ffill().bfill().rolling(periods, min_periods=1).mean().values 

I need to get more familiarity with the backtesting code before submitting PRs :)

bravegag added a commit to bravegag/backtesting.py that referenced this issue Feb 26, 2025
@bravegag
Copy link

@EladioRocha thanks a lot for reporting this. Can you please double-check my PR #1230 to make sure that it works for you now?

@kernc
Copy link
Owner

kernc commented Feb 27, 2025

this ensures that only the leading (oldest prices chronologically) are backfilled for the purpose of calculating HPR.

@bravegag As far as I understand, ATR is computed from true range, which is computed from OHLC data, which is checked beforehand to contain no NaNs ...

hi, lo, c_prev = self.data.High, self.data.Low, pd.Series(self.data.Close).shift(1)
tr = np.max([hi - lo, (c_prev - hi).abs(), (c_prev - lo).abs()], axis=0)
atr = pd.Series(tr).rolling(periods).mean().bfill().values

if data[['Open', 'High', 'Low', 'Close']].isnull().values.any():
raise ValueError('Some OHLC values are missing (NaN). '
'Please strip those lines with `df.dropna()` or '
'fill them in with `df.interpolate()` or whatever.')

Therefore, there should be no NaNs in ATR other than the leading ones! Which are bfill()-ed already, hence this bug report. 🤔

@bravegag
Copy link

bravegag commented Feb 27, 2025

OK then if you are fine, close the issue and the PR, my fix just makes sure there won't be any LAB without making any assumptions. Also the bfill should come before the rolling. Otherwise any leading NAs will introduce exactly 99 (100-1) NA prices ahead due to the default min_periods argument to rolling; and the mean, due to the bfill, will be unnecessarily duplicated for all those steps.

@kernc
Copy link
Owner

kernc commented Feb 27, 2025

pd.Series(tr).rolling(periods) introduces periods - 1 NaNs in the front warm-up period. The issue is these (and only these NaNs from rolling) are bfilled instead of e.g. being left as NaN values. This bfilling introduces a look-ahead bias for the warm-up period, more noticeable with short ATR periods.

@kernc kernc linked a pull request Mar 11, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants