Estimating slow correlations from samples of limited duration
TL;DR: Many biological time-series have slow correlation timescales, but default estimators can greatly underestimate these correlations, even if you have enough data to resolve them. Normalizing by a triangle function removes the bias, but at the cost of potentially yielding an "invalid" correlation function estimate.Science is replete with time-series data, and we'd like to analyze this data to extract meaningful insight about the structure and function of the systems that produce them.
If you already know what correlation functions are, skip directly to the estimation section.
Auto- and cross-correlation functions
Suppose we have a stationary, univariate random process \( X(t) ~ P[X(t)] \), and suppose we know the distribution \( P[X(t)] \). Its autocorrelation function \( R_x(\tau) \)(sometimes written \( R_{xx}(\tau) \)) is defined as
\[ R_x(\tau) \equiv \mathbb{E}[X(t)X(t+\tau)] \]where \( \tau \) is the time lag between two measurements and \( \mathbb{E}[\dots] \) is the expectation over samples \( X(t) \). This function \( R_x(\tau) \) gives the expected product of the signal measured at two different timepoints separated by \( \tau \). Stationarity means that \( R_x(\tau) \) is only a function of \( \tau \) and not of \( t \).
The autocovariance function \( C_x(\tau) \) is related to \( R_x(\tau) \) via
\[ C_x(\tau) \equiv \textrm{E}[(X(t) - \mu_x)(X(t+\tau) - \mu_x)] = R_x(\tau) - \mu_x^2 \]i.e. it is the autocorrelation function of the signal after its mean \( \mu_x \) has been subtracted off. This function \( C_x(\tau) \) specifies more explicitly how the signal measured at two timepoints separated by \( \tau \) covary relative to the signal mean \( \mu_x \). For zero-mean signals, however, \( C_x(\tau) = R_x(\tau) \).
A key property of an autocorrelation function in general is that \( |R_x(\tau)| \ll R_x(0) \), i.e. the autocorrelation function is bounded by \( R_x(0) \). And for a zero-mean signal \( R_x(0) = C_x(0) \) is just the variance \( \text{Var}[X(t)] \), also independent of \( t \).
(Note: the autocovariance \( C_x(\tau) \) equals to the covariance between \( x(t) \) and \( x(t+\tau) \), but the autocorrelation \( R_x(\tau) \) does not equal the Pearson correlation \( R \) between \( x(t) \) and \( x(t+\tau) \). The Pearson correlation is the covariance scaled by the geometric mean of the variances, which restricts its value to lie between \( [-1, 1] \), whereas values of the \( R_x(\tau) \) have no such restrictions.)
