Jump to content

Autoregressive integrated moving average

From Wikipedia, the free encyclopedia
(Redirected fromARIMA)

Instatisticsandeconometrics,and in particular intime series analysis,anautoregressive integrated moving average(ARIMA)modelis a generalization of anautoregressive moving average(ARMA) model. To better comprehend the data or to forecast upcoming series points, both of these models are fitted totime seriesdata. ARIMA models are applied in some cases where data show evidence ofnon-stationarityin the sense of expected value (but not variance/autocovariance), where an initial differencing step (corresponding to the"integrated"part of the model) can be applied one or more times to eliminate the non-stationarity of the mean function (i.e., the trend).[1]When the seasonality shows in a time series, the seasonal-differencing[2]could be applied to eliminate the seasonal component. Since theARMAmodel, according to theWold's decomposition theorem,[3][4][5]is theoretically sufficient to describe aregular(a.k.a. purely nondeterministic[5])wide-sense stationarytime series, we are motivated to make stationary a non-stationary time series, e.g., by using differencing, before we can use theARMAmodel.[6]Note that if the time series contains apredictablesub-process (a.k.a. pure sine or complex-valued exponential process[4]), the predictable component is treated as a non-zero-mean but periodic (i.e., seasonal) component in the ARIMA framework so that it is eliminated by the seasonal differencing.

The autoregressive (AR) part of ARIMA indicates that the evolving variable of interest isregressedon its own lagged (i.e., prior) values. The moving average (MA) part indicates that theregression erroris actually alinear combinationof error terms whose values occurred contemporaneously and at various times in the past.[7]TheI(for "integrated" ) indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once). The purpose of each of these features is to make the model fit the data as well as possible.

Non-seasonal ARIMA models are generally denoted ARIMA(p,d,q) whereparametersp,d,andqare non-negative integers,pis the order (number of time lags) of theautoregressive model,dis the degree of differencing (the number of times the data have had past values subtracted), andqis the order of themoving-average model.Seasonal ARIMA models are usually denoted ARIMA(p,d,q)(P,D,Q)m,wheremrefers to the number of periods in each season, and the uppercaseP,D,Qrefer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.[8][2]

When two out of the three terms are zeros, the model may be referred to based on the non-zero parameter, dropping "AR","I"or"MA"from the acronym describing the model. For example,isAR(1),isI(1),andisMA(1).

ARIMA models can be estimated following theBox–Jenkinsapproach.

Definition

[edit]

Given time series dataXtwheretis an integer index and theXtare real numbers, anmodel is given by

or equivalently by

whereis thelag operator,theare the parameters of the autoregressive part of the model, theare the parameters of the moving average part and theare error terms. The error termsare generally assumed to beindependent, identically distributedvariables sampled from anormal distributionwith zero mean.

Assume now that the polynomialhas aunit root(a factor) of multiplicityd.Then it can be rewritten as:

An ARIMA(p,d,q) process expresses this polynomial factorisation property withp=p'−d,and is given by:

and thus can be thought as a particular case of an ARMA(p+d,q) process having the autoregressive polynomial withdunit roots. (For this reason, no process that is accurately described by an ARIMA model withd> 0 iswide-sense stationary.)

The above can be generalized as follows.

This defines an ARIMA(p,d,q) process withdrift.

Other special forms

[edit]

The explicit identification of the factorization of the autoregression polynomial into factors as above can be extended to other cases, firstly to apply to the moving average polynomial and secondly to include other special factors. For example, having a factorin a model is one way of including a non-stationary seasonality of periodsinto the model; this factor has the effect of re-expressing the data as changes fromsperiods ago. Another example is the factor,which includes a (non-stationary) seasonality of period 2.[clarification needed]The effect of the first type of factor is to allow each season's value to drift separately over time, whereas with the second type values for adjacent seasons move together.[clarification needed]

Identification and specification of appropriate factors in an ARIMA model can be an important step in modeling as it can allow a reduction in the overall number of parameters to be estimated while allowing the imposition on the model of types of behavior that logic and experience suggest should be there.

Differencing

[edit]

A stationary time series's properties do not depend on the time at which the series is observed. Specifically, for awide-sense stationarytime series, the mean and the variance/autocovariancekeep constant over time.Differencingin statistics is a transformation applied to a non-stationary time-series in order to make it stationaryin the mean sense(viz., to remove the non-constant trend), but having nothing to do with the non-stationarity of the variance orautocovariance.Likewise, theseasonal differencingis applied to a seasonal time-series to remove the seasonal component. From the perspective of signal processing, especially theFourier spectral analysistheory, the trend is the low-frequency part in the spectrum of a non-stationary time series, while the season is the periodic-frequency part in the spectrum of it. Therefore, the differencing works as ahigh-pass(i.e., low-stop) filter and the seasonal-differencing as acomb filterto suppress the low-frequency trend and the periodic-frequency season in the spectrum domain (rather than directly in the time domain), respectively.[6]

To difference the data, the difference between consecutive observations is computed. Mathematically, this is shown as

Differencing removes the changes in the level of a time series, eliminating trend and seasonality and consequently stabilizing the mean of the time series.[6]

Sometimes it may be necessary to difference the data a second time to obtain a stationary time series, which is referred to assecond-order differencing:

Another method of differencing data is seasonal differencing, which involves computing the difference between an observation and the corresponding observation in the previous season e.g a year. This is shown as:

The differenced data are then used for the estimation of anARMAmodel.

Examples

[edit]

Some well-known special cases arise naturally or are mathematically equivalent to other popular forecasting models. For example:

  • An ARIMA(0, 1, 0) model (orI(1)model) is given by— which is simply arandom walk.
  • An ARIMA(0, 1, 0) with a constant, given by— which is a random walk with drift.
  • An ARIMA(0, 0, 0) model is awhite noisemodel.
  • An ARIMA(0, 1, 2) model is a Damped Holt's model.
  • An ARIMA(0, 1, 1) model without constant is abasic exponential smoothingmodel.[9]
  • An ARIMA(0, 2, 2) model is given by— which is equivalent to Holt's linear method with additive errors, ordouble exponential smoothing.[9]

Choosing the order

[edit]

The order p and q can be determined using the sampleautocorrelation function(ACF),partial autocorrelation function(PACF), and/or extended autocorrelation function (EACF) method.[10]

Other alternative methods include AIC, BIC, etc.[10]To determine the order of a non-seasonal ARIMA model, a useful criterion is theAkaike information criterion (AIC).It is written as

whereLis the likelihood of the data,pis the order of the autoregressive part andqis the order of the moving average part. Thekrepresents the intercept of the ARIMA model. For AIC, ifk= 1 then there is an intercept in the ARIMA model (c≠ 0) and ifk= 0 then there is no intercept in the ARIMA model (c= 0).

The corrected AIC for ARIMA models can be written as

TheBayesian Information Criterion (BIC)can be written as

The objective is to minimize the AIC, AICc or BIC values for a good model. The lower the value of one of these criteria for a range of models being investigated, the better the model will suit the data. The AIC and the BIC are used for two completely different purposes. While the AIC tries to approximate models towards the reality of the situation, the BIC attempts to find the perfect fit. The BIC approach is often criticized as there never is a perfect fit to real-life complex data; however, it is still a useful method for selection as it penalizes models more heavily for having more parameters than the AIC would.

AICc can only be used to compare ARIMA models with the same orders of differencing. For ARIMAs with different orders of differencing,RMSEcan be used for model comparison.

Estimation of coefficients

[edit]

Forecasts using ARIMA models

[edit]

The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary:

while the second iswide-sense stationary:

Now forecasts can be made for the process,using a generalization of the method ofautoregressive forecasting.

Forecast intervals

[edit]

The forecast intervals (confidence intervalsfor forecasts) for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these assumptions does not hold, then the forecast intervals may be incorrect. For this reason, researchers plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals.

95% forecast interval:,whereis the variance of.

For,for all ARIMA models regardless of parameters and orders.

For ARIMA(0,0,q),

[citation needed]

In general, forecast intervals from ARIMA models will increase as the forecast horizon increases.

Variations and extensions

[edit]

A number of variations on the ARIMA model are commonly employed. If multiple time series are used then thecan be thought of as vectors and a VARIMA model may be appropriate. Sometimes a seasonal effect is suspected in the model; in that case, it is generally considered better to use a SARIMA (seasonal ARIMA) model than to increase the order of the AR or MA parts of the model.[11]If the time-series is suspected to exhibitlong-range dependence,then thedparameter may be allowed to have non-integer values in anautoregressive fractionally integrated moving averagemodel, which is also called a Fractional ARIMA (FARIMA or ARFIMA) model.

Software implementations

[edit]

Various packages that apply methodology likeBox–Jenkinsparameter optimization are available to find the right parameters for the ARIMA model.

  • EViews:has extensive ARIMA and SARIMA capabilities.
  • Julia:contains an ARIMA implementation in the TimeModels package[12]
  • Mathematica:includesARIMAProcessfunction.
  • MATLAB:theEconometrics ToolboxincludesARIMA modelsandregression with ARIMA errors
  • NCSS:includes several procedures forARIMAfitting and forecasting.[13][14][15]
  • Python:the"statsmodels"package includes models for time series analysis – univariate time series analysis: AR, ARIMA – vector autoregressive models, VAR and structural VAR – descriptive statistics and process models for time series analysis.
  • R:the standard Rstatspackage includes anarimafunction, which is documented in"ARIMA Modelling of Time Series".Besides thepart, the function also includes seasonal factors, an intercept term, and exogenous variables (xreg,called "external regressors" ). The packageastsahas scripts such assarimato estimate seasonal or nonseasonal models andsarima.simto simulate from these models. The CRAN task view onTime Seriesis the reference with many more links. The"forecast"package inRcan automatically select an ARIMA model for a given time series with theauto.arima()function [that can often give questionable results][1]and can also simulate seasonal and non-seasonal ARIMA models with itssimulate.Arima()function.[16]
  • Ruby:the"statsample-timeseries"gem is used for time series analysis, including ARIMA models and Kalman Filtering.
  • JavaScript:the"arima"package includes models for time series analysis and forecasting (ARIMA, SARIMA, SARIMAX, AutoARIMA)
  • C:the"ctsa"package includes ARIMA, SARIMA, SARIMAX, AutoARIMA and multiple methods for time series analysis.
  • SAFE TOOLBOXES:includesARIMA modellingandregression with ARIMA errors.
  • SAS:includes extensive ARIMA processing in its Econometric and Time Series Analysis system: SAS/ETS.
  • IBMSPSS:includes ARIMA modeling in the Professional and Premium editions of its Statistics package as well as its Modeler package. The default Expert Modeler feature evaluates a range of seasonal and non-seasonal autoregressive (p), integrated (d), and moving average (q) settings and seven exponential smoothing models. The Expert Modeler can also transform the target time-series data into its square root or natural log. The user also has the option to restrict the Expert Modeler to ARIMA models, or to manually enter ARIMA nonseasonal and seasonalp,d,andqsettings without Expert Modeler. Automatic outlier detection is available for seven types of outliers, and the detected outliers will be accommodated in the time-series model if this feature is selected.
  • SAP:the APO-FCS package[17]inSAP ERPfromSAPallows creation and fitting of ARIMA models using the Box–Jenkins methodology.
  • SQL Server Analysis Services:fromMicrosoftincludes ARIMA as a Data Mining algorithm.
  • Stataincludes ARIMA modelling (using its arima command) as of Stata 9.
  • StatSim:includes ARIMA models in theForecastweb app.
  • TeradataVantage has the ARIMA function as part of its machine learning engine.
  • TOL (Time Oriented Language) is designed to model ARIMA models (including SARIMA, ARIMAX and DSARIMAX variants)[2].
  • Scala:spark-timeserieslibrary contains ARIMA implementation for Scala, Java and Python. Implementation is designed to run onApache Spark.
  • PostgreSQL/MadLib:Time Series Analysis/ARIMA.
  • X-12-ARIMA:from theUS Bureau of the Census

See also

[edit]

References

[edit]
  1. ^For further information on Stationarity and Differencing seehttps://www.otexts.org/fpp/8/1
  2. ^abHyndman, Rob J; Athanasopoulos, George.8.9 Seasonal ARIMA models.oTexts.Retrieved19 May2015.{{cite book}}:|website=ignored (help)
  3. ^Hamilton, James (1994).Time Series Analysis.Princeton University Press.ISBN9780691042893.
  4. ^abPapoulis, Athanasios (2002).Probability, Random Variables, and Stochastic processes.Tata McGraw-Hill Education.
  5. ^abTriacca, Umberto (19 Feb 2021)."The Wold Decomposition Theorem"(PDF).Archived(PDF)from the original on 2016-03-27.
  6. ^abcWang, Shixiong; Li, Chongshou; Lim, Andrew (2019-12-18). "Why Are the ARIMA and SARIMA not Sufficient".arXiv:1904.07632[stat.AP].
  7. ^Box, George E. P. (2015).Time Series Analysis: Forecasting and Control.WILEY.ISBN978-1-118-67502-1.
  8. ^"Notation for ARIMA Models".Time Series Forecasting System.SAS Institute.Retrieved19 May2015.
  9. ^ab"Introduction to ARIMA models".people.duke.edu.Retrieved2016-06-05.
  10. ^abMissouri State University."Model Specification, Time Series Analysis"(PDF).
  11. ^Swain, S; et al. (2018). "Development of an ARIMA Model for Monthly Rainfall Forecasting over Khordha District, Odisha, India".Recent Findings in Intelligent Computing Techniques.Advances in Intelligent Systems and Computing. Vol. 708. pp. 325–331).doi:10.1007/978-981-10-8636-6_34.ISBN978-981-10-8635-9.{{cite book}}:|journal=ignored (help)
  12. ^TimeModels.jlwww.github.com
  13. ^ARIMA in NCSS,
  14. ^Automatic ARMA in NCSS,
  15. ^Autocorrelations and Partial Autocorrelations in NCSS
  16. ^8.7 ARIMA modelling in R | OTexts.Retrieved2016-05-12.{{cite book}}:|website=ignored (help)
  17. ^"Box Jenkins model".SAP.Retrieved8 March2013.

Further reading

[edit]
[edit]