Uploaded by billiejoea231

Time Series Theory and Methods Springer Series in Statistics

advertisement
Springer Series in Statistics
Advisors:
P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg,
I. Olkin, N. Wermuth, S. Zeger
For other titles published in this series, go to
http://www.springer.com/series/692
Peter J. Brockwell
Richard A. Davis
Time Series:
Theory and Methods
Second Edition
�Springer
Peter J. Brockwell
Department of Statistics
Colorado State University
Fort Collins, CO 80523
USA
Richard A. Davis
Department of Statistics
Columbia University
New York, NY 10027
USA
Mathematical Subject Classification: 62-01, 62M10
Library of Congress Cataloging-in-Publication Data
Brockwell, Peter J.
Time series: theory and methods I Peter J. Brockwell, Richard A. Davis.
p.
em. -(Springer series in statistics)
"Second edition"-Pref.
Includes bibliographical references and index.
ISBN 0-387-97429-6 (USA).-ISBN 3-540-97429-6 (EUR.)
I. Time-series analysis.
I. Davis, Richard A.
QA280.B76 1991
II. Title.
III. Series.
90-25821
519.5'5-dc20
ISBN 1-4419-0319-8
ISBN 978-1-4419-0319-8
Printed on a.cid-free paper.
(soft cover)
© 2006 Springer Science +Business Media, LLC
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street,
New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly
analysis. Use in connection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as a n expres.sion of opinion as to whether or
not they are subject to proprietary rights.
Printed in the United States of America.
15 14 13
springer.com
To our families
Preface to the Second Edition
This edition contains a large number of additions and corrections scattered
throughout the text, including the incorporation of a new chapter on
state-space models. The companion diskette for the IBM PC has expanded
into the software package I TSM: An Interactive Time Series Modelling
Package for the PC, which includes a manual and can be ordered from
Springer-Verlag. *
We are indebted to many readers who have used the book and programs
and made suggestions for improvements. Unfortunately there is not enough
space to acknowledge all who have contributed in this way; however, sp�cial
mention must be made of our prize-winning fault-finders, Sid Resnick and
F. Pukelsheim. Special mention should also be made of Anthony Brockwell,
whose advice and support on computing matters was invaluable in the
preparation of the new diskettes. We have been fortunate to work on the
new edition in the excellent environments provided by the University of
Melbourne and Colorado State University. We thank Duane Boes
particularly for his support and encouragement throughout, and the
Australian Research Council and National Science Foundation for their
support of research related to the new material. We are also indebted to
Springer-Verlag for their constant support and assistance in preparing the
second edition.
Fort Collins, Colorado
November, 1 990
P.J.
BROCKWELL
R.A. DAVIS
* ITSM: An Interactive Time Series Modelling Package for the PC by P.J. Brockwell a nd R.A.
Da vis. ISBN: 0-387-97482-2; 1991.
viii
Preface to the Second Edition
Note added in the eighth printing: The computer programs referred to in the text
have now been superseded by the package ITSM2000, the student version of which
accompanies our other text, Introduction to Time Series and Forecasting, also
published by Springer-Verlag. Enquiries regarding purchase of the professional
version of this package should be sent to pjbrockwell @cs.com.
Preface to the First Edition
We have attempted in this book to give a systematic account of linear time
series models and their application to the modelling and prediction of data
collected sequentially in time. The aim is to provide specific techniques for
handling data and at the same time to provide a thorough understanding of
the mathematical basis for the techniques. Both time and frequency domain
methods are discussed but the book is written in such a way that either
approach could be emphasized. The book is intended to be a text for graduate
students in statistics, mathematics, engineering, and the natural or social
sciences. It has been used both at the M.S. level, emphasizing the more
practical aspects of modelling, and at the Ph.D. level, where the detailed
mathematical derivations of the deeper results can be included.
Distinctive features of the book are the extensive use of elementary Hilbert
space methods and recursive prediction techniques based on innovations, use
of the exact Gaussian likelihood and AIC for inference, a thorough treatment
of the asymptotic behavior of the maximum likelihood estimators of the
coefficients of univariate ARMA models, extensive illustrations of the tech­
niques by means of numerical examples, and a large number of problems for
the reader. The companion diskette contains programs written for the IBM
PC, which can be used to apply the methods described in the text. Data sets
can be found in the Appendix, and a more extensive collection (including most
of those used for the examples in Chapters 1 , 9, 10, 1 1 and 1 2) is on the diskette.
Simulated ARMA series can easily be generated and filed using the program
PEST. Valuable sources of additional time-series data are the collections of
Makridakis et al. (1984) and Working Paper 109 ( 1984) of Scientific Computing
Associates, DeKalb, Illinois.
Most of the material in the book is by now well-established in the time
series literature and we have therefore not attempted to give credit for all the
X
Preface to the First Edition
results discussed. Our indebtedness to the authors of some of the well-known
existing books on time series, in particular Anderson, Box and Jenkins, Fuller,
Grenander and Rosenblatt,, Hannan, Koopmans and Priestley will however
be apparent. We were also fortunate to have access to notes on time series by
W. Dunsmuir. To these and to the many other sources that have influenced
our presentation of the subject we express our thanks.
Recursive techniques based on the Kalman filter and state-space represen­
tations of ARMA processes have played an important role in many recent
developments in time series analysis. In particular the Gaussian likelihood of
a time series can be expressed very simply in terms of the one-step linear
predictors and their mean squared errors, both of which can be computed
recursively using a Kalman filter. Instead of using a state-space representation
for recursive prediction we utilize the innovations representation of an arbi­
trary Gaussian time series in order to compute best linear predictors and exact
Gaussian likelihoods. This approach, developed by Rissanen and Barbosa,
Kailath, Ansley and others, expresses the value of the series at time t in terms
of the one-step prediction errors up to that time. This representation provides
insight into the structure of the time series itself as well as leading to simple
algorithms for simulation, prediction and likelihood calculation.
These algorithms are used in the parameter estimation program (PEST)
found on the companion diskette. Given a data set of up to 2300 observations,
the program can be used to find preliminary, least squares and maximum
Gaussian likelihood estimators of the parameters of any prescribed ARIMA
model for the data, and to predict future values. It can also be used to simulate
values of an ARMA process and to compute and plot its theoretical auto­
covariance and spectral density functions. Data can be plotted, differenced,
deseasonalized and detrended. The program will also plot the sample auto­
correlation and partial autocorrelation functions of both the data itself and
the residuals after model-fitting. The other time-series programs are SPEC,
which computes spectral estimates for univariate or bivariate series based on
the periodogram, and TRANS, which can be used either to compute and plot
the sample cross-correlation function of two series, or to perform least squares
estimation of the coefficients in a transfer function model relating the second
series to the first (see Section 1 2.2). Also included on the diskette is a screen
editing program (WORD6), which can be used to create arbitrary data files,
and a collection of data files, some of which are analyzed in the book.
Instructions for the use of these programs are contained in the file HELP on
the diskette.
For a one-semester course on time-domain analysis and modelling at the
M.S. level, we have used the following sections of the book :
1 . 1 - 1 .6; 2. 1 -2.7; 3.1 -3.5; 5. 1-5.5; 7. 1 , 7.2; 8.1 -8.9; 9. 1 -9.6
(with brief reference to Sections 4.2 and 4.4). The prerequisite for this course
is a knowledge of probability and statistics at the level ofthe book Introducti on
to the Theory of Stati sti cs by Mood, Graybill and Boes.
Preface to the First Edition
XI
For a second semester, emphasizing frequency-domain analysis and multi­
variate series, we have used
4. 1 -4.4, 4.6-4. 10; 10. 1 - 10.7; 1 1 . 1 - 1 1 .7; selections from Chap. 1 2.
At the M.S. level it has not been possible (or desirable) to go into the mathe­
matical derivation of all the results used, particularly those in the starred
sections, which require a stronger background in mathematical analysis and
measure theory. Such a background is assumed in all of the starred sections
and problems.
For Ph.D. students the book has been used as the basis for a more
theoretical one-semester course covering the starred sections from Chapters
4 through 1 1 and parts of Chapter 1 2. The prerequisite for this course is a
knowledge of measure-theoretic probability.
We are greatly indebted to E.J. Hannan, R.H. Jones, S.l. Resnick, S.Tavare
and D. Tj0stheim, whose comments on drafts of Chapters 1 -8 led to sub­
stantial improvements. The book arose out of courses taught in the statistics
department at Colorado State University and benefitted from the comments
of many students. The development of the computer programs would not have
been possible without the outstanding work of Joe Mandarino, the architect
of the computer program PEST, and Anthony Brockwell, who contributed
WORD6, graphics subroutines and general computing expertise. We are
indebted also to the National Science Foundation for support for the research
related to the book, and one of us (P.J.B.) to Kuwait University for providing
an excellent environment in which to work on the early chapters. For permis­
sion to use the optimization program UNC22MIN we thank R. Schnabel of
the University of Colorado computer science department. Finally we thank
Pam Brockwell, whose contributions to the manuscript went far beyond those
of typist, and the editors of Springer-Verlag, who showed great patience and
cooperation in the final production of the book.
Fort Collins, Colorado
October 1 986
P.J.
BROCKWELL
R.A. DAVIS
Contents
Preface t o the Second Edition
Preface to the First Edition
Vll
IX
CHAPTER I
Stationary Time Series
§1.1
§ 1 .2
§1.3
§ 1 .4
§1.5
§ 1 .6
§1 .7*
Examples o f Time Series
Stochastic Processes
Stationarity and Strict Stationarity
The Estimation and Elimination of Trend and Seasonal Components
The Autocovariance Function of a Stationary Process
The Multivariate Normal Distribution
Applications of Kolmogorov's Theorem
Problems
CHAPTER 2
Hilbert Spaces
Inner-Product Spaces and Their Properties
Hilbert Spaces
The Projection Theorem
Orthonormal Sets
Projection in IR"
Linear Regression and the General Linear Model
Mean Square Convergence, Conditional Expectation and Best
Linear Prediction in L 2(!1, :F, P)
§2.8 Fourier Series
§2.9 Hilbert Space Isomorphisms
§2. 10* The Completeness of L 2 (Q, .?, P)
§2. 1 1 * Complementary Results for Fourier Series
Problems
§2. 1
§2.2
§2.3
§2.4
§2.5
§2.6
§2.7
1
8
11
14
25
32
37
39
42
42
46
48
54
58
60
62
65
67
68
69
73
XIV
Contents
CHAPTER 3
Stationary ARMA Processes
§3.1
§3.2
§3.3
§3.4
§3.5
§3.6*
Causal and Invertible ARMA Processes
Moving Average Processes of I nfinite Order
Computing the Autocovariance Function of an ARMA(p, q) Process
The Partial AutOCfimelation Function
The Autocovariance Generating Function
Homogeneous Linear Difference Equations with
Constant Coefficients
Problems
77
77
89
91
98
1 03
1 05
1 10
CHAPTER 4
The Spectral Representation of a Stationary Process
§4. 1
§4.2
§4.3
§4.4
§4.5*
§4.6*
§4.7*
§4.8 *
§4.9*
§4. 1 0*
§4. 1 1 *
Complex-Valued Stationary Time Series
The Spectral Distribution of a Linear Combination of Sinusoids
Herglotz's Theorem
Spectral Densities and ARMA Processes
Circulants and Their Eigenvalues
Orthogonal Increment Processes on [ -n, n]
Integration with Respect to an Orthogonal Increment Process
The Spectral Representation
Inversion Formulae
Time-Invariant Linear Filters
Properties of the Fourier Approximation h" to J(v.wJ
Problems
1 14
1 14
1 16
1 17
1 22
1 33
1 38
1 40
1 43
1 50
1 52
1 57
1 59
CHAPTER 5
Prediction of Stationary Processes
§5. 1
§5.2
§5.3
§5.4
§5.5
The Prediction Equations in the Time Domain
Recursive Methods for Computing Best Linear Predictors
Recursive Prediction of an ARMA(p, q) Process
Prediction of a Stationary Gaussian Process; Prediction Bounds
Prediction of a Causal Invertible ARMA Process in
Terms of Xi, oo <} :s; n
§5.6* Prediction in the Frequency Domain
§5.7* The Wold Decomposition
§5.8* Kolmogorov's Formula
Problems
-
1 66
1 66
1 69
1 75
1 82
1 82
1 85
1 87
191
1 92
CHAPTER 6*
Asymptotic Theory
§6. 1
§6.2
§6.3
§6.4
Convergence in Probability
Convergence in r'h Mean, r > 0
Convergence in Distribution
Central Limit Theorems and Related Results
Problems
1 98
1 98
202
204
209
215
Contents
XV
CHAPTER 7
Estimation of the Mean and the Autocovariance Function
§7. 1 Estimation of J1
§7.2 Estimation of y( ·) and p( · )
§7.3* Derivation of the Asymptotic Distributions
Problems
218
218
220
225
236
CHAPTER 8
Estimation for ARMA Models
The Yule-Walker Equations and Parameter Estimation for
Autoregressive Processes
§8.2 Preliminary Estimation for Autoregressive Processes Using the
Durbin-Levinson Algorithm
§8.3 Preliminary Estimation for Moving Average Processes Using the
Innovations Algorithm
§8.4 Preliminary Estimation for ARMA(p, q) Processes
§8.5 Remarks on Asymptotic Efficiency
§8.6 Recursive Calculation of the Likelihood of an Arbitrary
Zero-Mean Gaussian Process
§8.7 Maximum Likelihood and Least Squares Estimation for
ARMA Processes
§8.8 Asymptotic Properties of the Maximum Likelihood Estimators
§8.9 Confidence Intervals for the Parameters of a Causal Invertible
ARMA Process
§8. 1 0* Asymptotic Behavior of the Yule-Walker Estimates
§8. 1 1 * Asymptotic Normality of Parameter Estimators
Problems
238
§8. 1
239
241
245
250
253
254
256
258
260
262
265
269
CHAPTER 9
Model Building and Forecasting with ARIMA Processes
§9. 1
§9.2
§9.3
§9.4
§9.5
§9.6
ARIMA Models for Non-Stationary Time Series
Identification Techniques
Order Selection
Diagnostic Checking
Forecasting ARIMA Models
Seasonal ARIMA Models
Problems
273
274
284
301
306
314
320
326
CHAPTER 10
Inference for the Spectrum of a Stationary Process
§10.1
§10.2
§ 1 0.3
§ 10.4
§ 1 0.5
§ 1 0.6
The Periodogram
Testing for the Presence of Hidden Periodicities
Asymptotic Properties of the Periodogram
Smoothing the Periodogram
Confidence Intervals for the Spectrum
Autoregressive, Maximum Entropy, Moving Average and
Maximum Likelihood ARMA Spectral Estimators
§ 1 0.7 The Fast Fourier Transform (FFT) Algorithm
330
331
334
342
350
362
365
373
XVI
Contents
§10.8 * Derivation of the Asymptotic Behavior of the Maximum
Likelihood and Least Squares Estimators of the Coefficients of
an ARMA Process
Problems
CHAPTER II
Multivariate Time Series
§11.1
§1 1 .2
§1 1 .3
§ 1 1 .4
§1 1 . 5
§1 1 .6
§1 1 .7
§1 1 .8 *
Second Order Properties of Multivariate Time Series
Estimation of the Mean and Covariance Function
Multivariate ARMA Processes
Best Linear Predictors of Second Order Random Vectors
Estimation for Multivariate ARMA Processes
The Cross Spectrum
Estimating the Cross Spectrum
The Spectral Representation of a Multivariate Stationary
Time Series
Problems
CHAPTER 12
State-Space Models and the Kalman Recursions
§ 1 2. 1
§ 1 2.2
§ 1 2.3
§12.4
§ 1 2.5
State-Space M odels
The Kalman Recursions
State-Space Models with Missing Observations
Controllability and Observability
Recursive Bayesian State Estimation
Problems
CHAPTER 13
Further Topics
§13. 1
§ 13.2
§ 1 3.3
§13.4
Transfer Function Modelling
Long Memory Processes
Linear Processes with Infinite Variance
Threshold Models
Problems
Appendix: Data Sets
Bibliography
Index
375
396
401
402
405
417
421
430
434
443
454
459
463
463
474
482
489
498
501
506
506
520
535
545
552
555
561
567
CHAPTER 1
Stationary Time Series
In this chapter we introduce some basic ideas of time series analysis and
stochastic processes. Of particular importance are the concepts of stationarity
and the autocovariance and sample autocovariance functions. Some standard
techniques are described for the estimation and removal of trend and season­
ality (of known period) from an observed series. These are illustrated with
reference to the data sets in Section 1 . 1 . Most of the topics covered in this
chapter will be developed more fully in later sections of the book. The reader
who is not already familiar with random vectors and multivariate analysis
should first read Section 1.6 where a concise account of the required
background is given. Notice our convention that an n-dimensional random
vector is assumed (unless specified otherwise) to be a column vector X
(X 1, X2, . . , XnY of random variables. If S is an arbitrary set then we shall use
the notation sn to denote both the set of n-component column vectors with
components in S and the set of n-component row vectors with components
in S.
=
.
§ 1 . 1 Examples of Time Series
A time series is a set of observations x,, each one being recorded at a specified
time t. A discrete-time series (the type to which this book is primarily devoted)
is one in which the set T0 of times at which observations are made is a discrete
set, as is the case for example when observations are made at fixed time
intervals. Continuous-time series are obtained when observations are recorded
continuously over some time interval, e.g. when T0 [0, 1]. We shall use the
notation x(t) rather than x, if we wish to indicate specifically that observations
are recorded continuously.
=
1 . Stationary Time Series
2
EXAMPLE l.l.l (Current Through a Resistor). If a sinusoidal voltage v(t) =
a cos( vt + 8) is applied to a resistor of resistance r and the current recorded
continuously we obtain a continuous time series
x(t) r - 1acos(vt + 8).
=
If observations are made only at times 1 , 2, . . . , the resulting time series will
be discrete. Time series of this particularly simple type will play a fundamental
role in our later study of stationary time series.
0.5
0
-0 5
-1
-1 5
-
2
0
10
20
30
40
50
60
70
80
Figure 1 . 1 . 1 00 observations of the series x(t) = cos(.2t + n/3).
90
1 00
§ 1 . 1 . Examples of Time Series
EXAMPLE
3
1 . 1 .2 (Population x, of the U.S.A., 1 790- 1 980).
x,
x,
1 790
1 800
1 8 10
1 820
1830
1 840
1 850
1 860
1 870
1 880
3,929,21 4
5,308,483
7,239,88 1
9,638,453
1 2,860,702
1 7,063,353
23,1 9 1 ,876
3 1 ,443,321
38,558,371
50,1 89,209
1 890
1 900
1910
1 920
1 930
1940
1 950
1960
1 970
1980
62,979,766
76,21 2, 1 68
92,228,496
1 06,021 ,537
1 23,202,624
1 32, 1 64,569
1 5 1 ,325,798
1 79,323,1 75
203,302,03 1
226,545,805
260
240
220
200
1 80
�
til
c
�
::>
160
1 40
1 20
1 00
80
60
40
40
0
1 78 0
1 830
1 8 80
1 930
1 9 80
Figure 1 .2. Population of the U.S.A. at ten-year intervals, 1 790- 1980 (U.S. Bureau of
the Census).
I. Stationary Time Series
4
EXAMPLE
1 . 1 .3 (Strikes in the U.S.A., 1 95 1 - 1 980).
x,
x,
1951
1952
1953
1954
1955
1956
1957
1958
1 959
1960
1961
1962
1963
1 964
1 965
4737
5117
5091
3468
4320
3825
3673
3694
3708
3333
3367
36 14
3362
3655
3963
1 966
1 967
1 968
1 969
1 970
1 97 1
1 972
1 973
1 974
1 975
1 976
1 977
1 978
1979
1980
4405
4595
5045
5700
571 6
5 1 38
501 0
5353
6074
503 1
5648
5506
4230
4827
3885
6
�
Ill
1J
c
0
Ill
�
0
J:
f.--
5
4
3
2
+-���-,���-.-,����-,��
1950
1955
1 9 60
1 965
1 9 70
1 975
1980
Figure 1 .3. Strikes in the U.S.A., 1 95 1 - 1 980 (Bureau of Labor Statistics, U.S. Labor
Department).
§I. I. Examples of Time Series
EXAMPLE
1 . 1 .4 (All Star Baseball Games, 1 933 - 1 980).
Xt =
t- 1900
x,
5
33
34
35
{
1 if the National League won in year t,
- 1 if the American League won in year t.
37
36
-I -I -I
x,
49
50
-I
I
t- 1900
65
66
t- 1900
x,
51
I
67
t
=no ga me.
*
=two ga mes scheduled.
68
54
55
I
69
40
41
42
43
44
45
46
47
48
56
57
58
59
60
61
62
63
64
I
I
79
80
I - I - I -I
-I
I -I
53
52
39
38
-I
70
71
-I
I -I -I
*
74
75
72
73
*
76
t -I -I -I
*
77
*
78
I
3
2
rp
�9*\
0
GB-!1
-1
�
G-EH3-!t
rk.u
-2
-3
1 930
1 935
1 9 40
1945
1 950
1 955
1 960
1
965
1 970
1 975
1 980
Figure 1 .4. Results x,, Example 1 . 1 .4, of All-star baseball games, 1933 - 1 980.
6
I. Stationary Time Series
EXAMPLE
1 770
1 77 1
1 772
1 773
1 774
1 775
1 776
1 777
1 778
1 779
1 780
1781
1 782
1 783
1 784
1 785
1 786
1 787
1 788
1 789
1 . 1 .5 (Wolfer Sunspot Numbers, 1 770- 1 869).
1 790
1 79 1
1 792
1 793
1 794
1 795
1 796
1 797
1 798
1 799
1 800
1 80 1
1 802
1 803
1 804
1 805
1 806
1 807
1 808
1 809
101
82
66
35
31
7
20
92
1 54
1 25
85
68
38
23
10
24
83
1 32
131
118
90
67
60
47
41
21
16
6
4
7
14
34
45
43
48
42
28
10
8
2
1810
181 1
1812
1813
1814
1815
1816
1817
1818
1 81 9
1 820
1 82 1
1 822
1 823
1 824
1 825
1 826
1 827
1 828
1 829
0
5
12
14
35
46
41
30
24
16
7
4
2
8
17
36
50
62
67
1830
1831
1 832
1 833
1 834
1 835
1 836
1 837
1 838
1 839
1 840
1 84 1
1 842
1 843
1 844
1 845
1 846
1 847
1 848
1 849
71
48
28
8
13
57
1 22
1 38
1 03
86
63
37
24
11
15
40
62
98
1 24
96
1 850
1851
1 852
1 853
1 854
1 855
1 856
1 857
1 858
1 859
1 860
1 86 1
1 862
1 863
1 864
1 865
1 866
1 867
1 868
1 869
66
64
54
39
21
7
4
23
55
94
96
77
59
44
47
30
16
7
37
74
1 6 0 ,-----,
1 50
1 40
1 30
1 20
1 10
1 00
90
80
70
60
50
40
30
20
10
0 �������
1 770
1 780
1 790
1 800
1810
1 8 20
1830
1 840
1 85 0
Figure 1 .5. The Wolfer sunspot numbers, 1 770- 1 869.
1 860
1870
§ 1 . 1 . Examples of Time Series
EXAMPLE
7
1 . 1 .6 (Monthly Accidental Deaths in the U.S.A., 1 973-1 978).
Jan.
Feb.
Mar.
Apr.
May
Jun.
Jul.
Aug.
Sep.
Oct.
Nov.
Dec.
1 973
1 974
1 975
1 976
1 977
1 978
9007
8 1 06
8928
9 1 37
1 00 1 7
1 0826
1 13 1 7
1 0744
97 1 3
9938
9161
8927
7750
698 1
8038
8422
8714
95 1 2
1 0 1 20
9823
8743
9 1 29
8710
8680
8 1 62
7306
8 1 24
7870
9387
9556
1 0093
9620
8285
8433
8 1 60
8034
77 1 7
746 1
7776
7925
8634
8945
1 0078
9 1 79
8037
8488
7874
8647
7792
6957
7726
8 1 06
8890
9299
1 0625
9302
83 1 4
8850
8265
8796
7836
6892
779 1
8 1 29
9115
9434
1 0484
9827
91 10
9070
8633
9240
11
10
"()
c
UJ
�
:J
0
.r:
f--
9
8
7
0
12
24
36
48
60
72
Figure 1.6. Monthly accidental deaths in the U.S.A., 1 973 - 1 978 (National Safety
Council).
8
I. Stationary Time Series
These examples are of course but a few of the multitude of time series to
be found in the fields of engineering, science, sociology and economics. Our
purpose in this book is to study the techniques which have been developed
for drawing inferences from such series. Before we can do this however, it is
necessary to set up a hypothetical mathematical model to represent the data.
Having chosen a model (or family of models) it then becomes possible to
estimate parameters, check for goodness of fit to the data and possibly to use
the fitted model to enhance our understanding of the mechanism generating
the series. Once a satisfactory model has been developed, it may be used
in a variety of ways depending on the particular field of application. The
applications include separation (filtering) of noise from signals, prediction of
future values of a series and the control of future values.
The six examples given show some rather striking differences which are
apparent if one examines the graphs in Figures 1 . 1 - 1 .6. The first gives rise to
a smooth sinusoidal graph oscillating about a constant level, the second to a
roughly exponentially increasing graph, the third to a graph which fluctuates
erratically about a nearly constant or slowly rising level, and the fourth to an
erratic series of minus ones and ones. The fifth graph appears to have a strong
cyclic component with period about 1 1 years and the last has a pronounced
seasonal component with period 12.
In the next section we shall discuss the general problem of constructing
mathematical models for such data.
§ 1.2 Stochastic Processes
The first step in the analysis of a time series is the selection of a suitable
mathematical model (or class of models) for the data. To allow for the possibly
unpredictable nature of future observations it is natural to suppose that each
observation x, is a realized value of a certain random variable X,. The time
series { x" t E T0 } is then a realization of the family of random variables
{ X,, t E T0 }. These considerations suggest modelling the data as a realization
(or part of a realization) of a stochastic process { X,, t E T} where T 2 T0 . To
clarify these ideas we need to define precisely what is meant by a stochastic
process and its realizations. In later sections we shall restrict attention to
special classes of processes which are particularly useful for modelling many
of the time series which are encountered in practice.
Definition 1.2.1 (Stochastic Process). A stochastic process is a family of random
variables {X,, t E T} defined on a probability space (Q, ff, P).
Remark 1. In time series analysis the index (or parameter) set Tis a set of time
points, very often {0, ± 1 , ± 2, . . . }, { 1 , 2, 3, . . . }, [0, oo ) or ( - oo, oo ). Stochastic
processes in which Tis not a subset of IR are also of importance. For example
in geophysics stochastic processes with T the surface of a sphere are used to
§ 1 .2. Stochastic Processes
9
represent variables indexed by their location on the earth's surface. In this
book however the index set T will always be a subset of IR.
Recalling the definition of a random variable we note that for each fixed
t E T, X, is in fact a function X,( . ) on the set n. On the other hand, for each
fixed wEn, X.(w) is a function on T.
(Realizations of a Stochastic Process). The functions
{X.(w), w E!l} on T are known as the realizations or sample-paths of the
process {X,, t E T}.
Definition 1.2.2
Remark 2. We shall frequently use the term time series to mean both the data
and the process of which it is a realization.
The following examples illustrate the realizations of some specific stochastic
processes. The first two could be considered as possible models for the time
series of Examples 1 . 1 . 1 and 1 . 1 .4 respectively.
1 .2. 1 (Sinusoid with Random Phase and Amplitude). Let A and 0
be independent random variables with A :;:::: 0 and 0 distributed uniformly on
(0, 2n). A stochastic process { X (t), t E IR} can then be defined in terms of A and
0 for any given v :;:::: 0 and r > 0 by
( 1 .2. 1 )
X, = r - 1 A cos(vt + 0),
ExAMPLE
o r more explicitly,
X,(w) = r- 1 A(w)cos(vt + 0(w)),
( 1 .2.2)
where w is an element of the probability space n on which A and 0 are defined.
The realizations of the process defined by 1 .2.2 are the functions of t
obtained by fixing w, i.e. functions of the form
x (t) = r- 1 a cos(vt + (}).
The time series plotted in Figure 1 . 1 is one such realization.
EXAMPLE 1 .2.2 (A Binary Process). Let {X,, t = 1, 2, . . . } be a sequence of
independent random variables for each of which
=
( 1 .2.3)
P (X, = 1 ) = P (X, = - 1) l
In this case it is not so obvious as in Example 1 .2. 1 that there exists a
probability space (Q, ff, P) with random variables X 1 , X2 , defined on n
having the required joint distributions, i.e. such that
. • •
( 1 .2.4)
for every n-tuple (i 1 , . . . , in) of 1 's and - 1 's. The existence of such a process is
however guaranteed by Kolmogorov's theorem which is stated below and
discussed further in Section 1 .7.
1 . Stationary Time Series
10
The time series obtained by tossing a penny repeatedly and scoring + 1 for
each head, - I for each tail is usually modelled as a realization of the process
defined by ( 1 .2.4). Each realization of this process is a sequence of 1 's and 1 's.
A priori we might well consider this process as a model for the All Star
baseball games, Example 1 . 1 .4. However even a cursory inspection of the
results from 1 963 onwards casts serious doubt on the hypothesis P(X, 1) = t·
-
=
ExAMPLE 1 .2.3 (Random Walk). The simple symmetric random walk {S, t =
0, I, 2, . . . } is defined in terms of Example 1 .2.2 by S0 = 0 and
t � 1.
( 1 .2.5)
The general random walk is defined in the same way on replacing X 1 , X2 ,
by a sequence of independently and identically distributed random variables
whose distribution is not constrained to satisfy ( 1 .2.3). The existence of such
an independent sequence is again guaranteed by Kolmogorov's theorem (see
Problem 1 . 1 8).
. • •
1 .2.4 (Branching Processes). There is a large class of processes,
known as branching processes, which in their most general form have been
applied with considerable success to the modelling of population growth
(see for example lagers (1 976)). The simplest such process is the Bienayme­
Galton-Watson process defined by the equations X0 = x (the population size
in generation zero) and
ExAMPLE
t = 0, 1, 2,
0 0 0 '
( 1 .2.6)
,j
are independently and identically
where Z,,j, t = 0, I , . . . = 1 , 2,
distributed non-negative integer-valued random variables, Z,,j, representing
the number of offspring of the ph individual born in generation t.
In the first example we were able to define X,(w) quite explicitly for each
t and w. Very frequently however we may wish (or be forced) to specify
instead the collection of all joint distributions of all finite-dimensional vectors
(X, , , X,2, . . . , X,J, t = (t1, . . . , t" ) E T", n E {I, 2, . . . }. In such a case we need
to be sure that a stochastic process (see Definition 1 .2. 1 ) with the specified
distributions really does exist. Kolmogorov's theorem, which we state here
and discuss further in Section 1.7 , guarantees that this is true under minimal
conditions on the specified distribution functions. Our statement of Kolmo­
gorov' s theorem is simplified slightly by the assumption (Remark 1) that T is
a subset of IR and hence a linearly ordered set. If T were not so ordered an
additional "permutation" condition would be required (a statement and proof
of the theorem for arbitrary T can be found in numerous books on probability
theory, for example Lamperti, 1 966).
§ 1 .3. Stationarity and Strict Stationarity
11
Definition 1.2.3 (The Distribution Functions of a Stochastic Process
{X� ' t E Tc !R}). Let 5be the set of all vectors { t = (t 1 , . . . , tn)' E Tn: t 1 <
t 2 < · · · < tn , n = 1 , 2, . . . }. Then the (finite-dimensional) distribution functions
of { X� ' t E T} are the functions { F1 ( ), t E 5} defined for t = (t 1 , , tn)' by
•
• • •
Theorem 1.2.1 (Kolmogorov's Theorem). The probabi li tydi stri buti on functi ons
{ F1( ), t E 5} are the di stri buti on functi ons of some stochasti c process if and
only if for any n E { 1 , 2, . . . }, t = (t 1, . . . , tn)' E 5 and 1 :-:::; i :-:::; n,
•
lim F1(x) = F1< ;>(x(i ))
( 1 .2.8)
wheret (i ) and x(i ) are the (n - I )- component vectors obtai ned by d eleti ng the
i'h components oft and x respecti vely.
If (M · ) is the characteristic function corresponding to F1( ), i.e.
tP1(u) =
l e ;u·xF. (d x 1 , . ,. ,d xn ),
J �n
•
U =
(u 1 , . . . , u n )' E !Rn,
then (1 .2.8) can be restated in the equivalent form,
lim tP1 (u) = tPt(i) (u(i )),
ui-+0
(1 .2.9)
where u(i) is the (n - I )-component vector obtained by deleting the i 1h
component of u.
Condition ( 1 .2.8) is simply the "consistency" requirement that each function
F1( · ) should have marginal distributions which coincide with the specified
lower dimensional distribution functions.
§ 1 .3 Stationarity and Strict Stationarity
When dealing with a finite number of random variables, it is often useful
to compute the covariance matrix (see Section 1 .6) in order to gain insight
into the dependence between them. For a time series {X1 , t E T} we need to
extend the concept of covariance matrix to deal with infinite collections of
random variables. The autocovariance function provides us with the required
extension.
Definition 1.3.1 (The Autocovariance Function). If { X,, t E
T} is a process such
that Var(X1) < oo for each t E T, then the autocovariance function Yx( · , · ) of
{ X1 } is defined by
Yx (r, s) = Cov(X, X. ) = E [(X, - EX, ) (Xs - EX5)],
r, s E T. ( 1 .3.1)
12
I . Stationary Time Series
Definition 1.3.2 (Stationarity). The time series { X0 t E Z }, with index set
Z = {0, ± 1 , ± 2, . . . }, is said to be stationary if
(i) E I X11 2 < oo for all t E Z,
(ii) EX1 =
m
(iii) Yx(r, s)
=
and
for all t E £',
Yx(r + t, s + t) for all r, s, t E £'.
Remark I . Stationarity as just defined is frequently referred to in the literature
as weak stationarity, covariance stationarity, stationarity in the wide sense or
second-order stationarity. For us however the term stationarity, without
further qualification, will always refer to the properties specified by Definition
1 .3.2.
-
Remark 2. If { X1, t E Z } is stationary then Yx(r, s) = Yx(r s, 0) for all r, s E £'. It
is therefore convenient to redefine the autocovariance function of a stationary
process as the function of just one variable,
Yx(h) = Yx(h, 0) = Cov(Xr + h > X1) for all t, h E £'.
The function YxC ) will be referred to as the autocovariance function of { X1}
and Yx(h) as its value at "lag" h. The autocorrelation function (acf) of { X1} is
defined analogously as the function whose value at lag h is
Px(h) = Yx(h)!Yx(O) = Corr(Xr+h> X1) for all t, h E 7L.
It will be noticed that we have defined stationarity only in the case
when T = Z. It is not difficult to define stationarity using a more general index
set, but for our purposes this will not be necessary. If we wish to model a set
of data { X1, t E T c Z } as a realization of a stationary process, we can always
consider it to be part of a realization of a stationary process { X1, t E Z }.
Remark 3.
Another important and frequently used notion of stationarity is introduced
in the following definition.
Definition 1.3.3 (Strict Stationarity). The time series { X0 t E Z } is said to be
strictly stationary if the joint distributions of(X1, , , X1J and (X1, +h , . . . , Xr.+h)'
are the same for all positive integers k and for all t 1, . . . , tk, h E £'.
Strict stationarity means intuitively that the graphs over two equal-length
time intervals of a realization of the time series should exhibit similar statistical
characteristics. For example, the proportion of ordinates not exceeding a
given level x should be roughly the same for both intervals.
• • •
1 .3.3 is equivalent to the statement that (X 1, , Xk)' and
(X l +h ' . . . , Xk+h)' have the same joint distribution for all positive integers k and
integers h.
Remark 4. Definition
• • •
§ 1 .3. Stationarity and Strict Stationarity
13
The Relation Between Stationarity and Strict Stationarity
If { X1 } is strictly stationary it immediately follows, on taking k = 1 in
Definition 1.3.3, that X1 has the same distribution for each t E 7!.. . If E I X1I 2 < oo
this implies in particular that EX1 and Var(X1) are both constant. Moreover,
taking k = 2 in Definition 1 .3.3, we find that Xt+ h and X1 have the same joint
distribution and hence the same covariance for all h E 7!.. . Thus a strictly
stationary process with finite second moments is stationary.
The converse of the previous statement is not true. For example if { X1 } is
a sequence of independent random variables such that X1 is exponentially
distributed with mean one when t is odd and normally distributed with mean
one and variance one when t is even, then { X1} is stationary with Yx(O) = 1
and Yx(h) = 0 for h =F 0. However since X 1 and X2 have different distributions,
{ X1 } cannot be strictly stationary.
There is one important case however in which stationarity does imply strict
stationarity.
Definition 1 .3.4 (Gaussian
Time Series). The process { X1 } is a Gaussian time
series if and only if the distribution functions of { X1} are all multivariate
normal.
If { Xn t E 7!.. } is a stationary Gaussian process then { X1 } is strictly stationary,
since for all n E { 1 , 2, . . . } and for all h, t 1 , t 2 , E Z, the random vectors
(X1, , , X1} and (X1, +h• . . . , X1" +h)' have the same mean and covariance
matrix, and hence the same distribution.
• • •
. . •
1 .3. 1 . Let X1 = A cos(8t) + B sin(8t) where A and B are two uncor­
related random variables with zero means and unit variances with 8 E [ -n, n].
This time series is stationary since
ExAMPLE
Cov(Xr+h• X1) = Cov(A cos(8(t + h)) + B sin(8(t + h)), A cos(8t) + B sin(8t))
=
cos(8t)cos(8(t + h)) + sin(8t)sin(8(t + h))
= cos(8h),
which is independent of t.
EXAMPLE 1 .3.2. Starting with an independent and identically distributed
sequence of zero-mean random variables Z1 with finite variance ai , define
XI = zl + ezt-1· Then the autocovariance function of XI is given by
{
Cov(Xt +h• XI) = Cov(Zt +h + ezt+h- 1 > zl + ezt- 1 )
(1 + 8 2 )al if h = 0,
=
if h = ± 1 ,
8al
if I hi > 1 ,
0
I. Stationary Time Series
14
and hence { X1 } is stationary. In fact it can be shown that { X1 } is strictly
stationary (see Problem 1 . 1 ).
EXAMPLE
1 .3.3. Let
{Y,
if t is even,
x�¥,+ 1 if t is odd.
where { Y, } is a stationary time series. Although Cov(Xr+h• X1)
not stationary for it does not have a constant mean.
=
= yy(h), {
X1 } is
1 .3.4. Referring to Example 1 .2.3, let st be the random walk
X
1
+
X2 + · · · + X, where X 1, X2 , . . . , are independent and identically
S1
distributed with mean zero and variance (J 2 . For h > 0,
t
t +h
Cov(Sr+h • S1) Cov � X; , � Xj
;
j
EXAMPLE
and thus
= (
st
is not stationary.
=
)
(J2 t
Stationary processes play a crucial role in the analysis of time series.
Of course many observed time series (see Section 1 . 1) are decidedly non­
stationary in appearance. Frequently such data sets can be transformed by
the techniques described in Section 1 .4 into series which can reasonably be
modelled as realizations of some stationary process. The theory of stationary
processes (developed in later chapters) is then used for the analysis, fitting and
prediction of the resulting series. In all of this the autocovariance function is
a primary tool. Its properties will be discussed in Section 1.5.
§ 1 .4 The Estimation and Elimination of Trend
and Seasonal Components
The first step in the analysis of any time series is to plot the data. If there are
apparent discontinuities in the series, such as a sudden change of level, it may
be advisable to analyze the series by first breaking it into homogeneous
segments. If there are outlying observations, they should be studied carefully
to check whether there is any justification for discarding them (as for example
if an observation has been recorded of some other process by mistake).
Inspection of a graph may also suggest the possibility of representing the data
as a realization of the process (the "classical decomposition" model),
§1.4. The Estimation and Elimination of Trend and Seasonal Components
X, =
m,
+ s, + r;,
15
( 1 .4. 1)
where m , is a slowly changing function known as a "trend component", s, is
a function with known period d referred to as a "seasonal component", and
r; is a "random noise component" which is stationary in the sense of Definition
1 .3.2. If the seasonal and noise fluctuations appear to increase with the level
of the process then a preliminary transformation of the data is often used to
make the transformed data compatible with the model ( 1 .4. 1). See for example
the airline passenger data, Figure 9.7, and the transformed data, Figure 9.8,
obtained by applying a logarithmic transformation. In this section we shall
discuss some useful techniques for identifying the components in ( 1 .4. 1).
Our aim is to estimate and extract the deterministic components m , and s,
in the hope that the residual or noise component r; will turn out to be a
stationary random process. We can then use the theory of such processes to
find a satisfactory probabilistic model for the process {I; }, to analyze its
properties, and to use it in conjunction with m, and s, for purposes of prediction
and control of {X,}.
An alternative approach, developed extensively by Box and Jenkins ( 1970),
is to apply difference operators repeatedly to the data { x,} until the differenced
observations resemble a realization of some stationary process {Wr }. We can
then use the theory of stationary processes for the modelling, analysis and
prediction of {Wr } and hence of the original process. The various stages of this
procedure will be discussed in detail in Chapters 8 and 9.
The two approaches to trend and seasonality removal, (a) by estimation of
m, and s, in ( 1 .4. 1 ) and (b) by differencing the data { x, }, will now be illustrated
with reference to the data presented in Section 1 . 1 .
Elimination of a Trend i n the Absence of Seasonality
In the absence of a seasonal component the model ( 1 .4. 1 ) becomes
t = 1, . . . , n
where, without loss of generality, we can assume that EI; = 0.
( 1 .4.2)
(Least Squares Estimation of m, ). In this procedure we attempt to
fit a parametric family of functions, e.g.
Method 1
( 1 .4.3)
to the data by choosing the parameters, in this illustration a0, a 1 and a 2 , to
minimize ,L, (x, - m, f .
Fitting a function of the form ( 1 .4.3) to the population data of Figure 1 .2,
1 790 :::::; t :::::; 1 980 gives the estimated parameter values,
llo = 2.0979 1 1 X 1 0 1 0 ,
a1
=
- 2.334962
x
107,
1 . Stationary Time Series
16
260
240
220
200
180
�
Ul
c
0
2-
1 60
1 40
0
1 20
0
1 00
80
60
40
20
0
1 78 0
1 98 0
1 930
188 0
1830
Figure 1 .7. Population of the U.S.A., 1 790- 1 980, showing the parabola fitted by least
squares.
and
a2
=
6.49859 1
x
1 03.
A graph of the fitted function is shown with the original data in Figure 1 .7.
The estimated values of the noise process 1;, 1 790 $; t $; 1 980, are the residuals
obtained by subtraction of m t = ao + a! t + llzt2 from xt.
The trend component m1 furnishes us with a natural predictor of future
values of X1 • For example if we estimate ¥1 990 by its mean value (i.e. zero) we
obtain the estimate,
m1 990 2.484 x 1 08 ,
=
for the population of the U.S.A. in 1 990. However if the residuals { Yr} are
highly correlated we may be able to use their values to give a better estimate
of ¥1 990 and hence of X 1 990 .
Method 2 (Smoothing by Means of a Moving Average). Let q be a non­
negative integer and consider the two-sided moving average,
q
w, = (2q + 1 )- 1
( 1 .4.4)
x+t j•
j=-q
of the process { X1 } defined by ( 1 .4.2). Then for q + 1 $; t $; n q,
q
q
w, = (2q + 1 ) l 2: m+
t
(2q + 1) - l 2: Yr+j
j=-q j +
j=-q
( 1 .4.5)
L
-
-
17
§ 1.4. The Estimation and Elimination of Trend and Seasonal Components
assuming that m, is approximately linear over the interval [t - q, t + q] and
that the average of the error terms over this interval is close to zero.
The moving average thus provides us with the estimates
m, = (2q + W1 j=L-q x,+ j,
q
q
+ 1 ::; t ::; - q.
n
( 1 .4.6)
Since is not observed for t ::; 0 or t > n we cannot use ( 1 .4.6) for t ::; q or
t > n- q. The program SMOOTH deals with this problem by defining
for t < 1 and
n for t > n. The results of applying this
program to the strike data of Figure 1.3 are shown in Figure 1 .8. The
are shown in Figure 1 .9. As expected,
estimated noise terms, Y,
they show no apparent trend.
For any fixed E [0, 1], the one-sided moving averages
t = 1 , . . . , n,
defined by the recursions,
( 1.4.7)
t = 2, . . . , n,
+ (1
and
( 1 .4.8)
can also be computed using the program SMOOTH. Application of ( 1 .4.7)
and ( 1 .4.8) is often referred to as exponential smoothing, since it follows from
a
i + (1 these recursions that, for t :;:o: 2,
, with weights decreasing expo­
weighted moving average of
nentially (except for the last one).
in ( 1 .4.6) as a process obtained from
It is useful to think of
by application of a linear operator or linear filter,
with
X,
X,:= X 1
X,:= X
= X, - m"
a
m, = aX, - a)m,_ 1,
m,,
jX,_ a)'- 1 X 1 ,
m,
=
i��
a(l
a'
L
X,, X,_ 1,
{m,}
m, = L� - co ajx,+ j {X,}
• • .
6
'";)
5
t:,
4
"1J
<:
0
"'
�
0
.r::
3
2 +-������
1975
1980
1955
1960
1965
1970
1950
Figure 1 .8. Simple 5-term moving average m, of the strike data from Figure 1 .3.
I. Stationary Time Series
18
1 ,-------,
0.9
0.8
0.7
0.6
0.5
0.4
0.3
'-;;'
0.2
0
�
:J
0
.r:
0.1
-g
t:.,
0 4-4---+-4-���+-��--���--r--�_,-+--+-�
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
1950
1955
Figure 1 .9. Residuals, Y,
the strike data.
=
1960
= x,
-
1965
1970
1975
1980
m,, after subtracting the 5-term moving average from
weights aj (2q + 1) - 1 , - q s j s q, and aj = 0, Ul > q. This particular filter
is a "low-pass" filter since it takes the data { x,} and removes from it the rapidly
fluctuating (or high frequency) component { Y,}, to leave the slowly varying
estimated trend term { m,} (see Figure 1 . 1 0).
{x,}
Linea r filter
Figure 1 . 1 0. Smoothing with a low-pass linear filter.
The particular filter ( 1 .4.6) is only one of many which could be used for
smoothing. For large q, provided (2q + 1 ) - 1 2J=-q Y,+i � 0, it will not only
attenuate noise but at the same time will allow linear trend functions m, =
at + b, to pass without distortion. However we must beware of choosing q to
be too large since if m, is not linear, the filtered process, although smooth, will
not be a good estimate of m,. By clever choice of the weights { aj} it is possible
to design a filter which will not only be effective in attenuating noise from the
data, but which will also allow a larger class of trend functions (for example
all polynomials of degree less than or equal to 3) to pass undistorted through
the filter. The Spencer 1 5-point moving average for example has weights
ai = 0,
I ii > 7,
§ 1 .4. The Estimation and Elimination of Trend and Seasonal Components
with
19
I i i :,;; 7,
and
[a0, a1 , ... , a7 ] = 3 i0 [74, 67, 46, 2 1, 3, - 5, - 6, 3]
Applied to the process ( 1 .4.3) with m, =a t 3 + bt2 + ct + d , it gives
7
7
7
a;Xt+i =
a;mt+i +
a; Yr+i
i=-7
i= - 7
i= 7
7
�
aimt+i'
i=-7
-
L
L
L
.
( 1.4.9)
L
-
=mo
where the last step depends on the assumed form of m, (Problem 1 .2). Further
details regarding this and other smoothing filters can be found in Kendall and
Stuart, Volume 3, Chapter 46.
Method 3 (Differencing to Generate Stationary Data). Instead of attempting
to remove the noise by smoothing as in Method 2 , we now attempt to eliminate
the trend term by differencing. We define the first difference operator V by
( 1 .4. 1 0)
VX, =X,- Xt-1 =( 1 - B)X0
where B is the backward shift operator,
( 1 .4. 1 1)
BX, =X,-1·
Powers of the operators B and V are defined in the obvious way, i.e.
Bj (X,) =X,_j and Vj(X,) =V(Vj-1(X,)),j � 1 with V0(X,) =X,. Polynomials
in B and V are manipulated in precisely the same way as polynomial functions
of real variables. For example
=X,- 2X,_1 + X,_z.
If the operator V is applied to a linear trend function m1 =at + b, then we
obtain the constant function Vm, =a. In the same way any polynomial trend
of degree can be reduced to a constant by application of the operator Vk
(Problem 1 .4).
Starting therefore with the model X, =m, + Yr where m, = J=o alj and
Yr is stationary with mean zero, we obtain
VkX, = ak + VkYr,
k
L
k!
a stationary process with mean k!ak. These considerations suggest the
possibility, given any sequence {x,} of data, of applying the operator V
repeatedly until we find a sequence {Vkx,} which can plausibly be modelled
as a realization of a stationary process. It is often found in practice that the
I. Stationary Time Series
20
20
15
10
�
1/l
c
0
i
�
5
0
-5
-10
-15
- 20
1 78 0
1 830
1 880
1 930
1 980
Figure 1 . 1 1 . The twice-differenced series derived from the population data of Figure 1 .2.
order of differencing required is quite small, frequently one or two. (This
depends on the fact that many functions can be well approximated, on an
interval of finite length, by a polynomial of reasonably low degree.)
Applying this technique to the twenty population values { xn, n = 1 , . . . , 20}
of Figure 1 .2 we find that two differencing operations are sufficient to produce
a series with no apparent trend. The differenced data, V2 xn = xn - 2xn - t
xn- z , are plotted in Figure 1 . 1 1 . Notice that the magnitude of the fluctuations
in V2 xn increase with the value of xn - This effect can be suppressed by first
taking natural logarithms, Yn = In xn, and then applying the operator V 2 to
the series { Yn } · (See also Section 9.2(a).)
k
+
Elimination of both Trend and Seasonality
The methods described for the removal of trend can be adapted in a natural
way to eliminate both trend and seasonality in the general model
( 1 .4. 1 2)
where E l; = 0, st + d = S1 and I1= t si = 0. We illustrate these methods, with
reference to the accident data of Example 1 . 1 .6 (Figure 1 .6) for which the
period d of the seasonal component is clearly 1 2.
It will be convenient in Method 1 to index the data by year and month.
= 1, . . . , 1 2 will denote the number ofaccidental deaths
Thus xi. k> j = 1, . . .
,6,k
§ 1 .4. The Estimation and Elimination of Trend and Seasonal Components
reported for the kth month of the/h year, ( 1 972
j = 1' . . .
' 6,
21
+ j). I n other words we define
k = 1 ' . . . ' 1 2.
Method Sl (The Small Trend Method). If the trend is small (as in the accident
data) it is not unreasonable to suppose that the trend term is constant, say mi ,
for the /h year. Since :Lf,:1 sk = 0, we are led to the natural unbiased estimate
while for sk > k
1 12
mj = 1 2 I xj. k>
k=l
=
( 1 .4. 1 3)
1 , . . . , 1 2 we have the estimates,
1 6
( 1 .4. 14)
.sk = Il (xj.k - mJ ,
j=
which automatically satisfy the requirement that :LI,:1 sk = 0. The estimated
error term for month k of the /h year is of course
-6
Y) , k
= x.j, k - mJ - .sk '
j = 1, . . .
, 6,
k = 1 , . . . , 1 2.
( 1 .4. 1 5)
The generalization of( 1 .4.1 3)-( 1 .4. 1 5) to data with seasonality having a period
other than 1 2 should be apparent.
In Figures 1 . 1 2, 1 . 1 3 and 1 . 1 4 we have plotted respectively the detrended
observations xj, k - mi, the estimated seasonal components sk> and the de-
2
�
Vl
u
c
8
:J
0
..c:
f-
0 1-����--����-+--+-----r--H�-
-1
-2
0
12
24
36
48
60
72
Figure 1 . 1 2. Monthly accidental deaths from Figure 1 .6 after subtracting the trend
estimated by Method S l .
1 . Stationary Time Series
22
2
"
D
c
�
��
0
L
,t_
0 4---�--¥+�--�--4---¥+�--�-,M+�
- 1
-2
0
12
24
36
48
60
72
Figure 1 . 1 3. The seasonal component o f the monthly accidental deaths, estimated by
Method S l .
2
Vl
D
c
�
:J
0
L
,t_
0 �����--��rF���
-1
-2
0
12
24
36
48
60
72
Figure 1 . 1 4. The detrended and deseasonalized monthly accidental deaths (Method S l).
§ 1 .4. The Estimation and Elimination of Trend and Seasonal Components
12
23
.-------�
1 1
�
(/)
u
c
0
(/)
:J
0
.J:
t:.
10
9
8
��3EE·r··�·:·�9'*9I'IJCl'crTI:D
7
0
12
24
36
60
48
72
Figure 1 . 1 5. Comparison of the moving average and piecewise constant estimates of
trend for the monthly accidental deaths.
trended, deseasonalized observations �. k = xi. k - mi - sk . The latter have no
apparent trend or seasonality.
Method S2 (Moving Average Estimation). The following technique is preferable
to Method S 1 since it does not rely on the assumption that mr is nearly
constant over each cycle. It is the basis for the "classical decomposition"
option in the time series identification section of the program PEST.
Suppose we have observations {x 1 , . . . , x.}. The trend is first estimated by
applying a moving average filter specially chosen to eliminate the seasonal
component and to dampen the noise. If the period d is even, say d = 2q, then
we use
mt = (0.5Xr - q + Xr - q + l + ' ' ' + Xr+ q - 1 + 0.5Xr+ q )/d,
q<ts
n -
q.
( 1 .4. 1 6)
If the period is odd, say d = 2q + 1, then we use the simple moving average
( 1 .4.6).
In Figure 1 . 1 5 we show the trend estimate mn 6 < t s 66, for the accidental
deaths data obtained from ( 1 .4. 1 6). Also shown is the piecewise constant
estimate obtained from Method S l .
The second step is to estimate the seasonal component. For each k = 1 , . . . , d
we compute the average wk of the deviations { (xk + id - mk +id) : q < k + jd s
n - q}. Since these average deviations do not necessarily sum to zero, we
1 . Stationary Time Series
24
Table 1 . 1 . Estimated Seasonal Components for the Accidental Deaths Data
k
.�, ( Method S 1 )
.�, ( Method S2)
- 744
- 804
2
3
4
5
6
7
8
9
10
11
12
- 1 504
- 1 522
- 724
- 737
- 523
- 526
338
343
808
746
1 665
1 680
96 1
987
- 87
- 1 09
197
258
- 32 1
- 259
- 67
- 57
k
1 , . . . , d,
estimate the seasonal component sk as
=
( 1 .4. 1 7)
and sk = sk -d• > d.
The deseasonalized data is then defined to be the original series with the
estimated seasonal component removed, i.e.
k
t
d, = x, - s,
=
( 1 .4. 1 8)
1 , . . . , n.
Finally we reestimate the trend from { d, } either by applying a moving
average filter as described earlier for non-seasonal data, or by fitting a
polynomial to the series { d, }. The program PEST allows the options of fitting
a linear or quadratic trend m,. The estimated noise terms are then
5;
= - m,
x,
-
t
s, ,
=
1, . . . , n.
The results of applying Methods S l and S2 to the accidental deaths data
are quite similar, since in this case the piecewise constant and moving average
estimates of m, are reasonably close (see Figure 1 . 1 5).
A comparison of the estimates of sk > = 1 , . . . , 1 2, obtained by Methods
S 1 and S2 is made in Table 1 . 1 .
k
Method S3 (Differencing a t Lag d). The technique of differencing which we
applied earlier to non-seasonal data can be adapted to deal with seasonality
of period d by introducing the lag-d difference operator vd defined by
(This operator should not be confused with the operator V
earlier.)
Applying the operator Vd to the model,
X, = m,
where
{ } has period d, we obtain
d
=
d
(1 .4. 1 9)
( 1 - B) defined
+ + Y,,
s,
s,
which gives a decomposition of the difference vdxt into a trend component
- m,_d ) and a noise term ( Y, - Y, - d). The trend, m, - m, _d, can then be
eliminated using the methods already described, for example by application
of some power of the operator V.
Figure 1 . 1 6 shows the result of applying the operator V1 2 to the accidental
(m,
§ 1 .5. The Autocovariance Function of a Stationary Process
25
2
�
Vl
1J
c
0 1-------+-��--����
�:J
0
.r:
s
-
1
-2
0
12
24
Figure 1. 16. The differenced series {V 1 2 x,, t
accidental deaths {x,, t = ! , . . . , 72}.
36
=
48
60
72
1 3, . . . , 72} derived from the monthly
deaths data. The seasonal component evident in Figure 1 .6 is absent from the
graph of V 1 2 x, 1 3 :s:; t :s:; 72. There still appears to be a non-decreasing trend
however. If we now apply the operator V to V 1 2 x, and plot the resulting
differences VV 1 2 x,, t = 14, . . . , 72, we obtain the graph shown in Figure 1 . 1 7,
which has no apparent trend or seasonal component. In Chapter 9 we shall
show that the differenced series can in fact be well represented by a stationary
time series model.
In this section we have discussed a variety of methods for estimating and/or
removing trend and seasonality. The particular method chosen for any given
data set will depend on a number of factors including whether or not estimates
of the components of the series are required and whether or not it appears
that the data contains a seasonal component which does not vary with time.
The program PEST allows two options, one which decomposes the series as
described in Method S2, and the other which proceeds by successive differencing
of the data as in Methods 3 and S3.
§ 1.5 The Autocovariance Function of
a Stationary Process
In this section we study the properties of the autocovariance function intro­
duced in Section 1 .3.
1 . Stationary Time Series
26
2
�
Vl
u
c
��
0
.c
f-
0
�----����+-�
- 1
-2
24
12
0
36
48
60
72
Figure 1 . 1 7. The differenced series {VV 1 2 x,, t = 14, . . . , 7 2 } derived from the monthly
accidental deaths { x, , t = 1, . . , 72}.
.
Proposition 1 .5.1 (Elementary Properties). If y( · ) is the autocovariance function
of a stationary process { X, t E Z}, then
y(O) :;::.: 0,
( 1.5. 1 )
l y(h) l :::;; y(O) for all h E Z,
( 1 .5.2)
y(h) = y( - h) for all h E Z.
(1.5.3)
and y( · ) is even, i.e.
PROOF. The first property is a statement of the obvious fact that Var(X,) :;::>: 0,
the second is an immediate consequence of the Cauchy-Schwarz inequality,
and the third is established by observing that
y( - h) = Cov(X, _h , X,) = Cov(X, X,+ h ) = y(h).
D
Autocovariance functions also have the more subtle property of non­
negative definiteness.
(Non-Negative Definiteness). A real-valued function on the
integers, K : Z --> IR, is said to be non-negative definite if and only if
Definition 1 .5.1
§1 .5. The Autocovariance Function of a Stationary Process
27
( 1 .5.4)
Li,jn=l a;K(t; - ti)ai � 0
for all positive integers n and for all vectors a (a 1 , . . . , a n Y E !Rn and
(t 1, ... , tnY E zn or if and only if Li. i = 1 a; K(i - j)ai � 0 for all such n and a.
t=
=
Theorem 1 .5.1 (Characterization of Autocovariance Functions). A real-valued
function defined on the integers is the autocovariance function of a
stationary time series if and only if it is even and non-negative definite.
PROOF. To show that the autocovariance function y( · ) of any stationary time
E
series {X, } is non-negative definite, we simply observe that if = (a 1 , ,
!Rn , t =
, n E zn , and Z1 = (X,, - EX,, , . . . , X,., - EX,J', then
a
(t 1, ... t )'
=
=
rn [y(t; - ti)]i. i=l
• • •
anY
a'rn a
n
L
i,j=l a;y(t; - ti)ai,
where =
is the covariance matrix of (X, , , . . . , X,).
To establish the converse, let K : Z --> IR be an even non-negative definite
function. We need to show that there exists a stationary process with K( · )
as its autocovariance function, and for this we shall use Kolmogorov's
theorem. For each positive integer n and each t = 1' . .
E z n such that
n
< < · · · < let F1 be the distribution function on !R with characteristic
function
t 1 t2
(t ' tnY
tn ,
.
tP1(u) = exp( - u' Ku/2),
n
where u =
Since K is non-negative
. . . , un Y E !R and K =
definite, the matrix K is also non-negative definite and consequently tPt is
the characteristic function of an n-variate normal distribution with mean
zero and covariance matrix K (see Section 1.6). Clearly, in the notation of
Theorem 1 .2. 1 ,
[K(t;- ti)]i.i=I ·
(u 1 ,
tPt< ;>(u(i)) = lim tP1(u) for each t E Y,
uc-·"" 0
i.e. the distribution functions F1 are consistent, and so by Kolmogorov's
theorem there exists a time series { X, } with distribution functions F1 and
characteristic functions tP1, E Y. In particular the joint distribution of X; and
Xi is bivariate normal with mean 0 and covariance matrix
t
[
K(i - j)
-j) K(O) J '
K(i -j) as required.
K(O)
K(i
which shows that Cov(X; , XJ =
D
I . Stationary Time Series
28
Remark l . As shown in the proof of Theorem 1 .5. 1 , for every autocovariance
function y( · ), there exists a stationary Gaussian time series with y( · ) as its
autocovariance function.
Remark :Z. To verify that a given function is non-negative definite it is sometimes
simpler to specify a stationary process with the given autocovariance function
than to check Definition 1 .5. 1 . For example the function K(h) = cos(Bh), h E Z,
is the autocovariance function of the process in Example 1 .3 . 1 and is therefore
non-negative definite. Direct verification by means of Definition 1 .5.1 however
is more difficult. Another simple criterion for checking non-negative definite­
ness is Herglotz's theorem, which will be proved in Section 4.3.
Remark 3. An autocorrelation function p( ·) has all the properties of an
autocovariance function and satisfies the additional condition p(O) = 1 .
={
ExAMPLE 1 5 1 . Let us show that the real-valued function on Z,
.
.
K(h)
1 if h = 0,
p if h = ± 1 ,
0 otherwise,
is an autocovariance function if and only if I P I � t .
If I p I � i then K ( · ) i s the autocovariance function of the process defined in
Example 1 .3.2 with (J 2 = (1 B 2 r 1 and e = (2pr 1 ( 1 ± j 1 - 4p 2 ).
If p > !, K = [K(i - j)J7, j =t and a is the n-component vector a =
(1, - 1 , 1 , - 1 , . . . )', then
+
a'Ka = n - 2(n - 1)p < 0 for n > 2pj(2p - 1),
which shows that K( · ) is not non-negative definite and therefore, by Theorem
1 .5.1 is not an autocovariance function.
If p < -i, the same argument using the n-component vector
a = (1, 1 , 1 , . . .)' again shows that K( · ) is not non-negative definite.
The Sample Autocovariance Function of an Observed Series
From the observations {x 1 , x 2 , . . . , xn } of a stationary time series { Xr } we
frequently wish to estimate the autocovariance function y( · ) of the underlying
process { Xr } in order to gain information concerning its dependence structure.
This is an important step towards constructing an appropriate mathematical
model for the data. The estimate of y( · ) which we shall use is the sample
autocovariance function.
Definition 1 .5.2. The sample autocovariance function of { x 1 , . . . , xn } is defined
by
§1 .5. The Autocovariance Function of a Stationary Process
n -h
P (h) := n -1 j=I (xj +h - x)(xj - x),
1
and
0 :<::;
29
h < n,
y(h) = y( - h), - n < h :-:::; 0, where .X is the sample mean .X = n - 1 I'i= 1 xi .
Remark 4. The divisor n is used rather than (n - h) since this ensures that the
matrix f" := [y(i - j)J7. j= 1 is non-negative definite (see Section 7.2).
Remark 5. The sample autocorrelation function is defined in terms of the
sample autocovariance function as
l h l < n.
p(h) : = y(h)/Y (O),
The corresponding matrix Rn := [p(i - j)J7. i= 1 is then also non-negative
definite.
Remark 6. The large-sample properties of the estimators
discussed in Chapter 7.
y(h) and p(h) are
1 .5.2. Figure 1 . 1 8(a) shows 300 simulated observations of the series
X,1 . 1 8(b)Z, shows
+ 8Z,_ 1 of Example 1 .3.2 with 8 = 0.95 and Z, N(O, 1 ). Figure
the corresponding sample autocorrelation function at lags
EXAMPLE
=
�
0, . . . , 40. Notice the similarity between p( · ) and the function p( · ) computed
as described in Example 1 .3.2 (p(h) = 1 for h = 0, .4993 for h = ± 1 , 0 otherwise).
EXAMPLE 1 .5.3. Figures 1 . 1 9(a) and 1 . 1 9(b) show simulated observations and
=
the corresponding sample autocorrelation function for the process
Z, 8Z, _ 1 , this time with 8 = - 0.95 and Z, � N(O, 1 ). The similarity between
p( · ) and p( · ) is again apparent.
+
X,
Remark 7. Notice that the realization of Example 1 .5.2 is less rapidly fluctuating
than that of Example 1 .5.3. This is to be expected from the two autocorrelation
functions. Positive autocorrelation at lag 1 reflects a tendency for successive
observations to lie on the same side of the mean, while negative autocorrelation
at lag I reflects a tendency for successive observations to lie on opposite sides
of the mean. Other properties of the sample-paths are also reflected in the
autocorrelation (and sample autocorrelation) functions. For example the
sample autocorrelation function of the Wolfer sunspot series (Figure 1 .20)
reflects the roughly periodic behaviour of the data (Figure 1 .5).
Remark 8. The sample autocovariance and autocorrelation functions can be
computed for any data set { x 1 , . . . , xn} and are not restricted to realizations
of a stationary process. For data containing a trend, I P{h) l will exhibit slow
decay as h increases, and for data with a substantial deterministic periodic
component, p(h) will exhibit similar behaviour with the same periodicity. Thus
p( · ) can be useful as an indicator of non-stationarity (see also Section 9. 1).
30
1 . Stationary Time Series
5
4
3
2
I�
IV
0
�
� \A
-1
-2
-3
-4
-5
50
0
1 00
1 50
200
250
300
(a)
1
0.9
0 .8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0 6
-0.7
- 0 .8
-0.9
-1
0
20
10
30
40
(b)
Figure 1 . 1 8. (a) 300 observations of the series X, = Z, + .95Z, _ 1 , Example 1 .5.2.
(b) The sample autocorrelation function p(h), 0 :s; h :s; 40.
31
§ 1 .5. The Autocovariance Function of a Stationary Process
0
-1
-
2
-
3
-4
-5
0
1 00
50
1 50
250
200
300
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0 2
0.1
0
-0.1
-0.2
- 0.3
-0.4
-0 5
- 0. 6
-0.7
-0.8
- 0.9
-1
0
10
20
30
40
(b)
Figure 1 . 1 9. (a) 3 00 observations o f the series X, = Z, - .95Z, _ 1 , Example 1 .5.3.
(b) The sample autocorrelation function p(h), 0 � h � 40.
I . Stationary Time Series
32
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
- 0 .9
-1
c
10
20
30
40
Figure 1.20. The sample autocorrelation function o f the Wolfer sunspot numbers (see
Figure 1 .5).
§ 1 .6 The Multivariate Normal Distribution
An n-dimensional random vector is a column vector, X = (X 1 , . . . , X.)', each
of whose components is a random variable. If E I X; ! < oo for each i, then we
define the mean or expected value of X to be the column vector,
( 1 .6. 1 )
I n the same way we define the expected value of any array whose elements
are random variables (e.g. a matrix of random variables) to be the same array
with each random variable replaced by its expected value (assuming each
expectation exists).
If X = (X 1 , . . . , X.)' and Y = ( Y1 , . . . , Ym )' are random vectors such that
E I X; / 2 < oo, i = 1 , . . . , n, and E l ¥; 1 2 < oo, i = 1 , . . . , m, we define the co­
variance matrix of X and Y to be the matrix,
�x v = Cov(X, Y) = E [(X - EX) (Y - EY)' ]
= E(XY') - (EX) (EY)'.
( 1 .6.2)
The (i,j)-element of�xv is the covariance, Cov(X; , lj) = E(X; lj) - E(X; )E( lj).
In the special case when Y = X, Cov(X, Y) reduces to the covariance matrix of X.
§1 .6. The Multivariate Normal Distribution
33
Proposition 1 .6.1 . If a is an m-component column vector, B is an m x n matrix
and X = (X1 , , Xn )' where E I X; I 2 < oo, i = 1, . . . , n, then the random vector,
• • .
Y = a + BX ,
( 1 .6.3)
EY = a + BEX,
(1 .6.4)
Lyy = BLxx B'.
( 1 .6.5)
has mean
and covariance matrix,
PROOF. Problem 1 . 1 5.
Proposition 1 .6.2. The covariance matrix Lxx is symmetric and non-negative
definite, i.e. b' Lxx b 2': 0 for all b = (b1 , . . . , bn Y E �n .
PROOF. The symmetry of Lxx is apparent from the definition. To prove non­
negative definiteness let b = (b 1 , , bn )' be an arbitrary vector in �n . Then by
Proposition 1 .6. 1
• • .
b'L xx b = Var(b'X) ;:o: 0.
( 1 .6.6)
0
Proposition 1 .6.3. Any symmetric, non-negative definite n x n matrix L can be
written in the form
( 1 .6.7)
L = PAP',
1
where P is an orthogonal matrix (i.e. P' = p- ) and A is a diagonal matrix
A = diag()" 1 , . . . , ).n ) in which A 1 , , An are the eigenvalues (all non-negative)
of L.
• . .
PROOF. This proposition is a standard result from matrix theory and for a
proof we refer the reader to Graybill (1 983). We observe here only that if P;,
i = 1, . . . , n, is a set of orthonormal right eigenvectors of L corresponding to
the eigenvalues )" 1 , , An respectively, then P may be chosen as the n x n
matrix whose i'h column is p;, i = 1 , . . . , n.
D
• • •
Remark 1. Using the factorization ( 1 .6. 7) and the fact that det P = det P' = 1 ,
we immediately obtain the result,
det L = Jc1 )" 2 . . . A..Definition 1 .6.1 (The Multivariate Normal Distribution). The random vector
Y = ( Y1 , . . . , Y,)' is said to be multivariate normal, or to have a multivariate
normal distribution, if and only if there exist a column vector a, a matrix B
and a random vector X = (X 1 , , Xm )' with independent standard normal
• • •
34
1 . Stationary Time Series
components, such that
Y = a + BX.
Remark
density
2.
The components X 1 ,
• • •
( 1 .6.8)
, Xm of X in ( 1 .6.8) must have the joint
X = (x 1 , . . . , Xm )' E !Rm,
( 1.6.9)
and corresponding characteristic function,
)
(
</lx (u) = Eeiu'X = exp - t u]/2 ,
j= 1
Remark 3. It is clear from the definition that if Y has a multivariate normal
distribution and if D is any k x n matrix and c any x 1 vector, then
Z = c + DY is a k-component multivariate normal random vector.
k
Remark 4. If Y is multivariate normal with representation ( 1 .6.8), then by
Proposition 1 .6. 1 , EY = a and �YY = BB'.
Proposition 1 .6.4. If Y = ( Y1 , . . . , Y,)' is a multivariate normal random vector
such that EY = J1 and �YY = �. then the characteristic function of Y is
<jly (u) = exp(iu' Jl - !u'�u),
If det �
>
0 then Y has the density,
n
fy(y) = (2n) - i2 (det �) - 1i2 exp [ -!(Y - J1)'� - 1 (y - J1} ].
(1.6. 1 1)
( 1 .6. 1 2)
PROOF. If Y is multivariate normal with representation ( 1 .6.8) then
<jly(u) = E exp [ iu (a + BX)] = exp(iu'a) E exp(iu'BX).
Using ( 1 .6. 1 0) with u (E !Rm) replaced by B'u (u E !Rn) in order to evaluate the
last term, we obtain
'
<f>v(u) = exp(iu'a)exp( -!u'BB'u),
which reduces to ( 1 .6. 1 1 ) by Remark 4.
If det � > 0, then by Proposition 1 .6.3 we have the factorization,
� = PAP',
where PP' = In, the n x n identity matrix, A = diag(A. 1 , . . . , A.n) and each ).i > 0.
If we define A - 1 12 = diag(A.( 1 12 , . . . , A.;;- 112 ) and
� - 1;2 = PA - 112 P',
then it is easy to check that �- 112 ��-112 = ln. From Proposition 1 .6. 1 and
Remark 3 we conclude that the random vector
§1 .6. The Multivariate Normal Distribution
35
( 1 .6. 1 3)
is multivariate normal with EZ = 0 and I:zz = l . Application of the result
n
( 1 .6. 1 1 ) now shows that Z has the characteristic function lftz(u) = exp( - u'u/2),
whence it follows that Z has the probability density ( 1 .6.9) with m = n. In view
of the relation ( 1 .6. 1 3), the density of Y is given by
fv (Y) = j det I: - l/2 1fz(I: - If2 (y - f.l))
= (det I:) -112 (2n)-nf2 exp [ - !(Y - f.l)'I: - 1 (y - f.l) ]
as required.
D
Remark 5. The transformation ( 1 .6. 1 3) which maps Y into a vector of inde­
pendent standard normal random variables is clearly a generalization of the
transformation Z = u- 1 ( ¥ - /1) which standardizes a single normal random
variable with mean 11 and variance u 2 .
Remark 6. Given any vector f.l E IRn and any symmetric non-negative definite
n x n matrix I:, there exists a multivariate normal random vector with mean
f.l and covariance matrix I:. To construct such a random vector from a vector
X = (X 1 , . . . , XnY with independent standard normal components we simply
choose a = f.l and B = 1: 112 in ( 1 .6.8), where 1: 112 , in the terminology of
Proposition 1 .6.3, is the matrix PA 112 P' with A 112 = diag().il2 , . . . , 2�12 ).
Remark 7. Proposition 1 .6.4 shows that a multivariate normal distribution is
uniquely determined by its mean and covariance matrix. If Y is multivariate
normal, EY = f.l and I:yy = I:, we shall therefore say that Y has the multi­
variate normal distribution with mean f.l and covariance matrix I:, or more
succinctly,
Y � N(f.l, I:).
1 .6. 1 (The Bivariate Normal Distribution). The random vector
Y = ( ¥1 , Y2 )' is bivariate normal with mean f.l = (f1 1 , f1 2 )' and covariance
matrix
ExAMPLE
(1 .6. 14)
if and only if Y has the characteristic function (from ( 1 .6. 1 1))
lftv (u) = exp [i(u 1 f1 1
+ U2 f12 ) - !Cui uf + 2u 1 u2 pU1 (J2 + u� ui)].
(1.6. 1 5)
The parameters (J 1 , u2 and p are the standard deviations and correlation of
the components Y1 and ¥2 . Since every symmetric non-negative definite 2 x 2
matrix can be written in the form ( 1 .6. 14), it follows that every bivariate normal
random vector has a characteristic function of the form ( 1 .6. 1 5). If u1 i= 0,
u2 i= 0 and - 1 < p < 1 then I: has an inverse,
36
I . Stationary Time Series
(1 .6. 1 6)
and so by ( 1 .6. 1 2), Y has the probability density,
( 1 .6. 1 7)
Proposition 1.6.5. The random vector Y = ( Y1 , , Yn )' is multivariate normal
with mean 11 and covariance matrix L if and only if for each a = (a 1 , , an )' E IRn,
a'Y has a univariate normal distribution with mean a'11 and variance a' La.
. • .
. . •
PROOF. The necessity of the condition has already been established. To prove
the sufficiency we shall show that Y has the appropriate characteristic function.
For any a E IRn we are assuming that a' Y � N (a'Jl, a'La), or equivalently that
( 1 .6. 1 8)
E exp(ita'Y) = exp(ita' 11 - 1t 2 a'La).
Setting t
=
1 in ( 1 .6. 1 8) we obtain the required characteristic function ofY, viz.
E exp(ia'Y) = exp(ia' Jl - 1a'La).
0
Another important property of multivariate normal distributions (one
which we shall use heavily) is that all conditional distributions are again
multivariate normal. In the following proposition we shall suppose that Y is
partitioned into two subvectors,
y( t )
y
.
y< 2 )
=[ J
]
Correspondingly we can write the mean and covariance matrix of Y as
11 =
where 11(i )
=
[��::: ]
11
and L
=
[
Lt t L1 2
L2 1 L2 2
£yU> and L ii = E(Y< i > - ,.u> ) (y u> - Jl (j) )'.
Proposition 1.6.6.
(i) y( l J and Y( 2 ) are independent if and only if L 1 2 = 0.
(ii) If det L 2 2 > 0 then the conditional distribution of Y( l l given y< 2 > is
N(J1< 1 > + L 1 2 L zi (Y<2 > - 11<2 > ), L t t - L 1 2 L zi L 2 t l·
PROOF. (i) If y( l> and Y< 2 > are independent, then
L 1 2 E(Y< t > - 11( 1 l )E(Y< 2 > - 11( 2 > )' = 0.
=
Conversely if L 1 2
=
0 then the characteristic function lfov(u), as specified by
§ I . 7. * Applications of Kolmogorov's Theorem
37
Proposition 1 .6.4, factorizes into
tPy(U)
=
tPyi 1>(U( l ))tPy<2>(U(Z)),
establishing the independence of y< I > and Y( 2 ).
(ii) If we define
( 1 .6. 1 9)
then clearly
so that X and y< z > are independent by (i). Using the relation ( 1 .6. 1 9) we can
express the conditional characteristic function of y< l > given Y( 2 ) as
E(exp(iu' Y< 1 > )j Y< 2 >) = E(exp [iu'X + iu'(J1< 1 > + L 1 2 L2� (Y< 2 > - Jl< 2 > ))] j Y< 2 >)
1
= exp [iu'(J1< > + L 1 2 L2� (Y< 2 > - J1< 2 > ))] E exp(iu' X j Y< 2 > ),
where the last line is obtained by taking a factor dependent only on y< z >
outside the conditional expectation. Now since X and Y( 2 ) are independent,
E(exp(iu' X) j Y< 2 > ) = E exp(iu' X) = exp[ - �u'(L 1 1 - L 1 2 L z� L 2 du],
so
E(exp(iu' Y< 1 >)j Y( 2 ))
1
2
2
= exp [iu'(Jl< > + L 1 2 L2� (Y< > - Jl< > )) - �u'(L l l - L l z Lz� L z l )u],
completing the proof.
D
ExAMPLE 1 .6.2. For the bivariate normal random vector Y discussed in
Example 1 .6. 1 we immediately deduce from Proposition 1 .6.6 that Y1 and Y2
are independent if and only if p(J 1 (J2 = 0. If (J 1 > 0, (J2 > 0 and p > 0 then
conditional on Y2 , Y1 is normal with mean
1
E(Y1 l Yz ) = !11 + P (J1 (Jz - ( Yz - Jl z ),
and variance
§ 1 .7* Applications of Kolmogorov's Theorem
In this section we illustrate the use of Theorem 1 .2. 1 to establish the existence
of two important processes, Brownian motion and the Poisson process.
Definition 1 .7.1 (Standard Brownian Motion). Standard Brownian motion
starting at level zero is a process { B(t), t ?: 0} satisfying the conditions
l. Stationary Time Series
38
(a) B(O) = 0,
(b) B(t 2 ) - B(t 1 ), B(t 3 ) - B(t 2 ), . . . , B(tn) - B(tn_ 1 ), are independent for every
n E { 3, 4, . . . } and every t = (t 1 , . . . , tn)' such that 0 � t 1 < t 2 < · · · < tn ,
(c) B(t) - B(s) N(O, t - s) for t ;;::: s.
�
To establish the existence of such a process we observe that conditions
(a), (b) and (c) are satisfied if and only if, for every t (t 1 , . . . , tn)' such that
0 � t 1 < · · · < tn , the characteristic function of (B(t 1 ), . . . , B(tn)) is
1Pt{u) = E exp [iu 1 B(t d + + iunB(tn)J
=
· · ·
+ iu2 (i1 1 + l12 ) + · · · + iun(L1 1 + · · · + L1n)]
( 1 .7.1)
(where L1i = B(ti ) - B(ti_1 ), j ;;::: 1, and t0 = 0)
= E exp [ii1 1 (u 1 + · · · + Un) + il1 2 (u 2 + · · · + un) + · · · + il1nunJ
= exp [ - � I (ui + · · · + un)2 (ti - ti_ 1 ) J .
2 j=1
= E exp [iu 1 i1 1
It is trivial to check that the characteristic functions (M · ) satisfy the consistency
condition (1 .2.9) and so by Kolmogorov' s theorem there exists a process with
characteristic functions r/J1( · ), or equivalently with the properties (a), (b) and (c).
Definition 1 .7.2 (Brownian Motion with Drift). Brownian motion with drift Jl,
variance parameter rJ 2 and initial level x is process { Y(t), t ;;::: 0} where
Y(t) = X + J.Lt + rJB(t),
and B(t) is standard Brownian motion.
The existence of Brownian motion with drift follows at once from that of
standard Brownian motion.
Definition 1 .7.3 (Poisson Process). A Poisson process with mean rate A. ( > 0)
is a process { N(t), t ;;::: 0} satisfying the conditions
(a) N(O) = 0,
(b) N(t 2 ) - N(td, N(t 3 ) - N(t 2 ), , N(tn ) - N(tn_ 1 ), are independent for
every n E { 3,4, . . . } and every t = (t 1 , . . . , tn)' such that O � t 1 < t 2 < · · · < tn ,
(c) N(t) - N(s) has the Poisson distribution with mean A.(t - s) for t ;;::: s.
• • .
The proof of the existence of a Poisson process follows precisely the same
steps as the proof of the existence of standard Brownian motion. For the
Poisson process however the characteristic function of the increment L1i =
N(ti ) - N(ti - d is
E exp(iu l1i ) = exp { - A.(ti - ti _ d ( l - e ;" )}.
In fact the same proof establishes the existence of a process {Z(t), t ;;::: 0}
Problems
39
satisfying conditions (a) and (b) of Definition 1 .7. 1 provided the increments
L1i = Z(ti ) - Z(tj -t) have characteristic function of the form
Problems
1. 1. Suppose that X, = Z, + IJZ, _ 1 , t 1, 2, . . . , where Z0, Z 1 , Z2 , . . . , are independent
random variables, each with moment generating function E exp().Z;) m(A).
(a) Express the joint moment generating function E exp(L7�1 A ; XJ in terms of
the function m( · ).
(b) Deduce from (a) that {X, } is strictly stationary.
=
=
1 .2. (a) Show that a linear filter { aJ passes an arbitrary polynomial of degree k
without distortion, i.e.
for all k'h degree polynomials m,
=
k
c0 + c 1 t + · · · + ck t , if and only if
(b) Show that the Spencer 1 5-point moving average filter { aJ does not distort
a cubic trend.
1 .3. Suppose that m,
(a) Show that
=
c0
+
2
c 1 t + c2 t , t
2
m, = I, a i mt+ i
i= - 2
=
=
0, ± 1, . . . .
3
I bi mr+ i'
i= - 3
t
=
0, ± 1, . . . ,
where a 2 a 2
f.; a 1 a_ 1 H, a0 H, and b3 b_ 3 -fr,
b2 b_ 2 -fr, b 1 = b_! = fr, bo = k
(b) Suppose that X, = m, + Z, where { Z,, t = 0, ± 1 , . . . } is an independent se­
quence of normal random variables, each with mean 0 and variance u 2 • Let
U, = If� - 2 a ; Xr+ i and V, = If� - 3 b; Xr+ i ·
(i) Find the means and variances of U, and V,.
(ii) Find the correlations between U, and U,+ 1 and between V, and V,+t ·
(iii) Which of the two filtered series { U, } and { V,} would you expect to be
smoother in appearance?
=
=
_
=
-
,
=
=
=
=
=
=
1 .4. If m, = I f� o ckt\ t = 0, ± 1 , . . . , show that Vm, is a polynomial of degree (p
i n t and hence that VP + l m , 0.
-
1)
=
1 .5. Design a symmetric moving average filter which eliminates seasonal components
with period 3 and which at the same time passes quadratic trend functions
without distortion.
1 . Stationary Time Series
40
1 .6. (a) Use the programs WORD6 and PEST to plot the series with values
1
{ x 1 , . . . , x30 } given by
1-10 486 474 434 441 435 401 414 414 386 405
1 1-20 4 1 1 389 414 426 4 1 0 441 459 449 486 5 1 0
21-30 506 549 579 5 8 1 630 666 674 729 7 7 1 785
This series is the sum of a quadratic trend and a period-three seasonal
component.
(b) Apply the filter found in Problem 1.5 to the preceding series and plot the
result. Comment on the result.
1 .7. Let Z,, t = 0, ± 1, . . . , be independent normal random variables each with mean
0 and variance a 2 and let a, b and c be constants. Which, if any, of the following
processes are stationary? For each stationary process specify the mean and
autocovariance function.
(a) X, = a + bZ, + cZ,_ 1 ,
(b) X, = a + bZ0 ,
(d) X, = Z0 cos(ct),
(c) X, = Z 1 cos(ct) + Z2 sin(ct),
(e) X, = Z,cos(ct) + Z,_ , sin(ct),
(f) X, = Z,Z,_ 1 .
1 .8. Let { Y, } be a stationary process with mean zero and let a and b be constants.
(a) If X, = a + bt + s, + Y, where s, is a seasonal component with period 1 2,
show that VV 1 2 X, = ( 1 - 8) (1 - B 1 2 ) X, is stationary.
(b) If X, = (a + bt) s, + Y, where s, is again a seasonal component with period
1 2, show that V�2 X, = ( 1 - 8 1 2 ) ( 1 - B 1 2 ) X, is stationary.
1 .9. Use the program PEST to analyze the accidental deaths data by "classical de­
composition".
(a) Plot the data.
(b) Find estimates s, t = 1, . . . , 1 2, for the classical decomposition model,
X, = m, + s, + Y,, where s, = s, + 1 2 , z)�1 s, 0 and E Y, = 0.
(c) Plot the deseasonalized data, X, - s, t = 1, . . . , 72.
(d) Fit a parabola by least squares to the deseasonalized data and use it as your
estimate m, of m, .
(e) Plot the residuals Y, = X, - m, - .�,, t = 1, . . . , 72.
(f) Compute the sample autocorrelation function of the residuals p(h), h =
=
0, . . . , 20.
(g) Use your fitted model to predict X,, t = 73, . . . , 84 (using predicted noise
values of zero).
1 . 1 0. Let X, = a +
bt + Y,, where { Y,, t = 0, ± 1, . . . } is an independent and identically
distributed sequence of random variables with mean 0 and variance a 2 , and a
and b are constants. Define
W,
=
(2q
+
q
W' l: X,+j ·
j= q
-
Compute the mean and autocovariance function of { W, } . Notice that although
{ W, } is not stationary, its autocovariance function y(t + h, t) = Cov( W,+h, W, )
does n o t depend on t. Plot the autocorrelation function p(h) = Corr(W,+ h , W,).
Discuss your results in relation to the smoothing of a time series.
Problems
41
1.1 1.
If {X,} and { Y,} are uncorrelated stationary sequences, i.e. if X, and Y, are
uncorrelated for every s and t, show that {X, + Y,} is stationary with auto­
covariance function equal to the sum of the autocovariance functions of {X,}
and { Y,}.
1 . 1 2.
Which, if any, of the following functions defined on the integers is the
autocovariance function of a stationary time series?
{
1
if h = 0,
1 /h if h =I 0.
nh
nh
(c) f(h) = 1 + cos T + cos 4
(a) f(h) =
(e) f(h) =
1 . 1 3.
{
(b) f(h) = (
�
1 )ihl
{
nh
nh
(d) f(h) = 1 + cos - � cos 4
2
1 if h = 0,
(f) .f(h) = .6 if h = ± 1 ,
0 otherwise.
1 if h 0,
.4 if h = ± 1 ,
0 otherwise.
=
Let {S, t = 0, 1 , 2, . . . } be the random walk with constant drift
S0 = 0 and
s, = 11 + s,_ 1 + x,
f1,
defined by
t = 1 , 2, . . . '
where X 1 , X 2 , . . . are independent and identically distributed random variables
with mean 0 and variance (J2 . Compute the mean of S, and the autocovariance
function of the process { S, }. Show that {VS, } is stationary and compute its mean
and autocovariance function.
1 . 1 4.
1 . 1 5.
1 . 1 6.
If X, = a + bt, t = 1 , 2, . . . , n, where a and b are constants, show that the sample
autocorrelations have the property p(k) --> 1 as n --> oo for each fixed k.
Prove Proposition 1 . 6. 1 .
(a) If Z N (O, 1 ) show that Z 2 has moment generating function Ee'z' =
( 1 � 2tf 1 12 for t < !, thus showing that Z2 has the chi-squared distribution
with 1 degree of freedom.
(b) If Z 1 , . . . , Z" are independent N(O, 1) random variables, prove that Zl + · · · +
z; has the chi-squared distribution with n degrees of freedom by showing
that its moment generating function is equal to (1 � 2tf"12 for t < !.
(c) Suppose that X = (X 1 , . . . , X")' N(Jl, L) with L non-singular. Using ( 1 .6. 1 3),
show that (X � Jl)'L - 1 (X � Jl) has the chi-squared distribution with n degrees
of freedom.
�
�
1 . 1 7.
If X = (X 1 , . . . , X")' is a random vector with covariance matrix L, show that L is
singular if and only if there exists a non-zero vector b = (b 1 , , b")' E IR" such
that Var(b'X) = 0.
• . .
1 . 1 8.* Let F be any distribution function, let T be the index set T { 1 , 2, 3, . . . } and let
Y be as in Definition 1 .2.3. Show that the functions F1, t E Y, defined by
=
F,, ... 1Jx 1 ,
• • •
, xn) : = F(x J l · · · F(xn),
X 1 , . . . , Xn E !R,
constitute a family of distribution functions, consistent in the sense of ( 1 .2.8).
By Kolmogorov's theorem this establishes that there exists a sequence of inde­
pendent random variables {X 1 , X 2, } defined on some probability space and
such that P(X; :o:; x) = F(x) for all i and for all x E IR.
. • .
CHAPTER 2
Hilbert Spaces
Although it is possible to study time series analysis without explicit use of
Hilbert space terminology and techniques, there are great advantages to be
gained from a Hilbert space formulation. These are largely derived from our
familiarity with two- and three-dimensional Euclidean geometry and in par­
ticular with the concepts of orthogonality and orthogonal projections in these
spaces. These concepts, appropriately extended to infinite-dimensional Hilbert
spaces, play a central role in the study of random variables with finite second
moments and especially in the theory of prediction of stationary processes.
Intuition gained from Euclidean geometry can often be used to make
apparently complicated algebraic results in time series analysis geometrically
obvious. It frequently serves also as a valuable guide in the development and
construction of algorithms.
This chapter is therefore devoted to a study of those aspects of Hilbert
space theory which are needed for a geometric understanding of the later
chapters in this book. The results developed here will also provide an adequate
background for a geometric approach to many other areas of statistics, for
example the general linear model (see Section 2.6). For the reader who wishes
to go deeper into the theory of Hilbert space we recommend the book by
Simmons ( 1 963).
§2. 1 Inner-Product Spaces and Their Properties
Definition 2.1.1 (Inner-Product Space). A complex vector space Yf is said to
be an inner-product space if for each pair of elements x and y in Yf, there is
a complex number <x, y), called the inner product of x and y, such that
§2. 1. Inner-Product Spaces and Their Properties
(a)
(b)
(c)
(d)
(e)
43
<x, y) = <y, x ) , the bar denoting complex conjugation,
<x + y, z) = <x, z) + <y, z) for all x, y, z E Yf,
< o:x, y) = o: <x, y) for all x , y E Yf and o: E IC,
<x, x) ;:::: 0 for all x E Yf,
<x , x) = 0 if and only if x = 0.
Remark I. A real vector space Yf is an inner-product space if for each x, y E Yf
there exists a real number <x, y) satisfying conditions (a)-(e). Of course
condition (a) reduces in this case to <x, y) = <y, x ).
Remark 2. The
inner product is a natural generalization of the inner or scalar
product of two vectors in n-dimensional Euclidean space. Since many of the
properties of Euclidean space carry over to inner-product spaces, it will be
helpful to keep Euclidean space in mind in all that follows.
ExAMPLE 2. 1 . 1 (Euclidean Space). The set of all column vectors
is a real inner-product space if we define
k
<x, y ) = L X;Yii=t
(2. 1 . 1)
Equation (2. 1 . 1 ) defines the usual scalar product of elements of IRk . It is a simple
matter to check that the conditions (a)-(e) are all satisfied.
In the same way it is easy to see that the set of all complex k-dimensional
column vectors
z = (z 1 , • • • , zk )' E IC k
is a complex inner-product space if we define
k
< w, z) = L W;Z; .
i= l
Definition 2.1 .2 (Norm).
is defined to be
(2. 1 .2)
The norm of an element x of an inner-product space
l l x ii =
FxA.
(2. 1 .3)
In the Euclidean space [Rk the norm of the vector is simply its length,
l l x ll = ( L�= l x f ) 112·
The Cauchy-Schwarz Inequality. If Yf is an inner-product space, then
l <x, y ) l
and
:o;;
ll x ii i i Y II
for all x, y E Yf,
l <x, y) l = ll x ii i i Y II if and only if x = y<x, y)f<y, y).
(2. 1 .4)
(2. 1 .5)
2. Hilbert Spaces
44
PROOF. The following proof for complex :tf remains valid (although it could
be slightly simplified) in the case when :tf is real.
Let a = IIYII 2 , b = I <x, y) I and c = ll x f. The polar representation of <x, y)
is then
<x, y) = be ;o for some O E ( - n, n] .
Now for all r E IR,
<x - rei8 y, x - re i 8 y)
<x, x) - rei0 <y, x) - re - w <x, y) + r 2 < y, y)
(2. 1 .6)
= c - 2rb
r 2 a,
=
+
and using elementary calculus, we deduce from this that
0 � min (c - 2rb + r 2 a) = c - b 2 1a,
r E IR
thus establishing (2. 1 .4).
The minimum value, c - b 2la, of c - 2rb + r2 a is achieved when r = bla.
If equality is achieved in (2. 1 .4) then c - b 2 Ia = 0. Setting r = bIa in (2. 1 .6) we
then obtain
<x - yei0 bla, x - ye i8 bla) = 0,
which, by property (e) of inner products, implies that
x = yei8 bla = y<x, y>l< y, y).
Conversely if x = y <x, y)l< y, y) (or equivalently if x is any scalar multiple of
y), it is obvious that there is equality in (2. 1 .4).
D
EXAMPLE 2. 1 .2 (The Angle between Elements of a Real Inner-Product
Space). In the inner-product space IR3 of Example 2. 1 . 1 , the angle between two
non-zero vectors x and y is the angle in [0, n] whose cosine is L f= I X; Y; /
( l l x l l ll y ll ). Analogously we define the angle between non-zero elements x and
y of any real inner-product space to be
(2. 1 .7)
e = cos - 1 [<x, y)l( ll x ii iiYII )].
In particular x and y are said to be orthogonal if and only if <x, y) = 0. For
non-zero vectors x and y this is equivalent to the statement that 0 = nl2.
The Triangle Inequality. If :tf is an inner-product space, then
ll x + Yll � l l x ll
PROOF.
ll x + Y l l 2
=
+ II Y II
for all x, y E :tf.
(2. 1 .8)
<x + y, x + y)
<x, x) + <x, y) + <y, x) + < y, y)
� ll x l l 2 + 2 11x ii iiYII + ll y f
=
by the Cauchy-Schwarz inequality.
D
§2. 1 . Inner-Product Spaces and Their Properties
45
Proposition 2.1 . 1 (Properties of the Norm). If Yf is a complex (respectively
real) inner-product space and II x I I is defined as in (2. 1 .3), then
(a)
(b)
(c)
(d)
+
l l x Y l l ::;; l l x l l + IIYII for all x, y E Yf,
for all x E Yf and all a E C (a E IR),
ll ax ll l a l ll x ll
for all x E Yf,
ll x ll � 0
if and only if x = 0.
ll xll = 0
=
(These properties justify the use of the terminology "norm" for llxll .)
PROOF. The first property is a restatement of the triangle inequality and the
others follow at once from Definition (2. 1 .3).
0
The Parallelogram Law. If Yf is an inner-product space, then
llx Y ll 2 llx - Y ll 2 = 2 11xll 2 + 2 11Y II 2 for all x, y E Yf.
+ +
(2. 1 .9)
PROOF. Problem 2. 1 . Note that (2. 1 .9) is not a consequence of the properties
(a), (b), (c) and (d) of the norm. It depends on the particular form (2. 1 .3) of the
norm as defined for elements of an inner-product space.
D
Definition 2.1 .3 (Convergence in Norm). A sequence { x., n = 1 , 2, . . . } of ele­
ments of an inner-product space Yf is said to converge in norm to x E Yf if
ll x. - x ll -+ 0 as n -+ oo.
Proposition 2. 1 .2 (Continuity of the Inner Product). If { x. } and { y. } are
sequences of elements of the inner-product space Yf such that llx. - x ll -+ 0 and
llYn - Y ll -+ 0 where x, y E Yf, then
(a) llx. ll -+ llxll
and
PROOF. From the triangle inequality it follows that l l x l l ::;; ll x - Yll + IIYII
and I IYII ::;; I I Y - x ll ll x ll . These statements imply that
(2. 1 . 10)
ll x - Y ll � \ ll x ll - II Y II \ ,
+
from which (a) follows immediately. Now
l <x. , y. ) - <x, y) l
l <x., y. - y) + <x. - x, y) l
::;; l <x., y. - y) l + l <x. - x, y) l
=
::;; llx. II I I Yn - Y l l + llx. - x ii iiYII ,
where the last line follows from the Cauchy-Schwarz inequality. Observing
from (a) that llx. l l -+ ll x ll , we conclude that
D
2. Hilbert Spaces
46
§2.2 Hilbert Spaces
An inner-product space with the additional property of completeness is called
a Hilbert space. To define completeness we first need the concept of a Cauchy
sequence.
Definition 2.2.1 (Cauchy Sequence). A sequence { xn , n = 1, 2, . . . } of elements
of an inner-product space is said to be a Cauchy sequence if
ll xn - xm ll � 0 as m, n � oo ,
i.e. if for every 6 > 0 there exists a positive integer N(6) such that
l l xn - xm l l < 6 for all m, n > N(6).
Definition 2.2.2 (Hilbert
Space). A Hilbert space Yf is an inner-product space
which is complete, i.e. an inner-product space in which every Cauchy sequence
{ xn } converges in norm to some element x E Yf.
2.2. 1 (Euclidean Space). The completeness of the inner-product
space IRk defined in Example 2. 1. 1 can be verified as follows. If xn =
(xnl , Xn 2 , . . . , Xnk )' E IRk satisfies
k
l l xn - xm l l 2 = L l xni - xm Y � 0 as m, n � oo ,
i =l
then each of the components must satisfy
EXAMPLE
By the completeness of IR, there exists x i E IR such that
l xn i - xd � 0 as n � oo ,
and hence if x = (x 1 , . . . , xd, then
ll xn - x ll � 0 as n � oo .
Completeness of the complex inner-product space Ck can be checked in the
same way. Thus IRk and C k are both Hilbert spaces.
(The Space L 2 (0., .'F, P)). Consider a probability space
(0., .'F, P) and the collection C of all random variables X defined on 0. and
satisfying the condition,
ExAMPLE 2.2.2
EX 2
=
fn X (w)2 P(dw)
< oo.
With the usual notion of multiplication by a real scalar and addition of
random variables, it is clear that C is a vector space since
E(aX) 2 = a 2 EX 2 < oo for all a E IR and X E C,
§2.2. Hilbert Spaces
47
and, from the inequality (X + Y) 2 ::;; 2X 2 + 2 Y 2 ,
E(X
+
Y)2 ::;; 2EX 2
+
2E Y 2 <
oo
for all X, YE C.
The other properties required of a vector space are easily checked. In particular
C has a zero element, the random variable which is identically zero on Q.
For any two elements X, Y E C we now define
< X, Y) = E(X Y).
(2.2. 1 )
I t i s easy t o check that <X, Y ) satisfies all the properties of a n inner product
except for the last. If <X, X) = 0 then it does not follow that X(w) = 0 for
all co, but only that P(X = 0) = 1. This difficulty is circumvented by saying
that the random variables X and Y are equivalent if P(X = Y) = 1 . This
equivalence relation partitions C into classes of random variables such that
any two random variables in the same class are equal with probability one.
The space U (or more specifically L2(Q, ff, P)) is the collection of these
equivalence classes with inner product defined by (2.2. 1 ). Since each class is
uniquely determined by specifying any one of the random variables in it,
we shall continue to use the notation X, Y for elements of L2 and to call
them random variables (or functions) although it is sometimes important to
remember that X stands for the collection of all random variables which are
equivalent to X.
Norm convergence of a sequence {Xn } of elements of L2 to the limit X
means
Norm convergence of X" to X in an L 2 space is called mean-square con­
vergence and is written as X" � X.
To complete the proof that U is a Hilbert space we need to establish
completeness, i.e. that if II xm - xn 11 2 --+ 0 as m, n --+ oo , then there exists X E L 2
such that X" � X. This is indeed true but not so easy to prove as the
completeness of IRk . We therefore defer the proof to Section 2. 10.
ExAMPLE 2.2.3 (Complex L 2 Spaces). The space of complex-valued ran­
dom variables X on (Q, ff, P) satisfying E I X I 2 < oo is a complex Hilbert space
if we define an inner product by
<X, Y)
=
E(X Y).
(2.2.2)
In fact if J1 is any finite non-zero measure on the measurable space (Q, ff),
and if D is the class of complex-valued functions on Q such that
In
1/1 2 dj1 < 00
(2.2.3)
(with identification of functions f and g such that Jn I f - g l 2 dJ1 = 0), then D
becomes a Hilbert space if we define the inner product to be
48
2. Hilbert Spaces
( f, g) =
L/g
d,u.
(2.2.4)
This space will be referred to as the complex Hilbert space L 2 (D., %, ,u). (The
real Hilbert space L 2 (D., %, ,u) is obtained if D is replaced by the real-valued
functions satisfying (2.2.3). The definition of (f, g) then reduces to Sn fg d,u.)
Remark 1 . The terms L 2 (D., %, P) and U(D., %, ,u) will be reserved for the
respective real Hilbert spaces unless we state specifically that reference is being
made to the corresponding complex spaces.
(Norm Convergence and the Cauchy Criterion). If {x n } is a
sequence of elements belonging to a Hilbert space Yf, then {x n } converges in
norm if and only if l l xn - xm l l -> 0 as m, n -> oo .
Proposition 2.2.1
PROOF. The sufficiency of the Cauchy criterion is simply a restatement of the
completeness of Yf. The necessity is an elementary consequence of the triangle
inequality. Thus if ll xn - x l l -> 0,
llxn - xmll ::;:; ll xn - x ll + ll x - xm ll -> 0 as m, n -> 0.
0
EXAMPLE 2.2.4. The Cauchy criterion is used primarily in checking for
the norm convergence of a sequence whose limit is not specified. Consider for
example the sequence
n
sn = L a;X;
(2.2.5)
i=l
where {X; } is a sequence of independent N (0, 1 ) random variables. It is easy
to see that with the usual definition of the L 2 -norm,
m
m > n,
II Sm - Sn ll 2 = L a?,
i=n+l
and so by the Cauchy criterion { S" } has a mean-square limit if and only if for
every e > 0, there exists N(e) > 0 such that L7'= n+ l a? < e for m > n > N(e).
Thus {Sn } converges in mean square if and only if :L � 1 a? < oo .
§2.3 The Projection Theorem
We begin this section with two examples which illustrate the use of the
projection theorem in particular Hilbert spaces. The general result is then
established as Theorem 2.3. 1 .
EXAMPLE 2.3. 1 (Linear Approximation in IR3). Suppose we are given three
vectors in IR3,
49
§2.3. The Projection Theorem
z
Figure 2. 1 . The best linear approximation y
= rx 1 x 1
+
rx 2 x 2 ,
to y.
(i , t, 1)' ,
y
=
XI
=
( 1 , 0, ty,
X2
=
(0, 1 , i)',
Our problem is to find the linear combination y = rx 1 x 1 + rx 2 x 2 which is
closest to y in the sense that S = II Y - rx 1 x 1 - rx 2 x 2 ll 2 is minimized.
One approach to this problem is to write S in the form S = (i rx 1 )2 +
(i - rx 2 )2 + (1 - irx 1 - irx 2 ) 2 and then to use calculus to minimize with respect
to rx 1 and rx 2 . In the alternative geometric approach to the problem we observe
that the required vector y = rx 1 x 1 + rx 2 x 2 is the vector in the plane determined
by x 1 and x 2 such that y - rx 1 x 1 - rx 2 x 2 is orthogonal to the plane of x 1 and
x 2 (see Figure 2. 1 ). The orthogonality condition may be stated as
(2.3.1)
i = 1 , 2,
or equivalently
IX 1 ( x 1 , X 1 ) + IX z ( x 2 , X 1 ) = ( y, X 1 ),
rx 1 ( x 1 , x 2 ) + rx 2 ( x 2 , x 2 ) = ( y, x 2 ).
-
For the particular vectors x 1 , x 2 and y specified, these equations become
��rx 1 + /6 rx z
/6 rx l + ��rx z
from which we deduce that rx 1 = rx 2
=
=
=
�,
�,
�' and y
=
(�, t �)' .
2.3.2 (Linear Approximation in L 2 (0., ff, P)). Now suppose that
X 1 , X2 and Y are random variables in L 2 (0., ff, P). If only X 1 and X2 are
observed we may wish to estimate the value of Y by using the linear combina­
tion Y = rx 1 X 1 + rx 2 X2 which minimizes the mean squared error,
S = E l Y - rx 1 X 1 - rx 2 X2 1 2 = II Y - rx 1 X 1 - rx 2 X2 I I 2 .
ExAMPLE
As in Example 2.3.1 there are at least two possible approaches to this
problem. The first is to write
50
2. Hilbert Spaces
S = E Y 2 + af EXf + rx� EXi - 2rx 1 E(YX1 ) - 2rx 2 E(YX2 ) + rx 1 rx 2 E(X 1 X2 ),
and then to minimize with respect to rx 1 and rx 2 by setting the appropriate
derivatives equal to zero. However it is also possible to use the same geometric
approach as in Example 2.3. 1 . Our aim is to find an element Y in the set
{X E L 2 (Q, g;-, P) : X
a 1 X 1 + a 2 X2 for some a 1 , a 2 E 1R},
whose "squared distance" from Y, I I Y - Yl l 2 , is as small as possible. By
analogy with Example 2.3.1 we might expect Y to have the property that Y - Y
is orthogonal to all elements of A. The validity of this analogy, and the extent
to which it may be applied in more general situations, is established in
Theorem 2.3 . 1 (the projection theorem). Applying it to our present problem,
we can write
A
=
=
( Y - rx 1 X1 - rx 2 X2 , X) = 0 for all X E A,
or equivalently, by the linearity of the inner product,
i = 1, 2.
(2.3.2)
(2.3.3)
These are the same equations for rx 1 and rx 2 as (2.3. 1 ), although the inner
product is of course defined differently in (2.3.3). In terms of expectations we
can rewrite (2.3.3) in the form
rx 1 E(Xl) + rx 2 E(X2 X d = E ( YXd,
rx 1 E(X 1 X2 ) + rx 2 E(XI) = E( YX2 ),
from which rx1 and rx 2 are easily found.
Before establishing the projection theorem for a general Hilbert space we
need to introduce a certain amount of new terminology.
Definition 2.3.1 (Closed Subspace). A linear subspace A of a Hilbert space :ff
is said to be a closed subspace of :ff if A contains all of its limit points (i.e. if
x" E .4i and ll xn - x ll --> 0 imply that x E A).
Definition 2.3.2 (Orthogonal
Complement). The orthogonal complement of a
subset A of :ff is defined to be the set Al l_ of all elements of :ff which are
orthogonal to every element of A. Thus
x E Al l_ if and only if (x, y)
Proposition 2.3.1.
subspace of£.
=
0 (written x .l y) for all y E A. (2.3.4)
If A is any subset of a Hilbert space :ff then Al l_ is a closed
PROOF. It is easy to check from (2.3.4) that D E Al_ and that if x 1 , x 2 E A/ j_
then all linear combinations of x 1 and x 2 belong to Al l_ . Hence Al l_ is a
subspace of £. If x" E Al l_ and l l xn - x ll --> 0, then by continuity of the inner
product ( Proposition 2. 1 .2) (x, y) = 0 for all y E A, so x E Aj_ and hence Al l_
is closed.
D
,
§2.3. The Projection Theorem
51
(The Projection Theorem). If A is a closed subspace of the
Hilbert space Yf' and x E £, then
Theorem 2.3.1
(i) there is a unique element .X E A such that
l l x - .X II = inf llx - Y ll ,
and
(ii) x E A and llx - x ll = infy e .1t ll x - y l l
(x - x) E A_i.
(2.3.5)
if and only if x E A and
[ The element x is called the (orthogonal) projection of x onto A.]
PROOF. (i) If d = infye A ll x - yll 2 then there is a sequence { Yn } of elements of
A such that llYn - x 11 2 -> d. Apply the parallelogram law (2. 1 .9), and using
the fact that ( Ym + Yn )/2 E A, we can write
0 ::;; IIYm - Yn ll 2 = - 4 11 ( Ym + Yn )/2 - x ll 2 + 2( 11Yn - x l l 2 + IIYm - x l l 2 )
::;; - 4d + 2( 11Yn - x ll 2 + IIYm - x l l 2 )
->
0 a s m, n -> oo .
Consequently, by the Cauchy criterion, there exists .X E Yf' such that II Yn - .X II ->
0. Since A is closed we know that .X E A, and by continuity of the inner
product
ll x - x l l 2 = lim llx - Yn ll 2 = d.
To establish uniqueness, suppose that y E A and that llx - Pll 2 =
ll x - x ll 2 = d. Then, applying the parallelogram law again,
o ::;; l l x - P ll 2 = - 4 11 (x + P )/2 - x ll 2 + 2( 11 x - x ll 2 + li P - x ll 2 )
::;; - 4d + 4d = 0.
Hence y = x.
(ii) If x E A and (x - x) E A _l then .X is the unique element of A defined
in (i) since for any y E A,
llx - Yll 2 = ( x - x + x - y, x - .X + x - y)
= llx - x ll 2 + ll x - Y ll 2
;:o: ll x - x ll 2 ,
with equality if and only if y = x.
Conversely if x E .� and (x - x) ¢ A_l then x is not the element of A closest
to x since
x = x + ay/ I IYI I 2
IS
closer, where y
IS
any element of A such that ( x - .X, y) =I= 0 and
2. Hilbert Spaces
52
a = <x - x, y). To see this we write
l l x - x ll 2 = <x - x + x - x, x - x + x - .X )
= ll x - x ll 2 + l a i 2/ IIYII 2 + 2 Re <x - X, X - .X )
= ll x - x ll 2 - l a i 2 / IIY I I 2
< ll x - x ll 2 .
D
Corollary 2.3.1 (The Projection Mapping of Yf onto At). If At is a closed
subspace of the Hilbert space Yf and I is the identity mapping on Yf, then
there is a unique mapping PH of Yf onto At such that I - P.41 maps Yf onto At j_ .
Pn is called the projection mapping of Yf onto At.
PROOF. By Theorem 2.3. 1 , for each x E Yf there is a unique x E AI such that
x - x E At j_. The required mapping is therefore
P.41x = x,
X E Yf.
(2.3.6)
D
Proposition 2.3.2 (Properties of Projection Mappings). Let Yf be a Hilbert
space and let PA denote the projection mapping onto a closed subspace At.
Then
(i) fH(rxx + {3y) = rxP.41 X + f3fHy,
X, y E Yf, rx, {3 E C,
(ii) ll x ll 2 = II P.Hx ll 2 + 11 ( 1 - P.H)x ll 2 ,
(iii) each x E Yf has a unique representation as a sum of an element of At and
an element of Atj_, i.e.
x = P.41 x + (I - P.41 ) x,
(iv)
(v)
(vi)
and
(vii)
(2.3.7)
P.H xn --> P.41x if ll xn - x ll --> 0,
x E At if and only if PAx = x,
x E At j_ if and only if PAx = 0,
Al 1
<;;
At2 if and only if fH, P.412X = fA, x for all x E Yf.
PROOF. (i) rxPAx + f3PA Y E At since A is a linear subspace of Yf. Also
rxx + f3y - (rxP.41x + f3P.41 y) = rx(x - PAx) + f3(y - PAy)
E A\
since Aj_ is a linear subspace of Yf by Proposition 2.3. 1 . These two properties
identify rxPAx + f3PA Y as the projection P.41 (rxx + f3y).
(ii) This is an immediate consequence of the orthogonality of P.Hx and
(I - P"')x.
(iii) One such representation is clearly x = P.H x + (I - fA )x. If x = y + z,
y E A, Z E Alj_ is another, then
y - PAx + z - (I - PH)x = 0.
53
§2.3. The Projection Theorem
Taking inner products of each side with y - PAtx gives IIY - P.H x ll 2 = 0, since
z - (I - fA)x E AJ.. Hence y = fJtx and z = (I - P.H)x.
(iv) By (ii), I I P.ff (Xn - x) ll 2 ::;::; ll xn - x ll 2 -> 0 if ll xn - x ll -> 0.
(v) x E A if and only if the unique representation x y + z, y E A, z E AJ.,
is such that y = x and z = 0, i.e. if and only if PAx = x.
(vi) Repeat the argument in (v) with y = 0 and z = x.
(vii) x = P.ff2 X + (I - PJt2)x. Projecting each side onto A 1 we obtain
=
PA, x = PAt, PAf2 X + fJt, (I - P.H) x.
Hence fA, x = PAt, PAf2X for all x E Yl' if and only if PAt, Y = 0 for all y E Af,
i.e. if and only if A± c:; At, i.e. if and only if A1 c:; A/2 .
D
The Prediction Equations. Given a Hilbert space £, a closed subspace A,
and an element x E £, Theorem 2.3. 1 shows that the element of A closest to
x is the unique element x E A such that
(x - x, y) = 0 for all y E A.
(2.3.8)
The equations (2.3 . 1 ) and (2.3.2) which arose in Examples 2.3 . 1 and 2.3.2 are
special cases of (2.3.8). In later chapters we shall constantly be making use of
the equations (2.3.8), interpreting x = PAx as the best predictor of x in the
subspace A.
Remark 1. It is helpful to visualize the projection theorem in terms of Figure
2. 1 , which depicts the special case in which Yl' = IR 3 , A is the plane containing
x 1 and x 2 , and y = PAY · The prediction equation (2.3.8) is simply the state­
ment (obvious in this particular example) that y - y must be orthogonal to
A. The projection theorem tells us that y = PA Y is uniquely determined by
this condition for any Hilbert space Yl' and closed subspace A. This justifies
in particular our use of equations (2.3.2) in Example 2.3.2. As we shall see later
(especially in Chapter 5), the projection theorem plays a fundamental role in
all problems involving the approximation or prediction of random variables
with finite variance.
ExAMPLE 2.3.3 (Minimum Mean Squared Error Linear Prediction of a
Stationary Process). Let {X,, t = 0, ± 1, . . . } be a stationary process on (Q, g;, P)
with mean zero and autocovariance function y ( · ), and consider the problem
of finding the linear combination Xn+l = LJ=l rPni Xn +l -i which best approxi­
mates xn +l in the sense that E I Xn+l - LJ=l rPnjxn+l -jl 2 is minimum. This
problem is easily solved with the aid of the projection theorem by taking
Yl' = U(n , g;, P) and A = {L}=1 rtiXn+ l -i : rt 1 , . . . , rt" E IR}. Since minimization
of E I Xn+l - Xn+1 1 2 is identical to minimization of the squared norm II Xn+l Xn+1 ll 2 , we see at once that Xn+l = PAXn + 1 · The prediction equations (2.3.8)
are
(
Xn+l -
i� rPni Xn+l -i' Y) = 0
for all Y E A,
54
2. Hilbert Spaces
which, by the linearity of the inner product, are equivalent to the n equations
t r/Jnjxn+l -j, xk
I xn+l - j=l
\
)
=
o,
k
=
n, n - 1 , . . . , 1 .
Recalling the definition <X, Y) = E(X Y) of the inner product in L 2 (Q, !#', P),
we see that the prediction equations can be written in the form
(2.3.9)
where clln = (r/Jn 1 , . . . , rPnn)', 'Yn = (y(l), . . . , y(n))' and rn = [y(i - j)J7. ;= 1 · The pro­
jection theorem guarantees that there is at least one solution rPn of (2.3.9). If
rn is singular then (2.3.9) will have infinitely many solutions, but the projection
theorem guarantees that every solution will give the same (uniquely defined)
predictor xn +l ·
ExAMPLE 2.3.4. To illustrate the last assertion of Example 2.3.3, consider
the stationary process
X, = A cos(wt) + B sin(wt),
where w E (0, n) is constant and A, B are uncorrelated random variables with
mean 0 and variance a 2 . We showed in Example 1 .3.1 that for this process,
y (h) = a 2 cos(wh). It is easy to check from (2.3.9) (see Problem 2.6) that
r/J 1 = cos w and t/>2 = (2 cos w, - 1 )'.
Thus
X3 = (2 cos w)X2 - X 1 •
The mean squared error of X3 is
E(X3 - (2 cos w)X2
+
X 1 f = 0,
showing that for this process we have the identity,
X3 = (2 cos w)X2 - X 1 •
(2.3. 10)
The same argument and the stationarity of { X, } show that
X4
=
(2 cos w)X3 - X2 ,
(2.3. 1 1)
again with mean squared error zero. Because of the relation (2.3. 10) there are
infinitely many ways to reexpress X4 in terms of X 1 , X2 and X3 • This is
reflected by the fact that r3 is singular for this process and (2.3.9) has infinitely
many solutions for r/J3 .
§2.4 Orthonormal Sets
(Closed Span). The closed span sp { x,, t E T} of any subset
{ x,, t E T} of a Hilbert space Yf is defined to be the smallest closed subspace
of Yf which contains each element x,, t E T.
Definition 2.4.1
§2.4. Orthonormal Sets
55
Remark 1. The closed span of a finite set { x 1 , • • • , x. } is the set of all linear
combinations, y = a 1 x 1 + · · · + a.x., a 1 , . . . , a. E IC (or IR if :If is real). See
Problem 2.7. If for example x 1 , x 2 E IR3 and x 1 is not a scalar multiple of x 2
then sp {x 1 , x 2 } is the plane containing x 1 and x 2 •
Remark 2. If Jlt = sp { x 1 , . . . , x. }, then for any given x E .1t', ?_41 x is the unique
element of the form
such that
<x - P.ff x, y)
=
0,
or equivalently such that
<P.Afx, xj )
=
j = 1, . . . , n.
<x, xj ),
(2.4. 1 )
The equations (2 4. 1 ) can be rewritten as a set of linear equations for a 1 , . . . , a.,
.
VIZ.
n
(2.4.2)
L ai <xi, xj ) = <x, xj),
j = 1 , . . . , n.
i� l
By the projection theorem the system (2.4.2) has at least one solution for a1 ,
. . . , a. . The uniqueness of PAx implies that all solutions of (2.4.2) must yield
the same element a 1 x 1 + · · · + a.x• .
Definition 2.4.2 (Orthonormal Set). A set { e, t E T} of elements of an inner­
product space is said to be orthonormal if for every s, t E T,
<e. , e1 )
=
{1
if s = t,
if s f= t.
0
(2.4.3)
1
EXAMPLE 2.4. 1 . The set of vectors { ( 1 , 0, 0)', (0, 1, 0)', (0, 0, )' } is an ortho­
normal set in IR3.
EXAMPLE 2.4.2. Any sequence { Z, t E Z} of independent standard normal
random variables is an orthonormal set in L 2 (D., ;F, P).
If { e 1 , . . . , ed is an orthonormal subset of the Hilbert space .1t'
and Jlt = sp{e 1 , . . . , ek }, then
k
P41 x = L <x, e i ) ei for all x E .Yt',
(2.4.4)
i=!
Theorem 2.4.1 .
11 Pu x ll 2
I it
x-
=
<x, e i ) ei
k
L l <x, ei ) l 2 for all x E .Yt',
i= l
I �I t I
x-
i
ci ei
for all x E .Yt',
(2.4.5)
(2.4.6)
2. Hilbert Spaces
56
and for all c 1 , . . . , ck E C (or IR if Yf is real). Equality holds in (2.4.6) if and only
ifc; = (x, e; ), i = 1 , . . . , k.
The numbers (x, e; > are sometimes called the Fourier coefficients ofx relative
to the set { e 1 , . . . , ed.
PROOF. To establish (2.4.4) it suffices by Remark 2 to check that P.ff x as defined
by (2.4.4) satisfies the prediction equations (2.4. 1 ), i.e. that
(t
(x, e; ) e;, ej
)
=
(x, ei),
j = 1, . . . , k.
But this is an immediate consequence of the orthonormality condition (2.4.3).
The proof of (2.4.5) is a routine computation using properties of the inner
product and the assumed orthonormality of { e 1 , • • • , ed.
By Theorem 2.3. 1 (ii), I I x - P.ff x I I � I I x - y I I for all y E A, and this is
precisely the inequality (2.4.6). By Theorem 2.3. 1 (ii) again, there is equality in
(2.4.6) if and only if
k
k
(2.4.7)
I C;e; = �41x = L (x, e; ) e;.
i == l
i= l
Taking inner products of each side with ei and recalling the orthonormality
assumption, we immediately find that (2.4.7) is equivalent to the condition
ci ( x, ei >, j = 1, . . . , k.
D
=
(Bessel's Inequality). If x is any element of a Hilbert space Yf
and { e 1 , . . . , ed is an orthonormal subset of Yf then
k
I l (x, e;) l 2 � ll x ll 2 •
(2.4.8)
i� 1
PROOF. This follows at once from (2.4.5) and Proposition 2.3.2 (ii).
D
Corollary 2.4.1
Orthonormal Set). If { e,, t E T} is an orthonormal
subset of the Hilbert space Yf and if Yf = sp { e,, t E T} then we say that
{ e,, t E T} is a complete orthonormal set or an orthonormal basis for Yf.
Definition 2.4.3. (Complete
Definition 2.4.4 (Separability). The Hilbert space Yf is
Yf = sp{ e,, t E T} with { e,, t E T} a finite or countably infinite
set.
If Yf is the separable Hilbert space Yf
{ e;, i = 1, 2, . . . } is an orthonormal set, then
Theorem 2.4.2.
=
separable if
orthonormal
sp { e1 , e 2 , . • . } where
(i) the set of all finite linear combinations of {e 1 , e 2 , • • . } is dense in Yf, i.e.
for each x E Yf and £ > 0, there exists a positive integer k and constants c 1 ,
. . . , ck such that
I - ;t I <
x
c;e;
£,
(2.4.9)
57
§2.4. Orthonormal Sets
(ii)
(iii)
(iv)
(v)
==
=
x L� 1 <x, e; ) eJor each x E £, i.e. ll x - L7= 1 <x, e; ) eJ -+ 0 as n -+ oo ,
ll x ll 2 L�1 l <x, e; ) l 2 for each x E £,
<x, y) = L: � 1 <x, e; ) <e;, y) for each x, y E £, and
x 0 if and only if <x, e; ) = 0 for all i 1 , 2,
= ... .
The result (iv) is known as Parseval's identity.
PROOF. (i) If S = U ;;, 1 sp { e 1 , • . . , ei }, the set of all finite linear combinations
of { e 1 , e 2 , . . . } , then the closure S of S is a closed subspace of £ (.Problem 2. 1 7)
containing { e;, i = 1 , 2, . . . }. Since £ is by assumption the smallest such closed
subspace, we conclude that S = £.
(ii) By Bessel's inequality (2.4.8), L�=1 l <x, e; ) l 2 s ll x ll 2 for all positive
integers k. Hence L:� 1 l <x, e; ) l 2 s ll x ll 2. From (2.4.6) and (2.4.9) we conclude
that for each t: > 0 there exists a positive integer k such that
I i�
x-
<x, e) e;
I < e.
.
Now by Theorem 2.4. 1, L7= 1 <x, e) e; = PAx where A = sp {e 1 , . . , e. }, and
since for k s n, I �= 1 <x, e; ) e; E A, we also have
I i� I < e
�
= I X - i� r + I i�
< i= l
(iii) From (2.4. 10) we can write, for n
l l x ll 2
for all n � k.
<x, e)e;
x-
k,
<x, e) e;
£2
+
<x, e; ) e;
00
(2.4. 1 0)
r
L l <x, e; ) l 2 •
Since t: > 0 was arbitrary, we deduce that
00
l <x, e; ) l 2 ,
ll x ll 2 :S I
i =1
which together with the reversed inequality proved in (ii), establishes (iii).
(iv) The result (2.4. 10) established in (iii) states that II L7=1 <x, e)e; - x ll -+ 0
as n -+ oo for each x E £. By continuity of the inner product we therefore have,
for each x, y E £,
<x, y)
(�
n
n_,.oo i = l
= i::;;: l
=
!��
<x, e; ) e;,
i� <y, ei ) ei)
= lim I <x, e; ) <e;, y)
00
L <x, e; ) <e;, y).
(v) This result is an immediate consequence of (ii).
0
2. Hilbert Spaces
58
Remark 3. Separable Hilbert spaces are frequently encountered as the closed
spans of countable subsets of possibly non-separable Hilbert spaces.
§2.5 Projection in IR"
In Examples 2. 1 . 1 , 2. 1.2 and 2.2. 1 we showed that !Rn is a Hilbert space with
the inner product
< x, y )
n
= L X ; Y; ,
i= l
(2.5. 1 )
the corresponding squared norm
ll x ll 2
n
= L xf,
i=l
and the angle between x and y,
e
= cos -
1
(
< x, y )
ll x i i i i Y II
(2.5.2)
)
.
(2.5.3)
Every closed subspace A of the Hilbert space !Rn can be expressed by means
of Gram-Schmidt orthogonalization (see for example Simmons ( 1963)) as
A = sp {e 1 , . . . , em } where {e1 , . . . , em } is an orthonormal subset of A and m
( s n) is called the dimension of A (see also Problem 2. 1 4). lf m < n then there
is an orthonormal subset { em + l , . . . , en } of A j_ such that A j_ = sp{ em l , . . . , en } ·
By Proposition 2.3.2 (iii) every x E !R n can be expressed uniquely as a sum of
two elements of A and A j_ respectively, namely
+
(2.5.4)
where, by Theorem 2.4. 1 ,
PJtx
m
= L < x, e; ) e;
i= l
and
n
(/ - �41)x = L
i= m + l
< x, e; ) e;.
(2.5.5)
(2.5.6)
The following theorem enables us to compute PJtx directly from any specified
set of vectors { x 1 , . . . , xm } spanning A.
Theorem 2.5.1 .
If X ; E !Rn, i = 1 , . . . , m, and A = sp { x 1 , . . . ,xm } then
where X is the n
x
(2.5.7)
=
m matrix whose /h column is xi and
X'XP
X'x.
(2.5.8)
§2.5. Projection in
IR"
59
Equation (2.5.8) has at least one solution for p but XP is the same for all solutions.
There is exactly one solution of(2.5.8) if and only if X' X is non-singular and in
this case
P.41x = X(X' X) -1 X'x.
(2.5.9)
PROOF. Since P.41x E At, we can write
m
[J;x ; = xp,
fA x = I
=1
for some P = ([31 ,
• • •
, f3m)' E !Rm.
i
The prediction equations (2.3.8) are equivalent in this case to
<XP, xi ) = <x, xi), j = 1, . . . , m,
(2.5.1 0)
(2.5. 1 1)
and in matrix form these equations can be written
X'XP = X'x.
(2.5. 1 2)
The existence of at least one solution for p is guaranteed by the existence of
the projection P.41x. The fact that Xp is the same for all solutions is guaranteed
by the uniqueness of P.41 x. The last statement of the theorem follows at once
from (2.5.7) and (2.5.8).
D
Remark l. lf { x 1 , , xm} is an orthonormal set then X' X is the identity matrix
and so we find that
• • •
m
P.41x = XX'x = L <x, x; )x;,
i=1
in accordance with (2.5.5)
Remark 2. If { x 1 , , xm} is a linearly independent set then there must be a
unique vector p such that P.41x = xp. This means that (2.5.8) must have a
unique solution, which in turn implies that X' X is non-singular and
• • •
P.41x = X(X' X) -1 X'x for all x E IR".
The matrix X(X' X) - 1 X' must be the same for all linearly independent sets
{x 1 , . . . , xm} spanning At since P.41 is a uniquely defined mapping on IR".
Remark 3. Given a real n x n matrix M, how can we tell whether or not there
is a subspace At of IR" such that M x = P.41x for all x E IR"? If there is such a
subspace we say that M is a projection matrix. Such matrices are characterized
in the next theorem.
Theorem 2.5.2.
The n
x
n matrix M is a projection matrix if and only if
(a) M'
=
(b) M 2
=
M
and
M.
2. Hilbert Spaces
60
PROOF. If M is the projection matrix corresponding to some subspace A then
by Remark 2 it can be written in the form X(X' xr 1 X' where X is any
matrix having linearly independent columns which span A. It is easily verified
that (a) and (b) are then satisfied.
Suppose now that (a) and (b) are satisfied. We shall show that Mx = PA!x
for all x E IR" where A is the range of M defined by
R(M) = {Mx : x E IR"} .
First observe that M x E R(M) b y definition. Secondly we know that for any
y E R(M) there exists w E IR" such that y = Mw. Hence
( x - Mx, y) = ( x - Mx, Mw) = x'(J - M)' Mw = 0 for all y E R(M),
showing that Mx is indeed the required projection.
0
§2.6 Linear Regression and the General Linear Model
Consider the problem of finding the "best" straight line
(2.6. 1 )
81 x 82 ,
or equivalently the best values e1 , e2 of 81 , 82 E IR, t o fit a given set o f data
In least squares regression the best estimates e 1 , e2
points
y;), i = 1 , . .
are defined to be values of 81 , 82 which minimize the sum,
S(81 , 82 ) iL= l (y; - 81 X; - 82 f,
of squared deviations of the observations Y; from the fitted values 81
82 •
This problem reduces to that of computing a projection in IR" as is easily
seen by writing S (81 , 82 ) in the equivalent form
(2.6.2)
...
where x =
1 = (1, . . . ,1)' and y = (y 1 , . . . , y.)'. By the projection
theorem there is a unique VeCtOr of the form ({)1 X {)2 1) Which minimizes
PA!y where A = sp {x, 1 } .
S(8Defining
1 , 82 ), namely
X to be the
2 matrix X = [x, 1 ] and 6 to be the column
vector 6 = ({)1 , e2 )', we deduce from Theorem 2.5. 1 that
y=
(x;,
.,
+
n.
n
=
X; +
(x 1 ,
'
, x. ) ,
+
n x
PA!y = x6
where
(2.6.3)
X'X6 = X'y.
There is a unique solution 6 if and only if X' X is non-singular. In this case
6 = (xxr 1 X'y.
(2.6.4)
If X' X is singular there are infinitely many solutions of (2.6.4), however by the
uniqueness of �"'y, x6 is the same for all of them.
§2.6. Linear Regression and the General Linear Model
61
The argument just given applies equally well to least squares estimation
for the general linear model. The general problem is as follows. Given a set of
data points
i = 1, . . . , n; m � n,
we are required to find a value o = (B1 , . . . , emy of a = (81 , . . . , emy which
minimizes
m
S(fJ ) = L (yi - 81 x[Il - · · · - 8mx! l) 2
i= l
n
where y = (y 1 , . . . ,y. )' and x Ul = (xpl, . . . , x�jl)', j = 1 , . . . , m. By the projection
m
theorem there is a unique vector of the form (B 1 x 0 l + · · · + emx< l) which
m
minimizes S( fJ ), namely PAY where .$1 = sp { x < l l, . . . , x< l } .
Defining X t o be the n x m matrix X = [x1 1 l, . . . , x <ml] and 0 to be the
column vector 0 = ( 81 , . . . , (Jm)', we deduce from Theorem 2.5 . 1 that
PAY = XO
where
X'XO = X'y
(2.6.5)
0 = (X' X)- 1 X'y.
(2.6.6)
As in the special case of fitting a straight line, 0 is uniquely defined if and only
if X' X is non-singular, in which case
If X' X is singular then there are infinitely many solutions of (2.6.5) but XO is
the same for all of them.
In spite of the assumed linearity in the parameters 81 , . . . , (Jm, the applica­
tions of the general linear model are very extensive. As a simple illustration,
let us fit a quadratic function,
y = 81 x 2 + 82 x + 83,
to the data
0
2
3
4
3
5
8
The matrix X for this problem is
0 0
20
10 - 40
1 1
1 74 - 108 .
giving (X' Xr 1 =
- 40
X= 4 2
1 0
1 24
20 - 108
9 3
16 4
The least squares estimate 0 = (B1 , B2 , B3)' is therefore unique and is found
�
[
]
2. Hilbert Spaces
62
from (2.6.6) to be
9 = (0.5, - 0. 1 , 0.6)'.
The vector of fitted values XO = PAY is given by
xa = (0.6, 1, 2.4, 4.8, 8.2)',
as compared with the vector of observations,
y
= ( 1 , 0, 3, 5, 8)'.
§2.7 Mean Square Convergence, Conditional
Expectation and Best Linear Prediction in
L2 (Q, ff, P)
All results in this section will be stated for the real Hilbert space L 2 =
L 2 (0., �, P) with inner product (X, Y) = E(X Y). The reader should have no
difficulty however in writing down analogous results for the complex space
L 2 (0., �' P) with inner product (X, Y) = E(X Y). As indicated in Example
2.2.2, mean square convergence is just another name for norm convergence
in L 2 , i.e. if xn , X E L 2 , then
Xn � X if and only if I I Xn - X ll 2 = E I X. - Xl2 --+ 0 as n --+ oo .
(2.7. 1 )
By simply restating properties already established for norm convergence we
obtain the following proposition.
Proposition 2.7.1 (Properties of Mean Square Convergence).
(a) X" converges in mean square if and only if E I Xm - X. l2 --+ 0 as m, n --+ oo .
(b) If X. � X and Yn � Y then a s n --+ oo ,
(i) EX. = (X., 1 ) --+ (X, 1 ) = EX,
(ii) E I Xn l 2 = (Xn , Xn ) --+ (X, X) = E I X I 2,
and
(iii) E(X. Yn ) = (X., Yn ) --+ (X, Y ) = E(X Y).
Definition 2.7.1 (Best Mean Square Predictor of Y). If A is a closed subspace
of L 2 and Y E L 2 , then the best mean square predictor of Y in A is the element
Y E A such that
(2.7.2)
I I Y - Y II 2 = inf II Y- Z II 2 = inf E I Y - ZI 2 .
Z E J!
Z E J!
The projection theorem immediately identifies the unique best predictor of
Y in A as PA Y. By imposing a little more structure on the closed subspace
§2.7. Mean Square Convergence, Conditional Expectation
63
A, we are led from Definition 2. 7. 1 to the notions of conditional expectation
and best linear predictor.
(The Conditional Expectation, E.41 X). If A is a closed sub­
space of L 2 containing the constant functions, and if X E L 2, then we define
the conditional expectation of X given A to be the projection,
Definition 2.7.2
(2.7.3)
Using the definition of the inner product in L 2 and the prediction equations
(2.3.8) we can state equivalently that E H X is the unique element of A such
that
E( WE.41X) = E ( WX) for all W E A.
(2.7.4)
Obviously the operator EH on L 2 has all the properties of a projection
operator, in particular (see Proposition 2.3.2)
a, b E IR,
(2.7.5)
(2.7.6)
and
(2.7.7)
Notice also that
(2.7.8)
and if A0 is the closed subspace of L 2 consisting of all the constant functions,
then an application of the prediction equations (2.3.8) gives
(2.7.9)
(The Conditional Expectation E(X 1 Z)). If Z is a random
variable on (0, ff, P) and X E L 2 (0, ff, P) then the conditional expectation of
X given Z is defined to be
Definition 2.7.3
(2.7. 10)
where A(Z) is the closed subspace of L 2 consisting of all random variables in
L 2 which can be written in the form r/J (Z) for some Borel function r/J : IR --+ IR.
(For the proof that A(Z) is a closed subspace see Problem 2.25.)
The operator E H<Zl has all the properties (2.7.5)-(2.7.8), and in addition
(2.7. 1 1)
Definition 2. 7.3 can be extended in a fairly obvious way as follows: if Z1 ,
. . . , Z" are random variables on (0, ff, P) and X E L 2 , then we define
(2.7. 1 2)
where A(Z 1 , . . . , Z") is the closed subspace of L 2 consisting of all random
2. Hilbert Spaces
64
variables in L 2 of the form ¢ (Z 1 , , Zn ) for some Borel function ¢ : !R n -+ R
The properties of E41121 listed above all carry over to E41 1z 1 z" J ·
• • •
• • • • •
Conditional Expectation and Best Linear Prediction. By the projection theorem,
the conditional expectation EA1z1 , . . . . zjX) is the best mean square predictor
of X in A(Z1 , , Zn ), i.e. it is the best function of Z1 , , Zn (in the m.s. sense)
for predicting X. However the determination of projections on A(Z 1 , . . . , Zn )
is usually very difficult because of the complex nature of the equations (2.7.4).
On the other hand if Z1 , . . . , Zn E L2, it is relatively easy to compute instead
the projection of X on sp f l , Z 1 , . . . , Zn } <;; A(Z 1 , . . . , Zn ) since we can write
. . •
. . •
Ps�{
p 1 , z 1 , . . . , z " }(X) =
where r:J. 0 , . . . , r:J.n satisfy
"'
�
"
i=O
r:J. . Z.
t
l'
Z0 = 1 ,
(2.7. 1 3)
(2.7. 1 4)
or equivalently,
n
L r:J.;E(Z;ZJ = E(XZi ), j = 0, 1 , . . . , n.
i=O
(2.7. 1 5)
The projection theorem guarantees that a solution (r:J.0 , , r:J.n ) exists. Any solu­
tion, when substituted into (2.7. 1 3) gives the required projection, known as
the best linear predictor of X in terms of 1, Z 1 , , Zn . As a projection of X
onto a subspace of A(Z1 , , Zn) it can never have smaller mean squared error
than EA1z 1 . . . . . z"1 X. Nevertheless it is of great importance for the following
reasons:
• . .
. • •
• . .
(a) it is easier to calculate than EA<z 1 , . . . ,zjX),
(b) it depends only on the first and second order moments, EX, EZ;, E(Z; Zi )
and E(XZi ) of the joint distribution of (X, Z 1 , . . . , Zn ),
(c) if(X, Z 1 , . . . , Zn )' has a multivariate normal distribution then (see Problem
2.20),
P;;p( ! , Z1 , . . . . z" } (X) = EA<Z 1 . . . . . zjX) .
Best linear predictors are defined more generally as follows:
Linear Predictor of X in Terms of { ZA, A E A} ). If X E U
and ZA E L 2 for all A E A, then the best linear predictor of X in terms of
{ZA, }_ E A} is defined to be the element of sp {ZA, A E A} with smallest mean
square distance from X. By the projection theorem this is just P;;p(z,. A . A } X.
Definition 2.7.4 (Best
2.7. 1 . Suppose Y = X 2 + Z where X and Z are independent
standard normal random variables. The best predictor of Y in terms of X
ExAMPLE
65
§2.8. Fourier Series
is E( Y I X) = X 2. (The reader should check that the defining properties of
E(Y I X) = E11<x> Y are satisfied by X 2 , i.e. that X 2 E .A(X) and that (2.7.4) is
satisfied with .A = .A(X).) On the other hand the best linear predictor of Y
in terms of { l, X} is
psp{ l . x } Y = aX + b,
where, by the prediction equations (2.7.1 5),
<aX + b, X) = < Y, X) = E ( YX) = 0
and
<aX + b,
l) = < Y, l) = E ( Y) = l.
Hence a = 0 and b = l so that
psp{ l , X } y = l.
The mean squared errors of the two predictors are
II E ( Y I X) - Y ll 2 = E(Z2 ) = l,
and
showing the substantial superiority of the best predictor over the best linear
predictor in this case.
Remark 1. The conditional expectation operators EJI!<Z> and £_41(z, , . . . . z"> are
usually defined on the space L 1 (0., !?, P) of random variables X such that
E I X I < oo (see e.g. Breiman ( 1 968), Chapter 4). The restrictions of these
operators to L2(0., !?, P) coincide with £.4/(Z) and EA(Z, , . . . . z"> as we have
defined them.
§2.8 Fourier Series
Consider the complex Hilbert space L 2 [ - n, n] = L 2 ( [ - n, n], !!J, U ) where !!J
consists of the Borel subsets of [ - n, n ], U is the uniform probability measure
U(dx) = (2n)- 1 dx, and the inner product of f g E L2 [ - n, n] is defined as
usual by
,
l
<J, g) = Ejg = 2n
The functions {en , n E Z} defined by
I"
f(x)g(x) dx.
(2.8. 1 )
-n
(2.8.2)
2. Hilbert Spaces
66
are orthonormal in L 2 [ - n, n ] since
I" .
1 I"
1
< em • en ) = 2n
_,
=2n
_,
=
e•<m n)x dx
[cos(m - n)x + i sin(m - n)x] dx
if m = n
0 if m f:. n.
{1
(Fourier Approximations and Coefficients). The n1h order
Fourier approximation to any function f E L 2 [ - n, n] is defined to be the
projection of f onto sp {ej , l j l ::;; n}, which by Theorem 2.4. 1 is
Definition 2.8.1
n
The coefficients
SJ = L < f, ej ) ej .
j= -n
(2.8.3)
(2.8.4)
are called the Fourier coefficients of the function f.
We can write (2.8.3) a little more explicitly in the form
n
(2.8.5)
x E [ - n, n] ,
SJ(x) = L <J, ej ) e ijx,
j= - n
and one is naturally led to investigate the senses (if any) in which the sequence
of functions { Sn f} converges to f as n --+ oo. In this section we shall restrict
attention to mean square convergence, deferring questions of pointwise and
uniform convergence to Section 2. 1 1 .
The sequence {SJ } has a mean square limit as n --+ oo which
we shall denote by L� -oo < f, ej > ej or Sf
(b) Sf = f.
Theorem 2.8.1 . (a)
PROOF. (a) From Bessel' s inequality (2.4.8) we have L lk; ;n I <f, ej > 1 2 s 11/11 2
for all n which implies that L� -oo I < J, ej ) l 2 < oo . Hence for n > m 2 1 ,
IISJ - Sm /11 2 s L l <f, ej ) l 2 --+ 0 as m --+ oo ,
ljl > m
showing that {SJ } is a Cauchy sequence and therefore has a mean square
limit.
(b) For l j l ::;; n, <SJ, e) = <J, ej ), so by continuity of the inner product
<Sf, e) = limoo <SJ, ej > = <f, ej > for all j E J;.
n�
§2.9. Hilbert Space Isomorphisms
67
In Theorem 2. 1 1 .2 we shall show that ( g , ei )
Hence Sf- f 0.
=
Corollary 2.8.1 .
L 2 [ - n, n]
=
=
0 for allj E Z implies that g
=
0.
0
sp { ei,j E Z}.
PROOF. Any f E e [- n, n] can be expressed as the mean square limit of Snf
where SJE sp{ei,j E Z } . Since sp{ei,j E Z} is by definition closed it must con­
tainf. Hence sp{ei,j E Z} 2 U [ - n, n].
0
Corollary 2.8.2.
(b) ( f, g )
=
L � -ro l ( f, e) l 2 .
L � -00 (f, e) (g, ei ) .
(a) 1 1 !1 1 2
=
PROOF. Corollary 2.8. 1 implies that the conditions of Theorem 2.4.2 are
satisfied.
0
§2.9 Hilbert Space Isomorphisms
Definition 2.9.1 (Isomorphism). An isomorphism of the Hilbert space ..no1 onto
the Hilbert space ..no2 is a one to one mapping T of ..no1 onto ..no2 such that for
all !1 . !2 E ..nol ,
and
(a) T(af1 + bf2 )
=
aTf1
+
bTf2 for all scalars a and b
We say that ..no1 and ..no2 are isomorphic if there is an isomorphism T of ..no1
onto ..no2 . The inverse mapping T - 1 is then an isomorphism of ..no2 onto ..no1 .
Remark 1 . In this book we shall always use the term isomorphism to indicate
that both (a) and (b) are satisfied. Elsewhere the term is frequently used to
denote a mapping satisfying (a) only.
EXAMPLE 2.9. 1 (The Space / 2 ). Let F denote the complex Hilbert space of
sequences { zn, n = 1, 2, . . . }, zn E IC, L ��1 l z� I < oo , with inner product
( { Yn }, {zn} )
=
00
L Y;Z;.
i=l
(For the proof that / 2 is a separable Hilbert space see Problem 2.23.) If now
any Hilbert space with an orthonormal basis {en, n = 1 , 2, . . . } then the
mapping T : ..no --+ 12 defined by
..no is
(2.9. 1 )
i s an isomorphism of ..no onto F (see Problem 2.24). Thus every separable
Hilbert space is isomorphic to / 2 •
68
2. Hilbert Spaces
Properties of Isomorphisms. Suppose T is an isomorphism of £1 onto £2 . We
then have the following properties, all of which follow at once from the
definitions:
(i) If {en } is a complete orthonormal set in £1 then { Ten} is a complete
orthonormal set in £2 •
(ii) I Tx I = I x II for all x E £1 .
(iii) llxn - x ll --> 0 if and only if I Txn - Tx ll --> 0.
(iv) {xn } is a Cauchy sequence if and only if { Txn } is a Cauchy sequence.
(v) TPS!i{x, .l. e A} (x) = PS!i{ Tx, .l. e A} (Tx).
The last property is the basis for the spectral theory of prediction of a
stationary process { X,, t E Z} (Section 5.6), in which we use the fact that the
mapping
.
.
defines an isomorphism of a certain Hilbert space of random variables onto
a Hilbert space L 2 ( [ - n, n] , 86', .u) with .u a finite measure. The problem of
computing projections in the former space can then be tranformed by means
of (v) into the problem of computing projections in the latter.
§2. 10* The Completeness of L2 (Q, ff, P)
We need to show that if xn E L 2 ' n = 1 , 2, . . . ' and I I Xn - Xm il --> 0 as m, n --> oo ,
then there exists X E L 2 such that xn � X. This will be shown by identifying
X as the limit of a sufficiently rapidly converging subsequence of { Xn}· We
first need a proposition.
and 1 1 Xn+ 1 - Xn ll ::s; r n, n = 1, 2, . . . , then there
is a random variable X on (Q, :F, P) such that Xn --> X with probability one.
Proposition 2.1 0.1. 1/ Xn E L 2
PROOF. Let X o = 0. Then xn = LJ=1 ( Xj - xj - 1 ). Now I.;1 1 Xj - xj- 1 1 is
finite with probability one since, by the monotone convergence theorem and
the Cauchy-Schwarz inequality,
X - x 1 ll X1 + I rj < oo .
X -x l
EI
E Xj - xj - 1 l :::;; I
j =1 I j j-1 = jI
j =1 II j j - :::;; II II j = 1
=1 I
It follows that limn� LJ=1 I Xj - xj - 1 1 (and hence limn� L J= 1 ( Xj - xj - d =
limn�oo Xn ) exists and is finite with probability one.
D
00
00
00
co
Theorem 2. 1 0. 1 .
L 2 (Q, :F, P) is complete.
C1J
co
PROOF. If { Xn } is a Cauchy sequence in L 2 then we can find integers n 1 , n 2 ,
. . . , such that n 1 < n 2 < · · · and
§2. 1 1 . * Complementary Results for Fourier Series
69
(2. 1 0. 1 )
(First choose n 1 t o satisfy (2. 1 0. 1 ) with k = 1 , then successively choose n 2 , n 3 ,
. . . , to satisfy the appropriate conditions.)
By Proposition 2. 1 0. 1 there is a random variable X such that X"" ---> X with
probability one as k ---> oo. Now
II Xn - X ll 2
=
f
iXn - X l 2 d P = lim inf iXn - Xnl d P,
J
and so by Fatou's lemma,
k�oo
IIXn - X ll 2 :s; lim inf ii Xn - XnJ 2 •
k�oo
(2. 1 0.2)
The right-hand side of (2. 10.2) can be made arbitrarily small by choosing n
large enough since {Xn } is a Cauchy sequence. Consequently II Xn - X ll 2 ---> 0.
The fact that E I X I 2 < oo follows from the triangle inequality
II X II :s; II X" - X I I + IIXn ll ,
the right-hand side of which is certainly finite for large enough n.
0
§2. 1 1 * Complementary Results for Fourier Series
The terminology and notation of Section 2.8 will be retained throughout this
section. We begin with the classical result that trigonometric polynomials are
uniformly dense in the space of continuous functions f which are defined on
[- n, n] and which satisfy the condition f(n) = f( - n).
Theorem 2.1 1.1.
.f( - n). Then
Let .f be a continuous function on [ - n, n] such that .f(n) =
(2.1 1 . 1 )
uniform/y on [ - n, n] as n ---> oo.
PROOF. By definition of the n1h order Fourier approximation,
SJ(x) = I < f, ei ) ei
lil s n
which by defining .f(x) = f(x
+
f" f(y)
I e ii(x � y) dy,
lil s n
�,
2n), x E IR, can be rewritten as
= (2n)�l
SJ(x) = (2n) � 1
where Dn(Y) is the Dirichlet kernel,
f,
.f(x - y)Dn (Y) dy,
(2. 1 1 .2)
70
2 . Hilbert Spaces
12
1 1
10
9
8
7
6
5
4
3
2
0
-1
-2
-3
-4
-5
-3
-4
Q
-1
-2
2
3
4
5
Figure 2.2. The Dirichlet kernel D5 (x), - 5 :s; x :s; 5 (D"( · ) has period 2n:).
Dn (y)
= L
lks n
..
e 'JY
=
e i(n+1/2)y
_
e-i(n+ 1/2)y
. - e lyj2
.
e•y/2
=
{
sin [(n + 1-)y] .
If y # 0,
.
sm b·1 y)
If y = 0.
2n + 1
(2. 1 1 .3)
•
A graph of the function D" is shown in Figure 2.2. For the function f(x) = 1 ,
<f, e0 ) = 1 and <J, ei ) = O, j # 0. Hence S" 1 (x) = 1 , and substituting this i n
(2. 1 1 .2) we find that
(2n)-1 J:, Dn(Y) dy
Making use of (2. 1 1 .2) we can now write
n - 1 (S0f(x) + · · · + S" _J(x)) =
where Kn( Y) is the Fejer kernel,
K n ( Y)
=
�1� "�1
2 nn if...
=O
.
DJ ( Y)
=
=
1.
J:/(x - y)K"(y) dy,
L}:J sin [(j + 1-)y]
.
.
2 nn sm ( z1 Y)
Evaluating the sum with the aid of the identity,
2 sin{ty)sin[(j + 1-)y]
we find that
=
cos(jy) - cos [(j + 1)y],
(2. 1 1 .4)
(2. 1 1 .5)
§2. 1 1 . * Complementary Results for Fourier Series
71
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-5
-4
-3
-2
!
-1
0
3
2
4
5
Figure 2.3. The Fejer kernel K 5(x), - 5 :s; x :s; 5 (K "( · ) has period 2n).
K"(y) =
1 sin 2 (ny/2) .
If y =f. O,
2nn sin 2 (y/2)
n
2n
(2. 1 1 .6)
.
If y = 0.
The Fejer kernel is shown in Figure 2.3. It has the properties,
(a)
(b)
(c)
(d)
(e)
K n(Y) � 0 (unlike D"(y)),
K"( · ) has period 2n,
K"( - ) is an even function
s� " K "(y) dy = 1 ,
for each b > 0, J�a Kn( Y) dy --> 1 as n --> oo .
The first three properties are evident from (2. 1 1 .6). Property (d) is obtained by
setting f(x) = 1 in (2. 1 1 .5). To establish (e), observe that
Kn( Y) :s;;
1
for 0 <
2nn sin 2 (c5/2)
c'5
< IYI
:s;;
n.
For each c'5 > 0 this inequality implies that
t�:
Kn( Y) dy +
1"
Kn( Y) dy --> 0 as n --> 00,
which, together with property (d), proves (e).
Now for any continuous function f with period 2n, we have from (2. 1 1 .5)
2. Hilbert Spaces
72
and property (d) of K " ( . ),
11 " (x) = l n - 1 (S0f(x) + · · · + Sn-d(x)) - f(x) l
=
=
Hence for each .:5 > 0,
11n (x )
:$;
l f/
If
(x - y)K n (y) dy - f(x)
,
[ f(x - y) - f(x)] Kn (y) dy
lf
+Il
o
l
[f(x - y) - f(x)] Kn (y) dy
J [ - n , n]\(- o , o )
I
I
·
[f(x - y) - f(x)] K n ( Y) dy
I·
(2. 1 1 .7)
Since a continuous function with period 2n is uniformly continuous, we can
choose for any 6 > 0, a value of .:5 such that sup _, ,; x ,; n l f(x - y) - f(x) l < 6
whenever I Y I < .:5. The first term on the right of (2. 1 1 .7) is then bounded
by 6 J':., K n ( Y) dy and the second by 2M(l - J� o Kn ( Y) dy) where M =
sup _, ,;x,; n l f(x) l . Hence
_..
6
as n _.. oo .
But since 6 was arbitrary and 11n(x) 2 0, we conclude that 11" (x) _.. 0 uniformly
on [ - n, n] as required.
D
Remark 1. Under additional smoothness conditions on f, Sn f may converge
to f in a much stronger sense. For example if the derivative f' exists and
f' E L2 [ - n, n] , then Sn f converges absolutely and uniformly to f (see Chur­
chill ( 1 969) and Problem 2.22).
Theorem 2.1 1 .2. IffE L 2 [ - n, n]
everywhere.
and (f, ei >
=
0 for all j E 71., then f = 0 almost
PROOF. It sufficies to show that JAf(x) dx = 0 for all Borel subsets A of
[ - n, n] or, equivalently, by a monotone class argument (see Billingsley ( 1986)),
(2n) -1
f
f(x) dx = (f, I1a , bJ )
=
0
(2. 1 1.8)
for all subintervals [a, b] of [ - n, n]. Here Ira.bJ denotes the indicator function
of [a, b].
To establish (2. 1 1 .8) we first show that (f, g) = 0 for any continuous
function g on [ - n, n] with g( - n) g(n). By Theorem 2. 1 1 . 1 we know that
=
73
Problems
-n
a
a + !jn
b - !jn
b
n
Figure 2.4. The continuous function h. approximating J[a. bJ ·
for g continuous, g. = n -! (Sog
plying in particular that
+
.
. . + s.- g) --+ g uniformly on [ - n, TC], im­
1
m.s .
g. �
g.
By assumption ( f, g. ) = 0, so by continuity of the inner product,
(f, g) = lim (f, g. ) = 0.
The next step is to find a sequence {h.} of continuous functions such that
h. � I[a.bJ · One such sequence is defined by
h. (x) =
0
n (x - a )
1
- n (x - b)
0
if - n :::;; x :::;; a,
if a :::;; x :::;; a + 1/n,
if a + 1/n :::;; x :::;; b - 1/n,
if b - 1/n :::;; x :::;; b,
if b :::;; x :::;; n,
since III[a.bJ - h.ll 2 :::;; ( 1 /2n) (2/n) --+ 0 as n --+
continuity of the inner product again,
oo.
(f, /[a.bJ ) = lim ( J, h. )
(See Figure 2.4.) Using the
=
0.
D
Problems
2. 1 . Prove the parallelogram law (2. 1 .9).
2.2. If {X,, t = 0, ± 1 , . . . } is a stationary process with mean zero and auto­
covariance function y( · ), show that Y, = I �� � ak Xk converges in mean square
if I�o I;,o aiai y(i - j) is finite.
2.3. Show that if {X, t = 0, ± 1, . . . } is stationary and I ll I < 1 then for each n,
L} 1 ()iX" + 1 i con verges in mean square as m --+ oo .
�
_
2.4. I f .H is a closed subspace o f the Hilbert space £, show that (.H� )�
=
.H.
2.5. If .H is a closed subspace of the Hilbert space :Yt' and x E £, prove that
min llx - Yll
y e . It
=
max { l (x, z ) l : z E .H \ llzll
=
1 }.
2. Hilbert Spaces
74
2.6. Verify the calculations of t/1 1 and t/12 in Example 2.3.4. Also check that X3 =
(2 cos w)X2 - X 1 .
2.7. If£' is a complex Hilbert space and X;E £', i = 1 , . . . , n, show that sp{x 1 , . . . , xn} =
{2:: }� 1 aj xj : aj E IC,j = 1 , . . . , n}.
2.8. Suppose that {X,, t = 1 , 2, . . . } is a stationary process with mean zero. Show that
P;;pp .x 1 Xn ) Xn + ! = P;;p{X1, Xn ) Xn+ 1 ·
• • • • •
• • • ,
2.9. (a) Let £' = U ( [ - 1 , 1], :?1[ - 1 , 1], J1) where dJ1 = dx is Lebesgue measure on
[ - 1 , 1]. Use the prediction equations to find constants a0, a 1 and a 2 which
mmimize
(b) Find max{gEA '. I I Y II � 1 1 J � 1 exg(x) dx where At = sp{ 1 , x, x 2 }.
2. 10. I f X, = Z, - OZ,_ 1 , where 1 0 1 < 1 and {Z" t = 0 , ± 1 , . . . } i s a sequence o f un­
correlated random variables, each with mean 0 and variance a 2 , show by check­
ing the prediction equations that the best mean square predictor of Xn+l m
sp {Xj, - oo < j ::s; n} is
j
xn+1 = - I o xn + l -j ·
j� 1
00
What is the mean squared error of Xn+l?
2. 1 1 . I f X, is defined a s in Problem 2. 1 0 with () = 1 , find the best mean square predictor
of Xn+1 in sp {Xj , I ::s; j ::s; n} and the corresponding mean squared error.
2. 1 2. If X, = ¢ 1 X,_ 1 + ¢2 X,_ 2 + . . . + ¢v Xr - p + Z" t = 0, ± 1, . . . where {Z, } is a se­
quence of uncorrelated random variables, each with mean zero and variance a2
and such that Z, is uncorrelated with { Xj ,j < t} for each t, use the prediction
equations to show that the best mean square predictor of Xn+l in sp { Xj,
-oo < j ::s; n} is
Xn + 1 = !/J1 Xn + !/J2 Xn - 1 + . . . + t/JpXn+ 1 - p ·
2. 1 3. (Gram-Schmidt orthogonalization). Let x 1 , x 2 , . . . , xn be linearly independent
elements of a Hilbert space £' (i.e. elements for which lla 1 x 1 + · · · + anxn ll = 0
implies that a 1 = a 2 = · · · = an = 0). Define
and
Show that {ek = wk /llwk ll, k = l, . . . ,n} is an orthonormal set and that
sp { e 1 o . . . , ek } = sp { x 1 , . . . , xk } for 1 ::s; k ::s; n.
2. 1 4. Show that every closed subspace At of IR" which contains a non-zero vector can
be written as At = sp{e 1 , . . . ,em } where {e 1 , . . . ,em } is an orthonormal subset of
At and m ( ::s; n) is the same for all such representations.
Problems
75
2. 1 5. Let X 1 , X2 and X3 be three random variables with mean zero and covariance
matrix,
Use the Gram-Schmidt orthogonalization process of Problem 2. 1 3 to find
three uncorrelated random variables Z 1 , Z 2 and Z 3 such that sp {X 1 } = sp { Z 1 } ,
sp {X 1 , X 2 } = sp { Z 1 , Z 2 } and sp { X 1 , X 2 , X 3 } = sp {Z 1 , Z 2 , Z 3 } .
2. 1 6. (Hermite polynomials). Let £' = L 2 (1R, .@, J1) where dJ1 = (2nr 1 12 e - x'12 dx. Set
f0 (x) = I , /1 (x) = x, f2 (x) = x 2 , f3 (x) = x 3 . Using the Gram-Schmidt ortho­
gonalization process, find polynomials Hk(x) of degree k, k = 0, I, 2, 3 which are
orthogonal in £'. (Do not however normalize Hk(x) to have unit length.) Verify
dk
k
that Hk(x) = ( - l ) ex'12 - e - x'l2 k = O I 2 3
dxk
'
'
'
'
·
2. 1 7. Prove the first statement in the proof of Theorem 2.4.2.
2. 1 8. (a) Let x be an element of the Hilbert space £' = sp {x 1 , x 2 , . . . }. Show that £'
is separable and that
(b) If { X, t = 0, ± 1, . . . } is a stationary process show that
P;;p{x1• -oo <J->n) Xn+1 = lim P;;p{x1.n-r<J-<;n) Xn+1 ·
r-oo
2. 1 9. (General linear model). Consider the general linear model
Y = X9 + Z,
where Y = ( Y1 , , Y,)' is the vector of observations, X is a known n x m matrix
of rank m < n, 9 = (8 1 , . . . , em )' is an m-vector of parameter values, and Z =
(Z 1 , . . . , z.)' is the vector of noise variables. The least squares estimator of 9 is
given by equation (2.6.4), i.e.
9 = (X' xr 1 X'Y.
. • .
Assume that Z - N(O, a 2 I.) where I. is the n-dimensional identity matrix.
(a) Show that Y - N(X9, a 2 I.).
(b) Show that 9 - N(9, a2 (X' Xr 1 ).
(c) Show that the projection matrix Pu = X(X' xr 1 X' is non-negative definite
and has m non-zero eigenvalues all of which are equal to one. Similarly,
I. - PH is also non-negative definite with (n - m) non-zero eigenvalues all
of which are equal to one.
(d) Show that the two vectors of random variables, Pu(Y - X9) and (/. - Pu)Y
are independent and that a- 2 11 PH(Y - X9) 11 2 and a- 2 11 ( 1. - PH) Y II 2 are inde­
pendent chi-squared random variables with m and (n - m) degrees of freedom
respectively. ( I I Y II here denotes the Euclidean norm of Y, i.e. (I,�� � l'? ) 1 12 .)
(e) Conclude that
76
2. Hilbert Spaces
has the
2.20.
F
(n - m) I I P.,u ( Y - X9) 11 2
m ii Y - P..�t YII 2
distribution with
Suppose (X, Z 1 ,
• . •
m
and
(n - m)
degrees of freedom.
, Z")' has a multivariate normal distribution. Show that
, Z } ( X) = E..K(Z1, . . . ,zjX ),
where the conditional expectation operator E..K(z1, z") is defined as in Section
psp{ l .Z1
• • • •
n
. • . •
2.7.
2.2 1 .
Suppose {X,, t = 0, ± 1 , . . . } i s a stationary process with mean zero and auto­
[y(h)l < oo ).
covariance function y( · ) which is absolutely summable (i.e. Li:'=
Define f to be the function,
- oo
h
and show that y(h) = f': n e' 'f(A) dA.
-n
� A � n,
2.22. (a) If / E L 2 ( [ - n, n] ), prove the Riemann-Lebesgue lemma: (f, eh ) -> 0 as
h -> oo, where e" was defined by (2.8.2).
(b) If jE L 2 ( [ - n, n] ) has a continuous derivative f'(x) and f(n) = f( - n), show
that (f, eh) = (ihr1 (f', eh) and hence that h(f, eh ) -> 0 as h -> oo. Show
also that I I:'= I (f, eh ) I < oo and conclude that Snf (see Section 2.8)
converges uniformly to f.
- oo
2.23.
2.24.
Show that the space F (Example 2.9. 1 ) is a separable Hilbert space.
If Yf is any Hilbert space with orthonormal basis { e", n = 1 , 2, . . . }, show that
the mapping defined by Th = { (h, e" ) }, hE Yf, is an isomorphism of Yf onto 12.
2.25.* Prove that .H(Z) (see Definition 2.7.3) is closed.
CHAPTER 3
Stationary ARMA Processes
In this chapter we introduce an extremely important class of time series
{X,, t = 0, ± 1 , ± 2, . . . } defined in terms of linear difference equations with
constant coefficients. The imposition of this additional structure defines a
parametric family of stationary processes, the autoregressive moving aver­
age or ARMA processes. For any autocovariance function y( · ) such that
limh Xl y(h) = 0, and for any integer k > 0, it is possible to find an ARMA
process with autocovariance function Yx( · ) such that Yx(h) = y(h), h =
0, 1 , . . . . , k. For this (and other) reasons the family of ARMA processes plays
a key role in the modelling of time-series data. The linear structure of ARMA
processes leads also to a very simple theory of linear prediction which is
discussed in detail in Chapter 5.
§3. 1 Causal and Invertible ARMA Processes
In many respects the simplest kind of time series { X, } is one in which the
random variables X,, t = 0, ± 1, ± 2, . . . are independently and identically
distributed with zero mean and variance rJ 2 . From a second order point of
view i.e. ignoring all properties of the joint distributions of { X, } except those
which can be deduced from the moments E(X,) and E(X5X,), such processes
are identified with the class of all stationary processes having mean zero and
autocovariance function
y(h)
=
{(J2
0
if h
if h
=
0,
f= 0.
(3. 1 . 1)
3. Stationary ARMA Processes
78
Definition 3.1 .1. The process { Z, } is said to be white noise with mean 0 and
variance a 2 , written
{Z, } - WN(O, a 2 ),
(3. 1 .2)
if and only if { Z, } has zero mean and covariance function (3. 1 . 1 ).
If the random variables Z, are independently and identically distributed
with mean 0 and variance a 2 then we shall write
(3. 1 .3)
A very wide class of stationary processes can be generated by using white
noise as the forcing terms in a set of linear difference equations. This leads to
the notion of an autoregressive-moving average (ARMA) process.
0, ± 1, ± 2, . . . }
is said to be an ARMA(p, q) process if {X, } is stationary and if for every t,
Definition 3.1.2 (The ARMA (p, q) Process). The process {X,, t =
X, - rp 1 X, _1
r/JvXr-v
=
Z, + 8 1 Z, _1 + · · · + 8qZr-q,
(3. 1 .4)
2
where {Z, } - WN(O, a ). We say that { X, } is an ARMA(p, q) process with
mean fJ. if {X, - fJ.} is an ARMA(p, q) process.
···
-
-
The equations (3. 1 .4) can be written symbolically in the more compact form
t = 0, ± 1 ' ± 2, . . . '
where rjJ and e are the p'h and q'h degree polynomials
rp(B)X, = 8(B)Z,,
rp(z) = 1
and
(3. 1 .5 )
¢Yv zP
(3. 1 .6)
8(z) = 1 + el z + . . . + eq z q
(3. 1 . 7)
-
r/J 1 z
-
···
-
and B is the backward shift operator defined by
j = 0, ± 1 ' ± 2, . . . .
(3. 1 .8)
The polynomials rjJ and 8 will be referred to as the autoregressive and moving
average polynomials respectively of the difference equations (3. 1 .5).
EXAMPLE
3. 1 . 1 (The MA(q) Process). If r/J(z) = 1 then
X, = 8(B)Z,
(3. 1 .9 )
and the process is said to be a moving-average process of order q (or MA(q)).
It is quite clear in this case that the difference equations have the unique
solution (3. 1 .9). Moreover the solution { X, } is a stationary process since
(defining eo = 1 and ej 0 for j > q), we see that
=
q
Ex, = I ej Ez,_j = o
j= O
§3. 1 . Causal and Invertible ARMA Processes
and
{
_
Cov(Xt +h• X,)
-
79
q
a 2 _t1 eA+I hl if I h i :::; q,
1-o
0
if l h l > q .
A realization of {X1 , . . . , X 1 00 } with q = 1 , 8 1 = - .8 and Z1 � N(O, 1 ) is shown
in Figure 3.l(a). The autocorrelation function of the process is shown in Figure
3.1(b).
EXAMPLE
3. 1 .2 (The AR(p) Process). If 8(z) = 1 then
l/J(B)X,
=
(3. 1 . 1 0)
Z,
and the process is said to be an autoregressive process of order p (or AR(p)).
In this case (as in the general case to be considered in Theorems 3. 1 . 1 -3. 1 .3)
the existence and uniqueness of a stationary solution of (3. 1 . 1 0) needs closer
investigation. We illustrate by examining the case f/J(z) = 1 l/J1 z, i.e.
-
(3. 1 . 1 1 )
Iterating (3. 1 . 1 1 ) we obtain
X, = Z, + l/J1 Zt - 1 + l/Jf X, _ z
lf l l/J1 1
= z, + l/J1 zt - 1 + · · · + l/J� z, _ k + l/J� + 1 x,_ k -1 ·
<
1 and {X, } is stationary then II X, I I 2 = E(X/) is constant so that
I x, jto l/l{z,_j W
-
=
¢Jfk+2 l l xt -k- 1 11 2 --+ o as k --+ oo .
Since I.'t= o l/J { Z,_j i s mean-square convergent (by the Cauchy criterion), we
conclude that
x,
= j=OI. l/l{ z,_j·
00
(3. 1 . 1 2)
Equation (3. 1 . 1 2) is valid not only in the mean square sense but also (by
Proposition 3. 1 . 1 below) with probability one, i.e.
00
X,(w) = I. l/l{Z,_Aw) for all w ¢ E,
j=O
where E is a subset of the underlying probability space with probability zero.
All the convergent series of random variables encountered in this chapter will
(by Proposition 3. 1 . 1) be both mean square convergent and absolutely con­
vergent with probability one. Now { X, } defined by (3. 1 . 1 2) is stationary since
EX,
=
00
L l/l{ EZ,_j = 0
j=O
3. Stationary ARMA Processes
80
0
-1
-2
-3
-4
10
0
30
20
40
50
60
70
80
90
1 00
(a)
1
0.9
0.8
0.7
0.6
0 5
0.4
0.3
0.2
0.1
0
-0. 1
-0 2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
(
0
10
5
(b)
Figure 3. 1 . (a) 1 00 observations of the series X, = Z,
autocorrelation function of {X, } .
-
.
r 1,
8Z
-
15
20
Example 3. 1 . 1 . (b) The
81
§3. 1. Causal and Invertible ARMA Processes
and
=
=
(52 ¢>�hi I ¢> �j
j� O
i
(52 ¢>�h /( 1 ¢> l ).
00
-
Moreover {X, } as defined b y (3. 1 . 1 2) satisfies the difference equations (3. 1 . 1 1 )
and i s therefore the unique stationary solution. A realization of the process
with ¢> 1 = .9 and Z, N(O, 1 ) is shown in Figure 3.2(a). The autocorrelation
function of the same process is shown in Figure 3.2(b).
In the case when l ¢> 1 1 > 1 the series (3. 1 . 1 2) does not converge in L 2 .
However we can rewrite (3. 1 . 1 1) in the form
�
(3. 1 . 1 3)
Iterating (3. 1 . 1 3) gives
x, = - r/>�� z, +l - ¢> � 2 z, +2 + ¢>� 2 x, +2
- ¢ � 1 Zr + l - · · · - ¢ � k - ! z, +k + ! + ¢ � k - ! x, + k + l •
which shows, by the same arguments as in the preceding paragraph, that
=
i
(3. 1 . 14)
x, = - j�Ll ¢>� z, +j
is the unique stationary solution of (3. 1 . 1 1). This solution should not be
confused with the non-stationary solution {X" t = 0, ± 1 , . . . } of (3. 1 . 1 1 )
obtained when X 0 i s any specified random variable which i s uncorrelated
with {Zr }.
The stationary solution (3. 1 . 1 4) is frequently regarded as unnatural since
X, as defined by (3. 1 . 14) is correlated with {Zs , s > t}, a property not shared
by the solution (3. 1 . 1 2) obtained when I ¢ I < 1 . It is customary therefore when
modelling stationary time series to restrict attention to AR( 1 ) processes with
I r/>1 1 < 1 for which X, has the representation (3. 1 . 1 2) in terms of { Z8, s ::::;; t}.
Such processes are called causal or future-independent autoregressive pro­
cesses. It should be noted that every AR(l ) process with l ¢>1 1 > 1 can be
reexpressed as an AR(1) process with l ¢> 1 1 < 1 and a new white noise sequence
(Problem 3.3). From a second-order point of view therefore, nothing is lost
by eliminating AR( 1 ) processes with I ¢ 1 1 > 1 from consideration.
If l ¢> 1 1 1 there is no stationary solution of (3. 1 . 1 1 ) (Problem 3.4). Con­
sequently there is no such thing as an AR(1) with l ¢> 1 1 = 1 according to our
Definition 3. 1 .2.
The concept of causality will now be defined for a general ARMA(p, q)
process.
00
=
3. Stationary ARMA Processes
82
8 ,-------�
7
6
5
4
3
2
0 ��-----=����---+----�--��,-��7---�
-1
-2
-3
-4
-5
-6
-7
- 8 �����
40
60
70
1 00
30
50
90
10
20
80
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
5
10
15
20
(b)
Figure 3.2. (a) 100 observations of the series X, - .9X,_ 1
autocorrelation function of {X, } .
=
Z,, Example 3. 1 .2. (b) The
83
§3. 1 . Causal and Invertible ARMA Processes
Definition 3.1 .3. An ARMA(p, q) process defined by the equations ¢;(B) X, =
8(B)Z, is said to be causal (or more specifically to be a causal function of
{ Z, } ) if there exists a sequence of constants { t/Ji } such that I i=o I t/1) < oo and
00
X, = I t/Jj Zt -j ,
j=O
(3. 1 . 1 5)
t = 0, ± I , . . . .
It should be noted that causality is a property not of the process { X, } alone
but rather of the relationship between the two processes { X, } and { Z,}
appearing in the defining ARMA equations. In the terminology of Section
4. 10 we can say that { X, } is causal if it is obtained from {Z, } by application
of a causal linear filter. The following proposition clarifies the meaning of the
sum appearing in (3. 1 . 1 5).
If { X, } is any sequence of random variables such that
and if I i= -oo I t/1) < oo, then the series
Proposition 3.1.1.
sup, E I X, l
< oo,
t/J(B)X,
=
00
I
j=
-oo
t/Jj BiX,
-oo
00
I t/Jj Xt _j ,
j=
=
(3. 1 . 1 6)
converges absolutely with probability one. If in addition sup, E I X, I 2
the series converges in mean square to the same limit.
< oo
then
PROOF. The monotone convergence theorem and finiteness of sup, E I X, I give
EC=�oo
)
)
(t_
:S: !�� Ct )
l t/1) 1 x, -j 1 =
!� E
" l t/Jj i i X, _jl
n
l t/Jj l s �p E I X, I
< 00 ,
from which i t follows that I i= I t/li l l X,_i l and t/I (B)X,
both finite with probability one.
lf sup, E I X, I 2 < oo and n > m > 0, then
- oo
E
I
I t/fj x, _j
m <lil S, n
l
2
=
I
I
m <lil S, n m < lkl S, n
:;:; s � p E I X, I 2
I i=
- oo
t/Ji Xt -j are
t/Ji/lk E(x, _jx, _d
c�s," )
--> 0 as m, n -->
=
l t/Ji l
2
oo ,
and so by the Cauchy criterion the series (3. 1 . 1 6) converges in mean square.
If S denotes the mean square limit, then by Fatou's lemma,
i t_
E I S - t/J(B)X, I 2 = E li �nf S �
i
i X, _i
n t/l
l
2
3. Stationary ARMA Processes
84
=
0,
showing that the limits S and t/J (B)X, are equal with probability one.
Proposition 3.1 .2. If {X,}
y( · ) and if LJ:= - oo I t/Ji l <
0
is a stationary process with autocovariance function
then for each t E 71. the series (3. 1 . 1 6) converges
absolutely with probability one and in mean square to the same limit. If
oo,
Y, = t/J(B)X,
then the process { Y, } is stationary with autocovariance function
}'y(h) =
t/J t/J y(h - j + k).
j, k = - oo j k
<Xl
I
PROOF. The convergence assertions follow at once from Proposition 3. 1 . 1 and
the observation that if {X, } is stationary then
1
E I X, I ::;; (E I X,I 2 ) 12
=
c,
where c is finite and independent of t.
To check the stationarity of { Y, } we observe, using the mean square
convergence of (3. 1 . 1 6) and continuity of the inner product, that
E Y, = !��
t.
t/J X, _ =
i n iE i
and
c=�oo t/Ji) EX,,
<Xl
L t/Jj t/J (y(h - j + k) + (EX,) 2 ) .
j, k = - oo k
Thus E Y, and E ( Y, +h Y, ) are both finite and independent of t. The auto­
covariance function Yr( · ) of { Y,} is given by
}'y(h) = E ( Y, + h Y, ) - E Y, + h . E Y, =
L t/Jj t/J y(h - j + k).
j,k = - oo k
<Xl
=
0
It is an immediate corollary of Proposition 3. 1 .2 that operators such
as t/J(B) LJ:= - oo t/Ji Bi with LJ:= - oo I t/Ji l < oo, when applied to stationary
processes, are not only meaningful but also inherit the algebraic properties
of power series. In particular if LJ:= - oo l ai l < oo , LJ:= - oo i Pi l < oo, a (B) =
i
LJ:= - oo ai Bi, fJ (B) = LJ:= - oo fJiB , and t/J(B) = LJ:= - oo t/Ji Bi, where
<Xl
<Xl
t/Ji = L ak fJi -k = L Pk ai -k•
k = - oo
k = - oo
85
§3. 1 . Causal and Invertible ARMA Processes
then a (B) {J (B)X, is well-defined and
a (B) {J (B)X, = {J (B) a (B)X, = 1/J(B)X, .
The following theorem gives necessary and sufficient conditions for an
ARMA process to be causal. It also gives an explicit representation of X, in
terms of { Z5, s ::::; t}.
Let {X,} be an ARMA(p, q) process for which the polynomials
r/J( · ) and 8( · ) have no common zeroes. Then {X, } is causal if and only if rjJ(z) #- 0
for all z d � such that I z I ::::; 1 . The coefficients { 1/JJ in (3. 1 . 1 5) are determined
by the relation
00
1/J(z) = L 1/Ji zi = 8(z)/r/J(z),
(3. 1 . 1 7)
l z l ::::; 1 .
j=O
Theorem 3.1 . 1 .
(The numerical calculation of the coefficients 1/Ji is discussed in Section 3.3.)
PROOF. First assume that r/J(z) #- 0 if I z I ::::; 1. This implies that there exists 1:: > 0
such that 1 /r/J(z) has a power series expansion,
1 /r/J(z) = L �iz i = �(z),
l z l < 1 + 1::.
j=O
Consequently �i ( 1 + t:/2) i � 0 asj � oo so that there exists K E (O, oo) for which
l �i l < K(1 + t:/2)-i for all j = 0, 1, 2, . . . .
00
In particular we have L i= o l �i l < oo and �(z)r/J(z) = 1 for l z l ::::; 1 . By Pro­
position 3. 1 .2 we can therefore apply the operator �(B) to both sides of the
equation r/J(B)X, = 8(B)Z, to obtain
X, = �(B)8(B)Z,.
Thus we have the desired representation,
X, =
00
I 1/Jj Zt-j
j= O
where the sequence { 1/Ji } is determined by (3. 1 . 1 7).
Now assume that {X, } is causal, i.e. X, = L i'=o 1/Ji Z, _ i for some sequence
{ 1/JJ such that L i'= o 1 1/1) < oo . Then
8(B)Z, = r/J(B)X, = rjJ(B) IjJ (B)Z, .
I f we let 11(z) = r/J(z)I/J (z) = L i=o 11i z i, lzl ::::; 1, we can rewrite this equation as
and taking inner products of each side with Z, _ k (recalling that { Z, }
WN(O, o- 2 )) we obtain 11k = ek, k = 0, . . . , q and 11k = 0, k > q. Hence
8(z) = 11(z) = rjJ (z)ljJ(z),
l z l ::::; 1 .
'"'"'
3. Stationary ARMA Processes
86
Since 8(z) and r/J(z) have no common zeroes and since 1 1/J(z) l < oo for l z l :::::; 1 ,
we conclude that r/J(z) cannot be zero for l z l :::::; 1 .
D
Remark 1 . If {X, } is an ARMA process for which the polynomials r/J( " ) and
8( · ) have common zeroes, then there are two possibilities:
(a) none of the common zeroes lie on the unit circle, in which case (Problem
3.6) {X, } is the unique stationary solution of the ARMA equations with
no common zeroes, obtained by cancelling the common factors of r/J( · ) and
8( . ),
(b) at least one of the common zeroes lies on the unit circle, in which case
the ARMA equations may have more than one stationary solution (see
Problem 3.24).
Consequently ARMA processes for which r/J( · ) and 8( · ) have common zeroes
are rarely considered.
Remark 2. The first part of the proof of Theorem 3. 1 . 1 shows that if {X, }
i s a stationary solution of the ARMA equations with r/J(z) # 0 for l z l :::::; 1,
then w e must have X, = L i= o 1/Ji Z,_i where { 1/JJ is defined by (3. 1 . 1 7).
Conversely if X, = L i=o 1/Ji Zt -i then r/J(B)X, = r/J(B)Ij!(B)Z, = 8(B)Z,. Thus
the process { 1/J(B)Z, } is the unique stationary solution of the ARMA equations
if r/J(z) # 0 for l z l :::::; 1 .
Remark 3 . We shall see later (Problem 4.28) that if r/J( " ) and 8( ' ) have no
common zeroes and if r/J(z) = 0 for some z E C with l z l = 1 , then there is no
stationary solution of r/J(B)X, = 8(B)Z,.
We now introduce another concept which is closely related to that of
causality.
An ARMA(p, q) process defined by the equations r/J(B)X, =
8(B)Z, is said to be invertible if there exists a sequence of constants { ni } such
that L i=o I ni l < oo and
Definition 3.1 .4.
00
z, = I nj x, _j ,
j=O
t = 0, ± 1, . . . .
(3. 1 . 1 8)
Like causality, the property of invertibility is not a property of the process
{X,} alone, but of the relationship between the two processes {X, } and { Z, }
appearing in the defining ARMA equations. The following theorem gives
necessary and sufficient conditions for invertibility and specifies the coeffi­
cients ni in the representation (3. 1 . 1 8).
Let {X,} be an ARMA(p, q) process for which the polynomials
r/J( · ) and 8( · ) have no common zeroes. Then {X, } is invertible if and only if
Theorem 3.1 .2.
§3. 1 . Causal and Invertible ARMA Processes
87
B(z) # 0 for all z E C such that l z l � 1 . The coefficients {n:J in (3. 1 . 1 8) are
determined by the relation
n:(z) = L n:i zi = r/J(z)/B(z),
j= O
00
lzl � 1.
(3. 1 . 1 9)
(The coefficients { n:J can be calculated from recursion relations analogous
to those for { t/JJ (see Problem 3.7).)
PROOF. First assume that B(z) # 0 if l z l � 1. By the same argument as in the
proof of Theorem 3. 1 . 1 , 1/B(z) has a power series expansion
1 /B(z) = L '1i z i = 17(z),
l z l < 1 + 8,
j =O
for some 8 > 0. Since L � o 1 '11 1 < oo , Proposition 3. 1 .2 allows us to apply 17(B)
to both sides of the equation r/J(B)X, = B(B)Z, to obtain
00
17(B)rjJ(B)X, = 17(B)8(B)Z, = Z, .
Thus we have the desired representation
00
z, = I n:j x,_j,
j =O
where the sequence { n:J is determined by (3. 1 . 1 9).
Conversely if {X, } is invertible then Z, = L� o n:i X,_i for some sequence
{ n:i } such that L � o I n:i I < oo . Then
r/J(B)Z, = n:(B)r/J(B) X, = n:(B)B(B)Z, .
Setting �(z) = n:(z)B(z) = L � o �i z i, l z l � 1 , we can rewrite this equation as
and taking inner products of each side with Z,_ k we obtain �k = r/Jk ,
k = 0, . . . , p and � k = 0, k > p. Hence
r/J(z) = �(z) = n:(z)B(z),
lzl � 1.
Since r/J(z) and B(z) have no common zeroes and since l n:(z) l < oo for l z l � 1 ,
we conclude that B(z) cannot be zero for l z l � 1 .
0
Remark 4. If {X,} is a stationary solution of the equations
r/J(B)X, = B(B)Z"
and if r/J(z)B(z) # 0 for l z l � 1 , then
(3. 1 .20)
00
X, = L 1/JjZr -j
j =O
and
3. Stationary ARMA Processes
88
Z,
=
00
L njXr -j •
j=O
Remark 5. If {X, } is any ARMA process, f/J(B)X, = 8 (B) Z, , with </J(z) non-zero
for all z such that I z I = 1 , then it is possible to find polynomials �( · ), 8( · ) and
a white noise process { Z:} such that ;f;(B)X, = 8(B)Zi and such that { X,}
is a causal function of {Zi}. If in addition 8(z) is non-zero when l z l = 1 then
8( · ) can be chosen in such a way that { X,} is also an invertible function of
{Zi }, i.e. such that B(z) is non-zero for l z l � 1 (see Proposition 3.5. 1 ). If
{Z,} � IID(O, a2) it is not true in general that {Z:} is independent (Breidt
and Davis ( 1 990)). It is true, however, if { Z,} is Gaussian (see Problem 3. 1 8) .
Remark 6. Theorem 3. 1 .2 can be extended to include the case when the
moving average polynomial has zeroes on the unit circle if we extend the
definition of invertibility to require only that Z, E sp{X., - oo < s � t } .
Under this definition, a n ARMA process i s invertible i f and only i f 8(z) =/= 0
for all l z l < 1 (see Problem 3.8 and Propositions 4.4. 1 and 4.4.3).
In view of Remarks 4 and 5 we shall focus attention on causal invertible
ARMA processes except when the contrary is explicitly indicated. We con­
clude this section however with a discussion of the more general case when
causality and invertibility are not assumed. Recall from Remark 3 that if </J( · )
and 8( · ) have no common zeroes and if </J(z) = 0 for some z E C with l z l = 1 ,
then there i s n o stationary solution o f </J (B)X, = 8(B)Z,. I f o n the other hand
qy(z) =/= 0 for all z E C such that l z l = 1 , then a well-known result from complex
analysis guarantees the existence of r > 1 such that
8(z)f/J(z)- 1
=
00
i
L t/ti z = t/t (z),
j=
r- 1
<
l z l < r,
(3. 1 .2 1 )
- ro
the Laurent series being absolutely convergent i n the specified annulus (see
e.g. Ahlfors ( 1 953)). The existence of this Laurent expansion plays a key role
in the proof of the following theorem.
If f/J(z) =/= 0 for all z E C such that l z l = 1, then the ARMA
equations f/J(B)X, = 8(B)Z, have the unique stationary solution,
Theorem 3.1 .3.
j= -oo
(3. 1 .22)
where the coefficients t/ti are determined by (3. 1 .21 ).
PROOF. By Proposition 3.1 .2, {X, } as defined by (3.1 .22) is a stationary process.
Applying the operator f/J(B) to each side of (3.1 .22) and noting, again by
89
§3.2. Moving Average Processes of Infinite Order
Proposition 3. 1 .2, that rp(B)Ij;(B)Z1
=
rp(B)X1
8(B)Z1, we obtain
8(B)Z1•
=
(3.1 .23)
Hence { X1 } is a stationary solution of the ARMA equations.
To prove the converse let { X1 } be any stationary solution of (3. 1 .23). Since
r/J(z) =f- 0 for all z E IC such that l z l = 1 , there exists b > 1 such that the series
1 �(z) is absolutely convergent for b- 1 < l z l < b. We can
j
L � -� �j z = r/J(z)therefore apply the operator �(B) to each side of (3. 1 .23) to get
=
�(B)rp(B)X1
or equivalently
= �(B)e(B)Zu
Xt = lj;(B)Zt.
D
§3.2 Moving Average Processes of Infinite Order
In this section we extend the notion of MA(q) process introduced in Section
3. 1 by allowing q to be infinite.
Definition 3.2.1 . If { Z1 }
� WN(O, CJ2 ) then we say that { X1 } is a moving average
(MA(oo)) of {Z1 } if there exists a sequence {t/lj } with L � o l t/lj l < oo such that
xt
=
=
00
t/lj Zt-j,
jI
=O
t
=
0, ± 1, ± 2, . . . .
(3.2. 1)
=
3.2. 1 . The MA(q) process defined by (3. 1 .9) is a moving average of
{Z1 } with t/Jj = ej, j 0, 1, . . . , q and t/Jj O, j > q.
ExAMPLE
EXAMPLE
with t/Jj
=
3.2.2. The AR(1 ) process with l r/J I < 1 is a moving average of {Z1 }
rp j, j = 0, 1 , 2, . . . .
=
EXAMPLE 3.2.3. By Theorem 3. 1 . 1 the causal ARMA(p, q) process rp(B)X1 =
8(B)Z1 is a moving average of {Z1 } with L J=O t/Jj z j 8(z)/r/J(z), l zl � 1 .
It should be emphasized that i n the definition of M A ( oo ) of { Z1 } i t is
required that X1 should be expressible in terms of Z5, s � t, only. It is for this
reason that we need the assumption of causality in Example 3.2.3. However,
even for non-causal ARMA processes, it is possible to find a white noise
sequence {zn such that X1 is a moving average of {zn (Proposition 3.5. 1 ).
Moreover, as we shall see in Section 5.7, a large class of stationary processes
have MA( oo) representations. We consider a special case in the following
proposition.
Proposition 3.2.1 . If { X1 } is a zero-mean stationary process with autocovariance
function y( · ) such that y(h) = 0 for I h i > q and y(q) =/= 0, then { X1 } is an MA(q)
90
3. Stationary ARMA Processes
process, i.e. there exists a white noise process { Z, } such that
X, = Z, + 8 1 Z,� 1 + · · · + 8q Zr �q ·
(3.2.2)
PROOF. For each t, define the subspace A, = sp { X, -oo < s :-s; t } of U and
set
Z, = X, - PA, _ , X, .
(3.2.3)
Clearly Z, E A" and by definition of PAtt - I , Z, E A/� 1 . Thus if s < t, Zs E
As c ,4/, �1 and hence EZ5Z, = 0. Moreover, by Problem 2. 1 8
P5P{Xs . s=r�n , ... ,r � l } Xr � PAtt - I X, as n --> oo,
so that by stationarity and the continuity of the L 2 norm,
II Zr+ 1 ll = 11Xr+ 1 - PA, Xr + 1 l l
= nlim 11 Xr + 1 - Psp{Xs , s=r+ 1 �n, ... ,r} Xr + 1 11
�oo
= nlim I IX, - Psp{Xs , s=r�n , .. . , r � l } Xr l l
�oo
= I I X, - PH,_ , X, I I = II Z, II .
Defining (J 2 = I I Z, I I 2 , we conclude that {Zr }
Now by (3.2.3), it follows that
�
WN(O, (J 2 ).
A,� 1 = sp { X, s < t - 1 , Z, � d
= sp { X, s < t - q, Z, �q , . . . , Z,�d
and consequently A, �1 can be decomposed into the two orthogonal sub­
spaces, A, �q�1 and sp{ Zr� q , . . . , Z, � 1 } . Since y(h) = 0 for I h i > q, it follows that
X, j_ A, �q�1 and so by Proposition 2.3.2 and Theorem 2.4. 1 ,
�H,_ , X, = PAt,_._ , X, + Psp{ z,_. , ... ,z,_, } X,
= 0 + (J � 2 E (X,Z, �1 )Z,�1 + · · · + (J � 2 E (X,Z, �q )Z, �q
= 8 1 z,� 1 + . . . + eq zr�q
where ()i := (J � 2 E (X,Z, �j ), which by stationarity is independent of t for
j = 1 , . . . , q. Substituting for PAt, _ , X, in (3.2.3) gives (3.2.2).
D
If { X ,} has the same autocovariance function as that of an
ARMA(p, q) process, then { X ,} is also an ARMA(p, q) process. In other
words, there exists a white noise sequence {Z,} and coefficients ¢ 1 , . . . , ¢ v ,
8 1, . . . ' eq such that
X, - ¢1 1 Xr � t - . . . - f/JvXr � v = z, + 8 1 Zr � t + . . . + eq zr � q
Remark.
(see Problem 3.19).
91
§3.3. Computing the Autocovariance Function of an ARMA(p, q) Process
The following theorem i s a n immediate consequence of Proposition 3. 1 .2.
Theorem 3.2.1 . The MA(oo) process defined by (3.2. 1 ) is stationary with mean
zero and autocovariance function
y (k)
= a2
OC!
(3.2.4)
I j t/Jj+ lk l ·
j =O t/J
Notice that Theorem 3.2. 1 together with Example 3.2.3 completely deter­
mines the autocovariance function y of any causal ARMA(p, q) process. We
shall discuss the calculation of y in more detail in Section 3.3.
The notion of AR(p) process introduced in Section 3.1 can also be extended
to allow p to be infinite. In particular we note from Theorem 3. 1 .2 that any
invertible ARMA(p, q) process satisfies the equations
=
OC!
=
t 0, ± 1 , ± 2, . . .
X, + L nj xt -j Z,,
j=l
which have the same form as the AR(p) equations (3. 1 . 10) with p
=
oo.
§3.3 Computing the Autocovariance Function of an
ARMA(p, q) Process
We now give three methods for computing the autocovariance function of an
ARMA process. In practice, the third method is the most convenient for
obtaining numerical values and the second is the most convenient for obtain­
ing a solution in closed form.
=
First Method. The autocovariance function y of the causal ARMA(p, q)
process rjJ(B)X, 8(B)Z, was shown in Section 3.2 to satisfy
y (k)
where
= a2
=
00
ro
(3.3. 1 )
I j j k•
j= O t/J i/J +l l
=
for l z l :o;; 1 ,
(3.3.2)
L lj;i z i B(z)/r/J(z)
j= O
B(z) 1 + e l z + . . . + eq z q and r/J(z) 1 rf i Z - . . . - r/Jp z P. In order to
determine the coefficients if;i we can rewrite (3.3.2) in the form lj;(z)rjJ(z)
B(z) and equate coefficients of z i to obtain (defining 80 = 1 , ei 0 for j > q
and r/Ji 0 for j > p),
=
lj;(z)
=
=
=
-
0 :o;; j < max(p q + 1)
,
=
(3.3.3)
92
3. Stationary ARMA Processes
and
= =
j � max(p, q + 1).
(3.3.4)
These equations can easily be solved successively for 1/10 , lj; 1 , lj; , . . . . Thus
2
1/10
lj; l
00
=
el
+
1,
1/Jo r/Jl
=
el
+ ¢1 ,
(3.3.5 )
Alternatively the general solution (3.3.4) can be written down, with the aid of
Section 3.6 as
r; - 1
k
n � max(p, q + 1 ) - p,
aii n i � i ",
(3.3.6)
I
I
=
l
=
O
j
i
where �;, i = 1 , . . . , k are the distinct zeroes of rfo(z) and r; is the multiplicity of
�; (so that in particular we must have I �= l r; p). The p constants aii and the
coefficients lj;i , 0 � j < max(p, q + 1) - p, are then determined uniquely by the
if;. =
=
max(p, q + 1) boundary conditions (3.3.3). This completes the determination
of the sequence { lj;i } and hence, by (3.3.1), of the autocovariance function y .
ExAMPLE
form
3.3. 1 . ( 1 - B + ±B 2 )X1 = ( 1 + B)Z1• The equations (3.3.3) take the
and (3.3.4) becomes
1/10 = 00 = 1 ,
1/11 = 81 + I/Jo r/J 1 = 81 + r/J 1 = 2,
lj;j - 1/Jj - 1
+ t i/Jj - 2 = 0,
The general solution of (3.3.4) is (see Section 3.6)
j � 2.
n � 0.
The constants a 1 0 and a 1 1 are found from the boundary conditions lj;0
and if; 1 = 2 to be
a 1 0 = 1 and a 1 1 = 3.
Hence
if;. = (1 + 3n)T ",
n = 0, 1 , 2, . . . .
Finally, substituting in (3.3 . 1 ), we obtain for k � 0
y( k) = a2 I ( 1 + 3j)( 1 + 3j + 3k)rzj - k
00
j= O
= a2 rk I [(3k + 1 )4-j + 3 (3k + 2)j4-j + 9/4 -j J
00
j=O
= a2 r k [j:(3k + 1) + V (3k + 2) + 1NJ
= a 2 rk [ 332 + 8k].
=
1
§3.3. Computing the Autocovariance Function of an ARMA(p, q) Process
93
Second Method. An alternative method for computing the autocovariance
function y( · ) of the causal ARMA(p, q)
l/J(B)X,
= 8(B) ,
(3.3.7)
Z,
is based on the difference equations for y(k), k = 0, 1, 2, . . . , which are obtained
by multiplying each side of (3.3.7) by X,_ k and taking expectations, namely
and
0 s k < max(p, q + 1 ),
y(k) - ¢11 y(k - 1 ) - . . . - l/Jp y(k - p) = 0,
=
k ri - 1
y(h) = I. I. {3
i =1
(3.3.8)
k :2: max(p, q + 1 ). (3.3.9)
(In evaluating the right-hand sides of these equations we have used the
representation X, I.� o tf;jZr -j · )
The general solution of (3.3.9) has the same form as (3.3.6), viz.
j= O
ij
hj � i \
:2:
h
max(p, q + 1 ) - p,
(3.3. 10)
where the p constants {3ii and the covariances y(j), 0 s j < max(p, q + 1 ) - p,
are uniquely determined from the boundary conditions (3.3.8) after first com­
puting t/;0, tj; 1 , . . . , t/;q from (3.3.5).
ExAMPLE
=
3.3.2. ( 1 - B + iB 2 ) X, ( 1 + B)Z,. The equations (3.3.9) become
y(k) - y(k - 1 ) + t y(k - 2) = 0,
k :2: 2,
with general solution
n :2: 0.
(3.3.1 1)
y(n) = ( /31 0 + {31 1 n) T n,
The boundary conditions (3.3.8) are
y(O) - y(l) + t y(2) a 2 (t/Jo + t/1 1 ),
y(1) - y(O) + ty{l) a 2 tj;0 ,
where from (3.3.5), t/;0 1 and t/; 1 = 81 + ¢1 1 = 2. Replacing y(O), y(l) and y(2)
in accordance with the general solution (3.3. 1 1) we obtain
3{31 0 - 2{3 1 1 1 6a 2 ,
- 3f31o + 5{31 1 = 8a z ,
=
=
=
whence {31 1
= 8a2
=
and {3 1 0 = 32a 2/3. Finally therefore we obtain the solution
y(k)
= a2 2-k [ 3l
+
8k],
as found in Example 3.3. 1 using the first method.
94
3. Stationary ARMA Processes
ExAMPLE 3.3.3 (The Autocovariance Function of an MA(q) Process). By
Theorem 3.2. 1 the autocovariance function of the process
has the extremely simple form
l k l � q,
(3.3. 1 2)
l k l > q.
where 80 is defined to be 1 and ()j, j > q, is defined to be zero.
ExAMPLE 3.3.4 (The Autocovariance Function of an AR(p) Process). From
(3.3. 1 0) we know that the causal AR(p) process
</J(B)Xr = Zr,
has an autocovariance function of the form
k
ri-1
y(h) = I :L f3ij hj � � \
i =l j=O
(3.3. 1 3)
where C i = 1, . . . , k, are the zeroes (possibly complex) of </J(z), and r; is the
multiplicity of � ;- The constants f3ij are found from (3.3.8).
By changing the autoregressive polynomial </J( · ) and allowing p to be
arbitrarily large it is possible to generate a remarkably large variety of
covariance functions y( · ). This is extremely important when we attempt to
find a process whose autocovariance function "matches" the sample auto­
covariances of a given data set. The general problem of finding a suitable
ARMA process to represent a given set of data is discussed in detail in
Chapters 8 and 9. In particular we shall prove in Section 8. 1 that if y( · ) is any
covariance function such that y(h) --> 0 as h --> oo, then for any k there is a
causal AR(k) process whose autocovariance function at lags 0, 1 , . . . , k,
coincides with y(j), j = 0, 1, . . . , k.
We note from (3.3. 1 3) that the rate of convergence of y(n) to zero as n --> oo
depends on the zeroes � ; which are closest to the unit circle. (The causality
condition guarantees that I � ; I > 1 , i = 1, . . . , k.) If </J( - ) has a zero close to the
unit circle then the corresponding term or terms of (3.3. 1 3) will decay in
absolute value very slowly. Notice also that simple real zeroes of </J( · ) contri­
bute terms to (3.3 . 1 3) which decrease geometrically with h. A pair of complex
conjugate zeroes together contribute a geometrically damped sinusoidal term.
We shall illustrate these possibilities numerically in Example 3.3.5 with refer­
ence to an AR(2) process.
§3.3. Computing the Autocovariance Function of an ARMA(p, q) Process
EXAMPLE
95
3.3.5 (An Autoregressive Process with p = 2). For the causal AR(2),
we easily find from (3.3. 1 3) and (3.3.8), using the relations
r/J 1 = G 1 + G 1 ,
and
that
Figure 3.3 illustrates some of the possible forms of y( · ) for different values
of �1 and � 2 . Notice that if �1 = re i6 and � 2 = re- i6, 0 < 8 < n, then we can
rewrite ( 3.3.14) in the more illuminating form,
0"2r4 , -h sin(h8 + 1/1 )
y(h) =
(3.3. 1 5)
(r2 - 1 ) (r4 - 2r2 cos 28 + 1 ) 112 sin 8 '
•
where
r2 + 1
tan 1/J = --- tan 8
r2 - 1
(3.3. 1 6)
and cos 1/J has the same sign as cos 8.
Third Method. The numerical determination of the autocovariance function
y( · ) from equations ( 3.3.8) and (3.3.9) can be carried out readily by first finding
y(O), . . . , y( p) from the equations with k = 0, 1, . . . , p, and then using the
subsequent equations to determine y(p + 1 ), y( p + 2), . . . recursively.
EXAMPLE 3.3.6. For the process considered in Examples 3.3. 1 and 3.3.2 the
equations (3.3.8) and (3.3.9) with k = 0, 1, 2 are
y(O) - y(l) + t y(2) = 30"2 ,
y(1) - y(O) + ty(1) = 0"2,
y(2) - y(l) + t y(O) = 0,
with solution y(O) = 320" 2 /3, y(1) = 280" 2/3, y(2) = 200"2/3. The higher lag
autocovariances can now easily be found recursively from the equations
y(k) = y(k - 1 ) - t y(k - 2),
k = 3, 4, . . . .
3. Stationary ARMA Processes
96
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0 2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
- 0. 9
-1
0
10
5
15
20
(a)
0.9
0. 8
0 7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
4------ ------ ------ -----�
-0.2 -
-0.3
-0 4
-0.5
-0.6
-0.7
-0.8
-0 9
-I
----,----,----,
------,��----.---.--�-----,--,--,-,--,
--,--,----,
-.----,
---1
0
5
(b)
10
15
20
Figure 3.3. Autocorrelation functions y(h)/y(O), h = 0, . . . , 20, of the AR(2) process
(1 - � ! 1 B) ( 1 - ¢2 1 B)X, Z, when (a) � 1 = 2 and � 2 = 5, (b) � 1 = � and �2 = 2,
(c) � 1 = - � and � 2 = 2, (d) � 1 , � 2 = 2( 1 ± ij3)/3 .
=
§3.3. Computing the Autocovariance Function of an AR MA(p, q) Process
97
·--------
0 9
0.8 -
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0 I
-0.2
-0 3
-0 4
-0.5
-0.6
-0.7
-0.8
-0.9
-I
0
5
0
5
(c)
10
15
20
10
15
20
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0 5
-0.6
-0.7
- 0.8
-0.9
-1
(d )
Figure 3.3 Continued
3. Stationary ARMA Processes
98
§3.4 The Partial Autocorrelation Function
The partial autocorrelation function, like the autocorrelation function, conveys
vital information regarding the dependence structure of a stationary process.
Like the autocorrelation function it also depends only on the second order
properties of the process. The partial autocorrelation a(k) at lag k may be
regarded as the correlation between X 1 and Xk + 1 , adjusted for the intervening
observations X2 , , Xk . The idea is made precise in the following definition.
. • •
Definition 3.4.1. The partial autocorrelation function (pacf) a( · ) of a stationary
time series is defined by
a( l ) = Corr(X2 , Xd = p(l),
and
k :?:. 2,
where the projections PSJ>p , xz, . . . , xk) Xk + 1 and PSJ>p , x2, , x,J X 1 can be found
from (2.7. 1 3) and (2.7. 1 4). The value a(k) is known as the partial autocorrelation
at lag k.
. . •
The partial autocorrelation a(k), k :?:. 2, is thus the correlation of the two
residuals obtained after regressing Xk+1 and X 1 on the intermediate observa­
tions X 2 , , Xk . Recall that if the stationary process has zero mean then
P;;p{ 1 , x2 , ••• , xk} ( " ) = P;;p{ x2 , , xk} ( " ) (see Problem 2.8).
. • .
•••
EXAMPLE
3.4. 1 . Let {X,} be the zero mean AR(l) process
X, = .9X, _ 1 + Z,.
For this example
a(l) = Corr(X2 , X1 )
= Corr(.9X1
+
Z2 , Xd
= .9
since Corr(Z2 , Xd = 0. Moreover P;;p{xz , . . . , xk} Xk + 1 = .9Xk by Problem 2. 1 2
and P;;p { x z . ... , xk} X1 = .9X2 since (X 1 , X2 , , Xk )' has the same covariance
matrix as (Xk + 1 , X k , . . . , X 2 )'. Hence for k :?:. 2,
a(k) = Corr(Xk+1 - .9Xb X 1 - .9X 2 )
• • •
= Corr(Zk + 1 , X 1 - .9Xz )
= 0.
A realization of 1 00 observations { X" t = 1 , . . . , 1 00} was displayed in Figure
3.2. Scatter diagrams of (X,_ 1 , X,) and (X, _ 2 , X,) are shown in Figures 3.4 and
3.5 respectively. The sample correlation p(l) = 2:: (�1 (X, - X)(X,+1 - X)/
c�=:�? (X, - X)2 ] for Figure 3.4 is .814 (as compared with the corresponding
99
§3.4. The Partial Autocorrelation Function
6
,-------,--n--�
5
4
3
2
0
-1
-2
-3
-4
0
0
0
0
0
0
0
0
0 0
0
0
0
0
Cbo
0
c9 0
0
0
a
0
0
0
0
0
0
DO
o e
�----,---,---0�---r---.--4
0
-2
-4
4
2
6
Figure 3.4. Scatter plot of the points (x,_ 1 , x,) for the data of Figure 3.2, showing the
line x, = 9 x , 1
.
_
.
6 ,-------,--u,---��-,
0
5
co
4
3
0
2
/
0
0
0
-1
-2
-3
-4
0
0
0
0
oo
o/ o
Do
0
o r:fl
OJ
0
0
0
0
0
0
DO
D li!J
0
0
0
c
0
0
0
0
oo
0
0
oD
0
0
/
0
0
4-----,------.-----ro
d___�-----,-----,-----,-----,-----,----�
6
4
2
0
-2
-4
Figure 3.5. Scatter plot of the points (x,_ 2 , x,) for the data of Figure 3.2, showing the
line x, = .8 1x,_2.
100
3. Stationary ARMA Processes
=
theoretical correlation p(l) .9). Likewise the sample correlation p(2) =
1
"Li��\ (X1 - X) (X1 +2 - X)/[L 1 21° (X1 - X) 2 ] for Figure 3.5 is .605 as compared
with the theoretical correlation p(2) = .8 1 . In Figure 3.6 we have plotted the
points (X1_ 2 - .9X1_ 1 , X1 - .9X1_ 1 ). It is apparent from the graph that the
sample correlation between these variables is very small as expected from the
fact that the theoretical partial autocorrelation at lag 2, i.e. a(2), is zero. One
could say that the correlation between X1_ 2 and X1 is entirely eliminated when
we remove the information in both variables explained by X1 _ 1 .
ExAMPLE
3 .4.2 (An MA(1 ) Process). For the moving average process,
1 (} 1 < 1 , { Z1 } "' WN(0, 0" 2 ),
XI = Zl + (JZI - 1 '
we have
a(1) = p(l) = (}j(l + (} 2 ).
A simple calculation yields PSil{ x2 ) X 3 = [ (J/( 1 + (} 2 ) ] X2 = PSil{x 2 JX1, whence
a(2) Corr(X 3 - (}(1 + (} 2 )- 1 X2 , X 1 - (}( 1 + (} 2 ) - 1 X2 )
- (]2/( 1 + (]2 + (} 4 ).
=
=
= -
More lengthy calculations (Problem 3.23) give
( (J)k ( l - (]2 )
a(k) - - (J2<k+ll
1
=
One hundred observations {X0 t = 1, . . . , 1 00} of the process with (} = - .8
and p(l) - .488 were displayed in Figure 3.1. The scatter diagram of the
points (X1_ 2 + .488X1_ 1 , X1 + .488X1_ d is plotted in Figure 3.7 and the
sample correlation of the two variables is found to be - .297, as compared
with the theoretical correlation a(2) - (.8) 2 /(1 + .8 2 + .8 4 ) = - .3 1 2.
ExAMPLE
=
=
=
3.4.3 (An AR(p) Process). For the causal AR process
{ Z1 } "' WN (0, 0" 2 ),
XI 1/J! Xl- 1 . . . - 1/Jp XI -p Zo
-
we have for k > p,
-
.
p
L I/Ji Xk+1-i •
j=l
since if Y E sp { X2 , . . . , Xk } then by causality Y E sp { Zi ,j :s; k} and
Psp{X2 , , x. ) Xk+ 1
I\ xk+! - jf=l ¢Jj xk+1 -j , Y
For k > p we conclude from (3.4. 1 ) that
a(k)
= (
=
Corr xk+J -
0.
)=
<Zk+ 1 , Y )
=
.f I/Ji Xk+J -i • X 1 - PSi>{ x2,
j =!
••
o.
. , x. ) X J
)
(3.4. 1 )
101
§3.4. The Partial Autocorrelation Function
3
0
0
0
2
0
0
0
0
0
-1
0
0
0
-2
-3
�
0
o on::t:J
00
0 0
0
0
0
0
0
oo
0
0
0
0
-3
0
ncr:P
0
0
0
oo
QJ
�
0
ow::
6
0
0
0
0
0
0 o c9 0
0
l:b_
u
0
0
cP
0
0
0
0
0
0
'2J
c\] o
0
0
0
0
o
o
0
0
0
0
3
-1
Figure 3.6. Scatter plot of the points (x,_ 2 - .9xt - l , x, - .9x,_ 1 ) for the data of
Figure 3.2.
3 ,-------.---,
2
0
0
-2
0 0
0
0
- 3 4-------.---4---�
3
-1
-3
Figure 3.7. Scatter plot of the points (x,_ 2
Figure 3. 1 , showing the line y = - .3 1 2x.
+
.488xr-� , x,
+
.488xr - � ) for the data of
3. Stationary ARMA Processes
1 02
For k � p the values of rx(k) can easily be computed from the equivalent
Definition 3.4.2 below, after first determining p( j ) = y( j )/y(O) as described in
Section 3.3.
In contrast with the partial autocorrelation function of an AR(p) process,
that of an MA(q) process does not vanish for large lags. It is however bounded
in absolute value by a geometrically decreasing function.
An Equivalent Definition of the Partial Autocorrelation
Function
=
=
Let {X, } be a zero-mean stationary process with autocovariance function y( · )
such that y(h) --+ 0 as h --+ oo , and suppose that tPki' j 1, . . . , k; k 1, 2, . . . ,
are the coefficients in the representation
k
. . . . x,1 X k + t = L tPki x k + t -j ·
j= 1
Then from the equations
P'P{X�o
[
j
we obtain
p(O)
p(1 )
p(1 )
p(O)
p(2)
p( l )
p(k � 1 ) p(k - 2) p(k - 3)
Definition 3.4.2.
TJ
=
[_ ]
k, . . . , 1 ,
.,
Ml
p(k - l
p(2)
p(k - 2) tP k z
...
... - ... '
p(k)
p(O)
tPkk
k
"?.
1.
(3.4.2)
The partial autocorrelation rx(k) of { X1} at lag k is
k
"?.
1,
where tPkk is uniquely determined by (3.4.2).
The equivalence of Definitions 3.4. 1 and 3.4.2 will be established in Chapter
5, Corollary 5.2. 1 . The sample partial autocorrelation function is defined
similarly.
The sample partial autocorrelation &(k) at lag k of
, x.} is defined, provided X; =I= xi for some i and j, by
Definition 3.4.3.
{ x1 ,
. • •
&(k) = (/Jkk >
1 � k < n,
where (/J kk is uniquely determined by (3.4.2) with each p(j) replaced by the
corresponding sample autocorrelation p(j).
1 03
§3.5. The Autocovariance Generating Function
§3.5 The Autocovariance Generating Function
If { X1 } is a stationary process with autocovariance function y ( ), then its
autocovariance generating function is defined by
·
G (z) =
y( )zk,
k=L-oo k
00
(3.5. 1)
provided the series converges for all z in some annulus r - 1 < l z l < r with r > 1.
Frequently the generating function is easy to calculate, in which case the
autocovariance at lag k may be determined by identifying the coefficient of
either zk or z - k . Clearly { X1} is white noise if and only if the autocovariance
generating function G(z) is constant for all z. If
and there exists r >
(3.5.2)
j= - oo
1 such that
1 1/!) z i < oo ,
j=Loo
00
(3.5.3)
< l z l < r,
the generating function G ( ) takes a very simple form. It is easy to see that
r
-1
·
y(k) = Cov(XI +k> XI ) = a 2
and hence that
G(z) = a 2
L 1/!i l/li + l k l •
j=:. oo
00
-
1/Jil/li + lk l z k
k=L- oo j=L- oo
00
00
Defining
1/!(z) =
L 1/!iz i,
j=-oo
ro
r
-1
<
lzl
< r,
we can write this result more neatly in the form
r
-1
<
lzl
< r.
(3. 5 .4)
EXAMPLE 3.5. 1 (The Autocovariance Generating Function of an ARMA(p, q)
Process). By Theorem 3. 1 .3 and (3. 1 .2 1), any ARMA process </J(B)X1 = 8(B)Z1
for which </J(z) # 0 when l z l = I can be written in the form (3.5.2) with
1/J (z) = 8(z)/</J (z),
r
-1
<
lzl
< r
3. Stationary ARMA Processes
104
for some r > 1 . Hence from (3.5.4)
(:J(z)(:J(z - 1 )
G (z) = (J 2 rp rp - 1 '
(z) (z )
r - 1 < l z l < r.
(3.5.5)
In particular for the MA(2) process
X, = Z, + 81 Z,_ 1 + 82 Z,_ 2 ,
we have
G (z) = (J 2 (1 + 81 z + 82 z 2 ) ( 1 + 81 z- 1 + 82 z - 2 )
= (J 2 [(1 + 8 i + 8�) + (8 1 + 8 1 8 2 )(z + z - 1 ) + 8 2 (z 2 + z - 2 )],
from which we immediately find that
y(O) = (J 2 ( 1 + ei + e� ),
y( ± 1 ) =
y( ± 2) =
(J2 8 1 ( 1 + 82 ),
(J 2 8 z
y(k)
0 for l k l > 2.
and
ExAMPLE 3.5.2.
=
Let {X,} be the non-invertible MA(1 ) process
{Z,} ,..., WN(O, (J2 ).
X, = Z, - 2Z, _ 1 ,
The process defined by
Z( := ( 1 - .5B) - 1 ( 1 - 2B)Z,
=
i
( 1 - .5B) - 1 X, = L (.S) x ,_ j ,
(1)
has autocovariance generating function,
j=O
( 1 - 2z)(l - 2z - 1 ) 2
(J
( 1 - .5z)(l - .5z- 1 )
4(1 - 2z)(1 - 2z - 1 ) 2
(J
=
( 1 - 2z)(1 - 2z - 1 )
= 4(J2 .
G(z) =
It follows that {Zi}
representation,
,...,
WN(O, 4(J 2 ) and hence that {X,} has the invertible
X, = Z( - .5Zi- 1 •
A corresponding result for ARMA processes is contained in the following
proposition.
§3.6.* Homogeneous Linear Difference Equations with Constant Coefficients
Proposition 3.5.1 . Let { X r }
¢(B)Xr
=
105
be the ARMA(p, q) process satisfying the equations
=
B(B)Zr ,
where ¢(z) ¥- 0 and B(z) ¥- 0 for all z E C such that l z l 1. Then there exist
polynomials, ;jy(z) and il(z), nonzero for l z l � 1, of degree p and q respectively,
and a white noise sequence {Zi} such that {Xr } satisfies the causal invertible
equations
PROOF. Define
-
¢(z)
-
;jy(B)X r
=
=
f1
¢(z)
il(B)Zi.
(1
r < ] '5, p (1
B(z) = B(z)
f1
-- aj z)
:- 1
-- a, z)
'
( 1 -- bA
:- 1 '
(1 -- b Z)
where a, + 1 , . . . , aP and bs + t : : . . , b q are t J:e zeroes of ¢(z) and B(z) which lie
inside the unit circle. Since ¢(z) ¥- 0 and B(z) ¥- 0 for all I z I � 1 , it suffices to
show that the process defined by
s < 1 ,;, q
1
;jy(B)
Z*r --Xr
il(B)
is white noise. Using the same calculation as in Example 3.5.2, we find that
the autocovariance generating function for { Zi} is given by
Since G(z) is constant, we conclude that {Zi} is white noise as asserted. D
§3.6* Homogeneous Linear Difference Equations
with Constant Coefficients
In this section we consider the solution { hr } of the k th order linear difference
equation
(3.6. 1)
t E T,
where tx 1 , . . . , txk are real constants with tx k ¥- 0 and T is a subinterval of the
integers which without loss of generality we can assume to be [k, oo ), ( -- oo , oo)
or [k, k + r], r > 0. Introducing the backward shift operator B defined by
1 06
3. Stationary ARMA Processes
equation (3. 1 .8), we can write (3.6. 1 ) in the more compact form
where a(B)
= +
1
a(B)h,
Definition 3.6.1 . A set of m
0,
(3.6.2)
t E T,
:::;;
k solutions, { W >, . . . , hlm>}, of(3.6.2) will be called
·.
=
linearly independent if from
it follOWS that C 1
=
a 1 B + · · · + rxk Bk .
= =.
Cz
Cm = 0.
We note that if { hi } and {hi } are any two solutions of (3.6.2) then
{c 1 h,1 + c 2 hi } is also a solution. Moreover for any specified values of
h0 , h 1 , . . . , hk - I , henceforth referred to as initial conditions, all the remaining
values h,, t ¢; [0, k - 1], are uniquely determined by one or other of the recur­
sion relations
t
=
k, k
+
(3.6.3)
1, . . . ,
and
t = - 1 , - 2, . . . . (3.6.4)
Thus if we can find k linearly independent solutions { hp >, . . . , hlk l } of (3.6.2)
then by linear independence there will be exactly one set of coefficients c 1 , ,
ck such that the solution
• • .
(3.6.5)
has prescribed initial values h0, h 1 , . . . , hk - I · Since these values uniquely
determine the entire sequence { h, } we conclude that (3.6.5) is the unique
solution of(3.6.2) satisfying the initial conditions. The remainder of this section
is therefore devoted to finding a set of k linearly independent solutions of (3.6.2).
=
h, (a0 + a 1 t + · · · + ait i )m' where a0,
, ai, m are
(possibly complex-valued) constants, then there are constants b0 , . . . , bj - !
such that
Theorem 3.6.1 . If
PROOF.
(1 - mB)h,
=
. . •
+
(a0 + a 1 t + · · · ak t i )m' - m(a0 + a (t - 1) + · · ·
1
i
+ ak (t - 1 ) )m' - I
= Lt
m'
o
a,(t' - (t - 1 )')
]
and 'f,! = o a,(t ' - (t - 1 )') is clearly a polynomial of degree j - 1 .
0
§3.6.* Homogeneous Linear Difference Equations with Constant Coefficients
1 07
The functions hli1 = t i C', j = 0, 1 , . . , k - 1 are k linearly
independent solutions of the difference equation
Corollary 3.6. 1.
.
(3.6.6)
PROOF. Repeated application of the operator ( 1 C 1 B) to hli1 in conjunction
with Theorem 3.6. 1 establishes that hlil satisfies (3.6.6). If
(c0 + c 1 t + · · · + ck _ 1 t k -1 ) C ' = 0 for t = 0, 1, . . . , k - 1,
then the polynomial L:J;;-6 ci t i, which is of degree less than k, has k zeroes. This
is only possible if c0 = c 1 = · · · = ck- t = 0.
0
-
Solution of the General Equation of Order k
For the general equation (3.6.2), the difference operator a (B) can be written as
j
1
a (B) = Il ( 1 (i B)'•
i�l
where ( i , i = 1, . . ,j are the distinct zeroes of a (z) and ri is the multiplicity of
( i . It follows from Corollary 3.6. 1 that t " (i ', n = 0, 1 , . . . , ri 1 ; i = 1 , . . , j,
are k solutions of the difference equation (3.6.2) since
1
a (B) t"(i' = Il ( 1 (; B)'s ( l - (i1 B)'' t "(i ' = 0.
s =F: i
It i s shown below i n Theorem 3.6.2 and Corollary 3.6.2 that these solutions are
indeed linearly independent and hence that the general solution of (3.6.2) is
-
.
-
.
-
j ri�l
n
(3.6.7)
h, = L L C in t ( i ' ·
n
i � l �o
In order for this general solution to be real, the coefficients corresponding to
a pair of complex conjugate roots must themselves be complex conjugates.
More specifically if ((i , �i ) is a pair of complex conjugate zeroes of a (z) and
(i = d exp(i8i ), then the corresponding terms in (3.6.7) are
which can be rewritten as
r·-1
I 2 [Re(cin ) cos (8; t) + Im (cin ) sin (8; t)] t n d ',
n ::::: Q
or equivalently as
-
ri - 1
n
L a n t d ' cos(8J + bin),
n �o i
with appropriately chosen constants ain and bin ·
-
1 08
3. Stationary ARMA Processes
ExAMPLE 3.6. 1 . Suppose h, satisfies the first order linear difference equation
(1 - � - 1 B)h, = 0. Then the general solution is given by h, = c� - r = h0 C ' .
Observe that if I � I > 1 , then h, decays at an exponential rate as t ---+ oo .
EXAMPLE 3.6.2. Consider the second order difference equation ( 1 + o: 1 B +
cx 2 B 2 )h, = 0. Since 1 + cx 1 B + cx 2 B 2 = ( 1 - G 1 B) ( 1 - G 1 B), the character of
the general solution will depend on � 1 and � 2 .
Case 1 � 1 and � 2 are real and distinct. In this case, h, = c 1 � 1' + c 2 �2.' where
c 1 and c 2 are determined by the two initial conditions c 1 + c 2 = h0
and c 1 �1 1 + c 2 G 1 = h 1 . These have a unique solution since � 1 of � 2 •
Case 2 � 1 = � 2 . Using (3.6.7) withj = 1 and r1 = 2 we have h, = (c0 + c 1 t)�1' ·
Case 3 � 1 = �2 = de i0, 0 < 8 < 2n. The solution can be written either as
c G ' + c�1' or as the sinusoid h, = ad - ' cos(8t + b).
Observe that if 1 � 1 1 > 1 and 1 � 2 1 > 1 , then in each of the three cases, h,
approaches zero at a geometric rate as t ---+ oo. In the third case, h, is a damped
sinusoid. More generally, if the roots of cx(z) lie outside the unit circle, then
the general solution is a sum of exponentially decaying functions and ex­
ponentially damped sinusoids.
We now return to the problem of establishing linear independence of the
solutions t " � i ', n = 0, 1 , . . . , r; - 1; i = 1 , . . . , j, of (3.6.2).
Theorem 3.6.2.
If
q p
i
I I cli t ml = 0 for t = 0, 1, 2, . . .
1=1 j =O
where m 1 , m 2 ,
j = 0, 1, . . . , p.
• . .
(3.6.8)
, mq are distinct numbers, then cli = 0 for l = 1 , 2, . . . , q;
PROOF. Without loss of generality we can assume that l m 1 1 ;;:::: 1 m 2 I ;;:::: · · · ;;::::
l mq l > 0. It will be sufficient to show that (3.6.8) implies that
c l i = 0,
j = 0, . . . , p
(3.6.9)
since if this is the case then equations (3.6.8) reduce to
t = 0, 1 , 2, . . . ,
which in turn imply that c 2 i = 0, j = 0, . . . , p. Repetition of this argument
shows then that cli = O, j = 0, . . . , p; l = 1, . . . , q.
To prove that (3.6.8) implies (3.6.9) we need to consider two separate cases.
Case 1 l m 1 1 > l m 2 1. Dividing each side of (3.6.8) by t P m� and letting t ---+ oo ,
we find that c 1 P = 0. Setting c 1 P = 0 in (3.6.8), dividing each side by
t p - 1 m� and Jetting t ---+ oo , we then obtain c 2 P = 0. Repeating the
§3.6. * Homogeneous Linear Difference Equations with Constant Coefficients
109
procedure with divisors t P - 2 mL t P - 3 mL . . . , m� (in that order) we find
that e l i = O, j = 0, 1, . . . , p as required.
Case 2 l m 1 1 = 1 m 2 I = · · · = l ms l > l ms+ 1 1 > 0, where s s q. In this case we can
write mi = re i8; where - rc < ()i s rc and 8 1 , . . . , ()s are all different.
Dividing each side of (3.6.8) by t P r' and letting t -> oo we find that
s
(3.6. 1 0)
L c1P e ;o,r -> 0 as t -> 00 .
1 �1
We shall now show that this is impossible u�less c 1 P = c 2P =
g, = Lf=1 c1P e ;o r and let A., n = 0, 1, 2, . . . , be the matrix
.
e i8 2 n
,
l,
· · ·
= csp = 0. Set
,. , J
e i8t (n + 1 )
e i82(n + 1 )
e i85(n +1 )
(3.6. 1 1)
:
:
e i81 (n...+ s - 1 ) e i82(n + s - 1 )
e iO.(n + s- 1 )
Observe that det A. = e ;<o, + ··· + O.J" (det A0). The matrix A0 is a Vandermonde
matrix (Birkhoff and Mac Lane ( 1 965)) and hence has a non-zero determinant.
Applying Cramer' s rule to the equation
An =
we have
det M
c1 P '
det A.
(3.6. 1 2)
where
M=
Since g. -> 0 as n -> oo, the numerator in (3.6. 1 2) approaches zero while the
denominator remains bounded away from zero because l det A. l = l det A 0 1 > 0.
Hence c 1 P must be zero. The same argument applies to the other coefficients
c 2P , . . . , csp showing that they are all necessarily zero as claimed.
We now divide (3.6.8) by t P - 1 r' and repeat the preceding argument, letting
t -> oo to deduce that
s
L c1 , p - 1 e ;o, , -> 0 as t -> oo ,
1 =1
and hence that c1. p _1 = 0, I = 1 , . . . , s. We then divide by t P - 2 r', . . . , r ' (in that
order), repeating the argument at each stage to deduce that
clj = O, j = O, 1, . . . , p and I = 1 , 2, . . . , s.
3. Stationary ARMA Processes
1 10
This shows that (3.6.8) implies (3.6.9) in this case, thereby completing the proof
of the theorem.
0
r
Corollary 3.6.2. The k solutions t " C , n 0, 1 , . . . , ri - 1 ; i = 1, . . . , j, of the
difference equation (3.6.2) are linearly independent.
PROOF. We must show that each c in is zero if "2J= 1 L �;,:-� c int " C r 0 for
t = 0, 1 , . . . , k - I . Setting hr equal to the double sum we have a(B)hr = 0
and h0 h 1 = · · · = h k _ 1 0. But by the recursions (3.6.3) and (3.6.4), this
necessarily implies that hr = 0 for all t. Direct application of Theorem 3.6.2
with p = max { r1 , , ri } completes the proof.
0
=
=
=
=
• . •
Problems
3 . 1 . Determine which of the following processes are causal and/or invertible:
(a) X, + .2X,_1 - .48X,_ 2 = Z,
(b) X, + 1 .9 X,_ 1 + .88X,_ 2 = Z, + .2Z, _ 1 + .7Z,_ 2 ,
(c) X, + .6X,_ 2 = Z, + 1 .2Z, _ 1 ,
(d) X, + 1 .8X,_ 1 + .8 1 X, _ 2 = Z,
(e) X, + 1 .6X,_1 = Z, - .4Z, _ 1 + .04Z,_ 2 .
3.2. Show that in order for an AR (2) process with autoregressive polynomial t/J(z) =
I - t/! 1 z - t/J2 z 2 to be causal, the parameters (t/!1 , t/!2 ) must lie in the triangular
region determined by the intersection of the three regions,
tPz + tPt < 1 ,
tPz - tP t < 1 ,
l t/Jz l < I .
3.3. Let { X, t = 0, ± 1 , . . . } be the stationary solution o f the non-causal A R ( l )
equations,
X, = t/JX, _ 1 + Z,,
lt/J I > ! .
Show that { X, } also satisfies the causal AR( 1 ) equations,
X, = rr X,_ 1 + 2,,
{ Z, }
�
WN(0, 0' 2 ),
for a suitably chosen white noise process {Z, } . Determine 0'2 .
3.4. Show that there is no stationary solution of the difference equations
X, = t/J X, _·1 + Z,,
if tP = ± I .
3.5. Let { Y,, t = 0, ± 1 , . . . } be a stationary time series. Show that there exists a
stationary solution { X, } of the difference equations,
x, - ¢ 1 x 1 - . . · - t/JpX,_p = Y, + 61 Y,_ 1 + . . · + oq Y, - q•
t
-
if t/J(z) = 1 - ¢ 1 z - . . · - t/JpzP =1- 0 for i z l
=
show that { X, } is a causal function of { Y, } .
I. Furthermore, if t/J(z) =1- 0 for i z l
:-::;;
1
111
Problems
3.6. Suppose that {X, } is the ARMA process defined by
1/i (B)X,
=
O(B)Z,,
{ Z, }
�
WN(O, a 2 ),
where 1/J( · ) and 0( " ) have no common zeroes and 1/J(z) =f. 0 for l z l = 1 . If �( · ) is
any polynomial such that �(z) =f. 0 for l z l 1, show that the difference equations,
=
�(B)I/I(B) Y,
=
�(B)O(B)Z,,
have the unique stationary solution, { Y, }
=
{ X, } .
3.7. Suppose {X, } i s a n invertible ARMA(p, q) process satisfying (3. 1 .4) with
=
Z,
00
L njXr -j ·
j=O
Show that the sequence { nj} is determined by the equations
nj +
min(q ,j)
L Ok nj k
=!
k
where we define <Po = - 1 and ok
-
=
=
j
- 1/ij ,
0 for k
>
=
0, 1, . . .
q and 1/ij
=
0 for j
>
p.
3.8. The process X, = Z, - Z,_ � > { Z, } WN(O, a 2 ), is not invertible according to
Definition 3 . 1 .4. Show however that Z, E sp { Xj, -oo < j :::; t} by considering the
mean square limit of the sequence L}= o (1 - j/n)X,_j as n -> oo .
�
3.9. Suppose {X, } i s the two-sided moving average
X,
00
=
.
L 1/Jj Zr-j•
where Lj l 1/Jj l < oo. Show that L ;;'= -oo I y(h)l < oo where y( · ) is the autocovariance
function of {X, } .
3.1 0 . Let { Y, } be a stationary zero-mean time series. Define
X, = ( 1 - .4B) Y,
and
w;
=
(1 - 2.58) Y,
=
=
Y, - .4 ¥,_ 1
Y, - 2.5 Y, _, .
(a) Express the autocovariance functions of {X, } and { W, } in terms of the
autocovariance function of { Y, } .
(b) Show that {X, } and { W, } have the same autocorrelation functions.
j
(c) Show that the process U, = L:� 1 (.4) Xr+j satisfies the difference equations
U, - 2.5U,_ 1 = X,.
-
3. 1 1 . Let {X, } be an ARMA process with 1/J(z) =f. 0, l z l = 1 , and autocovariance func­
tion y( · ). Show that there exist constants C > 0 and s E (O, 1 ) such that ly(h)l :::;
Cs lhl, h = 0, ± 1 , . . . and hence that L ;;'= - oo l y(h)l < oo .
3. 1 2. For those processes in Problem 3. 1 which are causal, compute and graph their
autocorrelation and partial autocorrelation functions using PEST.
3. 1 3. Find the coefficients 1/Jj , }
=
0, 1, 2, . . . , in the representation
00
X, = L 1/Jj Zt-j
j= O
1 12
3. Stationary ARMA Processes
of the ARMA(2, I ) process,
( I - .5B
+
.04B2 )X, = ( I + .25B)Z,,
3.14. Find the autocovariances y(j), j = 0, 1, 2, . . . , of the AR(3) process,
( 1 - .5B) ( 1 - .4B) ( I - . 1 B) X, = Z,,
{ Z, }
�
WN(O, 1 ).
Check your answers for j = 0, . . . , 4 with the aid of the program PEST.
3 . 1 5. Find the mean and autocovariance function of the ARMA(2, I) process,
X, = 2 + 1 .3X, _ 1 - .4X, _ 2 + Z, + Z, _ 1 ,
Is the process causal and invertible?
3 . 1 6. Let {X, } be the ARMA(I, 1 ) process,
X, - I/JX, _1
=
Z, + 8Z,_ 1 ,
where 1 1/J I < I and 1 8 1 < I . Determine the coefficients {1/!i } i n Theorem 3. 1 . 1
and show that the autocorrelation function of { X, } is given by p(l) =
1
( I + ¢8) (1/J + 8)/( 1 + 82 + 2¢8), p(h) = 1/J h - p ( 1 ) for h ;::::: I .
3 . 1 7. For a n MA(2) process find the largest possible values of l p(1)1 and l p(2) 1.
3. 1 8. Let {X,} be the moving average process
{ Z,}
�
IID(O, 1 ).
(a) If Z� := ( I - .5B) - 1 X,, show that
where .lf, _ 1 = sp{X., - oo < s < t}.
(b) Conclude from (a) that
Specify the values of 8 and a 2 .
(c) Find the linear filter which relates { Z,} to { zn , i.e. determine the coeffi­
IJ(jz, _ j·
cients {IJ(J in the representation z� = Ii�
(d) If EZ� = c , compute E((ZWZ!l. If c -=1- 0, are Z! and Z! independent? If
Z, N(O, 1 ), are Z! and Z! independent?
- ro
�
3 . 1 9. Suppose that {X,} and { Y;} are two zero-mean stationary processes with the
same autovariance function and that { Y;} is an ARMA(p, q) process. Show that
{X,} must also be an ARMA(p, q) process. (Hint: If ¢ 1 , . . . , </>P are the AR
coefficients for { Y;}, show that { W, := X, - </>,X, _ , - · · · - </>pX r - p } has an
autocovariance function which is zero for lags I hi > q. Then apply Proposition
3.2. 1 to { W,}.)
3.20. (a) Calculate the autocovariance function y( · ) of the stationary time series
(b) Use program PEST to compute the sample mean and sample autocovari­
ances y(h), O :0::: h :0::: 20, of {VV 1 X, } where {X,, t = 1, . . . , 72 } is the accidental
2
deaths series of Example 1 . 1 .6.
1 13
Problems
(c) By equating '9(1 ), y(l l) and '9(12) from part(b) to y ( l ), y(l l ) and y ( l 2) respec­
tively from part(a), find a model of the form defined in (a) to represent
{ VV 1 X, }.
2
3.2 1 . B y matching the autocovariances and sample autocovariances a t lags 0 and 1,
fit a model o f t h e form
X, - 11 = ¢(X,_1 - /1) + Z,,
to the strikes data of Example 1 . 1 .3. Use the fitted model to compute the best
linear predictor of the number of strikes in 1 98 1 . Estimate the mean squared
error of your predictor.
3.22. If X, = Z, - (}Z,_1 , { Z, }
WN(0, 0"2 ) and 1 (} 1 < 1 , show from the prediction
equations that the best linear predictor of Xn+l in sp { X� > . . . , X"} is
�
n
xn+l = I (jlj Xn + ! -j'
j� !
where ¢1 , . . . , ifln satisfy the difference equations,
- Oiflj -! + ( I + 02)¢j - (}(jlj + !
=
0,
2 s j s n - 1,
with boundary conditions,
and
3.23. Use Definition 3.4.2 and the results of Problem 3.22 to determine the partial
autocorrelation function of a moving average of order I .
3.24. Let { X, } be the stationary solution of ¢(B) X, = (}(B)Z,, where { Z,} WN(O, 0"2),
(jl(z) # 0 for all z E C such that l z l = I , and ¢( · ) and 0( · ) have no common zeroes.
If A is any zero-mean random variable in L 2 which is uncorrelated with { X, }
and if I z0 I = I , show that the process { X, + Az� } i s a complex-valued sta­
tionary process (see Definition 4. 1 . 1 ) and that {X, + Az� } and {X, } both satisfy
the equations ( I - z0B)¢(B)X, = ( I - z0 B)(}(B)Z,.
�
CHAPTER 4
The Spectral Representation of
a Stationary Process
The spectral representation of a stationary process { Xn t = 0, ± 1, . . . } essen­
tially decomposes { X1 } into a sum of sinusoidal components with uncorrelated
random coefficients. In conjunction with this decomposition there is a cor­
responding decomposition into sinusoids of the autocovariance function of
{ X1 }. The spectral decomposition is thus an analogue for stationary stochastic
processes of the more familiar Fourier representation of deterministic functions.
The analysis of stationary processes by means of their spectral representations
is often referred to as the "frequency domain" analysis of time series. It is
equivalent to "time domain" analysis, based on the autocovariance function,
but provides an alternative way of viewing the process which for some
applications may be more illuminating. For example in the design of a
structure subject to a randomly fluctuating load it is important to be aware
of the presence in the loading force of a large harmonic with a particular
frequency to ensure that the frequency in question is not a resonant frequency
of the structure. The spectral point of view is particularly advantageous in the
analysis of multivariate stationary processes (Chapter 1 1 ) and in the analysis
of very large data sets, for which numerical calculations can be performed
rapidly using the fast Fourier transform (Section 10.7).
§4. 1 Complex-Valued Stationary Time Series
It will often be convenient for us to make use of complex-valued stationary
processes. Although processes encountered in practice are nearly always
real-valued, it is mathematically simpler in spectral analysis to treat them as
special cases of complex-valued processes.
§4. 1 . Complex-Valued Stationary Time Series
1 15
Definition 4.1.1. The process {X1 } is a complex-valued stationary process
E I X1 I 2 < oo, EX1 is independent of t and E(Xt+h X1) is independent of t.
if
As already pointed out in Example 2.2.3, Remark 1 , the complex-valued
random variables X on
satisfying E I X I 2 < oo constitute a Hilbert
space with the inner product
(Q,ff,P)
<X, Y )
=
E(X Y) .
(4. 1 . 1)
Definition 4.1 .2. The autocovariance function y( · ) of a complex-valued
stationary process {XI } is
y(h)
= E(Xt+h X1) - EXt+h EX1 •
�
�
(4. 1 .2)
Notice that Definitions 4. 1 . 1 and 4. 1.2 reduce to the corresponding defini­
tions for real processes if { X1 } is restricted to be real-valued.
Properties of Complex-Valued Autocovariance Functions
The properties of real-valued autocovariance functions which were established
in Section 1.5, can be restated for complex-valued autocovariance functions
as follows:
y(O) � 0,
(4. 1 .3)
ly(h)l � y(O) for all integers h,
y( · ) is a Hermitian function (i.e. y(h)
= y( - h)).
(4. 1 .4)
(4. 1 .5)
We also have an analogue of Theorem 1 .5. 1 , namely
Theorem 4.1 . 1 . A function K( · ) defined on the integers is the autocovariance
function of a (possibly complex-valued) stationary time series if and only if
K( · ) is Hermitian and non-negative definite, i.e. ifand only if K(n) = K( - n) and
L a; K(i - j)iii � 0,
n
i, j = 1
(4. 1 .6)
for all positive integers n and all vectors a = (a!, . . . ' anY E e.
The proofs of these extensions (which reduce to the analogous results in
Section 1 .5. 1 in the real case) are left as exercises (see Problems 4. 1 and 4.30).
We shall see (Corollary 4.3.1) that "Hermitian" can be dropped from the
statement of Theorem 4. 1 . 1 since the Hermitian property follows from the
validity of (4. 1 .6) for all complex a.
1 16
4. The Spectral Representation of a Stationary Process
§4.2 The Spectral Distribution of a Linear
Combination of Sinusoids
In this section we illustrate the essential features of the spectral representation
of an arbitrary stationary process by considering the simple complex-valued
process,
n
X, = I A ( A.) e i' AJ
j= l
(4.2. 1)
in which - n < )� 1 < A. 2 < · · · < )," = n and A(A. d, . . . , A(A.n) are uncorrelated
complex-valued random coefficients (possibly zero) such that
j = 1, . . . , n,
E(A (A.)) = 0,
and
j = 1 , . . . , n.
E(A(A.)A(A.)) = aJ ,
For {X,} to be real-valued it is necessary that A(),") be real and that A.j = An - j
and A(A.) = A(A.n - ) for j = 1, . . . , n - 1 . (Note that A()�) and A (A_" _ ) are
uncorrelated in spite of the last relation.) In this case (see Problem 4.4),
n
X, = I (C(A_Jcos tA,j - D(A,j )sin tA.),
j= l
where A(A.) = C(A.) + iD (A.), j = 1 , . . . , n and D(A_") = 0.
(4.2.2)
It is easy to see that the process (4.2. 1 ), and in particular the real-valued
process (4.2.2), is stationary since
and
E(xr+ h X-r ) =
2
a e ih A',.
j =l j
n
"
L..
the latter being independent of t. Rewriting the last expression as a Riemann­
Stieltjes integral, we see that the process { X, } defined by (4.2. 1) is stationary
with autocovariance function,
y(h) = I
J(-1t,1t]
e ihv dF(v),
(4.2.3)
where F is the distribution function,
F(A.) = I a} .
(4.2.4)
j: AJ <:; A
Notice that the function F, which is known as the spectral distribution function
of { X, } , assigns all of its mass to the frequency interval ( - n, n]. The mass
assigned to each frequency in the interval is precisely the variance of the ran­
dom coefficient corresponding to that frequency in the representation (4.2. 1 ).
§4.3. Herglotz's Theorem
1 17
The equations (4.2. 1) and (4.2.3) are fundamental to the spectral analysis
of time series. Equation (4.2. 1) is the spectral representation of the process
{X, } itself and equation (4.2.3) is the corresponding spectral representation of
the covariance function. The spectral distribution function appearing in the
latter is related to the random coefficients in (4.2. 1 ) through the equation
F(A) L .<; ,; .< E I A (AJI 2 .
The remarkable feature of this example is that every zero-mean stationary
process has a representation which is a natural generalization of (4.2. 1 ),
namely
=
X, =
I
J ( - 1t , 1t]
e itv dZ (v).
(4.2.5)
The integral is a stochastic integral with respect to an orthogonal-increment
process, a precise definition of which will be given in Section 4.7. Corre­
spondingly the autocovariance function Yx ( · ) can be expressed as
Yx (h)
=I
e ihv dF( v),
I
cos vh dF(v).
J ( - 1t . 1t]
=
= =
(4.2.6)
where F is a distribution function with F( - n) 0 and F(n) y(O) E I X, I 2 •
The representation (4.2.6) is easier to establish than (4.2.5) since it does not
require the notion of stochastic integration. We shall therefore establish (4.2.6)
(Herglotz' s theorem) in Section 4.3, deferring the spectral representation of
{ X, } itself until after we have introduced the definition of the stochastic
integral in Section 4.7.
In the special case when { X, } is real there are alternative forms in which
we can write (4.2.5) and (4.2.6). In particular if Yx(h) is to be real it is necessary
that F( - ) be symmetric in the sense that F(A) = F(n - ) - F( - A - ), - n < A < n,
where F(A -) is the left limit of F at A (see Problem 4.25). Equation (4.2.6)
can then be expressed as
Yx ( h) =
J ( - 1t , 1t]
Equivalent forms of (4.2.5) when {X,} is real are given in Problem 4.25.
§4.3 Herglotz's Theorem
Theorem 4. 1 . 1 characterizes the complex-valued autocovariance functions on
the integers as those functions which are Hermitian and non-negative definite.
Herglotz's theorem, which we are about to prove, characterizes them as
the functions which can be written in the form (4.2.6) for some bounded
distribution function F with mass concentrated on ( - n, n].
(Herglotz). A complex-valued function y( · ) defined on the integers
is non-negative definite if and only if
Theorem 4.3.1
1 18
4. The Spectral Representation of a Stationary Process
y(h) = I
J ( - 1t , 1t]
e ih v dF(v) for all h
=
0, ± 1 , . . . ,
(4.3.1)
where F( " ) is a right-continuous, non-decreasing, bounded function on [ - n, n]
and F( - n) = 0. (The function F is called the spectral distribution function of
y and if F(2) J �, !(v) dv, - n ::;; 2 ::;; n, then f is called a spectral density of
=
y( . ).)
PROOF. If y( ·) has the representation (4.3. 1 ) then it is clear that y( · ) is Hermitian,
i.e. y( - h) = y(h). Moreover if a, E C, r = 1, . . . , n, then
r. � 1 a,y(r - s)iis = f " '· � 1 a, iis exp[iv(r - s)] dF(v)
=
fJJ1 a, exp[ivr] 1 2dF(v)
� 0,
so that y( · ) is also non-negative definite and therefore, by Theorem 4. 1 . 1, an
autocovariance function.
Conversely suppose y( · ) is a non-negative definite function on the integers.
Then, defining
fN(v)
.
1 � ".
;_, e - vy (r - s)e !Sv
2 N r, s =l
=n
--
=
.
1
" (N - lm l)e - •mvy(m),
;_,
2 n N lm i < N
-
we see from the non-negative definiteness of y( · ) that
fN(v) � 0 for all v E ( - n, n].
=
Let FN( · ) be the distribution function corresponding to the density
fN( " ) 1 < -" · "k ). Thus FN(2) = 0, 2 ::;; - n, FN(2) FN(n), 2 � n, and
FN(2)
Then for any integer h,
I
J ( - 1t , 1t]
J ( - 1t , 1t]
- 1!
::;; 2 ::;;
1!.
= _.!._ L ( - �) y(m) f" ei(h-m)v dv,
l h l ) y(h)
lhl
h
i
e v dFN(v) =
e ih v dFN(v)
i.e.
I
= J:/N(v) dv,
2n l mi< N
{(
1
1 -N
0'
N
_"
'
< N,
otherwise.
(4.3.2)
1 19
§4.3. Herglotz's Theorem
Since F N(n) = f( - , , ,1 dFN(v) = y(O) < oo for all N, we can apply Helly's theorem
(see e.g. Ash (1 972), p. 329) to deduce that there is a distribution function F
and a subsequence { FNJ of the sequence { FN} such that for any bounded
continuous function g with g(n) = g( - n),
J_ "·"l g(v) dFN"(v) J_"·"l g(v) dF(v)
�
as k � oo .
(4.3.3)
Replacing N by Nk in (4.3.2) and letting k � oo , we obtain
y(h)
=
I
J ( - 1t , 1t]
e ih v dF(v)
(4.3.4)
which is the required spectral representation of y( · ).
0
Corollary 4.3. 1 . A complex-valued function y( · ) defined on the integers is the
autocovariance function of a stationary process {X,, t 0, ± 1, . . . } if and only
if either
(i) y(h) f< -, , ,1 e ih v dF(v) for all h 0, ± 1 , . . . , where F is a right-continuous,
non-decreasing, bounded function on [ - n, n] with F( - n) 0, or
(ii) I ?.i=t ai y(i - j ) ai z 0 for all positive integers n and for all a =
(a 1 , , an )' E Cn.
The spectral distribution function F( · ) (and the corresponding spectral
density if there is one) will be referred to as the spectral distribution function
(and the spectral density) of both y( · ) and of {X,}.
=
=
=
=
. • •
PROOF. Herglotz's theorem asserts the equivalence of (i) and (ii). From (i)
it follows at once that y( · ) is Hermitian. Consequently the conditions of
0
Theorem 4. 1 . 1 are satisfied if and only if y( · ) satisfies either (i) or (ii).
It is important to note that the distribution function F( · ) (with F( - n) = 0)
is uniquely determined by y(n), n = 0, ± 1, . . . . For if F and G are two
distribution functions vanishing on ( - oo, - n], constant on [n, oo) and such
that
y(h)
=
I
J ( - 1t, 1t]
e ih v dF(v)
=
I
J ( - 1t , 1t]
e ihv dG(v),
h
=
0, ± 1, . . . ,
then it follows from Theorem 2. 1 1 . 1 that
I
J ( - 1t , 1t]
rjJ(v) dF(v) = I
=
J ( - 1t , 1t]
r/J (v) dG(v) if rjJ is continuous with r/J(n) = r/J( - n),
and hence that F(),) G(A.) for all ). E ( - oo, oo ).
The following theorem is useful for finding F from y in many important
cases (and in particular when y is the autocovariance function of an ARMA(p, q)
process).
4. The Spectral Representation of a Stationary Process
1 20
Theorem 4.3.2. If K ( · ) is any complex-valued function on the
00
[ K (n)[
n=I- oo
then
K(h) =
where
integers such that
(4.3.5)
< oo ,
f�" eihvf(v) dv, h = 0, 1,
12n L e-inAK(n).
f(A) = ±
...
00
n= - oo
(4.3.6)
(4.3.7)
PROOF.
1
f ""
= - L K(n) e i(h - n) v dv
2n n= - oo
= K(h),
since the only non-zero summand is the one for which n = h. The inter­
change of summation and integration is justified by Fubini' s theorem since
s�" ( l /2n) I::'= - oo [ e i(h - n) v K(n)[ dv < 00 by (4.3.5).
D
00
An absolutely summable complex-valued function y( · ) defined
on the integers is the autocovariance function of a stationary process if and only
if
Corollary 4.3.2.
1
00
f(A) := - I e - w A y(n) 2: 0 for all A E [ - n, n],
2n n= - oo
in which case f( · ) is the spectral density of y( · ) .
.
(4.3.8)
PROOF. First suppose that y( · ) is an autocovariance function. Since y ( · ) is
non-negative definite and absolutely summable,
0 � fN(A) =
1
�
L....
2nN r, s = l
�
= __!__
I
2n l m i < N
.
e - ". Ay( r - s)e!SA
'
'
(1 - �N )e - imAy(m)
� f(A)
as N � oo .
Consequently f(A) 2: 0, -n � A � n. Also from Theorem 4.3.2 we have
y(h) = f�" e ihvf(v) dv, h = 0, ± . . . . Hence f( - ) is the spectral density of y( - ).
On the other hand if we assume only that y ( · ) is absolutely summable,
1,
§4.3. Herglotz's Theorem
121
=
Theorem 4.3.2 allows us to write y(h) J�" e ih vf(v) dv. If j().) ::2:: 0 then this
integral is of the form (4.3. 1 ) with F(A) = J�" f(v) dv. This implies, by Corollary
4.3. 1, that y( · ) is an autocovariance function with spectral density f.
0
EXAMPLE
={
4.3. 1 . Let us prove that the real-valued function
K(h)
=
1 if h = 0,
p if h ± 1 ,
0 otherwise,
is an autocovariance function if and only if I P I � �.
Since K ( · ) is absolutely summable we can apply Corollary 4.3.2. Thus
j(A) = - L e- in J.K(n)
2n n= - oo
00
1
= [
1
1 + 2p cod
2n
J
is non-negative for all ). E [ - n, n] if and only if I P I � �. Consequently K ( - ) is
an autocovariance function if and only if I P I � � in which case K( ) has the
spectral density f computed above. In fact K( · ) is the autocovariance function
of an MA( 1 ) process (see Example 1 .5.1).
·
Notice that Corollary 4.3.2 provides us with a very powerful tool for
checking non-negative definiteness, which can be applied to any absolutely
summable function on the integers. It is much simpler and much more infor­
mati•fe than direct verification using the definition of non-negative definiteness
stated in Theorem 4. 1 . 1 .
Corollary 4.3.2 shows i n particular that every ARMA(p, q) process has
a spectral density (see Problem 3.1 1). This density is found explicitly in
Section 4.4. On the other hand the linear combination of sinusoids (4.2. 1 )
studied i n Section 4.2 has the purely discrete spectral distribution function
(4.2.4) and therefore there is no corresponding spectral density.
If { X,} is a real-valued stationary process then its autocovariance function
x(
Y · ) is real, implying (as pointed out in Section 4.2) that its spectral distribution
function is symmetric in the sense that
- n < A < n.
We can then write
Yx(h) =
1-rr,rrJ
cos(vh) dFx(v).
(4.3.9)
In particular if Yx( · ) has spectral density fx().), - n � ). � n, then fx(A)
fx( - ).), - n � A � n, and hence
Yx(h)
= J:
2
fx(v) cos(vh) dv.
=
(4.3.10)
122
4. The Spectral Representation of a Stationary Process
{X,}
The covariance structure of a real-valued stationary process
is thus
determined by F x(O - ) and F x(A), 0 ::S:: A ::S:: n (or by fx(A), 0 ::S:: A ::S:: n, if the
spectral density fx( · ) exists).
From the above discussion, it follows that a function f defined on
[ - n, n] is the spectral density of a real-valued stationary process if and
only if
Remark.
=
(i) f()�) f( - )�),
(ii) f(A) 2 0, and
(iii) s�n�u�) dA. < oo .
§4.4 Spectral Densities and ARMA Processes
Theorem 4.4.1 . If { t;} is any zero-mean, possibly complex-valued stationary
process with spectral distribution function Fr( · ), and {X,} is the process
then
co
X, = j=-oo
L 1/!i Yr-i
co
j= - co
where L 1 1/!i l <
(4.4. 1)
oo ,
{X,} is stationary with spectral distribution function
- n ::S::
PROOF. The argument of Proposition 3.1.2 shows that
mean zero and autocovariance function,
A
::S:: n.
(4.4.2)
{X,} is stationary with
co
E(Xt +h X,) = j. kI=-ao 1/Jiflk Yr (h - j + k), h = 0, ± 1 , . . . .
Using the spectral representation of { Yr ( · )} we can write
ei(h-j+k)v dFy (v)
Yx(h) = . t 1/Jif/ {
J, k - - ao
k
J(- 7<.7<]
= J_"· "l c=�co 1/Jie- iiv) (=�co lf/k eikv) eihv dFy(v)
{ eihv I . I I/Jj e - ijv l 2 dFy (v),
= J(-n.n]
J = - co
which immediately identifies Fx( · ) defined by (4.4.2) as the spectral distribu­
tion function of
D
{ t;}
{X,}.
If
has a spectral density fr( · ) and if
also has a spectral density fx( · ) given by
{X,} is defined by (4.4. 1 ), then {X,}
§4.4. Spectral Densities and ARMA Processes
1 23
(4.4.3)
where if;(e - iA ) = L)= - cc if;i e - iiA. The operator if;(B) = I � - oo if;i Bi applied to
{ r; } in (4.4. 1) is often called a time-invariant linear filter with weights { if;i }.
The function if;(e - i ·) is called the transfer function of the filter and the squared
modulus I if;(e -i·W is referred to as the power transfer function of the filter.
Time-invariant linear filters will be discussed in more detail in Section 4. 10.
As an application of Theorem 4.4. 1 we can now derive the spectral density
of an arbitrary ARMA(p, q) process.
(Spectral Density of an ARMA(p, q) Process). Let {X, } be an
ARMA(p, q) process (not necessarily causal or invertible) satisfying
(4.4.4)
{ Z, } WN(0, 1J"2),
¢J(B)X, = 8(B) Z,,
Where r/J(z) = 1 - 1J 1 - . . . - ¢Jpz P and 8(z) = 1 + e l + . . . + eq z q have no
common zeroes and r/J(z) has no zeroes on the unit circle. Then {X, } has spectral
density
Theorem 4.4.2
�
Z
Z
(4.4.5)
- n ::::; A ::::; n.
(Because the spectral density of an ARMA process is a ratio of trigonometric
polynomials it is often called a rational spectral density.)
PROOF. First recall from Section 3.1 that the stationary solution of (4.4.4)
can be written as X, = I � -oo if;i Z,_i where I � - oo 1 1/Ji l < oo . Since { Z,} has
spectral density IJ" 2j(2n) (Problem 4.6), Theorem 4.4. 1 implies that {X, } has a
spectral density. This also follows from Corollary 4.3.2 and Problem 3. 1 1 .
Setting U, = ¢J(B)X, = 8(B) Z, and applying Theorem 4.4. 1 , we obtain
Since r/J(e - iA )
(4.4.5).
(4.4.6)
=1
ExAMPLE 4.4. 1
0 for all A E [ - n, n] we can divide (4.4.6) by I rjJ(e - iA W to obtain
D
(Spectral Density of an MA(l) Process). If
x,
then
=
z, + ez,_ l ,
}.
- n ::::; ::::;
n.
The graph of fx(X), 0 ::::; A ::::; n, is displayed in Figure 4. 1 for each of the values
e = 9 and e = 9 Observe that for e = .9 the density is large for low
frequencies and small for high frequencies. This is not unexpected since when
e = .9 the process has a large lag one correlation which makes the series
smooth with only a small contribution from high frequency components. For
-
.
.
.
4. The Spectral Representation of a Stationary Process
1 24
24
22
20
18
1 6
14
12
10
8
6
4
2
0
0
0. 1
0.2
0.3
0.4
0.5
0.3
0.4
0.5
(a)
0
0. 1
0.2
(b)
Figure 4. 1 . The spectral densities fx(2rr.c), 0 <::; c -::; t of X, = Z, + OZt - 1 , { Z, }
WN(O, 6.25), (a) when f) = - .9 and (b) when fJ = .9.
�
§4.4. Spectral Densities and ARMA Processes
1 25
8 - .9 the lag one correlation is large and negative, the series fluctuates
rapidly about its mean value and, as expected, the spectral density is large for
high frequencies and small for low frequencies. (See also Figures 1 . 1 8 and 1 . 1 9.)
=
0
EXAMPLE
4.4.2 (The Spectral Density of an AR( 1 ) Process). If
then by Theorem 4.4.2, {X, } has spectral density
fx(X)
az
=
-11
2n
-
az
.
,Pe - ''r 2
=
- ( 1 - 2,P cod + ¢ 2 ) - 1 .
2n
=
This function is shown in Figure 4.2 for each of the values ,P = .7 and ,P - .7.
Interpretations of the graphs analogous to those in Example 4.4. 1 can again
be made.
Causality, Invertibility and the Spectral Density
=
Consider the AR MA(p, q) process {X, } satisfying ,P(B)X, = 8(B)Z,, where
,P(z)8(z) =/= 0 for all z E C such that l z l 1 . Factorizing the polynomials ,P( · )
and 8( · ) we can rewrite the defining equations in the form,
p
q
fl ( 1 - ai- 1 B)X, fl ( 1 - bi- 1 B)Z,,
j= 1
j= 1
where
=
and
l bj l > 1 , 1 s j s s, l hj l < l , s < j s q.
By Theorem 4.4.2, {X, } has spectral density
a z fl ]= 1 1 - bi- 1 e - i l l z
fX ( )c) = - " 1 - :-1 - l 2 •
2 n fl 1 = 1 1 1 a1 e i l
Now define
"
= "5:flj Sr ( I
O(B) =
s j ss
;}(B)
and
1
- at B) fl ( I - ZiiB)
r < j -::; p
bj- 1 B) fl ( 1 - bj B).
1
s <jS q
Then the ARMA(p, q) process { X, } defined by
fl (I
-
;}(B)X,
= O(B)Z,
(4.4.8)
1 26
4. The Spectral Representation of a Stationary Process
0
0. 1
0.2
0.3
0.4
0.5
0.3
0.4
0.5
(a)
0
0. 1
0.2
(b)
Figure 4.2. The spectral densities fx (2nc), 0 s c s 1, of X, - ¢>X,_ 1 = Z,, { Z, }
WN(O, 6.25), (a) when ¢> = .7 and (b) when ¢> = - .7.
�
§4.4. Spectral Densities and ARMA Processes
1 27
has spectral density
Since
I I - bi e - ; ;_1 = I I - bi e i !-1 = l bil l l - bi- l e -ii- I ,
we can rewrite fx (A.) as
fJ 2 Ti s<j ,; q l bil 1 8(e �:
Ti s<j ,; q l bjl fx(A.).
fx (A.) = 2n
=
fl r <j ,; p I ail l ifo (e ) I fl r <j $ p I ail
Thus the ARMA(p, q ) process { x,+ } defined by
:
:
?:
( ( n l a l ) ( TI l bj l )-2 ) ,
;j(B)X,+ = 8(B) Z" {Z,} - wN 0, fJ 2
2
r <J -:;, p
j
s<] -5:,_ q
is causal and invertible and has exactly the same spectral density (and hence
autocovariance function) as the ARMA process (4.4.7). In fact { X, } itself has
the causal invertible representation
�(B)X, = 8(B)Z,*
where { Zr*} is white noise with the same variance as { Z,}. This is easily checked
by using the latter equation as the definition of { Zi}. ( See Proposition 3.5. 1 .)
EXAMPLE 4.4.3. The ARMA process
{ Z, } - WN(O, fJ 2 ),
is neither causal nor invertible. Introducing �(z) = 1 - 0.5z and e (z) = 1 +
0. 25 z, we see that { X,} has the causal invertible representation
{ z: } - WN (0, .25 fJ 2 ).
x, - o.sx, _ , = z: + o.2sz:_ , ,
The case when the moving average polynomial 8(z) has zeroes on the unit
circle is dealt with in the following propositions.
Let { X,} be an ARMA(p, q) process satisfying
cp(B)X, = 8(B)Z, ,
{Z,} - WN(O, fJ 2 ),
where cp(z) and 8(z) have no common zeroes, cp(z) =f. 0 for I z I = 1 and 8(z) =f. 0
for l z l < 1 . Then Z, E sp { X5, - oo < s :::;; t } .
Proposition 4.4.1.
PROOF. Factorize 8(z) as et(z)e*(z), where
et (z) = TI
o - bj z),
8*(z) = f1
(1 -
1 �j�s
s < j s,q
_,
bi- 1 z),
1 28
4. The Spectral Representation of a Stationary Process
l bjl > 1 , 1 � j �
process,
s
and l bj l
and note, since </>(B)X,
=
=
1,
s
< j � q. Then consider the MA(q - s)
Yr = B*(B)Z,
f)f(B) Yr, that
sp{ Yk , - oo < k � t}
s::::
sp { X k , - oo < k � t}
for all t. Consequently it suffices to show that Z, E sp{ Yk ,
Proposition 3.2. 1,
{ Ur}
where
- oo
< k � t}. By
"' WN(O, (Jb),
U, =
Yr - psp{Y., - oo < k < t) Yr ·
Using the two moving average representations for { Yr}, we can write the
spectral density fy of { Yr } as
Jy(A)
=
(J2
__!!.
l o:(e -i..\W
2n
= - I B*(e -i..�w.
(J 2
2n
Since B*(z) has all of its zeroes on the unit circle, o:(z) and 8*(z) must have
the same zeroes. This in turn implies that
B(z)
= o:(z)
and
(J 2
=
(Jb .
It now follows that the two vectors, ( U, Yr , . . . , Yr - nY and (Z0 Yr , . . . , Yr - n)',
have the same covariance matrix and hence that
Taking mean square limits as n --> oo and using the fact that
U, E
sp{ Yk, - oo < k � t},
we find that
Hence, since
(J2
=
(Jb,
E(Z, - PS!i( Y., - oo < k s tl Zr) 2
This implies that
Z,
=
Psp{Y. , oo < L ,1 Z,,
_
=
EZ� - E U�
= 0.
or equivalently that
Z, E sp { Yk, - oo < k � t}
as was to be shown.
D
Remark 1. If we extend the definition (Section 3. 1 ) of invertibility of the
ARMA equations, <f>(B)X, = fJ(B)Z,, by requiring only that
Z, E
sp{Xb - oo < k � t},
1 29
§4.4. Spectral Densities and ARMA Processes
Proposition 4.4. 1 then states that invertibility is implied by the condition
8(z) =f. 0 for l z l < 1 . The converse is established below as Proposition 4.4.3.
Remark 2. If ¢(z) =f. 0 for all I z I s 1, then by projecting each side of the
equation, ¢(B)X, = 8(B)Zn in Proposition 4.4. 1 onto
sp{X., - oo < s s t - 1 },
we see at once that Z, is the innovation,
z, = x, - P, _ 1 X r o
where P, _ 1 denotes projection onto sp{ X., - oo < s s t - 1 }.
Proposition 4.4.2.
Let {X,} be an ARMA(p, q) process satisfying
=
where ¢(z) and 8(z) have no common zeroes and ¢(z) =f. 0 for all z E IC such
that l z l 1 . Then, if {bi, 1 s j s q} are the zeroes of 8(z), with l bil ;:::: 1 ,
1 s j s m , and l bi l < 1 , m < j s q, there exists a white noise sequence { U,}
such that {X,} is the unique stationary solution of the equations,
where (iJ(z) is the polynomial defined in (4.4.8) and e(z) is the polynomial,
e(z)
n ( 1 - bj- 1 z) n ( 1 - bj z).
=
1 ::;;, j ::;;, m
The variance of U, is given by
m < j ::;;, q
PROOF. By Problem 4.29, we know that there exists a white noise sequence
{ 2,} such that
The required white noise sequence { U,} is therefore the unique stationary
solution of the equations,
m < j -:;; q
m < j::;;, q
D
If {X,} is defined as in Proposition 4.4.2 and the polynom­
ial 8(z) has one or more zeros in the interior of the unit circle, then
Z, ¢ sp{Xs, - oo < s S t}.
Proposition 4.4.3.
4. The Spectral Representation of a Stationary Process
1 30
PROOF. By Proposition 4.4.2, we can express X, in the form,
00
x, = I �j u, _ j ,
where V, = X, - P, _ 1X,, L� o �izi =O(z)f¢ (z) for l z l
sp{ Xk,
- oo
< k s; t} =sp{ Ub
- oo
s;
<k
1 , and
s;
t}.
Suppose that the zeroes of O(z) in the interior of the unit circle are
{b/ m < j s; q} where m < q, and let
O(z) =O*(z) f1 ( 1
m
< j :So q
- bi- 1 z) = O*(z)O;(z).
From the equations ¢(B)X, =8(B)V, and </J(B)X, = O *(B)O;(B)Z, , it follows
that
¢(B)e*(B)Z, =
L �j v, _ j ,
00
j = - 00
j
where �j is the coefficient of z in the Laurent expansion of </J(z)O(z)/O;(z),
l z - 1 1 < £, which is valid for some £ > 0. Since </J(z)O(z) and O;(z) have no
zeroes in common and since O ;(z) has all of its zeroes in the interior of the
unit circle, it follows that �j # 0 for some j = -j0 < 0.
From Z, E sp{ X k, - oo < k s; t}, it would follow that
¢(B) e*(B)Z, E sp{ Uk , - oo < k s; t}.
But this is impossible since
( V, +jo' ¢(B)O*(B)Z, ) =� -jo Var( Ur + jo) # 0.
We conclude therefore that Z, f/; sp{Xb
- oo
<k
s;
t}, as required.
D
Rational Approximations for Spectral Densities
For any real-valued stationary process {X, } with continuous spectral density
f, it is possible to find both a causal AR(p) process and an invertible MA(q)
process whose spectral densities are arbitrarily close to f. This suggests that
{X,} can be approximated in some sense by either an AR(p) or an MA(q)
process. These results depend on Theorem 4.4.3 below. Recall that f is the
spectral density of a real-valued stationary process if and only iff is symmetric,
non-negative and integrable on [ - n, n].
If f is a symmetric continuous spectral density on [ - n, n],
then for every £ > 0 there exists a non-negative integer p and a polynomial
a(z) =f1J=1 (1 - 11t z) = 1 + a 1 z + · · · + aP z P with l '7j l > l , j 1 , . . . , p, and
Theorem 4.4.3.
=
§4.4. Spectral Densities and ARMA Processes
131
real-valued coefficients a0, . . . , aP , such that
I A ) a(e - i." W - f(.A) i < £ for all .A E [ - n, n]
where A = (1 + ai + · · · + a�r 1 (2n) - 1 J�,,J(v) dv.
(4.4.9)
PROOF. If f(.A) = 0 the result is clearly true with p = 0. Assume therefore that
M = SUP-n -s -< -s n f(),) > 0. Now for any £ > 0 let
and define
fJ(),) = max { f(.A), ()}.
Clearly fJ(),) i s also a symmetric continuous spectral density with fJ(.A) � ()
and
Now by Theorem 2. 1 1 . 1 there exists an integer r such that
l r- 1 j=riO ikLi <Si bk e-;u - JJ(A) I < ()
(4.4. 10)
for all A E [ - n, n],
(4.4. 1 1)
where bk = (2n) - 1 J�"fJ(v)e ivk dv. Interchanging the order of summation and
using the fact that fJ is a symmetric function, we have
r - 1 L L bk e -;u = L (1 - l k l/r)bk e - ;u_
j = 0 lki <Sj
lkl <r
This function is strictly positive for all A by (4.4. 1 0) and the definition of fJ(.A).
r-1
Let
C(z) = I ( 1 - lkl/r)bk zk ,
lkl< r
and observe that if C(m) = 0 then by symmetry C(m -1 ) = 0. Hence, letting
p = max { k : bk # 0}, we can write
p
z PC(z) = K 1 f1 ( 1 - '1i- 1 z)( l - IJi z)
j= 1
for some K 1 , 1J 1 , . . . , Y/ p such that 1 '1il > 1 , j = 1 , . . . , p . This equation can be
rewritten in the form
(4.4. 1 2)
where a(z) is the polynomial 1 + a 1 z + · · · + aP z P = f1f=1 (1 - '1i- 1 z), and
K 2 = ( - l)P '1 1 '1P K 1 . Equating the coefficients of z 0 on each side of (4.4. 1 2)
. • •
we find that
K 2 = b0(1 + a i + · · · + a�) - 1 = (2n) - 1 ( 1 + ai +
·:·
+ a�)- 1
fJJ(v) dv.
132
4. The Spectral Representation of a Stationary Process
Moreover from (4.4. 1 1 ) we have
] K 2 ] a(e - uW - fb()-) ] < b for all )_.
From (4. 4. 1 3) and (4.4. 1 0) we obtain the uniform bound
( I + af +
···
+ a� ) - 1 ] a (e iA W � (fb(A) + b)2n:
-
(4.4. 1 3)
(f/b(v) dvr 1
Now with A defined as in the statement of the theorem
I K z l a(e - i AW - A ] a(e - i A W I
(f" (fb(v) - f(v)) dv) 4n:M (f" f(v) dvr 1
1
4n:Mb (f/(v) dvy .
� (2n:) - �
�
(4.4. 1 4)
From the inequalities (4.4. 1 0), (4.4. 1 3) and (4.4. 1 4) we obtain
]A ] a(e - iA W - f(A)] < b + b + 4nMb
< e,
(f/(v) dvr1
by the definition of 3.
D
If f is a symmetric continuous spectral density and e > 0, then
there exists an invertible MA(q) process
Corollary 4.4.1 .
such that
where CJ2 = ( I
+
l fx (A.) - f(A)] < e for all A E [ - n , n:] ,
af + · · · + a; ) - 1 J"-rrf(v) dv.
PROOF. Problem 4. 1 4.
D
If f is a symmetric continuous spectral density and e > 0 then
there exists a causal AR(p) process
Corollary 4.4.2.
such that
l fx (A) - f(A) ] < e for all A E [ - n, n ] .
1 33
§4.5. * Circulants and Their Eigenvalues
PROOF. Let f'(A.) = max { f(A), e/2}. Then j'(A) � c/2 and
0 :-:;; j'(),) - f(A) :-:;; e/2 for all A E [ - n, n].
(4.4. 1 5)
Let M = max,d'()o) and b = min { (2M)� 2 e, (2M)� 1 }. Applying Theorem 4.4.3
to the function 1 /f"(),), we have
(4.4. 1 6)
I K l a(e�ilW - 1 /f"(A) I < b for all A E [ - n, n],
where the polynomial a(z) = 1 + a 1 z + · · · + a P z P is non-zero for l z l :-:;; 1 and
K is a positive constant. Moreover by our definition of b, the inequality (4.4. 1 6)
yields the bound
K � 1 l (a(e � il) l � 2 :-:;; f " (A)/( 1 - bf" (A)) :-:;; M/( 1 - Mb) :-:;; 2M.
Thus
1
1
I K � I a(e�ilW 2 - j'(A) I = I K l a(e � i lW - 1 /f " (A) I [K � I a(e � ilW2f'(A)]
(4.4. 1 7)
< 2M 2 b :-:;; e/2.
Combining the inequalities (4.4.1 5) and (4.4. 1 7) we get
1
(4.4. 1 8)
I K� l a(e � ilW2 - f(A) I < e for all A E [ - n, n].
Now by Theorem 4.4. 1 the causal AR(p) process
has spectral density K� 1 l a(e�ilW, which by (4.4. 1 8) furnishes the required
approximation to f()o).
D
§4.5 * Circulants and Their Eigenvalues
It is often desirable to be able to diagonalize a covariance matrix in a simple
manner. By first diagonalizing a circulant matrix it is possible to obtain
a relatively easy and useful asymptotic diagonalization of the covariance
matrix of the first n observations from a stationary time series. We say that
the n x n matrix M = [m jJ?.j =J is a circulant matrix if there exists a function
i
m( · ) with period n such that m ij = m(j - i). That is
m(O)
m( l )
m (n - 1 )
m(n - 1 )
m(n - 2)
m(O)
M = m( n - 2) m(n - 1 )
m(n - 3)
(4.5. 1 )
m(1)
m(2)
m(O)
The eigenvalues and eigenvectors of M are easy to compute. Let
2nj
Wj = - ,
n
4. The Spectral Representation of a Stationary Process
1 34
and
for j
=
0, 1 , . . . , n - 1 .
The circulant matrix M has eigenvalues
n- 1
Aj L m(h)rj- h , j 0, 1 , . . . , n - 1 ,
h =O
with corresponding orthonormal left eigenvectors,
j 0, 1 , . . . , n - 1 .
Proposition 4.5.1 .
=
=
=
PROOF. Straightforward calculations give
viM = n - 1 12 [m(O) + m(n - 1)ri + . . . + m(l) r;- 1 , m(1) + m(O)ri + . . .
+ m(2)r; - 1 , . . . , m(n - 1) + m(n - 2)ri + · · · + m(O)r;- 1 ]
= Jcin-112 [1, ri, rf, . . . , rj -1 ]
= Aivi,
=
showing that vi is a left eigenvector of M with corresponding eigenvalue Jci,
j 0, 1, . . . , n - 1. Moreover, if vt is the conjugate transpose of vk , then
vj vt
=
n - 1 (1 + rj rk- 1 + . . . + r; - 1 rk-n + 1 )
n - 1 [1 - (rjrk )"] [1 - rjrk r 1 0 ifj =f. k,
1
ifj k.
={
l
=
=
0
In order to diagonalize the matrix we now introduce the matrix
v0
V = v. 1
�n - 1
observing from Proposition 4.5. 1 that VM
VM V - 1
=
J=
'
A,
A V and hence that
(4.5.2)
Diagonalization of a Real Symmetric Circulant Matrix
=
If the circulant matrix M defined by (4.5. 1) is also real and symmetric (i.e. if
m(n - j) E IR,j = 0, 1, . . . , n - 1 ), then we can rewrite the eigenvalues ).i
of Proposition 4.5.1 in the form
m(j)
§4.5.* Circulants and Their Eigenvalues
135
if n is odd,
if n is even,
(4.5.3)
where [n/2 ] is the integer part of n/2.
We first consider the case when n is odd. Since m( · ) is an even function, we
can express the n eigenvalues of M as
L m(h)
A0 =
Ai =
l hl s [n/2 ]
L m(h)exp( - iwi h),
l hl s !n/2 1
and
j
=
1,
2, . . . , [n/2],
j = 1 , 2, . . . , [n/2].
Corresponding to the repeated eigenvalue Ai = An -i (1 ::::;, j ::5, [n/2]) of M
there are two orthonormal left eigenvectors vi and vn -i = vi as specified in
Proposition 4.5. 1. From these we can easily find a pair of real orthonormal
eigenvectors corresponding to Ai , viz.
cj = (vj + vn -JIJ2 = J2;;;[ 1 , cos wj , cos 2wj , . . . , cos(n - 1)wJ
and
si
Setting
=
i(vn -j
-
vi )/J2
=
jVn [O, sin wi sin 2wi . . . , sin(n - l ) wi].
c0 =
,
,
jVn [ 1 , 1 , 1 , . . . , 1 ]
and defining the real orthogonal matrix P by
(4.5.4)
we have PM = A(s)p and hence
(4.5.5)
where A <s) = diag {A0, A 1 , A 1 , . . . , A[n!2 J , A[n!2 J } .
For the case when n is even, both Ao and An;2 have multiplicity 1. If we
replace c [n/2 ] by r 112 cn!2 in the definition of p and drop the last rows of the
matrices P and A < s), then we again have PMP' = A<s).
Proposition 4.5.2.
Let y( · ) be an absolutely summable real autocovariance
4. The Spectral Representation of a Stationary Process
136
function, let f( · ) be its spectral density
f(w) = (2n) -t L y( h)e-ihw,
a)
h=-oo
and let Dn be the n n matrix,
if n is odd,
diag {f(O), f(w d, f(w d , . . . , f(w[n;2 1 ) ,f(w[n;2 1 ) }
Dn =
diag { f(O), f(w1 ), f(w 1 ), , f(w(n - 2112 ), f(w(n - 2 );2 ), f(wn;2 ) } if n is even.
If P is the matrix defined by (4.5.4) and rn = [y(i - j )] ?, i =t , then the components
x�J 1 of the matrix
x
{
• . .
Prn P' - 2nD" ,
converge to zero uniformly as n --+ oo ( i.e. sup 1 s: i ,js: n l x�'? l -+ 0).
PROOF. Let Pi = [Pi t ' Pi 2 ' . . . ' Pin] denote the ith row of the matrix p and let r�s)
denote the real symmetric circulant matrix
r�s)
=
y(O) y(l) y(2)
y(l ) y(O) y(l )
y(2) y( l ) y(O)
y(2) y( l )
y(3) y(2)
y(4) y(3)
y(l) y(2) y( 3 )
y(l) y(O)
We know from (4.5.5 ) that pnsJP' = Ns1. Moreover since the elements of the
matrix A(sJ - 2nD" are bounded in absolute value by L lh/ > [n;2 1 i y( h) l which
converges to zero as n --+ oo, it suffices to show that
l pJ"'�slpj - P; ln P} I -+ 0 uniformly in i and j.
But
I Pi ( r�s) - rn ) P} I
= [(n
4n - 1 ( t m l
t
m
m
n
m l (y(m) - y( - m)) k l ( Pik Pi, n - + k - Pi,n- + kPjk )
l
- 1 )/2]. Since I Pij l � (2/n) 1 1 2 this expression is bounded by
where c
2
=lt
(4.5.6)
m
y(m) l + 2
t
m
) t�
m i y(n - m) l � 8
m
l y(m) l + 8
n
m
�� c � l y(m) l .
The first term converges to zero as n --+ oo by the dominated convergence
theorem since the summand is dominated by l y(m) l and L;;; = 1 ly(m) l < oo . The
second term goes to zero since it is bounded by L ;;;= [n;2 1 ly(m) l . Since both
terms are independent of i and j, the proof of (4.5.6) is complete.
D
Now let { X,} be a real-valued zero-mean stationary time series with
autocovariance function y( · ) which is absolutely summable. Consider the
transformed vector of random variables
137
§4.5.* Circulants and Their Eigenvalues
(4.5.7)
=
with ri, vi , j 0, . . . , n - 1 defined as in Proposition 4.5. 1 . The components of
Z are approximately uncorrelated for large n by Proposition 4.5.2. Moreover
the matrix V, being orthogonal, is easily inverted to give
n -1
..
"' . zh exp ( - l] Wh ) .
Xi = n - 1/2 L...
h�O
Thus we have represented X0 , X 1 , . . . , X._1 as a sum of sinusoids with random
coefficients which are asymptotically uncorrelated. This is one (albeit rough)
interpretation of the spectral representation of the process {X, } .
Another easily verified consequence of Proposition 4.5.2 is that with Z
defined as in (4.5.7),
ro
sup I E I Zk l 2 - L y(h)exp( - ihwk ) l --+ 0 as n --+ oo .
h� -ro
O :<:; k :<:; n - 1
Let us consider now an arbitrary frequency w E ( - n , n] and define, by analogy
with (4.5.7),
n -1
Zw, n n - 1 12 L Xh exp(ihw).
h�O
Then
n-1
E 1 Zw,n l 2 n -1 L y(k - l ) exp [ iw (k - l)]
k, l� O
1
n- L (n - l hl) y (h)exp( iwh)
lh l < n
ro
--+
L y (h) exp ( iwh) 2nf(w) as n --+ oo .
h�-ro
This shows that (2n)- 1 1Zw,n l 2 is an asymptotically unbiased estimator of f(w) .
In Chapter 1 0 we shall show how this estimator can be modified in order to
construct a consistent estimator of f(w) .
We conclude this section by deriving upper and lower bounds for the
eigenvalues of the covariance matrix of X. (X 1 , . . . , X.)' when {X, } is a
stationary process with spectral density f(),), - n � ), � n.
=
=
=
=
=
Proposition 4.5.3.
that
Let {X, } be a stationary process with spectral density f such
m :=
inff(A.) > 0 and M
A
:=
sup f(A.) <
A
oo ,
4 . The Spectral Representation o f a Stationary Process
138
and denote by ). 1 , , An (A 1 :::;; ). 2 :::;; · • · :::;; ).n) the eigenvalues of the covariance
matrix rn of (X 1 , , X.)'. Then
2nm :::;; A 1 :::;; An :::;; 2nM.
. . .
• . .
PROOF. Let X = (x I ' . . . ' xn) ' be a non-zero right eigenvector of r n
corresponding to the eigenvalue An . Then
rJ� xje-ijff(v) dv
:::;; f "
=
- rr
:::;;
showing that An
L L Xixke - i(j - k ) vM dv
j k
= 2nM '[. xJ,
j
2nM. A similar argument shows that A 1
;:::::
2nm.
D
§4.6* Orthogonal Increment Processes on [ - rc, rc]
In order to give a precise meaning to the spectral representation (4.2.5)
mentioned earlier, it is necessary to introduce the concept of stochastic inte­
gration of a non-random function with respect to an orthogonal-increment
process {Z(A) }.
An orthogonal-increment process on [- n, n] is a complex­
valued stochastic process {Z(A), - n :::;; A :::;; n} such that
Definition 4.6.1 .
<Z(A), Z(A) )
< oo,
<Z(J.), I ) = 0,
and
(4.6. 1 )
-n
:::;; ). :::;; n,
- n :::;; :::;; n,
},
where the inner product is defined by <X, Y)
(4.6.2)
=
E(X Y).
The process {Z(A), - n :::;; A :::;; n} will be called right-continuous if for all
). E [ - n, n),
II Z(A + o ) - Z(A) II 2 = E IZ(A + o ) - Z(A)I 2 ----> 0 as o 1 0.
It will be understood from now on that the term orthogonal-increment process
will mean right-continuous orthogonal-increment process unless specifically
indicated otherwise.
§4.6 * Orthogonal Increment Processes on [ - n, n]
139
{Z(A), - n :o:; A :o:; n} is an orthogonal-increment process,
then there is a unique distribution function F (i.e. a unique non-decreasing,
right-continuous function) such that
F(J") 0,
F(J") F(n),
and
(4.6.4)
- n :o:; A :o:; f.1 :o:; n.
F( p) - F(A) IIZ(p) - Z(J") 11 2 ,
Proposition 4.6.1 . If
=
=
=
PROOF. For F to satisfy the prescribed conditions it is clear, on setting A =
that
F( p)
=
IIZ ( p) - Z( - n) ll 2 ,
- n :o:; f.1 :o:; n.
-
n,
(4.6.5)
To check that the function so defined is non-decreasing, we use the orthog­
onality of Z(p) - Z(A) and Z(A) - Z( - n), - n :o:; A :o:; f.1 :o:; n, to write
F( p)
= IIZ(p) - Z(A) + Z()") - Z(- n) ll 2
II Z(p) - Z(A) II 2 + IIZ(A) - Z( - n) ll 2
2 F(A).
The same calculation gives, for - n :o:; f.1 :o:; f.1 + 15 :o:; n,
F( p + 15) - F( p) IIZ( p + 15) - Z( p) l l 2
=
=
---> 0
as 15 1 0,
by the assumed right-continuity of { Z(A) }.
D
Remark. The distribution function F of Proposition 4.6. 1 , defined on [ - n, n]
by (4.6.5) will be referred to as the distribution function associated with the
orthogonal-increment process {Z(A), - n :o:; )" :o:; n}. It is common practice in
time series analysis to use the shorthand notation,
E(dZ(A) dZ(p))
for the equations (4.6.3) and (4.6.4).
=
15;.,,11 dF(A),
ExAMPLE 4.6. 1 . Brownian motion {B()"), - n :o:; )" :o:; n} with EB()") = 0 and
Var(B(A)) = <T 2 (A + n)/2n, - n :o:; )" :o:; n, is an orthogonal-increment process
on [ - n, n]. The associated distribution function satisfies F(A) = 0, )" :o:; - n,
F().) = <T 2 , )" 2 n, and
F(A) = <T 2 (A + n)/2n,
- n :o:; A :o:; n.
EXAMPLE 4.6.2. If {N(A), - n :o:; A :o:; n} is a Poisson process on [ - n, n] with
constant intensity c then the process Z()") = N(A) - EN(A), - n :o:; )" :o:; n, is an
orthogonal-increment process with associated distribution function F()") = 0,
140
4. The Spectral Representation of a Stationary Process
F(A) ={Z(A)} A
F(A) c(A
A
s; - n,
2nc, ?: n and
=
+ n), - n s; s; n. If c is chosen to
has exactly the same associated distribution function as
be () 2j2n then
in Example 4.6. 1 .
)
"
{ B(A)}
§4. 7* Integration with Respect to an Orthogonal
Increment Process
We now show how to define the stochastic integral
{ Z(A), - A n}
I(f)
= 1 -,, ,/(v)dZ(v),
where
n s; s; is an orthogonal-increment process defined on the
probability space (Q, :Ji', P) and f is any function on [ which is square
We
integrable with respect to the distribution function associated with
proceed step by step, first defining I (f) for any f of the form
n
i =O
f()") = L .IJ( A; , A;+d (A),
as
I(f) =
-n
= A0 A1
<
F n, n]
Z(A).
< . . · < An+ l = n,
n
Z(A;+d - Z(A;)],
I.M
i =O
(4.7. 1 )
(4.7.2)
and then extending the mapping I to an isomorphism (see Section 2.9) of
2
2
= L (F) onto a subspace of L (Q, :Ji', P).
L2 ( [
Let � be the class of all functions having the form (4.7. 1 ) for some
n E 1 , 2,
Then the definition (4.7.2) is consistent on � since for any given
f E � there is a unique representation of f,
-n,n],&B,F)
{0, ... }.
m
f(A) = iI=O rJ(v,, v,+d(A), - n v0 < v1 < · · · < vm+l n,
i n which r; ri + l , 0 i < All other representations of f having the form
(4.7. 1 ) are obtained by reexpressing one or more of the indicator functions
=
=f.
s;
=
m.
I< v, , v ,+ d as a sum of indicator functions of adjoining intervals. However this
makes no difference to the value of I (f), and hence the definition (4.7.2) is the
same for all representations (4.7. 1) of f. It is clear that (4.7.2) defines I as a
linear mapping on �. Moreover the mapping preserves inner products since
if f E � and g E � then there exist representations
and
n
j(A) = iL=O JJ(A;, Ai+ Ip)
g(A) = iL=On gJ(A;, A;+d(A)
§4.7.* Integration with Respect to an Orthogonal Increment Process
141
)on+ 1 Hence the inner­
P)
( /(f), /(g) ) = (ta.{;[Z(Jci + l ) - Z(.lc J ], it gJZ()o i + i ) - Z(.lcJ ])
o
n
= L !Jli ( F(Jci+i) - F(.lcJ ),
i=O
by the orthogonality of the increments of { Z(A.)} and Proposition 4.6. 1 . But
the last expression can be written as
1 -"· "/(v)g(v) dF(v) = <J, g) Ll<Fl'
in terms of a single partition - n = )00 < A 1 < · · · <
product of /(f) and /(g) in U(Q., .'F, is
=
n.
the inner product in L 2 (F) of f and g. Hence the mapping I on � preserves
inner products.
Now let � denote the closure in L 2 (F) of the set �. If f E f0 then there exists
a sequence Un } of elements of � such that llfn - J IIP(F) ---> 0. We therefore
define /(f) as the mean square limit
/(f) = m.s.lim I(fn ),
(4.7.3)
after first checking (a) that the limit exists and (b) that the limit is the same for
ali sequences { fn } such that II f, - f I L2 (F) ---> 0. To check (a) we simply observe
that for fm , f, E � '
I I I( f, ) - I(fm l ll = I I I(f" - fm l ll
= 11 /n - fm 1 1 L 2 (F) '
so if I I f, - f I I P < Fl ---> 0, the sequence { /( f,) } is a Cauchy sequence and therefore
convergent in L 2 (0., .'F, P). To check (b), suppose that llfn - J I I P (F) ---> 0 and
ll gn - J II L2 (F) ---> 0 where fn , gn E �. Then the sequence /1 , g 1 , /2 , g2 , , must
be norm convergent and therefore the sequence /(JJ l, / (g J l, / ( /2 ), /(g 2 ), . ,
must converge in L 2 (0., .'51',
However this is not possible unless the sub­
sequences { !Un l } and { /(g")} have the same mean square limit. This com­
plet�s the proof that the definition (4.7.3) is both meaningful and consistent for
jE �.
The mapping I on � is linear and preserves inner products since if p n E f0
and I I f,< n - p n I I P F ---> 0, f,Ul E � ' i = 1 , 2, then by the linearity of I on �,
< l
/(a J ( l l + a 2 j< 2 l ) = lim /(a 1 f,( l l + a 2 f,< 2 l )
. • .
P).
and by the continuity of the inner product,
..
142
4. The Spectral Representation of a Stationary Process
2
= (j(ll, j( ) ) L2(F) ·
It remains now only to show that qj = U (F). To do this we first observe
that the continuous functions on [ - n, n] are dense in L 2 (F) since F is a
bounded distribution function (see e.g. Ash ( 1 972), p. 88). Moreover � is a
dense subset (in the L 2 (F) sense) of the set of continuous functions on [ - n, n].
Hence qj U(F).
Equations (4.7.2) and (4.7.3) thus define I as a linear, inner-product preserving
mapping of qj = L 2 (F) into U (n, ff , P). The image J(qj) of qj is clearly a
closed linear subspace of L 2 (0., ff, P), and the mapping I is an isormorphism
(see Section 2.9) of qj onto J(qj). The mapping I provides us with the required
definition of the stochastic integral.
=
Definition 4.7.1 (The Stochastic Integral). If { Z(A.)} is an orthogonal-increment
process on [ - n, n] with associated distribution function F and if f E U(F),
then the stochastic integral J( - "·"1 f(A.) dZ(A.) is defined as the random variable
I (f) constructed above, i.e.
l f(v) dZ(v) := / (f).
J(-1t,1t]
Properties of the Stochastic Integral
For any functions f and g in L 2 (F) we have established the properties
(4.7.4)
and
E (l (f)I( g)) = 1 - "·"l f(v) g (v) dF(v).
(4.7.5)
E(l(fn)l(gn)) � E(l(f)I(g)) = J " · "/(v)g(v) dF(v).
_
(4.7.6)
2
Moreover if Un } and { gn } are sequences in L (F) such that ll fn - fll u(FJ � 0
and llgn - g ii L2(FJ � 0, then by continuity of the inner product,
From (4.7.2) it is clear that
E (l (f))
=
0
(4.7.7)
for 2(all f E �; if f E qj then there is a sequence Un }, fn E �. such that
L Fl
f and /(!,) � l (f), so E (l (f)) = limn �oo E (I(fn)) and (4.7.7) remains
fn
143
§4.8. * The Spectral Representation
valid. This argument is frequently useful for establishing properties of stochastic
integrals.
Finally we note from (4.7.5) and (4.7.7) that if {Z(),) } is any orthogonal
increment process on [ - n, n] with associated distribution function then
X, = l(e i' · ) =
{ e icv
J (-1t,1t]
F,
dZ(v),
(4.7.8)
is a stationary process with mean zero and autocovariance function
E(Xc +h X,) =
{ e i vh
J(-1t,1t]
dF(v).
(4.7.9)
In the following section we establish a converse of this result, namely that if
{X, } is any stationary process, then {X, } has the representation (4.7.8) for an
appropriately chosen orthogonal increment process { Z(A.) } whose associated
distribution function is the same as the spectral distribution function of { X, }.
§4.8 * The Spectral Representation
Let {X, } be a zero mean stationary process with spectral distribution function
Ffirst
the spectral representation (4.2.5) of the process { X, } we
. Toneedestablish
to identify an appropriate orthogonal increment process { Z(A.),
) E [ - n, n] } . The identification of { Z(A.) } and the proof of the representation
will be achieved by defining a certain isomorphism between the subspaces­
£ = sp{X" t E E} of
and X = sp{e i '·, t E E} of
This iso­
morphism will provide a link between random variables in the "time domain"
and functions on [ - n, n] in the "frequency domain".
Let Yf = sp {X,, t E E} and ff = sp { e i ' · , t E E} denote the (not necessarily
closed) subspaces ff c
and ff c
consisting of finite linear
combinations of X,, t E E, and e i' · , t E E, respectively. We first show that the
mappmg
,
L2 (Q,li',P)
L2 (0., li', P)
L2 (F).
L2 (F)
(4.8. 1 )
defines an isomorphism between Yf and % . To check that T is well-defined,
suppose that II L}=1 aj X,1 - I::'=1 bk X,J = 0. Then by definition of the
norm and Herglotz's theorem,
L2 (F)
showing that (4.8. 1 ) defines T consistently on
Yf.
The linearity of T follows
4. The Spectral Representation of a Stationary Process
144
easily from this fact. In addition,
showing that T does in fact define an isormorphism between .Yf and X.
We show next that the mapping T can be extended uniquely to an iso­
morphism from :if onto %. If Y E :if then there is a sequence Y, E .Yf such
that I I Y, - Yll ...... 0. This implies that { Y, } is a Cauchy sequence and hence,
since T is norm-preserving, the sequence { T Y,} is Cauchy in L 2 (F). The
sequence { TY, } therefore converges in norm to an element of %. If T is to be
norm-preserving on :if we must define
TY
=
m.s.iim TY,.
This is a consistent definition of T on :if since if II Y, - Yll ...... 0 then the
sequence TY1 , TY1 , TY2 , TY2 ,
is convergent, implying that the sub­
sequences { TY, } and { TY,} have the same limit, namely TY. Moreover using
the same argument as given in Section 4.7 it is easy to show that the mapping
T extended to :if is linear and preserves inner products.
Finally, by Theorem 2. 1 1.1, X is uniformly dense in the space of continuous
functions ¢ on [ - n, n] with ¢(n) = ¢( - n), which in turn is dense in L 2 (F)
(see Ash ( 1972), p. 88). Hence .i? L 2 (F). We have therefore established the
following theorem.
• . •
=
Theorem 4.8. 1 . IfF is the spectral distribution function of the stationary process
{ X" t E .Z}, then there is a unique isomorphism T of sp { X1, t E .Z} onto L 2 (F) such
that
Theorem 4.8. 1 is particularly useful in the theory of linear prediction (see
Section 5.6). It is also the key to the identification of the orthogonal increment
process {Z(A), - n :::::; A :::::; n} appearing in the spectral representation (4.2.5).
We introduce the process { Z(A)} in the following proposition.
Proposition 4.8.1 . If T is
{Z(A), - n :::::; A :::::; n} defined
Z(A)
=
defined as in Theorem 4.8. 1 then the process
by
- n :::::; A :::::; n,
T - 1 U(-1t,;.k )),
§4.8.* The Spectral Representation
145
is an orthogonal increment process (see Definition 4.6. 1 ). Moreover the distri­
bution function associated with { Z(A)} (see Proposition 4.6. 1 ) is exactly the
spectral distribution function F of { Xr } ·
PROOF. For each A E [ - n, n], Z(A) is a well-defined element of sp { X, t E Z} by
Theorem 4.8. 1 . Hence <Z(A), Z(A) ) < oo. Since Z(A) E sp { X, t E Z} there is a
sequence { Y, } of elements of sp { Xr , t E Z} such that II Y, - Z(A) II -+ 0 as n -+ oo.
By the continuity of the inner product we have
<Z(A), 1 ) = lim < Y,, 1 ) = 0
since each X, and hence each Y,, has zero mean. Finally if - n
A3 :s; A4 :s; n,
:s;
A1
:s;
A2
:s;
= <Io., , ;. . J ( · ), /(;. , , ;. 2 J ( · ) ) L2 (F)
=
l
J (-n, n]
= 0,
/(;., , ;..1(v)/<;. , , ;. 2 1(v) dF(v)
completing the proof that { Z(A)} has orthogonal increments. A calculation
which is almost identical to the previous one gives
<Z(p) - Z(A), Z(p) - Z(A)) = F(p) - F(A),
showing that {Z(A)} is right-continuous with associated distribution function
D
F as claimed.
It is now a simple matter to establish the spectral representation (4.2.5).
Theorem 4.8.2 (The Spectral Representation Theorem). If { Xr } is a stationary
sequence with mean zero and spectral distribution function F, then there exists
a right-continuous orthogonal-increment process { Z(A), - n :s; A :s; n } such that
and
(i) E I Z(A) - Z( -
n) l 2
= F(A),
- n :s; A :s; n,
(ii) Xr = J( - n , nJ e irv dZ(v) with probability one.
PROOF. Let { Z(A)} be the process defined in Proposition 4.8. 1 and let I be
the isomorphism,
/(f) =
J
/
(v) dZ(v),
_" "
·
146
2
f E = L (F)
4. The Spectral Representation of a Stationary Process
U(Q, .?, P),
I(f) = L .t;(Z(A; + l) - Z(),;))
=
This relationship remains valid for all f E = L 2 (F) since both and
we must have I =
(i.e.
= f for all
2(F)) and henceTherefore
from Theorem 4.8. 1 .
fareE Lisomorphisms.
X, = I(eit ·) = JI( - 1t, 1t]eitv dZ(v),
giving the required representation for {X,}. The first assertion of the theorem
is an immediate consequence of Proposition 4.8. 1 .
forthogonal
{X,} is a zero-mean
stationary
sequence
then
there
existhat
ts a
right
continuous
increment
process
{
Z(.A_
)
,
-n
:::;
;
A
:::;
;
n}
such
Z( -n) = 0 and
X, = JI(-1t,1t] e itv dZ(v) with probability one.
{ Y(.A_) } and {Z(.A_) } are two such processes then
P(Y(A) = Z(.A_)) = 1 for each AE[ - n,n].
PROOF. If we denote by { Z*(.A_) } the orthogonal-increment process defined by
Proposition 4.8. 1 , then the process
Z(.A_) = Z*(.A_) - Z*(-n), -n :::;; :::;;
not only satisfies Z(-n) = 0, but also has exactly the same increments as
{Z*(),) }. Hence
I e itv dZ*(v) = J(-1t,1t]
I e itv dZ(v).
X, = J(-1t,1t]
Suppose now that { Y(.A_) } is another orthogonal-increment process such
that Y( -n) = 0 and
e it v dY(v) = I
e i<v dZ(v) with probability one. (4.8.2)
X, = I
J(-1t,1t]
J(-1t,1t]
If we define for f E U(F),
Iy(f} = J_"· "/(v)dY(v)
from E0
onto I(E0) <:;
which was discussed in Section 4.7.
If 'lJ has the representation (4.7. 1 ) then
n
i=O
T-t (f).
E0
T- 1
I
T -1
TJ(f)
D
Corollary 4.8.1 . I
If
w
n,
147
§4.8 * The Spectral Representation
and
then we have from (4.8.2)
Iy(ei' · ) = lz(ei' ·) for all t E E..
(4.8.3)
Since Ir and lz are equal on sp {e ;'., t E Z} which is dense in L 2 (F) (see the
comment preceding Theorem 4.8. 1 ), it follows that ly(f) = lz(f) for all
f E L 2 (F). Choosing f(v) = /( - n , AJ ( v) we obtain (with probability one)
Y()�) =
1-,, ,/(v) dZ(v) = Z(A),
- n ::;; A ::;; n.
D
Remark 1. In the course of the proof of Theorem 4.8.2, the following result
was established: Y E sp { X,, t E E.} if and only if there exists a function f E L 2 (F)
such that Y = /(f) = J( - ,,,1 J( v) dZ(v). This means that I is an isomorphism
of L 2 (F) onto sp{ X,, t E E.} (with the property that l (e i' · ) = X,).
The argument supplied for Theorem 4.8.2 is an existence proof
which does not reveal in an explicit manner how { Z(A)} is constructed. In the
next section, we give a formula for obtaining Z(A) from {X, } .
Remark 2.
Remark 3. The corollary states that the orthogonal-increment process in the
spectral representation is unique if one uses the normalization Z(- n) = 0.
Two different stationary processes may have the same spectral distribution
function, for example the processes X, = J(-,,,1 e it A dB(A) and Y,
J<-,,,1 e i' A dN(A) with { B()�)} and { N(A)} defined as in Examples 4.6. 1 and 4.6.2.
In such cases the processes must of course have the same autocovariance
function.
=
ExAMPLE 4.8. 1 . Let Z(A) = B(A) be Brownian motion on [ - n, n] as defined
in Example 4.6. 1 with EZ(A) = 0 and Var(Z(A)) = 0" 2 (A + n)/2n, - n ::;; A ::;; n.
For t E E., set g,(v) =
cos(tv)/( ,, o 1 ( v) +
sin(t v)/( 0 ,,1 (v) and
X, =
J2
{ g,(v) dB(v) =
J( -n,n]
-
{
J2 ( J(-n,O]
J2
cos(tv) dB(v) +
{ sin(tv) dB(v)
J(O,n]
(cf. Problem 4.25). Then EX, = 0 by (4.7.7), and by (4.7.5),
0' 2
0' 2
E (X, +h Xh ) =
{
,
2
dv =
2rc
2n J(-n,nJ g +h(v)g,(v)
f"
0
cos(hv) dv.
Hence E (X, +h X,) = 0' 2 1\. o and consequently { X, } � WN(0, 0' 2 ).
)
,
(4.8.4)
(4.8.5)
148
4. The Spectral Representation of a Stationary Process
Since however B(A.) is Gaussian we can go further and show that the
= 0, ± 1 , . . . , are independent with
N(O,
To
random variables
sk be any k distinct integers and for each fixed j let
prove this let s
be a sequence of elements of EfJ, i.e. functions of the form (4.7. 1 ), such that
· ) in
Since the mapping /8 is an isomorphism of � = U(F)
onto /8(�), we conclude from (4.8.4) that
(4.8.6)
+ ... +
+ ... +
X,, t
1,
jj<nJ --+ 9si( L2 (F).
81/B(Jtl)
X,
. • . ,
�
CJ2{ ).Ji<nl }
8k /B(h(n)) � 81 Xs, 8k Xs. ·
The left -hand side, l8(L Y�1 8i jj<n l ), is clearly normally distributed with mean
zero and variance 1 I J� 1 8ifi<nJ I 2 . The characteristic function of 18(L Y�1 8i jj<n l )
is therefore (Section
1
1 .6)
L2 (F), as n --+ oo ,
k 8.J:<n) 1 2 Ik 8 e is 1 2 = (J2 Ik 82
I
j� 1 •
I j�1
I j� 1
From
we conclude therefore that I 1� 1 8j Xsj has the Gaussian char­
acteristic function,
f/J(u) = nlim-tct:J rftn(u) = exp [ - tu2 2 ± 8f].
Since this is true for all choices of 8 1 , . , 8b we deduce that Xs,, ... , Xs. are
it then follows that the random
jointly normal. From the covariances
variables X,, t = 0, ± 1 , . . , are iid N(O, CJ 2 ).
If A is a Borel subset of [-n, n], it will be convenient in the
following proposition (and elsewhere) to define
L f(v) dZ(v) = 1-"· "/(v)/A(v) dZ(v),
By the continuity of the inner product in
J J
L2(F)
--+
r
.
J
J
U(F)
(4.8.6)
0"
J=l
.
(4.8.5)
.
.
Remark 4.
(4.8.7)
where the right-hand side has already been defined in Section 4.7.
upposeofthediscontinuity
spectral distribution
function
F)"0of<then. stationary
Spoint
process
{X,}
has
a
at
A.0
where
-n
<
Then with
probability one,
X, = JI(-"·"l\ {.<o} eirv dZ(v) + (Z(A.o ) - Z(A.() ))eir.<o,
where the two terms on the right side are uncorrelated and
Proposition 4.8.2.
PROOF. The left limit Z(A.0) is defined to be
§4.8.* The Spectral Representation
Z(A0 )
=
1 49
m.s.lim Z(An ),
(4.8.8)
where An is any sequence such that An i A0 . To check that (4.8.8) makes sense
we note first that {Z(An ) } is a Cauchy sequence since IIZ(An ) - Z(Am ) ll 2 =
I F(An ) - F(Am ) l --+ 0 as m, n --+ oo . Hence the limit in (4.8.8) exists. Moreover
if vn i A 0 as n --+ oo then II Z(An ) - Z(vn ) ll 2 = I F(An ) - F(vn ) l --+ 0 as n --+ oo ,
and hence the limit in (4.8.8) is the same for all non-decreasing sequences with
limit A 0 .
For b > 0 define A± c) = Ao ± b. Now by the spectral representation, if
0 < b < n - I A0 I ,
X,
j
e irv dZ(v)
= J(-1t,1t]\(.L,,A,]
+
j
e irv dZ(v).
J(L,,A,]
(4.8.9)
Note that the two terms are uncorrelated since the regions of integration
are disjoint. Now as b --+ 0 the first term converges in mean square to
f<-"·"l \{ Ao ) e irv dZ(v) since
e ;' .J<-"·"l\(L,,A,1 --+ e i' · I<-"·"l\ { Ao) in L 2 (F).
To see how the last term of (4.8.9) behaves as c5 --+ 0 we use the inequality
I J(L,,
r A,] e itv dZ(v) - e itAo(Z(Ao) - Z(A(} )) I
r eitv dZ(v) - eitAo(Z(Ad) - Z(A_d)) I
I J(L,,A,]
:$;
(4.8. 1 0)
As c5 --+ 0 the second term on the right of (4.8. 1 0) goes to zero by the right
continuity of {Z(A)} and the definition of Z(A0). The first term on the right
side of (4.8. 1 0) can be written as
0 as c5 --+ 0,
by the continuity of the function e ir ·. Hence we deduce from (4.8. 1 0) that
--+
j
e ir v dZ(v) � e i< Ao(Z(A0) - Z(A0 )) as c5 --+ 0.
J(L,, A,]
The continuity of the inner product and the orthogonality of the two
integrals in (4.8.9) guarantee that their mean-square limits are also orthogonal.
4. The Spectral Representation of a Stationary Process
! 50
Moreover
Var(Z(A.0 ) - Z(A.0 )) = lim Var(Z(A.0) - Z(A.")) =
Ant Ao
F A. ) F(},() ).
( 0 -
0
If the spectral distribution function has k points of discontinuity at ..1. 1 , . . . ,
A.k then { X, } has the representation
X, =
I
J(-1t,1t]\{ A ,, ... , .<,}
e irv dZ(v) +
�j- (Z(A.j) - Z(A.n)e;,;.i,
(4.8. 1 1 )
where the (k + 1 ) terms on the right are uncorrelated (this should be compared
with the example in Section 4.2).
The importance of (4.8. 1 1) in time series analysis is immense. The process
Y, = (Z(A.0 ) - Z(A.0 ))eir -<o is said to be deterministic since Y, is determined for
all t if Y,0 is known for some t0 . The existence of a discontinuity in the spectral
distribution function at a given frequency ),0 therefore indicates the presence
in the time series of a deterministic sinusoidal component with frequency ..1. 0.
§4.9* Inversion Formulae
Using the isomorphism T of Theorem 4.8. 1 and a Fourier approximation
to /( v , wJ C ) E L
it is possible to express directly in terms of { X, } the
orthogonal-increment process { Z(),)} appearing in the spectral representation
(4.2.5).
Recall that for - n < v < w < n
2 (F),
and
Consequently if
T(Z(w) - Z(v)) = /( v, w] ( · )
,
TX, = e; .
ij·
"
L., IY.J. e
UI :S n
for all t E Z.
V(Fl I( v , w] ( ),
·
(4.9. 1 )
then by the isomorphism,
I r:xjXj � Z(w) - Z(v).
(4.9.2)
UI :S n
An appropriate trigonometric polynomial satisfying (4.9. 1 ) is given by the
n1h-order Fourier series approximation to /( v , wJ ( · ), viz.
j
where
hn(A) = I r:xj ei .<,
Ul :s: n
(4.9.3)
(4.9.4)
151
§4.9.* Inversion Formulae
I n Section 4. 1 1 we establish the following essential properties of the sequence
of approximants { hn( · ) } :
l hn(A) - I(v .wJ (A.)I --> 0
n -->
where E is any open set containing the points v and w, and
sup
). E [- n . n] \ £
as
00 ,
(4.9.5)
sup l hn(A.)I s M < oo for some constant M and all
n. (4.9.6)
is thew spectral
distribution
andIfif Fv and
are continuity
points function
ofF suchofthatthe -stationary
rc < v <
A E [ - n , n]
Proposition 4.9.1 .
{ X" }
sequence
w < n, then
PROOF. Problem 4.26.
D
{ Xn}
stationaryF,sequence
with autocovariance
function
If is a function
), spectral
distribution
and
spectral
representation
r). dZ(A.), and if v and w ( - n < v < w < rc) are continuity points ofF,
ei
then as n oo
__!_ I ( f "' e - ij). ) � Z(w) - Z(v)
(4.9.7)
2n
and
(4.9.8)
__!_
y(j) ( f w e - i dA.) --> F(w) - F(v).
2rc I
PROOF. The left side of (4.9.7) is just T- 1 h where T is the isomorphism of
Theorem 4.8. 1 and h is defined by (4.9.3). By Proposition 4.9. 1 we conclude
Theorem 4.9.1 .
y( ·
f<- ,,,1
-->
lil :<: n
Iii :<; n
that
X,
x
j
=
dA.
v
v
j).
n
T- 1 hn � T - 1 I(v, w] =
n
Z(w) - Z(v),
which establishes (4.9.7). To find Z(8) - Z(8-), - n < e
4.32.
s n, see Problem
To prove (4.9.8) we note, from the spectral representation of y( · ) that the
left-hand side of (4.9.8) can be written as
4. The Spectral Representation of a Stationary Process
! 52
By Proposition 4.9. 1 and the Cauchy-Schwarz inequality,
I
Hence
<h" - J(v. wJ• 1 l h" - J(v, w111 F112 (n) � o as n � oo .
>I
::;:;
l - ,, ,1hnCA.)dF(.A) � J _,, ,/(v,w](.A) dF(.Ic) F(w) - F(v),
=
as required.
D
Although in this book we are primarily concerned with time series in
discrete-time, there is a simple analogue of the above theorem for continuous­
time stationary processes which are mean square continuous. This is stated
below. The major differences are that sums are replaced by integrals and the
range of integration is IR instead of
(-n, n].
Let { X(t),y( t Ewhich
IR} beisacontinuous
zero meanatstationary
process with
0. Then there exists a
autocovariance
function
spectral
distribution function F(t) and an orthogonal-increment process Z(t) on
- oo < t < oo such that
y(s) = I: eis.< dF(A)
and
Theorem 4.9.2.
·)
More
T� oo,over if v and w (- oo < v < w < oo are continuity points of F, then as
_!_2n I T (f w e- iry dy)y(t)dt � F(w) - F(v)
v
and
__!_2n__ I-T (fvw e- iry dy) X(t) dt � Z(w) - Z(v).
)
-T
T
For a proof of this result see Hannan ( 1 970).
�4. 10* Time-Invariant Linear Filters
{ t=
... }
{ c,, k , t, = ... {X,
} t
t 0, ± 1 , . . . ;
c, , kxk,
The process Y, ,
0, ± 1,
is said to be obtained from
by application of the linear filter C =
k 0, ± 1 ,
if
00
r; = L:
k
= -oo
=
=
0, ± 1, . . .
}
,4. 1 0. 1 )
§4.1 0. * Time-Invariant Linear Filters
1 53
the coefficients c,, k are called the weights of the filter. The filter C is said to be
time-invariant if c, , k depends only on (t - k), i.e. if
(4. 1 0.2)
since then
Y;-s = L oo Cr -s,k Xk
k= 00
= L c,, s + k Xk
k= - oo
00
=
L c,, k xk - s•
k= oo
00
i.e. the time-shifted process { r;_., t = 0, ± 1 , . . . } is obtained from { x, _ s ,
t = 0, ± 1 , . . . } by application of the same linear filter C. For the time-invariant
linear filter H = { h ; , i = 0, ± 1 , . . . } we can rewrite (4. 1 0. 1 ) in the form
r; =
L hk Xt -k .
00
(4. 1 0.3)
k= - oo
The time-invariant linear filter (TLF) H is said to be causal if
hk = 0 for k < 0,
since then r; is expressible in terms only of X., s s t.
EXAMPLE 4. 1 0. 1 .
(4. 1 0.4)
The filter defined by
r; = aX_ ,,
t = 0, ± 1,
...
'
is linear but not time-invariant since the weights are c,, k = al5,, -k which do not
depend on (t - k) only.
EXAMPLE 4. 1 0.2.
The filter
t = 0, ± 1 , . . .
'
is a TLF with hi = ai,j = - 1 , 0, 1 and hi = 0 otherwise. It is not causal unless
a_ 1 = 0.
EXAMPLE
4. 1 0.3. The causal ARMA(p, q) process,
t = 0, ± 1 , . . .
c/J(B)X, 8(B)Z"
can be written (by Theorem 3. 1 . 1 ) in the form
=
'
where L i=o t/Ji z i = 8(z)/c/J(z), lzl s 1. Hence { Xr } is obtained from {Z, } by
application of the causal TLF { t/Ji ,j = 0, 1, 2, . . . } .
1 54
4. The Spectral Representation of a Stationary Process
An absolutely summable time-invariant linear filter H
0, ± 1 , . . . } is a TLF such that
Definition 4.10.1.
{
hj,j
=
00
=
lh l
I j < w.
j= � oo
If {X, } is a zero-mean stationary process and H is an absolutely summable
TLF, then applying H to {X, } gives
Y,
=
00
(4. 10.5)
L hj Xt -j•
j=
� oo
By Theorem 4.4. 1 we know that { Y; } is stationary with zero mean and spectral
distribution function
(4. 10.6)
where
h(e-iv)
=
00
L hj e- ijv.
j= - oo
In the following theorem, we show that (4. 10.5) and (4. 10.6) remain valid under
conditions weaker than absolute summability of the TLF. The theorem also
shows how the spectral representation of the process { r; } itself is related to
that of {X, } .
representation Let {X, } be a zero-mean stationary process with spectral
e itv dZx(v)
X, I
J(-1t.1t)
and
spectral distribution function Fx( ). Suppose H { hj,j 0, ± 1 , . . . } is a
TLF such that the series 2:;= -n hj e-ij · converges in L 2 (Fx) norm to
h(e- ;·) j=L hje-W
(4. 10.7)
as n --> oo. Then the process
Theorem 4.10.1.
=
·
=
=
=
00
- co
j= - oo
is stationary with zero mean, spectral distribution function
I l) lh(e- iv )l 2 dFx(v)
Fy(A) J(-1t.
and spectral representation
=
(4. 1 0.8)
§4. 1 0 * Time-Invariant Linear Filters
!55
(4. 1 0.9)
- iv) is non-zero for vi A where SA dFx(A.) = 0, the filter H can be
I
f
h(e
inverted in the sense that X, can be expressed as
I
(4. 10. 1 0)
X, =
g(e - iv)ei" dZy(v),
J(-n, 7t]
where g(e-i
v) 1/h(e - iv) and dZy(v) h(e - iv)dZx(v). From (4. 10. 10) and
Remar
k 1 of Section 4.8 it then follows=that X, E sp{ Y,, s } .
=
- oo <
i eirv " h -e- iv dZ (v)
< oo
PROOF. From the spectral representation of { X, } we have
n
" h-X . =
j =/_.- n 1 t-J
hie-ii"
n
( - n , n]
/_.
j= - n
X
1
(4. 10. 1 1)
h(e - i·) L2(Fx), it follows that the left
converges to
and since Li = - n
in
side of (4. 1 0. 1 1 ) converges in mean square and that
Equation (4. 1 0.8) then follows directly from (4.7.5).
Once (4. 10. 10) is established, it will follow immediately that
s oo }.
Since g(e - iv)h(e - iv) = 1 for v i A and SA dF y(A.) = 0, it follows that g EL 2 (F y)
so that the stochastic integral in (4. 1 0. 10) is well-defined. Also,
X, e sp{ Y,,
=
- oo <
I
J ( - n, n]
<
11
-
g(e - i'v)h(e - i'v)l 2 dFx(v)
= 0,
which establishes (4. 1 0. 10).
0
Remark 1. An absolutely summable TLF satisfies the assumptions of the first
part of the theorem since in this case, LJ = - n
converges uniformly to
hi e - ii·
h(e -i·).
Remark 2. Heuristically speaking the spectral representation of {X, } de­
composes X, into a sum of sinusoids,
-n <
v�
n.
4. The Spectral Representation of a Stationary Process
! 56
H is to produce corresponding components
<v
which combine to form the filtered process { Y,}. Consequently [ h (e - i v )[ is often
called the amplitude gain of the filter, arg h(e - iv ) the phase gain and h(e - iv ) itself
the transfer function. In view of (4. 10.8) with A. = the quantity [ h (e - iv W is
referred to as the variance (or power) gain or as the power transfer function at
frequency v.
The effect of the TLF
:-s; n,
-n
n,
Remark 3. The spectral viewpoint is particularly convenient for linear filtering
since techniques are available for producing physical devices with prescribed
transfer functions. The analysis of the behaviour of networks of such devices
is particularly simple in terms of spectral analysis. For example if
is
operated on sequentially by two absolutely summable TLF's
and
in
series, then the output process
will have spectral representation
{ X1}
H 1 H2
{ W,}
I
e irv h l (e-iv )hz (e- iv )dZx(v)
W, = J(-1t
,1t]
and spectral distribution function
I [ h 1 (e -iv)h2 (e - ivW dFx(v).
Fw(A.) = J(-1t,1t]
Let { Y,} be the MA( 1 ) process,
Y, = Z1 - Z, _1, {Z,} WN(O, cr2).
Then since h(e - i v) = - e - iv is non-zero for
0 and since F is continuous
at zero, it follows from Theorem 4. 1 0. 1 that
z, = JI( - 1t, 1t]eitv( l - e - iv) - ' dZy(v).
Although in this case Z1 E sp { Y., oo < s t} (see also Problem 3.8), Z1 does
not have a representation of the form L� o a Y, - . More generally, the filter
( 1 - B) can be inverted whenever the spectral distribution function of the
input process is continuous at zero. To illustrate the possible non-invertibility
of ( 1 - B), let us apply it to the stationary process,
z; = z, + A,
where A is uncorrelated with {Z,}, E(A) = 0 and Var(A) = cr� > 0. Since
Y, = ( 1 - B)Z, = ( 1 - B)Z;,
it is clear that we cannot hope to recover {z;} from {Y,}. In this example
the transfer function, h(e - iv) = 1 - e - iv, is zero at v = 0, a frequency to which
F assigns positive measure, cr�.
Remark 4.
�
v
1
-
z·
:-s;
i
=/=
z
i
§4. 1 1 .* Properties of the Fourier Approximation hn to /(v,wJ
1 57
§4. 1 1 * Properties of the Fourier Approximation
hn to /(v , w]
(4.9.5) and (4.9.6) of the trigonometric
(4.11.1)
hn(8) _I_2n lil s n e iO f-"" I (A)e i dA
which were used in deriving the inversion formulae of Section 4. 9 .
Let Dn C) be the Dirichlet kernel (see Section 2.1 1),
{
sin[(n �)A]
i
Dn(A) = I iLI ei J. = sin(A/2) if A 0'
2n 1
if A = 0.
If b E (0, 2n) and { f, } is the sequence of functions defined by
fn (x) J: Dn(A) dA,
then as n --+ oo, fn(x) --+ n uniformly on the set [b, 2n - b] and
sup l fn (x) l :S M < oo for all n 1.
PROOF. We have for x 0,
xf Dn(A)dA = 2 f x/2 r 1 sin((2n 1))o) dA
xf /2 g(A)sin((2n + 1)A)dA, (4.1 1.2)
+2
In this section we establish the properties
polynomials
"
L.,
=
-
(v, w]
+
9
J.
'
=I
+
Proposition 4.1 1.1.
=
X E
�
(0, 2 7t - O]
�
0
+
0
0
where g(A) = [sin - 1 (A) - A - 1 ] = (A - sin A)/(A sin A). Straightforward calcula­
tions show that g(A) = A/6 + o (A), g'(A) = i +
as A --+ 0 and that g(A) and
g'(A) are uniformly bounded on [0, n - b/2]. Integrating the second term on
the right side of
and using the fact that g(O) = 0, we obtain
o(1)
(4.11.2)
x/
2 ro 2 g(A)sin((2n + 1)A)dA = - 2(2n 1 ) - g(x/2)cos((2n l)x/2
J
2(2n + 1 ) 1 J: g'(A)cos((2n + l)A)dA.
1
+
+
-
+
12
Since g(A) and g'()o) are bounded for A E [0, n - b/2] it follows easily that this
expression converges to zero uniformly in on the set [0, 2n - b]. On the
other hand by a standard result in analysis (see Billingsley
p. 239),
x
(1986),
! 58
4. The Spectral Representation of a Stationary Process
2
f x/2
0
A - 1 sin ((2
n
+
1 ) ..1.)
d
..1.
=
2
f (2 n + l )x/2
0
....... n as
n
.......
r 1 sin )_ d)_
(4. 1 1 .3)
oo.
It is also evident that the convergence in (4. 1 1 .3) is uniform in x on the set
[<5, 2n - <5] and hence that fn (x) converges uniformly to n on the same set.
Moreover since the integral 2 Jll A -l sin ..1. is a continuous function on [0, oo)
with finite limits at
0 and = oo , the integral in (4. 1 1 .3) is uniformly
bounded in x ( :;::,: 0) and Combined with the uniform convergence to zero of
the second integral on the right of (4. 1 1 .2), this shows that fn(x) is uniformly
bounded for x E [0, 2n - <5] and :;::,: I .
D
y =n
y dA.
.
n
for - n < v < w < Ifn, { h" } is the sequence of functions defined in (4. 1 1 . 1 ), then
0 as n -->
8E(sup
-1t, 1t)\E
where E is any open subset of [ - n, n] containing both v and Also
8E(sup- 1t, 1t) h ( ) M < for all n I.
Proposition 4. 1 1 .2.
l hn(8) - /(v , wJ (8) 1 -->
PROOF.
l n 8 1 s
00 ,
w.
:;::,:
oo
h = 2n_!__ L eij8 e- ij'- dA.
n(8)
1
2n
fw
lk> n
v
fw
Dn(8 - A) dA
v
1 f e-v
Dn ()-) d
2n e-w
fe-w
1 ( f e-v
0
Dn(}-)
- 0 Dn(A.)
= -
==
A.
dA.
2n
)
d)_ .
(4. 1 1 .4)
Given the set E, there exists a 6 > 0 such that 6 < 1 8 - v i < 2n - 6 and
6 < 1 8 - wl < 2n - 6 for all 8 E [ - n, n]\E. Since D"(A.) is an even function it
follows from Proposition 4. 1 1 . 1 that
_!__
2n
f -v
o Dn(A.)
o
d..{ --> { �
-z
�f 8 - v > 0,
If 8 - V < 0,
and the convergence is uniform in 8 on [ - n, n]\E. The same result holds with
/(v , wJ (8) uniformly on [ - n, n]\E.
v replaced by w, and hence
The uniform boundedness of h"(8) follows on applying Proposition 4. 1 1 . 1
to (4. 1 1 .4) and noting that 1 0 - v i < 2n and 1 8 - w l < 2n for all 8 E [ - n, n].
h"(O) -->
0
1 59
Problem:;
Problems
4. 1 . If }'( · ) is the autocovariance function of a complex-valued stationary process,
show that y(O) � 0, ly(h)l � y(O), y(h) = y( - h), and that y( · ) is non-negative
defi nite (see (4. 1.6)).
4.2. Establish whether or not the following function is the autocovariance function
of a stationary process:
1 if h = 0,
- 5 if h = ± 2,
y(h) =
:
- 5 if h = ± 3,
otherwise.
4.3. If 0 < a < n, use equation (4.3. 1) to show that
h = ± 1 , ± 2, . . . ,
h - 1 sin ah,
y(h) =
h = 0,
a,
{�
{
is the autocovariance function of a stationary process { X,, t = 0, ± 1, . . . } . What
is the spectral density of {X, } ?
4.4. I f { X, } is the process defined by equation (4.2.1), show that {X, } i s real-valued if
and only if Aj = - An - j and A(A) = A(An - ), j = 1, . . . , n - 1 and A(A") is real.
Sh<�w that {X,} then satisfies equation (4.2.2).
4.5. Determine the autocovariance function of the process with spectral density
f(A) (n - IAI)/n 2 , - n � A � n.
=
4.6. Evaluate the spectral density of {Z, }, where {Z, }
�
WN(O, <J2 ).
4. 7. If { X, } and { Y, } are uncorrelated stationary processes with spectral distribution
functions Fx( · ) and Fr( · ), show that the process { Z, := X, + Y, } is stationary and
det,ermine its spectral distribution function.
4.8. Let {X, } and { Y, } be stationary zero-mean processes with spectral densities fx
and fr· If fx(A) � fr(A) for all A E [ - n, n] , show that
(a) rn . Y - rn . X is a non-negative definite matrix, Where rn. Y and rn . X are the
covariance matrices of Y = ( Y1 , . . , Y,)' and X = (X 1 , . . . , X")' respectively,
and
(b) Var(b'X) � Var(b'Y) for all b (b 1 , , bn )' E IR".
.
=
• • •
4.9. Let {X, } be the process
X, = A cos(nt/3) + B sin(nt/3) + Y,
where Y, Z, + 2.5Zt - J , { Z,} WN (0, 0'2 ), and A and B are uncorrelated (0, v 2 )
random variables which are also uncorrelated with {Z, } . Find the covariance
function and the spectral distribution function of { X, } .
�
=
{71:
4. 10. Construct a process {X, } which has spectral distribution function,
Fx(w) =
+ w,
3n: + w,
5n + w,
-n � w < - n/6,
- n/6 � w < n/6,
n/6 � w � n.
1 60
4. The Spectral Representation of a Stationary Process
For which values of d does the differenced process VdX, = X, - X,_d have a
spectral density? What is the significance of this result for deseasonalizing a time
series by differencing?
4. 1 1 . Let {X, } be the ARMA(p, q) process defined by
{Z, } - WN(O, a 2 ),
¢>(B)X, = O(B)Z,,
where ¢>(z) # 0 for all z E IC such that lzl = I . Recall from Example 3.5. 1 that for
some r > I ,
"'
1
1
I y(k)z k = a2 8(z)O(z - )/[¢>(z)¢>(z - ) ] ,
k=
r- 1 <
lzl
< r,
- oo
where the series converges absolutely in the region specified. Use this result in
conjunction with Corollary 4.3.2 to deduce that {X, } has a spectral density and
to express the density in terms of a 2 , 0( · ) and ¢>( · ).
4. 1 2. Let {X, } denote the Wolfer sunspot numbers (Example 1 . 1 .5) and Jet { Y,} denote
the mean-corrected series, Y, = X, - 46.93, t = I, . . . , 1 00. The following AR(2)
model for { Y,} is obtained by equating the theoretical and sample autocovariances
at lags 0, I and 2:
Y, - 1 . 3 1 7 Y,-1
+
.634 ¥,_2 = z,
{Z, } - WN(0,289.3).
(These estimated parameter values are called "Yule-Walker" estimates and can
be found using the program PEST, option 3.)
Determine the spectral density of the fitted model and find the frequency at
which it achieves its maximum value. What is the corresponding period? (The
spectral density of any ARMA process can be computed numerically using the
program PEST, option 5.)
4. 1 3. If {X, } and { Y, } are stationary processes satisfying
X, - cxX,_ 1 = J.t;,
and
Y, - cx Y, _ 1 = X, + Z,,
where l cx l < I and { I-t; } and {Z, } are uncorrelated, find the spectral density of
{ Y, }.
4. 1 4. Prove Corollary 4.4. 1 .
4. 1 5. Let {X, } be the MA( l ) process,
X, = Z, - 2 Z,_ 1 ,
Given s > 0, find a positive integer k(s) and constants a0 = I , a 1 , . . . , ak such that
the spectral density of the process
k
Y, = I aj x,_j
j�O
satisfies sup_ , ,; < ,; • lfr(A) - Var ( Y, )/2n l < s.
4. 1 6. Compute and sketch the spectral density f(2), 0 s 2
s
n,
of the stationary
Problems
161
process {X, } defined by
X, - .99X,_ 3 = Z,,
{ Z, } - WN(O, 1).
Does the spectral density suggest that the sample paths of { X, } will exhibit
oscillatory behaviour? If so, then what is the approximate period of the oscil­
lation? Compute the spectral density of the filtered process,
Y,
=
t(X, _ t + X, + X, + t l,
and compare the numerical values of the spectral densities of { X, } and { Y, } at
frequency w = 2n/3 radians per unit time. What effect would you expect the
filter to have on the oscillations of { X, } ?
4. 1 7. The spectral density of a real-valued process { X, } is defined on [0, n] by
{
1 00,
f(A.) =
0,
n/6 - .01 < A < n/6 + .01,
otherwise,
and on [ - n, OJ by f(A.) = f( - /c).
(a) Evaluate the covariance function of { X, } at lags 0 and 1 .
(b) Find the spectral density of the process { Y, } where
(c) What is the variance of Y,?
(d) Sketch the power transfer function of the filter V 1 2 and use the sketch to
explain the effect of the filter on sinusoids with frequencies (i) near zero and
(ii) near rr/6.
4. 1 8. Let { X, } be any stationary series with continuous spectral density f such that
0 :<:; f(A) :<:; K and f(n) # 0.
Let f.(A.) denote the spectral density of the differenced series { ( 1 - B)"X, } .
(a) Express f.(A.) i n terms o f f,_1 (A.) and hence evaluate f,(/c).
(b) Show that f,().)/f.(n) --> 0 as n --> oo for each A E [0, n).
(c) What does (b) suggest regarding the behaviour of the sample-paths of
{ (1 - B)"X, } for large values of n?
(d) Plot { ( 1 - B)" X, } for n = 1, 2, 3 and 4, where X,, t = 1, . . . , 100 are the
Wolfer sunspot numbers (Example 1 . 1 .5). Do the realizations exhibit the
behaviour expected from (c)? Notice the dependence of the sample variance
on the order of differencing. (The graphs and the sample variances can be
found using the program PEST.)
4. 19. Determine the power transfer function of the time-invariant linear filter with
2a, ljl2 1 and ljli = 0 for j oft 0, 1 or 2. If you wish
coefficients ljl0 1, tjl1
to use the filter to suppress sinusoidal oscillations with period 6, what value of
IX should you use? If the filter is applied to the process defined in Problem 4.9,
what is the spectral distribution function of the filtered process, Y, = X, 2aX,_1 + X,_z?
4.20.* Suppose { Z().), -n :<:; A :<:; rr} is an orthogonal-increment process which is not
necessarily right-continuous. Show that for all A E [ - n, n), Z(A. + c5) converges
in mean square as c5 !0. Call the limit Z(A. + ) and show that this new process is
a right-continuous orthogonal-increment process which is equal to Z(A.) with
probability one except possibly at a countable number of values of A in [ - n, n).
=
=
-
=
1 62
4. The Spectral Representation of a Stationary Process
4.21 . * (a) If { Z, } is an iid sequence of N(O, 1) random variables, show that the
associated orthogonal-increment process for { Z,} is given by
dZ().) = dB(A.) + dB( - A.) + i(dB( - ) ) - dB(A.))
.
where B(A.) is Brownian motion on [ - rc, rc] with a2 = 1/4 (see Example
4.6. 1 ) and where integration relative to dB( - A.) is defined by
f-
.. .
/(A.) dB( - A.) =
f- /(
.
.
- A.) dB(A.)
for all f E L2([ - rc, rc]).
(b) Let X, = A cos(wt) + B sin(wt) + z, where A, B and Z,, t = 0, ± 1, . . . , are
iid N(O, 1) random variables. Give a complete description of the
orthogonal-increment process associated with {X,}.
4.22.* If X, = f<- 1 e ''v dz(v) where {Z(v), -rc
:<;; v :<;; rc } is an orthogonal increment
process with associated distribution function F( · ), and if Y, - <P Y,_1 = X, where
<P E ( - 1, 1 ), find a function 1/J( · ) such that
•. •
Y, =
I
J(-n . n]
e ''vi/J(v) dZ(v).
Hence express E( Y,+ h X,) as an integral with respect to F. Evaluate the integral
in the special case when F(v) = a2(v + rc)/2rc, - rc :<;; v :<;; rc.
:<;; v :<;; rc} be an orthogonal increment process with associated
distribution function F( · ) and suppose that 1/J E L 2 (F).
(a) Show that
4.23.* Let {Z(v), - rc
W(v) =
L
..
vJ
1/l(A.) dZ(A.),
- rc
:<;;
v :<;; rc,
is an orthogonal increment process with associated distribution function,
G(v) =
L
.
.
vJ
1 1/J(A.W dF(A.).
(b) Show that if g E L2(G) then gi/J E L2(F) and
L
...
J
g(A.) dW(A.) =
L
. .
.
J
g(),)I/J(A.) dZ(),).
(c) Show that if 1 1/1 1 > 0 (except possibly on a set of F-measure zero), then
Z(v) - Z( - rc)
=I
J
v
- n, )
1
-- d W(A.),
1/J(A.)
- TC :<;; V :<;; TC.
4.24. * If {X, } is the stationary process with spectral representation,
t = 0, ± 1, . . . ,
Problems
1 63
where E l dZx(vW = I ¢(vW dF(v), F is a distribution function on [ - n, n], ¢
almost everywhere relative to dF, and ¢ E L 2(F), show that
Y, =
I
J(-1t,1t]
ei"r 1 (v) dZx(A),
t = 0,
# 0
± I, . . . ,
is a stationary process with spectral distribution function F.
4.25.* Suppose that { X, } is a real stationary process with mean zero, spectral
distribution function F and spectral representation X, = Jr - •. •l e;'.dZ(v). Let
Im { Z(A)} and assume that F is continuous at
U(A) Re { Z(A)} and V(J,)
0 and n.
(a) Show that dF(A) = dF( - A), i.e. Jr • . •1 g(A) dF(A) = Jr _ •.• 1 g(),) dF( - A) for
all g E L 2 (F), where integration with respect to F( - A) is defined by
=
=
-
_
I
J [ - • . •l
I g( - A) dF(A).
J[- • . •l
Jr- •.•1 g(A) dZ(A) Jr- 1 g(A) dZ( - X) for
g(A) dF( - ),) =
•. •
(b) Show that dZ(A) = dZ( - A), i.e.
all g E L 2 (F), where integration with respect to Z( - A) is defined by
=
l -•. •l g(A) dZ( - A) = l -•.•l g( - A) dZ(A).
Deduce that dU(A) = dU( - J,) and d V(X) = - d V( - A).
(c) Show that { U(A), 0 :<::; A :<::; n} and { V(A), 0 :<::; A :<::; n} are orthogonal
increment processes such that for all ), and J.t,
E(dU(X) dU(p))
=
2 - 1 b;_, � dF( A.),
E(dV(X) dV(p)) = 2 - 1 3;_, � dF(A),
and
E(dU(A) dV(p))
(d) Show that
X, =
and that
X, = 2
I
J[-n, n]
I
J [O,x]
=
0.
cos(vt) dU(v) +
I
J [ - x, x]
cos(vt) dU(v) + 2
I
J [O, x]
sin(vt) dV(v)
sin(vt) dV(v).
4.26.* Use the properties (4.9.5) and (4.9.6) to establish Proposition 4.9 . 1 .
4.27.* Let { Z(v), - n :<::; v :<::; n } be an orthogonal-increment process with E IZ(v2 ) ­
Z(v1 W = a(v2 - v1 ), v2 ;::o: v1 . Show that
Y, =
I
J(-x,x ]
(n - I v l/2r l e ivr dZ(v),
4. The Spectral Representation of a Stationary Process
1 64
is a stationary process and determine its spectral density and variance. If
X,
=
f
(-rr,n:]
eivt dZ(v),
find the coefficients of the time-invariant linear filter which, when applied to
{ Y; }, gives {X, } .
4.28. * Show that i f 1/1( · ) and 0 ( · ) are polynomials with n o common zeroes and if
¢(z) = 0 for some z E IC such that l z l = I, then the ARMA equations
1/i (B)X, = O(B)Z,,
have no stationary solution. (Assume the existence of a stationary solution and
then use the relation between the spectral distributions of {X, } and { Z, } to
derive a contradiction.)
4.29.* Let {X,} be the stationary solution of the ARMA(p, q) equations,
¢(B)X, = O(B)Z,
where
¢(z)
=
and
l ai l
( I - a; 1 z) · · · ( 1 - a,- 1 z)(1 - a,-+\z) · · · ( 1 - a; 1 z),
> I,
i = 1, . . . , r;
l ad < 1 , i = r +
I, . . . , p.
Define
�(z) = (I - a; 1 z) · · · ( I - a,- 1 z)(l - a, + 1 z) · · · ( 1 - apz),
and show (by computing the spectral density) that {tj(B)X,} has the same
autocovariance function as { I a, + 1 · · · aP I 2 0(B)Z,}. It follows (see Section 3.2)
that there exists a white noise process {Z,} in terms of which {X,} has the
causal representation,
�(B)X,
=
O(B)Z,.
What is the variance of Z,?
4.30.* Prove Theorem 4. 1 . 1 . (To establish the sufficiency of condition (4. 1 . 6), let K 1 and
K 2 be the real and imaginary parts respectively of the Hermitian function K and
let 0"1 be the 2n x 2n-matrix,
where K�"1 = [K,(i - j)J7.j= l , r = 1 , 2.
Show that U"1 is a real symmetric non-negative definite matrix and let ( Y1 , • . • , Y;,,
0 and covariance matrix
0"1• Define the family of distribution functions,
Z1 , . . . , Z")' be a Gaussian random vector with mean
Problems
1 65
n E { 1, 2, . . . }, t E Z, and use Kolmogorov's theorem to deduce the existence of a
bivariate Gaussian process { ( Y, , Z,), t E Z} with mean zero and covariances
E( Y,+h Y,) = E(Z,+hZ,) = t K , (h),
E( Y, +hZ,) = - E(Z, + h Y, ) = t K 2(h).
Conclude by showing that {X, := Y, - iZ, t E Z } is a complex-valued process
with autocovariance function K( · ).)
4.3 1 * Let { B(.l.), - n ::s; A ::s; n} be Brownian motion as described in Example 4.6. 1 . If
g E L2(d).) is a real-valued symmetric function, show that the process defined by
X, =
I
J(-rr.O]
J2 cos(tv)g(v) dB(v) + I
J(O,rr]
J2 sin(tv)g(v) dB(v),
t = O, ± 1 , . . . ,
is a stationary Gaussian process with spectral density function g2(.l.)a2/(2n).
Conversely, suppose {X,} is a real stationary Gaussian process with orthogo­
nal-increment process Z(.l.) and spectral density function f(.l.) which is strictly
positive. Show that the process,
B(.l.) :=
I
J(O, !.]
r l i 2 (v)[dZ(v) + dZ(v)] ,
;, E [0, n] ,
is Brownian motion.
4.32.* Show that for any spectral c.d.f. F on [ - n, n] , the function D.( - - O)j(2n + 1 )
(see Section 2. 1 1) converges i n U(F) t o the indicator function o f the set { 0}.
Use this result to deduce, in the notation of Theorem 4.9. 1 , that Z(O) - Z(O - ),
- n < 0 ::s; n, is the mean square limit as n ..... oo of
(2n + W ' I Xj exp( - ijO).
Ul $ n
CHAPTER 5
Prediction of Stationary Processes
In this chapter we investigate the problem of predicting the values { X1,
t ;::: n + 1 } of a stationary process in terms of {X b . . . , X" } . The idea is to utilize
observations taken at or before time n to forecast the subsequent behaviour
P), the best predictor in uH
of { X1 }. Given any closed subspace uH of
of xn+ h is defined to be the element of uH with minimum mean-square distance
from Xn+ h · This of course is not the only possible definition of "best", but for
processes with finite second moments it leads to a theory of prediction which
is simple, elegant and useful in practice. (In Chapter 1 3 we shall introduce
alternative criteria which are needed for the prediction of processes with in­
finite second-order moments.) In Section 2.7, we showed that the projections
P41<x1 x " > Xn+ h and P5P{ l . x 1 , , x " } Xn+ h are respectively the best function of
XI , . . . , xn and the best linear combination of 1 , XI , . . . , xn for predicting
Xn+ h · For the reasons given in Section 2.7 we shall concentrate almost ex­
clusively on predictors of the latter type (best linear predictors) instead of
attempting to work with conditional expectations.
U(Q, .?,
• • • • •
• • •
§5. 1 The Prediction Equations in the Time Domain
=
Let { X1 } be a stationary process with mean J1 and autocovariance function
y( · ). Then the process { 1; } { X1 - J1} is a zero-mean stationary process with
autocovariance function y ( - ) and it is not difficult to show (Problem 5. 1) that
(5. 1 . 1 )
Throughout this chapter we shall assume therefore, without loss of generality,
that J1 0. Under this assumption it is clear from (5. 1 . 1 ) that
(5. 1 .2)
=
§5. 1. The Prediction Equations in the Time Domain
1 67
Equations for the One-Step Predictors
denote the closed linear subspace sp {X 1 , . . . ,X.}, n
nLet Yf,.0, denote
the one-step predictors, defined by
if n = 0,
{0
Xn +! =
P:rc, Xn+! if n 1 .
Since xn+ ! E Yf,., n 1 , we can write
n 1,
Xn+ I = rPn! Xn + · · · + rPnn X ! ,
�
1, and let X. +1,
�
�
(5. 1 .3)
�
�
(5. 1 .4)
�
where r/J. 1 , . . . , rPnn satisfy the prediction equations (2.3.8), viz.
)
I\ t=l
.f rPni xn+! - i , xn +! -j = <Xn+! , xn+ ! -j ),
j = 1, . . .
, n,
with <X , Y ) = E (X Y). By the linearity of the inner product these equations
can be rewritten in the form,
n
i=l
L rPniY (i - j)
=
y(j),
j = 1, . . .
, n,
or equivalently
..
(5. 1 .5)
where r. = [y(i - j )];_j= !, · " ' 'Yn = (y(l ), . . . , y(n))' and cp. = (r/J. 1 , . . . , r!J•• ) . The
projection theorem (Theorem 2.3. 1 ) guarantees that equation (5.1 .5) has at
least one solution since xn+l must be expressible in the form (5. 1 .4) for some
cp. E IR " . Equations (5. 1 .4) and (5.1 .5) are known as the one-step prediction
equations. Although there may be many solutions of(5. 1 .5), every one of them,
when substituted in (5. 1 .4), must give the same predictor xn+! since we know
(also from Theorem 2.3. 1 ) that Xn + I is uniquely defined. There is exactly one
solution of (5. 1 . 5) if and only if r. is non-singular, in which case the solution is
'
(5. 1 .6)
The conditions specified in the following proposition are sufficient to ensure
that r. is non-singular for every
n.
Proposition 5.1.1.
h->ngularthenfortheevery
covarin. ance matrix
If y(O)of>(X0 and. . .y,(h).->' i0s asnon-si
r. = [y ( - j )]i,j=!, . . . ,n
i
oo
I ,
x)
PROOF. Suppose that r. is singular for some n . Then since EX, = 0 there exists
such that r, is non-singular and
an integer � 1 and real constants
r
a 1 , ... , a,
r
x, +! = I ajxj
j= !
(see Problem 1 . 1 7). By stationarity we then have
1 68
5. Prediction of Stationary Processes
r
xr + h = L1 aj xj + h - 1 , for all h � I,
j=
and consequently for all n � r + 1 there exist real constants ai">, . . . , a�•l, such
that
(5. 1 .7)
where X, = (X 1 ,
. • .
, X,)' and a<•l = (a\">, . . . , a� l)'. Now from (5. 1 .7)
y(O) = a<•lT,a<•l
"
= a<•l' PAP'a<•l,
where (see Proposition 1 .6.3) A is a diagonal matrix whose entries are the
strictly positive eigenvalues .1. 1 :s; .1. 2 :s; · · · :s; .1., of r, and P P' is the identity
matrix. Hence
r
= ,1. 1 L (aj"l) 2 ,
j= 1
which shows that for each fixed j, a)" l is a bounded function of n.
We can also write y(O) = Cov(X., L 'i= 1 aj"l XJ, from which it follows that
y(O)
:S::
r
L l aj"ll ly(n - j)l.
j= 1
In view of this inequality and the boundedness of aj "l, it is clearly not possible
to have y(O) > 0 and y(h) -+ 0 as h -+ oo if r. is singular for some n. This
completes the proof.
D
Corollary 5.1 . 1 . Under the conditions of Proposition 5. 1 . 1 , the best linear pre­
dictor x.+ 1 of x. +1 in terms of X 1 , . . . , X. is
n
n = 1, 2, . . . ,
Xn+1 = L
rPni Xn+ 1 - i •
i= l
where cJl. := (r/J. 1 , . . . , r/J••r = r.- 1 Ym y. := (y(1 ), . . . , y(n))' and r. = [y(i - j)L= 1 .....• .
The mean squared error is v. = y(O) - y� r; 1 y• .
PROOF. The result is an immediate consequence of (5. 1 .5) and Proposition
5. 1 . 1 .
D
Equations for the h-Step Predictors, h � 1
The best linear predictor of Xn+ h in terms of X 1 ,
found in exactly the same manner as x. +1 • Thus
• • •
, x. for any h � 1 can be
n, h � 1 ,
(5. 1 .8)
1 69
§5.2. Recursive Methods for Computing Best Linear Predictors
where q,�h )
=
(ifJ���, . . . , ifJ���)' is any solution (unique if r" is non-singular) of
where y�h )
=
(y (h), y(h + 1), ... , y(n + h - 1 ))'.
(5.1.9)
§5.2 Recursive Methods for Computing Best Linear
Predictors
In this section we establish two recursive algorithms for determining the
and show how they can be
one-step predictors Xn+ l , :2: defined by
used also to compute the h-step predictors P.Yc,Xn+ h ' :2: Recursive pre­
diction is of great practical importance since direct computation of P.Yc,Xn+ h
from
and
requires, for large the solution of a large system of
linear equations. Moreover, each time the number of observations is increased,
the whole procedure must be repeated. The algorithms to be described in this
section however allow us to compute best predictors without having to
perform any matrix inversions. Furthermore they utilize the predictors based
on observations to compute those based on + observations,
. We shall also see in Chapter how the second algorithm greatly facilitates
the computation of the exact likelihood of { X1 , . . . , Xn } when the process {X, }
i s Gaussian.
n 1,
(5.1.3), h 1.
n,
(5.1.8) (5.1.9)
... n
n = 1, 2,
n 1
8
Recursive Prediction Using the Durbin-Levinson Algorithm
n :2: 1, we can express Xn+l in the form,
n :2: 1.
(5.2.1)
The mean squared error of prediction will be denoted by v" . Thus
n :2: 1,
(5.2.2)
and clearly v0 = y (O).
The algorithm specified in the following proposition, known as the Durbin
or Levinson algorithm, is a recursive scheme for computing «Pn = ( n , , cPnnY
and vn for n = 1, 2, ... .
Proposition
(The Durbin-Levinson Algorithm). If {X, } is a zero mean
stationary
process
withtheautocovariance
function y( such that y (O) > 0 and
coefficients
cPnj and mean squared errors vn as defined
yby(h)(5.--->2.1)0 ashand--->(5.2.2then
) satisfy y(1)/y(O), v0 y(O),
(5.2.3)
Since Xn+l
=
P.Yc,Xn+ l E j'f,,
c/J 1
5.2.1
oo,
·)
c/J 1 1 =
=
• . .
1 70
5. Prediction of Stationary Processes
[l/Jnl: ] = [l/Jn-:1,1 ] - Y'nn [l/Jn-: l,n-1]
•
•
and
l/Jn, n - 1
l/Jn -l , n- 1
A.
(5.2.4)
•
l/Jn- 1 , 1
(5.2.5)
$"1
X2 X.
$"2 =
= sp {
sp { X 1 PROOF. By the definition of P:xt ,
. . . , } and
P:xt X 1 } are orthogonal subspaces of Yf,. = sp { X 1 , . . . , X. } . Moreover it is easy
to see that for any Y E L
ff,P), f¥t;. Y = P:xt Y + P.Jfi Y. Hence
2 (0,
,
(5.2.6)
where
(5.2.7)
Now by stationarity, (X . . . , X.)' has the same covariance matrix as both
(X.,X
._1 , ... ,X1 )' and (X2 ,X.+1)', 1so that
n(5.2.8)
P� X � = L l/Jn - l, i Xi+l•
jn = 1l
(5.2.9)
f� x. +l = I l/J. - l. j xn+l -j•
j=l
and
I X! - PXj Xl ll 2 = I Xn+ l - PXj Xn+l ll 2 = I X. - X.� ll 2 = vn-1 · (5.2. 1 0)
From equations (5.2.6), (5.2.8) and (5.2.9) we obtain
xn + l = aX! j=lnL-1 [l/Jn -l, j - al/Jn- l,n-jJ Xn+l -j> (5.2. 1 1)
1,
, • . •
+
where, from (5.2.7) and (5.2.8),
In view of(5. 1 .6) and Proposition 5. 1 . 1 , the assumption that y(h) ---+ 0 as h ---+ oo
guarantees that the representation
x.+l = jI=ln l/J.jxn + l -j
(5.2. 1 2)
l/Jnn = a
(5.2. 1 3)
is unique. Comparing coefficients in (5.2. 1 1 ) and (5.2. 1 2) we therefore deduce
that
§5.2. Recursive Methods for Computing Best Linear Predictors
and
j
= ... , n 1,
171
1,
(5.2. 14)
in accordance with (5.2.3) and (5.2.4).
It remains only to establish (5.2.5). The mean squared error of the predictor
Xn+l is
=
=
=
lJn = IIXn+l - Xn+ 1 ll 2
II Xn+l - P;r; Xn+l - f'Jf2 Xn+l ll 2
II Xn +l - P,x; Xn+l ll 2 + II Px;Xn+l ll 2 - 2 (Xn+l - P,x; Xn+I , Px;Xn+l )
vn -1 + a 2 vn - l - 2a(Xn+I , XI - P,x; XI ),
where we have used (5.2. 10), the orthogonality of X1 and X2 , and the fact
that Plfi Xn+l a(X1 - f� X1 ). Finally from (5.2.7) we obtain
v. = vn - l (1 - a2 )
as required.
=
D
In Section 3.4 we gave two definitions of the partial autocorrelation of { X, }
a t lag viz.
n,
and
a(n) = Corr(Xn+l
-
PSi>{Xz, . . . , x"} Xn+I , X1 - Psp(x2, . . . , x"} X d
ct(n) = r/Jnn·
In the following corollary we establish the equivalence of these two definitions
under the conditions of Proposition 5.2. 1 .
(The Partial Autocorrelation Function). Under the assumptions
y
ofCorollar
Proposition
5.2. 1
5.2.1
PROOF. Since P;r; X. + 1 l. (X1 - Px-, X1 ), equations (5.2. 1 3), (5.2.7) and (5.2. 10)
give
rflnn
=
=
=
(Xn+I , XI - Px; X1 )/IIX1 - P,x; X1 II 2
(Xn+l - P;r; Xn+I , XI - P;r; X1 )/IIX1 - P,x; X1 II 2
Corr(Xn+l - P;r; xn+l , X I - f;r; X )
I .
D
Recursive Prediction Using the Innovations Algorithm
The central idea in the proof of Proposition 5.2. 1 was the decomposition of
.Yf', into the two orthogonal subspaces X1 and X . The second recursion,
established below as Proposition 5.2.2, depends on 2the decomposition of J'f,
into orthogonal subspaces by means of the Gram-Schmidt procedure.
n
5. Prediction of Stationary Processes
1 72
Proposition 5.2.2 is more generally applicable than Proposition 5.2. 1 since
to be a possibly non-stationary process with mean zero and
we allow
autocovariance function,
{ X1}
K(i,j) = < X;, xj > = E(X;XJ.
As before, we define £, sp {X 1 , ... ,Xn }, Xn +I as
I Xn+ I - Xn+ 1ll 2. Clearly (defining X1 0),
=
:=
so that
m
(5. 1 .3), and
n
::;::.:
Vn
=
1,
gn+ ! jL�n I enj(Xn+l -j - xn +I -)·
=
{ eni•j 1 , ... , n ; vn },
n=
(The Innovations Algorithm). If { X1} has zero mean and
Proposition
E(X;
X
;)
K(i,
j
),
where
the matrix [K(i,j)J?. j�I is non-singular for each n 1 ,
2 , . . ' then the one-step predictors xn + l ' n 0, and their mean squared errors
Vn, n 1, are given by
We now establish the recursive scheme for computing
1 , 2, . . . .
=
5.2.2
=
.
=
:::::-:
::;::.:
(5.2. 1 5)
and
V0
=
K( J , 1),
k
= ... , n 0, 1 ,
1,
K(n + 1 , n + 1 ) - nj�OL-1 e?;,n -jvj.
(5.2. 1 6)
to
solve
(5.2. 1 6) recursively in the order v0 ; 8 11 , v 1; 822 ,
8(It2 1•isV2a; trivial
833 • 83matter
2 • 831 • v3 ; . · .)
PROOF. The set {X I - x l , x2 - x2, ... ,Xn - Xn } is orthogonal since
X;) E YtJ- 1 fo i < j and (Xj - Xj) YtJ- 1 by definition
(X;inner
- product
of xj . Taking the
on both sides of (5.2. 1 5) with Xk +I - Xk+ I , 0 k < n, we have
<X'n+l , Xk+l - Xk+l > en.n-k vk .
Since (Xn +I - Xn +I) (Xk +I - Xk+ I ), the coefficients en , n -k• k = 0, ... , n - 1
are given by
=
.
j_
r
:-:;;
=
_l_
n
(5.2. 1 7)
Making use of the representation (5.2. 1 5) with replaced by k, we obtain
§5.2. Recursive Methods for Computing Best Linear Predictors
Since by (5.2. 1 7), <X.+ t , Xi+ 1 - Xi+ 1 )
(5.2. 1 8) in the form
= vien, n -i'
1 73
0 -5;_ j < n, we can rewrite
as required. By the projection theorem and Proposition 2.3.2,
0
completing the derivation of (5.2. 1 6).
=
Remark 1. While the Durbin-Levinson recursion gives the coefficients of X1 ,
. . . , x. in the representation xn+ 1 I:;� 1 ¢J.jxn+ 1 -j, Proposition 5.2.2 gives
the coefficients of the "innovations", (Xi - Xi ),j = 1 , . . , n, in the orthogonal
expansion X.+ 1 L}� 1 e.i(X.+ 1 -i - X.+ 1 -J The latter expansion is extremely
simple to use and, in the case of ARMA(p, q) processes, can be simplified still
further as described in Section 5.3. Proposition 5.2.2 also yields an innovations
representation of X.+ 1 itself. Thus, defining 8. 0 1 , we can write
=
.
=
n
Xn+ 1 = L e.i(Xn+ 1 -i - Xn+ 1 -i),
j�O
n
=
0, 1 , 2, . . . .
ExAMPLE 5.2. 1 (Prediction of an MA(l ) Process Using the Innovations Al­
gorithm). If {X, } is the process,
x,
= z,
+
ez,_ 1 ,
then K(i,j) = 0 for I i - jl > 1 , K(i, i) = a2(1 + 82) and K(i, i + 1 )
this it is easy to see, using (5.2. 1 6), that
2 5;_ j -5;_ n,
= 8a2.
From
and
v. = [1 + 82 - v;;-� 1 82a2] a2.
=
=
If we define r. = v./a2, then we can write
Xn+ 1 8(X. - X.)/rn -1
where r0 I + 82 and r.+ 1 1 + 82 - 82/r Table 5.2. 1 illustrates the use
of these recursions in computing X from observations of X 1 , . . . , X with
e = - .9. Note that v. is non-increasing in n and, since II X. - x. - Z. ll � 0
as n -> oo, v. -> a2 (see Problem 5.5). The convergence of v. to a2 is quite rapid
=
6
in the example shown in Table 5.2. 1 .
•.
5
5. Prediction of Stationary Processes
1 74
X,
Table 5.2. 1 . Calculation of
and from Five Observations
of the MA(l) Process,
X, = Z, - .9Z , Z, N(O, 1 )
v,
,1
_
X, + I
- 2.58
1 .62
- 0.96
2.62
- 1 .36
0
2
3
4
5
x,+l
,.._,
0
1.28
- 0.22
0.55
- 1.63
- 0.22
v,
1.810
1 .362
1.215
1 . 1 44
1 . 1 02
1 .075
ExAMPLE 5.2.2 (Prediction of an MA(1) Process Using the Durbin-Levinson
Algorithm). If we apply the Durbin-Levinson algorithm to the problem
considered in Example 5.2. 1 we obtain
v0 == 1-. 8.4972
10
rPf/J11 = - .3285 f/J == 1-.362.6605 v 1 .2 1 5
221 = - .4892 rP3 2 - .7404 v3 = 1 . 144
rP3
rPrP432432 = -- ..2433
1
1 9 14 rP4 3 = - .3850 rP4 2 = - .5828 f/J4 1 = - .7870 v4 1 . 102
f/Jv55 5 = 1-.075,
. 1 563 f/J 5 4 = - .3 144 f/J5 3 = - .4761 rPs z = - .6430 f/J5 1 = - . 8 1 69
f/J5 1 X5 -0.22, in agreement with the much
giving X 6 = f/J5 5 X 1 +
simpler calculation based on Proposition 5.2.2 and shown in Table 5.2. 1 .
Note that the constants f/J""' n = 1 , 2, . . . , 5, are the partial autocorrelations
at lags 1, 2, . . . , 5 respectively.
v1
=
=
=
=
=
···
+
=
Recursive Calculation of the h-Step Predictors, h :2: 1
"
PnXn+h P
PnXn+h = PnPn+h- 1 Xn+h
= PnXn+h
P.Ye. .
Let us introduce the notation for the projection operator
Then the
can easily be found with the aid of Propo� ition 5.2.2.
h-step predictors
By Proposition 2.3.2, for h � 1 ,
Since
(Xn+h-i - Xn+h-i) for- j < h, it follows from Proposition 2.3.2 that
(5.2. 1 9)
PnXn+h n+hj=hL 1 (}n +h-1 .)Xn +h -j - Xn+h -j)
j_ Yf.
=
1 75
§5.3. Recursive Prediction of an ARMA(p, q) Process
where the coefficients 8ni are determined as before by (5.2. 1 6). Moreover the
mean squared error can be expressed as
E(Xn+h - Pn Xn+h ) 2 = IIXn+h ll 2 - IIPnXn+h l l 2
n+h - 1
= K(n + h , n + h) - L 8�+h- 1 , j vn + h -j- 1 • (5.2.20)
j= h
§5.3 Recursive Prediction of an ARMA(p, q) Process
Proposition 5.2.2 can of course be applied directly to the prediction of the
causal ARMA process,
(5.3. 1 )
l/J(B)X, = 8(B)Z,,
{Z, } WN(O, a 2 ),
where as usual, l/J(B) = 1 - l/J 1 B - · · · - l/Jp BP and 8(B) = 1 + 8 1 B + · · · + 8q Bq .
We shall see below however that a drastic simplification in the calculations
can be achieved if, instead of applying Proposition 5.2.2 directly to {X,}, we
apply it to the transformed process (cf. Ansley ( 1979)),
t = 1 , . . . , m,
= a - 1 X,,
(5.3.2)
1
t > m,
= a l/J(B)X,,
�
{W.
W,
where
m = max(p, q).
(5.3.3)
For notational convenience we define 80 = 1 and assume that p � 1 and q � 1 .
(There is no loss of generality i n these assumptions since i n the analysis which
follows we may take any of the coefficients l/J; and 8; to be zero.)
With the subspaces Yf,. as defined in Section 5. 1 , we can write
n
�
1.
(5.3.4)
For n � 1 , Xn+ t and W, + t will denote the projections on Yf,. of Xn+1 and l¥, +1
respectively. As in (5. 1 .3) we also define X 1 = W1 = o.
The autocovariance function Yx( · ) of {X, } can easily be computed using
any of the methods described in Section 3.3. The autocovariances K(i,j) =
E( W; ltj) are then found from
a -2 Yx(i -j),
1 � i, j � m,
[
a - 2 Yx(i -j) K(i,j) =
r�
¢lr Yx(r - l i -j i )
J
min(i,j) � m < max(i,j) � 2m,
min(i,j) > m,
0,
otherwise,
where we have adopted the convention 8i = 0 for j > q.
(5.3.5)
1 76
5. Prediction of Stationary Processes
{ W,} we obtain
w.. +1 j=1f enpv,,+1-j - w.. +1-j), 1 :::;; n <
(5.3.6)
f
w;,+1 j=1 enj( W,+1-j - w;, +1-j), n �
where the coefficients 8ni and mean squared errors
E( W,+ 1 - Wn + J2 are
found recursively from (5.2. 1 6) with K defined as in (5.3.5). The notable feature
of the predictors (5.3.6) is the vanishing of 8ni when both n � andj q. This
is a consequence of ( 5.2. 1 6) and the fact that K(n,j ) = 0 if n
and In - j I q .
To find X" from W, we observe, by projecting each side of (5.3.2) onto Jt;_ 1 ,
that
1,
Jf; 0' -1 x,
(5.3.7)
{ Jf; = 0' - 1 [XI - rP1 X1 -1 - . . . - r/JP XI _p] , t
Applying Proposition 5.2.2 to the process
{
m,
=
�
�
= L.
m,
r" =
m
>
> m
t
=
=
>
. . . , m,
> m,
(5.3.8)
X1 - X1 = O'[W, - Jf;] for ali t � 1 .
Replacing ( rtj - � ) by 0' - 1 (Xi - Xi ) in (5.3.6) and then substituting into (5.3.7)
we finally obtain,
n
1 :::;; n <
xn+1 = L1 8niXn+1-j - xn +1-j),
(5.3.9)
��
Xn+1 Xn + r/Jp Xn+t- p + j=1I 8niXn +1-i - Xn+1-j), n �
which, together with (5.3.2), shows that
{
m,
=
+ ···
m,
and
(5.3. 1 0)
8ni
where and r" are found from (5.2. 1 6) with K as in (5.3.5). Equations (5.3.9)
determine the one-step predictors
. . . , recursively.
r/J1 , , r/Jp, 81 , . . . , 8q
X2 , X3,
{ W,}
8ni
Remark 1. The covariances K(i,j) of the transformed process
depend only
on
and not on 0' 2 • The same is therefore true of and r" .
. • .
Xn + 1
Remark 2. The representation (5.3.9) for
is particularly convenient from
a practical point of view, not only because of the simple recursion relations
for the coefficients, but also because for n � m it requires the storage of at
most p past observations
and at most q past innovations
Direct application of
+ 1 J, j = 1 , . . . , q, in order to predict
Proposition 5.2.2 to
on the other hand leads to a representation of
in terms of all the n preceding innovations
= 1, . . . , n.
xn , . . . , xn+1 - p
Xn +1 .
xn+ 1
(Xi - Xi),j
Remark 3. It can be shown (see Problem 5.6) that if { X1} is invertible then as
n
...... 1 and enj ...... ej ,j = 1, . . . , q.
(Xn +1 -i - Xn ...... 00, rn
{XI}
§5.3. Recursive Prediction of an ARMA(p, q) Process
1 77
ExAMPLE 5.3. 1 (Prediction of an AR(p) Process). Applying (5.3.9) to the
ARMA(p, 1 ) process with = 0, we easily find that
81
n � p.
5.3.2 (Prediction of an MA(q) Process). Applying (5.3.9) to the
ARMA( 1 , q) process with f/J 1 = 0, we obtain
ExAMPLE
,L q) 8 X +1 - +1 ), n 1 ,
n
mi
n
(
1
gn+ j= 1 ni n -j gn -j
where the coefficients enj are found by applying the algorithm (5.2. 1 6) to the
defined in (5.3.5). Since in this case the processes { lt;} and
co variances
{ (J - 1 X,} are identical, these covariances are simply
q-�2...-ji e,e,+ i-j •
=
�
K(i,j)
K(i,j) = (J-2 Yx (i - j) =
ExAMPLE
1 1
r=O
5.3.3 (Prediction of an ARMA ( 1 , 1) Process). If
(5.3. 1 1 )
and I fiJ I < 1, then equations (5.3.9) reduce to the single equation
n
�
(5.3. 1 2)
1.
(J8" 1 8 82 )/(1
To compute
we first use equations (3.3.8) with k = 0 and k = 1 to find
- f/J2). Substituting in (5.3.5) then gives, for
that Yx (O) = 2 ( 1 + 2 ¢J +
i, j � 1 ,
i =j = 1,
i = j � 2,
l i - ji = 1 i
otherwise.
,
�
1,
With these values of K(i,j), the recursions (5.2. 1 6) reduce to
(5.3. 1 3)
which are quite trivial to solve (see Problem 5. 1 3).
for the process
In Table 5.3. 1 we show simulated values of
(5.3. 1 1 ) with Z, N (O, 1 ), f/J 1 = ¢J = 0.2 and = = 0.4. The table also shows
n = 1 , . . . , 1 0, computed from (5.3. 1 3) and the cor­
the values of and
responding predicted values
n = 1, . . . , 1 0, as specified by (5.3. 1 2). Since
(J2 = 1 in this case, the mean squared errors are
rn en 1 ' X +1,
n
�
81 8X1 , ... , X1 0
1 78
5. Prediction of Stationary Processes
Xn
Table 5.3. 1 . Calculation of for Data from
the ARMA(1, 1 ) Process of Example 5.3.3
n
xn+ l
rn
en !
xn+l
0
1
2
3
4
5
6
7
8
9
10
- 1 . 1 00
0.5 1 4
0. 1 1 6
- 0.845
0.872
- 0.467
- 0.977
- 1 .699
- 1 .228
- 1 .093
1 .3750
1 .0436
1 .0067
1 .001 1
1 .0002
1 .0000
1 .0000
1 .0000
1 .0000
1 .0000
1 .0000
0.2909
0.3833
0.3973
0.3996
0.3999
0.4000
0.4000
0.4000
0.4000
0.4000
0
- 0.5340
0.5068
- 0. 1 32 1
- 0.4539
0.7046
- 0.5620
- 0.36 14
- 0.8748
- 0.3869
- 0.5010
ExAMPLE 5.3.4 (Prediction of an ARMA(2, 3) Process). Simulated values
of X 1 , . . . , X 1 0 for the causal ARMA(2, 3) process
X, - X,_ 1 + 0.24X,_ 2 = Z, + 0.4Z,_ 1 + 0.2Z, _ 2 + 0. 1 Z,_3,
{Z, }
�
WN(O, 1 ),
are shown in Table 5.3.2.
In order to find the one-step predictors
n 2, . . . , 1 1 we first need the
covariances Yx(h), h = 0, 1, 2, which are easily found from equations (3.3.8)
with k = 0, 1, 2, to be
Xn,
=
Yx (O) = 7. 1 7 1 33, Yx0) = 6.44 1 39 and Yx(2) = 5.06027.
Substituting in (5.3.5), we find that the symmetric matrix K = [K(i,j)l,j = l , l , . . .
is given by
K=
7. 1 7 1 33
6.44 1 39
5.06027
0. 10
0
0
7. 1 7 1 33
6.44 1 39
0.34
0. 1 0
0
0
7. 1 7 1 33
0.8 1 6
0.34
0. 1 0
0
0
1 .2 1
0.50 1 .2 1
0.24 0.50 1 .2 1
0. 1 0 0.24 0.50 1 .2 1
0
(5.3.14)
The next step is to solve the recursions (5.2. 1 6) with K(i,j) as in (5.3. 14) for enj
and rn_ 1 , j = 1, . . . , n; n = 1 , . . . , 1 0. Then
§5.3. Recursive Prediction of an ARMA(p, q) Process
n
=
1 79
1 , 2,
gn+1 xn - 0.24Xn - 1 + j=L1 8n}Xn + 1 -j - .xn+1-j),
3
n = 3, 4, . . .
=
and
,
The results are shown in Table 5.3.2.
Table 5.3.2. Calculation of
Process of Example 5.3.4
xn+l
n
0
1
2
3
4
5
6
7
8
9
10
11
12
1 .704
0.527
1 .041
0.942
0.555
- 1 .002
- 0.585
0.010
- 0.638
0.525
rn
7. 1 7 1 3
1 .3856
1 .0057
1 .001 9
1 .00 1 6
1 .0005
1 .0000
1 .0000
1 .0000
1 .0000
1 .0000
1 .0000
1 .0000
.Xn + 1 for Data from the ARMA(2, 3)
on!
Onz
(}n3
0.8982
1 .3685
0.4008
0.3998
0.3992
0.4000
0.4000
0.4000
0.4000
0.4000
0.4000
0.4000
0.7056
0. 1 806
0.2020
0. 1995
0. 1 997
0.2000
0.2000
0.2000
0.2000
0.2000
0.2000
0.01 39
0.0722
0.0994
0.0998
0.0998
0.0999
0. 1 000
0. 1 000
0. 1 000
0. 1 000
xn+1
0
1 .5305
- 0. 1 71 0
1 .2428
0.7443
0.3 1 38
- 1 .7293
- 0. 1 688
0.3 193
- 0.873 1
1 .0638
h-Step Prediction of an ARMA(p, q) Process, h � 1
As in Section 5.2 we shall use the notation
Then from (5.2. 1 9) we have
Pn for the projection operator P
£, .
n +h -1
Pn w,+h = j=h
2: en+h- 1 )w,+h-j - W..+h-j)
{
Pn
Pn Xn +h
Using this result and applying the operator to each side of the equations
satisfy
(5.3.2), we conclude that the h-step predictors
Pn Xn+h =
n+ -1
± 8n+h-l,j(Xn+h -i - Xn +h-i), 1 :S: h :S: m - n,
j=h
if= 1 r/J; Pn Xn +h - i + h<:,Lj <:, q en+h-1 )Xn+ h-j - xn +h-j ),
h
>
m - n.
(5.3. 1 5)
180
5. Prediction of Stationary Processes
Once the predictors X 1 , . . . , x. have been computed from (5.3.9), it is a
straightforward calculation, with n fixed, to determine the predictors P. X. +1 ,
P. X. + 3 , . . . , recursively from (5.3. 1 5).
Assuming that n > m, as is invariably the case in practical prediction
problems, we have for h ;:::: 1 ,
P.X.+ 2 ,
(5.3. 1 6)
where the second term is zero if h > q. Expressing Xn+h as Xn+h + (Xn+h xn+h ), we can also write,
( 5.3. 1 7)
where 8. 0
I
for all n . Subtracting (5.3. 1 6) from (5.3. 1 7) gives
p
h- 1
L en+h-l ,j(Xn+h-j - xn+h -),
xn+h - P.Xn+h - L r/>; (Xn+h-i - P.Xn+h-i ) = j=O
i=1
:=
and hence,
(5.3 . 1 8)
where <D and e are the lower triangular matrices,
and
0=
[ 8n+i- J . i-j]�.j=1 (8. 0 := 1 , e.i := 0 if ) > q or j < 0).
From (5.3. 1 8) we immediately find that the covariance matrix of the vector
(X. +1 - P.X.+ 1 , , Xn+h - P. Xn + h )' of prediction errors is
. • •
(5.3. 1 9)
where V = diag(v., v.+ 1 , . . . , vn + h-d· It is not difficult to show (Problem 5.7)
that <D - 1 is the lower triangular matrix
(5.3.20)
<D- 1 = [ X; iJ � i = 1 (Xo := 1 , Xi := 0 if j < 0),
whose components Xi,) ;:::: I, are easily computed from the recursion relations,
-
.
(5.3.2 1 )
j = 1 , 2, . . . .
xj = I1 r/>kxj-k ,
k=
[By writing down the recursion relations for the coefficients in the power series
expansion of 1 /r/>(z) (cf. (3.3.3)), we see in fact that
min(p,j)
I xiz i = (1 - r/>1 z - · · · - r/>p zPr 1 ,
w
j=O
lzl
:'S:
1 .]
�5.3. Recursive Prediction of an ARMA(p, q) Process
181
The mean squared error of the h-step predictor PnXn+ h is then found from
(5.3. 1 9) to be
a; (h) : = E(Xn+h - Pn Xn+h ) 2 =
:� Cto XA+h -r- 1 , j-ry vn+h -j- 1 •
(5.3.22)
Assuming invertibility of the ARMA process, we can let n -+ oo in (5.3. 1 6)
and (5.3.22) to get the large-sample approximations,
p
q
Pn Xn+h � I </J;Pn Xn +h -i + L ej(Xn+h-j - xn+ h-)
j=h
i= 1
(5.3.23)
and
(5.3.24)
where
lzl ::S:: 1 .
EXAMPLE 5.3.5 (Two- and Three-Step Prediction of an ARMA(2, 3) Process).
We illustrate the use of equations (5.3. 1 6) and (5.3.22) by applying them to the
data of Example 5.3.4 (see Table 5.3.2). From (5.3. 1 6) we obtain
3
2
p1 0 x1 2 = I1 </J;P1 0 X l 2 - i + j=L el l . iX ! 2 -j - .1\ 2-)
i=
2
= </J1 X1 1 + </J2 X 1 0 + .2(X 1 0 - X10 ) + . 1 (X9 - X9 )
= 1.1217
and
3
2
P l o X 1 3 = I </J;P!o X ! 3 - i + L e! 2 )X 1 3 -j - x 1 3 -)
j= 3
i= 1
= </J1 P1o X 1 2 + </J2 X 1 1 + . l (X 1 0 - X1o l
= 1 .0062
For k > 1 3, P1 0 Xk is easily found recursively from
P1 o Xk = </J1 P1 0 Xk_ 1 + </J2 P1 o Xk- 2 ·
To find the mean squared error of PnXn+h we apply (5.3.22) with Xo = 1 ,
X1 = </J1 = 1 and X 2 = </J 1 X 1 + </J2 = .76. Using the values of enj and vj ( = rj ) in
Table 5.3.2, we obtain
a fo (2) = E(X 1 2 - P10 X 1 2 ) 2 = 2.960,
and
5. Prediction of Stationary Processes
1 82
If we use the large-sample approximation (5.3.23) and (5.3.24), the predicted
values
and mean squared errors
1 , are unchanged since
the coefficients e.i , j = 1 , 2, 3, and the one-step mean squared errors v. =
have attained their asymptotic values (to four decimal places) when n = 1 0.
P1 0X1 o+h
O"f0(h), h ;::::
r. 0"2
§5.4 Prediction of a Stationary Gaussian Process;
Prediction Bounds
Let { X, } be a zero-mean stationary Gaussian process (see Definition 1 .3.4)
with covariance function
such that
> 0 and
0 as
oo. By
equation (5. 1 .8) the best linear predictor of
in terms of X. =
y(O)X +
nh
y ( ·)
IS
P.Xn +h
y(h) --> (Xh 1-,->. . . , X.)'
h ;:::: 1 . (5.4. 1 )
(The calculation of
is most simply carried out recursively with the aid
of (5.2. 1 9) or, in the case of an ARMA(p, q) process, by using (5.3.1 5).) Since
,
has a multivariate normal distribution, it follows from Problem
2.20 that
(X1 , Xn+h)'
P.Xn +h = E.ff<x , . ... . x"J Xn +h = E(Xn + h i X1 , . . . , X.).
Gaussian process it is clear that the prediction error,
tJ..(h)For:=aXstationary
n+h - P.Xn+ h' is normally distributed with mean zero and variance
O"; (h) = EtJ..(W,
• • •
which can be calculated either from (5.2.20) in the general case, or from (5.3.22)
if {X, } is an ARMA(p, q) process.
Den oting by
the ( 1 - a/2)-quantile of the standard normal distribu­
tion function, we conclude from the observations of the preceding paragraph
±
with probability
that
lies between the bounds
(1 - a). These bounds are therefore called ( 1 - a)-prediction bounds for
xn +h
<l>1_a12
P.Xn +h <l> 1 -aj2 0"n (h)
Xn+h ·
§5.5 Prediction of a Causal Invertible ARMA Process
in Terms of Xi, - oo < j < n
P.Xn +h
{X,}
It is sometimes useful, primarily in order to approximate
for large n,
to determine the projection of
onto .A. = sp { Xi , - oo < j :::::; n }. In this
section we shall consider this problem in the case when
is a causal
invertible ARMA(p, q) process,
Xn+h
(5.5. 1 )
I n order to simplify notation we shall consider n to be a fixed positive integer
§5.5. Prediction of a Causal Invertible ARMA Process
1 83
(=
and define
n).
(5.5.2)
n
Theorem 5.5.1 . X, is the causal invertible ARMA process (5. 5 .1) and X, is
defined by (5.5.2) then
(5.5.3)
L nj Xn+h-j
xn+h
j=l
and
(5.5.4)
xn+h L 1/Jj Zn+h-j •
j=h
where Li=o njzj = ¢(z)/8(z) and L�o 1/Jjzj = 8(z)/¢(z), l z l 1. Moreover
(5.5.5)
PROOF. We know from Theorems 3.1.1 and 3.1. 2 that
(5.5.6)
zn+h = xn+h + Ll nj Xn+h-j
j=
X, := P«" X,
X, for t s
E(Xn+h - Xn+h ) 2 from the following theorem.
We can then determine Xn +h and
2
The quantity E(Xn+h - Xn+h ) is useful for large as an approximation to
E(Xn+h - Pn Xn+h ) z .
If
=
-
co
co
=
s
co
and
= j=O 1/JjZn+h-j·
(5.5.7)
Applying the operator P11" to each side of these equations and using the fact
that Zn+ k is orthogonal to A" for each k 1, we obtain equations (5. 5 . 3) and
(5.5.4). Then subtracting (5.5.4) from (5.5.7) we find that
h -1
(5.5.8)
L 1/Jj Zn+h-j•
xn+h - xn+h = j=O
from which the result (5. 5 . 5 ) follows at once.
D
Remark 1. Equation (5. 5 . 3 ) is the most convenient one for calculation of Xn+h ·
It can be solved recursively for h = 1, 2, 3, ... , using the conditions X, = X,,
t n. Thus
xn+h
co
L
�
s
xn+ l
xn+ 2
co
=
- L nj Xn+ l -j •
j=!
= - n ! Xn+! - j= 2 njxn+ 2 -j•
co
L
etc.
1 84
5. Prediction of Stationary Processes
X�,
For large n, a truncated solution
obtained by setting
in (5.5.3) and solving the resulting equations
L�n +h niXn +h-i = 0
xt = X, t = 1 , . . . , n,
is sometimes used as an approximation to P.Xn + h · This procedure gives
with
etc.
X
E(Xnn+h+h - P.Xn+h)2•
The mean squared error of
as specified by (5.5.5) is also sometimes used
The approximation (5.5.5) is in
as an approximation to
fact simply the large sample approximation (5.3.24) to the exact mean squared
error (5.3.22) of
P. Xn+h ·
Remark 2. For an AR(p) process, equation (5.5.3) leads to the expected result,
X.+ t = </J1 Xn · · · + f/JpXn + t-p•
with mean squared error
+
For an MA(1) process (5.5.3) gives
with mean squared error
E(Xn+t - x.+d2 = a2.
The "truncation" approximation to P.X.+1 for the MA(l ) process is
which may be a poor approximation if I fJ1 1 is near one.
Xn +h - Xn +h • =
h 1 , 2, . . . , are not
Remark 3. For fixed n, the prediction errors
uncorrelated. In fact it is clear from (5.5.8) that the covariance of the h-step
and k-step prediction errors is
for k
(Xn+h - P.Xn +h)
2
h. (5.5.9)
(Xn+k - P.X.+k)
and
The corresponding covariance of
rather more complicated, but it can be found from (5.3 . 1 9).
IS
§5.6.* Prediction in the Frequency Domain
185
§5.6* Prediction in the Frequency Domain
- ,
{X,,
If
t E Z} is a zero-mean stationary process with spectral distribution
function F and associated orthogonal increment process { Z(A.), n ::;; ) ::;; n },
then the mapping I defined by
(5.6. 1 )
I g = 1-n. n] g(A.) dZ(A.),
i s a n isomorphism o f
onto
= sp { X,, t E Z } (see Remark 1 o f Section
4.8), with the property that
( )
L 2 (F)
Yf
t E Z.
(5. 6 .2)
This isomorphism allows us to compute projections (i.e. predictors) in Yf by
computing projections in L 2 (F) and then applying the mapping I. For example
the best linear predictor
in
=
-oo < t ::;;
can
of
be expressed (see Section 2.9) as
P41"Xn+h Xn+h J!tn sp{X,,
P.41" Xn+h - I(Psp{exp( it · ). -co <t :Sn} ei(n+h) · )·
_
n}
(5.6.3)
The calculation of the projection on the right followed by application of the
mapping is illustrated in the following examples.
I
EXAMPLE 5. 6 .1. Suppose that {X,} has mean zero and spectral density f such
that
1 < I A. I ::;; n.
(5.6.4)
f(A.) 0,
It is simple to check that 11 - e - i-< 1 < 1 for A. E = [ - 1, 1 ] and hence that the
series
=
E
[1 - ( 1 - e - i-<)J - 1 = e i-<. Consequently
e i( n+ ! k�Of j�OI (�) ( - 1)iei-<(n-j),
(5.6.5)
converges uniformly on E to
).<
}
=
(5.6.6)
with the series on the right converging uniformly on E. By (5.6.4) the series
(5.6.6) converges also in L2 (F) to a limit which is clearly an element of
sp {exp(it · ), - oo < t ::;; n } . Projecting each side of (5.6.6) onto this subspace,
we therefore obtain
Psp(exp ( it · ). -co < t:Sn} e i(n+ !) · - e i(n +! ) · - kf�O j�O� (k.) ( - 1)ie i(n -j)·. (5.6. 7)
Applying the mapping I to this equation and using (5.6.2) and (5.6.3), we
conclude that
_
_
L.
L.
}
(5.6.8)
1 86
5. Prediction of Stationary Processes
Computation of this predictor using the time-domain methods of Section 5. 1
is a considerably more difficult task.
ExAMPLE 5.6.2 (Prediction of an ARMA Process). Consider the causal inver­
tible ARMA process,
qy (B)X, = 8(B)Z,,
with spectral density
(5.6.9)
where
00
a(A.) (2nr 112 <J k=OL th e- iu
(5.6. 1 0)
and L r'= o t/fk z k 8(z)/¢J(z), l z l 1 . Convergence of the series in (5.6. 1 0) is
uniform on [ - n, n] since, by the causality assumption, L k lt/lk l < oo .
PSiJ{exp( ir · ), <r s n) ei<n+h) · must satisfy
The function g(
f" (ei(n+h).\ - g (A.))e-im.la(A.)a(A.) dA. 0, n. (5.6. 1 1)
This equation implies that (ei <n + h)· - g( · ))a( · )a( · ) is an element of the
subspace A+ = sp {exp(im · ),
n} of U( [ - n, n], , dA.). Noting from
0}, we deduce that the function
1 /a( ) sp{exp( ),
(ei(5.6.<n+1h0))· -thatg(-))a(-)
is also an element of A+ · Let us now write
e i <n + h) .l a ( ) g(A.)a(A.) + (ei<n + h) .l g(A.))a(A.),
(5.6. 1 2)
observing that g(-)a(-) is orthogonal to A+ (in U(d.A.)). But from (5.6. 10),
( 5.6. 1 3)
ei(n+hl.la(A.) (2n)-1 2 <Jein.l k =L-h t/lk+h e- ikA,
and since the element ei<n + h) · a(-) of 2 (dA.) has a unique representation as a
sum of two components, one in A+ and one orthogonal to A+ , we can
=
:S:
=
·) =
- oo
m :S:
=
m >
" E
im ·
),
:?B
m ;:o:
=
_
00
!
=
L
immediately make the identification,
g(A.)a(A.)
=
(2n) - l i2
Using (5.6. 10) again we obtain
<Jein.l kL=O t/lk +h e- iu.
00
g(A.) ei".l[¢J(e- i.l)/8(e-i.l) ] kL=O t/lk +he-iu,
i.e.
g(A.) j=OL rxi ei<n-il-l,
(5.6. 1 4)
where L'f=orxi z i [¢J(z)/8(z)] L r'= o t/lk + h z\ i z l 1 . Applying the mapping I to
each side of (5.6. 14) and using (5.6.2) and (5.6.3), we conclude that
00
=
00
=
=
:S:
§5.7.* The Wold Decomposition
1 87
f$/" Xn+ h
a)
(5.6. 1 5)
L Xn
j=O IJ.j -j·
It is not difficult to check (Problem 5. 1 7) that this result is equivalent to (5.5.4).
=
§5.7* The Wold Decomposition
In Example 5.6. 1 , the values Xn+j,j ;:::. 1, of the process {X,, t E Z} were perfectly
predictable in terms of elements ofA" = sp {X,, -oo <
n}. Such processes
are called deterministic. Any zero-mean stationary process { X,} which is not
deterministic can be expressed as a sum X, = U, + v; of an MA( oo) process
{ U, } and a deterministic process { v;} which is uncorrelated with { U,}. In the
statement and proof of this decomposition (Theorem 5.7. 1 ) we shall use the
notation (J 2 for the one-step mean squared error,
(J2 = E I Xn+1 - PA" Xn+1 l 2 ,
t ::::;
and At - oo for the closed linear subspace,
n = � oo
of the Hilbert space At = sp { X,, E Z}. All subspaces and orthogonal com­
plements should be interpreted as relative to At. For orthogonal subspaces
9'1 and 9'2 we define 9'1 EB //2' := { x + y : E 9;. and y E 9'2 }.
t
x
Remark 1 . The process { X, } is said to be deterministic if and only if (J 2
or equivalently if and only if X, E At for each t (Problem 5. 1 8).
=
0,
-oo
Theorem 5.7.1 (The Wold Decomposition).
as
00
X, = L 1/Jj Zt-j + v;,
j=O
where
(i)
1/10
=
(ii) { Z,}
and j=OL 1/1/ < oo,
a)
1
�
WN(O, (J 2 ),
for each t E Z,
0 for all s, t E Z,
(iv) E(Z,
for each t E Z,
(v) v; E At
(vi) { v; } is deterministic.
(iii) Z, E At,
V,)
and
If (J2 > 0 then X, can be expressed
=
- oo
(5.7. 1 )
5. Prediction of Stationary Processes
1 88
((v) and (vi) are not the same since A is defined in terms of { X,}, not
{ V. } .) The sequences { 1/JJ , { Zj } and { rj } are uniquely determined by (5.7. 1 ) and
the conditions (i)-(vi).
- oo
PROOF. We first show that the sequences defined by
(5.7.2)
(5.7.3)
and
00
J!; = X, - I 1/Jj Zr -j ,
j=O
(5.7.4)
satisfy (5.7.1) and conditions (i)-(vi). The proof is then completed by establish­
ing the uniqueness of the three sequences.
Clearly Z, as defined by (5.7.2) is an element of A, and is orthogonal to
A,_ 1 by the definition of fA, _ , X, . Hence
Z, E A,':_ 1 c A/�2 c · · · ,
which shows that for s < t, E(Z5 Z,) = 0. By Problem 5. 1 9 this establishes (ii)
and (iii). Now by Theorem 2.4.2(ii) we can write
00
(5.7.5)
= I 1/Jj Zr-j ,
j =O
where 1/Jj is defined by (5.7.3) and I� o 1/1} < oo. The coefficients 1/Jj are inde­
pendent of t by stationarity and
psp{Zi, j s; r} Xr
1/10 = a-2 <X,, X, - PA, _ , X, ) = a - 2 I I X, - P.A, _ , X, II 2 = 1 .
Equations (5.7.4) and (5.7.5) and the definition of PS!5{ Zj , j s; r ) X, imply that
< Vr, Zs ) = 0 for s :0:: t.
On the other hand if s > t, Zs E A!--- 1 c A,l- , and since v; E .4t, we conclude that
.
< Vr, Zs ) = O for s > t,
establishing (iv.) To establish (v) and (vi) it will suffice (by Remark 1 ) to show
that
for every t.
(5.7.6)
sp { �,j :0:: t} = A
- oo
Since V. E A, = Ar - 1 EB sp { Z, } and since < v;, Z, ) = 0, we conclude that
J!; E Ar - 1 = A,_ 2 EB sp { Z,_d. But since < J!;, Zr 1 ) = 0 it then follows that
v; E A,_ 2. Continuing with this argument we see that v; E A,_j for each j � 0,
whence v; E n i= o A,_j = A Thus
-
- oo ·
sp { �,j :0:: t}
�
A - oo for every t.
(5. 7. 7)
Now by (5.7.4), A, = sp {Zj,j :O:: t} EB sp{ rj,j :O:: t}. If YEA_00 then Y E A5 _ 1
for every s, so that < Y, Z5 ) = 0 for every s, and consequently Y E sp{ T-j, j :0:: t } .
§5.7.* The Wold Decomposition
But this means that
At
- en
s;
189
sp { �,j :s;: t} for every t,
(5.7.8)
which completes the proof of (5.7.6) and hence of (v) and (vi).
To establish uniqueness we observe from (5. 7. 1) that if { Z, } and { V, }
are any sequences satisfying (5.7. 1 ) and having the properties (i)-(vi), then
Ar - J s; sp { Zj ,j :s;: t - 1 } EB sp { �,j :s;: t - 1 } from which it follows, using (ii)
and (iv), that Z, is orthogonal to Ar - 1 - Projecting each side of (5.7. 1 ) onto
A, _ 1 and subtracting the resulting equation from (5.7. 1 ), we then find that the
process { Z, } must satisfy (5. 7.2). By taking inner products of each side of (5. 7. 1)
with Z, _j we see that 1/Jj must also satisfy (5.7.3). Finally, if (5.7. 1) is to hold, it
is obviously necessary that V, must be defined as in (5.7.4).
D
In the course of the preceding proof we have established a number of results
which are worth collecting together as a corollary.
Corollary 5.7.1
(a) sp { �,j :s;: t} = At for every t.
(b) At, = sp {Zj ,j :s;: t} EB Af_ 00 •
(c)
= sp{Zj ,j E Z}.
(d) sp { Uj ,j :s;: t} sp {Zj ,j :s;: t}, where U, = Li= o t/lj Zr-j ·
-oo
u�tJ-co
=
PROOF.
(a)
(b)
(c)
(d)
This is a restatement of (5.7.6).
Use part (a) together with the relation, A, = sp {Zj ,j :s;: t} EB sp{ �,j :s;: t}.
Observe that At = sp{X, t E Z} = sp{Z,, t E Z} EB Af_ 00 •
This follows from the fact that At, = sp { tj,j :s;: t} EB At - w
D
In view of part (b) of the corollary it is now possible to interpret the
representation (5.7. 1 ) as the decomposition of the subspace A, into two
orthogonal subs paces sp { Z;,j :s;: t} and At- oo ·
A stationary process is said to be purely non-deterministic if and only
if Af_G(, = {0}. In this case the Wold decomposition has no determin­
istic component, and the process can be represented as an MA( oo ), X, =
'[.]= 0 t/lj Zr -j · Many of the time series dealt with in this book (e.g. ARMA
processes) are purely non-deterministic.
Observe that the h-step predictor for the process (5.7. 1) is
00
since Zj j_ ._4f,
error IS
�ff, xr+h = jI. t/lj Zr+h-j + v,+h•
=h
for all j < t, and V,+ h E A, . The corresponding mean squared
190
5. Prediction of Stationary Processes
which should be compared with the result (5.5.5). For a purely non-deterministic
process it is clear that the h-step prediction mean squared error converges as
h -+ oo to the variance of the process. In general we have from part (d) of
Corollary 5.7. 1 ,
psp{Ui ,j <; t } Ut + h = psp{Zi,j<; t } Ut + h = � 1/Jj Zt + h-j ,
]� h
which shows that the h-step prediction error for the { UJ sequence coincides
with that of the { X1 } process. This is not unexpected since the purely deter­
ministic component does not contribute to the prediction error.
00
EXAMPLE 5. 7 .I. Consider the stationary process xt = zt Y, where { zt }
WN (0, ri 2 ), { Z1 } is uncorrelated with the random variable Y and Y has mean
zero and variance ri 2 . Since
+
�
1 n-1
1 n-1
- I xt -j = I zt -j + Y � Y,
n j�o
n j�o
it follows t hat y E Jilt for every t. Also zt ..L J!!s for s < t so zt ..L J!!- oo · Hence
Y = ���- X1 is the deterministic component of the Wold decomposition and
"
zt = xt y is the purely non-deterministic component.
-
-
For a stationary process { X1 } satisfying the hypotheses of Theorem 5. 7. 1 ,
the spectral distribution function Fx i s the sum of two spectral distribution
functions Fu and Fv corresponding to the two components V1 = L � o 1/Ji Zt -i
and f'; appearing in (5.7. 1 ) (see Problem 4.7). From Chapter 4, Fu is absolutely
continuous with respect to Lebesgue measure and has the spectral density
00
where 1/J(e- ;;, ) = L 1/Ji e - ;i;._
(5.7.9)
j�O
On the other hand, the spectral distribution Fv has no absolutely continuous
component (see Doob ( 1953)). Consequently the Wold decomposition of a
stationary process is analogous to the Lebesgue decomposition of the spectral
measure into its absolutely continuous and singular parts. We state this as a
theorem.
Theorem 5.7.2. If ri 2
>
0, then
Fx = Fu + Fv
where Fu and Fv are respectively the absolutely continuous and singular com­
ponents in the Lebesgue decomposition of Fx. The density function associated
with Fu is defined by (5.7.9).
The requirement ri 2 > 0 is critical in the above theorem. In other words it
is possible for a deterministic process to have an absolutely continuous
spectral distribution function. This is illustrated by Example 5.6. 1 . In the next
section, a formula for ri 2 will be given in terms of the derivative of Fx which
§5.8. * Kolmogorov's Formula
191
CJ2
is valid even in the case
= 0. This immediately yields a necessary and
sufficient criterion for a stationary process to be deterministic.
§5.8 * Kolmogorov's Formula
Let {X, } be a real-valued zero-mean stationary process with spectral distri­
bution function Fx and let f denote the derivative of Fx (defined everywhere
on [ - n, n] except possibly on a set of Lebesgue measure zero). We shall
assume, to simplify the proof of the following theorem, that f is continuous
on [ - n, n] and is bounded away from zero. Since { X,} is real, we must have
f(.A.) = f( A) 0 ::;; ). ::;; n. For a general proof, see Hannan ( 1 970) or Ash and
Gardner ( 1 975).
-
,
Theorem 5.8.1 (Kolmogorov's Formula). The one-step mean square prediction
error of the stationary process { X, } is
{_l_f"
CJ2 =2n e x p
2n
- rc
}
ln f(.A.) d.A. .
(5.8. 1)
PROOF. Using a Taylor series expansion of ln(1 - z ) for l z l < 1 and the
identity J"- " e ik .l. d). =0, k # 0, we have for l a l < 1 ,
f
"
ln l l - ae - i.l. l 2 d). =
f}
n(l - ae - i.l.)(l - iiei.l.) d).
(5.8.2)
If {X,} is an AR(p) process satisfying cp(B)X, = Z, where {Z,} WN(O, CJ 2 )
and ¢(z) = 1 - ¢ 1 z - · · · - ¢vzP # 0 for l z l ::;; 1, then {X,} has spectral
density,
= 0.
�
where I a) < 1 , j = 1, . . . , p. Hence
f"
2
f " (J
p f"
(J
2
1n - d.A. - I
1n l 1 - ai e ;. l 2 d.A. =2n 1 n - ,
2n
2n
j= 1
.
establishing Kolmogorov' s formula for causal AR processes.
Under the assumptions made on f, it is clear that min _ rc s ;. 9 /(.A.) > 0.
Moreover, it is easily shown from Corollary 4.4.2 that for any c: E (0, min f(.A.)),
there exist causal AR processes with spectral densities g�l ) and g�2 l such that
- rc
1n g(.A.) d.A. =
- rc
- rc
-
'
(5.8.3)
1 92
5. Prediction of Stationary Processes
a2(f) = E[(Xt - P-sp{Xt - I • · · · • Xr_,l} X )2]
Now define
n
f
C ] , . . . , Cn
=
�i �
J� , ll - c 1 e-i.l - . c. e - i"" l 2f(A.) d).,
.· -
c , c,.
By (5.8.3) and the definition of
a�( · ),
a�(g� :-:::; a�(f) :-:::; a�(g�2l).
Since, by Problem 2. 1 8, a� (f) --> a2 (f) E[(X1 - PSf5{X" <s< t} X1 )2 ],
a2(g�1 l) :-:::; a2(f) :-:::; a2(g�2)).
(5.8.4)
1)
)
_
:=
oo
However we have already established that
a2(g�i)) 2n expL� f ,ln g�i)(A.)dA.} i 1 , 2.
If follows therefore from (5.8.4) that a 2 (f) must equal the common limit, as
--> 0, of a2(g�l l) and a2(g�2l), i.e.
D
a2(f) 2n exp{_l_2n I" In f(A.) dA.}.
Remark Notice that - oo J':.. , In f(A.) dA. < oo since In f(A.) :-:::; f(A.). If
J". , ln f(A.) d A. -oo, the theorem is still true with a 2 = 0. Thus
a2 0 if and only if r, Inf(A.) dA. -oo,
and in this case f(A.) 0 almost everywhere.
=
=
£
=
_,
:-:::;
1.
=
>
>
>
Remark 2. Equation (5.8. 1 ) was first derived by Szego in the absolutely
continous case and was later extended by Kolmogorov to the general case.
In the literature however it is usually referred to as Kolmogorov's formula.
Fu(dA.) a2 dA./2n
Fv(dA.) 0"2 b0(dA.) b0
2n exp {_1_2n J" In (2n0'2 ) dA.} a2.
EXAMPLE 5.8. 1 . For the process defined in Example 5.7. 1 ,
=
=
and
where is the unit mass at the origin. Not surprisingly,
the one-step mean square prediction error is therefore
=
_,
Problems
5. 1 . Let { X1 } be a stationary process with mean Jl. Show that
where { Y; }
=
{X1 - Jl}.
Problems
193
5.2. Suppose that { � ' n = I , 2, . . . } is a sequence of subspaces of a Hilbert space .Yt
with the property that � <;: �+l , n = I , 2, . . . . Let .Yl'00 be the smallest closed
subspace of .Yt containing U �, and let X be an element of .Yt. If PnX and P"' X
are the projections of X onto � and .Yt", respectively, show that
(a) P1 X, (P2 - P1 )X, (P3 - P2 )X, . . . , are orthogonal,
(b) I � l 11 ( � + 1 - � ) X II 2 < 00 ,
and
(c) Pn X --> PCN X.
5.3. Show that the converse of Proposition 5. 1 . 1 is not true by constructing a
stationary process {X, } such that r" is non-singular for all n and y(h) + 0 as
h --> 00 .
5.4. Suppose that { X, } is a stationary process with mean zero and spectral density
-n
::::::; Jc ::::::;
n.
Find the coefficients { 8,i,j = l , . . . , i; i = 1, . . . , 5} and the mean squared errors
{v, , i = 0, . . . , 5}.
5.5. Let {X, } be the MA( l ) process of Example 5.2. 1 . If i O I < I , show that as n --> oo ,
(a) II X" - X" - Zn ll --> 0,
(b) Vn -> IJ 2 ,
and
2
(c) (}n l --> 8. (Note that (} = E(Xn+! Zn)a- and (}n l = v;.\ E(Xn +l (Xn - Xn )).)
5.6. Let {X, } be the invertible M A(q) process
X,
=
Z, + 81 Z, _ 1 + · · ·
+
8qZr-q'
Show that as n --> oo ,
(a) II X" - X" - Zn ll --> 0,
(b) Vn -> IJ2 ,
and that
(c) there exist constants K > 0 and c E (0, I) such that I (}ni - (}i 1
5.7. Verify equations (5.3.20) and (5.3.21 ).
::::::;
Kc" for all n.
5.8. The values .644, - .442, - .9 1 9, - 1 .573, .852, - .907, .686, - .753, - .954, .576,
are simulated values of X1 , . . . , X 1 0 where { X, } is the ARMA(2, I) process,
X, - . I X,_1 - . 1 2X, _2
=
Z, - .7Z, _ 1 ,
{ Z, }
�
WN(O, 1 ).
(a) Compute the forecasts P10X1 1 , P1 0X1 2 and P1 0X1 3 and the corresponding
mean squared errors.
(b) Assuming that Z, N(O, 1 ), construct 95% prediction bounds for X1 1 , X1 2
and x l 3 .
(c) Using the method of Problem 5. 1 5, compute X[1 , X[2 and X[3 and compare
these values with those obtained in (a).
[The simulated values of X1 1 , X1 2 and X1 3 were in fact .074, 1 .097 and - . 1 87
respectively.]
�
5.9. Repeat parts (a)-( c) of Problem 5.8 for the simulated values - 1 .222, 1 .707, .049,
1 .903, - 3.341, 3.041, - 1 .0 1 2, - .779, 1 .837, - 3.693 of X 1 , . . . , X1 0 , where {X, }
is the MA(2) process
X, = Z, - l . I Z,_1 + .28Z,_ 2 ,
{ Z, }
�
WN(O, 1 ).
5. Prediction of Stationary Processes
194
[The simulated values of X1 1 , X1 2 and X1 3 in this case were 3.995, - 3.859
3.746.]
5.10. If {X I ' . . . ' xn } are observations of the AR(p) process,
{ Z, }
�
WN(0, 0" 2 ),
show that the mean squared error of the predictor PnXn+ h is
h- 1
for n � p, h � 1 ,
O"; (hJ = 0"2 2: 1/lf
j �O
where 1/J(z) = L � o lj11z1 1/1/J(z). This means that the asymptotic approximation
(5.3.24) is exact for an autoregressive process when n � p.
=
5. 1 1 . Use the model defined in Problem 4. 1 2 to find the best linear predictors of the
Wolfer sunspot numbers X 1 0 1 , . . . , X 1 05 (being careful to take into account the
non-zero mean of the series). Assuming that the series is Gaussian, find 95%
prediction bounds for each value. (The observed values of X 10 1 , . . . , X 1 05 are in
fact 1 39, I l l , 1 02, 66, 45.) How do the predicted values P1 00X1 oo+ h and their
mean squared errors behave for large h?
5. 1 2. Let { X, } be the ARMA(2, 1 ) process,
and let
X, - . 5 X, _ 1 + .25 X, _ 2 = Z, + 2Z,_ 1 ,
{
Y.
X,,
t ::s; 2,
'_
X, - .5X,_1 + .25X,_ 2 ,
{Z, }
�
WN(O, 1 ),
t > 2.
(a) Find the covariance matrix of ( Y1 , Y2 , Y3 )' and hence find the coefficients e 1 1
and e2 1 in the representations
;\\ = e l l (x l - x l ),
+
e2 1 (X z - X2 ).
(b) Use the mean squared errors of the predictors X 1 , X2 and X 3 to evaluate the
determinant of the covariance matrix of (X1 , X2 , X3 )'.
(c) Find the limits as n --> oo of the coefficients en! and of the one-step mean­
square prediction errors vn.
(d) Given that X199 = 6.2, X 2 00 = - 2.2 and X2 00 = .5, use the limiting values
found in (c) to compute the best predictor x2 0 1 and its mean squared error.
2
(e) What is the value of limh-oo E(Xn +h - PnXn+ h ) ?
5. 1 3. The coefficients enJ and one-step mean squared errors vn = rn0"2 can be deter­
mined for the general causal ARMA(1 , 1 ) process (5.3 . 1 1 ) by solving the equations
(5.3. 1 3 ) as follows:
(a) Show that if Yn := rn/(rn - !), then the last of the equations (5.3. 1 3), can be
rewritten in the form,
n � I.
Yn = o - 2Yn - t + I ,
n
l
2
(b) Deduce that Yn e - 2 Yo + L i� l e - (}- ) and hence determine rn
and on ! ' n = 1 , 2, . . . .
(c) Evaluate the limits as n --> oo ofrn and On1 in the two cases 1 e 1 < 1 and 1 e 1 � 1 .
x3 = .5Xz - .25X l
=
195
Problems
5. 14. Let {X, } be the MA( l ) process
x, = z, + oz, _ l ,
{Z, }
�
WN(0, 0"2)
with 1 0 1 < I .
(a) Show that vn := E I Xn +l - xn+l i 2 = 0'2( 1 - 82"+4)/(1 - 82" + 2).
(b) If X�+ I = - Li= I ( - wx n + 1 - j is the truncation approximation to PnXn + I '
show that E I X n + 1 - X�+ 1 1 2 = ( I + 82" + 2)0'2 and compare this value with
vn for 1 11 1 near one.
5.15. Let {X, } be a causal invertible ARMA(p, q) process
r/i (B)X, = &(B)Z,,
Given the sample {X1,
Z,* =
{0
. • .
, Xn }, we define
if t :-:::; 0 or t >
r/i (B)X, - 111 Z,*_1 - • • · - Bq Z,*_q if t = 1 , . . . , n,
n,
where we set X, = 0 for t <::; 0.
(a) Show that r/J(B)X, = B(B)Z,* for all t <::; n (with the understanding that
X, = 0 for t :-:::; 0) and hence that Z,* = n(B)X, where n(z) = r/J(z)/B(z).
(b) If x;;+ l = - LJ= I njXn+ l -j is the truncation approximation to PnXn + l (see
Remark I in Section 5.5), show that
(c) Generalize (b) to show that for all h 2 1
where Xl = xj ifj = 1 , . . . , n.
5.16.* Consider the process X, = A cos(Bt + U), t = 0, ± 1, . . . , where A, B and U are
random variables such that (A, B) and U are independent, and U is uniformly
distributed on (0, 2n).
(a) Show that {X,} is stationary and determine its mean and covariance function.
(b) Show that the joint distribution of A and B can be chosen in such a way that
{ X, } has the autocovariance function of the MA(1) process, Y, = Z, + &Z,_1 ,
{Z, } WN(0, 0"2), 101 :<:; 1.
(c) Suppose that A and B have the joint distribution found in (b) and let X,*+.h
and X, + h be the best and best linear predictors respectively of X,+ h in terms
of {Xi, - oo < j <::; t }. Find the mean squared errors of X,*+.h and X,+h ' h 2 2.
�
5. 1 7.* Check that equation (5.6. 1 5) is equivalent to (5.5.4).
5. 1 8. * If 0'2 is the one-step mean-square prediction error for a stationary process { X,}
show that 0'2 = 0 if and only if X, E At for every t.
- oo
5. 1 9. * Suppose that {X, } is a stationary process with mean zero. Define Jt, = sp {X.,
s :-:::; t } and z, = X, - PAt,_1X,.
(a) Show that 0'2 = E I X,+1 - PAt, X, + l l 2 does not depend on t.
(b) Show that t/Ji = E(X,Z,_i)/0'2 does not depend on t.
5. Prediction of Stationary Processes
196
5.20.* Let { Y, } be the MA( l ) process,
{Z, } WN(O, ri2 ),
Z, + 2.5Zt - 1 ,
and define X, A cos(wt) + B sin(wt) + Y, where A and B are uncorrelated (0,
rifj random variables which are uncorrelated with { Y, } .
(a) Show that { X, } i s non-deterministic.
(b) Determine the Wold decomposition of { X, } .
(c) What are the components o f the spectral distribution function o f { X , } cor­
responding to the deterministic and purely non-deterministic components
of the Wold decomposition?
Y,
=
�
=
5.2 1 . * Let { x., n = 0, ± 1, . . . } be a stationary Markov chain with states ± 1 and
transition probabilities P(Xn +l j i X. i) p if i j, ( 1 - p) if i =I= j. Find the
white noise sequence { z. } and coefficients aj such that
=
=
Z. E sp {X,,
-
=
oo
=
< t s n}
and
n
00
x. = I ajZn -j,
j=O
=
0, ± 1 , . . . .
5.22. Suppose that
=
A cos(nt/3) + B sin(nt/3) + z, + .5Z, _, ,
t 0, ± 1 , . . ,
where {Z,} WN(O, 1 ), A and B are uncorrelated random variables with
mean zero, variance 4 and E(AZ,) = E(BZ,) = 0, t = 0, ± 1, . . . . Find the best
linear predictor of X,+, based on X, and X,_ , . What is the mean squared error
of the best linear predictor of X,+, based on { Xj, -oo < j ::::; t}?
X,
=
.
�
5.23. * Let { X, } be the moving average
where �k
=
G)Ci: )
j=
- oo
k .
(a) Find the spectral density of { X, } .
(b) Is the process purely non-deterministic, non-deterministic, o r deterministic?
5.24. * If the zero-mean stationary process { X. } has autocovariance function,
y(h) =
{1
p
if h
if h
0,
=I= 0,
=
where 0 < p < 1,
(a) show that the mean square limit as n -> oo of n _ , L} = 1 X _ j exists,
(b) show that x. can be represented as
x.
=
Z + Y.,
where { Z, }j,j 0, ± 1 , . . . } are zero-mean uncorrelated random variables
with EZ2 = p and E }j 2 = 1 - p, j 0, ± 1, . . .
(c) find the spectral distribution function of { Xn } ,
=
=
,
1 97
Problems
(d) determine the components in the Wold decomposition of X" , and
(e) find the mean squared error of the one-step predictor Psp{x1• _ "' < i , nl X " + 1 .
5.25. Suppose that { V,} and { V,} are two stationary processes having the same auto­
covariance functions. Without appealing to Kolmogorov's formula, show that
the two processes have the same one-step mean-square prediction errors.
5.26.* Under the assumptions made in our proof of Kolmogorov's formula (Theorem
5.8. 1 ), show that the mean squared error of the two-step linear predictor
X t + 2 ·- Psp{XJ, � x_ < 1 :s; r : X t + 2 is
with (J 2 as in (5.8.1).
5.27. Let { X, } be the causal AR( l ) process,
X, - rjJX,_1
=
Z,,
{Z, }
�
WN(O, (J2 ),
and let xn+ ! be the best linear predictor of xn+ ! based on X I ' . . . ' Xn . Defining
enO 1 and X 1 0, find en! ' . . . ' enn such that
n
Xn+!
L enj (Xn+! -j - xn+ t -jl·
j� O
=
=
=
5.28.* Suppose that X, = L.0 o 1/JiZ, _ i, {Z,} WN(O, I ) and L.0 o 1/JJ <
that the h-step mean-square prediction error,
�
(J2(h) := E(X, + h - Psp{X" - ro < s :>t) X t + h)2 ,
satisfies
(Jz(h) � 1/1� + . . . + ljlj; _ t ·
Conclude that {X,} is purely non-deterministic.
oo .
Show
CHAPTER 6*
Asymptotic Theory
In order to carry out statistical inference for time series it is necessary to be
able to derive the distributions of various statistics used for the estimation of
parameters from the data. For finite n the exact distribution of such a statistic
f.(X 1 , . , X.) is usually (even for Gaussian processes) prohibitively compli­
cated. In such cases, we can still however base the inference on large-sample
approximations to the distribution of the statistic in question. The mathe­
matical tools for deriving such approximations are developed in this chapter.
A comprehensive treatment of asymptotic theory is given in the book of
Serfling ( 1 980). Chapter 5 of the book by Billingsley ( 1986) is also strongly
recommended.
. .
§6. 1 Convergence in Probability
We first define convergence in probability and the related order concepts
which, as we shall see, are closely analogous to their deterministic counter­
parts. With these tools we can then develop convergence in probability ana­
logues of Taylor expansions which will be used later to derive the large-sample
asymptotic distributions of estimators of our time series parameters.
Let {a., n = 1, 2, . . } be a sequence of strictly positive real numbers and let
{ x., n = 1 , 2, . . . } be a sequence of random variables all defined on the same
probability space.
.
Definition 6.1.1 (Convergence in Probability to Zero). We say that X. con­
verges in probability to zero, written x. = op ( 1 ) or x• .!... 0, if for every
e > 0,
P( I X. I > e) ---> 0 as n ---> oo .
§6. 1. Convergence in Probability
199
Definition 6. 1.2 (Boundedness in Probability). We say that the sequence {Xn }
is bounded in probability (or tight), written Xn = Op ( 1 ), if for every e > 0 there
exists b(e) E (0, oo) such that
P( I Xnl > b(e)) < e for all n.
The relation between these two concepts is clarified by the following
equivalent characterization of convergence in probability to zero, viz. Xn = op( 1 )
if and only if for every e > 0 there exists a sequence bn(e) ! 0 such that
P( I Xn l > bn(e)) < e for all n,
(see Problem 6.3). The definitions should also be compared with their non­
random counterparts, viz. xn = o( 1) if xn --> 0 and xn = 0( 1) if { xn } is bounded.
Definition 6.1.3 (Convergence in Probability and Order in Probability).
(i) Xn converges in probability to the random variable X, written Xn .!.. X,
if and only if Xn - X = op ( 1 ).
(ii) xn = op (an) if and only if a;; 1 xn = op ( l ).
(iii) Xn = Op (an) if and only if a;; 1 Xn = Op ( 1 ).
Notice that if we drop the subscripts p in Definitions 6. 1 .3 (ii) and (iii) we
recover the usual definitions of o( · ) and 0( · ) for non-random sequences. In
fact most of the rules governing the manipulation of o( ·) and 0( · ) carry over
to op ( · ) and OP ( · ). In particular we have the following results.
Proposition 6.1 . 1 . If Xn and Y,., n = 1, 2, . . . , are random variables defined on
the same probability space and an > 0, bn > 0, n = 1 , 2, . . . , then
(i) if Xn = op (an) and Y, = op (bn), we have
xn Y, = op (anbn),
xn + Y, = op (max(an, bn)),
and
for r > 0;
I Xnl' = op (a�),
(ii) if Xn = op (an) and Y, = Op (bn), we have
xn Y, = op (anbn).
Moreover
(iii) the statement (i) remains valid if oP is everywhere replaced by OP .
PROOF. (i) If I Xn Y, l/(anbn) > e then either I Y,.l/bn :::;; 1 and I Xnl/an > e or I Y,.l/bn > 1
and I Xn Ynl/(anbn) > e. Hence
P(IXn Y, I/(a"b") > e) :::;; P( I Xn llan > e) + P(I Y, I/bn > 1)
-->
0 as n --> oo .
If I Xn + Y,. l/max(an, bn) > e then either I Xnl/an > e/2 or I Y,. l/bn > e/2. Hence
6. Asymptotic Theory
200
P(IXn
+
->
Y.t l/max(an, bn) > c) S P(I Xn l/an > c/2) + P(I Y.t l/bn > c/2)
0 as n -> oo .
For the last part of (i) we simply observe that
P( I Xnl'/a� > c) = P(I Xnl/an > c 11') -> 0 as n -> 00 .
0
Parts (ii) and (iii) are left as exercises for the reader.
The Definitions 6. 1 . 1 -6. 1 .3 extend in a natural way to sequences of random
vectors. Suppose now that {Xn, n = 1, 2, . . . } is a sequence of random vectors,
all defined on the same probability space and such that X" has k components
Xn t ' Xn 2 , . . . , Xnb n = 1, 2, . . . .
Definition 6.1 .4 (Order in Probability for Random Vectors).
(i) xn = op (an) if and only if xnj = op(an), j = 1 , . . . ' k.
(ii) X" = Op (a") if and only if Xni = Op (a"), j = 1 , . . . , k.
(iii) X" converges in probability to the random vector X, written Xn .!::. X,
if and only if X" - X = op (l).
Convergence in probability of X" to X can also be conveniently characterized
in terms of the Euclidean distance I Xn - X I = [L�= l (Xni - Xi f ] 1 12 .
Proposition 6.1.2. Xn - X = op (l) if and only if I Xn - X I = op( l ).
PROOF. If Xn - X = op( l ) then for each c > O, limn� G() P( I Xni - Xi l 2 > c/k) = 0
for each j = 1 , . . . , k. But
P
et
) it
I Xni - Xi l 2 > c s
P( I Xni - Xi l 2 > c/k)
(6. 1 . 1 )
since L�=l I Xni - Xi l 2 > c implies that at least one summand exceeds c/k. Since
the right side of (6. 1 . 1 ) converges to zero so too does the left side and hence
I Xn - X l 2 = op ( 1 ). By Proposition 6. 1 . 1 this implies that I Xn - X I = op( l ).
Conversely if I Xn - X I = op( l ) we have I Xn i - X;l 2 s I Xn - X l 2 whence
P( I Xn i - Xd > c) s P( I Xn - X l 2 > c 2 ) ...... 0.
0
Proposition 6.1.3. If X" - Y" .!::. 0 and Yn .!::. Y then X" .!::. Y.
PROOF. I Xn - Y l s I Xn - Ynl + I Yn - Y l = op ( 1 ), by Propositions 6. 1 . 1 and
�1.2.
0
.
Proposition 6.1 .4. If {Xn} is a sequence of k-dimensional random vectors such
that X" .!::. X and if g : IRk -> !Rm is a continuous mapping, then g(Xn)!. g(X)
PROOF. Let K be a positive real number. Then given any c > 0 we have
§6. 1 . Convergence in Probability
201
P( l g( X.) - g(X)I > e) � P( l g (X. ) - g(X)I > e, l X I � K , I X. I � K)
+ P( { I X I > K} u { I X.I > K } ).
Since g is uniformly continuous on {x : l x l � K}, there exists y(e) > 0 such
that for all n,
{ l g( X.) - g(X) I > e, IXI � K , I X. I � K}
Hence
s;;
{ I X. - X I > y(e)}.
P(l g(X.) - g(X) I > e) � P( I X. - X I > y(e)) + P( I X I > K) + P( I X. I > K)
� P( I X. - X I > y(e)) + P( I X I > K)
+ P( I X I > K/2) + P( I X. - X I > K/2).
Now given any i5 > 0 we can choose K to make the second and third terms
each less than <5/4. Then since I X. - X I � 0, the first and fourth terms will
each be less than <5/4 for all n sufficiently large. Consequently g(X. ) � g(X ).
D
Taylor Expansions in Probability
If g is continuous at a and x. = a + oP (l) then the argument of Proposition
6. 1 .4 tells us that g(X. ) = g(a) + op ( l ). If we strengthen the assumptions on g
to include the existence of derivatives, then it is possible to derive probabilistic
analogues of the Taylor expansions of non-random functions about a given
point a. Some of these analogues which will be useful in deriving asymptotic
distributions are given below.
Proposition 6.1.5. Let {X. } be a sequence ofrandom variables such that Xn = a +
Op (r.) where a E IR and 0 < r. -> 0 as n -> oo. If g is a function with s derivatives
at a then
s g<il(a)
.
g(X.) = L . , - (X. - a)l + op(r�),
j= O } .
<
where g il is the ph derivative of g and g < OJ = g.
PROOF. Let
[
h(x) = g(x) -
_L
s
g <il(a)
·
. ,- (x - a)l
.
)
J=O
]j[
(x - a)5
S 1.
]
,
x i= a,
and h(a) = 0. Then the function h is continuous at a so that h(X.) =
h(a) + op(l). This implies that h(X.) = op( l ) and so by Proposition 6. 1 . 1 (ii),
202
6. Asymptotic Theory
which proves the result.
D
EXAMPLE 6. 1 . 1 . Suppose that { X,} "' IID(p, a 2 ) with J1 > 0. If Xn =
n - 1 L�= J X,, then by Chebychev' s inequality (see Proposition 6.2. 1 below),
P(n 1 1 2 1 Xn - p i > c) ::;; a 2 £ - 2 , and hence
Xn - p = Op(n - ! 1 2 ).
Since In x has a derivative at p, the conditions of Proposition 6. 1.5 are
satisfied and we therefore obtain the expansion,
In X" = In 11 + p - 1 (X" - p) + op(n - 1 1 2 ).
We conclude this section with a multivariate analogue of Proposition 6. 1 .5.
Proposition 6.1 .6. Let {Xn} be a sequence of random k
X" - a
=
x
1 vectors such that
Op (r"),
where a E [Rk and rn � 0 as n � oo. If g is a function from [Rk into IR such that
the derivatives ogjox; are continuous in a neighborhood N (a) of a, then
PROOF. From the usual Taylor expansion for a function of several variables
(see for example Seeley ( 1 970), p. 1 60) we have, as x � a,
k og
g(x) = g(a) + L - (a ) (x; - a;) + o( l x - a l ).
OX;
Defining
i=l
[
h(x) = g(x) -
g(a) - it ::i (a) (x; - a;)J / I x - a l,
x
# a,
and h(a ) = 0, we deduce that h is continuous at a and hence that h(X") = op (1)
as n � oo . By Proposition 6. 1 . 1 this implies that h(Xn) I Xn - a l = op (rn), which
proves the result.
D
§6.2 Convergence in
rt
h
Mean,
r >
0
Mean square convergence was introduced in Section 2.7 where we discussed
the space L 2 of square integrable random variables on a probability space
(Q, .?, P). In this section we consider a generalization of this concept, conver-
§6.2. Convergence in r'h Mean,
r >
0
203
gence in r'h mean, and discuss some of its properties. It reduces to mean-square
convergence when r = 2.
Definition 6.2.1 (Convergence in r'h Mean, r > 0). The sequence of random
variables {Xn } is said to converge in r'h mean to X, written X" .!:.. X, if
E I Xn - X I' -> 0 as n -> 00.
Proposition 6.2.1 (Chebychev's Inequality). If E I X I' <
then
P( I X I :::0: s) s s - ' E I X I'.
oo,
r :::0: 0 and s > 0,
PROOF.
P( I X I :::0: s) = P( I X I 's - ' :::0: 1 )
S E [ I X I's - r I[ l .ro ) ( I X I's- ')]
s s - ' E I X I'.
D
The following three propositions provide useful connections between the
behaviour of moments and order in probability.
Proposition 6.2.2. If X" .!:.. X then X" .!.. X.
PROOF. By Chebychev' s inequality we have for any s > 0,
P( I Xn - X I > s) s s - r E I Xn - X I' -> 0 as n -> CIJ .
Proposition 6.2.3. If a" > 0, n
=
D
1 , 2, . . . , and E(X;) = O(a�), then X" = Op (a").
PROOF. Applying Chebychev's inequality again, we have for any M > 0,
P(a;; ' I Xn l > M) s a;; 2 E I Xni 2/M 2
s C/M 2
where C = sup (a;; 2 E I Xnl 2 ) <
oo .
Defining c5(s) 2(Cje) 112 if C > 0 and any positive constant if C = 0, we see
from Definition 6. 1 .2 that a;;' I Xnl = OP ( l ).
D
=
Proposition 6.2.4. If EX" __. f1 and Var(X") __. 0 then X" � f1 (and X" .!..
Proposition 6.2.2).
11
by
PROOF.
__.
0 as n __. ctJ .
D
204
6. Asymptotic Theory
§6.3 Convergence in Distribution
The statements X" � X and Xn !'. X are meaningful only when the
random variables X, X 1 , X2 , . . . , are all defined on the same probability space.
The notion of convergence in distribution however depends only on the
distribution functions of X, X 1 , X2 , . . . , and is meaningful even if X, X 1 , X2 ,
. . . , are all defined on different probability spaces. We shall show in Proposition
6.3.2 that convergence in distribution of a sequence { Xn } is implied by con­
vergence in probability. We begin with a definition.
Definition 6.3.1 (Convergence in Distribution). The sequence {Xn} of random
k-vectors with distribution functions { Fx J · ) } is said to converge in distribu­
tion if there exists a random k-vector X such that
lim FxJx) = Fx(x) for all x E C,
(6.3. 1 )
where C is the set of continuity points of the distribution function Fx( · ) of X.
If (6.3. 1 ) holds we shall say that Xn converges in distribution to X. Such
convergence will be denoted by X" => X or Fx " => Fx .
If X" = X then the distribution of X" can be well approximated for large n
by the distribution of X. This observation is extremely useful since Fx is often
easier to compute than Fx "·
A proof of the equivalence of the following characterizations of convergence
in distribution can be found in Billingsley ( 1986), Chapter 5.
Theorem 6.3.1 (Characterizations of Convergence in Distribution). If Fa, F1 ,
F2 , are distribution functions on IRk with corresponding characteristic func­
tions ij?"(t) = JIR" exp(it'x) dF"(x), n = 0, 1, 2, . . . , then the following statements
are equivalent:
. . •
(i) Fn => Fa ,
(ii) JIR" g(x) dFn(x) --> J IR"g(x) dFa(x) for every bounded continuous function g,
(iii) limn � C(J ij?n(t) = ij?a(t) for every t = (t 1 , . . . , tk )' E IRk .
Proposition 6.3.1 (The Cramer-Wold Device). Let {Xn} be a sequence of random
k-vectors. Then xn = X if and only if A.'Xn = A.'X for all A. = (A. 1 ' . . . ' A.d' E IRk .
PROOF. First assume that xn = X. Then for any fixed A. E IR\ Theorem 6.3.1
(iii) gives
showing that A.'Xn => A.'X.
Now suppose that A.'Xn => A.'X for each A. E IRk . Then using Theorem 6.3.1
again, we have for any A. E IRk,
205
§6.3. Convergence in Distribution
f/Jx JA.)
=
E exp(iA.'Xn )
=
which shows that X" => X.
¢Jl.·xJ l) -+ ¢Yl.· x 0 ) =
f/Jx (A.)
D
Remark 1. If X" => X then the Cramer-Wold device with Jci = 1 and Jci = 0,
j # i, shows at once that xni => xi where xni and xi are the i'h components of
X" and X respectively. If on the other hand Xni => Xi for each i, then it is not
necessarily true that X" => X (see Problem 6.8).
Proposition 6.3.2. If X" !.. X then
(i) E l exp(it'Xn ) - exp(it'X) I -+ 0 as n -+ oo for every t E IRk
and
(ii) X" => X.
PROOF. Given t E IRk and E > 0, choose b(s) > 0 such that
l exp(it'x ) - exp(it'y) l
=
11
- exp(it'(y - x)) I
< E if l x -
Yl < b.
(6.3.2)
We then have
E l exp(it'Xn ) - exp(it'X) I
=
=
E l l - exp(it'(X" - X ) ) l
E [ l l - exp(it'(X " - X)) I J{Ix"-XI < b } J
+ E [ l l - exp(it'(X" - X)) I I {Ix"-m' : b}] .
The first term is less than E by (6.3.2) and the second term is bounded above
by 2P( 1 Xn - X I ::2: b) which goes to zero as n -+ oo since X" !.. X. This
proves (i).
To establish the result (ii) we first note that
I E exp(it'Xn ) - E exp(it'X)I :::;; E l exp(it'Xn) - exp(it'X) I -+ 0
as n -+ oo,
and then use Theorem 6.3. 1 (iii).
D
Proposition 6.3.3. If {Xn } and {Yn} are two sequences of random k-vectors such
that X" - Y" = op ( l ) and X" => X, then Y" => X.
PROOF. By Theorem 6.3.1 (iii), it suffices to show that
l f/JvJt) - f/Jx Jt) l -+ 0 as
n -+ oo for each t E IRk,
(6.3.3)
since then
l f/JyJ t) - f/Jx (t) l :::;; l f/JvJt) - f/JxJt) l
+
l f/Jx Jt) - f/Jx (t) l -+ 0.
But
l f/JyJt) - f/Jx Jt) l
=
I E(exp(it'Yn ) - exp(it'Xn )) l
:::;; E l l - exp(it'(X" - Y" ))l
-+ 0 as n -+ oo,
by Proposition 6.3.2.
D
206
6. Asymptotic Theory
Proposition 6.3.4. If {Xn } is a sequence of random k-vectors such that Xn => X
and if h : IRk -4 IR"' is a continuous mapping, then h(Xn) => h(X).
PROOF. For a fixed t E IR "', eit' h< X > is a bounded continuous function of X so
that by Theorem 6.3. 1 (ii), �h<xjt) -4 �h< X l (t). Theorem 6.3.1 (iii) then implies
that h(Xn) => h(X).
D
In the special case when { Xn } converges in distribution to a constant
random vector b, it is also true that {Xn } converges in probability to b, as
shown in the following proposition. (Notice that convergence in probability
to b is meaningful even when X 1 , X 2 , . . . , are all defined on different proba­
bility spaces.)
Proposition 6.3.5. If Xn => b where b is a constant k-vector, then Xn � b.
PROOF. We first prove the result for random variables (i.e. in the case k = 1).
If xn => b then Fx Jx) -4 I[b, oo ) (x) for all X =I= b. Hence for any c > 0,
P( I Xn - b l :::;; c) = P(b - c :::;; Xn :::;; b + c)
-4
I[b,oo ) (b + c) - I[b,oo) (b - c)
= 1,
showing that xn � b.
To establish the result in the general case, k 2: 1, we observe that if Xn => b
then Xni => bi for each j = 1 , . . . , k by Remark 1 . From the result of the
preceding paragraph we deduce that xnj � bj for each j 1, . . . , k and
hence by Definition 6. 1 .4 that Xn � b.
D
=
Proposition 6.3.6 (The Weak Law of Large Numbers). If { Xn } is an iid sequence
of random variables with a finite mean Jl, then
where Xn := (X 1
+
···
+
Xn)/n.
- p
-4 J1
xn
PROOF. Since Xn - J1 = ((X 1 - Jl) + · · · + (Xn - Jl))jn, it suffices to prove the
result for zero-mean sequences. Assuming that J1 = 0, and using the in­
dependence of X 1 , X 2 , . . , we have
�xn (t) = Ee ir x"
= (�x , (n- 1 t)t .
From the inequality 1 1 - y n l :::;; ni l - yl, I Y I :::;; 1, and the assumption that
EX 1 = 0 it follows that
1 1 - �xn (t) l :::;; n i l - �x Jn - 1 t) l
= n i E( l + itn - 1 X 1 - e irn -' x ' ) l
'
:::;; E l n ( 1 + itn- 1 X I - e irn - x ' ) 1 .
.
§6.3. Convergence in Distribution
207
A Taylor series approximation to cos x and sin x then gives
1 1 + iy - e iY I = 1 1 + iy - cos y - i sin y l
::::;; 1 1 - cos y l + I Y - sin y l
::::;; min (2 1 y l, I Y I 2 )
for all real y. Replacing y by tn �1 x in this bound we see that for every x
n
l n( l + itn � 1 x - e it -'x)l
and
::::;;
2 1 t l l x l,
n = 1 , 2, . . . ,
n
itn� J X - e ir - 'x)l -4 0 as n -4 00 .
n
Since E I X 1 I < oo by assumption, E l n( l + itn� 1 X 1 - e;r - ' x ' ) l -4 0 by the
dominated convergence theorem. Hence iflxJt) -4 1 for every t and since 1 is
the characteristic function of the zero random variable we conclude from
Propositions 6.3.1 (iii) and 6.3.5 that X" � 0.
0
l n( l
+
Proposition 6.3.7. If {Xn} and {Yn } are sequences of random k - and m-vectors
respectively and if X" => X and Y" => b where b is a constant vector, then
(6.3.4)
(Note that (6.3.4) is not necessarily true if Y" converges in distribution to a
non-constant random vector.)
PROOF. If we define zn = [X� , b']', then from Proposition 6.3.5 we have
Z" - [X�, Y�]' = op ( 1 ). It is clear that Z" => [X', b']' and so (6.3.4) follows from
Proposition 6.3.3.
D
The following proposition is stated without proof since it follows at once
from Propositions 6.3.4 and 6.3.7.
Proposition 6.3.8. If {Xn } and {Yn} are sequences ofrandom k-vectors such that
X" => X and Y" => b where b is constant, then
(i) X" + Y" => X + b
and
(ii) Y�X" => b'X.
The next proposition will prove to be very useful in establishing asymptotic
normality of the sample mean and sample autocovariance function for a wide
class of time series models.
Proposition 6.3.9. Let Xn, n
random k-vectors such that
=
1 , 2, . . . , and Ynj• j = 1 , 2, . . . ; n
=
1 , 2, . . . , be
208
6. Asymptotic Theory
(i) Ynj = Yj as n -+ oo for each j = 1 , 2, . . . ,
(ii) Yj => Y as j -+ oo, and
(iii) limh,, lim SUPn� co P( I X" - Yn) > s) = 0 for every e > 0.
Then
X" => Y as n -+ oo.
PROOF. By Theorem 6.3. 1 , it suffices to show that for each t E IR k
l tf>xJt) - tf>v (t)l -+ 0
as n -+ oo.
The triangle inequality gives the bound
l tf>xJt) - tf>v (t) l ::::; l tf>xJt) - tPv"/t) l + l tf>v"i (t) - tf>vi (t) l
+ l tf>v/t) - tf>v (t) l .
(6.3.5)
From (iii) it follows, by an argument similar to the proof of Proposition 6.3.2
(i), that lim sup" � co I tf>x" (t) - tf>v (t) l -+ 0 as j -+ oo. Assumption (ii) guarantees
that the last term in (6.3.5) al� o goes to zero as j -+ oo. For any positive
() we can therefore choose j so that the upper limits as n -+ oo of the first
and third terms on the right side of (6.3.5) are both less than b/2. For this
fixed value of j, limn � l tf>v"J (t) - tf>vJ (t) l = 0 by assumption (i). Consequently
lim sup" � co I tf>xJt) - tf>v (t) l < 1b + 1b = b, and since b was chosen arbitrarily,
lim supn� oo l tf>xJt) - tf>v (t)l = 0 as required.
0
co
Proposition 6.3.10 (The Weak Law of Large Numbers for Moving Averages).
Let { X1 } be the two-sided moving average
j=
- oo
where {Z1 } is iid with mean J1 and L � - co l t/ljl < oo . Then
(Note that the variance of Z1 may be infinite.)
PROOF. First note that the series L� - oo t/lj Zt -j converges absolutely with
probability one since
Now for each j, we have from the weak law of large numbers,
"
p
n - 1 "\'
L. zt -j -+ J.l.
t�l
209
§6.4. Central Limit Theorems and Related Results
Proposition 6. 1 .4 that
( )
Y.k � 2: 1/Jj Jl·
lil sk
If we define Yk = (Lui s k 1/Ji )Jl then since Yk --+ Y := (L � � co 1/JJ Jl, it suffices to
show by Proposition 6.3.9 that
,
lim lim sup P ( I x. - Y,k I > e) = 0 for every e > 0.
k --too n--t oo
Applying Proposition 6.2. 1 with r = 1 , we have
P(I X. - Y,k l > e) = P
(I
l
n� 1
I )
I L 1/li Zt �j > e
t = l li l > k
l/
-::;:, E iL 1/il Z l �j e
l l>k
-::;:,
which implies (6.3.6).
(
(6.3.6)
) j
L 1 1/Ji l E I Z � I e,
li l > k
0
§6.4 Central Limit Theorems and Related Results
Many of the estimators used in time series analysis turn out to be asymp­
totically normal as the number of observations goes to infinity. In this section
we develop some of the standard techniques to be used for establishing
asymptotic normality.
Definition 6.4.1. A sequence of random variables {X. } is said to be asymp­
totically normal with "mean" Jln and "standard deviation" (Jn if (Jn > 0 for n
sufficiently large and
where Z
�
N(O, 1 ).
In the notation of Serfling ( 1 980) we shall write this as
X. is AN(Jl., (J;).
0
Remark 1 . If x. is AN(Jl., (J;) it is not necessarily the case that Jln = EX. or
that (J; = Var(X.). See Example 6.4. 1 below.
Remark 2. In order to prove that x. is AN(Jl., (J; ) it is often simplest to
establish the result in the equivalent form (see Theorem 6.3.1 (iii)),
1/Jzjt) --+ exp(- t 2/2),
where 1/JzJ · ) is the characteristic function of z. = (J.� 1 (X. - Jl.). This approach
6 . Asymptotic Theory
210
works especially well when X. is a sum of independent random variables as
in the following theorem.
Theorem 6.4.1 (The Central Limit Theorem). If { x. }
(X I + . . . + x.)/n, then
�
IID(fl, 0" 2 ) and x. =
PROOF. Define the iid sequence { Y, } with mean zero and variance one by
Y, = (X1 - /1)/0" and set Y. = n -1 L7= 1 Y; . By Remark 2, it suffices to show that
rPn•t2 yJt) -+ e - 1 212 . By independence, we have
[
rPn•t2 yJt) = E exp itn - 1 12
= [r/Jr , (tn - 1 12 ) ] " .
t
j=l
lf]
First we need the inequality, l x " - y " l .:::;; n i x - y l for l x l .:::;; 1 and I Y I .:::;; 1 ,
which can be proved easily by induction on n. This implies that for n ;:o: t 2/4,
l [r/Jr 1 (tn - 1 12 )] " - ( 1 - t 2 /(2n))" l .:::;; n l r/Jr 1 (tn - 1 12 ) - ( 1 - t2 /(2n)) l
(6.4. 1 )
n i E(eirn - ' 12 Y 1 - ( 1 + itn - 1 1 2 Y1 - t 2 Y i/(2n))) l .
Using a Taylor series expansion of e ix in a neighborhood of x = 0 we have
nl e itn - 1 1 2x - (1 + itn - 1 12 x - t 2 x 2 /(2n)) l -+ 0 as n -+ oo
=
and
2
n l e i t"- '1 x - (1 + itn- 1 12 x - t2 x2 /(2n)) l .:::;; (tx)2 for all n and x.
Thus, by the dominated convergence theorem, the right-hand side of (6.4. 1 )
converges to zero as n -+ oo and since ( 1 - t 2 /(2n))" -+ e - 1 2 12 we obtain
2
rPn•tzrJt) -+ e - 1 12 as required.
D
Remark 3. The assumption of identical distributions in Theorem 6.4. 1 can be
replaced by others such as the Lindeberg condition (see Billingsley, 1 986)
which is a restriction on the truncated variances of the random variables x• .
However the assumptions of Theorem 6.4. 1 will suffice for our purposes.
Proposition 6.4.1. If x. is AN( fl, 0",7) where O"" -+ 0 as n -+ oo , and if g is a
function which is differentiable at fl, then
g(X.) is AN(g(fl), g'(/1) 2 0",7 ).
PROOF. Since z. = 0".- 1 (X. - /1) => Z where Z � N(O, 1 ), we may conclude
from Problem 6.7 that z. = OP( 1 ) as n -+ oo . Hence x. = 11 + OP (O".). By
Proposition 6. 1 .5 we therefore have
0",;- 1 [g(X.) - g(/1) ] = 0",;- 1 g'( /l) [X. - 11] + op(1 ),
which with Proposition 6.3.3 proves the result.
D
§6.4. Central Limit Theorems and Related Results
EXAMPLE 6.4. 1 . Suppose that { Xn } "' IID(,u, (} 2 ) where .U # 0 and 0 < (} <
If Xn = n- 1 (X1 · · · + Xn) then by Theorem 6.4. 1
Xn is AN(,u, (J 2/n),
+
21 1
CIJ .
and by Proposition 6.4. 1 ,
X,; 1 i s AN(,u-1, ,u-4(} 2jn).
Depending on the distribution of Xn, it is possible that the mean of X,; 1 may
not exist (see Problem 6. 1 7).
We now extend the notion of asymptotic normality to random k-vectors,
k � 1 . Recall from Proposition 1 .5.5 that X is multivariate normal if and only
if every linear combination A.' X is univariate normal. This fact, in conjunction
with the Cramer-Wold device, motivates the following definition (see Serfling
(1 980)) of asymptotic multivariate normality.
Definition 6.4.2. The sequence {Xn } of random k-vectors is asymptotically
normal with "mean vector" Jln and "covariance matrix" Ln if
(i) Ln has no zero diagonal elements for all sufficiently large n, and
(ii) A.' Xn is AN(A.'Jln, A.'LnA) for every A E IRk such that A.' LnA > 0 for all
sufficient large n.
Proposition 6.4.2. If xn is AN(Jtn, Ln) and B is any non-zero m X k matrix such
that the matrices BLnB', n = 1 , 2, . . . , have no zero diagonal elements then
PROOF. Problem 6.21.
D
The following proposition is the multivariate analogue of Proposition 6.4. 1 .
Proposition 6.4.3. Suppose that x n is AN(Jl, C� L) where L is a symmetric non­
negative definite matrix and en --+ 0 as n --+ oo. If g (X) = (g 1 (X), . . . , gm(X))' is a
m
mapping from IRk into !R such that each g i ( ) is continuously differentiable in a
neighborhood of Jl, and if DLD' has all of its diagonal elements non-zero, where
D is the m x k matrix [(8gj8xi ) (Jt) ] , then
g (Xn) is AN(g (Jl), c�DLD').
·
PROOF. First we show that xnj = .Uj + Op (cn). Applying Proposition 6.4.2 with
B = (bi1 , bi2 , , bid we find that Xni = BX is AN(,ui, c� (}ii) where (}ii is the /h
diagonal element of L and (}ii > 0 by Definition 6.4.2. Since c,; 1 (Xni - .Ui)
converges in distribution we may conclude that it is bounded in probability
(Problem 6.7) and hence that Xni = .Ui + Op (cn).
Now applying Proposition 6. 1 .6 we can write, for i = 1 , . . . , m,
. • •
212
6 . Asymptotic Theory
or equivalently,
g(Xn) - g(Jt) = D(Xn - Jt) + op (cn).
Dividing both sides by en we obtain
1
c; 1 [g(Xn) - g(Jt) ] = c; D (Xn - Jl) + op ( 1 ),
and since c; 1 D(Xn Jl) is AN(O, DI.D'), we conclude from Proposition 6.3.3
that the same is true of c; 1 [g(Xn) - g(Jl)] .
D
-
EXAMPLE 6.4.2 (The Sample Coefficient of Variation). Suppose that { Xn } "'
IID(,u, a 2), a > 0, EX� = ,u4 < oo, E X � = ,u 3 , EX?; = ,u2 = ,u 2 + a 2 and
E X n = ,u 1 = ,u i= 0. The sample coefficient of variation is defined as Y, = sn/Xn
where xn = n - 1 (X 1 + . . . + Xn) and s?; = n- 1 2:: 7= 1 (X; - Xn) 2 • It is easy to
verify (Problem 6.22) that
(6.4.2)
where I. is the matrix with components
i, j = 1 , 2.
1
Now Y, = g(Xn, n - 2:: 7= 1 X?) where g(x, y) = x - 1 (y - x 2 ) 1 12. Applying Prop­
osition 6.4.3 with
we find at once that
We shall frequently have need for a central limit theorem which applies to
sums of dependent random variables. It will be sufficient for our purposes to
have a theorem which applies to m-dependent strictly stationary sequences,
defined as follows.
Definition 6.4.3 (m-Dependence). A strictly stationary sequence of random
variables { Xr } is said to be m-dependent (where m is a non-negative integer)
if for each t the two sets of random variables {Xi , j :s; t} and {Xi, j z t + m + 1 }
are independent.
Remark 4. In checking for m-dependence of a strictly stationary sequence
{ Xr , t = 0, ± 1, ± 2, . . . } it is clearly sufficient to check the independence of
§6.4. Central Limit Theorems and Related Results
213
the two sets {Xi , j ::;; 0} and {Xi , j � m + 1 } since they have the same joint
distributions as {Xi , j ::;; t} and {Xi , j � t + m + 1 } respectively.
Remark 5. The property of m-dependence generalizes that of independence
in a natural way. Observations of an m-dependent process are independent
provided they are separated in time by more than m time units. In the
special case when m = 0, m-dependence reduces to independence. The MA(q)
processes introduced in Section 3.1 are m-dependent with m = q.
The following result, due originally to Hoeffding and Robbins ( 1 948),
extends the classical central limit theorem (Theorem 6.4. 1 ) to m-dependent
sequences.
Theorem 6.4.2 (The Central Limit Theorem for Strictly Stationary m-Dependent
Sequences). If {X, } is a strictly stationary m-dependent sequence of random
variables with mean zero and autocovariance function y( · ), and if vm = y(O) +
2 L }= 1 y (j) -1= 0, then
(i) limn� oo n Var(Xn) = vm and
(ii) Xn is AN(O, vm/n).
n n
PROOF. (i) n Var(Xn) = n - 1 L L y (i - j)
i = 1 j= 1
= L ( 1 - n - 1 lj l )y(j)
li l < n
= L ( 1 - n - 1 1 j l )y(j) for n > m
lil :o; m
(ii) For each integer k such that k > 2m, let Y,k = n - 1 12 [ (X 1 + · · · + Xk - m) +
(Xk+ 1 + · · · + X2 k - m) + · · · + (X<r -1 Jk + 1 + · · · + X,k - m) ] where r = [n/k] , the
integer part of n/k. Observe that n 112 Y,k is a sum of r iid random variables each
having mean zero and variance,
Rk - m = Var(X 1 + . . . + xk - m) = L (k - m - i jl)y(j).
lil <k- m
Applying the central limit theorem (Theorem 6.4. 1 ), we have
Y,k = Y,. where Y,. N(O, k - 1 R k - m).
Moreover, since k- 1 Rk - m ..... vm as k ..... oo, we may conclude (Problem 6. 1 6) that
Yk = Y where Y � N(O, vm) ·
It remains only to show that
�
lim sup P(l n 112 Xn - Y,k l > �:) = 0 for every e > 0,
klim
-+oo n -+oo
(6.4.3)
6 . Asymptotic Theory
214
since the second conclusion of the theorem will then follow directly from
Proposition 6.3.9.
In order to establish (6.4.3) we write (n 112 X" - Y,k ) as a sum of r = [ n/k]
independent terms, viz.
r-1
...
n 112 xn - Y,k = n -1 12 jLt (Xjk- m + l + xjk- m+ 2 + + Xjd
�
+ n - 112 (Xrk -m+t + . . . + X").
Making use of this independence and the stationarity of {X, }, we find that
Var(n 1 12 X" - Ynd = n -1 [([n/k] - l)Rm + Rh<na,
where Rm = Var(X 1 + · · · + Xm), Rh<n> = Var(X 1 + · · · + Xh<n>) and h(n) =
n - k [n/k] + m. Now Rm is independent of n and Rh<n> is a bounded function
of n since 0 :::; h(n) :::; k + m. Hence lim SUPn� oo Var(n 112 xn - Y,k ) = k- 1 Rm,
and so by Chebychev's inequality condition (6.4.3) is satisfied.
D
Remark 6. Recalling Definition 6.4. 1 , we see that the condition vm I= 0 is
essential for conclusion (ii) of Theorem 6.4.2 to be meaningful. In cases where
vm 0 it is not difficult to show that n 112 xn .!'... 0 and n Var(Xn ) -> 0 as n -> CIJ
(see Problem 6.6). The next example illustrates this point.
=
EXAMPLE 6.4.3. The strictly stationary MA( l ) process,
is m-dependent with m = 1, and
Vm = y (O) + 2y (1) = 0.
For this example X" = n- 1 (Z" - Z0), which shows directly that nX" => Z 1 Z0, n 112 X" .!'... 0 and n Var(X") -> 0 as n -> oo .
EXAMPLE 6.4.4 (Asymptotic Behaviour of xn for the MA(q) Process with
"L J� o ()i I= 0). The MA(q) process,
is a q-dependent strictly stationary sequence with
Vq =
}: /U) = 0"2 (Jo ejy = 2nf(O),
•
.
where f( · ) is the spectral density of {X, } (see Theorem 4.4.2). A direct appli­
cation of Theorem 6.4.2 shows that
(6.4.4)
Problems
215
Problems
6. 1 . Show that a finite set of random variables {X 1 ,
6.2. Prove parts (ii) and (iii) of Proposition 6. 1 . 1 .
6.3. Show that x. = ov( l ) i f and only if fo r every e
such that P ( I X.I > b.(e)) < e for all n.
6.4. Let X 1 , X2 ,
• • •
>
, X.} is bounded in probability.
0, there exists a sequence b.(e) !O
, be iid random variables with distribution function F. If
, X.) and m. := min(X 1 , , X.), show that M. /n !. 0 if
x(1 - F(x)) -> 0 as x -> oo and m./n !. 0 if xF( - x) -> 0 as x -> oo .
• • .
M. : = max(X 1 ,
. • .
• • •
6.5. If X. = Ov( l ), is it true that there exists a subsequence { X . } and a constant
K E (0, oo) such that P(I X I < K, k = 1 , 2, . . . ) = 1?
•
••
6.6. Let {X, } be a stationary process with mean zero and an absolutely summable
autocovariance function y( · ) such that L�= y(h) = 0. Show that n Var(X.) -> 0
and hence that n 1 '2 X. !. 0.
- oo
6.7. If {X. } is a sequence of random variables such that X. = X, show that {X. } is
also bounded in probability.
6.8. Give an example of two sequences of random variables { X., n = 0, 1, . . . } and
{ Y,, n = 0, 1 , . . . } such that x. = X0 and Y, = Y0 while (X., Y,)' does not converge
in distribution.
6.9. Suppose that the random vectors X. and Y. are independent for each n and
that X. = X and Y. = Y. Show that [X�, Y�]' = [X', Y']' where X and Y are
independent.
6. 1 0. Show that if x. = X, Y, = Y and X. is independent of Y, for each n, then
x. + Y. = X + Y where X and Y are independent.
6. 1 1 . Let {X. } be a sequence of random variables such that EX. = m and Var(X.) =
a} > 0 for all n, where a; -> 0 as n -> oo . Define
z. a.- 1 (X. - m),
=
and let f be a function with non-zero derivative f'(m) at m.
(a) Show that z. = Op( 1 ) and x. = m + ov( 1).
(b) If Y, = [f(X.) - f(m)]/[aJ'(m)], show that Y. - z. = ov( l).
(c) Show that if z. converges in probability or in distribution then so does Y,.
(d) If s. is binomially distributed with parameters n and p, and f ' ( p ) i= 0, use the
preceding results to determine the asymptotic distribution of f(S./n).
6. 1 2. Suppose that x. is AN(11, a; ) where a; -> 0. Show that x. !. 11.
6. 1 3. Suppose that x. is AN(/l, a;) and Y, = a. + ov(a.). If a./a. -> c, where 0 < c <
show that (X. - �I)/ Y. is AN(O, c2).
oo ,
, x.m l' = N (O, I:) and 1:. !. I: where I: is non-singular, show that
z
X � I:; x. = X (m).
6. 14. If X. = (X. 1 ,
t
. • .
6. 1 5. If x. is AN(/l., u;), show that
(a) x. is AN(ji., a; ) if and only if a.;a. -> 1 and (fi. - 11. )/a. -> 0, and
6 . Asymptotic Theory
216
(b) a" X" + b" is AN(11"' 0'; ) if and only if a" -> 1 and (11n (a" - 1) + bn)/O'n -> 0.
(c) If X" is AN(n, 2n), show that ( 1 - n-1 )X" is AN(n, 2n) but that ( 1 - n- 1 12 )X"
is not AN(n, 2n).
6. 1 6. Suppose that xn - N ( l1n> vn) where l1n -> 11, Vn -> v and 0 < v <
X" => X, where X - N ( 11, v).
00 .
Show that
6. 1 7. Suppose that { X, } - IID(11, 0' 2 ) where 0 < 0'2 < oo. If X" = n- 1 (X + . . . + X")
1
has a probability density function f(x) which is continuous and positive at x 0,
show that E I Xn- 1 1 = oo. What is the limit distribution of g"- t when 11 = 0?
=
6. 1 8. If X 1 , X2 , . . . , are iid normal random variables with mean 11 and variance 0'2 ,
find the asymptotic distributions of x; (n- 1 I 1= t Xj) 2
(a) when 11 # 0, and
(b) when 11 = 0.
=
6.19. Define
In + (x) =
{
ln(x) if x > 0,
0
X :S: 0.
If X" is AN(11, (J; ) where 11 > 0 and (J" -> 0, show that In + (X") is AN(ln(11), 11- 2 0'/ ).
6.20. Let f(x) 3x- 2 - 2x - 3 for x # 0. If Xn is AN( ! , 0'; ) find the limit distribution
of (f(X") - 1 )/0'� assuming that 0 < 0'" -> 0.
=
6.2 1 . Prove Proposition 6.4.2.
6.22. Verify (6.4.2) in Example 6.4.2. If 11
#
0, what is the limit distribution of n - 112 Y,?
6.23. Let X 1 , X 2 ,
, be iid positive stable random variables with support [0, oo ),
exponent :X E (O, 1 ) and scale parameter c 1 1• where c > 0. This means that
• • •
Ee -ex ,
=
exp( - cO"),
0 ;::: 0.
The parameters c and :x can be estimated by solving the two "moment" equations
n
n - 1 I e -e , x,
j= t
where 0 < 0 1 < 02 , for c and
estimators.
:x.
=
exp( - cOf),
i = 1, 2,
Find the asymptotic joint distribution of the
6.24. Suppose { Z, } - IID(O, 0' 2 ).
(a) For h ;::: I and k ;::: I , show that Z,Z,+h and ZsZs+ k are uncorrelated for all
s # t, s ;::: I , t ;::: I .
(b) For a fixed h ;::: I , show that
n
0' - z n- 112 I (Z,Z, + 1 , , Z,Zr+h)' => ( Nt , . . . , Nh)'
t= l
• • •
, Nh are iid N(O, I ) random variables. (Note that the sequence
I , 2, . . . } is h-dependent and is also WN (0, 0'4).)
(c) Show that for each h ;::: I ,
where N1 , N2,
{ Z,Zr+h' t
• • .
=
n - 112
(� Z,Z,+h - �t� (Z, - Z") (Z,+h - Z" )) .!.
0
217
Problems
where
1
z. = n- (21 +
· · · + Z.).
1
(d) Noting by the weak law of large numbers that n - L�� � Z� !. a2 , conclude
from (b) and (c) that
where
CHAPTER 7
Estimation of the Mean and the
Autocovariance Function
I f { Xr } i s a real-valued stationary process, then from a second-order point of
view it is characterized by its mean 11 and its autocovariance function y( · ). The
estimation of fl, y( · ) and the autocorrelation function p( · ) = y( · )/y (O) from
observations of X 1 , . . . , Xn, therefore plays a crucial role in problems of
inference and in particular in the problem of constructing an appropriate
model for the data. In this chapter we consider several estimators which will
be used and examine some of their properties.
§7. 1 Estimation of f1
A natural unbiased estimator of the mean 11 of the stationary process { Xr } is
the sample mean
(7. 1 . 1 )
We first examine the behavior of the mean squared error E(Xn - 11f for
large n.
Theorem 7.1.1. If { Xr } is stationary with mean 11 and autocovariance function
y( · ), then as n --> oo,
Var(Xn ) = E(Xn - !1) 2 --> 0 if y(n) --> 0,
and
00
00
nE(Xn - /1)2 --> I y(h) if I l y(h) l < 00 .
h=
- oo
h=
- oo
�7. 1 . Estimation of J1
219
1 n
n Var(X") = - L Cov(X; , XJ
n i, j= 1
PROOF.
l hl< n
= I
(
)
lhl
1 - - y(h)
n
� I / y(h) / .
n
lhl<
Ify(n) ----> 0 as n ----> oo then limn�<XJ n - 1 Ii hl < n ly(h)l = 2 limn �<XJ ly(n) l = 0, whence
Var(X" ) ----> 0. If I h'
/ y(h) l < oo then the dominated convergence theorem
gives
w
lhl
lim n Var(X") = lim I 1
y(h) = L y(h).
0
n
h = -oo
n�c:o
n--+ oo lh l<n
= _ 00
(
-
Remark 1. If Lh'= - oo / y(h) / <
Corollary 4.3.2,
oo ,
-
)
-
then {X, } has a spectral density f( " ) and, by
ro
n Var(X") ----> I y(h) = 2nf(O).
h= -w
Remark 2. If X, = J1 + I� - <XJ 1/JjZr -j with I� -w 1 1/Jj l
oo (see Problem 3.9) and
ro
n Var(Xn) ----> )�co y(h) = 2nf(O) = rJ 2
< oo,
then I h'= - co ly(h)l <
(� )
j= oo
<D
2
1/Jj .
Remark 3. Theorem 7. 1 . 1 shows that if y(n) ----> 0 as n ----> oo, then X" converges
in mean square (and hence in probability) to the mean Jl. Moreover under the
stronger condition Ih'= ly(h) l < oo (which is satisfied by all ARMA(p, q)
processes) Var(X" ) n - 1 I h'= y(h). This suggests that under suitable condi­
tions it might be true that xn is AN(J1, n - 1 I h' -w y(h)). One set of assumptions
which guarantees the asymptotic normality is given in the next theorem.
- <X)
�
- co
=
Theorem 7.1.2. If {X, } is the stationary process,
w
X, = J1 + L 1/JjZt -j•
j= - oo
1/Jj of 0, then
X" is AN(J1, n - 1 v),
where v = L h= _ 00 y(h) = rJ 2 ( I� - oo 1/JY, and y( · ) is the autocovariance function
of {X, } .
where L �
_"'
I t/1)
< oo
PROOF. See Section 7.3.
and L�
-co
D
220
7. Estimation of the Mean and the Autocovariance Function
Theorem 7. 1.2 is useful for finding approximate large-sample confidence
intervals for Jl. If the process {X, } is not only stationary but also Gaussian,
then from the second line of the proof of Theorem 7. 1 . 1 we can go further and
write down the exact distribution of X" for finite n, viz.
a result which gives exact confidence bounds for J1 if y( · ) is known, and
approximate bounds if it is necessary to estimate y( · ) from the observations.
Although we have concentrated here on X" as an estimator of Jl, there
are other possibilities. If for example we assume a particular model for the
data such as tf!(B) (X, - Jl) = 8(B)Z,, then it is possible to compute the best
linear unbiased estimator fln of J1 in terms of X 1 , . . . , X" (see Problem 7.2).
However even with this more elaborate procedure, there is little to be gained
asymptotically as n -> oo since it can be shown (see Grenander and Rosenblatt
( 1 957), Section 7.3) that for processes {X, } with piecewise continuous spectral
densities (and in particular for ARMA processes)
lim n Var(fl.") = lim n Var(X").
We shall use the simple estimator X".
§7.2 Estimation of y( · ) and p( )
·
The estimators which we shall use for y(h) and p(h) are
n -h
Y (h) = n- 1 L (X, - X")(X,+h - X"),
r=1
and
0�h�n-
l,
(7.2. 1 )
(7.2.2)
p (h) = y(h)/Y (O),
respectively. The estimator (7.2. 1) is biassed but its asymptotic distribution (as
n -> oo) has mean y(h) under the conditions of Proposition 7.3.4 below. The
estimators y(h), h = 0, . . . , n - 1 , also have the desirable property that for each
n 2 l the matrix
�
r
=
yl (O)
y(l)
y(O)
y( l )
" y(n �
l)
y(n - 2)
is non-negative definite. To see this we write
f" = n- 1 TT',
y(n -l)J
y(O)
y(n - 2)
(7.2.3)
§7.2. Estimation of y ( · ) and
where T is the n
x
p( · )
2n matrix,
...
...
T�
and lf = X; - Xn , i =
�r
221
0
0
y1
y1
y2
y2
Y,
!J
Y, 0
y1 y2
1 , . . . , n. Then for any real n x 1 vector a we have
-1
a' f a = n ( a' T) (a' T)' 2: 0,
n
and consequently the sample autocovariance matrix rn and sample auto­
correlation matrix,
(7.2.4)
Rn = fn /1(0),
1
are both non-negative definite. The factor n - is sometimes replaced by
(n - h) - 1 in the definition of y(h), but the matrices t" and R." may not then be
non-negative definite. We shall therefore always use the definitions (7.2. 1 ) and
(7.2.2) of y(h) and p(h). Note that det f" > 0 if y(O) > 0 (Problem 7. 1 1).
From X 1 , . . . , X" it is of course impossible without further information to
estimate y(k) and p(k) for k 2: n, and for k slightly smaller than n we should
expect that any estimators will be unreliable since there are so few pairs
(X1, X1 + k) available (only one if k = n - 1). Box and Jenkins ( 1976), p. 33,
suggest that useful estimates of correlation p(k) can only be made if n is roughly
50 or more and k ::;; n/4.
It will be important in selecting an appropriate ARMA model for a given
set of observations to be able to recognize when sample autocorrelations are
significantly different from zero. In order to do this we use the following
theorem which gives the asymptotic joint distribution for fixed h of p(1), . . . ,
p(h) as n � oo .
Theorem 7.2.1. If { X1 } is the stationary process,
00
xt - 11 = I t/lj Zt-j•
j=
where L� - oo I thl < oo and EZ14 < oo , then for each h E { 1, 2, . . . } we have
p(h) is AN(p(h), n - 1 W),
where
j) (h)' = [ /J (l ), p(2), . . . ' p(h) ] ,
p(h)' = [p(1), p(2), . . . ' p(h)],
and W is the covariance matrix whose (i,j)-element is given by Bartlett'sformula,
- co
00
wii = L { p(k + i)p(k + j) + p(k - i) p(k + j) + 2p(i)p(j)p 2 (k)
k= -oo
- 2p(i)p(k)p(k + j) - 2p(j)p(k)p(k + i) } .
222
7. Estimation of the Mean and the Autocovariance Function
PROOF. See Section 7.3.
0
In the following theorem, the finite fourth moment assumption is relaxed
at the expense of a slightly stronger assumption on the sequence {l/lj } ·
Theorem 7.2.2. If {X, } is the stationary process
ro
X, - f.1 = L 1/Jj Zt -j ,
j=
1 1/1) < oo and L � -ro 1/1} [ j [ < oo, then for each h E { 1 , 2, . . . }
- oo
where L �
-ro
p(h) is AN(p(h), n - 1 W),
where p(h), p(h) and W are defined as in Theorem 1.2. 1 .
PROOF. See Section 7.3.
Remark 1. Simple algebra shows that
ro
wu = L { p(k + i) + p(k - i) - 2p (i) p (k)}
k�l
X { p(k + j) + p(k - j) - 2p(j)p(k)},
(7.2.5)
which is a more convenient form of wu for computational purposes. This
formula also shows that the asymptotic distribution of n 112 (p(h) p(h)) is the
same as that of the random vector ( Y1 , , Yh)', where
-
• . .
ro
i = 1 , . . . , h,
(7.2.6)
Y; = L (p(k + i) + p(k - i) - 2p(i)p(k))Nk >
k� l
and N1 , N2 , are iid N(O, 1 ) random variables. The proof o f Theorem 7.2.2
shows in fact that the limit distribution of n 112 (p(h) - p(h)) is completely deter­
mined by the limit distribution of the random variables a 2 n - 1 12 L�� 1 Z,Z, + i •
i = 1 , 2, . . . which are asymptotically iid N(O, 1 ) (see Problem 6.24).
• • •
-
Remark 2. Before considering some applications of Theorem 7.2.2 we note
that its conditions are satisfied by every ARMA(p, q) process driven by an
iid sequence {Z, } with zero mean and finite variance. The assumption of
identical distributions in Theorems 7. 1 .2 and 7.2. 1 can also be replaced by the
boundedness of E [ Z, [ 3 and E [ Z,[6 respectively (or by other conditions which
permit the use in the proofs of a central limit theorem for non-identically
distributed random variables). This should be kept in mind in applying the
results.
ExAMPLE 7.2. 1 (Independent White Noise). If {X, } � IID(O, a 2 ), then p(l) = 0
if [ / [ > 0, so from (7.2.5) we obtain
223
§7.2. Estimation of y( · ) and p ( · )
wii {01
=
if i = j,
otherwise.
For large n therefore p(l), . . . , p(h) are approximately independent and
identically distributed normal random variables with mean · o and variance
n- 1 . If we plot the sample autocorrelation function p(k) as a function of
k, approximately .95 of the sample autocorrelations should lie between the
bounds ± 1 .96n - 112 . This can be used as a check that the observations truly
are from an liD process. In Figure 7. 1 we have plotted the sample auto­
correlation p(k), k = 1, . . . , 40 for a sample of 200 independent observations
from the distribution N(O, 1 ). It can be seen that all but one of the auto­
correlations lie between the bounds ± 1 .96n - 1 12 . If we had been given the data
with no prior information, inspection of the sample autocorrelation function
would have given us no grounds on which to reject the simple hypothesis that
the data is a realization of a white noise process.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
0.9
-1
0
10
30
20
Figure 7.1. The sample autocorrelation function of n
white noise, showing the bounds ± 1 .96n-112.
=
40
200 observations of Gaussian
ExAMPLE 7.2.2 (Moving Average of Order q). If
X1 = Z1 + 81 Z1_ 1 + · · · + eqzt - q•
then from Bartlett 's formula (7.2.5) we have
i > q,
W;; = ( 1 + 2p 2 ( 1 ) + 2p 2 (2) + · · · + 2p 2 (q)J,
as the variance of the asymptotic distribution of n 1 12 p(i) as n --+ oo. In Figure
224
7. Estimation of the Mean and the Autocovariance Function
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0 3
-0.4
-0 5
-0 6
-0.7
- 0.8
-0.9
-1
0
10
20
30
40
Figure 7.2. The sample autocorrelation function of n = 200 observations o f the Gaus­
sian M A ( l ) process, X, = Z, - .8Z,_ 1 , { Z, } WN(O, 1 ), showing the bounds
± t .96n - 1 12 [ t + 2p 2 ( t ) r12 .
�
7.2 we have plotted the sample autocorrelation function p(k), k
for 200 observations from the Gaussian MA( l ) process
{Z, }
�
IID(O, 1 ).
=
0, 1 , . . . , 40,
=
(7.2.6)
The lag-one sample autocorrelation is found to be p(1) = - .5354
- 7.57n - 1 12 , which would cause us (in the absence of our prior knowledge
of {X, } ) to reject the hypothesis that the data is a sample from a white
noise process. The fact that I P{k)l < 1 .96n- 112 for k E { 2, . . . , 40} strongly
suggests that the data is from a first-order moving average process. In
Figure 7.2 we have plotted the bounds ± 1 .96n - 1i2 [1 + 2p 2 ( 1 )r;2 where
p(1) = - .8/1 .64 = - .4878. The sample autocorrelations p(2), . . . , p(40) all lie
within these bounds, indicating the compatibility of the data with the model
(7.2.6). Since however p ( 1 ) is not normally known in advance, the autocor­
relations p(2), . . . , p(40) would in practice have been compared with the more
stringent bounds ± 1 .96n -1 12 or with the bounds ± 1.96n - 1 12 [ 1 + 2p 2 ( 1)] 1 12 in
order to check the hypothesis that the data is generated by a moving average
process of order 1 .
ExAMPLE 7.2.3 (Autoregressive Process of Order 1 ). Applying Bartlett's formula
to the causal AR(1) process,
§7.3. * Derivation of the Asymptotic Distributions
and using the result (see Section 3.1) that p(i)
variance of n 1 12(p(i) - p(i)) is
=
225
,plil , we find that the asymptotic
i i -k
k +
k -i
i
=
L
W;; k =l ,p2 (,p - r/J )2 k =iI+ l ,p 2 (,p - r/J )2
= ( 1 - ,p 2i)(1 + r/12)(1 - r/12) - 1 - 2 i r/J 2 i,
i = 1 , 2, . . . ,
( 1 + r/12 )/( 1 - r/12) for i large.
00
::::e
The result is not of the same importance in model identification as the corre­
sponding result for moving average processes, since autoregressive processes
are more readily identified from the vanishing of the partial autocorrelation
function at lags greater than the order of the autoregression. We shall return
to the general problem of identifying an appropriate model for a given time
series in Chapter 9.
§ 7. 3 * Derivation of the Asymptotic Distributions
This section is devoted to the proofs of Theorems 7. 1 .2, 7.2. 1 and 7.2.2. For the
statements of these we refer the reader to Sections 7. 1 and 7.2. The proof of
Theorem 7. 1 .2, being a rather straightforward application of the techniques
of Chapter 6, is given first. We then proceed in stages through Propositions
7.3. 1 -7.3.4 to the proof of Theorem 7.2. 1 and Propositions 7.3.5-7.3.8 to the
proof of Theorem 7.2.2.
PROOF OF THEOREM 7. 1 .2. We first define
m
X ,m = f.l + L 1/Jj Zt - j
j= - m
and
By Example 6.4.4, as n ---> oo,
n 1 12( Y,m - Jl) => Ym where Ym
�
( C � Y) .
N O, rr 2 = m t/li
(7.3. 1 )
Now as m ---> oo , rr 2( L j= - m t/IY ---> rr 2(L.i= -oo t/JY , and so by Problem 6. 1 6,
Ym
==>
Y where Y
By Remark 2 of Section 7. 1 ,
�
( c=�oo t/liy).
N 0, rr 2
(7.3.2)
226
7. Estimation of the Mean and the Autocovariance Function
Hence
lim lim sup Var(n 1 12 (X. - Y..m )) = 0,
m-+ro n..--Jo oo
which, in conjunction with Chebychev's inequality, implies that condition (iii)
of Proposition 6.3.9 is satisfied. In view of (7.3. 1 ) and (7.3.2) we can therefore
apply the Proposition to conclude that n 112 (X. - /1) = Y.
D
The asymptotic multivariate normality of the sample autocorrelations
(Theorem 7.2. 1 ) will be established by first examining the asymptotic be­
havior of the sample autocovariances y(h) defined by (7.2. 1 ). In order to do
this it is simplest to work in terms of the function
n
h = 0, 1 , 2, . .
y*(h) = n - 1 I x,x, +h ,
t=l
which, as we shall see in Proposition 7.3.4, has the same asymptotic properties
as the sample autocovariance function.
. '
Proposition 7.3.1. Let {X,} be the two-sided moving average,
00
X, = L t/lj Zt -j '
j= - ro
where EZ,4 = 1]a4 <
oo
and L� - ro I t/Ii i <
00 .
Then if p z 0 and q z 0,
lim n Cov(y*(p), y*(q))
-
00
3)y(p)y(q) + I [y(k)y(k - P + q) + y(k + q)y(k - p)],
k= -ro
where y( · ) is the autocovariance function of {X,}.
= (17
(7.3.3)
PROOF. First observe that
if s = t = u = v,
if s = t # u = v,
if s # t, s # u and s # v.
Now
E(X,Xr+p Xt+h+p Xt+h+p + q )
= I I I I t/Ji t/Jj +pt/Jk + h+p t/l, +h+p+q E(Zr - ; Z, _jz, _kz,_ z>
i j k I
(7.3.4)
227
§7.3. * Derivation of the Asymptotic Distributions
and the sum can be rewritten, using (7.3.4), in the form
(17 - 3) o.4 L 1/1;1/!;+pl/l;+h+pl/l;+h+p+ q + y(p)y(q)
i
+ y (h + p)y(h + q) + y (h + p + q)y(h).
It follows that
- Ctl I� x,xt+pxsxs+q)
t �l
s l�
Ey* (p)y*(q) = n 2 E
=
n-2
y (p)y(q) + y(s - t) y (s - t - p + q)
+ y(s - t + q)y(s - t - p)
+ (17 - 3) <T4 � 1/J; I/J;+pi/Ji +s-ti/Ji +s-t+ q .
]
Letting k s - t, interchanging the order of summation and subtracting
y (p)y(q), we find that
Cov(y* (p), y* (q)) = n - 1 L ( 1 - n - 1 l k l ) 7k ,
(7.3.5)
l kl < n
where
4
7k = y(k)y(k - p + q) + y (k + q)y(k - p) + (17 3)o L 1/1;1/J;+pl/l;+kl/li +k+ q·
i
The absolute summability of { 1/Jj } implies that { 7k} is also absolutely summable.
We can therefore apply the dominated convergence theorem in (7.3.5) to
deduce that
=
-
lim n Cov(y* (p), y*(q))
k=:L-oo 1k
00
=
= (I] - 3) y (p) y (q)
+
00
L [y(k)y(k - p + q) + y(k + q)y(k - p) ] .
k= - oo
D
Proposition 7.3.2. If {X, } is the moving average,
(7.3.6)
j= - m
where Ez: = 17 04 < oo , and if y( ' ) is the autocovariance function of { X,}, then
for any non-negative integer h,
[ J (r J l
y* (O)
_
.·
y* (h)
1s. AN
where V is the covariance matrix,
y O)
\.· n - 1 V
y(h)
'
'
7. Estimation of the Mean and the Autocovariance Function
228
V=
[(I'/
-
3)y(p)y(q) +
�
l
k = oo
+ y(k + q)y(k - p))
(y(k)y(k - p + q)
·
q=O , , h
...
PROOF. We define a sequence of random (h + I )-vectors {Y, } by
y; = (X, X, , x,x, + 1 , . . . , x,x,+ h ).
Then {Y, } is a strictly stationary (2m + h)-dependent sequence and
n
n - 1 L Y, = (y*(O), . . . , y*(h))'.
t=!
We therefore need to show that as n � oo,
n- 1
t
t
I
( [� ]
I..' Y, is AN l..'
y O)
y(h)
)
' n - 1 I..' VI.. ,
(7.3.7)
for all vectors 1.. E IRh+l such that 1..' VI.. > 0. For any such 1.., the sequence {I.. ' Y, }
is (2m + h)-dependent and since, by Proposition 7.3 . 1 ,
!�� n- 1 Var
(� )
I..' Y, = J.' VA > 0,
we conclude from Remark 6 of Section 6.4 that { I..' Y, } satisfies the hypotheses
of Theorem 6.4.2. Application of the theorem immediately gives the required
result (7.3.7).
0
The next step is to extend Proposition 7.3.2 to MA ( oo) processes.
Proposition 7.3.3. Proposition 7.3.2 remains true if we replace (7.3.6) by
{ Z, }
j= - co
"'
110(0, a 2 ),
(7.3.8)
PROOF. The idea of the proof is to apply Proposition 7.3.2 to the truncated
sequence
m
x,m = L 1/Jj Zt-j •
j= - m
and then to derive the result for { X, } by letting m � oo . For 0 � p � h we
define
n
Y!( p) = n - 1 L X,mX(r+ p)m·
t=l
§7.3.* Derivation of the Asymptotic Distributions
Then by Proposition 7.3.2
n t lz
[
y!(O) - Ym (O)
:
]
229
=
Ym ,
Y! (h) - Ym (h)
where Ym C ) is the autocovariance function of {X1m }, Ym - N(O, Vm ) and
Vm =
[
(17 -
3)ym ( P) Ym (q) +
+ Ym (k + q)ym (k - p))
Now as m --+
�
k� oo
(Ym (k)ym (k - P + q)
l. q�O
. ... , h
·
oo,
where V is defined like Vm with Ym ( · ) replaced by y( · ). Hence
Ym = Y where Y - N(O, V).
The proof can now be completed by an application of Proposition 6.3.9
provided we can show that
lim lim sup P(n 1 12 l y! (p) - Ym ( P) - y*(p) + y(p) l > c) = 0, (7.3.9)
m.--. oo n.__... oo
for p = 0, 1 , . . . , h.
The probability in (7.3.9) is bounded by c - 2 n Var(y! (p) - y*(p)) =
2
c - [ n Var (y! (p)) + n Var(y*(p)) - 2n Cov(y!(p), y* (p))]. From the calcula­
tions of Proposition 7.3. 1 and the preceding paragraph,
lim lim n Var(y! (p)) = lim n Var(y*(p))
vpq
where is the (p, q)-element of V Moreover by a calculation similar to that
given in the proof of Proposition 7.3. 1 , it can be shown that
lim lim n Cov(y!(p), y*(p)) = vPP '
(7.3. 1 0)
lim lim sup c - 2 n Var(y!(p) - y* (p)) = 0.
(7.3. 1 1)
whence
This establishes (7.3.9).
D
Next we show that, under the conditions of Proposition 7.3.3, the vectors
[y*(O), . . . , y*(h)]' and [1!(0), . . . , y(h)]' have the same asymptotic distribution.
Proposition 7.3.4. If { X1 } is the moving average process,
7. Estimation of the Mean and the Autocovariance Function
230
{ Z, } � IID(O, a 2 ),
j= - oo
where I � _ 00 1 1/Jj l < oo and ez: = 11a4 < oo , and if y( · ) is the autocovariance
function of { X, }, then for any non-negative integer h,
[ ] ([ l
'
y(O)
is AN
.: , n - 1 v
' y(h)
Y (O)
:.
y(h)
where V is the covariance matrix,
V=
[
(1'/
- 3)y(p)y(q) +
�
l
k = oo
+ y(k + q)y(k - p))
(7.3. 1 3)
q = O, h
[ �� Xr+p
+ n - 1 !2
(7.3.12)
(y(k)y(k - P + q)
.
...,
PROOF. Simple algebra gives, for 0 :=::; p :=::; h,
n 1 12 (y* (p) - y(p)) = n 1 12 X. n - 1
)
+
n-1
n
�� X,
+ ( 1 - n- 1 p) X.
J
"
XI Xt+p ·
f..t=n-p+1
The last term is op ( 1 ) since n - 112E I L �=n - p + 1 X,X, +p l :=::; n- 1 12 py(O) and
n - 112 py(O) --> 0 as n --> oo. By Theorem 7. 1 .2 we also know that
( C � 1/JiY ) .
n 1 12 X. = Y wher e Y � N O, a 2
= oo
1
2
which implies that n 1 x. is OP (l). Moreover by the weak law of large numbers
(cf. Proposition 6.3.1 0),
n -p
n-p
n - 1 Xr +p + n - 1 X, + ( 1 - n - 1 p) X. � 0.
1�
[ 1�
J
From these observations we conclude that
n 112 (y* (p) - y(p)) = op ( l ) as n --> oo ,
and the conclusion of the proposition then follows from Propositions 6.3.3
and 7.3.3.
D
Remark 1. If { 1'; } is a stationary process with mean Jl, then Propositions 7.3. 1 7.3.4 apply to the process {X, } = { 1'; - J1 }, provided of course the specified
conditions are satisfied by { 1'; - J1}. In particular if
00
c:o
1'; = J1 + I 1/Jj Zt+j•
j= -
{ Z, }
�
IID(O, a 2 ),
§7.3.* Derivation of the Asymptotic Distributions
23 1
where I.'l= - co I t/1) < oo and EZ,4 = Yfa4 < oo and if y( · ) is the autocovariance
function of { Y, }, then for any non-negative integer h,
[ � ] (r � ] n-1 v),
y O)
y O)
is AN
,
y (h)
y(h)
where V is defined by (7.3. 1 3) and y(p) = n - 1 L,j:-� (lj - Yn ) ( lJ+h - Y,).
We are now in a position to prove the asymptotic joint normality of the
sample autocorrelations.
PROOF OF THEOREM 7.2. 1 . Let g( · ) be the function from [R1h + 1 into !Rh defined by
x0 # 0.
If y ( " ) is the autocovariance function of {X,}, then by Proposition 6.4.3 and
Remark 1 above,
p(h) = g ( [Y (O), . . . , y(h) ] ') is AN(g ( [y(O), . . . , y(h) ] '), n 1 D VD'),
�-
i.e. p(h) is AN(p(h), n - 1 D VD'), where V is defined by (7.3. 1 3) and D is the matrix
of partial derivatives,
D
=
y(0)
_1
J
-p(1) 1 0 · · · O
�(2) 0 1 . 0 .
.
- p(h) 0 0 . . .
1
Denoting by vij and W;j the (i,j)-elements of V and D VD' respectively, we find
that
wij = vij - p(i)v0j - p(j)v ;0 + p(i)p(j) Voo
=
�[
k=
co
p(k)p(k - i + j) + p(k - i) p (k + j) + 2p(i)p(j)p 2 (k)
- 2p(i)p(k)p(k + j) - 2p(j)p(k) p(k - i)
J.
Noting
that
L k p(k)p(k - i + j) = L k p(k + i)p(k + j)
and
that
L k p(j)p(k)p(k - i) = L k p(j)p(k + i)p(k), we see that wij is exactly as
specified in the statement of Theorem 7 . 2. 1 .
D
We next turn to the proof of Theorem 7.2.2 which is broken up into a series
of propositions.
Proposition 7.3.5. If {X, } is the moving average process
232
7. Estimation of the Mean and the Autocovariance Function
where L�
- oc
1 1/l i l
j=
< oo
- oo
and L� - oo 1/!J ijl
y*(h) !'.
< oo,
c=�oo 1/ljl/lj+h) (52
PROOF. We give the proof for h
=
then for h ;:o: 0,
=
y(h).
0. The general case is similar. Now
n
y*(O) = n - 1 L L l/1; 1/lj Zr -i Zr -j
<= 1 i, j
n
n - 1 L "L l/l f Z �- ; + Y.,
t= 1 i
1
where Y, = L L i 7'i 1/1;1/li n L �= 1 Z,_ ; Zr -j· By the weak law of large numbers
for moving averages (Proposition 6.3. 1 0), the first term converges in
probability to ( L; l/Jl) C5 2 . So it suffices to show that Y, !'. 0. For i 1= j,
{ Z,_ ; Z,_i , t = 0, ± 1, . . . } WN(O, C54) and hence
=
�
(
Var n 1
-
I Z,_ ; Z,_i)
<= 1
=
n - 1 C54 � 0.
Thus for each positive integer k
Y,k =
and
n
L L 1/1; 1/lj n- 1 L z,_ ; Zr -j !'. 0,
r= 1
lil s k . lil s k, i h
lim lim sup E I Y, - Y,k l s lim lim sup L L l l/l;l/li i E I Z1 Z2 I
k� oo n �oo
k� ro n �oo lil > k lil> k
0.
Now appealing to Proposition 6.3.9, we deduce that Y, !'. 0.
=
D
Proposition 7.3.6. Let { X, } be as defined in Proposition 7.3.5 and set
y*(h)
p *(h) = y*( ) Jor h = 1 , 2, . . .
O
r.
Then
(
.
L L aii Z, _ ; Zr - i +i
n 1 12 p*(h) - p(h) - (y*(0)) - 1 n - 1 12 r=I j,<O
i
1
where
PROOF. We have
)
!'. 0
(7.3.14)
i = 0, ± 1 ' . . . ; j = ± 1 ' ± 2, . . . .
§7.3.* Derivation of the Asymptotic Distributions
233
p *(h) - p (h) = (y*(0)) -1 (y*(h) - p (h)y*(O))
(�
)
� 1/1;1/!jZr -i Zr+ h-j - p (h)L L I/!;1/JjZr - ; Zr -j
= (y*(O)f 1 n - 1 f
t =l
n
= (y*(0)) - 1 n - 1 L I I 1/1; ( 1/1;-j+ h - p (h) !/1; j )Zr -i zr- i +i '
t= 1 i j
l
l
1
1
so that the left side of (7.3. 1 4) is
t )
= (y*(0))- 1 n - 1 12 � [ 1/1; ( 1/J; + h - p (h) I/J; ) (t Zr2 Un;) l
(
(y*(0))- 1 n- 1 12 � 1/1; ( 1/J; + h - p (h) I/J; ) r Zr2- ;
(7.3. 1 5)
+
where Uni = I7:{_ ; Z12 - L7= 1 Z? is a sum of at most 2 1 i l random variables.
Since L ; 1/1; ( 1/J; + h - p (h)l/f; ) = 0 and y*(O) .!.. ( L ; I/Jl)a 2 , the proof will be com­
plete by Proposition 6. 1 . 1 once we show that
L 1/1; (1/J;+ h - p (h) I/J; ) uni = Op(1).
i
But,
�
lim sup £ � 1/1; (1/J;+ h - p (h) I/J; ) Un i
n-+co
(
t
)
:-:;; � 1 1/1;1/J;+ h l + l l/ld 2 (2 l i l )a 2
(7.3. 1 6)
l
< OO
and this inequality implies (7.3.16) as required.
D
Proposition 7.3.7. Let Xr and aii be as defined in Proposition 7.3.6. Then for
each positive integer j,
and
n- 1 12
[� aij tt zt_izt -i+j - � aij t ztzt+j] .!.. 0
t
(7.3. 1 7)
(7.3. 1 8)
PROOF. The left side of (7.3. 1 7) is equal to n - 112 L ; a ij un i where uni =
I7:1- i Zt Zt+j - L7= 1 ztzt+j is a sum of at most 2 1 i l products ztzt+j ·
Moreover,
234
7. Estimation of the Mean and the Autocovariance Function
< 00 .
Consequently n - 1 12 I, ;a;j Un ; _:. 0. The proof of (7.3.18) is practically identical
and is therefore omitted.
D
Proposition 7.3.8. Let { X, } be the moving average process defined in Prop­
osition 7.3.5. Then for every positive integer h,
n 112 ( p*(h) - p (h))' = ( Y1 , . . . , Y;,)'
where p*(h) = (p*(1), . . . , p*(h)),
and N1 , N2 ,
• • •
00
I. (p(k + n + p(k - j) - 2p(j)p(k)) �,
j=l
are iid N (0, 1) random variables.
1k
=
PROOF. By Proposition 7.3.6,
n
n 1 12 (p*(h) - p(h)) = (y*(O)f 1 n - 1 12 rI, I I, a ijZr - i Zr - i+j + op (1). (7.3. 19)
= 1 j#O i
Also by Problem 6.24, we have for each fixed positive integer m,
(J -2 n - 1 /2
(rI= 1 Z,Zr+1 • · · · • rI=1 Z,Zr+m) = (N1 , .
.
. , Nm )'
where N1 , , Nm are iid N(O, 1 ) random variables. It then follows from
Propositions 7.3.7 and 6.3.4 that
• • •
(
)
rr - 2 n - 1 12 I, I I aijZr - i Zr - i +j = I I, (aij + a;, -) �· (7.3.20)
j= 1 i
O<ljl ,;; m r = 1 i
We next show that (7.3.20) remains valid with m replaced by oo . By Prop­
osition 6.3.9, m may be replaced by oo provided
I I, (a j a ) = I, I (aij + a ;. -) � as m -+ oo
j= 1 i i + ;, -j � j= 1 i
m
and
oo
(
lim lim sup Var n - 1 12 .L
m-oo
n -oo
I
IJ I>m t ;:::; l
(� aijZ,_ ;Zr-i+j))
l
=
0.
(7.3.21 )
(7.3.22)
§7.3.* Derivation of the Asymptotic Distributions
235
Now (7.3.2 1 ) is clear (in fact we have convergence in probability). To prove
(7.3.22), we write Var(n� 112 I lil>m I�=l I ; a ij Zc�; Zr-;+) as
n
n
n � 1 I I I I I I a ii akt E(Z, �; Zc�i+jzs�kzs�k+t)
s= l t= i i lil >m k l ll >m
n
n
= n � J I I I I a ij (as�t+i. j + as�t+i�j. �j ) (J4
s=l t=i i lil >m
This bound is independent of n. Using the definition of a;i it is easy to
show that the bound converges to zero as m ---> oo , thus verifying (7.3.22).
Consequently (7.3.20) is valid with m = oo. Since y*(O) ..!'.. ( I i t{I/ ) (J2, it follows
from (7.3. 1 9) and Proposition 6.3.8 that
n 112(p* (h) - p(h))
=> jI=l (I a i + a;, �i) � / (I t/1?)
l
i
l
= }/, .
Finally the proof of the joint convergence of the components of the vector
n 112 (p* (h) - p(h)) can be carried out by writing vector analogues of the
preceding equations.
D
PROOF OF THEOREM 7.2.2. As in the proof of Theorem 7.2. 1 , (see Remark 1 ) we
may assume without loss of generality that J1 = 0. By Proposition 6.3.3, it
suffices to show that
n 1 12 (p*(h) - p(h)) ..!'.. 0 for h = 1 , 2, . . . .
As in the proof of Proposition 7.3.4 (and assuming only that { Z, }
we have for h ;:::: 0,
By Proposition 7.3.5,
y*(h) ..!'.. y(h) for h
and hence
Y (h) ..!'.. y(h) for h
;::::
;::::
0
0.
-
IID(O, (J2))
7. Estimation of the Mean and the Autocovariance Function
236
Thus by Proposition 6. 1 . 1 ,
n 112 (p* (h) - p(h)) = n 1 12 (y*(h) - y (h))/y*(O)
+ n 1 12 (y(O) - y*(O)) y (h)/(y*(O) y (O))
=
op( l ).
0
Problems
7. 1 . If {X, } is a causal AR( 1 ) process with mean fl., show that Xn is AN (fl.,
0"2(1 - ¢)- 2 n - 1 ). In a sample of size 1 00 from an A R ( 1 ) process with 1ft = .6 and
0" 2 2, we obtain Xn = .27 1 . Construct an approximate 95% confidence interval
for the mean fl.· Does the data suggest that f1 = 0?
=
7.2. Let { X, } be a stationary process with mean fl. Show that the best linear
unbiased estimator /l of f1 is given by fln = (1Tn- 1 tr 1 tT;1 Xn where rn is the
covariance matrix of X" = (X 1 , . . . , Xnl' and 1 = ( 1 , . . . , 1 )'. Show also that
Var(Jinl = (tT; 1 t) - 1 . [Jin is said to be the best linear unbiased estimator
of f1 if E(fln - f1.) 2 = min E I Y - f1 1 2 where the minimum is taken over all
Y E sp { X 1 , . . . , X" } with E Y = fl ]
.
7.3. Show that for any series { x 1 , . . . , xn }, the sample autocovariances satisfy
L lhl < n Y(h) = 0.
7.4. Use formula (7.2.5) to compute the asymptotic covariance matrix of p ( l ), . . . , p(h)
for an M A ( 1 ) process. For which values of} and k in { 1 , 2, . . . } are p(j) and p(k)
asymptotically independent?
7.5. Use formula (7.2.5) to compute the asymptotic covariance of p ( 1 ) and p(2) for
an A R ( 1 ) process. What is the behaviour of the asymptotic correlation of p ( l )
and p(2) a s 1ft --+ ± 1 ?
7.6. F o r a n AR( 1 ) process the sample autocorrelation p ( l ) i s AN(¢, ( 1 - ¢2)n- 1 ).
Show that n 112 (p( 1) - ¢)/( 1 - p 2 (1)) 112 is AN(O, 1). If a sample of size 1 00 from
an A R ( 1 ) process gives p ( l ) = .638, construct a 95% confidence interval for ¢. Is
the data consistent with the hypothesis that 1ft = .7?
7�7. In Problem 7.6, suppose that we estimate 1ft by (p(3)) 113 . Show that (p(3)) 1 13 is
AN(¢, n- 1 v) and express v in terms of ¢. Compare the asymptotic variances of
the estimators p(l) and (p(3))1 13 as 1ft varies between - 1 and 1 .
7.8. Suppose that { X, } is the A R ( 1 ) process,
X, -
f1
= tft (X, - fl.) + Z,
where 1 ¢ 1 < 1 . Find constants an > 0 and bn such that exp(Xn) is AN(b", an).
7.9. Find the asymptotic distribution of p(2)/p ( l ) for the Gaussian MA( 1 ) process,
{ Z, } - IID(O, v),
where 0 < 1 0 1 < 1 .
Problems
237
7. 1 0. If {X, } is the M A ( 1 ) process in Problem 7.9, the moment estimators {J and () of
0 and v based on the observations {X1 , , X. } are obtained by equating the
sample and theoretical autocovariances at lags 0 and 1. Thus
v( 1 + B 2 ) = y(O),
• • •
and
0/( 1
+
02 ) = ,0 ( 1).
Use the asymptotic joint distribution of (Y{O), p(l ) )
(a) to estimate the probability that these equations have a solution when 0 = .6
and n = 200 (B must be real), and
(b) to determine the asymptotic joint distribution of (v, B)'.
7. 1 1 . If X 1 ,
. • .
, x. are n observations of a stationary time series, define
Show that the function Y( · ) is non-negative definite and hence, by Theorem 1 .5 . 1 ,
that Y( · ) i s the autocovariance function o f some stationary process { Y, } . From
Proposition 3.2. 1 it then follows at once that { Y,} is an MA(n - 1) process. (Show
that tn + h is non-negative definite for all h :2: 0 by setting ¥;, + 1 = ¥;,+ 2 = · · · =
Y, + h = 0 in the argument of Section 7.2.) Conclude from Proposition 5. 1 . 1 that
if y(O) > 0, then f. is non-singular for every n.
CHAPTER 8
Estimation for ARMA M odels
The determination of an appropriate ARMA(p, q) model to represent an
observed stationary time series involves a number of inter-related problems.
These include the choice of p and q (order selection), and estimation of the
remaining parameters, i.e. the mean, the coefficients { f/J; , ej : i = 1 , . . . , p;
j = 1 , . . . , q} and the white noise variance CJ 2 , for given values of p and
q. Goodness of fit of the model must also be checked and the estimation
procedure repeated with different values of p and q. Final selection of the most
appropriate model depends on a variety of goodness of fit tests, although it
can be systematized to a large degree by use of criteria such as the AICC
statistic discussed in Chapter 9.
This chapter is devoted to the most straightforward part of the modelling
procedure, namely the estimation, for fixed values of p and q, of the parameters
cjl = (f/J 1 , , f/Jp )', 0 = (81 , , 8q )' and CJ 2 . It will be assumed throughout that
the data has been adjusted by subtraction of the mean, so our problem
becomes that of fitting a zero-mean ARMA model to the adjusted data x 1 ,
. . . , xn - If the model fitted to the adjusted data is
• . .
• . .
x, - f/J1 X, _ 1 - · · · - f/Jpx, _p = z, + 81 Z,_ 1 + · · · + eqz,_q,
{ Z, }
�
WN(O, CJ 2 ),
then the corresponding model for the original stationary series { Y; } is found
by substituting 1j - y for Xj , j = t, . . . , t - p, where y = n - 1 I 'i; 1 yj is the
sample mean of the original data, treated as a fixed constant.
In the case q = 0 a good estimate of cjl can be obtained by the simple device
of equating the sample and theoretical autocovariances at lags 0, 1 , . . . , p.
This is the Yule-Walker estimator discussed in Sections 8 . 1 and 8.2. When
q > 0 the corresponding procedure, i.e. equating sample and theoretical
239
§8. 1 . The Yule�Walker Equations
autocovariances at lags 0, . . . , p + q, is neither simple nor efficient. In Sections
8.3 and 8.4 we discuss a simple method, based on the innovations algorithm
(Proposition 5.2.2), for obtaining more efficient preliminary estimators of the
coefficients when q > 0. These are still not as efficient as least squares or
maximum likelihood estimators, and serve primarily as initial values for the
non-linear optimization procedure required for computing these more effi­
cient estimators.
Calculation of the exact Gaussian likelihood of an arbitrary second order
process and in particular of an ARMA process is greatly simplified by use of
the innovations algorithm. We make use of this simplification in our discus­
sion of maximum likelihood and least squares estimation for ARMA processes
in Section 8.7. The asymptotic properties of the estimators and the determina­
tion of large-sample confidence intervals for the parameters are discussed in
Sections 8.8, 8.9, 8. 1 1 and 1 0.8.
§8. 1 The Yule-Walker Equations and Parameter
Estimation for Autoregressive Processes
Let {X, } be the zero-mean causal autoregressive process,
{Z,} � WN(0, � 2 ).
(8.1.1)
Our aim i s t o find estimators of the coefficient vector � = (ifJ 1 , . . . , l/Jp )' and the
white noise variance � 2 based on the observations X 1 , . . . , Xn The causality assumption allows us to write X, in the form
00
X, = L t/Jj Zr �j '
j=O
(8.1 .2)
where by Theorem 3. 1.1, t/J(z) = L i'= o t/Ji z i = 1 /ifJ (z), lzl � 1 . Multiplying each
side of (8. 1 . 1 ) by X, �i ' j = 0, . . . , p, taking expectations, and using (8. 1.2) to
evaluate the right-hand sides, we obtain the Yule� Walker equations,
and
(8. 1 .3)
(8. 1 .4)
where rP is the covariance matrix [y(i - j)JL=1 and yp = (y( l ), y(2), . . . , y(p))'.
These equations can be used to determine y(O), . . . , y(p) from � 2 and �·
On the other hand, if we replace the covariances y(j),j = 0, . . . , p, appearing
in (8.1 .3) and (8. 1.4) by the corresponding sample covariances y(j), we obtain
a set of equations for the so-called Yule-Walker estimators � and 6" 2 of � and
� 2 , namely
(8. 1.5)
240
8. Estimation for ARMA Models
and
(8. 1 .6)
rr 2 = y eo) - cf>' 1P '
where rp = [y(i - j)JL=I and Yp = (y(l ), y (2), . . . y (p)) .
If y(O) > 0, then by Problem 7. 1 1, fp is non-singular. Dividing each side of
(8. 1 .5) by y (O), we therefore obtain
'
'
(8. 1 . 7)
and
- A f R_ - J A ]
2
(8 . 1 .8)
(JA = Y' (0) [ 1 Pp P Pp ,
where pP = (p(l ), . . . , p(p))' = yP /Y (O).
With <f> as defined by (8. 1 .7), it can be shown that 1 - ¢ 1 z - · · · - ¢P zP #- 0
for \ z \ ::;; 1 (see Problem 8.3). Hence the fitted model,
is causal. The autocovariances yp(h), h = 0, . . . , p of the fitted model must
therefore satisfy the p + 1 linear equations (cf. (8. 1 .3) and (8.1 .4))
h = 1, . . . , p,
h = 0.
However, from (8. 1 .5) and (8. 1 .6) we see that the solution of these equations
is yp(h) = y(h), h = 0, . . . , p so that the autocovariances of the fitted model at
lags 0, . . . , p coincide with the corresponding sample autocovariances.
The argument of the preceding paragraph shows that for every non-singular
covariance matrix rp +t = [y(i - j)J f.}� 1 there is an AR(p) process whose
autocovariances at lags 0, . . . , p are y(O), . . . , y (p). (The required coefficients
and white noise variance are found from (8. 1 .7) and (8. 1 .8) on replacing p(j)
by y(j)/y(O),j = 0, . . . , p, and y(O) by y(O). ) There may not however be an MA(p)
process with this property. For example if y (O) = 1 and y (1) = y ( - 1) = [3, the
matrix r2 is a non-singular covariance matrix for all f3 E ( - 1 , 1 ). Consequently
there is an AR( 1 ) process with autocovariances 1 and f3 at lags 0 and 1 for all
f3 E ( - 1 , 1 ). However there is an MA(l) process with autocovariances 1 and f3
at lags 0 and 1 if and only if I /31 ::;; 1/2. (See Example 1 .5. 1 .)
It is often the case that moment estimators, i.e. estimators which (like cf>)
are obtained by equating theoretical and sample moments, are far less efficient
than estimators obtained by alternative methods such as least squares or
maximum likelihood. For example, estimation of the coefficient of an MA(1)
process by equating the theoretical and sample autocorrelations at lag 1 is
very inefficient (see Section 8.5). However for an AR(p) process, we shall see
that the Yule-Walker estimator, cf>, has the same asymptotic distribution as
n --> oo as the maximum likelihood estimator of cj) to be discussed in Sections
8.7 and 8.8.
Theorem 8.1.1. If {X, } is the causal AR(p) process (8. 1 . 1 ) with { Z, }
�
IID(O, (J 2 ),
§8.2. Preliminary Estimation, the Durbin-Levinson Algorithm
24 1
and � is the Yule - Walker estimator of cj}, then
n l 12 (� - cj)) => N(O, u 2 rp- l ),
where rP is the covariance matrix [y(i - j)JL= I · Moreover,
a- 2 .:. (T2.
PROOF. See Section 8. 1 0.
D
Theorem 8. 1 . 1 enables us in particular to specify large-sample confidence
regions for cj) and for each of its components. This is illustrated in Example
8.2. 1 .
I n fitting autoregressive models to data, the order p will usually be
unknown. If the true order is p and we attempt to fit a process of order m, we
should expect the estimated coefficient vector �m = (�ml , . . . , �mmY to have a
small value of �mm for each m > p. Although the exact distribution of �mm for
m > p is not known even in the Gaussian case, the following asymptotic result
is extremely useful in helping us to identify the appropriate order of the process
to be fitted.
Theorem
8.1.2. If {X, } is the causal AR(p) process (8. 1 . 1 ) with { Z,}
A
and if cj}m = (f/Jm l , . . . , ¢mmY = R ;;. I Pm• m > p, then
n l12 (�m - cj)m ) => N(O, u2 r,;; l ),
A
A
�
� IID(O, u 2 ),
where cj}m is the coefficient vector of the best linear predictor cj}�Xm of Xm+ l
based on X m = (Xm • . . . , X d' , i.e. cj}m = R ;;. 1 Pm · In particular for m > p,
PROOF. See Section 8. 1 0.
D
The application of Theorem 8. 1 .2 to order selection will be discussed in
Section 8.2 in connection with the recursive fitting of autoregressive models.
§8.2 Preliminary Estimation for Autoregressive
Processes Using the Durbin- Levinson Algorithm
Suppose we have observations x 1, . . . , xn of a zero-mean stationary time series.
Provided y(O) > 0 we can fit an autoregressive process of order m < n to the
data by means of the Yule-Walker equations. The fitted AR(m) process is
8. Estimation for ARMA Models
242
where from (8. 1 .7) and (8. 1 . 8),
(8.2.2)
and
(8.2.3)
Now if we compare (8.2.2) and (8.2.3) with the statement of Corollary 5. 1 . 1 ,
we see that �m and {jm are related to the sample autocovariances i n the same
way that �m and vm are related to the autocovariances of the underlying
process {Xr } · (As in Theorem 8. 1 .2, �m is defined as the coefficient vector of
the best linear predictor ��Xm of Xm +1 based on X m = (Xm , . . . , X 1 ) ; vm is the
corresponding mean squared error.)
Consequently (if y(O) > 0 so that R1 , R 2 , are non-singular) we can use
the Durbin-Levinson algorithm to fit autoregressive models of successively
increasing orders 1, 2, . . . , to the data. The estimated coefficient vectors �� >
� 2 , . . . , and white noise variances 0 1 , 0 2 , , are computed recursively from
the sample co variances just as we computed �1 , � 2 , . . . , and v 1 , v 2 , , from
the covariances in Chapter 5. Restated in terms of the estimates �m , vm , the
algorithm becomes:
'
. . •
• • .
• • •
Proposition 8.2.1 (The Durbin- Levinson Algorithm for Fitting Autoregressive
Models). If y(O) > 0 then the fitted autoregressive models (8.2. 1 ) for m = 1 , 2,
. . . , n - 1 , can be determined recursively from the relations, �1 1 = p(l), 01 =
y(O) [1 - ,0 2 ( 1 )],
(8.2.4)
(8.2.5)
and
(8.2.6)
Use of these recursions bypasses the matrix inversion required in the
direct computation of �m and vm from (8. 1 .7) and (8. 1 .8). It also provides
us with estimates �1 1 , �2 2 , . . , of the partial autocorrelation function at
lags 1, 2, . . . . These estimates are extremely valuable, first for deciding on
the appropriateness of an autoregressive model, and then for choosing an
appropriate order for the model to be fitted.
We already know from Section 3.4 that for an AR( p) process the partial
autocorrelations a(m) = rPmm , m > p, are zero. Moreover we know from
Theorem 8. 1 .2 that for an AR(p) process the estimator �mm , is, for large n and
each m > p, approximately normally distributed with mean 0 and variance
.
§8.2. Preliminary Estimation, the Durbin- Levinson Algorithm
243
1/n. If an autoregressive model is appropriate for the data there should
consequently be a finite lag beyond which the observed values �mm are
compatible with the distribution N(O, 1/n). In particular if the order of the
process is p then for m > p, �mm will fall between the bounds ± l .96n- 1 12 with
probability close to .95. This suggests using as a preliminary estimator of p
the smallest value ofr such that l �mm l < l .96n- 112 for m > r. (A more systematic
approach to order selection based on the AICC will be discussed in
Section 9.2.) Once a value for p has been selected, the fitted process is specified
by (8.2. 1 ), (8.2.2) and (8.2.3) with m = p.
Asymptotic confidence regions for the true coefficient vector lj)P and for its
individual components r/JPi can be found with the aid of Theorem 8. 1 . 1 . Thus,
if xi ( P) denotes the ( 1 - a) quantile of the chi-squared distribution with p
degrees of freedom, then for large sample size n, the region
-a
(8.2.7)
contains <I>P with probability close to (1 - a) . (See Problems 1 . 1 6 and 6. 1 4.)
Similarly, if <l> 1 _a denotes the (1 - a) quantile of the standard normal distri­
bution and vjj is the r diagonal element of vp rp- l ' then for large n the interval
{ r/J E iR : l r/J - �Pii � n- 1 12 <1> 1 -a/2 vJF }
(8.2.8)
contains r/Jpi with probability close to ( 1
-
a).
EXAMPLE 8.2. 1 . One thousand observations x 1 , . . . , x 1 000 of a zero-mean
stationary process gave sample autocovariances y(O) = 3.6840, ]1 ( 1 ) = 2.2948
and ]1(2) = 1 .849 1 .
Applying the Durbin-Levinson algorithm to fi t successively higher order
autoregressive processes to the data, we obtain
�1 1 = p(l) = .6229,
V I = ]1 (0) ( 1 - p 2 ( 1 )) = 2.2545,
�2 2 = []1 (2) - �1 1 y( l) ] /v l = . 1 861,
�2 1 = �1 1 - �2 2 �1 1 = .5070,
v 2 = V 1 ( 1 - �12 ) = 2. 1 764.
The computer program PEST can be used to apply the recursions
(8.2.4)-(8.2.6) for increasing values of n, and hence to determine the sample
partial autocorrelation function (foii• shown with the sample autocorrelation
function p(j) in Figure 8. 1 . The bounds plotted on both graphs are the values
± 1 .96n - 1 1 2 .
Inspection of the graph of �ii strongly suggests that the appropriate model
for this data is an AR(2) process. Using the Yule-Walker estimates �2 1 , �22
and v 2 computed above, we obtain the fitted process,
{ Z, } "' WN(0, 2. 1 764).
244
8. Estimation for ARMA Models
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
0.3
-0.4
-0.5
-0.6
- 0. 7
-0.8
-0.9
-1
0
10
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
(b)
Figure 8. 1 . The sample ACF (a) and PACF (b) for the data of Example 8.2. 1 , showing
the bounds ± 1 .96n�1f2 •
§8.3. Preliminary Estimation for Moving Average Processes
245
From Theorem 8. 1 . 1 , the error vector cf, cJ1 is approximately normally
distributed with mean 0 and covariance matrix,
-
n- 1 v 2 r� 2- 1
=
=
[
n- 1 1 -
[
2
p(j) ¢l2i
�
i
.000965
- .000601
A
J[
1
p(1)
]
- .00060 1
.000965 ·
From (8.2.8) we obtain the approximate .95 confidence bounds, Ji ±
1 .96(.000965) 1 12 for f/J; , i = 1 , 2. These are .5070 ± .0609 for f/J 1 and
. 1 86 1 ± .0609 for ¢l2 .
The data for this example came from a simulated AR(2) process with
coefficients f/J1 = .5, f/J2 = .2 and white noise variance 2.25. The true coeffi­
cients thus lie between the confidence bounds computed in the preceding
paragraph.
§8.3 Preliminary Estimation for Moving Average
Processes Using the Innovations Algorithm
Just as we can fit autoregressive models of orders 1 , 2, . . . , to the data
x1 ,
, x. by applying the Durbin-Levinson algorithm to the sample auto­
covariances, we can also fit moving average models,
• • •
{Z1 }
� WN(O, vm ),
(8.3. 1 )
of orders m = 1 , 2, . . . , by means of the innovations algorithm (Proposition
5.2.2). The estimated coefficient vectors am := (Om 1 , . . . , emm )', and white noise
variances vm , m = 1 2, . . . , are specified in the following definition. (The
justification for using estimators defined in this way is contained in Theorem
8.3. 1 .)
Definition 8.3.1 (Innovation Estimates of Moving Average Parameters). If
y(O) > 0, we define the innovation estimates am , vm appearing in (8.3.1) for
m = 1 , 2, . . . , n - 1 , by the recursion relations, v0 = y(O),
k = 0, .
. .
,m -
1 , (8.3.2)
and
m- 1
vm = y(O) - L e�.m-A ·
j=O
(8.3.3)
8. Estimation for A R M A M odels
246
Theorem 8.3.1 (The Asymptotic Behavior of Om ). Let { X, } be the causal
invertible ARMA process ifJ(B)X, = B(B) Z,, {Z, } "' 110(0, a2 ), EZ� < oo, and
let t/l (z) = L.i=o t/lj z j = () (z)/ifJ(z), l z l :-:;; 1 , (with t/10 = 1 and t/lj = 0 for j < 0).
Then for any sequence of positive integers { m(n), n = 1 , 2, . . . } such that m < n,
m -> oo and m = o(n 1 13 ) as n -> oo, we have for each k,
where A = [a;J �. j = l and
min(i. j)
aij = I t/1; - ,t/lj - r·
r=1
Moreover,
PROOF.
See Brockwell and Davis ( 1988b).
0
Remark. Although the recursive fitting of moving average models using
the innovations algorithm is closely analogous to the recursive fitting of
autoregressive models using the Durbin�Levinson algorithm, there is one
important distinction. For an AR(p) process the Yule� Walker estimator
�P = (�P 1 , , �PP )' is consistent for cj}P (i.e. �P � cj)P ) as the sample size n -> oo .
However for a n MA(q) process the estimator Oq = (Oq 1 ' . . . ' eqq)' i s not consistent
for the true parameter vector 9q as n -> oo. For consistency it is necessary to
use the estimators (Om 1 , , emqY of oq with { m(n)} satisfying the conditions of
Theorem 8.3. 1 . The choice of m for any fixed sample size can be made by
increasing m until the vector (Om 1 , , emqY stabilizes. 1t is found in practice that
there is a large range of values of m for which the fluctuations in Omj are
small compared with the estimated asymptotic standard deviation
n - 11 2 (IJ : � 8;,d 1 1 2 as given by Theorem 8.3.1.
. • •
• • •
. • .
We know from Section 3.3 that for an MA(q) process the autocorrelations
p (m), m > q, are zero. Moreover we know from Bartlett's formula (see Example
7.2.2) that the sample autocorrelation p(m), m > q, is approximately normally
distributed with mean p (m) = 0 and variance n - 1 [1 + 2p 2 (1) + · · · + 2 p 2 (q)].
This result enables us to use the graph of p(m), m = 1 , 2, . . . , both to decide
whether or not a given set of data can be plausibly modelled by a moving
average process and also to obtain a preliminary estimate of the order q. This
procedure was described in Example 7.2.2.
If, in addition to examining p(m), m = 1, 2, . . . , we examine the coefficient
vectors Om , m = 1, 2, . . . , we are able not only to assess the appropriateness
of a moving average model and estimate its order q, but also to obtain
preliminary estimates Om 1 , . . . , emq of the coefficients. We plot the values
em 1 , . . . , emm• 0, 0, . . . for m = 1 , 2, . . . , increasing m until the values stabilize
§8.3. Preliminary Estimation for Moving Average Processes
247
(until the fluctuations in each component are of order n - 1 12 , the asymptotic
standard deviation of 8m 1 ). Since from Theorem 8.3. 1 the asymptotic variance
of {jmj is (J/(81 ' . . . ' ej - 1 ) = n- 1 It:b ef, we also plot the bounds ± 1 .9Mj where
tri = (Ji{jm 1 , . . . , em , j - 1 ). A value of {jmi outside these bounds ,suggests that the
corresponding coefficient ei is non-zero. The estimate of ei is emi and the largest
lag for which {jmi lies outside the bounds ± 1 .96ai is the estimate of the order
q of the moving average process. (A more systematic approach to order
selection using the AICC will be discussed in Section 9.2.)
Asymptotic confidence regions for the coefficient vector Oq and for its
individual components can be found with the aid of Theorem 8.3. 1 . For
example an approximate .95 confidence interval for ei is given by
{
8 E IR . 1 8 - em) � 1 .96n - 1/2
,
A
( )}
j- 1 ' 1 /2
.
em2k
k�O
(8.3.4)
ExAMPLE 8.3. 1 . One thousand observations x 1 , . . . , x 1 000 of a zero-mean sta­
tionary process gave sample autocovariances y(O) = 7.554 1 , y (l) = - 5. 1 24 1
and y (2) = 1 .3805.
The sample autocorrelations and partial autocorrelations for lags up to 40
are shown in Figure 8.2. They strongly suggest a moving average model of
order 2 for the data. Although five sample autocorrelations at lags greater
than 2 are outside the bounds ± 1 .96n- 1 12 , none are outside the bounds
± 1.96n - 1 12 [ 1 + 2p 2 ( 1 ) + 2p 2 (2) ] 112 .
Applying the innovations algorithm to fit successively higher moving
average processes to the data, we obtain v0 = 7.5541 ,
{} , ,
p(l) = - .67832,
v 1 = y(O) - {}f, v0 = 4.0785,
{}22 = v()' ]1 (2) = . 1 8275,
{}2 1 = V� 1 [Y ( 1 ) - {j22 {jl l (j0 ] = - 1 .0268,
V2 = y(O) - 8i2 Do 8i 1 V 1 = 3.0020.
=
-
Option 3 of the program PEST can be used to appl}' the recursions (8.3.2) and
(8.3.3) for larger values of m. The estimated values emi , j = 1, . . . , 1 0 and vm are
shown in Table 8. 1 for m = 1 , , . . , 1 0, 20, 50 and 1 00. It is clear from the table
that the fluctuations in the coefficients from m = 7 up to 1 00 are of order
l 000 - 1 12 = .032. The values of 87i , j = 1 , . . . , 7, plotted in Figure 8.3 confirm
the MA(2) model suggested by the sample autocorrelation function.
The model fitted to the data on the basis of 07 is
X, = Z, - 1 .4 1 Z,_ 1
+
.60Z,_ 2 ,
{Z, }
�
WN(0, 2.24).
(8.3.5)
In fact from Table 8.1 we see that the estimated coefficients show very little
change as m varies between 7 and 1 00.
8. Estimation for ARMA Models
248
1 �-------,
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
- 0. 1
-0.2
- 0.3
-0.4
- 0.5
- 0.6
-07
-0.8
- 0.9
-1
{iJ
Ill
�
'.,(
"' )o( )<>{
"' v
��±/�
\ ��������������������
� �������
=
OH::r
n..Q
-to
19-t'
20
10
0
-=
Q.
30
� �
�
40
(a)
1
0.9
0.8
07
0.6
0.5
0.4
0 . .3
() 2
0.1
0
-0. 1
-0 2
-0.3
-0.4
-0.5
=� �
-0.8
-0.9
-1
i/
lf======��d��
�������������������
1
""=.R-Et
)a'
CJ
.=
'-""1'? n ..-Iii',;;/ ,_...c:tcr
�
0
10
20
30
40
(b)
Figure 8.2. The sample ACF (a) and PACF (b) for the data of Example 8.3. 1 , showing
the bounds ± ! .96n - 112 •
249
§8.3. Preliminary Estimation for Moving Average Processes
Table 8. 1 . Bmi ' j = 1 , . . . , 1 0, and vm for the Data of Example 8.3.1
(}mj
�
1
2
3
4
5
6
7
8
9
10
20
50
1 00
1
2
- 0.68
- 1 .03
- 1 .20
- 1.31
- 1 .38
- 1 .4 1
- 1 .41
- 1 .4 1
- 1 .4 1
- 1 .4 1
- 1 .43
- 1 .43
- 1 .43
.18
.37
.44
.5 1
.57
.60
.61
.61
.61
.63
.62
.62
4
3
.03
- .04
- .03
- .0 1
- .0 1
- .02
- .02
- .02
- .03
- .02
- .03
.07
- .04
- .02
- .02
- .03
- .03
- .02
- .02
- .02
- .0 1
5
()m
7
6
.06
.10
.10
.10
.10
.12
.11
.12
.11
- .02
- .05
- .07
- .08
- .07
- .08
- .08
- .08
8
- .0 1
- .02
- .02
.00
.00
.00
- .0 1
.00
.01
.05
.03
.03
.04
9
10
.01
.04
.02
.02
.01
4.08
3.00
2.65
2.40
2.27
2.24
2.24
2.24
2.24
2.22
2. 1 6
2. 1 0
2.00
.02
- .03
- .03
- .03
An alternative method for obtaining preliminary estimates of the coeffi­
cients (once q has been determined) is to equate the theoretical and sample
autocorrelations at lags 1, . . . , q and solve the resulting non-linear equations
for 81 , . . . , eq . Using the algorithm of Wilson ( 1 969) to determine the solution
0 8
0.6
0.4
0.2
-
0
-0.2
�
.'\.
-----...
------
�
-0.4
-0.6
-0.8
-1
- 1 .2
- 1 .4
- 1 .6
0
2
3
4
5
6
7
Figure 8.3. The estimates e7i, j = 1 , . . . , 7, for the data of Example 8.3 . 1 , showing the
bounds ± 1 .96(I{ : � e�k) 1 1 2 n - 112 .
250
8. Estimation for ARMA Models
for (8 1 , 82 ) such that 1 + 81 z + 82 z 2
X, = Z, - 1 .49Z,_ 1
+
X, = Z, - 1 40Z, 1
+
0 for l z l
.67Zi _ 2 ,
1 , we arrive at the model,
{ Z,} - WN(0, 2.06).
.60Z, _ 2 ,
{ Z,}
f=
<
The actual process used to generate the data in this example was the Gaussian
moving average,
.
_
�
WN(O, 2.25).
It is very well approximated by the preliminary model (8.3.5).
§8.4 Preliminary Estimation for ARMA(p, q)
Processes
Let {X,} be the zero-mean causal ARMA(p, q) process,
X, - r/J1 X,_ 1 - · · · - r/Jp Xr-p = Z, + 8 1 Z,_ 1
The causality assumption ensures that
+ ··· +
8qZr-q•
(8.4. 1 )
{ Z, } - WN(O, (J 2 ).
00
X, = L t/lj Zr-j •
j=O
where by (3.3.3) and (3.3.4), the coefficients t/Ji satisfy
{t/10
= 1,
t/lj = 8j +
min (j,p)
i�
r/J; t/lj - i •
j
=
(8.4.2)
1, 2, . . .
and by convention, 8i = 0 for j > q and r/Ji = 0 for j > p. To estimate
t/l t , . . . , t/lp+q • we can use the innovation estimates (jm l , . . . , em ,p+q • "';:hose
asymptotic behaviour is specified in Theorem 8.3. 1 . Replacing t/li by 8mi in
(8.4.2) and solving the resulting equations,
min(j,p)
emj = 8j + L ,pi em .j -i •
i= 1
j = 1 , 2,
" '
(8.4.3)
, p + q,
for ell and 0, we obtain initial parameter estimates � and 0. From equations
(8.4.3) with j = q + 1, . . . , q + p, we see that � should satisfy the equation,
em . q+ t
em , q+ 2
[ l[
�
em, +p
=
em , q
em ,q+l
�
em, + p - 1
Having solved (8.4.4) for
then easily found from
: : : �m,q+l - p
8m ,q+2 - p
.
.
.
em , q
r/J1
l [rPz]
..
.
r/Jp
.
(8.4.4)
cf, (which may not be causal), the estimate of 0 is
25 1
§8.4. Preliminary Estimation for ARMA(p, q) Processes
0.
�
0.8
�---
-
0.7
0.6
-------
\
0.5
0.4
0.3
0.2
0. 1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
- 0 .9
- 1
0
10
(b)
Figure 8.4. The sample ACF (a) and PACF (b) for the data of Example 8.4. 1 , showing
the bounds ± l .96n - 112•
8. Estimation for ARMA Models
252
j = 1 , 2, . . . , q.
(8.4.5)
Finally the white noise variance CJ 2 is estimated by
In the case of a pure moving average process, p = 0 and the method reduces
to the one described in Section 8.3.
EXAMPLE 8.4. 1 . The sample autocorrelation function and partial autocorrela­
tion function of a zero-mean time series of length 200 are shown in Figure
8.4. Identification of an appropriate model is much less obvious than in
Examples 8.2. 1 and 8.3. 1 . However we can proceed as follows. First use
program PEST, Option 3, to fit a moving average model (8.3. 1 ), with m
chosen so as to give the smallest AICC value. (The AICC is a measure of
goodness of fit, defined and discussed later in Section 9.3.) For this example
the minimum occurs when m = 8 and the corresponding moving average
model has coefficients as follows :
Table 8.2. B8. i , j = 1 , . . . , 8, for the Data of Example 8.4. 1
1
1 .341
2
1 .0 1 9
3
.669
4
.423
5
.270
6
. 1 29
7
.0 1 1
8
-.115
The next step is to search for an ARMA(p, q) process, with p and q small,
such that the equations (8.4.3) are satisfied with m = 8. For any given p and
q (with p + q :<:::; 8), the equations (8.4.3) can be solved for q, and 9 using
Option 3 of PEST with m set equal to 8. At the same time the program
computes the AICC value for the fitted model. The procedure is repeated
for values of p and q such that p + q :<:::; 8 and models with small AICC value
are noted as potentially useful preliminary models. In this particular example
the AICC is minimized when p = q = 1 and the corresponding preliminary
model is
{Z,} � WN(O, 1 .097).
X, - .760X, _ 1 = Z, + .582Z, _ 1 ,
This has a close resemblance to the true model, X, - .8X, _ 1 = Z, + .6Z, _ 1 ,
with {Z,} � WN(O, 1 ), which was used to generate the data. In general the
resemblance will not be so close, so it is essential that preliminary estimation
be followed by application of a more efficient procedure (see Section 8.5).
For larger values of p and q, it is preferable to carry out the search procedure
using maximum likelihood estimation (Option 8 of PEST) without
preliminary estimation. Thus we can fit maximum likelihood models with
p + q = 1 , then p + q = 2, p + q = 3, . . . , using lower order models with
appended zero coefficients as initial models for the likelihood maximization.
(See Sections 8.7 and 9.2.)
§8.5. Remarks on Asymptotic Efficiency
253
§8.5 Remarks on Asymptotic Efficiency
The preliminary estimates (�, 9, 8 2 ) of the parameters in the ARMA(p, q)
model discussed in Section 8.4 are weakly consistent in the sense that
p
A p
p
q, ---> q,, 0 ---> 0 and 8 2 ---> a 2 as n ---> oo .
A
This i s because (with m(n) satisfying the conditions o f Theorem 8.3. 1 ) B i � 1/Ji
m
and O � a 2 • Hence (�, 0) must converge in probability to a solution of (8.4.2),
m
i.e. to (q,, O). In fact using Theorem 8.3. 1 , it may be shown (see Problem 8.22
and Brockwell and Davis ( 1 988a)) that
� q, + Op(n- 112 ) and 9 = 0 + Op(n- 112 ).
=
In the next section we discuss a more efficient estimation procedure (strictly
more efficient if q � 1) of (<j>, 9) based on maximization of the Gaussian
likelihood. We first introduce, through an example, the concept of relative
efficiency of two competing estimators. Consider the MA( 1 ) process
X, Z, + 8Z, _ 1 where I 8 1 < 1 and { Z,} "' IID(O, a 2 ). If (J�l J and 8�21 are two
estimators of 8 based on the observations X 1 , . . , Xn such that 0�1 is
AN(8, aN8)/n), i = 1 , 2, then the asymptotic efficiency of 0�1 l relative to 0�2 l
is defined to be
ai (8)
e(8 , (Jo l, (J< 2l )
.
ai {8)
=
.
=
(This notion of efficiency extends in an obvious way to more general
estimation problems.) If e(8, {J< l J, 0( 2 )) ::::; 1 for all 8 E ( - 1, 1 ) then we say that
0�2) is a more efficient estimator of 8 than e�l ) (strictly more efficient if in
addition e(8, 8< 1 1, 8( 2 )) < 1 for some 8 E ( - 1, 1 )). For the MA( 1 ) process let
{J�l l denote the moment estimator of 8 obtained by solving the equations
y(O) 8 2 ( 1 + 0 2 ) and y(1 ) 8 2 0 for 8 and e. If I !J( l ) l > !- there is no real
solution {J so we define {J sgn(p( 1 )). If I P{ 1 ) 1 ::::; !- then
p(l) = ()� 1 )/(1 + ( 8� 1 )) 2 ).
=
=
=
In general therefore we can write,
8� 1 )
where
{
-1
(1 - ( 1
1
=
g(p(1))
if x < 1
4x 2 ) 112 )/2x if l x l ::::; !-,
g(x)
if x > }.
From Theorem 7.2.2, p( 1 ) is AN(p(1 ), ( 1 - 3p 2( 1 ) + 4p 4( 1))/n), and so by
Proposition 6.4. 1 ,
B� 1 l i s AN( g (p(1 )), af(8)/n),
=
where
-
-
,
254
8. Estimation for ARMA Models
a f (8) = [g'(p( 1 ))] 2 [ 1 - 3 p 2 ( 1 ) + 4p4(1 )]
= ( 1 + ()2 + 484 + 86 + 88 )/( 1 - 82 )2 .
If we now define 0�2 ) = em ! ' the estimator obtained from the innovations
algorithm, then by Theorem 8.3. 1 ,
0�2 ) i s AN(8, n -1 ).
Thus e(0, 8< 1 ), 0(2 )) = aj 2 (8) :::;; 1 for all 181 < 1 , with strict inequality when
8 # 0. In particular
{
.82,
8 = .25,
8 = .5o,
e(8, 8< 1 ), 0(2 )) = .37,
8 = .75,
.06,
1
demonstrating the superiority of 0�2 ) over 0� ). We shall see in Example 8.8.2
that the maximum likelihood estimator 0�3 ) is AN(8, ( 1 - 8 2 )/n). Hence
{
8 = .25,
.94,
8 = .5o,
e(8, 0(2 ), 0(3 )) = .75,
8 = .75 .
.44,
While 0�3 ) is more efficient, 0�2 ) has reasonably good efficiency except when 1 8 1
i s close t o 1 . The superiority of maximum likelihood estimators from the point
of view of asymptotic efficiency holds for a very large class of time-series
models.
§8.6 Recursive Calculation of the Likelihood of an
Arbitrary Zero- Mean Gaussian Process
In this section { X1 } is assumed to be a Gaussian process with mean zero
and covariance function K(i,j) = EXiXj . Let xn = (X 1 ' . . . ' Xn Y and let xn =
(X 1 , , X" )' where X 1 = 0 and Xi = E(Xii X 1 , • • • , Xi - d = PS!i(x, x1_ . ) Xi,
j � 2. Let r" denote the covariance matrix, r" = E(X" X�), and assume that r"
is non-singular.
The likelihood of X" is
• ....
. . •
(8.6. 1 )
The direct calculation o f det r" and r"- 1 can be avoided b y expressing this in
terms of the one-step predictors Xi , and their mean squared errors vi _ 1 ,j = 1 ,
. . . , n, both of which are easily calculated recursively from the innovations
algorithm, Proposition 5.2.2.
Let 8ii, j = 1, . . , i; i = 1, 2, . . , denote the coefficients obtained when
Proposition 5.2.2 is applied to the covariance function K of { X1 }, and let
8i0 = 1 ' 8ij = 0 for j < 0, i = 0, 1 ' 2, . . . . Now define the n X n lower triangular
matrix,
.
.
255
§8.6. Recursive Likelihood Calculation
and the n
x
C = [ 8;.;-J?.}� 0,
n diagonal matrix,
(8.6.2)
D = diag(v0, v 1 , . . . , vn - d ·
(8.6.3)
The innovations representation (5.2. 1 5) of Xi ,j = 1, . . . , n, can then be written
in the form,
X" = (C - J) (X" - X"),
where I is the n
x
n
identity matrix. Hence
X" = X" - X n + X n = C(Xn - X") .
(8.6.4)
Since D is the covariance matrix of (X" - X"), it follows that
1" = CDC'
(8.6.5)
(from which the Cholesky factorization 1" = U U', with U lower triangular,
can easily be deduced).
From (8.6.4) and (8.6.5), we obtain
n
x� rn- 1 X n = (X" - xn yD- 1 (Xn - X") = L (Xj - xy;vj-1 •
j=l
(8.6.6)
and
det 1" = (det C)2 (det D) = v 0 v 1 · · · vn-! ·
The likelihood (8.6. 1 ) of the vector X" therefore reduces to
L ( 1") = (2 n) - "1 ( v0 · · · vn-! ) - 1 12 exp
2
{ -t� (Xi -
(8.6.7)
}
XY/vj - 1 .
(8.6.8)
Applying Proposition 5.2.2 to the covariance function K gives X 1 , X 2, . . . , v0 ,
v 1 , . . , and hence L(1").
If rn is expressible in terms of a finite number of unknown parameters /31 '
. . . , {3,, as for example when { X1 } is an ARMA(p, q) process and r = p + q + 1 ,
i t is usually necessary t o estimate the parameters from the data X"" A standard
statistical procedure in such situations (see e.g. Lehmann ( 1 983)) is to maxi­
mize the likelihood L(/31 , . . . , /3, ) with respect to /31 , . . . , {3,. In the case when
are independently and identically distributed, it is known that
X1 , X2 ,
under rather general conditions the maximum likelihood estimators are
consistent as n --> oo and asymptotically normal with variances as small or
smaller than those of any other asymptotically normal estimators. A natural
estimation procedure for Gaussian processes therefore is to maximize (8.6.8)
with respect to {3 1 , . . . , {3,. The dependence of the sequence {X" } must however
be kept in mind when studying the asymptotic behaviour of the estimators.
(See Sections 8.8, 8. 1 1 and 1 0.8 below.)
Even if {X,} is not Gaussian, it makes sense to regard (8.6.8) as a measure
of the goodness of fit of the covariance matrix rn (/31 ' . . . ' /3, ) to the data, and
.
• • •
256
8. Estimation for ARMA Models
still to choose the parameters {31 , . . • , {3, in such a way as to maximize (8.6.8).
We shall always refer to the estimators /31 , . . . , /3, so obtained as "maximum
likelihood" estimators, even when { Xr } is not Gaussian. Regardless of the joint
distribution of X 1 , . . . , x., we shall also refer to (8.6. 1 ) (and its algebraic
equivalent (8.6.8)) as the "Gaussian likelihood" of X 1 , . . . , X• .
§8.7 Maximum Likelihood and Least Squares
Estimation for ARMA Processes
Suppose now that { Xr } is the causal ARMA(p, q) process,
Xr = f/J 1 Xr - t + · · · + f/JpXr - p + eozr + · · · + eqzr - q•
{Zr } � WN(O, a2),
(8.7. 1)
where e0 = 1 . The causality assumption means that 1 - f/J 1 z - · - f/Jp z P -:/- 0
for l z l � 1 . To avoid ambiguity we shall assume also that the coefficients e;
and white noise variance a2 have been adj usted (without affecting the autoco­
variance function of { Xr } ) to ensure that e(z) = 1 + e1 z + + eq z q -:/- 0 for
l z l < 1 . Our first problem is to find maximum likelihood estimates of the
parameter vectors cj) = (f/J1 , . . . , ¢JP )', 6 = (e1 , . . . , eq )' and of the white noise
variance a2•
In Section 5.3 we showed that the one-step predictors X;+t and their mean
squared errors are given by,
·
·
· · ·
and
(8.7.3)
where eij and r; are obtained by applying Proposition 5.2.2 to the covariance
function (5.3.5). We recall also that eij and r; are independent of a2•
Substituting in the general expression (8.6.8), we find that the Gaussian
likelihood of the vector of observations x. = (X 1 , . . . , X.)' is
[
L(cj}, 6, a2) = (2na2) - "12(r0 . . · r. - 1 r 1 12 exp - t a - 2
j� (Xj - XYh-1 ] .
(8. 7.4)
Differentiating In L(cj), 6, a2) partially with respect to a2 and noting that Xj
and rj are independent of a2, we deduce (Problem 8. 1 1) that the maximum
likelihood estimators �, 9 and r'J2 satisfy
§8.7. Maximum Likelihood and Least Squares ARMA Estimation
257
(8.7.5)
where
n
" (Xi - X�) 2h 1 ,
S(<j>, 9) = L...
j=1
A
A
and cj,, 0 are the values of <j>, 9 which minimize
(8.7.6)
n
/(<j>, 9) = ln(n- 1 S(<j>, 9)) + n - 1 L ln ri_ 1 .
j=1
(8.7. 7)
We shall refer to /(<j>, 9) as the "reduced likelihood". The calculation of /(<j>, 9)
can easily be carried out using Proposition 5.2.2 which enables us to compute
8; - 1 , i , r; _ 1 and X; recursively for any prescribed pair of parameter vectors <j>,
9. A non-linear minimization program is used in the computer program PEST,
in conjunction with the innovations algorithm, to search for the values of 4>
and 9 which minimize /(<j>, 9). These are the maximum likelihood estimates of
4> and 9 respectively. The maximum likelihood estimator of a 2 is then found
from (8.7.5).
The search procedure may be greatly accelerated if we begin with parameter
values <j>0 , 90 which are close to the minimum of /. It is for this reason that
simple, reasonably good preliminary estimates of 4> and 9, such as those
described in Sections 8.2, 8.3 and 8.4, are important. It is essential to begin
the search with a causal parameter vector <j>0 since causality is assumed in the
computation of l(<j>, 9). Failure to do so will result in an error message from
the program. The estimate of 4> returned by the program is constrained to be
causal. The estimate of 9 is not constrained to be invertible, although if the
initial VeCtOr 9o satisfies the condition 1 + 8o 1 Z + . . . + 8o qZ q i= 0 for lzl < 1
and if (<j>0 , 90) is close to the minimum, then it is likely that the value of 0
returned by the program will also satisfy 1 + 81 z + · · · + eq z q i= o for 1 z 1 < 1 .
I f not, i t is a simple matter t o adjust the estimates of a 2 and 9 i n order to
satisfy the condition without altering the value of the likelihood function (see
Section 4.4). Since we specified in (8. 7. 1) that 8(z) i= 0 for l zl < 1, the estimates
0 and 6-2 are chosen as those which satisfy the condition e(z) i= 0 for lzl < 1 .
Note however that this constraint i s not always desirable (see Example 9.2.2).
An intuitively appealing alternative estimation procedure is to minimize
the weighted sum of squares
n
S(<j>, 9) = "
(8.7.8)
(Xi - X�)2 h - 1 ,
jL...
=1
with respect to 4> and 9. The estimators obtained in this way will be referred
to as the "least squares" estimators � and 9 of 4> and 9. In view of the close
relationship (8. 7.7) between l(<j>, 9) and S(<j>, 9), the least squares estimators can
easily be found (if required) using the same computer program PEST. For the
minimization of S(<j>, 9) however, it is necessary not only to restrict <!> to be
causal, but also to restrict 9 to be invertible. Without the latter constraint
258
8. Estimation for ARMA M odels
there will in general be no finite (cp, 9) at which S achieves its minimum value
(see Problem 8.1 3). If n - 1 L J= t In rj-t is asymptotically negligible compared
with In S(cp, 9), as is the case when 9 is constrained to be invertible (since r" -> 1 ),
then from (8.7.7), minimization of S will be equivalent to minimization of l
and the least squares and maximum likelihood estimators will have similar
asymptotic properties. The least squares estimator afs is found from
(8.7.9)
where the divisor (n - p - q) is used (as in standard linear regression theory)
since a - 2 S(cji, 0) is distributed approximately as chi-squared with (n - p - q)
degrees of freedom (see Section 8.9).
§8.8 Asymptotic Properties of the Maximum
Likelihood Estimators
If {X, } is the causal invertible process,
x, - r/J1 x, _ 1 - · · • - r/Jp Xr - p = z, + 81 z,_ 1 + · · · + 8qZr-q •
{ Z, }
�
110(0, a 2 ),
(8.8. 1 )
and i f r/J ( · ) and 8 ( · ) have n o common zeroes, then the maximum likelihood
estimator �� = (J1 , . . . , Jp , () 1 , . . . , eq ) = (cf,', f)') is defined tO be the CaUSal inver­
tible value of �' = W, 9') which minimizes the reduced likelihood /(cp, 9) defined
by (8.7.7). The program PEST can be used to determine cf,, 0 numerically. It
also gives the maximum likelihood estimate 8 2 of the white noise variance
determined by (8.7.5).
The least squares estimators cji, 9 are the causal invertible values of cp and
9 which minimize ln(n - 1 S(cp, 9)) = /(cp, 9) - n - 1 I, }= 1 ln rj _ 1 • Because of the
invertibility the term n- 1 L }= 1 ln rj - t is asymptotically negligible as n -> oo and
the estimators cji and 9 have the same asymptotic properties as cf, and 0. It
follows, (see Theorem 1 0.8.2), that if { Z, } � 110(0, a 2 ) and rp( - ) and 8( " ) are
causal and invertible with no common zeroes, then
(8.8.2)
where the asymptotic covariance matrix V(�) can be computed explicitly from
(8. 1 1 . 14) (see also ( 10.8.30)). Specifically for p � 1 and q � 1,
Eu,v; - 1
(8.8.3)
EVt U't EVt Vt' '
where U, = ( U,, . . . , U, + 1 - p )', V, = ( V, , . . . , V,+ 1 -q)' and { U, } , { V, } are the auto­
V ( �)
regressive processes,
= a
2
[Eu,u;
rp (B) U, = Z,,
J
(8.8.4)
§8.8. Asymptotic Properties of the Maximum Likelihood Estimators
259
and
(8.8.5)
(For p = 0, V(p) = 0" 2 [EV1V;rt, and for q = 0, V(p) = 0"2 [EU1U;r1 .)
We now compute the asymptotic distributions for several special cases of
interest.
EXAMPLE 8.8. 1 (AR(p)). From (8.8.3),
V(cp) = 0"2 [EUiu;r 1 ,
•
where r/J(B) U1 = Z1 Hence
V(cp) = (}2 r; 1 ,
where rP = E(U1 U; ) = [EX; Xi ] fi= 1 , and
<f, is AN(cp, n - 1 0" 2 rp- 1 ).
In the special cases p = 1 and p = 2 it is easy to express rP- 1 in terms of cp,
giving the results,
EXAMPLE 8.8.2 (MA(q)). From (8.8.3)
where 8 (B) J!; = Z1 • Hence
V(O) = 0" 2 [EVI v;r t ,
v(o) = 0" 2 cqrt.
where rq* is the covariance matrix [E V; J.j] i. i= 1 of the autoregressive process
J!; + e1 J! ; _ 1 + · · · + eq J!; _ q = zl .
Inspection of the results of Example 8.8. 1 yields, for the cases MA(l) and
MA(2),
EXAMPLE 8.8.3 (ARMA( l , 1 )). In this case
V(r/J, 8) = 0" 2
[ E U,l
E U1 J!;
•
J
E UI v.I -1
,
E J!;2
where U1 - r/J U1_ 1 = Z1 and J!; + 8 J!; _ 1 = Z1 A simple calculation gives,
260
8. Estimation for ARMA Models
whence
These asymptotic distributions provide us with a general technique for
computing asymptotic confidence regions for <!> and 0 from the maximum
likelihood or least squares estimates. This is discussed in more detail, together
with an alternative technique based on the likelihood surface, in Section 8.9.
§8.9 Confidence Intervals for the Parameters of a
Causal Invertible ARMA Process
Large-sample confidence regions for the coefficients <j) and 0 of a causal
invertible ARMA process can be derived from the asymptotic distribution of
the maximum likelihood estimators in exactly the same way as those derived
from the asymptotic distribution of the Yule-Walker estimator of<!> in Section
8.2. For the process (8.8. 1 ) let P' = (<j)', 0' ) and let p be the maximum likelihood
estimator of p. Then defining V(p) by (8.8.3) we obtain the approximate ( 1 a)
confidence region for p,
-
(8.9. 1 )
Writing vjj(p) for ther diagonal element o f V(p), we also have the approximate
(1 ct) confidence region for f3i , i.e.
-
{ f3 E IR . I f3
0
/3) s
A
-
n
- 1/2
<1> 1 -a/2 Vii1/2 (p) } ·
A
(8.9.2)
An alternative approach, based on the shape of the reduced likelihood
surface, l(p) = l(<j>, 0), near its minimum can also be used. We shall assume
for the remainder of this section that { X, } is Gaussian. For large n, the
invertibility assumption allows us to approximate n exp(/(p)) (since
-1
n
I7�6 ln r; -+ 0) by
(8.9.3)
where
(8.9.4)
and the maximum likelihood estimator p by the value of p which minimizes
S(p), i.e. by the least squares estimator.
The behavior of P can then be investigated by making a further approx­
imation which reduces the problem to a standard one in the theory of the
§8.9. Confidence Intervals for ARMA Parameters
26 1
general linear model. To do this we define T = CT - 1 CD 112 , where C and D were
defined in Section 8.6, and let
Wn(�) = T - 1 Xn Then by (8.6.5), TT' = Gn(�) and by (8.6.4), the i1h component l¥.t i of l¥.t(�) is
(Xi - XJ/rl�� . The problem of finding is thus equivalent to minimizing
= Wn(�)'Wn(�) with respect to �· Now make the approximation that for
each j, awn;a{3j is constant in a small neighborhood of the true parameter
vector �*, and let
p
S(�)
x
a l¥.t i
= _
[
a�
(f3*)]p+j q 1 .
i.
=
If �0 is some fixed vector in this neighborhood, we can then write
I.e.
(8.9.5)
Yn = X �* + Wn(�*),
where Y n is the transformed vector of observations, Wn(�0) + X �0 , and the
components of Wn(�*) are independent with the distribution N(O, CT2).
Equation (8.9.5) is a linear model of standard type, and our estimator P is
the value of � which minimizes the squared Euclidean distance,
(8.9.6)
The standard theory of least-squares estimation for the general linear model
(see Section 2.6 and Problem 2. 1 9) suggests the approximations,
"'
2
2
2
2
and
and
q),
with
q)
p
CT X (n
"' CT X (p +
approximately independent. These observations yield the ( 1 - a) confidence
regions:
(S p) S(�*) S-(pS) (p)
S(�*) - S(p)
{ � E p+q .. S(�) -, S(p) s p + q F1 _,(p + q, n - p q) ,
S(�) n - p - q
}
{
S(p) S V S 2 S(p) ,
CT 2 : V E IH :
1H
� * ..
-
X 21 - of2 (n - P - q)
Xo!2 (n - P - q)
}
(8.9.7)
(8.9.8)
where F, and x; denote the a-quantiles of the F and x2 distributions. Since
the function can be computed for any value of � using the program PEST,
these regions can be determined numerically. Marginal confidence intervals
for the components of �* can be found analogously. These are:
S
:
{3*
1
{/3· S(�)S(�)S(p)
1
E IH :
�
s
t i -o!2 (n - p - q)
for
n-p-q
some � E !Hp +q with j'h component �} .
(8.9.9)
8. Estimation for ARMA Models
262
§8. 1 0* Asymptotic Behavior of the Yule-Walker
Estimates
Throughout this section, it is assumed that {X, } is a causal AR(p) process
(8. 1 0. 1 )
where { Z, } "' IID(O, a 2 ). The Yule-Walker estimates o f cp and a 2 are given by
equations (8. 1 .3) and (8. 1 .4), or equivalently by
� = f; 1 yp
and
8 2 = y(O) - y��-
lt will be convenient to express (8. 1 0. 1 ) in the form
where Y = (X 1 ,
• • •
l
Y = Xcp + Z
, Xn)', Z = (Z1 , . . . , Zn)' and X is the n
Xo x _ 1 . . . x, �,
Xo .
x2 - p
x1
X=
. .
�
J
x
(8. 10.2)
p design matrix,
xn - p
x - 1 xn -2
Because of the similarity between (8. 10.2) and the general linear model (see
Section 2.6 and Problem 2.1 9), we introduce the "linear regression estimate"
cp * of cp defined by
(8. 1 0.3)
cp * = (X' X) - 1 X'Y.
The vector cp* is not an estimator in the usual sense since it depends on the
values X 1 _ P ' X2 _ P' . . . , X" and not only on the observed values X 1 , , X".
Nevertheless, as we shall see, cp* and � have similar properties. This is because
n -1 (X'X) is the matrix whose (i,j)-element is equal to n - 1 l:� =� -i Xk Xk+ l i -il
and n - 1 X'Y is the vector whose i'h component is n- 1 l:�=� - i XkXk +i · Conse­
quently (see Proposition 7.3.5),
• . •
(8. 1 0.4)
The proof of Theorem 8. 1 . 1 is divided into two parts. First we establish the
limit distribution for cp* and then show that cp* and � must have the same
limit law.
Proposition 8.10.1. With cp * defined as in (8. 1 0.3)
n 112 (cp* - cp) = N(O, a 2 rp- 1 ).
PROOF. From (8. 10. 1 ) and (8. 10.2) we have
§8. 10.* Asymptotic Behavior of the Yule-Walker Estimates
n 112 (cj)* - cj))
=
=
By setting U,
=
(X,_ 1 ,
• • .
263
n 112 ((X'X) -1 X'(Xcj} + Z) - cj))
n (X'Xr 1 (n -112 X' Z).
, X,_ P)'Z,, t
�
1 we have
"
n -1!2 X' Z = n -1/2 L U, .
t= 1
Observe that E U, = 0 and that
,
rr 2 rP ,
h = 0,
E U,U,+h =
h # 0,
'
Op x p
since Z, is independent of (X,_ 1 , , X,_P). Let X, = L}= o t{lj Zr -j be the causal
representation of X,. For a fixed positive integer m set x�ml = L i= o t{lj Zr -j
and U!ml = (X!�l , . . . , X� �� )' Z, and let A. be a fixed element of W. Then A.' U!ml
is a strictly. stationary (m + p)-dependent white noise sequence with variance
given by rr2 A.T�mlA. where qml is the covariance matrix of (X! �[ , . . . , x:�� ).
Hence by Theorem 6.4.2, we have
{
• • •
"
n - 1 12 L A.' u:m) = A.'V(m) where v<m) � N(O, rr 2 qm1 ) .
t= 1
Since rr 2 r�ml --> rr2 rP as m --> co, we have A.'V<ml = V where V � N (0, rr 2 rp ).
Also it is easy to check that
(�
n -1 Var A.'
�
(U!m1 - U, )
)
=
A.' E [(U, - U!ml) (U, - U!ml)' ] A.
-->
0 as m --> co .
Since x�mJ � X, as m --> oo , application of Proposition 6.3.9 and the Cramer­
Wold device gives
n- 1;2 X'Z = N(O, rr2 rp ).
It then follows from (8. 1 0.4) that n (X'X)- 1 � rP-1 , from which we conclude
by Propositions 6.3.8 and 6.4.2 that
D
PRoOF OF THEOREM 8. 1 . 1 . In view of the above proposition and Proposition
6.3.3, it suffices to show that
n 1 12 (� - cj)* ) = op(1).
We have
I.e.
n 112 (� - cj)* ) = n 1/2 f; 1 (yp - n - 1 X'Y)
+
n 112 (f'; 1 - n(X'X) -1 ) n- 1 X'Y.
(8. 1 0.5)
264
8. Estimation for ARMA Models
The i'h component of n 1 12 (y P - n - 1 X' Y) is
n - 1;2
(��
(Xk - XJ(Xk+ i - Xn ) 0
:� Xk Xk+i)
k -;
(8. 1 0.6)
n-i
1
1
1
= n -112 I Xk Xk+i + n 12 Xn ((1 - n- i)Xn - n - I (Xk + Xk+; )),
k =1
k = 1 -i
which by Theorem 7. 1 .2 and Proposition 6. 1 . 1 is op ( l ). Next we show that
n 112 l l f'P- 1 - n(X'X ) -1 11 = op ( 1 ),
(8. 1 0.7)
where I I A II is defined for a p x p matrix A to be the Euclidean length of the
p 2 dimensional vector consisting of all the components of the matrix. A simple
calculation gives
n 1 12 l l tp- 1 - n (X'X) - 1 11 = n 112 l l tp- 1 (n- 1 (X'X) - tp ) n(X'Xr 1 ll
� n 112 llf'P- 1 II II n - 1 (X'X) - f'P II II n (X'X) - 1 1 1 .
Equation (8. 1 0.6) implies that n 1 12 ll n - 1 (X'X) - f'P I I = op ( l ), and since
tP- 1 .!.,. rP- 1 and n(X'X) -1 .!.,. rP- 1 , (8. 1 0.7) follows. Combining (8. 1 0.7) with
the fact that n - 1 X'Y .!.,. yP' gives the desired conclusion that n 1 12 (cf. - cjl *)
op (1 ). Since yP .!.,. yP and cf. .!.,. cjl, it follows that 82 .!.,. (5 2 .
0
=
j� � l
PROOF OF THEOREM 8. 1 .2. The same ideas used in the proof of Theorem 8. 1 . 1
can be adapted to prove Theorem 8. 1 .2. Fix an integer m > p and note that
the linear model in (8. 1 0.2) can be written as
:::
X1 -m
Z1
rPm 1
Xz - m
Zz
rPm z
.
+
. ,
..
..
..
.
zn
xn - m
xn - 2
rPmm
where, smce { X, } is an AR (p) process, cjl� = ( r/Jm 1 , . . . , r/Jm m ) := r,;;- 1 ym
(r/J 1 , . . . , r/JP, 0, . . . , 0)'. The linear regression estimate of cilm in the model
Y = X cilm + Z is then
cil! = (X'X)- 1 X'Y,
which differs by oP ( n -112 ) from the Yule-Walker estimate
cilm = r,;;- 1 Ym·
A
J
=
�
It follows from the proof of Proposition 8. 1 0. 1 that cil! is AN (cjlm , (J 2 r,;;- 1 ) and
hence that
In particular,
Jmm is AN(O, n - 1 ),
since the (m, m) component of r,;;- 1 is (see Problem 8. 1 5)
(det rm - d/(det rm)
=
(det rm - 1 )/((52 det rm - d = (J - 2 .
0
265
§8. 1 1 . * Asymptotic Normality of Parameter Estimators
§8. 1 1 * Asymptotic Normality of Parameter Estimators
In this section we discuss, for a causal invertible ARMA(p, q) process, the
asymptotic normality of an estimator of the coefficient vector which has the
same asymptotic distribution as the least squares and maximum likelihood
estimators. The asymptotic distribution of the maximum likelihood and least
squares estimators will be derived in Section 10.8.
Recall that the least squares estimators minimize the sum of squares,
1
However we shall consider the following approximation to S(cp, 9). First we
approximate the "standardized innovations" (X, - X,)/(r, _ d 112 by Z,(cp, 9)
where
Z 1 (cp, 9) = X 1 ,
�2 (cp, 9) = X2 - r/J1 X 1 - 81 Z t (cp, 9),
Z"(cp, 9) = X" - r/J 1 Xn - t -
(8_ 1 1 . 1)
r/Jp Xn - p - 81 Zn _ 1 (cp, 9) - · · · - 8q Zn - q (cp, 9).
By the assumed invertibility we can write Z, in the form,
··· -
ro
Z, = X, + L nj Xr-j '
!
j=
and then (8. 1 1 . 1) corresponds to setting (see Problem 5. 1 5)
r- 1
Z,(cp, 9) = X, + L niXt-j·
j=!
Using the relations (see Problem 8.2 1 )
ro
I I Z,(cp, 9) - Z, ll s L l ni i ii X t l l ,
j=t
and
we can show that
(8. 1 1 .2)
for all t where a, c 1 , c 2 and k are constants with 0 < a < 1 . It is useful to make
one further approximation to (X, - X,)/(r,_ d 1 12 by linearizing Z,(cp, 9) about
an initial estimate (cp0, 90) of (cp, 9). Thus, if W = (r/J 1 , , r/JP , 81 , , 8q ) and
�� = (cp� , 9� ), we approximate Z,( �) by
• • •
• • •
266
8. Estimation for ARMA Models
Zr (Po )
-
o; ( p - P o ),
(8. 1 1 .3)
where o; = (Dr . 1 (Po ), . . . ' Dt ,p+ q (P o )), with
i = 1 , . . . , p + q.
Then by minimizing the sum of squares
n
L (Zr (Po ) o; ( p - Po )) 2
t= 1
(which by (8. 1 1 .2) and (8. 1 1 .3) is a reasonably good approximation to S(cjl, 9)),
we obtain an estimator pt of p which has the same asymptotic properties
as the least squares estimator p. The estimator pt is easy to compute from
the methods of Section 2.6. Specifically, if we let Z(P0) = (Z1 (P0), . . . , Zn (P0))'
and write D for the n x (p + q) design matrix (D 1 , . . . , D")', then the linear
regression estimate of AP = P - P o is
-
so that
The asymptotic normality of this estimator is established in the following
theorem.
Theorem 8.1 1.1. Let { Xr } be the causal invertible ARMA(p, q) process
Xr - r/J1 Xr - 1 - · · · r/Jp Xr- p = Zr + B1 Zr - 1 + · · · + BqZr-q•
-
where { Zr } IID(O, a 2 ) and where r/J (z) and B(z) have no common zeros.
Suppose that Po = (/30 1 , . . . , flo . p+q )' is a preliminary estimator of p =
(r/J1 , . . . , r/JP , 81 , . . . , Bq)' such that Po - P = o p (n- 1 14), and that pt is the estimator
constructed from P o as described above. Then
(i) n - 1 D'D � a2 v- 1 ( p)
�
where V( p) is a (p + q) x (p + q) nonsingular matrix and
(ii) n 112 ( pt
p) = N(O, V( p)).
In addition for the least squares estimator p, we have
(iii) n 1 12 ( p - p) = N(O, V(p)).
-
(A formula for computing V ( p) is given in (8. 1 1 . 1 4).)
SKETCH OF PROOF. We shall give only an outline of the proofs of (i) and (ii).
The result (iii) is discussed in Section 1 0.8. Expanding Zr (P) in a Taylor series
about P = P0, we have
(8. 1 1 .4)
§8. 1 1 .* Asymptotic Normality of Parameter Estimators
where
267
H, = 21 pi�+q Jp+1q 8{382; azpj (P;") ( /3; - /3o;)(/3j - f3o)
and Pi is between p and P o · Rearranging (8. 1 1 .4) and combining the equations
for t = 1, . . . , n, we obtain the matrix equation,
Z(Po ) = D( p - Po ) - H + Z (p),
where Z( p)
Hence
=
(Z 1 (p), . . . , Zn (P))', D
n 1 12 ( pt
_
=
(D 1 , . . . , Dn )' and H
( 1 , . . . , Hn Y ·
H
=
p) = n 1 12 (Po + ,ftJ p)
= n 1 12 (Po + (D'D) - 1 D' Z(Po ) - p),
_
I.e.
1
n 1 1z ( pt p) = n 1 ;2 (D'D) - 1 D'Z( p) n ;z (D'Dr 1 D'H.
The idea of the proof is to show that
n - 1 D'D .!... a 2 v - 1 ( p),
1
n - 1 12 D'Z( p) => N(O, a4 v - ( p)),
_
_
(8. 1 1 .5)
(8. 1 1 .6)
(8. 1 1 . 7)
and
(8. 1 1 .8)
Once these results are established, the conclusion of the theorem follows from
Propositions 6. 1 . 1 , 6.3.3 and 6.3.8.
From (8. 1 1 . 1 ) we have for t > max(p, q),
D,,;(p) =
-
az,
(p) = X, _; - 01 D,_ l . i (p) - · · · - Oq Dr - q ,; (p),
a ¢J;
i = 1, . . . , p,
and
i = 1, . . . , q,
so that for t
>
max (p, q), D,, ;(P) satisfies the difference equations,
{O(B)D,, ;(P) = X,_;,
O( B)D,, i+p ( p) = Z,_;( p),
i = 1 , . . . , p,
i = 1 , . . . , q.
If we define the two autoregressive processes,
U, = e -1 (B)X, = r 1 (B)Z,,
and
v; = e -1 (B)Z,,
(8. 1 1 .9)
268
8. Estimation for ARMA Models
then it follows from (8. 1 1 .9) and (8. 1 1 .2) that D1, ; (�) can be approximated by
B ; U1 = U1 _
for i = 1, . . . , p,
;
B - v, = V, - i+p for i = p + I , . . . , p + q.
Set U1 = (U1_ 1 , . . . , U1 _ P )' , V1 = ( V,_ 1 , . . . , v,_ q )' and w ; = (U;, V; ).
The limit in (8. 1 1 .6) is established by first showing that
{ip
(8. 1 1 . 1 0)
n - 1 ( � D1 , ;(�0) D1 ,) �0) - � D1 , ; (�) D;j�)) = op ( 1 ),
,
1
using the assumption (�0 - �) = o( n - 114), and then expanding D1 , ; (�0) D1 j�0)
in a Taylor series about � · Next, from the approximation (8. 1 1 . 10) we have
)
n - 1 ( � D1 , ; (�) D1 j�) - � Wr; Wri
,
1
=
op ( 1 ).
If EZ14 < oo , then by applying Theorem 7. 1 . 1 to the individual components
of WI w;, we obtain
n-1
I W1w; .!... E(W1 w; ),
(8. 1 1 . 1 1)
1=1
from which we identify V(�) as
V(�) = 0" 2 [ E (W 1 W '1 ) ] - 1 .
(8. 1 1 . 1 2)
However if we assume only that EZ12 < oo, then (8. 1 1 . 1 1) also holds by the
ergodic theorem (see Hannan, 1 970).
The verification of (8. 1 1 . 7) is completed in the following steps. By expanding
D1, 1 (�0) in a Taylor series about � and using (8. 1 1 .2) to approximate Z1( �) by
Z1, it can be shown that
l (,�
,
n - 1 12
, 1
D1, ; (�0)Z1(�) - �� Wr ; Z1)
=
op ( 1 ).
(8. 1 1 . 1 3)
Since for each i = I , . . . , p + q, Wr , ; , t = . . . , - 1 , 0, 1 , . . . , is a moving average
of Z1 _ 1 , Z1_ 2 ,
the sequence of random vectors W1Z1, t ?: 1 , is uncorrelated
with mean zero and covariance matrix,
. • •
E [(W1Z1)(W1Z1)' ]
=
0' 2 E (W1W;)
= 64 v - 1 (�).
Using the argument given in the proof of Proposition 8.10. 1 , it follows that
n - 112
L W1Z1 => N (O, 0"4V - 1 (�)),
1
n
=1
which, with (8. 1 1 . 1 3), establishes (8. 1 1 .7).
Finally to prove (8. 1 1 .8), it suffices to show that
.
i, j, k = 1 , . . , p + q,
269
Problems
since (/3; - f30J(f3i - f30J = oP (n - 112 ). This term is handled by first showing that
Po and P� may be replaced by P and then that the resulting expression has an
expectation which is bounded in n.
D
Note that the expression for V ( p) simplifies to
EU 1 V'1 - 1
EV1 V;
]
(8. 1 1 . 1 4)
where U, and V, were defined in the course of the proof. The application of
(8. 1 1 . 1 4) was illustrated for several low-order ARMA models in Section 8.8.
Problems
8. 1 . The Wolfer sunspot numbers {X,, t = 1, . . . , 1 00} of Example 1 . 1 .5 have sample
autocovariances y(O) = 1 3 82.2, Y(1) = 1 1 1 4.4, Y(2) = 591 .72 and y(3) 96.2 1 5.
Find the Yule-Walker estimates of I/J1 , I/J2 and CJ2 in the model
=
Y, = I/J1 Y, - 1
1/Jz Y,- 2
+
Z, ,
for the mean-corrected series Y, = X, - 46.93, t = 1 , . . . , 1 00. Use Theorem 8. 1 . 1
t o find 95% confidence intervals for I/J 1 and I/J2 .
+
8.2. Use the Durbin-Levinson algorithm to compute the sample partial autocorrela­
tions ¢1 1 , ¢22 and ¢33 of the Wolfer sunspot numbers. Is the value of ¢33
compatible with the hypothesis that the data is generated by an A R(2) process?
(Use Theorem 8.1.2 and significance level .05.)
8.3. Let (X 1 , . . . , Xp+l )' be a random vector with mean 0 and non-singular covariance
matrix rp +1 = [y(i - j)]r,;;,l . Note that psp{X,, , Xp } xp +1 = I/J 1 Xp + . . . +
1
1/JpX I where II> = rp- Yp (see (8. 1.3)). Show that 1/J(z) = 1 - 1/J I z - . . - 1/Jpz P =f. 0
for l z l :::::; I . (If rjJ(z) = ( 1 - az)((z), with l a l ;:>: 1 , set �(z) = ( 1 - pz)((z) where
p = Corr( Yp + l • YP) and lj = ((B)Xi. Then E l r/J(B)Xp + l l 2 = E l Yp + l - p YP I 2 :::::;
E l Yp + 1 - a YP I 2 = E l r/J(B)X p+ 1 12 with equality holding if and only if p = a.)
...
.
�
8.4. Show that the zero-mean stationary Gaussian process {X, } with spectral density
has the autocovariance function
- n :::::; A. :::::; n,
if h = 0,
if lhl = 1, 3, 5, . . . ,
otherwise.
Hence find the coefficients 84 1 , . . . , 844 in the innovation representation,
4
X s = L B4)Xs -j - X s -)·
j� 1
Find an explicit expression, in terms of X; and X; , i = 1 , . . . , 5, for the maximum
likelihood estimator of CJ2 based on X 1 , . . . , X 5 .
8. Estimation for ARMA Models
270
8.5. Use the program PEST to simulate and file 20 realizations of length 200 of the
Gaussian ARMA( 1 , 1) process,
X, - r/J X,_ 1
=
Z, + OZ,_1 ,
{Z, }
�
WN(O, 1 ),
with r/J = 0 = 6 Use the program PEST as in Example 8.4.1 to find preliminary
models for each series.
.
.
8.6. Use the program PEST to simulate and file 20 realizations of length 200 of the
Gaussian MA( I ) process
x, = z, + oz,_ l ,
{Z, }
�
WN(O, 1 ),
with 0 = .6.
(a) For each series find the moment estimate e of 0 (see Section 8.5), recording
M
the number of times the sample autocorrelation p( l) falls outside the interval
[ -1, 1].
(b) For each series use the program PEST t o find the innovations estimate B1 of
0 (choosing m to minimize the preliminary AICC value).
(c) Use the program PEST to compute the least squares estimate BLs for each
senes.
(d) Use the program PEST to compute the maximum likelihood estimate {J L
M
for each series.
Compare the performances of the four estimators with each other and with the
behavior expected from their asymptotic distributions. Compare the number of
series for which lp(l) l > ! with the expected number based on the asymptotic
probability computed in Problem 7. 10.
8.7. Use equation (8.7.4) to show that if n >
{X1 , . . . , X" } of the causal AR(p) process,
p,
the likelihood of the observations
{Z, }
�
WN(O, a2),
is
L(cjl, a2)
x
=
(2na2) - "12(det Gp) - 112
{
[
exp -�a-2 x�G; 1 X p + I (X, - r/J1 x,_1 - · · · - r/JpX,_p)2
2
t� p + I
j} ,
8.8. Use the result of Problem 8.7 to derive a pair of linear equations for the least
squares estimates of r/J1 and r/J for a causal AR(2) process. Compare your
2
equations with those for the Yule-Walker estimates.
8.9. Given two observations x1 and x from the causal A R ( I ) process
2
such that lx 1 1
#
lx 2 l , find the maximum likelihood estimates of r/J and a2 •
8.1 0. Derive a cubic equation for the maximum likelihood estimate of r/J1 for a causal
AR( I ) process.
271
Problems
8. I 1 . Verify that the maximum likelihood estimators .j, and 0 are those values of cJ>
and 9 which minimize l(cj>, 9) in equation (8.7.7). Also show that the maximum
likelihood estimator of 0" 2 is n - 1 S(«f,, 0).
8. 1 2. For a causal ARMA process, determine the limit of ( 1/n) LJ= I In rj_ 1 . When is
the limit non-zero?
8. 1 3. In Section 8.6, suppose that the covariance matrix r. depends on the parameter
p. Further assume that the n values v 0 , , v._ 1 are unbounded functions of p.
. • .
Show that the function S(Pl
for a suitable choice of p.
=
1
X� r.- (P)X. can be made arbitrarily close to zero
8. 14. Specialize Problem 8. 1 3 to the case when r. is the covariance matrix of an MA(l)
process with () equal to any real number. Show that S(()) = LJ= I (Xj - xy;rj-1
can be made arbitrarily small by choosing 8 sufficiently large.
m
8. 1 5. For an AR(p) process, show that det rm = (det rP)0" 2 ( - pJ for all m > p.
1
Conclude that the (m, m) component of r,;; is (det rm _ 1 )/(det rml = ()"- 2 .
8. 1 6. Simulation of a Gaussian process. Show that n consecutive observations { Xk, k =
1 , . . . , n} of a zero-mean Gaussian process with autocovariance function K(i,j)
can be generated from n iid N (O, 1) random variables Z� > . . . , z., by setting
k
=
1, . . . , n,
where ()kj,j = 1 , . . . , k and vk- l are computed from the innovations algorithm.
(Use equation (8.6.4) to show that (X1 , . . . , X.Y has covariance matrix
[K(i,j)J7. j = ! ·)
8. 1 7. Simulation of an ARMA(p, q) process. Show that a Gaussian ARMA(p, q)
process { X, t = 1, 2, . . . } can be generated from iid N (0, 1) random variables Z 1 ,
Z2 , . . . by first defining
k � m = max(p, q),
k > m,
where ()kj , j 1, . . . , k and vk - I are found from the innovations algorithm with
K(i,j) as defined in (5.3.5). The simulated values of the ARMA process { X, } are
then found recursively from
=
k � m,
k > m.
8. 1 8. Verify the calculation of V(t,b, 8) in Example 8.8.3.
8.19. Verify the calculation of V(¢ 1 , 1,62 ) for the AR(2) process in Example 8.8. 1 .
8.20. Using (8.9. 1 ) and one of the series generated in Problem 8.5, plot the boundary
of an approximate 95% confidence region for (1,6, 8).
8.2 1 .* Verify the relations (8. 1 1 .2).
8. Estimation for ARMA Models
272
8.22.* If � and 9 are the preliminary estimates of ljl and 9 obtained from equations
(8.4.4) and (8.4.5), show that cf> = «J> + 0 p(n - 1 12 ) and 9 = 9 + 0 P(n - l f2).
8.23.* Let l(G) be the function
[(G) = ln(n - 1 X'G - 1 X) + n - 1 ln(det G)
where G is a positive definite matrix. Show that l(aG) = l(G) where a is any
positive constant. Conclude that for an M A( l ) process, the reduced likelihood
1(8) given in (8.7.7) satisfies /(8) = 1(8 - 1 ) and that I(· ) has either a local maximum
or minimum at 8 = 1 .
CHAPTER 9
Model Building and Forecasting with
ARIMA Processes
In this chapter we shall examine the problem of selecting an appropriate
model for a given set of observations {X" t = 1, . . . , n }. If the data (a) exhibits
no apparent deviations from stationarity and (b) has a rapidly decreasing
autocorrelation function, we shall seek a suitable ARMA process to represent
the mean-corrected data. If not, then we shall first look for a transformation
of the data which generates a new series with the properties (a) and (b). This
can frequently be achieved by differencing, leading us to consider the class
of ARIMA (autoregressive-integrated moving average) processes which is
introduced in Section 9. 1 . Once the data has been suitably transformed, the
problem becomes one of finding a satisfactory ARMA(p, q) model, and in
particular of choosing (or identifying) p and q. The sample autocorrelation
and partial autocorrelation functions and the preliminary estimators el-m and
Om of Sections 8.2 and 8.3 can provide useful guidance in this choice. However
our prime criterion for model selection will be the AICC, a modified version
of Akaike's AIC, which is discussed in Section 9.3. According to this criterion
we compute maximum likelihood estimators of <j>, 9 and a2 for a variety of
competing p and q values and choose the fitted model with smallest AICC
value. Other techniques, in particular those which use the R and S arrays of
Gray et al. ( 1978), are discussed in the recent survey of model identification
by de Gooijer et al. ( 1985). If the fitted model is satisfactory, the residuals
(see Section 9.4) should resemble white noise. A number of tests designed to
check this are described in Section 9.4, and these should be applied to the
minimum-AICC model to make sure that the residuals are consistent with
their expected behaviour under the model. If they are not, then competing
models (models with AICC-value close to the minimum) should be checked
until we find one which passes the goodness of fit tests. In some cases a small
difference in AICC-value (say less than 2) between two satisfactory models
9. Model Building and Forecasting with ARIMA Processes
274
may be ignored in the interest of model simplicity. In Section 9.5 we consider
the prediction of ARIMA processes, which can be treated as an extension
of the techniques developed for ARMA processes in Section 5.3. Finally we
examine the fitting and prediction of seasonal ARIMA (SARIMA) models,
whose analysis, except for certain aspects of model identification, is quite
analogous to that of ARIMA processes.
§9. 1 ARIMA Models for Non-Stationary
Time Series
We have already discussed the importance of the class of ARMA models
for representing stationary series. A generalization of this class, which incor­
porates a wide range of non-stationary series, is provided by the ARIMA
processes, i.e. processes which, after differencing finitely many times, reduce
to ARMA processes.
Definition 9.1.1 (The ARIMA(p, d, q) Process). If d is a non-negative integer,
then {X,} is said to be an ARIMA(p, d, q) process if Y, := (1 - B)dX, is a causal
ARMA(p, q) process.
This definition means that {X, } satisfies a difference equation of the form
ifJ* (B)X,
=
¢J(B)( 1 - B)dX, = 8(B)Z0
{Z, }
� WN(O, cr 2 ),
(9. 1 . 1)
where ifJ(z) and 8(z) are polynomials of degrees p and q respectively and
ifJ(z) # 0 for l z l � 1 . The polynomial ifJ*(z) has a zero of order d at z = 1 . The
process {X,} is stationary if and only if d = 0, in which case it reduces to an
ARMA(p, q) process.
Notice that if d :2:: 1 we can add an arbitrary polynomial trend of degree
(d - 1) to {X,} without violating the difference equation (9. 1 . 1 ). ARIMA
models are therefore useful for representing data with trend (see Sections 1 .4
and 9.2). It should be noted however that ARIMA processes can also be
appropriate for modelling series with no trend. Except when d = 0, the mean
of {X, } is not determined by equation (9. 1 . 1 ) and it can in particular be zero.
Since for d :2:: 1, equation (9. 1 . 1) determines the second order properties of
{ ( 1 - B)dX, } but not those of {X, } (Problem 9. 1 ), estimation of cjl, 0 and cr 2
will be based on the observed differences (1 - B)d X,. Additional assumptions
are needed for prediction (see Section 9.5).
ExAMPLE 9. 1 . 1 . {X, } is an ARIMA ( 1 , 1, 0) process if for some ifJ E ( - 1 , 1),
(1 - ¢JB) (1 - B)X,
We can then write
=
Z,,
{Z, }
� WN(O, cr 2 ).
§9. 1 . ARIMA Models for Non-Stationary Time Series
275
90
80
70
60
50
40
30
20
10
0
-10
- 20
0
20
60
40
80
1 00
1 20
1 40
1 60
1 80
200
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0 2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0 5
-0.6
-0.7
-0 8
-0.9
-1
0
10
20
30
40
(b)
Figure 9. 1 . (a) A realization of { X 1 , . , X2 0 0 } for the ARIMA process of Example
9. 1 . 1 , (b) the sample ACF and (c) the sample PACF.
. .
9. Model Building and Forecasting with ARIMA Processes
276
1
0.9
0.8
07
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-07
-0.8
-0.9
-1
v
0
_B'�
'off
�
.B-e,
-e-=-tJ
"'-=
20
10
�
�
E"
!9-ff -=-tl
30
40
(c)
Figure 9. 1 . (continued)
t
xt = Xo + L lj ,
t �
j= l
1,
where
Y; = ( 1
-
OCJ
B)Xt = I ,pizt -j ·
j=O
A realization of {X 1 , . . . , X2 00 } with X0 = 0, rjJ = .8 and a = 1 is shown
in Figure 9. 1 together with the sample autocorrelation and partial auto­
correlation functions.
A distinctive feature of the data which suggests the appropriateness of
an ARIMA model is the slowly decaying positive sample autocorrelation
function seen in Figure 9. 1 . If therefore we were given only the data and wished
to find an appropriate model it would be natural to apply the operator
V = 1 - B repeatedly in the hope that for some j, { ViX1} will have a rapidly
decaying sample autocorrelation function compatible with that of an ARMA
process with no zeroes of the autoregressive polynomial near the unit circle.
For the particular time series in this example, one application of the operator
V produces the realization shown in Figure 9.2, whose sample autocorrelation
and partial autocorrelation functions suggest an AR(l) model for {VX1 } . The
maximum likelihood estimates of rjJ and a 2 obtained from PEST (under the
assumption that E (V X1) = 0) are .808 and .978 respectively, giving the model,
277
§9. 1 . ARIMA Models for Non-Stationary Time Series
0
- 1
-2
-3
-4
-5
-6
0
20
40
60
80
1 00
1 20
1 40
1 60
1 80
200
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
20
30
40
(b)
Figure 9.2. (a) The differenced series, Y, = X, + 1 - X,, t = 1, . . . , 199, of Example 9. 1 . 1 ,
(b) the sample ACF o f { Y, } and (c) the sample PACF o f { Y, } .
9. Model Building and Forecasting with ARIMA Processes
278
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
�0.1
�0.2
� 0. 3
� 0. 4
�0.5
�0.6
�0.7
� o-. 8
�0.9
�1
0
10
20
30
40
(c)
Figure 9.2. (continued)
( 1 - .808B)( 1 - B)X, = Z,,
{Z, }
� WN(0, .978),
(9. 1 .2)
which bears a close resemblance to the true underlying process,
(9. 1 .3)
( 1 - .8B) ( 1 - B)X, = Z,,
Instead of differencing the series in Figure 9. 1 we could proceed more
directly by attempting to fit an AR(2) process as suggested by the sample
partial autocorrelation function. Maximum likelihood estimation, carried out
using the program PEST and assuming that EX, = 0, gives the model,
(1 - 1 .804B + .806B 2 )X,
=
(1 - .8 1 5B) ( l - .989B)X, = Z,,
{Z, }
� WN(0, .970),
(9. 1 .4)
which, although stationary, has coefficients which closely resemble those of
the true non-stationary process (9. 1 .3).
From a sample of finite length it will be extremely difficult to distinguish
between a non-stationary process such as (9. 1 .3) for which �*( 1 ) = 0, and
a process such as (9. 1 .4), which has very similar coefficients but for which �*
has all of its zeroes outside the unit circle. In either case however, if it is
possible by differencing to generate a series with rapidly decaying sample
autocorrelation function, then the differenced data can be fitted by a low order
ARMA process whose autoregressive polynomial �* has zeroes which are
§9. 1 . ARIMA Models for Non-Stationary Time Series
279
comfortably outside the unit circle. This means that the fitted parameters will
be well away from the boundary of the allowable parameter set. This is
desirable for numerical computation of parameter estimates and can be quite
critical for some methods of estimation. For example if we apply the Yule­
Walker equations to fit an AR(2) model to the data in Figure 9. 1 , we obtain
the model
(1 - 1 .282B + .290B2 )X, = Z,
{Z,} "' WN(O, 6.435),
(9. 1 .5)
which bears little resemblance to either the maximum likelihood model (9. 1 .4)
or the true model (9. 1 .3). In this case the matrix R 2 appearing in (8.1.7) is
nearly singular.
An obvious limitation in fitting an ARIMA(p, d, q) process { X, } to data
is that { X, } is permitted to be non-stationary only in a very special way,
i.e. by allowing the polynomial r/J*(B) in the representation r/J*(B)X, = 8(B)Z,
to have a zero of positive multiplicity d at the point 1 on the unit circle.
Such models are appropriate when the sample autocorrelation function
of the data is a slowly decaying positive function as in Figure 9. 1 , since
sample autocorrelation functions of this form are associated with models
r/J* (B)X, = 8(B)Z, in which r/J* has a zero either at or close to 1 .
Sample autocorrelations with slowly decaying oscillatory behavior as in
Figures 9.3 and 9.4 are associated with models r/J*(B)X, = 8(B)Z, in which r/J*
has a zero close to e i8 for some 8 E ( - n, n] other than {) = 0. Figure 9.3 was
obtained from a sample of 200 simulated observations from the process,
X, + .99X,_ 1 = Z,,
{Z, } "' WN (O, 1 ),
for which r/J* has a zero near e;". Figure 9.4 shows the sample autocorrelation
function of 200 observations from the process,
X, - X,_! + .99X,_z = Z"
{Z, } ,... WN(O, 1 ),
for which r/J* has zeroes near e± i"13 . In such cases the sample autocor­
relations can be made to decay more rapidly by applying the operator
[1 - (2 cos 8)B + B 2 ] = (1 - e i8B) ( 1 - e-;oB) to the data, instead of the
operator ( 1 - B) as in the previous paragraph. If 2n/8 is close to some
integer s then the sample autocorrelation function will be nearly periodic with
period s and the operator Vs = (1 - Bs ) (with zeroes near B = e±io ) can also
be applied to produce a series with more rapidly decaying autocorrelation
function (see also Section 9.6). The sample autocorrelation functions in Figures
9.3 and 9.4 are nearly periodic with periods 2 and 6 respectively. Applying the
operators (1 - B 2 ) to the first series and ( 1 - B 6 ) to the second gives two new
series with the much more rapidly decaying sample autocorrelation functions
shown in Figure 9.5 and 9.6 respectively. For the new series it is then not
difficult to fit an ARMA model rjJ(B)X, = 8(B)Z, for which the zeroes of rjJ are
all well outside the unit circle. Techniques for identifying and determining
such ARMA models will be discussed in subsequent sections.
9. Model Building and Forecasting with ARIMA Processes
280
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
-
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
- 1
� w
I \ I I I\
I \I \I
� g �
�
A
�
�
0
10
A
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0 5
0.6
-0.7
-0.8
- 0.9
-1
0
10
(b)
Figure 9.3. The sample ACF (a) and PACF (b) of a realization of length 200 of the
process X, + .99X,_ 1 = Z,, {Z, } WN(O, 1 ).
�
§9. 1. ARIMA Models for Non-Stationary Time Series
28 1
1
0 9
0.8
0.7
0.6
0.5
0.4
0.3
0 2
0.1
0 4-4-�����-+�-+�-+�-+����r-�+-�
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
40
30
10
20
0
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
o ��--����--���sc���L-��
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
- 1
0
10
(b)
20
30
40
Figure 9.4. The sample ACF (a) and PACF (b) of a realization of length 200 of the
process X, � X,_ 1 + 0.99X,_ 2 = Z,, {Z, } WN(O, 1 ).
�
9. Model Building and Forecasting with A RIMA Processes
282
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0 5
0.4
0.3
0 2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
(b)
Figure 9.5. The sample ACF (a) and PACF (b) of { ( 1 B2 )X, } where {X, } is the series
whose sample ACF and PACF are shown in Figure 9.3.
-
§9. 1. ARIMA Models for Non-Stationary Time Series
283
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0 2
0. 1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
- 0.3
-0.4
-0.5
-0.6
-0.7
-0.8
- 0.9
-1
0
10
(b)
Figure 9.6. The sample ACF (a) and PACF (b) of { ( 1 - B6)X, } where {X, } is the series
whose sample ACF and PACF are shown in Figure 9.4.
284
9. Model Building and Forecasting with ARIMA Processes
§9.2 Identification Techniques
(a) Preliminary Transformations. The estimation methods described in
Chapter 8 enable us to find, for given values of p and q, an ARMA(p, q) model
to fit a given series of data. For this procedure to be meaningful it must be at
least plausible that the data is in fact a realization of an ARMA process and
in particular that it is a realization of a stationary process. If the data display
characteristics suggesting non-stationarity (e.g. trend and seasonality), then it
may be necessary to make a transformation so as to produce a new series
which is more compatible with the assumption of stationarity.
Deviations from stationarity may be suggested by the graph of the series
itself or by the sample autocorrelation function or both.
Inspection of the graph of the series will occasionally reveal a strong
dependence of variability on the level of the series, in which case the data
should first be transformed to reduce or eliminate this dependence. For
example, Figure 9.7 shows the International Airline Passenger Data { U0
t 1 , . . . , 1 44} of Box and Jenkins ( 1 976), p. 5 3 l . lt is clear from the graph that
the variability increases as Ut increases. On the other hand the transformed
series V, = In Ut, shown in Figure 9.8, displays no increase in variability with
V,. The logarithmic transformation used here is in fact appropriate whenever
{ Ut } is a series whose standard deviation increases linearly with the mean. For
a systematic account of a general class of variance-stabilizing transformations,
we refer the reader to Box and Cox ( 1964). The defining equation for the
general Box-Cox transformation /;. is
=
ut 2 o, .A > o,
ut > o, .A = o,
and the program PEST provides the option of applying /;. (with (0 s .A s 1 . 5)
prior to the elimination of trend and/or seasonality from the data. In practice,
if a Box-Cox transformation is necessary, it is often the case that either fo
or !1 12 is adequate.
Trend and seasonality are usually detected by inspecting the graph of the
(possibly transformed) series. However they are also characterized by sample
autocorrelation functions which are slowly decaying and nearly periodic
respectively. The elimination of trend and seasonality was discussed in
Section 1 .4 where we described two methods:
(i) "classical decomposition" of the series into a trend component, a seasonal
component, and a random residual component, and
(ii) differencing.
The program PEST(Option 1 ) offers a choice between these techniques.
Both methods were applied to the transformed Airline Data V, = In Ut of the
preceding paragraph. Figures 9.9 and 9. 1 0 show respectively the two series
found from PEST by (i) estimating and removing from { V,} a linear trend
component and a seasonal component of period 1 2, and (ii) applying the
285
§9.2. Identification Techniques
600
500
�
1/)
u
c
0
1/)
:J
0
.c
t=.
400
300
200
1 00
0
0
12
24
36
48
60
84
72
96
1 08
1 20
1 32
1 44
Figure 9.7. International airline passengers; monthly totals in tlrousands of passengers
{ U, t I , . , 144} from January 1 949 to December 1960 (Box and Jenkins ( 1970)).
=
.
.
6.5
6.4
6.3
6.2
6. 1
6
5.9
5.8
5.7
5.6
5.5
5.4
5.3
5.2
5. 1
5
4.9
4.8
4.7
4.6
0
12
24
36
48
60
72
Figure 9.8. Natural logarithms, V, = In U,, t
84
=
96
1 08
1 20
1 32
1 44
I . . , 1 44, of the data in Figure 9.7.
,
.
286
9. Model Building and Forecasting with ARIMA Processes
0
12
24
36
48
72
60
84
96
1 08
1 20
1
32
1 44
Figure 9.9. Residuals after removing a linear trend and seasonal component from the
data { V,} of Figure 9.8.
0
1 2
24
36
48
60
72
84
96
1 08
1 20
1 32
Figure 9.1 0. The differenced series {VV 1 2 V, + 1 3 } where { V, } is the data shown in
Figure 9.8.
§9.2. Identification Techniques
287
difference operator (1 - B) ( 1 - B 1 2 ) to { l--; } . Neither of the two resulting
series display any apparent deviations from stationarity, nor do their sample
autocorrelation functions (the sample autocorrelation function of {VV 1 l--; } is
2
shown in Figure 9. 1 1 ).
After the elimination of trend and seasonality it is still possible that the
sample autocorrelation function may appear to be that of a non-stationary or
nearly non-stationary process, in which case further differencing as described
in Section 9. 1 may be carried out.
(b) The Identification Problem. Let { Xr } denote the mean-corrected trans­
formed series, found as described in (a). The problem now is to find the most
satisfactory ARMA(p, q) model to represent { Xr }. If p and q were known
in advance this would be a straightforward application of the estimation
techniques developed in Chapter 8. However this is usually not the case, so
that it becomes necessary also to identify appropriate values for p and q.
It might appear at first sight that the higher the values of p and q chosen,
the better the fitted model will be. For example, if we fit a sequence of AR(p)
processes, p = 1, 2, . . . , the maximum likelihood estimate, 82 , of (J 2 generally
decreases monotonically as p increases (see e.g. Table 9.2). However we must
beware of the danger of overfitting, i.e. of tailoring the fit too closely to the
particular numbers observed. An extreme case of overfitting (in a somewhat
different context) occurs if we fit a polynomial of degree 99 to 1 00 observations
generated from the model Y, = a + bt + Z0 where {Zr } is an independent
sequence of standard normal random variables. The fit will be perfect for the
given data set, but use of the model to predict future values may result in gross
errors.
Criteria have been developed, in particular Akaike's AIC criterion and
Parzen's CAT criterion, which attempt to prevent overfitting by effectively
assigning a cost to the introduction of each additional parameter. In Section
9.3 we discuss a bias-corrected form of the AIC, defined for an ARMA(p, q)
model with coefficient vectors <!> and 9, by
AICC(<j>, 9) = - 2 ln L(<j>, 9, S(<j>, 9)/n) + 2(p + q + l )nj(n - p - q - 2),
(9.2. 1 )
where L(<j>, 9, (J 2 ) i s the likelihood o f the data under the Gaussian ARMA
model with parameters (<j>, 9, (J 2 ) and S(<j>, 9) is the residual sum of squares
defined in Section 8.7. On the basis of the analysis given in Section 9.3, the
model selected is the one which minimizes the value of AICC. Intuitively
one can think of 2(p + q + l)n/(n - p - q - 2) in (9.2. 1 ) as a penalty term
to discourage over-parameterization. Once a model has been found which
minimizes the AICC value, it must then be checked for goodness of fit
(essentially by checking that the residuals are like white noise) as discussed
in Section 9.4.
Introduction of the AICC (or analogous) statistic reduces model identi­
fication to a well-defined problem. However the search for a model which
minimizes the AICC can be very lengthy without some idea of the class
288
9. Model Building and Forecasting with ARIMA Processes
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
- 0.8
-0.9
-1
0
10
(b)
Figure 9.1 1 . The sample ACF (a) and PACF (b) of the series {VV1 2 V, +1 3 } shown in
Figure 9. 10.
§9.2. Identification Techniques
289
of models to be explored. A variety of techniques can be used to accel­
erate the search by providing us with preliminary estimates of p and q, and
possibly also preliminary estimates of the coefficients.
The primary tools used as indicators of p and q are the sample autocor­
relation and partial autocorrelation functions and the preliminary estimators
�m and Om , m I , 2, . . . , discussed in Sections 8.2 and 8.3 respectively.
From these it is usually easy to judge whether a low order autoregressive or
moving average model will prove satisfactory. If so then we can proceed by
successively fitting models of orders I , 2, 3, . . . , until we find a minimum value
of the AICC. (Mixed models should also be considered before making a final
selection.)
=
ExAMPLE 9.2. 1 . Figure 9. 1 2 shows the sample autocorrelation and partial
autocorrelation functions of a series of 200 observations from a zero-mean
stationary process. They suggest an autoregressive model of order 2 (or
perhaps 3) for the data. This suggestion is supported by the Yule-Walker
estimators �m ' m I , 2, . . . , of the coefficient vectors of autoregressive models
of order m. The Yule-Walker estimates �mj ' j 1 , . . . , m; m = I , . . . , 5 are
shown in Table 9. 1 with the corresponding ratios,
=
=
(9.2.2)
1
where a-�j is the r diagonal element of 6- 2 r;;,- (�m)/n, the estimated version of
the asymptotic covariance matrix of �m appearing in Theorem 8. 1 .2. A value
of rmj with absolute value greater than 1 causes us to reject, at approximate
level .05, the hypothesis that iflmj is zero (assuming that the true underlying
process is an AR(p) process with p � m).
The next step is to fit autoregressive models of orders 1 , 2, . . . , by maximum
likelihood, using the Yule-Walker estimates as initial values for the maximi­
zation algorithm. The maximum likelihood estimates for the mean-corrected
data are shown in Table 9.2 together with the corresponding AICC values.
=
Table 9. 1 . The Yule-Walker Estimates �mj ,j = 1, . . . , m;
m I, . . . , 5, and the Ratios rmj (in Parentheses) for
the Data of Example 9.2. 1
m
2
3
4
5
j
.878
(1 3.255)
1 .410
(1 2.785)
1 .301
(9.545)
1 .293
(9.339)
1 .295
(9.361)
2
3
4
5
- .606
( - 5.490)
- .352
( - 1 .595)
- .369
( - 1.632)
- .362
( - 1 .602)
- . 1 80
( - 1. 3 1 8)
-.119
( - .526)
- .099
( - .428)
- .047
( - .338)
-.117
( - .5 1 6)
.054
(.391 )
9. Model Building and Forecasting with ARIMA Processes
290
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
- 0. 4
-0 5
-0 6
-0 7
- 0.8
- 0.9
-1
0
10
(b)
Figure 9. 1 2. The sample ACF (a) and PACF (b) for the data of Example 9.2. 1 .
§9.2. Identification Techniques
29 1
Table 9.2 The Maximum Likelihood Estimates �mi' &;,,j = 1 , . . . , m; m = 1 , . . . , 5, and
the Corresponding AICC, BIC and FPE Values for the Data of Example 9.2. 1
m
2
3
4
5
j
2
.892
1 .47 1
1 .387
1 .383
1 .383
- .656
- .47 1
- .486
- .484
3
- . 127
- .08 1
- .072
4
- .033
- .059
5
.01 9
A2
(Jm
1 .547
.885
.871
.870
.870
AICC
BIC
FPE
660.44
5 5 1.94
550.86
552.75
554.81
662.40
558.29
561.49
567. 3 1
573.01
1 .562
.903
.897
.905
.914
The BIC and FPE statistics (which are analogous to the AICC but with
different penalties for the introduction of additional parameters) are also
shown. All three statistics are discussed in Section 9.3.
From Table 9.2 we see that the autoregressive model selected by the AICC
criterion for the mean-corrected data {X,} is
X, - 1 .387X, _1 + .47 1 X, _ 2 + . 1 27X, _3 = Z"
{Z, } ....., WN(0, .87 1 ).
(9.2.3)
Application of the goodness of fit tests to be described in Section 9.4 shows
that this model is indeed satisfactory. (If the residuals for the model (9.2.3) had
turned out to be incompatible with white noise, it would be necessary to
modify the model. The model modification technique described below in (d)
is frequently useful for this purpose.)
Approximate confidence intervals for the coefficients can be found from
the asymptotic distribution of the maximum likelihood estimators given in
Section 8.8. The program PEST approximates the covariance matrix V(p) of
(8.8.3) by 2 H - ' ( p), where H(P) is the Hessian matrix of the reduced likelihood
evaluated at p. From this we obtain the asymptotic .95 confidence bounds
sj ± 1 .96 [vjj (P)/nJ 1 12 for f3j , where vjj(p) is the r diagonal element of V(p). This
gives the following bounds for the coefficients tP 1 , tP2 , tP3 .
¢ , : 1 .387 ± . 1 36,
¢ 2 : - .47 1 ± .226,
¢ 3 : - . 127 ± . 1 38.
The confidence bounds for tP3 suggest that perhaps an AR(2) model should
have been fitted to this data since 0 falls between the bounds for tP3 . In fact if
we had minimized the BIC rather than the AICC (see Table 9.2) we would
have chosen the AR(2) model. The BIC is a Bayesian modification of the
AIC criterion which was introduced by Akaike to correct the tendency of
the latter to overestimate the number of parameters. The true model in this
example was
{Z, } ....., WN(O, 1 ).
(9.2.4)
ExAMPLE 9.2.2. Inspection of the sample autocorrelation and partial autocor­
J elation functions of the logged and differenced airline data {VV 1 2 v; } shown
292
9. Model Building and Forecasting with ARIMA Processes
in Figure 9. 1 1 suggests the possibility of either a moving average model of
order 1 2 (or perhaps 23) with a large number of zero coefficients, or alterna­
tively of an autoregressive model of order 1 2.
To explore these possibilities further, the program PEST (Option 3) was
used to compute the preliminary estimates om and �m ' m = 1 5, 25, 30, as
described in Sections 8.2 and 8.3 respectively. These are shown, with the ratios
rmj of each estimated coefficient to 1 .96 times its standard error, in Tables 9.3
and 9.4 respectively. For �m• rmj was defined by equation (9.2.2). For Om ,
rmj = 8m)(1.966'mj ),
where by Theorem 8.3. 1 , a-;,j = n - 1 ( 1 + 8;, 1 + · · · + e;,, j- d, j >
n-1 .
1 , and a;, 1
=
For the preliminary moving average model of order 30 we have plotted the
ratios rmj , j = 1 , . . . , 30, with boundaries at the critical value 1, in Figure 9. 1 3.
The graph suggests that we consider models with non-zero coefficients at lags
1, 1 2, 23, possibly 3, and possibly also 5, 9 and 1 3. Of the models with non-zero
coefficients at one or more of the lags 1, 3, 1 2 and 23, it is found that the one
with smallest AICC value ( - 486.04) is (for X1 = VV 1 2 v; - .00029)
Zt - .355Zt - 1 - .201 Zt _ 3 - .524Zt - 1 2 + .24 1 Zt_ 2 3 ,
(9.2.5)
where {Z1 } � WN(O, .001 25). If we expand the class of models considered to
include non-zero coefficients at one or more of the lags 5, 9 and 13 suggested
Xt
0
=
10
20
30
Figure 9 . 1 3 . Ratio o f the estimated coefficient B30,j to 1 .96 times its standard error,
j = I, . . . , 30 (from Table 9.3).
§9.2. Identification Techniques
293
Table 9.3. The Innovation Estimates Om , vm ,
m =
m =
15
White Noise Variance
.0014261
M A Coefficients
- . 1 5392
.071 00
- .40660
.03474
- .04885
.04968
. 1 5 1 23
- .36395
- .07263
Ratio of Coefficients to ( 1 .96 * Standard Error)
- .83086
.38408
- 2.37442
. 1 8465
- .25991
.26458
.7571 2
- 1.9 1 782
- .38355
m =
.02 1 85
.09606
- .00745
.0885 1
- .07203
. 1 4956
. 1 1 679
.5103 1
- .03700
.47294
- .38 1 22
.74253
- .02483
. 1 2737
- .03 196
- .05076
- .05955
. 12247
- .09385
. 1 2591
- .06775
- . 10028
- . 1 3471
.68497
- . 1 5528
- .24370
- .27954
.66421
- .501 28
.61 149
- .32501
- .470 1 6
- .02979
. 1 3925
- .02092
- .04987
- .06067
- .0401 2
. 1401 8
- .09032
. 1 4 1 33
- .06822
- .08874
- .05 1 06
- . 1 6 1 79
.74792
- .1 0099
- .23740
- .28201
- . 1 8554
.761 08
- .48 1 16
.6821 3
- .32447
- .4 1 1 99
- .23605
25
White Noise Variance
.001 2638
MA Coefficients
.05981
- .36499
.05701
- .01 327
- .47123
.01909
- . 1 1 667
.061 24
.00908
- .04050
Ratio of Coefficients to ( 1 .96 * Standard
- 2. 1 3 1 40
.328 1 3
- .071 4 1
.30723
. 10 1 60
- 2.50732
- .56352
.29444
.04352
- . 1 9399
m =
1 5, 25, 30 for Ex. 9.2.2
- . 1481 2
- .03646
. 1 3557
- .0 1722
.24405
Error)
- .8 1 1 27
- . 19619
.66284
- .08271
1 . 168 1 8
30
White Noise Variance
.0012483
MA Coefficients
- .35719
- . 1 5764
.06006
.03632
.01689
- .063 1 3
.02333
- .47895
. 1 5424
- . 1 3 1 03
.0521 6
- .03701
.01 85 1
- .0351 3
.25435
.03687
- .02951
.04555
Ratio of Coefficients to ( 1 .96* Standard Error)
- .86556
.33033
- 2.08588
. 1 9553
.09089
- .33967
. 1 2387
- 2.54232
.75069
- . 17626
.24861
- .628 1 5
1 .20720
- . 1 6683
.0879 1
. 1 7077
- . 1 3662
.2 1083
294
9. Model Building and Forecasting with ARIMA Processes
Table 9.4. The Yule� Walker Estimates �m ' vm,
Example 9.2.2
m =
1 5, 25, 30 for
m = 15
White Noise Variance
.0014262
AR Coefficients
- .40660
- . 1 6261
- .09364
- .00421
.042 1 6
.09282
- .09957
- .38601
- .1 4 1 60
Ratio of Coefficients to ( 1 .96 * Standard Error)
- .88695
- .50827
- 2.37494
.535 1 7
.24526
- .02452
- .77237
- .575 1 7
- 2.22798
- .091 85
. 1 5873
- .08563
.06259
.01347
- .021 74
- .5301 6
.92255
- .4648 1
.361 5 5
.07765
- . 1 2701
- .1 32 1 5
. 1 5464
- .07800
- .046 1 1
- . 1 0408
.06052
.01640
- .04537
- .0938 1
- . 1 0264
- .72665
.84902
- .42788
- .25 1 90
- .57639
.33 1 37
.08970
- .24836
- .5 1 488
- .60263
- . 14058
. 1 5 146
- .07941
- .033 1 6
- . 1 1 330
.045 1 4
.07806
.02239
- .03957
- . 10 1 1 3
- . 1 0948
- .00489
- .76545
.82523
- .39393
- . 1 8026
- .61407
.24853
.421 48
. 1 2 1 20
- .1 9750
- .54978
- .59267
- .02860
m = 25
White Noise Variance
.001 2638
AR Coefficients
- .36498
- .07087
- . 1 5643
. 1 1 335
.03 1 22
- .00683
- .038 1 5
- .44895
- .1 9 1 80
- . 1 2301
.051 0 1
. 1 0 1 60
.0885 1
- .03933
. 1 0959
Ratio of Coefficients to ( 1 .96 * Standard Error)
- 2. 1 4268
-.86904
- .39253
.62205
. 1 7052
- .03747
- .20886
- 2.46259
- .98302
- .67279
.28007
.55726
.48469
- .2 1 632
.60882
m = 30
White Noise Variance
.0012483
AR Coefficients
- .35718
- .06759
- . 1 5995
.09844
.04452
- .0 1 653
- .03322
- .46045
- . 1 8454
- . 1 5279
.06951
.09498
.08865
- .03566
. 1 1481
.00673
- .07332
.01 324
Ratio of Coefficients to ( 1 .96 * Standard Error)
- 2.08586
- .3721 0
- .88070
.53284
.24 1 32
- .09005
- . 1 8061
- 2.503 1 0
- .925 14
- .76248
.34477
.4761 5
.47991
- . 1 9433
.6253 1
.072 1 1
.03638
- .40372
295
§9.2. Identification Techniques
Table 9.5. Moving Average Models for Example 9.2.2
j
3
5
12
13
23
AICC
62
0 1 30
Model (9.2.5)
{Jj
- .355
- .201
0
- .524
0
.241
- 486.04
.00 1 25
.00125
{Jj
- .433
- .306
.238
- .656
0
.352
- 489.95
.001 03
.00 1 1 7
{)j
- .396
0
0
- .6 1 4
.243
0
- 483.38
.001 34
.00 1 34
Model (9.2.6)
Model (9.2.9)
by Figure 9. 1 3, we find that there is a model with even smaller AICC value
than (9.2.5), namely
X1 = Z1 - .433Z1 _ 1 - .306Z1 _ 3 + .238Z1 _ 5
(9.2.6)
- .65621_ 1 2 + .35221 - 2 3 ,
with {Z1} � WN(O, .001 03) and AICC = - 489.95. Since the process defined
by (9.2.6) passes the goodness of fit tests in Section 9.4, we choose it as our
moving average model for the data.
The substantial reduction in white noise variance achieved by (9.2.6) must
be interpreted carefully since (9.2.5) is an invertible model and (9.2.6) is not.
Thus for (9.2.6) the asymptotic one-step linear predictor variance (the white
noise variance of the equivalent invertible version of the model) is not a 2 but
a 2/ 1 b1 · · · bj l 2 (see Section 4.4), where b 1 , . . . , bj are the zeroes of the moving
average polynomial 8(z) inside the unit circle. For the model (9.2.6), j = 4 and
I b1 · · · bj l = .939, so the asymptotic one-step predictor variance is .001 1 7, which
is still noticeably smaller than the value .00 1 25 for (9.2.5). The maximum
likelihood program PEST also computes the estimated mean squared error
of prediction, v"_ 1 , for the last observation based on the first (n - 1 ). This is
simply r" _ 1 times the maximum likelihood estimator of a 2 (see Section 8.7). It
can be seen in Table 9.5 that v" _ 1 is quite close to IJ2 for each of the invertible
models (9.2.5) and (9.2.9).
The model (9.2.6) does of course have an invertible version with the same
likelihood (which can be found by using the program PEST), however it will
have small non-zero coefficients at lags other than 1 , 3, 5, 12 and 23. If we
constrain the model to be invertible and to have zero coefficients except at
lags 1, 3, 5, 12 and 23, the likelihood is maximized for parameter values
precisely on the boundary of the invertible region and the maximum is strictly
less than the likelihood of the model (9.2.6). Thus in the presence of lag
constraints, insistence on invertibility can make it impossible to achieve the
maximum value of the likelihood.
A similar analysis of the data, starting from Table 9.4 and fitting auto­
regressive rather than moving average models, leads first to the model,
(9.2.7)
296
9. Model Building and Forecasting with ARIMA Processes
with { Z,} � WN(O, .001 46) and AICC = - 472.53. Allowing non-zero coef­
ficients also at lags 3, 4, 9 and 1 6, we obtain the improved model,
X, + .365X,_1 + .467X,_ 1 2 + . 1 79X,_ 1 3 + . 1 29Xr - t 6 = Z,, (9.2.8)
with { Z,} � WN(0, .001 42) and AICC = - 472.95. However neither (9.2.7)
nor (9.2.8) comes close to the moving average model (9.2.6) from the point of
view of the AICC value.
It is interesting to compare the model (9.2.6) with the multiplicative model
for {VV 1 2 V, } fitted by Box and Jenkins (1976), i.e. with Xr* = VV 1 2 V, ,
{Z, } � WN(0, .001 34). (9.2.9)
X,* = ( 1 - .396B)( 1 - . 6 1 4B 1 2 )Z"
The AICC value for this model is - 483.38, making it preferable to (9.2.8) but
inferior to both (9.2.5) and to our chosen model (9.2.6). Characteristics of the
three moving average models can be compared by examining Table 9.5.
(c) Identification of Mixed Models. The identification of a pure auto­
regressive or moving average process is reasonably straightforward using the
sample autocorrelation and partial autocorrelation functions, the pre­
liminary estimators ci»m and Om and the AICC. On the other hand, for
ARMA(p, q) processes with p and q both non-zero, the sample ACF and
PACF are much more difficult to interpret. We therefore search directly for
values of p and q such that the AICC defined by (9.2. 1 ) is minimum. The
search can be carried out in a variety of ways, e.g. by trying all (p, q) values
such that p + q = 1 , then p + q = 2, etc., or alternatively by using the
following steps.
(i) Use maximum likelihood estimation (program PEST) to fit ARMA
processes of orders ( 1, 1), (2, 2), . . . , to the data, selecting the model which
gives the smallest value of the AICC. [Initial parameter estimates for
PEST can be found using Option 3 to fit ARMA(p, p) models as described
in Example 8.4. 1 , or by appending zero coefficients to fitted maximum
likelihood models of lower order.]
(ii) Starting from the minimum-AICC ARMA(p, p) model, eliminate one or
more coefficients (guided by the standard errors of the estimated
coefficients), maximize the likelihood for each reduced model and
compute the AICC value.
(iii) Select the model with smallest AICC value (subject to its passing the
goodness of fit tests in Section 9.4).
The procedure is illustrated in the following example.
EXAMPLE 9.2.3. The sample autocorrelation and partial autocorrelation func­
tions of 200 observations of a stationary series are shown in Figure 9. 1 4. They
suggest an AR(4) model for the data, or perhaps a mixed model with fewer
coefficients. We shall explore both possibilities, first fitting a mixed model in
accordance with the procedure outlined above.
§9.2. Identification Techniques
297
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
- 0.5
-0.6
-0.7
- 0.8
-0.9
- 1
0
10
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0 2
-0.3
-0.4
-0.5
-0.6
-07
-0.8
-0.9
-1
0
10
(b)
Figure 9. 14. The sample ACF (a) and PACF (b) for the data of Example 9.2.3.
298
9. Model Building and Forecasting with ARIMA Processes
Table 9.6. Parameter Estimates for ARMA(p, p) Models, Example 9.2.3
(a) Preliminary Estimates (from PEST) with m = 9
p
Jl
J2
i'il
i'iz
I
2
3
.803
1 . 142
- 2.524
- .592
3.576
.868
.528
4. 195
.025
1 .982
p
- 2. 1 56
AICC
. 1 09
(b) Maximum Likelihood Estimates (from PEST)
��
�2
�
.701
2 1 . 1 1 8 - .580
3 1 . 1 22 - .555 - .020
4 1 .0 1 6 - 1.475 1 .0 1 2
�
- .525
�
�
.892
.798 . 103
.792 .059
.889 1 .207
�
- .042
.897
�
.216
82
656.61
591 .43
Non-causal
AICC
1 .458 652.33
.982 578.27
.982 582.39
.930 579.98
BIC
657.36
591 .85
603.17
603.67
Table 9.6(a) shows the preliminary parameter estimates �, 0 for ARMA(p, p)
models with p = 1 , 2 and 3 (p = 3 gives a non-causal model) and m = 9,
obtained from PEST as described in Example 8.4. 1 . On the basis of the
AICC values in Table 9.6(a), the ARMA(2, 2) model is the most promising.
Since the preliminary ARMA(3, 3) model is not causal, it cannot be used
to initialize the search for the maximum likelihood ARMA(3, 3) model.
Instead, we use the maximum likelihood ARMA(2, 2) model with appended
coefficients r/J3 = 83 = 0. The maximum likelihood results are shown in Table
9.6(b). The AICC values have a clearly defined minimum at p = 2. Comparing
each coefficient of the maximum likelihood ARMA(2, 2) model with its stan­
dard error we obtain the results shown in Table 9.7, which suggest dropping
the coefficient 82 and fitting an ARMA(2, 1 ) process. Maximum likelihood
estimation then gives the model (for the mean-corrected data),
X, - 1 . 1 85X,_ 1 + .624X,_ 2 = Z, + .703Z,_ 1 ,
{Z, } ""' WN (0, .986),
(9.2. 1 0)
with AICC value 576.88 and BIC value 586.48.
Table 9.7. Comparison of J1 , J2 , 01 and 02 with Their
Standard Errors (Obtained from the Program PEST)
Estimated coefficient
Estimated coefficient
1 .96 * (Standard error)
�1
�2
{)1
{)2
1.118
- .580
.798
. 103
5.8 1 1
- 3.605
3.604
.450
If now we fit AR(p) models of order p = 2, . . . , 6 we obtain the results shown
in Table 9.8. The smallest AICC and BIC values are both achieved when
p = 5, but the values are substantially larger than the corresponding values
§9.2. Identification Techniques
299
Table 9.8. Maximum Likelihood AR(p) Models for Example 9.2.3
6'2
AICC
p
�6
�5
�4
�3
�2
�I
2
3
4
5
6
1 .379
1.712
1 .839
1 .89 1
1 .909
- .773
- 1 .364
- 1 .760
- 1 .932
- 1 .991
.428
.91 9
1 .248
1 .365
- .284
- .627
- .807
. 1 86
.362
- .092
1 .380
1.121
1 .029
.992
.984
640.83
602.03
587.36
582.35
582.77
BIC
FPE
646.50
61 1 .77
600.98
599.66
603.56
1 .408
1.155
1 .071
1 .043
1 .044
for (9.2. 1 0). We therefore select the ARMA(2, 1 ) model, subj ect to its passing
the goodness of fit tests to be discussed in Section 9.4.
The data for this example were in fact generated by the Gaussian process,
{ Z, } � WN(O, 1 ). (9.2. 1 1)
(d) Use of the R esiduals for Model Modification. When an ARMA model
</!(B) X, = 8(B)Z1 is fitted to a given series, an essential part of the procedure
is to examine the residuals, which should, if the model is satisfactory, have the
appearance of a realization of white noise. If the autocorrelations and partial
autocorrelations of the residuals suggest that they come from some other
clearly identifiable process, then this more complicated model for the residuals
can be used to suggest a more appropriate model for the original data.
If the residuals appear to come from an ARMA process with coefficient
vectors <1-z and 9z, this indicates that { Z, } in our fitted model should satisfy
<Pz(B)Z1 = 8z(B) W, where { W, } is white noise. Applying the operator <Pz(B) to
each side of the equation defining {X, } , we obtain,
</lz(B) </J (B)X, = </!z(B)8(B)Z1 = 8z(B)8(B) W, ,
(9.2. 1 2)
where { W, } is white noise. The modified model for { X, } is thus an ARMA
process with autoregressive and moving average operators <Pz(B)</!(B) and
8z(B)8(B) respectively.
EXAMPLE 9.2.4. Consider the AR(2) model in Table 9.8,
(1 - 1 .379B + .773B 2 )X, = Z,,
(9.2. 1 3)
which was fitted to the data of Example 9.2.3. This is an unsatisfactory model,
both for its high AICC value and the non-whiteness of its residuals. The
sample autocorrelation and partial autocorrelation functions of its residuals
are shown in Figure 9. 1 5. They suggest an MA(2) model for {Z,}, i.e.
Z, = (1 + 8 1 B + 82 B 2 ) W,,
{ W, } � WN(0, 0" 2 ).
(9.2. 14)
From (9.2. 1 3) and (9.2. 14) we obtain an ARMA(2, 2) process as the modified
model for { X, } . Fitting an ARMA(2, 2) process by maximum likelihood and
allowing subsets of the coefficients to be zero leads us to the same model for
the data as was found in Example 9.2.3.
300
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0 5
-0.6
-0.7
-0.8.
-0.9
-1
9. Model Building and Forecasting with ARIMA Processes
0
10
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
(b)
Figure 9. 1 5. The sample ACF (a) and PACF (b) of the residuals when the data of
Example 9.2.3 is fitted with the AR(2) model (9.2. 1 3).
§9.3. Order Selection
301
It is fortunate in Example 9.2.4 that the fitted AR (2) model has an auto­
regressive polynomial similar to that of the best-fitting ARMA(2, 1) process
(9.2. 10). It frequently occurs, when an AR(p) model is fitted to an ARMA(p, q)
process, that the autoregressive polynomials for the two processes are totally
different. The residuals from the AR ( p) model in such a case are not likely to
have a simple form such as the moving average form encountered in Example
9.2.4.
§9.3 Order Selection
In Section 9.2 we referred to the problem of overfitting and the need to avoid
it by imposing a cost for increasing the number of parameters in the fitted
model. One way in which this can be done for pure autoregressive models is
to minimize the final prediction error (FPE) of Akaike ( 1969). The FPE is an
estimate of the one-step prediction mean squared error for a realization of the
process independent of the one observed. If we fit autoregressive processes of
steadily increasing order p to the observed data, the maximum likelihood
estimate of the white noise variance will usually decrease with p, however the
estimation errors in the expanding set of fitted parameters will eventually
cause the FPE to increase. According to the FPE criterion we then choose
the order of the fitted process to be the value of p for which the FPE is
minimum. To apply this criterion it remains only to express the FPE in terms
of the data x 1 , . . . ' x• .
Assume that {X 1 , , x. } is a realization of an AR ( p) process with coeffi­
cients r/J 1 , . . . , r/JP , (p < n), and let { Y1 , . . . , Y.} be an independent realization of
the same process. If J 1 , . . . , JP are the maximum likelihood estimators of the
coefficients based on { X 1 , . . . , X. } and if we use these to compute the one-step
predictor J 1 Y. + · · · + JP ¥, + 1 - p of ¥,+ 1 , then the mean-square prediction
error 1s
. . •
E ( Yn +1 - J 1 Y. - · · · - JP Yn + 1 - p )2
= E [ Yn+ 1 - rP1 Y, - · · · - r/Jp ¥,+ 1 -p
- (J, - r/J 1 ) Y. - . . . - (Jp - r/Jp ) ¥,+1 - p] z
= (J 2 + E [(cjlp - cjlp )' [ Y,+l - i ¥,+1 -J L �1 (cjlp - cjlp ) ] ,
A
A
where cjl� = (¢;1 , . . . , r/Jp ), �� = (J1 , . . . , JP) and CJ 2 is the white noise variance of
the AR ( p) model. Writing the last term in the preceding equation as the
expectation of the conditional expectation given X 1 , . . . , x., and using the
independence of {X 1 , . . . , X. } and { Y1 , , Y. }, we obtain
• • •
E ( ¥,+1 - J1 Y, - · · · - Jp ¥,+1 -p )2
= (J z + E [(�p - cjlp )' fp (�p - cjlp )],
where rP = E [ Y; l}J L � 1 . We can approximate the last term by assuming that
302
9. Model Building and Forecasting with ARIMA Processes
n 1 12 (�p - cf>p) has its asymptotic distribution N (O, 0" 2 rP- 1 ) from Theorem 8. 1 . 1 .
This gives (see Problem 1 . 1 6)
E( Yn+l - J1 Yn - · · ·
-
Jp Y,+ l -p) 2
�
( �).
(9.3.1)
0"2 1 +
I f 172 i s the maximum likelihood estimator o f 0" 2 then for large n, n<1 2 /0" 2 is
distributed approximately as chi-squared with (n - p) degrees of freedom (see
Section 8.9). We therefore replace 0" 2 in (9.3.1) by the estimator na 2/(n - p) to
get the estimated mean square prediction error of Yn +l ,
FPE
=
a2
n+p
.
n-p
(9.3.2)
Inspection of Table 9.2 shows how the FPE decreases to a minimum then
increases as p is increased in Example 9.2. 1 . The same table shows the non­
increasing behaviour of 17 2 .
A more generally applicable criterion for model-selection is the information
criterion of Akaike ( 1973), known as the AIC. This was designed to be an
approximately unbiased estimate of the Kullback-Leibler index of the fitted
model relative to the true model (defined below). Here we use a bias-corrected
version of the AIC, referred to as the AICC, suggested by Hurvich and Tsai
( 1989).
If X is an n-dimensional random vector whose probability density belongs
to the family {!( · ; 1/J), 1/J E 'P}, the Kullback-Leibler discrepancy between
f( · ; 1/1) and f( · ; 8) is defined as
where
d( 1/1 I 8) = il( 1/1 I 8) - il(8 I 8),
.!l(l/1 1 8)
=
=
E8( - 2 ln f(X ; 1/1))
r - 2 ln(f(x; 1/J))f(x; 8) dx,
J n;Jn
is the Kullback-Leibler index of f(· ; 1/J) relative to f( - ; 8). (In general
.!l(l/1 1 8) -:f .!l(8 1 1/J).) Applying Jensen's inequality, we see that
d(l/1 I 8) =
?:
I
J n;J n
-
- 2 ln
2 ln
= - 2 ln
=
0
(
)
f(x; 1/J)
f(x; 8) dx
f(x ; 8)
(Jrn;Jn
(t/
f(x, 1/1)
f(x ; 8) dx
f(x ; 8)
(x ; I/J) dx
with equality holding if and only if f(x; 1/J)
=
)
)
f(x; 8) a.e. [f( · 8)].
,
§9.3. Order Selection
303
, X n of an ARMA process with unknown
Given observations X 1 ,
parameters 8 = (p, a2), the true model could be identified if it were possible
to compute the Kullback-Leibler discrepancy between all candidate
models and the true model. Since this is not possible we estimate the
Kullback-Leibler discrepancies and choose the model whose estimated
discrepancy (or index) is minimum. In order to do this, we assume that the
true model and the alternatives are all Gaussian. (See the Remark below for
further comments on this point.) Then for any given 8 (p, a2), f( · ; 8) is the
probability density of ( Y1 ,
, Y,)', where { t;} is a Gaussian ARMA(p, q)
process with coefficient vector p and white noise variance rr2. (The dependence
of 8 on p and q is through the dimension of the autoregressive and moving
average coefficient vectors in p.)
Suppose therefore that our observations X 1 , . . . , X n are from a Gaussian
ARMA process with parameter vector 8 = (p, rr2) and assume for the moment
that the true order is (p, q). Let fJ (p, 8-2) be the maximum likelihood
estimator of 8 based on X I•
' xn and let Yl, . . . ' Y, be an independent
realization of the true process (with parameter 8). Then
. . •
=
• . .
• . •
- 2 ln L r(P, 8-2)
so that
=
=
- 2 ln L x(P, 8-2) + & - 2 Sr(P) - n,
(9.3.3)
Making the local linearity approximation used in Section 8. 1 1, we can write,
for large n,
[
asy
1
a2 Sr(P) n
Sr(P) � Sr(P) + (p - p) 8i (p) + 2 (p - py
(p - p)
a {3; a{3j i , j = I
n a zt
� S r( P) + ( p - P)2 L
(p)Zt(P) + (p - PY D'D (p - p).
t = l ap
A
A
A
A
J
A
A
A
�
p
From Section 8. 1 1, we know that n - 1 D'D--+a2 V - 1 ( p), p is AN(p, n - 1 V( p)),
and that (aZ1jap)(P)Z1(P) has mean 0. Replacing D'D by nrr2 V - 1 (P) and
assuming that n 1 12(P - p) has covariance matrix V(p), we obtain
A
EII,,,[Sr(P)]
�
�
A
A
EII,,,[Sr(P)] + rr 2 E11.A(P - p)' V - 1 (p)(p - p)]
rr2n + rr 2 (p + q),
A
I
since (azt�ap)(p)Zt(P) is independent of p - p and E(U'I: - U) =
trace(I:I: - 1 ) = k for any zero-mean random k-vector U with nonsingular
covariance matrix I:. From the argument given in Section 8.9, nG-2 Sx( P)
is distributed approximately as rr2x2(n - p - q) for large n and is
asymptotically independent of p. With the independence of {X 1 , . . . , X n } and
{ Y1 , , Y,}, this implies that 8-2 is asymptotically independent of Sr( P).
=
• • •
9. Model Building and Forecasting with ARIMA Processes
304
(sr(P))
Consequently,
Efl,a'
fj2
- n � a2 (n + p + q)(Efl,<>' a 2 ) - n
A -
2(p + q + 1 )n
n-p-q-2
Thus the quantity, - 2 In Lx( P, ff 2 ) + 2(p + q + 1)n/(n - p - q - 2), is an
approximately unbiased estimate of the expected Kullback-Leibler index
£8(�(8 1 8)) in (9.3.3). Since the preceding calculations (and the maximum
likelihood estimators p and ff 2 ) are based on the assumption that the true
order is (p, q), we therefore select the values of p and q for our fitted model
to be those which minimize AICC(p), where
A ICC(�) := - 2 I n Lx(�, Sx(�)/n) + 2(p + q + 1 )n/(n The AIC statistic, defined as
p
-
q - 2). (9.3.4)
AIC(�) := - 2 ln Lx(�, S x(�)/n) + 2(p + q + 1 ),
can be used in the same way. Both AICC(�, a2) and AIC(�, a 2 ) can be defined
for arbitrary a 2 by replacing Sx(�)/n in the preceding definitions by a 2 ,
however we shall use AICC(�) and AIC(�) as defined above since both AICC
and AIC are minimized for any given � by setting a 2 = Sx(�)/n.
For fitting autoregressive models, Monte Carlo studies (Jones ( 1 975),
Shibata ( 1976)) suggest that the AIC has a tendency to overestimate p. The
penalty factors, 2(p + q + 1 )n/(n - p - q - 2) and 2(p + q + 1 ), for the AICC
and AIC statistics are asymptotically equivalent as n -> oo . The AICC statistic
however has a more extreme penalty for large-order models which counter­
acts the overfitting tendency of the AI C. The BIC is another criterion which
attempts to correct the overfitting nature of the AI C. For a zero-mean causal
invertible ARMA(p, q) process, it is defined (Akaike ( 1978)) to be,
BIC = (n - p - q) In[nff 2/(n - p - q)] + n(1 + In Fn)
+ (p + q) I n
[(t, X� - )/(p J
nff 2
+ q)
(9.3.5)
where ff 2 is the maximum likelihood estimate of the white noise variance.
The BIC is a consistent order selection procedure in the sense that if the
data {X 1,
, Xn } are in fact observations of an ARMA(p, q) process, and if
p and q are the estimated orders found by minimizing the BIC, then p -> p
and q -> q with probability one as n -> oo (Hannan ( 1980)). This property is
not shared by the AICC or AIC. On the other hand, order selection by
minimization of the AICC, AIC or FPE is asymptotically efficient for
autoregressive models, while order selection by BIC minimization is not
• • •
§9.3 Order Selection
305
(Shibata ( 1980), Hurvich and Tsai (1989)). Efficiency in this context is defined
as follows. Suppose that {X,} is a causal AR(oo) process satisfying
00
L nj X, _ j = Z,,
j=O
(where n 0 = 1 ) and let (¢ P 1 , . . . , ¢ PP)' be the Yule-Walker estimates of the
coefficients of an AR(p) model fitted to the data {X 1,
, x.} (see (8.2.2)).
• • •
The one-step mean-square prediction error for an independent realization
{ Y, } of {X,}, based on the AR(p) model fitted to {X,}, is
00
= Ex( z:+ 1 - (¢ p1 + n d Y, - · · · - (¢PP + np) ¥;, + 1 - p - L nj ¥;, + 1 -) 2
j= p + 1
= (}"2 + (<l>p, oo + 1too ) roo («<>p, oo + 1too )
A
f
A
=: H(p)
where E x denotes expectation conditional on X 1 ,
, x •. Here {zn is the
white noise associated with the { Y,} process, r oo is the infinite-dimensional
covarianc� matrix of { Y,}, and <J>p. oo and 1t 00 are the infinite-dimensional
,
vectors, ( ¢ P 1 , . . . , rf; PP' 0, 0, . . .)' and (n 1 , n 2 , . . . )' . Now if v: is the value of p
which minimizes H(p), 0 ::::; p ::::; k., and k. is a sequence of constants
converging to infinity at a suitable rate, then an order selection procedure
is said to be efficient if the estimated order Pn satisfies
. • •
H(p:)
!.. 1
H{P.)
as n -+ oo . In other words, an efficient order selection procedure chooses an
AR model which achieves the optimal rate of convergence of the mean-square
prediction error.
Of course in the modelling of real data there is rarely such a thing as the
" true order". For the process X, = L� o l/Jj Zr - j there may be many
polynomials 8(z), ¢(z) such that the coefficients of zj in 8(z)j¢(z) closely
approximate l/Jj for moderately small values ofj. Correspondingly there may
be many ARMA processes with properties similar to {X,}. This problem of
identifiability becomes much more serious for multivariate processes. The
AICC criterion does however provide us with a rational criterion for choosing
between competing models. It has been suggested (Duong ( 1 98 1)) that models
with AIC values within c of the minimum value should be considered
competitive (with c = 2 as a typical value). Selection from amongst the
competitive models can then be based on such factors as whiteness of the
residuals (Section 9.4) and model simplicity.
Remark. In the course of the derivation of the AICC, it was assumed that
the observations {X 1 ,
, X.} were from a Gaussian ARMA(p, q) process.
However, even if (X 1 , . . . , X.) has a non-Gaussian distribution, the argument
• • .
9. Model Building and Forecasting with ARIMA Processes
306
given above shows that the AICC is an approximately unbiased estimator of
E( - 2 In Ly(p, 8 2)),
(9.3.6)
where the expectation is now taken relative to the true (possibly
non-Gaussian) distribution of (X 1 , • • . , X nY and ( Y1 , . . . , Y,.)' and Ly is the
Gaussian likelihood based on ( Y1 , . . . , Y,.)'. The quantity in (9.3.6) can be
interpreted as the expected Kullback-Leibler index of the maximum
likelihood Gaussian model relative to the true distribution of the process.
The AICC for Subset Models. We frequently have occasion, particularly in
analyzing seasonal data, to fit ARMA(p, q) models in which all except
m ( :s;; p + q) of the coefficients are constrained to be zero (see Example 9.2.2).
In such cases the definition (9.3.4) is replaced by,
AICC(�)
=
- 2 ln Lx(�, Sx(�)/n) + 2(m + l )n/(n - m - 2).
(9.3.7)
§9.4 Diagnostic Checking
Typically the goodness of fit of a statistical model to a set of data is judged
by comparing the observed values with the corresponding predicted values
obtained from the fitted model. If the fitted model is appropriate, then the
residuals should behave in a manner which is consistent with the model.
When we fit an ARMA(p, q) model to a given series we first find the
maximum likelihood estimators �, 9 and 6- 2 of the parameters �' 9 and (J 2 •
In the course of this procedure the predicted values X,(�, 0) of X, based on
X 1 , , X,_ 1 are computed for the fitted model. The residuals are then defined,
in the notation of Section 8.7, by
• . •
t = 1 , . . . , n.
(9.4. 1)
If w e were t o assume that the maximum likelihood ARMA(p, q) model i s the
true process generating {X, }, then we could say that { Jt; } "' WN (0, 6- 2 ).
However to check the appropriateness of an ARMA(p, q) model for the data,
we should assume only that X 1 , • • • , X" is generated by an ARMA(p, q) process
with unknown parameters �' 9 and (J 2 , whose maximum likelihood estimators
are �, 9 and 6- 2 respectively. Then it is not true that { Jt; } is white noise.
Nonetheless Jt;, t = 1, . . . , n should have properties which are similar to those
of the white noise sequence,
t
=
1 , . . . , n.
Moreover by (8. 1 1 .2), E( W,(�, 9) - Z,>2 is small for large t, so that properties
of the residuals { Jt; } should reflect those of the white noise sequence { Z, }
generating the underlying ARMA(p, q) process. I n particular the sequence
{ lt; } should be approximately (i) uncorrelated if {Z, } "' WN(O, (J 2 ), (ii) inde­
pendent if { Z, } "' IID(O, (J 2 ), and (iii) normally distributed if Z, "' N (0, (J 2 ).
§9.4. Diagnostic Checking
307
Remark. There are several other candidates for the title "residuals" of a fitted
ARMA process. One choice for example is (see Problem 5 . 1 5(a))
Z, = {J - l (B) J(B)X"
t =
1,
. . . , n,
where J(z) : = 1 J1 z · · · JP z P, B(z) : = 1 + 0 1 z + · · · + Bq z q and X, := 0,
t :<:;; 0. However we prefer to use the definition (9.4. 1 ) because of its direct
interpretation as a scaled difference between an observed and a predicted
value, and because it is computed for each t in the course of determining the
maximum likelihood estimates.
-
-
-
The Graph of {l¥,, t = 1 , . . . , n } . If the fitted model is appropriate, then the
graph of W,, t = 1 , . . . , n, should resemble that of a white noise sequence. While
it is difficult to identify the correlation structure of { W, } (or any time series
for that matter) from its graph, deviations of the mean from zero are sometimes
clearly indicated by a trend or cyclic component, and non-constancy of the
variance by fluctuations in W, whose magnitude depends strongly on t.
The residuals obtained from fitting an AR(3) model to the data in Example
9.2. 1 are displayed in Figure 9. 1 6. The residuals W, have been rescaled; i.e.
divided by the estimated standard deviation cr, so that most of the values
should lie between ± 1 .96. The graph gives no indication of a non-zero mean
or non-constant variance, so on this basis there is no reason to doubt the
compatibility of wl ' . . . ' If;. with white noise.
The next step is to check that the sample autocorrelation function of
wl ' . . . ' If;. behaves as it should under the assumption that the fitted model
is appropriate.
3
2
I
0
II
IV
-1
�
�
I�
I�
lA
�
�
v
-2
-3
0
20
40
60
80
1 00
1 20
1 40
1 60
1 80
200
Figure 9. 16. Rescaled residuals from the AR(3) model for the data of Example 9.2. 1 .
9. Model Building and Forecasting with ARIMA Processes
308
The Sample Autocorrelation Function of Jt; . The sample autocorrelations of
an iid sequence Z 1 , , z. with E(Z,l} < oo are for large n approximately iid
with distribution N (O, 1/n) (see Example 7.2. 1 ). Assuming therefore that we
have fitted an appropriate ARMA model to our data and that the ARMA
model is generated by an iid white noise sequence, the same approximation
should be valid for the sample autocorrelation function of Jt;, t = 1 , . . . , n,
defined by
• • .
h
=
1 , 2, . . .
where W = n- 1 L�= 1 Jt;. However, because each Jt; is a function of the maxi­
mum likelihood estimator (�, 0), W1 , . . . , J¥, is not an iid sequence and the
distribution of fiw(h) is not quite the same as in the iid case. In fact fiw(h) has
an asymptotic variance which for small lags is less than 1 /n and which for
large lags is close to 1/n. The asymptotic distribution of fiw (h) is discussed
below.
Let Pw = (fiw( 1 ), . . . , Pw(h))' where h is a fixed positive integer. If { X, } is the
causal invertible ARMA process </J(B)X, = O(B)Z,, define
(9.4.2)
and
a(z) = (�(zW 1
It will be convenient also to define ai
Assuming h 2: p + q, set
=
=
00
aizi.
L
j=O
(9.4.3)
0 for j < 0.
(9.4.4)
and
(9.4.5)
Note that fp+q is the covariance matrix of ( Y1 , . . . , Yp+ q) where { Y, } is an
AR(p + q) process with autoregressive polynomial given by �(z) in (9.4.2) and
with rJ2 = 1 . Then using the argument given in Box and Pierce ( 1 970), it can
be shown that
(9.4.6)
where Ih is the h x
n- 1 ( 1 - qu).
h identity matrix. The asymptotic variance of fiw (i) is thus
EXAMPLE 9.4. 1 (AR( 1 )). In this case fp + q = (1 - </J2)- 1 and
qii
=
qii (</J) = </J2 ( i -1)( 1 - </J2),
i = 1 , 2, . . . .
309
§9.4. Diagnostic Checking
0.2
0. 1 5
0. 1
0 . 05
0
-0.05
-0. 1
-0. 1 5
-0.2
0
5
10
Figure 9. 1 7. The bounds ± 1 .96n- 112 ( 1 - qii(r/J))1 12 of Example 9.4.1 with n = 1 00 and
rjJ = 0 (outer), rjJ = .8 (inner).
The bounds ± 1 .96( 1 - qii (rp)) 112 n - 112 are plotted in Figure 9. 1 7 for two values
of rp . In applications, since the true value of rp is unknown, the bounds
± 1 .96( 1 - q;; (�)) 1i2 n - 1 i2 are plotted. A value of Pw(h) lying outside these
bounds suggests possible inconsistency of the residuals, a;, t = 1, . . . , n, with
the fitted model. However it is essential to bear in mind that approximately
5 percent of the values of Pw (h) can be expected to fall outside the bounds,
even if the fitted model is correct.
ExAMPLE 9.4.2 (AR(2)). A straightforward calculation yields
q 1 2 = - ¢ 1 r/J2 ( 1 + r/J2 ),
q 2 2 = 1 - r/Ji - ¢ f ( 1 + r/J2 ) 2 .
Since the sequence { ai } in (9.4.3) satisfies the recursion relations,
q l l = 1 - r/Ji,
ai - ¢ 1 ai - 1 - rp2 ai - 2 = 0,
j ?. 2,
it follows from (9.4.5) that
(9.4.7)
and hence that
q ii = ¢ 1 qi,i - 1 + ¢2 qi,i - 2 ·
The asymptotic variance (1 q;; (cjl))n - 1 can thus easily be computed using
the recursion (9.4.7) and the initial values q 1 1 , q 1 2 and q2 2 . The auto­
-
correlations of the estimated residuals from the fitted AR(2) model in Example
3 10
9. Model Building and Forecasting with ARIMA Processes
0.2
0. 1 5
0. 1
0 . 05
0
-0.05
-0. 1
-0. 1 5
-0.2
10
0
20
30
40
Figure 9. 1 8. The autocorrelations of the residuals { W, } from the AR(2) model,
X, - 1 .458Xr - � + 6X, 2 = Z,, fitted to the data in Example 9.2. 1 . The bounds are
computed as described in Example 9.4.2.
.
_
9.2. 1 and the bounds ± 1 .96(1 - q;; (J1 , J2 )) 1 12 n - 112 are plotted in Figure 9. 1 8.
With the exception of Pw ( 1 3), the correlations are well within the confidence
bounds.
The limit distribution of Pw for M A ( 1 ) and MA(2) processes is the same as
in Examples 9.4. 1 and 9.4.2 with cjl replaced by - 0. Moreover the ARMA ( 1 , 1 )
bounds can be found from the AR(2) bounds b y setting r/>1 = (¢> - 8) and
¢>2
- ¢>8 where ¢> and 8 are the respective parameters in the ARMA ( l , 1 )
model.
=
The Portmanteau Test. Instead of checking to see if each Pw(i) falls within the
confidence bounds ± 1 .96(1 - q ;; ) 112 n - 1 12 , it is possible to consider instead a
single statistic which depends on Pw(i), 1 s i s h. Throughout this discussion
h is assumed to depend on the sample size n in such a way that (i) hn -> oo as
n -> oo, and (ii) the conditions of Box and Pierce ( 1 970) are satisfied, namely
=
l{!i = O(n - 112 ) for j ;:::: h" where l{!i , j 0, 1, . . . are the coefficients in the
expansion X, I.;o l{!i Z,_i, and
(b) h. = O(n 1 12 ).
Then since hn -> oo, the matrix fp+ q may be approximated by T;I;. and so the
(a)
=
matrix Q in (9.4.5) and (9.4.6) may be approximated by the projection matrix
(see Remark 2 of Section 2.5),
311
§9.4. Diagnostic Checking
I;, ( n i;, )- 1 n ,
which has rank p + q. Thus if the model is appropriate, the distribution of
Pw = (p w ( l ), . . . ' P w (h))' is approximately N(O, n -1 (Ik - I;, ( T� 7;, )- 1 rn). It then
follows from Problem 2. 1 9 that the distribution of
h
A 2 ( ")
Q w = n PwPw = n "
L... Pw
J
A'
A
j= 1
is approximately chi-squared with h - (p + q) degrees of freedom. The ade­
quacy of the model is therefore rejected at level ex if
Qw
>
x i - a (h - p -
q).
Applying this test to the residuals from the fitted AR(3) model in Example
9.2. 1 with h = 25, we obtain n I J;1 p?v ( j) = 1 1 .995, which is less than
x 295(22) = 33.9. Thus on the basis of this test, there is no reason to doubt the
adequacy of the fitted model. For the airline data in Example 9.2.2, we have
n I J;1 p?v ( j) = 1 2. 1 04 for the fitted moving average model with non-zero
coefficients at lags 1 , 3, 5, 1 2 and 23. Comparing this value with x 29 5 (25 - 5) =
3 1 .4, we see that the residuals pass the portmanteau test. Note that the number
of coefficients fitted in the model is 5. For the residuals from the AR(2) model
fitted to the data of Example 9.2.4, we obtain n IJ;1 p?v ( j) = 56.61 5 which is
larger than x 295(23) = 35.2. Hence, as observed earlier, this model is not a good
fit to the data.
Ljung and Box ( 1 978) suggest replacing the statistic Qw in the above test
procedure with
Q w = n(n + 2)
h
L pfv ( j)/(n - j).
j= 1
They argue that under the hypothesis of model adequacy, the cutoff value
given by x i - a (h - p - q) is closer to the true ( 1 - a)-quantile of the distribu­
tion of Q w than to that of Qw . However, as pointed out by Davies, Triggs and
Newbold ( 1 977) the variance of Q w may exceed that of a x 2 distribution with
h - p - q degrees of freedom. The values of Q w with h = 25 for Examples
9.2. 1 and 9.2.2 are 1 2.907 and 1 3.768, respectively. Hence the residuals pass
this test of model adequacy.
Examination of the squared residuals may often suggest departures of the
data from the fitted model which could not otherwise be detected from the
residuals themselves. Granger and Anderson ( 1 978) have found examples
where the residuals were uncorrelated while the squared residuals were cor­
related. We can test the squared residuals for correlation in the same way that
we test the residuals themselves. Let
'\' n_=- h ( �V.Z
L..., r -1 r
,
Pww (h) '\'n
_
_
w 2 ) ( Wr 2+h
_
L..., r = 1 ( Wr 2 - W 2 )
�
w2 )
'
h
>
_
1
be the sample autocorrelation function of the squared residuals where W 2 =
312
9 . Model Building and Forecasting with ARIMA Processes
n - 1 I 7= 1 W, 2 . Then McLeod and Li ( 1983) show that
h
L NvwU )/(n j)
Q ww = n(n + 2) j=1
has an approximate x2 (h) distribution under the assumption of model ade­
quacy. Consequently, the adequacy of the model is rejected at level a if
Q ww > x f _,. (h).
For Examples 9.2.1 and 9.2.2 with h = 25 we obtain the values «2 ww = 26.367
and Q ww = 1 6.356, respectively. Since x�95(25) = 37.7, the squared residuals
-
for these two examples pass this portmanteau test.
An advantage of portmanteau tests is that they pool information from the
correlations Pw (i), i = 1 , . . . , h at different lags. A distinct disadvantage how­
ever, is that they frequently fail to reject poorly fitting models. In practice
portmanteau tests are more useful for disqualifying unsatisfactory models
from consideration than for selecting the best-fitting model among closely
competing candidates.
Tests of Randomness. In addition to the tests based on the sample auto­
correlation function of { W, } which we have already described, there are a
number of other tests available for checking the hypothesis of "randomness"
of Uf;}, i.e. the hypothesis that { W, } is an iid sequence. Three of these tests
are described below. For further details and for additional tests of randomness,
see Kendall and Stuart ( 1976).
(a) A Test Based on Turning Points. If y 1 , . . • , Yn is a sequence of observa­
tions, then we say that the data has a turning point at time i, 1 < i < n, if
Y; 1 < Y; and Y; > Y;+ 1 or if Y; - 1 > Y; and Y; < Y;+1 . Define T to be the number
of turning points of the sequence y 1 , . . . , Yn - If y 1 , . . . , Yn are observations of
a random (iid) sequence, then the probability of a turning point at time i is 1.
The expected number of turning points is therefore
-
f.-LT
= ET
=
2(n - 2)/3.
It can also be shown that the variance is
(J¥ = V ar (T) = ( 1 6n - 29)/90.
A large value of T - f.-LT indicates that the series is fluctuating more rapidly
than expected for a random series. On the other hand a value of T f.-LT
much smaller than zero indicates a positive correlation between neighboring
observations. It can be shown that for an iid sequence
-
T
is AN (J.-L T , (Jf),
so the assumption that y 1 , • • • , Yn are observations from a random sequence
is rejected if I T - f.-LT II(JT > <1> 1 _a12 where <1> 1 _a12 is the 1 - a/2 percentage
point of a standard normal distribution. The values of T for the residuals in
313
§9.4. Diagnostic Checking
Examples 9.2. 1 -9.2.3 are displayed i n Table 9.9. Inspecting the I T - J1rl /a1
column of the table we see that the three sets of residuals safely pass this test
of randomness.
=
(b) The D ifference-Sign Test. For this test we count the number of values
of i such that Yi > Y i -1 , i 2, . . . , n or equivalently the number of times the
differenced series Yi - Yi-1 is positive. If we denote this number by S, it is clear
that under the random sequence assumption,
Jls =
ES = t(n - 1).
It can also be shown, under the same assumption, that
a§ = Var(S) =
(n + 1)/12,
and that
S is AN(Jls, a§}.
A large positive (or negative) value of S - Jls indicates the presence of an
increasing (or decreasing) trend in the data. We therefore reject the assumption
of no trend in the data if I S - Jls l /as > <1\ - a;z · Table 9.9 contains the results
of this test applied to the residuals of Examples 9.2.1 -9.2.3. In all three cases,
the residuals easily pass this test of randomness.
The difference-sign test as a test of randomness must be used with caution.
A set of observations exhibiting a strong cyclic component will pass the
difference-sign test for randomness since roughly half of the observations will
be points of increase.
(c) The Rank Test. The rank test is particularly useful for detecting a linear
trend in the data. Define P to be the number of pairs (i, j ) such that Yi > Y i >
j > i, i 1 , . . . , n - 1 . There is a total of ( � ) = tn(n - 1) pairs (i,j ) such that
j > i, and for each pair the event { yj > yJ has probability t if {yj} is a random
sequence. The mean of P is therefore Jlp ±n(n - 1 ). It can also be shown that
the variance of P is af, n(n - 1)(2n + 5)/8 and that P is AN(Jlp, af,) (see
Kendall and Stuart, 1976). A large positive (negative) value of P - Jlp indicates
the presence of an increasing (decreasing) trend in the data. The assumption of
randomness of { yj } is therefore rejected at level a if I P - Jlp l jap > <1> 1 - a;z ·
From Table 9.9 we see that the residuals from Examples 9.2. 1 -9.2.3 easily pass
this test of randomness.
=
=
=
Table 9.9. Tests of Randomness Applied to Residuals in Examples
Example 9.2. 1
Example 9.2.2
Example 9.2.3
9.2.1 -9.2.3
T
fly
I T - J.!TI/ar
s
J.!s
I S - J.!s l /as
p
I P - flpl/ap
1 32
87
131
1 32
86
1 32
0
.21
. 10
99
65
1 04
99.5
65
99.5
.12
0
1.10
1 0465
3929
10086
.36
.44
.10
9 . Model Building and Forecasting with ARIMA Processes
314
Checking for Normality. If it can be assumed that the white noise process { Z,}
generating an ARMA (p, q) process is Gaussian, then stronger conclusions can
be drawn from the fitted model. For example, not only is it then possible to
specify an estimated mean squared error for predicted values, but asymptotic
prediction confidence bounds can also be computed (see Section 5.4). We now
consider a test of the hypothesis that { Z,} is Gaussian.
Let l( 1 l < 1( > < · · · < l(nl be the order statistics of a random sample
2
Y1 ,
, Y" from the distribution N(.u, 0"2 ). If X< 1 > < X< > < · · · < X<n> are the
2
order statistics from a N(O, 1) sample of size n, then
• . .
E l(jl = .u + O"mj ,
where mj = EX(j) , j = 1 , . . . , n. Thus a plot of the points (m 1 , l( 1 > ), . . . , (m", l(nl)
should be approximately linear. However if the sample values Y; are not
normally distributed, then the plot should be non-linear. Consequently, the
squared correlation of the points (m; , l( il), i = 1 , . . . , n should be near one if
the normal assumption is correct. The assumption of normality is therefore
rejected if the squared correlation R 2 is sufficiently small. If we approximate
m ; by <l>-1 ((i - .5)/n) (see Mage ( 1 982) for some alternative approximations),
then R2 reduces to
where Y = n - 1 ( Y1 + · · · + Y"). Percentage points for the distribution of R2 ,
assuming normality of the sample values, are given by Shapiro and Francia
( 1 972) for sample sizes n < 1 00. For n = 200, P (R2 < .987) = .05 and
P (R2 < .989) = . 1 0; for n = 1 3 1 , the corresponding quantiles are .980 and
.983.
In Figure 9. 1 9, we have plotted (<l>- 1 ((i - .5)/n), lf( il), i = 1, . . . , n for the
three sets of residuals obtained in Examples 9.2. 1 -9.2.3. The respective R2
values are .992, .984 and .990. Based on the graphs and the R2 values, the
hypothesis that the residuals { W,}, and hence { Z, }, are normally distributed
is not rejected, even at level . 1 0.
§9.5 Forecasting ARIMA Models
In this section we demonstrate how the methods of Section 5.3 can be adapted
to forecast the future values of an ARIMA(p, d, q) process {X, } . (The required
numerical calculations can be carried out using the program PEST.) If d 2 1
the first and second moments EX, and E(Xr+ h X,) are not determined by the
difference equations (9. 1 . 1 ). We cannot expect therefore to determine best
linear predictors for {X, } without further assumptions.
§9.5. Forecasting ARIMA Models
315
3 ,-------,---�
0
0 0
- 3 �----,--,---�---�
-3
3
-1
(a)
4 ,-------,---�
3
0
2
-2
-3
0
0
0
0
0
- 4 4------,---�---,---�
3
-3
- 1
(b)
1
Figure 9.19. Scatter plots of the points (<l>- ((i
5)/n) Wri!), i = I , . .
(a) Example 9.2. 1 , (b) Example 9.2.2 and (c) Example 9.2.3.
-
.
,
.
,
n,
for
316
9 . Model Building and Forecasting with ARIMA Processes
3 .-------,---.
0
2
oo o
o
- 3 �W------,---,_---,---�
-3
3
-1
(c)
Figure 9. 19. (continued)
For example, suppose that { Y, } is a causal ARMA(p, q) process and that
X0 is any random variable. Define
X, = Xo +
l
lj ,
jL
=!
t = 1 , 2,
. . .
.
Then { X" t :2:: 0} is an ARIMA(p, 1, q) process with mean EX, = EX0 and
autocovariances E(X, + h X,) - (EX 0) 2 depending on Var(X 0) and Cov(X 0 , }j),
j = 1 ' 2, . . . . The best linear predictor of xn+ l based on X0 ' X I ' . . . ' xn is the
projection Ps" Xn+ l where
Sn = sp{X0 , X 1 , . . . , Xn } = sp{X0, Y1 , . . . , Y, }.
Thus
Ps"Xn + l = PsJX o + Y1 +
···
+ Y, + d = X n + Ps, Yn + ! ·
To evaluate this projection it is necessary in general to know E(X0 lj), j =
1 , . . . , n + 1 , and EX1;. However if we assume that X0 is uncorrelated with lj,
j = 1 , 2, . . . , then Ps" Y, + 1 is simply the projection of Y, + 1 onto sp { Y1 , . . . , Y, }
which can be computed as described in Section 5.3. The assumption that
therefore suffices to determine the best
X0 is uncorrelated with Y1 , Y2 ,
linear predictor Ps" Xn+ l in this case.
Turning now to the general case, we shall assume that our observed process
{X, } satisfies the difference equations,
. . • ,
t = 1, 2,
. . . '
§9.5. Forecasting ARIMA Models
317
where { Y; } is a causal ARMA(p, q) process, and that the vector (X 1 _d, . . . , X0)
is uncorrelated with Y;, t > 0. The difference equations can be rewritten in the
form
X, = Y; -
I ( �) (
j� 1
}
t = 1 , 2, . . . .
- 1 )i X,_i,
(9.5 . 1 )
It i s convenient, b y relabelling the time axis if necessary, t o assume that we
observe X 1 - d' X 2 - d, . . . , Xn . (The observed values of { Y;} are then Y1, . . . , Y;. .)
Our goal is to compute the best linear predictor of Xn+ h based on
x 1 - d' . . . , xn , i.e.
Ps" Xn+ h : = psp{X 1 -d · · · · · Xn } Xn +h ·
In the notation of Section 5.2 we shall write
and
Since
Sn = sp {X1 _d, . . . , X0, ¥1 , . . . , Y;. },
and since by assumption,
we have
(9.5.2)
Hence if we apply the operator P8" to both sides of (9.5. 1 ) with t
obtain
Ps"Xn+h = Pn Y;. + h -
t
j� 1
( �) (
}
= n
- 1 ) j Psn Xn+ h -j ·
+ h, we
(9.5.3)
Since the predictors Pn Yn + 1 , P. Y;. + 2 , . . . , can be found from ( 5.3. 1 6), the pre­
dictors P8"Xn+ 1 , P8"X.+ 2 , , are then easily computed recursively from (9.5.3).
In order to find the mean squared error of prediction it is convenient to
express P. Yn+ h in terms of {XJ. For t :::0: 0 define
. • .
Then from (9.5. 1 ) and (9.5.3) with
n =
t we have
t :::0: 0,
and consequently for n
p
> m
= max(p, q) and h
q
:::0:
1,
P. Y;. + h = I cp i Pn Y;. + h - i + I en +h - 1 , /X n + h - j - x :+ h - J
j�h
i� 1
(9.5.4)
9 . Model Building and Forecasting with ARIMA Processes
318
Setting q) *(z) = ( 1 - z)d q)(z) = 1 - q)f z - · - lft;+d z p + d, we find from (9.5.2),
(9.5.3) and (9.5.4) that for n > m and h � 1 ,
· ·
q
p+d
Psn Xn+h = L q)f Psn Xn + h -j + L en +h - l , j(Xn+ h-j - x:+h -J,
j=l
j=h
(9.5.5)
which is analogous to the h-step prediction formula (5.3. 1 6) for an ARMA
process. The same argument which led to (5.3.22) shows that the mean squared
error of the h-step predictor is (Problem 9.9)
(9.5.6)
Where 8n 0 = 1 ,
ro
x (z) = I x, z ' = ( 1 - iftiz r=O
· · ·
-
lft;+ d z p + d ) - 1 ,
lzl < 1,
and
The coefficients Xi can be found from the recursions (5.3.2 1 ) with 1ft/ replacing
q)i. For large n we can approximate (9.5.6), provided 8( · ) is invertible, by
h-1
a;(h) = L l/JJa 2 ,
(9.5.7)
j=O
where
l/J(z) = L l/Ji z i = (q)*(z)t 1 8(z),
j=O
ro
lzl < 1 .
EXAMPLE 9.5. 1 . Consider the ARIMA( 1 , 2, 1) model,
(1 - q)B) ( 1 - B) 2 X, = ( 1 + 8B)Z,,
t = 1 , 2, . . .
'
where (X _ 1 , X 0) is assumed to be uncorrelated with the ARMA(1, 1 ) process,
r; = ( 1 - Bf X,, t = 1 , 2, . . . . From (5.3 . 1 2) we have
and
pn yn+ ! = q) Y,. + en ! ( Y,. - Y,.)
Since in this case q)*(z) = (1 - z) 2 ( 1 - q)z)
we find from (9.5.5) that
{
=
1 - (I)) + 2)z + (21)) + 1 )z 2 - q)z 3 ,
Ps" Xn+! = (I)) + 2)Xn - (21)) + 1 )Xn - 1 + iftXn- 2 + 8" 1 ( Y,. - f,.),
Ps" Xn+h = (q) + 2) Ps" Xn+h - ! - (21)) + 1 ) Ps" Xn+ h- 2 + lftPs" Xn+h - 3
for h > 1 .
(9.5.8)
§9.5. Forecasting ARIMA Models
If for the moment we regard
319
n
a s fixed and define the sequence
{ g(h)} by
g(h) = Ps" Xn +h•
{ g(h)} satisfies the difference equations
1/> *(B)g(h) g(h) - (¢> + 2)g(h - 1) + (2¢> + 1 )g(h - 2) - if>g(h - 3) = 0,
(9.5.9)
h > 1, with initial conditions,
then
=
(9.5.1 0)
Using the results of Section 3.6, we can write the solution of the difference
equation (9.5.9) in the form
g(h) = a0 + a1 h + a 2 ¢> \
where a0, a1 and a 2 are determined by the initial conditions (9.5. 10).
Table 9. 10 shows the results of predicting the values X199, X2 00 and X2 0 1
o f an ARIMA(1, 2, 1) process with ¢> = .9, 8 = .8 and a 2 = 1, based o n 200
observations {X _1 , X0, . . . , X1 9 8 } . By running the program PEST to compute
the likelihood of the observations Y, = ( 1 - B) 2 X,, t = 1, . . . , 198, under the
model,
{Z, } � WN(O, 1),
we find that Y1 98 - Y1 98 = - 1 .953, 81 97 , 1 = .800 and v 197 = 1 .000. Since
81 97, 1 = limn�co en , 1 and v ! 97 = limn�ro vn to three decimal places, we use the
large-sample approximation (9.5.7) to compute a?98(h). Thus
( 1 - .9B) Y, = (1 + .8B)Z,,
h- 1
h-1
a?98(h) = L t/lf a 2 = L t/Jf ,
j� O
where
j� O
t/J (z) = 8(z)/¢> *(z)
= (1 + .8z)(1 - 2.9z + 2.8z 2 - .9z3r 1
= 1 + 3.7z + 7.93z 2 + 1 3.537z3 + . . ·
Since X19 6
=
lzl < 1.
- 221 95.57, X 1 9 7 = - 22335.07, X198 = - 22474.41 and
,
X19 s - x r98 = Y1 9 s - Y1 9 s
equation (9.5.8) gives,
= - 2261 5. 1 7.
= - 1.95,
Ps , 9s X199 = 2.9X 1 9 s - 2.8 X197 + .9X 1 9 6 + .8(X!9 s - X (98)
9. Model Building and Forecasting with ARIMA Processes
320
Table 9. 10. Predicted Values Based on 200 Observations
{X _ 1 , X0, . . . , X198 } of the ARI M A ( 1 , 2, 1 ) Process in Example 9.5. 1
(the Standard Deviation of the Prediction Error Is Also Shown)
h
-1
0
Ps , x t 9s+h
- 22335.07
0
- 22474.41
0
••
(J [
gg (h )
- 226 1 5. 1 7
2
3
- 22757. 2 1
3.83
- 22900.41
8.81
These predicted values and their mean squared errors can be found from
PEST. The coefficients a0, a1 and a 2 in the function,
g(h) = P5,98 X1 9 s +h = a0 + a 1 h + a 2 (.9)h,
h 2: - 1,
can now be determined from the initial conditions (9.5. 1 0) with n = 1 98. These
give g(h) = - 22346.6 1 - 1 53.54h - 1 27.8(.9)h . Predicted values P5198 X 1 9 s + h
for any positive h can be computed directly from g(h).
More generally, for an arbitrary ARIMA(p, d, q) process, the function
defined by
g(h)
=
P5"Xn+ h
satisfies the (p + d)1h-order difference equation,
r/J*(B)g(h)
=
0 for h
>
q,
with initial conditions
h
=
q, q
-
1 , . . . ' q + 1 - p - d.
The solution g(h) can be expressed for d 2: 1 as a polynomial of degree (d - 1 )
plus a linear combination of geometrically decreasing terms corresponding to
the reciprocals of the roots of r/J(z) = 0 (see Section 3.6). The presence of the
polynomial term for d 2: l distinguishes the forecasts of an ARIMA process
from those of a stationary ARMA process.
§9.6 Seasonal ARIMA Models
Seasonal series are characterized by a strong serial correlation at the seasonal
lag (and possibly multiples thereof). For example, the correlation function in
Figure 9.4 strongly suggests a seasonal series with six seasons. In Section 1 .4,
we discussed the classical decomposition of the time series X, = m, + s , + Y,
where m, is the trend component, s, is the seasonal component, and Y, is the
random noise component. However in practice it may not be reasonable to
assume that the seasonality component repeats itself precisely in the same way
cycle after cycle. Seasonal ARIMA models allow for randomness in the
seasonal pattern from one cycle to the next.
§9.6. Seasonal ARIMA Models
32 1
Suppose we have r years of monthly data which we tabulate as follows:
Month
2
Year
2
3
r
12
X!
x1 3
Xz s
Xz
x14
Xz6
xl 2
Xz 4
x 36
X I + ! 2(r-1)
x 2 +1 2(r-1)
x 1 2+1 2(r-1)
Each column in this table may itself be viewed as a realization of a time series.
Suppose that each one of these twelve time series is generated by the same
ARMA(P, Q) model, or more specifically that the series corresponding to the
r mOnth, Xj+ 1 2P f 0, . . . , r - 1 , SatisfieS a difference equatiOn Of the form,
=
Xj+ l 2t = <ll 1 Xj+ l 2(t-l) + . . . + <llp Xj+ l 2 (t- P) + � + 1 2t
+ e l uj+ l 2(t - l ) + · · · + e Q uj + l 2(t - Q> •
(9.6. 1 )
where
{ � +12n t = . . . , - 1 , 0, 1 , . . . }
- WN(O, rrb).
(9.6.2)
Then since the same ARMA(P, Q) model is assumed to apply to each month,
(9.6. 1 ) can be rewritten for all t as
x t = <Di x t - 1 2 + · · · + <Dp X t - 1 2 P + ut + e 1 u1 - 1 2 + · · · + e Q ut - 1 2 Q ,
where (9.6.2) holds for each j = 1 , . . . , 1 2. (Notice however that E( U1 Ut + h ) is
not necessarily zero except when h is an integer multiple of 1 2.) We can thus
write (9.6. 1 ) in the compact form,
(9.6.3)
where <ll (z) = 1 - <ll 1 z - · · · - <llp z P, E>(z) = 1 + E> 1 z + · · · + E> Q z Q, and
{ � + 1 2n t = . . . , - 1 , 0, 1 , . . . } - WN(O, rrb ) for each j. We refer to the model
(9.6.3) as the between-year model.
EXAMPLE 9.6. 1 . Suppose p = 0, Q = 1 and e I = - .4. Then the series of
observations for any particular month is a moving average of order 1 . If
E(U1 Ut + h ) = 0 for all h, i.e. if the white noise sequences for different months
are uncorrelated with each other, then the columns themselves are uncorre­
lated. The correlation function for such a process is displayed in Figure 9.20.
EXAMPLE 9.6.2. Suppose P = 1, Q = 0 and <11 1 = .7. In this case the 12 series
(one for each month) are AR( l ) processes which are uncorrelated if the white
noise sequences for different months are uncorrelated. A graph of the cor­
relation function of this process is shown in Figure 9.20.
9. Model Building and Forecasting with ARIMA Processes
322
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0 5
-0.6
-0.7
-0.8
-0.9
-1
v
0
12
24
36
60
48
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
- 0 �8
-0.9
- 1
�
0
12
A
\
24
(b)
36
48
Figure 9.20. The autocorrelation functions of {X, } when (a) X, = U,
7X, 1 2 = U, (see Examples 9.6. 1 and 9.6.2).
(b) X,
-
.
_
J
60
-
.
4U, 1 2 and
_
§9.6. Seasonal ARIMA Models
323
It is unlikely that the 12 series corresponding to the different months are
uncorrelated as in Examples 9.6. 1 and 9.6.2. To incorporate dependence
between these series we assume now that the { U, } sequence in (9.6.3) follows
an ARMA{p, q) model,
l/J(B) U,
=
8(B)Z1 ,
{ Z, }
� WN(O, CT 2 ).
(9.6.4)
This assumption not only implies possible non-zero correlation between
consecutive values of U,, but also within the twelve sequences { LJ +1 2,
t = . . . , - 1 , 0, 1, . . }, each of which was previously assumed to be uncor­
related. In this case (9.6.2) may no longer hold, however the coefficients
in (9.6.4) will frequently have values such that E( U, U,+ 1 2J is small for
j = ± 1 , ± 2, . . . . Combining the two models (9.6.3) and (9.6.4), and allowing
for differencing leads us to the definition of the general seasonal multiplicative
ARIMA process.
.
Definition 9.6.1 (The SARIMA(p, d, q) x (P, D , Q)s Process). If d and D are
non-negative integers, then {X, } is said to be a seasonal ARIMA(p, d, q) x
(P, D, Q)s process with period s if the differenced process t; : = ( 1 - B)d
( 1 - Bs)vX, is a causal ARMA process,
l/J(B) <I>(Bs ) t; = 8(B)0 (Bs )z,,
where ¢{z) = 1 - ¢ 1 z - · · · - c/JpzP, <l>(z) = 1 - <1> 1 z el z + . . . + ezz q and 0(z) = 1 + e l z + . . . e Q z Q.
· · ·
-
<l>p zP, 8(z)
=
1 +
Note that the process { t; } is causal if and only if l/J(z) =1= 0 and <l>{z) =I= 0 for
::::; 1 . In applications, D is rarely more than one and P and Q are typically
less than three.
Because of the interaction between the two models describing the between­
year and the between-season dependence structure, the covariance function
for a SARIMA process can be quite complicated. Here we provide general
guidelines for identifying SARIMA models from the sample correlation
function of the data. First, we find d and D so as to make the differenced
observations
lzl
stationary in appearance (see Section 9.2). Next, we examine the sample auto­
correlation and partial autocorrelation functions of { t; } at lags which are
multiples of s in order to identify the orders P and Q in the model (9.6.3). If
p( - ) is the sample autocorrelation function of { t;} then P and Q should be
chosen so that p(ks), k 1 , 2, . . . , is compatible with the autocorrelation
function of an ARMA(P, Q) process. The orders p and q are then selected by
attempting to match p ( 1 ), . . . , p(s - 1) with the autocorrelation function of
an ARMA(p, q) process. Ultimately the AICC criterion (Section 9.3) and the
goodness of fit tests (Section 9.4) are used to identify the best SARIMA model
among competing alternatives.
=
324
9. Model Building and Forecasting with ARIMA Processes
For given values of p, d, q, P, D and Q, the parameters cj), 9, <1>, 0 and u 2
can be found using the maximum likelihood procedure of Section 8.7. The
differences Yr = ( 1 - B)d(1 - nst X, constitute an ARMA(p + sP , q + s Q)
process in which some of the coefficients are zero and the rest are functions
of the (p + P + q + Q)-dimensional vector �' = W, <I>', 9', 0'). For any fixed
� the reduced likelihood /(�) of the differences Y1 + + ' . . . , Y,, is easily
d sD
computed as described in Section 8.7. The maximum likelihood estimate of �
is the value which minimizes /(�) and the maximum likelihood estimate of u 2
is given by (8.7.5). The estimates can be found using the program PEST by
specifying the required multiplicative relationships between the coefficients.
The forecasting methods described in Section 9.5 for A RIMA processes can
also be applied to seasonal models. We first predict future values of the ARMA
process { Yr} using (5.3. 1 6) and then expand the operator (1 - B)d(1 - Bs)D to
derive the analogue of equation (9.5.3) which determines the best predictors
recursively. The large sample approximation to the h-step mean squared error
for prediction of {X, } is u; (h) = u 2 '[.J-:,6 t/J} , where u 2 is the white noise
variance and '[.�0 t/Jiz i = 8 (z)0(z5)/[tft(z)<l>(z5) ( 1 - z)d(1 - zs)D], l z l < 1 .
Invertibility is required for the validity of this approximation.
The goodness of fit of a SARIMA model can be assessed by applying the
same techniques and tests described in Section 9.4 to the residuals of the fitted
model. In the following example we fit a SARIMA model to the series {X, }
of monthly accidental deaths i n the U.S.A. (Example 1 . 1 .6).
EXAMPLE 9.6.3. The accidental death series X 1 , , X 7 is plotted in Figure
2
1 .6. Application of the operator (1 - B) ( 1 - B 1 2 ) generates a new series { 'Yr }
with no apparent deviations from stationarity as seen in Figure 1 . 1 7. The
sample autocorrelation function p( · ) of { 'Yr } is displayed in Figure 9.2 1 . The
values p ( 1 2) = - .333, p(24) = - .099 and p(36) = .01 3 suggest a moving
average of order 1 for the between-year model (i.e. P = 0, Q = 1 ). More­
over inspection of p(1 ), . . . , p ( 1 1 ) suggests that p(1) is the only short-term
correlation different from zero, so we also choose a moving average of order
1 for the between-month model (i.e. p = 0, q = 1 ). Taking into account the
sample mean (28.83 1 ) of the differences Y, = (1 - B) ( 1 - B 1 2 )X,, we therefore
arrive at the model,
• . .
{ Z, } � WN (0, u 2 ),
for the series { Yr } . The maximum likelihood estimates of the parameters are,
and
{J1
=
- .479,
01
=
- .59 1 ,
8- 2 = 94240,
with AICC value 855.53. The fitted model for {X,}
IS
thus the
325
§9.6. Seasonal ARIMA Models
1
0.9
0 8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
36
24
12
0
Figure 9.2 1 . The sample ACF of the differenced accidental deaths {VV1 2 X, }.
p(l) - p(12): - .36 - . 1 0 .10 - . 1 1 .04 . 1 1 - .20 - .0 1 . 10 - .08 .20 - .33
p(13) - p(24):
.09
p(25) - p(36): - .03
SARIMA(O,
1, 1)
x
. 1 2 - .04 - .06
. 1 8 - . 19
.09 - . 1 6
.02
.11
.02
.03 - .04
.05 - . 1 2
.04
.03
.00 - .09
.04
. 1 6 - . 10
.0 I
(0, 1 , 1 ) 1 2 process
( 1 - B) (1 - B 1 2 )X, = Y; = 28.83 1 + ( 1 - .479B)( 1 - .59 1 B 1 2 )Z�>
(9.6.5)
where { Z,} � WN(O, 94240).
Notice that the model (9.6.5) has a slightly more general form than in
Definition 9.6. 1 owing to the presence of the constant term, 28.83 1 . Predicted
values and their mean squared errors can however still be computed as
described in Section 9.5 with minor modifications. Thus predicted values of
{ Y; } are obtained by adding 28.831 to the corresponding predicted values of
the ARMA process { Y; - 28.83 1 }. From (9.6.5) it is easy to write out the
analogue of (9.5. 1), from which the predicted values of {X,} are then found
recursively as in Section 9.5. The mean-squared errors are given as before by
(9.5.6), i.e. by ignoring the constant term in (9.6.5). Thus for large n the
mean-squared h-step prediction error is approximately (from (9.5.7)),
a� (h)
=
a2
h- 1
I t/JJ , where
j� O
a2 =
94240,
326
9. Model Building and Forecasting with ARIMA Processes
=
Table 9. 1 1 . Predicted Values of the Accidental Deaths Series for
t 73, . . . , 78, the Standard Deviations CJ1 of the Prediction Errors,
and the Observed Values X1
73
74
75
76
77
78
8441
307
7706
346
8550
38 1
8886
414
9844
443
1 028 1
471
8347
287
7620
322
8358
358
8743
394
9796
432
1 0 1 80
475
7798
7406
8363
8460
9217
93 1 6
Model (9.6.5)
Predictors
(JI
Model (9.6.6)
Predictors
a,
Observed values
X,
Instead o f fitting a S AR I M A model t o the series { X1 }, w e could look for
the best-fitting moving average model for {VV1 2 X1} as we did in Example
9.2.2. This procedure leads to the model
VV1 2 X1 = r; = 28.831 + Z1 - .596Z1 - l
- .68521 _ 1 2 + .45821 -1 3 ,
-
.405ZI_ 6
(9.6.6)
where {Z1} � WN(O, 7 1 370). The residuals for the models (9.6.5) and (9.6.6)
both pass the goodness of fit tests in Section 9.4. The AICC value for (9.6.6)
is 855.61 , virtually the same as for (9.6.5).
The program PEST can be used to compute the best h-step linear predictors
and their mean squared errors for any ARIMA (or SARIMA) process. The
asymptotic form (9.5.7) of the mean squared error (with CJ 2 replaced by vn)
is used if the model is invertible. If not then PEST computes the mean
squared errors by converting the model to an invertible one. In Table 9. 1 1
we show the predictors of the accidental deaths for the first six months of
1 979 together with the standard deviations of the prediction errors and the
observed numbers of accidental deaths for the same period. Both of the
models (9.6.5) and (9.6.6) are illustrated in the table. The second of these is
not invertible.
Problems
9. 1 . Suppose that {X, } is an ARIMA(p, d, q) process, satisfying the difference
equations,
¢(B) ( 1 - B)d X, = O(B)Z,,
Show that these difference equations are also satisfied by the process W, =
X, + A0 + A 1 t + · · · + A d - t t d - t , where A 0 , . . . , A d - ! are arbitrary random
variables.
327
Problems
9.2. The model fitted to a data set x 1,
+
X,
.4X,_1
. . . , x 100
=
Z,,
is
{ Z, } - WN(O, 1 ).
The sample acf and pacf of the residuals are shown in the accompanying table.
Are these values compatible with whiteness of the residuals? If not, suggest a
better model for {X, } , giving estimates of the coefficients.
2
3
4
5
6
7
8
9
10
II
12
.4 1 2
- .625
.025
- .044
- .228
.038
-.316
- .020
- .287
- .077
- . 1 98
- .007
-.I l l
- .061
- .056
- .042
- .009
.089
.048
.052
. 1 33
. 1 25
La g
ACF
PACF
.799
.799
9.3. Suppose { X, } is an MA(2) process, X, = Z, + 81 Zr-t + 82 Z,_ 2 , { Z, } ­
WN(O, 0"2). If the AR( 1 ) process, ( I - ¢B)X, = Y, , is mistakenly fitted to {X,},
determine the autocovariance function of { Y,}.
9.4. The following table shows the sample acf and pacf of the series,
Y,
=
VX, t = 1, . . . , 399, with I,�!i Y,
=
0 and Yr(O)
=
8.25.
(a) Specify a suitable ARMA model for { Y, } , giving estimates of all the parameters. Explain your choice of model.
(b) Given that X395 = 1 02.6, X396 1 05.3, X397 = 1 08.2, X398 = 1 10.5 and
X 399
1 1 3.9, use your model to find the best mean square estimates of X4 00
and X40 1 and estimate the mean squared errors of your predictors.
=
=
Lag
2
3
4
5
.4 1 8
- .068
.298
- .080
6
8
.115
- .083
.03 1
- .045
10
9
ACF
PACF
.808
.808
.654
.006
.538
.023
Lag
II
12
13
14
15
16
17
18
19
20
- .03 1
- .0 1 6
- .069
- .09 1
- .096
- .034
-.1 1 1
.001
- . 1 26
- .034
-.115
.05 1
-.116
- .03 1
-.1 16
- .005
- . 1 05
.008
- .083
.001
ACF
PACF
.210
.003
7
.007
.09 1
- .010
.003
9.5. Consider the process Y, rxt + f3 + Z, , t 0, 1 , 2, . . , { Z, } - IID(0, 0"2 ), where
rx and f3 are known. Observed values of Y0, . . . , Yn are available. Let W, =
Y, - Y, _ 1 rx, t I , 2, . . . .
(a) Find the mean squared error of the best linear predictor W, + 1 =
en d Wn - W, ) (use Problem 5.1 3).
Ps,;{ W I · · · · · Wn } w,+l
(b) Find the mean squared error of the predictor of Yn + h given by
Y, + wn + [ + hrx, h = 1, 2, . . . .
(c) Compare the mean squared error computed in (b) with that of the best
predictor E( Yn + h I Yo , . . . , Y,).
(d)* Compute the mean squared error of the predictor in (c) when rx and f3
are replaced by the least squares estimators & and /3 found from Y0, . . . , Y, .
=
-
=
=
=
.
9. Model Building and Forecasting with ARIMA Processes
328
9.6. Series A (Appendix A) consists of the lake levels in feet (reduced by 570) of Lake
Huron for July of each year from 1 875 through 1972. In the class of ARIMA
models, choose the model which you believe fits the data best. Your analysis
should include:
(i) a logical explanation of the steps taken to find the chosen model,
(ii) approximate 95% confidence bounds for the components of cjl and 0,
(iii) an examination of the residuals to check for whiteness as described in Section
9.4.
9.7. The following observations are the values X0, , X9 of an ARIMA(O, 1 , 2)
process, VX, = Z, - 1 . 1 Z,_ 1 + .28Z,_ 2 , {Z, } WN(O, 1): 2.83, 2. 1 6, .85, - 1.04,
.35, - .90, . 1 0, - .89, - 1 .57, - .42.
(a) Find an explicit formula for the function g(h) = P59X9 + h' h ;::: 0.
(b) Compute a�(h) for h = 1, . . . , 5.
• . •
�
9.8. Let {X, } be the ARIMA(2, 1 , 0) process,
(1 - .8B + .25B 2 )VX, = Z,,
{Z, }
�
WN(O, 1 ).
Determine the function g(h) = Ps Xn+h for h > 0. Assuming that n is large,
compute a;(h) for h = 1, . . . ' 5.
n
9.9. Verify equation (9.5.6).
9.10. Let { X, } be the seasonal process
(1 - .7B 2 ) X, = (1 + .3B 2 ) Z,
{Z, }
�
WN(O, 1).
Find the coefficients { t/lj } in the representation X, = L �o tjljZr -j ·
Find the coefficients { nj} in the representation Z, = L�o njXt -j ·
Graph the autocorrelation function of {X, } .
Find a n expression for P10X1 1 and P1 0X1 2 i n terms of X1, . . . , X10 and the
innovations X, - X, t = 1, . . . , 1 0.
(e) Find an explicit expression for g(h) = P1 0X1 o+h , h ;::: 1, in terms of g(1) and
(a)
(b)
(c)
(d)
g(2).
9. 1 1. Let { X, } be the seasonal process,
1
X, = (1 + .2B)(1 - .8B 2 )Z,,
(a) Determine the coefficients { nJ in the representation Z, = L}�o njXr -j ·
(b) Graph the autocorrelation function of {X, } .
9. 1 2. Monthly observations { D,, - 1 1 :<::: t :<::: n} are deseasonalized b y differencing at
lag 1 2. The resulting differences X, = D, - D,_1 2 , t = 1, . . . , n, are then found to
be well fitted by the ARMA model,
X, - 1.3X,_1 + .5X,_ 2 = Z, + .5Z,_1 ,
{Z, }
�
WN(0, 3.85).
Assume in the following questions that n is large and {D, - 1 1 :<::: t :<::: 0} is
uncorrelated with {X,, t ;::: 1 }; P" denotes projection onto sp {X,, 1 :<::: t :<::: n} and
Ps" denotes projection onto sp { D,, - 1 1 :<::: t :<::: n}.
(a) Express P"X" + ' and PnXn+l in terms of X1 , . . . , X" and the innovations
(Xj - �- 1 X), j = 1, . . . , n.
Problems
329
(b) Express P5"Dn + l and P5"Dn+l in terms of {D,, - 1 1 � t � n}, PnXn+ l and
PnXn+ 2 ·
(c) Find the mean squared errors of the predictors P5"Dn + l and P5"Dn + l ·
9. 1 3. For each of the time series B-F in Appendix A find an ARIMA (or ARMA)
model to represent the series obtained by deleting the last six observations.
Explain and justify your choice of model in each case, giving approximate
confidence bounds for estimated coefficients. Use each fitted model to obtain
predicted values of the six observations deleted and the mean squared errors of
the predictors. Compare the predicted and observed values. (Use PEST to carry
out maximum likelihood estimation for each model and to generate the approxi­
mate variances of the estimators.)
CHAPTER 1 0
Inference for the Spectrum of a
Stationary Process
In this chapter we consider problems of statistical inference for time series
based on frequency-domain properties of the series. The fundamental tool
used is the periodogram, which is defined in Section 1 0. 1 for any time series
{ x 1 , . . . , x"}. Section 1 0.2 deals with statistical tests for the presence of "hidden
periodicities" in the data. Several tests are discussed, corresponding to various
different models and hypotheses which we may wish to test. Spectral analysis
for stationary time series, and in particular the estimation of the spectral
density, depends very heavily on the asymptotic distribution as n � oo of the
periodogram ordinates of the series {X 1 , , X" }. The essential results are
contained in Theorem 1 0.3.2. Under rather general conditions, the periodo­
gram ordinates J"(Jc;) at any set of frequencies A 1 , . . . , Am, 0 < A 1 < · . . < ),m < n,
are asymptotically independent exponential random variables with means
2nf(Jc;), were f is the spectral density of {X, } . Consequently the periodogram
In is not a consistent estimator of 2nf. Consistent estimators can however be
constructed by applying linear smoothing filters to the periodogram. The
asymptotic behaviour of the resulting discrete spectral average estimators can
be derived from the asymptotic behaviour of the periodogram as shown in
Section 10.4. Lag-window estimators of the form (2n) - 1 L lh l o:: r w(h/r)y(h)e - ihw,
where w(x), 1 :o:; x :o:; 1 , is a suitably chosen weight function, are also dis­
cussed in Section 10.4 and compared with discrete spectral average estimators.
Approximate confidence intervals for the spectral density are given in Section
10.5. An alternative approach to spectral density estimation, based on fitting
an ARMA model to the data and computing the spectral density of the fitted
process, is discussed in Section 1 0.6. An important role in the development of
spectral analysis has been played by the fast Fourier transform algorithm,
which makes possible the rapid calculation of the periodogram for very large
. • .
-
331
§ I 0. 1 . The Periodogram
data sets. An introduction to the algorithm and its application to the computa­
tion of autocovariances is given in Section 1 0.7. The chapter concludes with a
discussion of the asymptotic behaviour of the maximum likelihood estimators
of the coefficients of an ARMA(p, q) process.
§ 10. 1 The Periodogram
Consider an arbitrary set of (possibly complex-valued) observations x 1 , . . . ,
x" made at times 1 , . . . , n respectively. The vector
belongs to the n-dimensional complex space C". If u and v are two elements
of C", we define the inner product of u and v as in (2. 1 .2), i.e.
< u, v )
=
n
L U;V; ·
i=l
(10. 1. 1 )
By imagining the data x 1 , , x" to be the values at 1 , . . . , n of a function with
period n, we might expect (as is shown in Proposition 10. 1 . 1 below) that each
xt can be expressed as a linear combination of harmonics,
• • •
t = 1 , . . . , n,
( 1 0. 1 . 2 )
where the frequencies wj = 2nj/n are the integer multiples of the fundamental
frequency 2n/n which fall in the interval ( - n, n] . (Harmonics e itwJ with fre­
quencies 2nj/n outside this interval cannot be distinguished on the basis of
observations at integer times only.) The frequencies wj = 2nj/n, - n < wj s n,
are called the Fourier frequencies of the series { x 1 , . . . , x" } . The representation
( 1 0. 1 .2) can be rewritten in vector form as
x =
where
L ajej,
j E Fn
(10. 1 .3)
(10. 1 .4)
}"',
=
{jEZ : -n
<
wj = 2nj/n s n}
=
{ - [ (n
-
1 )/2] , . . . , [n/2] }, ( 1 0. 1 .5)
and [x] denotes the integer part of x. Notice that F" contains n integers. The
validity and uniqueness of the representation (10. 1 .3) and the values of the
coefficients aj are simple consequences of the following proposition.
Proposition 10.1.1. The vectors ej, j E F", defined by ( 1 0. 1 .4) constitute an ortho­
normal basis for C".
332
10. Inference for the Spectrum of a Stationary Process
PROOF.
< ej, ek> =
n - 1 I e ir<wrwk>
r=l
n
D
Corollary 1 0.1.1. For any X E en ,
( 1 0. 1 .6)
where
ai = <x, ej )
n
=
n -1!2 L x,e -it wJ.
t� 1
( 1 0. 1 .7)
PROOF. Take inner products of each side of ( 1 0. 1 .6) with ei , j E Fn .
D
Definition 10.1.1. The discrete Fourier transform of X E e n is the sequence
{ai ,j E Fn } defined by ( 1 0. 1 .7).
Definition 10.1.2 (The Periodogram of x E e). The value J (wi ) of the periodo­
gram of x at frequency wi = 2 nj/n, j E Fn, is defined in terms of the discrete
Fourier transform { ai } of x by,
l (wJ := l ai l 2
=
l < x , ei ) l 2
=
n-1
1 f x,e- itwJ J 2 .
( 1 0. 1 .8)
t�1
Notice that the periodogram decomposes ll x ll 2 into a sum of components
[ ( x ei ) l 2 associated with the Fourier frequencies, wi , j E Fn- Thus
,
ll x ll 2 = I l (wJ.
j E Fn
( 1 0. 1 .9)
This decomposition can be neatly expressed as the "analysis of variance"
shown in Table 1 0. 1 . ( [y] denotes the integer part of y.)
Table 1 0. 1 . Decomposition of l l x l l 2 into Components
Corresponding to the Harmonic Decomposition ( 1 0. 1 .6) of x
Source
Degrees of freedom
Sum of squares
Frequency w_[<n - 1 )/2 1
I a-r<n -1 )/2] 1 2
Frequency w0 (mean)
l ao l 2 = n - 1 1 L x, l 2
Freq uency Wrn1 1
2
l a[n/2 ] 1 2
Total
n
t=l
n
ll x ll 2
333
§I 0. 1 . The Periodogram
If x E IR:" and if wj ( = 2nj/n) and - wj are both in ( - n, n], it follows from
aj = a_j and l (wj) = I( - wj). We can therefore rewrite ( 1 0. 1 .6) in
the form
( 1 0. 1 .7) that
X=
a oeo +
[ (n-1 )/2]
L (aj ej + aj e -j ) + an;2 en/2 '
j= 1
( 1 0. 1 . 1 0)
where the last term is defined to be zero if n is odd. Writing aj in its polar
form, aj = rj exp(i8J, we can reexpress ( 1 0. 1 . 1 0) as
x = a0 e0
+
[ (n -1)/2] 1
L 2 12 rj(cj cos 8j - sj sin 8) + a n12 en12 ,
j
( 1 0. 1 . 1 1 )
=1
where
and
sj = (2/n) 1i2 (sin wj, sin 2wj, . . . , sin nwX
Now { e 0, c 1 , s 1 , . . . , c [ (n -1 )121, s[(n- 1 )12 1, en12 }, with the last vector excluded if n is
odd, is an orthonormal basis for IR:". We can therefore decompose the sum of
squares L7= 1 xl into components corresponding to each vector in the set. For
1 � j � [(n - 1 )/2], the components corresponding to cj and sj are usually
lumped together to produc� a "frequency w/' component as in Table 1 0.2.
This is just the squared length of the projection of x onto the two-dimensional
subspace sp {cj, sj } of IR:".
Notice that for x E IR:" the same decomposition is obtained by pooling the
contributions from frequencies wj and - wj in Table 1 0. 1 .
We have seen how the periodogram generates a decomposition of l l x l l 2
into components associated with the Fourier frequencies wj = 2njjn E ( - n, n].
Table 1 0.2. Decomposition of II x 1 1 2 , x E IR: " , into Components Corresponding
to the Harmonic Decomposition (1 0. 1 . 1 1 )
Source
Frequency
w0
Sum of squares
Degrees of freedom
a6 = n- 1
(mean)
Frequency w 1
2
2rf
n
L x(
=
Frequency wk
Frequency wn;z
Total
= n
(if n is even)
n
t=l
(� }
2 l a 1 12
x
=
=
J(O)
2 / (w t l
1 0. Inference for the Spectrum of a Stationary Process
334
It is also closely related to the sample autocovariance function
as demonstrated in the following proposition.
Y (k), l kl < n,
Proposition 10.1.2 (The Periodogram of x E IC" in Terms of the Sample Auto­
covariance Function). If wi is any non-zero Fourier frequency,
I(w) = I y(k) e-iko\
l kl < n
then
(10. 1 . 1 2)
where y(k) := n -1 L,�,;;-t (x, +k - m) (x, - m), k � 0, m := n - 1 I�= 1 x, and y(k) =
y( - k), k < 0. [If m = 0, or if we replace y(k) in (10. 1 . 1 2) by y(k), where y is
defined like y with m replaced by zero, the following proof shows that (10. 1 . 1 2)
is then valid for all Fourier frequencies, wi E ( - n, n].]
PROOF. By Definition
10. 1 .2, we can write
n
n
s=l
r=l
I(wj) = n -1 L xs e- iswj I x, eit wJ.
I(wj)
n n
n -1 I L (xs - m)(x, - m ) e-i(s - t)wj
s=l r= 1
= L y(k) e- ikwJ.
lk l < n
=
0
Remark. The striking resemblance between (10.1.12) and the expression f(w) =
y(k)e- ikw for the spectral density of a stationary process with
I l y (k) l < oo , suggests the potential value of the periodogram for spectral
(2n) -1 L::"=
_00
density estimation. This aspect of the periodogram will be taken up in Section
1 0.3.
§ 10.2 Testing for the Presence of Hidden Periodicities
In this section we shall consider a variety of tests (based on the periodogram)
which can be used to test the null hypothesis H0 that the data { X 1 , . . . , Xn }
is generated by a Gaussian white noise sequence, against the alternative
hypothesis H 1 that the data is generated by a Gaussian white noise sequence
with a superimposed deterministic periodic component. The form of the test
will depend on the way in which the periodic component is specified. The data
is assumed from now on to be real.
(a) Testing for the
model for the data is
Presence of a Sinusoid with Specified Frequency.
11
The
(10.2.1)
+ A cos wt + B sin wt + Z,,
where {Z, } is Gaussian white noise with variance a 2 , A and B are non-random
X, =
335
§10.2. Testing for the Presence of Hidden Periodicities
constants and w is a specified frequency. The null and alternative hypotheses
are
and
(10.2.2)
H0 : A
= B
H1
and B are not both zero.
:
A
=
0,
(10.2.3)
If w is one of the Fourier frequencies w = 2 nk/n E (0, n), then the analysis
of variance (Table 1 0.2) provides us with an easy test. The model ( 1 0.2. 1 ) can
be written, in the notation of (10. 1 . 1 1 ), as
Z � N(O, cr 2 /").
( 10.2.4)
We therefore reject H0 in favour of H1 if the frequency wk sum of squares in
Table 10.2, i.e. 2/(wk ), is sufficiently large. To determine how large, we observe
that under H0 (see Problem 2. 1 9),
2/(wk ) = II P5P{c".s.J X II 2 =
and that I(wd is independent of
II X - P5P{e0.c",s. ) X II 2
=
II P'P{c".s"J Z II 2 � cr 2 x2 (2),
n
L X? - 1(0) - 2/(wk )
i=l
� cr 2 x2 (n -
We therefore reject H0 in favor of H 1 at level a if
(n
- 3)/(wk )
/[� X? - /(0) - 2/(wd]
>
3).
F1 _a(2, n - 3).
An obvious modification of the above test can also be used if w = n .
However if w is not a Fourier frequency, the analysis is a little more com­
plicated since the vectors
1
c = (2/n) 12 (cos w, cos 2w, . . . , cos nw)' ,
s = (2/n) 1 12 (sin w, sin 2 w, . . . , sin nw)',
and e0 are not orthogonal. In principle however the test is quite analogous.
The model now is
1
1
1
X = n 12 f.1e0 + (n/2) 12 Ac + (n/2) 12 Bs + Z,
and the two hypotheses H0 and H 1 are again defined by (10.2.2) and
In this case we reject H0 in favor of H 1 if
is large. Now
and
2/* (w) : = 1 1 Psp{e0,c,s} X - Psp{e0) X II 2
under H0,
2/*(w) � cr2x2(2),
I*(w) is independent of
II X - P5P{e0,c,s} X II 2 � CT 2 X 2 (n -
3).
(10.2.3).
1 0. Inference for the Spectrum of a Stationary Process
336
We therefore reject H0 in favour of H 1 at level r:x if
(n - 3)J*(w)/I I X - Psp{e0, c , s} X II 2 > F1 -a (2, n - 3).
To evaluate the test statistic we have
_ n - 1 ;2
Psp { eo } X and (see Section
2.6)
n
"
L.,
i=1
X; eo,
Psp{ e0,c, s} X = n 1 12 t1 e0 + (n/2) 1i2 A + (n/2) 1i2 Bs'
c
r
where {i., A and B are least squares estimators satisfying
W' W({i, A , B)'
=
W'X,
and W is the (n x 3)-matrix [ n 112 e0 , (n/2) 1i2 c, ( n/2) 1i2 s] .
(b) Testing for the Presence of a Non-Sinusoidal Periodic Component with
Specified Integer- Valued Period, p < n. Iff is any function with values fr, t E 7l,
and with period p E (1, n), then the same argument which led to (10. 1 . 1 1 ) shows
that f has the representation,
[(p- 1 )/2 ]
1 (10.2.5)
A 1 ( - 1) ,
[Ak cos (2nkt/p) Bk sin (2nk t/p)]
=
fr
L
fl. +
+
+
k= 1
P2
0 if p is odd. Our model for the data is therefore
t = 1, . . . , n
where { Z, } is Gaussian white noise with variance a 2 and fr
( 10.2.5). The null hypothesis is
H0 : Aj = Bj = 0 for all j,
where A P12
:=
(10.2.6)
is defined by
(10.2.7)
and the alternative hypothesis is
H1 : H0 is false.
(10.2.8)
Define the n-component column vectors,
e0 = (1/n) 112 (1, 1, . . . , 1 )',
yj = (2/n) 1 12 (cos 1/Jj , cos 21/fj , . . . , cos ni/IJ'
and
Gj = (2/n) 1i2 (sin 1/Jj , sin 2 1/fj , . . . , sin n i/Jj )'
where 1/Jj 2 j/p, j 1 , 2, . . . , [p/2]. Now let S(p) be the span of the p vectors
e0, y 1 , CJ 1 , y 2 , CJ 2 , . . . , (the last is eP1 2 if p is even, CJ(p - 1 )1 2 if p is odd) and let W
be the (n x p)-matrix
W = [eo , y, , CJ1 , y 2 , . . ].
The projection of X = ( X 1 , , X.)' onto S(p) is then (see Section 2.6)
=
n
=
.
• . .
§ 1 0.2. Testing for the Presence of Hidden Periodicities
Ps<P> x =
From (10.2.5) and (10.2.6),
337
( 10.2.9)
W(W' w) - 1 wx.
(10.2. 10)
I I X - Ps<P> X II 2 = liZ - Ps<p> Z f � a 2 x2 (n - p),
since Z : = (Z1 ,
• • .
, Z")' � N (O, a 2 /"). Moreover under H0,
I I Ps(p)X - Psp { e0) X II 2 = I I Ps(p)z - P5P (e0 ) Z II 2 �
D"
2 X2 (p -
(10.2. 1 1)
1 ),
and is independent of II X - Ps<P> X II 2 .
We reject H0 in favour of H 1 if I I Ps(p)X - P5P{•o} X II is sufficiently large. From
( 1 0.2. 10) and (10.2. 1 1 ), we obtain a size rx test if we reject H0 when
II Ps<p> x - X l ll 2/(p - 1 )
>
I I X - Ps<P> X II 2 /(n - p)
Fl - a ( P -
(10.2. 1 2)
1 ' n - p),
where X = L:i'= 1 Xjn, 1 := (1, . . . , 1 )' and Ps<P> X is found from (10.2.9).
In the special case when n is an integer multiple of the period p, say n = rp,
the calculations simplify dramatically as a result of the orthogonality of the
p vectors e0, y 1 , cr 1 , y , cr , . . . . In fact, in the notation of (10. 1 . 1 1 ),
2
Hence, using Table
2
j = 0, . . . ' [ p/2] .
1 0.2,
1 1 Ps(p) X II 2 =
=
where bP =
reduces to
[p/2 ]
L
j= O
[ I I Psp{c.1) X II 2
+
11 Psp{s.1) X I I 2 ]
/(0) + 2 L I(wr) + bp l(n),
p
1
sj< /2
1 if p is even, 0 if p is odd. The rejection criterion (10.2. 12) therefore
where, as usual, wri =
2nrj/n.
(c) Testing for Hidden Periodicities of Unspecified Frequency: Fisher's Test.
If { X, } is Gaussian white noise with variance a2 and X = (X 1 , , X")', then,
since 2 /(wd = l l f'.--p { c• . s. J X II 2 , k = 1 , . . . , [(n - 1 )/2] , we conclude from Prob­
lem 2.19 that
• • •
where
k = 1, . . . ' q,
(1 0.2. 14)
q := [(n - 1 )/2] ,
and that V1 , . . . , Vq are independent. Since from ( 1 0.2. 1 4) the density function
of � is e - x I[ o, oo/x), we deduce that the joint density function of V1 ,
� is
• . • ,
338
10. Inference for the Spectrum of a Stationary Process
fv , . . . v. (VI , . . . , vq)
=
q
fl e - "• I[O, oo ) (vJ
i =l
( 1 0.2. 1 5)
This is the key result used in the proof of the following proposition.
Proposition 10.2.1 . If { X1 } is Gaussian white noise, then the random variables,
L � =I � L � = I I(wk )
=
i = 1, . . . ' q - 1,
'
Lk=l � Lk=l I(wd
are distributed as the order statistics of a sample of (q - 1 ) independent random
variables, each one uniformly distributed on the interval [0, 1 ] .
y
·. =
'
PROOF. Let si = L �=l J.j, i = 1, . . . , q. Then from ( 1 0.2. 1 5), the joint density
function of S1 , , Sq is (see e.g. Mood, Graybill and Boes ( 1974))
. • .
fs , . . . s. (s 1 , . . . , sq) = exp [ - s 1 - (s 2 - s d - · · · - (sq - sq_ 1 ) ]
0 s s 1 s · · · s sq.
= exp( - sq),
( 1 0.2. 1 6)
The marginal density function of Sq is the probability density function of the
sum of q independent standard exponential random variables. Thus
s: - 1
fs. (sq) = _ ! exp( - sq ),
(q 1 )
( 1 0.2. 1 7)
From ( 1 0.2. 1 6) and ( 1 0.2. 1 7), the conditional density of (SI ' . . . ' sq - 1 ) given sq is
fs, . . . s. _ , Js. (S 1 , . . . , sq - I I sq) = (q - 1 ) ! s; q + I ,
0 s s 1 s · · · s sq - I s sq .
Since by definition Y; = SJSq, i = 1 , . . . , q - 1 , the conditional density of
yl ' . . . ' Yq-1 given sq is
and since this does not depend on sq, we can write the unconditional joint
density of Y1 , , Yq -1 as,
• • •
0 s Y 1 s · · · s Yq s 1. ( 1 0.2. 1 8)
-1
This is precisely the joint density of the order statistics of a random sample
of size (q - 1) from the uniform distribution on (0, 1 ).
0
Corollary 10.2.1. Under the conditions of Proposition 1 0.2. 1 , the cumulative
distribution function with jumps of size (q - 1 ) - 1 at Y; , i = 1 , . . . , q - 1 , is the
empirical distribution function of a sample of size ( q - 1) from the uniform
distribution on (0, 1 ).
Corollary 10.2.2. If we define Y0 := 0, Yq := 1 and
§I 0.2. Testing for the Presence of Hidden Periodicities
339
then under the conditions of Proposition 10.2. 1,
P(Mq :::;; a) =
where x+
=
I ( - 1 )j ( :) ( 1 - ja)'t- 1 ,
j =O
max (x, 0).
(10.2.1 9)
1
PROOF. It is clear from Proposition 1 0.2. 1 that Mq is distributed as the length
of the largest subinterval of (0, 1 ) obtained when the interval is randomly
partitioned by (q - 1) points independently and uniformly distributed on
(0, 1). The distribution function of this length is shown by Feller (1971), p. 29,
to have the form (1 0.2. 19).
D
Fisher' s Test for Hidden Periodicities.
Corollary 10.2.2 was used by Fisher to
construct a test of the null hypothesis that {X, } is Gaussian white noise against
the alternative hypothesis that {X, } contains an added deterministic periodic
component of unspecified frequency. The idea is to reject the null hypothesis
if the periodogram contains a value substantially larger than the average
value, i.e. (recalling that q = [(n - 1 )/2]) if
�q : =
[ max
J(w;)J /[ q-1 .I J (w;)J
t=1
OS: t OS: q
I
=
( 10.2.20)
qMq,
is sufficiently large. To apply the test, we compute the realized value
from the data X 1 , . . . , X" and then use (1 0.2. 19) to compute
P(�q 2 x) = 1 -
I ( - 1)j ( :) ( 1 - jx/q)'t- 1 •
j= O
x of �q
(10.2.21 )
1
I f this probability i s less than rx, w e reject the null hypothesis a t level rx.
=
=
ExAMPLE 1 0.2. 1 . Figure 10. 1 shows a realization of {X 1 , . . . , X 1 00 } together
with the periodogram ordinates I(wj ),j = 1 , . . , 50. In this case q [99/2]
.
49 and the realized value of � 4 9 is x = 9.4028/1 . 1 092 = 8.477. From (10.2.21),
P(� 4 9 > 8.477) = .0054,
and consequently we reject the null hypothesis at level .01 . [The data was in
fact generated by the process
t = 1, . . . ' 100,
where { Z, } is Gaussian white noise with variance 1 . This explains the peak in
the periodogram at w 1 7 = .34n.]
X, = cos (nt/3) + Z,,
The Kolmogorov-Smirnov Test. Corollary 1 0.2. 1 suggests another test of the
null hypothesis that {X, } is Gaussian white noise. We simply plot the empirical
distribution function defined in the corollary and check its compatibility with
the uniform distribution function F(x) = x, 0 :::;; x :::;; 1, using the Kolmogorov-
10. Inference for the Spectrum of a Stationary Process
340
3
2
�
(\
� LN
0
-1
N
W\
1\
�
-2
�
\ \I
v
\
u
�
-3
0
10
20
30
40
50
60
70
80
90
1 00
(a)
0
10
20
(b)
30
40
50
Figure 1 0. 1 . (a) The series {X 1 , . . . , X 1 00} o f Example 1 0.2. 1 and (b) the corresponding
periodogram ordinates l(2nj/1 00), j = 1, . . . , 50.
§ 1 0.2. Testing for the Presence of Hidden Periodicities
341
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
0
20
10
40
30
50
Figure 1 0.2. The standardized cumulative periodogram C(x) for Example 1 0.2. 1
showing the Kolmogorov-Smirnov bounds for rx = .05 (inner) and rx = .01 (outer).
Smirnov test. For q > 30 (i.e. for sample size n > 62), a good approximation
to the level-a Kolmogorov-Smirnov test is to reject the null hypothesis if the
empirical distribution function exits from the bounds
0 < X < 1,
where k 0 5 = 1.36 and k 0 1 = 1 .63.
This procedure is precisely equivalent to plotting the standardized cumula­
tive periodogram,
C(x) =
{
0,
Y;,
1,
X < 1,
i ::::;; X < i + 1 , i = 1 , . . . , q
X :::0: q,
-
1,
( 10.2.22)
and rejecting the null hypothesis at level a if for any x in [ 1 , q], the function
C exits from the boundaries,
Y
=
x-1
q- 1
--
1
± ka (q - 1 ) - /2 .
( 1 0.2.23)
EXAMPLE 1 0.2.2. Figure 1 0.2 shows the cumulative periodogram and
Kolmogorov-Smirnov boundaries for the data of Example 1 0.2. 1 with
a = .05 and a = .01 . We do not reject the null hypothesis even at level .05
using this test. The Fisher test however rejected the null hypothesis at level
10. Inference for the Spectrum of a Stationary Process
342
.01 since it is specifically designed to detect departures from the null hypothesis
of the kind encountered in this example.
Generalization of the Fisher and Kolmogorov-Smirnov Tests. The null hypoth­
esis assumed for both these tests was that {X, } is Gaussian white noise.
However when n is large the tests can also be used to test the null hypothesis
that {X, } has spectral density f by replacing I(wk ) by I(wk )/f(wk ) in the
definitions of Y; and �q ·
§ 10.3 Asymptotic Properties of the Periodogram
In this section we shall consider the asymptotic properties of the periodogram
of X 1 , , X" when {X,} is a stationary time series with mean f.1 and absolutely
summable autocovariance function y( " ). Under these conditions {X, } has a
continuous spectral density (Corollary 4.3.2) given by
. • •
(10.3. 1)
w E [ - n, n].
f(w) = (2n) - 1 L y(k)e- ikw,
k = -oo
The periodogram of {X 1 , . . . , X. } is defined at the Fourier frequencies wi =
2nj/n, wi E [ - n, n], by
I.(wJ = n - 1 I X,e - ir wi
00
{
I
1= 1
l z·
10. 1.2, this definition is equivalent to
I.(O) = n J X J 2,
(10.3.2)
I.(wi) = Ln y (k)e - ikwi if wi -:f. 0,
kl l<
where y(k) = n - 1 L ��ik l (X, - X)(X, + Ik l - X) and X = n - 1 L �=1 X,. In deriving
the asymptotic properties of I. it will be convenient to use the alternative
By Proposition
representation,
(10.3.3)
which can be established by the same argument used in the proof of Pro­
position 1 0. 1 .2.
In view of (10.3.2) a natural estimate of f(wJ for wi -:f. 0 is I.(wi)/(2n). We
now extend the domain of I. to the whole interval [ - n, n] in order to estimate
f(w) for arbitrary non-zero frequencies in the interval [ - n, n]. This can be
done in various ways, e.g. by replacing wi in (10.3.2) by w and allowing w to
take any value in [ - n, n]. However we shall follow Fuller ( 1976) in defining
the periodogram on [ - n, n] as a piecewise constant function which coincides
with ( 1 0.3.2) at the Fourier frequencies wi E [ - n, n].
§I 0.3. Asymptotic Properties of the Periodogram
343
Definition 10.3.1 (Extension of the Periodogram). For any w E [ - n, n] the
periodogram is defined as follows:
if wk - n/n < w :::;; wk + n/n and 0 :::;; w :::;; n,
In( - w) If. w E [ - n, O).
Clearly this definition implies that In is an even function which coincides
with (10.3.2) at all integer multiples of 2n/n. For w E [0, n], let g(n, w) be
the multiple of 2n/n closest to w (the smaller one if there are two) and for
w E [ - n, 0) let g(n, w) = g(n, - w). Then Definition 10.3.1 can be rewritten as
(10.3.4)
In(w) = In(g(n, w)).
In(w) _
{In(wk )
The following proposition establishes the asymptotic unbiasedness of the
periodogram estimate In(w)/(2n) of f(w) for w # 0.
Proposition 1 0.3.1. If {X, } is stationary with mean J1 and absolutely summable
autocovariance function y ( · ), then
(i) Ein(O) - nj1 2 2nf(O)
and
(ii) Ein(w) -+ 2nf(w) if w # 0.
(If J1 = 0 then Ein(w) converges uniformly to 2nf(w) on [ - n, n].)
-+
PROOF. By Theorem
7. 1 . 1 ,
00
Ein(O) - nJ1 2 = n Var (Xn) -+ L y (n) = 2nf(O).
n= -oo
Now if w E (0, n] then, for n sufficiently large, g(n, w) #
and ( 10.3.4)
0. Hence, from (10.3.3)
n - lkl
Ein(w) = Ln n - 1 L E[(X, - j1)(Xr+ lk l - Jl) ] e - ikg(n ,ro)
t=!
lk l<
=
( 1 - l k l/n)y(k)e - ikg(n, w).
kl l< n
However, since y ( · ) is absolutely summable, L lk l < n (1 - I k l/n)y(k)e - ik.l. con­
verges uniformly to 2nf(2) and therefore (since g(n, w) -+ w) we have
EUw) -+ 2nf(w).
The uniform convergence of Ein(w) to 2nf(w) when J1 = 0 is easy to check
using the uniform continuity of f on [ - n, n] and the uniform convergence of
g(n, w) to w on [0, n].
D
L
As indicated earlier, the vectors {ci, si;j = 1 , . . . , q = [(n - 1 )/2] } in equa­
tion (10. 1 . 1 1) are orthonormal. Consequently if {X,} is Gaussian white noise
with variance a2, then the random variables,
1 0. Inference for the Spectrum of a Stationary Process
344
j=
j=
1, ...
1, . . .
' q,
(10.3.5)
, q,
are independent with distribution N(O, CJ 2 ). Consequently, as observed in
Section
the periodogram ordinates,
10.2,
j=
1,
. . . , q,
are independently and exponentially distributed with means CJ 2 = 2nfx (wi),
where
is the spectral density of { Xr}. An analogous asymptotic result
(Theorem
can be established for linear processes. First however we
shall consider the case when {X, } � IID(O, CJ 2 ).
fxC )10.3.2)
� IID(O, CJ 2 )
Suppose
that {Z, }
and let In(w), -n w n,
denote
the
periodogram
of
{Z1 , , Zn} as defined by (10. 3.4).
(i) If 0 2 1
then
thevector
random
vector (In(2 1and
)
con­
2
verges
in
distribution
as
n
to
a
of
independent
exponentially
distributed
random variables,andeachw with2njjnmean[0,CJ2n],• then
(ii) If EZ{ f/CJ4
i
3)CJ4 + 2 4 if wi 0 or n,
{n-1(ry
Var(Jn (wi ))
n -1 (., - 3)CJ4 + (J4 if 0 wi n, (10. 3.6)
and
(10. 3.7)
3 0 10.so 2that
(If Z 1 is normally distributed, then
(c).) In(wJ and In(wk) are
uncorrelated for =I k, as pointed out in Section
Proposition 10.3.2.
�
�
. • •
<
< ··· <
m
<
n
, . . . , In()om ))'
--> oo
=
=
< oo
E
CJ
=
.,
11
j
=
z
-
<
<
=
PROOF. (i) For an arbitrary frequency 2 E (O, n) define
2)),
a (2) := a ( ( 2)) and f3 (2) : =
where a (wi) and fJ(wJ are given by
with Z, replacing X,. Since In(2i) =
it suffices to show that
(a 2 (2J +
(a(2 1 ), f3 (2 d, . . . , a(2m ), f3 (2m))' is AN(O, CJ 2 I2 m ),
f3(g(n,
g n,
(10.3.5)
f32().J )/2,
(10.3.8)
where I2 m is the 2m 2m identity matrix.
if 2 is a fixed frequency in (0, n) then for all sufficiently large n,
g(n,Now
2) (0, n) and hence by the independence of the sequence { Z,}
Var(a(2)) Var (a (g(n, 2))
n
CJ 2 (2jn) L cos 2 ( g(n, 2)t)
x
E
=
=
t= l
§ 10.3. Asymptotic Properties of the Periodogram
Moreover for any s
>
345
0,
n
n -1 L E(cos 2(g (n, A.)t)ZI2 /[Ieos (g(n . .!)I)Z,I>tn'i 2a] )
1= 1
n
� n -1 L E(ZI2 J[IZ,I >enl/2a] )
1= 1
-+ 0 as n -+ oo ,
implying that a ( A.) is AN(O, tT2) by the Lindeberg condition (see Billingsley
( 1 986)). Finally, for all sufficiently large n, g(n, A.J E (0, n), i = 1, . . . , m, and since
the covariance matrix of (e<(A.d, f3(A.d, . . . , e<(A.m ), f3(A.m ))' is 0"2 12 m , the joint
convergence in (i) is easily established using the Cramer-Wold device.
(ii). By definition of J"(wJ, we have
n
n
In (w·)
J = n -1 '\'
� '\'
� Zs Zl e iwJ(I - s)'
s= 1 1 = 1
and hence,
n n n n
E ln (wJ in (wk ) = n - 2 L L L L E(ZS ZI Zu Zv ) eiwj(l - s)eiwk( v - u).
s= 1 1 = 1 u= 1 v = 1
By (7.3.4), this expression can be rewritten as
(
n -1 ( '1 - 3)0"4 + 0"4 1 + n - 2
and since EJ"(wj)
=
n -1 L�= 1 EZ12
1�
=
1
+
n-2
1�
1
l)
e i(wk - wj)l 2 ,
(I �� e i(wj+wk)l 12 + I �� ei(wk-wj)l 1 2}
0"2, it follows that
4
Cov (Jn (wj), Jn (wk)) = n -1 ('1 - 3)0" + n-20"4
.
l
ei(wj+wk)l 2
The relations (10.3.6) and ( 10 3 .7) are immediate consequences of this equation.
0
We next extend Proposition 10.3.2 to the linear process
{ Z1 } � IID(O, 0"2 ),
(10.3.9)
j= - oo
where L� - ro I I/I) < oo . The spectral density of this process is related (see
(4.4.3)) to the spectral density of the white noise sequence {Z1 } by
- n � A. � n,
j
where 1/J(e -u) = LJ= -ro 1/Jj e-i .< (and fz(A) = CT2j2n). Since J" (A.)/2n can be thought
of as a sample version of the spectral density function, we might expect a
similar relationship to exist between the respective periodograms of { X1 } and
{ Z1 }. This is the content of the following theorem.
I 0. Inference for the Spectrum of a Stationary Process
346
Theorem 1 0.3.1. Let {X, } be the linear process defined by ( 1 0.3.9) and let In , x(A)
and In, z(A) denote the periodograms of {X 1 , . . . , Xn } and {Z1 , . . . , Zn } respec­
tively. Then, if wk = 2nk/n E [0, n], we can write
(10.3.10)
where max w, E [0, 1t) E I Rn(wk ) I --+ O as n --> oo. /f in addition, L� - ro l l/li 1 1N12 < oo
and E I Z1 14 < oo, then maxw, E [0,1t] E 1 Rn (wd l 2 = O (n - 1 ).
Remark 1. Observe that we can rewrite (10.3. 10) as
n,
ln, x ( )o) = 1 1/J (e - ig( ).)W In, z(A) + R n (g(n, A)),
(10.3. 1 1)
where suph [ -" · "l E I R n (g(n, A)) l --> 0. In particular, Rn (g(n, A)) .!.. 0 for
every A E [ - n, n].
PROOF. Set A = wk E [0, n] and let lx(A) and Jz(A) denote the discrete Fourier
transforms of {X,} and { Z,} respectively. Then
n
lx(A) = n - 112 L X, e - w
r= 1
I.e.
( 1 0.3. 1 2)
,
n
oo
, 1,
).i Unj •
"" 1 z1 e - ;;, and Y.n ( A, ) - n - 112 "
" -j1 -j z,e -;;,, - L.,t=
Where Unj - L.,t=
_ 00 'f'j e -i
L.,j=
Note that if Ul < n, then Unj is a sum of 21jl independent random variables,
whereas if ljl :::::: n, Uni is a sum of 2n independent random variables. It follows
that
_
_
and hence that
Thus
(10.3. 1 3)
§ 10.3. Asymptotic Properties of the Periodogram
347
Now if m is a fixed positive integer we have for n > m,
co
n - 1/2 I l l/il [ min(l j[ , n) 1 /2 ::;; n - 1/2 I l l/il l lj[ 1 12 + I 1 1/!i [ ,
j� - oo
I i i S: m
l i l> m
whence
co
lim n - 1 12 I l l/li [ min([ j [, n) 112 ::;; I 1 1/!J
j� - co
n�co
l i l> m
Since m is arbitrary it follows that the bound in ( 1 0.3. 1 3) converges to zero as
n ---+ oo. Recalling that In , x(wk ) = lx (wk )lx( - wk ), we deduce from ( 1 0.3. 1 0)
and ( 1 0.3. 1 2) that
R " (A)
=
1/J (e - iA ) lz (A) Y, ( - A)
+
ljl (eiA ) Jz ( - A) Y, (A)
+ I Yn (AW .
Now [ ljl (e- i A ) l ::;; Ii� - oo l l/jl l < oo and E [ Jz (AW = Ein , z (A) = (J2• Moreover we
have shown that the bound in ( 1 0.3. 1 3) does not depend on A. Application of
the Cauchy-Schwarz inequality therefore gives
max E l Rn (wd l ---+ 0 as n ---+ oo.
Finally if E [ Z 1 [4
<
oo
and I� - co [ thl lj[ 1 12
E l Un) 4 ::;; 2 [ j [ E [ Z 1 [4
oo, then (see Problem
<
1 0. 1 4)
+ 3(2 [ j [ (J 2)2,
so that
El Yn ( A) [ 4 ::;;
=
n-2 C��oo
l l/lj [ (2 [j[ E [ Z 1 [4
+
1
1 2 1 N (J4 ) 14
r
O(n - 2).
Hence by applying the Cauchy-Schwarz inequality and Proposition 1 0.3.2 to
each of the terms in R� (A), we obtain
max E l R n (wk W = O(n- 1 )
as desired.
Theorem 1 0.3.2. Let
D
{ X, }
be the linear process,
j = � co
{ Z, } �
IID(O, (J2 ),
where I� - oo 1 1/!j l < oo . Let In (A) denote the periodogram of { X � > . . . , Xn } and
let f(A) be the spectral density of { X, } .
(i) If f(A) > 0 for all A E [ - n, n] and if 0 < A 1 < . . . < Am < n, then the
random vector Un ()o 1 ), , In ( Am))' converges in distribution to a vector of inde­
pendent and exponentially distributed random variables, the i1h component of
which has mean 2nf(A;), i = 1 , . . . , m.
(ii ) If I� - co l l/li l lj[ 1 12 < oo, EZt = rw4 < oo, wi = 2nj/n 2': 0 and wk =
2 nk/n 2': 0, then
• • .
348
Cov(In(wJ, In(wd) =
{
10. I nference for the Spectrum of a Stationary Process
2(2n) 2f 2 (wj ) + O(n - 1 12 ) if wj = wk = 0 or n,
(2n) 2j l (wJ + O(n - 1 12 ) if O < wj = wk < n,
O(n - 1 )
if wi =I= wk >
where the terms O(n - 112 ) and O(n - 1 ) can be bounded un iformly in j and k by
c n - 112 and c 2 n- 1 respectively, for some positive constants c and c 2 •
1
1
PROOF. From Theorem 10.3. 1 , we have
ln (g(n, }")) = 2 nf(g(n, Aj))a - 2 ln , z (Aj ) + Rn (g(n, I"J).
Since f(g(n, AJ) --+ j(Aj ) and Rn(g(n, Aj )) !. 0, the result (i) follows immedi­
ately from Propositions 10.3.2 and 6.3.8.
Now if L]= - oo l t/lj l ljl 112 < oo and EZ1 < oo then from (10.3. 1 1 ) we have
Var(In(wd) = (2nf(wd/a2 ) 2 Var(/n ,z (wd) + Var(Rn(wk))
In().)
=
+ 2(2nf(wk))/a2 ) Cov(Jn, z (Wk), Rn(wk)).
Since Var(Rn(wk)) :5: E I R n (wk) l 2 = O(n - 1 ) and since Var(/n , z(wd) is bounded
uniformly in wk, the Cauchy-Schwarz inequality implies that Cov(Jn,z (wk),
1
Rn(wk)) = O(n - 1 2 ) . It therefore follows from (10.3.6) and Proposition 1 0.3.2
that
if wk = 0 or n,
if O < wk < n.
A similar argument also gives
Cov(Jn(wJ, In(wk)) = O(n - 112 ) if wj =I= wk.
In order to improve the bound from O(n - 112 ) to O(n- 1 ) in this last relation we
follow the argument of Fuller ( 1 976).
Set w = wj and A = wk with A =I= w. Then by the definition of the periodo­
gram, we have
By the same steps taken in the proof of Proposition 7.3. 1 , the above expression
may be written as the sum of the following three terms:
n n n n oo
iw(t - s) e i.<( v - u)'
" L.
" L.
" "
,/,
,/,
n - 2 ('1 - 3) (j4 L.
L. "
L. ,/,
'l'j ,/,
'l't-s+j'l'u-s+j'l'v
- s+je
s = l t = l u==l v=l j= - oo
(10.3. 14)
( 1 0.3. 1 5)
and
§ 1 0.3. Asymptotic Properties of the Periodogram
349
(n -1 st1 vt1 y (v - s)e-iwse iAv) (n -1 �� .t1 y (u - t) e i"''e-iAu) .
(10.3. 1 6)
By interchanging the order of summation, we see that the first term is bounded
by
n- 2 (1] - 3)a4
C=�oo 1 1/Jj lr
= O (n - 2 ).
(10.3. 1 7)
Now the first factor of (1 0.3. 1 5) can be written as
n n
n n-u
n - 1 L L y (u - s) e- iw(s - u) e- i(H w)u n -1 L L y (s) e- iws e- i(w + A )u,
u=l s = l - u
s= l u = l
from which it follows that
n n
n -1 L L y (u - s) e-iws e- iAu
s= 1 u = l
(10.3. 1 8)
=
However, since w + A. 2n( j + k)/n i= 0 or 2n, we have for 0 :<S: s :<S: n
n -s
n
n
f- e i( w + A) U = f- e i(w + A)u
� e i ( w + A) U
u 1
u 1
u =n s+ 1
=
I
I I
lo
=
Similarly,
:'S:
_
_
s.
f
u=n - s+l
e i<w + A) u
l
I
I
It
e- i(w + A )u :'S: l s i ,
- n + 1 :<S: s :<S: - 1 .
u= s
These inequalities show that the right side of (10.3. 1 8) is bounded by
n- 1 L l s l l y (s) l
jsj < n
:'S:
:'S:
:<S:
=
00
n - l L L l s i i i/Jji/jJ +s l
jsj < n j = -oo
n - 112 L L l s i 112 I I/iJ i/Ji+sl
jsj < n j= - oo
OCJ
c=�oo j=�oo I s + ji 1/2 1 1/Js+ji/Jj l
+ s =�oo j =�oo J jl l/2 1 1/Jji/Js+j l )
2n -112 (1:00 l s i 112 1 1/Js1 ) c � 1 1/Jj l )
= oo
n -1;2
= O (n - 112 ).
-
1,
1 0. Inference for the Spectrum of a Stationary Process
350
Hence
I n- 1 s� ut1
y(u
i
- s) e- wse- ilu
I
=
O(n - 112 ).
(10.3. 1 9)
The relation ( 1 0.3. 1 9) remains valid if w is replaced by - w or if A is replaced
by - A or both. The terms ( 1 0.3. 1 5) and ( 1 0.3. 1 6) are therefore of order
).
Taking into account (10.3. 1 7) we deduce that Cov(J.(w), /.(A)) =
as
desired.
0
O(n- - 1
O(n 1)
A good estimator e. of any parameter () should be at least consistent, i.e.
oo. However Theorem 10.3.2 shows
should converge in probability to () as
is not a consistent estimator of j(),). Since for large the periodo­
that
gram ordinates are approximately uncorrelated with variances changing only
slightly over small frequency intervals, we might hope to construct a consistent
estimator of j(A) by averaging the periodogram ordinates in a small neigh­
borhood of A. (just as we obtain a consistent estimator of a population mean
by averaging the observed values in a random sample of size The number
of Fourier frequencies in a given interval increases approximately linearly with
By averaging the periodogram ordinates over a suitably increasing number
of frequencies in a neighborhood of A, we can indeed construct consistent
spectral density estimators as shown in the following section.
n -+
I.(A)/2n
n
n).
n.
§ 1 0.4 Smoothing the Periodogram
Let
{X,} be the linear process
LJ= -oo lthl l j l112
where
< oo. If
. . . , x., then we may write
I.(w),jE F., is the periodogram based on X 1 ,
j
=
1,
. . . , [(n - 1 )/2] ,
where f is defined by ( 1 0.3. 1) and the sequence { �} (by Theorem 10.3.2)
is approximately WN(O, 1) for large n. In other words, we may think of
= 1,
1)/2], as an uncorrelated time series with a trend
f(w) which we wish to estimate. The considerations of Section 1 .4 suggest
with the aid, for example,
estimating f(wj ) by smoothing the series
of the simple moving average filter,
+ 1 - I.(wj +k).
I,
(2n)- 1 J(w),j
. . . , [(n -
(2n)- 1
l k l ,o; m
{I.(w) },
(2m ) 1
More generally we shall consider the class of estimators having the form
j(wJ
=
(2nr1
I. W.(k)I.(wj +d,
l k l ,o; mn
( 1 0.4. 1)
351
§I 0.4. Smoothing the Periodogram
where { mn } is a sequence of positive integers and { W,(")} is a sequence of
weight functions. For notational simplicity we shall write m for m", the
dependence on n being understood. In order for this estimate of the spectral
density to be consistent (see Theorem 10.4. 1 below) we impose the following
conditions on m and { W,( · ) }:
m -> oo and m/n -> 0 as n ...... oo ,
W,(k) = W,( - k),
I w,(k) = 1 ,
l k l ,; m
W,(k)
;::::
0,
( 10.4.2)
for all k,
(10.4.3)
( 10.4.4)
and
(10.4.5)
I W,2(k) -> 0 as n -> oo .
l k l ,; m
If wi+k ¢ [ the term l (wi+k ) in ( 1 0.4. 1) is evaluated by defining I to
have period
The same convention will be used to define f(w), w ¢ [ We shall refer to the set of weights { W,(k), l k l $; m} as a filter.
2n.n, n],
n, n].
Definition 10.4.1 (Discrete Spectral Average Estimator). The estimator
](w) = ](g(n, w)),
with ](wJ defined by ( 1 0.4. 1 ) and m and { W,( · ) } satisfying ( 10.4.2)-(10.4.5),
is called a discrete spectral average estimator of f(w).
The consistency of discrete spectral average estimators is established in the
proof of the following theorem.
Theorem 10.4.1. Let { X1 } be the linear process,
j :::=: - oo
with I� I t/li l lj l 112 < oo and EZ{ < oo . If j is a discrete spectral average
estimator of the spectral density f, then for A, w E [0,
(a) lim E](w) = f(w)
-oo
and
(b) lim
n� oo
(
I W,2( j )
Ii i ,; m
n],
)
-1
{
2j 2 (w) if w = A = 0 or
Cov( ](w), f(A)) = JZ (w) if O < w = A <
O
if w =I= A.
n,
n,
PROOF. (a) From (10.4. 1) we have
I E](w) - f(w)l =
I lkIl
[(2n)-1 Eln(g(n, w) + wk ) - f(g(n, w) + wd
(1 0.4.6)
+ f(g(n, w) + wk ) - f(w)J I ·
,; m
W,(k)
352
10. Inference for the Spectrum of a Stationary Process
2) m implies that
lmax
kl ,; m l g(n,w) + wk - wl -> 0 as n ->
0, this implies b y the continuity of f, that
lmax
kl,;m l f( g(n,w) + wd - f(w) l :::;; e/2,
The restriction (1 0.4. on
oo .
For any given
s >
n sufficiently large. Moreover, by Proposition 1 0.3. 1 ,
lmax
kl,;m 1 (2n)-1 EI"(g(n,w) + wk) - f(g (n,w) + wdl < s/2,
for n sufficiently large. Noting that L l k l ,; m W,(k) = 1, we see from (10.4.6) that
A w - f w l :::;; s for n sufficiently large. Since is arbitrary, this implies
IEf
() ()
that E/(w) -> f(w) .
(b) From the definition of f we have
Cov(/(w),/(A))
(2nr2 UlL,; m lkLl ,; m W,( j) W,(k) Cov(l"(g (n, w) + wJ, I"(g(n, A) + wd).
I f w i= A and n i s sufficiently large, then g (n, w) + wj i= g (n, A) + wk for all l j l
l k l :::;; m. Hence, with c2 as defined in Theorem 10.3.2,
I Cov(/(w),/(A)) I I ljlL,; m lkLl ,; m W,(j) W,(k)O(n - 1 ) I
:::;; Cz n- 1 (ljlL,; m W,( j))z
:::;; c2 n-1 (ljlL,; m W,2(j)) (2m + 1 ).
Since m/n -> 0, this proves assertion (b) i n the case w i=
Now suppose that 0 < w = A < n. Then by Theorem 10.3.2,
Var(/(w)) = (2nr2 UlL,; m W,2 (j)((2nff2(g(n,w) + wj) + O(n-112 ))
+ (2n)- 2 UlL,; m lkLl ,; m W,(j) W,(k)O(n-1 ).
k #j
An argument similar to that used in the proof of (a) shows that the first term
for
s
=
,
=
X
is equal to
The second term is bounded by
c2 n -1(2nr2 (ljl,;m
L W, ( j))2 :::;; c2 n-1(2n)-2 Ul,;m
L W,2 (j)(2m + 1).
353
§ 1 0.4. Smoothing the Periodogram
Consequently
(liIi S m W/(j))- 1
Var( /(w)) -+ j l (w).
n
Remark 1. The assumption L l k l s m W,2(k)-+ 0 ensures that Var( /(w)) -+ 0.
Since E/(w) -+ f(w), this implies that the estimator /(w) is mean-square
The remaining cases w = A. = 0 o r are handled i n a similar fashion. D
consistent for f(w). A slight modification of the proof of Theorem 1 0.4. 1 shows
in fact that
sup IE/(w) - f(w)l -+ 0
-n S w :S tt
and
sup Var( /(w)) -+ 0.
-n, n],
Hence J converges in mean square to f uniformly on [
i.e.
2
sup E l /(w) - f(w)l = sup (Var( /(w)) + I E/(w) - f(w) l 2 )
Remark 2. Theorem 1 0.4. 1 refers to a zero-mean process { X1 }. In practice we
deal with processes { 1'; } having unknown mean Jl. The periodogram is then
usually computed for the mean-corrected series { 1'; - Y} where Y is the
sample mean. The periodograms of { 1';}, { 1'; - Jl} and { 1'; - Y} are all identical
at the non-zero Fourier frequencies but not at frequency zero. In order to
estimate f(O) we therefore ignore the value of the periodogram at frequency
0 and use a slightly modified form of ( 1 0.4. 1 ), namely
(2nr1 [ W,(O)/n(wd + 2 kt1 W,(k)In(wk+dJ. ( 1 0.4.7)
Moreover, whenever /n(O) appears in the moving averages (10.4. 1 ) for /(wi ),
j = 1 , . . . , [n/2], we replace it by 2n](O) as defined in ( 10.4.7).
/(0) =
EXAMPLE 1 0.4. 1 . For the simple moving average estimator,
k
if l l :5:, m,
otherwise,
we have
Ll kl s m W,2 (k) = (2m + 1)-1 so that2
{2j (w)
,
(2m + 1 ) Var(f(w)) -+ l
j (w)
if w = 0 or n,
if O < w <
n.
In choosing a weight function it is necessary to compromise between bias
and variance of the spectral estimator. A weight function which assigns
I 0.
354
Inference for the Spectrum of a Stationary Process
roughly equal weights to a broad band of frequencies will produce an estimate
of f( · ) which, although smooth, may have a large bias, since the estimate of
f(w) depends on values of /" at frequencies distant from w. On the other hand
a weight function which assigns most of its weight to a narrow frequency band
centered at zero will give an estimator with relatively small bias, but with a
large variance. In practice it is advisable to experiment with a range of weight
functions and to select the one which appears to strike a satisfactory balance
between bias and variance.
=
EXAMPLE 10.4.2. The periodogram of 1 60 observations generated from the
MA( 1 ) process X1 Z1 - .6Z1_ 1 , { Z1 } � WN(O, 1 ), is displayed in Figure
1 0.3. Figure 10.4 shows the result of using program SPEC to apply the filter
{ t , t , t } (W,(k) = (2m + 1 ) - 1 , l k l s m = 1 ). As expected with such a small
value of m, not much smoothing of the periodogram has occurred. Next we
use a more dispersed set of weights, W, (O) = W,(1) = W, (2) = l1 , W, (3) l1 ,
W,(4) = 2\ , producing the smoother spectral estimate shown in Figure 10.5.
This particular weight function is obtained by successive application of the
filters { t , j-, j-} and { �' i, �' i, �' t �} to the periodogram. Thus the esti­
mates in Figure 1 0.5 (except for the end-values) are obtained by applying
the filter {+, �' �' �' �' �' �} to the estimated spectral density in Figure 10.4.
Applying a third filter {/1 ' �\ , , 1\ , /J} to the estimate in Figure 10.5 we
obtain the still smoother spectral density estimate shown in Figure 1 0.6.
The weight function resulting from successive application of the three filters
is shown in the inset of Figure 1 0.6. Its weights (multiplied by 23 1 ) are
{ 1 , 3, 6, 9, 1 2, 1 5, 1 8, 20, 21, 2 1 , 2 1 , 20, 1 8, 1 5, 1 2, 9, 6, 3, 1 } . Except for the peak at
frequency w 7 5 , the estimate in Figure 1 0.6 has the same general form as the
true spectral density. We shall see in Section 5 that the errors are in fact not
large compared with their approximate standard deviations.
=
• . .
ExAMPLE 10.4.3 (The Wolfer Sunspot Numbers). The periodogram for the
Wolfer sunspot numbers of Example 1 . 1 .5 is shown in Figure 10.7. Inspecting
this graph we notice one main peak at frequency w 1 0 = 2n(. 1 ) (correspond­
ing to a ten-year cycle) and a possible secondary peak at w = w 1 2 . In Figure
10.8, the periodogram has been smoothed using the weight function W,(O) =
W, ( 1 ) = W, (2) l1 , W, (3) 221 and W,(4) = 2\ , which i s obtained b y succes.
.
1 7,
1 71 } to th e peno
1 7,
1 7,
1 7>
1 7>
1 31 } and { 7>
d osJVe app1"icatwn of th e two fIi l ters { 31 > 3>
gram. In Section 1 0.6 we shall examine some alternative spectral density
estimates for the Wolfer sunspot numbers.
.
=
=
Lag Window Estimators. The spectral density f is often estimated by a function
of the form,
JL(w)
=
1
(2n) - L w(hjr) Y (h)e - ihw,
I h i ,; r
( 10.4.8)
where y( · ) is the sample autocovariance function and w(x) is an even, piecewise
!i I 0.4. Smoothing the Periodogram
355
13
12
1 1
10
9
8
7
6
5
4
3
2
0
0. 1
0.3
0.2
Figure 1 0.3. The periodogram J1 60(2nc), 0 < c
Example 1 0.4.2.
:5:
0.4
0.5
0.5, of the simulated MA(l) series of
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0. 1
0.2
Figure 10.4. The spectral estimate /(2nc), 0 :5: c
with the weights { t, t , n
0.3
:5:
0.4
0.5
0.5, of Example 1 0.4.2, obtained
10. Inference for the Spectrum of a Stationary Process
356
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
0
0. 1
0.2
Figure 10.5. The spectral estimate ](2nc), 0
with the inset weight function.
0.3
::::;
c
::::;
0.4
0.5
0.5, o f Example 10.4.2, obtained
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0 2
0. 1
0
0
0. 1
0.2
Figure 10.6. The spectral estimate ](2nc), 0
with the inset weight function.
0.3
::::;
c
::::;
0.4
0.5
0.5, of Example 10.4.2, obtained
357
§1 0.4. Smoothing the Periodogram
16
15
14
13
12
1 1
VJ
u
c
0
�
VJ
:0
0
£
c
10
9
8
7
6
5
4
3
2
0
0
0. 1
0.2
0.3
0.4
0.5
Figure 10.7. The periodogram /1 00(2nc), 0 < c � 0.5, of the Wolfer sunspot numbers.
2.4
2.2
2
1 .8
�
"
u
c
0
"
:0
0
.r:
c
1 .6
1 .4
1 .2
0.8
0.6
0.4
0.2
0
0
0. 1
0.2
0.3
0.4
0.5
Figure I 0.8. The spectral estimate ](2nc), 0 < c � 0.5, o f the Wolfer sunspot numbers,
obtained with the same weight function as Figure 1 0.5.
358
1 0. Inference for the Spectrum of a Stationary Process
=
continuous function of x satisfying the conditions,
w(O) 1,
l w(x) l ::::; 1,
and
w(x)
=
for all x,
for l x l
0,
>
1.
The function w( · ) i s called the lag window, and the corresponding estimator
/L is called the lag window spectral density estimator. By setting w(x) = 1 ,
lxl ::::; 1, and r = n, we obtain 2n/L(w) = I.(w) for all Fourier frequencies w =
wi # 0. However if we assume that r = r. is a function of n such that r --+ oo
and rjn --+ 0 as n --+ oo , then /L is a sum of (2r + 1 ) terms, each with a variance
which is O(n- 1 ). If { r. } satisfies these conditions and {X, } satisfies the con­
ditions of Theorem 10.4. 1 , then it can be shown that /L(w) is in fact a mean­
square consistent estimator of f(w).
Although the estimator /L(w) and the discrete spectral average estimator
/(w) defined by (10.4. 1) appear to be quite different, it is possible to approxi­
mate a given lag window estimator by a corresponding average of periodogram
ordinates. In order to do this define a spectral window,
W(w)
= (2n)-1 lhlI� r w(hjr)e-ihw,
( 10.4.9)
and an extension o f the periodogram,
I hi <n
Then T,. coincides with the periodogram I. at the non-zero Fourier frequencies
2nj/n and moreover,
Y (h) = (2n) - 1
I:, e ih;.l. (A.) dA..
Substituting this expression into (10.4.8) we get
/L(w)
= (2n)-2 ihl�rI w(hjr) I " e-ih(w--<>T,.(A.) dA
= (2nr 2 I " (lhIl�r w(h/r)e- ih(w-;.)) ln(),) dA.
(2n) - 1 r, W(w A.)T,.(A.) d),
= (2nr 1 I:, W(A.)T,.(w A.) dA..
-n
-n
=
-
+
Partitioning the interval [ - n, n] at the Fourier frequencies and replacing the
last integral by the corresponding Riemann sum, we obtain
359
§ 1 0.4. Smoothing the Periodogram
]L(w) ::::: (2 ) - i Ln W(w)l,(w + w) 2 /n
I i <; [ /2 )
1
(2 ) - 1 L W(w)I.(g(n, w) + wi )2 jn .
i [n/ )
n
:::::
n
n
n
I i <; 2
Thus we have approximated ]L(w) by a discrete spectral average with weights
lj J � [n/2].
( 1 0.4. 1 0)
(Notice that the approximating spectral average does not necessarily satisfy
the constraints (1 0.4.2)-(1 0.4.5) imposed earlier.)
From ( 1 0.4. 10) we have
I W.2 ( j ) = (2nf I W 2 (wi)/n 2
I ii <; [n/2 )
Ii i <; [n/2 )
::::: 2n I "
n
=
:::::
{
W 2 (w) dw
-n
1
L w 2 (h/r)
n lhl <: r
-
r
n
I-l1
(by ( 10.4.9))
w 2 (x) dx .
Although the approximating spectral average does not satisfy the conditions
of Theorem 10.4. 1 , the conclusion of the theorem suggests that as n -> oo ,
n
- Var(fL(w)) ->
r
A
2j 1 (w)
JZ(w)
J
1
-1
w 2 (x) dx if w = 0 or n,
f1 w2 (x) dx
if O < w
<
(10.4. 1 1)
n.
If {X, } satisfies the conditions of Theorem 1 0.4. 1 and if { r. } satisfies the
conditions r. -> oo and r. /n -. 0 as n -. oo, then ( 10.4. 1 1 ) is in fact true and
E]L(w) -> f(w) for 0 � w � n. Proofs of these results and further discussion of
JL(w) can be found in the books of Anderson (1971), Brillinger ( 198 1 ) and
Hannan ( 1970).
Examples. We conclude this section by listing some commonly used lag
windows and the corresponding spectral windows W( · ) as defined by ( 10.4.9).
ExAMPLE 1 (The Rectangular or Truncated Window). This window has the
form
1 if J x l � 1 ,
w(x) =
0 otherwise,
{
and the corresponding spectral window is given by the Dirichlet kernel (see
Figure 2.2),
360
10. Inference for the Spectrum of a Stationary Process
W(w) = (2n) _ 1
sin\(r + i )w)
.
sm(w/2)
( 1 0.4. 1 2)
Observe that W(w) is negative for certain values of w. This may lead to
negative estimates of the spectral density at certain frequencies. From (10.4. 1 1 )
we have, a s n -> oo ,
2r
Var(fL(w)) � -j l (w) for 0 < w < n.
n
A
ExAMPLE 2 (The Bartlett or Triangular Window). In this case
w (x) _
{ 1 - l xl
0
if l x l � 1 ,
if lx l > 1 ,
and the corresponding spectral window i s given b y the Fejer kernel (see Figure
2.3),
W(w)
=
si nz(rw/2)
.
(2nr) - 1 . 2
sm (w/2)
Since W(w) � 0, this window always gives non-negative spectral density esti­
mates. Moreover, as n -> oo,
0
<
w
<
n.
The asymptotic variance is thus smaller than that of the rectangular lag
window estimator using the same sequence {rn } ·
EXAMPLE 3 (The Daniell Window). From (10.4. 1 0) we see that the spectral
window,
W(w) =
{rj2n,
0,
lwl � njr,
otherwise,
corresponds to the discrete spectral average estimator with weights
W,(j) = (2m + 1 ) - 1 ,
l j l � m = [nj2r].
From ( 10.4.9) we find that the lag window corresponding to W(w) is
w(h/r)
I.e.
= f"
W(w)eihw dw = n - 1 (r/h)sin(nhjr),
w(x) = sin(nx)/(nx),
- 1 � x � l.
The corresponding lag window estimator has asymptotic variance
0<w
<
n.
36 1
§1 0.4. Smoothing the Periodogram
EXAMPLE 4 (The Blackman-Tukey Window). This lag window has the general
form
1 - 2a + 2a cos x, l x l ::;; 1 ,
w(x) =
0,
otherwise,
{
with corresponding spectral window,
W(w) = aD, (w - n/r) + ( 1 - 2a)D, (w) + aD, (w + n/r),
where D, is the Dirichlet kernel, (10.4. 1 2). The asymptotic variance of the
corresponding density estimator is
0 < w < n.
The Blackman-Tukey windows with a = .23 and a = .25 are often referred
to as the Tukey-Hamming and Tukey-Hanning windows respectively.
{1-
EXAMPLE 5 (The Parzen Window). This lag window is defined to be
6 1 x l 2 + 6 1 x l 3,
l x l < !,
w(x) = 2(1 - l x l ) 3 ,
t ::;; l x l ::;; 1 ,
0,
otherwise,
with approximate spectral window,
6 sin4 (rw/4)
.
W(w) = -3 .
nr sm (w/2)
4
The asymptotic variance of the spectral density estimator is
Var ( ]L (w)) � .539rjl(w)/n,
0 < w < n.
Comparison of Lag-Window Estimators. Lag-window estimators may be com­
pared by examining the spectral windows when the values of r for the different
estimators are chosen in such a way that the estimators have the same asymp­
totic variance. Thus to compare the Bartlett and Daniell estimators we plot
the spectral windows
W8(w) = (2nrr 1 sin 2 (rw/2)/(sin 2 (w/2)) and W�(w) = r'/(2n), l w l ::;; njr',
(10.4.1 3)
where r' = 2rj3. Inspection of the graphs (Problem 1 0. 1 8) reveals that the mass
of the window W8 is spread over a broader frequency interval and has
secondary peaks or "side-lobes" at some distance from the centre. This means
that the Bartlett estimator with the same asymptotic variance as the Dan-iell
estimator is liable (depending on the spectral density being estimated) to
exhibit greater bias. For other factors affecting the choice of an appropriate
lag window, see Priestley (1981).
The width of the rectangular spectral window which leads to the same
1 0. Inference for the Spectrum of a Stationary Process
362
asymptotic variance as a given lag-window estimator is sometimes called the
bandwidth of the given estimator. For example the Bartlett estimator with
parameter r has bandwidth 2n/r' = 3n/r.
§ 10.5 Confidence Intervals for the Spectrum
In this section we provide two approximations to the distribution of the
discrete spectral average estimator /(w) from which confidence intervals for
the spectral density f(w) can be constructed. Assume that {X, } satisfies
the conditions of Theorem 10.4. 1 (i.e. X, = L� - ro thZ, _i, I I I/Iii iN12 < oo,
{Z, } � IID(O, cr 2 ) and Ez: < oo) and that / is the discrete spectral average
(10.5. 1)
/(wJ = (2nr 1 L l¥,.(k)In (wi + wk ).
lk l ,; m
The x 2 Approximation. By Theorem 10.3.2, the random variables In(wi + wd/
(nf(wi + wk )), -j < k < n/2 - j, are approximately independent and distri­
buted as chi-squared with 2 degrees of freedom. This suggests approximating
the distribution of j(wi ) by the distribution of the corresponding linear
combination of independent and identically distributed x 2 (2) random variables.
However, as advocated by Tukey ( 1 949), this distribution may in turn be
approximated by the distribution of c Y, where c is a constant, Y x 2 (v) and
c and v are found by the method of moments, i.e. by setting the mean and
variance of c Y equal to the asymptotic mean and variance of j(wJ. This
procedure gives the equations
�
c v = f(wJ,
2c 2 v = L l¥,.2 (k)jl (wi ),
l kl ,; m
from which we find that c = Llkl ,; m l¥,.2 (k)f(wi)/2 and v = 2/(Li kl ,; m l¥,2 (k)).
The number v is called the equivalent degrees of freedom of the estimator j
The distribution of v/(wJ/f(wJ is thus approximated by the chi-squared
distribution with v degrees of freedom, and the interval
(
v/(wJ vj(wi)
,
X.297 s (v) X.2o zs (v)
),
0 < wi < n,
(10.5.2)
is an approximate 95% confidence interval for f(wJ. By taking logarithms in
(10.5.2) we obtain the 95% confidence interval
(ln f(wJ + In v - In X2. 9 7 5 (v), ln f(wi ) + ln v - ln X.20 25 (v)),
(10.5.3)
A
A
for ln f(wi ). This interval, unlike (10.5.2) has the same width for each wi E (0, n) .
In Figure 1 0.9, we have plotted the confidence intervals (10.5.3) for the data
§ 1 0.5. Confidence Intervals for the Spectrum
0
0.1
0.2
363
0.3
0.4
0.5
Figure 1 0.9. 95% confidence intervals for ln(2n:f(2n:c)) based o n the spectral estimates
of Figure 1 0.6 and a x 2 approximation. The true function is also shown.
of Example 10.4.2 using the spectral estimate displayed in Figure 10.6. Using
the weights specified in Example 10.4.2 we find that Llk l :o; m W,2 (k) = .07052
and v = 28.36 so that (10.5.3) reduces to the interval
( 10.5.4)
cwj = (ln ](wj ) - .450, ln ](wJ + .617).
Notice that this is a confidence interval for ln f(wJ only, and the intervals
{ Cwj ' 0 < wi < n} are not to be interpreted as simultaneous 95% confidence
intervals for {ln f(wJ, O < wi < n}. The probability that C"'j contains ln f(wi)
for all wi E (0, n) is less than .95. However we would expect the intervals C"'j to
include ln f(wj ) for approximately 95% of the frequencies wi E (O, n). As can be
seen in Figure 10.9, the true log spectral density lies well within the confidence
interval (10.5.4) for all frequencies.
The Normal Approximation. There are two intuitive justifications for making
a normal approximation to the distribution of j(w). The first is that if the
equivalent number of degrees of freedom v is large (i.e. ifLik l :o; m W,2 (k) is small)
and if Y is distributed as x 2 (v), then the distribution of c Y can be well
approximated by a normal distribution with mean cv = f(wi) and variance
2c 2 v = Ll k l :o; m W,.2 (k)f 2 (w), 0 < wi < n . The second is that w e may approxi­
mate ](wi) for n large by a sum of (2m + 1) independent random variables,
which by the Lindeberg condition, is AN(f(wi), Likl :o;m W,.2 (k)j l (wi)). Both
points of view lead to the approximation N(f(w), Llk l :o; m W,2 (k)j 2 (wJ) for the
364
I 0. Inference for the Spectrum of a Stationary Process
0
0. 1
0.2
0.4
0.3
0.5
Figure 1 0. 1 0. 95% confidence intervals for ln(2nf(2nc)) based o n the spectral estimates
of Figure 1 0.6 and a normal approximation. The true function is also shown.
distribution of j(wJ Using this approximation we obtain the approximate
95% confidence bounds,
1 12
f(wi ) ± 1 .96 L W,,l (k ) f(wi ),
lkl oS m
for f(wi ). Since the width of the confidence interval depends on j(wi ), it
is customary to construct a confidence interval for ln f(wi ). The normal
approximation to j(wi ) implies that ln j(wJ is AN(lnf(wJ, Ll kl oS m W,2 (k))
by Proposition 6.4. 1 . Approximate 95% confidence bounds for ln f(wJ are
therefore given by
112
ln j(wJ ± 1 .96 L W,2 ( k) .
(10.5.5)
lkl oS m
For the spectral estimate shown in Figure 1 0.6, we have L lkl oS m W,2 (k) =
.07052, so that the bounds ( 1 0.5.5) become
ln j(wJ ± .520.
(10.5.6)
)
(
(
)
These bounds are plotted in Figure 10. 1 0. The width of the intervals ( 10.5.4)
based on the x 2 approximation is very close to the width of the intervals
(10.5.6) based on the normal approximation. However the normal intervals
are centered at ln j(wi ) and are therefore located below the x 2 intervals. This
365
§ 1 0.6. Rational Spectral Density Estimators
can be seen in Figure 10. 1 0 where the spectral density barely touches the upper
limit of the confidence interval. For values of v 2: 20, there is very little
difference between the two approximations.
§ 10.6 Autoregressive, Maximum Entropy, Moving
Average and M aximum Likelihood ARMA
Spectral Estimators
The m1h order autoregressive estimator fm(w) of the spectral density of a
stationary time series {X, } is the spectral density of the autoregressive process
{ Y, } defined by
Y, - �m 1 Yr- 1 -
···
- �mm Yr-m
=
Z,,
( 1 0.6. 1 )
where �m = (�m 1 , . . . , �mmY and vm are the Yule-Walker estimators defined by
(8.2.2) and (8.2.3). These estimators can easily be computed recursively using
Proposition 8.2. 1 . Then yy(h) = y(h), h = 0, ± 1 , . . . , ± m, (see Section 8. 1 ) and
( 10.6.2)
The choice of m for which the approximating AR(m) process "best" represents
the data can be made by minimizing AICC(<f>m) as defined by (9.3.4).
Alternatively the CAT statistic of Parzen ( 1 974) can be minimized. This
quantity is defined for m = 1 , 2, . . . , by
m
CAT(m) = n - 1 L vj- 1 - v;;,l ,
j=1
and for m = 0 by
CAT(O) = - 1 - n- 1 ,
where
j = 1 , 2, . . . .
We shall use AICC for choosing m.
The m1h order autoregressive estimator fm(w) defined by ( 10.6.2) is the same
as the maximum entropy estimator, i.e. the spectral density ]which maximizes
the entropy,
E
=
f,
In g(A.) dA.
over the class of all densities g which satisfy the constraints,
10. Inference for the Spectrum of a Stationary Process
366
r"
e i Ah g()_) d). = y( h),
(10.6.3)
h = 0, ± 1 , . . . , ± m.
To show this, let { W, } be any zero-mean stationary process with spectral
density g satisfying (1 0.6.3), and let a;+ l = Psr;{ wJ, - '"' <j � t } W,+l · Then by
Kolmogorov's formula (5.8.1 ),
- 1 )2
E( W,+ 1 - W,+
Now for any sequence a 1 ,
. . •
=
{ f"
1
2n exp 2n
, a m E IR,
}
ln g(..1.) d..1. .
-n
where { r; } is the AR(m) process (10.6. 1 ), since { 1';} and { W, } both have
autocovariances y(j), 0 :::; j :::; m. Setting aj = �mj• j = 1 , . . . , m, in the last
expression and using Kolmogorov's formula for the process { r; }, we obtain
the inequality
"
"
ln /m (..1.) d..1. ,
ln g(..1.) d..1. :-:::;; 2n exp �
2n exp �
2n
2n
{ f
-n
}
{ f
-n
}
as required. The idea of maximum entropy spectral estimation is due to Burg
( 1 967). Burg' s estimates �m l > . . . , �mm in (1 0.6. 1 ) are however a little different
from the Yule-Walker estimates.
The periodogram and the non-parametric window estimators discussed
in Section 1 0.4 are usually less regular in appearance than autoregressive
estimators. The non-parametric estimates are valuable for detecting strict
periodicities in the data (Section 1 0.2) and for revealing features of the data
which may be smoothed out by autoregressive estimation. The autoregressive
procedure however has a much more clearly defined criterion for selecting m
than the corresponding criteria to be considered in the selection of a spectral
window. In estimating a spectral density it is wise to examine both types
of density estimator. Parzen ( 1 978) has also suggested that the cumulative
periodogram should be compared with the autoregressive estimate of the
spectral distribution function as an aid to autoregressive model selection in
the time domain.
In the definition ( 1 0.6.2) it is natural to consider replacing the Yule-Walker
estimates �m and Om by the corresponding maximum likelihood estimates, with
m again chosen to minimize the AICC value. In fact there is no need to restrict
attention to autoregressive models, although these are convenient since �m is
asymptotically efficient for an AR(m) process and can be computed very
rapidly using Proposition 8.2. 1 . However there are processes, e.g. a first order
§ 1 0.6. Rational Spectral Density Estimators
367
moving average with (} 1 � 1, for which autoregressive spectral estimation
performs poorly (see Example 1 0.6.2). To deal with cases of this kind we can
use the estimate suggested by Akaike ( 1 974), i.e.
6 2 1 1 + {) 1 e -iw + • + eq e -iqw l 2
( 10.6.4)
'
f(w) = 2 n 1 1 - ¢> 1 e -•w - . . . - rPAv e - 'P"' I 2
• •
A
A
•
•
where � = (J1 , . . . , Jv y, 9 = ( {)1 , , {)qy and 6 2 are maximum likelihood esti­
mates of an ARMA(p, q) process fitted to the data, with p and q chosen using
the A ICC. We shall refer to the function j as the maximum likelihood ARMA
(or MLARMA) spectral density estimate.
A simpler but Jess efficient estimator than ( 10.6.4) which is particularly
useful for processes whose MA( oo) representation has rapidly decaying
coefficients is the moving average estimator (Brockwell and Davis 1 988(a))
given by
(jm
(10.6.5)
gA m (w) = 1 1 + {)m l e -iw + • • • + {)mm e -imw l 2 ,
. • •
2n
where om = ( {)m I ' . . . ' {)mm Y and (jm are the innovation estimates discussed in
Section 8.3. Like the autoregressive estimator ( 10.6.2), gm(w) can be calculated
very rapidly. The choice of m can again be made by minimizing the AICC
value. As is the case for the autoregressive estimator, there are processes for
which the moving average estimator performs poorly (e.g. an AR(l ) process
with r/J 1 � 1 ). The advantage of both estimators over the MLARMA estimator
(1 0.6.4) is the substantial reduction in computation time. Moreover, under
specified conditions on the growth of m with n, the asymptotic distributions
of the m'h order autoregressive and moving average spectral density estimators
can be determined for a large class of linear processes (see Berk ( 1974) and
Brockwell and Davis ( 1 988(a)).
ExAMPLE 10.6. 1 (The Wolfer Sunspot Numbers). For the Wolfer sunspot
numbers of Example 1 0.4.3, the minimum AICC model for the mean­
corrected data was found to be,
X, - 1 .475X, _ 1 + .937X, _ 2 - .21 8X, _ 3 + . 1 34X, _ 9 = Z, , ( 10.6.6)
with { Z,} � WN(O, 1 97.06) and AICC = 826.25.
The rescaled periodogram (2n)- 1 I 1 00(2ncj), ci = 1/100, 2/100, . . . , 50/100,
and the MLARMA estimator, /(2nc), 0 :::; c :::; .50, i.e. the spectral density of
the process ( 1 0.6.6), are shown in Figure 1 0. 1 1 .
Figure 10. 1 2 shows the autoregressive estimators j3 (2nc) (with
AICC = 836.97) and /8(2nc) (with AICC = 839.54). The estimator ]3(2nc)
has the smallest AICC value. The estimator /8(2nc) which corresponds to
the second smallest local minimum and fourth smallest overall AICC value,
has a more sharply defined peak (like the periodogram) at frequency w 1 0 •
1 0. I nference for the Spectrum of a Stationary Process
368
3.0
2.8
2.6
2.4
2.2
rn
,
"
0
rn
:J
0
.r:
�
t:,
2.0
1 .8
1 .6
1.4
1.2
1 .0
0.8
0 .6
0.4
0.2
0.0
0
0.1
0.2
0.1
0.2
(a)
0.3
0.4
0 .5
0.3
0.4
0.5
3.0
2.8
2.6
2.4
2.2
rn
,
"
0
Ill
:J
0
.r:
�
t:,
2.0
1 .8
1 .6
1.4
1 .2
1.0
0.8
0.6
0.4
0.2
0.0
'-.../
0
(b)
Figure 1 0. 1 1 . (a) The rescaled periodogram (2nr 1 /100(2nc), 0 < c � 0.5, and (b) the
maximum likelihood ARMA estimate j(2nc), for the Wolfer sunspot numbers,
Example I 0.6. 1 .
369
§ 1 0.6. Rational Spectral Density Estimators
3.0
2.8
2.6
2.4
2.2
2.0
.,
,
c:
0
.,
::J
0
.s:
�
t;,
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0
0. 1
0.3
0.2
0.4
0.5
0.4
0.5
(a)
3.0
2.8
2.6
2.4
2.2
2.0
"
,
c:
0
"
::J
0
.s:
�
t;,
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0
0.1
0.3
0.2
(b)
Figure 1 0. 1 2. The autoregressive spectral density estimates (a) ]3(2nc) and (b) ]8(2nc)
for the Wolfer sunspot numbers.
10. Inference for the Spectrum of a Stationary Process
370
3.0
2.8
2 .6
2.4
2.2
2.0
�
.,
1.8
"tl
c
0
1.6
t:,
, .2
"
:J
0
.r:.
, .4
1.0
0.8
0.6
0.4
0.2
0.0
0
0.1
0.2
0.3
0.4
0.5
Figure 1 0. 1 3. The moving average spectral density estimate g 1 3(2n:c) for the Wolfer
sunspot numbers.
Observe that there is a close resemblance between /3(2nc) and the non­
parametric estimate of Figure 10.8.
The moving average estimator with smallest AICC value (848.99) is
g 1 3(2nc) shown in Figure 10.1 3.
EXAMPLE 10.6.2 (MA( l )). A series of 400 observations was generated using the
model
(10.6.7)
The spectral density of the process,
f(2nc) = 1 1 + e - i 2 7tc l 2/(2n),
0 :::::: c :::::: 0.50,
and the rescaled periodogram (2n) - 1 l400(2nci), ci = 1/400, 2/400, . . . , 200/400,
of the data are shown in Figure 10.14. The data were mean-corrected.
Figure 1 0. 1 5 shows the autoregressive estimator /9(2nc) with (AICC =
1 162.66) and the moving average estimator g6(2nc) (with AICC = 1 1 52.06).
Maximum likelihood estimation gives the minimum AICC ARMA model,
{ Z, }
�
WN(O, .980),
(10.6.8)
with AICC = 1 1 37.72. The MLARMA estimator of the spectral density is
therefore j(2nc) = .980 f(2nc), showing that j and f are almost indistin­
guishable in this example.
§ 1 0.6. Rational Spectral Density Estimators
0
0.1
0.2
37 1
0.3
0.4
0.5
(a)
2.4
2.2
.-------�
2
1 .8
1 .6
1 .4
1 .2
0.8
0.6
0
0. 1
0.2
(b)
0.3
0.4
0.5
Figure 1 0. 1 4. (a) The spectral density f(2nc) and (b) the rescaled periodogram of
the realization {X 1 , . . . , X400 } of the process X, = Z, + Z, _ 1 , {Z,} WN(O, 1 ), of
Example 1 0.6.2.
�
10. Inference for the Spectrum of a Stationary Process
372
2.4
2.2
2
1 .8
1 .6
1 .4
1 .2
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0
0. 1
0.2
(a)
0.3
0.4
0.5
0.3
0.4
0.5
2.4
2.2
2
1 .8
1 .6
1 .4
1 .2
0.8
0.6
0.4
0.2
0
(b)
Figure 1 0. 1 5. (a) The autoregressive spectral estimate ]9(2nc) and (b) the moving
average estimate g6(2nc) for the data of Example 10.6.2.
§ 1 0.7. The Fast Fourier Transform (FFT) Algorithm
373
§ 1 0.7 The Fast Fourier Transform (FFT)
Algorithm
A major factor in the rapid development of spectral analysis in the past twenty
years has been the availability of a very fast technique for computing the
discrete Fourier transform (and hence the periodogram) of long series. The
algorithm which makes this possible, the FFT algorithm, was developed by
Cooley and Tukey ( 1 965) and Gentleman and Sande (1966), although some
of the underlying ideas can be traced back to the beginning of this century
(Cooley et al., 1967).
We first illustrate the use of the algorithm by examining the computational
savings achieved when the number of observations n can be factorized as
(10.7.1)
n = rs.
(The computational speed is increased still more if either r or s can be
factorized.) Instead of computing the transform as defined by (10. 1.7), i.e.
n
aj = n - 1!2 L xt e- i"'Jt,
t=1
we shall compute the closely related transform,
n -1
e- 2 Ttijtjn'
(10.7.2)
b1 = '\'
O :::;; j :::;; n - 1 .
L Xt +l
.
t =O
Then, by straightforward algebra,
0 :::;; j :::;; [n/2],
(10.7.3)
- [(n - 1 )/2] ::;; j :::;; - 1.
Under the assumption (10.7.1), each t E [0, n - 1 ] has a unique representation,
t = ru + v,
u E {O, . . . , s - 1 }, v E {O, . . .
1 }.
Hence ( 10.7.2) can be rewritten as,
r- 1 s-1
bj = L L Xru+v+ 1 exp [ - 2nij(ru + v)/n]
v =O u=O
s- 1
r -1
= L exp(- 2nijv/n) L Xru +v+ 1 exp ( - 2nijujs),
v =O
u=O
, r -
i.e.
r- 1
=O
where { bj, 0 :::;; j :::;; s - 1 } is the Fourier transform,
bj = L exp( - 2nij v/n)bj, v •
v
( 10.7.4)
"'
bj , v
=
s -1
L Xru+v+ 1 exp( - 2nijujs).
u=O
(10.7.5)
1 0. Inference for the Spectrum of a Stationary Process
374
If we now define an "operation" to consist of the three steps, the computation
of a term exp(- 2niju/k), a complex multiplication and a complex addition,
then for each v the calculation of { bi. " ' 0 ::;, j ::;, s - 1 } requires a total of
Ns = sz
operations. Since
l = 0, 1 , 2, . . . ,
it suffices to calculate { bi. "' 0 ::;, j ::;, s - 1 } in order to determine { bi. "' 0 :::; j ::;,
n - 1 } . A total of rNs = rs2 operations is therefore required to determine
{bi. v• O ::;, v ::;, r - 1 , 0 ::;, j ::;, n - 1 }. The calculation of {bi, O ::;, j ::;, n - 1 }
from (10.7.4) then requires another nr operations, giving a total of
rNs + nr = n(r + s)
(10.7.6)
operations altogether for the computation of { bi, 0 ::;, j ::;, n - 1 }. This repre­
sents a substantial savings over the n 2 operations required to compute {bi }
directly from ( 10.7.2).
If now s can be factorized as s = s 1 s 2 , then the number of operations Ns
required to compute the Fourier transform (10.7.5) can be reduced by applying
the technique of the preceding paragraph, replacing n by s, r by s 1 and s by
s 2 • From (1 0.7.6) we see that Ns is then reduced from s 2 to
(10.7.7)
Replacing Ns in (10. 7.6) by its reduced value N;, we find that { bi, 0 ::;, j ::;, n - 1 }
can now be calculated with
rs(s 1 + s 2 ) + nr = n(r + s 1 + s 2 )
(10.7.8)
operations.
The argument of the preceding paragraph is easily generalized to show
that if n has the prime factors p 1 , . . . , Pk • then the number of operations
can be reduced to n(p 1 + p 2 + · · · + Pk ). In particular if n = 2P, then the
number of operations can be reduced to 2n log2 n. The savings in computer
time is particularly important for very large n. For the sake of improved
computational speed, a small number of dummy observations (equal to the
sample mean) is sometimes appended to the data in order to make n highly
composite. A small number of observations may also be deleted for the same
reason. If n is very large, the resulting periodogram will not be noticeably
different from that of the original data, although the Fourier frequencies
{ wi , j E F. } will be slightly different. An excellent discussion of the FFT
algorithm can be found in the book of Bloomfield ( 1976).
We conclude this section by showing how the sample autocovariance
function of a time series can be calculated by making two applications of the
FFT algorithm. Let y(k), l k l < n, denote the sample autocovariance function
of {X 1 , , X.}. We first augment the series to Y = ( Y1 , . . . , Y2 ._ 1 )', where
t ::;, n,
. . •
§10.8.* Asymptotic Behavior of the Maximum Likelihood Estimators
and
Y;
=
0,
t>
375
n.
The discrete Fourier transform of Y is then
n
ai = (2n - 1 ) - 1;2 2tL=1- 1 Y; e- i<AJ , Ai 2nj/(2n - 1 ), jE Fzn -1, ( 10.7.9)
where F2n_ 1 {jEZ : - n Ai n}, and the periodogram of Y is (by Prop­
osition 1 0. 1.2 and the fact that I �:� 1 Y; = 0)
=
=
<
:S::
L
n
2n and summing over
1 1kl<n y(k) e- ;uJ.
j E F2 n - 1 , we get
Multiplying each side by e im A1
1
2
(10.7. 1 0)
y(m) = n - L lail e im AJ .
The autocovariances y(k) can thus be computed by taking the two Fourier
=
�-
j E F2n- l
transforms (10.7.9) and (10.7. 1 0), and using the FFT algorithm in each case.
For large n the number of operations required is substantially less than the
number of multiplications and additions (of order n 2 ) required to compute
< n, from the definition. The fast Fourier transform technique is
particularly advantageous for long series, but significant savings can be
achieved even for series of length one or two hundred.
y(k), lkl
§ 10.8 * Derivation of the Asymptotic Behavior
of the Maximum Likelihood and Least
Squares Estimators of the Coefficients
of an ARMA Process
In order to derive the asymptotic properties of the maximum likelihood
estimator, it will be convenient to introduce the concept of almost sure
convergence.
Definition
1 0.8.1 (Almost Sure Convergence). A sequence of random variables
{ Xn} is said to converge to the random variable X almost surely or with
probability one if
P(Xn converges to X) l .
It is implicit that X, X 1 , X 2 , . . . are all defined on the same probability space.
Almost sure convergence of { Xn } to X will be written as Xn � X a.s.
Remark 1. If Xn � X a.s. then Xn !... X. To see this, note that for any s > 0,
=
10. Inference for the Spectrum of a Stationary Process
376
1 = P(X. converges to X)
(Q JJ.
!�� Co
sP
=
P
.
{ I Xk - X I s �:}
{ I Xk - X I s e }
s lim inf P( I X. - X I s �:).
)
)
The converse is not true, although if X" !... X there exists a subsequence
{X.J of {X.} such that X.1 -+ X a.s. (see Billingsley ( 1986)).
Remark
2. For the two-sided moving average process
with Lj l l/lj l
< oo,
it can be shown that for h E { O, ± 1, . . . },
y(h) = n - 1 L (XI - x. ) (XI+Ihl - x.) -+ y(h) a.s.
n-lh l
and
1= 1
y(h) : = n - 1 L X1Xr+ l h l -+ y(h) a.s.
1= 1
n - lh l
( 1 0.8. 1 )
The proofs of these results are similar t o the corresponding proofs of conver­
gence in probability given in Section 7.3 with the strong law of large numbers
replacing the weak law.
Strong Consistency of the Estimators
Let { X1 } be the causal invertible ARMA(p, q) process,
xl - c/J 1 xl- 1
-
. . . - c/Jp XI - p =
zl + 81 zl - 1 + . . . + 8q zl - q •
{ Z1 }
-
IID(O, a2 ),
( 1 0.8. 2)
where c/J(z) and 8(z) have no common zeroes. Let � = (c/J 1 , . . . , ¢JP , 81 , . . . , 8q )' and
denote by C the parameter set, C = {� E IW + q : cp(z)(}(z) # 0 for l z l S 1, cp p # 0,
8q =F 0, and ¢( · ), 8( · ) have no common zeroes}.
Remark 3. Notice that � can be expressed as a continuous function � (a 1 , , aP ,
b 1 , . . . , bq ) of the zeroes a l > . . . , aP of c/J( · ) and b1 , . . . , bq of 8(- ). The parameter
set C is therefore the image under � of the set { (a 1 , . . . , aP , b 1 , . . . , bq ) : I a ! > 1 ,
l bj l > 1 and a; =F bj , i = 1, . . . , p, j = 1 , . . . , q}.
• • .
;
The spectral density f(2; �) of { X1 } can be written in the form,
az
f(2; �) = 2 g( 2; �),
n
§ 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators
377
where
( 1 0.8.3)
Proposition
1 0.8.1. Let P o be a fixed vector in C. Then
(2n)� 1
for all p E C such that P
I=
I"
g ( A. ; P o)
dA.
(Jc ; p)
g
n
�
Po ( C denotes
>
1
the closure of the set C).
PROOF. If { X, } is an ARMA(p, q) process with coefficient vector P o and white
noise variance 0"6 , then we can write
where t/J0(B) and 80(B) are the autoregressive and moving average polynomials
with coefficients determined by P o · Now suppose that p = (cp', 9')' E C, and p I=
P o · I f l cf>(z)/8(z) l is unbounded on i z l .$; 1 then
(2n) �
1 I�"
[g(Jc; P o)/g(Jc; P)J dJc
= oo
and the result follows. So suppose l cf>(z)/8(z) i is bounded on i z l .$; 1 and
consider the one-step predictor
L� 1 niXr � i of X, where n(z) =
1 + L� 1 nizi = cf>(z)8� 1 (z). Since P I= P0 , the mean squared error of this
predictor is greater than that of the best linear one-step predictor, and hence
-
( j� Y (
0"6 < E X, +
nj Xt j
�
= E e� 1 (B)t/J(B)X,
y
.
But the spectral density of e� 1 (B)t/J(B)X, is (0"6/2n) [ g (A.; P o)/g (A. ; p)] and hence
0"6 < Var (B� 1 (B)t/J(B)X, )
which establishes the proposition.
=
0"5
2n
I"
g ( ),; Po) Jc
d
( ) ,
� n g A.; P
0
The Gaussian likelihood of the vector of observations X. = (X1 , . . . , X. )' is
given by
{ :
L (p, 0"2 ) = (2n0" 2 )�nJ2 1 G. (PWif2 exp -
}
x� c.�I (p) x. ,
2 2
where G.(P) = 0" � 2 r.( P) and r. ( P) is the covariance matrix of X •. From Section
8.7, the maximum likelihood estimator p is the value of p in C which minimizes
1
(1 0.8.4)
l ( P) = ln (X � G.� ( P) X./n) + n � 1 ln det ( G. (p)).
The least squares estimator p is found by minimizing
a:;(p)
=
1
n � X� G.� 1 (P)X.
with respect to p E C . A third estimator � , is found by minimizing
(1 0.8.5)
1 0. Inference for the Spectrum of a Stationary Process
378
a; ( {3 )
n - 1 I I.(wi)/g(wi ; p)
(10.8.6)
j
with respect to p E C, where I.( · ) is the periodogram of {X , . . . , X.} and the
1
sum is taken over all frequencies wi = 2nj/n E ( - n, n]. We shall show that the
three estimators, �' p and p have the same limit distribution. The argument
follows Hannan (1 973). See also Whittle (1 962), Walker (1 964) and Dunsmuir
and Hannan ( 1 976).
In the following propositions, assume that {X, } is the ARMA process
defined by (10.8.2) with parameter values P o E C and > 0.
Proposition
=
O"g
O"g I "
10.8.2. For every p E C,
g(A ; P o )
d). a.s.
(10.8.7)
2n
g(A; p)
0, defining g0(A; p) = ( l e(e - i,\W + b)/ l ¢(e - i,\) l 2 ,
a; ( p) -+
Moreover for every b >
n - 1 L ln(w)
i g0(wi; p)
uniformly in P E C almost surely.
_"
-+ O"� I "
2n
_ "
g(A; Po )
d).
g0(A ; p)
(10.8.8)
PROOF. We shall only prove (10.8.8) since the proof of (10.8.7) is similar.
Let qm (A; p) be the Cesaro mean of the first m Fourier approximations to
g0(A; P) - 1 , given by
m
k
qm (A ; P) = m - 1 L L bk e - i ,\
j�O lkl :$j
-1
= -1
1
where bk (2n) f':.. " eik A(g0(A; p)) - 1 dA. By the non-negative definiteness of
{ bd, qm(A; p) � 0. As a function of (A, p), (go().; p)) - is uniformly continuous
on the set [ - n, n] x C. It therefore follows easily from the proof of Theorem
2. 1 1 . 1 that qm (A; p) converges uniformly to (g0(A; p)f1 on [ - n, n] x C and in
particular that for any c: > 0, there exists an m such that
l qm(A ; P) - (g0(A; p)) - 1 1 < c
for all (A, p) E [ - n, n] x C. We can therefore write, for all p E C,
1 n _1 'J" (
=
) _1"'J
ln(w) - n
ga(wi ; p)
n- 1
I�
. I
I.(w)qm(wi , p)
I.(wi)((ga(wi; p)) - 1 - qm(wi; P))
:::::; w - 1 "L I.(wi )
j
= c:y(O)
where the last equality follows from (10. 1 .9).
I
(10.8.9)
§ 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators
Now for n
>
379
m,
n - 1 L ln(wj) qm (wj ; P)
j
( - wi
= L L y(h) ( 1 - �) bk n 1 L e - i (h - k))
nk
m
j
l hl< l l< m
= L ii(k) ( 1 -
lk l< m
-lmkl ) bk
m
k=
-
lk l
1
J(n
k)
(
) bk . ( 10.8. 10)
m
1
+2L
For k and t fixed, the process { X,+ n - k ' n = 1 , 2, . . . } is strictly stationary and
ergodic (see Hannan, 1970) and by a direct application of the ergodic theorem,
n - 1 Xr+ n - k -> 0 a.s. From this it follows that y(n - k) = n - 1 L �= 1 x,x,+n - k -> 0
a.s. for each fixed k . The second term in ( 10.8. 1 0) therefore converges to zero
a.s. as n -> oo . By Remark 2, the first term converges to L lk l < m y(k)( 1 - l k lfm)bk
and since bk is uniformly bounded in p and k, we have
b
n - 1 Lj ln(w)qm (wj; P) -> kL y(k) ( 1 - �)
m k
l l< m
uniformly in p E C a.s. Moreover
( 1 0.8. 1 1 )
� r"lqm(Jc; P) - (go(Jc ; p)) - 1 l f(2; Po) d2
�
6')'(0) .
->
Since y(O) y(O) a.s. we conclude from ( 1 0.8.9), ( 1 0.8.1 1 ) and the above
inequality that
uniformly in p E C a.s.
D
Proposition 10.8.3. There exists an event B with probability one such that for
any sequence { Pn }, Pn E C with Pn -> p, we have the following two possibilities:
(a) If P E C, then for all outcomes in B,
g(Jc ; Po ) d) .
a;(p" )
( 10.8. 12)
2n
g(Jc; p)
_.
a5 f"
_"
.
10. Inference for the Spectrum of a Stationary Process
380
(b) If � E
in B,
ac (where ac is the boundary of the set C), then for all outcomes
I.Ill Inf O"-n2 (Rl'n )
.
n � oo
> O"�
_
2n
I " g(A.;g(),;�o ) d '
�)
- rr
A.
(1 0.8. 1 3)
PROOF. (a) Since � E C, inf;. g(A.; �) > 0 and sup;_ g(A.; �) < oo . Consequently for
each r. E (0, inf;. g(A.; �)) there exists an N such that
sup l g(A.; �") - g(A. ; �)I < r./2 for n :::::: N.
(10.8. 14)
;_
By Corollary 4.4.2, we can find a polynomial, a(z) = 1 + a 1 z + · · · + am z m,
and a positive constant K m such that a(z) i= 0 for l z l :o::; 1 , and
{:
sup l g(), ; �) - Am(A.) I
<
s/2,
i f Am(A) > inf g(A. ; �)/2 > 0,
;_
;_
where
(10.8. 1 5)
Note that Km --+ 1 as r. --+ 0. Let H" be the covariance matrix corresponding to
the spectral density (2n) - 1 Am(A.). Then if y E IR" and y' y = 1 ,
ly
'
G" ( �") y
=
y'
( 2nr'
:o::;
(2n) - 1
:o::;
s(2n) - '
=
H" y l
1 fJ itl yieii;f (g(A.; �") - Am(A.)) dA. I
fJ j� yjeij-\ 1 2 (l g(A.; g(A.; lg(A.;
"I I t yie ii-\ 1 2 dA.
-
s for n
�") -
- rr
::;:.: N.
J- l
�) I +
�) - Am()�) I ) dA.
(10.8. 1 6)
Now if { Y, } is an ARMA process with spectral density (2nr 1 g(A. ; �") (and white
noise variance 1 ) then by Proposition 4.5.3 the eigenvalues of G"(�") are
bounded below by inf;. g(A. ; �" ) ::;:.: K > 0 for some constant K and all n suffi­
ciently large. The same argument also shows that the eigenvalues of H" are
bounded below by inf;_ Am()�) > 0. Thus the eigenvalues of G,;- 1 ( �" ) and H,;- 1
are all less than a constant C 1 (independent of s) so that
l n - ' X� (G,;-' (�") - H;' )X"I = l n -1 X� H"- 1 (H" - G" (�")) G; ' (�")X"I
(10.8. 1 7)
:o::; r.Cf y(O) for n ::;:.: N.
We next consider the asymptotic behavior of n- 1 X�H;' X" .
Let { Y, } be the AR(m) process with spectral density (2n) - 1 K ml a(e - ; ;.W 2
§1 0.8. * Asymptotic Behavior of the Maximum Likelihood Estimators
381
(and white noise variance K m ). Then by the Gram-Schmidt orthogonalization
process, we can choose (5ik ' k = 1 , . . . , j, j = 1 , . . . , m, so that the random
variables
wl = 15 1 1 Yl ,
Wz
=
(52 1 Yt + l5zz Yz ,
are white noise with mean zero and variance K m . Then
where W"
matrix
=
( W1 ,
. . • ,
W,.)', Y"
= ( Y1 , . . . , Y" )'
and T is the lower trangular
(5mm
at
az at
m
m
T = (5 l (5 2
am am - !
am
(10.8. 1 8)
az at
am
It follows that TH" T' = K m l, where I is the n x n identity matrix, and hence
that H;; 1 = (T' T)/(K m ). Except for the m 2 components in the upper left corner,
and the m 2 components in the bottom right corner, the matrix H"- t = [h uJ7.i=t
is the same as the matrix R;; 1 = [ hu ] 7. i =t with
m - -j\
K ;;/
a,ar+ \ i-i\ if l i - ji :::;; m,
r- 0
( 1 0.8. 1 9)
hu 0 otherwise,
_
_
{
t
where a0 := 1 . It then follows that
m
= n - 1 L: (h;i - h;JX;Xi + n - 1
i,j=l
n
L
i, =n
j -m+ l
(h u - h;J X;Xi ( 10.8.20)
--> 0 a.s. ,
1
since n - X; --> 0 a.s. and n - 1 Xn - i --> 0 a.s. by an application of the ergodic
theorem. It is easy to show that for n > m
I 0. Inference for the Spectrum of a Stationary Process
382
and
l
(w )
(
n - 1 L In j - n - 1 � In �)
1 g (wj , p)
1 A m (wj )
I
:S:
Ci n - 1 � In (wj ) I A m (wJ - g (wj; P) I
1
::::; C��>y(O),
( 10.8.22)
where C2 1 = (inf,c g (A- ; P)/2) > o. Combining equations (10.8. 1 7), ( 10.8.20),
(10.8.21 ) and (1 0.8.22), we have
( )
n - 1 X � G; 1 ( Pn)Xn - n - I L In �
j g(wj , p)
m
1
:S: (C i + C�)�>y(O) + 2 C 2 L i y( n - k ) l + l n - X � H; 1 X n - n - 1 X� H; 1X n l
k= l
for all n ?: N. Now let { Pd be a dense set in C and let Sk denote the probability
one event where
I
I
( )
n - 1 L In �j ___. 0"6
j g (Wj , Pk ) 2n
f"
( )
n - 1 L In �j ....... 0"6
j g (wj , p) 2n
f"
g (A-: Po ) d .
Ag (A, Pk )
The event S = nk'== 1 Sk also has probability one and a routine approximation
argument shows that for each outcome in S n { y(O) ....... y(O) },
-n
-n
g (A- ; Po ) dA.
g ( A. , P)
for each P E C. If B1 denotes the event
00
B1 = n ( {n - 1 Xk ....... 0} n {n- 1 Xn - k ....... 0}) n { Y(O) ....... y(O)}
k=!
then P (B 1 ) = 1 and for each outcome in B1 ,
I�.��S�p I n -I xn' Gn-I (nJln) Xn
f"
2n
1
n
S
g ( ), ; Po ) d 1
2
2
::::; (C 1 + C2 ) £)1 (0)•
g (A ; p) A
Since Cf and Ci do not depend on £ and £ is arbitrary, the assertion (10.8.12)
follows for each outcome in B1 .
(b) Set Pn = (cjl�, 9� ) ' and p = w, 9' ) ' . Since Pn ....... p, choose 9t such that
et (z) = 1 + etz + . . . + eJz q f= 0 for l z l ::::; 1 and
sup \ 1 8t(e -;"W - I B (e - ;" ) 1 2 \ < £.
(If 8(z)
f=
0 for l z l
::::;
-
(J6
-
n
;_
1 , take 9t = 9. ) With P! = (cjl�, 9t' )', we have
g (A-; Pn ) ::::;
1 8t (e - ;"W + £
for all sufficiently large n.
l if>n (e -;"W
§1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators
383
By Corollary 4.4.2 there exists a polynomial b(z) = 1 + b 1 z + · · · + bk z k and
a positive constant K such that b(z) =I= 0 for l z l � 1 and
K < &t(e - ; 2
.') 1 + 2t:
l
l et(e -i '-) 1 2 + t: < b(e
- l -· ... w for all A. Setting A (A; cp) = K l 1 + a 1 e - ;;. + · · · + ame - im).l - 2
l ¢>(e - i'-W 2 we have
(10.8.23)
=
K l b(e - ;;.W 2
x
g()o; Pn) � A (A; <I-n) for all A and sufficiently large n.
Define the matrices T, Hn and H; 1 as in the proof of (a) with Am(),) replaced
by A (A; .Pn ). Since the coefficients in the matrix T are bounded in n, we have
from ( 1 0.8.20) and (1 0.8.21 )
n - 1 x'n H n- 1 xn
I (w )
" n . J.h· --> 0 a.s.
- n - 1 L...,
j A (w , 'f' )
(10.8.24)
j n
1
1
Since A - (.A.; cpn) = K l b(e ;;.W I ¢>(e ;;.W is uniformly bounded, we have, by
the argument given in the proof of Proposition 1 0.8.2 (see (10.8. 1 1 )), that
n
0"
ln(wj )
_ 1 '\'
L....
--> 6
j A (wj ; .Pn) 2n
-
f"
g()o ; Po) d 1
A
a.s.
_, A (A; cp)
(10.8.25)
Also, since g(A ; Pn) � A (A; <I-n) for all large n, the matrix Hn - Gn(Pn) is non­
negative definite (see Problem 4.8) and thus the matrix G; 1 (Pn ) - H; 1 is also
non-negative definite (see Rao ( 1973), p. 70). Thus, by ( 10.8.24), (10.8.25) and
( 10.8.23),
Letting 9t --. 9, the lower bound becomes
and finally, letting t: --> 0 and applying the monotone convergence theorem,
we obtain the desired result, namely
As in the proof of (a), we can also find a set B2 such that P(B2 ) = 1 and for
each outcome in B2 ,
384
1 0. Inference for the Spectrum of a Stationary Process
for any sequence Pn --+ p with p E ac.
Proposition
D
1 0.8.4. If P E C, then ln(det Gn (P)) > 0 and
n 1 in (det Gn (P)) --+ 0
-
as n --+ oo .
PROOF. Suppose { t; } is an ARMA process with spectral density (2n:) - 1 g(A.; p).
Then det Gn (P) = r0 · · · r" _ 1 , where r, = E( l-";+ 1 - Yr 1 ) 2 (see (8.6.7)). Rewriting
the difference equations for { t; } as 1'; + 2:: ;;, 1 ni Yr -i = Z,, we see from the
definition of Yr+1 that
+
1
=
(
Var(Z, + d = E l-"; + 1 +
)z
I= 1 n:i l-";+ 1 -i
J
c=�1 l nil y Yr(O).
Since r, 1 and r0 1 , det Gn (P) r0 rn - 1 1 , and hence ln(det Gn (P)) 0.
Moreover, since (L� l n)) 2 y (0) -+ 0 as t +
it follows that r1 -+ 1 , so
::;; 1 +
2
>
r+ 1
by Cesaro convergence,
=
Y
···
>
-
>
oo,
n- 1
0 ::;; n - 1 ln(det Gn (P)) = n - 1 L In rr - 1
t=O
D
10.8.1. Let �"' �" and Pn be the estimators in C which minimize
l( p) = ln(X� G; 1 (P)Xn /n) + n - 1 ln(det Gn (P)), a:; (p) = n - 1 X � G; 1 (P)X n , and
a;(p) = n - 1 Li (In (wj)/g(wi; p)), respectively, where {X, } is an ARMA process
with true parameter values Po E C and rr5 > 0. Then
Theorem
(i)
(ii)
(iii)
PROOF. Let B be the event given in the statement of Proposition 1 0.8.3. Then
§ 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators
385
there exists an event B* c B with probability one such that for each outcome
in B*, (1 0.8.7) holds with p = Po and ( 1 0.8.8) is valid for all rational c5 > 0. We
shall therefore prove convergence in (iHiii) for each outcome in B*. So for
the remainder of the proof, consider a fixed outcome in B*.
(i) Suppose Pn does not converge to Po· Then by compactness there exists
a subsequence {�n. } such that �n. -> P where P E C and P #- Po · By Proposition
1 0.8.2, for any rational c5 > 0,
'\'
In,(w)
f.... gi Pn
j wj; )
g(k R 0)
--_
1'_ d)..
giJ. ; p)
IliDk- oom. f (J-2n,(lR'n,) ;:::-: IliDk- oomf nk
.
.
(] 2
= _ll.
2n
However by Proposition ( 1 0.8. 1 ),
aJ
2n
f"
_,
.
I"
_ ,
- 1
'
g(J.; Po) '
d A > a02 ,
g().; p)
so by taking c5 sufficiently small we have
lim inf rr;. (�n. ) > aJ .
k-oo
On the other hand, by definition of �n and ( 1 0.8.7),
( 10.8.26)
lim sup rr;(�n ) :$; lim sup rr; (po)
which contradicts (1 0.8.26). Thus we must have �n -> Po · It now follows quite
easily from Proposition 1 0.8.2 that rr; (�n) ...... aJ.
(ii) As in (i) suppose �n does not converge to Po · Then there exists a
subsequence {PnJ such that Pn, ...... P #- Po with P E C. By Propositions 1 0.8.3
and 10.8. 1
But, by Proposition 10.8.3(a) and the definition of �n,
lim sup a,;(�n) :$; lim sup a; (p0 ) = aJ
which contradicts the above inequality. Therefore we conclude that �n ...... p 0,
and hence, by Proposition 10.8.3(a), that a;(�n) -> aJ.
(iii) Suppose Pn• -> P #- Po for some subsequence {Pnk }. Then by Propositions 10.8.3 and 10.8.4 and the definition of Pn , we obtain the contradiction
A
386
1 0. Inference for the Spectrum of a Stationary Process
ln(O"�) < lim infln(a,;. (�"• ))
k � ro
:s; lim inf l( �n ) :s; lim sup l(P0)
.
n-oo
k--+-oo
= ln(O"�).
Thus �n --> Po and a,;(�n ) --> (}� by Proposition 1 0.8.3(a).
D
Asymptotic Normality of the Estimators
1 0.8.2. Under the assumptions of Theorem 1 0.8.1,
(i) �" is AN(p0 , n - 1 w - 1 (P0 )),
(ii) �" is AN(P0 , n- 1 w -1 (P0)), and
(iii) �" is AN(p0 , n -1 w - t (P0)),
where
Theorem
Before proving these results we show the equivalence of the asymptotic
covariance matrix w -1 (P0) and the matrix V(P0) specified in (8.8.3). In order
to evaluate the (j, k)-component of W(p) for any given p E C, i.e.
_!_ I "
W1k 4n
_
_"
a !n g(.l.; P) a !n g ( .l. ; p) 1
dA'
of3j
o f3k
(10.8.27)
we observe that
In g(),; p) = In 8(e - iA ) + In 8(e iA ) - In <P (e - iA ) - In <P (e iA ),
where <P (z) = 1 - <P t z - . . . - <Pp z P and 8(z) = 1 + e l z + . . . + eq z q. Hence
a In g(A; P)/o<Pj = e - ijA r l (e - iA ) + e ijA r l (e iA )
and
(J !n g(.l_ ; p)j(J8j = e- ijA e - l (e - iA ) + e ijA e - l (e iA ).
Substituting in (10.8.27) and noting that for j, k :2: 1 ,
I�"
j,
I
e iU + kl A <P- 2 (e iA ) d.l. =
I�
"
e - iU+k)A <P - 2 (e - iA ) d.l. = 0,
we find that for k :,::; p,
"
WJk = _!_
(e - i(j - k) A + e iU - kl A ) I <P (e -iA ) I - 2 dA = E [ Ur -1.+ 1 Ur - k +l ] ,
4 n _"
387
§ 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators
where { ut } is the autoregressive process defined by,
= _!_ I-nn
{ N1 }
The same argument shows that for j, k
W
Jk
4n
�
�
WN(O, 1 ).
(10.8.28)
p + 1,
(e - i(j - k)A + e i(j - k)A ) I B(e - iAW 2 d) = E [ V.t -J. +l V.t - k + 1 ] •
�
where { v; } is the autoregressive process defined by,
B(B) Yr = No
p + 1,
__!_ I n a d.. a e
a
a
n
= __!_ I-n
For j � p and k = p + m
w, p+ m =
j
4n
- 7t
�
{ N1 }
�
WN(O, 1 ).
( 10.8.29)
In g(.A.; �) in g(.A. ; �)
d.A.
m
'f'J
m
[e i<m -j)A rp - 1 (e -iA ) B - 1 (e iA ) + e - i< -j)A rp - 1 (e iA )B- 1 (e - iA )J d.A..
4n
If { Z(.A.), - n � A. � n} is the orthogonal increment process associated with the
process { N1 } in (10.8.28) and ( 10.8.29), we can rewrite ltf . p+ m as
ltf. p+m =
� [ \I:n
(fn
+
e iA<t -jl rp - 1 (e - iA ) dZ(.A.),
e iA(t - ml g- 1 (e - iA ) dZ(A. ),
= E [ Ut -j +l V. - m + l ] ,
I:"
r"
)
)J
e iA(t - m) e - 1 (e - iA ) dZ(A.)
e iAU -jlrp -1 (e- iA ) dZ(}�)
and by the symmetry of the matrix W(�).
1 � m � q, 1 � k � p.
The expressions for ltjk can be written more succinctly in matrix form as
W(�) = E [Y1 Y; J,
(10.8.30)
where Y1 = (U0 Ut - 1 , . . . , Ut - p+ 1 , v; , Yr - 1 , . • • , Yr -q + 1 )' and { U1 } and { Yr } are the
autoregressive processes defined by ( 1 0.8.28) and ( 1 0.8.29) respectively. The
expression ( 1 0.8.30) is equivalent to (8.8.3). We now return to the proof of
Theorem 1 0.8.2, which is broken up into a series of propositions.
10.8.5. Suppose In ( " ) is the periodogram of {X 1 , . • • , Xn } and In , z C )
is the periodogram of { Z 1 , . . . , Zn } · If IJ( · ) is any continuous even function on
[ - n, n] with absolutely summable Fourier coefficients {am , - oo < m < oo },
then
Proposition
(10.8.3 1 )
as n �
oo .
1 0. Inference for the Spectrum of a Stationary Process
388
PROOF. From Theorem 10.3.1, I.(wj) - g (wj; �0) 1•. 2(wj) = R.(wj) where R.(),) =
;
;
1/J (e - 0')12 (..1.) Y, ( - A.) + !/J, (e 0')Jz ( - A.) Y,(A.) + I Y, (A.W , 1/J (e - ;. ) = I � o 1/Jk e - ;.kr,
1)
"n
"
ro
.
.
w
u
""
;;.
Y.
1
1
1
1
JZ(),' ) = n - 12 L.,t=l z,e - ' n (A = n - 12 L.,1= 0 '1' 1 e
- -1 z,e - u
n , 1 and Un,1 = L.,t=l
r
1
).
- I7= I Z,e - iAt = I �=! (Zr - 1 - zn -l+ r ) e - i }.( - The proof of Theorem 10.3. 1
gives max wk E [ O,nJ E I R . (wk ) l = O(n- 1 12 ), however this result together with the
bound
l
l
E n - 1 12 I R. (w) YJ(Wj ) :;:::; n 1 12 sup i YJ {A) I max E I R. (wk ) l ,
j
A
Wk E [0, 7t)
is not good enough to establish ( 10.8.3 1 ) . Therefore a more careful analysis is
required. Consider
n- 1 /2 I 1/J (e- i"'j ) Jz(w) Y, ( - wj) YJ(w)
j
oo
n
ro
oo
/
n - 312 I I I I I I !/Jk ljJ1 am Zr [Z,_1 - zn - 1 + r ] e - iwj(k + m - r + t) .
j k=O 1= 0 r=l m = -oc; r=l
Now for k, I, r and m fixed let s = r - m - k mod(n). Then
=
if t
if t
-=f.
s,
=
s,
which implies that
l
l
I !/J ljJ1 am Zr [Z,_1 - zn - l +r ] �J e - iwj(k+m-r +t) :;:::; 2 1 !/Jk !/J1 am i (J'5
r =l k
and hence that
E n -I
--+ 0
as n --+ oo . Since E l Y,(w) l 2 :;:::; 2 n - 1 0'6(Ik"= o 1 1/J k l l k l 1 12 ) 2 (see (10.3. 1 3)),
2
E n - 1 12 I I Y,(w) I 2YJ(w) :;:::; 2n- 1 120'6 I I I/Jk l l k l 1 12 sup i YJ(},) I
l
1
l
--+ 0,
(k = O
)
;.
whence
as desired.
D
Proposition 10.8.6. If YJ ( · ) and its Fourier coefficients {a m } satisfy the conditions
of Proposition 10.8.5 and if I:::= t l am I m 1 12 < oo and J':., YJ(A.) g (A.; �0) dA. = 0,
389
§ 1 0.8.* Asymptotic Behavior of the M aximum Likelihood Estimators
t hen
PROOF. In view of Proposition 1 0.8.5, it suffices to show that
(
n - 1 L ln . z(w)x(w) is AN o, n - 1
atf"
)
x 2 (.A.) d). ,
(10.8.32)
n
-"
J
where X(A) = IJ(A)g(),; �0). Let Xm(A) = L lk l s m bk e m, where bk
(2rrr 1 J':." e- ik i.X(A) d).. The assumptions on the Fourier coefficients of 17 (},)
=
together with the geometric decay of Yxlh) imply that l:: ;;"= 1 l bk l k 1 12 < oo, and
Xm(),) -> X(A) = L bk eiu as rn -> oo,
k
where the convergence is uniform in A. It also follows from our assumptions
that b0 = 0. We next show that for all t: > 0,
lim lim sup P n - 112 � /n,z(wi ) (x(wi ) - Xm(wJ) > e = 0. (10.8.33)
m --+oo n --+ oo
1
Observe that
L bk e ikwJ
n- 1 !2 L In,z(w) (x(wJ - Xm(wJ) = n-1!2 L L Yz(h)e - ihwJ
j lhl < n
lk l> m
j
( �
n 1 12 Yz(O)
k
t;_o bkn
)(
(
where iiz(h) = n -1 L:��jhl Z,Z,+ Ihl and Kh
term in (10.8.34), we have
I
I )
I
=
)
( 10.8.34)
{ k E Z : l k n + h i > rn}. For the h
� 2')iz (O)n 112
� bk
I
k n
=
0
I
( 1 0.8.35)
1
2
1
� 2jlz(O) L i bk l k -> 0 a.s.
k =n
by Remark 2 and the summability of { l bk l k 1 12 }. Moreover, since Ejlz (h) 0 for
h =1= 0 and Eyz(h)yz(k) = 0 for h =I= k (see Problem 6.24), we have
and for n > rn
[
ro
(
E 2n 1 12 I1 Yz(h) L bkn + h
k E Kh
h=
)] = 0
=
(10.8.36)
Now (10.8.33) follows at once from (10.8.35) and (10.8.36).
10. Inference for the Spectrum of a Stationary Process
390
To complete the proof it is enough to show, by Proposition 6.3.9 and
Problem 6. 1 6, that
n- 1
and
( :ci f n
� I z(wi)Xm(wi)
•
f.
is AN 0, n - 1
.
f,
x ;, (Jc) d.lc -->
x2(Jc) d.lc
X;,(Jc) d.lc
)
(10.8.37)
(10.8.38)
as m --> oo .
But since n 112 yz(n - k) = n- 1 12 I�= 1 Z,Zr+n - k = op(1 ), it follows from Proposi­
tions 6.3.3 and 6.3.4 and Problem 6.24 that
m
m
1
1
n- 112 2 ) z (wJ xm (wJ = 2n 12 I Yz(k)bk + 2n 12 I Yz(n - k)bk
k= 1
j
k= 1
m
2n 1 12 L Yz (k)bk + op(1 )
k=1
•.
=
=> N
(0, � )
= f':.,
4ari
k 1
b[ .
By Parseval's identity, 4ari L �= 1 b[ arifn x;,(Jc) d.lc, which establishes
(10.8.37). Finally (10.8.38) follows from the uniform convergence of Xm (Jc) to
xm
D
PROOF OF THEOREM 10.8.2. (i) The Taylor-series expansion of 8 0'2 (P0)f8 P about
p = �. can be written as
n 1 12
80'2 (�. )
8 0'2 (Po )
n 112
ap
ap _
=
-
_
n 1 12
8 2 0'2 (P!) R
( Pn
apz
_
R )
PO
/ 8 2 2(P!) (R
R
n 1 2 0'
Pn - PO ) '
apz
for some P! E C satisfying II P! - �. II < II �. - Po II ( II · II = Euclidean norm).
Now
-1 . R t
- 2( Rtn )
a2 ()"
- "[
) 8 2 g (Wj , Pn )
P
n 1 Lr n (WJ·
ap2
ap2
=
and since P! --> Po a.s. by Theorem 1 0.8. 1, the proof given for Proposition 1 0.8.2
can be used to establish the result,
az 0'2 (P!) aJ
ap2 --> 21!
"
I-n
(k R
g ' PO )
az g - 1 (Jc ; Po ) .lc
d a.S.
apz
(10.8.39)
Since (2 n)- 1 g (Jc; p) is the spectral density of a causal invertible ARMA process
with white noise variance equal to one, it follows that J':., In g (Jc ; p) d.lc 0 for
=
391
§ 1 0.8 * Asymptotic Behavior of the Maximum Likelihood Estimators
ali � E C, and hence that
Since the last relation holds also with g replaced by g- 1 , it follows from
(10.8.39) that
Consequently it suffices to show that
au z (�o)
a�
·
IS
4
AN(O, n- 1 4<To W(�o)),
or equivalently, by the Cramer-Wold device, that
c'
for ali c E [RP + q. But
au2(�o) .
IS AN(O, n - 1 4aric' W(�o) c)
a�
a (�o)
c uz
= n -1 � I"(w)1J(w),
7'
a�
I
where 17(-A) = c' ag -1 (-A; � 0)/a�. Now 17'( " ) and 17"( " ) are also continuous func­
tions on [ - n, n], so that by Problem 2.22, the Fourier coefficients of 17( )
satisfy the assumptions of Proposition 1 0.8.6 and
·
I"
I"
I
= c'O = 0.
17(-A)g(.A; �0) d), = - c' :
ln g(.A; �) d.A
u � -rr
ll = llo
Hence, invoking Proposition 1 0.8.6, we have
- rr
(o, I"
)
a
n -1 I I.(wi }1J(w) is AN n -1 6
172(.A) g2(.A; � 0) d.A ,
n
_
j
and since (a6 /n) f�" 1'/2(.A) g2 (.A; �0) d.A = 4aric' W(�0) c', the proof of (i) is
complete.
(ii) Expanding 00'2(�0)/a� in a Taylor series about the vector � = P., we
have as in the proof of (i),
-rr
a (�o) - /Z a z az (�!) R
(
= n1
n 1/Z az
a�
a� z I'n
-
R
1' 0 )
for some �! E C with �! --+ � 0 a.s. By (i) and Proposition 6.3.3, it suffices to show
that
1 0. Inference for the Spectrum of a Stationary Process
392
(1 0.8.40)
and
-
a 2 (Po ) p o '"
aa 2 (Po )
1 ' . . . ' p + q . ( 10.8.41)
- n 1/2 u
---+
tOf k
8f3k
8f3k
The proofs of ( l 0.8.40) and ( 1 0.8.41 ) follow closely the argument given for the
proof of Proposition 10.8.3. We shall only prove (10.8.41).
Since g()"; Po ) and 8g(A; P0)/8f3k have continuous derivatives of all orders
with respect to )" and since g(A; P o ) > 0 and I 8g(A ; P0)/8f3k I > 0, it follows easily
from Problem 2.22 that
n 1/2
( 10.8.42)
as h ---+ oo. Set
q m (A) = L b(h; Po )e- ih)..
lhi :O: m
Then
8b(h ; P o ) -ih).
"
.
L.... --- e
lhl ,; m apk
Equations (10.8.42) ensure that if m = [n 1 15 ] (the integer part of n 1 15 ),
then
where a(z) = 1 + a 1 z + · · · + am z m # O for lzl � 1 and K m ---+ 1 as m ---+ oo. Let
H. be the covariance matrix corresponding to the autoregressive spectral
density (2nqm (A))- 1 • We shall show that
n - 11 2
� (X� G; 1 (P0)X. - � I.(w)qm(w)) !. 0
a k
( 10.8.44)
as n ---+ oo, where m = [n 115 ]. Once this is accomplished, the result (10.8.41 )
follows immediately, since b y (10.8.43),
§ 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators
393
0
:-::;; n-112 L In (wJO(n - 315 ) = O(n- 111 )]7 (0) = op(l).
j
Throughout the remainder of this proof set m = [n115 ]. From the proof of
Proposition 10.8.3, the eigenvalues of Gn- 1 (�0) and H;;1 are uniformly bounded
in n. Moreover, the eigenvalues of the matrices aG"(�0)ja{3k and aH"(�0)ja{3k are
also uniformly bounded in n since ag().; �0 )japk and aq;;/ (A)japk are uniformly
bounded (see the proof of Proposition 4.5.3). It is easy to show from ( 1 0.8.43)
that there exists a positive constant K such that for all y E IR ", (cf. ( 10.8. 1 7))
II (Gn- 1 <Po ) - H"- 1 (� o )) YII :-::;; K n - 315 IIYII , and
I CG;�� ��� ) I
I
1�
o)
-
a
o)
y
:-::;;
K n - 315 II Y II .
It then follows from a routine calculation that
n- 112
a k
(X� Gn-! ( IJo )Xn - X�H;;1 Xn )
:-::;;
0
O(n -1 11 )]7(0)
=
op( l ). ( 1 0.8.45)
We next compare n - 112 a(X�Hn- 1 Xn )/a{3k with n-112 a(X�H;;1 Xn )/a{3k where
H;;1 is the covariance matrix of an MA(m) process with spectral density
(2nt 1 q m ()�) = (2nKm t 1 ! a(e- i-')! 2 • Now from the proof of Proposition 10.8.3
(see ( 1 0.8. 1 9) and ( 1 0.8.20)),
a
n - 112 - (Xn' Hn- 1 xn - X'n Hn- 1 xn )
ap�
_
=
.
� a
n -112 L. - (h I}.. - h!} )X!. X.J
i,j= 1 apk
It follows from ( 1 0.8.43) that a(h ;j - fiij)japk is uniformly bounded in i and j,
and since m = [n115] the above expression is op(1 ). By the same reasoning,
n - 1;z
a ' a
xn Hn-1 x n - n -1 !2
" In (wj ) q m (wj) + op (1)
apk 7
apk
(see ( 1 0.8.21 )), and with ( 1 0.8.45) this establishes ( 10.8.44) and hence ( 10.8.41 ).
(iii) From the Taylor series expansion of /(�0) about � = �n (cf. (i) and (ii)),
it suffices to show that as n -> oo ,
_ a 2 ln(det G"(�!)) P
( 10.8.46)
n 1
-> O,
2
a�
where �! -> �0 a.s., and
( 1 0.8.47)
1 0. Inference for the Spectrum of a Stationary Process
394
We shall only prove (1 0.8.47) since the argument for ( 1 0.8.46) is similar, but
less technical due to the presence of the factor n - 1 .
As in the proof of Proposition 10.8.4, if { t; } is an ARMA process with
spectral density (2nr 1 g(Jc; p), then det Gn(P) = r0(P) · · · r" _ 1 (p), where r1(P) =
£ ( ¥;+1 - Y,+d 2 • Denote the autocovariance function of { t; } by 1J(h; p) and
write the difference equations for { t; } as
00
t; + L niP) Yr -i = Zo
j=l
{ Z1 }
�
(10.8.48)
110(0, 1).
We have from Corollary 5. 1 . 1,
rt (p) = IJ (O; p) - 'l; Gt- 1 (P) 'ln
where '11 = (1]( 1 ; p), . . . , 1J(t; p))'. For notational convenience, we shall often
suppress the argument p when the dependence on p is clear. From (1 0.8.48),
we have
where 'loo = ( 1] ( 1 ; p), 1](2; p), . . . )' , Goo = [1J( i - j; P) J�j=l and
n(2; p), . . . )'. It then follows that
1t 00
= (n(1 ; p),
and it is easy to show that G;;/ may be written as
where
G;;/ = TT',
T = [n;-iP) J�j=l ,
n0(P) = 1 and niP) = 0 for j < 0. We also have from ( 10.8.48) and the
independence of Z1 and { Yr - 1 , t;_ 2 , . . . }, that
1](0; p) = 1t:n Goo 1too + l .
Consequently, we may write
rt (P) = 1 + 11:n G� 1 'loo - 'l; Gt- 1 'ln
and hence
art (Po )
a :n
Goo
= 2 'l G - 1 'l oo + 'l ,oo Gco- 1 a G - 1 'loo
af3k
af3k
af3k
00
-2
C(J
a 'l; - 1
- aGl - 1
Gt 'lt + 'l ,t Gt 1
Gt 'lt
apk
a pk
where all of the terms on the right-hand side are evaluated at p = Po · Note
that if cp1 = (r/J1 1 , . . . , rPtt Y = G1- 1 '11 is the coefficient vector of the best linear
predictor of ¥; + 1 in terms of ( r;, . . . , Y1 )', then the above equation reduces to
§10.8.* Asymptotic Behavior of the Maximum Likelihood Estimators
395
( 1 0.8.49)
We next show that the vectors rr,
Observe that
= (n 1 ,
• • •
, n,)' and �� are not far apart.
and
Y,+ l + n 1 Y, + · · · + n, Yl = I nj Y,+l -j + z, +l
j> t
so that the variance of (n 1 + <A d Y, + · · · + (n, + </>,,) Y1 is equal to
( � ni Y,+ l-i Z,+l - ( Y,+l - Yr+d)
(�, ni Y,+l -i) (Z,+1 - ( Y,+ 1 - Yr+1 ))
(� ni y
( I ni )
(rr, + �,)'G,(rr, + �,) = Var
+
j >t
+ 2 Var
� 2 Var
�2
�4
1] (0, �0) + 2(r, - 1 )
,l l
j> t
l l
2
1] (0, �0),
where the last inequality comes from the calculation in the proof of Proposi­
tion 1 0.8.4. Since the eigenvalues of G, are bounded below by inf.\ g().; �0) >
L > 0,
t
L (ni + r/J,J 2
j=l
� L - 1 (rr, + �,)'G,(rr, + �,)
( 1 0.8.50)
for some K > 0 and 0
=
< s <
1. Therefore, from ( 10.8.49),
where K 1 ( 1 /2n) J':., l ag( A.; �0)/aPk l dA.. Since L� 1 nJ < oo and L J =l r/J,] is
bounded in t (see ( 1 0.8.50)), we have from the Cauchy-Schwarz inequality,
1 0. Inference for the Spectrum of a Stationary Process
396
l ar,(�o) I -<
a/3k
where
r,(�0 )
K 2 t !f2 s � tj2
+ K3 "
.{....
j>t
l n·l
J
+
K
4
ts� tj2
K 2 , K 3 , K4
�
and K 5 are positive constants and 0
1, it then follows that
<
ln(det Gn (�o) <
a
n�l ort(�o)
I
apk I t=O I of3k I rt 1'0)
n a (� )
.::;; n� ! j2 i r� o I
r=o I apk
�;
n tz
�
n t;z
�[
L...
s1
<
1 . Since
(R
l
.::;; n� t;z K s ( l - s l )� t
-> 0
as n -> oo , which completes the proof of ( 10.8.47).
D
Problems
1 0. 1 . The discrete Fourier transform { ai,j E F" } of {X 1 , . . . , X" } can be expressed as
wi = 2njjn E ( - n, n],
where
n
J(},) = n- 1!2 L X,e - i<J.,
t= l
- oo < .?t < oo.
1 1
Show that X, = (2n) - n 12 J':n J(A.)e;';. d.?t. [In this and the following questions
we shall refer to J(A.), - oo < .?t < oo, as the Fourier transform of {X1 , . . . , Xn } ·
Note that the periodogram { In(wi), wi = 2njjn E ( - n, n] } can be expressed as
1 0.2. Suppose that z, = x,y,, t = 1 , . . . , n. If lx, Jy and lz are the Fourier transforms
(see Problem 1 0. 1 ) of {x,}, { y, } and {z, } respectively, and if �i = lx(2njjn),
IJi = ly(2njjn) and (i = lz(2njjn), j = 0, ± 1 , . . . , show that
(i = n - 1 !2 L �k iJj -k ·
k E F"
1 0.3. Suppose that { x,, t = 0, ± I, . . . } has period n and that
z
s
= L gk x, _k ,
t = 0, ± 1 , . . . .
k = -s
i
If G(e - ;;.) = L:i= - s gie- i \ - oo < .?t < oo, and lx, lz are the Fourier
transforms of {x1, . . . , xn} and {z1, , zn } respectively, show that
,
• • .
wi = 2njjn E ( n, n],
-
397
Problems
and
wj = 2nj/n E ( - n, n].
1 0.4.* Show that the sequence X, = e
i vr
,
t = I , . . . , n, has Fourier transform,
sin [n(A - v)/2]
] exp[ - i(A - v)(n + 1 )/2],
J(),) = n- t;z . )
sm [ ( - v)/2
- OO < A < OO.
,
Use this result to evaluate and plot the periodograms /8 and It o in the case
when v = n/4. [n/4 is a Fourier frequency for n = 8 but not for n = 1 0.] Verify
in each case that 'I,j E FJn(wj) = 'I,� t i X, I2.
�
10.5.* If J( · ) is the Fourier transform in Problem 10.4 and - n < v < n, determine
the frequencies ), such that (a) IJ().W is zero and (b) jJ( W has a local maximum
at A. Let M = jJ (vW and M t = I J(). t ll2 where At is the frequency closest to v
(but not equal to v) at which I J( W has a local maximum. Show that
(a) M ---> oo as n ---> oo,
(b) Mt/M ---> .0471 90 and A t ---> v as n ---> oo, and
(c) for any fixed frequency w E [ - n, n] such that w -=1- v, j J (wW ---> 0 as n ---> oo .
·
·
1 0.6. Verify (10.2. 1 1 ) and show that IIPs<P> X - P5P{ t )X II 2 is independent of
II X - Ps<p> X II 2 •
1 0.7. The following quarterly sales totals { X,, t = 1 , . . . , 12} were observed
over a period of three years: 27, 1 8, 1 0, 1 9, 24, 1 7, 5, 1 5, 22, 1 8, 2, 14.
Test at level .05 the hypothesis,
X, = c + Z,,
{ Z, }
�
IID(O, 172 ),
where c is constant, against the alternative,
X, = c + S, + Z,,
where S, is a deterministic sinusoid with period one year. Repeat the test
assuming only that S, has period one year but is not necessarily sinusoidal.
10.8. Use the computer program SPEC to compute and file the periodogram I (wj),
0 < wj = 2nj/n s; n of the Wolfer sunspot numbers. File also the standardized
cumulative periodogram C(j), 1 s; j s; [(n - 1 )/2], defined by ( 10.2.22). Plot
the two periodograms and use the latter to conduct a Kolmogorov-Smirnov
test at level .05 of the null hypothesis that { X, } is white noise.
10.9.* Consider the model
X, =
f1
+ A cos wt + B sin wt + Z,,
t = 1, 2, . . .
,
where {Z, } is iid N(0, 172), fl, A, B and 172 are unknown parameters and
w is known. If X" , A and li are the estimators, X" = n- t 'I,��� X,, A =
(2/n) t;z 'I,�� t (X, - X" )cos wt and li = (2/n) t 12 'I,��1 (X, X" )sin wt of fl, A and
B, show that
-
398
1 0. Inference for the Spectrum of a Stationary Process
1 0. 1 0. Generate 1 00 observations, { X1 , , X100}, of Gaussian white noise. Use the
following three procedures to test (at level .05) the null hypothesis, {X, } is
Gaussian white noise, against the alternative hypothesis, {X, } is Gaussian
white noise with an added deterministic periodic component of unspecified
frequency.
(a) Fisher's test.
(b) The Kolmogorov-Smirnov test.
(c) Let wi be the frequency at which the periodogram is maximum and apply
the test described in Section 1 0.2(a) using the model X, f1 + A cos wit +
B sin wit + Z,. In other words reject the null hypothesis if
• . •
( 100 - 3)/(w)
ICt Xf - / (0) - 2/(wi))
=
>
F. 95 (2, 97).
Is this a reasonable test for hidden periodicities of unspecified frequency?
=
1 0. 1 1 . Compute the periodogram of the series {X, - Xr- 1 , t 2, . . . , 72} where X,,
t = 1 , . . . , 72, are the accidental deaths of Example 1 . 1 .6. Use the procedure
described in Section 1 0.2(a) to test for the presence of a deterministic periodic
component with frequency 1 2n/71 . (This is the Fourier frequency with period
closest to 1 2.) Apply Fisher's test to the periodogram of the residuals from the
fitted model (9.6.6) for { X, } .
1 0. 1 2. For the Lake Huron data of Problem 9.6, estimate the spectral density func­
tion using two different discrete spectral average estimators. Construct 95%
confidence intervals for the logarithm of the spectral density. Also compute the
M LARMA spectral density estimate and compare it with the discrete spectral
average estimators.
1 0. 1 3. * Suppose that V1 , V2 , . . . , is a sequence of iid exponential random variables with
mean one.
(a) Show that P(max 1 -s;hs,q J.j - In q ::::; x) e - e-x for all x as q oo .
1
(b) Show that P(max 1 -s;hq J.j/(q - Ll=! J.j) ::::; x + In q) e - · -x a s q oo .
(c) I f C:q is a s defined i n (10.2.20) conclude that for large q,
P((;q - In q � x) � 1 - exp{ - - } .
ex
......
...... ...... ......
1 0. 1 4. If {Z, } - IID(O, cr 2 ) and EZi < oo , establish the inequality, E(LJ'= 1 Z)4 ::::;
mEZ{ + 3m2 cr4.
1 0. 1 5. Find approximate values for the mean and variance of the periodogram
ordinate /2 00(n/4) of the causal AR( 1 ) process
X,
-
.5X,_1
1
= Z,,
{ z,} - IID(O, cr 2 ).
Defining ](wi) = ( 1 0nr L.l= -z I2 00(wi + wk), wi = 2nj/200, use the asymp­
totic distribution of the periodogram ordinates to approximate
(a) the mean and variance of ](n/4),
(b) the covariance of ](n/4) and ](26n/100),
(c) P(](n/4) > l . lf(n/4)) where f is the spectral density of {X,},
(d) P(maxl -s; j .,; 9 g (/z o0(w)/f(wi)) > .06 L,]1\ (/2 00(wi)/f(w))).
1 0. 1 6. Show that successive application of two filters {a_,, . . . , a,} and { b_, . . . , b, } to
a time series {X, } is equivalent to application of the single filter { c_,_, . . . , c,+,}
where
399
Problems
00
00
ck = j L ak-A = j L bk -j aj ,
=
=
- oo
- co
and aj , bj are defined to be zero for l j l > r, s respectively. In Example 1 0.4.2
show that successive application of the three filters, r 1 { 1 , 1, 1 }, T 1 { 1, . . . , 1 }
1
and 1 1 - I { 1 , . . . , 1 } is equivalent to application of the filter (23 1 ) - { 1 , 3, 6, 9, 12,
1� 1 8, 20, 2 1 , 2 1 , 2 1 , 20, 1� 1 5, 1 � 9, � 3, 1 } .
1 0. 1 7. If L ?= 1 X, = 0, /"( · ) is the period-2n extension of the periodogram of {X1 ,
X" }, and f�(w), wj = 2nj/n, is the Daniell estimator,
m
'
1
show that
• • •
,
fD (wj ) =
L In(wj + wd,
2n (2m + 1) k = - m
where Ak = (2m + 1 ) - 1 sin[(2m + 1 )kn/n]/[sin(knjn)]. Compare this result
with the approximate lag window for the Daniell estimator derived in Section
1 0.4.
1 0. 1 8. Compare the Bartlett and Daniell spectral density estimators by plotting and
examining the spectral windows defined in ( 1 0.4. 1 3).
1 0. 1 9. Derive the equivalent degrees of freedom, asymptotic variance and bandwidth
of the Parzen lag-window estimator defined in Section 1 0.4.
1 0.20. Simulate 200 observations of the Gaussian AR(2) process,
X, - X,_ 1 + .85X,_ 2 = Z,,
Z,
�
WN(O, 1 ),
and compare the following four spectral density estimators:
(i) the periodogram,
(ii) a discrete spectral average estimator,
(iii) the maximum entropy estimator with m chosen so as to minimize the
AICC value,
(iv) the M LARMA spectral density estimator.
Using the discrete spectral average estimator, construct 95% confidence
intervals for ln f(A.), A E (0, n), where f is the spectral density of {X, } . Does
In f(A.) lie entirely within these bounds? Why does f( · ) have such a large peak
near n/3?
1 0.21.* (a) Let X I , . . . , xn be iid N(O, a 2 ) random variables and let Yl , . . . , Y. be the
corresponding periodogram ordinates, }j = I"(w), where q = [(n - 1 )/2].
Determine the joint density of Y1 , . . . , Yq and hence the maximum likelihood estimator of a 2 based on Y1 , . . . , Y. ·
(b) Derive a pair of equations for the maximum likelihood estimators rfo and
6 2 based on the large-sample distribution of the periodogram ordinates
/" (2 1 ), . • . , /"( 2m ), 0 < 2 1 < · · · < 2m < n, when {X1 , , X" } is a sample
from the causal AR(1) process, X, = I/JX,_ + Z,, {Z, } IID(O, a 2 ) .
1
• • .
�
1 0.22.* Show that the partial sum S2 n +l (x) of the Fourier series of I10.n1(x) (see (2.8.5))
10. Inference for the Spectrum of a Stationary Process
400
satisfies
1 1
Szn +dx) - - + _
2
fx
n 0
sin [2(n + l)y]
dy,
.
sm y
X 2 0.
Let x 1 denote the smallest value of x in (0, n] at which Szn+l ( · ) has a local
maximum, and let M1 = Szn+ 1 (x d. Show that
(a) limn-oo x1 = 0 and
(b) limn-oo M1 = 1.089367.
[This persistence as n --> oo of an "overshoot" of the Fourier series beyond the
value of /10.n1 (x) on [0, n] is called the Gibbs phenomenon.]
CHAPTER 1 1
Multivariate Time Series
Many time series arising in practice are best considered as components of
some vector-valued (multivariate) time series { X,} whose specification includes
not only the serial dependence of each component series {Xr; } but also the
interdependence between different component series { Xr; } and {Xti}. From a
second order point of view a stationary multivariate time series is determined
by its mean vector, J1 = EX, and its covariance matrices r(h) = E(Xr +h X;) '
J1J1 , h = 0, ± 1, . . . . Most of the basic theory of univariate time series extends
in a natural way to multivariate series but new problems arise. In this chapter
we show how the techniques developed earlier for univariate series are
extended to the multivariate case. Estimation of the basic quantities J1 and
r( · ) is considered in Section 1 1 .2. In Section 1 1 .3 we introduce multivariate
ARMA processes and develop analogues of some of the univariate results in
Chapter 3. The prediction of stationary multivariate processes, and in partic­
ular of ARMA processes, is treated in Section 1 1 .4 by means of a multivariate
generalization of the innovations algorithm used in Chapter 5. This algorithm
is then applied in Section 1 1.5 to simplify the calculation of the Gaussian
likelihood of the observations { X 1 , X 2 , . . . , X"} of a multivariate ARMA
process. Estimation of parameters using maximum likelihood and (for
autoregressive models) the Yule-Walker equations is also considered. In
Section 1 1 .6 we discuss the cross spectral density of a bivariate stationary
process {X,} and its interpretation in terms of the spectral representation of
{X.}. (The spectral representation is discussed in more detail in Section 1 1 .8.)
The bivariate periodogram and its asymptotic properties are examined
in Section 1 1 .7 and Theorem 1 1 .7. 1 gives the asymptotic joint distribution
for a linear process of the periodogram matrices at frequencies A 1 ,
A 2 , , Am E (0, n). Smoothing of the periodogram is used to estimate the
cross-spectrum and hence the cross-amplitude spectrum, phase spectrum and
. . •
402
1 1 . Multivariate Time Series
squared coherency for which approximate confidence intervals are given.
The chapter ends with an introduction to the spectral representation of an
m-variate stationary process and multivariate linear filtering.
§ 1 1. 1 Second Order Properties of Multivariate Time
Series
Consider m time series { X1 ; , t = 0, ± 1, ± 2, . . . }, i = 1, . . . , m, with EXr7 < oo
for all t and all i. If all the finite dimensional joint distributions of the random
variables { Xr d were multivariate normal, then the distributional properties of
{ X1 ; } would be completely determined by the means,
(1 1.1.1)
and covariances,
( 1 1 . 1 .2)
yij (t + h, t) := E [(Xr +h, i - flr +h, ; ) (Xti - flti )] .
Even when the observations { Xti } do not have joint normal distributions,
the quantities flr; and yij (t + h, t) specify the second-order properties, the
co variances providing us with a measure of the dependence, not only between
observations in the same series, but also between observations in different
series.
It is more convenient when dealing with m interrelated series to use vector
notation. Thus we define
( 1 1 . 1 .3)
t = 0, ± 1 , ± 2, . . . .
The second-order properties of the multivariate time series { Xr } are then
specified by the mean vectors,
Jlr : = EXt = (flr t • . . . , fltm )',
and covariance matrices,
r{t + h, t) := E [(X r+h - Jlr+h ) (Xr - Jlr )' ]
( 1 1 . 1 .4)
=
[yij(t + h, t) ]�j = t · ( 1 1 . 1 .5)
Remark. If { Xr } has complex-valued components, then r(t
+ h, t) is defined as
r{t + h, t) = E [ (Xr+h - Jlr+h ) (Xr - Jlr )* ] ,
where * denotes complex conjugate transpose. However we shall assume
except where explicitly stated otherwise that X1 is real.
As in the univariate case, a particularly important role is played by the
class of multivariate stationary time series, defined as follows.
Definition 1 1.1.1
(Stationary Multivariate Time Series). The series ( 1 1 . 1 .3) with
� I I . I . Second Order Properties of Multivariate Time Series
403
=
means and co variances (1 1 . 1 .4) and ( 1 1 . 1 .5) is said to be stationary if 11, and
r(t + h, t), h 0, ± I , . . . , are independent of t.
For a stationary series we shall use the notation,
11
and
( 1 1 . 1 .6)
:= E X , = (Jl i , · . . , Jlm )',
( 1 1 . 1 .7)
We shall refer to 11 as the mean of the series and to r(h) as the covariance
matrix at lag h. Notice that if {X,} is stationary with covariance matrix
function r( · ), then for each i, { X,i } is stationary with covariance function Yii ( · ).
The function Yii( ), i i= j, is called the cross-covariance function of the two
series { X,i } and { X,i }. It should be noted that Yi) · ) is not in general the same
as Yii ( · ). The correlation matrix function R( · ) is defined by
·
(1 1 . 1.8)
The function R( · ) is the covariance matrix function of the normalized series
obtained by subtracting 11 from X, and then dividing each component by its
standard deviation.
The covariance matrix function r( · ) [yii( " )]�i=l , of a stationary time
series { X, } has the properties,
=
(i)
(ii)
(iii)
(iv)
r(h) = r'( - h),
iyii(h) i ::;; [yii (O)yii (0)] 1 12 , i, j = I , . . . , m,
y u( · ) is an autocovariance function, i = 1 , . . . , m,
Li.k = l aj r(j - k)ak � 0 for all n E { 1 , 2, . . . } and a 1 , . . . , a. E !Rm .
The first property follows at once from the definition, the second from the
Cauchy-Schwarz inequality, and the third from the observation that "'Iii ( · )
is the autocovariance function of the stationary series {X, i , t = 0, ± 1 , . . . } .
Property (iv) is a statement of the obvious fact that E(L'J= 1 a�{Xi - 11)) 2 � 0.
Properties (i), (ii), (iii) and (iv) are shared by the correlation matrix function
R( " ) [pii( · ) ] �j=I , which has the additional property,
(v) Pii (O) = 1 .
=
(A complete characterization of covariance matrix functions of stationary
processes is given later in Theorem 1 1 .8. 1 .)
The correlation pii(O) is the correlation between Xt i and X,i , which is
generally not equal to I if i # j (see Example 1 1 . 1 . 1 ). It is also possible that
i Yii(h) i > I Yii(O) I if i i= j (see Problem 1 1 . 1 ).
ExAMPLE 1 1 . 1 . 1 . Consider the bivariate stationary process {X, } defined by,
xt l
X, 2
=
= z,,
Z, + . 7 5Z, _ 1 o ,
1 1. Multivariate Time Series
404
where { Z, }
�
[ ]
[0
WN (0, 1 ). Elementary calculations yield J1 = 0,
1( - 1 0) =
1
0 .75
, l(O) =
1
0 .75
[
J
1
, 1(10) =
.75
1 .5625
and r ( j) = 0 otherwise. The correlation matrix function is given by
R ( - 10) =
[� ::sl
R(O) =
and R(j) = 0 otherwise.
[ l
1 .8
.8 1
R (10) =
[�6 �8l
The simplest multivariate time series is multivariate white noise, defined
quite analogously to univariate white noise.
1 1.1.2 (Multivariate White Noise). The m-variate series { Z"
t = 0, ± 1 , ± 2, . . . } is said to be white noise with mean 0 and covariance matrix
!:, written
Definition
( 1 1 . 1 .9)
if and only if { Z, } is stationary with mean vector 0 and covariance matrix
function,
l(h) =
{t
( 1 1 . 1 . 1 0)
IID(O, !:),
(1 1.1.1 1)
if h = 0.
0, otherwise.
We shall also use the notation
{ Z, }
�
to indicate that the random vectors Z,, t = 0, ± 1 , . . . , are independently and
identically distributed with mean 0 and covariance matrix !:.
Multivariate white noise {Z,} is used as a building block from which can
be constructed an enormous variety of multivariate time series. The linear
processes are those of the form
00
x , = I cj z,_j ,
j= - oo
{Z,}
�
WN(O, l:),
(1 1 . 1 . 1 2)
where { CJ is a sequence of matrices whose components are absolutely
summable. The linear process {X, } is stationary (Problem 1 1 .2) with mean 0
and covariance matrix function,
00
(1 1 . 1 . 1 3)
l(h) = I cj +h tc;,
h = 0, ± 1, . . . .
j= - oo
We shall reserve the term MA( oo) for a process of the form ( 1 1 . 1 . 1 2) with
Ci = 0, j < 0. Thus {X,} is an MA( oo) process if and only if for some white
noise sequence {Z,},
405
§ 1 1 .2. Estimation of the Mean and Covariance Function
00
X1 = I cj zl-j'
j�O
where the matrices Ci are again required to have absolutely summable com­
ponents. Multivariate ARMA processes will be discussed in Section 1 1.3,
where it will be shown in particular that any causal ARMA(p, q) process can
be expressed as an MA( oo) process, while any invertible ARMA(p, q) process
can be expressed as an AR( oo) process,
00
I Ajxl-j = zl'
j�O
where the matrices Ai have absolutely summable components.
Provided the covariance matrix function r has the property
,I ;:'� -oo I Yii (h) l < oo, i, j = 1 , . . . , m, then r has a spectral density matrix
function,
1 00
( 1 1 . 1 . 1 4)
f().) = - L e - m r(h),
- n ::::;; ). ::::;; n ,
2n h�-oo
and r can be expressed in terms of f as
r(h) =
J:,
e ;;."J().) d).
( 1 1 . 1 . 1 5)
The second order properties of the stationary process {X1} can therefore be
described equivalently in terms of f( · ) rather than r( ). Similarly X1 has a
spectral representation,
·
XI =
I
e iJ.r d Z ().),
J(-1t,1l]
( 1 1 . 1 . 1 6)
where { Z().), - n ::::;; )0 ::::;; n } is a process whose components are orthogonal
increment processes satisfying
E(dZ.().)dZ ( )) =
}
k
jJ.
{jjk().) d).
0
if ). = fJ.
if ). #- j.J..
( 1 1 . 1. 1 7)
The spectral representations of r( · ) and {X1} are discussed in Sections 1 1 .6
and 1 1 .8. They remain valid without absolute summability of yii( ) provided
f(Jo) d)o is replaced in ( 1 1 . 1 . 1 5) and ( 1 1 . 1 . 1 7) by dF().) (see Section 1 1 .8).
·
§ 1 1 .2 Estimation of the Mean and Covariance
Function
As in the univariate case, the estimation of the mean vector and cross­
correlation function of a stationary multivariate time series plays an im­
portant part in describing and modelling the dependence structure between
1 1 . M ultivariate Time Series
406
the component time series. Let {X, = (X, 1 , , X, m )' , -oo
m-dimensional stationary time series with mean vector
• • •
<
t
<
oo } be an
and covariance matrix function
[yii (h) J ri= t
where Y;)h) = Cov(X,+h .i • X,). The cross-correlation function between the
processes { xti } and { x,j} is given by
h = 0, ± 1 , . . . .
pii (h) = yii (h)/(Y;; (O) yi)0)) 112 ,
r(h) = E [ (X, + h - !J) (X, - IJ)' J
=
Estimation of IJ . Based on the observations X 1 , . . . , X., an unbiased estimate
of 11 is given by the vector of sample means
-
1
n
x . = - L X, .
n r=l
Observe that the mean of the r time series /1j is estimated by ( 1/n) L �= l xtj.
The consistency of the estimator X. under mild conditions on ')!;; (h) can be
established easily by applying Theorem 7. 1 . 1 to the individual time series
{ Xr; }, i = 1 , . . . , m. This gives the following result.
1 1.2.1. If {X, } is a stationary multivariate time series with mean IJ
and covariance function r( · ), then as n � oo
Proposition
and
E(X. - IJ)'(X. - IJ) � 0 if ')!; ; (n) � 0, i = 1 , . . . , m
m ro
ro
nE(X. - !J)'(X. - IJ) � L L ')!; ; (h) if L IY;; (h) l < oo, i = 1 , . . . , m.
h = - co
i = l h = - oo
The vector x. is asymptotically normal under more restrictive assumptions
on the process. In particular, if {X, } is a multivariate moving average process
then x. is asymptotically normal. This result is given in the following
proposition.
1 1.2.2. Let {X,} be the stationary multivariate time series,
ro
x , = 11 + L ck z, _ k o
{ Z, } � IID(O, l:),
k = - oo
where { Ck = [ Ck (i, j ) J ri = d is a sequence of m x m matrices such that
Lr'= - oo I Ck (i, j ) l < oo, i, j = 1, . . . , m. Then
Proposition
PROOF. See Problem 1 1 .3.
D
407
§ 1 1 .2. Estimation of the Mean and Covariance Function
This proposition can be used for constructing confidence regions for J.l.
For example if the covariance matrix l:x := n - 1 (L;;'� -oo Cdl:(L ;;'� - oo C�) is
nonsingular and known, then an asymptotic (1 - ()() confidence region for J.1 is
( 1 1.2. 1 )
This region is o f little practical use since i t is unlikely that l: x will be known
while J.1 is unknown. If we could find a consistent estimate fx ofl:x and replace
l:x by fx in (1 1 .2. 1 ), we would still have an asymptotic 1 - ()( confidence region
for J.l. However, in general, l:x is a difficult quantity to estimate. A simpler
approach is to construct for each i, individual confidence intervals for J..l. ; based
on X 1 ;, , X.; which are then combined to form one confidence region for J.l.
If J;(w) is the spectral density of the ith process, {X, ; }, then by the results of
Section 10.4 (see ( 1 0.4. 1 1)),
• . .
( �)
y;;(h)
2n/;(O) := L 1 r
l hl 9
is a consistent estimator of 2nf(O) = L;;'� Y;; (k) provided r = r. is a sequence
of numbers satisfying r./n � 0 and r. � oo. Thus if X. ; denotes the sample
mean of the ith process, and <l>� is the a-quantile of the standard normal
distribution, then by Theorem 7. 1 .2, the bounds
X. ; ± <l> 1 _a12 (2n/;(O)/n) 1/2
are asymptotic (1 - ()() confidence bounds for J..l.; · Hence
A
P([ J..I. ; - X.
;[ :0:: <l> 1 _ a12 (2n}; (O)/n) 1 /2 , l - 1 , . . . , m)
- oo
-
A
•
_
where the right-hand side converges to 1 - m()( as n � oo . Consequently as
n � oo, the region
( 1 1 .2.2)
has a confidence coefficient of at least 1 - ()(. For large values of m this
confidence region will be substantially larger than an exact ( 1 - ()() region.
Nevertheless it is easy to construct, and in most applications is of reasonable
size provided m is not too large.
Estimation of r(h). For simplicity we shall assume throughout the remainder
of this section that m = 2. As in the univariate case, a natural estimate of the
covariance matrix r(h) = E [(Xr +h - JJHXr J.l)'] is
{
n-1
f(h) =
n-1
-
f (Xr + h - X.)(Xr - X.)'
"
t- 1
for 0 s; h
s;
L (Xr+ h - X.) (Xr - X.)' for - n + 1
t� -h+ !
"
n - 1,
:0::
h
<
0.
1 1 . Multivariate Time Series
408
Writing yii(h) for the (i,j)-component of f'(h), i = 1, 2, we estimate the cross­
correlation function by
Pii(h) = Yii(h) (Yii (O)yi)O) r 1!2 _
If i = j this reduces to the sample autocorrelation function of the i'h series. We
first show the weak consistency of the estimator yii(h) (and hence of pii(h)) for
infinite-order moving averages. We then consider the asymptotic distribution
of yii(h) and P;i (h) in some special cases of importance.
Theorem 1 1 .2.1 .
Let {X,} be the bivariate time series
00
x, = L ck z,_ b
=
k
- oo
where { Ck = [ Ck (i,j)JL = 1 } is a sequence of matrices with L ;;'=
i, j = 1 , 2. Then as n --> oo ,
- oo
I Ck (i,j)l
< oo ,
and
pij(h) � pij(h)
for each fixed h :::0: 0 and for i, j = 1 , 2.
PROOF. We shall show that f'(h) � r(h) where convergence in probability of
random matrices means convergence in probability of all of the components
of the matrix. From the definition of f'(h) we have, for 0 :::;; h :::;; n - 1 ,
n -h
n -h
n- h
t(h) = n - 1 L x t+h x; - n - 1 X " L x; - n- 1 L xt+hx�
t= 1
t= 1
t= 1
( 1 1 .2.3)
1
(n
h)n
+ - - X " X�.
Since EX = 0, we find from Proposition 1 1 .2. 1 that X" = op ( 1 ), n - 1 L�:t X, =
op ( 1 ) and n -1 L�:1h Xt+h = oP (l). Consequently we can write
f'(h) = r*(h) + op ( 1 ),
where
( 1 1 .2.4)
"
r*(h) = n- 1 I x , + h x;
t= 1
"
00
= n -1 L L
t=l i
= -oo
00
L ci+h zt -i z;_i c;
j= -oo
Observe that for i # j, the time series {Z,_ ; 1 Z, i 2 , t = 0, ± 1 , . . . } is white
noise so that by Theorem 7. 1 . 1 , n -1 L�= 1 zt - i, 1 zt -j, 2 � 0. Applying this
_
_
_
409
§ 1 1.2. Estimation of the Mean and Covariance Function
argument to the other three components of Z, _ ;Z, _j , we obtain
n
i # j,
n -1 L z, _ ; Z r -j � 0 2 X 2 ,
i =l
where 0 2 x 2 denotes the 2 x 2 zero matrix. Hence for m fixed,
For any matrix A define I A I and EA to be the matrices of absolute values
and expected values, respectively, of the elements of A. Then
E I G!(h) - G! (h ) l
l ii Iljl>m ci+h n - 1 t=lf z,_;z;_j c; l
i #j
.:::; L I C; + h l (n -1 f E I Zr - ; Z j l ) I C) I
r= 1
ii #j
ljl> m
=
E
I
I
or
;
i
_
or
The latter bound is independent of n and converges to 0 as m --+ oo . Hence
lim lim sup E I G!(h) - G! (h) l
m-+oo n-+oo
which, by Proposition 6.3.9, implies that
=
0,
G!(h) � 02 X 2 ·
Now
( �� z, _; z; _ ;) c;
G!(h) + � C; + h ( n -1 f Z,z;) c; + 4: Ci + h(n - 1 Un ;) C;
t=l
f*(h) = G!(h) + � ci + h n -1
=
l
l
where Un ; = .L7�{ - ; z,z; - I7= 1 z,z; is a sum of 2 l i l random matrices if
I il < n and a sum of 2n random matrices if I i l 2 n. Hence
I
I
E ,L ci + h(n -1 Un ; ) C; ::;; 2n - 1 L l i i i C; + hl l l: I I C; I
i
Iii ,; n
+ 2n -1 L I Ci+ h l l l: I I C; I
lil> n
and by the absolute summability of the components of the matrices { C; }, this
410
1 1 . M ultivariate Time Series
bound goes to zero as n --->
oo.
It therefore follows that
(
r*(h) = � ci + h n - 1
�� zrz;) c; + op( 1 ).
By applying the weak law of large numbers to the individual components of
zt z;, we find that
n - 1 f. zt z; � t,
t= 1
and hence
r*(h) � I ci +htc;
i
=
r(h).
Consequently, from ( 1 1 .2.4),
f(h) � r(h).
( 1 1 .2.5)
The convergence of pij(h) to pij(h) follows at once from ( 1 1 .2.5) and
Proposition 6. 1 .4.
D
In general, the derivation of the asymptotic distribution of the sample
cross-correlation function is quite complicated even for multivariate moving
averages. The methods of Section 7.3 are not immediately adaptable to the
multivariate case. An important special case arises when the two component
time series are independent moving averages. The asymptotic distribution of
p 1 2 (h) for such a process is given in the following theorem.
Theorem 1 1 .2.2.
Suppose that
00
and
xtl = I cxjzt -j. 1 ,
j = - oo
{ Zt t }
00
�
IID(O, a-?},
xt 2 = I [Jjzt -j, 2 •
{ Zd IID(O, o'i},
j � - oo
where the two sequences { Zt t } and {Z1 2 } are independent, Ij l cxj l <
Lj l fJjl < oo . Then if h � 0,
�
( }:,
fJdh) is AN 0, n - 1
j
)
oo
and
P (j)p2 2 (j) .
oo 1 1
If h, k � 0 and h =I= k, then the vector (p d h), p1 2 (k))' is asymptotically normal
with mean 0, variances as above and covariance
00
n - 1 L P1 1 (j)p22 (j + k - h).
j= - oo
PROOF. It follows easily from ( 1 1 .2.3) and Proposition 1 1 .2. 1 that
411
§1 1 .2. Estimation of the Mean and Covariance Function
( 1 1 .2.6)
where
n
n
Y tz (h) = n - 1 L: X,+h. 1 X, z = n -1 L: I I rxi+hf3j ZH. 1 z, _j. z ·
t� 1 i j
t� 1
Since Eyf2 (h) = 0, we have
n Var(y f2 (h))
n
n
n - 1 L L L L L L: rxi+ hf3pk+ hf3t E [Zs - i , 1 zs -j, 2 Z, _u Zr -l, 2 ] .
s� 1 r � 1 i j k l
By the independence assumptions,
=
- -
if s i = t
otherwise,
( 1 1 .2.7 )
k and s - j = t - l,
so that
Applying the dominated covergence theorem to the last expression, we find that
ro
( 1 1 .2.8)
n Var (y f2 (h)) ---> L y 1 1 ( j )y2 2 (j) as n ---> w.
j= - oo
Next we show that y f2 (h) is asymptotically normal. For m fixed,
we first consider the (2m + h)-dependent, strictly stationary time series,
{ I Iil s: m L iii S: m rxif3j Zr +h -i , 1 Zr -j, 2 , t = 0, ± 1, . . . }. By Theorem 6.4.2 and the
calculation leading up to ( 1 1 .2.8),
n
n - 1 I L L rxi f3j Zr + h - i , ! Zr -j, Z is AN(O, n - 1 am ),
r � 1 l ii S: m lii S: m
where
Now as m ---> w, am ---> Liy1 1 ( j)y2 2 ( j). Moreover, the above calculations can
be used to show that
2
lim lim sup nE y f2 (h) - n -1 rI L L rxif3j Zr+h - i, 1 Zr -j, 2 = 0.
m�oo n�oo
�J l ii S: m li i S: m
1
I
This implies, with Proposition 6.3.9, that
y f2 (h) is
(
AN 0, n - 1
�
k � oo
)
Y 1 1 (k) Yzz (k) .
( 1 1 .2.9)
I I . Multivariate Time Series
412
Since y 1 1 (0) !.. y 1 1 (0) and y22 (0) !.. y22(0), we find from ( 1 1 .2.6), ( 1 1 .2.9) and
Proposition 6.3.8 that
(
�
2
t\ 2(h) = "Ydh)(y1 1 (O)y2 z (O)f 11 is AN 0, n - 1 j=
Finally, after showing that
a
)
/" (j) pzz(j) .
-
00
n Cov (y t2(h), Yt2(k)) --+ L y 1 1 (j)y22(j + k h),
j= - oo
the same argument, together with the Cramer-Wold device, can be used to
establish the last statement of the theorem.
D
This theorem plays an important role in testing for correlation between
two processes. If one of the two processes is white noise then p 1 2 (h) is
AN(O, n - 1 ) in which case it is straightforward to test the hypothesis that
p 1 2(h) = 0. However, if neither2process is white noise, then a value of I /J1 2 (h)l
which is large relative to n - 1 1 does not necessarily indicate that p 1 2 (h) is
different from zero. For example, suppose that { Xr 1 } and { X, 2 } are two
independent AR( 1 ) processes with p 1 1 (h) = p22(h) = .81hl. Then the asymptotic
variance of fJ d h) is n - 1 ( 1 + 2 L k"= 1 (.64)k ) = 4.556n- 1 . It would therefore not
be surprising to observe a value of p 1 2(h) as large as 3n - 112 even though { X, I }
and { X, 2 } are independent. If on the other hand p 1 1 (h) = .81hl and p22 (h) =
( - .8)1 hl, then the asymptotic variance of p 1 2(h) is .21 95n- 1 and an observed
value of 3n - 112 for p12(h) would be very unlikely.
Testing for Independence of Two Stationary Time Series. Since by Theorem
1 1 .2.2 the asymptotic distribution of p 1 2 (h) depends on both p 1 1 ( · ) and p22( • ),
any test for independence of the two component series cannot be based solely
on estimated values of p 1 2(h), h = 0, ± 1 , . . . , without taking into account the
nature of the two component series.
This difficulty can be circumvented by "prewhitening" the two series before
computing the cross-correlations p 1 2(h), i.e. by transforming the two series to
white noise by application of suitable filters. If { X, I } and { X, 2 } are invertible
ARMA(p, q) processes this can be achieved by the transformations,
-
00
n (i) Xr j.
Z, ; - "
j=L.O i - i
where L� o n)0zi = (/bu>(z)jOU>(z), lz l s 1 , and (/6< 0, ou> are the autoregressive and
moving average polynomials of the i1h series, i = 1 , 2.
Since in practice the true model is nearly always unknown and since the
data X,i , t s 0, are not available, it is convenient to replace the sequences {Z,; },
i = 1 , 2, by the residuals { If;;. t = 1 , . . . , n} (see (9.4. 1)) which, if we assume that
the fitted ARMA(p, q) models are in fact the true models, are white noise
sequences for i = 1 , 2.
To test the hypothesis H0 that {X, I } and {X, 2 } are independent series, we
§ 1 1 .2. Estimation of the Mean and Covariance Function
413
observe that under H0, the corresponding two prewhitened series {Z,I } and
{ Z,2 } are also independent. Under H0, Theorem 1 1 .2.2 implies that the sample
autocorrelations 1\ 2 (h), p 1 2 ( k), h =I k, of { Zr } and {Z,2 } are asymptotically
1
independent normal with means 0 and variances n - 1 • An approximate test for
independence can therefore be obtained by comparing the values of l p 1 2 (h)l
with 1 .96n-112 , exactly as in Example 7.2. 1 . If we prewhiten only one of
the two original series, say { Xr }, then under H0 Theorem 1 1 .2.2 implies
1
that the sample autocorrelations p 1 2 (h), p 1 2 (k), h =I k, of { Z, I } and { X,2 }
are asymptotically normal with means 0, variances n - 1 and covariance
n-1 p22 (k - h), where p22 ( · ) is the autocorrelation function of {X;2 }. Hence for
any fixed h, p 1 2 (h) also falls (under H0) between the bounds ± 1 .96n - 112 with
a probability of approximately .95.
EXAMPLE 1 1 .2. 1 . The sample cross-correlation function p 1 2 ( · ) of a bivariate
time series of length n = 200 is displayed in Figure 1 1 . 1 . Without knowing the
correlation function of each process, it is impossible to decide if the two
processes are uncorrelated with one another. Note that several of the values
2
of p 1 2 (h) lie outside the bounds ± 1 .96n - 11 = ± . 1 39. Based on the sample
autocorrelation function and partial autocorrelation function of the first
process, we modelled { X, } as an AR(1) process. The sample cross-correlation
1
function p 1 2 ( · ) between the residuals (J.f; 1 , t = 1, . . . , 200} for this model and
{ X,2 , t 1, . . . , 200} is given in Figure 1 1 .2. All except one of the values p u (h)
lie between the bounds ± . 1 39, suggesting by Theorem 1 1 .2.2, that the time
=
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
-20
-10
0
10
20
Figure 1 1. 1 . The sample cross-correlation function p1 2 (h) between { Xn } and { Xrz },
1
Example 1 1.2. 1 , showing the bounds ± 1 .96n- 12 •
I I . Multivariate Time Series
414
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
- 20
-10
0
10
20
Figure 1 1 .2. The sample cross-correlation function between the residuals { W, 1 } and
{ X, 2 }, Example 1 1 .2. 1 , showing the bounds ± 1 .96n - 1 12 •
series { Jt; , } (and hence { Xr 1 }) is uncorrelated with the series { X1 2 }. The data
for this example were in fact generated from two independent AR(l) processes
and the cross-correlations were computed using the program TRANS.
=
=
ExAMPLE 1 1 .2.2 (Sales with a Leading Indicator). In this example we consider
the sales data { 1'; 2 , t 1 , . . . , 1 50 } with leading indicator { Y, 1 , t 1 , . . . , 1 50 }
given by Box and Jenkins ( 1 976), p. 537. The autocorrelation functions of { 1'; 1 }
and { 1';2 } suggest that both series are non-stationary. Application of the
operator ( 1 - B) yields the two differenced series {Dr 1 } and {D12 } whose
properties are compatible with those of low order ARMA processes. Using
the program PEST, it is found that the models
and
D1 2 - .838D1 _ � , 2 - .0676
=
{ Z1J } � WN(0, .0779),
Z1 2 - .61 021 _ � , 2 ,
( 1 1 .2. 1 0)
( 1 1 .2. 1 1 )
{Z1 2 } WN(O, 1 .754),
provide a good fit to the series {D, J } and { D, 2 } , yielding the "whitened"
series of residuals { Jt; J } and { lt; 2 } with sample variances .0779 and 1 .754
�
respectively.
The sample cross-correlation function of { D, 1 } and { D,2 } is shown in Figure
1 1 .3. Without taking into account the autocorrelation structures of { D, 1 }
and { D,2 } it is not possible to draw any conclusions from this function.
§ 1 1 .2. Estimation of the Mean and Covariance Function
415
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
-20
-10
0
10
20
Figure 1 1 .3. The sample cross-correlation function between { D,1 } and { D, 2 }, Example
1 1 .2.2.
Examination of the sample cross-correlation function of the whitened series
{ ft; d and { ft;2 } is however much more informative. From Figure 1 1.4 it is
apparent that there is one large sample cross-correlation (between ft; 1 and
ft; +3. 2 ) and that the others are all between ± 1 .96n- 1 12 • Under the assumption
that { ft; t } and { ft; 2 } are jointly Gaussian, Bartlett's formula (see Corollary
1 1 .2. 1 below) indicates the compatibility of the cross-correlations with a model
for which
pd - 3) # 0
and
P1 2(h) = 0,
h # - 3.
The value p 1 2 ( - 3) = .969 suggests the model,
ft; 2 = 4.74ft; _ 3• 1 + N,,
( 1 1 .2.1 2)
where the stationary noise { N, } has small variance compared with { ft; 2 } and
is uncorrelated with { ft;t }. The coefficient 4.74 is the square root of the ratio
of sample variances of { ft; 2 } and { ft; 1 }. A study of the sample values of
{ ft;2 - 4.74 ft; _3. d suggests the model for { N,},
{ U,} "' WN(O, .0782).
( 1 + .345B)N, = U,,
( 1 1 .2. 1 3)
Finally, replacing { Z,t } and { Z,2 } in ( 1 1 .2. 1 0) and ( 1 1 .2. 1 1 ) by { ft; t } and { ft;2 }
and using ( 1 1 .2. 1 2) and ( 1 1 .2. 1 3), we obtain a model relating { D1 1 } , {D,2 } and
{ U, }, namely,
I I . M ultivariate Time Series
416
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
- 0. 5
-0.6
- 0. 7
-0.8
-0.9
-1
-20
-10
0
10
20
Figure 1 1 .4. The sample cross-correlation function between the whitened series { W, 1 }
and { W, 2 }, Example 1 1 .2.2.
D, 2 + .0765 = (1 - .6 10B)(1 - .838B) - 1 [4.74(1 - .474B) - 1 D, _ 3, 1
+ ( 1 + .345B) - 1 U,].
This model should be compared with the one derived later (Section 1 3. 1) by
the more systematic technique of transfer function modelling.
(Bartlett's Formula). If {X,} is a bivariate Gaussian process
(i.e. if all of the finite dimensional distributions of {(X, 1 , X, 2 )', t = 0, ± 1, . . . }
are multivariate normal) and if the autocovariances satisfy
Theorem 1 1 .2.3
00
L I Yij (h)l < 00 ,
h= -oo
i, j = 1 , 2,
then
00
lim n Cov (p u(h), p u (k)) = L [p 1 1 (j) p22 (j + k - h) + p ,z ( j + k) pz , ( j - h)
[See Bartlett (1 955).]
- P1 2 (h) { p u (j) P1 2 U + k) + Pzz (j) pz, ( j - k)}
- P1 2 (k) { p u (j) P1 2 U + h) + Pzz (j) pz, (j - h)}
+ p u(h) p1 2 (k) { !p �, (j) + P�z (j) + !P�z (j)} J.
§1 1 .3. Multivariate ARMA Processes
417
If {Xr} satisfies the conditions of Theorem 1 1 .2.3, if either
{ Xt l } or { XtZ } is white noise, and if
Corollary 1 1.2.1 .
P 1 z (h) = 0,
then
h ¢ [a, b],
lim n Var ( p u(h ) ) = 1 ,
h ¢ [a, b].
PROOF. The limit is evaluated by direct application of Theorem 1 1 .2.3.
D
§ 1 1 . 3 Multivariate ARMA Processes
As in the univariate case, we can define an extremely useful class of multivariate
stationary processes {Xr }, by requiring that {Xr } should satisfy a set of linear
difference equations with constant coefficients.
Definition 1 1 .3.1 (Multivariate ARMA(p, q) Process). {Xn t = 0, ± 1, . . . } is an
m-variate ARMA(p, q) process if {Xr } is a stationary solution of the difference
equations,
Xr - <1> 1 Xr _ 1 - · · · - <l>pXr - p = Zr + 8 1 Zr _ 1 +
where <1> 1 , . . . , <I>P , 8 1 , . . . , eq are real m
x
···
+ eqzt - q' ( 1 1 .3 . 1 )
m matrices and {Zr}
�
WN(O, t).
The equations ( 1 1 .3. 1 ) can be written in the more compact form
( 1 1 .3.2)
{Zr} WN(O, t),
<I>(B)Xr = 8(B)Zn
where <l>(z) := I - <1> 1 z - . . . - <l>p z P and 8(z) := I + 8 1 z + . . . + eq z q are
matrix-valued polynomials, I is the m x m identity matrix and B as usual
denotes the backward shift operator. (Each component of the matrices <l>(z),
8(z) is a polynomial with real coefficients and degree less than or equal to p,
q respectively.)
�
EXAMPLE 1 1 .3. 1 (Multivariate AR(l) Process). This process satisfies
Xr
=
<I>Xr -1 + Zr,
{ Zr }
�
WN(O, t).
( 1 1 .3.3)
By exactly the same argument used in Example 3.2. 1 , we can express Xr as
00
i
( 1 1 .3.4)
xt = L <I> zt-j ,
j=O
provided all the eigenvalues of <I> are less than 1 in absolute value, i.e. provided
det(/
-
z<l>) # 0 for all z E C such that l z l
s
1.
( 1 1 .3.5)
418
1 1 . Multivariate Time Series
If this condition is satisfied then the series ( 1 1 .3.4) converges (componentwise)
both in mean square and absolutely with probability 1 . Moreover it is the
unique stationary solution of(1 1 .3.3). The condition (1 1 .3.5) is the multivariate
analogue of the condition 1 ¢' 1 < 1 , required for the existence of the causal
representation (1 1 .3.4) in the univariate case.
Causality and invertibility of a general ARMA(p, q) model are defined
precisely as in Definitions 3.1.3 and 3. 1.4 respectively, the only difference being
that the coefficients 1/Ji , ni in the representations X, = L � o t/Ji Z< -i and Z, =
L� o niX< -i • are replaced by matrices 'Pi and IIi whose components are
required to be absolutely summable. The following two theorems provide us
with criteria for causality and invertibility analogous to those of Theorems
3. 1 . 1 and 3.1 .2.
Theorem
1 1.3.1 (Causality Criterion). If
det <D(z) # 0 for all z E C such that l z l
then ( 1 1 .3.2) has exactly one stationary solution,
:::;;
1,
ro
X, = L 'l'j Z r -j •
( 1 1 .3.6)
( 1 1 .3.7)
j= O
where the matrices 'Pi are determined uniquely by
ro
i
'l'(z) := L 'l'iz = <D- 1 (z)8(z),
j=O
lzl
:::;;
1.
( 1 1 .3.8)
PROOF. The condition (1 1 .3.6) implies that there exists e > 0 such that <D -1 (z)
exists for l z l < I + e . Since each of the m 2 elements of <D- 1 (z) is a rational
function of z with no singularities in { l z l < 1 + e }, <D - 1 (z) has the power series
expansion,
i
l z l < I + e.
<D- 1 (z) = L Aiz = A (z),
j= O
Consequently A) 1 + e/2)i ---+ 0 (componentwise) as j ---+ oo, so there exists
K E (0, oo) independent of j, such that all the components of Ai are bounded
in absolute value by K ( 1 + e/2) -i, j = 0, 1, 2, . . . . In particular this implies
absolute summability of the components of the matrices Ai. Moreover we have
ro
A (z)<D(z) = I for l z l :::;; 1 ,
where I is the (m x m ) identity matrix. By Proposition 3. 1 . 1 , if {X, } is a
stationary solution of (1 1 .3.2) we can apply the operator A (B) to each side of
this equation to obtain
X, = A (B)8(B)Z,.
Thus we have the desired representation,
§1 1.3. Multivariate ARMA Processes
419
ro
xt = L 'f'j zt-j ,
j=O
where the sequence { 'f'i } is determined by ( 1 1 .3.8).
Conversely if X1 = L � o 'f'iZt -j with {'f'i } defined by ( 1 1 .3.8) then
<D(B)X1 = <D(B)'f'(B)Z1 = 8 (B) Z n
showing that {'f'(B)Z1 } is a stationary solution of ( 1 1 .3.6).
Combining the results of the two preceding paragraphs we conclude that
if det <D(z) i= 0 for I z I s 1, then the unique stationary solution of ( 1 1.3.2) is the
causal solution (1 1.3.7).
0
Since the analogous criterion for invertibility is established in the same way
(see also the proof of Theorem 3.1.2), we shall simply state the result and leave
the proof as an exercise.
Theorem 1 1 .3.2
(Invertibility Criterion). If
det 8(z) i= 0 for all z E IC such that l z l s 1 ,
( 1 1 .3.9)
and { X1 } is a stationary solution of ( 1 1 .3.2) then
ro
( 1 1 .3.10)
zt = I njxt -j ,
j=O
where the matrices nj are determined uniquely by
n(z) := I nj z j = e - 1 (z)<D(z),
j=O
ro
lzl s 1.
( 1 1.3. 1 1)
PROOF. Problem 1 1 .4.
D
Remark. The matrices 'f'i and ni of Theorems 1 1 .3. 1 and 1 1 .3.2 can easily be
found recursively from the equations
j
'f'j = I <D; 'f'j i + ej ,
i=l
-
j=
1 , 2,
(1 1 .3. 1 2)
...,
and
j
j = 1, 2, . . . ,
(1 1.3. 1 3)
nj = - I e i nj-i - <Dj,
i=l
where ei = 0, j > q, and <D; = 0, i > p. These equations are established by
comparing coefficients of z i in the power series identities ( 1 1 .3.8) and (1 1 .3.1 1)
after multiplying through by <D(z) and 8 (z) respectively.
ExAMPLE 1 1.3.2. For the multivariate ARMA(1 , 1 ) process with <D 1
=
U :n
420
and 0 1
=
I I . M ultivariate Time Series
[
<1>'1 , an elementary calculation using (1 1 .3.8) gives
_2
1
.5z(1 + .5z)
'P(z) (1 - .5z)
lzl
.5z(1 - .5z) 1 - .25z2
=
J
'
::o;
1. ( 1 1 .3. 14)
The coefficient matrices 'l'j in the representation ( 1 1 .3.7) are found, either by
expanding each component of ( 1 1 .3.14) or by using the recursion relation
(1 1.3.12), to be '1'0 = I and
'J'.J
j
=r
[.
J
+ 1 2j - 1
'
2
1
]
j
=
1, 2, . . . .
It is a simple matter to carry out the same calculations for any multivariate
ARMA process satisfying (1 1.3.6), although the algebra becomes tedious for
larger values of m. The calculation of the matrices ITj for an invertible
ARMA process is of course quite analogous. For numerical calculation of the
coefficient matrices, the simplest method is to use the recursions ( 1 1 .3.12) and
(1 1.3.13).
The Covariance M atrix Function of a Causal ARMA Process
From the representation ( 1 1 .3.7) we can express the covariance matrix r(h) =
of the causal process (1 1 .3.1) as
E(Xt + h X;)
=
=
00
h 0, ± 1' ± 2, . . . '
I +kt 'l'� ,
k=O '�'h
where the matrices 'Pj are found from (1 1.3.8) or (1 1.3. 1 2) and 'l'j : = 0 for j < 0.
It is not difficult to show (Problem 1 1 .5) that there exists B E (O, 1) and a
constant K such that the components yij(h) of r(h) satisfy l yij(h)l < Kslhl for all
i, j and h.
The covariance matrices r(h), h = 0, ± 1 , ± 2, . . . , can be determined by
solving the Yule-Walker equations,
qh)
p
j = 0, 1 , 2, . . . , (1 1 .3 . 1 5)
r(j) - I <I>,r(j - r) = I: e, t'P;_j,
j �rS: q
r=l
obtained by post-multiplying (1 1.3. 1) by x;_j and taking expectations. The
first (p + 1) of the equations (1 1.3. 1 5) can be solved for the components of
r(O), . . . , r(p) using the fact that r( - h) r'(h). The remaining equations then
give r(p + 1 ), f(p + 2), . . recursively.
The covariance matrix generating function is defined (cf. (3.5. 1)) as
=
.
= oo
=
00
L f(h)z h,
h= which can be expressed (Problem 1 1 .7) as
G(z) 'P(z)l:'P'(z- 1 ) <l> - 1 (z)E>(z)l:E> ' (z - 1 ) <l>' - 1 (z - 1 ).
G(z)
=
( 1 1 .3. 1 6)
(1 1.3. 1 7)
421
§ 1 1 .4. Best Linear Predictors of Second Order Random Vectors
§ 1 1.4 Best Linear Predictors of Second Order
Random Vectors
Let {X, = (Xr 1 , . . . , X,m)', t = 0, ± 1, ± 2, . . . } be an m-variate time series with
mean EX, = 0 and covariance function given by the m x m matrix,
K(i, j) = E(X i Xj).
IfY = ( Y1 , . . . , Ym)' is a random vector with finite second moment, we define
(1 1 .4. 1)
where Sn = sp{X,i, t = 1, . . . , n; j = 1, . . . , m}. If V = (U 1 , , Um)' is a
random vector, we shall say that u E sn if ui E Sn , i = 1, . . . ' m. It then follows
from the projection theorem that the vector P(Y I X 1 , . . . , Xn) is characterized
by the two properties :
. . •
( 1 1 .4.2)
and
i = 1, . . . , n,
( 1 1 .4.3)
where we say that two m-dimensional random vectors X and Y are orthogonal
(written X _l_ Y) if E(XY') = Om x m ·
The best linear predictor of xn + I based on the observations X I , . . . ' xn
is obtained On replacing Y by Xn + 1 in ( 1 1 .4. 1), i.e.
ifn = 0,
Xn + 1 =
P(Xn + 1 I X 1 , . . . , Xn), if n ;;:: 1 .
�
{0,
Since x n + I E Sn, there exist m
X
m matrices <l>n l ' . . . ' <I>nn such that
Moreover, from (1 1 .4.3), we have xn + I
equivalently,
n = 1, 2, . . . .
-
Xn + 1
_j_
( 1 1 .4.4)
x n + 1 - i , i = 1, . . . ' n, or
i = 1, . . . , n.
(1 1.4.5)
When X n + l is replaced by the expression in ( 1 1.4.4), these prediction
equations become
n
i = 1 , . . . , n.
L <l>niK(n + 1 j, n + 1 i) = K(n + 1 , n + 1 i),
j= I
-
-
-
In the case when {X,} is stationary with K(i, j) = r(i - j), the prediction
equations simplify to the m-dimensional analogues of (5. 1 .5), i.e.
n
(1 1 .4.6)
i = 1, . . . , n.
L: <I>nir(i - j) = r(i),
j= I
422
1 1 . Multivariate Time Series
The coefficients { <l>nJ may be computed recursively using the multivariate
Durbin-Levinson algorithm given by Whittle ( 1 963). Unlike the univariate
algorithm, however, the multivariate version requires the simultaneous
solution of two sets of equations, one arising in the calculation of the forward
predictor, P(X n + 1 I X 1, . . . , Xn), and the other in the calculation of the
backward predictor, P(X0 I X 1 , . . . , Xn ). Let <i>n 1 , . . . , <i>nn be m x m coefficient
matrices satisfying
n = 1, 2, . . . . ( 1 1 .4.7)
P(Xo i Xl , · · · X n) = <l>n! X l + · · · + <l>nn X n ,
,
Then from ( 1 1 .4.3),
n
i = 1 , . . . , n.
I <l>nj ru - i) = r( - i),
j= l
The two prediction error covariance matrices will be denoted by
V, = E(Xn + ! - Xn + l )(Xn + l - X n + l )',
Vn = E(X0 - P(X0 I X 1 , . . . , Xn))(X0 - P(X0 I X1, . . . , X n))'.
Observe from ( 1 1 .4.5) that for n 2 1 ,
Vn = E[(Xn + 1 - X n + 1 )X � + 1]
= r(O) - <l>nl r( - 1) - · · · - <l>nn r( - n)
( 1 1 .4.8)
( 1 1 .4.9)
and similarly that
( 1 1 .4. 1 0)
We also need to introduce the matrices
L\n = E[(Xn + 1 - Xn + 1 )X �]
= r(n + 1) - <l>n l r(n) -
···
- <l>nn r(l),
( 1 1 .4. 1 1)
and
Lin = E[(X0 - P(X0 / X 1 , . . . , X n))X� + 1 ]
= r( - n - 1 ) - <l>n 1 r( - n) - · · · - <l>nn r( - 1).
( 1 1 .4. 1 2)
Proposition 1 1.4.1 . (The Multivariate Durbin-Levinson Algorithm). Let { X1}
be a stationary m-dimensional time series with EX1 = 0 and autocovariance
function r(h) = E(Xt + h X;). If the covariance matrix of the nm components of
X1, . . . , X n is nonsingular for every n 2 1 , then the coefficients { <l>nJ, { <l>nJ in
( 1 1 .4.4) and ( 1 1 .4.7) satisfy, for n 2 1 ,
1
<l>nn = L:\n - 1 vn-- 1 >
<linn = Lin- ! v;_l l ,
<l>nk = <l>n - ! ,k - <l>nn <l>n - ! , n - k •
<l>nk = <l>n - l . k - <l>nn<l>n - l . n - k•
( 1 1 .4. 1 3)
k = 1, . . . , n - 1,
k = 1, . . . , n - 1,
§ 1 1 .4. Best Linear Predictors of Second Order Random Vectors
423
where V,, Vn, dn, Lin are given by ( 1 1 .4.9H l 1 .4. 1 2) with
V0 = V0 = r(O)
and
PROOF. The proof of this result parallels the argument given in the univariate
case, Proposition 5.2. 1 . For n = 1, the result follows immediately from ( 1 1 .4.6)
and ( 1 1 .4.8) so we shall assume that n > 1. The multivariate version of(5.2.6) is
(1 1 .4. 14)
where U = X 1 - P(X 1 1 X2 , . . . , Xn) and A is an m
the orthogonality condition
x m
matrix chosen to satisfy
Xn + I - AU .1 U
I.e.,
E(Xn + 1 U') = AE(UU').
( 1 1.4. 1 5)
By stationarity,
P(Xn + I I Xz , . . . ' Xn) = <l>n - l . l Xn + . . . + <l>n - J . n - I X2, ( 1 1 .4. 1 6)
( 1 1.4. 1 7)
U = X I - <f>n - J . IX 2 - · · · - <f>n - J , n - ! Xn,
and
E(UU') = Vn - 1·
(1 1.4. 1 8)
It now follows from ( 1 1 .4.3), ( 1 1 .4.1 1), (1 1 .4. 1 5) and ( 1 1 .4. 1 8) that
A = E(Xn + ! U')V,;-_11
= E[(Xn + I - P(Xn + I I Xz , . . . , Xn))U'] vn--1 1
= E[(Xn + l - P(Xn + l i Xz , . . . , Xn))X'l ] V,;-_1 1
= [r(n) - <l>n - l , l r(n - 1) - . . . - <l>n - l , n - l r(1)] V;_l l
( 1 1 .4. 1 9)
= dn - 1 v ;_\ .
Combining equations (1 1 .4. 14), ( 1 1 .4. 1 6) and ( 1 1 .4. 1 7), w e have
n- I
xn + l = A X ! + I (<l>n - l , j - A <f>n - J . n -)Xn + l - j
j� I
which, together with (1 1 .4. 19), proves one half of the recursions ( 1 1 .4. 1 3). A
symmetric argument establishes the other half and completes the proof. D
Remark 1 . In the univariate case, r(h) = r( - h), so that the two equations
( 1 1 .4.6) and ( 1 1.4.8) are identical. This implies that <l>nj = <f>nj for all j and n.
The equations ( 1 1.4. 1 3) then reduce to the univariate recursions (5.2.3) and
(5.2.4).
1 1 . Multivariate Time Series
424
If for a fixed p 2 1, the covariance matrix of (X� + 1 , . . . , X'd' is
nonsingular, then the matrix polynomial <l>(z) = I - <l>P 1 z - · · · - <l>PP zP is
causal in the sense that det <l>(z) -:f. 0 for all z E IC such that I z I :s;; 1 (cf. Problem
8.3). To prove this, let { TJ,} be the stationary mp-variate time series
Remark 2.
[ ]
x'
TJ , =
?- 1
Xt 1
X,
.
Applying Proposition 1 1.4. 1 to this process with n = 1, we obtain
lJz = lJz - lJ z + lJ z
where
lJ z
with M =
E(l) 2 TJ� )[E(l) 1 l)'1 )] -
=
1
P(TJ z l lJ 1 )
=
MTJ 1
and
lJz - lJ z
..l
(1 1.4.20)
lJ1 ·
It is easily seen, from the composition of the vectors
stationarity, that the matrix M has the form
M=
<l>p1 <l>p 2
I
0
<l>p. p - 1
0
<l>pp
0
0
I
0
0
l) 2
and
l) 1
and
( 1 1 .4.21)
0
0
0
0
1
and since det(zi - M) = zmP det(<l>(z - )) (see Problem 1 1.8), it suffices to show
that the eigenvalues of M all have modulus less than one. Let r = E(TJ 1l)'1 ),
which is positive definite by assumption, and observe that from the
orthogonality relation (1 1 .4.20),
E(TJ z - fJ z )(TJz - f] 2 )'
=
r - MrM' .
If A is an eigenvalue of M with corresponding left eigenvector a, i.e.
= A.a* where a* denotes the complex-conjugate transpose of a, then
a* M
E l a* (TJ 2 - f] 2 ) 1 2
= a*ra - a*MrM'a
= a*ra - l -1 1 2 a*ra
=
a*ra( 1 - 1 -1 1 2 ).
Since r is positive definite, we must have 1 -1 1
since this would imply that
a* (TJ2 - f] 2 )
:s;;
1 . The case I -1 1 = 1 is precluded
= 0,
§ 1 1.4. Best Linear Predictors of Second Order Random Vectors
425
which in turn implies that the covariance matrix of (X� + 1 , . . . , X'1 )' is singular,
a contradiction. Thus we conclude that det <l>(z) =P 0 for all I z I s 1 .
We next extend the innovations algorithm for computing the best one-step
predictor to a general m-variate time series with mean zero. From the
definition of Sn , it is clear that
Sn
=
t
sp { X i - X1i , j = 1 , . . . , m; t = 1 , . . . , n},
so that we may write
n
x n + l = L 0nj(Xn + l �j - xn + l �),
j= I
where { eni,j = 1 , . . . , n} is a sequence of m x m matrices which can be found
recursively using the following algorithm. The recursions are identical to
those given in the univariate case (Proposition 5.2.2) and, in contrast to the
Durbin-Levinson recursions, involve only one set of predictor coefficients.
0
Proposition 1 1 .4.2 (The Multivariate Innovations Algorithm). Let { X1} be
an m-dimensional time series with mean EX1 = for all t and with covariance
function K (i, j) = E(X ; Xj). If the covariance matrix of the nm components of
X I , . . . , xn is nonsingular for every n 2: 1 , then the one-step predictors xn + I ,
n 2: 0, and their prediction error covariance matrices V,, n 2: 1 , are given by
if n = 0,
if n ?: 1 ,
( 1 1 .4.22)
and
V0 = K(l, 1)
k�l
en,n �k = K(n + 1, k + 1) - .L en,n�j v; e �.k�j v,.� J ,
J=O
(
)
k = 0,
n�J
vn = K (n + 1 , n + 1) - L en,n�j v; e �. n�j ·
j=O
. . . , n - 1,
( 1 1 .4.23)
(The recursions are solved in the order V0; 0 1 1 , V1 ; 022, 02 1 , V2; 033, 832,
8 3 1 , V3; . . .)
·
PROOF. For i < j, X; - X; E Sj � J and since each component of xj - xi is
orthogonal to sj�l by the prediction equations, we have
(X; - X;) .l (Xj - X) if i =P j.
'
( 1 1 .4.24)
Post multiplying both sides of ( 1 1 .4.22) by (Xk + 1 - X k + 1 ) , 0 s k s n, and
I I. M ultivariate Time Series
426
taking expectations, we find from ( 1 1 .4.24) that
E �n+t (Xk+t - �k+t )'
E> - k V, .
Since (Xn+t - �n+ t ) .1 (Xk+t - �k+t ) (see ( 1 1 .4.3)), we have
EXn+t (Xk+t - �k+t )' = E�n+t (Xk+t - �k+t ) = E> n-k V, . ( 1 1 .4.25)
Replacing Xk+t in ( 1 1 .4.25) by its representation given in ( 1 1 .4.22), we obtain
k -1
e... - k v, = K (n + 1 , k + 1) - L E Xn+l (Xj+l - �j+l )'E>�.k-j•
j=O
which, by ( 1 1 .4.25), implies that
k -1
e... -k v, = K(n + 1, k + 1) - I e... -j J.jE>�.k-j·
j=O
Since the covariance matrix of X 1 , . . . , x. is nonsingular by assumption, V,
is nonsingular and hence
k -1
1
e... - k = K (n + 1 , k + 1 ) - .Io e . . . -j J.jE>�.k-j v,- .
,=
=
•. •
'
(
•.
)
Finally we have
n -1
x n+l - �n+l + L e ... -j (Xj+l - �j+l ),
j=O
which, by the orthogonality of the set {Xi - �i,j = 1, . . . , n + 1 }, implies that
n -1
L en , n -j J.jE> �. n -j
K(n + 1, n + 1) = v.. + j=O
as desired.
D
x n+l
=
Recursive Prediction of an ARMA(p, q) Process
Let {X1 } be an m-dimensional causal ARMA(p, q) process
{ Z1 } WN(O, !),
<l>(B)X1 = E> (B)Z1,
where <I>( B) = I - <1> 1 B - . . · - <l>P B P, E> (B) = I + 0 1 B + . . · + E>q Bq, det l:
# 0 and I is the m x m identity matrix. As in Section 5.3, there is a sub­
stantial savings in computation if the innovations algorithm is applied to the
transformed process
�
{
wt = x,
WI = <l>(B)Xt,
t =
t >
1 , . . . , max(p, q),
max(p, q),
( 1 1 .4.26)
rather than to {X1 } itself. If the covariance function of the {X1} process
is denoted by f( ' ), then the covariance function K ( i, j ) = E (WiW)) is found
to be
427
§ 1 1 .4. Best Linear Predictors of Second Order Random Vectors
if 1 ::;; i ::;; j ::;;
f (i - j)
p
f (i - j) - L <D, f (i + r - j) if 1 ::;; i ::;;
r=1
K(i,j) =
<
j ::;; 21,
( 1 1 .4.27)
if I < i ::;; j ::;; i + q,
if I < i and i + q
0
=
I
I,
ifj
K'(i,j)
<
<
j,
i,
where I max(p, q) and by convention 0j = om X m for j > q. The advantage
of working with this process is that the covariance matrix is zero when
I i - jl > q, i, j > I. The argument leading up to equations (5.3.9) carries over
practically verbatim in the multivariate setting to give
�
xn+1
=
{ ;:
if 1 ::;;
0 ni(Xn+1 -i - X n+1-)
n ::;;
l,
<1> 1 X" + · · · + <DpXn+1 - p + if_1 0 niX n+1-i - Xn+1 -i) if n > l,
q
�
( 1 1 .4.28)
and
E (Xn+1 - Xn+1 ) (Xn+1 - Xn+1 ) = V, ,
1 , . . . , n and V, are found from ( 1 1 .4.23) with K(i,j) as in
'
where 0"i' j =
( 1 1 .4.27).
Remark 3. In the one dimensional case, the coefficients ()"i j = 1 , . . . , q do not
depend on the white noise variance a2 (see Remark 1 of Section 5.3). However,
in the multivariate case, the coefficients 0ni of X n+1 -i - Xn+1 -i will typically
depend on 1.
'
In the case when {Xr} is also invertible, xn+ 1 - xn + 1
approximation to zn+1 for n large in the sense that
Remark 4.
IS
an
E (X n+1 - Xn+1 - Zn+d (Xn+ 1 - X n+ 1 - Zn+ 1 )' � 0 as n � oo .
It follows (see Problem 1 1. 1 2) that as
n�
j
=
oo ,
1, . . . , q,
and
EXAMPLE 1 1 .4. 1 (Prediction of an ARMA(l, 1 )). Let X1 be the ARMA(1 , 1 )
process
( 1 1 .4.29)
{ Z1 } � WN(O, 1)
with det(/ - <Dz) # 0 for I z I ::;; 1 . From ( 1 1 .4.28), we see that
n �
1.
( 1 1 .4.30)
1 1 . M ultivariate Time Series
428
The covariance function for the process {W1} defined by ( 1 1.4.26) is given by
K(i,j) =
r(O),
te',
t + ete',
0,
K'(i,j),
i, j = 1,
1 :::;; i, j = i + 1,
1 < i = j,
1 :::;; i, j > i + 1,
j < i.
{
As in Example 5.3.3, the recursions in ( 1 1.4.23) simplify to
V0 = r(O),
( 1 1 .4.3 1)
en! = et v,-::.\ ,
'
+
=
t
ete - en! V.. - 1 e� l ·
v,
In order to start this recursion, it is necessary first to compute r(O). From
(1 1 .3.15) we obtain the two matrix equations
r(O) - <I>r'(l) = t + et(<I>' + e'),
r( 1 ) - <I>r(O) = e t.
Substituting r( l ) = <l>r(O) + et into the first expression, we obtain the single
matrix equation,
(1 1.4.32)
r(O) - <I>r(O)<I>' = <I>te' + e t<I>' + t + ete',
which is equivalent to a set of linear equations which can be solved for the
components of r(O).
Ten observations X 1 , . . . , X 1 0 were generated from the two-dimensional
ARMA(1, 1) process
+
J
[xt l ] [·7 o][xr- 1 . 1 ] = [ztl ] [ _ .5 .6 [zzt- l , l ]
8
_
x1 2
o .6
xr - 1 . 2
Z1 2
.7
.
r-1 . 2
( 1 1 .4.33)
where { ZI } i s a sequence of iid N([8J , u l -� l ] ) random vectors. The values
of xn + j, v, and en ! for n = 0, 1 , . . . ' 10, computed from equations ( 1 1 .4.30}-­
( 1 1 .4.32), are displayed in Table 1 1. 1 . Notice that the matrices V, and en 1 are
converging rapidly to the matrices t and e , respectively.
Once X l , . . . ' xn are found from equations ( 1 1.4.28), it is a simple matter
to compute the h-step predictors of the process. As in Section 5.3 (see equations
(5.3.1 5)), the h-step predictors Ps"Xn+h ' h 1, 2, . . . , satisfy
n+h -1
Xn h j - xn + h -J,
en h
jL
=h + -l.A + =
h
>
l-n
§ 1 1 .4. Best Linear Predictors of Second Order Random Vectors
429
Table 1 1 .1. Calculation of X. for Data from the ARMA(1, 1)
Process of Example 1 1 .4. 1
n
0
2
3
4
5
6
7
8
9
xn +l
[ - 1 .875 ]
[ - 2.1 .693
5 1 8]
- .030
[ - 3.002]
1 .057
[ -- 2.454
]
- 1 .038
[ - 1. 1 19]
[ --1.086
.720]
- .455
[ - 2.738]
.962
[ - 2.565
]
1 .992
[ - 4.603]
2.434
[ - 2.689
]
2. 1 1 8
10
[7.240
3.701
[2.035
[ 11 .060
.436
.777
[1.215
. 14 1
[ 1 .740
.750
[1 . 1 1 3
[ 1 .744
.085
.728
[ 1 .059
[ 1 .721
.045
.722
[ 1 .038
[ 1 .721
.030
v,
]
]
]
]
]
]
]
]
]
]
]
3.701
6.7 1 6
1 .060
2.688
.777
2.323
.740
2.238
.750
2. 177
.744
2.1 19
.728
2.084
.721
2.069
.722
2.057
.721
2.042
.717
.7 1 7 2.032
e. ,
[ .01 3
[ - .. 142
193
- .3 5 1
[ .345
.426
[ - .424
[ - .5.4421 2
.580
[ - .446
.610
[ - .461
.623
[ - .475
.639
[ - .480
[ - .48.6571
�n+l
]
]
]
]
]
]
]
]
]
]
.224
.243
.502
.549
.554
.61 7
.555
.662
.562
.707
.577
.735
.585
.747
.586
.756
.587
.767
.59 1
- .666 .775
[�]
[ - .958]
1 .693
]
[ - 2.930
- .4 1 7
[ - 2.48 1 ]
[ -- 11.000
.728 ]
- .662
[ - .073 ]
[ - 1.304
.001 ]
.331
[ - 2.809
]
2.754
[ - 2. 1 26]
.463
[ - 3.254
]
4.598
[ - 3.077]
- 1.029
where for fixed n, the predictors P8" Xn+l , P8"Xn+2, P8" Xn +J • . . . are determined
recursively from ( 1 1 .4.34). Of course in most applications n > l = max(p, q),
in which case the second of the two relations in (1 1 .4.34) applies. For the
ARMA(1, 1) process of Example 1 1 .4. 1 we have for h ;;::: 1,
=
[(.7)
h - 1 x�n +l . l
.
(.6)h - l xn +l . 2
J
More generally, let us fix n and define g(h) := P8" Xn+ h · Then g(h) satisfies the
multivariate homogeneous difference equation,
g(h) - <l> l g(h - 1) - . . . - <l>p g(h - p) = 0,
for h > q, (1 1.4.35)
430
1 1. M ultivariate Time Series
with initial conditions,
i = 0, . . . , p - 1 .
By appealing to the theory of multivariate homogeneous difference equations,
it is often possible to find a convenient representation for g(h) and hence
Ps, Xn + h by solving ( 1 1 .4.35).
§ 1 1.5 Estimation for Multivariate ARMA Processes
If { X, } is a causal m-variate ARMA(p, q) process,
X , - <11 1 X,_ 1 - · · · - <DpXr-p = Z, + 0 1 Z,_ 1 +
···
+ eqzr-q , ( 1 1 .5. 1 )
where {Z1 } � WN(O, t), then the Gaussian likelihood of {X 1 , . . . , X. } can be
determined with the aid of the multivariate innovations algorithm and the
technique used in Section 8.7 for the univariate case.
For an arbitrary m-variate Gaussian process {X1 } with mean and
covariance matrices
0
K (i,j) = E (X;Xj),
we can determine the exact likelihood of {X 1 , , Xn } as in Section 8.6. Let X
denote the nm-component column vector of observations, X := (X '1 , , X�)'
and let X := (X '1 , , X� )' where X 1 , , x . are the one-step predictors defined
in Section 1 1 .4. Assume that r. := E (XX') is non-singular for every n and let
eik and J!j be the coefficient and covariance matrices defined in Proposition
1 1.4.2, with 0;o = I and 0ii = 0, j < 0, i = 0, 1 , 2, . Then, introducing the
(nm x nm) matrices,
• . .
• • •
• . .
. • •
...
(1 1 .5.2)
and
D
=
diag { V0, . . . , V, _ J },
(1 1 .5.3)
we find by precisely the same steps as in Section 8.6 that the likelihood of
{ X 1 , . . . , Xn } is
n
- 1 12
1 n
exp - - L (Xj - XJ' J!f=Uxj - XJ '
L(r.) (2n) - nmf2 n det J!f- 1
2 j=l
J=l
( 1 1 .5.4)
=
(
)
{
}
where the one-step predictors xj and the corresponding error covariance
matrices Tj _ 1 ,j = 1 , . . . , n, are found from Proposition 1 1.4.2. Notice that the
calculation of L(r.) involves operations on vectors and square matrices of
dimension m only.
To compute the Gaussian likelihood of {X 1 , . . . , X. } for the ARMA process
( 1 1 .5. 1 ) we proceed as in Section 8.7. First we introduce the process {W, }
§1 1 .5. Estimation for Multivariate ARMA Processes
431
defined by ( 1 1 .4.26) with covariance matrices K(i,j) = E(W; Wj) given by
( 1 1 .4.27). Applying the multivariate innovations algorithm to the transformed
process {W,} gives the coefficients ejk and error covariance matrices V; in
the representation of ( 1 1 .4.28) of Xj . Since xj - xj = wj - Wj, j 1 ,
2, . . . , i t follows from ( 1 1 .5.4) that the Gaussian likelihood L (<D, 0, t) of
{ X r. . . . , Xn } can be written as
n
nm
det
L(<D, 0, t) = (2n) - fZ
( 1 1 .5.5)
+1
(})1 liJ- 1 )-1;2
=
where Xj is found from ( 1 1.4.28) and ejb V; are found by applying Proposition
1 1 .4.2 to the covariance matrix ( 1 1 .4.27).
In view of Remark 3 of Section 1 1 .4, it is not possible to compute maximum
likelihood estimators of <D and 0 independently of t as in the univariate
case. Maximization of the likelihood must be performed with respect to all
the parameters of <D, 0 and t simultaneously. The potentially large number
of parameters involved makes the determination of maximum likelihood
estimators much more difficult from a numerical point of view than the
corresponding univariate problem. However the maximization can be per­
formed with the aid of efficient non-linear optimization algorithms.
A fundamental difficulty in the estimation of parameters for mixed ARMA
models arises from the question of identifiability. The spectral density matrix
of the process ( 1 1 .5.1) is
f(w) =
_!_ c!> - 1 (e - iw)e(e - iw)t e'(e iw)c!> ' - 1 (e iw).
2n
The covariance matrix function, or equivalently the spectral density matrix
function f( · ), of a causal invertible ARMA process does not uniquely
determine t, ci>( · ) and 8( · ) unless further conditions are imposed (see
Dunsmuir and Hannan ( 1976)). Non-identifiability of a model results in a
likelihood surface which does not have a unique maximum. The identifiability
problem arises only when p > 0 and q > 0. For a causal autoregressive or
invertible moving average process, the coefficient matrices and the white
noise covariance matrix t are uniquely determined by the second order
properties of the process.
It is particularly important in the maximum likelihood estimation of
multivariate ARMA parameters, to have good initial estimates of the
parameters since the likelihood function may have many local maxima which
are much smaller than the global maximum. Jones ( 1984) recommends initial
fitting of univariate models to each component of the series to give an initial
approximation with uncorrelated components.
Order selection for multivariate ARMA models can be made by minimizing
I I . Multivariate Time Series
432
a multivariate analogue of (9.3.4), namely
2 ln L(<l> 1 , . . . , <l>p, 0 1 , . . , eq , I) + 2(k + 1 )nmj(nm - k 2),
where k = (p + q)m 2 .
Spectral methods of estimation for multivariate ARMA parameters are
also frequently used. A discussion of these (as well as some time domain
methods) is given in Anderson ( 1 980).
AICC
=
-
.
-
Estimation for Autoregressive Processes Using
the Durbin-Levinson Algorithm
There is a simple alternative estimation procedure, based on the multivariate
Durbin-Levison algorithm, for fitting autoregressions of increasing order.
This is analogous to the preliminary estimation procedure for autoregressions
in the univariate case discussed in Section 8.2. Suppose we have observations
x l > . . . , x. of a zero-mean stationary m-variate time series and let
f(O), . . . , f'(n - 1 ) be the sample covariance function estimates. Then the
fitted AR(p) process (p < n) is
=
Where the COefficientS <Dp l' . . , <DPP and Vp are COmputed recursively from
Proposition 1 1 .4. 1 with r(h) replaced by f'(h), h 0, . . . , n - 1 . The order p
of the autoregression may be chosen to minimize
AICC - 2 In L(<I>P 1 , . . . , <I>PP ' VP ) + 2(pm2 + 1)nmj(nm - pm2 - 2).
.
=
ExAMPLE 1 1 .5. 1 (Sales with a Leading Indicator). In this example we fit an
autoregressive model to the bivariate time series of Example 1 1 .2.2. Let
xtl
and
xr 2
=
=
( 1 - B) ¥; 1 - .0228,
t
=
1, . . . , 149,
( 1 - B) ¥; 2 - .420
t
=
1 , . . . , 1 49,
where { Y;d and { ¥, 2 }, t = 0, . . . , 149, are the leading indicator and sales data
respectively. The order of the minimum AICC autoregressive model for
(X1 1 , X1 2 )', computed using the program ARVEC, is p = 5 with parameter
estimates given by
- . 1 92 - .0 1 8
- .5 1 7
.024
- .073 .010
<!> 1 , <1> 2 =
<1> =
,
3
5
.047
.250
5
5
.01
9
.05
1
4.678 .207
-
<!>
54
-
-
[
[
- .032
3.664
]
- .009 ]
,
.004
<D
55
=
[
[
]
-
,
.022 .01 1
1.300 .029 ,
]
[
[
.076
v
=
5
- .003
]
- .003 ]
,
.095
433
§1 1 .5. Estimation for Multivariate ARMA Processes
and AICC = 1 14.94. Since the upper right component of each of the
coefficient estimates is near 0, we may model the { X, 1 } process separately
from { X, 2 }. The MA(1) model
{ U,} WN(O, .0779)
( 1 1 .5.6)
X1 1 = (1 - .414B)U,,
"'
provides an adequate fit to the series { X, d. Inspecting the bottom row of
the coefficient matrices, <1> 5 1 , . . . , <!> 5 5 , and deleting those elements which are
near 0, we arrive at the approximate relation between {X,d and {X, 2 } given
by
X, 2 = .250X, _ 2 , 2 + .207X, _ 3, 2 + 4.678X, _ 3, 1 + 3.664X, _ 4 , 1
+ 1.300X, _ 5, 1 + W,
or, equivalently,
x, 2 =
4.678B 3 (1 + .783B + .278B2 )
X1 1 + ( 1 - .250B 2 - .207B 3) _ 1 w,
( 1 - .250B 2 - .201B 3 )
( 1 1.5. 7)
where { W,} WN(O, .095). Moreover, since the estimated noise covariance
matrix is essentially diagonal, it follows that the two sequences {X,d and
{ W,} are uncorrelated. This reduced model ( 1 1.5.6) and ( 1 1 .5.7) is an example
of a transfer function model which expresses the output series { X, 2 } as the
output of a linear filter with input {X,d plus added noise. The model ( 1 1.5.6)
and ( 1 1 .5.7) is similar to the model found later in Section 13.1 (see (13.1.23))
using transfer function techniques.
Assuming that the fitted AR(5) model is the true model for
{X, := (Xt l , Xd'}, the one- and two-step ahead predictors of x l SO and x l 5 1
are
"'
=
and
=
[ ]
. 163
- .2 1 7
[ ]
- .027
.816 '
with error covariance matrices
- .003
.095 '
]
=
[
.0964
- .0024
- .0024
.
. 0953
]
1 1 . Multivariate Time Series
434
Forecasting future values of the original data Y1 = ( }; 1 , ¥; 2)' is analogous to
the forecasting of univariate ARIMA models discussed in Section 9.5. Let
P 1 49( · ) denote the operator P( · 1 1, Y 0 , . . . , Y 1 49) where 1 = (1, 1)' and assume,
as in the univariate case, that Yo j_ X I , . . . ' x 1 49 • Then, defining sn as in
(1 1.4. 1), we find (see Problem 1 1.9) that
=
[ ] [ ] [ ] [ ]
=
[ ] [ ] [ ] [ ]
.0228
. 1 63
1 3.4
1 3.59
=
+
+
.420
262.7
262.90
- .217
and
.0228
- .027
+
.420
.8 16
with error covariance matrices
+
13.59
1 3.59
262.90 = 264. 14 '
- .003
.095
]
and
]
E[(Y1 s 1 - P1 49Y l s l)(Y l s l - P 1 49 Y 1 s 1YJ = (I + <l>sdVs (I + <l>s1Y + Vs
.094 - .003
=
.
- .003
.181
[
These predicted values, computed using the program ARVEC, are i n close
agreement with those obtained from the transfer function model of Section
1 3 . 1 (see (13.1.27) and ( 1 3. 1.28). Although the two models produce roughly
the same prediction mean squared errors for the leading indicator data, the
AR model gives substantially larger values for the sales data (see (13.1 .29)
and (13.1.30)).
§ 1 1.6 The Cross Spectrum
Recall from Chapter 4 that if { X1 } is a stationary time series with absolutely
summable autocovariance function y( · ), then { X1 } has a spectral density
(Corollary 4.3.2) given by
1
00
f(A_) = - L e- ihA y (h),
2n h = - oo
-n
::::;
A
::::;
n,
(1 1.6. 1)
435
§ 1 1 .6. The Cross Spectrum
and the autocovariance function can then be expressed as
y (h) =
J:,
.
e ih Aj(A) dA.
( 1 1 .6 2)
By Theorem 4.8.2, the process { X, } has a corresponding spectral
representation,
X,
where {Z(A), -n
:s;
= J(-n,n]
(
A
:s;
e i'A dZ(A),
(1 1 .6.3)
n} is an orthogonal increment process satisfying
the latter expression representing the contribution to the variance of {X, } from
harmonic components with frequencies in the interval (A 1 , }, 2 ].
In this section we shall consider analogous representations for a bivariate
stationary time series, X, = (Xr 1 , X, 2 )', with mean zero and covariances
yij(h) = E(Xr+ h , i Xtj) satisfying
cY)
(1 1 .6.4)
i, j = 1, 2.
I I Yij(h) l < 00 ,
h=
Although we shall confine our discussion to bivariate time series, the ideas
can easily be extended to higher dimensions and to series whose covariances
are not absolutely summable (see Section 1 1 .8).
-ro
0
Definition 1 1.6.1 (The Cross Spectrum). If {X, } is a stationary bivariate time
series with mean and covariance matrix function r( · ) satisfying ( 1 1 .6.4), then
the function
1 � - ihA
e Y1 2 (h),
), E [ - n, n],
fdA) =
2n h=� oo
is called the cross spectrum or cross spectral density of {X, 1 } and { X, 2 }. The
matrix
/1 1 (h) ft 2 (h)
j(A) = __1_ I e -ih Ar(h) =
2n h= - oo
/2 1 (h) f22 (h)
is called the spectral density matrix or spectrum of {X,}.
[
J
The spectral representations of yij(h) and r(h) follow at once from this
definition. Thus
i, j = 1, 2,
and
( 1 1 .6.5)
1 1 . Multivariate Time Series
436
The function fi i ( · ) is the spectral density of the univariate series { X1; } as
defined in Chapter 4, and is therefore real-valued and symmetric about zero.
However since yii( · ), i =F j, is not in general symmetric about zero, the cross
spectrum J;j ( · ) is typically complex-valued.
If { Z;(A), - n ::::; A ::::; n } is the orthogonal increment process in the spectral
representation of the univariate series { Xti }, then we know from Chapter 4 that
X1 ; =
I
J(-1t,1t]
e itl dZ;(A)
(1 1 .6.6)
and
( 1 1 . 6. 7 )
the latter being an abbreviation for f1� J;;(A) dA = E I Z;(A 2 ) - Z;(Adl 2 , - n ::::;
A 1 ::::; A- 2 ::::; n. The cross spectrum J;j(A) has a similar interpretation, namely
k(A.) dA.
=
E(dZ;(A) dZp)),
( 1 1 .6.8)
which is shorthand for J1� J;)A.) dA = E [(Z;(A 2 ) - Z;(A 1 )) (Z)A2 ) - Zp1 ))],
- n ::::; A I ::::; A z ::::; n. As shown in Section 1 1 .8, the processes { z l (A.)} and
{ Z2 ().) } have the additional property,
E(dZ;(A) dZiJ-t) ) = 0 for A
=F
J1 and i, j
=
1 , 2.
The relation (4. 7.5) for univariate processes extends in the bivariate case to
i, j = 1, 2,
( 1 1 .6.9)
for all functions g and h which are square integrable with respect to J;; and jjj
respectively (see Remark 1 of Section 1 1 .8). From ( 1 1 .6.8) we see that
f2 1 (A) !1 z (A).
This implies that the matrices f(A.) are Hermitian, i.e. that j().) = f *(A ) where
2
* denotes complex conjugate transpose. Moreover if a = (a 1 , a 2 )' E C then
a*f(A.)a is the spectral density of {a*X1 }. Consequently a *f(A.) a ;::::: 0 for all
2
a E C , i.e. the matrix f(A.) is non-negative definite.
The correlation between dZ 1 (A) and dZ 2 (A) is called the coherency or
coherence, %u(A), at frequency A. From ( 1 1 .6.7) and ( 1 1 .6.8) we have
( 1 1 .6. 1 0)
%u(A.) = ju(A)/[ f1 1 (A)jzz (A.)] 1 1 2 .
By the Cauchy-Schwarz inequality, the squared coherency function I Xu{AW
satisfies the inequalities,
=
-n
::::; A ::::; n,
and a value near one indicates a strong linear relationship between dZ1 (),) and
dZ2 (A).
437
§ 1 1 .6. The Cross Spectrum
Since /1 2 (A.) is complex-valued, it can be expressed as
f1 2 (A.) = c dA.) - iq dA.),
where
and
q dA.) = - Im { f1 2 (A.) } .
The function c 1 2 (A.) is called the cospectrum of {Xt d and { Xt 2 }, and q 1 2 (A.) is
called the quadrature spectrum.
Alternatively f1 2 {A.) can be expressed in polar coordinates as
where
1
adA.) = (c f 2 (A.) + qi 2 (A.)) 12
is called the amplitude spectrum and
r/>dA.) = arg (c 1 2 (A.) - iq dA.)) E ( - n, n],
the phase spectrum of {X t d and { Xt 2 } . The coherency is related to the phase
and amplitude spectra by
X"1 2 (A.) = a d A.)[fl l (A.)fn(A.)r 1 12 e x p[i¢ dA.)] = f X'dA.) f ex p[icf> 1 2 (A.)].
EXAMPLE 1 1 .6. 1 . Let {X t } be the process defined in Example 1 1 . 1 . 1 , i.e.
where { Zt } � WN (0, 1 ). Then
f( A.) = __!__ [r( - 1 0) e l O iJc + r(O) + r(10)e - ! O i Jc J
2n
and
1
/1 2 (A.) = - [ 1 + .75 cos(10A.) + .75i sin(1m)]
2n
= a 1 2 (J�)exp [ir/> 1 2 (A.)],
where the amplitude spectrum a 1 2 (A.) is
a dA.) =
1
[ 1 .5625 + 1 .5 cos( l OA.)] 1 i2 ,
2n
and
tan r/>1 2 (A.) = .75 sin(10A.)/[1 + .75 cos(10A.) ].
1 1. Multivariate Time Series
438
Since f1 1 ()o) (2n) - 1 and f22 (A) = (2n) - 1 (1.5625 + 1 .5 cos(1 0A)), the squared
coherency is
=
- n :;:::; A :;:::; n .
The last result is a special case of the more general result that
l ./f'u(A) I 2 1, - n :;:::; A :;:::; n, whenever {X t t } and {Xd are related by a
time-invariant linear filter. Thus if
Remark 1 .
=
X, 2
L tf!i Xt -j, 1
j=
where Li I t/li l < oo , then by Theorem 4. 1 0. 1 ,
X, 2
GO
=
� oo
=
(4.
t/Ji e - ii'- ) e;,;_ dZ1
J(-1t,1t]
l
(A).
Hence dZ2 (A) = Li tf!i e - ii'- dZ1 (A), - n :;:::; A :;:::; n. Since dZ2 (A) and dZ 1 (A) are
linearly related for all ) the squared absolute correlation between dZ 1 (A) and
dZ2(A), i.e. l ffdA) I 2 , is 1 for all A. This result can also be obtained by
observing that
J
o,
E (X, + h , 2 X,d =
=
whence
l l;, tf!i e i<r+ h -i)'- dZ1 (A) l e i''- dZ1 (A) J
E [ J(-1t,1t]
J(-1t, 1t]
J:" (� t/Jie-ii'-) e ih'-J1 1 (A) dA,
J
L tf!i e - ii'-jl l (A).
j
Substituting in (1 1 .6. 1 0) and using the fact that f22 (A) I Li t/Ji e - ii'- 1 2 f1 1 (A), we
obtain the same result, i.e. I Xu{AW 1, - n :;:::; A :;:::; n.
f2 t (A)
=
=
=
If {Xt t } and {Xd have squared coherency l ffdA) I 2 and if linear
filters are applied to each process giving
Remark 2.
Y, l
L !Xj Xr -j, l
j=
GO
=
- oo
and
00
L f3j Xr -j, 2
j=
where Li l cxi l < oo and Li l f3il < oo, then { Y, d and { ¥, 2 } have the same
squared coherency l ffdAW. This can be seen by considering the
spectral representations x ,k f( - 1t , 1tit v dZk(v), Y,k L - 1t , 1t] e irv dZy,(v), and
observing, from Theorem 4. 1 0.1, that
dZy , (v) L !Xi e- iiv dZ1 (v)
j
¥, 2
=
- oo
=
=
=
439
§1 1 .6. The Cross Spectrum
and
dZy2(v) = L {3i e - ii v dZ2 (v).
j
From these linear relations it follows at once that the correlation between
dZy, (v) and dZy2(v) is the same as that between dZ1 (v) and dZ2 (v).
Remark 3. Let {X,} be a bivariate stationary series and consider the prob­
lem of finding a time-invariant linear filter 'P = { 1/Ji } which minimizes
E I X, 2 - L� l/li Xr -j. I I 2 . If 'I' is any time-invariant linear filter with transfer
function
1/J(e - iv) = L 1/Jj e - iiv,
j= - oo
then using ( 1 1 .6.6) and ( 1 1 .6.9) we can write
2
2
eit v dZ2 (v) 1/J(e - iv)eit v dZ! (v)
E x, 2 1/Jj Xr -j. l = E
j = oo
- oo
co
I
�
1 I f"
= J:,
f"
1
U22 (v) - I/J (e - iv)f1 2 (v) - 1/J (e iv)fd v)
+
=
1 1/J(e - iv)l 2/1 1 (v)] dv
J:,
E l dZ2 (v) - I/J (e - iv) dZ1 (vW.
It is easy to check (Problem 1 1. 1 3) that the integrand is minimized for each v
if
( 1 1 .6. 1 1)
and the spectral density of Li l/li Xr -j. l is then f2 1 1 (v) = l /2 1 (vW/f1 1 (v). The
density /2 1 1 is thus the spectral density of the linearly filtered version of { Xr 1 }
which is the best mean square approximation to { X, 2 } . We also observe that
f2 1(v)
( 1 1 .6. 1 2)
I :ff1 2 (Jc) l 2 = 1 ,
f22 (v)
so that l ffdJcW can be interpreted as the proportion of the variance of {Xa}
at frequency v which can be attributed to a linear relationship between { X, 2 }
and { Xn }.
Remark 4. If { X, J } and {X, 2 } are uncorrelated, then by Definition 1 1 .6. 1 ,
fdv) = 0, - n ::;; v ::;; n, from which it follows that l ffdJc) l 2 0,
- n ::;; v ::;; n.
=
ExAMPLE 1 1 .6.2. Consider the bivariate series defined by
I I. Multivariate Time Series
440
where ¢ > 0 and {Z1}
{X1 2 } is
�
WN(O, (J 2 1). The cross covariance between {X1d and
if h = - d,
otherwise,
and the cross spectrum is therefore
fd.!c) = (2n)- l ¢)(J z eid)..
The amplitude and phase spectra are clearly
a d),) = (2n)-l «fo(Jz
and
¢ d.!c) = (d.!c + n)mod(2n) - n.
(The constraint - n < «fo 1 2 (),) ::o;; n means that the graph of «fo 12 (.!c), - n < A ::o;; n,
instead of being a straight line through the origin with slope d, consists of
2r + 1 parallel lines, where r is the largest integer less than (d + 1 )/2. Each
line has slope d and one of them passes through the origin.) Since /1 1 (.!c) =
(J 2/(2n) and f22 (A) = (J 2 ( 1 + «fo 2 )/2n, the squared coherency is
- n ::o;; A ::o;; n.
5. In the preceding example the series { X1 2 } is a lagged multiple of
{ X1 1 } with added uncorrelated noise. The lag is precisely the slope of the phase
spectrum «fo 1 2 . In general of course the phase spectrum will not be piecewise
linear with constant slope, however «fo 1 2 (Jc) can still be regarded as a measure
of the phase lag of { xt2 } behind { xt I } at frequency A in the sense that
fd.!c) d.!c = a 1 2 (.!c)ei¢,2 <;.> dJc = E [ l dZ 1 (.!c) l l dZ2 (.!c) l ei<e.<;.> - e2 <;.>> ],
Remark
where E>JA.) = arg(dZ;(),)), i = 1 , 2. We say that X1 2 lags d time units behind
X1 1 at frequency A if ex p (it.!c) dZ2 (Jc) = exp (i(t - d)),) dZ 1 (.!c). We can then write
/1 2 (),) d.!c = Cov (dZ 1 (.!c), exp(- id.!c) dZ1 (.!c)) = exp (id.!c)/1 1 (.!c) d.!c.
Hence «fod.!c) = arg(fd.!c)) = (d.!c + n)mod(2n) and «fo� 2 (.!c) = d. In view of its in­
terpretation as a time lag, ¢� 2 (.!c) is known as the group delay at frequency A.
EXAMPLE 1 1 .6.3 (An Econometrics Model). The mean corrected price and
supply of a commodity at time t are sometimes represented by X1 1 and X1 2
respectively, where
{xt l = - «fo1 Xtz + uto
o < «P1 < 1,
( 1 1 .6. 1 3)
0
< ¢z < 1 ,
,
Xtz = «ftz Xt- 1 . 1 + V,
where { U1 } WN(O, (Jb), { V, } WN(O, (JB) and { U1 } , { V, } are uncorrelated.
We now replace each term in these equations by its spectral representation.
Noting that the resulting equations are valid for all t, we obtain the following
equations for the orthogonal increment processes Z 1 , Z2 , Zu and Zv in the
spectral representations of { xt l }, { xt2 }, { ut } and { v, }:
�
�
441
§1 1 .6. The Cross Spectrum
and
dZ2 (A) rjJ2 e - i;. dZ 1 ()o) + dZv(A).
Solving for dZ1 (A) and dZ2 (A), we obtain
dZ1 (A.) = (1 + r/J 1 r/J2 e - i;.r 1 [ - r/J 1 dZv(A) + dZu(A)]
=
and
dZ2 (A.) = ( 1 + rP 1 rP2 e - iJ.) - 1 [dZv(A.) + r/J2 e - iJ. dZu()o)] .
From (1 1 .6.8) and ( 1 1 .6.9) it follows that
/1 1 (A) = 1 1 + </J 1 </J 2 e - i;. I - 2 (CJ b + </J I CJ W(2n),
fn( A.) 1 1 + <P 1 </J 2 e - i;. I - 2 (CJ� + </J � CJ b)/(2n),
and
/1 2 ()o) = 1 1 + </Y 1 </J 2 e - i;. I - 2 ( </J 2 CJ� COs A - </J 1 CJ� + i</J 2 CJ b sin A.)/(2n).
=
The squared coherency is therefore, by ( 1 1 .6. 1 0),
r/JI t + r/J l CJt - 2rjJ1 rP2 CJb (JB co d
I X1 2 (A.W = CJ
'
r/JI (Jt + r/Jl (Jt + ( 1 + r/Jl r/JI )(Jb (JB
and
Notice that the squared coherency is largest at high frequencies. This suggests
that the linear relationship between price and supply is strongest at high
frequencies. Notice also that for A close to n,
(r/J CJb cos A. - r/J1 CJB ) r/J2 CJb co d
r/J,u(A.) � 2 "' 2
('1'2 CJu cos A1 - '1'A. 1 CJv2 ) 2
indicating that price leads supply at high frequencies as might be expected. In
the special case r/J 1 = 0, we recover the model of Example 1 1 .6. 2 with d = 1,
for which </J dA.) = (A. + n)mod(2n) - n and </J 'dA.) = 1 .
EXAMPLE 1 1 .6.4 (Linear Filtering with Added Uncorrelated Noise). Suppose
that \f { 1/Ji ,j = 0, ± 1 , . . . } is an absolutely summable time-invariant linear
filter and that { X, d is a zero-mean stationary process with spectral density
f1 1 (A). Let { N, } be a zero-mean stationary process uncorrelated with { X, d
and with spectral density fN()o). We then define the filtered process with added
noise,
( 1 1 .6. 1 4)
x, 2 = j=L: I/Jj X,_j. 1 + N,.
- oo
=
OCJ
I I . M ultivariate Time Series
442
Since { Xr 1 } and { N, } are uncorrelated, the spectral density of { X, 2 } is
where
j
t{! (e-;;') = L� - w t{!je-i A.
( 1 1 .6. 1 5)
Corresponding to ( 1 1 .6. 14) we can also write
( 1 1 .6. 1 6)
where Z2 , Z1 and ZN are the orthogonal increment processes in the spectral
representations of { X, 2 }, { Xn } and { N, }. From (1 1 .6. 1 6),
E(dZ2 (),) dZ1 (A) )
=
t{! (e-iA)/1 1 (A) dA
and hence
The amplitude spectrum is
and since /1 1 is real-valued, the phase spectrum coincides with the phase gain
of the filter, i.e.
qJ2 1 (A) = a rg(t{! (e-iA)).
In the case of a simple delay filter with lag d, i.e. t{!j = 0, j i= d, q)2 1 (A) =
n)mod(2n) - n, indicating that { Xn } leads { X, 2 } by d
as expected.
The transfer function t{! (e-i·) of the filter, and hence the weights { t/JJ , can
be found from the relation
arg(e - idA) = ( - dA +
( 1 1 .6. 1 7)
quite independently of the noise sequence { N, } . From ( 1 1 .6. 1 5) and ( 1 1 .6. 1 7)
we also have the relation,
2
fz z (),) = l fz t (A) I //l i (A) + fN (A)
= 1 Jf2 1 (AW/z z (A) + fN(A),
where
I Jf2 1 (AW
is the squared coherency between {X, 2 } and {X, J } . Hence
2
( 1 1 .6. 1 8)
fN(A) = ( 1 - 1 Jfz t (A) I )/zz (A),
and by integrating both sides, we obtain
a� : = Var(N ,) =
f�y
- ! Jf2 1(AW )fz z (A) dA.
In the next section we discuss the estimation of/1 1 (A), f2 2 (A) and f1 2 (A) from
n pairs of observations, (X, 1, Xd', t = 1 , . . . , n. For the model ( 1 1.6. 1 4), these
estimates can then be used in the equations ( 1 1 .6. 1 7) and ( 1 1 .6. 1 8) to estimate
the transfer function of the filter and the spectral density of the noise sequence
{N, } .
443
§1 1.7. Estimating the Cross Spectrum
§ 1 1.7 Estimating the Cross Spectrum
Let {X, } be a stationary bivariate time series with EX, = J1 and E(X,+ h X;) where the covariance matrices r(h) have absolutely summable
components. The spectral density matrix function of {X,} is defined by
'
JlJl = r(h),
[/11
]
1
(A) fdA.)
= ( 2nr l I r (h)e - ih .<,
- n � A � n.
(A)
fz i
fzz (A)
h= - oo
In this section we shall consider estimation of
by smoothing the multi­
variate periodogram of {X , . . . , X. }. First we derive bivariate analogues of
the asymptotic results of Sections 10.3 and 1 0.4. We then discuss inference for
the squared coherency, the amplitude spectrum and the phase spectrum which
were defined in Section 1 1 .6.
The discrete Fourier transform of {X 1 , , X.} is defined by
f()o) =
/(A)
• • •
n
J(wJ = n - 1/2 L X, e - itwJ,
t =l
where wi = 2nj/n, - [(n - 1 )/2] � j � [n/2], are the n Fourier frequencies
introduced in Section 10. 1 . The periodogram of { X I , . . . , x. } is defined at each
of these frequencies wi to be the 2 x 2 matrix,
J.(wi ) = J(w)J * (wi ),
where * denotes complex conjugate transpose. As in Section 10.3 the definition
is extended to all frequencies w E [ - n, n] by setting
I" (w) =
{I.(g(n,
I:(g(n,
w))
w) )
-
if w � 0,
if w < 0,
( 1 1 .7. 1 )
where g(n, w), 0 � w � n, is the multiple of 2n/n closest to w (the smaller one
if there are two). We shall suppress the subscript n and write Iii(w), i, j = 1 ,
. . . , n, for the components of I.(w). Observe that Iu(w) i s the periodogram of
the univariate observations {Xu , . . . , X.;}. The function I 1 2 (w) is called the
cross periodogram. At the Fourier frequency wk it has the value,
Asymptotic Properties of the Periodogram
Since the next two propositions are straightforward extensions of Propositions
1 0. 1 .2 and 1 0.3. 1, the proofs are left to the reader.
Proposition 1 1 .7.1 .
n- 1 L �= l X,, then
If wi is any non-zero Fourier frequency and X. =
I I . Multivariate Time Series
444
where f(k) = n-1 L��t (Xr+ k - Xn )(X, - Xn ) ', k 2: 0, and f(k) = f' ( - k), k < 0.
The periodogram at jrequency zero is
In (O) = nXn X�.
If {X,} is a stationary bivariate time series with mean 1.1 and
covariance matrices r(h) having absolutely summable components, then
Proposition 1 1 .7.2.
(i)
and
Ein (O) - niJIJ' ---> 2nf(O)
Ein (w) ---> 2nf(w),
ifw of. 0
where f( · ) is the spectral matrix function of {X,}.
(ii)
We now turn to the asymptotic distribution and asymptotic covariances
of the periodogram values of a linear process. In order to describe the asymp­
totic distribution it is convenient first to define the complex multivariate
normal distribution.
1 1 .7.1 (The Complex Multivariate Normal Distribution). If l: =
1:Definition
1 + il:2 isma complex-valued m m matrix such that l: = l:* and a*l:a 0
for all a E e , then we say that y = y 1 + iY 2 is a complex-valued multivariate
normal random vector with mean 1.1 = 1.11 + i1.12 and covariance matrix l: if
J
[
�:] � N ( [:: J � [�: - �: ) .
( 1 1 .7.2)
We then write Y � Nc(IJ, l:). If y<nl = Y\n l + iY�l, n = 1, 2, . . . , we say that y<nl
([IJ(n)J [l;(n) - l;(n)J )
[ y(n)J
is ANc(l.l(n l , l:(n l ) if y �nl is AN ll�n) , :2 l; �l 1:\n) , where each l: (n ) =
l:\nl + il:�l satisfies the conditions imposed on l:. These guarantee (Problem
2:
x
1
1 1 . 1 6) that the matrix in ( 1 1 .7.2) is a real covariance matrix.
Suppose that { Z,} � IID(O, l) where l is non-singular, and
let In (w), - n :0:: w :0:: n, denote the periodogram of {Z1, . . . , Zn } as defined by
Proposition 1 1 .7.3.
(1 1 .7. 1 ).
(i)
0<
If }" < · · · < Am < n then the matrices In (}" ), . . . , In (}"m) converge
jointly in distribution as n ---> oo to independent random matrices, each distributed
as Yk Yk* where Yk � Nc(O, l).
(ii) If EZ� < oo, i = I , 2, and wj, wk are Fourier frequencies in [0, n], then
0 < wj = wk < n,
wj = wk = 0 or
1
1
wj of. wk,
n,
where Ipq( · ) is the (p, q)-element of In ( · ), apq is the (p, q)-element of l, and Kpqrs
is the fourth cumulant between Z,P , Z,q, Z,r and Z,s. (See Hannan (1970), p. 23.)
445
§1 1.7. Estimating the Cross Spectrum
A.
n
J(A.) = n - 112 rL� 1 z, e- itg(n , A).
We first show that J(A.) is ANc(O, I) (see Definition 1 1 .7. 1 ). We can rewrite J(A.)
as
n
J(),) = n 112 rL� 1 [Z, cos(tg (n, A.)) - iZ, sin(tg(n, A.)) ] .
PROOF. (i) For an arbitrary frequency E (O, n) define
-
[zZ,, cos(tg (n, A.))A.))]
Now the four-dimensional random vector,
L
U" ·. = n _112 f
.
'
sm(tg(n,
r�1
is a sum of independent random vectors and for g(n, E (0, n) we can write
(see Problem 1 1 . 1 7)
A.)
( 1 1 .7.3)
Applying the Cramer-Wold device and the Lindeberg condition as in the
proof of Proposition 1 0.3.2, we find that
This is equivalent, by Definition 1 1 . 7. 1, to the statement that
J(A.) is ANc(O, l:).
(Note that a complex normal random vector with real covariance matrix L
has uncorrelated real and imaginary parts each with covariance matrix L/2.)
It then follows by Proposition 6.3.4 that /"(A.) => YY * where Y � N(O, l:).
For
a computation analogous to the one giving ( 1 1 .7.3) yields
w -# A.,
E[J(A.)J*(w)]
=
0
J(w)
A
),
In(A.m ) < 1 < < m <
for all n sufficiently large. Since J(A.) and
are asymptotically joint normal,
it follows that they are asymptotically independent. Extending this argument
to the distinct frequencies 0
n, we find that J (A. 1 ),
···
J (),m )
and hence
), ,
are asymptotically independent.
(ii) The proof is essentially the same as that of Proposition 10.3.2 and is
therefore omitted. (See also Hannan (1 970), p. 249.)
D
In(A.1
• • •
• • • ,
As in Section 1 0.3, a corresponding result (Theorem 1 1 .7. 1) holds also for
linear processes. Before stating it we shall relate the periodogram of a linear
process to the periodogram of the underlying white noise sequence.
Proposition 1 1 .7.4.
Let {X, } be the linear process,
1 1. M ultivariate Time Series
446
00
( 1 1 .7.4)
X, = I ck zr - k •
k=
where l: is non-singular and the components of the matrices Ck satisfy
2
I k'= I Ck (i, j)J I kJ 11 < oo , i, j = 1 , 2. Let I x ( · ) and I ( ) be the periodograms
of { X 1 , . . . , X.} and { Z 1 , . . . , Z.} respectively. If EZj < oo, i = 1 , 2, and
C(e - iw) := I I:'= Ck e- ikw, then for each Fourier frequency wk E [0, n],
-oo
•.
- oo
•.
z
-
_ 00
I x (wk ) = C (e - iw )I., (wd C (e iw ) + R.(wk ),
where the components of R.(wk ) satisfy
1
i, j = 1 , 2.
max E I R n, iiwk W = O(n - ),
,
"
•.
2
'
"
Wk E [0 7t]
PROOF. The argument follows that in the proof of Theorem 1 0.3. 1 . (See also
Hannan ( 1 970), p. 248.)
0
Theorem 1 1 .7.1 . Let {X, } be the linear process defined by ( 1 1 .7.4) with
periodogram I.(A) = [Jij().)JL = 1 , - n :5: A :5: n.
(i) If 0 < A1 < · · · < Am < n then the matrices I.(A1 ), . . . , I.(Am ) converge
jointly in distribution as n --> oo to independent random matrices, the k'h of
which is distributed as Wk Wk* where Wk Nc(O, 2nf(Ad) and f is the spectral
density matrix of {X,}.
(ii) If wi = 2nj/n E [0, n] and wk = 2nkjn E [0, n], then
(2n) 2 [ fp , (w).fsq (wi ) + fps (w)fq, (wi ) ] + O(n - 112 )
if wi = wk = 0 or n,
�
if 0 < wi = wk < n,
if (l)j i= (l)b
1
1
2
where the terms O(n - 1 ) and O(n - ) can be bounded un iformly in j and k by
c 1 n - 1 12 and c 2 n- 1 respectively for some positive constants c1 and c2•
PROOF. The proof is left to the reader. (See the proof of Theorem 1 0.3.2 and
Hannan (1 970), pp. 224 and 249.)
D
Smoothing the Periodogram
As in Section 1 0.4, a consistent estimator of the spectral matrix of the linear
process ( 1 1 .7.4) can be obtained by smoothing the periodogram. Let { m.} and
{ W,( · ) } be sequences of integers and (scalar) weight functions respectively,
satisfying conditions ( 10.4.2)-(1 0.4.5). We define the discrete spectral average
estimator j by
0 :5: w :5: n. ( 1 1 . 7.5)
j(w) := (2n) - 1 I W,(k)I.(g(n, w) + wk ),
lk l :5 mn
§ 1 1 .7. Estimating the Cross Spectrum
447
In order to evaluate j(w), 0 :s:; w :s:; n, we define In to have period 2n and replace
In(O) whenever it appears in ( 1 1 .7.5) by
{
t
}
j(O) : = (2n) 1 Re W, (O) In(w 1 ) + 2
W,(k)In(wk+d ·
k l
-
We have applied the same weight function to all four components of In(w)
in order to facilitate the statement and derivation of the properties of j(w). It
is frequently advantageous however to choose a different weight-function
sequence for each component of In( · ) since the components may have quite
diverse characteristics. For a discussion of choosing weight functions to match
the characteristics of In ( - ) see Chapter 9 of Priestley (1981).
The following theorem asserts the consistency of the estimator j(w). It is a
simple consequence of Theorem 1 1 .7. 1 .
If {X1} is the linear process defined by ( 1 1 .7.4) and j(w) =
[/;j{w)JL= t is the discrete spectral average estimator defined by ( 1 1 .7.5), then
for A., w E [0, n],
Theorem 1 1.7.2.
(a) lim Ej(w) = f(w)
and
if w = A. = 0 or
if 0 < w = A. < n,
if w =ld.
n,
(Recall that if X and Y are complex-valued, Cov(X, Y) = E(X Y) - (EX)(EY).)
The cospectrum c 1 2 (w) = [ f1 2 (w) + f2 1 (w)]/2 and the quadrature spectrum
q1 2 (w) = i [ f1 2 (w) - f2 1 (w)]/2 will be estimated by
c1 2 (w) [ J1 2 (w) + j2 1 (w)]/2
=
and
respectively. By Theorem 1 1.7.2(b) we find, under the conditions specified, that
the real-valued random vector (L ikl s m W,2 (k)t 1 (]1 1 (w), f22 (w), c1 2 (w), qdw))',
0 < w < n, has asymptotic covariance matrix,
f1 1 C t z
fz z C t z
tUt t !22 + c i 2 - qi 2 )
C1 2 q 1 2
ft t q 1 2
f2 2 q 1 2
'
Ctzqtz
tU1 1 f2 2 + q iz - c i 2 )
( 1 1 .7.6)
J
1 1 . M ultivariate Time Series
448
f
�
where the argument w has been suppressed. Moreover we can express
( ]1 1 (w), f22 (w), c u(w), q 1 2 (w))' as the sum of (2m + 1) random vectors,
]1 1 (w)
/ 1 1 (g(n, w) + wk )
f22 (w)
l
22 (g(n, w) + wd
=
L Wn (k)
Re {Iu(g(n, w) + wk ) }
� u(w) lkl <; m
q 1 2 (w)
- Im{Iu(g(n, w) + wk) }
J
J
'
where the summands, by Theorem 1 1 .7. 1 , are asymptotically independent.
This suggests that
(]1 1 (w), J2 2 (w), c 1 2 (w), q 1 2 (w) ) is AN ( U1 1 (w), !22 (w), c 1 2 (w), q 1 2 (w) )', a; V)
'
(1 1 .7.7)
where a; = L ikl <; m W,2 (k) and V is defined by ( 1 1 .7.6). We shall base our
statistical inference for the spectrum on the asymptotic distribution (1 1 .7.7).
For a proof of (1 1 .7.7) in the case when j(w) is a lag window spectral estimate,
see Hannan ( 1 9 70), p. 289.
Estimation of the Cross-Amplitude Spectrum
To estimate oc 1 2 (w) = l f1 2 (w)l = l c 1 2 (w) - iq 1 2 (w)l we shall use
&1 2 (w) : = (c i 2 (w) + 4 i 2 (w)) 112 = h(c u(w), q u(w)).
By (1 1 .7.7) and Proposition 6.4.3 applied to h(x, y) = (x 2 + y 2 ) 1 12 , we find that
if oc u(w) > 0, then
where
a;(w) =
(:�y + G�Y
V33
V44 + 2
(:�)(:�)
v34,
vii is the (i,j)-element of the matrix defined by ( 1 1 .7.6), and the derivatives of
h are evaluated at (c 1 2 (w), q 1 2 (w)). Calculating the derivatives and simplifying,
we find that if the squared coherency, l %dwW, is strictly positive then
a 1 2 (w) is AN(ocu(w), a; ociiw)( l %dw)l - 2 + 1 )/2).
( 1 1 .7.8)
Observe that for small values of I %1 2 (wW, the asymptotic variance of &1 2 (w)
is large
Estimation of the Phase Spectrum
The phase spectrum tP 1 2 (w) = arg f1 2 (w) will be estimated by
J1 2 (w) := arg(c u(w) - iq u(w)) E ( - n, n] .
§1 1 .7. Estimating the Cross Spectrum
If l %dwW
449
0, then by ( 1 1 .7.7) and Proposition 6.4.3,
¢ 1 2(w) is AN(¢ u(w), a;ai z(w)( l %u(w) l - 2 - 1 )/2).
>
( 1 1 .7.9)
The asymptotic variance of ¢ dw), like that of & u(w), is large if l %1 2(w) l 2
is small
In the case when %1 2 (w) = 0, both c 1 2 (w) and q 1 2 (w) are zero, so from
(1 1 .7.7) and (1 1 .7.6)
[�dw) J
([ ] [
0
f
°
, ta; 1 1 /2 2
0
0
/1 1 /22
As /;u(w) = arg(cu(w) - iq u(w)) = arg[(a; /1 1 /22 /2) - 1 12 (c u(w) - iq u(w))],
we conclude from Proposition 6.3.4 that
q dw)
is AN
] )·
/;1 2 (w) => arg(V1 + iV2 ),
where V1 and V2 are independent standard normal random variables. Since
V1 /V2 has a Cauchy distribution, it is a routine exercise in distribution theory
to show that arg(V1 + iV2 ) is uniformly distributed on ( - n, n). Hence if
n is large and %1 2 (w) = 0, J1 2 (w) is approximately uniformly distributed
on ( - n, n).
From ( 1 1 .7.9) we obtain the approximate 95% confidence bounds for
� 1 2 (w),
¢ dw) ± 1 .96an& dw)( I Xu(w) l - 2 - 1) 1 ' 2/2 1 ' 2 ,
where I XdwW is the estimated squared coherency,
I XdwW = &iz (w);[ Jl l (w)]dw)J,
and it is assumed that 1 Xu(w) l 2 > 0.
Hannan ( 1 970), p. 257, discusses an alternative method for constructing a
confidence region for � 1 2 (w) in the case when W,(k) = (2m + 1 ) - 1 for l kl :;:; m
and W(k) 0 for l k l > m. He shows that if the distribution of the periodogram
is replaced by the asymptotic distributions of Theorem 1 1 .7. 1, then the event
E has probability ( 1 - a), where
1 - I X1 2(wW 1 ' 2
•
E = j sm(¢
t 1 - a1 2 (4m)
1 2(w) - ¢ u(w)) l :;:;
4m l %u(w) l 2
=
{
A
[
�
]
}
and t 1 _ a1 2 (4m) is the ( 1 - ct/2)-quantile of the t-distribution with 4m degrees
of freedom. For given values of ¢ dw) and I Xu(w) j , the set of ¢ dw) values
satisfying the inequality which defines E is therefore a 1 00(1 - a)% confidence
region for ¢ dw). If the right-hand side of the inequality is greater than or
equal to 1 (as will be the case if I Xdw) l 2 is sufficiently small), then we obtain
the uninformative confidence interval ( - n, n] for � 1 2 (w). On the other hand
if the right-hand side is less than one, let us denote its arcsin (in [0, n/2)) by
�*. Our confidence region then consists of values �1 2 (w) such that
l sin(/; 1 2 (w) - � 1 2 (w))l :;:; sin �*,
1 1 . Multivariate Time Series
450
i.e. such that
( 1 1 .7. 10)
or
J1 2 (w) +
n -
r/J*
:<:;
r/Ju(w) :<:; J1 2 (w) + + r/J*.
n
The confidence region can thus be represented as a union of two subintervals
of the unit circle whose centers are diametrically opposed (at J1 2 (w) and
¢ 1 2(w) + n) and whose arc lengths are 2¢*. If l ffdwW is close to one, then we
normally choose the interval centered at J1 2 (w), since the other interval
corresponds to a sign change in both c 1 2 (w) and q 1 2 (w) which is unlikely if
I Jf'dwW is close to one.
Estimation of the Absolute Coherency
The squared coherency l %dwW is estimated by l ffdwW where
If I ffu(w)l
>
1 2'1 2 (w)l = [ci z (w) + 4i z (w) r12 /[Jl l (w)fzz (w)] 112
= h ( jl l (w)Jzz (W), c u(w), q u(w)).
0, then by ( 1 1 .7.7) and Proposition 6.4.3,
l ffu(w) l is AN( I %1 2 (w) l , a;( l - l %dwW) 2/2),
( 1 1.7. 1 1)
giving the approximate 95% confidence bounds,
for I Jf'dw) l .
l fu(w)l ± 1 .96an ( l - l f1 2 (w)l 2 )/j2,
Since d [tanh - 1 (x)]/dx
Proposition 6.4.3 that
=
[ C � :) }
d � ln
dx = (1 - x 2 ) - 1 , it follows from
(1 1 .7. 1 2)
From ( 1 1 .7. 1 2) we obtain the constant-width large-sample 100(1 - a)%
confidence interval,
(tanh - 1 (1X'u(w)l) - <I>I -a/2 an /J2, tanh - 1 ( 1 2'1 2 (w)l) + <I>I - af2 an/J2),
for tanh - 1 ( l %1 2 (w)l ). The corresponding 100(1 - a)% confidence region for
l %1 2 (w) l is the intersection with [0, 1 ] of the interval
(tanh [tanh - 1 (1X'u(w)l ) - <I>I -af2 an /J2 ] ,
tanh [tanh - 1 ( I X'1 2 (w) l ) + <I>I - a/2 an /J2 ] ),
assuming still that I Jf'u(w)l > 0.
(1 1 .7. 1 3)
If the weight function W,(k) in ( 1 1 .7.5) has the form W,(k) = (2m + 1) - I
for l k l :<:; m and W,(k) = 0, l k l > m, then the hypothesis l %1 2 (w) l = 0 can be
§ 1 1.7. Estimating the Cross Spectrum
45 1
tested against the alternative hypothesis l ffu(w) l > 0 using the statistic,
Y = 2m i %u(w) l 2 /[1 - l %u(wWJ .
Under the approximating asymptotic distribution of Theorem 1 1 .7. 1 , it can
be shown that I %1 2 (wW is distributed as the square of a multiple correlation
coefficient, so that Y F(2, 4m) under the hypothesis that l ff1 ;(w) l = 0. (See
Hannan ( 1970), p. 254.) We therefore reject the hypothesis l ffu(w) l = 0 if
�
Y > F1 _ a (2, 4m)
(1 1 .7. 14)
where F1 _ a (2, 4m) is the ( 1 - a)-quantile of the F distribution with 2 and 4m
degress of freedom. The power of this test has been tabulated for numerous
values of l ffu(w) l > 0 by Amos and Koopmans ( 1 963).
EXAMPLE 1 1 .7. 1 (Sales with a Leading Indicator). Estimates of the spectral
density for the two differenced series {D,J } and {D,2 } in Example 1 1 .2.2 are
shown in Figures 1 1 .5 and 1 1 .6. Both estimates were obtained by smoothing
the respective periodograms with the same weight function W.(k) /3 ,
l k l s 6. From the graphs, it is clear that the power is concentrated at high
frequencies for the leading indicator series and at low frequencies for the
sales series.
The estimated absolute coherency, i &"1 2 (w) l is shown in Figure 1 1 .7 with
=
0.05
0 . 045
0.04
0 . 035
0.03
0.025
0.02
0.0 1 5
0.01
0 .005
0
0
0. 1
0 2
0.4
0.3
Figure 1 1 .5. The spectral density estimate ]d2nc), 0 :s; c
leading indicator series of Example 1 1 .7. 1 .
:s;
0.5
0.5, for the differenced
452
1 1 . Multivariate Time Series
1 .6
1 .5
1 .4
1 .3
1 .2
1 .1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Q. 1
0
0.1
0
0.2
0.4
0.3
0.5
Figure 1 1 .6. The spectral density estimate ]2(2nc), 0 :o; c :o; 0.5, for the differenced
sales data of Example 1 1.7. 1 .
corresponding 95% confidence intervals computed from 1 1 .7. 1 3. The confi­
dence intervals for l f1 2 (w)J are bounded away from zero for all w, suggesting
that the coherency is positive at all frequencies. To test the hypothesis H0 :
l fdw) J = 0 at level a = .05, we use the rejection region ( 1 1 .7.14). Since
m = 6, we reject H0 if
2
1 2 J ffdw) J
> F 95(2, 24)
3.40,
1 - l fdw) J 2
�
=
=
i.e. if lffdw)J > .470. Applying this test to lffdw)J, we find that the hypothesis
l fdw) J 0 is rejected for all w E (0, n:). In fact the same conclusions hold
even at level a = .005. We therefore conclude that the two series are correlated
at each frequency. The estimated phase spectrum rP 1 2(w) is shown with the
95% confidence intervals from ( 1 1 .7. 10) in Figure 1 1.8. The confidence
intervals for ¢1 2(w) are quite narrow at each w owing to the large values of
l ffdw) J . Observe that the graph of rP dw) is piecewise linear with
slope 4. 1 at low frequencies and slope 2.7 at the other frequencies. This is
evidence, supported by the earlier analysis of the cross correlation function
in Example 1 1 .2.2, that {D,1} leads {D, 2 } by approximately 3 time units. A
transfer function model for these two series which incorporates a delay of 3
time units is discussed in Example 1 3. 1 . 1. The results shown in Figures
1 1 .5-1 1 .8 were obtained using the program SPEC.
§1 1 .7. Estimating the Cross Spectrum
453
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0. 1
0.2
0 3
0.4
0.5
Figure 1 1.7. The estimated absolute coherency I K 1 2 (2nc)l for the differenced leading
indicator and sales series of Example 1 1 .7. 1 , showing 95% confidence limits.
0
-1
-2
-3
-4
0
0.1
0.2
0.3
0.4
0.5
Figure 1 1 .8. The estimated phase spectrum, �1 2 (2nc), for the differenced leading
indicator and sales series, showing 95% confidence limits.
I I . Multivariate Time Series
454
§ 1 1 . 8 * The Spectral Representation of a Multivariate
Stationary Time Series
In this section we state the multivariate versions of the spectral representation
Theorems 4.3.1 and 4.8.2. For detailed proofs see Gihman and Skorohod
( 1 974) or Hannan ( 1 970). All processes are assumed to be defined on the
probability space (Q, .'#', P).
Theorem 1 1 .8.1. f( · ) is the covariance matrix function of an m-variate stationary
process {X,, t = 0, ± 1 , . . . } if and only if
h = 0, ± 1 , . . . ,
e ihv dF(v),
f(h) = (
J(-n.n]
where F( · ) is an m x m matrix distribution function on [ - n, n]. ( We shall use
this term to mean that F( - n) = 0, F( · ) is right-continuous and (F(/1) - F(A)) is
non-negative definite for all A :::; Jl, i.e. oo > a *(F ( Jl) - F(A)) a ?: 0 for all a E e m,
where a* denotes the complex conjugate transpose of a.) F is called the spectral
distribution matrix of {X,} or of r( · ). Each component Fik( · ) of F( · ) is a
complex-valued distribution function and J( ,1 eihv dF(v) is the matrix whose
(j, k)-component is L - n] eihv dFjk(v).
_ ,_
"·
PROOF. See Gihman and Skorohod (1 974), p. 2 1 7.
D
In order to state the spectral representation of {X, }, we need the concept
of a (right-continuous) vector-valued orthogonal increment process {Z(A),
- n :::; A :::; n}. For this we use Definition 4.6. 1, replacing (X, Y) by EXY*
and I I X II 2 by EXX*. Specifically, we shall say that {Z(A), - n :::; A :::; n} is a
vector-valued orthogonal increment process if
the components of the matrix E(Z(A)Z*(A)) are finite, - :::; A :::; n,
EZ(A) = 0, - n :::; A :::; n,
E(Z(A4 ) - Z(A3))(Z(A 2 ) - Z(A d )* = 0 if (A 1 , A2] (A3, A4 ] = r/J, and
E(Z(A + b) - Z(A))(Z(A + b) - Z(A))* --> 0 as b !O.
Corresponding to any process {Z(A), - n :::; A :::; n} satisfying these four
properties, there is a unique matrix distribution function G on [- n, n] such
(i)
(ii)
(iii)
(iv)
n
n
that
G(/1) - G(A) = E [(Z(/1) - Z(A.))(Z(/1) - Z(),))*],
A :::; fl. ( 1 1 .8. 1 )
In shorthand notation the relation between the matrix distribution function
G and {Z(A.), - n :::; A :::; n} can be expressed as
E(dZ(A.) d Z* (/1)) = bA dG(A.) =
· �'
{dG(A.)
0
if J1 = ) ,
:
otherwise.
Standard Brownian motion {B(A.), - n :::; A :::; n} with values
m
IRm
and
§ 1 1 .8. * The Spectral Representation of a Multivariate Stationary Time Series
455
B( - n) = 0 is an orthogonal increment process with G(A) = ()_ + n)I where I
is the (m x m) identity matrix. The fact that G(A) is diagonal in this particular
case reflects the orthogonality of B;(A), Bj(A), i i= j, for m-dimensional Brownian
motion. It is not generally the case that G(A) is diagonal; in fact from ( 1 1 .8. 1)
the (i,j)-element of dG(A) is the covariance, E (dZ;(A) dZp)).
The stochastic integral I ( f) with respect to { Z(A)} is defined for functions
f which are square integrable with respect to the distribution function
G0 := L f= 1 Gii as follows. For functions of the form
f(A)
=
n
L fJu,, ;.,+ ,/A),
i =O
- n = A0 < A 1 < . . . < An+l
=
n, ( 1 1 .8.2)
we define
n
! ( f ) := L J; [Z(A;+d - Z(A;)].
( 1 1 .8.3)
i =O
This mapping is then extended to a Hilbert space isomorphism I of L2(G0)
into L 2 (Z), where L 2(Z) is the closure in L 2 (0., Ji', P) of the set of all linear
combinations of the form ( 1 1 .8.3) with arbitrary complex coefficients };. The
inner product in L 2 (Z) is defined by
(1 1 .8.4)
Definition 1 1 .8.1 . If { Z(A), - n .::;; A .::;; n} is an m-variate orthogonal increment
process with E (dZ(A) dZ * (Jl)) = b;..ll dG(A) and G0 = 2:� 1 G;;, then for any
f E L 2 (G0) we define the stochastic integral J( - , , ,J i(v) dZ( v) to be the random
vector I (f ) E (Z) with I defined as above.
U
The stochastic integral has properties analogous to (4.7.4)-(4.7.7), namely
E (l (f))
=
0,
l (a J + a 2 g) = a 1 I ( f) + a 2 l (g),
<J( f ), I(g))
=
1- , , ,/( v) g( v) dG0( v),
a 1 , a 2 E C,
and
<I(fn), l (g.) )
--> 1 - , , ,/( v) g( v) dG0(v) if
fn � f and
g.
L2(G0)
�
g,
with the additional property,
E (l (f)I( g)*)
=
1 - , , , /( v) g( v) dG(v).
Now suppose that {X,} is a zero-mean m-variate stationary process with
spectral distribution matrix F( ' ) as in Theorem 1 1 .8. 1 . Let Yt' be the set of
random vectors of the form
1 1 . M ultivariate Time Series
456
( 1 1 .8.5)
U(Q,
F0 U(F0).F
and let :if denote the closure in
ff, P) of Jlt. The inner product in :if is
defined by ( 1 1 .8.4). Define % to be the (not necessarily closed) subspace
sp e ;1 . t E Z of
where = L7'= t ii . lf .i1 denotes the closure of % in
then, as in Section 4.8, .i1 =
The mapping T defined by
L2{(F0), } U(F0)
r(j=lf. aiX11)
=
f.
aie i1r,
j=l
( 1 1 .8.6)
can be extended as in Section 4.8 to a Hilbert space isomorphism of :if onto
which by Theorem 1 1 .8.1 has the property that
L2 (F0),
E[(T� 1f )(T� 1 g * ] 1�,, ,1 f(v)g(v) dF(v),
Consequently the process {Z(A), - n ::; A ::; n} defined by
Z(A) = T � 1 I( �Jt.AJ ( - ), - n ::; ) ::; n,
is an orthogonal increment process, and the matrix distribution function
associated with { Z(A)} is precisely the spectral distribution matrix F of { X1 }
appearing in Theorem
The spectral representation of {X1} is then
)
=
,
( 1 1 .8.8)
1 1 .8. 1 .
established by first showing that
T� 1f 1 ,, ,/(v)dZ(v),
�
then setting f(v) e i1 v and using
=
( 1 1 .8.9)
( 1 1 .8.6).
=
If {X1}
{ Z(A), ::; ::; n}
F(A),
is a stationary
Theorem 1 1 .8.2 (The Spectral Representation Theorem).
sequence with mean zero and spectral distribution matrix F ( · ), then there exists
a right-continuous orthogonal increment process
A
such that
-n
(i)
and
E[(Z(A) - Z(
(ii) X1 =
Z( - n))* ]
=
dZ(v) with probability
1.
- n) ) (Z(Jc) -
j ei1 v
J(�1t,1t]
PROOF. The steps are the same as those in the proof of Theorem
process
being defined by (1 1 .8.8).
{ Z().) }
4.8.2,
the
D
The following corollary is established by the same argument which we used
to prove Corollary 4.8. 1 .
If
{X1 } is a zero-mean stationary sequence then there exists a
- n ::;
such that
right-continuous orthogonal increment process
Corollary 1 1 .8.1 .
{Z(A),
A ::; n}
§ 1 1 .8. * The Spectral Representation of a Multivariate Stationary Time Series
Z( - n) = 0 and
xt =
I eitA dZ(v)
J(-1t,1t]
457
with probability 1 .
If { Y()") } and {Z(A-) } are two such processes, then
P(Y(A_) = Z(A-) = 1 for each A E [ - n, n] .
gEL2 (F0), Equations ( 1 1 .8.7) and ( 1 1 .8.9) imply that for any functions f,
E [f_ "· "/()") dZi()") f_"· "l g(A.) dZj(A-) J = 1 - "· "/(A-) g(A-) dFi (A.).
It can be shown (Problem 1 1.22) that the same relation then holds for all
fEU(Fi
; ) and gEL2 (Fjj).
As in the univariate case we also have the important result that
Y E (see (1 1 .8.5)) if and only if there exists a function gEL 2(F0) such that
Y = 1 -"· "l g(v) dZ(v) with probability 1 .
I n many important cases of interest (in particular i f {X1 } i s an
ARMA process) the spectral distribution matrix F( ) has the form,
F(w) = J:J(v) dv, - n w n.
Then f( · ) is called the spectral density matrix of the process. In the case when
for al i, j E { 1 , ... , m} we have the simple relations ( 1 1 . 1 . 14)
L �= IYii(h) l
and ( 1 1 . 1 . 1 5) connecting r( and f( ) .
Remark
I.
Remark 2.
Yi'
Remark 3.
·
s
- oo
< oo
i
·)
s
·
Time Invariant Linear Filters
The spectral representation of a stationary m-variate time series is particularly
useful when dealing with time-invariant linear filters. These are defined for
m-variate series just as in Section 4. 10, the only difference being that the
coefficients of the filter
0, ± 1 ,
are now (/ x m) matrices
instead of scalars. In particular an absolutely summable TLF has the property
that the elements of the matrices are absolutely summable and a causal TLF
has the property that
0 for j < 0.
is obtained from
by application of the absolutely summable
If
(l x m) TLF
then
00
= L
1 1 .8. 1 0)
Hi
H = { Hi,j =
{ Y1 } H = {Hi},Hi = {X1 }
yt j= HjXt -j•
- co
... }
(
1 1 . Multivariate Time Series
458
The following theorem expresses the spectral representation of {Y1 } in terms
of that of {X1 }.
If H = {Hi} is an absolutely summable (l x m) TLF and {X1 }
is any zero-mean m-variate stationary process with spectral representation,
Theorem 1 1 .8.3.
e'tv dZ(v),
Xt = l
J(�rr,rr]
and spectral distribution matrix F, then the /-variate process ( 1 1 .8.1 0) is stationary
with spectral representation
Y1 = l
e 11vH(e � 'v) dZ(v),
J(�rr,rr]
and spectral distribution matrix Fy satisfying
dFy(v) = H(e � iv) dF(v)H' (e 'v),
where
H(e'v) = L Hj e iiv.
00
j= - oo
PROOF. The proof of the representation of Y1 is the same as that of Theorem
4. 1 0. 1 . Since Y1 is a stochastic integral with respect to the orthogonal increment
process {W(v)} with dW(v) = H(e�'v) dZ(v), it follows at once that EY1 = 0
and that Y1 is stationary with
dFy(v) = E(dW(v) dW*(v)) = H(e � iv) dF(v)H'(e'v)
E(Yt+ h Y1*) = l
e'hv dFy(v).
J(�rr,rr]
and
Remark
D
4. The spectral representation decomposes X1 into a sum of sinusoids
eit v dZ(v),
�v�
The effect of the TLF H is to produce corresponding components
� v�
eitv H(e �iv) dZ(v),
which combine to form the filtered process {Y1}. The function H(e � iv),
� v � is called the matrix transfer function of the filter H = { HJ.
-n
n.
-n
-n
n,
ExAMPLE 1 1 .8. 1 . The causal ARMA(p, q) process,
<l>(B)X1 = 8(B)Z0
can be written (by Theorem 1 1 .3.1) in the form
00
xt = j�IO 'l'j z t�j•
n,
459
Problems
where L � o 'f'i z i = <1> - 1 (z)0(z), l z l :::;; 1 . Hence {X,} i s obtained from {Z, }
by application of the causal TLF {'f'i,j = 0, 1 , 2, . . . }, with matrix transfer
function,
- n :::;; v :::;; n.
By Theorem
matrix,
1 1 .8.3
the spectral distribution matrix of X therefore has density
- n :::;; v :::;; n.
ExAMPLE 1 1 .8.2. The spectral representation of any linear combination of
components of X, is easily found from Theorem 1 1 .8.3. Thus if
r;
=
a* X, where a E Cm,
then
where
Zy(v) = a* Z(v),
and
dFy (v) = E(dZy(v) dZ:(v)) = a* dF(v)a.
The same argument is easily extended to the case when r; is a linear combi­
nation of components of X, X,_ 1 , . . . .
Problems
1 1.1.
Let { Y, } be a stationary process and define the bivariate process, X11 = Y, ,
= Y,-d where d =1 0. Show that { (X1 1 , X1 2 )' } is stationary and express its
cross-correlation function in terms of the autocorrelation function of { Y,} . If
Pr (h) ..... 0 as h ..... oo show that there exists a lag k such that p 1 2 (k) > pn(O).
X, 2
1 1 .2. Show that the linear process defined in ( 1 1 . 1 . 1 2) is stationary with mean 0 and
covariance matrix function given by ( 1 1 . 1 . 1 3).
1 1 .3. * Prove Proposition 1 1 .2.2.
1 1 .4.
1 1 .5.
1 1 .6.
Prove Theorem 1 1 .3.2.
If {X, } is a causal ARMA process, show that there exists e E (0, 1) and a constant
K such that IYii(h)l :::;; K e1 h l for all i, j and h.
Determine the covariance matrix function of the ARMA(1, 1) process defined
in ( 1 1 .4.33).
460
1 1 . Multivariate Time Series
1 1 .7. If G(z) = L h'� 1(h)z h is the covariance matrix generating function of an
ARMA process, show that G(z) = <l> - 1 (z) E>(z) tE>'(z - 1 )<l>' - 1 (z - 1 ).
P
1 1 .8. For the matrix M in ( 1 1 .4.21 ), show that det(z/ - M) = zm det(<l>(z - 1 )) where
P
<l>(z) I - <l>P 1 z - · · · - <l>PPz .
- oc
=
1 1 .9. (a) Let { X, } be a causal multivariate AR(p) process satisfying the recursions
{ Z, }
�
WN(O, t).
For n > p write down recursion relations for the predictors, Ps, Xn + h ,
h � 0, and find explicit expressions for the error covariance matrices in
terms of the AR coefficients and * when h 1, 2 and 3.
(b) Suppose now that {Y,} is the multivariate ARIMA(p, 1 , 0) process
satisfying V'Y, X,, where { X, } is the AR process in (a). Assuming that
Y 0 j_ X,, t � I , show that
h
P( Yn + h i Yo , Y J> · · · , Yn) = Yn + L Ps, Xn + j
j� I
=
=
and derive the error covariance matrices when h 1 , 2 and 3. Compare
these results with those obtained in Example 1 1 .5. 1 .
=
1 1 . 10. Use the program ARVEC t o analyze the bivariate time series, X, 1 , X, 2 ,
t
1 , . . . , 200 (Series J and K respectively in the Appendix). Use the minimum
AICC model to predict (Xt. l , Xr. 2 ), t = 201 , 202, 203 and estimate the error
covariance matrices of the predictors.
=
1 1 . 1 1 . Derive methods for simulating multivariate Gaussian processes and multi­
variate Gaussian ARMA processes analogous to the univariate methods speci­
fied in Problems 8. 1 6 and 8 . 1 7.
1 1 . 1 2. Let { X , } be the invertible MA(q) process
{ Z, }
�
W N ( O, t ) ,
where t is non-singular. Show that as n --> oo,
(a) E(X n+ t - Xn + t - Zn+ t ) (Xn+t - Xn + t - Zn+ t l' --> 0,
(b) vn --> t, and
(c) E>nj --> E>j, j = 1, . . . , q.
(For (c), note that E>j = E(X n+t Z�+ t -)� - t and E>nj = E (Xn + 1 (Xn + t - j ­
xn+t -ll v,.-::. �.)
1 1 . 1 3. If X and Y are complex-valued random variables, show that E l Y - aXI 2 is
minimum when a = E( YX)/E I X I 2 .
1 1 . 1 4. Show that the bivariate time series (Xn , Xd' defined in ( 1 1 .6.14) is stationary.
1 1 . 1 5. If A and its complex conjugate A are uncorrelated complex-valued random
variables such that EA = 0 and E I A I 2 = (J2 , find the mean and covariance
matrix of the real and imaginary parts of A. If X, = L}�1 ( Aj e i<;t + � e - i'i' ),
0 < il 1 < · · · < iln < rr, where {Aj , � ,j = 1 , . . . , n } are uncorrelated, E Aj = 0
and E I Aj l 2 = (Jl/2 , j = 1 , . . . , n, express X, as L}� 1 [Bj cos(iljt) + Cj sin(iljt)]
and find the mean and variance of Bj and Cj .
Problems
461
Y is a complex-valued random vector with covariance
E( Y - Jl) ( Y - Jl) * = :E 1 + i:E2, verify that the matrix
1 1 . 1 6. If
matrix
:E :=
is the covariance matrix of a real-valued random vector.
1 1. 1 7. Let
un
=
n
[
Z, cos ( tw)
_ 112 �
L..
'
.
r =l Z, sm
( tw)
J
where { Z, } is bivariate white noise with mean 0 and covariance matrix 1:, and
2nj/n E (0, n). Show that E V. V� = !(� n
w1
=
1 1 . 1 8. If U1 and U2 are independent standard normal random variables, show that
U2/U1 has a Cauchy distribution and that arg(U1 + iU2) is uniformly dis­
tributed on ( - n, n).
1 1 . 19. Verify the calculation of the asymptotic variances in equations ( 1 1 .7.8), ( 1 1 .7.9)
and ( 1 1 .7. 1 1).
1 1 .20. Let { X, 1 , t = I , . . . , 63} and {X,2, t = I , . . . , 63} denote the differenced series
{V In Yr 1 } , {V In Y, 2} where { Y, J } and { Y,2} are the annual mink and muskrat
trappings (Appendix A, series H and I respectively).
(a) Compute the sample cross correlation function of { Xr1 } and { X,2 } for lags
between - 30 and + 30 using the program TRANS
(b) Test for independence of the two series.
1 1.21 . With { Xr 1 } and { Xd as in Problem 1 1 .20, estimate the absolute coherency,
I K 1 2(il)l and phase spectrum ¢'1 2(il), 0 � il. � n, using S PEC. What do these
functions tell you about the relation between the two series? Compute approxi­
mate 95% confidence intervals for I K 1 2(il)l and ¢' 1 2(il).
1 1 .22. * Prove Remark 1 of Section 1 1.8.
1 1 .23. * Let {X, } be a bivariate stationary process with mean 0 and a continuous
spectral distribution matrix F. Use Problem 4.25 and Theorem 1 1.8.2 to show
that {X, } has the spectral representation
X,1
=
2
r
J (O.•J
cos(vt) d U) v) + 2
r
J(O.n]
sin ( vt) d lj(v),
j
=
I , 2,
where {U(il) = ( U1 (il.), UAil.))' } and {V(il.) = ( V1 (il.), V2(..i.))' } are bivariate ortho­
gonal increment processes on [0, n] with
1
},
R e {d
= 2
-
E(dV(il.) dV'(fl))
E(dV(A.) dV' (!1))
and
=
i5;..�
F(il.)
r 1 i5;..� Re { d F (A.) } ,
462
1 1 . Multivariate Time Series
If {X,} has spectral density matrix f(A), then
en{ A) =
and
r'
Cov(dU, (A), dU2 (A))
=
r'
Cov(dV, (A), dV2 (A))
where c1 2 (A) is the cospectrum and q 1 2 (A) is the quadrature spectrum. Thus
c1 2 (A) and q1 2 (A) can be interpreted as the covariance between the "in-phase"
and "out of phase" components of the two processes { X,d and {X,J at
frequency A.
CHAPTER 1 2
State- Space Models and the
Kalman Recursions
In recent years, state-space representations and the associated Kalman
recursions have had a profound impact on time series analysis and many
related areas. The techniques were originally developed in connection with
the control of linear systems (for accounts of this subject, see the books of
Davis and Vinter ( 1 985) and Hannan and Deistler ( 1988)). The general form
of the state-space model needed for the applications in this chapter is defined
in Section 1 2. 1 , where some illustrative examples are also given. The Kalman
recursions are developed in Section 1 2.2 and applied in Section 1 2.3 to the
analysis of ARMA and ARIMA processes with missing values. In Section
1 2.4 we examine the fundamental concepts of controllability and
observability and their relevance to the determination of the minimal
dimension of a state-space representation. Section 1 2.5 deals with recursive
Bayesian state estimation, which can be used (at least in principle) to compute
conditional expectations for a large class of not necessarily Gaussian
processes. Further applications of the Bayesian approach can be found in
the papers of Sorenson and Alspach ( 1 97 1), Kitagawa ( 1987) and Grunwald,
Raftery and Guttorp ( 1989).
§ 1 2. 1 State-Space Models
In this section we shall illustrate some of the many time-series models which
can be represented in linear state-space form. By this we mean that the series
{Y�> t = 1, 2, . . . } satisfies an equation of the form
t = 1 , 2, . . .
'
( 1 2. 1 . 1 )
1 2. State-Space Models and the Kalman Recursions
464
where
X, + I = F,X, + v,
t = 1 , 2, . . . .
( 1 2. 1 .2)
The equation ( 1 2. 1 .2) can be interpreted as describing the evolution of the
state X, of a system at time t (a v x 1 vector) in terms of a known sequence
of v x v matrices F 1 , F 2 ,
and the sequence of random vectors X 1 , V 1 ,
Equation ( 1 2. 1 . 1) then defines a sequence of observations, Y , which
V2 ,
are obtained by applying a linear transformation to X, and adding a random
noise vector, W,, t = 1 , 2, . . . . (The equation ( 12. 1 .2) is generalized in control
theory to include an additional term H, u, on the right, representing the
effect of applying a control u, at time t for the purpose of influencing X, + 1 .)
• . .
• • • .
Assumptions. Before proceeding further, we list the assumptions to be used
in the analysis of the state equation ( 1 2. 1 .2) and the observation equation
( 1 2. 1 . 1 ) :
(a) F 1 , F 2 , is a sequence of specified v x v matrices.
(b) G 1 , G 2 , . . . is a sequence of specified w x v matrices.
(c) {X 1 , (V; , w;)', t = 1 , 2, . . . } is an orthogonal sequence of random vectors
with finite second moments. (The random vectors X and Y are said to be
orthogonal, written X l. Y, if the matrix E(XY') is zero.)
(d) EV, = 0 and EW, = 0 for all t.
(e) E(V,V ;) = Q,, E(W,W;) = R,, E(V, W;) = S, , where {Q,}, {R,} and {S,} are
specified sequences of v x v, w x w and v x w matrices respectively.
. . •
Remark 1. In many important special cases (and in all the examples of this
section) the matrices F, G,, Q, R, and S, will be independent of t, in which
case we shall suppress the subscripts.
It follows from the observation equation ( 1 2. 1 . 1 ) and the state
equation ( 1 2. 1 .2) that X, and Y, have the functional forms, for t = 2, 3, . . . ,
Remark 2.
and
x, = !,( X I , v I • . . . , v, _ l )
( 1 2. 1 .3 )
Y, = g,(X 1 , V � > . . . , V, _ 1 , W,).
( 1 2. 1 .4)
Remark 3. From Remark 2 and Assumption (c) it is clear that we have the
orthogonality relations,
and
V, l. Y,
1 � s � t,
W, l. Y, 1 � s < t.
As already indicated, it is possible to formulate a great variety of
time-series (and other) models in state-space form. It is clear also from the
465
§ 1 2. 1 . State-Space Models
definition that neither {X r } nor {Y,} is necessarily stationary. The beauty
of a state-space representation, when one can be found, lies in the simple
structure of the state equation ( 1 2. 1 .2) which permits relatively simple analysis
of the process { Xr }. The behaviour of {Y,} is then easy to determine from
that of { X,} using the observation equation ( 1 2. 1 . 1 ). If the sequence
{ X 1 , V 1 , V 2 , . . . } is independent, then {X,} has the Markov property, i.e. the
distribution of X, + 1 given X" . . . , X 1 is the same as the distribution of X, + 1
given X,. This is a property possessed by many physical systems provided
we include sufficiently many components in the specification of the state X,
(for example, we may choose the state-vector in such a way that X, includes
components of X, _ 1 for each t).
To illustrate the versatility of state-space models, we now consider some
examples. More can be found in subsequent sections and in the books of
Aoki ( 1 987) and Hannan and Deistler ( 1 988). The paper of Harvey ( 1984)
shows how state-space models provide a unifying framework for a variety
of statistical forecasting techniques.
ExAMPLE 1 2. 1 . 1 (A Randomly Varying Trend With Added Noise). If {3 is
constant, { V,} WN(O, a 2 ) and Z 1 is a random variable uncorrelated
with { V, , t = 1 , 2, . . . }, then the process {Z,, t = 1 , 2, . . . } defined by
t = 1 , 2, . . . ' (12.1 .5)
Zr+ l = z, + {3 + v, = z l + {3t + VI + . . . + v,,
�
has approximately linear sample-paths if a is small (perfectly linear if a = 0).
The sequence { V,} introduces random variation into the slope of the
sample-paths. To construct a state-space representation for { Z,} we introduce
the vector
Then ( 1 2. 1 .5) can be written in the equivalent form,
t = 1 , 2, . . . '
( 1 2. 1 .6)
where V, = ( V, , 0)'. The process {Z,} is then determined by the observation
equation, Z, = [ 1 O]X, . A further random noise component can be added
to Z,, giving rise to the sequence
Y, = [ 1 O]X, + �'
(12.1 .7)
t = 1 , 2, . . . ,
where { � } WN(O, v 2 }. If {X 1 , V1 , W1 , V2 , W2 , } is an orthogonal
sequence, the equations ( 1 2. 1 .6) and ( 1 2 . 1 .7) constitute a state-space
representation of the process { Y,}, which is a model for data with randomly
varying trend and added noise. For this model we have
�
• • •
1 2. State-Space Models and the Kalman Recursions
466
EXAMPLE 1 2. 1 .2 (A Seasonal Series with Noise). The classical decomposition
( 1 .4. 1 2) considered earlier in Chapter 1 expressed the time series { X1} as a
sum of trend, seasonal and noise components. The seasonal component (with
period d) was a sequence {s1} with the properties sr + d = S1 and L�= 1 s1 = 0.
Such a sequence can be generated, for any values of s 1 , s0 , , s _ d + J • by
means of the recursions,
• • •
sr + 1 = - sr - · · · - sr - d + 2 '
t = 1 , 2, . . .
'
( 12. 1 .8)
A somewhat more general seasonal component { Y, }, allowing for random
deviations from strict periodicity, is obtained by adding a term V, to the right
side of ( 1 2. 1. 1 8), where { V,} is white noise with mean zero. This leads to the
recursion relations,
Y, + 1 = - Y, - . . . - Y. - d + 2 + v; ,
(12.1.9)
t = 1 , 2, . . . .
To find a state-space representation for { Y,} we introduce the
(d - I )-dimensional state vector,
The series { Y, } is then given by the observation equation,
Y, = [1 0 0
·· ·
O]Xl '
t = 1 , 2, . . .
'
where { X1} satisfies the state equation,
t = 1 , 2, . . .
'
with V1 = ( V, , 0, . . . , 0)' and
-1
1
F= 0
0
-1
0
0
-1
0
0
-1
0
0
0
ExAMPLE 1 2. 1 . 3 (A Randomly Varying Trend with Seasonal and Noise
Components). Such a series can be constructed by adding the two series in
Examples 1 2. 1 . 1 and 1 2. 1 .2. (Addition of series with state-space
representations is in fact always possible by means of the following
construction. See Problem 1 2.2.) We introduce the state-vector
where X11 and X� are the state vectors in Examples 1 2. 1 . 1 and 1 2. 1 .2
respectively. We then have the following representation for { Y,}, the sum of
the two series whose state-space representations were given in Examples
§ 1 2. 1 . State-Space Models
467
1 2. 1 . 1 and 1 2. 1 .2. The state equation is
( 12. 1 . 10)
where F 1 , F 2 are the coefficient matrices and {V,1 }, { vn are the noise vectors
in the state equations of Examples 1 2. 1 . 1 and 1 2. 1 .2 respectively. The
observation equation is
Y, = [ 1 0
0
···
( 1 2. 1 . 1 1 )
O]X, + W,,
where { W,} is the noise sequence i n ( 1 2. 1 .7). I f the sequence of random vectors,
{X 1 , v t, Vi, W1 , VL V�, W2 , . . . }, is orthogonal, the equations ( 1 2. 1 . 1 0)
and ( 1 2. 1 . 1 1) constitute a state-space representation for { Y,} satisfying
assumptions (aHe).
We shall be concerned particularly in this chapter with the use of
state-space representations and the Kalman recursions in the analysis of
ARMA processes. In order to deal with such processes we shall need to
consider state and observation equations which are defined for all
t E { 0, ± 1 , . . . } .
Stationary State-Space Models Defined for t E {0, ± 1 ,
.
.
.
}
Consider the observation and state equations,
Y, = GX, + W,,
t = 0, ± 1 , . .
( 1 2. 1 . 1 2)
. '
( 1 2. 1 . 1 3)
t 0, ± 1, . . .
X, + 1 = FX, + V, ,
where F and G are v x v and w x v matrices respectively, {VJ WN(O, Q),
{W,} WN(O, R), E(V, w;) = S for all t and Vs .1 W, for all s =I= t.
The state equation (12. 1 . 1 3) is said to be stable (or causal) if the matrix
F has all its eigenvalues in the interior of the unit circle, or equivalently if
det(J Fz) =1= 0 for all z E C such that I z I :s; 1 . The matrix F is then also said
to be stable.
In the stable case the equations ( 1 2. 1 . 1 3) have the unique stationary
solution (Problem 1 2.3) given by
00
(12. 1 . 14)
X, = L FiVt - j - 1 •
j=O
The corresponding sequence of observations,
=
'
�
�
-
00
Y, = w, + L GFiV, _ j - 1 •
j= O
is also stationary.
1 2. State-Space Models and the Kalman Recursions
468
ExAMPLE 1 2. 1 .4 (State-Space Representation of a Causal AR(p) Process).
Consider the AR(p) process defined by
t = 0, ± 1 , . . . ,
( 1 2. 1 . 1 5)
where {Zr } - WN(O, a 2 ), and ¢(z) := 1 - ¢ 1 z - · · · - r/J p zP is non-zero for
l z l ::::; 1 . To express { X r } in state-space form we simply introduce the state
vectors,
t = 0, ± 1 , . . . .
( 1 2. 1 . 1 6)
If at time t we observe r; = X" then from ( 1 2. 1 . 1 5) and ( 1 2. 1 . 1 6) we obtain
the observation equation,
r; = [0 0 0 . . . 1]X"
( 1 2. 1 . 1 7)
t = 0, ± 1 , . . . ,
and state equation,
xt + l =
0
0
0
0
0
0
0
cPp cPp - 1 cPp - 2
0
0
¢1
Xr +
0
0
0
zt + l •
t = 0, ± 1 , . . . .
( 1 2. 1 . 1 8)
In Example 1 2. 1 .4 the causality condition, ¢(z) i= 0 for l z l ::::; 1 , is
equivalent to the condition that the state equation ( 1 2. 1 . 1 8) is stable, since
the eigenvalues of the coefficient matrix F in ( 1 2. 1 . 1 8) are simply the
reciprocals of the zeroes of ¢(z) (Problem 1 2.4). The unique stationary
solution of ( 1 2. 1 . 1 8) determines a stationary solution of the AR(p)
equation ( 1 2. 1 . 1 5), which therefore coincides with the unique stationary
solution specified in Remark 2 of Section 3. 1 .
Remark 4.
If equations ( 1 2. 1 . 1 7) and ( 1 2. 1 . 1 8) are postulated to hold only for
t = 1, 2, . . . , and if X 1 is a random vector such that {X 1 , Z 1 , Z 2 , } is an
orthogonal sequence, then we have a state-space representation for { r;} of
the type defined earlier by ( 1 2. 1 . 1 ) and (12.1 .2). The resulting process { r;} is
well-defined, regardless of whether or not the state equation is stable, but it
will not in general be stationary. It will be stationary if the state equation
is stable and if X 1 is defined by ( 1 2. 1 . 1 6) with X r = L� o t/Jj Zr - j •
t = 1 , 0, . . . , 2 - p, and t/J(z) = 1/¢(z), i z l ::::; 1 .
Remark 5.
• • .
EXAMPLE 1 2. 1 .5 (State-Space Representation of a Causal ARMA(p, q)
Process). State-space representations are not unique. We shall give two
§12. 1 . State-Space Models
469
representations for an ARMA(p, q) process. The first follows easily from
Example 1 2. 1 .4 and the second (Example 1 2. 1 .6 below) has a state-space with
the smallest possible dimension (more will be said on this topic in Section
12.4). Consider the causal ARMA(p, q) process defined by
where { Z,}
�
¢(B) ¥; = 8(B)Z,
t = 0, ± 1, . . .
2
WN(O, a ) and ¢(z) =!= 0 for I z I ::::; 1. Let
( 1 2.1. 1 9)
'
r = max(p, q + 1),
¢j = 0 for j > p, ej = 0 for j > q and e o = 1 .
Then i t is clear from ( 1 2. 1 . 1 9) that we can write
¥; = [ 8, - 1 e, - 2 . . . Bo]X,
( 1 2. 1 .20)
where
( 1 2.1.2 1 )
and
t = 0, ± 1 , . . . .
¢(B)X, = Z,
(12. 1 .22)
But from Example 1 2. 1 .4 we can write
0
0
x, + 1
0
0
0
0
=
0
0
0
¢ , c/Jr - 1 c/Jr - 2
X, +
0
0
0
¢1
z, + 1 ,
t = 0, ± 1, . . . .
( 1 2. 1 .23)
Equations ( 1 2.1 .20) and (12.1.23) are the required observation and state
equations. The causality assumption implies that ( 1 2.1.23) has a unique
stationary solution which determines a stationary sequence { ¥;} through the
observation equation ( 1 2. 1 .20). It is easy to check that this sequence satisfies
the ARMA equations ( 1 2. 1 . 1 9) and therefore coincides with their unique
stationary solution.
ExAMPLE 1 2.1.6 (The Canonical Observable Representation of a Causal
ARMA(p, q) Process). Consider the ARMA(p, q) process { ¥;} defined by
( 1 2. 1 . 1 9). We shall now establish a lower dimensional state-space
representation than the one derived in Example 1 2.1.5. Let
Then
m =
max(p, q) and ¢i = 0 for j > p.
¥; = [1 0 0
· · ·
O]X,
+
Z,
t = 0, ± 1, . . .
'
( 1 2.1 .24)
1 2. State-Space Models and the Kalman Recursions
470
where { X1 } is the unique stationary solution of
0
0
xl + l =
1
0
0
0
0
0
0
0
</Jm </Jm - 1 </Jm - 2
XI +
</J I
1/1 1
1/1 2
1/Jm - 1
1/Jm
t = 0, ± 1 , . . . '
zn
(12.1 .25)
and 1jJ 1 , . . . , 1/Jm are the coefficients of z, z 2 , . . . , zm in the power series expansion
of (}(z)f<P(z), \ z \ :::;; 1. (If m = 1 the coefficients of X1 in ( 1 2. 1 .24) and ( 1 2. 1 .25)
are 1 and <P 1 respectively.)
PROOF. The result will be proved by showing that if { X1} is the unique
stationary solution of (12.1.25) and if { l;} is defined by ( 1 2. 1 .24), then { l;}
is the unique stationary solution of ( 1 2. 1 . 1 9). Let F and G denote the
matrix coefficients of X1 in ( 1 2. 1 .25) and ( 1 2. 1 .24) respectively, and let
H = ( 1/1 1 , 1/1 2 , . . . , 1/Jm)'. Then
( 1 2. 1 .26)
i = 1, . . . , m,
and, since det(z/ - F) = z m - </J 1 zm - l - · · · - <Pm (see Problem 1 2.4), the
Cayley-Hamilton Theorem implies that
( 1 2. 1 .27)
pm - </J ipm - 1 - . . . - </J m/ = 0.
From (12.1 .24) and ( 1 2. 1 .25) we have
GXI + zn
'Yr + 1 = GFX1 + GHZ1 + Z1 + 1 ,
Yr
=
These equations, together with (12.1.26) and ( 1 2 . 1.27), imply that
'Yr + m - </J 1 'Yr + m - 1 - · · · - </J m 'Yr
0
=[
- </Jm · · · - </J 1
1/1 1
1 ] 1/1 2
1/1 1
0
0
0 zl
0 zl + l
0 zl + 2
zl +m
1/Jm 1/Jm - 1 1/Jm - 2
Since 1/1 1 , . . . , 1/1 m are the coefficients of z, z 2 , . . . , zm in the power series
expansion of (}(z)/</J(z), i.e. ljli = </J 1 1/lj - l + ¢ 2 1/li - 2 + · · · + <Pi + 8i as in (3.3.3),
471
§12. 1 . State-Space Models
we conclude that { r;} satisfies the ARMA equations,
Yr + m - <!> 1 Yr + m - 1 - · · · - ¢m 1'; = [8m em - !
81
· · ·
z,
Zr + l
1 ] Z, + z
( 1 2. 1 .28)
Thus the stationary process { r;} defined by ( 1 2. 1 .24) and ( 1 2. 1 .25) satisfies
¢(B) r; = 8(B)Z, t = 0, ± 1, . . . , and therefore coincides with the unique
stationary solution of these equations.
ExAMPLE 1 2. 1 .7 (State-Space Representation of an ARIMA(p, d, q) Process).
If { l';} is an ARIMA(p, d, q) process with {Vd l';} satisfying ( 1 2. 1 . 1 9), then by
the preceding example {Vd l';} has the representation,
t = 0, ± 1, . . . '
( 12. 1 .29)
where { X,} is the unique stationary solution of the state equation,
X, + I = FX, + HZ,
and G, F and H are the coefficient matrices in (12. 1 .24) and ( 1 2. 1 .25). Let A
and B be the d x 1 and d x d matrices defined by A = B = 1 if d = 1 and
A=
0
0
0
,
B=
0
0
0
0
0
0
!
+
d
d
d
( - 1 ) (�) ( - 1 ) (d � d ( - 1 ) l (d � z )
if d > 1 . Then since
r; = vd r; the vector
0
� . ( - 1)1 1'; -j ,
d) .
(
j�
L.
I
0
0
1
d
(12. 1 .30)
}
satisfies the equation,
Y, = AVd l"; + BY, _ 1 = A GX, + BY, _ 1 + AZ,.
Defining a new state vector T, by stacking X, and Y, _ 1 , we therefore obtain
the state equation,
t = 1 , 2, . . . ' (12. 1 .31)
1 2. State-Space Models and the Kalman Recursions
472
and the observation equation, from ( 1 2. 1 .29) and ( 12. 1 .30),
(-1)d + 1 (�) (-1)d(d�d (-l)d - 1 (d �z)
l'; = [G
d]
. . .
t
=
[yt-Xt 1 J
+
Z� >
1 , 2, . . . , (12.1.32)
with initial condition,
and the orthogonality conditions,
Y0
_l
t = 0, ± 1, . . . ,
Z�>
( 1 2. 1 .33)
. . . , Y0)'. The conditions (12.1 .33), which are satisfied
where Y0 =
in particular if Y0 is considered to be non-random and equal to the vector
of observed values (y
y2 . . . , y0)', are imposed to ensure that the
j_ Y0 and
assumptions (a)-(e) are satisfied. They also imply that
Y0 j_ V l'; , t :2: 1, as required earlier in Section 9.5 for prediction of ARIMA
processes.
State-space models for more general ARIMA processes (e.g. { l';} such that
{VV l';} is an ARMA(p, q) process) can be constructed in the same way.
(See Problem 1 2.9.)
(Y1 _d, Y2 - d,
1 - d, - d,
X1
d
12
For the ARIMA( l , 1, 1) process defined by
(1 + BB)Z� >
{Z1} � WN(O, a 2),
the vectors X1 and Y1 _ 1 reduce to 1 = X1 and Y1 _ 1 = fr _ . The state-space
representation ( 1 2. 1 .31) and (12.1.32) becomes (Problem 1 2.7)
(1
- ¢B)(l - B) l';
=
Yr [ 1
[xt + 1 ] [¢ o][ xt [¢ + e]zp
Yr 1 1 Yr- 1 J 1
=
where
=
and
X
1] [ Yr-xt 1 J + zp
+
1
(12.1 .34)
t=
1, 2, . . . , ( 1 2. 1 .35)
( 1 2. 1 .36)
ExAMPLE 1 2. 1 .8 (An ARMA Process with Observational Error). The
is not always the most
canonical observable representation of Example
convenient representation to employ. For example, if instead of observing
the ARMA process { l';}, we observe
12.1.6
473
§12. 1 . State-Space Models
where { N,} is a white-noise sequence uncorrelated with { Y,}, then a
state-space representation for { U,} is immediately obtained by retaining the
state equation ( 1 2. 1 .23) of Example 1 2. 1 .5 and replacing the observation
equation (12.1.20) by
U, = [8, _ , 8, _ 2 · · · 80]X, + N, .
(The state-space model in Example 1 2. 1 .6 can also be adapted to allow for
added observational noise.)
State-space representations have many virtues, their generality, their ease
of analysis, and the availability of the Kalman recursions which make least
squares prediction and estimation a routine matter. Applications of the latter
will be discussed later in this chapter. We conclude this section with a simple
application of the state-space representation of Example 1 2. 1 .6 to the
determination of the autocovariance function of a causal ARMA process.
The Autocovariance Function of a Causal ARMA Process
If { Y,} is the causal ARMA process defined by ( 1 2. 1 . 19), then we know from
Example 1 2. 1 .6 and ( 12. 1 . 1 4) that
where
G = [ 1 0 0 · · · OJ
and
X, = HZ, _ , + FHZ, _ 2 + F2 HZ, _ 3 + · · · ,
with the square matrix F and the column vector H as in (12.1.25). It follows
at once from these equations that
00
Y, = z, + I GFj - 1 HZ, _ j ·
j= 1
Hence
(12.1.37)
{
l
2
if k = 0,
, r(k) = a [ 1 + If= 1 GFj - H H'F'j - 1 G']
Y
a 2 [GF ik i - ! H + If= , GFj - 1 HH'F' I k l + j - 1 G'] if k i= 0.
The coefficients lj;j in the representation Y,
( 1 2. 1 .37) as
{
=
If= 0 lj;j z, _ j can be read from
1
ifj = 0,
=
I
j
j
1/1
G F - H ifj � 1 '
which shows in particular that lj;j converges to zero geometrically as j -> oo .
474
12. State-Space Models and the Kalman Recursions
This argument, unlike those used in Chapter 3, does not reqmre any
knowledge of the general theory of difference equations.
§ 12.2
The Kalman Recursions
In this section we shall consider three fundamental problems associated with
the state-space model defined by (12. 1 . 1 ) and (12. 1 .2) under the assumptions
(a)-(e) of Section 1 2. 1 . These are all concerned with finding best (in the sense
of minimum mean-square error) linear estimates of the state-vector X, in
terms of the observations Y 1, Y 2 , . . . , and a random vector Y 0 satisfying the
conditions
(12.2. 1 )
The vector Y 0 will depend o n the type of estimates required. I n many (but
not all) applications, Y 0 will be the degenerate random vector
Y0 = 1 ( 1 , 1, . . . , 1)'. Estimation of X, in terms of
=
(a) Y0, . . . , Y, 1 defines the prediction problem,
(b) Y 0, . . . , Y, defines the filtering problem, and
(c) Y0, . . . , Y" defines the smoothing problem (in which it is assumed that
_
n > t).
Each of these problems can be solved recursively using an appropriate
set of Kalman recursions which will be established in this section. Before we
can do so however we need to clarify the meaning of best linear estimate in
this context.
12.2.1. The best one-step linear predictor, X10 of X, = (X, 1 , • • . , X r v)'
is the random vector whose ith component, i = 1, . . . , v, is the best linear
predictor of X ti in terms of all the components of the t vectors, Y0,
y I ' . . . , Y, - I • More generally the best estimator x t lk of X, is the random
vector whose ith component, i = 1, . . . , v, is the best linear estimator of X ti
in terms of all the components of Y 0, Y 1 , . . . , Yk . The latter notation covers
all three problems (a), (b) and (c) with k = t - 1 , t and n respectively. In
particular X, = X,1, _ 1 • The corresponding error covariance matrices are
defined to be
Definition
The Projection P(X I Y0 , . . . , Y1) of a Second-Order
Random Vector X
In order to find X, (and more generally X, 1 k) we introduce (cf. Section 1 1 .4)
the projections P(X I Y0 , . . . , Y,), where X, Y0, . . . , Y, are jointly distributed
§12.2. The Kalman Recursions
475
random vectors with finite second moments. If X is a v-component random
vector with finite second moments we shall say that X E L2 .
If X E L'2 , and Y0 , Y 1 , Y 2 , . . . have finite second moments,
then we define P(X I Y0 , . . . , Y r) to be the random v-vector whose ith
component is the projection P(X i i S) of the ith component of X onto the
span, S, of all of the components of Y0, . . . , Yr . We shall abbreviate the
notation by writing
Definition 12.2.2.
t = 0, 1 , 2, . . .
'
throughout this chapter. The operator Pr is defined on U:�
1
L2 .
Remark 1 . By the definition of P(Xi i S), PJX) is the unique random vector
with components in S such that
[X - Pr(X)]
(See ( 1 1 .4.2) and ( 1 1 .4.3).)
.1
Ys ,
s = 0, . . . ' t.
For any fixed v, Pr( · ) is a projection operator on the Hilbert space
inner product (X, Y) = Li� 1 E(Xi Y;) (see Problem 1 2. 1 0).
Orthogonality of X and Y with respect to this inner product however is not
equivalent to the definition E(XY') = 0. We shall continue to use the latter.
Remark 2.
L2 with
Remark 3. If all the components of X, Y 1 ,
distributed and Y 0 = 1, then
Pr(X) = E(X I Y �o · · · · Y r),
Remark 4. Pr is
then
• . .
, Y r are jointly normally
t 2 1.
linear in the sense that if A is any k x v matrix and X , V E L2
and
Remark 5.
If Y E L2 and X E L2 , then
P(X I Y) = MY,
where M is the v x w matrix, M = E(XY')[E(YY')r 1 and [E(YY')r 1 is any
generalized inverse of E(YY'). (A generalized inverse of a matrix S is a matrix
such that SS - 1 S = S. Every matrix has at least one. See Problem 1 2. 1 1 ).
s- 1
Proposition 12.2.1.
for t, s 2 1 ,
If { Xr } and { Yr } are defined as in ( 1 2. 1 . 1) and (12.1 .2), then
(1 2.2.2)
1 2. State-Space Models and the Kalman Recursions
476
and in particular,
X, = P, _ , (X,),
(1 2.2.3)
where X, and X,1 s are as in Definition 1 2.2. 1 and Y0 satisfies (12.2. 1).
PROOF. The result is an immediate consequence of Definitions 1 2.2. 1 and
1 2.2.2.
0
We turn next to the derivation of the Kalman recursions for the one-step
predictors of X, in the state-space model defined by ( 12. 1 . 1 ) and ( 1 2. 1 .2).
Proposition 1 2.2.2
(Kalman Prediction). Suppose that
X, + 1 = F, X, + V,,
t = 1 , 2, . . . ,
( 12.2.4)
and
where
Y, = G,X, + W, ,
E Ut = E
[ ]
V
' = O,
w,
t = 1, 2, . . .
E(U ' U')'
( 1 2.2.5)
,
[ Q , R, J,
S,
=
s;
X 1 , U 1, U 2 , . . . , are uncorrelated, and Y0 satisfies (12.2. 1). Then the one-step
predictors,
x, = P, _ , x , ,
and the error covariance matrices,
n, = E[(X, - x,)(X, - x,n
are uniquely determined by the initial conditions,
and the recursions, for t = 1, 2, . . . ,
=
=
=
=
G,n, G; + R, ,
F,n, G; + s, ,
F, n, F; + Q , ,
F, 'P, F; + e, � ,- ' e; ,
n, + , - 'P, + , ,
x, + , = F, x, + e, � ,- ' (Y, - G, X,),
where � ,- 1 is any generalized inverse of � , .
��
e,
n, + ,
'P , + ,
n, + ,
=
PROOF. We shall make use of the innovations,
1,,
(1 2.2.6)
( 1 2.2.7)
defined by 10 = Y0 and
t = 1, 2, . . . .
§ 1 2.2. The Kalman Recursions
477
The sequence {11 } is orthogonal by Remark 1. Using Remarks 4 and 5 and
the relation,
Pr( ' ) = Pr t ( · ) + P( ' l lr),
(see Problem 1 2. 1 2), we find that
Xr + t =
=
=
(1 2.2.8)
Pr - t X r + t + P(Xr + t l lr)
Pr - I (Fr xt + VI) + e� ��- � ��
Ft x t + e � ��- � � 1 >
( 1 2.2.9)
where
�� = E(I1I;) = G1il1 G; + Rn
et = E(X r + I I;) = E[(Fr Xr + vi)([Xr - XrJ'G; + w;)]
= FrQr G; + St .
To evaluate �1, 81 and il1 recursively, we observe that
n t + l = E(Xt + l x; + d - E(Xr + l x; + d = nt + l - 'Pr + l ,
where, from ( 1 2.2.4) and ( 12.2.9),
nt + 1 = Fr nt F; + Qt .
and
D
Remark 6. The initial state predictor X 1 is found using Remark 5. In the
important special case when Y0 = 1, it reduces to EX 1 .
h-Step Prediction of
{Y1}
Using the Kalman Recursions
The results of Proposition 1 2.2.2 lead to a very simple algorithm
for the recursive calculation of the best linear mean-square predictors, P1 Y1 + h'
h = 1 , 2, . . . . From ( 12.2.9), ( 1 2.2.4), ( 12.2.5), ( 12.2.7) and Remark 2 in Section
1 2. 1 , we find that
( 1 2.2. 1 0)
h = 2, 3, . . . ,
and
h
=
1, 2, . . . .
( 1 2.2. 1 1)
( 1 2.2. 1 2)
478
1 2. State-Space Models and the Kalman Recursions
From the relation,
h
= 2, 3, . . . ,
Q�h>: = E[(Xr+ h - P,X,+ h)(Xr+ h - P,X, +h)'] satisfies the recur­
h = 2, 3, . . . ' ( 1 2.2. 1 3)
with op> = 0, + 1 . Then from (1 2.2.5) and ( 12.2. 1 2) it follows that l:�h > :=
E[(Y, +h - P,Y, +h)(Yr+h - P,Y,+ h)'] is given by
h = 1 , 2, . . . .
( 1 2.2. 14)
(Kalman Filtering). Under the conditions of Proposition
1 2.2.2, and with the same notation, the estimates X,1 , = P,X, and the error
covariance matrices 0,1 , = E[(X, - Xq,)(X, - X,1 ,)'] are determined by the
relations,
( 1 2.2. 1 5)
P,X, = P, _ 1 X, + O, G; L1,- 1 (Y, - G, X,),
we find that
swns,
Proposition 1 2.2.3
and
( 12.2. 1 6)
PROOF. From ( 1 2.2.8) it follows that
where
M = E(X, I;)[E(I, I;)]- 1
E[X,(Gr(X,
Jt;)'].-1,- 1
1
( 1 2.2. 1 7)
= Q, G;.-1, .
To establish ( 1 2.2. 1 6) we write
X, - P, _ 1 X, = X, - P,X, + P,X, - P, _ 1 X, = X, - P,X, + MI,.
Using ( 1 2.2. 1 7) and the orthogonality of X, - P,X, and MI,, we find from
the last equation that
=
- X,)
+
as required.
D
Fixed Point Smoothing). Under the conditions of
and
Proposition 1 2.2.2, and with the same notation, the estimates
the error covariance matrices
are determined
for fixed t by the following recursions, which can be solved successively for
n t, t + 1 , . . . :
( 1 2.2. 1 8)
Proposition 1 2.2.4 (Kalman
=
0,1 n = E[(X, - X,1 n)(X, - X,1 ")'X,1] n = Pn X,,
n, n+1 = O, n[Fn - E>nL1n- 1 GnJ',
Q, l n = O, l n - 1 - O,, nG�.-1; 1 Gn o;, n ,
( 1 2.2. 1 9)
(1 2.2.20)
479
§ 1 2.2. The Kalman Recursions
with initial conditions, P, _ l x l = X, and n, , , = n, l r - 1 = n, (found from
Proposition 1 2.2.2).
PROOF. Using ( 1 2.2.8) we can write Pn X, = Pn _ 1 X, + Cin, where In =
Gn(Xn - Xn) + Wn . By Remark 5 above,
I
c = E[X,(Gn(Xn - Xn) + Wn)'][E(In i�)] - = n,, n G��n- l ' (1 2.2.2 1)
where n, , n := E [(X, - X,)(Xn - Xn)' ] . It follows now from ( 1 2.2.4), ( 1 2.2. 1 0),
the orthogonality ofVn and wn with X, - X" and the definition ofn,,n that
n r,n + I = E[(X, - X,)(Xn - XnY(Fn - en �n- 1 Gn)'] = n, , n[Fn - en�; 1 GnJ '
thus establishing ( 1 2.2. 1 9). To establish (1 2.2.20) we write
X, - Pn X, = X, - Pn _ 1 X, - Ci n.
Using ( 1 2.2.21) and the orthogonality of X, - PnX, and In, the last equation
then gives
n = t, t + 1, . . . ,
as required.
D
EXAMPLE 1 2.2. 1 (A Non-Stationary State-Space Model). Consider the
univariate non-stationary model defined by
t
and
where
=
1, 2, . . . '
t = 1 , 2, . . . '
We seek state estimates in terms of 1 , Y1 , ¥2 , . . . , and therefore choose Y0 = 1 .
I n the notation o f Proposition 1 2.2.2 we have n l = 'I' I = 1 , n l = 0, and the
recurswns,
� � = n, + 1 ,
e , = 2n, ,
nr + l = 40, + 1 = j-(4' + 1 - 1 ),
( 1 2.2.22)
4!1l
'1' , + 1 = 4'1', + e,2 ��- 1 = 4'1', + -- ,
1 + n,
n r + l = n r + l - 'I' , + I = j-(4' + 1 - 1) - 'I' , + I ·
Setting '1', + I = - n, + I + j-(4' + I - 1), we find from the fourth equation that
4!1'2
+ 1 - 1 ) = 4( - n, + 11
- n, + 1 + 11
3�4' - 1)) + --- .
3�4'
1 + n,
480
1 2. State-Space Models and the Kalman Recursions
This yields the recursion
Qt + 1 -
1 + 50,
1 + 0, '
---
from which it can be shown that
4 + 2j5 - (j5 - 1 )c2 - '
,
where c = !{7 + 3j5). ( 12.2.23)
2 + (J5 + 3)c 2 - r
We can now write the solution of ( 12.2.22) as
0, =
{11,
n,
+ 1,
e , = 20,
n, = �(4' - 1 ),
\{1, = �(4' - 1 ) - n, ,
=
( 12.2.24)
with n, as in ( 12.2.23).
The equations for the estimators and mean squared errors as derived in
the preceding propositions can be made quite explicit for this example. Thus
from Proposition 1 2.2.2 we find that the one-step predictor of X, + 1 satisfies
the recursions,
20,
1 + n,
X,+ 1 = 2X, + -- ( I; - X,),
�
�
�
with x l = 1 ,
and with mean squared error Q, + 1 given b y ( 12.2.23). Similarly, from (12.2. 1 2)
and (12.2.14), the one-step predictor of r; + 1 and its mean squared error are
given by,
and
L� l J = 0, + 1 + 1 .
The filtered state estimate for Xr + 1 and its mean squared error are found
from Proposition 1 2.2.3 to be
and
Qt + 1
Qt + 1 lt + 1 - _:__c:__
1 + Qt + 1
Finally the smoothed estimate of X, + 1 based on Y0 , Y1, . . . , r; + 2 is found,
using Proposition 1 2.2.4 and some simple algebra, to be
-
_
48 1
§1 2.2. The Kalman Recursions
with mean squared error,
n, + l
nt + l lt + 2 - --1 + nt + 1
It is clear from ( 12.2.23) that the mean squared error, nn of the one-step
predictor of the state, Xn converges as t --+ oo. In fact we have, as t --+ oo ,
-1
and ntlt + 1 --+ Js
.
4
These results demonstrate the improvement in estimation of the state X, as
we go from one-step prediction to filtering to smoothing based on the
observed data Y0 , . . . , Yr + 1 .
For more complex state-space models i t is not feasible to derive
explicit algebraic expressions for the coefficients and mean squared errors
as in Example 1 2.2. 1 . Numerical solution of the Kalman recursions is however
relatively straightforward.
Remark 7 .
ExAMPLE 1 2.2.2 (Prediction of an ARIMA(p, 1 , q) Process). In Example
1 2. 1.7, we derived the following state-space model for the ARIMA(p, 1 , q)
process { 1;} :
1;
where
=
[G 1 ]
[ X, 1 J
Yr -
+ Z ,,
[X, + 1 ] = [ OJ[ X,- 1 J [ ]
Yr
F
G 1
1;
X1
+
t
H
z ,,
1
=
1 , 2, . . . '
t = 1 , 2, . . . '
z: FiH z, _ i'
j=O
t = 0, ± 1 , . . . '
00
=
( 12.2.25)
( 12.2.26)
and the matrices, F, G and H are as specified in Example 1 2. 1 .7. Note that
Y0 (and the corresponding innovation I 0 = Y0) here refers to the first
observation of the ARIMA series and not to the constant value 1. The
operator P.( - ), as usual, denotes projection onto sp{ Y0 , . . . , 1;}. Letting T,
denote the state vector (X;, 1; _ 1 )' at time t, the initial conditions for the
recursions ( 1 2.2.6) and ( 12.2.7) are therefore
0
E(X� X'1 )
0 1 = E(T1T'1) =
,
T 1 = P0 T 1 =
Yo
[J
[
482
1 2. State-Space Models and the Kalman Recursions
The recursions ( 1 2.2.6) and ( 12.2.7) for the one-step predictors T, and the
error covariance matrices n, = E[(T, - T,)(T, - T,)'] can now be solved. The
h-step predictors and mean squared errors for the ARIMA process { Y,} are
then found from ( 1 2.2. 1 1)-( 12.2. 14).
It is worth noting in the preceding example, since X, = Y, - r; _ ,
is orthogonal to Y0 , t ;:::: 1 , that
Remark 8.
P,X, + 1 = P(X, + 1 I X 1 , . . . , X,) = X �+ 1 ,
where X�+ 1 is the best linear predictor of X, + 1 based on X 1 , , X,.
Consequently, the one-step predictors of the state-vectors T, = (X;, Y, _ 1 )' are
• . •
t = 1 , 2, . . . ,
with error covariance matrices,
n, =
[�* �J
where X� and � are computed by applying the recursions ( 1 2.2.6) and ( 12.2.7)
to the state-space model for the ARMA process {X,}. Applying ( 1 2.2. 1 1) and
( 1 2.2. 1 2) to the model ( 12.2.25), ( 12.2.26) we see in particular that
and
In view of the matrix manipulations associated with state-space
representations, the forecasting of ARIMA models by the method described
in Section 9.5 is simpler and more direct than the method described above.
However, if there are missing observations in the data set, the state-space
representation is much more convenient for prediction, parameter estimation
and the estimation of missing values. These problems are treated in the next
section.
Remark 9.
§ 12.3 State-Space Models with Missing Observations
State-space representations and the associated Kalman recursions are ideally
suited to the precise analysis of data with missing values, as was pointed out
by Jones (1980) in the context of maximum likelihood estimation for ARMA
processes. In this section we shall deal with two missing-value problems for
state-space models. The first is the evaluation of the (Gaussian) likelihood
based on {Yi,, . . . , YJ where i 1 , i 2 , , i, are positive integers such that
1 � i 1 < i 2 < · · · < i, � n. (This allows for observation of the process {Y,} at
• . .
483
§1 2.3. State-Space Models with Missing Observations
irregular intervals, or equivalently for the possibility that (n - r) observations
are missing from the sequence {Y 1 , , Yn} .) The solution of this problem will
enable us, in particular, to carry out maximum likelihood estimation for
ARMA and ARIMA processes with missing values. The second problem to
be considered is the minimum mean squared error estimation of the missing
values themselves.
• . .
The Gaussian Likelihood of
1 ::::;; i 1 < i2 < . . < i, ::::;; n
{Yi1,
•
•
•
, Yd,
·
Consider the state-space model defined by equations ( 1 2. 1 . 1 ) and (12. 1.2) and
suppose that the model is completely parameterized by the components of
the vector 9. If there are no missing observations, i.e. if r = n and ij = j,
j = 1, . . . , n, then the likelihood of the observations {Y 1 , . . . , Yn} is easily
found as in ( 1 1 .5.4) to be
L(9; Y l , . . . , Yn)
where Yj = Pj - l Yj and Lj = Ljl ), j � 1 , are the one-step predictors and error
covariance matrices found from ( 1 2.2. 1 2) and ( 1 2.2. 1 4) with Y0 == 1 .
To deal with the more general case o f possibly irregularly spaced
observations {Y ; ,, . . . , Y ;,}, we introduce a new series { Yr}, related to the
process {X,} by the modified observation equations, Y� = Y 0 and
t
where
G*I
=
{,
G
0
if t E { i 1 , . . . , q,
otherwise,
W*I
and { N ,} is iid with
N, � N(O, /
)
w x w ,
Ns _L X I , Ns _L
=
= 1 , 2, . . .
{w,
N,
[�J
'
( 1 2.3. 1 )
if t E { i 1 , . . . , i.},
(1 2.3.2)
otherwise,
S, t
=
0, ± 1 , . . . . (12.3.3)
Equations ( 1 2.3.1) and ( 1 2. 1 .2) constitute a state-space representation for the
new series {Yi}, which coincides with {Y,} at each t E { i 1 , i2 , . . . , ir }, and at
other times takes random values which are independent of {Y,} with a
distribution independent of 9.
Let L 1 (9 ; Y; , , . . . , Y;) be the Gaussian likelihood based on the observed
values Y;, , . . . , Y;, of Y ; , , . . . , Y ;, under the model defined by (12. 1 . 1 ) and
(12.1.2). Corresponding to these observed values, we define a new sequence,
484
1 2. State-Space Models and the Kalman Recursions
' = {0
y f , . . . ' y: , by
Yr
y*
ift E { i 1 , . . . , i,},
otherwise.
( 1 2.3.4)
Then it is clear from the preceding paragraph that
( 12.3.5)
L 1 (9 ; Y i , , . . . , yj,} = (2n)<n - r) w/ 2 L 2(9; y f , . . . , y:),
where L2 denotes the Gaussian likelihood under the model defined by ( 1 2.3. 1)
and (12.1.2).
In view of ( 1 2.3.5) we can now compute the required likelihood L 1 of the
realized values { y" t = i 1 , , i,} as follows :
.
•
.
(i) Define the sequence { y t, t = 1 , . . . , n} as in ( 1 2.3.4).
(ii) Find the one-step predictors Y7 of Y7, and their error covariance
matrices I:(, using Proposition 1 2.2.2 and the equations ( 1 2.2. 1 2) and (1 2.2. 1 4)
applied to the state-space representation, ( 1 2.3. 1 ) and ( 1 2. 1 .2) of {Yn. Denote
the realized values of the predictors, based on the observation sequence { y t},
by {yn.
(iii) The required Gaussian likelihood of the irregularly spaced
observations, { Yi , , . . . , y d, is then, by ( 12.3.5),
L 1 (9 ; Y i, , · · · , Yi)
= (2nl - rw/2
(n
n
J= 1
det I:J
)
- 1 12
{
1 n
exp - - _I (yJ - YJYI:J - 1 (yJ - YJl .
2 J= l
( 12.3.6)
}
ExAMPLE 1 2.3.1 (An AR(1) Series with One Missing Observation). Let { Y;}
be the causal AR(1) process defined by
To find the Gaussian likelihood of the observations y 1 , YJ , y4 and y 5 of Y1 ,
Y3 , Y4 and Y5 we follow the steps outlined above.
(i) Set yt = Yi , i = 1, 3, 4, 5 and Y! =
(ii) We start with the state-space model for { Y;} from Example 1 2. 1 .4, i.e.
r; = X , , X, + 1 = ¢ X , + Z, + 1. The corresponding model for { Yt } is then,
from ( 1 2.3. 1 ),
t = 1 , 2, . . .
X, + 1 = F, X, + v; ,
Y ( = G(X, + W(,
t = 1 , 2, . . .
0.
'
where
G*l
=
{
'
0
1 if t i= 2,
if t = 2,
R*l
= {0
if t
1 if t
W*' =
i=
2,
= 2,
s:
=
{o
0,
if t i= 2,
N, if t = 2,
485
§1 2.3. State-Space Models with Missing Observations
and X 1 = L� o �jz 1 _ j (see Remark 5 of Section 1 2. 1 ). Starting from the initial
conditions,
0 11-
and applying the recursions ( 1 2.2.6), we find (Problem 1 2. 1 6) that
t
and
t
1
=
{¢
0
if t = 1 ,
if t = 3,
if t = 2, 4, 5,
if t = 1 ' 3 , 4, 5,
if t = 2,
From ( 1 2.2. 1 2) and ( 1 2.2. 1 4) with h = 1 , we find that
with corresponding mean squared errors,
(iii) From the preceding calculations we can now write the likelihood of
the original data as
EXAMPLE 1 2.3.2 (An ARIMA( l , 1 , 1) Series with One Missing Observation).
Suppose we have observations y0 , y 1 , y 3 , y4 and y 5 of the ARIMA( l , 1 , 1 )
process defined i n Example 1 2. 1 .7. The Gaussian likelihood of the
observations y1, J3 , y4 and y5 conditional on Y0 = Yo can be computed by
a slight modification of the method described above. Instead of setting Y0 = 1
as in the calculation of unconditional likelihoods, we take as Y 0 the
one-component vector consisting of the first observation Y0 of the process.
The calculation of the conditional likelihood is then performed as follows:
(i) Set y[ = yi, i = 0, 1, 3, 4, 5 and Y i = 0.
(ii) The one-step predictors Y;", t = 1, 2, . . . , and their mean squared
errors are evaluated from Proposition 1 2.2.2, equations ( 1 2.2. 1 2) and ( 1 2.2. 14)
and the state-space model for { Yi} derived from Example 1 2. 1 .7, i.e.
X, + 1
Y;"
= F, X,
+
V� >
= G;"X, + W;",
t = 1 , 2, . . . '
t = 1 , 2, . . . ,
1 2. State-Space Models and the Kalman Recursions
486
G*1
S(
=
{a 2 [ ¢
+
1
e]
0
if t i= 2,
if t
=
= {[[01
1 ] if t i= 2,
OJ if t 2,
if t i= 2,
if t 2,
if t i= 2,
if t 2,
=
=
=
and
2,
and using the recursions ( 12.2.6), ( 12.2. 1 2) and ( 1 2.2. 1 4), we obtain the
. . . ' n, and mean squared errors,
is the
predicted values,
in terms of
...
evaluated at
best linear predictor of
d.
· · · ,
(iii) The required likelihood of the observations
conditional
is found by direct substitution into ( 1 2.3.6). We shall not attempt
on
to write it algebraically.
yf, YiY0 = y0,
Yf, Yi
Y�, Yf, , Y(_I.f,1 , ... ' I.! (.yi y0,
y 1 , YJ, y4, y5,
Remark 1. If we are given observations
of
an ARIMA(p, d, q) process at times 1 - d, 2 - d, . . . , 0,
where
1 :::; < < · · · < ir :::; n, we can use the representation ( 1 2. 1 .3 1) and ( 1 2. 1 .32)
with the same argument as in Example 1 2.3.2 to find the Gaussian likeliconditional on
hood of
(Missing values among the first d observations
can be
handled by treating them as unknown parameters for likelihood
maximization.) A similar analysis can be carried out using more general
differencing operators of the form ( 1 - B)d( 1 - Bs)n (see Problem 1 2.9). The
dimension of the state vector constructed in this way is max(p + d + sD, q).
Different approaches to maximum likelihood estimation for ARIMA
processes with missing values can be found in Ansley and Kohn ( 1985)
and Bell and Hillmer ( 1990).
i 1 i2
Y;,, . . . , Y;,
y1 _d, y2 -d, . . . , y0, Y;i ,, ,. Y;. .,,, i.". . , Y;,
1
Y1_d = Y 1 - d, yY21 - d,d =y2Yz-d,d,. ....,.Yo, Y0 = Yo·
- -
Observation vectors from which some but not all components
are missing can be handled using arguments similar to those used above.
For details see Brockwell, Davis and Salehi ( 1990).
Remark 2.
487
§ 1 2.3. State-Space Models with Missing Observations
Estimation of Missing Observations for State-Space Models
Given that we observe only Y;,, Y;,, . . . , Y;, , 1 ::s; i1 < i2 < · · · < i, ::s; n, where
{Y,} has the state-space representation ( 1 2. 1 . 1) and (12.1.2), we now consider
the problem of finding the minimum mean square error estimators
P(Y,IY0, Y;,, . . . , Y;,) of Y, , 1 ::s; t ::S; n where Y0 = 1. To handle this problem
we again use the modified process {Yi} defined by ( 1 2.3. 1) and ( 12. 1 .2) with
Y6 = 1 . Since Ys* = Ys for s E {i1, . . . , i,} and Y: .1 X" Y0 for 1 ::s; t ::s; n and
s ¢ { 0, i1, . . . , i,}, we immediately obtain the minimum mean squared error
state estimators,
1
::s;
t ::s; n. ( 1 2.3.7)
The right-hand side can be evaluated by direct application of the Kalman
fixed point smoothing algorithm (Proposition 1 2.2.4) to the state-space model
( 1 2.3. 1 ) and ( 1 2. 1 .2). For computational purposes the observed values of Y�,
t ¢ { 0, i1, . . . , i,} are quite immaterial. They may, for example, all be set equal
to zero, giving the sequence of observations of Yi defined in ( 1 2.3.4).
In order to evaluate P(Y, I Y0, Y;,, . . . , Y;,), 1 ::s; t ::s; n, we use ( 1 2.3.7) and
the relation,
( 1 2.3.8)
Y, = G,X, + W, .
Under the assumption that
E(V, W ;) = S,
we find from ( 1 2.3.8) that
= 0,
t = 1 , . . . , n,
( 1 2.3.9)
P(Y, I Y0, Y; , , . . . , Y;,) = G, P(X, I Y6, Y f, . . . , Y:). ( 1 2.3. 1 0)
It is essential, in estimating missing observations of Y1 with ( 1 2.3.10), to use
a state-space representation for {Y,} which satisfies ( 12.3.9). The ARMA
state-space representation in Example 1 2. 1 .5 satisfies this condition, but the
one in Example 1 2. 1 .6 does not.
ExAMPLE 1 2.3.3 (An AR(1) Series with One Missing Observation). Consider
the problem of estimating the missing value Y2 in Example 1 2.3. 1 in terms
of Y0 = 1 , Y1, Y3, Y4 and Y5. We start from the state-space model,
X,+ 1 = cf;X1 + Zr + l• 1'; = X" for { J-;}, which satisfies the required condition
(1 2.3.9). The corresponding model for { Yi} is the one used in Example 1 2.3. 1 .
Applying Proposition 1 2.2.4 to the latter model, we find that
Ps X z = P3 X z ,
n2. 3 = cf;a2 '
488
1 2. State-Space Models and the Kalman Recursions
and
n 2 1 2 - 0" 2 ,
2
n2 1 1 - (J ,
(J 2
,
Q2 l r = ( 1
+ c/J 2 )
t z 3,
where P1( · ) here denotes P( · I Y�, . . . , Yt) and nr. n • nr l n are defined
correspondingly. Since the condition ( 1 2.3.9) is satisfied, we deduce from
( 1 2.3. 1 0) that the minimum mean squared error estimator of the missing
value Y2 is
with mean squared error,
ExAMPLE 1 2.3.4 (Estimating Missing Observations of an ARIMA(p, d, q)
Process). Suppose we are given observations Y1 _ d , Y2 _ d , . . . , Y0 , ¥; 1 , , Y;,
( 1 � i 1 < i 2 · · · < ir � n) of an ARIMA(p, d, q) process. We wish to find the
best linear estimates of the missing values Y,, t ¢: { i 1 , . . . , ir} , in terms of Y, ,
t E { i 1 , . . . , ir} and the components of Y 0 := ( Y1 _ d, Y2 - d • . . . , Y0)'. This can be
done exactly as described above provided we start with a state-space
representation of the ARIMA series { Y,} which satisfies ( 12.3.9) and we apply
the Kalman recursions to the state-space model for { Yi } . Although the
representation in Example 1 2. 1.7 does not satisfy ( 12.3.9), it is quite easy to
contruct another which does by starting from the model in Example 1 2. 1 .5
d
for {V Y,} and following the same steps as in Example 1 2. 1 .7. This gives
(Problem 1 2 . 8),
• • •
( 1 2.3. 1 1)
where
[
Xr + I
Yr
OJ[ [ ]
[
J
J
=
F
AG B
Xr
Yr - 1
+
H
0
Zr + l •
( 1 2.3. 1 2)
1, 2, . . . . The matrices G, F and H are the coefficients in ( 1 2. 1 .20) and
( 1 2. 1 .23) and A and B are defined as in Example 1 2. 1 .7. We assume, as in
Example 1 2. 1 .7, that
for t =
t = 0, ± 1 , . . . .
( 1 2.3. 1 3)
This model clearly satisfies ( 12.3.9). Missing observations can therefore be
estimated by introducing the corresponding model for { Yi } and using ( 1 2.3.7)
and ( 1 2.3. 1 0) .
§ 12.4. Controllability and Observability
489
§ 12.4 Controllability and Observability
In this section we introduce the concepts of controllability and observability,
which provide a useful criterion (Proposition 1 2.4.6) for determining whether
or not the state vector in a given state-space model for {Y,} has the smallest
possible dimension. Consider the model (with X, stationary),
= FX1 + vl'
Y, = GX1 + W,,
x, + 1
where
{[�J}
�
( [;, !])
WN o,
t
=
0, ± 1, . . .
'
t = 0, ± 1 , . . . '
( 1 2.4. 1)
and F satisfies the stability condition
det(/ - Fz) #- 0 for i z l ::; 1 .
( 1 2.4.2)
From Section 1 2. 1 , X1 and Y1 have the representations
_ j = 1 Fi - 1 yt -j '
Xt -
00
'\'
i...J
00
Y, = I GFi- 1 yr -j + Wl '
j= 1
( 1 2.4.3)
and (X; , Y;)' is a stationary multivariate time series (by a simple generalization
of Problem 1 1 . 14).
To discuss controllability and observability, we introduce the subclass of
stationary state-space models for {Y1} defined by the equations
X1 + 1
yt
= FX1 + HZn
=
GXt + Zt,
t = 0, ± 1 , . . .
'
t = 0, ± 1 , . . . '
( 12.4.4)
where {Z1} WN(O, t) with dimension v, H is a v x w matrix and F is stable.
If the noise vector, Zl ' is the same as the innovation of Yl' i.e. if
�
( 1 2.4.5)
then we refer to the model ( 1 2.4.4) as an innovations representation. Obviously
if {Y,} has an innovations representation then it has a representation of the
form ( 1 2.4. 1 ) with stable F. The converse of this statement is established below
as Proposition 1 2.4. 1 .
Even with the restnchon that the white noise sequence {Z1}
satisfies ( 12.4.5), the matrices F, G and H in the innovations representation
( 1 2.4.4) are not uniquely determined (see Example 1 2.4.2). However, if t is
non-singular then the sequence of matrices { GFi - 1 H, j = 1, 2, . . . }, is
Remark 1 .
1 2. State-Space Models and the Kalman Recursions
490
necessarily the same for all innovations representation of {Y1} since from
00
L: GFi - 1 HZ1 _ j + zl '
j= 1
i
1
it follows that GF - H = E(Y1Z; _ )t - 1 with Z1 given by ( 12.4.5).
Yl
=
Under assumption ( 12.4.2), the state-space model (12.4. 1 )
has a n innovations representation, i.e. a representation of the form (12.4.4)
with noise vector Z1 defined by ( 12.4.5).
Proposition 1 2.4.1 .
PROOF. For the state vectors defined in ( 12.4. 1 ), set
X(t l s) = P(X1 1 Yi, - oo < j ::;; s).
Then, with Z1 defined by ( 1 2.4.5), Z1 l_ Yi, j < t, so that by Problem 12. 12,
P( · I Yi, j ::;; t) = P( · I Yi, j < t) + P( · I Z1).
Hence by Remark 5 of Section 1 2.2 and the orthogonality ofV1 and Yi,j < t,
where
X(t + l i t) = P(Xt + 1 1 Yj , j < t) + HZI
= P(FXI + Vt i Yj , j < t) + HZI
= FX(t i t - 1 ) + HZn
( 12.4.6)
H = E(X1 + 1 z;) [ E(Z1Z;)] - 1 ,
and [E(Z1 Z;)] - 1 is any generalized inverse of E(Z1Z;). Since (X;, Y;)' is
stationary, H can be chosen to be independent of t. Finally, since W1 l_ Yi,
j < t,
P(Y1 1 Yi,j < t) = P(GX1 + W1 1 Yi,j < t)
= GX(t i t - 1).
Together with ( 1 2.4.6), this gives the innovations representation,
as required.
X(t + l i t) = FX(t i t - 1 ) + HZn
Y1 = GX(t i t - 1 ) + Z1,
( 12.4.7)
D
EXAMPLE 1 2.4. 1 . The canonical representation (12.1 .24), ( 1 2. 1 .25) of the causal
ARMA process { I";} satisfying
t; = <P 1 Yr - 1 + · · · + <Pp Yr - p + zt + e 1 z1 - 1 + · · · + eqzt - q'
{Z1}
� WN(O, a 2 ),
( 12.4.8)
has the form ( 12.4.4). It is also an innovations representation if ( 1 2.4.8) is
invertible (Problem 1 2. 1 9). Assuming that zl E sp{ Y., - 00 < s ::;; t} and
defining
Y(t l s) = P( l-; 1 lj, - oo < j ::;; s),
§ 1 2.4. Controllability and Observability
49 1
we now show how the canonical representation arises naturally if we seek a
model of the form (1 2.4.7) with
X(t [ t - 1) = ( Y( t [ t - 1), . . . , Y(t - 1 + m [ t - 1))'
and m = max(p, q). Since we are also assuming the causality of ( 12.4.8), we
have sp{ }j ,j � s} = sp{ Zj j � s}, so that from (5.5.4)
,
Y(t + j [ t) = L t/Jk Zr + j- k
k =j
w
j = 0, . . . , m = max(p, q). ( 12.4.9)
Replacing t by t + m in (1 2.4.8) and projecting both sides onto sp{ lj, j � t},
we obtain from (1 2.4.9) and the identity, t/Jm = L:k'= 1 ¢k t/Jm -k + em (see (3.3.3)),
the relation
m
Y( t + m [ t) = I cp k Y(t + m - k [ t) + emzl
k=l
m
= kL= l cpk Y(t + m - k [ t - 1) + t/Jm Z1 •
( 1 2.4. 1 0)
Now the state-vector defined by
X( t [ t - 1) = ( Y (t [ t - 1), . . . , Y( t - 1 + m [ t - 1 ))'
satisfies the state equation
X(t + 1 [ t) = FX(t - 1 [ t) + HZ,
where F and H are the matrices in Example 1 2. 1.6 (use ( 12.4.9) for the first
r - 1 rows and ( 12.4. 10) for the last row). Together with the observation
equation,
Y, = Y (t [ t - 1) + Y, - Y(t [ t - 1)
= [1 0 . . . O]X(t [ t - 1) + Z,
this yields the canonical state-space model of Example 12.1.6.
Definition 1 2.4.1 (Controllability). The state-space model (1 2.4.4) is said to
be controllable if for any two vectors xa and xb, there exists an integer k and
noise inputs, Z l , . . . ' zk such that x k = xb when X o = Xa .
I n other words, the state-space model i s controllable, if by judicious choice
of the noise inputs, Z 1 , Z 2 , . . . , the state vector X , can be made to pass from
xa to xb . In such a case, we have
X o = Xa,
X 1 = Fxa + HZ 1 ,
492
1 2. State-Space Models and the Kalman Recursions
and hence
xb - Fkxa
where
=
[H
FH
= ek[z;.
···
···
Fk - l H] [Zi
z;. _ 1
···
Z'1 ] '
Z'1 ] ' ,
( 1 2.4. 1 2)
From these equations, we see that controllability is in fact a property of the
two matrices F and H. We therefore say that the pair (F, H) is controllable
if and only if the model ( 12.4.4) is controllable.
Proposition 1 2.4.2. The state-space model ( 1 2.4.4), or, equivalently, the pair
(F, H) is controllable ifand only ifev has rank v (where v is the dimension of X1).
PROOF. The matrix ev is called the controllability matrix. If ev has rank v
then the state can be made to pass from xa to xb in v time steps by choosing
[ Z�
1
Z'1 ] ' = e�( ev e�) - (xb - F "x.).
···
Recall from Remark 2 of Section 2.5 that ev e� is non-singular if ev has full
rank.
To establish the converse, suppose that (F, H) is controllable. If
A.(z) = det(F - zl) is the characteristic polynomial of F, then by the
Cayley-Hamilton Theorem, A.(F) = 0, so that there exist constants {3 1 , • . . , f3v
such that
( 1 2.4. 1 3)
More generally, for any k
on k) such that
�
1 , there exist constants {3 1 ,
. • •
, f3v (which depend
( 12.4. 1 4)
This is immediate for k :$; v. For k > v it follows from ( 1 2.4. 1 3) by induction.
Now if ev does not have full rank, there exists a non-zero v-vector y such that
y'ev
=
y'[H
F H · · · F" - 1 H]
=
0',
which, in conjunction with ( 1 2.4. 14), implies that
y' Fj H = O', for j = 0, 1 , . . . ,
0
Choosing Xa = and xb = y, we have from ( 1 2.4. 1 1) and the preceding
equation that
1
y'y = y'(Fkxa + HZk + FHZk - l + · · · + Fk - H Z 1) = 0,
which contradicts the fact that y # 0. Thus ev must have full rank.
Remark 2.
From the proof of this proposition, we also see that
rank( ek) :$; rank( ev) for k
rank(ek) = rank(ev) for k
:$;
>
v,
v.
D
493
§ 1 2.4. Controllability and Observability
For k ::;; v this is obvious and, for k > v, it follows from (12.4. 14) since the
columns of pv + iH, j ;;:::: 0, are in the linear span of the columns of C" .
ExAMPLE
1 2.4.2. Suppose v =
2
and w = 1 with
[ � .�]
and H =
F= "
Then
[�J.
has rank one so that (F, H) is not controllable. In this example,
. 5j - 2
for j ;;:::: 1 ,
Fi - 1 H =
0
[ J
so that by replacing V1 and Wr i n ( 12.4.3) by HZ1 and Z0 respectively, we have
f: G
[0]
.sj - z
zr -j + zt .
j= 1
Since the second component in X1 plays no role in these equations, we
can eliminate it from the state-vector through the transformation
X 1 = [ 1 OJXr = Xt t . Using these new state variables, the state-space system
is now controllable with state-space equations given by
Xr + 1 = .5 X r + 2Z0
Y, =
Y, = G
[�J
xr + zr .
This example is a special case of a more general result, Proposition 1 2.4.3,
which says that any non-controllable state-space model may be transformed
into a controllable system whose state-vector has dimension equal to
rank(CJ
ExAMPLE 1 2.4.3. Let F and H denote the coefficient matrices in the state
equation (12. 1 .25) of the canonical observable representation of an
ARMA(p, q) process. Here v = m = max(p, q) and since
j = 0, 1, . . . '
we have
t/Jv ]
t/J v + I
�
t/Jz - 1
'
494
1 2. State-Space Models and the Kalman Recursions
where tj;j are the coefficients in the power series
( 1 2.4. 1 5)
If Cv is singular, then there exists a non-zero vector, a = (av 1 >
that
_
. . •
, a0)' such
( 1 2.4. 1 6)
and hence
k = v, v + 1 , . . . , 2v - 1 .
( 1 2.4. 1 7)
Multiplying the left side of ( 1 2.4. 1 6) by the vector (¢v, . . . , ¢ 1 ) and using
(3.3.4) with j > v, we find that ( 1 2.4. 1 7) also holds with k = 2v. Repeating
this same argument with Cv replaced by the matrix [tj; i + j] i. j = 1 (which satisfies
equation ( 1 2.4. 1 6) by what we have just shown), we see that ( 1 2.4. 1 7) holds
with k = 2v + 1 . Continuing in this fashion, we conclude that ( 1 2.4. 1 7) is
valid for all k ;;::: v which implies that a(z)tf;(z) is a polynomial of degree at
most v - 1 , viz.
8(z)
v
a(z)l/l(z) = a(z) - = b0 + b 1 z + · · · + bv _ 1 z - t = b(z),
¢(z)
v
where a(z) = a0 + a 1 z + · · · + av _ 1 z - t . In particular, ¢(z) must divide a(z).
This implies that p ::; v - 1 and, since v max(p, q), that v = q > p. But since
¢(z) divides a(z), a(z)I/J(z) = b(z) is a polynomial of degree at least
q > v - 1 ;;::: deg(b(z)), a contradiction. Therefore Cv must have full rank.
=
If the state-space model ( 12.4.4) is not controllable and
k = rank( Cv), then there exists a stationary sequence of k-dimensional
state-vectors {X1} and matrices F, fi, and G such that F is stable, (F, fi) is
controllable and
Proposition 1 2.4.3.
xt + 1 = txt + Hz('
Yt = GX1 + Z1
•
( 1 2.4. 1 8)
PROOF. For any matrix M, let Yf(M) denote the range or column space of
M. By assumption rank(Cv) = k < v so that there exist v linearly independent
vectors, v 1 , . . . , vv, which can be indexed so that �(Cv) = sp{v 1 , . . . , vk }. Let
T denote the non-singular matrix
T = [V 1 V 2 .
V vJ .
• •
Observe that
§ 12.4. Controllability and Observability
495
where the second equality follows from Remark 2. Now set
F = T � 1 FT and H = T � 1 H,
so that
[
( 1 2.4. 1 9)
TF = FT and TH = H.
]
.
· · · F� as F� 1 1 F� 1 2 and cons1. d enng on y t he fi rst k co umns
By partltlomng
F2 1 F 22
of the equation in ( 1 2.4. 1 9), we obtain
[PI I ]
I
I
= F[v 1 · · · vk].
[v 1 · · · v vJ F�
z1
Since the columns of the product o n the right belong to sp{v 1 , , vd and
since v 1 , . . . , v" are linearly independent, it follows that F 2 1 = 0. Similarly,
by writing
R
=
[:J
• . •
with H 1 a k x w matrix and noting that []l(H) £:
sp{v 1 ,
vk}, we conclude that H 2 = 0.
The matrices appearing in the statement of the proposition are now
defined to be
- �
k k
,
F = F 1 1 , H = H 1 , and G = G T
O
• . . ,
[J ]
�
X
where h k is the k-dimensional identity matrix. To verify that F, G and
have the required properties, observe that
p· � 1
pj � l n = - H
0
x
[ -]
H
and
rank[R PH · · · pv � 1 R]
= rank[H fR · · · pv � 1 H]
= rank[T� 1 H (T� 1 FT)(T � 1 H) · · · (T � 1 F" � 1 T)(T � 1 H)]
= rank[H FH · · · F" � 1 H]
= rank(C v) = k.
By Remark 2 the pair (F, H) is therefore controllable. In addition, F satisfies
the stability condition ( 12.4.2) since its eigenvalues form a subset of the
eigenvalues of F which in turn are equal to the eigenvalues of F. Now let
X1 be the unique stationary solution of the state equation
x t + I = Px t + ilz1 •
Then Y1 satisfies the observation equation
Yt
=
GX1 +
zl '
1 2. State-Space Models and the Kalman Recursions
496
since we know from ( 1 2.4.4) that yt = zt + If= t GFj - t H zt j and since
1
k
t
j
j
G F - fi = G T � p t 1f
[ ]
'
-
= G(TFj - t T - 1 )(TB)
= GFj - 1 H,
j = 1 , 2, . . . .
D
Definition 1 2.4.2 (Observability). The state-space system ( 1 2.4.4) is observable
if the state X0 can be completely determined from the observations Y0,
Y 1 , when Z0 = Z 1 = · · · = 0.
For a system to be observable, X0 must be uniquely determined by the
sequence of values
GX0 , GFX0, GF 2 X0, . . . .
• . .
Thus observability is a property of the two matrices F and G and we shall
say that the pair (F, G) is observable if and only if the system ( 1 2.4.4) is
observable. If the v x kw matrix
O� := [G' F'G' · · · F'k - 1 G']
[ ]
has rank v for some k, then we can express X0 as
GX0
G X
X0 = (O� Ok) - 1 0�
o
GFk - 1X0
�
= (O�Ok) - 1 0�(0k X0),
showing that (F, G) is observable in this case.
Proposition 1 2.4.4. The pair of matrices (F, G) is observable if and only if Ov
has rank v. In particular, (F, G) is observable ifand only if(F', G') is controllable.
The matrix Ov is referred to as the observability matrix.
PROOF. The discussion leading up to the statement of the proposition shows
that the condition rank(Ov) = v is sufficient for observability. To establish
the necessity suppose that (F, G) is observable and Ov is not of full rank.
Then there exists a non-zero vector y such that Ovy = 0. This implies that
GFj - t y = 0
for j = 1 , . . . , v, and hence for all j ;::::: 1 (by ( 1 2.4. 1 4)). It is also true that
GFj - t O = 0
showing that the sequence GFj - 1 X0, j = 1 , 2, . . . , is the same for X0 = y as
for X0 = 0. This contradicts the assumed observability of (F, G), and hence
497
§1 2.4. Controllability and Observability
rank(O") must be v. The last statement of the proposition is an immediate
consequence of Proposition 1 2.4.2 and the observation that 0� = Cv where
Cv is the controllability matrix corresponding to (F', G').
0
ExAMPLE 1 2.4.3 (cont.). The canonical observable state-space model for an
ARMA process given in Example 1 2.4.6 is observable. In this case
v = m = max(p, q) and GFj - 1 is the row-vector,
j = 1 , . . . ' v.
from which it follows that the observability matrix Ov is the v-dimensional
identity matrix.
If (F, G) is not observable, then we can proceed as in Proposition 1 2.4.3
to construct two matrices F and G such that F has dimension k = rank(Ov)
and (F, G) is observable. We state this result without proof in the following
proposition.
Proposition 1 2.4.5. If the state-space model ( 12.4.4) is not observable and
k = rank(O"), then there exists a stationary sequence of k-dimensional state
vectors {X,} and matrices F, fl and G such that F is stable, (F, G) is observable
and
x., + !
= tx, + flz,,
Y, = GX, + z, .
( 1 2.4.20)
The state-space model defined by (12.4.4) and ( 12.4.5) is said to be minimal
or of minimum dimension if the coefficient matrix F has dimension less than
or equal to that of the corresponding matrix in any other state-space model
for {Y,}. A minimal state-space model is necessarily controllable and
observable; otherwise, by Propositions 1 2.4.3 and 1 2.4.4, the state equation
can be reduced in dimension. Conversely, controllable and observable
innovations models with non-singular innovations covariance are minimal,
as shown below in Proposition 1 2.4.6. This result provides a useful means
of checking for minimality, and a simple procedure (successive application
of Propositions 1 2.4.3 and 1 2.4.5) for constructing minimal state-space
models. It implies in particular that the canonical observable model for a
causal invertible ARMA process given in Example 1 2.4. 1 is minimal.
Proposition 1 2.4.6. The innovations model defined by equations ( 1 2.4.4) and
( 1 2.4.5), with t non-singular, is minimal if and only if it is controllable and
observable.
PROOF. The necessity of the conditions has already been established. To
show sufficiency, consider two controllable and observable state-space
models satisfying (1 2.4.4) and ( 1 2.4.5), with coefficient matrices (F, G, H) and
498
1 2. State-Space Models and the Kalman Recursions
(F, G, ii) and with state dimensions v and i5 respectively. It suffices to show
that v = i5. Suppose that i5 < v. From Remark 1 it follows that
j = 1, 2, . . . ,
GFj - 1 H = GFj - 1 fi,
[
]
and hence, multiplying the observability and controllability matrices for each
model, we obtain
o v cv =
GFH
GF2H
GF" - 1 H
GF"H
GF"._ 1 H GF"H
GF2� - 1 H
GH
GFH
.
.
=
o-v cv
-
( 1 2.4.21 )
Since 0" and Cv have rank v, �(Cv) = IR", �(O" C") = �(0) and hence
rank(O"C") = v. On the other hand by ( 12.4.21 ), �(O"C") � �(O v), and since
rank( Ov) = i5 (Remark 2), we obtain the contradiction i5 ;;::: rank(Ov Cv) = v.
Thus i5 = v as was to be shown.
0
§ 12.5 Recursive Bayesian State Estimation
As in Section 1 2. 1 , we consider a sequence of v-dimensional state-vectors
{ XP t ;;::: 1 } and a sequence of w-dimensional observation vectors { Y0 t ;;::: 1 } .
r
I t will be convenient t o write y< J for the wt-dimensional column vector
r
y < J = (Y'I , Y � , . . . , Y;)'.
In place of the observation and state equations ( 1 2. 1 . 1 ) and ( 12.1 .2), we
now assume the existence, for each t, of specified conditional probability
r
r
densities of Yr given (X 0 y< - I J) and of Xr + 1 given (X 0 y< l). We also assume
that these densities are independent of y<t - I J and y<tJ, respectively. Thus the
observation and state equations are replaced by the collection of conditional
densities,
t = 1 , 2, . . . ,
( 1 2.5. 1 )
and
t = 1 , 2, . . . ,
( 1 2.5.2)
with respect to measures v and Jl, respectively. We shall also assume that
the initial state X 1 has a probability density p 1 with respect to f.l. If all the
conditions of this paragraph are satisfied, we shall say that the densities
{p 1 , p�ol, p�5l, t = 1 , 2 , . . . } define a Bayesian state-space model for {Yr}.
In order to solve the filtering and prediction problems in this setting, we
shall determine the conditional densities p�fl(x r 1 y<r )) of X r given y<tJ , and
p�Pl(xr i Y<t - l l) of Xr given y<t - 1 ), respectively. The minimum mean squared
r
error estimates of Xr based on y< J and y<t - I J can then be computed as the
r
conditional expectations, E(Xr l y< l ) and E(Xr l yU - I l).
§ 1 2.5. Recursive Bayesian State Estimation
499
The required conditional densities p�fl and p�Pl can be determined from
t = 1, 2, . . . } and the following recursions, the first of which is
obtained by a direct application of Bayes' s Theorem using the assumption
that the distribution of Y, given ( X , , yu - J l ) does not depend on yu - J l :
{p 1 , p�0l, p�sl,
= p�ol(y, I x,)p�Pl(x, I y<' - 1 l)/c,(y<tl),
P� 1 1 (x, + J l y<'l) = p�fl(x, I yUl)p�sl(x, + 1 1 x,) d11(x,),
PVl(x, I yUl)
where
f
( 1 2.5.3)
(1 2.5.4)
cr(y (tl ) = Pv, rvu - l )(y, l y<r - l l).
The initial condition needed to solve these recursions is
(1 2.5.5)
The constant cr(y<' l) appearing in (12.5.3) is just a scale factor, determined by
the condition J p�fl(x, 1 y<tl) d11(x,) = 1 .
ExAMPLES 1 2.5. 1 . Consider the linear state-space model defined b y (12. 1 . 1 )
and (12.1 .2) and suppose, i n addition t o the assumptions made earlier, that
{ X 1 , W 1 , V 1 , W2 , V2 , . . } is a sequence of independent Gaussian random
vectors, each having a non-degenerate multivariate normal density.
Using the same notation as in Section 1 2. 1 , we can reformulate this system
as a Bayesian state-space model, characterized by the Gaussian probability
densities,
.
= n(x 1 ; EX � > f.! 1 ),
p�ol(y, / X, ) = n(y, ; G, X, , R,),
p�s l(x, + 1 1 X ,) = n(x, + 1 ; F, X" Q,),
p 1 (x 1 )
( 12.5.6)
( 1 2.5. 7)
( 1 2.5.8)
where n(x ; �' I:) denotes the multivariate normal density with mean � and
covariance matrix I:. Note that in this formulation, we assume that
S, E( V, W;) = 0 in order to satisfy the requirement that the distribution of
X, + 1 given (X" yul) does not depend on Y ( tl .
To solve the filtering and prediction problems in the Bayesian framework,
we first observe that the conditional densities, p�fl and p�Pl are both normal.
We shall write them (using notation analogous to that of Section 1 2.2) as
=
( 1 2.5.9)
and
( 1 2.5. 1 0)
From (1 2.5.4), ( 1 2.5.8), (1 2.5.9) and (12.5. 1 0), we find that
1 2. State-Space Models and the Kalman Recursions
500
and
0, + 1 = F,n, 1 , F; +
Q, .
Substituting the corresponding density n(x, ; X, , n,) for
( 1 2.5.3) and ( 1 2.5.7), we find that
rlii/
and
-1
G; R,- 1 G, + n,
1
= G;R,- G, + (F, _ 1 Qr - 1 l r - 1 F; _ 1 +
=
m
Q, _ d - 1
1
X,1, = X , + n, 1, G; R,- (Y, - G, X,).
�
�
From (1 2.5.3) with
conditions,
and
p�Pl (x, l Y ( ' - 1 l)
pi_Pl(x 1 l y( 0 l) = n( x 1 ; EX 1 , Q 1 )
we obtain the initial
n.- 1
1
n. - 1
��1 1 1 = G '1 R 1- G 1 + ·�1 .
Remark 1 . Under the assumptions made in Example 1 2.5. 1 , the recursions
of Propositions 1 2.2.2 and 1 2.2.3 give the same results for X, and X, 1 " since
for Gaussian systems best linear mean-square estimation is equivalent to
best mean-square estimation. Note that the recursions of Example 1 2.5. 1
require stronger assumptions on the covariance matrices (including existence
of R,- 1 ) than the recursions of Section 1 2.2, which require only the
assumptions (a)-(e) of Section 1 2. 1 .
12.5.2. Application of the results of Example 1 2.5. 1 to Example
1 2.2. 1 (with X 1 = 1 and the additional assumption that the sequence
{ W1 , V1 , W2 , V2 , . . . } is Gaussian) immediately furnishes the recursions,
1
n,l/ = 1 + (4n, - 1 1 ' - 1 + 1 ) - ,
ExAMPLE
X, I, = 2 xr - 1 1r - 1
+ n, l l ¥;
-
2X r - 1 l r - d ,
X , + 1 = 2Xt l t >
n, + 1 = 4n, l , +
1,
with initial conditions X 1 1 1 = 1 and Q 1 1 1 = 0. It is easy to check that these
recursions are equivalent to those derived earlier in Example 1 2.2. 1 .
ExAMPLE 1 2.5.3 (A Non-Gaussian Example). I n general the solution of the
recursions ( 1 2.5.3) and ( 12.5.4) presents substantial computational problems.
Numerical methods for dealing with non-Gaussian models are discussed by
Sorenson and Alspach ( 1 971) and Kitagawa ( 1987). Here we shall illustrate
the recursions (1 2.5.3) and ( 12.5.4) in a very simple special case. Consider the
state equation,
( 1 2.5. 1 1)
X, = aX, _ 1 ,
Problems
501
with observation density (relative to counting measure on the non-negative
integers),
p,(o) (y, l x,) =
(nx,)Y e - nx
'
1
y, .
,
,
y,
= 0,
1, . . .
( 1 2.5. 1 2)
'
and initial density (with respect to Lebesgue measure),
( 1 2 . 5 . 1 3)
X ;:::: 0.
(This is a simplified model for the evolution of the number X, of individuals
at time t infected with a rare disease in which X, is treated as a continuous
rather than an integer-valued random variable. The observation y; represents
the number of infected individuals observed in a random sample consisting
of a small fraction n of the population at time t.) Although there is no density
p�sl(x, l x, _ 1 ) with respect to Lebesgue measure corresponding to the state
equation ( 1 2.5 . 1 1 ), it is clear that the recursion ( 1 2.5.4) is replaced in, this
case by the relation,
( 1 2.5. 1 4)
while the recursion ( 1 2.5 . 3) is exactly as before. The filtering and prediction
densities p�JJ and p�Pl are both with respect to Lebesgue measure. Solving for
pf(l from ( 1 2.5. 1 3) and the initial condition ( 1 2.5.5) and then successively
substituting in the recursions ( 1 2.5. 14) and ( 1 2.5.3), we easily find that
t :::::
( 1 2.5. 1 5)
0,
and
t :::::
0,
where a, = a + y 1 + . · + y, and A, = Aa 1 - ' + n:(l a - ')/( 1 - a - 1 ). In
particular the minimum mean squared error estimate of x, based on y<n is
the conditional expectation a,/A, The minimum mean squared error is the
variance of the distribution defined by ( 1 2.5. 1 5), i.e. a,/A� .
.
-
.
Problems
1 2. 1 . For the state-space model ( 1 2. 1 . 1 ) and (12.1 .2), show that
F, F, _ 1 · · · F 1 X 1 + v, + F, V, _ 1 + · · · + F, F, _ 1 · · · F2 V 1
x, + 1
=
Y, =
G,(F, _ 1 F, _ 2 · · · F 1 X 1 ) + G, V, - 1
+ G, F, _ 1 F, _ 2 · · · F2 V 1 + W, .
and
+
G,F, _ Iv,_ 2 + · · ·
1 2. State-Space Models and the Kalman Recursions
502
These expressions define j, and g, in ( 1 2. 1 .3) and ( 1 2. 1 .4). Specialize to the case
when F, = F and G, = G for all t.
1 2.2. Consider the two state-space models
and
{
Xr + l . l
: F 1X r 1 + Vt l ,
{
Xt + l . z
: Fz X,z + V,z ,
Y,. 1 - G 1 X, 1 + Wn ,
Y, 2 - G 2 X, 2 + W, 2 ,
where {(V ; 1 , W; 1 , V; 2 , W; 2)'} i s white noise. Derive a state-space representation
for { (Y; 1 , Y ; 2)'}.
1 2.3. Show that the unique stationary solutions to equations ( 1 2. 1 . 1 3) and ( 1 2. 1 . 1 2)
are given by the infinite series
00
X, =
L
j= 0
piy r - j - 1
and
Y,
=
W,
00
+ L
j=O
GFiV,_ j - 1 ,
which converge i n mean square provided det(J - Fz) o1 0 for I z I ::::; 1 . Conclude
that {(X ; , v;)'} is a multivariate stationary process. (Hint : Show that there
exists an 8 > 0 such that (I - Fz) - 1 has the power series representation,
L:J= 0 Fizi, in the region l z l < 1 + 8.)
1 2.4. Let F be the coefficient of X, in the state equation ( 1 2. 1 . 1 8) for the causal
AR(p) process
Establish the stability of ( 1 2. 1 . 1 8) by showing that the eigenvalues of F are
equal to the reciprocals of the zeros of the autoregressive polynomial c/J(z). In
particular, show that
det(z/ - F) = zPcjJ(z - 1 ).
1 2.5. Let { X, } be the unique stationary solution of the state equation ( 1 2 . 1 .23) and
suppose that { Y,} is defined by ( 1 2 . 1 .20). Show that { Y,} must be the unique
stationary solution of the ARMA equations ( 1 2. 1 . 1 9).
1 2.6. Let { Y,} be the MA( l ) process
Y, =
z, + ez
,_ 1,
(a) Show that { Y,} has the state-space representation
Y, =
[ 1 O] X,
where { X, } is the unique stationary solution of
X, + 1 =
[� �]x, [�jz,n
+
503
Problems
In particular, show that the state-vector X, may be written as
(b) Display the state-space model for { Y;} obtained from Example 1 2 . 1 .6.
1 2.7. Verify equations ( l 2. 1 .34H 12 . 1 .36) for an ARIMA( 1 , 1 , 1 ) process.
1 2.8. Let { Y;} be an ARIMA(p, d, q) process. By using the state-space model
Example 1 2. 1 .5 show that { Y;} has the representation
m
Y; = GX,
with
X, + 1
=
FX, + HZ, + 1
for t = 1 , 2, . . . and suitably chosen matrices F, G and H. Write down the
explicit form of the observation and state equations for an ARIMA( 1 , 1 , 1 )
process and compare with equations ( 1 2 . 1 .34H 1 2 . 1 .36).
1 2.9. Following the technique of Example 1 2. 1 .7, write down a state-space model
for { Y;} where {V'V' 1 2 Y;} is an ARMA(p, q) process.
2
1 2 . 1 0. Show that the set L� of random v-vectors with all components in L (Q, ff, P)
is a Hilbert space if we define the inner product to be < X, Y) = L:r� 1 E(Xi Y;)
for all X, Y E L� . If X, Y0 , . . . , Y, E L� show that P(X I Y0, . . . , Y,) as in
Definition 1 2.2.2 is the projection of X (in this Hilbert space) onto S", the
closed linear subspace of L� consisting of all vectors of the form
C0 Y0 + · · · + C, Y" where C0 , . . . , C, are constant matrices.
1 2. 1 1 . Prove Remark 5 of Section 1 2.2. Note also that if the linear equation,
Sx
=
b,
has a solution, then x = s - 1 b is a solution for any generalized inverse
1
1
s- I of s. (If Sy = b for some vector y then S(S - b) = ss - Sy Sy = b.)
=
1 2. 1 2. Let A1 and A2 be two closed subspaces of a Hilbert space £' and suppose
that A1 .l A2 (i.e. x .l y for all x E A1 and y E A2). Show that
where A1 EB A2 is the closed subspace {x + y: x E A1 ,
( 1 2.2.8) follows immediately from this identity.
y
E A2}. Note that
1 2. 1 3. The mass of a body grows according to the rule
X, + 1 = aX, + V.,
where X 1 i s known to be 1 0 exactly and { V. }
a > 1,
�
Y; = X, + W, ,
WN(O, 1 ). At time t we observe
504
1 2. State-Space Models and the Kalman Recursions
where { W,} - WN(O, I ) and { W,} is uncorrelated with { v;}. If P, denotes
2
projection (in L (Q, :1'; P)) onto sp { l , Y1, . . . , Y,}, t ?. 1 , and P0 denotes
projection onto sp{ 1 } ,
(a) express a;+ 1 i n terms o f a; , where
t = 1 , 2, . . . '
(b) express P,X, + 1 in terms of a;, Y, and P, _ 1 X,,
(c) evaluate P2 X 3 and its mean squared error if Y2 = 1 2, and
(d) assuming that Iim,_ ro a; exists, determine its value.
a =
1 .5,
1 2. 1 4. Use the representation found in Problem 1 2.6(a) to derive a recursive scheme
for computing the best linear one-step predictors Y, based on Y1, , Y, _ 1 and
their mean squared errors.
• • •
1 2. 1 5. Consider the state-space model defined by ( 12.2.4) and ( 12.2.5) with F,
and G, G for all t and let k > h ?. I . Show that
=
=
F
and
E(Y, + k - PY, + k)(Yc + h - P,Y, + h)' = GFk - hQ:h)G' + GFk - hSc + h ·
-1
1 2. 1 6. Verify the calculation of 0 , �, and Q, in Example 1 2.3. 1 .
1 2. 1 7. Verify the calculation of P 5 X 2 and its mean squared error in Example 1 2.3.3.
1 2. 1 8. Let y1 = - .2 1 0, y2
the MA( l ) process
=
.968, y4 = .618 and y5
Y, = Z, + .5Z, _ �>
=
- .880 be observed values of
{Z,} - WN(O, 1 ).
Compute P( Y6 1 Y1, Y2 , Y4, Y5) and its mean squared error.
Compute P(Y7 1 Y1, Y2 , Y4, Y5) and its mean squared error.
Compute P( Y3 1 Y1, Y2 , Y4, Y5) and its mean squared error.
Substitute the value found in (c) for the missing observation y 3 and evaluate
P( Y6 1 Y1, Y2 , Y3 , Y4, Y5) using the enlarged data set.
(e) Explain in terms of projection operators why the results of (a) and (d) are
the same.
(a)
(b)
(c)
(d)
12.19. Show that the state-space representation ( 1 2. 1 .24), ( 1 2. 1 .25) of a causal
invertible ARMA(p, q) process is also an innovations representation.
1 2.20. Consider the non-invertible MA( l ) process,
Y, = Z, + 2Z, _ 1,
{Z,} - WN(O, 1).
Find an innovations representation of { Y,} (i.e. a state-space model of the form
( 1 2.4.4) which satisfies (1 2.4.5)).
1 2.2 1 . Let { v;} be a sequence of independent exponential random variables with
1
E v; = t - and suppose that {X, , t ?. 1 } and { Y,, t ?. 1 } are the state and
observation random variables, respectively, of the state-space system,
x i = vi,
x, = x, _ 1 + v;,
t
=
2, 3, . . . '
505
Problems
where the distribution of the observation 1;, conditional on the random
variables X 1, Y2 , I :s; s < t, is Poisson with mean X, .
(a) Determine the densities { p 1 , p�ol, p�sl, t 2 I }, in the Bayesian state-space
model for { 1;}.
(b) Show, using ( 1 2.5.3HI 2.5.5), that
and
!
P 2Pl ( x 2 I Y I )
(c) Show that
=
2 2 + y,
x l + y, e - zx, ,
r(2 + Y d 2
x2
> 0.
x,
> 0,
and
Xr + l
> 0,
where Ci1 = y1 + · · · + y, .
(d) Conclude from (c) that the minimum mean squared error estimates of X,
and X, + 1 based on Y1, . . . , 1;, are
X, l , =
t + Y1 + · · · + l';
t+ I
-----­
and
X, + t
�
respectively.
t + l + Y1 + · · · + 1;
t+I
= ------
CHAPTER 1 3
Further Topics
In this final chapter we touch on a variety of topics of special interest. In
Section 1 3. 1 we consider transfer function models, designed to exploit, for
predictive purposes, the relationship between two time series when one leads
the other. Section 1 3.2 deals with long-memory models, characterized by
very slow convergence to zero of the autocorrelations p(h) as h -+ oo. Such
models are suggested by numerous observed series in hydrology and
economics. In Section 1 3.3 we examine linear time-series models with infinite
variance and in Section 1 3.4 we briefly consider non-linear models and their
applications.
§ 1 3. 1 Transfer Function Modelling
In this section we consider the problem of estimating the transfer function
of a linear filter when the output includes added uncorrelated noise. Suppose
that {X, t } and {X, 2 } are, respectively, the input and output of the transfer
function model
x, 2
=
co
L rj x r -j. l + N, ,
j=O
( 1 3. 1 . 1 )
where T = { ti , j = 0, 1 , . . . } i s a causal time-invariant linear filter and { N,} is a
zero-mean stationary process, uncorrelated with the input process {X1 1 }.
Suppose also that { X, J } is a zero-mean stationary time series. Then the
bivariate process {(Xt l , X,2)'} is also stationary. From the analysis of
Example 1 1 .6.4, the transfer function T(e - i '-) = L� 0 ti e- ii '-, - n < A. :::;; n,
507
§ 1 3. 1 . Transfer Function Modelling
can be expressed in terms of the spectrum of
{(X,�> Xd'} (see 1 1 .6. 1 7)) as
( 1 3 . 1 .2)
{
The analogous time-domain equation which relates the weights tj } to the
cross covariances is
00
f2 1 (k) = jL:= O tjy l l (k - j).
( 1 3 . 1 .3)
X,
This equation is obtained by multiplying each side of ( 1 3. 1 . 1) by - k , l and
then taking expectations.
The equations ( 1 3. 1 .2) and ( 1 3 . 1 .3) simplify a great deal if the input process
� WN(O,
then we can
happens to be white noise. For example, if
immediately identify tk from ( 1 3. 1 .3) as
{X,J}
ai),
( 1 3 . 1 .4)
This observation suggests that "pre-whitening" of the input process might
simplify the identification of an appropriate transfer-function model and at
the same time provide simple preliminary estimates of the coefficients tk .
If
can be represented as an invertible ARMA(p, q) process,
{ X, J}
( 1 3. 1 .5)
c/J(B)Xt ! = 8(B)Z,, { Z,} � WN(O, a;J,
then application of the filter n(B) = ¢(B)8- 1 (B) to {X, t } will produce the
whitened series { Z,}. Now applying the operator n(B) to each side of ( 1 3. 1 . 1 )
and letting Y, = n(B)X, 2 , we obtain the relation,
Y, L tj Z, _ j + N;,
j= O
=
where
00
N; = n(B)N,
{ N;}
{
and
is a zero-mean stationary process, uncorrelated with Z, } . The same
arguments which gave ( 1 3 . 1 .2) and ( 1 3 . 1 .4) therefore yield, when applied to
(Z, , Y,)',
00
L
h=
T(e - ; .) = 2na; 2jyz(Jc) = ai, 2 Yrz(h)e - ih).
- w
and
= PrzU)ariaz ,
where Prz( is the cross-correlation function of { Y, } and { Z, } ,fyz( is the
cross spectrum, ai = Var(Z,) and a� = Var( Y,).
Given the observations {(X tl , X, 2)', t = 1 , . . . , n}, the results of the
previous paragraph suggest the following procedure for estimating { tJ and
·)
tj
·
)
1 3. Further
508
Topics
analyzing the noise { N1} in the model ( 1 3 . 1 . 1 ) :
<f. and 9 denote the maximum likelihood estimates of the autoregressive and
( 1) Fit an ARMA model to {X1 1 } and file the residuals {Z 1 ,
. • •
, 2"} . Let
moving average parameters and let «J� be the maximum likelihood estimate
of the variance of { Zr }.
(2) Apply the operator fi(B) = ify(B)fr 1 (B) to { X12}, giving the series
{ i\ , . . . , f.} . The values Y1 can be computed as the residuals obtained by
running the computer program PEST with initial coefficients <f., 9 and using
Option 8 with 0 iterations. Let 8� denote the sample variance of Y1 •
(3) Compute the sample cross-correlation function Prz(h) between { � }
and { .ZJ Comparison of pyz(h) with the bounds ± 1 .96rt - 1 1 2 gives a
preliminary indication of the lags h at which Prz(h) is significantly different
from zero. A more refined check can be carried out by using Bartlett 's formula
(Theorem 1 1 .2.3) for the asymptotic variance of Prz(h). Under the assumptions
that {Z1} � WN(O, 8�) and { (Y� ' Z1Y} is a stationary Gaussian process,
{
n Va r(p rz(h)) � 1 - p �z(h 1 .5
<Xl
-
k
��
00
(p �z(k) + p � y(k)/2)
J
+ L [pyz(h + k)p rz(h - k) - 2pyz(h)p rz( - k)p h(h + k)].
k� -
<Xl
In order to check the hypothesis H 0 that Prz(h) = 0, h ¢ [a, b], where a and
b are integers, we note from Corollary 1 1 .2. 1 that under H 0 ,
Var(p yz(h)) � n - 1 for h ¢ [a, b].
We can therefore check the hypothesis H 0 by comparing P rz , h ¢ [a, b] with
the bounds ± l .96n - 1 1 2 • Observe that Prz(h) should be zero for h < 0 if the
model ( 1 3 . 1 . 1) is valid.
(4) Preliminary estimates of th for lags h at which Prz( h) is found to be
significantly different from zero are
For other values of h the preliminary estimates are th = 0. Let m ?: 0 be the
largest value of j such that ti is non-zero and let b ?: 0 be the smallest such
value. Then b is known as the delay parameter of the filter { ij}. If m is very
large and if the coefficients {tJ are approximately related by difference
equations of the form
j ?: b + p,
then T(B) = L'J'� b tiB can be represented approximately, using fewer
parameters, as
i
509
§ 1 3 . 1 . Transfer Function Modelling
In particular, if ij = 0, j < b, and
0 = w0 v( b' j ;;::: b, then
( 1 3 . 1 .6)
T(B) = w0(1 - v 1 B) - 1 Bb .
Box and Jenkins (1 970) recommend choosing T(B) to be a ratio of two
polynomials, however the degrees of the polynomials are often difficult
to estimate from {ti} . The primary objective at this stage is to find a para­
metric function which provides an adequate approximation to T(B) with­
out introducing too large a number of parameters. If T(B) is represented as
T(B) = Bbw(B)v - 1 (B) = Bb(w0 + w 1 B + · · · + wq Bq)(1 - v 1 B - · · · - vP BP) - 1
with v(z) # 0 for l z l :::;; 1 , then we define m = max(q + b, p).
(5) The noise sequence {N�' t = m + 1 , . . . , n} is estimated by
iV,
= x, 2 - f(B)Xtl.
(We set N, = 0, t :::;; m , in order to compute N, , t > m = max(b + q, p).)
(6) Preliminary identification of a suitable model for the noise sequence
is carried out by fitting a causal invertible ARMA model
¢ < Nl(B)N, = e< Nl(B) W,,
{ W,} � WN(O, O"�),
to the estimated noise N m + I ' . . . , Nn .
(7) Selection o f the parameters b, p and q and the orders p 2 and q 2 of
cP(N)( · ) and e< Nl( · ) gives the preliminary model,
¢< Nl(B)v(B)X, 2 Bb ¢< Nl(B)w(B)Xt 1 + e< Nl(B)v(B) W,,
where T(B) Bb w(B)v - 1 (B) as in step (4). For this model we can compute
W;(w, v, <P<Nl, o<Nl), t > m * = max(p 2 + p, b + p 2 + q), by setting W, = 0 for
t :::;; m * . The parameters w, v, <P< NJ and o< Nl can then be estimated by
mlfllm!Zlflg
=
=
n
( 1 3 . 1 . 7)
L w;(w, v, <P(N), o< Nl)
r ::::: m* + 1
subject to the constraints that ¢ < Nl(z), e< Nl(z) and v(z) are all non-zero for
l z l :::;; 1 . The preliminary estimates from steps (4) and (6) can be used as initial
values in the minimization and the minimization may be carried out using
the program TRANS. Alternatively, the parameters can be estimated by
maximum likelihood, as discussed in Section 1 2.3, using a state-space
representation for the transfer function model (see ( 1 3. 1 . 19) and ( 1 3. 1 .20)).
(8) From the least squares estimators of the parameters of T(B), a new
estimated noise sequence can be computed as in step (5) and checked for
compatibility with the ARMA model for { N,} fitted by the least squares
procedure. If the new estimated noise sequence suggests different orders for
cP(N)( . ) and e< Nl( " ), the least squares prOCedure in Step (7) can be repeated
using the new orders.
(9) To test for goodness of fit, the residuals from the ARMA fitting in
steps (1) and (6) should both be checked as described in Section 9.4. The
sample cross-correlations of the two residual series { zl ' t > m * } and
510
1 3. Further Topics
{ W,, t > m*} should also be compared with the bounds ± 1 .96/Jn in order
to check the hypothesis that the sequences { N,} and { Z,} are uncorrelated.
ExAMPLE 1 3 . 1 . 1 (Sales with a Leading Indicator). In this example we fit a
transfer function model to the bivariate time series of Example 1 1 .2.2. Let
X11
=
( 1 - B)l-; 1 - .0228,
and
X, 2 = ( 1 - B) Y, 2 - .420,
t=
1,
. . . ' 149,
t = 1 , . . . ' 149,
where { Y, 1 } and { Y, 2 }, t = 0, . . . , 149, are the leading indicator and sales data
respectively. It was found in Example 1 1.2.2 that { X, 1 } can be
modelled as the zero mean ARMA process,
X" = ( 1 - .474B)Z,,
=
{Z,}
�
WN(O, .0779).
We can therefore whiten the series by application of the filter
n(B) ( 1 - .474B) - 1 . Applying n(B) to both { X, J } and {X, 2 } we obtain
cri = .0779,
and
cJ �
4.021 7.
=
These calculations and the filing of the series { Z,} and { Y,} were carried
out using the program PEST as described in step (2). The sample corre­
lation function pyz(h) of { Y,} and { Z,}, computed using the program
TRANS, is shown in Figure 1 3. 1 . Comparison of p yz(h) with the
bounds ± 1 .96(149) - 1 1 2 = ± . 16 1 suggests that p yz(h) = 0 for h < 3. Since
tj = PrzU)crr/crz is decreasing approximately geometrically for j � 3, we take
T(B) to have the form ( 1 3 . 1 .6), i.e.
T(B) = w0(1 - v1B) - 1 B3•
Preliminary estimates of w0 and v 1 are given by w0 = t3 4.86 and
v 1 = t4/t3 .698. The estimated noise sequence is obtained from the equation
=
=
t
=
4, 5, . . . ' 149.
Examination of this sequence using the program PEST leads to the MA( l)
model,
{ W1} "' WN (O, .0590) .
Substituting these preliminary noise and transfer function models into
equation ( 1 3 . 1 . 1 ) then gives
X12 = 4.86B3 ( 1 - .698B) - 1 Xt! + ( I - . 364B) Wt,
{ Wt} "' WN(O, .0590).
511
§ 1 3. 1 . Transfer Function Modelling
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
������---7���-�----��
-0 4
-0.5
-0.6
-0 7
-0.8
-0.9
- 1 4-------�--����--�-�
-10
-20
10
0
Figure 1 3. 1 . The sample cross-correlation function pyz(h), - 20 :S h
13.1.1.
:S
20
20, of Example
Now minimizing the sum of squares ( 1 3. 1 .7) with respect to the parameters
(w0, v 1 , 8iN>) using the program TRANS, we obtain the least squares model
X, 2 = 4.7 1 7B3(1 - .724B) - 1 X, 1 + ( 1 - .582B) lf; , { lt'; } WN(O, .0486),
( 1 3. 1 .8)
�
where
Xr 1
= (1
-
.474B)Z,
{ Z,}
�
WN(O, .0779).
Notice the reduced white noise variance of { lt';} in the least squares model
as compared with the preliminary model.
The sample autocorrelation and partial autocorrelation functions for the
senes
N, = X, 2 - 4.7 1 7B3(1 - .724B) - 1 X, 1
are shown in Figure 1 3.2. These graphs strongly indicate that the MA( l )
model i s appropriate for the noise process. Moreover the residuals �
obtained from the least squares model ( 1 3. 1 .8) pass the diagnostic tests for
white noise as described in Section 9.4, and the sample cross-correlations
between the residuals � and Z,, t = 4, . , 1 49, are found to lie between the
bounds ± 1 .96/Jl44 for all lags between ± 20.
. .
1 3. Further Topics
512
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0. 1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8 .
-0.9
- 1
0
1 0
20
30
40
20
30
40
(a)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
- 1
0
1 0
(b)
Figure 1 3.2. The sample ACF (a) and PACF (b) of the estimated noise sequence
N, = X, 2 - 4.71 78 3 ( 1 - .724B) - 1 X11 of Example 1 3. 1 . 1 .
513
§ 1 3 . 1 . Transfer Function Modelling
A State-Space Representation o f the Series
{(Xtt, X12)'}t
A major goal of transfer function modelling is to provide more accurate
prediction of X n +h. 2 than can be obtained by modelling {X1 2 } as a univariate
series and projecting X " + h . 2 onto sp{ X 1 2 , 1 ::::;; t ::::;; n}. Instead, we predict
xn + h , 2 using
( 1 3 . 1 .9)
To facilitate the computation of this predictor, we shall now derive a
state-space representation of the input-output series {(X1 1 , X1 2 )'} which is
equivalent to the transfer function model. We shall then apply the Kalman
recursions of Section 1 2.2. (The state-space representation can also be used
to compute the Gaussian likelihood of {(X1 1 , X1 2 )', t = 1, . . . , n} and hence
to find maximum likelihood estimates of the model parameters. Model
selection and missing values can also be handled with the aid of the
state-space representation; for details, see Brockwell, Davis and Salehi ( 1 990).)
The transfer function model described above in steps ( 1)-(8) can be written
as
( 1 3. 1 . 1 0)
where { X t l } and { N1} are the causal and invertible ARMA processes
( 1 3. 1 . 1 1 )
{ Zr } � WN(O, a�),
¢(B)X t 1 = B(B)Z� '
( 1 3 . 1 . 1 2)
</J(Nl(B)N I = ()(Nl(B)Ht;,
{ Ht;} WN(O, a �),
�
{Z1} is uncorrelated with { Ht;}, and v(z) =/= 0 for l z l ::::;; 1 . By Example 1 2. 1 .6,
{ X t l } and { N1} have the state-space representations
Xr + l , l = F 1Xr 1 + H I ZP
( 1 3. 1 . 1 3)
X t l = G 1 x1 1 + Zr ,
and
Dr + I = F(Nlnr + H(Nl w;,
( 1 3. 1 . 14)
Nr = G(Nlnr + w;,
where (F � > G 1 , H d and (F< NJ, G< NJ, H(NJ) are defined in terms of the
autoregressive and moving average polynomials for {X1} and {Nr},
respectively, as in Example 12.1 .6. In the same manner, define the triple
(F 2 , G 2 , H 2 ) in terms of the "autoregressive" and "moving average"
polynomials v(z) and zb w(z) in ( 1 3. 1 . 1 0). From ( 1 3. 1 . 1 0), it is easy to see that
{ X1 2 } has the representation (see Problem 1 3.2),
( 1 3. 1 . 1 5)
where b = w0 if b = 0 and 0 otherwise, and { x1 2 } is the unique stationary
solution of
( 1 3 . 1 . 1 6)
t Pa ges 5 1 3-5 1 7 m a y be omitted without loss of continuity.
514
1 3. Further Topics
Substituting from ( 1 3. 1 . 1 3) and ( 1 3 . 1 . 14) into ( 1 3. 1 . 1 5) and ( 1 3. 1 . 1 6), we obtain
x t 2 = Gz Xtz + b G l xt l + b Zt + G(N)nt + w,,
xt + I . Z = FzXtz + Hz G J xt l + HzZt .
( 1 3. 1 . 1 7)
( 1 3. 1 . 1 8)
By combining ( 1 3. 1 . 1 3), ( 1 3 . 1 . 14), ( 1 3. 1 . 1 7) and ( 1 3 . 1 . 1 8), the required
state-space representation for the process, {(X t 1 , Xt 2 )', t = 1 , 2, . . . }, can now
be written down as
( 1 3 . 1 . 1 9)
where { 'It
equation,
= (xt' l , Xtz , nt')'}
f
[
' the unique stationary solution of the state
IS
F,
'It + ! = H �G �
0
0
F(N)
0
Fz
0
] [
H,
+
t
Hz
'I
0
l/t,]l�J
( 1 3 . 1 .20)
ExAMPLE 1 3. 1 . 1 (c on t .). The state-space model for the differenced and mean­
corrected leading indicator-sales data (with b = 3, w(B) = w0, v(B) = 1 - v 1 B,
cp(B) = 1 , B(B) = 1 + B I B, cp(N)(B) = 1 and e<Nl(B) = 1 + BiN)B) is
l j [
xt l
where {'It
equation,
xt z
= (x; 1 , x; 2 , n;)' }
'It + I =
0
0
0
=
IS
0
0
0
Wo 0
0 0
1
o
o
o
1
o
o
o
j [j
o l
't +
1
zt
w,
.
( 1 3. 1 .2 1 )
the umque stationary solution of the state
0
0
0
0
0
0
0
0 VI 0
0 0 0
'It +
0
0
0
Wo 0
0 B<fl
el
0
0
l�J.
( 1 3 . 1 .22)
We can estimate the model parameters in ( 1 3. 1.21) and ( 1 3 . 1 .22) by
maximizing the Gaussian likelihood ( 1 1.5.4) of {(Xt 1 , X t 2)', t = 1 , . . . , 149},
using ( 12.2. 1 2) and ( 1 2.2. 14) to determine the one-step predictors and their
error covariance matrices. This leads to the fitted model,
{ Zr }
Nt =
( 1 - .621 B) W,,
{ W,}
"'
WN(O, .0768),
( 1 3 . 1 .23)
"'
WN(O, .0457),
which differs only slightly from the least squares model ( 1 3 . 1 .8).
It is possible to construct a state-space model for the original leading
indicator and sales data, { Yt : = ( l-; 1 , l-;2)', t = 0, 1, . . . , 149}, at the expense of
increasing the dimension of the state-vector by two. The analysis is similar
§ 1 3. 1 . Transfer Function Modelling
515
to that given i n Example 1 2. 1 .7 for ARIMA processes. Thus we rewrite the
model ( 1 3. 1 .21)-( 1 3 . 1 .22) as
( 1 3 . 1 .24)
[ ] [ ] [ J
[ J [ ]+[ ] [ ]
[ ]
+
[z,J +
[ ]
+
Observing that
·0228
Yr 1
=
Y, r
.420
1"; 2
v r; I
=
v r; 2
_
_
0228
r ·
.420
-
Y, _ 1 - (t - 1 )
= X,
= GTJ,
.0228
Yr - 1 , 1
- (t - 1 )
.420
r; _ 1 . 2
.0228
.420
.0228
.420
Y, _ 1 - (t - 1 )
VVr
.0228
,
.420
we introduce the state vectors, T, + 1 = (TJ; + 1 , Y; - tj.t')', where J.l' =
(.0228, .420). It then follows from the preceding equation and ( 1 3 . 1 .24) that
{Y, - tJ.1} has the state-space representation, for t = 1, 2, . . . ,
Y, - tJ.l = [G
with state equation,
( 1 3 . 1 .25)
T,+ [ F lzxz]T, + [�:],
I=
initial condition,
/2x z]T, + [�J
0
G
( 1 3. 1 .26)
W.'
and orthogonality conditions,
t = 0, ± 1 , . . . .
+
To forecast future sales, we apply the Kalman recursions, ( 1 2.2. 10)-­
( 1 2.2. 14) to this state-space model to evaluate P.(Yn + h - (n h)J.l), where
P.( · ) denotes projection onto
sp{ Y0 , Y 1 - J.l, . . . , Y. - nJ.l} = sp{Y0 , X 1 , . . . , X.}.
Then the required predictor P(Y" + h 1 1, Y0 , . . . , Y.) is given by
P(Yn +h l l, Y o , . . . , Y.) = (n + h)J.l
+
P.(Yn + h - (n + h)J.l).
1 3. Further Topics
516
As in the case of an ARIMA process (see Remark 8, Section 1 2.2), this
predictor can be computed more directly as follows. Since Y 0 is orthogonal
to X 1 , X 2 , and 11 � > flz , . . . , we can write
0
. • •
P o tt l = Ett J =
and
. , X, _ 1 ) = ft, for t 2 2.
Similarly, P, ttr + h = P(tt, + h i X 1 , . . . , X ,) and P, X , + h = GP,ttr + h for all t 2 1
and h 2 1 . Both ft, and its error covariance matrix,
P, _ 1 fl,
=
P(tt, I X 1 , . .
n�. ,
=
E(tt, - ft,)(tt, - ft,)',
can be computed recursively by applying Proposition 1 2.2.2 to the model
( 1 3 . 1 .24) with initial conditions
Tt ! =
'1'�. I = Q,
and
0,
n�.
1
= E(tt J tt'J l = I FjQ ! F'j
j=O
0.0 1 74
0.0000
1 .92 1 5
0.0000
0.0000
0.5872
- 0. 1 7 1 9
0.4263
co
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000 ,
0.0000
- 0. 1 7 19
0.4263
0.5872
1.92 1 5
0.5872
1.92 1 5
0.5872
0.0000
0.0000
0.01 76
where Q 1 is the covariance matrix of V1 (see Problem 1 3.4). Consequently, the
one-step predictors for the state-vectors, T, = (tt; , Y; _ 1 - (t - l)'Jl')', m
( 1 3. 1 .26) are
Q [Q�·' OJ
with error covariance matrices given by
t
=
0
0 ,
for t 2 1 . It follows from ( 1 2.2. 1 2), ( 1 3 . 1 .24) and ( 1 3 . 1 .25) that
p 1 49(Y J s o - 1 50J1) = [G
whence
P(Y 1 so l l, Yo, · · · , Y 1 49 )
[y
lt ! s o
1 49 - 149J1
J
[ ]+[ ]+[ ]=[ ]
= Jl
=
+
/ 2 x 2]
X 1 so
.0228
.420
+
Y 1 49
. 1 38
- .232
1 3.4
262.7
1 3.56
.
262.89
( 1 3 . 1 .27)
517
§ 1 3 . 1 . Transfer Function Modelling
[ ]
[ ] [
Similarly,
P(Y t s l l l , Y o , . . . , Y I49) =
=
.0228
.420
+ P 1 4 9 X 1 s 1 + P(Y i s o l l , Y o , . . . , Y I 49)
.0228
.420
+
[ ] [ ]
J
1 3.56
0
.92 1 + 262.89
=
1 3.58
.
264.23
( 1 3. 1 .28)
The corresponding one- and two-step error covariance matrices, computed
from (1 2.2. 1 3) and (1 2.2. 1 4), are found to be
[
Q�� i 49
ol
= [ G /2 x 2]
L l49
O
0
0
[
J
and
=
where
and
Q1
[
X l s o)(X I 50 - X 1 5 o)'
E(X I 5 0 .07 68
0
,
=
.0457
0
=
J
,
/2 x 2] +
[G
[
< l
Q 1249 =
F
J
G
0
I2
X
2
J
( 1 3 . 1.29)
.09 79
0
,
0
.0523
[
.0768 0
0 .0457
][
( 1 3. 1 .30)
Q�� ll49
0
O
0
][
is the covariance matrix of (V'1 , Z 1 ,
F'
0
W1 )'.
Prediction Based on the Infinite Past
For the transfer function model described by ( 1 3 . 1 . 10H 1 3 . 1 . 1 2), prediction
of Xn + h. 2 based on the infinite past {(X1 1 , X, 2 )', - oo < t :s; n}, is substantially
simpler than that based on {(X, 1 , Xd', 1 :s; t :s; n}. The infinite-past
predictor, moreover, gives a good approximation to the finite-past predictor
provided n is sufficiently large.
The transfer function model ( 1 3 . 1 . 10H 1 3. 1 . 1 2) can be rewritten as
where f3(B)
=
X, 2 = T(B)X, 1 + {3(B)W,,
1
X1 1 = 8(B)¢ - (B)Z, ,
(13.1.31)
( 1 3. 1 .32)
(}( N)(B)/¢(N)(B). Eliminating x, l gives
X, 2 =
00
00
L cxj Z, _ j + L f3j w, _ j •
j= O
j=O
( 1 3 . 1 .33)
1 3. Further Topics
518
where cx(B) = T(B)O(B)I¢(B). Our objective is to compute
Pn X n + h, 2 : = psp{X,,, x,,, - aJ < t :5 n }x n + h, 2 '
Since {X,d and {N,} are assumed to be causal invertible ARMA processes,
it follows that sp { (X, 1 , X, 2 )', - oo < t � n} = sp { (Z, Jt;)', - oo < t � n}.
Using the fact that { Z,} and { Jt; } are uncorrelated, we see at once from
( 1 3. 1 .33) that
Pn X n + h, 2 = L ct.j Zn +h - j + L J3j W, +h -j'
( 1 3 . 1 .34)
j=h
j=h
Setting t = n + h in ( 1 3 . 1 .33) and subtracting ( 1 3 . 1 .34) gives the mean squared
error,
h-!
h- !
( 1 3 . 1 .35)
E(Xn + h, 2 - Pn X n + h, 2) 2 = d I ct.J + O"fy L J3J.
j=O
j=O
To compute the predictors Pn X n + h, z we proceed as follows. Rewrite
( 1 3. 1 .3 1 ) as
( 1 3 . 1 .36)
aJ
aJ
where A, U and V are polynomials of the form,
A (B) = 1 - A 1 B - · · · - A a Ba ,
U (B) = Uo + U1 B + · · · + UuB u ,
and
1 + V1B + · · · + VvB" .
Applying the operator P" to equation ( 1 3 . 1 .36) with t = n + h, we obtain
V(B)
=
a
=
v
u
I A j Pn X n + h - j, 2 + L Uj Pn X n + h - b -j, ! + L J-} W, + h - j '
j= !
j=O
j=h
( 1 3. 1 .37)
where the last sum is zero if h > v.
Since { Xt l } is uncorrelated with { Jt;}, Pn X j 1 Psr;{x,, - ao < r ;<;n} X j 1 , and
the second term in ( 1 3. 1 .37) is therefore obtained by predicting the univariate
series { X r 1 } as described in Chapter 5 using the model ( 1 3. 1 .32). In keeping
with our assumption that n is large, we can replace Pn X j by the finite past
predictor obtained from the program PEST.
The values Hj, j :-s:; n, are replaced by their estimated values Wj from the
least squares estimation in step (7) of the modelling procedure.
Equations ( 1 3 . 1 .37) can now be solved recursively for the predictors
Pn Xn + ! , 2 , Pn X n + 2 , 2 ' Pn X n + 3, 2 ' ' ' ' '
Pn X n + h , 2
=
ExAMPLE 1 3. 1 .2 (Sales with a Leading Indicator). Applying the preceding
results to the series {(X, 1 , Xr z )', 1 � t � 1 49} of Example 1 3 . 1 . 1, we find from
( 1 3. 1 .23) and ( 1 3. 1 .37) that
P 1 49 X 1 so, 2 = .726X 1 49, 2 + 4.70 1 X !4 7 ,1 - 1 .347 W1 4 9 + . 4 5 1 W! 4 s ·
§ 1 3 . 1 . Transfer Function Modelling
519
Replacing X 1 4 9, 2 and X 1 4 7 , 1 by the observed values .08 and - .093 and
W1 49 and W1 4 8 by the sample estimates - .0706 and . 1449, we obtain the
predicted value
P 1 4 9 X 1 50 . 2 = - .219.
Similarly, setting X 1 4 8, 1
=
.237, we find that
P t4 9 X t 5 t , 2 = .726P t49 X t 5o, 2 + 4.70 1 X t4 s. t + .45 1 Wt 4 9
= .923.
In terms of the original sales data { � 2 } we have ¥1 4 9, 2 = 262.7 and
� 2 = � - 1 . 2 + x 1 2 + .420.
Hence the predictors of actual sales are
Pf4 9 ¥1 50, 2 = 262.7 - .2 1 9 + .420 = 262.90,
P f4 9 Y1 5 t . 2 = 262.90 + .923 + .420 = 264.24,
{
( 1 3 . 1 .38)
where Pf4 9 means projection onto sp{ 1 , ¥1 4 9, 2 , (X5, 1 , X s, 2 )', - oo < s <
149}. These values are in close agreement with ( 1 3 . 1 .27) and ( 1 3 . 1 .28) obtained
earlier by projecting onto sp{ 1 , ( � 1 , � 2 )', 0 � t � 1 49}. Since our model
for the sales data is
( 1 - B) � 2 = .420 + 4.701B\1 - .476B)( l - .726B) - 1 Z1 + ( 1 - .621B) VJ--; ,
an argument precisely analogous to the one giving ( 1 3. 1 .35) yields the mean
squared errors,
h-1
h- 1
E( Yt 4 9 + h, 2 - Pf4 9 Yt 4 9 + h, 2) 2 = ai L rxf 2 + a� L [3f 2 ,
j=O
j=O
where
1
3
rxf zj = 4.70 1 z ( 1 - .476z)( 1 - .726z) - 1 ( 1 - z) L
j=O
00
and
L f3J zi ( 1 - .62 1 z)(1 - z) - 1 .
00
=
j= O
For h = 1 and 2 w e obtain
E( Yt 5o, 2 - P f4 9 Yt 5o, 2 ) 2 = .0457,
( 1 3. 1 .39)
E( Yt 5 t , 2 - Pf4 9 Yt 5 t , 2 ) 2 .0523,
m agreement with the finite-past mean squared errors m ( 1 3 . 1 .29) and
( 1 3 . 1 .30).
It is interesting to examine the improvement obtained by using the transfer
function model rather than fitting a univariate model to the sales data alone.
{
=
520
1 3. Further Topics
If we adopt the latter course we find the model,
X, 2 - .249X, _ 1 _ 2 - . 1 99X, _ 2 , 2
=
U,,
where { U,} � WN(O, 1.794) and X, 2 = 1'; 2 r; 1 2 - .420. The correspond­
ing predictors of Y1 5 0 , 2 and Y1 5 1 , 2 are easily found from the program PEST
to be 263. 14 and 263.58 with mean squared errors 1.794 and 4.593
respectively. These mean squared errors are dramatically worse than those
obtained using the transfer function modeL
-
-
.
§ 13.2 Long Memory Processes
An ARMA process {X,} is often referred to as a short memory process since
the covariance (or dependence) between X1 and X 1 +k decreases rapidly as
k -> oo. In fact we know from Chapter 3 that the autocorrelation function is
geometrically bounded, i.e.
i p(k)i :::;; Cr\
k = 1, 2, . . .
,
where C > 0 and 0 < r < l . A long memory process is a stationary process
for which
p(k) � Ck 2d - l as k -> oo,
( 1 3.2. 1 )
where C # 0 and d < .5. [Some authors make a distinction between "inter­
mediate memory" processes for which d < 0 and hence L� _ 00 I p(k)l < oo, and
"long memory" processes for which 0 < d < .5 and L k"= -oo i p(k)l oo.]
There is evidence that long memory processes occur quite frequently in
fields as diverse as hydrology and economics (see Hurst ( 1 95 1 ), Lawrance and
Kottegoda (1 977), Hipel and McLeod ( 1 978), and Granger ( 1 980)). In this
section we extend the class of ARMA processes as in Hosking ( 1 98 1 ) and
Granger and Joyeux ( 1980) to include processes whose autocorrelation func­
tions have the asymptotic behaviour ( 1 3.2. 1). While a long memory process
can always be approximated by an ARMA(p, q) process (see Sections 4.4 and
8. 1), the orders p and q required to achieve a reasonably good approximation
may be so large as to make parameter estimation extremely difficult.
For any real number d > - 1 , we define the difference operator Vd =
( 1 - B)d by means of the binomial expansion,
=
vd = ( 1 - B)d =
where
n- =
1
L nj Bi,
00
j=O
ru - d)
k-1-d
= n
'
k
f (j + 1)r( - d) O<ksj
j = 0, 1 , 2, . . . ' ( 1 3.2.2)
521
§ 1 3.2. Long Memory Processes
and r( · ) is the gamma function,
r(x) : =
{Iro
t x- 1 e - 1 dt,
oo,
x-1 r(l + x),
X > 0,
X = 0,
X < 0.
Definition 1 3.2.1 (The ARIMA (O,d,O) Process). The process { X0 t = 0, ± 1 , . . . }
is said to be an ARIMA (0, d, 0) process with d E ( - .5, .5) if { X, } is a stationary
solution with zero mean of the difference equations,
( 1 3.2.3)
The process { X1 } is often called fractionally integrated noise.
Throughout this section convergence of sequences of random
variables means convergence in mean square.
Remark 1 .
Remark 2. Implicit in Definition 1 3.2. 1 is the requirement that the series
VdX, = 'If= 0 niX, _ i with { ni } as in ( 1 3.2.2), should be mean square con­
vergent. This implies, by Theorem 4. 10. 1, that if X1 has the spectral representation X, = f< - "· "/1" dZx(A.) then
VdX, =
f e il " ( 1
J(-1t,1t]
-
e - i). )d dZx(A.).
( 1 3.2.4)
In view of the representation ( 13.2.3) of { Z1} we say that { X1} is
invertible, even though the coefficients { ni } may not be absolutely summable
as in the corresponding representation of { Z, } for an invertible ARMA pro­
cess. We shall say that { X, } is causal if X1 can be expressed as
Remark 3.
X, =
L 1/Jj Zr -j
j=
ro
O
where '[,f= 0 1/J J < oo . The existence of a stationary causal solution of ( 13.2.3)
and the covariance properties of the solution are established in the following
theorem.
Theorem 1 3.2.1 . If d E ( - .5, .5) then there is a unique purely nondeterministic
stationary solution { Xr } of ( 1 3.2.3) given by
ro
xl = I 1/Jj Z,_j = v - dzo
j= O
( 1 3.2.5)
1 3. Further Topics
522
where
t/1·
1
=
ru + dl
TI k - 1
=
k
['(j + 1 ) 1 (d) O<k ,;j
+
d
j = 0, 1, 2, . . . .
'
( 1 3.2.6)
Denoting byf( · ), y ( · ), p( · ) and oc( · ) the spectral density, autocovariancefunction,
autocorrelation function and partial autocorrelation function respectively of
{X, } , we have
- n :s;: A
y(O)
p (hl
=
=
CJ 2 1(1 - 2d)/12 ( 1 - d),
qh + dl 1 (1 - dl
['(h - d + 1 ) ['(d)
=
TI k - 1 + d
O<k ,; h k - d
:S::
n,
( 1 3.2. 7)
( 1 3.2.8)
'
h
=
1, 2, . . . ,
( 1 3.2.9)
and
oc(h) = d/(h - d),
h
=
( 1 3.2. 1 0)
1, 2, . . . .
Remark 4. App1ying Stirling's formu1a, l(x)
to ( 1 3.2.2), (1 3.2.6) and ( 1 3.2.9), we obtain
�
J2� e - x + 1 (x - lY - 1;2 as x � oo,
as j � 00 ,
nj rd - 1 /1( - d)
tf;i l - 1 /l(d)
as j � oo,
�
(1 3.2. 1 1)
�
( 1 3.2. 1 2)
and
p(h)
�
h 2d - 1 1(1 - d)j['(d)
as h �
oo.
( 1 3.2. 1 3)
Fractionally integrated noise with d # 0 is thus a long memory process in
the sense of Definition 1 3.2. 1 .
Remark
5.
Since sin A
�
A as A � 0, we see from ( 1 3.2.7) that
(1 3.2. 1 4)
showing that f(O) is finite if and only if d :s;: 0. The asymptotic behaviour
( 1 3.2. 14) of f(A) as A � 0 suggests an alternative frequency-domain definition
of long memory process which could be used instead of ( 1 3.2. 1 ).
PROOF OF THEOREM 1 3.2. 1 . We shall give the proof only for 0 < d < .5 since
the proof for - .5 < d < 0 is quite similar and the case d = 0 is trivial.
From (1 3.2. 1 2) it follows that L� o tj;J < oo so that
L tf;i e - ii · � ( 1 - e - ' Td as n � oo ,
j=O
n
( 1 3.2. 1 5)
523
§1 3.2. Long Memory Processes
where convergence is in L 2(dA.) and dA. denotes Lebesgue measure. By Theorem
4. 1 0. 1 ,
(1 - B)-dZ, : = L 1/Jj Zr-j
j =O
is a well-defined stationary process and if { Z, } has the spectral representation
Z, f< - ,.,1 e i.l.r d W(A.), then
ew(l - e - 0Td d W(A.).
( 1 - B)-dz, {
00
=
=
J ( - 7t , 7t]
Since L� o I nil < oo (by ( 1 3.2. 1 1 )), we can apply the operator ( 1 - B)d =
L� o niBi to (1 - B) - dz, (see Remark 1 in Section 4. 1 0), giving
-
(1
B)d(1 - B) - dZ,
=
{
J ( - 7t, 7t]
ew d W (A.)
= Z,.
Hence {X,} as defined by ( 1 3.2.5) satisfies ( 1 3.2.3).
To establish uniqueness, let { Y,} be any purely nondeterministic stationary
solution of ( 1 3.2.3). If { Y, } has the spectral representation,
Y,
=
{
J ( - n , n]
e io. dZr(A.),
then by ( 1 3.2.4), the process {( 1 - B)d Y,} has spectral representation,
(1
-
B)d Y,
{
=
J ( - 7t , 7t]
e i'.< ( l - e -i.< )d dZr(A.)
and spectral density a2/(2n). By ( 1 3.2. 1 5), Theorem 4. 1 0. 1 and the continuity
of Fr at 0,
( 1 - srdz, = ( 1 - B) -d ( l - B)d Y,
i
= lim
1/Ji B ( 1 - B)d Y,
n-+oo
1=0
=
{
(.I )
J ( -1t, 7t]
( 1 - e - ; ;. ) - d(l - e -i.< )de ir .< dZy (A.)
= Y,.
( 1 - B) - d Z, = X,.
Hence Y, =
By (4.1 0.8) the spectral density of {X,} is
f(A.) = 1 1 - e - i.< l - 2da2/(2n) = 1 2 sin(A./2W 2da2/(2n).
The autocovariances are
2f
(J "
= -
n o
cos(hA.)(2 sin(A./2W2d dA.
( - 1 t r ( l - 2d)
2,
r(h - d + 1)r(1 - h - d(
h
= 0, 1 , 2, . . . .
524
13. Further Topics
I"
where the last expression is derived with the aid of the identity,
n: cos (hn/2) 1 ( v + 1)2 1 -v
. v-1
cos (hx)s m
(x) dx = v v
l(( + h + 1 )/2) 1(( v - h + 1 )/2)
o
(see Gradshteyn and Ryzhik ( 1 965), p. 372). The autocorrelations ( 1 3.2.9) can
be written down at once from the expression for y(h). To determine the partial
autocorrelation function we write the best linear predictor of xn+1 in terms of
xn , . . . , x1 as
.fn+1 = rPn 1 xn + . . . + rPnnx 1
and compute the coefficients rPni from the Durbin-Levinson algorithm (Prop­
osition 5.2. 1). An induction argument gives, for n = 1 , 2, 3, . . . ,
rPn) =
·
whence
(X
- n r(j - d)l(n - d - j + 1 )
( )
j
r( - d) l(n - d + 1 ) '
-
l(h - d)1 (1 - d)
j
=
1 , . . . , n,
d_
( h) - rPnh - - 1
-_ .
D
( - d) l(h - d + 1 ) h - d
Fractionally integrated noise processes themselves are of limited value in
modelling long memory data since the two parameters d and (J 2 allow only a
very limited class of possible autocovariance functions. However they can be
used as building blocks to generate a much more general class of long memory
processes whose covariances at small lags are capable of assuming a great
variety of different forms. These processes were introduced independently by
Granger and Joyeux ( 1 980) and Hosking ( 1 98 1).
(The ARIMA(p, d, q) Process with d E (- .5, .5)). { X1, t = 0,
± 1 , . . . } is said to be an ARIMA(p, d, q) process with d E ( - .5, .5) or a
fractionally integrated ARMA(p, q) process if { X1} is stationary and satisfies
the difference equations,
Definition 13.2.2
( 1 3.2. 1 6)
where { zt } is white noise and r/J, e are polynomials of degrees p, q respectively.
Clearly { X1 } is an ARIMA(p, d, q) process with d E ( - .5, .5) if and only if
Vd X1 is an ARMA(p, q) process. If 8 (z) =f. 0 for l z l ::::; 1 then the sequence
1; = rjJ(B)8 - 1 (B)X1 satisfies
and
r/J(B)X1 = 8(B) 1;,
so that { X1 } can be regarded as an ARMA(p, q) process driven by fractionally
integrated noise.
525
§ 1 3.2. Long Memory Processes
Theorem 1 3.2.2 (Existence and Uniqueness of a Stationary Solution of
( 1 3.2. 1 6)). Suppose that d E ( - 5, .5) and that ¢( - ) and 8( - ) have no common
zeroes.
(a) If r/J(z) =f 0 for l z l = 1 then there is a unique purely nondeterministic
stationary solution of (1 3.2. 1 6) given by
00
xt = L 1/!j v -d Zr -j
j=
- ro
where 1/J (z) = L� - oo 1/Jj zj = 8(z)/r/J(z).
(b) The solution {Xr } is causal if and only if r/J (z) =f 0 for l z l :::; 1 .
(c) The solution {Xr } is invertible if and only if 8(z) =f 0 for l z l :::; 1 .
(d) If the solution {X, } is causal and invertible then its autocorrelation
function p ( · ) and spectral density f( · ) satisfy, for d i= 0,
p(h) � Ch 2 d - ! as h --> oo ,
where C
=f
( 1 3.2. 1 7)
0, and
as ). --> 0.
PROOF. We omit the proofs of (a), (b) and (c) since they are similar to the
arguments given for Theorems 3. 1 . 1 -3. 1.3 with Theorem 4. 1 0. 1 replacing
Proposition 3. 1 .2.
If { X, } is causal then r/J (z) =f 0 for l z l :::; 1 and we can write
00
X, = 1/! (B) Y, = L 1/!j Y, -j •
j�O
where
is fractionally integrated noise. If Yr( · ) is the autocovariance function of { Y,},
then by Proposition 3. 1 . 1 (with 1/Jj := O,j < 0),
Cov ( Xr+h• X,) = L L 1/Jj l/lk yy (h - j + k)
j k
I.e.
Yx(h) = L y(k}yy(h - k),
k
( 1 3.2. 1 9)
where y(k) = Lj 1/Jj l/lj +k is the autocovariance function of an ARMA(p, q) pro­
cess with a 2 = 1. If follows that ly(k) l < Cr\ k = 0, 1 , 2, . . . , for some C > 0
526
1 3. Further Topics
and r E (0, 1) and hence that
h 1 - 2 d I lii(k) l -4 0 as h -4 oo.
\k \> �
From ( 1 3.2. 19) we have
h l- 2 d Yx(h)
=
h l - 2 d I y (k)yy (h - k)
\k\> �
+
( 1 3.2.20)
L y(k)h l - 2 d)ly (h - k) .
\k\ s::;h
( 1 3.2.2 1 )
The first term o n the right converges to zero as h -4 oo b y ( 1 3.2.20). B y ( 1 3.2. 1 3)
there is a constant C =1= 0 (since we are assuming that d =1= 0) such that
Yr(h - k) C(h - k) z d- 1 Ch z d - 1
jh. Hence
�
uniformly on the set l k l
::;:;
�
Now letting h -4 oo in (13.2.2 1) gives the result (1 3.2. 1 7).
Finally from (4.4.3) and ( 1 3.2.7) the spectral density of { X1} is
f(A.) 1 8 (e - i'-) l 2 lt;b(e - i'-W 2/r(2)
=
Remark 6. A formula involving only the gamma and hypergeometric
functions is given in Sowell ( 1990) for computing the autocovariance function
of an ARIMA(p, d, q) process when the autoregressive polynomial ¢(z) has
distinct zeroes.
Remark 7 (The ARIMA(p, d, q) Process with d ::;:; - .5). This is a stationary
process {X r} satisfying
( 1 3.2.22)
It is not difficult to show that ( 1 3.2.22) has a unique stationary solution.
Xt = r 1 (B)O(B)V - d Z1.
The solution however is not invertible. Notice that if { X1 } is an ARIMA(p, d, q)
process with d < .5 then { ( 1 B)X1 } is an ARIMA(p, d - 1 , q) process. In
particular if 0 < d < .5, the effect of applying the operator ( 1 B) is to
transform the long memory process into an intermediate memory process
(with zero spectral density at frequency zero).
-
-
527
§ 1 3.2. Long Memory Processes
Parameter Estimation for ARIMA(p, d, q) Processes
with d E ( - .5, .5)
Estimation of The Mean. Let {X, } be the causal invertible ARIMA(p, d, q)
process defined by
d E ( - .5, .5). ( 1 3.2.23)
A natural estimator of the mean EX, = J1 is the sample mean,
X" = n�1 (X 1 + · · · + X" ).
Since the autocorrelation function p ( · ) of {X, } satisfies p (h) � as h �
conclude from Theorem 7. 1 . 1 that
0
and that
nE(X" - Jt) 2 �
{0
oo
if - .5 < d <
if O < d < .5.
oo,
we
0,
Using ( 1 3.2. 1 3) we can derive (Problem 1 3.6) the more refined result,
n1 � 2d E(Xn - J1)2 � C for d E ( - . 5, . 5),
where C is a positive constant. For long memory processes the sample mean
may not be asymptotically normal (see Taqqu ( 1 975)).
Estimation of the Autocorrelation Function, p ( · ). The function p ( · ) is usually
estimated by means of the sample autocorrelation function p( · ). In the case
- .5 < d < { X, } has the moving average representation
0,
X,
=
J1 +
L 1/Jj Zr �j •
00
j�O
with L}�o 1 1/!i I < oo If in addition { Z, } IID(O, a 2 ) and EZt < oo then
n 1 12 (p(h) - p (h)) is asymptotically normal with mean zero and variance given
by Bartlett's formula, (7.2.5). If 0 < d < .5 the situation is much more com­
plicated; partial results for the case when {Z, } is Gaussian can be found in
Fox and Taqqu ( 1 986).
�
Estimation of d, <I> and 9
(a) Maximum likelihood. The Gaussian likelihood of X = (X 1 , . . . , X" )' for the
process ( 1 3.2.23) with J1 0 can be expressed (cf. (8.7.4)) as
=
L ( p, a 2 )
=
� { 1 )�t1
(2na 2 r"12 (r0, . . . , rn � 1 ) 1 12 ex p - 2 2
(j
}
(Xi - XYh � � ·
528
13. Further Topics
where � = (d, f/J 1 , . . . , f/JP , (J 1 , . . . , (Jq )', Xi , j = 1 , . . . , n, are the one-step predictors
and ri - t = CJ - 2 E (Xi - xy, j = 1, . . . , n. The maximum likelihood estimators
and 8 2 can be found by maximizing L( p, CJ 2 ) with respect to � and CJ 2 • By
the same arguments used in Section 8.7 we find that
a2 = n - 1 S( ),
p
p
where
p
and is the value of � which minimizes
ln(S(�)/n) + n - 1 L In ri -l ·
( 1 3.2.24)
j=l
For {Z, } Gaussian, it has been shown by Yajima ( 1 985) in the case p = q = 0,
d > 0, and argued by Li and McLeod ( 1 986) in the case d > 0, that
/(�)
n
=
( 1 3.2.25)
where W(�) is the (p + q + 1) x (p + q + 1) matrix whose (j, k) element is
1
W.k (R) - 4n
1 "
_
f"
_ "
a In g(A. ; �) a In g(A. ; �)
dA_
a{Jk
a {Ji
'
and CJ 2 g( ; �)/(2n) is the spectral density of the process. The asymptotic
behaviour of is unknown in the case d < 0. Direct calculation of /(�) from
( 1 3.2.24) is slow, especially for large n, partly on account of the difficulty
involved in computing the autocovariance function of the process ( 1 3.2.23),
and partly because the device used in Section 8.7 to express Xi in terms of
only q innovations and p observations cannot be applied when d i= 0.
It is therefore convenient to consider the approximation to /(�),
·
p
1
I (w )
Ia(�) = In - L -"�
1 ,
i
g(wi
n
; �)
where I.( · ) is the periodogram of the series {X 1 , . . . , X. } and the sum is over
all non-zero Fourier frequencies wi = 2nj/n E ( - n, n]. Hannan ( 1 973) and Fox
and Taqqu (1 986) show that the estimator p which minimizes lu(�) is consistent
and, if d > 0, that p has the same limit distribution as in ( 1 3.2.25). The white
noise variance is estimated by
I (wi
(j 2 = ! L . � .
n i g(w i ; �)
·
The approximation la(�) to /(�) does not account for the determinant term
n - 1 L'l = 1 ln ri - t = n - 1 ln det(CJ - 2 r.) where r. = E(XX'). Although
n - 1 L ln ri _ 1 --> 0
j= 1
n
as n --> oo ,
§ 1 3.2. Long Memory Processes
529
this expression may have a non-negligible effect on the minimization of I(�)
even for series of several hundred observations. A convenient approximation
to the determinant term can be found from Proposition 4.5.2, namely
n
-1
In
(ry
)
g(wi; �) =
n
-1
� In g(w ; �).
i
Adding this term to Ia(�), we arrive at a second approximation to I given by
( 1 3.2.26)
Estimation based on minimizing lb( · ) has been studied by Rice ( 1979) in a
more general setting. For ARIMA(p, d, q) processes with d E ( - .5, .5),
empirical studies show that the estimates which minimize lb tend to have less
bias than those which minimize Ia .
(b) A regressio n method. The second method is based on the form of the
spectral density
( 1 3.2.27)
where
( 1 3.2.28)
is the spectral density of the ARMA(p, q) process,
( 1 3.2.29)
Taking logarithms in ( 1 3.2.27) gives
;
ln f(A.) = lnfu(O) - d ln l 1 - e - ;. 1 2
+
ln[fu(A-)/fu (O)].
( 1 3.2.30)
Replacing A in ( 1 3.2.30) by the Fourier frequency wi = 2nj/n E (O, n) and adding
In In(wi ) to both sides, we obtain
In ln(w) = ln fu(O) - d ln l 1 - e -iw1 1 2 + ln(Jn(wi )/f(wj ))
( 1 3.2.3 1 )
+ ln(fu(w)!fu(O)).
Now if wi i s near zero, say wi ::;; wm where wm i s small, then the last term is
negligible compared with the others on the right-hand side, so we can write
( 1 3.2.3 1 ) as the simple linear regression equation,
j = 1 , . . . , m,
( 1 3.2.32)
where lj = In In(w), xi = ln l l - e -iw1 l 2 , ei = ln(In(w)/f(w)), a = ln fu(O) and
b = - d. This suggests estimating d by least-squares regression of Y1 , . . . , Ym
on x 1 , . . . , xm . When this regression is carried out, we find that the least-squares
estimator a of d is given by
530
I 3. Further Topics
a
m
lm
= - i� (X; - x) ( Y; - Y) i� (x; - x)2.
( 1 3.2.33)
Geweke and Porter-Hudak ( 1 983) argue that when - .5 < d < 0 there exists
a sequence m such that (In n)2/m --+ 0 as n --+ oo and
( / [ � x; ])
a is AN d, n2
6; (
- x)2
as n --+ oo .
( 1 3.2.34)
Notice that n2/6 is the variance of the asymptotic distribution of ln(J (.l.)/f(.l.) )
for any fixed A E (0, n).
Having estimated d, we must now estimate the ARMA parameters cp and
9. Since X, v - d V, where { V, } is an ARMA(p, q) process, we find from
( 10.3. 1 2) (replacing Z by U) that
( 1 3.2.35)
lx(l) ( 1 - e - uT d lu (A.) + Y,(.l.)
=
=
where lx( · ) and lu( · ) are the discrete Fourier transforms of {X1 , . . . , X. } and
{ U1 , . . . , v. } respectively. Ignoring the error term Y,(.l.) (which converges in
probability to zero as n --+ 00 ) and replacing d by a, we obtain the approximate
relation
( 1 3.2.36)
If now we apply the inverse Fourier transform to each side of (1 3.2.36) we
obtain the estimates of V,,
t
= 1, . . . , n,
( 1 3.2.37)
where the sum is taken over all Fourier frequencies wi E ( - n, n] (omitting the
zero-frequency term if d < 0). Estimates of p, q, cp and 9 are then obtained by
applying the techniques of Chapter 9 to the series { 0, } .
The virtue of the regression method i s that it permits estimation o f d
without knowledge of p and q. The values { 0,} then permit tentative identifi­
cation of p and q using the methods already developed for ARMA processes.
Final estimates of the parameters are obtained by application of the approxi­
mate likelihood method described in (a).
ExAMPLE 1 3.2. 1 . We now fit a fractionally integrated ARMA model to the
data {X, t = 1, . . . , 200} shown in Figure 1 3.3. The sample autocorrelation
function (Figure 1 3.4) suggests that the series is long-memory or perhaps even
non-stationary. Proceeding under the assumption that the series is stationary,
we shall fit an ARIMA(p, d, q) model with d E ( - .5, .5). The first step is to
estimate d using ( 1 3.2.33). Table 1 3 . 1 shows the values of the regression
estimate a for values of m up to 40. The simulations of Geweke and Porter­
Hudak ( 1983) suggest the choice m = n · 5 or 14 in this case. In fact from the
table we see that the variation in a is rather small over the range 1 3 :::;: m :::;: 3 5.
It appears however that the term ln(fu(w)!fu(O)) in ( 1 3.2.30) is not negligible
§1 3.2. Long Memory Processes
531
0
-
1
-2
-3
-4
0
60
40
20
80
1 00
1 20
1 40
1 60
1 80
200
Figure 1 3.3. The data {X,, t = I , . . . , 200} of Example 1 3. 1 .
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
4-------��&ff�--�--��-
-0.2
-0.3
-0.4
-0 5
-0 6
-0.7
-0 8
-0 9
- 1
50
40
30
20
1 0
0
Figure 1 3.4. The sample autocorrelation function of the data shown in Figure 1 3.3.
Table 1 3 . 1 . Values of the Regression Estimator d in Example 1 3.2. 1
m
a
13
14
.342 .371
15
16
17
18
19
20
.299
.356
.41 1
.42 1
.37o
.334
25
30
35
.37o .409 .424
40
45
.521
.562
532
1 3. Further Topics
for j 2 40. We take as our estimate the value of d when m is 1 4, i.e. d = .37 1 ,
with estimated variance n 2/[6(IJ.�·1 (x; - xlJ = .053 1 . A n approximate 95%
confidence interval for d is therefore given by ( - .08 1 , .500). (Although the
asymptotic distribution ( 1 3.2.34) is discussed by Geweke and Porter-Hudak
only for d < 0, their simulation results support the validity of this distribution
even in the case when 0 < d < .5.)
Estimated values of U, = Vd (X, + .0434) are next found from ( 1 3.2.37). The
sample autocorrelation function of the estimates ( 0,} (Figure ( 1 3.5) strongly
suggests an MA(1) process, and maximum likelihood estimation of the param­
eters gives the preliminary model,
V · 3 7 5 (X, + .0434) = Z, + .8 1 6Z, _ l ,
( 1 3.2.38)
{Z, } WN(0, .489).
�
Finally we reestimate the parameters of the ARIMA(O, d, 1) model by
minimizing the function lb(d, 8) defined in ( 1 3.2.26). The resulting model is
{Z, }
�
WN(O, . 5 1 4),
( 1 3.2.39)
which is very similar both to the preliminary model ( 1 3.2.38) and to the model
{Z, }
�
WN(0, .483),
( 1 3.2.40)
from which the series {X, } was generated.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
- 1
0
1 0
20
30
40
Figure 1 3.5. The sample autocorrelation function of the estimates 0 , of V· 3 7 1 (X, +
.0433), Example 1 3.2. 1 .
533
§1 3.2. Long Memory Processes
Prediction of an ARIMA(p, d, q) Process, d E ( - .5, .5)
Let {X, } be the causal invertible ARIMA(p, d, q) process,
d E ( - .5, .5)
r/J(B)Vd X, = B(B) Z,
( 1 3.2.4 1 )
The innovations algorithm can be applied to the covariance function of {X, }
to compute the best linear predictor of Xn+h i n terms o f X 1 , , X" . For large
n however it is much simpler to consider the approximation
• • .
Xn+h : = P-;;pfl x . -oo <1·< n} Xn+h ·
J'
-
Since we are assuming causality and invertibility we can write
00
X, = L 1/Jj Zt -j
j=O
and
00
Z, = L ni Xt -i •
j=O
where L J= O 1/Jj z j = 8(z)r 1 (z) (1 - zr d and L .i= o nj z j = r/J(z)8 - 1 (z) ( 1 - z)d,
i z l < 1 . Theorem 5.5. 1 can be extended to include the process ( 1 3.2.41 ), giving
00
00
xn+h = - j=L1 nj xn+h -j = j=Lh 1/Jj Zn +h-j
( 1 3.2.42)
and
2 h-1
( 1 3.2.43)
j=O
Predicted values of X 20 1 , . . . , X 2 3 0 were computed for the data of Example
( 1 3.2. 1) using the fitted model ( 1 3.2.39). The predictors X 2oo + h were found
using a truncated and mean-corrected version of ( 1 3.2.42), namely
1 99+h
x200 +h + .0433 = - L nj (X200 +h-j + .0433),
j= 1
2
a; (h) = E(Xn+h - xn+h ) = CJ L 1/1} .
from which X2 0 1 , X202, . . . , can be found recursively. The predictors are
shown with the corresponding observed values X 20 1 , . . . , X 2 3 0 in Figure 1 3 .6.
Also shown are the predictors based on the ARMA (3, 3) model,
3
( I - . 1 32B + .037 B 2 - .407 B ) (X, + .0433)
= Z, + 1 .061 ZH + .49 1 Z, _ 2 + .21 8Z, _ 3 ,
{ Z,}
�
WN (0, .440),
which was fitted to the data {X,, t = 1, . . . , 200} using the methods of Section
9.2. The predictors based on the latter model converge much more rapidly to
the mean value, - .0433, than those based on the long memory model.
1 3. Further Topics
534
210
200
230
220
Figure 1 3.6. The data X 2 0 1 , . . . , X 2 3 0 o f Example 1 3.2. 1 showing the predictors based
on the ARMA model (lower) and the long memory model (upper).
The average squared errors of the 30 predictors are 1 .43 for the long
memory model and 2.35 for the ARMA model. Although the ARMA model
appears to fit the data very well, with an estimated one-step mean square
prediction variance .440 as compared with .514 for the long memory model,
the value .440 is a low estimate of the true value (.483) for the process (1 3.2.40)
from which the series was generated. Both predictors will, as the lead time
h -+ oo, have bias - .0433 and asymptotic variance 2. 1 73, the variance of the
generating process. It is interesting to compare the rates of approach of
the two predictor variances to their asymptotic values. This is done in Table
1 3.2 which shows the ratio avo-;( 00 ) for both models as computed from
( 1 3.2.43). It is apparent that the ARMA predictors are not appreciably better
than the mean for lead times of 10 or more, while the value of the long memory
predictor persists for much greater lead times.
Table 1 3.2. a;(h)/O';(oo) for the Long Memory and ARMA Models for the Data of
Example 1 3.2. 1
h
2
3
4
5
10
20
30
40
50
Long memory model
.250
.606
.685
.728
.757
.830
.887
.9 14
.932
.945
ARMA model
.261
.632
.730
.844
.922
.993
1 .000
1 .000
1 .000
1 .000
§1 3.3. Linear Processes with Infinite Variance
535
§ 1 3.3 Linear Processes with Infinite Variance
There has recently been a great deal of interest in modelling time series using
ARMA processes with infinite variance. Examples where such models appear
to be appropriate have been found by Stuck and Kleiner (1974), who considered
telephone signals, and Fama ( 1 965), who modelled stock market prices. Any
time series which exhibits sharp spikes or occasional bursts of outlying obser­
vations suggests the possible use of an infinite variance model. In this section
we shall restrict attention to processes generated by application of a linear
filter to an iid sequence, { Z,, t = 0, ± I , . . . }, of random variables whose dis­
tribution F has Pareto-like tails, i.e.
x) � pC, as x � oo ,
xaF( - x) = xaP(Z, :::;; - x) � qC, a s X � oo ,
{xa(l - F(x)) = xa P(Z1
>
( 1 3.3. 1)
where 0 < rx. < 2, 0 :::;; p = 1 - q :::;; 1, and C is a finite positive constant which
we shall call the dispersion, disp(Z,), of the random variable Z, . From ( 1 3.3. 1 )
we can write
xa( l - F(x) + F( - x)) = xaP( I Z,I > x) � C as x � oo .
A straightforward calculation (Problem ( 1 3.7)) shows that
if b ;:::: ('1,
if b < ('1,,
( 1 3.3.2)
( 1 3.3.3)
Hence Var(Z,) = oo for 0 < rx. < 2 and E I Z, I < oo only if 1 < rx. < 2. An im­
portant class of distributions satisfying ( 1 3.3. 1 ) consists of the non-normal
stable distributions.
13.3.1 (Stable Distributions). A random variable Z is said to be
stable, or to have a stable distribution, if for every positive integer n there exist
constants, an > 0 and bn, such that the sum zl + . . . + zn has the same
distribution as an Z + bn for all iid random variables Z1 , . . . , Zn, with the same
distribution as Z.
Definition
Properties of a Stable Random Variable, Z
Some of the important properties of Z are listed below. For an extensive
discussion of stable random variables see Feller (1971), pp. 568-583, but note
the error in sign in equation (3. 1 8).
I . The characteristic function, l/J(u) = E exp(iuZ), is given by
exp {iuf3 - d l u l a ( l - i8 sgn(u)tan(nrx./2)) } if rx. =1= I ,
"'
'f' (u) =
exp { iu/3 - d l u l ( l + i8(2/n)sgn(u)ln l u i ) } if rx. = 1 ,
{
( 1 3.3.4)
1 3. Further Topics
536
where sgn(u) is u/ 1 u I if u =/= 0, and zero otherwise. The parameters a E (0, 2],
1
f3 E IR, d 1a E [0, 00 ) and e E [ - 1 , 1 ] are known as the exponent, location,
scale and symmetry parameters respectively.
2. If a = 2 then Z N(/3, 2d).
3. If 8 = 0 then the distribution of Z is symmetric about /3. The symmetric
stable distributions (i.e. those which are symmetric about 0) have charac­
teristic functions of the form
�
( 1 3.3.5)
4. If a = 1 and 8 = 0 then Z has the Cauchy distribution with probability
density f(z) = (d/n) [d2 + (z - /3) 2 r l ' z E R
5. The symmetric stable distributions satisfy the property of Definition 1 3. 3 . 1
with an = n 1 1a and b" = 0, since if Z, Z1, . . . , Z" all have the characteristic
function ( 1 3.3.5) and z l • · · · · zn are independent, then
n
E exp [iu (Z1 + · · · + Zn ) J = e - d lu l" = E exp [iuZn1fa].
6.
If F is the distribution function of Z and a E (0, 2), then ( 1 3.3. 1 ) is satisfied
with p = ( 1 + 8)/2 and
C=
{d/(r( l - a)cos(na/2))
2d/n
if a =1=
if a =
1,
( 1 3.3.6)
1.
In the following proposition, we provide sufficient conditions under which
the sum LJ= 1/JjZ, _ j exists when { Z,} is an iid sequence satisfying ( 1 3.3. 1 ).
_ w
Proposition 1 3.3.1. Let { Z,} be an iid sequence of random
( 1 3.3. 1 ). If { 1/1J is a sequence of constants such that
variables satisfying
GO
< oo for some <5 E (0, a) n [0, 1 ] ,
L
j = I 1/!l
then the infinite series,
( 1 3.3.7)
- ro
w
L 1/JjZr - j •
j= - rfJ
converges absolutely with probability one.
PROOF. First consider the case
hence
1
< a < 2. Then by ( 1 3.3.3), E I Z 1 1 < oo and
00
= L 1 1/Jj i E I Zt l < oo.
j= - oo
Thus L � - oo 1 1/JjZr-jl is finite with probability one.
Now suppose 0 < a < 1 . Since 0 < <5 < 1, we can apply the triangle in­
equality lx + y l b � l x l b + I Y ib to the infinite sum L � - oo 1/!jZr-j· Making use
537
§1 3.3. Linear Processes with Infinite Variance
of (1 3.3.3) we then find that
00
=
Hence I� -oo I t/JiZt -)
< oo
I 1 t/Jl E I Z1 1b
j=
- oo
< 00 .
with probability one.
Remark 1. The distribution of the infinite sum I� _ 00
Specifically,
0
t/JiZt _ i satisfies (1 3.3.2).
(see Cline, 1983).
Remark 2. If Z 1 has a symmetric stable distribution with characteristic func­
tion e - d W (and dispersion C given by ( 1 3.3.6)), then I� _ w t/liZr - i also has a
symmetric stable distribution with dispersion c = c i � -w l t/X
J .
Remark 3.
The process defined by
w
xt = I t/Jj Zr -j ,
j=
( 1 3.3.8)
- ro
where { t/Ji} and { Zr } satisfy the assumptions of Proposition 1 3.3. 1, exists
with probability one and is strictly stationary, i.e. the joint distribution of
(X 1 , . . . , Xk )' is the same as that of (X 1 +h' . . . , Xk+h )' for all integers h and
positive integers k (see Problem 1 3.8). In particular if the coefficients t/Ji are
chosen so that t/iJ = 0 for j < 0 and
00
I t/Ji z i = 8(z)/r/J(z),
j�O
lz l :S 1 ,
( 1 3.3.9)
Where 8(z) = 1 + 8 1 z + . . . + 8q z q and rp(z) = 1 - tP 1 Z - . . . - rpp z P -f= 0 for
l z l :S 1, then it is easy to show that {Xc} as defined by ( 1 3.3.8) satisfies the
ARMA equations rp(B)Xr = 8(B)Zr · We record this result as a proposition.
Proposition 1 3.3.2. Let { Zr } be an iid sequence of random variables with
distribution function F satisfying ( 1 3.3. 1). Then if 8( · ) and ¢( · ) are polynomials
such that rp(z) i= 0 for l z l :S 1 , the difference equations
( 1 3.3. 1 0)
1 3. Further Topics
538
have the unique strictly stationary solution,
( 1 3.3. 1 1)
L t/Jj Zt -j'
j=O
where the coefficients { t/1i} are determined by the relation ( 13.3.9). If in addition
¢(z) and 8(z) have no common zeroes, then the process ( 1 3.3. 1 1) is invertible if
and only if 8(z) of. O for i z l :S; 1.
X,
=
00
PROOF. The series ( 1 3.3. 1 1) converges absolutely with probability one by
Proposition 1 3.3. 1 . The fact that it is the unique strictly stationary solution
of ( 1 3.3. 1 0) is established by an argument similar to that used in the proof
of Theorem 3. 1 . 1 . Invertibility is established by arguments similar to those
in the proof of Theorem 3. 1 .2. See Problem 1 3.9.
D
Although the process { X,} defined by (1 3.3.8) is strictly stationary it is not
second-order stationary since by Remark 1 and ( 1 3.3.3), E I X, l 2 = oo. Never­
theless we can still define, for such a process, an analogue of the autocorrela­
tion function, namely
h
=
1 , 2, . . . .
( 1 3.3. 1 2)
We use the same notation as for the autocorrelation function of a second-order
stationary process since if {Z,} is replaced in ( 1 3.3.8) by a finite variance white
noise sequence, then ( 1 3.3. 12) coincides with the autocorrelation function of
{X,}. Our point of view in this section however is that p(h) is simply a function
of the coefficients { t/li } in the representation ( 1 3.3.8), or as a function of the
coefficients { ¢J and { BJ if { X ,} is an ARMA process defined as in ( 1 3.3. 1 0).
We can estimate p (h) using the sample autocorrelation function,
h = 1 , 2, . . . ,
but it is by no means clear that p(h) is even a consistent estimator of p (h).
However, from the following theorem of Davis and Resnick ( 1 986), we find
that p(h) is not only consistent but has other good properties as an estimator
of p (h).
Theorem 1 3.3. 1 . Let { Z,} be an iid symmetric sequence of random variables
satisfying ( 1 3.3. 1 ) and let {X,} be the strictly stationary process,
X, =
where
j=L
00
- oo
U l l t/ll <
oo
t/Jj Zt-j'
j=L
00
-oo
for some <5 E (O, a) n [0, 1 ] .
§1 3.3. Linear Processes with Infinite Variance
539
Then for any positive integer h,
(n/1n(n)) 1 1"( ,0 ( 1 ) - p ( 1 ), . . . , p(h) - p (h))' => ( Y1 , . . . , }/, )',
where
Yk =
L ( p (k + j ) + p( k - j ) - 2p ( j ) p (k))S)S0,
00
j�l
k
( 1 3.3. 1 3)
=
1, . . . , h,
and S0, S 1 , . . . , are independent stable random variables; S0 is positive stable
with characteristic function,
E exp(iuS0)
( 1 3.3.14)
= exp { - C f(l - 1X/2)cos( mx/4) 1 u l "12 ( 1 - i sgn(u)tan(n:IX/4)) }
and S 1 , S 2 , . . . , are iid with characteristic function,
exp { - C 2 f ( 1 - 1X)cos(n1X/2) 1 u n if IX =/= 1 ,
.
E exp(!US1 ) =
exp { - C 2 n l u l /2 }
if rx = 1 .
{
( 1 3.3. 1 5)
If IX > 1 then ( 1 3.3. 1 3) is also true when p(fJ) is replaced by its mean­
corrected version, p(h) = L_�,:-1h (X1 - X)(X, +h - X)/L,�� � (X, - X) 2, where X =
n - 1 (X 1 + · · · + Xn ).
It follows at once from this theorem that p(h) .!.. p (h), and more specifically
that
p(h) - p (h) = Op ( [n/ln(n) J -11") = op (n - 1 1P )
for all f3 > IX. This rate of convergence to zero compares favourably with the
slower rate, OP (n - 1 12), for the difference p(h) - p(h) in the finite variance
case.
The form of the asymptotic distribution of p(h) can be somewhat simplified.
In order to do this, note that }/, has the same distribution as
·
( 1 3.3. 1 6)
l p (h + j ) + p ( h - j ) - 2p ( j ) p ( hW u;v,
C��
r
where V ( � 0) and U are independent random variables with characteristic
functions given by (1 3.3.14) and ( 1 3.3. 1 5) respectively with C = 1 . Percentiles
of the distribution of U/V can be found either by simulation of independent
copies of U /V or by numerical integration of the joint density of ( U, V) over
an appropriate region. Except when IX = 1, the joint density of U and V cannot
be written down in closed form. In the case IX 1, U is a Cauchy random
variable with probability density fu(u) = t [n 2/4 + u 2 r 1 (see Property 4 of
stable random variables), and V is a non-negative stable random variable
with density (see Feller (1971)), fv (v) = �v - 312 e - "1 <4v >, v � 0. The distribution
function of U/ V is therefore given by
=
P(U/ V � x)
= I"' P(U � xy)fv (y) dy
=
f oo
0
r
( 1 3.3. 1 7)
112 (n: w) - 312 [arctan(xw) + (n:/2)] exp( - 1/(2w)) dw.
540
1 3. Further Topics
Notice also that U /V has the same distribution as the product of a standard
Cauchy random variable (with probability density n - 1 (1 + x 2 ) - 1 ) and an
independent random variable distributed as x 2 (1).
ExAM PLE
1 3.3. 1 (An Infinite Variance Moving Average Process). Let {X,} be
the MA(q) process,
x, = z, + 01 z, _ 1 + · · · + eqz,_q ,
where the sequence {Z,} satisfies the assumptions of Theorem 1 3.3. 1 . Since
p(h) = 0 for I h i > q, the theorem implies in this case that
( jt PUllara
(n/ln(n)) 1 1" ( f5 (h) - p(h)) = 1 + 2
I
h
u; v,
>
q,
where the right-hand side reduces to U/V if q = 0.
Two hundred simulated values of the MA( l ) process
X,
=
Z, + .4Zn
( 1 3.3. 1 8)
with {Z, } an iid standard Cauchy sequence (i.e. Ee iuz , = e -lul), are shown in
Figure 1 3.7. The corresponding function p(O) is shown in Figure 1 3.8. Except
for the value at lag 7, the graph of p(h) does suggest that the data is a realization
of an MA( l ) process. Furthermore the moment estimator, iJ, of 8 is .394,
agreeing well with the true value (} = .40. (B is the root in [ - 1 , 1 ] of j5(1) =
8/( 1 + fJ2 ). If there is no such root, we define iJ sgn(j5(1)) as in Section 8.5.)
=
240
220
200
1 80
1 60
1 40
1 20
1 00
80
60
40
20
0
- 20
-40
0
20
40
60
80
1 00
1
20
1 40
1 60
1
80
200
Figure 1 3.7. Two hundred simulated values of the MA( l ) process, X, = Z, + .4Z, _ 1,
where { Z,} is an iid standard Cauchy sequence.
541
§1 3.3. Linear Processes with Infinite Variance
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
- 1
0
1 0
40
30
20
Figure 1 3.8. The function p(h) for the simulated Cauchy MA( t ) series of Example 1 3.3. 1 .
The .975 quantile of U/V for the process ( 1 3.3. 1 8) is found numerically
from ( 1 3.3. 1 7) to have the value 1 2.4. By Theorem 1 3.3. 1 , approximately 95%
confidence bounds for p ( 1 ) are therefore given by
p(1) ± 1 2.4(1 1 - 2p 2 ( 1 ) 1 + 1Ji ( 1 ) 1 ) (ln(n)/n) .341 ± .364.
=
These are not particularly informative bounds when n 200, but the difference
between them decreases rapidly as n increases. In simulation studies it has
been found moreover that p(h) gives good estimates of p(h) even when n 200.
Ten thousand samples of {X 1 , . . . , X 2 00} for the process ( 1 3.3. 1 8) gave 1 0,000
values of p(1), from which the sample mean and variance were found to be
.34 1 and .0024 respectively. For a finite-variance MA( l ) process, Bartlett's
formula gives the value, v = ( 1 - 3 p 2 ( 1 ) + 4p4 (1))/n, for the asymptotic
variance of ,0(1). Setting n = 200 and p ( 1 ) = .4/( 1 + .4 2 ) = .345, we find
that v = .00350. Thus the sample variance of p ( 1 ) for 200 observations
of the Cauchy process ( 1 3.3. 1 8) compares favourably with the asymptotic
approximation to the variance of p ( l ) for 200 observations of the corre­
sponding finite-variance process. Analogous remarks apply to the moment
estimator, iJ, of the coefficient of the MA(1) process. From our 1 0,000 realiza­
tions of {X 1 , . . . , X2 00}, the sample mean and variance of iJ were found to be
.40 1 and .00701 respectively. The variance of the moment estimator, fJ, for a
finite-variance MA( I ) process is n - 1 (1 + 82 + 484 + 86 + 88 )/( 1 - 8 2 ) 2 (see
Section 8.5). When n = 200 and e = .4 this has the value .00898, which is
somewhat larger than the observed sample variance, .00701 , of fJ for the
Cauchy process.
=
=
1 3. Further Topics
542
ExAMPLE 1 3.3.2 (An Infinite Variance AR(1) Process). Figure 1 3.9 shows 200
simulated values {X 1 , . . . , X2 00 } of the AR(1) process,
X1 = .7X1 _ 1
+ z,
where {Z, } is again an iid Cauchy sequence with E e iuz , = e- l ul . Each observed
spike in the graph corresponds to a large value of Z,. Starting from each spike,
the absolute value o( X, decays geometrically and then fluctuates near zero
until the next large value of Z, gives rise to a new spike. The graph of p(h)
resembles a geometrically decreasing function as would be expected from a
finite-variance AR(1) process (Figure 1 3.1 0). The "Yule-Walker" estimate of c/J
is ,6(1) = .697, which is remarkably dose to the true value, cjJ = .7. From 1 0,000
simulations of the sequence { X 1 , . . . , X2 00 }, the sample mean of p(1) was found
to be .692 and the sample variance was .0025. For an AR( l ) process with finite
variance, the asymptotic variance of p(1) is ( 1 - c/J 2 )/n (see Example 7.2.3).
When n = 200 and cjJ = .7, this is equal to .00255, almost the same as the
observed sample variance in the simulation experiment. The performance of
the estimator p ( 1 ) of cjJ in this case is thus very close, from the point of view
of sample variance, to that of the Yule-Walker estimator in the finite variance
case.
Linear Prediction of ARMA Processes with Infinite Variance. Let { X, } be the
strictly stationary ARMA process defined by ( 1 3.3.7) with c/J(z)B(z) #- 0 for
all z E C such that l z l ::::;: 1 . Suppose also that the iid sequence { Z1} satisfies
1 60
1 50
1 40
1 30
1 20
1 10
1 00
90
80
70
60
50
40
30
20
10
0
-10
- 20
0
20
40
60
80
1 00
1 20
1 40
1 60
1 80
200
Figure 1 3.9. Two hundred simulated values of the AR( l ) process, X, = .7X, _ 1 + Z,,
where { Z,} is an iid standard Cauchy sequence.
543
91 3.3. Linear Processes with Infinite Variance
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0. 1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1
0
10
20
40
30
Figure 1 3. 10. The function p(h) for the simulated Cauchy AR( 1 ) series of Example 1 2.5.2.
( 1 3.3. 1). In assessing the performance of the linear predictor,
(1 3.3. 1 9)
we cannot consider E(Xn + l - Xn+l ) 2 as we did for second order processes
since this expectation is infinite. Other criteria for choosing a "best" predictor
which have been suggested include minimization of the expected absolute
error (when a > 1 ), and the use of a pseudo-spectral technique (Cambanis and
Soltani (1 982)). Here we shall consider just one criterion, namely minimization
of the error dispersion (see ( 1 3.3. 1 )). Using ( 1 3.3. 1 1 ) we can rewrite X n + 1 in the
form
00
xn+l = I (an ! t/lj + an 2 t/lj - 1 + . . .
j=O
+
annt/lj - n+l )Zn -j'
( 1 3.3.20)
and using ( 1 3.3. 1 1) again we obtain
Since { Z1 } is assumed to have dispersion C, it follows from Remark 2 that
( + .I l t/lj+l - an l t/lj -
disp(Xn+l - Xn+l ) = c 1
;=0
··· -
)
annt/lj - n+l l " .
( 1 3.3.22)
1 3. Further Topics
544
In the special case when Z, has the symmetric stable distribution with
exponent 1X E (0, 2) and scale parameter d 1 1a (i.e. E ewz, = exp( - d i iW)), the
dispersion of Z, (see Property 6) is C = dj[r(l 1X)cos(n1X/2)], IX i= 1, and
C = 2djn, IX = 1 . The prediction error is also symmetric stable with dis­
persion given by ( 1 3.3.22). Minimization of ( 1 3.3.22) is therefore equivalent
to minimization of the scale parameter of the error distribution and hence
to minimization of P ( I Xn + 1 - Xn + 1 1 > s) for every s > 0. The minimum
dispersion criterion is useful also in regression problems (Blattberg and
Sargent ( 1 97 1 )) and Kalman filtering problems (Stuck ( 1 978)) associated with
stable sequences. For general sequences {Z,} satisfying ( 1 3.3. 1 ) the minimum
dispersion criterion minimizes the tail probabilities of the distribution of the
prediction error.
The minimization of ( 1 3.3.22) for IX E (0, 2) is rather more complicated
than in the case IX = 2 and the best predictor is not in general unique. For a
general discussion of this problem (and the related problem of finding h-step
predictors) see Cline and Brockwell ( 1 985). Here we shall simply state the
results for an MA( 1 ) process and, when Z, has a Cauchy distribution, compare
the minimum dispersion predictor .x" + 1 I.'J=1 anj xn + 1 -j with the predictor
X;:' = L 'J= 1 l/Jnj Xn + 1 -j obtained by assuming that { Z, } has finite variance.
-
=
If X, = Z, + 8Z, 1 where { Z,} is an iid sequence with
distribution function satisfying ( 1 3.3. 1 ), then the minimum dispersion linear
predictor xn + 1 of xn + 1 based on x1 , . . . , xn is
n
xn + 1 = - L ( - 8)j Xn +1 -j if IX :::; 1 ,
j= 1
n
1 IJ " + 1 - j
1
(
=
X n + 1 - L - 8)
X n + 1 -j if IX > 1 ,
1 IJn + 1
j= 1
where IJ = I e ta/(a - 1 ) . The error dispersion of xn + 1 is
C [ 1 + i 8 1 < " + 1 la] if IX :::; 1 ,
Proposition 13.3.3.
_
�
.
_
-
[
c 1 + I IJ I (n + 1 )a
c � y- 1 J
1
IJ
"
IJ
The minimum dispersion h-step predictor, h
c [1 + t en.
PROOF. See Cline and Brockwell ( 1 985).
ExAM PLE
2:
if IX > 1 .
1, is zero with error dispersion
D
1 3.3.3 (Linear Prediction of a Cauchy MA(1) Process). Suppose that
(1 3.3.23)
1 81 < 1,
where {Z, } is an iid standard Cauchy sequence, i.e. E e ;uz, = e - l ul . Then
condition ( 1 3.3. 1 ) is satisfied with p = q = 1 and C = 2/n. By Proposition
545
§ 1 3.4. Threshold Models
1 3.3.3, the minimum dispersion one-step predictor is
n
Xn +l = I ( - BYXn+ l -j•
j=l
with corresponding error dispersion,
2
d1sp(Xn+ 1 - Xn+ d = - (1 + I B I n+l ).
n
-
·
( 1 3.3.24)
-
( 1 3.3.25)
If now we imagine { Z1} in ( 1 3.3.23) to have finite variance and compute the
best linear mean square predictor x :+ 1, we find from Problem 3. 10 that
n
n+ j
(1 B2 " + 2 )X:i+l = - I [ ( - e)j - ( - ef l - J Xn+l -j • (1 3.3.26)
j=l
and hence that
(1 e z n+ 2 ) (Xn +l - X;i+ l )
-
From ( 1 3.3.27) we can easily compute the error dispersion when the mean­
square linear predictor x:+ 1 is applied to the Cauchy process ( 1 3.3.23). We
find that
-(
)
2
1 + IBI
.
*
d !Sp (xn+l - xn+ l ) - - 1 + I e I n+l
1 + I B I " +! '
n
which is clearly greater than the dispersion of (Xn + I
-
gn + d
( 1 3.3.28)
in ( 13.3.25).
The minimum dispersion linear predictor of Xn + l based on {Xj , -oo <
j :::;; n } turns out to be the same (for a causal invertible ARMA process) as the
best linear mean square predictor computed on the assumption that { Z1 } has
finite variance. The dispersion of the one-step prediction error is just the
dispersion of {Z1} (2/n in Example 1 3.3.3).
Although we have only considered linear prediction in this section, we
should not forget the potential for improved prediction of infinite variance
(and finite variance) processes using predictors which are non-linear in the
observations. In the next section we give a brief introduction to non-linear
time-series models, with particular reference to one of the families of non­
linear models ("threshold models") which have been found useful in practice.
§ 13.4 Threshold Models
Linear processes of the form
00
X I = I t/Jj Zt -j •
j=O
(1 3.4. 1)
1 3. Further Topics
546
where Z1 E At, = sp{X., - oo < s :::;; t }, play an important role in time series
analysis since for such processes the best mean square predictor,
E(X1 + h I X 5, - oo < s :::;; t) and the best linear predictor, P.tt,X 1 + h• are identical.
(In fact for the linear process ( 1 3.4. 1 ) with { ZJ "' WN(O, cr 2 ), the two
predictors are identical if and only if { Z1} is a martingale difference sequence
relative to {X1}, i.e. if and only if E(Z1 + 1 1 Xs, - oo < s :::;; t) = 0 for all t (see
Problem 1 3. 1 1 ).) The Wold decomposition (Section 5.7) ensures that any
purely non-deterministic stationary process can be expressed in the form
( 1 3.4. 1 ) with {Z1} "' WN(O, cr 2 ), but the process {Z1} is generally not an iid
sequence and the best mean square predictor of X t + h may be quite different
from the best linear predictor. However, in the case when {X1} is a purely
non-deterministic Gaussian stationary process, the sequence {Z1} in the
Wold decomposition is Gaussian and therefore iid. Every stationary purely
non-deterministic Gaussian process can therefore be generated by apply­
ing a causal linear filter to an iid Gaussian sequence. We shall therefore
refer to such processes as Gaussian linear processes. They have the desirable
property (like the more general linear process ( 1 3.4. 1)) that P.tt, X t +h =
E(X 1 + h 1 Xs , - oo < s :::;; t).
Many of the time series encountered in practice exhibit characteristics
not shown by linear Gaussian processes and so in order to obtain good
models and predictors for such series it is necessary to relax either the
Gaussian or the linear assumption. In the previous section we examined a
class of non-Gaussian (infinite variance) linear processes. In this section we
shall provide a glimpse of the rapidly expanding area of non-linear time
series modelling and illustrate this with a threshold model proposed by Tong
( 1 983) for the lynx data (Series G, Appendix A).
Properties of Gaussian linear processes which are sometimes found to be
violated by observed time series are the following. A Gaussian linear process
{XJ is reversible in the sense that (X11, , X1J has the same distribution
as (X 1 , , X1)'. (Except in a few special cases, linear, and hence ARMA
processes, are reversible if and only if they are Gaussian (Weiss ( 1975), Breidt
and Davis ( 1990)).) Deviations from this property are suggested by
sample-paths which rise to their maxima and fall away at different rates (see,
for example, the Wolfer sunspot numbers, Figure 1 .5, and the logarithms to
base 10 of the lynx data, Figure 1 3. 1 1 ). Gaussian linear processes do not
exhibit sudden bursts of outlying values as are sometimes observed in
practice. Such behaviour can however be shown by non-linear processes (and
by processes with infinite variance). Other characteristics suggesting deviation
from a Gaussian linear model are discussed by Tong ( 1 983).
If we restrict attention to second order properties of a time series, it will
clearly not be possible to decide on the appropriateness or otherwise of a
Gaussian linear model. To resolve this question we consider moments of
order greater than two.
Let { X 1} be a process which, for some k ?: 3, satisfies sup1 E 1 X1 l k < oo and
E(XtoXt J . . . X t) = E(Xto + h X tJ +h . . . X t, + h),
• • •
"
• • •
547
§1 3.4. Threshold Models
4.5
4
3.5
3
2.5
2
1 .5
1
����==��m==m����=m��m=��
1 8 2 0 30
40
50
60
70
80
90
1 900
1 0
20
30
40
50
60
70
Figure 1 3. 1 1 . The logarithms t o base 1 0 of the Canadian lynx series ( 1 82 1-1 934),
showing 50 predicted values based on the observations up to 1 920 and the
autoregressive model ( 1 3.4.3).
. . . }{X,}
{0,
1}.
Ck(r 1 , {0,, rkX-� > X,+,
,, . . . , X(0r +, r...k_ ,,,0)
ikz 1 z 2 zk
x(z1, , zd:= In E[exp(iz1X, iz2 X,+,, + + izk Xr+ r,_)].
In particular, the third order cumulant function C3 o f {X,} coincides
with the third order central moment function, i.e.
r, s E {0, ± 1, . . .} , ( 1 3.4.2)
where = EX,. lf L;, Ls I C 3 (r, s)l
we define the third order polyspectral
density (or bispectral density) of {X,} to be the Fourier transform,
1
,
f3 (w1 , w 2 ) = --2 " " C 3 (r, s) - Irro - 1sro2
(2n) r = - oo s= - oo
in which case
C3(r, s) J�,J�/rro, + isro'f3(w l , Wz) dw l dwz.
[More generally, if the
order cumulants Ck(r1,
rk _1 ), of {X,} are
absolutely summable, we define the
order polyspectral density as the
for all t 0 , t 1 , . . . , ti , h E
± 1,
and for all j E
1, . . . , k The kth
order cumulant
d of
is then defined as the joint cumulant
i.e. as the coefficient of
of the random variables,
· · ·
in the Taylor expansion about
of
• • •
+
• • .
p
···
< oo ,
00
00
L..
L..
e
•
.
,
=
kth
• • • ,
kth
1 3. Further Topics
548
Fourier transform of Ck . For details see Rosenblatt ( 1985) and Priestley
( 1 988).]
If {X1} is a Gaussian linear process, it follows from Problem 1 3. 1 2
that the cumulant function C 3 of { X1} i s identically zero. (The same
is also true of all the cumulant functions Ck with k > 3.) Consequently
f3(w 1, w2) = 0 for all w 1 , w2 E [ - n, n]. Appropriateness of a Gaussian linear
model for a given data set can therefore be checked by using the data to
test the null hypothesis, f3 = 0. For details of such a test, see Subba-Rao
and Gabr ( 1984).
If {X1} is a linear process of the form ( 1 3.4. 1 ) with E I Z1 I 3 < oo , Ez ; = Yf
and L� o 1 1/!j l < oo , it can be shown from ( 1 3.4.2) (see Problem 1 3. 1 2) that
the third order cumulant function of { X1} is given by
C 3 (r, s) =
(with
1/Jj
=
0 for j
<
1/J(z) :=
L
00
i = - 00
( 1 3.4.3)
1/J i i/J i + r i/J i +s,
0) and hence that { X1} has bispectral density,
f3(wl , w2)
where
Yf
:2 1/J(ei(wl + w,>)ljl(e - i"'l)ljf(e - iw,),
=
4
( 1 3.4.4)
L� o 1/Jj zj. By Theorem 4.4. 1 , the spectral density of { X1} is
(J2
f(w) = - 1 1/J(e - i"'W.
2n
Hence
"-
)
'+' (w l , w2 ·· -
l f3(wl , w 2 W
f(wdf(w2)f(w1 + w2)
11
2
2nrJ6
•
Appropriateness of the linear process ( 1 3.4. 1 ) for modelling a given data set
can therefore be checked by using the data to test for constancy of ¢(w1, w2)
(see Subba-Rao and Gabr ( 1 984)).
If it is decided that a linear Gaussian model is not appropriate, there is
a choice of several families of non-linear processes which have been found
useful for modelling purposes. These include bilinear models, autoregressive
models with random coefficients and threshold models. Excellent accounts
of these are available in the books of Subba-Rao and Gabr ( 1 984), Nicholls
and Quinn ( 1982) and Tong ( 1 990) respectively.
Threshold models can be regarded as piecewise linear models in which
the linear relationship varies with the values of the process. For example
if R< il , i = 1, . . . , k , is a partition of IRP, and {Z1} IID(O, 1 ), then the
k difference equations,
�
p
x l = (J<ilzt + " "-f.i)xt - ). '
� o/j
j= 1
i = 1 , . . . ' k,
( 1 3.4.5)
549
§ 1 3.4. Threshold Models
defined a threshold AR(p) model. Model identification and parameter
estimation for threshold models can be carried out in a manner similar to
that for linear models using maximum likelihood and the AIC criterion.
Details can be found in the book of Tong ( 1990). It is sometimes useful to
express threshold models in state-space form (cf. Section 1 2. 1 ). For example,
the model ( 1 3.4.5) can be re-expressed as
xt = [O o o
·· ·
1JSP
where Sr = = (Xr - p + 1 , Xr - p + 2 , , Xr)' satisfies the state equation,
Sr + 1 = FrSr + Hr Zr + 1 ·
• • .
This state representation differs from those in Chapter 1 2 in that the matrices
Fr and Hr now depend on Sr . Thus
0
0
Ft -
0
0
0i
¢ ) ¢( )
�
0
p- 1
0
¢(pi)- 2
0
0
0
0
Ht r/JV)
0
if S; E R(i).
(J(i)
As an illustration of the use of threshold models, Tong identifies the following
model for the logarithm to base 10 of the lynx data ( 1 82 1-1920) :
{
+
1 .068Xr - 1 - .207Xr - 2 + . 17 1 Xr - 3 - .453Xr - 4
Xr - 2 s 3.05,
+ .224Xr - s - .033Xr _ 6 + . 1 74Zp
2.296 + 1 .425Xr - J - 1 .080Xr_ 2 - .09 1 Xr _ 3 + .237Zp X r _ 2 > 3.05,
( 13.4.6)
where { Zr} � 110(0, 1 ). The model ( 1 3.4.6) is said to have a delay parameter
b = 2 since the form of the equation specifying Xr is dependent on the value of
Xr _ 2 . 1t is easy to compute the best mean square predictor E(Xn + h i Xn t s n)
for the model ( 1 3.4.6) if h = 1 or h = 2 but not if h > 2 (see Problem 1 3 . 1 3).
More generally, if the delay parameter is b, computation of the best predictor
is easy if h < b but is extremely complicated otherwise. A natural
approximation procedure for computing the best predictors given X 1, , X"
is to set Zr = 0, t � n + 1, in the recursions defining the process and then
Tong ( 1983), p. 1 87, refers to the
to solve recursively for Xn + 1, Xn + 2 ,
predictors obtained in this way as the values of the eventual forecast function.
For the logarithms of the lynx data and the model (1 3.4.6) the eventual
forecast function exhibits a stable limit cycle of period 9 years with values
(2.6 1 , 2.67, 2.82, 3.02, 3.25, 3.4 1, 3.37, 3. 1 3, 2.80). An alternative technique
suggested by Tong for computing h-step predictors when h > b and the data
is nearly cyclic with period T is to fit a new model with delay parameter
k T + b where k is a positive integer. Under the new model, prediction can
then be carried out for values of h up to kT + b. The most satisfactory general
procedure for forecasting threshold models is to simulate future values of
Xr =
.802
• • .
• . . •
1 3. Further Topics
550
the process using the fitted model and the observed data { X 1, . . . , X n } · From
N simulated values of Xn + h we can construct a histogram which estimates
the conditional density of X n + h given the data. This procedure is implemented
in the software package STAR of H. Tong (see Tong ( 1 990)).
It is interesting to compare the non-linear model ( 1 3.4.6) for the logarithms
of the lynx data with the minimum AICC autoregressive model found for the
same series using the program PEST, namely
X, = 1 . 1 23 + 1 .084X, _ 1 - .477X, _ 2 + .265X, _ 3 - .21 8X, _ 4
( 1 3.4.7)
{Z,} � WN(O, .0396)
+ . 1 80X, _ 9 - .224X, _ 1 2 + Z, ,
The best linear mean-square h-step predictors, h = 1 , 2, . . . , 50, for the years
1921-1970 were found from ( 1 3.4.7). They are shown with the observed values
of the series ( 1 82 1-1934) in Figure 1 3. 1 1. As can be seen from the graph, the
h-step predictors execute slowly damped oscillations about the mean (2.880)
of the first 1 00 observations. As h � oo the predictors converge to 2.880.
Figures 1 3 . 1 2 and 1 3 . 1 3 show respectively 1 50 simulated values of the
processes ( 1 3.4.7) and (13.4.6). Both simulated series exhibit the approximate
9 year cycles of the data itself.
In Table 1 3.3 we show the last 1 4 observed values of X, with the
corresponding one-step predictors X, based on (1 3.4.7) and the predictors
X, = E(X, I Xs , s < t) based on ( 1 3.4.6).
The relative performance of the predictors can be assessed by computing
1
s = (S/14) 1 2 where S is the sum of squares of the prediction errors for
5
4.5
4
3.5
3
2.5
2
1 .5
0
10
20
30
40
50
60
70
80
90
1
00
1 1
0
1
20 1 30
1
40
1
50
Figure 1 3. 1 2. One hundred and fifty simulated values of the autoregressive model
( 1 3.4.7) for the transformed lynx series.
§ 1 3.4. Threshold Models
0
1 0
20
30
55 1
40
50
60
70
80
90
1 00
1 1 0
1 20
1
30
1 40
1 50
Figure 1 3. 1 3. One hundred and fifty simulated values of the threshold model ( 1 3.4.6)
for the transformed lynx series.
Table 1 3.3. The Transformed Lynx Data {X,, t = 1 0 1 , . . . , 1 14}
with the One-Step Predictors, X ,, Based on
the Autoregressive Model ( 1 3.4.7), and X, Based on the
Threshold Model (1 3.4.6)
101
1 02
1 03
1 04
1 05
1 06
1 07
1 08
1 09
1 10
Ill
1 12
1 13
1 14
X,
x,
.X,
2.360
2.601
3.054
3.386
3.553
3.468
3 . 1 87
2.724
2.686
2.821
3.000
3.201
3.424
3.531
2.349
2.793
2.865
3.23 1
3.354
3.329
2.984
2.668
2.432
2.822
2.969
3.242
3.406
3.545
2.3 1 1
2.877
2.9 1 1
3.370
3.588
3.426
3.094
2.77 1
2.422
2.764
2.940
3.246
3.370
3.447
1 3. Further Topics
552
t = 101, . . . , 1 14. From the table we find that the value of s for the
autoregressive predictor X , is . 1 38, while for the threshold predictor X,, the
values of s is reduced to . 1 20.
A bilinear model for the log lynx data can be found on p. 204 of Subba-Rao
and Gabr (1 984) and an AR(2) model with random coefficients on p. 143 of
Nicholls and Quinn ( 1 982). The values of s for predictors based on these
models are . 1 1 5 and . 1 1 6 respectively. These values indicate the improvements
attainable in this example by consideration of non-linear models.
Problems
1 3. 1 . Suppose that { Xn } and { X, 2 } are related by the transfer function model,
2.5B
X, I + W, ,
X, z =
1 - .7B
---
(1 - .8B)X,1 = Z,,
where { Z,} WN(O, 1 ), { W,} WN(O, 2), and { Z,} and { W,} are uncorrelated.
(a) Write down a state-space model for { (Xn, X, z )'}.
(b) If WI OO = 1 .3, x 1 00 . 2 = 2.4, x ! OO. l = 3. 1 5, find the best linear predictors
of X 1 0 1 . 2 and X 1 0 2• 2 based on {X,1, X, 2 , t ::;; 1 00}.
(c) Find the mean squared errors of the predictors in (b).
�
�
1 3.2. Show that the output, { X, 2 }, of a transfer function model satisfies the equation
( 1 3. 1 . 1 5).
1 3.3. Consider the transfer function model
Bz
X, 2 = --- Xn + W, ,
1 - .5B
(1 + .5B)X,1 = Z,,
where {Z,} and { W,} are uncorrelated WN(O, 1 ) sequences. Let
Q, = psp{X,1 . - oo < s .o; t} •
Show that Pn X n + l . l = Qn X n + l . l and hence evaluate Pn Xn + J , J·
Express PnXn + 1 . 2 and PnXn + z . z in terms of Xj 1, Xj2 , j ::;; n and W".
Evaluate E(Xn + I . Z - Pn Xn + l . 2) 2 , E(Xn + l . l - Pn X n + 2 . 2 ) 2 .
Show that the univariate process { X, 2 } has the autocovariance function
of an ARMA process and specify the process. (Hint : consider
( 1 - .25B 2)X, 2 .)
2
(e) Use (d) to compute E(Xn + l . z - R n Xn + 1 , 2 ) .
(a)
(b)
(c)
(d)
1 3.4. Verify the calculations of I1�. 1 in Example 1 3. 1 . 1 (cont.).
553
Problems
1 3.5. Find a transfer function model relating the input and output series X,1, X,2,
t = I , . . . , 200 (Series J and K respectively in the Appendix). Use the model
to predict X 201 • 2 , X 202 • 2 and X 203• 2 . Compare the predictors and their mean
squared errors with the corresponding predictors and mean squared errors
obtained by modelling { Xrz} as a univariate A R MA process and with the
results of Problem 1 1 . 10.
1 3.6. If {X,} is the causal invertible ARIMA(p, d, q) process defined by ( 1 3.2.23) and
x. = n - 1 (X 1 + · · · + X.), show that
2
n 1 - dE(X. - /1) 2 --> C,
where C is a positive constant.
1 3.7. Verify the properties ( 1 3.3.3) for a random variable Z1 whose distribution
satisfies ( 1 3.3 . 1 ).
1 3.8. Show that the linear process ( 1 3.3.8) is strictly stationary if { tjJj } and { Z,} satisfy
the conditions of Proposition 1 3.3. 1 .
1 3.9. Prove Proposition 1 3.3.2. (Note that Proposition 1 3.3.1 remains valid for
strictly stationary sequences {Z,} satisfying ( 1 3.3.1 ), and use arguments similar
to those used in the proofs of Theorems 3 . 1 . 1 and 3 . 1.2.)
1 3 . 10. Modify Example 1 3.3.3 by supposing that {Z,} is iid with E eiuz, = e - 1 • 1 ', u E IR,
0 < rx < 2. Use Proposition 1 3.3.3 to show that for each fixed sample size n
and coefficient B E ( - 1 1 ), the ratio of the error dispersion of X*. + 1 (see
( 1 3.3.28)) tO that Of X n+ I • COnverge aS rJ. --> 0 tO 1 + n/2.
,
1 3. 1 1 . Let {X,} be the process,
00
where t/10
#
X, = I 1/Jj Zr - j• {Z,}
j�O
�
2
WN(O, a ),
0, Ii�o 1/Jl < oo and Z, E A, = sp {X , - oo < s � t } . Show that
.
E(X, + 1 1 Xs, - oo < s � t) = PJt, X, + 1
if and only if { z,} is a martingale difference sequence, i.e. if and only if
E(Z, + 1 I X - oo < s � t) = 0 for all t.
5,
1 3. 1 2. If {X,} is the linear process ( 1 3.4. 1) with {Z,} 110(0, a 2 ) and 11
that the third order cumulant function of {X,} is given by
�
C k, s)
=
11
=
Ez; , show
00
I 1/J;t/J; +,t/Ji + s ·
i= -
OC;
Use this result to establish equation ( 1 3.4.4). Conclude that if {X,} is a Gaussian
linear process, then C3(r, s) = 0 and f3(w 1 , w2) = 0.
1 3. 1 3. Evaluate the best mean square predictors E(X, + h i Xs, - oo < s � t), h = 1, 2
for the threshold model ( 1 3.4.6).
APPENDIX
Data Sets
All of the following data sets are listed by columns.
Series A. Level of Lake Huron in Feet (Reduced by 570), 1 875- 1 972
10.38
1 1 .86
10.97
10.80
9.79
10.39
10.42
10.82
1 1.40
1 1 .32
1 1 .44
1 1.68
1 1.17
1 0.53
1 0.01
9.9 1
9.14
9. 1 6
9.55
9.67
8.44
8.24
9. 1 0
9.09
9.35
8.82
9.32
9.01
9.00
9.90
9.83
9.72
9.89
10.01
9.37
8.69
8.19
8.67
9.55
8.92
8.09
9.37
1 0. 1 3
1 0. 14
9.5 1
9.24
8.66
8.86
8.05
7.79
6.75
6.75
7.82
8.64
10.58
9.48
7.38
6.90
6.94
6.24
6.84
6.85
6.90
7.79
8. 1 8
7.51
7.23
8.42
9.61
9.05
9.26
9.22
9.38
9.10
7.95
8. 1 2
9.75
1 0.85
10.41
9.96
9.61
8.76
8. 1 8
7.21
7. 1 3
9.10
8.25
7.91
6.89
5.96
6.80
7.68
8.38
8.52
9.74
9.3 1
9.89
9.96
Series B. Dow Jones Utilities Index, Aug. 28-Dec. 1 8, 1 972
1 10.94
1 1 0.69
1 1 0.43
1 1 0.56
1 10.75
1 1 0.84
1 10.46
1 1 0.56
1 1 0.46
1 10.05
1 09.60
1 09.31
1 09.3 1
1 09.25
1 09.02
1 08.54
1 08.77
1 09.02
1 09.44
1 09.38
1 09.53
1 09.89
1 10.56
1 10.56
1 1 0.72
1 1 1 .23
1 1 1 .48
1 1 1 .58
1 1 1 .90
1 1 2. 1 9
1 1 2.06
1 1 1 .96
1 1 1 .68
1 1 1 .36
1 1 1 .42
1 1 2.00
1 1 2.22
1 1 2.70
1 1 3. 1 5
1 14.36
1 14.65
1 1 5.06
1 1 5.86
1 1 6.40
1 1 6.44
1 1 6.88
1 1 8.07
1 1 8.51
1 19.28
1 19.79
1 19.70
1 19.28
1 19.66
1 20. 14
120.97
121.13
1 2 1 .55
1 2 1 .96
122.26
123.79
1 24. 1 1
1 24.1 4
123.37
1 23.02
1 22.86
1 23.02
1 23. 1 1
1 23.05
1 23.05
1 22.83
123. 1 8
1 22.67
1 22.73
1 22.86
1 22.67
1 22.09
1 22.00
1 2 1 .23
Appendix. Data Sets
556
Series C. Private Housing Units Started, U.S.A. (Monthly). [Makridakis
922]
1 361
1 278
1443
1 524
1 483
1404
1450
1517
1 324
1 533
1 622
1 564
1 244
1 456
1 534
1 689
1641
1 588
1614
1639
1 763
1 779
1 622
1491
1603
1 820
1517
1448
1 467
1 550
1 562
1 569
1455
1 524
1486
1 484
1361
1433
1423
1438
1478
1 488
1 529
1 432
1 482
1 452
1 460
1 656
1 370
1 378
1 394
1 352
1 265
1 194
1 086
1 1 19
1 046
843
961
990
1 067
1 123
1 056
1 09 1
1 304
1 248
1 364
1 407
142 1
149 1
1 538
1 308
1 380
1 520
1 466
1 554
1 408
1405
1 5 12
1 495
1 556
1 569
1 630
1 548
1 769
1 705
1 56 1
1 524
1 583
1 528
1 368
1 358
1 507
1381
1 229
1 327
1 085
1 305
1319
1 264
1 290
1 385
1517
1 399
1 534
1 580
1 647
1 893
1 828
1 741
1910
1 986
2049
2026
2083
2 1 58
2041
2 1 28
2 1 82
2295
2494
2390
2334
2249
2221
2254
2252
2382
248 1
2485
2421
2366
248 1
2289
2365
2084
2266
2067
2 1 23
205 1
1 874
1 677
1 724
1 526
Series D. Industrial Production, Austria (Quarterly). [Makridakis 337]
54. 1
59.5
56.5
63.9
57.8
62.0
58.5
65.0
59.6
63.6
60.4
66.3
60.6
66.8
63.2
7 1 .0
66.5
72.0
67.8
75.6
69.2
74. 1
70.7
77.8
72.3
78. 1
72.4
82.6
72.9
79.5
72.6
82.8
76.0
85. 1
80.5
89. 1
84.8
94.2
89.5
99.3
93.1
103.5
96.4
107.2
101.7
1 09.5
101.3
1 12.6
1 05.5
1 1 5.4
1 08.0
1 29.9
1 12.4
1 23.6
1 14.9
1 3 1 .0
1 22.6
1 3 1 .9
1 20.5
1 30.7
1 1 5.7
1 19.7
1 09.7
1 25.1
Series E. Industrial Production, Spain (Monthly). [Makridakis 868]
128
1 34
133
141
1 34
142
143
1 36
108
142
146
149
141
1 56
151
1 60
1 56
1 60
161
149
1 18
147
1 58
146
1 32
1 39
1 39
137
144
146
149
142
101
141
1 22
1 45
1 48
137
1 37
1 55
152
1 53
1 52
1 53
1 13
151
1 59
1 65
161
1 60
1 67
1 78
1 67
1 76
1 73
1 64
1 23
1 75
1 75
1 76
1 74
557
Appendix. Data Sets
Series F. General Index of Industrial Production (Monthly). [Makridakis
904]
96
97
99
100
1 02
1 06
1 02
80
1 04
1 04
1 07
102
1 00
1 07
1 10
111
1 14
113
1 09
91
1 16
118
123
121
1 19
125
131
132
1 35
138
132
1 06
1 34
132
1 36
1 32
1 33
140
142
146
148
149
1 47
1 15
1 50
1 52
1 58
1 54
1 55
1 59
1 60
1 63
1 67
1 60
1 62
1 26
1 60
161
1 67
1 67
1 64
1 65
1 73
1 79
181
1 82
1 75
131
183
181
1 75
1 82
1 82
1 85
191
191
188
1 39
1 89
1 90
1 99
193
1 95
200
205
208
216
216
210
1 69
217
213
220
217
3495
587
1 05
1 53
387
758
1 307
3465
6991
63 1 3
3794
1 836
345
382
808
1 388
271 3
3800
309 1
2985
3790
674
81
80
1 08
229
399
1 1 32
2432
3574
2935
1 537
529
485
662
1 000
1 590
2657
3396
83023
40748
35396
29479
42264
58 1 7 1
508 1 5
5 1 285
70229
76365
70407
41 839
45978
478 1 3
57620
66549
54673
55996
60053
39169
2 1 534
1 7857
2 1 788
33008
1 84
1 79
181
1 79
1 87
1 85
183
1 77
1 76
1 30
1 76
1 75
181
1 76
Series G. Annual Canadian Lynx Trappings, 1 82 1 - 1 934
269
321
585
871
1475
282 1
3928
5943
4950
2577
523
98
1 84
279
409
2285
2685
3409
1 824
409
151
45
68
213
546
1 033
2 1 29
2536
957
361
377
225
360
731
1 638
2725
2871
2 1 19
684
299
236
245
552
1 623
33 1 1
6721
4254
687
255
473
358
784
1 594
1 676
225 1
1426
756
299
201
229
469
736
2042
28 1 1
443 1
25 1 1
389
73
39
49
59
1 88
377
1 292
403 1
Series H. Annual Mink Trappings, 1 848- 1 9 1 1
37123
347 1 2
296 1 9
21151
24859
25152
42375
50839
6 1 58 1
6 1 95 1
7623 1
63264
44730
3 1 094
49452
4396 1
61 727
60334
5 1 404
58451
73575
74343
27708
3 1985
39266
44740
60429
72273
792 14
79060
84244
62590
35072
36160
45600
47508
52290
1 10824
76503
64303
558
Appendix. Data Sets
Series I. Annual Muskrat Trappings, 1 848- 1 9 1 1
224347
1 79075
1 75472
194682
292530
493952
5 1 2291
345626
258806
302267
3 1 3502
254246
1 77291
206020
335385
357060
509769
41 8370
320824
4 1 2 1 64
6 1 808 1
4 1 4 1 73
232251
443999
703789
767896
671 982
523802
5833 19
437 1 2 1
486030
499727
478078
829034
1029296
1 069 1 83
1 083067
8 1 7003
347050
380 1 32
344878
2236 14
322 1 60
574742
8061 03
934646
648687
6748 1 1
8 1 3 1 59
551716
568934
701487
767741
928 1 99
1650214
1 488287
924439
1 056253
695070
407472
1 724 1 8
302195
7491 42
963597
5.06
4.93
6.00
5.59
6.24
3.79
4.5 1
6.00
4.76
5.55
4.74
5.61
6.52
5.95
4.38
5.95
5.80
4.87
6.47
4.01
5.57
6.30
3.34
6.38
4.73
4.70
6.29
1 .98
6.71
5.39
5.73
5.55
5.88
4.27
4.34
4.26
4.98
3.38
6. 14
5.45
5.92
2.62
7.68
3.37
3.90
7.01
1.24
9.03
1 .08
7.86
0.87
9.26
1 .29
7.59
3.90
4.37
6.00
3.56
5.92
4.38
6.24
6.57
3.05
6.43
4.03
5.86
4.82
2.90
5.65
5.14
3. 1 7
6.73
4.43
3.99
4.46
Series J. Simulated Input Series
6.94
3.48
5.92
5. 1 1
7.00
2.23
5.78
5.26
5. 19
4.83
4.48
5.98
4.45
5.73
6.38
2.47
7.43
2.73
5.90
5. 19
5.08
4.28
3.64
6.01
3.08
5.98
5.07
6.43
3.64
6.84
1 .49
7.39
2.36
5.24
4.63
6.87
2.8 1
6.27
4.98
3.45
4.90
4. 1 1
5.40
6.38
3.94
4.96
6.35
2.75
6.03
4.76
3.31
5.56
5.06
4.28
3.89
7.47
3.60
8.88
1 .26
7.78
2.76
9.23
2.95
6.38
4.35
5.33
5.00
4.87
6.74
3.37
6.56
3.45
4.54
4.95
4.34
3.69
7.40
3. 1 8
6.92
3.40
7.65
3.42
5.91
3.91
6.99
2. 1 5
7. 1 7
3.94
3.87
7.61
3.73
5.08
6.61
1 .94
6. 1 7
5.73
4. 14
5.76
3.56
5.44
4.8 1
5.46
5.01
3.05
6.48
6.35
3.86
5.01
6.49
3.43
7.70
2. 1 3
8.47
2.41
5.93
4.69
5.29
6.44
5.14
5.93
3.95
5.87
4.64
7.25
5.00
Appendix. Data Sets
559
Series K. Simulated Output Series
1 5.21
21.13
14. 1 4
18.22
8.74
1 7.33
1 7. 1 0
22.20
4.44
1 1 .62
1 7.50
1 8.59
1 4.77
14.63
17.16
1 3.44
1 5.63
20.99
1 1 .98
20. 1 3
1 1 .90
1 3.94
14.06
19.71
1 5.73
1 2.79
14. 1 2
7.77
1 1 .03
1 3.36
1 7.36
1 2.82
22. 1 6
8.83
22.66
10.65
10.72
6.77
1 5.73
7.26
1 2.82
1 2.38
7.26
7.90
1 2. 1 1
14.56
1 7. 1 1
1 6. 1 5
1 7. 1 2
1 9.97
1 2.36
1 5.92
1 4.00
7.05
10. 1 9
10.78
9.21
7.36
1 7.01
1 2. 1 4
24.86
1 2.34
23.49
1 2.76
26.97
14.95
22.94
1 5.88
20.30
1 9. 1 3
1 7.26
1 9.99
1 5.76
19.82
1 1 .09
10.70
1 1 .71
1 0.65
7.61
1 7.73
9.71
1 8.58
1 5.04
24.44
14.33
1 7. 1 8
1 2.21
2 1 .90
1 2.28
20.73
1 1 .76
8.82
1 6.89
1 2.58
1 3.63
1 9.38
9.80
1 7.85
1 9.01
1 8.9 1
1 8.88
14.20
1 6.58
1 7.32
1 6.85
1 8.32
1 6. 1 1
1 9. 1 7
21.18
1 6. 1 7
1 2.02
21 .46
1 7.35
22.76
1 4.03
25.30
1 3.46
17.17
1 2.98
20.27
22.75
20.75
23.2 1
1 7.67
1 7.24
1 3.78
1 9.22
1 4.90
1 3.06
1 0.99
1 7.73
21.1 1
22.43
1 6.87
14.56
1 7.60
1 4.45
1 4.89
1 8.00
24.21
22.95
1 9.59
1 4.67
2 1 .08
24.71
20. 1 8
22. 1 8
1 8.08
1 7.29
1 6.30
8.69
1 6.46
14.26
1 1.23
1 2.90
5.82
1 5. 1 3
1 8.50
1 7.66
1 7.37
1 7.78
1 3.96
1 3.89
1 1.98
1 3 .02
1 0.64
1 8.09
1 9.92
24.02
1 3.62
24.69
1 3.59
10.97
1 6.25
4.93
21 .28
1 0.00
1 8.47
4.75
23.36
6.40
1 7.65
1 3.67
14. 1 5
14. 1 6
1 0.63
1 7.78
1 1.99
1 3.65
1 7.64
1 1.00
1 7.07
1 5.07
1 6.24
1 4.92
1 1.62
1 6.75
1 6.54
1 0.82
1 8.45
Bibliography
Ahlfors, L.V. (1953), Complex Analysis, McGraw-Hill, New York.
Akaike, H. ( 1969), Fitting autoregressive models for prediction, A nnals of the
Institute of Statistical Mathematics, Tokyo, 21, 243-247.
Akaike, H. ( 1 973), Information theory and an extension of the maximum likelihood
principle, 2nd International Symposium on Information Theory, B.N. Petrov and F.
Csaki (eds.), Akademiai Kiado, Budapest, 267-28 1 .
Akaike, H . ( 1 978), Time series analysis and control through parametric models,
Applied Time Series Analysis, D.F. Findley (ed.), Academic Press, New York.
Amos, D.E. and Koopmans, L.H. ( 1 963), Tables of the distribution of coherences for
stationary bivariate Gaussian processes, Sandia Corporation Monograph SCR-483.
Anderson, O.D. ( 1 976), Time Series Analysis and Forecasting. The Box-Jenkins Ap­
proach, Butterworths, London.
Anderson, T.W. ( 1 97 1 ), The Statistical Analysis of Time Series, John Wiley, New
York.
Anderson, T.W. ( 1 980), Maximum likelihood estimation for vector autoregressive
moving average models, Directions in Time Series, D.R. Brillinger and G.C. Tiao
(eds.), Institute of M athematical Statistics, 80- 1 1 1 .
Ansley, C. F . ( 1979), A n algorithm for the exact likelihood of a mixed autoregressive­
moving average process, Biometrika, 66, 59-65.
Ansley, C. F. and Kohn, R. ( 1 985), On the estimation of ARIMA models with missing
values, Time Series A nalysis of Irregularly Observed Data, E. Parzen (ed.), Springer
Lecture Notes in Statistics, 25, 9-37.
Aoki, M. ( 1987), State Space Modelling of Time Series, Springer-Verlag, Berlin.
Ash, R.B. ( 1 972), Real Analysis and Probability, Academic Press, New York.
Ash, R.B. and Gardner, M . F. ( 1 975), Topics in Stochastic Processes, Academic Press,
New York.
Bartlett, M .S. (1 955), An Introduction to Stochastic Processes, Cambridge University
Press.
Bell, W. and Hillmer, S. (1 990), Initializing the Kalman filter for non-stationary
time series models, Research Report, U.S. Bureau of the Census.
Berk, K .N. ( 1974), Consistent autoregressive spectral estimates, Ann. Statist., 2,
489-502.
562
Bibliography
Billingsley, P. ( 1986), Probability and Measure, 2nd ed., Wiley-lnterscience, New
York.
Birkhoff, G. and Mac Lane, S. (I 965), A Survey of Modern Algebra, MacMillan, New
York.
Blattberg, R . and Sargent, T. ( 1 97 1 ), Regression with non-Gaussian stable distur­
bances: Some sampling results, Econometrica, 39, 501 -5 10.
Bloomfield, P. ( 1 976), Fourier Analysis of Time Series: An Introduction, John Wiley,
New York.
Box, G.E.P. and Jenkins, G.M. ( 1 970), Time Series Analysis: Forecasting and Control,
Holden-Day, San Francisco.
Box, G.E.P. and Pierce, D.A. ( 1 970), Distribution of residual autocorrelations in
autoregressive-integrated moving average time series models. J. Amer. Statist.
Assoc. 65, 1 509-1 526.
Breidt, F.J. and Davis, R.A. ( 1 99 1 ). Time-reversibility, identifiability and indepen­
dence of innovations for stationary time series. J. Time Series Analysis, 1 3, 377390.
Brei man, L. ( 1968), Probability, Addison-Wesley, Reading, Massachusetts.
Brillinger, D. R. ( 1 965), An introduction to polyspectra, Annals of Math. Statist., 36,
1 3 5 1 - 1 374.
Brillinger, D. R . ( 198 1 ), Time Series Analysis: Data Analysis and Theory, Holt, Rine­
h a rt & Winston, New York.
Brillinger, D.R. and Rosenblatt, M . ( 1 967), Asymptotic theory of estimates of kth
order spectra, Spectral Analysis of Time Series, B. Harris (ed.), John Wiley, New
York, 1 53 - 1 88.
Brillinger, D. R. and Rosenblatt, M. ( 1967), Computation and interpretation of kth
order spectra, Spectral Analysis of Time Series, B. Harris (ed.), John Wiley, New
York, 1 89-232.
Brockwell, P.J. and Davis, R.A. ( 1988a), Applications of innovation representations
in time series analysis, Probability and Statistics, Essays in Honor of Franklin A.
Graybill, J.N. Srivastava (ed.), Elsevier, Amsterdam, 6 1 -84.
Brockwell, P.J. and Davis, R.A. ( 1988b), Simple consistent estimation of the
coefficients of a linear filter, Stoch. Processes and Their Applications, 22, 47-59.
Brockwell, P.J., Davis, R.A. and Salehi, H. ( 1990), A state-space approach to transfer
function modelling, in Inference from Stochastic Processes, I.V. Basawa and N.U.
Prabhu (eds).
Burg, J.P. ( 1967), Maximum entropy spectral analysis, 37th Annual International
S.E.G. Meeting, Oklahoma City, Oklahoma.
Cambanis, S. and Soltani, A . R . ( 1 982), Prediction of stable processes: Spectral and
moving average representations, Z. Wahrscheinlichkeitstheorie verw. Geb., 66, 5936 1 2.
Chatfield, C. ( 1984), The Analysis of Time Series: An Introduction, 3rd ed., Chapman
and Hall, London.
Churchill, R.V. ( 1 969), Fourier Series and Boundary Value Problems, McGraw-Hill,
New York.
Cline, D.B.H. ( I 983), Estimation and linear prediction for regression, autoregression
and A R M A with infinite variance data, Ph.D. Dissertation, Statistics Department,
Colorado State University.
Cline, D.B.H. and Brockwell, P.J. ( 1 985), Linear prediction of A RM A processes with
infinite variance, Stoch. Processes and Their Applications, 1 9, 28 I -296.
Cooley, J.W. and Tukey, J.W. ( 1 965), An algorithm for the machine calculation of
complex Fourier series, Math. Comp., 19, 297-30 1 .
Cooley, J.W., Lewis, P.A.W. and Welch, P.O. ( 1 967), Historical notes o n the fast
Fourier transform, IEEE Trans. Electroacoustics, A U-15, 76-79.
Bibliography
563
Davies, N., Triggs, C.M . and Newbold, P. (1 977), Significance levels of the Box-Pierce
portmanteau statistic in finite samples, Biometrika, 64, 5 1 7-522.
Davis, H.F. (1 963), Fourier Series and Orthogonal Functions, Allyn & Bacon, Boston.
Davis, M.H.A. and Vinter, R.B. (1985), Stochastic Modelling and Control, Chapman
and Hall, London.
Davis, R.A. and Resnick, S.I. ( 1 986), Limit theory for the sample covariance and
correlation functions of moving averages, Ann. Statist., 14, 533-558.
Doob, J.L. (1 953), Stochastic Processes, John Wiley, New York.
Dunsmuir, W. and Hannan, E.J. ( 1 976), Vector linear time series models, Adv. Appl.
Prob., 8, 339-364.
Duong, Q.P. (1 984), On the choice of the order of autoregressive models: a ranking
and selection approach, J. Time Series Anal., 5, 145- 1 57.
Fama, E. ( 1965), Behavior of stock market prices, J. Bus. U. Chicago, 38, 34- 105.
Feller, W. (1971), A n Introduction to Probability Theory and Its A pplications, Vol. 2,
2nd ed., John Wiley, New York.
Fox, R. and Taqqu, M.S. (1 986), Large sample properties of parameter estimates
for strongly dependent stationary Gaussian time series, A nn. Statist., 14, 5 1 7532.
Fuller, W.A. ( 1 976), Introduction to Statistical Time Series, John Wiley, New York.
Gardner, G., Harvey, A.C. and Phillips, G.D.A. ( 1 980), An algorithm for exact
maximum likelihood estimation of autoregressive-moving average models by means
of Kalman filtering, Applied Statistics, 29, 3 1 1 -322.
Gentleman, W.M. and Sande, G. ( 1 966), Fast Fourier transforms for fun and profit,
AFIPS, Proc. 1 966 Fall Joint Computer Conference, Spartan, Washington, 28,
563-578.
Geweke, J. and Porter-Hudak, S. ( 1 983), The estimation and application of long­
memory time series models, J. Time Series Analysis, 4, 221 -238.
Gihman, 1 . 1 . and Skorohod, A.V. ( 1 974), The Theory of Stochastic Processes I,
translated by S. Kotz, Springer-Verlag, Berlin.
de Gooijer, J.G., Abraham, B., Gould, A. and Robinson, L. ( 1985), Methods of
determining the order of an autoregressive-moving average process: A survey, Int.
Statist, Review, 53, 301-329.
Gradshteyn, I.S. and Ryzhik, I . M . (1 965), Tables of Integrals, Series and Products,
Academic Press, New York.
Granger, C.W. ( 1 980), Long memory relationships and the aggregation of dynamic
models, Econometrics, 14, 227-238.
Granger, C. W .1. and Andersen, A.P. ( 1 978), Non-linear time series modelling, Applied
Time Series Analysis, D.F. Findley (ed.), Academic Press, New York.
Granger, C.W. and Joyeux, R. ( 1 980), An introduction to long-memory time series
models and fractional differencing, J. Time Series Analysis, 1, 1 5-29.
Gray, H .L., Kelley, G.D. and Mcintire, D.O. (1 978), A new approach to ARMA
modelling, Comm. Statist., B 7 , 1 -77.
Graybill, F.A. ( 1 983), Matrices with Applications in Statistics, Wadsworth, Belmont,
California.
Grenander, U. and Rosenblatt, M . ( 1 957), Statistical Analysis of Stationary Time
Series, John Wiley, New York.
Grunwald, G.K., Raftery, A.E. and Guttorp, P. (1989), Time series of continuous
proportions, Statistics Department Report, 6, The University of Melbourne.
Hannan, E.J. (1 970), Multiple Time Series, John Wiley, New York.
Hannan, E.J. ( 1973), The asymptotic theory of linear time series models, J. Appl. Prob.,
10, 1 30- 145.
Hannan, E.J. ( 1980). The estimation of the order of an ARMA process. Ann.
Statist. 8, 1 07 1- 1 08 1 .
564
Bibliography
Hannan, E.J. and Deistler, M. ( 1988), The Statistical Theory of Linear Systems, John
Wiley, New York.
Harvey, A.C. ( 1984), A unified view of statistical forecasting procedures, J. Forecasting,
3, 245-275.
Hosking, J. R. M . ( 1981 ), Fractional differencing, Biometrika, 68, 1 65- 1 76.
H urst, H. ( 1 95 1 ), Long-term storage capacity of reservoirs, Trans. Amer. Soc. Civil
Engrs., 1 16, 778-808.
Hurvich, C.M . and Tsai, C.L. ( 1989). Regression and time series model selection
in small samples. Biometrika 76, 297-307.
Jagers, P. ( 1 975), Branching Processes with Biological Applications, Wiley-lnterscience,
London.
Jones, R.H. ( 1 975), Fitting autoregressions, J. Amer. Statist. Assoc., 70, 590-592.
Jones, R.H. ( 1 980), Maximum likelihood fitting of ARMA models to time series with
missing observations, Technometrics, 22, 389-395.
Jones, R.H. ( 1 985), Fitting multivariate models to unequally spaced data, Time Series
Analysis of Irregularly Observed Data, E. Parzen (ed.), Springer Lecture Notes in
Statistics, 25, 1 58 - 1 88.
Kailath, T. ( 1 968), An innovations approach to least squares estimation-Part 1 :
Linear filtering i n additive white noise, IEEE Transactions on Automatic Control,
A C- 13, 646-654.
Kailath, T. ( 1 970), The innovations approach to detection and estimation theory,
Proceedings IEEE, 58, 680-695.
Kalman, R.E. ( 1 960), A new approach to linear filtering and prediction problems,
Trans. A SME, J. Basic Eng., 83D, 35-45.
Kendall, M .G. and Stuart, A. ( 1 976), The Advanced Theory of Statistics, Vol. 3,
Griffin, London.
Kitagawa, G. ( 1987), Non-Gaussian state-space modelling of non-stationary time
series, J . A . S. A , 82 (with discussion), 1032-1063.
Koopmans, L.H. ( 1974), The Spectral Analysis of Time Series, Academic Press, New
York.
Lamperti, J. ( 1 966), Probability, Benjamin, New York.
Lawrance, A .J. and Kottegoda, N .T. ( 1 977), Stochastic modelling of riverflow time
series, J. Roy. Statist. Soc. Ser. A, 140, 1 -47.
Lehmann, E.L. ( 1 983), Theory of Point Estimation, John Wiley, New York.
Li, W.K. and McLeod, A.I. ( 1 986), Fractional time series modelling, Biometrika, 73,
2 1 7-22 1 .
Lii, K.S. and Rosenblatt, M . ( 1982), Deconvolution and estimation of transfer function
phase and coefficients for non-Gaussian linear processes, Ann. Statist., 10,
1 195-1 208.
Ljung, G.M. and Box, G.E.P. ( 1 978), On a measure of lack of fit in time series models,
Biometrika, 65, 297-303.
McLeod, A. I. and Hipel, K. W. ( 1 978), Preservation of the rescaled adjusted range, I.
A reassessment of the Hurst phenomenon, Water Resources Res., 14, 49 1 -508.
McLeod, A.I. and Li, W.K. ( 1 983), Diagnostic checking A RM A time series models
using squared-residual autocorrelations, J. Time Series A nalysis, 4, 269-273.
Mage, D.T. ( 1 982), An objective graphical method for testing normal distributional
assumptions using probability plots, A merican Statistician, 36, 1 1 6- 1 20.
Makridakis, S., Andersen, A . , Carbone, R., Fildes, R., Hibon, M . , Lewandowski, R.,
Newton, J., Parzen, E. and Winkler, R. ( 1 984), The Forecasting A ccuracy of Major
Time Series Methods, John Wiley, New York.
Melard, G. ( 1 984), A fast algorithm for the exact likelihood of moving average models,
Applied Statistics, 33, 104- 1 14.
Bibliography
565
Mood, A . M . , Graybill, F.A. and Boes, D.C. ( 1 974), Introduction to the Theory of
Statistics, McGraw-Hill, New York.
Nicholls, D.F. and Quinn, B.G. (1 982), Random Coefficient A utoregressive Models:
An Introduction, Springer Lecture Notes in Statistics, 1 1 .
Parzen, E. ( 1 974), Some recent advances in time series modelling, IEEE Transactions
on Automatic Control, A C-I9, 723-730.
Parzen, E. ( 1 978), Time series modeling, spectral analysis and forecasting, Directions
in Time Series, D.R . Brillinger and G.C. Tiao (eds.), Institute of Mathematical
Statistics, 80- 1 1 1 .
Priestley, M.B. ( 1 98 1 ), Spectral A nalysis and Time Series, Vols. I and 2, Academic
Press, New York.
Priestley, M.B. ( 1 988), Non-linear and Non-stationary Time Series Analysis, Academic
Press, London.
Rao, C.R . ( 1 973), Linear Statistical Inference and Its Applications, 2nd ed., John Wiley,
New York.
Rice, J. ( 1979). On the estimation of the parameters of a power spectrum, J.
Multivariate A nalysis, 9, 378-392.
R issanen, J. (1 973), A fast algorithm for optimum linear predictors, IEEE Transac­
tions on Automatic Control, A C- I8, 555.
Rissanen, J. and Barbosa, L. ( 1 969), Properties of infinite covariance matrices and
stability of optimum predictors, Information Sci. , I, 221 -236.
Rosenblatt, M. ( 1 985), Stationary Sequences and Random Fields, Birkhauser, Boston.
Schweppe, F.C. ( 1965), Evaluation of likelihood functions for Gaussian signals, IEEE
Transactions on Information Theory, IT- I I , 61 -70.
Seeley, R.T. ( 1 970), Calculus of Several Variables, Scott Foresman, Glenview, Illinois.
Serfling, R.J. ( 1980), Approximation Theorems of Mathematical Statistics, John Wiley,
New York.
Shapiro, S.S. and Francia, R.S. ( 1 972), An approximate analysis of variance test for
normality, J. Amer. Statist. Assoc., 67, 2 1 5-216.
Shibata, R . ( 1 976), Selection of the order of an autoregressive model by A kaike's
information criterion, Biometrika, 63, 1 1 7- 1 26.
Shibata, R. ( 1980). Asymptotically efficient selection of the order of the model for
estimating parameters of a linear process. Ann. Statist. 8, 1 47-1 64.
Simmons, G.F. ( 1 963), Introduction to Topology and Modern Analysis, McGraw-H ill,
New York.
Sorenson, H.W. and Alspach, D.L. ( 1 97 1 ), Recursive Bayesian estimation using
Gaussian sums, Automatica, 7, 465-479.
Sowell, F.B. ( 1990). Maximum likelihood estimation of stationary univariate
fractionally integrated time series models, J. of Econometrics, 53, 1 65 - 1 88.
Stuck, B.W. ( 1978), M inimum error dispersion linear filtering of scalar sym­
metric stable processes, IEEE Transactions on Automatic Control, AC-23, 507509.
Stuck, B.W. and Kleiner, B. (1 974), A statistical analysis of telephone noise, The Bell
System Technical Journal, 53, 1 263- 1 320.
Subba Rao, T. and Gabr, M . M . ( 1984), An Introduction to Bispectral Analysis and
Bilinear Time Series Models, Springer Lecture Notes in Statistics, 24.
Taqqu, M . S. ( 1 975), Weak convergence to fractional Brownian motion and to the
Rosenblatt process, Z. Wahrscheinlichkeitstheorie verw. Geb., JI, 287-302.
Tong, H. ( 1 983), Threshold Models in Non-linear Time Series Analysis, Springer
Lecture Notes in Statistics, 2 1 .
Tong, H . ( 1 990), Non-linear Time Series: A Dynamical Systems Approach, Oxford
University Press, Oxford.
566
Bibliography
Tukey, J. ( 1949), The sampling theory of power spectrum estimates, Proc. Symp. on
Applications of Autocorrelation Analysis to Physical Problems, NA V EXOS-P-735,
Office of Naval Research, Washington, 47-67.
Walker, A . M . ( 1 964), Asymptotic properties of least squares estimates of the param­
eters of the spectrum of a stationary non-deterministic time series, J. A ust. Math.
Soc., 4, 363-384.
Wampler, S. ( 1988), Missing values in time series analysis, Statistics Department,
Colorado State University.
Weiss, G. ( 1975), Time-reversibility of linear stochastic processes, J. App/. Prob., 12,
83 1-836.
Whittle, P. ( 1962), Gaussian estimation in stationary time series, Bull. lnt. Statist. lnst.,
39, 105- 1 29.
Whittle, P. ( 1963), On the fitting of multivariate autoregressions and the approximate
canonical factorization of a special density matrix, Biometrika, 40, 1 29-1 34.
Whittle, P. ( 1 983), Prediction and Regulation by Linear Least-Square Methods, 2nd
ed., University of Minnesota, Minneapolis.
Wilson, G.T. ( 1969), Factorization of the generating function of a pure moving
average process, SIA M J. Num. Analysis, 6, 1 -7.
Yajima, Y. ( 1 985), On estimation of long-memory time series models, The A ustralian
J. Statistics, 27, 303-320.
Index
Accidental deaths in the USA, 1 973-1978
(monthly) 7, 2 1 -25, 1 1 3,
324--3 26
ACF (see Autocorrelation function)
AIC 273, 284--2 87, 291 -296, 302
AICC 243, 247, 252, 273, 287, 289,
302, 365, 43 1 , 432
Airline passenger data 1 5, 284-287,
29 1 -296
All Star baseball games (1 933-1980) 5,
10
Amplitude spectrum (see Crossamplitude spectrum)
AN (see Asymptotic normality)
Analysis of variance 332, 335
AR( oo) processes (see Autoregressive
processes of infinite order)
AR(p) processes (see Autoregressive
processes)
ARIMA processes 274--2 83
definition 274
prediction 3 14--320
seasonal (see Seasonal ARIMA
models)
state-space representation of 471
ARIMA(O, d, 0) processes with
.5 < d < .5 (see Fractionally
integrated noise)
ARIMA(p, d, q) processes with
.5 < d < .5 (see Fractionally
integrated ARMA processes)
ARMA(p, q) processes 78
�
�
autocovariance function of 91-97
calculating coefficients in AR
representation 87, I l l
in MA representation 85, 91-92,
473
causal 83, 85
equations for 78
estimation
least squares 257
maximum likelihood 257
preliminary 250-252
identification 252, 296--299
invertible 86, 1 28
multivariate (see M ultivariate A R MA
processes)
prediction 1 7 5-1 82
seasonal (see Seasonal ARIMA
models)
spectral density of 1 23
state-space representations 468-471
canonical observable
representation 469-47 1
with infinite variance 537
with mean 11 78
with observational error 472-473
ARVEC 432, 434, 460
Asymptotic normality 209
for random vectors 2 1 1
Asymptotic relative efficiency 253-254
Autocorrelation function (ACF) 1 2
analogue for linear processes with
infinite variance 538
568
Autocorrelation function (ACF) (cont.)
sample 29, 220, 527, 538
asymptotic covariance matrix
of 221, 222
asymptotic distribution of 221,
222, 538
Autocovariance function
characterization of 27
definition I I
difference equations for 93
elementary properties of 26
Herglotz's theorem 1 1 7-1 1 8
of ARMA processes 91-97, 473-474
of AR(p) processes 94
of AR(2) processes 95
of complex-valued time series (see
Complex-valued
autocovariance function)
of MA(q) processes 94
of MA( I ) processes 9 1
sample 28, 220
asymptotic covariance matrix
of 230
asymptotic distribution
of 229-230
computing using FFT 374-375
consistency of 232
spectral representation of 1 1 7-1 1 8
Autocovarianee generating
function 103-1 05
of ARMA processes 103
Autoregressive integrated moving
average (see ARIMA processes)
Autoregressive models with random
coefficients 548
Autoregressive moving average (see
ARMA processes)
Autoregressive polynomial 78
Autoregressive (AR(p)) processes 79
AR( I ) process 79, 81
with infinite variance 542-552
estimation
Burg 366
maximum likelihood 259
preliminary 241-245
Yule-Walker 1 60, 239, 241 , 262,
279, 365, 542
asymptotic distribution of 241,
262-264
FPE 301-302, 307
identification of the order 242-243
partial autocorrelation function
of 100
prediction 1 77
I ndex
Yule-Walker equations 239, 279
Autoregressive processes of infinite
order (AR(cx:: )) 91, 405
Autoregressive spectral density
estimator 365
Backward shift operator 1 9,
78, 4 1 7
Bandwidth 362, 399
Bartlett's formula 22 1 , 222, 527
AR( I ) 224-225
independent white noise 222
MA(q) 223-224
multivariate 4 1 5, 4 1 6, 508
Bayesian state-space model 498, 505
Bessel's inequality 56
Best estimator 474
Best linear predictor 64, 1 66, 1 68, 3 1 7,
546
of second order random
vectors 421
Best linear unbiased estimator of
J1 220, 236
Best mean square predictor 62, 546
Best one-step predictor 474
BIC criterion 289, 29 1, 306
Bienayme-Galton-Watson process 10
Bilinear models 548, 552
Binary process 9
Bispectral density 547
Bivariate normal distribution 35-36, 37
Bounded in probability 199 (see also
Order in probability)
Branching process 10
Brownian motion 37-38
with drift 38
on [ - n, n] 1 39, 147, 1 62, 1 64
multivariate 454-455
CAT criterion 287, 365
Cauchy criterion 48
Cauchy distribution 449, 461
Cauchy-Schwarz inequality 43
Cauchy sequence 46
Causal 83, 1 05, 1 25-1 30
ARMA processes 85, 88
fractionally integrated ARMA
processes 521, 525
multivariate ARMA processes 4 1 8,
424
seasonal ARIMA processes 323
state equation 467
time-invariant linear filter 1 53, 459
Index
Cayley-Hamilton theorem 470, 492
Central limit theorem 2 1 0
for infinite order moving
averages 2 1 9
for MA(q) processes 2 1 4
for strictly stationary m-dependent
sequences 2 1 3
Lindeberg's condition 2 1 0, 345, 363,
445
Characteristic function 1 1
of multivariate normal random
vector 34
of a random vector 1 1
of a stable random variable 535
Chebyshev's inequality 203
Cholesky factorization 255
Circulant matrix 133
approximation to the covariance
matrix of a stationary process
by 1 35-1 36
diagonalization of 1 34-- 1 35
eigenvalues of 1 34
eigenvectors of 1 34
Classical decomposition 14-- 1 5, 40,
284
Closed span 54
Closed subspace 50
Coherency 436 (see also Squared
coherency)
Complete orthonormal set 56
Completeness of L 2(0, :F, P) 68-69
Complex multivariate normal
distribution 444
Complex-valued autocovariance
functions 1 1 5, 1 1 9, 1 20
Complex-valued stationary
processes 1 14-1 1 5
autocovariance function o f 1 1 5
existence of 1 64
linear combination of
sinusoids 1 1 6-- 1 1 7
spectral representation of 145
Conditional expectation 63-65, 76
Confidence regions
for parameters of ARMA
models 260--261, 291
for parameters of AR(p) models 243
for parameters of MA(q) models 247
for the absolute coherency 450, 453
for the mean vector of a stationary
multivariate time series 407
for the phase spectrum 449-450, 453
for the spectrum 362-365 (see also
Spectral density estimation)
569
Conjugate transpose 402
Consistency condition
for characteristic functions 1 1
for distribution functions 1 1
Controllability 49 1
Controllability matrix 492
Convergence
almost surely 375
in distribution 204-209
characterizations of 204
in mean square 47
in norm 45, 48
in probability 1 98, 199, 375
in rth mean 203
Correlation function (see
Autocorrelation function)
Correlation matrix function 403
Cospectrum 437, 462
estimation of 447
Covariance function (see
Autocovariance function)
Covariance matrix 32
Covariance matrix function 403
characterization of 454
estimation of 407
properties of 403
spectral representation of 435, 454
Covariance matrix generating
function 420, 460
Cramer-Wold device 204
Cross-amplitude spectrum 437
estimation of 448
Cross-correlation function 406
sample 408
asymptotic distribution of 4 1 0,
416
Bartlett's formula 4 1 5, 4 1 6
weak consistency o f 408
Cross-covariance function 403
sample 408
weak consistency of 408
spectral representation of 435
Cross-periodogram 443
Cross-spectrum 435
Cumulant 444, 547
Cumulant function 547
of linear processes 548
Current through a resistor 2
Delay parameter 508, 549
Deterministic 1 50, 1 87
Diagnostic checking 306--3 1 4 (see also
Residuals)
Index
570
Difference equations (see also
Homogeneous linear difference
equations)
for ARIMA processes 274, 3 1 6
for h-step predictor 3 1 9-320
multivariate 429-430
for multivariate ARMA
processes, 4 1 7
Difference operator
first order 19
with positive lag d 24
with real lag d > I 520
Differencing to generate stationary
data 1 9, 274-284
at lag d 24
Dimension of a subspace 58
Dirichlet kernel 69-70, ! 57, 359, 361
Discrete Fourier transform 332, 373
multivariate 443
Discrete spectral average (see Spectral
density function)
Dispersion 535, 537
Distribution function associated with an
orthogonal-increment
process 1 39, 454
Distribution functions of a stochastic
process 1 1 , 4 1 , 1 64
Dow Jones utilities index 555
Durbin-Levinson algorithm 1 69-1 70,
269, 422
for fitting autoregressive
models 242, 432
-
Econometrics model 440-44 1
Empirical distribution function 338,
341
Equivalent degrees of freedom 362, 399
Ergodic stationary sequence 379
Ergodic theorem 268, 379, 381
Error dispersion 543, 544
Estimation of missing values in an
ARMA process 488
Estimation of the white noise variance
least squares 258, 377
maximum likelihood 257, 377
preliminary 251
using the Durbin-Levinson
algorithm 240, 242
using the innovations
algorithm 245-246
Euclidean space 43, 46
completeness of 46
Eventual forecast function 549
Fast Fourier transform
(FFT) 373-375
Fejer kernel 70-71 , 360
FFT (see Fast Fourier transform)
Filter 350 (see also Time-invariant
linear filter and smoothing)
low pass 1 8
successive application of filters 354,
398
Fisher's test (see Testing for hidden
periodicities)
Forecasting (see Prediction of
stationary processes)
Forecasting ARIMA
processes 3 1 4-320
an ARIMA(1, 2, 1) example 3 1 8-320
h-step predictor 3 1 8
mean square error of 3 1 8
Fourier coefficients 56, 66
Fourier frequencies 3 3 1
Fourier series 65-67
nth order Fourier approximation 66
to an indicator function 1 57- 1 58
uniform convergence of Cesaro
sums 69
Fourier transform 396
FPE criterion 289, 301-302, 306
Fractionally integrated ARMA
processes 524
autocorrelation function of 525
causal 525
estimation 526--5 32
maximum likelihood 527-528
regression method 529-530
invertible 525
prediction 533-534
spectral density of 525
with d < - .5 526
Fractionally integrated noise 521
autocorrelation function of 522
autocovariance function of 522
partial autocorrelation function
of 522
spectral density function of 522
Frequency domain 1 14, 1 43, 330
prediction in 1 85-1 87
Gain (see Time-invariant linear
filter)
Gamma function 521
Gaussian likelihood 254, 255 (see also
Recursive calculation of the
Gaussian likelihood)
571
Index
of a multivariate ARMA
process 430-43 1
of an AR(p) process 270
of an ARMA process 256
with missing observations 486
of general second order
process 254-256
Gaussian linear process 546
Gaussian time series 1 3
bivariate 1 64, 4 1 6
multivariate 430
prediction 1 82
General linear model 60-62, 75
Generalized inverse 475, 503
Gibbs phenomenon 400
Gram-Schmidt orthogonalization 58,
74, 75, 1 7 1 , 381
Group delay 440
Helly's selection theorem 1 1 9
Herglotz's theorem 28, 1 1 7-122
Hermite polynomials 75
Hermitian function 1 1 5
Hermitian matrix 435
Hessian matrix 29 1
Hilbert space, definition 46
closed span 54
closed subspace 50
complex L2 spaces 47
Euclidean space 46
isomorphisms, 67-68
[2 67, 76
L2 46, 47
U[ - n, n] 65
orthogonal complement 50
separable 56
Homogeneous linear difference
equations 105-1 1 0
first order I 08
general solution of 107
initial conditions 106
linear independent solutions of 106
second order 108
Identification techniques 284-301
of ARMA processes 296-299
of AR(p) processes 289-291
of fractionally integrated ARMA
processes 530
of MA(q) processes 291-294
of seasonal ARIMA processes 323
78, 404
Independent white noise 222 (see l iD)
Index set 8
Inner product 42-43
continuity of 45
Inner-product space 42-43
complete 46
orthonormal set 55
Innovations 1 29, 1 73, 476
standardized 265
Innovations algorithm 1 72
applied to ARMA processes 1 75
for preliminary estimation of MA(q)
models 245-249, 292-294,
367
multivariate 423
Integration with respect to an
orthogonal-increment process
(see Stochastic integral)
Intermediate memory processes 520
Inversion formulae (see Spectral
distribution function and
orthogonal-increment
processes)
Invertible 86, I 05, 1 25-1 30
ARMA processes 86, 87, 1 28
fractionally integrated ARMA
processes 521, 525
infinite variance ARMA
processes 538
multivariate ARMA processes 419
time-invariant linear filter 1 5 5
Isomorphism 67, 68
between time and frequency
domains 143-144
properties of 68
liD
Kalman recursions 474-482
filtering 478
fixed-point smoother 478
prediction 476
ARIMA(p, I , q) 48 1
h-step 477-478
Kolmogorov-Smirnov test (see Testing
for hidden periodicities)
Kolmogorov's formula 191, 197, 366
Kolmogorov's theorem 9, 1 0, 27, 38,
41, 1 64
statement I I
Kullback-Leibler discrepancy 302
Kullback-Leibler index 302
572
Lag window 358
Bartlett or triangular 360, 361, 399
Blackman-Tukey 361
Daniell 360, 361, 399
Parzen 361, 399
rectangular or truncated 359
Tukey-Hamming 361
Tukey-Hanning 361
Lag-window spectral estimator (see
Spectral density estimation)
Lake Huron (1 875-1 972) 328, 398, 555
Laurent series 88, 1 30
Least squares estimation for ARMA
processes 257-258, 377
asymptotic properties 258-260, 384,
386
derivation of 265-269, 376-396
of variance 258
Least squares estimation for transfer
function models 509
Least squares estimation of trend 1 5
Lebesgue decomposition of the spectral
measure 1 90
Lebesgue measure 190
Likelihood function (see Gaussian
likelihood)
Lindeberg's condition (see Central limit
theorem)
Linear approximation in IR 3 48
in L 2 49
Linear filter 1 7, 1 52, 441 (see also
Time-invariant linear filter)
Linear multivariate processes 404
Linear processes with infinite
variance 535-545
analogue of the autocorrelation
function 538
Linear regression 60--6 2
Long memory processes 520--5 34
Lynx trappings ( 1 82 1-1934) 546,
549-552, 557
m-dependence 2 1 2-21 3, 263
Matrix distribution function 454
MA(q) (see Moving average processes)
MA(oo) (see Moving average processes
of infinite order)
Markov chain 196
Markov property 465
Martingale difference sequence 546, 553
Maximum entropy spectral density
estimator (see Spectral density
estimation)
Index
Maximum likelihood estimation 256
Maximum likelihood estimation for
ARMA processes 256-258
asymptotic properties 258-260, 384,
386
derivation of 265-269, 376-396
Maximum likelihood estimation for
fractionally integrated ARMA
processes 527-528
M aximum likelihood estimation for
transfer function models 5 1 4
Maximum likelihood spectral density
estimator (see Spectral density
estimation)
Mean
best linear unbiased estimator
of 220, 236
of a multivariate stationary time
series 403
of a random vector 32
sample 29, 2 1 8, 406
asymptotic normality of 2 1 9, 406
derivation of 225
mean squared error of 2 1 8-21 9
Mean square convergence 47, 62
properties of 62
Minimum dispersion h-step
predictor 544
M inimum dispersion linear
predictor 544
Minimum mean squared error of
prediction of a stationary
process 53-54
Mink trappings ( 1848-1 9 1 1 ) 46 1, 557
Missing values in ARMA
processes 482--488
estimation of 487-488
likelihood calculation with 483--486
MLARMA (see Spectral density
estimation)
Moment estimator 240, 249, 253, 270,
362
Moment generating function 39, 41
Moving average polynomial 78
Moving average (MA(q)) processes 78,
89-90
autocorrelation function of 80
autocovariance function of 79
estimation
innovations 245, 270
maximum likelihood 259, 270
method of moments 249, 253,
270, 540
preliminary 245-250, 291-294
Index
invertible and non-invertible
versions 295, 326
order identification of 246--247,
291-294
prediction 1 77
with infinite variance 540-541
Moving average processes of infinite
order (MA( oo )) 89-9 1
autocovariance function of 9 1
multivariate 405
with infinite variance 536
Multiple correlation coefficient 45 1
Multivariate ARMA processes 4 1 7
AR( oo ) representation 4 1 7
causal 420
covariance matrix function of 420
covariance matrix generating
function of 420
estimation 430-434
maximum likelihood 43 1
Yule-Walker 432
identification 431
invertible 42 1
MA( oo ) representation 4 1 8
prediction 426-430 (see also
Recursive prediction)
Yule-Walker equations 420
Multivariate innovations
algorithm 425
applied to an ARMA( 1 , 1)
process 427-428
one-step predictors 425
prediction error covariance
matrix 425
Multivariate normal
distribution 33-37
an equivalent characterization of 36
characteristic function of 34
conditional distribution 36
conditional expectation 36
density function 34
Multivariate time series 402
covariance matrices of 402
mean vectors of 402
prediction 421-430
stationary 402-403
Multivariate white noise 404
Muskrat trappings ( 1 848-19 1 1) 461,
557
Noise 1 5 (see also White noise)
Non-deterministic, purely 1 89, 1 97,
546
573
Non-negative definite function 26-27,
1 1 5, 1 1 7
Non-negative definite matrix 33
Non-stationary time series 29, 274
Norm 43
convergence in 45, 48
properties of 45
Observability 496
Observability matrix 496
Observational equation 464
One-step mean squared error based on
infinite past 1 87, 295
Kolmogorov's formula 1 9 1
One-step predictors (see Prediction of
stationary processes)
Order in probability 1 99
for random vectors 200
Order selection 301-306
Orthogonal
complement of a subspace 50
elements in a Hilbert space 44
matrix 33
eigenvalues of 33
eigenvectors of 33
projection 5 1
random vectors 421, 464
Orthogonal-increment
process 1 38-1 40, 145, 1 52
integration with respect to 1 40-143
inversion formula for 1 5 1 , 152
right continuous 1 35, 454
vector-valued 454
Orthonormal basis 56
for C" 3 3 1
for IR " 333
Orthonormal set 55
complete 56
of random variables 55
PACF (see Partial autocorrelation
function)
Parallelogram law 45
Parameters of a stable random
variable 535
Pareto-like tails 535
Parseval's identity 57
Partial autocorrelation function
(PACF) 98- 1 02
an equivalent definition of 1 02, 1 7 1
estimation o f 102, 243
of an AR( 1 ) process 98
Index
574
Partial autocorrelation function (cont.)
of an AR(p) process 100
of an MA( l ) process 100, 101, 1 1 3
sample 1 02, 243
Periodogram 332
asymptotic covariance of ordinates
for iid sequences 344
for two-side moving
averages 347-348
asymptotic distribution
for iid sequences 344
for two-sided moving
averages 347-348
asymptotic unbiasedness 343
cumulative periodogram 341
extension to [ - n, n] 343
multivariate 443
asymptotic properties of 443-446
smoothing of 350-362, 446
PEST 23, 24, 25, 40, 1 1 1 , 1 60, 1 6 1 ,
243, 252, 257, 26 1 , 270, 276,
277, 284, 292, 295, 414, 508,
5 1 0, 550
Phase spectrum 437
confidence interval for 449-450, 453
estimation of 448-450, 452, 453
Poisson process 38
on [ - n, n] 1 39
Polyspectral density 547
Population of USA ( 1 790-1980) 3,
1 5-16, 20
Portmanteau test for residuals 3 1 0-3 1 2
Power transfer function 1 23 (see also
Time-invariant linear filter)
Prediction bounds 1 82
Prediction equations 53
for h-step predictors 1 68
for multivariate time series 42 1
for one-step predictors 1 67
in the time domain 1 67
Prediction of stationary processes (see
also Recursive prediction)
AR(p) processes 1 77
ARMA processes 1 75-182
based on infinite past 1 82-1 84
covariance matrix of prediction
errors 1 84
h-step prediction 1 79-1 82
truncation approximation 1 84
best linear predictor of a stationary
process 1 66, 1 68
fractionally integrated ARMA
processes 533-534
Gaussian processes 1 82
prediction bounds 182
h-step prediction 1 68, 1 74-1 7 5,
1 79-1 82
mean squared error of 1 75
in frequency domain 1 85-1 87
ARMA processes 186
infinite variance processes 542-545
M A( l ) processes 544
MA( l ) processes 1 73
MA(q) processes 1 77
multivariate ARMA
processes 426-430
one-step predictors 167, 1 72, 425,
474, 476
mean squared error of 1 69, 1 72,
425, 474, 476
Preliminary transformations 284
Prewhitening 412, 4 1 3, 414, 4 1 5, 507
Projection
in IR" 58-60
mapping 52
properties of 52
matrix 59
of multivariate random vectors 421,
475
theorem 5 1
Quadrature spectrum 437, 462
estimation of 447
and S arrays 273
Random noise component 1 5
Random variable 9
Random vector 32
Random walk 1 0, 14
Rational spectral density (see Spectral
density function)
Realizations of a stochastic process 9
Recursive calculation of the Gaussian
likelihood function 254-256
for an ARMA process with missing
observations 483-485
Recursive prediction 1 69- 1 82
Durbin-Levinson
algorithm 1 69-1 70
h-step predictors 1 74-1 75
for ARMA processes 1 79-1 82
mean squared error of 1 8 1
innovations algorithm 1 72
Kalman prediction (see Kalman
recursions)
multivariate processes 422-430
R
Index
Durbin-Levinson
algorithm 422--423
innovations algorithm 425
of a multivariate ARMA
process 426-427
of an AR(p) process 1 77
of an ARMA process 1 75-182
of an MA(q) process 1 77
Reduced likelihood 257, 272
Residuals 307
application to model
modification 299-301
diagnostic checking 306-3 1 4
check o f normality 3 1 4
graph of 307
sample autocorrelation function
of 308-3 1 0
tests o f randomness for 3 1 2-3 1 3
Reversible time series 546
Riemann-Lebesgue lemma 76
Riemann-Stieltjes integral 1 1 6
Sales with a leading indicator 4 1 4,
432-434, 5 10--5 1 2, 5 1 4-520
Sample
autocorrelation function 29, 220,
527, 538
of residuals 307
autocovariance function 28, 220
coefficient of variation 2 1 2
covariance matrix 220
non-negative definite 22 1
mean 29, 2 1 8, 527
multivariate 406
SARIMA (see Seasonal A RIMA models)
Seasonal ARIMA models 320--3 26
Seasonal component 1 5, 284
differencing 24
estimation of 20--2 5
method S1 (small trend) 2 1 , 24
method S2 (moving average) 23,
24
Separable Hilbert space 56
Series A (see Lake Huron ( 1 875-1972))
Series B (Dow Jones Utilities
Index) 329, 555
Series C (Private Housing Units
Started, USA) 327, 554
Series D (Industrial Production,
Austria) 329, 556
Series E (Industrial Production,
Spain) 329, 556
575
Series F (General Index of Industrial
Production) 329, 557
Series G (see Lynx Trappings
( 1 821-1934))
Series H (see M ink Trappings
( 1 848-191 1))
Series I (see Muskrat Trappings
( 1 848-191 1))
Series J (Simulated Input
Series) 460, 553, 558
Series K (Simulated Output
Series) 460, 553, 559
Shift operator (see Backward shift
operator)
Simulation of an ARMA process 27 1
multivariate 460
Simulation of a Gaussian process 27 1
multivariate 460
SMOOTH 1 7
Smoothing
exponential 1 7
by means of a moving average 1 6- 1 9
the periodogram (see Spectral density
estimation)
using a simple moving average 350,
353
S PEC 354, 397, 452, 46 1
Spectral density estimation
autoregressive 365, 367, 369, 370, 372
discrete spectral average 3 5 1
asymptotic properties of 3 5 1 , 353
confidence intervals using x 2
approximation 362-363
confidence intervals using a normal
approximation 363-364
simple moving average 350, 353
lag window 354, 358, 372
asymptotic variance of 359
maximum entropy 365
maximum likelihood ARMA
(MLARMA) 367, 368, 370
moving average 367, 370
rational 365, 372
smoothing the periodogram 350--362
Spectral density function 1 1 8, 1 19,
1 20, 1 22
an asymptotically unbiased estimator
of 1 3 7
i n the real-valued case 1 22
of ARMA processes 1 23
causality and invertibility 1 25-1 30
of an AR( l ) 1 25
of an MA( l ) 123
rational 1 23
576
Spectral density function (cont.)
rational function approximations
to 1 30-133
Spectral density matrix 435, 443, 457
estimation of 446-447
discrete spectral average 446-447
smoothing the
periodogram 446-447
Spectral distribution function 1 1 8,
1 1 9, 145
discontinuity in 1 48, 1 50
in the real-valued case 1 2 1
inversion formula for 1 5 1 , 1 52
Lebesgue decomposition of 1 90
of a linear combination of
sinusoids 1 1 6
Spectral distribution matrix
function 454
Spectral matrix function (see Spectral
density matrix)
Spectral representation
of an autocovariance function 1 1 8
of a continuous-time stationary
process 1 52
of a covariance matrix function 405,
454
of a stationary multivariate time
series 405, 456
of a stationary time series 145
Spectral window 358
Spectrum (see Spectral density function
and cross-spectrum)
Spencer's 1 5 point moving
average 1 8-19, 39
Squared coherency function 436-439
confidence interval for 450, 453
estimation of 450, 453
test of zero coherency 451, 452
Stable matrix 467
Stable random variables 535-536
parameters of 535
positive 2 1 6
properties of 535
symmetric 535
State equation 464
State-space models 463--474
Bayesian 498, 505
for threshold models 548
non-stationary 479--48 1
stationary 467
causal 467
controllable 49 1
innovations representation 489,
490
Index
minimum dimension 497
observable 496
stable 467
with missing observations 482-488
Stationarity 1 2
covariance 1 2
i n the wide sense 1 2
second order 1 2
strict 12
weak 1 2
Stirling's formula 522
Stochastic integral 142, 455
properties of 1 42, 455
with respect to a vector-valued
orthogonal increment
process 455
Stochastic process 8
distribution functions of 1 1 , 4 1
realizations o f 9
Strict stationarity 1 2
Strikes i n the USA ( 1 951-1980) 4, 1 7,
1 8, 1 1 3
Strong consistency of estimators for
ARMA parameters 376-388
Strong law of large numbers 376
Taylor expansions in
probability 201-202
Testing for hidden periodicities
334-342
Fisher's test 337-339, 342
Kolmogorov-Smirnov test applied to
the cumulative
periodogram 339-342
of a specified frequency 334-337
of an unspecified frequency 337-342
Testing for the independence of two
stationary time series 4 1 2-4 1 3
Tests o f randomness 3 1 2-3 1 3
based on the periodogram 339-342
based on turning points 3 1 2
difference-sign test 3 1 3
rank test 3 1 3
Threshold models 545-552
Time domain 1 14, 145
prediction equations 53
Time-invariant linear filter (TLF) 123,
1 53, 438, 439, 457--458, 506
absolutely summable 1 54
amplitude gain 1 56
causal 1 5 3
for multivariate time series 457-459
invertible 1 55
matrix transfer function 458
577
Index
phase gain 1 56
power gain 1 56
power transfer function 123, 1 56
simple delay 442
transfer function 1 23, 1 56, 442
Time series 1
discrete-time
continuous-time 1
Gaussian 1 3
TLF (see Time-invariant linear filter)
TRANS 414, 461 , 509, 5 1 0
Transfer function (see Time-invariant
linear filter)
Transfer function models 432, 506
estimation for 507-5 10, 5 1 4
prediction for 5 1 5, 5 1 7-520
state-space representation
of 5 1 3-5 1 7
Transformations 1 5 (see also
Identification techniques)
variance-stabilizing 284
Trend component 1 5, 284
elimination of 1 5-24
in absence of seasonality 1 5
differencing 1 9
estimation of 1 5, 1 9
b y least squares 1 5
by smoothing with a moving
average 1 6
randomly varying with noise 465,
466
Triangle inequality 44
Trigonometric polynomials 69, 1 50,
1 57
Vandermonde matrix
I 09
Weak law of large numbers 206
for infinite order moving
averages 208
Weakly consistent 253
Weighted sum of squares 257
White noise 78
multivariate 404
Window (see Lag window)
Wold decomposition 1 87, 546
Wolfer sunspot numbers
( 1770-1 869) 6, 29, 32, 1 60,
161, 269, 354, 397
WORD6 40, 1 94
WN (see White noise)
Yule-Walker equations (see
Autoregressive processes and
multivariate ARMA processes)
Download