Time Series Theory and Methods Springer Series in Statistics

Springer Series in Statistics Advisors: P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. Olkin, N. Wermuth, S. Zeger For other titles published in this series, go to http://www.springer.com/series/692 Peter J. Brockwell Richard A. Davis Time Series: Theory and Methods Second Edition �Springer Peter J. Brockwell Department of Statistics Colorado State University Fort Collins, CO 80523 USA Richard A. Davis Department of Statistics Columbia University New York, NY 10027 USA Mathematical Subject Classification: 62-01, 62M10 Library of Congress Cataloging-in-Publication Data Brockwell, Peter J. Time series: theory and methods I Peter J. Brockwell, Richard A. Davis. p. em. -(Springer series in statistics) "Second edition"-Pref. Includes bibliographical references and index. ISBN 0-387-97429-6 (USA).-ISBN 3-540-97429-6 (EUR.) I. Time-series analysis. I. Davis, Richard A. QA280.B76 1991 II. Title. III. Series. 90-25821 519.5'5-dc20 ISBN 1-4419-0319-8 ISBN 978-1-4419-0319-8 Printed on a.cid-free paper. (soft cover) © 2006 Springer Science +Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as a n expres.sion of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 15 14 13 springer.com To our families Preface to the Second Edition This edition contains a large number of additions and corrections scattered throughout the text, including the incorporation of a new chapter on state-space models. The companion diskette for the IBM PC has expanded into the software package I TSM: An Interactive Time Series Modelling Package for the PC, which includes a manual and can be ordered from Springer-Verlag. * We are indebted to many readers who have used the book and programs and made suggestions for improvements. Unfortunately there is not enough space to acknowledge all who have contributed in this way; however, sp�cial mention must be made of our prize-winning fault-finders, Sid Resnick and F. Pukelsheim. Special mention should also be made of Anthony Brockwell, whose advice and support on computing matters was invaluable in the preparation of the new diskettes. We have been fortunate to work on the new edition in the excellent environments provided by the University of Melbourne and Colorado State University. We thank Duane Boes particularly for his support and encouragement throughout, and the Australian Research Council and National Science Foundation for their support of research related to the new material. We are also indebted to Springer-Verlag for their constant support and assistance in preparing the second edition. Fort Collins, Colorado November, 1 990 P.J. BROCKWELL R.A. DAVIS * ITSM: An Interactive Time Series Modelling Package for the PC by P.J. Brockwell a nd R.A. Da vis. ISBN: 0-387-97482-2; 1991. viii Preface to the Second Edition Note added in the eighth printing: The computer programs referred to in the text have now been superseded by the package ITSM2000, the student version of which accompanies our other text, Introduction to Time Series and Forecasting, also published by Springer-Verlag. Enquiries regarding purchase of the professional version of this package should be sent to pjbrockwell @cs.com. Preface to the First Edition We have attempted in this book to give a systematic account of linear time series models and their application to the modelling and prediction of data collected sequentially in time. The aim is to provide specific techniques for handling data and at the same time to provide a thorough understanding of the mathematical basis for the techniques. Both time and frequency domain methods are discussed but the book is written in such a way that either approach could be emphasized. The book is intended to be a text for graduate students in statistics, mathematics, engineering, and the natural or social sciences. It has been used both at the M.S. level, emphasizing the more practical aspects of modelling, and at the Ph.D. level, where the detailed mathematical derivations of the deeper results can be included. Distinctive features of the book are the extensive use of elementary Hilbert space methods and recursive prediction techniques based on innovations, use of the exact Gaussian likelihood and AIC for inference, a thorough treatment of the asymptotic behavior of the maximum likelihood estimators of the coefficients of univariate ARMA models, extensive illustrations of the tech niques by means of numerical examples, and a large number of problems for the reader. The companion diskette contains programs written for the IBM PC, which can be used to apply the methods described in the text. Data sets can be found in the Appendix, and a more extensive collection (including most of those used for the examples in Chapters 1 , 9, 10, 1 1 and 1 2) is on the diskette. Simulated ARMA series can easily be generated and filed using the program PEST. Valuable sources of additional time-series data are the collections of Makridakis et al. (1984) and Working Paper 109 ( 1984) of Scientific Computing Associates, DeKalb, Illinois. Most of the material in the book is by now well-established in the time series literature and we have therefore not attempted to give credit for all the X Preface to the First Edition results discussed. Our indebtedness to the authors of some of the well-known existing books on time series, in particular Anderson, Box and Jenkins, Fuller, Grenander and Rosenblatt,, Hannan, Koopmans and Priestley will however be apparent. We were also fortunate to have access to notes on time series by W. Dunsmuir. To these and to the many other sources that have influenced our presentation of the subject we express our thanks. Recursive techniques based on the Kalman filter and state-space represen tations of ARMA processes have played an important role in many recent developments in time series analysis. In particular the Gaussian likelihood of a time series can be expressed very simply in terms of the one-step linear predictors and their mean squared errors, both of which can be computed recursively using a Kalman filter. Instead of using a state-space representation for recursive prediction we utilize the innovations representation of an arbi trary Gaussian time series in order to compute best linear predictors and exact Gaussian likelihoods. This approach, developed by Rissanen and Barbosa, Kailath, Ansley and others, expresses the value of the series at time t in terms of the one-step prediction errors up to that time. This representation provides insight into the structure of the time series itself as well as leading to simple algorithms for simulation, prediction and likelihood calculation. These algorithms are used in the parameter estimation program (PEST) found on the companion diskette. Given a data set of up to 2300 observations, the program can be used to find preliminary, least squares and maximum Gaussian likelihood estimators of the parameters of any prescribed ARIMA model for the data, and to predict future values. It can also be used to simulate values of an ARMA process and to compute and plot its theoretical auto covariance and spectral density functions. Data can be plotted, differenced, deseasonalized and detrended. The program will also plot the sample auto correlation and partial autocorrelation functions of both the data itself and the residuals after model-fitting. The other time-series programs are SPEC, which computes spectral estimates for univariate or bivariate series based on the periodogram, and TRANS, which can be used either to compute and plot the sample cross-correlation function of two series, or to perform least squares estimation of the coefficients in a transfer function model relating the second series to the first (see Section 1 2.2). Also included on the diskette is a screen editing program (WORD6), which can be used to create arbitrary data files, and a collection of data files, some of which are analyzed in the book. Instructions for the use of these programs are contained in the file HELP on the diskette. For a one-semester course on time-domain analysis and modelling at the M.S. level, we have used the following sections of the book : 1 . 1 - 1 .6; 2. 1 -2.7; 3.1 -3.5; 5. 1-5.5; 7. 1 , 7.2; 8.1 -8.9; 9. 1 -9.6 (with brief reference to Sections 4.2 and 4.4). The prerequisite for this course is a knowledge of probability and statistics at the level ofthe book Introducti on to the Theory of Stati sti cs by Mood, Graybill and Boes. Preface to the First Edition XI For a second semester, emphasizing frequency-domain analysis and multi variate series, we have used 4. 1 -4.4, 4.6-4. 10; 10. 1 - 10.7; 1 1 . 1 - 1 1 .7; selections from Chap. 1 2. At the M.S. level it has not been possible (or desirable) to go into the mathe matical derivation of all the results used, particularly those in the starred sections, which require a stronger background in mathematical analysis and measure theory. Such a background is assumed in all of the starred sections and problems. For Ph.D. students the book has been used as the basis for a more theoretical one-semester course covering the starred sections from Chapters 4 through 1 1 and parts of Chapter 1 2. The prerequisite for this course is a knowledge of measure-theoretic probability. We are greatly indebted to E.J. Hannan, R.H. Jones, S.l. Resnick, S.Tavare and D. Tj0stheim, whose comments on drafts of Chapters 1 -8 led to sub stantial improvements. The book arose out of courses taught in the statistics department at Colorado State University and benefitted from the comments of many students. The development of the computer programs would not have been possible without the outstanding work of Joe Mandarino, the architect of the computer program PEST, and Anthony Brockwell, who contributed WORD6, graphics subroutines and general computing expertise. We are indebted also to the National Science Foundation for support for the research related to the book, and one of us (P.J.B.) to Kuwait University for providing an excellent environment in which to work on the early chapters. For permis sion to use the optimization program UNC22MIN we thank R. Schnabel of the University of Colorado computer science department. Finally we thank Pam Brockwell, whose contributions to the manuscript went far beyond those of typist, and the editors of Springer-Verlag, who showed great patience and cooperation in the final production of the book. Fort Collins, Colorado October 1 986 P.J. BROCKWELL R.A. DAVIS Contents Preface t o the Second Edition Preface to the First Edition Vll IX CHAPTER I Stationary Time Series §1.1 § 1 .2 §1.3 § 1 .4 §1.5 § 1 .6 §1 .7* Examples o f Time Series Stochastic Processes Stationarity and Strict Stationarity The Estimation and Elimination of Trend and Seasonal Components The Autocovariance Function of a Stationary Process The Multivariate Normal Distribution Applications of Kolmogorov's Theorem Problems CHAPTER 2 Hilbert Spaces Inner-Product Spaces and Their Properties Hilbert Spaces The Projection Theorem Orthonormal Sets Projection in IR" Linear Regression and the General Linear Model Mean Square Convergence, Conditional Expectation and Best Linear Prediction in L 2(!1, :F, P) §2.8 Fourier Series §2.9 Hilbert Space Isomorphisms §2. 10* The Completeness of L 2 (Q, .?, P) §2. 1 1 * Complementary Results for Fourier Series Problems §2. 1 §2.2 §2.3 §2.4 §2.5 §2.6 §2.7 1 8 11 14 25 32 37 39 42 42 46 48 54 58 60 62 65 67 68 69 73 XIV Contents CHAPTER 3 Stationary ARMA Processes §3.1 §3.2 §3.3 §3.4 §3.5 §3.6* Causal and Invertible ARMA Processes Moving Average Processes of I nfinite Order Computing the Autocovariance Function of an ARMA(p, q) Process The Partial AutOCfimelation Function The Autocovariance Generating Function Homogeneous Linear Difference Equations with Constant Coefficients Problems 77 77 89 91 98 1 03 1 05 1 10 CHAPTER 4 The Spectral Representation of a Stationary Process §4. 1 §4.2 §4.3 §4.4 §4.5* §4.6* §4.7* §4.8 * §4.9* §4. 1 0* §4. 1 1 * Complex-Valued Stationary Time Series The Spectral Distribution of a Linear Combination of Sinusoids Herglotz's Theorem Spectral Densities and ARMA Processes Circulants and Their Eigenvalues Orthogonal Increment Processes on [ -n, n] Integration with Respect to an Orthogonal Increment Process The Spectral Representation Inversion Formulae Time-Invariant Linear Filters Properties of the Fourier Approximation h" to J(v.wJ Problems 1 14 1 14 1 16 1 17 1 22 1 33 1 38 1 40 1 43 1 50 1 52 1 57 1 59 CHAPTER 5 Prediction of Stationary Processes §5. 1 §5.2 §5.3 §5.4 §5.5 The Prediction Equations in the Time Domain Recursive Methods for Computing Best Linear Predictors Recursive Prediction of an ARMA(p, q) Process Prediction of a Stationary Gaussian Process; Prediction Bounds Prediction of a Causal Invertible ARMA Process in Terms of Xi, oo <} :s; n §5.6* Prediction in the Frequency Domain §5.7* The Wold Decomposition §5.8* Kolmogorov's Formula Problems - 1 66 1 66 1 69 1 75 1 82 1 82 1 85 1 87 191 1 92 CHAPTER 6* Asymptotic Theory §6. 1 §6.2 §6.3 §6.4 Convergence in Probability Convergence in r'h Mean, r > 0 Convergence in Distribution Central Limit Theorems and Related Results Problems 1 98 1 98 202 204 209 215 Contents XV CHAPTER 7 Estimation of the Mean and the Autocovariance Function §7. 1 Estimation of J1 §7.2 Estimation of y( ·) and p( · ) §7.3* Derivation of the Asymptotic Distributions Problems 218 218 220 225 236 CHAPTER 8 Estimation for ARMA Models The Yule-Walker Equations and Parameter Estimation for Autoregressive Processes §8.2 Preliminary Estimation for Autoregressive Processes Using the Durbin-Levinson Algorithm §8.3 Preliminary Estimation for Moving Average Processes Using the Innovations Algorithm §8.4 Preliminary Estimation for ARMA(p, q) Processes §8.5 Remarks on Asymptotic Efficiency §8.6 Recursive Calculation of the Likelihood of an Arbitrary Zero-Mean Gaussian Process §8.7 Maximum Likelihood and Least Squares Estimation for ARMA Processes §8.8 Asymptotic Properties of the Maximum Likelihood Estimators §8.9 Confidence Intervals for the Parameters of a Causal Invertible ARMA Process §8. 1 0* Asymptotic Behavior of the Yule-Walker Estimates §8. 1 1 * Asymptotic Normality of Parameter Estimators Problems 238 §8. 1 239 241 245 250 253 254 256 258 260 262 265 269 CHAPTER 9 Model Building and Forecasting with ARIMA Processes §9. 1 §9.2 §9.3 §9.4 §9.5 §9.6 ARIMA Models for Non-Stationary Time Series Identification Techniques Order Selection Diagnostic Checking Forecasting ARIMA Models Seasonal ARIMA Models Problems 273 274 284 301 306 314 320 326 CHAPTER 10 Inference for the Spectrum of a Stationary Process §10.1 §10.2 § 1 0.3 § 10.4 § 1 0.5 § 1 0.6 The Periodogram Testing for the Presence of Hidden Periodicities Asymptotic Properties of the Periodogram Smoothing the Periodogram Confidence Intervals for the Spectrum Autoregressive, Maximum Entropy, Moving Average and Maximum Likelihood ARMA Spectral Estimators § 1 0.7 The Fast Fourier Transform (FFT) Algorithm 330 331 334 342 350 362 365 373 XVI Contents §10.8 * Derivation of the Asymptotic Behavior of the Maximum Likelihood and Least Squares Estimators of the Coefficients of an ARMA Process Problems CHAPTER II Multivariate Time Series §11.1 §1 1 .2 §1 1 .3 § 1 1 .4 §1 1 . 5 §1 1 .6 §1 1 .7 §1 1 .8 * Second Order Properties of Multivariate Time Series Estimation of the Mean and Covariance Function Multivariate ARMA Processes Best Linear Predictors of Second Order Random Vectors Estimation for Multivariate ARMA Processes The Cross Spectrum Estimating the Cross Spectrum The Spectral Representation of a Multivariate Stationary Time Series Problems CHAPTER 12 State-Space Models and the Kalman Recursions § 1 2. 1 § 1 2.2 § 1 2.3 §12.4 § 1 2.5 State-Space M odels The Kalman Recursions State-Space Models with Missing Observations Controllability and Observability Recursive Bayesian State Estimation Problems CHAPTER 13 Further Topics §13. 1 § 13.2 § 1 3.3 §13.4 Transfer Function Modelling Long Memory Processes Linear Processes with Infinite Variance Threshold Models Problems Appendix: Data Sets Bibliography Index 375 396 401 402 405 417 421 430 434 443 454 459 463 463 474 482 489 498 501 506 506 520 535 545 552 555 561 567 CHAPTER 1 Stationary Time Series In this chapter we introduce some basic ideas of time series analysis and stochastic processes. Of particular importance are the concepts of stationarity and the autocovariance and sample autocovariance functions. Some standard techniques are described for the estimation and removal of trend and season ality (of known period) from an observed series. These are illustrated with reference to the data sets in Section 1 . 1 . Most of the topics covered in this chapter will be developed more fully in later sections of the book. The reader who is not already familiar with random vectors and multivariate analysis should first read Section 1.6 where a concise account of the required background is given. Notice our convention that an n-dimensional random vector is assumed (unless specified otherwise) to be a column vector X (X 1, X2, . . , XnY of random variables. If S is an arbitrary set then we shall use the notation sn to denote both the set of n-component column vectors with components in S and the set of n-component row vectors with components in S. = . § 1 . 1 Examples of Time Series A time series is a set of observations x,, each one being recorded at a specified time t. A discrete-time series (the type to which this book is primarily devoted) is one in which the set T0 of times at which observations are made is a discrete set, as is the case for example when observations are made at fixed time intervals. Continuous-time series are obtained when observations are recorded continuously over some time interval, e.g. when T0 [0, 1]. We shall use the notation x(t) rather than x, if we wish to indicate specifically that observations are recorded continuously. = 1 . Stationary Time Series 2 EXAMPLE l.l.l (Current Through a Resistor). If a sinusoidal voltage v(t) = a cos( vt + 8) is applied to a resistor of resistance r and the current recorded continuously we obtain a continuous time series x(t) r - 1acos(vt + 8). = If observations are made only at times 1 , 2, . . . , the resulting time series will be discrete. Time series of this particularly simple type will play a fundamental role in our later study of stationary time series. 0.5 0 -0 5 -1 -1 5 - 2 0 10 20 30 40 50 60 70 80 Figure 1 . 1 . 1 00 observations of the series x(t) = cos(.2t + n/3). 90 1 00 § 1 . 1 . Examples of Time Series EXAMPLE 3 1 . 1 .2 (Population x, of the U.S.A., 1 790- 1 980). x, x, 1 790 1 800 1 8 10 1 820 1830 1 840 1 850 1 860 1 870 1 880 3,929,21 4 5,308,483 7,239,88 1 9,638,453 1 2,860,702 1 7,063,353 23,1 9 1 ,876 3 1 ,443,321 38,558,371 50,1 89,209 1 890 1 900 1910 1 920 1 930 1940 1 950 1960 1 970 1980 62,979,766 76,21 2, 1 68 92,228,496 1 06,021 ,537 1 23,202,624 1 32, 1 64,569 1 5 1 ,325,798 1 79,323,1 75 203,302,03 1 226,545,805 260 240 220 200 1 80 � til c � ::> 160 1 40 1 20 1 00 80 60 40 40 0 1 78 0 1 830 1 8 80 1 930 1 9 80 Figure 1 .2. Population of the U.S.A. at ten-year intervals, 1 790- 1980 (U.S. Bureau of the Census). I. Stationary Time Series 4 EXAMPLE 1 . 1 .3 (Strikes in the U.S.A., 1 95 1 - 1 980). x, x, 1951 1952 1953 1954 1955 1956 1957 1958 1 959 1960 1961 1962 1963 1 964 1 965 4737 5117 5091 3468 4320 3825 3673 3694 3708 3333 3367 36 14 3362 3655 3963 1 966 1 967 1 968 1 969 1 970 1 97 1 1 972 1 973 1 974 1 975 1 976 1 977 1 978 1979 1980 4405 4595 5045 5700 571 6 5 1 38 501 0 5353 6074 503 1 5648 5506 4230 4827 3885 6 � Ill 1J c 0 Ill � 0 J: f.-- 5 4 3 2 +-��-,��-.-,��-,�� 1950 1955 1 9 60 1 965 1 9 70 1 975 1980 Figure 1 .3. Strikes in the U.S.A., 1 95 1 - 1 980 (Bureau of Labor Statistics, U.S. Labor Department). §I. I. Examples of Time Series EXAMPLE 1 . 1 .4 (All Star Baseball Games, 1 933 - 1 980). Xt = t- 1900 x, 5 33 34 35 { 1 if the National League won in year t, - 1 if the American League won in year t. 37 36 -I -I -I x, 49 50 -I I t- 1900 65 66 t- 1900 x, 51 I 67 t =no ga me. * =two ga mes scheduled. 68 54 55 I 69 40 41 42 43 44 45 46 47 48 56 57 58 59 60 61 62 63 64 I I 79 80 I - I - I -I -I I -I 53 52 39 38 -I 70 71 -I I -I -I * 74 75 72 73 * 76 t -I -I -I * 77 * 78 I 3 2 rp �9*\ 0 GB-!1 -1 � G-EH3-!t rk.u -2 -3 1 930 1 935 1 9 40 1945 1 950 1 955 1 960 1 965 1 970 1 975 1 980 Figure 1 .4. Results x,, Example 1 . 1 .4, of All-star baseball games, 1933 - 1 980. 6 I. Stationary Time Series EXAMPLE 1 770 1 77 1 1 772 1 773 1 774 1 775 1 776 1 777 1 778 1 779 1 780 1781 1 782 1 783 1 784 1 785 1 786 1 787 1 788 1 789 1 . 1 .5 (Wolfer Sunspot Numbers, 1 770- 1 869). 1 790 1 79 1 1 792 1 793 1 794 1 795 1 796 1 797 1 798 1 799 1 800 1 80 1 1 802 1 803 1 804 1 805 1 806 1 807 1 808 1 809 101 82 66 35 31 7 20 92 1 54 1 25 85 68 38 23 10 24 83 1 32 131 118 90 67 60 47 41 21 16 6 4 7 14 34 45 43 48 42 28 10 8 2 1810 181 1 1812 1813 1814 1815 1816 1817 1818 1 81 9 1 820 1 82 1 1 822 1 823 1 824 1 825 1 826 1 827 1 828 1 829 0 5 12 14 35 46 41 30 24 16 7 4 2 8 17 36 50 62 67 1830 1831 1 832 1 833 1 834 1 835 1 836 1 837 1 838 1 839 1 840 1 84 1 1 842 1 843 1 844 1 845 1 846 1 847 1 848 1 849 71 48 28 8 13 57 1 22 1 38 1 03 86 63 37 24 11 15 40 62 98 1 24 96 1 850 1851 1 852 1 853 1 854 1 855 1 856 1 857 1 858 1 859 1 860 1 86 1 1 862 1 863 1 864 1 865 1 866 1 867 1 868 1 869 66 64 54 39 21 7 4 23 55 94 96 77 59 44 47 30 16 7 37 74 1 6 0 ,-----, 1 50 1 40 1 30 1 20 1 10 1 00 90 80 70 60 50 40 30 20 10 0 �� 1 770 1 780 1 790 1 800 1810 1 8 20 1830 1 840 1 85 0 Figure 1 .5. The Wolfer sunspot numbers, 1 770- 1 869. 1 860 1870 § 1 . 1 . Examples of Time Series EXAMPLE 7 1 . 1 .6 (Monthly Accidental Deaths in the U.S.A., 1 973-1 978). Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. 1 973 1 974 1 975 1 976 1 977 1 978 9007 8 1 06 8928 9 1 37 1 00 1 7 1 0826 1 13 1 7 1 0744 97 1 3 9938 9161 8927 7750 698 1 8038 8422 8714 95 1 2 1 0 1 20 9823 8743 9 1 29 8710 8680 8 1 62 7306 8 1 24 7870 9387 9556 1 0093 9620 8285 8433 8 1 60 8034 77 1 7 746 1 7776 7925 8634 8945 1 0078 9 1 79 8037 8488 7874 8647 7792 6957 7726 8 1 06 8890 9299 1 0625 9302 83 1 4 8850 8265 8796 7836 6892 779 1 8 1 29 9115 9434 1 0484 9827 91 10 9070 8633 9240 11 10 "() c UJ � :J 0 .r: f-- 9 8 7 0 12 24 36 48 60 72 Figure 1.6. Monthly accidental deaths in the U.S.A., 1 973 - 1 978 (National Safety Council). 8 I. Stationary Time Series These examples are of course but a few of the multitude of time series to be found in the fields of engineering, science, sociology and economics. Our purpose in this book is to study the techniques which have been developed for drawing inferences from such series. Before we can do this however, it is necessary to set up a hypothetical mathematical model to represent the data. Having chosen a model (or family of models) it then becomes possible to estimate parameters, check for goodness of fit to the data and possibly to use the fitted model to enhance our understanding of the mechanism generating the series. Once a satisfactory model has been developed, it may be used in a variety of ways depending on the particular field of application. The applications include separation (filtering) of noise from signals, prediction of future values of a series and the control of future values. The six examples given show some rather striking differences which are apparent if one examines the graphs in Figures 1 . 1 - 1 .6. The first gives rise to a smooth sinusoidal graph oscillating about a constant level, the second to a roughly exponentially increasing graph, the third to a graph which fluctuates erratically about a nearly constant or slowly rising level, and the fourth to an erratic series of minus ones and ones. The fifth graph appears to have a strong cyclic component with period about 1 1 years and the last has a pronounced seasonal component with period 12. In the next section we shall discuss the general problem of constructing mathematical models for such data. § 1.2 Stochastic Processes The first step in the analysis of a time series is the selection of a suitable mathematical model (or class of models) for the data. To allow for the possibly unpredictable nature of future observations it is natural to suppose that each observation x, is a realized value of a certain random variable X,. The time series { x" t E T0 } is then a realization of the family of random variables { X,, t E T0 }. These considerations suggest modelling the data as a realization (or part of a realization) of a stochastic process { X,, t E T} where T 2 T0 . To clarify these ideas we need to define precisely what is meant by a stochastic process and its realizations. In later sections we shall restrict attention to special classes of processes which are particularly useful for modelling many of the time series which are encountered in practice. Definition 1.2.1 (Stochastic Process). A stochastic process is a family of random variables {X,, t E T} defined on a probability space (Q, ff, P). Remark 1. In time series analysis the index (or parameter) set Tis a set of time points, very often {0, ± 1 , ± 2, . . . }, { 1 , 2, 3, . . . }, [0, oo ) or ( - oo, oo ). Stochastic processes in which Tis not a subset of IR are also of importance. For example in geophysics stochastic processes with T the surface of a sphere are used to § 1 .2. Stochastic Processes 9 represent variables indexed by their location on the earth's surface. In this book however the index set T will always be a subset of IR. Recalling the definition of a random variable we note that for each fixed t E T, X, is in fact a function X,( . ) on the set n. On the other hand, for each fixed wEn, X.(w) is a function on T. (Realizations of a Stochastic Process). The functions {X.(w), w E!l} on T are known as the realizations or sample-paths of the process {X,, t E T}. Definition 1.2.2 Remark 2. We shall frequently use the term time series to mean both the data and the process of which it is a realization. The following examples illustrate the realizations of some specific stochastic processes. The first two could be considered as possible models for the time series of Examples 1 . 1 . 1 and 1 . 1 .4 respectively. 1 .2. 1 (Sinusoid with Random Phase and Amplitude). Let A and 0 be independent random variables with A :;:::: 0 and 0 distributed uniformly on (0, 2n). A stochastic process { X (t), t E IR} can then be defined in terms of A and 0 for any given v :;:::: 0 and r > 0 by ( 1 .2. 1 ) X, = r - 1 A cos(vt + 0), ExAMPLE o r more explicitly, X,(w) = r- 1 A(w)cos(vt + 0(w)), ( 1 .2.2) where w is an element of the probability space n on which A and 0 are defined. The realizations of the process defined by 1 .2.2 are the functions of t obtained by fixing w, i.e. functions of the form x (t) = r- 1 a cos(vt + (}). The time series plotted in Figure 1 . 1 is one such realization. EXAMPLE 1 .2.2 (A Binary Process). Let {X,, t = 1, 2, . . . } be a sequence of independent random variables for each of which = ( 1 .2.3) P (X, = 1 ) = P (X, = - 1) l In this case it is not so obvious as in Example 1 .2. 1 that there exists a probability space (Q, ff, P) with random variables X 1 , X2 , defined on n having the required joint distributions, i.e. such that . • • ( 1 .2.4) for every n-tuple (i 1 , . . . , in) of 1 's and - 1 's. The existence of such a process is however guaranteed by Kolmogorov's theorem which is stated below and discussed further in Section 1 .7. 1 . Stationary Time Series 10 The time series obtained by tossing a penny repeatedly and scoring + 1 for each head, - I for each tail is usually modelled as a realization of the process defined by ( 1 .2.4). Each realization of this process is a sequence of 1 's and 1 's. A priori we might well consider this process as a model for the All Star baseball games, Example 1 . 1 .4. However even a cursory inspection of the results from 1 963 onwards casts serious doubt on the hypothesis P(X, 1) = t· - = ExAMPLE 1 .2.3 (Random Walk). The simple symmetric random walk {S, t = 0, I, 2, . . . } is defined in terms of Example 1 .2.2 by S0 = 0 and t � 1. ( 1 .2.5) The general random walk is defined in the same way on replacing X 1 , X2 , by a sequence of independently and identically distributed random variables whose distribution is not constrained to satisfy ( 1 .2.3). The existence of such an independent sequence is again guaranteed by Kolmogorov's theorem (see Problem 1 . 1 8). . • • 1 .2.4 (Branching Processes). There is a large class of processes, known as branching processes, which in their most general form have been applied with considerable success to the modelling of population growth (see for example lagers (1 976)). The simplest such process is the Bienayme Galton-Watson process defined by the equations X0 = x (the population size in generation zero) and ExAMPLE t = 0, 1, 2, 0 0 0 ' ( 1 .2.6) ,j are independently and identically where Z,,j, t = 0, I , . . . = 1 , 2, distributed non-negative integer-valued random variables, Z,,j, representing the number of offspring of the ph individual born in generation t. In the first example we were able to define X,(w) quite explicitly for each t and w. Very frequently however we may wish (or be forced) to specify instead the collection of all joint distributions of all finite-dimensional vectors (X, , , X,2, . . . , X,J, t = (t1, . . . , t" ) E T", n E {I, 2, . . . }. In such a case we need to be sure that a stochastic process (see Definition 1 .2. 1 ) with the specified distributions really does exist. Kolmogorov's theorem, which we state here and discuss further in Section 1.7 , guarantees that this is true under minimal conditions on the specified distribution functions. Our statement of Kolmo gorov' s theorem is simplified slightly by the assumption (Remark 1) that T is a subset of IR and hence a linearly ordered set. If T were not so ordered an additional "permutation" condition would be required (a statement and proof of the theorem for arbitrary T can be found in numerous books on probability theory, for example Lamperti, 1 966). § 1 .3. Stationarity and Strict Stationarity 11 Definition 1.2.3 (The Distribution Functions of a Stochastic Process {X� ' t E Tc !R}). Let 5be the set of all vectors { t = (t 1 , . . . , tn)' E Tn: t 1 < t 2 < · · · < tn , n = 1 , 2, . . . }. Then the (finite-dimensional) distribution functions of { X� ' t E T} are the functions { F1 ( ), t E 5} defined for t = (t 1 , , tn)' by • • • • Theorem 1.2.1 (Kolmogorov's Theorem). The probabi li tydi stri buti on functi ons { F1( ), t E 5} are the di stri buti on functi ons of some stochasti c process if and only if for any n E { 1 , 2, . . . }, t = (t 1, . . . , tn)' E 5 and 1 :-:::; i :-:::; n, • lim F1(x) = F1< ;>(x(i )) ( 1 .2.8) wheret (i ) and x(i ) are the (n - I )- component vectors obtai ned by d eleti ng the i'h components oft and x respecti vely. If (M · ) is the characteristic function corresponding to F1( ), i.e. tP1(u) = l e ;u·xF. (d x 1 , . ,. ,d xn ), J �n • U = (u 1 , . . . , u n )' E !Rn, then (1 .2.8) can be restated in the equivalent form, lim tP1 (u) = tPt(i) (u(i )), ui-+0 (1 .2.9) where u(i) is the (n - I )-component vector obtained by deleting the i 1h component of u. Condition ( 1 .2.8) is simply the "consistency" requirement that each function F1( · ) should have marginal distributions which coincide with the specified lower dimensional distribution functions. § 1 .3 Stationarity and Strict Stationarity When dealing with a finite number of random variables, it is often useful to compute the covariance matrix (see Section 1 .6) in order to gain insight into the dependence between them. For a time series {X1 , t E T} we need to extend the concept of covariance matrix to deal with infinite collections of random variables. The autocovariance function provides us with the required extension. Definition 1.3.1 (The Autocovariance Function). If { X,, t E T} is a process such that Var(X1) < oo for each t E T, then the autocovariance function Yx( · , · ) of { X1 } is defined by Yx (r, s) = Cov(X, X. ) = E [(X, - EX, ) (Xs - EX5)], r, s E T. ( 1 .3.1) 12 I . Stationary Time Series Definition 1.3.2 (Stationarity). The time series { X0 t E Z }, with index set Z = {0, ± 1 , ± 2, . . . }, is said to be stationary if (i) E I X11 2 < oo for all t E Z, (ii) EX1 = m (iii) Yx(r, s) = and for all t E £', Yx(r + t, s + t) for all r, s, t E £'. Remark I . Stationarity as just defined is frequently referred to in the literature as weak stationarity, covariance stationarity, stationarity in the wide sense or second-order stationarity. For us however the term stationarity, without further qualification, will always refer to the properties specified by Definition 1 .3.2. - Remark 2. If { X1, t E Z } is stationary then Yx(r, s) = Yx(r s, 0) for all r, s E £'. It is therefore convenient to redefine the autocovariance function of a stationary process as the function of just one variable, Yx(h) = Yx(h, 0) = Cov(Xr + h > X1) for all t, h E £'. The function YxC ) will be referred to as the autocovariance function of { X1} and Yx(h) as its value at "lag" h. The autocorrelation function (acf) of { X1} is defined analogously as the function whose value at lag h is Px(h) = Yx(h)!Yx(O) = Corr(Xr+h> X1) for all t, h E 7L. It will be noticed that we have defined stationarity only in the case when T = Z. It is not difficult to define stationarity using a more general index set, but for our purposes this will not be necessary. If we wish to model a set of data { X1, t E T c Z } as a realization of a stationary process, we can always consider it to be part of a realization of a stationary process { X1, t E Z }. Remark 3. Another important and frequently used notion of stationarity is introduced in the following definition. Definition 1.3.3 (Strict Stationarity). The time series { X0 t E Z } is said to be strictly stationary if the joint distributions of(X1, , , X1J and (X1, +h , . . . , Xr.+h)' are the same for all positive integers k and for all t 1, . . . , tk, h E £'. Strict stationarity means intuitively that the graphs over two equal-length time intervals of a realization of the time series should exhibit similar statistical characteristics. For example, the proportion of ordinates not exceeding a given level x should be roughly the same for both intervals. • • • 1 .3.3 is equivalent to the statement that (X 1, , Xk)' and (X l +h ' . . . , Xk+h)' have the same joint distribution for all positive integers k and integers h. Remark 4. Definition • • • § 1 .3. Stationarity and Strict Stationarity 13 The Relation Between Stationarity and Strict Stationarity If { X1 } is strictly stationary it immediately follows, on taking k = 1 in Definition 1.3.3, that X1 has the same distribution for each t E 7!.. . If E I X1I 2 < oo this implies in particular that EX1 and Var(X1) are both constant. Moreover, taking k = 2 in Definition 1 .3.3, we find that Xt+ h and X1 have the same joint distribution and hence the same covariance for all h E 7!.. . Thus a strictly stationary process with finite second moments is stationary. The converse of the previous statement is not true. For example if { X1 } is a sequence of independent random variables such that X1 is exponentially distributed with mean one when t is odd and normally distributed with mean one and variance one when t is even, then { X1} is stationary with Yx(O) = 1 and Yx(h) = 0 for h =F 0. However since X 1 and X2 have different distributions, { X1 } cannot be strictly stationary. There is one important case however in which stationarity does imply strict stationarity. Definition 1 .3.4 (Gaussian Time Series). The process { X1 } is a Gaussian time series if and only if the distribution functions of { X1} are all multivariate normal. If { Xn t E 7!.. } is a stationary Gaussian process then { X1 } is strictly stationary, since for all n E { 1 , 2, . . . } and for all h, t 1 , t 2 , E Z, the random vectors (X1, , , X1} and (X1, +h• . . . , X1" +h)' have the same mean and covariance matrix, and hence the same distribution. • • • . . • 1 .3. 1 . Let X1 = A cos(8t) + B sin(8t) where A and B are two uncor related random variables with zero means and unit variances with 8 E [ -n, n]. This time series is stationary since ExAMPLE Cov(Xr+h• X1) = Cov(A cos(8(t + h)) + B sin(8(t + h)), A cos(8t) + B sin(8t)) = cos(8t)cos(8(t + h)) + sin(8t)sin(8(t + h)) = cos(8h), which is independent of t. EXAMPLE 1 .3.2. Starting with an independent and identically distributed sequence of zero-mean random variables Z1 with finite variance ai , define XI = zl + ezt-1· Then the autocovariance function of XI is given by { Cov(Xt +h• XI) = Cov(Zt +h + ezt+h- 1 > zl + ezt- 1 ) (1 + 8 2 )al if h = 0, = if h = ± 1 , 8al if I hi > 1 , 0 I. Stationary Time Series 14 and hence { X1 } is stationary. In fact it can be shown that { X1 } is strictly stationary (see Problem 1 . 1 ). EXAMPLE 1 .3.3. Let {Y, if t is even, x�¥,+ 1 if t is odd. where { Y, } is a stationary time series. Although Cov(Xr+h• X1) not stationary for it does not have a constant mean. = = yy(h), { X1 } is 1 .3.4. Referring to Example 1 .2.3, let st be the random walk X 1 + X2 + · · · + X, where X 1, X2 , . . . , are independent and identically S1 distributed with mean zero and variance (J 2 . For h > 0, t t +h Cov(Sr+h • S1) Cov � X; , � Xj ; j EXAMPLE and thus = ( st is not stationary. = ) (J2 t Stationary processes play a crucial role in the analysis of time series. Of course many observed time series (see Section 1 . 1) are decidedly non stationary in appearance. Frequently such data sets can be transformed by the techniques described in Section 1 .4 into series which can reasonably be modelled as realizations of some stationary process. The theory of stationary processes (developed in later chapters) is then used for the analysis, fitting and prediction of the resulting series. In all of this the autocovariance function is a primary tool. Its properties will be discussed in Section 1.5. § 1 .4 The Estimation and Elimination of Trend and Seasonal Components The first step in the analysis of any time series is to plot the data. If there are apparent discontinuities in the series, such as a sudden change of level, it may be advisable to analyze the series by first breaking it into homogeneous segments. If there are outlying observations, they should be studied carefully to check whether there is any justification for discarding them (as for example if an observation has been recorded of some other process by mistake). Inspection of a graph may also suggest the possibility of representing the data as a realization of the process (the "classical decomposition" model), §1.4. The Estimation and Elimination of Trend and Seasonal Components X, = m, + s, + r;, 15 ( 1 .4. 1) where m , is a slowly changing function known as a "trend component", s, is a function with known period d referred to as a "seasonal component", and r; is a "random noise component" which is stationary in the sense of Definition 1 .3.2. If the seasonal and noise fluctuations appear to increase with the level of the process then a preliminary transformation of the data is often used to make the transformed data compatible with the model ( 1 .4. 1). See for example the airline passenger data, Figure 9.7, and the transformed data, Figure 9.8, obtained by applying a logarithmic transformation. In this section we shall discuss some useful techniques for identifying the components in ( 1 .4. 1). Our aim is to estimate and extract the deterministic components m , and s, in the hope that the residual or noise component r; will turn out to be a stationary random process. We can then use the theory of such processes to find a satisfactory probabilistic model for the process {I; }, to analyze its properties, and to use it in conjunction with m, and s, for purposes of prediction and control of {X,}. An alternative approach, developed extensively by Box and Jenkins ( 1970), is to apply difference operators repeatedly to the data { x,} until the differenced observations resemble a realization of some stationary process {Wr }. We can then use the theory of stationary processes for the modelling, analysis and prediction of {Wr } and hence of the original process. The various stages of this procedure will be discussed in detail in Chapters 8 and 9. The two approaches to trend and seasonality removal, (a) by estimation of m, and s, in ( 1 .4. 1 ) and (b) by differencing the data { x, }, will now be illustrated with reference to the data presented in Section 1 . 1 . Elimination of a Trend i n the Absence of Seasonality In the absence of a seasonal component the model ( 1 .4. 1 ) becomes t = 1, . . . , n where, without loss of generality, we can assume that EI; = 0. ( 1 .4.2) (Least Squares Estimation of m, ). In this procedure we attempt to fit a parametric family of functions, e.g. Method 1 ( 1 .4.3) to the data by choosing the parameters, in this illustration a0, a 1 and a 2 , to minimize ,L, (x, - m, f . Fitting a function of the form ( 1 .4.3) to the population data of Figure 1 .2, 1 790 :::::; t :::::; 1 980 gives the estimated parameter values, llo = 2.0979 1 1 X 1 0 1 0 , a1 = - 2.334962 x 107, 1 . Stationary Time Series 16 260 240 220 200 180 � Ul c 0 2- 1 60 1 40 0 1 20 0 1 00 80 60 40 20 0 1 78 0 1 98 0 1 930 188 0 1830 Figure 1 .7. Population of the U.S.A., 1 790- 1 980, showing the parabola fitted by least squares. and a2 = 6.49859 1 x 1 03. A graph of the fitted function is shown with the original data in Figure 1 .7. The estimated values of the noise process 1;, 1 790 $; t $; 1 980, are the residuals obtained by subtraction of m t = ao + a! t + llzt2 from xt. The trend component m1 furnishes us with a natural predictor of future values of X1 • For example if we estimate ¥1 990 by its mean value (i.e. zero) we obtain the estimate, m1 990 2.484 x 1 08 , = for the population of the U.S.A. in 1 990. However if the residuals { Yr} are highly correlated we may be able to use their values to give a better estimate of ¥1 990 and hence of X 1 990 . Method 2 (Smoothing by Means of a Moving Average). Let q be a non negative integer and consider the two-sided moving average, q w, = (2q + 1 )- 1 ( 1 .4.4) x+t j• j=-q of the process { X1 } defined by ( 1 .4.2). Then for q + 1 $; t $; n q, q q w, = (2q + 1 ) l 2: m+ t (2q + 1) - l 2: Yr+j j=-q j + j=-q ( 1 .4.5) L - - 17 § 1.4. The Estimation and Elimination of Trend and Seasonal Components assuming that m, is approximately linear over the interval [t - q, t + q] and that the average of the error terms over this interval is close to zero. The moving average thus provides us with the estimates m, = (2q + W1 j=L-q x,+ j, q q + 1 ::; t ::; - q. n ( 1 .4.6) Since is not observed for t ::; 0 or t > n we cannot use ( 1 .4.6) for t ::; q or t > n- q. The program SMOOTH deals with this problem by defining for t < 1 and n for t > n. The results of applying this program to the strike data of Figure 1.3 are shown in Figure 1 .8. The are shown in Figure 1 .9. As expected, estimated noise terms, Y, they show no apparent trend. For any fixed E [0, 1], the one-sided moving averages t = 1 , . . . , n, defined by the recursions, ( 1.4.7) t = 2, . . . , n, + (1 and ( 1 .4.8) can also be computed using the program SMOOTH. Application of ( 1 .4.7) and ( 1 .4.8) is often referred to as exponential smoothing, since it follows from a i + (1 these recursions that, for t :;:o: 2, , with weights decreasing expo weighted moving average of nentially (except for the last one). in ( 1 .4.6) as a process obtained from It is useful to think of by application of a linear operator or linear filter, with X, X,:= X 1 X,:= X = X, - m" a m, = aX, - a)m,_ 1, m,, jX,_ a)'- 1 X 1 , m, = i�� a(l a' L X,, X,_ 1, {m,} m, = L� - co ajx,+ j {X,} • • . 6 '";) 5 t:, 4 "1J <: 0 "' � 0 .r:: 3 2 +-�� 1975 1980 1955 1960 1965 1970 1950 Figure 1 .8. Simple 5-term moving average m, of the strike data from Figure 1 .3. I. Stationary Time Series 18 1 ,-------, 0.9 0.8 0.7 0.6 0.5 0.4 0.3 '-;;' 0.2 0 � :J 0 .r: 0.1 -g t:., 0 4-4---+-4-��+-��--��--r--�_,-+--+-� -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 1950 1955 Figure 1 .9. Residuals, Y, the strike data. = 1960 = x, - 1965 1970 1975 1980 m,, after subtracting the 5-term moving average from weights aj (2q + 1) - 1 , - q s j s q, and aj = 0, Ul > q. This particular filter is a "low-pass" filter since it takes the data { x,} and removes from it the rapidly fluctuating (or high frequency) component { Y,}, to leave the slowly varying estimated trend term { m,} (see Figure 1 . 1 0). {x,} Linea r filter Figure 1 . 1 0. Smoothing with a low-pass linear filter. The particular filter ( 1 .4.6) is only one of many which could be used for smoothing. For large q, provided (2q + 1 ) - 1 2J=-q Y,+i � 0, it will not only attenuate noise but at the same time will allow linear trend functions m, = at + b, to pass without distortion. However we must beware of choosing q to be too large since if m, is not linear, the filtered process, although smooth, will not be a good estimate of m,. By clever choice of the weights { aj} it is possible to design a filter which will not only be effective in attenuating noise from the data, but which will also allow a larger class of trend functions (for example all polynomials of degree less than or equal to 3) to pass undistorted through the filter. The Spencer 1 5-point moving average for example has weights ai = 0, I ii > 7, § 1 .4. The Estimation and Elimination of Trend and Seasonal Components with 19 I i i :,;; 7, and [a0, a1 , ... , a7 ] = 3 i0 [74, 67, 46, 2 1, 3, - 5, - 6, 3] Applied to the process ( 1 .4.3) with m, =a t 3 + bt2 + ct + d , it gives 7 7 7 a;Xt+i = a;mt+i + a; Yr+i i=-7 i= - 7 i= 7 7 � aimt+i' i=-7 - L L L . ( 1.4.9) L - =mo where the last step depends on the assumed form of m, (Problem 1 .2). Further details regarding this and other smoothing filters can be found in Kendall and Stuart, Volume 3, Chapter 46. Method 3 (Differencing to Generate Stationary Data). Instead of attempting to remove the noise by smoothing as in Method 2 , we now attempt to eliminate the trend term by differencing. We define the first difference operator V by ( 1 .4. 1 0) VX, =X,- Xt-1 =( 1 - B)X0 where B is the backward shift operator, ( 1 .4. 1 1) BX, =X,-1· Powers of the operators B and V are defined in the obvious way, i.e. Bj (X,) =X,_j and Vj(X,) =V(Vj-1(X,)),j � 1 with V0(X,) =X,. Polynomials in B and V are manipulated in precisely the same way as polynomial functions of real variables. For example =X,- 2X,_1 + X,_z. If the operator V is applied to a linear trend function m1 =at + b, then we obtain the constant function Vm, =a. In the same way any polynomial trend of degree can be reduced to a constant by application of the operator Vk (Problem 1 .4). Starting therefore with the model X, =m, + Yr where m, = J=o alj and Yr is stationary with mean zero, we obtain VkX, = ak + VkYr, k L k! a stationary process with mean k!ak. These considerations suggest the possibility, given any sequence {x,} of data, of applying the operator V repeatedly until we find a sequence {Vkx,} which can plausibly be modelled as a realization of a stationary process. It is often found in practice that the I. Stationary Time Series 20 20 15 10 � 1/l c 0 i � 5 0 -5 -10 -15 - 20 1 78 0 1 830 1 880 1 930 1 980 Figure 1 . 1 1 . The twice-differenced series derived from the population data of Figure 1 .2. order of differencing required is quite small, frequently one or two. (This depends on the fact that many functions can be well approximated, on an interval of finite length, by a polynomial of reasonably low degree.) Applying this technique to the twenty population values { xn, n = 1 , . . . , 20} of Figure 1 .2 we find that two differencing operations are sufficient to produce a series with no apparent trend. The differenced data, V2 xn = xn - 2xn - t xn- z , are plotted in Figure 1 . 1 1 . Notice that the magnitude of the fluctuations in V2 xn increase with the value of xn - This effect can be suppressed by first taking natural logarithms, Yn = In xn, and then applying the operator V 2 to the series { Yn } · (See also Section 9.2(a).) k + Elimination of both Trend and Seasonality The methods described for the removal of trend can be adapted in a natural way to eliminate both trend and seasonality in the general model ( 1 .4. 1 2) where E l; = 0, st + d = S1 and I1= t si = 0. We illustrate these methods, with reference to the accident data of Example 1 . 1 .6 (Figure 1 .6) for which the period d of the seasonal component is clearly 1 2. It will be convenient in Method 1 to index the data by year and month. = 1, . . . , 1 2 will denote the number ofaccidental deaths Thus xi. k> j = 1, . . . ,6,k § 1 .4. The Estimation and Elimination of Trend and Seasonal Components reported for the kth month of the/h year, ( 1 972 j = 1' . . . ' 6, 21 + j). I n other words we define k = 1 ' . . . ' 1 2. Method Sl (The Small Trend Method). If the trend is small (as in the accident data) it is not unreasonable to suppose that the trend term is constant, say mi , for the /h year. Since :Lf,:1 sk = 0, we are led to the natural unbiased estimate while for sk > k 1 12 mj = 1 2 I xj. k> k=l = ( 1 .4. 1 3) 1 , . . . , 1 2 we have the estimates, 1 6 ( 1 .4. 14) .sk = Il (xj.k - mJ , j= which automatically satisfy the requirement that :LI,:1 sk = 0. The estimated error term for month k of the /h year is of course -6 Y) , k = x.j, k - mJ - .sk ' j = 1, . . . , 6, k = 1 , . . . , 1 2. ( 1 .4. 1 5) The generalization of( 1 .4.1 3)-( 1 .4. 1 5) to data with seasonality having a period other than 1 2 should be apparent. In Figures 1 . 1 2, 1 . 1 3 and 1 . 1 4 we have plotted respectively the detrended observations xj, k - mi, the estimated seasonal components sk> and the de- 2 � Vl u c 8 :J 0 ..c: f- 0 1-��--��-+--+-----r--H�- -1 -2 0 12 24 36 48 60 72 Figure 1 . 1 2. Monthly accidental deaths from Figure 1 .6 after subtracting the trend estimated by Method S l . 1 . Stationary Time Series 22 2 " D c � �� 0 L ,t_ 0 4---�--¥+�--�--4---¥+�--�-,M+� - 1 -2 0 12 24 36 48 60 72 Figure 1 . 1 3. The seasonal component o f the monthly accidental deaths, estimated by Method S l . 2 Vl D c � :J 0 L ,t_ 0 ��--��rF�� -1 -2 0 12 24 36 48 60 72 Figure 1 . 1 4. The detrended and deseasonalized monthly accidental deaths (Method S l). § 1 .4. The Estimation and Elimination of Trend and Seasonal Components 12 23 .-------� 1 1 � (/) u c 0 (/) :J 0 .J: t:. 10 9 8 ��3EE·r··�·:·�9'*9I'IJCl'crTI:D 7 0 12 24 36 60 48 72 Figure 1 . 1 5. Comparison of the moving average and piecewise constant estimates of trend for the monthly accidental deaths. trended, deseasonalized observations �. k = xi. k - mi - sk . The latter have no apparent trend or seasonality. Method S2 (Moving Average Estimation). The following technique is preferable to Method S 1 since it does not rely on the assumption that mr is nearly constant over each cycle. It is the basis for the "classical decomposition" option in the time series identification section of the program PEST. Suppose we have observations {x 1 , . . . , x.}. The trend is first estimated by applying a moving average filter specially chosen to eliminate the seasonal component and to dampen the noise. If the period d is even, say d = 2q, then we use mt = (0.5Xr - q + Xr - q + l + ' ' ' + Xr+ q - 1 + 0.5Xr+ q )/d, q<ts n - q. ( 1 .4. 1 6) If the period is odd, say d = 2q + 1, then we use the simple moving average ( 1 .4.6). In Figure 1 . 1 5 we show the trend estimate mn 6 < t s 66, for the accidental deaths data obtained from ( 1 .4. 1 6). Also shown is the piecewise constant estimate obtained from Method S l . The second step is to estimate the seasonal component. For each k = 1 , . . . , d we compute the average wk of the deviations { (xk + id - mk +id) : q < k + jd s n - q}. Since these average deviations do not necessarily sum to zero, we 1 . Stationary Time Series 24 Table 1 . 1 . Estimated Seasonal Components for the Accidental Deaths Data k .�, ( Method S 1 ) .�, ( Method S2) - 744 - 804 2 3 4 5 6 7 8 9 10 11 12 - 1 504 - 1 522 - 724 - 737 - 523 - 526 338 343 808 746 1 665 1 680 96 1 987 - 87 - 1 09 197 258 - 32 1 - 259 - 67 - 57 k 1 , . . . , d, estimate the seasonal component sk as = ( 1 .4. 1 7) and sk = sk -d• > d. The deseasonalized data is then defined to be the original series with the estimated seasonal component removed, i.e. k t d, = x, - s, = ( 1 .4. 1 8) 1 , . . . , n. Finally we reestimate the trend from { d, } either by applying a moving average filter as described earlier for non-seasonal data, or by fitting a polynomial to the series { d, }. The program PEST allows the options of fitting a linear or quadratic trend m,. The estimated noise terms are then 5; = - m, x, - t s, , = 1, . . . , n. The results of applying Methods S l and S2 to the accidental deaths data are quite similar, since in this case the piecewise constant and moving average estimates of m, are reasonably close (see Figure 1 . 1 5). A comparison of the estimates of sk > = 1 , . . . , 1 2, obtained by Methods S 1 and S2 is made in Table 1 . 1 . k Method S3 (Differencing a t Lag d). The technique of differencing which we applied earlier to non-seasonal data can be adapted to deal with seasonality of period d by introducing the lag-d difference operator vd defined by (This operator should not be confused with the operator V earlier.) Applying the operator Vd to the model, X, = m, where { } has period d, we obtain d = d (1 .4. 1 9) ( 1 - B) defined + + Y,, s, s, which gives a decomposition of the difference vdxt into a trend component - m,_d ) and a noise term ( Y, - Y, - d). The trend, m, - m, _d, can then be eliminated using the methods already described, for example by application of some power of the operator V. Figure 1 . 1 6 shows the result of applying the operator V1 2 to the accidental (m, § 1 .5. The Autocovariance Function of a Stationary Process 25 2 � Vl 1J c 0 1-------+-��--�� :J 0 .r: s - 1 -2 0 12 24 Figure 1. 16. The differenced series {V 1 2 x,, t accidental deaths {x,, t = ! , . . . , 72}. 36 = 48 60 72 1 3, . . . , 72} derived from the monthly deaths data. The seasonal component evident in Figure 1 .6 is absent from the graph of V 1 2 x, 1 3 :s:; t :s:; 72. There still appears to be a non-decreasing trend however. If we now apply the operator V to V 1 2 x, and plot the resulting differences VV 1 2 x,, t = 14, . . . , 72, we obtain the graph shown in Figure 1 . 1 7, which has no apparent trend or seasonal component. In Chapter 9 we shall show that the differenced series can in fact be well represented by a stationary time series model. In this section we have discussed a variety of methods for estimating and/or removing trend and seasonality. The particular method chosen for any given data set will depend on a number of factors including whether or not estimates of the components of the series are required and whether or not it appears that the data contains a seasonal component which does not vary with time. The program PEST allows two options, one which decomposes the series as described in Method S2, and the other which proceeds by successive differencing of the data as in Methods 3 and S3. § 1.5 The Autocovariance Function of a Stationary Process In this section we study the properties of the autocovariance function intro duced in Section 1 .3. 1 . Stationary Time Series 26 2 � Vl u c �� 0 .c f- 0 �----��+-� - 1 -2 24 12 0 36 48 60 72 Figure 1 . 1 7. The differenced series {VV 1 2 x,, t = 14, . . . , 7 2 } derived from the monthly accidental deaths { x, , t = 1, . . , 72}. . Proposition 1 .5.1 (Elementary Properties). If y( · ) is the autocovariance function of a stationary process { X, t E Z}, then y(O) :;::.: 0, ( 1.5. 1 ) l y(h) l :::;; y(O) for all h E Z, ( 1 .5.2) y(h) = y( - h) for all h E Z. (1.5.3) and y( · ) is even, i.e. PROOF. The first property is a statement of the obvious fact that Var(X,) :;::>: 0, the second is an immediate consequence of the Cauchy-Schwarz inequality, and the third is established by observing that y( - h) = Cov(X, _h , X,) = Cov(X, X,+ h ) = y(h). D Autocovariance functions also have the more subtle property of non negative definiteness. (Non-Negative Definiteness). A real-valued function on the integers, K : Z --> IR, is said to be non-negative definite if and only if Definition 1 .5.1 §1 .5. The Autocovariance Function of a Stationary Process 27 ( 1 .5.4) Li,jn=l a;K(t; - ti)ai � 0 for all positive integers n and for all vectors a (a 1 , . . . , a n Y E !Rn and (t 1, ... , tnY E zn or if and only if Li. i = 1 a; K(i - j)ai � 0 for all such n and a. t= = Theorem 1 .5.1 (Characterization of Autocovariance Functions). A real-valued function defined on the integers is the autocovariance function of a stationary time series if and only if it is even and non-negative definite. PROOF. To show that the autocovariance function y( · ) of any stationary time E series {X, } is non-negative definite, we simply observe that if = (a 1 , , !Rn , t = , n E zn , and Z1 = (X,, - EX,, , . . . , X,., - EX,J', then a (t 1, ... t )' = = rn [y(t; - ti)]i. i=l • • • anY a'rn a n L i,j=l a;y(t; - ti)ai, where = is the covariance matrix of (X, , , . . . , X,). To establish the converse, let K : Z --> IR be an even non-negative definite function. We need to show that there exists a stationary process with K( · ) as its autocovariance function, and for this we shall use Kolmogorov's theorem. For each positive integer n and each t = 1' . . E z n such that n < < · · · < let F1 be the distribution function on !R with characteristic function t 1 t2 (t ' tnY tn , . tP1(u) = exp( - u' Ku/2), n where u = Since K is non-negative . . . , un Y E !R and K = definite, the matrix K is also non-negative definite and consequently tPt is the characteristic function of an n-variate normal distribution with mean zero and covariance matrix K (see Section 1.6). Clearly, in the notation of Theorem 1 .2. 1 , [K(t;- ti)]i.i=I · (u 1 , tPt< ;>(u(i)) = lim tP1(u) for each t E Y, uc-·"" 0 i.e. the distribution functions F1 are consistent, and so by Kolmogorov's theorem there exists a time series { X, } with distribution functions F1 and characteristic functions tP1, E Y. In particular the joint distribution of X; and Xi is bivariate normal with mean 0 and covariance matrix t [ K(i - j) -j) K(O) J ' K(i -j) as required. K(O) K(i which shows that Cov(X; , XJ = D I . Stationary Time Series 28 Remark l . As shown in the proof of Theorem 1 .5. 1 , for every autocovariance function y( · ), there exists a stationary Gaussian time series with y( · ) as its autocovariance function. Remark :Z. To verify that a given function is non-negative definite it is sometimes simpler to specify a stationary process with the given autocovariance function than to check Definition 1 .5. 1 . For example the function K(h) = cos(Bh), h E Z, is the autocovariance function of the process in Example 1 .3 . 1 and is therefore non-negative definite. Direct verification by means of Definition 1 .5.1 however is more difficult. Another simple criterion for checking non-negative definite ness is Herglotz's theorem, which will be proved in Section 4.3. Remark 3. An autocorrelation function p( ·) has all the properties of an autocovariance function and satisfies the additional condition p(O) = 1 . ={ ExAMPLE 1 5 1 . Let us show that the real-valued function on Z, . . K(h) 1 if h = 0, p if h = ± 1 , 0 otherwise, is an autocovariance function if and only if I P I � t . If I p I � i then K ( · ) i s the autocovariance function of the process defined in Example 1 .3.2 with (J 2 = (1 B 2 r 1 and e = (2pr 1 ( 1 ± j 1 - 4p 2 ). If p > !, K = [K(i - j)J7, j =t and a is the n-component vector a = (1, - 1 , 1 , - 1 , . . . )', then + a'Ka = n - 2(n - 1)p < 0 for n > 2pj(2p - 1), which shows that K( · ) is not non-negative definite and therefore, by Theorem 1 .5.1 is not an autocovariance function. If p < -i, the same argument using the n-component vector a = (1, 1 , 1 , . . .)' again shows that K( · ) is not non-negative definite. The Sample Autocovariance Function of an Observed Series From the observations {x 1 , x 2 , . . . , xn } of a stationary time series { Xr } we frequently wish to estimate the autocovariance function y( · ) of the underlying process { Xr } in order to gain information concerning its dependence structure. This is an important step towards constructing an appropriate mathematical model for the data. The estimate of y( · ) which we shall use is the sample autocovariance function. Definition 1 .5.2. The sample autocovariance function of { x 1 , . . . , xn } is defined by §1 .5. The Autocovariance Function of a Stationary Process n -h P (h) := n -1 j=I (xj +h - x)(xj - x), 1 and 0 :<::; 29 h < n, y(h) = y( - h), - n < h :-:::; 0, where .X is the sample mean .X = n - 1 I'i= 1 xi . Remark 4. The divisor n is used rather than (n - h) since this ensures that the matrix f" := [y(i - j)J7. j= 1 is non-negative definite (see Section 7.2). Remark 5. The sample autocorrelation function is defined in terms of the sample autocovariance function as l h l < n. p(h) : = y(h)/Y (O), The corresponding matrix Rn := [p(i - j)J7. i= 1 is then also non-negative definite. Remark 6. The large-sample properties of the estimators discussed in Chapter 7. y(h) and p(h) are 1 .5.2. Figure 1 . 1 8(a) shows 300 simulated observations of the series X,1 . 1 8(b)Z, shows + 8Z,_ 1 of Example 1 .3.2 with 8 = 0.95 and Z, N(O, 1 ). Figure the corresponding sample autocorrelation function at lags EXAMPLE = � 0, . . . , 40. Notice the similarity between p( · ) and the function p( · ) computed as described in Example 1 .3.2 (p(h) = 1 for h = 0, .4993 for h = ± 1 , 0 otherwise). EXAMPLE 1 .5.3. Figures 1 . 1 9(a) and 1 . 1 9(b) show simulated observations and = the corresponding sample autocorrelation function for the process Z, 8Z, _ 1 , this time with 8 = - 0.95 and Z, � N(O, 1 ). The similarity between p( · ) and p( · ) is again apparent. + X, Remark 7. Notice that the realization of Example 1 .5.2 is less rapidly fluctuating than that of Example 1 .5.3. This is to be expected from the two autocorrelation functions. Positive autocorrelation at lag 1 reflects a tendency for successive observations to lie on the same side of the mean, while negative autocorrelation at lag I reflects a tendency for successive observations to lie on opposite sides of the mean. Other properties of the sample-paths are also reflected in the autocorrelation (and sample autocorrelation) functions. For example the sample autocorrelation function of the Wolfer sunspot series (Figure 1 .20) reflects the roughly periodic behaviour of the data (Figure 1 .5). Remark 8. The sample autocovariance and autocorrelation functions can be computed for any data set { x 1 , . . . , xn} and are not restricted to realizations of a stationary process. For data containing a trend, I P{h) l will exhibit slow decay as h increases, and for data with a substantial deterministic periodic component, p(h) will exhibit similar behaviour with the same periodicity. Thus p( · ) can be useful as an indicator of non-stationarity (see also Section 9. 1). 30 1 . Stationary Time Series 5 4 3 2 I� IV 0 � � \A -1 -2 -3 -4 -5 50 0 1 00 1 50 200 250 300 (a) 1 0.9 0 .8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0 6 -0.7 - 0 .8 -0.9 -1 0 20 10 30 40 (b) Figure 1 . 1 8. (a) 300 observations of the series X, = Z, + .95Z, _ 1 , Example 1 .5.2. (b) The sample autocorrelation function p(h), 0 :s; h :s; 40. 31 § 1 .5. The Autocovariance Function of a Stationary Process 0 -1 - 2 - 3 -4 -5 0 1 00 50 1 50 250 200 300 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0 2 0.1 0 -0.1 -0.2 - 0.3 -0.4 -0 5 - 0. 6 -0.7 -0.8 - 0.9 -1 0 10 20 30 40 (b) Figure 1 . 1 9. (a) 3 00 observations o f the series X, = Z, - .95Z, _ 1 , Example 1 .5.3. (b) The sample autocorrelation function p(h), 0 � h � 40. I . Stationary Time Series 32 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 - 0 .9 -1 c 10 20 30 40 Figure 1.20. The sample autocorrelation function o f the Wolfer sunspot numbers (see Figure 1 .5). § 1 .6 The Multivariate Normal Distribution An n-dimensional random vector is a column vector, X = (X 1 , . . . , X.)', each of whose components is a random variable. If E I X; ! < oo for each i, then we define the mean or expected value of X to be the column vector, ( 1 .6. 1 ) I n the same way we define the expected value of any array whose elements are random variables (e.g. a matrix of random variables) to be the same array with each random variable replaced by its expected value (assuming each expectation exists). If X = (X 1 , . . . , X.)' and Y = ( Y1 , . . . , Ym )' are random vectors such that E I X; / 2 < oo, i = 1 , . . . , n, and E l ¥; 1 2 < oo, i = 1 , . . . , m, we define the co variance matrix of X and Y to be the matrix, �x v = Cov(X, Y) = E [(X - EX) (Y - EY)' ] = E(XY') - (EX) (EY)'. ( 1 .6.2) The (i,j)-element of�xv is the covariance, Cov(X; , lj) = E(X; lj) - E(X; )E( lj). In the special case when Y = X, Cov(X, Y) reduces to the covariance matrix of X. §1 .6. The Multivariate Normal Distribution 33 Proposition 1 .6.1 . If a is an m-component column vector, B is an m x n matrix and X = (X1 , , Xn )' where E I X; I 2 < oo, i = 1, . . . , n, then the random vector, • • . Y = a + BX , ( 1 .6.3) EY = a + BEX, (1 .6.4) Lyy = BLxx B'. ( 1 .6.5) has mean and covariance matrix, PROOF. Problem 1 . 1 5. Proposition 1 .6.2. The covariance matrix Lxx is symmetric and non-negative definite, i.e. b' Lxx b 2': 0 for all b = (b1 , . . . , bn Y E �n . PROOF. The symmetry of Lxx is apparent from the definition. To prove non negative definiteness let b = (b 1 , , bn )' be an arbitrary vector in �n . Then by Proposition 1 .6. 1 • • . b'L xx b = Var(b'X) ;:o: 0. ( 1 .6.6) 0 Proposition 1 .6.3. Any symmetric, non-negative definite n x n matrix L can be written in the form ( 1 .6.7) L = PAP', 1 where P is an orthogonal matrix (i.e. P' = p- ) and A is a diagonal matrix A = diag()" 1 , . . . , ).n ) in which A 1 , , An are the eigenvalues (all non-negative) of L. • . . PROOF. This proposition is a standard result from matrix theory and for a proof we refer the reader to Graybill (1 983). We observe here only that if P;, i = 1, . . . , n, is a set of orthonormal right eigenvectors of L corresponding to the eigenvalues )" 1 , , An respectively, then P may be chosen as the n x n matrix whose i'h column is p;, i = 1 , . . . , n. D • • • Remark 1. Using the factorization ( 1 .6. 7) and the fact that det P = det P' = 1 , we immediately obtain the result, det L = Jc1 )" 2 . . . A..Definition 1 .6.1 (The Multivariate Normal Distribution). The random vector Y = ( Y1 , . . . , Y,)' is said to be multivariate normal, or to have a multivariate normal distribution, if and only if there exist a column vector a, a matrix B and a random vector X = (X 1 , , Xm )' with independent standard normal • • • 34 1 . Stationary Time Series components, such that Y = a + BX. Remark density 2. The components X 1 , • • • ( 1 .6.8) , Xm of X in ( 1 .6.8) must have the joint X = (x 1 , . . . , Xm )' E !Rm, ( 1.6.9) and corresponding characteristic function, ) ( </lx (u) = Eeiu'X = exp - t u]/2 , j= 1 Remark 3. It is clear from the definition that if Y has a multivariate normal distribution and if D is any k x n matrix and c any x 1 vector, then Z = c + DY is a k-component multivariate normal random vector. k Remark 4. If Y is multivariate normal with representation ( 1 .6.8), then by Proposition 1 .6. 1 , EY = a and �YY = BB'. Proposition 1 .6.4. If Y = ( Y1 , . . . , Y,)' is a multivariate normal random vector such that EY = J1 and �YY = �. then the characteristic function of Y is <jly (u) = exp(iu' Jl - !u'�u), If det � > 0 then Y has the density, n fy(y) = (2n) - i2 (det �) - 1i2 exp [ -!(Y - J1)'� - 1 (y - J1} ]. (1.6. 1 1) ( 1 .6. 1 2) PROOF. If Y is multivariate normal with representation ( 1 .6.8) then <jly(u) = E exp [ iu (a + BX)] = exp(iu'a) E exp(iu'BX). Using ( 1 .6. 1 0) with u (E !Rm) replaced by B'u (u E !Rn) in order to evaluate the last term, we obtain ' <f>v(u) = exp(iu'a)exp( -!u'BB'u), which reduces to ( 1 .6. 1 1 ) by Remark 4. If det � > 0, then by Proposition 1 .6.3 we have the factorization, � = PAP', where PP' = In, the n x n identity matrix, A = diag(A. 1 , . . . , A.n) and each ).i > 0. If we define A - 1 12 = diag(A.( 1 12 , . . . , A.;;- 112 ) and � - 1;2 = PA - 112 P', then it is easy to check that �- 112 ��-112 = ln. From Proposition 1 .6. 1 and Remark 3 we conclude that the random vector §1 .6. The Multivariate Normal Distribution 35 ( 1 .6. 1 3) is multivariate normal with EZ = 0 and I:zz = l . Application of the result n ( 1 .6. 1 1 ) now shows that Z has the characteristic function lftz(u) = exp( - u'u/2), whence it follows that Z has the probability density ( 1 .6.9) with m = n. In view of the relation ( 1 .6. 1 3), the density of Y is given by fv (Y) = j det I: - l/2 1fz(I: - If2 (y - f.l)) = (det I:) -112 (2n)-nf2 exp [ - !(Y - f.l)'I: - 1 (y - f.l) ] as required. D Remark 5. The transformation ( 1 .6. 1 3) which maps Y into a vector of inde pendent standard normal random variables is clearly a generalization of the transformation Z = u- 1 ( ¥ - /1) which standardizes a single normal random variable with mean 11 and variance u 2 . Remark 6. Given any vector f.l E IRn and any symmetric non-negative definite n x n matrix I:, there exists a multivariate normal random vector with mean f.l and covariance matrix I:. To construct such a random vector from a vector X = (X 1 , . . . , XnY with independent standard normal components we simply choose a = f.l and B = 1: 112 in ( 1 .6.8), where 1: 112 , in the terminology of Proposition 1 .6.3, is the matrix PA 112 P' with A 112 = diag().il2 , . . . , 2�12 ). Remark 7. Proposition 1 .6.4 shows that a multivariate normal distribution is uniquely determined by its mean and covariance matrix. If Y is multivariate normal, EY = f.l and I:yy = I:, we shall therefore say that Y has the multi variate normal distribution with mean f.l and covariance matrix I:, or more succinctly, Y � N(f.l, I:). 1 .6. 1 (The Bivariate Normal Distribution). The random vector Y = ( ¥1 , Y2 )' is bivariate normal with mean f.l = (f1 1 , f1 2 )' and covariance matrix ExAMPLE (1 .6. 14) if and only if Y has the characteristic function (from ( 1 .6. 1 1)) lftv (u) = exp [i(u 1 f1 1 + U2 f12 ) - !Cui uf + 2u 1 u2 pU1 (J2 + u� ui)]. (1.6. 1 5) The parameters (J 1 , u2 and p are the standard deviations and correlation of the components Y1 and ¥2 . Since every symmetric non-negative definite 2 x 2 matrix can be written in the form ( 1 .6. 14), it follows that every bivariate normal random vector has a characteristic function of the form ( 1 .6. 1 5). If u1 i= 0, u2 i= 0 and - 1 < p < 1 then I: has an inverse, 36 I . Stationary Time Series (1 .6. 1 6) and so by ( 1 .6. 1 2), Y has the probability density, ( 1 .6. 1 7) Proposition 1.6.5. The random vector Y = ( Y1 , , Yn )' is multivariate normal with mean 11 and covariance matrix L if and only if for each a = (a 1 , , an )' E IRn, a'Y has a univariate normal distribution with mean a'11 and variance a' La. . • . . . • PROOF. The necessity of the condition has already been established. To prove the sufficiency we shall show that Y has the appropriate characteristic function. For any a E IRn we are assuming that a' Y � N (a'Jl, a'La), or equivalently that ( 1 .6. 1 8) E exp(ita'Y) = exp(ita' 11 - 1t 2 a'La). Setting t = 1 in ( 1 .6. 1 8) we obtain the required characteristic function ofY, viz. E exp(ia'Y) = exp(ia' Jl - 1a'La). 0 Another important property of multivariate normal distributions (one which we shall use heavily) is that all conditional distributions are again multivariate normal. In the following proposition we shall suppose that Y is partitioned into two subvectors, y( t ) y . y< 2 ) =[ J ] Correspondingly we can write the mean and covariance matrix of Y as 11 = where 11(i ) = [��::: ] 11 and L = [ Lt t L1 2 L2 1 L2 2 £yU> and L ii = E(Y - ,.u> ) (y u> - Jl (j) )'. Proposition 1.6.6. (i) y( l J and Y( 2 ) are independent if and only if L 1 2 = 0. (ii) If det L 2 2 > 0 then the conditional distribution of Y( l l given y< 2 > is N(J1< 1 > + L 1 2 L zi (Y<2 > - 11<2 > ), L t t - L 1 2 L zi L 2 t l· PROOF. (i) If y( l> and Y< 2 > are independent, then L 1 2 E(Y< t > - 11( 1 l )E(Y< 2 > - 11( 2 > )' = 0. = Conversely if L 1 2 = 0 then the characteristic function lfov(u), as specified by § I . 7. * Applications of Kolmogorov's Theorem 37 Proposition 1 .6.4, factorizes into tPy(U) = tPyi 1>(U( l ))tPy<2>(U(Z)), establishing the independence of y and Y( 2 ). (ii) If we define ( 1 .6. 1 9) then clearly so that X and y< z > are independent by (i). Using the relation ( 1 .6. 1 9) we can express the conditional characteristic function of y< l > given Y( 2 ) as E(exp(iu' Y< 1 > )j Y< 2 >) = E(exp [iu'X + iu'(J1< 1 > + L 1 2 L2� (Y< 2 > - Jl< 2 > ))] j Y< 2 >) 1 = exp [iu'(J1< > + L 1 2 L2� (Y< 2 > - J1< 2 > ))] E exp(iu' X j Y< 2 > ), where the last line is obtained by taking a factor dependent only on y< z > outside the conditional expectation. Now since X and Y( 2 ) are independent, E(exp(iu' X) j Y< 2 > ) = E exp(iu' X) = exp[ - �u'(L 1 1 - L 1 2 L z� L 2 du], so E(exp(iu' Y< 1 >)j Y( 2 )) 1 2 2 = exp [iu'(Jl< > + L 1 2 L2� (Y< > - Jl< > )) - �u'(L l l - L l z Lz� L z l )u], completing the proof. D ExAMPLE 1 .6.2. For the bivariate normal random vector Y discussed in Example 1 .6. 1 we immediately deduce from Proposition 1 .6.6 that Y1 and Y2 are independent if and only if p(J 1 (J2 = 0. If (J 1 > 0, (J2 > 0 and p > 0 then conditional on Y2 , Y1 is normal with mean 1 E(Y1 l Yz ) = !11 + P (J1 (Jz - ( Yz - Jl z ), and variance § 1 .7* Applications of Kolmogorov's Theorem In this section we illustrate the use of Theorem 1 .2. 1 to establish the existence of two important processes, Brownian motion and the Poisson process. Definition 1 .7.1 (Standard Brownian Motion). Standard Brownian motion starting at level zero is a process { B(t), t ?: 0} satisfying the conditions l. Stationary Time Series 38 (a) B(O) = 0, (b) B(t 2 ) - B(t 1 ), B(t 3 ) - B(t 2 ), . . . , B(tn) - B(tn_ 1 ), are independent for every n E { 3, 4, . . . } and every t = (t 1 , . . . , tn)' such that 0 � t 1 < t 2 < · · · < tn , (c) B(t) - B(s) N(O, t - s) for t ;;::: s. � To establish the existence of such a process we observe that conditions (a), (b) and (c) are satisfied if and only if, for every t (t 1 , . . . , tn)' such that 0 � t 1 < · · · < tn , the characteristic function of (B(t 1 ), . . . , B(tn)) is 1Pt{u) = E exp [iu 1 B(t d + + iunB(tn)J = · · · + iu2 (i1 1 + l12 ) + · · · + iun(L1 1 + · · · + L1n)] ( 1 .7.1) (where L1i = B(ti ) - B(ti_1 ), j ;;::: 1, and t0 = 0) = E exp [ii1 1 (u 1 + · · · + Un) + il1 2 (u 2 + · · · + un) + · · · + il1nunJ = exp [ - � I (ui + · · · + un)2 (ti - ti_ 1 ) J . 2 j=1 = E exp [iu 1 i1 1 It is trivial to check that the characteristic functions (M · ) satisfy the consistency condition (1 .2.9) and so by Kolmogorov' s theorem there exists a process with characteristic functions r/J1( · ), or equivalently with the properties (a), (b) and (c). Definition 1 .7.2 (Brownian Motion with Drift). Brownian motion with drift Jl, variance parameter rJ 2 and initial level x is process { Y(t), t ;;::: 0} where Y(t) = X + J.Lt + rJB(t), and B(t) is standard Brownian motion. The existence of Brownian motion with drift follows at once from that of standard Brownian motion. Definition 1 .7.3 (Poisson Process). A Poisson process with mean rate A. ( > 0) is a process { N(t), t ;;::: 0} satisfying the conditions (a) N(O) = 0, (b) N(t 2 ) - N(td, N(t 3 ) - N(t 2 ), , N(tn ) - N(tn_ 1 ), are independent for every n E { 3,4, . . . } and every t = (t 1 , . . . , tn)' such that O � t 1 < t 2 < · · · < tn , (c) N(t) - N(s) has the Poisson distribution with mean A.(t - s) for t ;;::: s. • • . The proof of the existence of a Poisson process follows precisely the same steps as the proof of the existence of standard Brownian motion. For the Poisson process however the characteristic function of the increment L1i = N(ti ) - N(ti - d is E exp(iu l1i ) = exp { - A.(ti - ti _ d ( l - e ;" )}. In fact the same proof establishes the existence of a process {Z(t), t ;;::: 0} Problems 39 satisfying conditions (a) and (b) of Definition 1 .7. 1 provided the increments L1i = Z(ti ) - Z(tj -t) have characteristic function of the form Problems 1. 1. Suppose that X, = Z, + IJZ, _ 1 , t 1, 2, . . . , where Z0, Z 1 , Z2 , . . . , are independent random variables, each with moment generating function E exp().Z;) m(A). (a) Express the joint moment generating function E exp(L7�1 A ; XJ in terms of the function m( · ). (b) Deduce from (a) that {X, } is strictly stationary. = = 1 .2. (a) Show that a linear filter { aJ passes an arbitrary polynomial of degree k without distortion, i.e. for all k'h degree polynomials m, = k c0 + c 1 t + · · · + ck t , if and only if (b) Show that the Spencer 1 5-point moving average filter { aJ does not distort a cubic trend. 1 .3. Suppose that m, (a) Show that = c0 + 2 c 1 t + c2 t , t 2 m, = I, a i mt+ i i= - 2 = = 0, ± 1, . . . . 3 I bi mr+ i' i= - 3 t = 0, ± 1, . . . , where a 2 a 2 f.; a 1 a_ 1 H, a0 H, and b3 b_ 3 -fr, b2 b_ 2 -fr, b 1 = b_! = fr, bo = k (b) Suppose that X, = m, + Z, where { Z,, t = 0, ± 1 , . . . } is an independent se quence of normal random variables, each with mean 0 and variance u 2 • Let U, = If� - 2 a ; Xr+ i and V, = If� - 3 b; Xr+ i · (i) Find the means and variances of U, and V,. (ii) Find the correlations between U, and U,+ 1 and between V, and V,+t · (iii) Which of the two filtered series { U, } and { V,} would you expect to be smoother in appearance? = = _ = - , = = = = = = 1 .4. If m, = I f� o ckt\ t = 0, ± 1 , . . . , show that Vm, is a polynomial of degree (p i n t and hence that VP + l m , 0. - 1) = 1 .5. Design a symmetric moving average filter which eliminates seasonal components with period 3 and which at the same time passes quadratic trend functions without distortion. 1 . Stationary Time Series 40 1 .6. (a) Use the programs WORD6 and PEST to plot the series with values 1 { x 1 , . . . , x30 } given by 1-10 486 474 434 441 435 401 414 414 386 405 1 1-20 4 1 1 389 414 426 4 1 0 441 459 449 486 5 1 0 21-30 506 549 579 5 8 1 630 666 674 729 7 7 1 785 This series is the sum of a quadratic trend and a period-three seasonal component. (b) Apply the filter found in Problem 1.5 to the preceding series and plot the result. Comment on the result. 1 .7. Let Z,, t = 0, ± 1, . . . , be independent normal random variables each with mean 0 and variance a 2 and let a, b and c be constants. Which, if any, of the following processes are stationary? For each stationary process specify the mean and autocovariance function. (a) X, = a + bZ, + cZ,_ 1 , (b) X, = a + bZ0 , (d) X, = Z0 cos(ct), (c) X, = Z 1 cos(ct) + Z2 sin(ct), (e) X, = Z,cos(ct) + Z,_ , sin(ct), (f) X, = Z,Z,_ 1 . 1 .8. Let { Y, } be a stationary process with mean zero and let a and b be constants. (a) If X, = a + bt + s, + Y, where s, is a seasonal component with period 1 2, show that VV 1 2 X, = ( 1 - 8) (1 - B 1 2 ) X, is stationary. (b) If X, = (a + bt) s, + Y, where s, is again a seasonal component with period 1 2, show that V�2 X, = ( 1 - 8 1 2 ) ( 1 - B 1 2 ) X, is stationary. 1 .9. Use the program PEST to analyze the accidental deaths data by "classical de composition". (a) Plot the data. (b) Find estimates s, t = 1, . . . , 1 2, for the classical decomposition model, X, = m, + s, + Y,, where s, = s, + 1 2 , z)�1 s, 0 and E Y, = 0. (c) Plot the deseasonalized data, X, - s, t = 1, . . . , 72. (d) Fit a parabola by least squares to the deseasonalized data and use it as your estimate m, of m, . (e) Plot the residuals Y, = X, - m, - .�,, t = 1, . . . , 72. (f) Compute the sample autocorrelation function of the residuals p(h), h = = 0, . . . , 20. (g) Use your fitted model to predict X,, t = 73, . . . , 84 (using predicted noise values of zero). 1 . 1 0. Let X, = a + bt + Y,, where { Y,, t = 0, ± 1, . . . } is an independent and identically distributed sequence of random variables with mean 0 and variance a 2 , and a and b are constants. Define W, = (2q + q W' l: X,+j · j= q - Compute the mean and autocovariance function of { W, } . Notice that although { W, } is not stationary, its autocovariance function y(t + h, t) = Cov( W,+h, W, ) does n o t depend on t. Plot the autocorrelation function p(h) = Corr(W,+ h , W,). Discuss your results in relation to the smoothing of a time series. Problems 41 1.1 1. If {X,} and { Y,} are uncorrelated stationary sequences, i.e. if X, and Y, are uncorrelated for every s and t, show that {X, + Y,} is stationary with auto covariance function equal to the sum of the autocovariance functions of {X,} and { Y,}. 1 . 1 2. Which, if any, of the following functions defined on the integers is the autocovariance function of a stationary time series? { 1 if h = 0, 1 /h if h =I 0. nh nh (c) f(h) = 1 + cos T + cos 4 (a) f(h) = (e) f(h) = 1 . 1 3. { (b) f(h) = ( � 1 )ihl { nh nh (d) f(h) = 1 + cos - � cos 4 2 1 if h = 0, (f) .f(h) = .6 if h = ± 1 , 0 otherwise. 1 if h 0, .4 if h = ± 1 , 0 otherwise. = Let {S, t = 0, 1 , 2, . . . } be the random walk with constant drift S0 = 0 and s, = 11 + s,_ 1 + x, f1, defined by t = 1 , 2, . . . ' where X 1 , X 2 , . . . are independent and identically distributed random variables with mean 0 and variance (J2 . Compute the mean of S, and the autocovariance function of the process { S, }. Show that {VS, } is stationary and compute its mean and autocovariance function. 1 . 1 4. 1 . 1 5. 1 . 1 6. If X, = a + bt, t = 1 , 2, . . . , n, where a and b are constants, show that the sample autocorrelations have the property p(k) --> 1 as n --> oo for each fixed k. Prove Proposition 1 . 6. 1 . (a) If Z N (O, 1 ) show that Z 2 has moment generating function Ee'z' = ( 1 � 2tf 1 12 for t < !, thus showing that Z2 has the chi-squared distribution with 1 degree of freedom. (b) If Z 1 , . . . , Z" are independent N(O, 1) random variables, prove that Zl + · · · + z; has the chi-squared distribution with n degrees of freedom by showing that its moment generating function is equal to (1 � 2tf"12 for t < !. (c) Suppose that X = (X 1 , . . . , X")' N(Jl, L) with L non-singular. Using ( 1 .6. 1 3), show that (X � Jl)'L - 1 (X � Jl) has the chi-squared distribution with n degrees of freedom. � � 1 . 1 7. If X = (X 1 , . . . , X")' is a random vector with covariance matrix L, show that L is singular if and only if there exists a non-zero vector b = (b 1 , , b")' E IR" such that Var(b'X) = 0. • . . 1 . 1 8.* Let F be any distribution function, let T be the index set T { 1 , 2, 3, . . . } and let Y be as in Definition 1 .2.3. Show that the functions F1, t E Y, defined by = F,, ... 1Jx 1 , • • • , xn) : = F(x J l · · · F(xn), X 1 , . . . , Xn E !R, constitute a family of distribution functions, consistent in the sense of ( 1 .2.8). By Kolmogorov's theorem this establishes that there exists a sequence of inde pendent random variables {X 1 , X 2, } defined on some probability space and such that P(X; :o:; x) = F(x) for all i and for all x E IR. . • . CHAPTER 2 Hilbert Spaces Although it is possible to study time series analysis without explicit use of Hilbert space terminology and techniques, there are great advantages to be gained from a Hilbert space formulation. These are largely derived from our familiarity with two- and three-dimensional Euclidean geometry and in par ticular with the concepts of orthogonality and orthogonal projections in these spaces. These concepts, appropriately extended to infinite-dimensional Hilbert spaces, play a central role in the study of random variables with finite second moments and especially in the theory of prediction of stationary processes. Intuition gained from Euclidean geometry can often be used to make apparently complicated algebraic results in time series analysis geometrically obvious. It frequently serves also as a valuable guide in the development and construction of algorithms. This chapter is therefore devoted to a study of those aspects of Hilbert space theory which are needed for a geometric understanding of the later chapters in this book. The results developed here will also provide an adequate background for a geometric approach to many other areas of statistics, for example the general linear model (see Section 2.6). For the reader who wishes to go deeper into the theory of Hilbert space we recommend the book by Simmons ( 1 963). §2. 1 Inner-Product Spaces and Their Properties Definition 2.1.1 (Inner-Product Space). A complex vector space Yf is said to be an inner-product space if for each pair of elements x and y in Yf, there is a complex number <x, y), called the inner product of x and y, such that §2. 1. Inner-Product Spaces and Their Properties (a) (b) (c) (d) (e) 43 <x, y) = <y, x ) , the bar denoting complex conjugation, <x + y, z) = <x, z) + <y, z) for all x, y, z E Yf, < o:x, y) = o: <x, y) for all x , y E Yf and o: E IC, <x, x) ;:::: 0 for all x E Yf, <x , x) = 0 if and only if x = 0. Remark I. A real vector space Yf is an inner-product space if for each x, y E Yf there exists a real number <x, y) satisfying conditions (a)-(e). Of course condition (a) reduces in this case to <x, y) = <y, x ). Remark 2. The inner product is a natural generalization of the inner or scalar product of two vectors in n-dimensional Euclidean space. Since many of the properties of Euclidean space carry over to inner-product spaces, it will be helpful to keep Euclidean space in mind in all that follows. ExAMPLE 2. 1 . 1 (Euclidean Space). The set of all column vectors is a real inner-product space if we define k <x, y ) = L X;Yii=t (2. 1 . 1) Equation (2. 1 . 1 ) defines the usual scalar product of elements of IRk . It is a simple matter to check that the conditions (a)-(e) are all satisfied. In the same way it is easy to see that the set of all complex k-dimensional column vectors z = (z 1 , • • • , zk )' E IC k is a complex inner-product space if we define k < w, z) = L W;Z; . i= l Definition 2.1 .2 (Norm). is defined to be (2. 1 .2) The norm of an element x of an inner-product space l l x ii = FxA. (2. 1 .3) In the Euclidean space [Rk the norm of the vector is simply its length, l l x ll = ( L�= l x f ) 112· The Cauchy-Schwarz Inequality. If Yf is an inner-product space, then l <x, y ) l and :o;; ll x ii i i Y II for all x, y E Yf, l <x, y) l = ll x ii i i Y II if and only if x = y<x, y)f<y, y). (2. 1 .4) (2. 1 .5) 2. Hilbert Spaces 44 PROOF. The following proof for complex :tf remains valid (although it could be slightly simplified) in the case when :tf is real. Let a = IIYII 2 , b = I <x, y) I and c = ll x f. The polar representation of <x, y) is then <x, y) = be ;o for some O E ( - n, n] . Now for all r E IR, <x - rei8 y, x - re i 8 y) <x, x) - rei0 <y, x) - re - w <x, y) + r 2 < y, y) (2. 1 .6) = c - 2rb r 2 a, = + and using elementary calculus, we deduce from this that 0 � min (c - 2rb + r 2 a) = c - b 2 1a, r E IR thus establishing (2. 1 .4). The minimum value, c - b 2la, of c - 2rb + r2 a is achieved when r = bla. If equality is achieved in (2. 1 .4) then c - b 2 Ia = 0. Setting r = bIa in (2. 1 .6) we then obtain <x - yei0 bla, x - ye i8 bla) = 0, which, by property (e) of inner products, implies that x = yei8 bla = y<x, y>l< y, y). Conversely if x = y <x, y)l< y, y) (or equivalently if x is any scalar multiple of y), it is obvious that there is equality in (2. 1 .4). D EXAMPLE 2. 1 .2 (The Angle between Elements of a Real Inner-Product Space). In the inner-product space IR3 of Example 2. 1 . 1 , the angle between two non-zero vectors x and y is the angle in [0, n] whose cosine is L f= I X; Y; / ( l l x l l ll y ll ). Analogously we define the angle between non-zero elements x and y of any real inner-product space to be (2. 1 .7) e = cos - 1 [<x, y)l( ll x ii iiYII )]. In particular x and y are said to be orthogonal if and only if <x, y) = 0. For non-zero vectors x and y this is equivalent to the statement that 0 = nl2. The Triangle Inequality. If :tf is an inner-product space, then ll x + Yll � l l x ll PROOF. ll x + Y l l 2 = + II Y II for all x, y E :tf. (2. 1 .8) <x + y, x + y) <x, x) + <x, y) + <y, x) + < y, y) � ll x l l 2 + 2 11x ii iiYII + ll y f = by the Cauchy-Schwarz inequality. D §2. 1 . Inner-Product Spaces and Their Properties 45 Proposition 2.1 . 1 (Properties of the Norm). If Yf is a complex (respectively real) inner-product space and II x I I is defined as in (2. 1 .3), then (a) (b) (c) (d) + l l x Y l l ::;; l l x l l + IIYII for all x, y E Yf, for all x E Yf and all a E C (a E IR), ll ax ll l a l ll x ll for all x E Yf, ll x ll � 0 if and only if x = 0. ll xll = 0 = (These properties justify the use of the terminology "norm" for llxll .) PROOF. The first property is a restatement of the triangle inequality and the others follow at once from Definition (2. 1 .3). 0 The Parallelogram Law. If Yf is an inner-product space, then llx Y ll 2 llx - Y ll 2 = 2 11xll 2 + 2 11Y II 2 for all x, y E Yf. + + (2. 1 .9) PROOF. Problem 2. 1 . Note that (2. 1 .9) is not a consequence of the properties (a), (b), (c) and (d) of the norm. It depends on the particular form (2. 1 .3) of the norm as defined for elements of an inner-product space. D Definition 2.1 .3 (Convergence in Norm). A sequence { x., n = 1 , 2, . . . } of ele ments of an inner-product space Yf is said to converge in norm to x E Yf if ll x. - x ll -+ 0 as n -+ oo. Proposition 2. 1 .2 (Continuity of the Inner Product). If { x. } and { y. } are sequences of elements of the inner-product space Yf such that llx. - x ll -+ 0 and llYn - Y ll -+ 0 where x, y E Yf, then (a) llx. ll -+ llxll and PROOF. From the triangle inequality it follows that l l x l l ::;; ll x - Yll + IIYII and I IYII ::;; I I Y - x ll ll x ll . These statements imply that (2. 1 . 10) ll x - Y ll � \ ll x ll - II Y II \ , + from which (a) follows immediately. Now l <x. , y. ) - <x, y) l l <x., y. - y) + <x. - x, y) l ::;; l <x., y. - y) l + l <x. - x, y) l = ::;; llx. II I I Yn - Y l l + llx. - x ii iiYII , where the last line follows from the Cauchy-Schwarz inequality. Observing from (a) that llx. l l -+ ll x ll , we conclude that D 2. Hilbert Spaces 46 §2.2 Hilbert Spaces An inner-product space with the additional property of completeness is called a Hilbert space. To define completeness we first need the concept of a Cauchy sequence. Definition 2.2.1 (Cauchy Sequence). A sequence { xn , n = 1, 2, . . . } of elements of an inner-product space is said to be a Cauchy sequence if ll xn - xm ll � 0 as m, n � oo , i.e. if for every 6 > 0 there exists a positive integer N(6) such that l l xn - xm l l < 6 for all m, n > N(6). Definition 2.2.2 (Hilbert Space). A Hilbert space Yf is an inner-product space which is complete, i.e. an inner-product space in which every Cauchy sequence { xn } converges in norm to some element x E Yf. 2.2. 1 (Euclidean Space). The completeness of the inner-product space IRk defined in Example 2. 1. 1 can be verified as follows. If xn = (xnl , Xn 2 , . . . , Xnk )' E IRk satisfies k l l xn - xm l l 2 = L l xni - xm Y � 0 as m, n � oo , i =l then each of the components must satisfy EXAMPLE By the completeness of IR, there exists x i E IR such that l xn i - xd � 0 as n � oo , and hence if x = (x 1 , . . . , xd, then ll xn - x ll � 0 as n � oo . Completeness of the complex inner-product space Ck can be checked in the same way. Thus IRk and C k are both Hilbert spaces. (The Space L 2 (0., .'F, P)). Consider a probability space (0., .'F, P) and the collection C of all random variables X defined on 0. and satisfying the condition, ExAMPLE 2.2.2 EX 2 = fn X (w)2 P(dw) < oo. With the usual notion of multiplication by a real scalar and addition of random variables, it is clear that C is a vector space since E(aX) 2 = a 2 EX 2 < oo for all a E IR and X E C, §2.2. Hilbert Spaces 47 and, from the inequality (X + Y) 2 ::;; 2X 2 + 2 Y 2 , E(X + Y)2 ::;; 2EX 2 + 2E Y 2 < oo for all X, YE C. The other properties required of a vector space are easily checked. In particular C has a zero element, the random variable which is identically zero on Q. For any two elements X, Y E C we now define < X, Y) = E(X Y). (2.2. 1 ) I t i s easy t o check that <X, Y ) satisfies all the properties of a n inner product except for the last. If <X, X) = 0 then it does not follow that X(w) = 0 for all co, but only that P(X = 0) = 1. This difficulty is circumvented by saying that the random variables X and Y are equivalent if P(X = Y) = 1 . This equivalence relation partitions C into classes of random variables such that any two random variables in the same class are equal with probability one. The space U (or more specifically L2(Q, ff, P)) is the collection of these equivalence classes with inner product defined by (2.2. 1 ). Since each class is uniquely determined by specifying any one of the random variables in it, we shall continue to use the notation X, Y for elements of L2 and to call them random variables (or functions) although it is sometimes important to remember that X stands for the collection of all random variables which are equivalent to X. Norm convergence of a sequence {Xn } of elements of L2 to the limit X means Norm convergence of X" to X in an L 2 space is called mean-square con vergence and is written as X" � X. To complete the proof that U is a Hilbert space we need to establish completeness, i.e. that if II xm - xn 11 2 --+ 0 as m, n --+ oo , then there exists X E L 2 such that X" � X. This is indeed true but not so easy to prove as the completeness of IRk . We therefore defer the proof to Section 2. 10. ExAMPLE 2.2.3 (Complex L 2 Spaces). The space of complex-valued ran dom variables X on (Q, ff, P) satisfying E I X I 2 < oo is a complex Hilbert space if we define an inner product by <X, Y) = E(X Y). (2.2.2) In fact if J1 is any finite non-zero measure on the measurable space (Q, ff), and if D is the class of complex-valued functions on Q such that In 1/1 2 dj1 < 00 (2.2.3) (with identification of functions f and g such that Jn I f - g l 2 dJ1 = 0), then D becomes a Hilbert space if we define the inner product to be 48 2. Hilbert Spaces ( f, g) = L/g d,u. (2.2.4) This space will be referred to as the complex Hilbert space L 2 (D., %, ,u). (The real Hilbert space L 2 (D., %, ,u) is obtained if D is replaced by the real-valued functions satisfying (2.2.3). The definition of (f, g) then reduces to Sn fg d,u.) Remark 1 . The terms L 2 (D., %, P) and U(D., %, ,u) will be reserved for the respective real Hilbert spaces unless we state specifically that reference is being made to the corresponding complex spaces. (Norm Convergence and the Cauchy Criterion). If {x n } is a sequence of elements belonging to a Hilbert space Yf, then {x n } converges in norm if and only if l l xn - xm l l -> 0 as m, n -> oo . Proposition 2.2.1 PROOF. The sufficiency of the Cauchy criterion is simply a restatement of the completeness of Yf. The necessity is an elementary consequence of the triangle inequality. Thus if ll xn - x l l -> 0, llxn - xmll ::;:; ll xn - x ll + ll x - xm ll -> 0 as m, n -> 0. 0 EXAMPLE 2.2.4. The Cauchy criterion is used primarily in checking for the norm convergence of a sequence whose limit is not specified. Consider for example the sequence n sn = L a;X; (2.2.5) i=l where {X; } is a sequence of independent N (0, 1 ) random variables. It is easy to see that with the usual definition of the L 2 -norm, m m > n, II Sm - Sn ll 2 = L a?, i=n+l and so by the Cauchy criterion { S" } has a mean-square limit if and only if for every e > 0, there exists N(e) > 0 such that L7'= n+ l a? < e for m > n > N(e). Thus {Sn } converges in mean square if and only if :L � 1 a? < oo . §2.3 The Projection Theorem We begin this section with two examples which illustrate the use of the projection theorem in particular Hilbert spaces. The general result is then established as Theorem 2.3. 1 . EXAMPLE 2.3. 1 (Linear Approximation in IR3). Suppose we are given three vectors in IR3, 49 §2.3. The Projection Theorem z Figure 2. 1 . The best linear approximation y = rx 1 x 1 + rx 2 x 2 , to y. (i , t, 1)' , y = XI = ( 1 , 0, ty, X2 = (0, 1 , i)', Our problem is to find the linear combination y = rx 1 x 1 + rx 2 x 2 which is closest to y in the sense that S = II Y - rx 1 x 1 - rx 2 x 2 ll 2 is minimized. One approach to this problem is to write S in the form S = (i rx 1 )2 + (i - rx 2 )2 + (1 - irx 1 - irx 2 ) 2 and then to use calculus to minimize with respect to rx 1 and rx 2 . In the alternative geometric approach to the problem we observe that the required vector y = rx 1 x 1 + rx 2 x 2 is the vector in the plane determined by x 1 and x 2 such that y - rx 1 x 1 - rx 2 x 2 is orthogonal to the plane of x 1 and x 2 (see Figure 2. 1 ). The orthogonality condition may be stated as (2.3.1) i = 1 , 2, or equivalently IX 1 ( x 1 , X 1 ) + IX z ( x 2 , X 1 ) = ( y, X 1 ), rx 1 ( x 1 , x 2 ) + rx 2 ( x 2 , x 2 ) = ( y, x 2 ). - For the particular vectors x 1 , x 2 and y specified, these equations become ��rx 1 + /6 rx z /6 rx l + ��rx z from which we deduce that rx 1 = rx 2 = = = �, �, �' and y = (�, t �)' . 2.3.2 (Linear Approximation in L 2 (0., ff, P)). Now suppose that X 1 , X2 and Y are random variables in L 2 (0., ff, P). If only X 1 and X2 are observed we may wish to estimate the value of Y by using the linear combina tion Y = rx 1 X 1 + rx 2 X2 which minimizes the mean squared error, S = E l Y - rx 1 X 1 - rx 2 X2 1 2 = II Y - rx 1 X 1 - rx 2 X2 I I 2 . ExAMPLE As in Example 2.3.1 there are at least two possible approaches to this problem. The first is to write 50 2. Hilbert Spaces S = E Y 2 + af EXf + rx� EXi - 2rx 1 E(YX1 ) - 2rx 2 E(YX2 ) + rx 1 rx 2 E(X 1 X2 ), and then to minimize with respect to rx 1 and rx 2 by setting the appropriate derivatives equal to zero. However it is also possible to use the same geometric approach as in Example 2.3. 1 . Our aim is to find an element Y in the set {X E L 2 (Q, g;-, P) : X a 1 X 1 + a 2 X2 for some a 1 , a 2 E 1R}, whose "squared distance" from Y, I I Y - Yl l 2 , is as small as possible. By analogy with Example 2.3.1 we might expect Y to have the property that Y - Y is orthogonal to all elements of A. The validity of this analogy, and the extent to which it may be applied in more general situations, is established in Theorem 2.3 . 1 (the projection theorem). Applying it to our present problem, we can write A = = ( Y - rx 1 X1 - rx 2 X2 , X) = 0 for all X E A, or equivalently, by the linearity of the inner product, i = 1, 2. (2.3.2) (2.3.3) These are the same equations for rx 1 and rx 2 as (2.3. 1 ), although the inner product is of course defined differently in (2.3.3). In terms of expectations we can rewrite (2.3.3) in the form rx 1 E(Xl) + rx 2 E(X2 X d = E ( YXd, rx 1 E(X 1 X2 ) + rx 2 E(XI) = E( YX2 ), from which rx1 and rx 2 are easily found. Before establishing the projection theorem for a general Hilbert space we need to introduce a certain amount of new terminology. Definition 2.3.1 (Closed Subspace). A linear subspace A of a Hilbert space :ff is said to be a closed subspace of :ff if A contains all of its limit points (i.e. if x" E .4i and ll xn - x ll --> 0 imply that x E A). Definition 2.3.2 (Orthogonal Complement). The orthogonal complement of a subset A of :ff is defined to be the set Al l_ of all elements of :ff which are orthogonal to every element of A. Thus x E Al l_ if and only if (x, y) Proposition 2.3.1. subspace of£. = 0 (written x .l y) for all y E A. (2.3.4) If A is any subset of a Hilbert space :ff then Al l_ is a closed PROOF. It is easy to check from (2.3.4) that D E Al_ and that if x 1 , x 2 E A/ j_ then all linear combinations of x 1 and x 2 belong to Al l_ . Hence Al l_ is a subspace of £. If x" E Al l_ and l l xn - x ll --> 0, then by continuity of the inner product ( Proposition 2. 1 .2) (x, y) = 0 for all y E A, so x E Aj_ and hence Al l_ is closed. D , §2.3. The Projection Theorem 51 (The Projection Theorem). If A is a closed subspace of the Hilbert space Yf' and x E £, then Theorem 2.3.1 (i) there is a unique element .X E A such that l l x - .X II = inf llx - Y ll , and (ii) x E A and llx - x ll = infy e .1t ll x - y l l (x - x) E A_i. (2.3.5) if and only if x E A and [ The element x is called the (orthogonal) projection of x onto A.] PROOF. (i) If d = infye A ll x - yll 2 then there is a sequence { Yn } of elements of A such that llYn - x 11 2 -> d. Apply the parallelogram law (2. 1 .9), and using the fact that ( Ym + Yn )/2 E A, we can write 0 ::;; IIYm - Yn ll 2 = - 4 11 ( Ym + Yn )/2 - x ll 2 + 2( 11Yn - x l l 2 + IIYm - x l l 2 ) ::;; - 4d + 2( 11Yn - x ll 2 + IIYm - x l l 2 ) -> 0 a s m, n -> oo . Consequently, by the Cauchy criterion, there exists .X E Yf' such that II Yn - .X II -> 0. Since A is closed we know that .X E A, and by continuity of the inner product ll x - x l l 2 = lim llx - Yn ll 2 = d. To establish uniqueness, suppose that y E A and that llx - Pll 2 = ll x - x ll 2 = d. Then, applying the parallelogram law again, o ::;; l l x - P ll 2 = - 4 11 (x + P )/2 - x ll 2 + 2( 11 x - x ll 2 + li P - x ll 2 ) ::;; - 4d + 4d = 0. Hence y = x. (ii) If x E A and (x - x) E A _l then .X is the unique element of A defined in (i) since for any y E A, llx - Yll 2 = ( x - x + x - y, x - .X + x - y) = llx - x ll 2 + ll x - Y ll 2 ;:o: ll x - x ll 2 , with equality if and only if y = x. Conversely if x E .� and (x - x) ¢ A_l then x is not the element of A closest to x since x = x + ay/ I IYI I 2 IS closer, where y IS any element of A such that ( x - .X, y) =I= 0 and 2. Hilbert Spaces 52 a = <x - x, y). To see this we write l l x - x ll 2 = <x - x + x - x, x - x + x - .X ) = ll x - x ll 2 + l a i 2/ IIYII 2 + 2 Re <x - X, X - .X ) = ll x - x ll 2 - l a i 2 / IIY I I 2 < ll x - x ll 2 . D Corollary 2.3.1 (The Projection Mapping of Yf onto At). If At is a closed subspace of the Hilbert space Yf and I is the identity mapping on Yf, then there is a unique mapping PH of Yf onto At such that I - P.41 maps Yf onto At j_ . Pn is called the projection mapping of Yf onto At. PROOF. By Theorem 2.3. 1 , for each x E Yf there is a unique x E AI such that x - x E At j_. The required mapping is therefore P.41x = x, X E Yf. (2.3.6) D Proposition 2.3.2 (Properties of Projection Mappings). Let Yf be a Hilbert space and let PA denote the projection mapping onto a closed subspace At. Then (i) fH(rxx + {3y) = rxP.41 X + f3fHy, X, y E Yf, rx, {3 E C, (ii) ll x ll 2 = II P.Hx ll 2 + 11 ( 1 - P.H)x ll 2 , (iii) each x E Yf has a unique representation as a sum of an element of At and an element of Atj_, i.e. x = P.41 x + (I - P.41 ) x, (iv) (v) (vi) and (vii) (2.3.7) P.H xn --> P.41x if ll xn - x ll --> 0, x E At if and only if PAx = x, x E At j_ if and only if PAx = 0, Al 1 <;; At2 if and only if fH, P.412X = fA, x for all x E Yf. PROOF. (i) rxPAx + f3PA Y E At since A is a linear subspace of Yf. Also rxx + f3y - (rxP.41x + f3P.41 y) = rx(x - PAx) + f3(y - PAy) E A\ since Aj_ is a linear subspace of Yf by Proposition 2.3. 1 . These two properties identify rxPAx + f3PA Y as the projection P.41 (rxx + f3y). (ii) This is an immediate consequence of the orthogonality of P.Hx and (I - P"')x. (iii) One such representation is clearly x = P.H x + (I - fA )x. If x = y + z, y E A, Z E Alj_ is another, then y - PAx + z - (I - PH)x = 0. 53 §2.3. The Projection Theorem Taking inner products of each side with y - PAtx gives IIY - P.H x ll 2 = 0, since z - (I - fA)x E AJ.. Hence y = fJtx and z = (I - P.H)x. (iv) By (ii), I I P.ff (Xn - x) ll 2 ::;::; ll xn - x ll 2 -> 0 if ll xn - x ll -> 0. (v) x E A if and only if the unique representation x y + z, y E A, z E AJ., is such that y = x and z = 0, i.e. if and only if PAx = x. (vi) Repeat the argument in (v) with y = 0 and z = x. (vii) x = P.ff2 X + (I - PJt2)x. Projecting each side onto A 1 we obtain = PA, x = PAt, PAf2 X + fJt, (I - P.H) x. Hence fA, x = PAt, PAf2X for all x E Yl' if and only if PAt, Y = 0 for all y E Af, i.e. if and only if A± c:; At, i.e. if and only if A1 c:; A/2 . D The Prediction Equations. Given a Hilbert space £, a closed subspace A, and an element x E £, Theorem 2.3. 1 shows that the element of A closest to x is the unique element x E A such that (x - x, y) = 0 for all y E A. (2.3.8) The equations (2.3 . 1 ) and (2.3.2) which arose in Examples 2.3 . 1 and 2.3.2 are special cases of (2.3.8). In later chapters we shall constantly be making use of the equations (2.3.8), interpreting x = PAx as the best predictor of x in the subspace A. Remark 1. It is helpful to visualize the projection theorem in terms of Figure 2. 1 , which depicts the special case in which Yl' = IR 3 , A is the plane containing x 1 and x 2 , and y = PAY · The prediction equation (2.3.8) is simply the state ment (obvious in this particular example) that y - y must be orthogonal to A. The projection theorem tells us that y = PA Y is uniquely determined by this condition for any Hilbert space Yl' and closed subspace A. This justifies in particular our use of equations (2.3.2) in Example 2.3.2. As we shall see later (especially in Chapter 5), the projection theorem plays a fundamental role in all problems involving the approximation or prediction of random variables with finite variance. ExAMPLE 2.3.3 (Minimum Mean Squared Error Linear Prediction of a Stationary Process). Let {X,, t = 0, ± 1, . . . } be a stationary process on (Q, g;, P) with mean zero and autocovariance function y ( · ), and consider the problem of finding the linear combination Xn+l = LJ=l rPni Xn +l -i which best approxi mates xn +l in the sense that E I Xn+l - LJ=l rPnjxn+l -jl 2 is minimum. This problem is easily solved with the aid of the projection theorem by taking Yl' = U(n , g;, P) and A = {L}=1 rtiXn+ l -i : rt 1 , . . . , rt" E IR}. Since minimization of E I Xn+l - Xn+1 1 2 is identical to minimization of the squared norm II Xn+l Xn+1 ll 2 , we see at once that Xn+l = PAXn + 1 · The prediction equations (2.3.8) are ( Xn+l - i� rPni Xn+l -i' Y) = 0 for all Y E A, 54 2. Hilbert Spaces which, by the linearity of the inner product, are equivalent to the n equations t r/Jnjxn+l -j, xk I xn+l - j=l \ ) = o, k = n, n - 1 , . . . , 1 . Recalling the definition <X, Y) = E(X Y) of the inner product in L 2 (Q, !#', P), we see that the prediction equations can be written in the form (2.3.9) where clln = (r/Jn 1 , . . . , rPnn)', 'Yn = (y(l), . . . , y(n))' and rn = [y(i - j)J7. ;= 1 · The pro jection theorem guarantees that there is at least one solution rPn of (2.3.9). If rn is singular then (2.3.9) will have infinitely many solutions, but the projection theorem guarantees that every solution will give the same (uniquely defined) predictor xn +l · ExAMPLE 2.3.4. To illustrate the last assertion of Example 2.3.3, consider the stationary process X, = A cos(wt) + B sin(wt), where w E (0, n) is constant and A, B are uncorrelated random variables with mean 0 and variance a 2 . We showed in Example 1 .3.1 that for this process, y (h) = a 2 cos(wh). It is easy to check from (2.3.9) (see Problem 2.6) that r/J 1 = cos w and t/>2 = (2 cos w, - 1 )'. Thus X3 = (2 cos w)X2 - X 1 • The mean squared error of X3 is E(X3 - (2 cos w)X2 + X 1 f = 0, showing that for this process we have the identity, X3 = (2 cos w)X2 - X 1 • (2.3. 10) The same argument and the stationarity of { X, } show that X4 = (2 cos w)X3 - X2 , (2.3. 1 1) again with mean squared error zero. Because of the relation (2.3. 10) there are infinitely many ways to reexpress X4 in terms of X 1 , X2 and X3 • This is reflected by the fact that r3 is singular for this process and (2.3.9) has infinitely many solutions for r/J3 . §2.4 Orthonormal Sets (Closed Span). The closed span sp { x,, t E T} of any subset { x,, t E T} of a Hilbert space Yf is defined to be the smallest closed subspace of Yf which contains each element x,, t E T. Definition 2.4.1 §2.4. Orthonormal Sets 55 Remark 1. The closed span of a finite set { x 1 , • • • , x. } is the set of all linear combinations, y = a 1 x 1 + · · · + a.x., a 1 , . . . , a. E IC (or IR if :If is real). See Problem 2.7. If for example x 1 , x 2 E IR3 and x 1 is not a scalar multiple of x 2 then sp {x 1 , x 2 } is the plane containing x 1 and x 2 • Remark 2. If Jlt = sp { x 1 , . . . , x. }, then for any given x E .1t', ?_41 x is the unique element of the form such that <x - P.ff x, y) = 0, or equivalently such that <P.Afx, xj ) = j = 1, . . . , n. <x, xj ), (2.4. 1 ) The equations (2 4. 1 ) can be rewritten as a set of linear equations for a 1 , . . . , a., . VIZ. n (2.4.2) L ai <xi, xj ) = <x, xj), j = 1 , . . . , n. i� l By the projection theorem the system (2.4.2) has at least one solution for a1 , . . . , a. . The uniqueness of PAx implies that all solutions of (2.4.2) must yield the same element a 1 x 1 + · · · + a.x• . Definition 2.4.2 (Orthonormal Set). A set { e, t E T} of elements of an inner product space is said to be orthonormal if for every s, t E T, <e. , e1 ) = {1 if s = t, if s f= t. 0 (2.4.3) 1 EXAMPLE 2.4. 1 . The set of vectors { ( 1 , 0, 0)', (0, 1, 0)', (0, 0, )' } is an ortho normal set in IR3. EXAMPLE 2.4.2. Any sequence { Z, t E Z} of independent standard normal random variables is an orthonormal set in L 2 (D., ;F, P). If { e 1 , . . . , ed is an orthonormal subset of the Hilbert space .1t' and Jlt = sp{e 1 , . . . , ek }, then k P41 x = L <x, e i ) ei for all x E .Yt', (2.4.4) i=! Theorem 2.4.1 . 11 Pu x ll 2 I it x- = <x, e i ) ei k L l <x, ei ) l 2 for all x E .Yt', i= l I �I t I x- i ci ei for all x E .Yt', (2.4.5) (2.4.6) 2. Hilbert Spaces 56 and for all c 1 , . . . , ck E C (or IR if Yf is real). Equality holds in (2.4.6) if and only ifc; = (x, e; ), i = 1 , . . . , k. The numbers (x, e; > are sometimes called the Fourier coefficients ofx relative to the set { e 1 , . . . , ed. PROOF. To establish (2.4.4) it suffices by Remark 2 to check that P.ff x as defined by (2.4.4) satisfies the prediction equations (2.4. 1 ), i.e. that (t (x, e; ) e;, ej ) = (x, ei), j = 1, . . . , k. But this is an immediate consequence of the orthonormality condition (2.4.3). The proof of (2.4.5) is a routine computation using properties of the inner product and the assumed orthonormality of { e 1 , • • • , ed. By Theorem 2.3. 1 (ii), I I x - P.ff x I I � I I x - y I I for all y E A, and this is precisely the inequality (2.4.6). By Theorem 2.3. 1 (ii) again, there is equality in (2.4.6) if and only if k k (2.4.7) I C;e; = �41x = L (x, e; ) e;. i == l i= l Taking inner products of each side with ei and recalling the orthonormality assumption, we immediately find that (2.4.7) is equivalent to the condition ci ( x, ei >, j = 1, . . . , k. D = (Bessel's Inequality). If x is any element of a Hilbert space Yf and { e 1 , . . . , ed is an orthonormal subset of Yf then k I l (x, e;) l 2 � ll x ll 2 • (2.4.8) i� 1 PROOF. This follows at once from (2.4.5) and Proposition 2.3.2 (ii). D Corollary 2.4.1 Orthonormal Set). If { e,, t E T} is an orthonormal subset of the Hilbert space Yf and if Yf = sp { e,, t E T} then we say that { e,, t E T} is a complete orthonormal set or an orthonormal basis for Yf. Definition 2.4.3. (Complete Definition 2.4.4 (Separability). The Hilbert space Yf is Yf = sp{ e,, t E T} with { e,, t E T} a finite or countably infinite set. If Yf is the separable Hilbert space Yf { e;, i = 1, 2, . . . } is an orthonormal set, then Theorem 2.4.2. = separable if orthonormal sp { e1 , e 2 , . • . } where (i) the set of all finite linear combinations of {e 1 , e 2 , • • . } is dense in Yf, i.e. for each x E Yf and £ > 0, there exists a positive integer k and constants c 1 , . . . , ck such that I - ;t I < x c;e; £, (2.4.9) 57 §2.4. Orthonormal Sets (ii) (iii) (iv) (v) == = x L� 1 <x, e; ) eJor each x E £, i.e. ll x - L7= 1 <x, e; ) eJ -+ 0 as n -+ oo , ll x ll 2 L�1 l <x, e; ) l 2 for each x E £, <x, y) = L: � 1 <x, e; ) <e;, y) for each x, y E £, and x 0 if and only if <x, e; ) = 0 for all i 1 , 2, = ... . The result (iv) is known as Parseval's identity. PROOF. (i) If S = U ;;, 1 sp { e 1 , • . . , ei }, the set of all finite linear combinations of { e 1 , e 2 , . . . } , then the closure S of S is a closed subspace of £ (.Problem 2. 1 7) containing { e;, i = 1 , 2, . . . }. Since £ is by assumption the smallest such closed subspace, we conclude that S = £. (ii) By Bessel's inequality (2.4.8), L�=1 l <x, e; ) l 2 s ll x ll 2 for all positive integers k. Hence L:� 1 l <x, e; ) l 2 s ll x ll 2. From (2.4.6) and (2.4.9) we conclude that for each t: > 0 there exists a positive integer k such that I i� x- <x, e) e; I < e. . Now by Theorem 2.4. 1, L7= 1 <x, e) e; = PAx where A = sp {e 1 , . . , e. }, and since for k s n, I �= 1 <x, e; ) e; E A, we also have I i� I < e � = I X - i� r + I i� < i= l (iii) From (2.4. 10) we can write, for n l l x ll 2 for all n � k. <x, e)e; x- k, <x, e) e; £2 + <x, e; ) e; 00 (2.4. 1 0) r L l <x, e; ) l 2 • Since t: > 0 was arbitrary, we deduce that 00 l <x, e; ) l 2 , ll x ll 2 :S I i =1 which together with the reversed inequality proved in (ii), establishes (iii). (iv) The result (2.4. 10) established in (iii) states that II L7=1 <x, e)e; - x ll -+ 0 as n -+ oo for each x E £. By continuity of the inner product we therefore have, for each x, y E £, <x, y) (� n n_,.oo i = l = i::;;: l = !�� <x, e; ) e;, i� <y, ei ) ei) = lim I <x, e; ) <e;, y) 00 L <x, e; ) <e;, y). (v) This result is an immediate consequence of (ii). 0 2. Hilbert Spaces 58 Remark 3. Separable Hilbert spaces are frequently encountered as the closed spans of countable subsets of possibly non-separable Hilbert spaces. §2.5 Projection in IR" In Examples 2. 1 . 1 , 2. 1.2 and 2.2. 1 we showed that !Rn is a Hilbert space with the inner product < x, y ) n = L X ; Y; , i= l (2.5. 1 ) the corresponding squared norm ll x ll 2 n = L xf, i=l and the angle between x and y, e = cos - 1 ( < x, y ) ll x i i i i Y II (2.5.2) ) . (2.5.3) Every closed subspace A of the Hilbert space !Rn can be expressed by means of Gram-Schmidt orthogonalization (see for example Simmons ( 1963)) as A = sp {e 1 , . . . , em } where {e1 , . . . , em } is an orthonormal subset of A and m ( s n) is called the dimension of A (see also Problem 2. 1 4). lf m < n then there is an orthonormal subset { em + l , . . . , en } of A j_ such that A j_ = sp{ em l , . . . , en } · By Proposition 2.3.2 (iii) every x E !R n can be expressed uniquely as a sum of two elements of A and A j_ respectively, namely + (2.5.4) where, by Theorem 2.4. 1 , PJtx m = L < x, e; ) e; i= l and n (/ - �41)x = L i= m + l < x, e; ) e;. (2.5.5) (2.5.6) The following theorem enables us to compute PJtx directly from any specified set of vectors { x 1 , . . . , xm } spanning A. Theorem 2.5.1 . If X ; E !Rn, i = 1 , . . . , m, and A = sp { x 1 , . . . ,xm } then where X is the n x (2.5.7) = m matrix whose /h column is xi and X'XP X'x. (2.5.8) §2.5. Projection in IR" 59 Equation (2.5.8) has at least one solution for p but XP is the same for all solutions. There is exactly one solution of(2.5.8) if and only if X' X is non-singular and in this case P.41x = X(X' X) -1 X'x. (2.5.9) PROOF. Since P.41x E At, we can write m [J;x ; = xp, fA x = I =1 for some P = ([31 , • • • , f3m)' E !Rm. i The prediction equations (2.3.8) are equivalent in this case to <XP, xi ) = <x, xi), j = 1, . . . , m, (2.5.1 0) (2.5. 1 1) and in matrix form these equations can be written X'XP = X'x. (2.5. 1 2) The existence of at least one solution for p is guaranteed by the existence of the projection P.41x. The fact that Xp is the same for all solutions is guaranteed by the uniqueness of P.41 x. The last statement of the theorem follows at once from (2.5.7) and (2.5.8). D Remark l. lf { x 1 , , xm} is an orthonormal set then X' X is the identity matrix and so we find that • • • m P.41x = XX'x = L <x, x; )x;, i=1 in accordance with (2.5.5) Remark 2. If { x 1 , , xm} is a linearly independent set then there must be a unique vector p such that P.41x = xp. This means that (2.5.8) must have a unique solution, which in turn implies that X' X is non-singular and • • • P.41x = X(X' X) -1 X'x for all x E IR". The matrix X(X' X) - 1 X' must be the same for all linearly independent sets {x 1 , . . . , xm} spanning At since P.41 is a uniquely defined mapping on IR". Remark 3. Given a real n x n matrix M, how can we tell whether or not there is a subspace At of IR" such that M x = P.41x for all x E IR"? If there is such a subspace we say that M is a projection matrix. Such matrices are characterized in the next theorem. Theorem 2.5.2. The n x n matrix M is a projection matrix if and only if (a) M' = (b) M 2 = M and M. 2. Hilbert Spaces 60 PROOF. If M is the projection matrix corresponding to some subspace A then by Remark 2 it can be written in the form X(X' xr 1 X' where X is any matrix having linearly independent columns which span A. It is easily verified that (a) and (b) are then satisfied. Suppose now that (a) and (b) are satisfied. We shall show that Mx = PA!x for all x E IR" where A is the range of M defined by R(M) = {Mx : x E IR"} . First observe that M x E R(M) b y definition. Secondly we know that for any y E R(M) there exists w E IR" such that y = Mw. Hence ( x - Mx, y) = ( x - Mx, Mw) = x'(J - M)' Mw = 0 for all y E R(M), showing that Mx is indeed the required projection. 0 §2.6 Linear Regression and the General Linear Model Consider the problem of finding the "best" straight line (2.6. 1 ) 81 x 82 , or equivalently the best values e1 , e2 of 81 , 82 E IR, t o fit a given set o f data In least squares regression the best estimates e 1 , e2 points y;), i = 1 , . . are defined to be values of 81 , 82 which minimize the sum, S(81 , 82 ) iL= l (y; - 81 X; - 82 f, of squared deviations of the observations Y; from the fitted values 81 82 • This problem reduces to that of computing a projection in IR" as is easily seen by writing S (81 , 82 ) in the equivalent form (2.6.2) ... where x = 1 = (1, . . . ,1)' and y = (y 1 , . . . , y.)'. By the projection theorem there is a unique VeCtOr of the form ({)1 X {)2 1) Which minimizes PA!y where A = sp {x, 1 } . S(8Defining 1 , 82 ), namely X to be the 2 matrix X = [x, 1 ] and 6 to be the column vector 6 = ({)1 , e2 )', we deduce from Theorem 2.5. 1 that y= (x;, ., + n. n = X; + (x 1 , ' , x. ) , + n x PA!y = x6 where (2.6.3) X'X6 = X'y. There is a unique solution 6 if and only if X' X is non-singular. In this case 6 = (xxr 1 X'y. (2.6.4) If X' X is singular there are infinitely many solutions of (2.6.4), however by the uniqueness of �"'y, x6 is the same for all of them. §2.6. Linear Regression and the General Linear Model 61 The argument just given applies equally well to least squares estimation for the general linear model. The general problem is as follows. Given a set of data points i = 1, . . . , n; m � n, we are required to find a value o = (B1 , . . . , emy of a = (81 , . . . , emy which minimizes m S(fJ ) = L (yi - 81 x[Il - · · · - 8mx! l) 2 i= l n where y = (y 1 , . . . ,y. )' and x Ul = (xpl, . . . , x�jl)', j = 1 , . . . , m. By the projection m theorem there is a unique vector of the form (B 1 x 0 l + · · · + emx< l) which m minimizes S( fJ ), namely PAY where .$1 = sp { x < l l, . . . , x< l } . Defining X t o be the n x m matrix X = [x1 1 l, . . . , x <ml] and 0 to be the column vector 0 = ( 81 , . . . , (Jm)', we deduce from Theorem 2.5 . 1 that PAY = XO where X'XO = X'y (2.6.5) 0 = (X' X)- 1 X'y. (2.6.6) As in the special case of fitting a straight line, 0 is uniquely defined if and only if X' X is non-singular, in which case If X' X is singular then there are infinitely many solutions of (2.6.5) but XO is the same for all of them. In spite of the assumed linearity in the parameters 81 , . . . , (Jm, the applica tions of the general linear model are very extensive. As a simple illustration, let us fit a quadratic function, y = 81 x 2 + 82 x + 83, to the data 0 2 3 4 3 5 8 The matrix X for this problem is 0 0 20 10 - 40 1 1 1 74 - 108 . giving (X' Xr 1 = - 40 X= 4 2 1 0 1 24 20 - 108 9 3 16 4 The least squares estimate 0 = (B1 , B2 , B3)' is therefore unique and is found � [ ] 2. Hilbert Spaces 62 from (2.6.6) to be 9 = (0.5, - 0. 1 , 0.6)'. The vector of fitted values XO = PAY is given by xa = (0.6, 1, 2.4, 4.8, 8.2)', as compared with the vector of observations, y = ( 1 , 0, 3, 5, 8)'. §2.7 Mean Square Convergence, Conditional Expectation and Best Linear Prediction in L2 (Q, ff, P) All results in this section will be stated for the real Hilbert space L 2 = L 2 (0., �, P) with inner product (X, Y) = E(X Y). The reader should have no difficulty however in writing down analogous results for the complex space L 2 (0., �' P) with inner product (X, Y) = E(X Y). As indicated in Example 2.2.2, mean square convergence is just another name for norm convergence in L 2 , i.e. if xn , X E L 2 , then Xn � X if and only if I I Xn - X ll 2 = E I X. - Xl2 --+ 0 as n --+ oo . (2.7. 1 ) By simply restating properties already established for norm convergence we obtain the following proposition. Proposition 2.7.1 (Properties of Mean Square Convergence). (a) X" converges in mean square if and only if E I Xm - X. l2 --+ 0 as m, n --+ oo . (b) If X. � X and Yn � Y then a s n --+ oo , (i) EX. = (X., 1 ) --+ (X, 1 ) = EX, (ii) E I Xn l 2 = (Xn , Xn ) --+ (X, X) = E I X I 2, and (iii) E(X. Yn ) = (X., Yn ) --+ (X, Y ) = E(X Y). Definition 2.7.1 (Best Mean Square Predictor of Y). If A is a closed subspace of L 2 and Y E L 2 , then the best mean square predictor of Y in A is the element Y E A such that (2.7.2) I I Y - Y II 2 = inf II Y- Z II 2 = inf E I Y - ZI 2 . Z E J! Z E J! The projection theorem immediately identifies the unique best predictor of Y in A as PA Y. By imposing a little more structure on the closed subspace §2.7. Mean Square Convergence, Conditional Expectation 63 A, we are led from Definition 2. 7. 1 to the notions of conditional expectation and best linear predictor. (The Conditional Expectation, E.41 X). If A is a closed sub space of L 2 containing the constant functions, and if X E L 2, then we define the conditional expectation of X given A to be the projection, Definition 2.7.2 (2.7.3) Using the definition of the inner product in L 2 and the prediction equations (2.3.8) we can state equivalently that E H X is the unique element of A such that E( WE.41X) = E ( WX) for all W E A. (2.7.4) Obviously the operator EH on L 2 has all the properties of a projection operator, in particular (see Proposition 2.3.2) a, b E IR, (2.7.5) (2.7.6) and (2.7.7) Notice also that (2.7.8) and if A0 is the closed subspace of L 2 consisting of all the constant functions, then an application of the prediction equations (2.3.8) gives (2.7.9) (The Conditional Expectation E(X 1 Z)). If Z is a random variable on (0, ff, P) and X E L 2 (0, ff, P) then the conditional expectation of X given Z is defined to be Definition 2.7.3 (2.7. 10) where A(Z) is the closed subspace of L 2 consisting of all random variables in L 2 which can be written in the form r/J (Z) for some Borel function r/J : IR --+ IR. (For the proof that A(Z) is a closed subspace see Problem 2.25.) The operator E H<Zl has all the properties (2.7.5)-(2.7.8), and in addition (2.7. 1 1) Definition 2. 7.3 can be extended in a fairly obvious way as follows: if Z1 , . . . , Z" are random variables on (0, ff, P) and X E L 2 , then we define (2.7. 1 2) where A(Z 1 , . . . , Z") is the closed subspace of L 2 consisting of all random 2. Hilbert Spaces 64 variables in L 2 of the form ¢ (Z 1 , , Zn ) for some Borel function ¢ : !R n -+ R The properties of E41121 listed above all carry over to E41 1z 1 z" J · • • • • • • • • Conditional Expectation and Best Linear Prediction. By the projection theorem, the conditional expectation EA1z1 , . . . . zjX) is the best mean square predictor of X in A(Z1 , , Zn ), i.e. it is the best function of Z1 , , Zn (in the m.s. sense) for predicting X. However the determination of projections on A(Z 1 , . . . , Zn ) is usually very difficult because of the complex nature of the equations (2.7.4). On the other hand if Z1 , . . . , Zn E L2, it is relatively easy to compute instead the projection of X on sp f l , Z 1 , . . . , Zn } <;; A(Z 1 , . . . , Zn ) since we can write . . • . . • Ps�{ p 1 , z 1 , . . . , z " }(X) = where r:J. 0 , . . . , r:J.n satisfy "' � " i=O r:J. . Z. t l' Z0 = 1 , (2.7. 1 3) (2.7. 1 4) or equivalently, n L r:J.;E(Z;ZJ = E(XZi ), j = 0, 1 , . . . , n. i=O (2.7. 1 5) The projection theorem guarantees that a solution (r:J.0 , , r:J.n ) exists. Any solu tion, when substituted into (2.7. 1 3) gives the required projection, known as the best linear predictor of X in terms of 1, Z 1 , , Zn . As a projection of X onto a subspace of A(Z1 , , Zn) it can never have smaller mean squared error than EA1z 1 . . . . . z"1 X. Nevertheless it is of great importance for the following reasons: • . . . • • • . . (a) it is easier to calculate than EA<z 1 , . . . ,zjX), (b) it depends only on the first and second order moments, EX, EZ;, E(Z; Zi ) and E(XZi ) of the joint distribution of (X, Z 1 , . . . , Zn ), (c) if(X, Z 1 , . . . , Zn )' has a multivariate normal distribution then (see Problem 2.20), P;;p( ! , Z1 , . . . . z" } (X) = EA<Z 1 . . . . . zjX) . Best linear predictors are defined more generally as follows: Linear Predictor of X in Terms of { ZA, A E A} ). If X E U and ZA E L 2 for all A E A, then the best linear predictor of X in terms of {ZA, }_ E A} is defined to be the element of sp {ZA, A E A} with smallest mean square distance from X. By the projection theorem this is just P;;p(z,. A . A } X. Definition 2.7.4 (Best 2.7. 1 . Suppose Y = X 2 + Z where X and Z are independent standard normal random variables. The best predictor of Y in terms of X ExAMPLE 65 §2.8. Fourier Series is E( Y I X) = X 2. (The reader should check that the defining properties of E(Y I X) = E11<x> Y are satisfied by X 2 , i.e. that X 2 E .A(X) and that (2.7.4) is satisfied with .A = .A(X).) On the other hand the best linear predictor of Y in terms of { l, X} is psp{ l . x } Y = aX + b, where, by the prediction equations (2.7.1 5), <aX + b, X) = < Y, X) = E ( YX) = 0 and <aX + b, l) = < Y, l) = E ( Y) = l. Hence a = 0 and b = l so that psp{ l , X } y = l. The mean squared errors of the two predictors are II E ( Y I X) - Y ll 2 = E(Z2 ) = l, and showing the substantial superiority of the best predictor over the best linear predictor in this case. Remark 1. The conditional expectation operators EJI!<Z> and £_41(z, , . . . . z"> are usually defined on the space L 1 (0., !?, P) of random variables X such that E I X I < oo (see e.g. Breiman ( 1 968), Chapter 4). The restrictions of these operators to L2(0., !?, P) coincide with £.4/(Z) and EA(Z, , . . . . z"> as we have defined them. §2.8 Fourier Series Consider the complex Hilbert space L 2 [ - n, n] = L 2 ( [ - n, n], !!J, U ) where !!J consists of the Borel subsets of [ - n, n ], U is the uniform probability measure U(dx) = (2n)- 1 dx, and the inner product of f g E L2 [ - n, n] is defined as usual by , l <J, g) = Ejg = 2n The functions {en , n E Z} defined by I" f(x)g(x) dx. (2.8. 1 ) -n (2.8.2) 2. Hilbert Spaces 66 are orthonormal in L 2 [ - n, n ] since I" . 1 I" 1 < em • en ) = 2n _, =2n _, = e•<m n)x dx [cos(m - n)x + i sin(m - n)x] dx if m = n 0 if m f:. n. {1 (Fourier Approximations and Coefficients). The n1h order Fourier approximation to any function f E L 2 [ - n, n] is defined to be the projection of f onto sp {ej , l j l ::;; n}, which by Theorem 2.4. 1 is Definition 2.8.1 n The coefficients SJ = L < f, ej ) ej . j= -n (2.8.3) (2.8.4) are called the Fourier coefficients of the function f. We can write (2.8.3) a little more explicitly in the form n (2.8.5) x E [ - n, n] , SJ(x) = L <J, ej ) e ijx, j= - n and one is naturally led to investigate the senses (if any) in which the sequence of functions { Sn f} converges to f as n --+ oo. In this section we shall restrict attention to mean square convergence, deferring questions of pointwise and uniform convergence to Section 2. 1 1 . The sequence {SJ } has a mean square limit as n --+ oo which we shall denote by L� -oo < f, ej > ej or Sf (b) Sf = f. Theorem 2.8.1 . (a) PROOF. (a) From Bessel' s inequality (2.4.8) we have L lk; ;n I <f, ej > 1 2 s 11/11 2 for all n which implies that L� -oo I < J, ej ) l 2 < oo . Hence for n > m 2 1 , IISJ - Sm /11 2 s L l <f, ej ) l 2 --+ 0 as m --+ oo , ljl > m showing that {SJ } is a Cauchy sequence and therefore has a mean square limit. (b) For l j l ::;; n, <SJ, e) = <J, ej ), so by continuity of the inner product <Sf, e) = limoo <SJ, ej > = <f, ej > for all j E J;. n� §2.9. Hilbert Space Isomorphisms 67 In Theorem 2. 1 1 .2 we shall show that ( g , ei ) Hence Sf- f 0. = Corollary 2.8.1 . L 2 [ - n, n] = = 0 for allj E Z implies that g = 0. 0 sp { ei,j E Z}. PROOF. Any f E e [- n, n] can be expressed as the mean square limit of Snf where SJE sp{ei,j E Z } . Since sp{ei,j E Z} is by definition closed it must con tainf. Hence sp{ei,j E Z} 2 U [ - n, n]. 0 Corollary 2.8.2. (b) ( f, g ) = L � -ro l ( f, e) l 2 . L � -00 (f, e) (g, ei ) . (a) 1 1 !1 1 2 = PROOF. Corollary 2.8. 1 implies that the conditions of Theorem 2.4.2 are satisfied. 0 §2.9 Hilbert Space Isomorphisms Definition 2.9.1 (Isomorphism). An isomorphism of the Hilbert space ..no1 onto the Hilbert space ..no2 is a one to one mapping T of ..no1 onto ..no2 such that for all !1 . !2 E ..nol , and (a) T(af1 + bf2 ) = aTf1 + bTf2 for all scalars a and b We say that ..no1 and ..no2 are isomorphic if there is an isomorphism T of ..no1 onto ..no2 . The inverse mapping T - 1 is then an isomorphism of ..no2 onto ..no1 . Remark 1 . In this book we shall always use the term isomorphism to indicate that both (a) and (b) are satisfied. Elsewhere the term is frequently used to denote a mapping satisfying (a) only. EXAMPLE 2.9. 1 (The Space / 2 ). Let F denote the complex Hilbert space of sequences { zn, n = 1, 2, . . . }, zn E IC, L ��1 l z� I < oo , with inner product ( { Yn }, {zn} ) = 00 L Y;Z;. i=l (For the proof that / 2 is a separable Hilbert space see Problem 2.23.) If now any Hilbert space with an orthonormal basis {en, n = 1 , 2, . . . } then the mapping T : ..no --+ 12 defined by ..no is (2.9. 1 ) i s an isomorphism of ..no onto F (see Problem 2.24). Thus every separable Hilbert space is isomorphic to / 2 • 68 2. Hilbert Spaces Properties of Isomorphisms. Suppose T is an isomorphism of £1 onto £2 . We then have the following properties, all of which follow at once from the definitions: (i) If {en } is a complete orthonormal set in £1 then { Ten} is a complete orthonormal set in £2 • (ii) I Tx I = I x II for all x E £1 . (iii) llxn - x ll --> 0 if and only if I Txn - Tx ll --> 0. (iv) {xn } is a Cauchy sequence if and only if { Txn } is a Cauchy sequence. (v) TPS!i{x, .l. e A} (x) = PS!i{ Tx, .l. e A} (Tx). The last property is the basis for the spectral theory of prediction of a stationary process { X,, t E Z} (Section 5.6), in which we use the fact that the mapping . . defines an isomorphism of a certain Hilbert space of random variables onto a Hilbert space L 2 ( [ - n, n] , 86', .u) with .u a finite measure. The problem of computing projections in the former space can then be tranformed by means of (v) into the problem of computing projections in the latter. §2. 10* The Completeness of L2 (Q, ff, P) We need to show that if xn E L 2 ' n = 1 , 2, . . . ' and I I Xn - Xm il --> 0 as m, n --> oo , then there exists X E L 2 such that xn � X. This will be shown by identifying X as the limit of a sufficiently rapidly converging subsequence of { Xn}· We first need a proposition. and 1 1 Xn+ 1 - Xn ll ::s; r n, n = 1, 2, . . . , then there is a random variable X on (Q, :F, P) such that Xn --> X with probability one. Proposition 2.1 0.1. 1/ Xn E L 2 PROOF. Let X o = 0. Then xn = LJ=1 ( Xj - xj - 1 ). Now I.;1 1 Xj - xj- 1 1 is finite with probability one since, by the monotone convergence theorem and the Cauchy-Schwarz inequality, X - x 1 ll X1 + I rj < oo . X -x l EI E Xj - xj - 1 l :::;; I j =1 I j j-1 = jI j =1 II j j - :::;; II II j = 1 =1 I It follows that limn� LJ=1 I Xj - xj - 1 1 (and hence limn� L J= 1 ( Xj - xj - d = limn�oo Xn ) exists and is finite with probability one. D 00 00 00 co Theorem 2. 1 0. 1 . L 2 (Q, :F, P) is complete. C1J co PROOF. If { Xn } is a Cauchy sequence in L 2 then we can find integers n 1 , n 2 , . . . , such that n 1 < n 2 < · · · and §2. 1 1 . * Complementary Results for Fourier Series 69 (2. 1 0. 1 ) (First choose n 1 t o satisfy (2. 1 0. 1 ) with k = 1 , then successively choose n 2 , n 3 , . . . , to satisfy the appropriate conditions.) By Proposition 2. 1 0. 1 there is a random variable X such that X"" ---> X with probability one as k ---> oo. Now II Xn - X ll 2 = f iXn - X l 2 d P = lim inf iXn - Xnl d P, J and so by Fatou's lemma, k�oo IIXn - X ll 2 :s; lim inf ii Xn - XnJ 2 • k�oo (2. 1 0.2) The right-hand side of (2. 10.2) can be made arbitrarily small by choosing n large enough since {Xn } is a Cauchy sequence. Consequently II Xn - X ll 2 ---> 0. The fact that E I X I 2 < oo follows from the triangle inequality II X II :s; II X" - X I I + IIXn ll , the right-hand side of which is certainly finite for large enough n. 0 §2. 1 1 * Complementary Results for Fourier Series The terminology and notation of Section 2.8 will be retained throughout this section. We begin with the classical result that trigonometric polynomials are uniformly dense in the space of continuous functions f which are defined on [- n, n] and which satisfy the condition f(n) = f( - n). Theorem 2.1 1.1. .f( - n). Then Let .f be a continuous function on [ - n, n] such that .f(n) = (2.1 1 . 1 ) uniform/y on [ - n, n] as n ---> oo. PROOF. By definition of the n1h order Fourier approximation, SJ(x) = I < f, ei ) ei lil s n which by defining .f(x) = f(x + f" f(y) I e ii(x � y) dy, lil s n �, 2n), x E IR, can be rewritten as = (2n)�l SJ(x) = (2n) � 1 where Dn(Y) is the Dirichlet kernel, f, .f(x - y)Dn (Y) dy, (2. 1 1 .2) 70 2 . Hilbert Spaces 12 1 1 10 9 8 7 6 5 4 3 2 0 -1 -2 -3 -4 -5 -3 -4 Q -1 -2 2 3 4 5 Figure 2.2. The Dirichlet kernel D5 (x), - 5 :s; x :s; 5 (D"( · ) has period 2n:). Dn (y) = L lks n .. e 'JY = e i(n+1/2)y _ e-i(n+ 1/2)y . - e lyj2 . e•y/2 = { sin [(n + 1-)y] . If y # 0, . sm b·1 y) If y = 0. 2n + 1 (2. 1 1 .3) • A graph of the function D" is shown in Figure 2.2. For the function f(x) = 1 , <f, e0 ) = 1 and <J, ei ) = O, j # 0. Hence S" 1 (x) = 1 , and substituting this i n (2. 1 1 .2) we find that (2n)-1 J:, Dn(Y) dy Making use of (2. 1 1 .2) we can now write n - 1 (S0f(x) + · · · + S" _J(x)) = where Kn( Y) is the Fejer kernel, K n ( Y) = �1� "�1 2 nn if... =O . DJ ( Y) = = 1. J:/(x - y)K"(y) dy, L}:J sin [(j + 1-)y] . . 2 nn sm ( z1 Y) Evaluating the sum with the aid of the identity, 2 sin{ty)sin[(j + 1-)y] we find that = cos(jy) - cos [(j + 1)y], (2. 1 1 .4) (2. 1 1 .5) §2. 1 1 . * Complementary Results for Fourier Series 71 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -5 -4 -3 -2 ! -1 0 3 2 4 5 Figure 2.3. The Fejer kernel K 5(x), - 5 :s; x :s; 5 (K "( · ) has period 2n). K"(y) = 1 sin 2 (ny/2) . If y =f. O, 2nn sin 2 (y/2) n 2n (2. 1 1 .6) . If y = 0. The Fejer kernel is shown in Figure 2.3. It has the properties, (a) (b) (c) (d) (e) K n(Y) � 0 (unlike D"(y)), K"( · ) has period 2n, K"( - ) is an even function s� " K "(y) dy = 1 , for each b > 0, J�a Kn( Y) dy --> 1 as n --> oo . The first three properties are evident from (2. 1 1 .6). Property (d) is obtained by setting f(x) = 1 in (2. 1 1 .5). To establish (e), observe that Kn( Y) :s;; 1 for 0 < 2nn sin 2 (c5/2) c'5 < IYI :s;; n. For each c'5 > 0 this inequality implies that t�: Kn( Y) dy + 1" Kn( Y) dy --> 0 as n --> 00, which, together with property (d), proves (e). Now for any continuous function f with period 2n, we have from (2. 1 1 .5) 2. Hilbert Spaces 72 and property (d) of K " ( . ), 11 " (x) = l n - 1 (S0f(x) + · · · + Sn-d(x)) - f(x) l = = Hence for each .:5 > 0, 11n (x ) :$; l f/ If (x - y)K n (y) dy - f(x) , [ f(x - y) - f(x)] Kn (y) dy lf +Il o l [f(x - y) - f(x)] Kn (y) dy J [ - n , n]\(- o , o ) I I · [f(x - y) - f(x)] K n ( Y) dy I· (2. 1 1 .7) Since a continuous function with period 2n is uniformly continuous, we can choose for any 6 > 0, a value of .:5 such that sup _, ,; x ,; n l f(x - y) - f(x) l < 6 whenever I Y I < .:5. The first term on the right of (2. 1 1 .7) is then bounded by 6 J':., K n ( Y) dy and the second by 2M(l - J� o Kn ( Y) dy) where M = sup _, ,;x,; n l f(x) l . Hence _.. 6 as n _.. oo . But since 6 was arbitrary and 11n(x) 2 0, we conclude that 11" (x) _.. 0 uniformly on [ - n, n] as required. D Remark 1. Under additional smoothness conditions on f, Sn f may converge to f in a much stronger sense. For example if the derivative f' exists and f' E L2 [ - n, n] , then Sn f converges absolutely and uniformly to f (see Chur chill ( 1 969) and Problem 2.22). Theorem 2.1 1 .2. IffE L 2 [ - n, n] everywhere. and (f, ei > = 0 for all j E 71., then f = 0 almost PROOF. It sufficies to show that JAf(x) dx = 0 for all Borel subsets A of [ - n, n] or, equivalently, by a monotone class argument (see Billingsley ( 1986)), (2n) -1 f f(x) dx = (f, I1a , bJ ) = 0 (2. 1 1.8) for all subintervals [a, b] of [ - n, n]. Here Ira.bJ denotes the indicator function of [a, b]. To establish (2. 1 1 .8) we first show that (f, g) = 0 for any continuous function g on [ - n, n] with g( - n) g(n). By Theorem 2. 1 1 . 1 we know that = 73 Problems -n a a + !jn b - !jn b n Figure 2.4. The continuous function h. approximating J[a. bJ · for g continuous, g. = n -! (Sog plying in particular that + . . . + s.- g) --+ g uniformly on [ - n, TC], im 1 m.s . g. � g. By assumption ( f, g. ) = 0, so by continuity of the inner product, (f, g) = lim (f, g. ) = 0. The next step is to find a sequence {h.} of continuous functions such that h. � I[a.bJ · One such sequence is defined by h. (x) = 0 n (x - a ) 1 - n (x - b) 0 if - n :::;; x :::;; a, if a :::;; x :::;; a + 1/n, if a + 1/n :::;; x :::;; b - 1/n, if b - 1/n :::;; x :::;; b, if b :::;; x :::;; n, since III[a.bJ - h.ll 2 :::;; ( 1 /2n) (2/n) --+ 0 as n --+ continuity of the inner product again, oo. (f, /[a.bJ ) = lim ( J, h. ) (See Figure 2.4.) Using the = 0. D Problems 2. 1 . Prove the parallelogram law (2. 1 .9). 2.2. If {X,, t = 0, ± 1 , . . . } is a stationary process with mean zero and auto covariance function y( · ), show that Y, = I �� ak Xk converges in mean square if I�o I;,o aiai y(i - j) is finite. 2.3. Show that if {X, t = 0, ± 1, . . . } is stationary and I ll I < 1 then for each n, L} 1 ()iX" + 1 i con verges in mean square as m --+ oo . � _ 2.4. I f .H is a closed subspace o f the Hilbert space £, show that (.H� )� = .H. 2.5. If .H is a closed subspace of the Hilbert space :Yt' and x E £, prove that min llx - Yll y e . It = max { l (x, z ) l : z E .H \ llzll = 1 }. 2. Hilbert Spaces 74 2.6. Verify the calculations of t/1 1 and t/12 in Example 2.3.4. Also check that X3 = (2 cos w)X2 - X 1 . 2.7. If£' is a complex Hilbert space and X;E £', i = 1 , . . . , n, show that sp{x 1 , . . . , xn} = {2:: }� 1 aj xj : aj E IC,j = 1 , . . . , n}. 2.8. Suppose that {X,, t = 1 , 2, . . . } is a stationary process with mean zero. Show that P;;pp .x 1 Xn ) Xn + ! = P;;p{X1, Xn ) Xn+ 1 · • • • • • • • • , 2.9. (a) Let £' = U ( [ - 1 , 1], :?1[ - 1 , 1], J1) where dJ1 = dx is Lebesgue measure on [ - 1 , 1]. Use the prediction equations to find constants a0, a 1 and a 2 which mmimize (b) Find max{gEA '. I I Y II � 1 1 J � 1 exg(x) dx where At = sp{ 1 , x, x 2 }. 2. 10. I f X, = Z, - OZ,_ 1 , where 1 0 1 < 1 and {Z" t = 0 , ± 1 , . . . } i s a sequence o f un correlated random variables, each with mean 0 and variance a 2 , show by check ing the prediction equations that the best mean square predictor of Xn+l m sp {Xj, - oo < j ::s; n} is j xn+1 = - I o xn + l -j · j� 1 00 What is the mean squared error of Xn+l? 2. 1 1 . I f X, is defined a s in Problem 2. 1 0 with () = 1 , find the best mean square predictor of Xn+1 in sp {Xj , I ::s; j ::s; n} and the corresponding mean squared error. 2. 1 2. If X, = ¢ 1 X,_ 1 + ¢2 X,_ 2 + . . . + ¢v Xr - p + Z" t = 0, ± 1, . . . where {Z, } is a se quence of uncorrelated random variables, each with mean zero and variance a2 and such that Z, is uncorrelated with { Xj ,j < t} for each t, use the prediction equations to show that the best mean square predictor of Xn+l in sp { Xj, -oo < j ::s; n} is Xn + 1 = !/J1 Xn + !/J2 Xn - 1 + . . . + t/JpXn+ 1 - p · 2. 1 3. (Gram-Schmidt orthogonalization). Let x 1 , x 2 , . . . , xn be linearly independent elements of a Hilbert space £' (i.e. elements for which lla 1 x 1 + · · · + anxn ll = 0 implies that a 1 = a 2 = · · · = an = 0). Define and Show that {ek = wk /llwk ll, k = l, . . . ,n} is an orthonormal set and that sp { e 1 o . . . , ek } = sp { x 1 , . . . , xk } for 1 ::s; k ::s; n. 2. 1 4. Show that every closed subspace At of IR" which contains a non-zero vector can be written as At = sp{e 1 , . . . ,em } where {e 1 , . . . ,em } is an orthonormal subset of At and m ( ::s; n) is the same for all such representations. Problems 75 2. 1 5. Let X 1 , X2 and X3 be three random variables with mean zero and covariance matrix, Use the Gram-Schmidt orthogonalization process of Problem 2. 1 3 to find three uncorrelated random variables Z 1 , Z 2 and Z 3 such that sp {X 1 } = sp { Z 1 } , sp {X 1 , X 2 } = sp { Z 1 , Z 2 } and sp { X 1 , X 2 , X 3 } = sp {Z 1 , Z 2 , Z 3 } . 2. 1 6. (Hermite polynomials). Let £' = L 2 (1R, .@, J1) where dJ1 = (2nr 1 12 e - x'12 dx. Set f0 (x) = I , /1 (x) = x, f2 (x) = x 2 , f3 (x) = x 3 . Using the Gram-Schmidt ortho gonalization process, find polynomials Hk(x) of degree k, k = 0, I, 2, 3 which are orthogonal in £'. (Do not however normalize Hk(x) to have unit length.) Verify dk k that Hk(x) = ( - l ) ex'12 - e - x'l2 k = O I 2 3 dxk ' ' ' ' · 2. 1 7. Prove the first statement in the proof of Theorem 2.4.2. 2. 1 8. (a) Let x be an element of the Hilbert space £' = sp {x 1 , x 2 , . . . }. Show that £' is separable and that (b) If { X, t = 0, ± 1, . . . } is a stationary process show that P;;p{x1• -oo <J->n) Xn+1 = lim P;;p{x1.n-r<J-<;n) Xn+1 · r-oo 2. 1 9. (General linear model). Consider the general linear model Y = X9 + Z, where Y = ( Y1 , , Y,)' is the vector of observations, X is a known n x m matrix of rank m < n, 9 = (8 1 , . . . , em )' is an m-vector of parameter values, and Z = (Z 1 , . . . , z.)' is the vector of noise variables. The least squares estimator of 9 is given by equation (2.6.4), i.e. 9 = (X' xr 1 X'Y. . • . Assume that Z - N(O, a 2 I.) where I. is the n-dimensional identity matrix. (a) Show that Y - N(X9, a 2 I.). (b) Show that 9 - N(9, a2 (X' Xr 1 ). (c) Show that the projection matrix Pu = X(X' xr 1 X' is non-negative definite and has m non-zero eigenvalues all of which are equal to one. Similarly, I. - PH is also non-negative definite with (n - m) non-zero eigenvalues all of which are equal to one. (d) Show that the two vectors of random variables, Pu(Y - X9) and (/. - Pu)Y are independent and that a- 2 11 PH(Y - X9) 11 2 and a- 2 11 ( 1. - PH) Y II 2 are inde pendent chi-squared random variables with m and (n - m) degrees of freedom respectively. ( I I Y II here denotes the Euclidean norm of Y, i.e. (I,�� l'? ) 1 12 .) (e) Conclude that 76 2. Hilbert Spaces has the 2.20. F (n - m) I I P.,u ( Y - X9) 11 2 m ii Y - P..�t YII 2 distribution with Suppose (X, Z 1 , • . • m and (n - m) degrees of freedom. , Z")' has a multivariate normal distribution. Show that , Z } ( X) = E..K(Z1, . . . ,zjX ), where the conditional expectation operator E..K(z1, z") is defined as in Section psp{ l .Z1 • • • • n . • . • 2.7. 2.2 1 . Suppose {X,, t = 0, ± 1 , . . . } i s a stationary process with mean zero and auto [y(h)l < oo ). covariance function y( · ) which is absolutely summable (i.e. Li:'= Define f to be the function, - oo h and show that y(h) = f': n e' 'f(A) dA. -n � A � n, 2.22. (a) If / E L 2 ( [ - n, n] ), prove the Riemann-Lebesgue lemma: (f, eh ) -> 0 as h -> oo, where e" was defined by (2.8.2). (b) If jE L 2 ( [ - n, n] ) has a continuous derivative f'(x) and f(n) = f( - n), show that (f, eh) = (ihr1 (f', eh) and hence that h(f, eh ) -> 0 as h -> oo. Show also that I I:'= I (f, eh ) I < oo and conclude that Snf (see Section 2.8) converges uniformly to f. - oo 2.23. 2.24. Show that the space F (Example 2.9. 1 ) is a separable Hilbert space. If Yf is any Hilbert space with orthonormal basis { e", n = 1 , 2, . . . }, show that the mapping defined by Th = { (h, e" ) }, hE Yf, is an isomorphism of Yf onto 12. 2.25.* Prove that .H(Z) (see Definition 2.7.3) is closed. CHAPTER 3 Stationary ARMA Processes In this chapter we introduce an extremely important class of time series {X,, t = 0, ± 1 , ± 2, . . . } defined in terms of linear difference equations with constant coefficients. The imposition of this additional structure defines a parametric family of stationary processes, the autoregressive moving aver age or ARMA processes. For any autocovariance function y( · ) such that limh Xl y(h) = 0, and for any integer k > 0, it is possible to find an ARMA process with autocovariance function Yx( · ) such that Yx(h) = y(h), h = 0, 1 , . . . . , k. For this (and other) reasons the family of ARMA processes plays a key role in the modelling of time-series data. The linear structure of ARMA processes leads also to a very simple theory of linear prediction which is discussed in detail in Chapter 5. §3. 1 Causal and Invertible ARMA Processes In many respects the simplest kind of time series { X, } is one in which the random variables X,, t = 0, ± 1, ± 2, . . . are independently and identically distributed with zero mean and variance rJ 2 . From a second order point of view i.e. ignoring all properties of the joint distributions of { X, } except those which can be deduced from the moments E(X,) and E(X5X,), such processes are identified with the class of all stationary processes having mean zero and autocovariance function y(h) = {(J2 0 if h if h = 0, f= 0. (3. 1 . 1) 3. Stationary ARMA Processes 78 Definition 3.1 .1. The process { Z, } is said to be white noise with mean 0 and variance a 2 , written {Z, } - WN(O, a 2 ), (3. 1 .2) if and only if { Z, } has zero mean and covariance function (3. 1 . 1 ). If the random variables Z, are independently and identically distributed with mean 0 and variance a 2 then we shall write (3. 1 .3) A very wide class of stationary processes can be generated by using white noise as the forcing terms in a set of linear difference equations. This leads to the notion of an autoregressive-moving average (ARMA) process. 0, ± 1, ± 2, . . . } is said to be an ARMA(p, q) process if {X, } is stationary and if for every t, Definition 3.1.2 (The ARMA (p, q) Process). The process {X,, t = X, - rp 1 X, _1 r/JvXr-v = Z, + 8 1 Z, _1 + · · · + 8qZr-q, (3. 1 .4) 2 where {Z, } - WN(O, a ). We say that { X, } is an ARMA(p, q) process with mean fJ. if {X, - fJ.} is an ARMA(p, q) process. ··· - - The equations (3. 1 .4) can be written symbolically in the more compact form t = 0, ± 1 ' ± 2, . . . ' where rjJ and e are the p'h and q'h degree polynomials rp(B)X, = 8(B)Z,, rp(z) = 1 and (3. 1 .5 ) ¢Yv zP (3. 1 .6) 8(z) = 1 + el z + . . . + eq z q (3. 1 . 7) - r/J 1 z - ··· - and B is the backward shift operator defined by j = 0, ± 1 ' ± 2, . . . . (3. 1 .8) The polynomials rjJ and 8 will be referred to as the autoregressive and moving average polynomials respectively of the difference equations (3. 1 .5). EXAMPLE 3. 1 . 1 (The MA(q) Process). If r/J(z) = 1 then X, = 8(B)Z, (3. 1 .9 ) and the process is said to be a moving-average process of order q (or MA(q)). It is quite clear in this case that the difference equations have the unique solution (3. 1 .9). Moreover the solution { X, } is a stationary process since (defining eo = 1 and ej 0 for j > q), we see that = q Ex, = I ej Ez,_j = o j= O §3. 1 . Causal and Invertible ARMA Processes and { _ Cov(Xt +h• X,) - 79 q a 2 _t1 eA+I hl if I h i :::; q, 1-o 0 if l h l > q . A realization of {X1 , . . . , X 1 00 } with q = 1 , 8 1 = - .8 and Z1 � N(O, 1 ) is shown in Figure 3.l(a). The autocorrelation function of the process is shown in Figure 3.1(b). EXAMPLE 3. 1 .2 (The AR(p) Process). If 8(z) = 1 then l/J(B)X, = (3. 1 . 1 0) Z, and the process is said to be an autoregressive process of order p (or AR(p)). In this case (as in the general case to be considered in Theorems 3. 1 . 1 -3. 1 .3) the existence and uniqueness of a stationary solution of (3. 1 . 1 0) needs closer investigation. We illustrate by examining the case f/J(z) = 1 l/J1 z, i.e. - (3. 1 . 1 1 ) Iterating (3. 1 . 1 1 ) we obtain X, = Z, + l/J1 Zt - 1 + l/Jf X, _ z lf l l/J1 1 = z, + l/J1 zt - 1 + · · · + l/J� z, _ k + l/J� + 1 x,_ k -1 · < 1 and {X, } is stationary then II X, I I 2 = E(X/) is constant so that I x, jto l/l{z,_j W - = ¢Jfk+2 l l xt -k- 1 11 2 --+ o as k --+ oo . Since I.'t= o l/J { Z,_j i s mean-square convergent (by the Cauchy criterion), we conclude that x, = j=OI. l/l{ z,_j· 00 (3. 1 . 1 2) Equation (3. 1 . 1 2) is valid not only in the mean square sense but also (by Proposition 3. 1 . 1 below) with probability one, i.e. 00 X,(w) = I. l/l{Z,_Aw) for all w ¢ E, j=O where E is a subset of the underlying probability space with probability zero. All the convergent series of random variables encountered in this chapter will (by Proposition 3. 1 . 1) be both mean square convergent and absolutely con vergent with probability one. Now { X, } defined by (3. 1 . 1 2) is stationary since EX, = 00 L l/l{ EZ,_j = 0 j=O 3. Stationary ARMA Processes 80 0 -1 -2 -3 -4 10 0 30 20 40 50 60 70 80 90 1 00 (a) 1 0.9 0.8 0.7 0.6 0 5 0.4 0.3 0.2 0.1 0 -0. 1 -0 2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 ( 0 10 5 (b) Figure 3. 1 . (a) 1 00 observations of the series X, = Z, autocorrelation function of {X, } . - . r 1, 8Z - 15 20 Example 3. 1 . 1 . (b) The 81 §3. 1. Causal and Invertible ARMA Processes and = = (52 ¢>�hi I ¢> �j j� O i (52 ¢>�h /( 1 ¢> l ). 00 - Moreover {X, } as defined b y (3. 1 . 1 2) satisfies the difference equations (3. 1 . 1 1 ) and i s therefore the unique stationary solution. A realization of the process with ¢> 1 = .9 and Z, N(O, 1 ) is shown in Figure 3.2(a). The autocorrelation function of the same process is shown in Figure 3.2(b). In the case when l ¢> 1 1 > 1 the series (3. 1 . 1 2) does not converge in L 2 . However we can rewrite (3. 1 . 1 1) in the form � (3. 1 . 1 3) Iterating (3. 1 . 1 3) gives x, = - r/>�� z, +l - ¢> � 2 z, +2 + ¢>� 2 x, +2 - ¢ � 1 Zr + l - · · · - ¢ � k - ! z, +k + ! + ¢ � k - ! x, + k + l • which shows, by the same arguments as in the preceding paragraph, that = i (3. 1 . 14) x, = - j�Ll ¢>� z, +j is the unique stationary solution of (3. 1 . 1 1). This solution should not be confused with the non-stationary solution {X" t = 0, ± 1 , . . . } of (3. 1 . 1 1 ) obtained when X 0 i s any specified random variable which i s uncorrelated with {Zr }. The stationary solution (3. 1 . 1 4) is frequently regarded as unnatural since X, as defined by (3. 1 . 14) is correlated with {Zs , s > t}, a property not shared by the solution (3. 1 . 1 2) obtained when I ¢ I < 1 . It is customary therefore when modelling stationary time series to restrict attention to AR( 1 ) processes with I r/>1 1 < 1 for which X, has the representation (3. 1 . 1 2) in terms of { Z8, s ::::;; t}. Such processes are called causal or future-independent autoregressive pro cesses. It should be noted that every AR(l ) process with l ¢>1 1 > 1 can be reexpressed as an AR(1) process with l ¢> 1 1 < 1 and a new white noise sequence (Problem 3.3). From a second-order point of view therefore, nothing is lost by eliminating AR( 1 ) processes with I ¢ 1 1 > 1 from consideration. If l ¢> 1 1 1 there is no stationary solution of (3. 1 . 1 1 ) (Problem 3.4). Con sequently there is no such thing as an AR(1) with l ¢> 1 1 = 1 according to our Definition 3. 1 .2. The concept of causality will now be defined for a general ARMA(p, q) process. 00 = 3. Stationary ARMA Processes 82 8 ,-------� 7 6 5 4 3 2 0 ��-----=��---+----�--��,-��7---� -1 -2 -3 -4 -5 -6 -7 - 8 �� 40 60 70 1 00 30 50 90 10 20 80 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 5 10 15 20 (b) Figure 3.2. (a) 100 observations of the series X, - .9X,_ 1 autocorrelation function of {X, } . = Z,, Example 3. 1 .2. (b) The 83 §3. 1 . Causal and Invertible ARMA Processes Definition 3.1 .3. An ARMA(p, q) process defined by the equations ¢;(B) X, = 8(B)Z, is said to be causal (or more specifically to be a causal function of { Z, } ) if there exists a sequence of constants { t/Ji } such that I i=o I t/1) < oo and 00 X, = I t/Jj Zt -j , j=O (3. 1 . 1 5) t = 0, ± I , . . . . It should be noted that causality is a property not of the process { X, } alone but rather of the relationship between the two processes { X, } and { Z,} appearing in the defining ARMA equations. In the terminology of Section 4. 10 we can say that { X, } is causal if it is obtained from {Z, } by application of a causal linear filter. The following proposition clarifies the meaning of the sum appearing in (3. 1 . 1 5). If { X, } is any sequence of random variables such that and if I i= -oo I t/1) < oo, then the series Proposition 3.1.1. sup, E I X, l < oo, t/J(B)X, = 00 I j= -oo t/Jj BiX, -oo 00 I t/Jj Xt _j , j= = (3. 1 . 1 6) converges absolutely with probability one. If in addition sup, E I X, I 2 the series converges in mean square to the same limit. < oo then PROOF. The monotone convergence theorem and finiteness of sup, E I X, I give EC=�oo ) ) (t_ :S: !�� Ct ) l t/1) 1 x, -j 1 = !� E " l t/Jj i i X, _jl n l t/Jj l s �p E I X, I < 00 , from which i t follows that I i= I t/li l l X,_i l and t/I (B)X, both finite with probability one. lf sup, E I X, I 2 < oo and n > m > 0, then - oo E I I t/fj x, _j m <lil S, n l 2 = I I m <lil S, n m < lkl S, n :;:; s � p E I X, I 2 I i= - oo t/Ji Xt -j are t/Ji/lk E(x, _jx, _d c�s," ) --> 0 as m, n --> = l t/Ji l 2 oo , and so by the Cauchy criterion the series (3. 1 . 1 6) converges in mean square. If S denotes the mean square limit, then by Fatou's lemma, i t_ E I S - t/J(B)X, I 2 = E li �nf S � i i X, _i n t/l l 2 3. Stationary ARMA Processes 84 = 0, showing that the limits S and t/J (B)X, are equal with probability one. Proposition 3.1 .2. If {X,} y( · ) and if LJ:= - oo I t/Ji l < 0 is a stationary process with autocovariance function then for each t E 71. the series (3. 1 . 1 6) converges absolutely with probability one and in mean square to the same limit. If oo, Y, = t/J(B)X, then the process { Y, } is stationary with autocovariance function }'y(h) = t/J t/J y(h - j + k). j, k = - oo j k <Xl I PROOF. The convergence assertions follow at once from Proposition 3. 1 . 1 and the observation that if {X, } is stationary then 1 E I X, I ::;; (E I X,I 2 ) 12 = c, where c is finite and independent of t. To check the stationarity of { Y, } we observe, using the mean square convergence of (3. 1 . 1 6) and continuity of the inner product, that E Y, = !�� t. t/J X, _ = i n iE i and c=�oo t/Ji) EX,, <Xl L t/Jj t/J (y(h - j + k) + (EX,) 2 ) . j, k = - oo k Thus E Y, and E ( Y, +h Y, ) are both finite and independent of t. The auto covariance function Yr( · ) of { Y,} is given by }'y(h) = E ( Y, + h Y, ) - E Y, + h . E Y, = L t/Jj t/J y(h - j + k). j,k = - oo k <Xl = 0 It is an immediate corollary of Proposition 3. 1 .2 that operators such as t/J(B) LJ:= - oo t/Ji Bi with LJ:= - oo I t/Ji l < oo, when applied to stationary processes, are not only meaningful but also inherit the algebraic properties of power series. In particular if LJ:= - oo l ai l < oo , LJ:= - oo i Pi l < oo, a (B) = i LJ:= - oo ai Bi, fJ (B) = LJ:= - oo fJiB , and t/J(B) = LJ:= - oo t/Ji Bi, where <Xl <Xl t/Ji = L ak fJi -k = L Pk ai -k• k = - oo k = - oo 85 §3. 1 . Causal and Invertible ARMA Processes then a (B) {J (B)X, is well-defined and a (B) {J (B)X, = {J (B) a (B)X, = 1/J(B)X, . The following theorem gives necessary and sufficient conditions for an ARMA process to be causal. It also gives an explicit representation of X, in terms of { Z5, s ::::; t}. Let {X,} be an ARMA(p, q) process for which the polynomials r/J( · ) and 8( · ) have no common zeroes. Then {X, } is causal if and only if rjJ(z) #- 0 for all z d � such that I z I ::::; 1 . The coefficients { 1/JJ in (3. 1 . 1 5) are determined by the relation 00 1/J(z) = L 1/Ji zi = 8(z)/r/J(z), (3. 1 . 1 7) l z l ::::; 1 . j=O Theorem 3.1 . 1 . (The numerical calculation of the coefficients 1/Ji is discussed in Section 3.3.) PROOF. First assume that r/J(z) #- 0 if I z I ::::; 1. This implies that there exists 1:: > 0 such that 1 /r/J(z) has a power series expansion, 1 /r/J(z) = L �iz i = �(z), l z l < 1 + 1::. j=O Consequently �i ( 1 + t:/2) i � 0 asj � oo so that there exists K E (O, oo) for which l �i l < K(1 + t:/2)-i for all j = 0, 1, 2, . . . . 00 In particular we have L i= o l �i l < oo and �(z)r/J(z) = 1 for l z l ::::; 1 . By Pro position 3. 1 .2 we can therefore apply the operator �(B) to both sides of the equation r/J(B)X, = 8(B)Z, to obtain X, = �(B)8(B)Z,. Thus we have the desired representation, X, = 00 I 1/Jj Zt-j j= O where the sequence { 1/Ji } is determined by (3. 1 . 1 7). Now assume that {X, } is causal, i.e. X, = L i'=o 1/Ji Z, _ i for some sequence { 1/JJ such that L i'= o 1 1/1) < oo . Then 8(B)Z, = r/J(B)X, = rjJ(B) IjJ (B)Z, . I f we let 11(z) = r/J(z)I/J (z) = L i=o 11i z i, lzl ::::; 1, we can rewrite this equation as and taking inner products of each side with Z, _ k (recalling that { Z, } WN(O, o- 2 )) we obtain 11k = ek, k = 0, . . . , q and 11k = 0, k > q. Hence 8(z) = 11(z) = rjJ (z)ljJ(z), l z l ::::; 1 . '"'"' 3. Stationary ARMA Processes 86 Since 8(z) and r/J(z) have no common zeroes and since 1 1/J(z) l < oo for l z l :::::; 1 , we conclude that r/J(z) cannot be zero for l z l :::::; 1 . D Remark 1 . If {X, } is an ARMA process for which the polynomials r/J( " ) and 8( · ) have common zeroes, then there are two possibilities: (a) none of the common zeroes lie on the unit circle, in which case (Problem 3.6) {X, } is the unique stationary solution of the ARMA equations with no common zeroes, obtained by cancelling the common factors of r/J( · ) and 8( . ), (b) at least one of the common zeroes lies on the unit circle, in which case the ARMA equations may have more than one stationary solution (see Problem 3.24). Consequently ARMA processes for which r/J( · ) and 8( · ) have common zeroes are rarely considered. Remark 2. The first part of the proof of Theorem 3. 1 . 1 shows that if {X, } i s a stationary solution of the ARMA equations with r/J(z) # 0 for l z l :::::; 1, then w e must have X, = L i= o 1/Ji Z,_i where { 1/JJ is defined by (3. 1 . 1 7). Conversely if X, = L i=o 1/Ji Zt -i then r/J(B)X, = r/J(B)Ij!(B)Z, = 8(B)Z,. Thus the process { 1/J(B)Z, } is the unique stationary solution of the ARMA equations if r/J(z) # 0 for l z l :::::; 1 . Remark 3 . We shall see later (Problem 4.28) that if r/J( " ) and 8( ' ) have no common zeroes and if r/J(z) = 0 for some z E C with l z l = 1 , then there is no stationary solution of r/J(B)X, = 8(B)Z,. We now introduce another concept which is closely related to that of causality. An ARMA(p, q) process defined by the equations r/J(B)X, = 8(B)Z, is said to be invertible if there exists a sequence of constants { ni } such that L i=o I ni l < oo and Definition 3.1 .4. 00 z, = I nj x, _j , j=O t = 0, ± 1, . . . . (3. 1 . 1 8) Like causality, the property of invertibility is not a property of the process {X,} alone, but of the relationship between the two processes {X, } and { Z, } appearing in the defining ARMA equations. The following theorem gives necessary and sufficient conditions for invertibility and specifies the coeffi cients ni in the representation (3. 1 . 1 8). Let {X,} be an ARMA(p, q) process for which the polynomials r/J( · ) and 8( · ) have no common zeroes. Then {X, } is invertible if and only if Theorem 3.1 .2. §3. 1 . Causal and Invertible ARMA Processes 87 B(z) # 0 for all z E C such that l z l � 1 . The coefficients {n:J in (3. 1 . 1 8) are determined by the relation n:(z) = L n:i zi = r/J(z)/B(z), j= O 00 lzl � 1. (3. 1 . 1 9) (The coefficients { n:J can be calculated from recursion relations analogous to those for { t/JJ (see Problem 3.7).) PROOF. First assume that B(z) # 0 if l z l � 1. By the same argument as in the proof of Theorem 3. 1 . 1 , 1/B(z) has a power series expansion 1 /B(z) = L '1i z i = 17(z), l z l < 1 + 8, j =O for some 8 > 0. Since L � o 1 '11 1 < oo , Proposition 3. 1 .2 allows us to apply 17(B) to both sides of the equation r/J(B)X, = B(B)Z, to obtain 00 17(B)rjJ(B)X, = 17(B)8(B)Z, = Z, . Thus we have the desired representation 00 z, = I n:j x,_j, j =O where the sequence { n:J is determined by (3. 1 . 1 9). Conversely if {X, } is invertible then Z, = L� o n:i X,_i for some sequence { n:i } such that L � o I n:i I < oo . Then r/J(B)Z, = n:(B)r/J(B) X, = n:(B)B(B)Z, . Setting �(z) = n:(z)B(z) = L � o �i z i, l z l � 1 , we can rewrite this equation as and taking inner products of each side with Z,_ k we obtain �k = r/Jk , k = 0, . . . , p and � k = 0, k > p. Hence r/J(z) = �(z) = n:(z)B(z), lzl � 1. Since r/J(z) and B(z) have no common zeroes and since l n:(z) l < oo for l z l � 1 , we conclude that B(z) cannot be zero for l z l � 1 . 0 Remark 4. If {X,} is a stationary solution of the equations r/J(B)X, = B(B)Z" and if r/J(z)B(z) # 0 for l z l � 1 , then (3. 1 .20) 00 X, = L 1/JjZr -j j =O and 3. Stationary ARMA Processes 88 Z, = 00 L njXr -j • j=O Remark 5. If {X, } is any ARMA process, f/J(B)X, = 8 (B) Z, , with </J(z) non-zero for all z such that I z I = 1 , then it is possible to find polynomials �( · ), 8( · ) and a white noise process { Z:} such that ;f;(B)X, = 8(B)Zi and such that { X,} is a causal function of {Zi}. If in addition 8(z) is non-zero when l z l = 1 then 8( · ) can be chosen in such a way that { X,} is also an invertible function of {Zi }, i.e. such that B(z) is non-zero for l z l � 1 (see Proposition 3.5. 1 ). If {Z,} � IID(O, a2) it is not true in general that {Z:} is independent (Breidt and Davis ( 1 990)). It is true, however, if { Z,} is Gaussian (see Problem 3. 1 8) . Remark 6. Theorem 3. 1 .2 can be extended to include the case when the moving average polynomial has zeroes on the unit circle if we extend the definition of invertibility to require only that Z, E sp{X., - oo < s � t } . Under this definition, a n ARMA process i s invertible i f and only i f 8(z) =/= 0 for all l z l < 1 (see Problem 3.8 and Propositions 4.4. 1 and 4.4.3). In view of Remarks 4 and 5 we shall focus attention on causal invertible ARMA processes except when the contrary is explicitly indicated. We con clude this section however with a discussion of the more general case when causality and invertibility are not assumed. Recall from Remark 3 that if </J( · ) and 8( · ) have no common zeroes and if </J(z) = 0 for some z E C with l z l = 1 , then there i s n o stationary solution o f </J (B)X, = 8(B)Z,. I f o n the other hand qy(z) =/= 0 for all z E C such that l z l = 1 , then a well-known result from complex analysis guarantees the existence of r > 1 such that 8(z)f/J(z)- 1 = 00 i L t/ti z = t/t (z), j= r- 1 < l z l < r, (3. 1 .2 1 ) - ro the Laurent series being absolutely convergent i n the specified annulus (see e.g. Ahlfors ( 1 953)). The existence of this Laurent expansion plays a key role in the proof of the following theorem. If f/J(z) =/= 0 for all z E C such that l z l = 1, then the ARMA equations f/J(B)X, = 8(B)Z, have the unique stationary solution, Theorem 3.1 .3. j= -oo (3. 1 .22) where the coefficients t/ti are determined by (3. 1 .21 ). PROOF. By Proposition 3.1 .2, {X, } as defined by (3.1 .22) is a stationary process. Applying the operator f/J(B) to each side of (3.1 .22) and noting, again by 89 §3.2. Moving Average Processes of Infinite Order Proposition 3. 1 .2, that rp(B)Ij;(B)Z1 = rp(B)X1 8(B)Z1, we obtain 8(B)Z1• = (3.1 .23) Hence { X1 } is a stationary solution of the ARMA equations. To prove the converse let { X1 } be any stationary solution of (3. 1 .23). Since r/J(z) =f- 0 for all z E IC such that l z l = 1 , there exists b > 1 such that the series 1 �(z) is absolutely convergent for b- 1 < l z l < b. We can j L � -� �j z = r/J(z)therefore apply the operator �(B) to each side of (3. 1 .23) to get = �(B)rp(B)X1 or equivalently = �(B)e(B)Zu Xt = lj;(B)Zt. D §3.2 Moving Average Processes of Infinite Order In this section we extend the notion of MA(q) process introduced in Section 3. 1 by allowing q to be infinite. Definition 3.2.1 . If { Z1 } � WN(O, CJ2 ) then we say that { X1 } is a moving average (MA(oo)) of {Z1 } if there exists a sequence {t/lj } with L � o l t/lj l < oo such that xt = = 00 t/lj Zt-j, jI =O t = 0, ± 1, ± 2, . . . . (3.2. 1) = 3.2. 1 . The MA(q) process defined by (3. 1 .9) is a moving average of {Z1 } with t/Jj = ej, j 0, 1, . . . , q and t/Jj O, j > q. ExAMPLE EXAMPLE with t/Jj = 3.2.2. The AR(1 ) process with l r/J I < 1 is a moving average of {Z1 } rp j, j = 0, 1 , 2, . . . . = EXAMPLE 3.2.3. By Theorem 3. 1 . 1 the causal ARMA(p, q) process rp(B)X1 = 8(B)Z1 is a moving average of {Z1 } with L J=O t/Jj z j 8(z)/r/J(z), l zl � 1 . It should be emphasized that i n the definition of M A ( oo ) of { Z1 } i t is required that X1 should be expressible in terms of Z5, s � t, only. It is for this reason that we need the assumption of causality in Example 3.2.3. However, even for non-causal ARMA processes, it is possible to find a white noise sequence {zn such that X1 is a moving average of {zn (Proposition 3.5. 1 ). Moreover, as we shall see in Section 5.7, a large class of stationary processes have MA( oo) representations. We consider a special case in the following proposition. Proposition 3.2.1 . If { X1 } is a zero-mean stationary process with autocovariance function y( · ) such that y(h) = 0 for I h i > q and y(q) =/= 0, then { X1 } is an MA(q) 90 3. Stationary ARMA Processes process, i.e. there exists a white noise process { Z, } such that X, = Z, + 8 1 Z,� 1 + · · · + 8q Zr �q · (3.2.2) PROOF. For each t, define the subspace A, = sp { X, -oo < s :-s; t } of U and set Z, = X, - PA, _ , X, . (3.2.3) Clearly Z, E A" and by definition of PAtt - I , Z, E A/� 1 . Thus if s < t, Zs E As c ,4/, �1 and hence EZ5Z, = 0. Moreover, by Problem 2. 1 8 P5P{Xs . s=r�n , ... ,r � l } Xr � PAtt - I X, as n --> oo, so that by stationarity and the continuity of the L 2 norm, II Zr+ 1 ll = 11Xr+ 1 - PA, Xr + 1 l l = nlim 11 Xr + 1 - Psp{Xs , s=r+ 1 �n, ... ,r} Xr + 1 11 �oo = nlim I IX, - Psp{Xs , s=r�n , .. . , r � l } Xr l l �oo = I I X, - PH,_ , X, I I = II Z, II . Defining (J 2 = I I Z, I I 2 , we conclude that {Zr } Now by (3.2.3), it follows that � WN(O, (J 2 ). A,� 1 = sp { X, s < t - 1 , Z, � d = sp { X, s < t - q, Z, �q , . . . , Z,�d and consequently A, �1 can be decomposed into the two orthogonal sub spaces, A, �q�1 and sp{ Zr� q , . . . , Z, � 1 } . Since y(h) = 0 for I h i > q, it follows that X, j_ A, �q�1 and so by Proposition 2.3.2 and Theorem 2.4. 1 , �H,_ , X, = PAt,_._ , X, + Psp{ z,_. , ... ,z,_, } X, = 0 + (J � 2 E (X,Z, �1 )Z,�1 + · · · + (J � 2 E (X,Z, �q )Z, �q = 8 1 z,� 1 + . . . + eq zr�q where ()i := (J � 2 E (X,Z, �j ), which by stationarity is independent of t for j = 1 , . . . , q. Substituting for PAt, _ , X, in (3.2.3) gives (3.2.2). D If { X ,} has the same autocovariance function as that of an ARMA(p, q) process, then { X ,} is also an ARMA(p, q) process. In other words, there exists a white noise sequence {Z,} and coefficients ¢ 1 , . . . , ¢ v , 8 1, . . . ' eq such that X, - ¢1 1 Xr � t - . . . - f/JvXr � v = z, + 8 1 Zr � t + . . . + eq zr � q Remark. (see Problem 3.19). 91 §3.3. Computing the Autocovariance Function of an ARMA(p, q) Process The following theorem i s a n immediate consequence of Proposition 3. 1 .2. Theorem 3.2.1 . The MA(oo) process defined by (3.2. 1 ) is stationary with mean zero and autocovariance function y (k) = a2 OC! (3.2.4) I j t/Jj+ lk l · j =O t/J Notice that Theorem 3.2. 1 together with Example 3.2.3 completely deter mines the autocovariance function y of any causal ARMA(p, q) process. We shall discuss the calculation of y in more detail in Section 3.3. The notion of AR(p) process introduced in Section 3.1 can also be extended to allow p to be infinite. In particular we note from Theorem 3. 1 .2 that any invertible ARMA(p, q) process satisfies the equations = OC! = t 0, ± 1 , ± 2, . . . X, + L nj xt -j Z,, j=l which have the same form as the AR(p) equations (3. 1 . 10) with p = oo. §3.3 Computing the Autocovariance Function of an ARMA(p, q) Process We now give three methods for computing the autocovariance function of an ARMA process. In practice, the third method is the most convenient for obtaining numerical values and the second is the most convenient for obtain ing a solution in closed form. = First Method. The autocovariance function y of the causal ARMA(p, q) process rjJ(B)X, 8(B)Z, was shown in Section 3.2 to satisfy y (k) where = a2 = 00 ro (3.3. 1 ) I j j k• j= O t/J i/J +l l = for l z l :o;; 1 , (3.3.2) L lj;i z i B(z)/r/J(z) j= O B(z) 1 + e l z + . . . + eq z q and r/J(z) 1 rf i Z - . . . - r/Jp z P. In order to determine the coefficients if;i we can rewrite (3.3.2) in the form lj;(z)rjJ(z) B(z) and equate coefficients of z i to obtain (defining 80 = 1 , ei 0 for j > q and r/Ji 0 for j > p), = lj;(z) = = = - 0 :o;; j < max(p q + 1) , = (3.3.3) 92 3. Stationary ARMA Processes and = = j � max(p, q + 1). (3.3.4) These equations can easily be solved successively for 1/10 , lj; 1 , lj; , . . . . Thus 2 1/10 lj; l 00 = el + 1, 1/Jo r/Jl = el + ¢1 , (3.3.5 ) Alternatively the general solution (3.3.4) can be written down, with the aid of Section 3.6 as r; - 1 k n � max(p, q + 1 ) - p, aii n i � i ", (3.3.6) I I = l = O j i where �;, i = 1 , . . . , k are the distinct zeroes of rfo(z) and r; is the multiplicity of �; (so that in particular we must have I �= l r; p). The p constants aii and the coefficients lj;i , 0 � j < max(p, q + 1) - p, are then determined uniquely by the if;. = = max(p, q + 1) boundary conditions (3.3.3). This completes the determination of the sequence { lj;i } and hence, by (3.3.1), of the autocovariance function y . ExAMPLE form 3.3. 1 . ( 1 - B + ±B 2 )X1 = ( 1 + B)Z1• The equations (3.3.3) take the and (3.3.4) becomes 1/10 = 00 = 1 , 1/11 = 81 + I/Jo r/J 1 = 81 + r/J 1 = 2, lj;j - 1/Jj - 1 + t i/Jj - 2 = 0, The general solution of (3.3.4) is (see Section 3.6) j � 2. n � 0. The constants a 1 0 and a 1 1 are found from the boundary conditions lj;0 and if; 1 = 2 to be a 1 0 = 1 and a 1 1 = 3. Hence if;. = (1 + 3n)T ", n = 0, 1 , 2, . . . . Finally, substituting in (3.3 . 1 ), we obtain for k � 0 y( k) = a2 I ( 1 + 3j)( 1 + 3j + 3k)rzj - k 00 j= O = a2 rk I [(3k + 1 )4-j + 3 (3k + 2)j4-j + 9/4 -j J 00 j=O = a2 r k [j:(3k + 1) + V (3k + 2) + 1NJ = a 2 rk [ 332 + 8k]. = 1 §3.3. Computing the Autocovariance Function of an ARMA(p, q) Process 93 Second Method. An alternative method for computing the autocovariance function y( · ) of the causal ARMA(p, q) l/J(B)X, = 8(B) , (3.3.7) Z, is based on the difference equations for y(k), k = 0, 1, 2, . . . , which are obtained by multiplying each side of (3.3.7) by X,_ k and taking expectations, namely and 0 s k < max(p, q + 1 ), y(k) - ¢11 y(k - 1 ) - . . . - l/Jp y(k - p) = 0, = k ri - 1 y(h) = I. I. {3 i =1 (3.3.8) k :2: max(p, q + 1 ). (3.3.9) (In evaluating the right-hand sides of these equations we have used the representation X, I.� o tf;jZr -j · ) The general solution of (3.3.9) has the same form as (3.3.6), viz. j= O ij hj � i \ :2: h max(p, q + 1 ) - p, (3.3. 10) where the p constants {3ii and the covariances y(j), 0 s j < max(p, q + 1 ) - p, are uniquely determined from the boundary conditions (3.3.8) after first com puting t/;0, tj; 1 , . . . , t/;q from (3.3.5). ExAMPLE = 3.3.2. ( 1 - B + iB 2 ) X, ( 1 + B)Z,. The equations (3.3.9) become y(k) - y(k - 1 ) + t y(k - 2) = 0, k :2: 2, with general solution n :2: 0. (3.3.1 1) y(n) = ( /31 0 + {31 1 n) T n, The boundary conditions (3.3.8) are y(O) - y(l) + t y(2) a 2 (t/Jo + t/1 1 ), y(1) - y(O) + ty{l) a 2 tj;0 , where from (3.3.5), t/;0 1 and t/; 1 = 81 + ¢1 1 = 2. Replacing y(O), y(l) and y(2) in accordance with the general solution (3.3. 1 1) we obtain 3{31 0 - 2{3 1 1 1 6a 2 , - 3f31o + 5{31 1 = 8a z , = = = whence {31 1 = 8a2 = and {3 1 0 = 32a 2/3. Finally therefore we obtain the solution y(k) = a2 2-k [ 3l + 8k], as found in Example 3.3. 1 using the first method. 94 3. Stationary ARMA Processes ExAMPLE 3.3.3 (The Autocovariance Function of an MA(q) Process). By Theorem 3.2. 1 the autocovariance function of the process has the extremely simple form l k l � q, (3.3. 1 2) l k l > q. where 80 is defined to be 1 and ()j, j > q, is defined to be zero. ExAMPLE 3.3.4 (The Autocovariance Function of an AR(p) Process). From (3.3. 1 0) we know that the causal AR(p) process </J(B)Xr = Zr, has an autocovariance function of the form k ri-1 y(h) = I :L f3ij hj � � \ i =l j=O (3.3. 1 3) where C i = 1, . . . , k, are the zeroes (possibly complex) of </J(z), and r; is the multiplicity of � ;- The constants f3ij are found from (3.3.8). By changing the autoregressive polynomial </J( · ) and allowing p to be arbitrarily large it is possible to generate a remarkably large variety of covariance functions y( · ). This is extremely important when we attempt to find a process whose autocovariance function "matches" the sample auto covariances of a given data set. The general problem of finding a suitable ARMA process to represent a given set of data is discussed in detail in Chapters 8 and 9. In particular we shall prove in Section 8. 1 that if y( · ) is any covariance function such that y(h) --> 0 as h --> oo, then for any k there is a causal AR(k) process whose autocovariance function at lags 0, 1 , . . . , k, coincides with y(j), j = 0, 1, . . . , k. We note from (3.3. 1 3) that the rate of convergence of y(n) to zero as n --> oo depends on the zeroes � ; which are closest to the unit circle. (The causality condition guarantees that I � ; I > 1 , i = 1, . . . , k.) If </J( - ) has a zero close to the unit circle then the corresponding term or terms of (3.3. 1 3) will decay in absolute value very slowly. Notice also that simple real zeroes of </J( · ) contri bute terms to (3.3 . 1 3) which decrease geometrically with h. A pair of complex conjugate zeroes together contribute a geometrically damped sinusoidal term. We shall illustrate these possibilities numerically in Example 3.3.5 with refer ence to an AR(2) process. §3.3. Computing the Autocovariance Function of an ARMA(p, q) Process EXAMPLE 95 3.3.5 (An Autoregressive Process with p = 2). For the causal AR(2), we easily find from (3.3. 1 3) and (3.3.8), using the relations r/J 1 = G 1 + G 1 , and that Figure 3.3 illustrates some of the possible forms of y( · ) for different values of �1 and � 2 . Notice that if �1 = re i6 and � 2 = re- i6, 0 < 8 < n, then we can rewrite ( 3.3.14) in the more illuminating form, 0"2r4 , -h sin(h8 + 1/1 ) y(h) = (3.3. 1 5) (r2 - 1 ) (r4 - 2r2 cos 28 + 1 ) 112 sin 8 ' • where r2 + 1 tan 1/J = --- tan 8 r2 - 1 (3.3. 1 6) and cos 1/J has the same sign as cos 8. Third Method. The numerical determination of the autocovariance function y( · ) from equations ( 3.3.8) and (3.3.9) can be carried out readily by first finding y(O), . . . , y( p) from the equations with k = 0, 1, . . . , p, and then using the subsequent equations to determine y(p + 1 ), y( p + 2), . . . recursively. EXAMPLE 3.3.6. For the process considered in Examples 3.3. 1 and 3.3.2 the equations (3.3.8) and (3.3.9) with k = 0, 1, 2 are y(O) - y(l) + t y(2) = 30"2 , y(1) - y(O) + ty(1) = 0"2, y(2) - y(l) + t y(O) = 0, with solution y(O) = 320" 2 /3, y(1) = 280" 2/3, y(2) = 200"2/3. The higher lag autocovariances can now easily be found recursively from the equations y(k) = y(k - 1 ) - t y(k - 2), k = 3, 4, . . . . 3. Stationary ARMA Processes 96 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0 2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 - 0. 9 -1 0 10 5 15 20 (a) 0.9 0. 8 0 7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 4------ ------ ------ -----� -0.2 - -0.3 -0 4 -0.5 -0.6 -0.7 -0.8 -0 9 -I ----,----,----, ------,��----.---.--�-----,--,--,-,--, --,--,----, -.----, ---1 0 5 (b) 10 15 20 Figure 3.3. Autocorrelation functions y(h)/y(O), h = 0, . . . , 20, of the AR(2) process (1 - � ! 1 B) ( 1 - ¢2 1 B)X, Z, when (a) � 1 = 2 and � 2 = 5, (b) � 1 = � and �2 = 2, (c) � 1 = - � and � 2 = 2, (d) � 1 , � 2 = 2( 1 ± ij3)/3 . = §3.3. Computing the Autocovariance Function of an AR MA(p, q) Process 97 ·-------- 0 9 0.8 - 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0 I -0.2 -0 3 -0 4 -0.5 -0.6 -0.7 -0.8 -0.9 -I 0 5 0 5 (c) 10 15 20 10 15 20 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0 5 -0.6 -0.7 - 0.8 -0.9 -1 (d ) Figure 3.3 Continued 3. Stationary ARMA Processes 98 §3.4 The Partial Autocorrelation Function The partial autocorrelation function, like the autocorrelation function, conveys vital information regarding the dependence structure of a stationary process. Like the autocorrelation function it also depends only on the second order properties of the process. The partial autocorrelation a(k) at lag k may be regarded as the correlation between X 1 and Xk + 1 , adjusted for the intervening observations X2 , , Xk . The idea is made precise in the following definition. . • • Definition 3.4.1. The partial autocorrelation function (pacf) a( · ) of a stationary time series is defined by a( l ) = Corr(X2 , Xd = p(l), and k :?:. 2, where the projections PSJ>p , xz, . . . , xk) Xk + 1 and PSJ>p , x2, , x,J X 1 can be found from (2.7. 1 3) and (2.7. 1 4). The value a(k) is known as the partial autocorrelation at lag k. . . • The partial autocorrelation a(k), k :?:. 2, is thus the correlation of the two residuals obtained after regressing Xk+1 and X 1 on the intermediate observa tions X 2 , , Xk . Recall that if the stationary process has zero mean then P;;p{ 1 , x2 , ••• , xk} ( " ) = P;;p{ x2 , , xk} ( " ) (see Problem 2.8). . • . ••• EXAMPLE 3.4. 1 . Let {X,} be the zero mean AR(l) process X, = .9X, _ 1 + Z,. For this example a(l) = Corr(X2 , X1 ) = Corr(.9X1 + Z2 , Xd = .9 since Corr(Z2 , Xd = 0. Moreover P;;p{xz , . . . , xk} Xk + 1 = .9Xk by Problem 2. 1 2 and P;;p { x z . ... , xk} X1 = .9X2 since (X 1 , X2 , , Xk )' has the same covariance matrix as (Xk + 1 , X k , . . . , X 2 )'. Hence for k :?:. 2, a(k) = Corr(Xk+1 - .9Xb X 1 - .9X 2 ) • • • = Corr(Zk + 1 , X 1 - .9Xz ) = 0. A realization of 1 00 observations { X" t = 1 , . . . , 1 00} was displayed in Figure 3.2. Scatter diagrams of (X,_ 1 , X,) and (X, _ 2 , X,) are shown in Figures 3.4 and 3.5 respectively. The sample correlation p(l) = 2:: (�1 (X, - X)(X,+1 - X)/ c�=:�? (X, - X)2 ] for Figure 3.4 is .814 (as compared with the corresponding 99 §3.4. The Partial Autocorrelation Function 6 ,-------,--n--� 5 4 3 2 0 -1 -2 -3 -4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Cbo 0 c9 0 0 0 a 0 0 0 0 0 0 DO o e �----,---,---0�---r---.--4 0 -2 -4 4 2 6 Figure 3.4. Scatter plot of the points (x,_ 1 , x,) for the data of Figure 3.2, showing the line x, = 9 x , 1 . _ . 6 ,-------,--u,---��-, 0 5 co 4 3 0 2 / 0 0 0 -1 -2 -3 -4 0 0 0 0 oo o/ o Do 0 o r:fl OJ 0 0 0 0 0 0 DO D li!J 0 0 0 c 0 0 0 0 oo 0 0 oD 0 0 / 0 0 4-----,------.-----ro d___�-----,-----,-----,-----,-----,----� 6 4 2 0 -2 -4 Figure 3.5. Scatter plot of the points (x,_ 2 , x,) for the data of Figure 3.2, showing the line x, = .8 1x,_2. 100 3. Stationary ARMA Processes = theoretical correlation p(l) .9). Likewise the sample correlation p(2) = 1 "Li��\ (X1 - X) (X1 +2 - X)/[L 1 21° (X1 - X) 2 ] for Figure 3.5 is .605 as compared with the theoretical correlation p(2) = .8 1 . In Figure 3.6 we have plotted the points (X1_ 2 - .9X1_ 1 , X1 - .9X1_ 1 ). It is apparent from the graph that the sample correlation between these variables is very small as expected from the fact that the theoretical partial autocorrelation at lag 2, i.e. a(2), is zero. One could say that the correlation between X1_ 2 and X1 is entirely eliminated when we remove the information in both variables explained by X1 _ 1 . ExAMPLE 3 .4.2 (An MA(1 ) Process). For the moving average process, 1 (} 1 < 1 , { Z1 } "' WN(0, 0" 2 ), XI = Zl + (JZI - 1 ' we have a(1) = p(l) = (}j(l + (} 2 ). A simple calculation yields PSil{ x2 ) X 3 = [ (J/( 1 + (} 2 ) ] X2 = PSil{x 2 JX1, whence a(2) Corr(X 3 - (}(1 + (} 2 )- 1 X2 , X 1 - (}( 1 + (} 2 ) - 1 X2 ) - (]2/( 1 + (]2 + (} 4 ). = = = - More lengthy calculations (Problem 3.23) give ( (J)k ( l - (]2 ) a(k) - - (J2<k+ll 1 = One hundred observations {X0 t = 1, . . . , 1 00} of the process with (} = - .8 and p(l) - .488 were displayed in Figure 3.1. The scatter diagram of the points (X1_ 2 + .488X1_ 1 , X1 + .488X1_ d is plotted in Figure 3.7 and the sample correlation of the two variables is found to be - .297, as compared with the theoretical correlation a(2) - (.8) 2 /(1 + .8 2 + .8 4 ) = - .3 1 2. ExAMPLE = = = 3.4.3 (An AR(p) Process). For the causal AR process { Z1 } "' WN (0, 0" 2 ), XI 1/J! Xl- 1 . . . - 1/Jp XI -p Zo - we have for k > p, - . p L I/Ji Xk+1-i • j=l since if Y E sp { X2 , . . . , Xk } then by causality Y E sp { Zi ,j :s; k} and Psp{X2 , , x. ) Xk+ 1 I\ xk+! - jf=l ¢Jj xk+1 -j , Y For k > p we conclude from (3.4. 1 ) that a(k) = ( = Corr xk+J - 0. )= <Zk+ 1 , Y ) = .f I/Ji Xk+J -i • X 1 - PSi>{ x2, j =! •• o. . , x. ) X J ) (3.4. 1 ) 101 §3.4. The Partial Autocorrelation Function 3 0 0 0 2 0 0 0 0 0 -1 0 0 0 -2 -3 � 0 o on::t:J 00 0 0 0 0 0 0 0 oo 0 0 0 0 -3 0 ncr:P 0 0 0 oo QJ � 0 ow:: 6 0 0 0 0 0 0 o c9 0 0 l:b_ u 0 0 cP 0 0 0 0 0 0 '2J c\] o 0 0 0 0 o o 0 0 0 0 3 -1 Figure 3.6. Scatter plot of the points (x,_ 2 - .9xt - l , x, - .9x,_ 1 ) for the data of Figure 3.2. 3 ,-------.---, 2 0 0 -2 0 0 0 0 - 3 4-------.---4---� 3 -1 -3 Figure 3.7. Scatter plot of the points (x,_ 2 Figure 3. 1 , showing the line y = - .3 1 2x. + .488xr-� , x, + .488xr - � ) for the data of 3. Stationary ARMA Processes 1 02 For k � p the values of rx(k) can easily be computed from the equivalent Definition 3.4.2 below, after first determining p( j ) = y( j )/y(O) as described in Section 3.3. In contrast with the partial autocorrelation function of an AR(p) process, that of an MA(q) process does not vanish for large lags. It is however bounded in absolute value by a geometrically decreasing function. An Equivalent Definition of the Partial Autocorrelation Function = = Let {X, } be a zero-mean stationary process with autocovariance function y( · ) such that y(h) --+ 0 as h --+ oo , and suppose that tPki' j 1, . . . , k; k 1, 2, . . . , are the coefficients in the representation k . . . . x,1 X k + t = L tPki x k + t -j · j= 1 Then from the equations P'P{X�o [ j we obtain p(O) p(1 ) p(1 ) p(O) p(2) p( l ) p(k � 1 ) p(k - 2) p(k - 3) Definition 3.4.2. TJ = [_ ] k, . . . , 1 , ., Ml p(k - l p(2) p(k - 2) tP k z ... ... - ... ' p(k) p(O) tPkk k "?. 1. (3.4.2) The partial autocorrelation rx(k) of { X1} at lag k is k "?. 1, where tPkk is uniquely determined by (3.4.2). The equivalence of Definitions 3.4. 1 and 3.4.2 will be established in Chapter 5, Corollary 5.2. 1 . The sample partial autocorrelation function is defined similarly. The sample partial autocorrelation &(k) at lag k of , x.} is defined, provided X; =I= xi for some i and j, by Definition 3.4.3. { x1 , . • • &(k) = (/Jkk > 1 � k < n, where (/J kk is uniquely determined by (3.4.2) with each p(j) replaced by the corresponding sample autocorrelation p(j). 1 03 §3.5. The Autocovariance Generating Function §3.5 The Autocovariance Generating Function If { X1 } is a stationary process with autocovariance function y ( ), then its autocovariance generating function is defined by · G (z) = y( )zk, k=L-oo k 00 (3.5. 1) provided the series converges for all z in some annulus r - 1 < l z l < r with r > 1. Frequently the generating function is easy to calculate, in which case the autocovariance at lag k may be determined by identifying the coefficient of either zk or z - k . Clearly { X1} is white noise if and only if the autocovariance generating function G(z) is constant for all z. If and there exists r > (3.5.2) j= - oo 1 such that 1 1/!) z i < oo , j=Loo 00 (3.5.3) < l z l < r, the generating function G ( ) takes a very simple form. It is easy to see that r -1 · y(k) = Cov(XI +k> XI ) = a 2 and hence that G(z) = a 2 L 1/!i l/li + l k l • j=:. oo 00 - 1/Jil/li + lk l z k k=L- oo j=L- oo 00 00 Defining 1/!(z) = L 1/!iz i, j=-oo ro r -1 < lzl < r, we can write this result more neatly in the form r -1 < lzl < r. (3. 5 .4) EXAMPLE 3.5. 1 (The Autocovariance Generating Function of an ARMA(p, q) Process). By Theorem 3. 1 .3 and (3. 1 .2 1), any ARMA process </J(B)X1 = 8(B)Z1 for which </J(z) # 0 when l z l = I can be written in the form (3.5.2) with 1/J (z) = 8(z)/</J (z), r -1 < lzl < r 3. Stationary ARMA Processes 104 for some r > 1 . Hence from (3.5.4) (:J(z)(:J(z - 1 ) G (z) = (J 2 rp rp - 1 ' (z) (z ) r - 1 < l z l < r. (3.5.5) In particular for the MA(2) process X, = Z, + 81 Z,_ 1 + 82 Z,_ 2 , we have G (z) = (J 2 (1 + 81 z + 82 z 2 ) ( 1 + 81 z- 1 + 82 z - 2 ) = (J 2 [(1 + 8 i + 8�) + (8 1 + 8 1 8 2 )(z + z - 1 ) + 8 2 (z 2 + z - 2 )], from which we immediately find that y(O) = (J 2 ( 1 + ei + e� ), y( ± 1 ) = y( ± 2) = (J2 8 1 ( 1 + 82 ), (J 2 8 z y(k) 0 for l k l > 2. and ExAMPLE 3.5.2. = Let {X,} be the non-invertible MA(1 ) process {Z,} ,..., WN(O, (J2 ). X, = Z, - 2Z, _ 1 , The process defined by Z( := ( 1 - .5B) - 1 ( 1 - 2B)Z, = i ( 1 - .5B) - 1 X, = L (.S) x ,_ j , (1) has autocovariance generating function, j=O ( 1 - 2z)(l - 2z - 1 ) 2 (J ( 1 - .5z)(l - .5z- 1 ) 4(1 - 2z)(1 - 2z - 1 ) 2 (J = ( 1 - 2z)(1 - 2z - 1 ) = 4(J2 . G(z) = It follows that {Zi} representation, ,..., WN(O, 4(J 2 ) and hence that {X,} has the invertible X, = Z( - .5Zi- 1 • A corresponding result for ARMA processes is contained in the following proposition. §3.6.* Homogeneous Linear Difference Equations with Constant Coefficients Proposition 3.5.1 . Let { X r } ¢(B)Xr = 105 be the ARMA(p, q) process satisfying the equations = B(B)Zr , where ¢(z) ¥- 0 and B(z) ¥- 0 for all z E C such that l z l 1. Then there exist polynomials, ;jy(z) and il(z), nonzero for l z l � 1, of degree p and q respectively, and a white noise sequence {Zi} such that {Xr } satisfies the causal invertible equations PROOF. Define - ¢(z) - ;jy(B)X r = = f1 ¢(z) il(B)Zi. (1 r < ] '5, p (1 B(z) = B(z) f1 -- aj z) :- 1 -- a, z) ' ( 1 -- bA :- 1 ' (1 -- b Z) where a, + 1 , . . . , aP and bs + t : : . . , b q are t J:e zeroes of ¢(z) and B(z) which lie inside the unit circle. Since ¢(z) ¥- 0 and B(z) ¥- 0 for all I z I � 1 , it suffices to show that the process defined by s < 1 ,;, q 1 ;jy(B) Z*r --Xr il(B) is white noise. Using the same calculation as in Example 3.5.2, we find that the autocovariance generating function for { Zi} is given by Since G(z) is constant, we conclude that {Zi} is white noise as asserted. D §3.6* Homogeneous Linear Difference Equations with Constant Coefficients In this section we consider the solution { hr } of the k th order linear difference equation (3.6. 1) t E T, where tx 1 , . . . , txk are real constants with tx k ¥- 0 and T is a subinterval of the integers which without loss of generality we can assume to be [k, oo ), ( -- oo , oo) or [k, k + r], r > 0. Introducing the backward shift operator B defined by 1 06 3. Stationary ARMA Processes equation (3. 1 .8), we can write (3.6. 1 ) in the more compact form where a(B) = + 1 a(B)h, Definition 3.6.1 . A set of m 0, (3.6.2) t E T, :::;; k solutions, { W >, . . . , hlm>}, of(3.6.2) will be called ·. = linearly independent if from it follOWS that C 1 = a 1 B + · · · + rxk Bk . = =. Cz Cm = 0. We note that if { hi } and {hi } are any two solutions of (3.6.2) then {c 1 h,1 + c 2 hi } is also a solution. Moreover for any specified values of h0 , h 1 , . . . , hk - I , henceforth referred to as initial conditions, all the remaining values h,, t ¢; [0, k - 1], are uniquely determined by one or other of the recur sion relations t = k, k + (3.6.3) 1, . . . , and t = - 1 , - 2, . . . . (3.6.4) Thus if we can find k linearly independent solutions { hp >, . . . , hlk l } of (3.6.2) then by linear independence there will be exactly one set of coefficients c 1 , , ck such that the solution • • . (3.6.5) has prescribed initial values h0, h 1 , . . . , hk - I · Since these values uniquely determine the entire sequence { h, } we conclude that (3.6.5) is the unique solution of(3.6.2) satisfying the initial conditions. The remainder of this section is therefore devoted to finding a set of k linearly independent solutions of (3.6.2). = h, (a0 + a 1 t + · · · + ait i )m' where a0, , ai, m are (possibly complex-valued) constants, then there are constants b0 , . . . , bj - ! such that Theorem 3.6.1 . If PROOF. (1 - mB)h, = . . • + (a0 + a 1 t + · · · ak t i )m' - m(a0 + a (t - 1) + · · · 1 i + ak (t - 1 ) )m' - I = Lt m' o a,(t' - (t - 1 )') ] and 'f,! = o a,(t ' - (t - 1 )') is clearly a polynomial of degree j - 1 . 0 §3.6.* Homogeneous Linear Difference Equations with Constant Coefficients 1 07 The functions hli1 = t i C', j = 0, 1 , . . , k - 1 are k linearly independent solutions of the difference equation Corollary 3.6. 1. . (3.6.6) PROOF. Repeated application of the operator ( 1 C 1 B) to hli1 in conjunction with Theorem 3.6. 1 establishes that hlil satisfies (3.6.6). If (c0 + c 1 t + · · · + ck _ 1 t k -1 ) C ' = 0 for t = 0, 1, . . . , k - 1, then the polynomial L:J;;-6 ci t i, which is of degree less than k, has k zeroes. This is only possible if c0 = c 1 = · · · = ck- t = 0. 0 - Solution of the General Equation of Order k For the general equation (3.6.2), the difference operator a (B) can be written as j 1 a (B) = Il ( 1 (i B)'• i�l where ( i , i = 1, . . ,j are the distinct zeroes of a (z) and ri is the multiplicity of ( i . It follows from Corollary 3.6. 1 that t " (i ', n = 0, 1 , . . . , ri 1 ; i = 1 , . . , j, are k solutions of the difference equation (3.6.2) since 1 a (B) t"(i' = Il ( 1 (; B)'s ( l - (i1 B)'' t "(i ' = 0. s =F: i It i s shown below i n Theorem 3.6.2 and Corollary 3.6.2 that these solutions are indeed linearly independent and hence that the general solution of (3.6.2) is - . - . - j ri�l n (3.6.7) h, = L L C in t ( i ' · n i � l �o In order for this general solution to be real, the coefficients corresponding to a pair of complex conjugate roots must themselves be complex conjugates. More specifically if ((i , �i ) is a pair of complex conjugate zeroes of a (z) and (i = d exp(i8i ), then the corresponding terms in (3.6.7) are which can be rewritten as r·-1 I 2 [Re(cin ) cos (8; t) + Im (cin ) sin (8; t)] t n d ', n ::::: Q or equivalently as - ri - 1 n L a n t d ' cos(8J + bin), n �o i with appropriately chosen constants ain and bin · - 1 08 3. Stationary ARMA Processes ExAMPLE 3.6. 1 . Suppose h, satisfies the first order linear difference equation (1 - � - 1 B)h, = 0. Then the general solution is given by h, = c� - r = h0 C ' . Observe that if I � I > 1 , then h, decays at an exponential rate as t ---+ oo . EXAMPLE 3.6.2. Consider the second order difference equation ( 1 + o: 1 B + cx 2 B 2 )h, = 0. Since 1 + cx 1 B + cx 2 B 2 = ( 1 - G 1 B) ( 1 - G 1 B), the character of the general solution will depend on � 1 and � 2 . Case 1 � 1 and � 2 are real and distinct. In this case, h, = c 1 � 1' + c 2 �2.' where c 1 and c 2 are determined by the two initial conditions c 1 + c 2 = h0 and c 1 �1 1 + c 2 G 1 = h 1 . These have a unique solution since � 1 of � 2 • Case 2 � 1 = � 2 . Using (3.6.7) withj = 1 and r1 = 2 we have h, = (c0 + c 1 t)�1' · Case 3 � 1 = �2 = de i0, 0 < 8 < 2n. The solution can be written either as c G ' + c�1' or as the sinusoid h, = ad - ' cos(8t + b). Observe that if 1 � 1 1 > 1 and 1 � 2 1 > 1 , then in each of the three cases, h, approaches zero at a geometric rate as t ---+ oo. In the third case, h, is a damped sinusoid. More generally, if the roots of cx(z) lie outside the unit circle, then the general solution is a sum of exponentially decaying functions and ex ponentially damped sinusoids. We now return to the problem of establishing linear independence of the solutions t " � i ', n = 0, 1 , . . . , r; - 1; i = 1 , . . . , j, of (3.6.2). Theorem 3.6.2. If q p i I I cli t ml = 0 for t = 0, 1, 2, . . . 1=1 j =O where m 1 , m 2 , j = 0, 1, . . . , p. • . . (3.6.8) , mq are distinct numbers, then cli = 0 for l = 1 , 2, . . . , q; PROOF. Without loss of generality we can assume that l m 1 1 ;;:::: 1 m 2 I ;;:::: · · · ;;:::: l mq l > 0. It will be sufficient to show that (3.6.8) implies that c l i = 0, j = 0, . . . , p (3.6.9) since if this is the case then equations (3.6.8) reduce to t = 0, 1 , 2, . . . , which in turn imply that c 2 i = 0, j = 0, . . . , p. Repetition of this argument shows then that cli = O, j = 0, . . . , p; l = 1, . . . , q. To prove that (3.6.8) implies (3.6.9) we need to consider two separate cases. Case 1 l m 1 1 > l m 2 1. Dividing each side of (3.6.8) by t P m� and letting t ---+ oo , we find that c 1 P = 0. Setting c 1 P = 0 in (3.6.8), dividing each side by t p - 1 m� and Jetting t ---+ oo , we then obtain c 2 P = 0. Repeating the §3.6. * Homogeneous Linear Difference Equations with Constant Coefficients 109 procedure with divisors t P - 2 mL t P - 3 mL . . . , m� (in that order) we find that e l i = O, j = 0, 1, . . . , p as required. Case 2 l m 1 1 = 1 m 2 I = · · · = l ms l > l ms+ 1 1 > 0, where s s q. In this case we can write mi = re i8; where - rc < ()i s rc and 8 1 , . . . , ()s are all different. Dividing each side of (3.6.8) by t P r' and letting t -> oo we find that s (3.6. 1 0) L c1P e ;o,r -> 0 as t -> 00 . 1 �1 We shall now show that this is impossible u�less c 1 P = c 2P = g, = Lf=1 c1P e ;o r and let A., n = 0, 1, 2, . . . , be the matrix . e i8 2 n , l, · · · = csp = 0. Set ,. , J e i8t (n + 1 ) e i82(n + 1 ) e i85(n +1 ) (3.6. 1 1) : : e i81 (n...+ s - 1 ) e i82(n + s - 1 ) e iO.(n + s- 1 ) Observe that det A. = e ;<o, + ··· + O.J" (det A0). The matrix A0 is a Vandermonde matrix (Birkhoff and Mac Lane ( 1 965)) and hence has a non-zero determinant. Applying Cramer' s rule to the equation An = we have det M c1 P ' det A. (3.6. 1 2) where M= Since g. -> 0 as n -> oo, the numerator in (3.6. 1 2) approaches zero while the denominator remains bounded away from zero because l det A. l = l det A 0 1 > 0. Hence c 1 P must be zero. The same argument applies to the other coefficients c 2P , . . . , csp showing that they are all necessarily zero as claimed. We now divide (3.6.8) by t P - 1 r' and repeat the preceding argument, letting t -> oo to deduce that s L c1 , p - 1 e ;o, , -> 0 as t -> oo , 1 =1 and hence that c1. p _1 = 0, I = 1 , . . . , s. We then divide by t P - 2 r', . . . , r ' (in that order), repeating the argument at each stage to deduce that clj = O, j = O, 1, . . . , p and I = 1 , 2, . . . , s. 3. Stationary ARMA Processes 1 10 This shows that (3.6.8) implies (3.6.9) in this case, thereby completing the proof of the theorem. 0 r Corollary 3.6.2. The k solutions t " C , n 0, 1 , . . . , ri - 1 ; i = 1, . . . , j, of the difference equation (3.6.2) are linearly independent. PROOF. We must show that each c in is zero if "2J= 1 L �;,:-� c int " C r 0 for t = 0, 1 , . . . , k - I . Setting hr equal to the double sum we have a(B)hr = 0 and h0 h 1 = · · · = h k _ 1 0. But by the recursions (3.6.3) and (3.6.4), this necessarily implies that hr = 0 for all t. Direct application of Theorem 3.6.2 with p = max { r1 , , ri } completes the proof. 0 = = = = • . • Problems 3 . 1 . Determine which of the following processes are causal and/or invertible: (a) X, + .2X,_1 - .48X,_ 2 = Z, (b) X, + 1 .9 X,_ 1 + .88X,_ 2 = Z, + .2Z, _ 1 + .7Z,_ 2 , (c) X, + .6X,_ 2 = Z, + 1 .2Z, _ 1 , (d) X, + 1 .8X,_ 1 + .8 1 X, _ 2 = Z, (e) X, + 1 .6X,_1 = Z, - .4Z, _ 1 + .04Z,_ 2 . 3.2. Show that in order for an AR (2) process with autoregressive polynomial t/J(z) = I - t/! 1 z - t/J2 z 2 to be causal, the parameters (t/!1 , t/!2 ) must lie in the triangular region determined by the intersection of the three regions, tPz + tPt < 1 , tPz - tP t < 1 , l t/Jz l ! . Show that { X, } also satisfies the causal AR( 1 ) equations, X, = rr X,_ 1 + 2,, { Z, } � WN(0, 0' 2 ), for a suitably chosen white noise process {Z, } . Determine 0'2 . 3.4. Show that there is no stationary solution of the difference equations X, = t/J X, _·1 + Z,, if tP = ± I . 3.5. Let { Y,, t = 0, ± 1 , . . . } be a stationary time series. Show that there exists a stationary solution { X, } of the difference equations, x, - ¢ 1 x 1 - . . · - t/JpX,_p = Y, + 61 Y,_ 1 + . . · + oq Y, - q• t - if t/J(z) = 1 - ¢ 1 z - . . · - t/JpzP =1- 0 for i z l = show that { X, } is a causal function of { Y, } . I. Furthermore, if t/J(z) =1- 0 for i z l :-::;; 1 111 Problems 3.6. Suppose that {X, } is the ARMA process defined by 1/i (B)X, = O(B)Z,, { Z, } � WN(O, a 2 ), where 1/J( · ) and 0( " ) have no common zeroes and 1/J(z) =f. 0 for l z l = 1 . If �( · ) is any polynomial such that �(z) =f. 0 for l z l 1, show that the difference equations, = �(B)I/I(B) Y, = �(B)O(B)Z,, have the unique stationary solution, { Y, } = { X, } . 3.7. Suppose {X, } i s a n invertible ARMA(p, q) process satisfying (3. 1 .4) with = Z, 00 L njXr -j · j=O Show that the sequence { nj} is determined by the equations nj + min(q ,j) L Ok nj k =! k where we define <Po = - 1 and ok - = = j - 1/ij , 0 for k > = 0, 1, . . . q and 1/ij = 0 for j > p. 3.8. The process X, = Z, - Z,_ � > { Z, } WN(O, a 2 ), is not invertible according to Definition 3 . 1 .4. Show however that Z, E sp { Xj, -oo < j :::; t} by considering the mean square limit of the sequence L}= o (1 - j/n)X,_j as n -> oo . � 3.9. Suppose {X, } i s the two-sided moving average X, 00 = . L 1/Jj Zr-j• where Lj l 1/Jj l < oo. Show that L ;;'= -oo I y(h)l < oo where y( · ) is the autocovariance function of {X, } . 3.1 0 . Let { Y, } be a stationary zero-mean time series. Define X, = ( 1 - .4B) Y, and w; = (1 - 2.58) Y, = = Y, - .4 ¥,_ 1 Y, - 2.5 Y, _, . (a) Express the autocovariance functions of {X, } and { W, } in terms of the autocovariance function of { Y, } . (b) Show that {X, } and { W, } have the same autocorrelation functions. j (c) Show that the process U, = L:� 1 (.4) Xr+j satisfies the difference equations U, - 2.5U,_ 1 = X,. - 3. 1 1 . Let {X, } be an ARMA process with 1/J(z) =f. 0, l z l = 1 , and autocovariance func tion y( · ). Show that there exist constants C > 0 and s E (O, 1 ) such that ly(h)l :::; Cs lhl, h = 0, ± 1 , . . . and hence that L ;;'= - oo l y(h)l < oo . 3. 1 2. For those processes in Problem 3. 1 which are causal, compute and graph their autocorrelation and partial autocorrelation functions using PEST. 3. 1 3. Find the coefficients 1/Jj , } = 0, 1, 2, . . . , in the representation 00 X, = L 1/Jj Zt-j j= O 1 12 3. Stationary ARMA Processes of the ARMA(2, I ) process, ( I - .5B + .04B2 )X, = ( I + .25B)Z,, 3.14. Find the autocovariances y(j), j = 0, 1, 2, . . . , of the AR(3) process, ( 1 - .5B) ( 1 - .4B) ( I - . 1 B) X, = Z,, { Z, } � WN(O, 1 ). Check your answers for j = 0, . . . , 4 with the aid of the program PEST. 3 . 1 5. Find the mean and autocovariance function of the ARMA(2, I) process, X, = 2 + 1 .3X, _ 1 - .4X, _ 2 + Z, + Z, _ 1 , Is the process causal and invertible? 3 . 1 6. Let {X, } be the ARMA(I, 1 ) process, X, - I/JX, _1 = Z, + 8Z,_ 1 , where 1 1/J I < I and 1 8 1 < I . Determine the coefficients {1/!i } i n Theorem 3. 1 . 1 and show that the autocorrelation function of { X, } is given by p(l) = 1 ( I + ¢8) (1/J + 8)/( 1 + 82 + 2¢8), p(h) = 1/J h - p ( 1 ) for h ;::::: I . 3 . 1 7. For a n MA(2) process find the largest possible values of l p(1)1 and l p(2) 1. 3. 1 8. Let {X,} be the moving average process { Z,} � IID(O, 1 ). (a) If Z� := ( I - .5B) - 1 X,, show that where .lf, _ 1 = sp{X., - oo < s < t}. (b) Conclude from (a) that Specify the values of 8 and a 2 . (c) Find the linear filter which relates { Z,} to { zn , i.e. determine the coeffi IJ(jz, _ j· cients {IJ(J in the representation z� = Ii� (d) If EZ� = c , compute E((ZWZ!l. If c -=1- 0, are Z! and Z! independent? If Z, N(O, 1 ), are Z! and Z! independent? - ro � 3 . 1 9. Suppose that {X,} and { Y;} are two zero-mean stationary processes with the same autovariance function and that { Y;} is an ARMA(p, q) process. Show that {X,} must also be an ARMA(p, q) process. (Hint: If ¢ 1 , . . . , </>P are the AR coefficients for { Y;}, show that { W, := X, - </>,X, _ , - · · · - </>pX r - p } has an autocovariance function which is zero for lags I hi > q. Then apply Proposition 3.2. 1 to { W,}.) 3.20. (a) Calculate the autocovariance function y( · ) of the stationary time series (b) Use program PEST to compute the sample mean and sample autocovari ances y(h), O :0::: h :0::: 20, of {VV 1 X, } where {X,, t = 1, . . . , 72 } is the accidental 2 deaths series of Example 1 . 1 .6. 1 13 Problems (c) By equating '9(1 ), y(l l) and '9(12) from part(b) to y ( l ), y(l l ) and y ( l 2) respec tively from part(a), find a model of the form defined in (a) to represent { VV 1 X, }. 2 3.2 1 . B y matching the autocovariances and sample autocovariances a t lags 0 and 1, fit a model o f t h e form X, - 11 = ¢(X,_1 - /1) + Z,, to the strikes data of Example 1 . 1 .3. Use the fitted model to compute the best linear predictor of the number of strikes in 1 98 1 . Estimate the mean squared error of your predictor. 3.22. If X, = Z, - (}Z,_1 , { Z, } WN(0, 0"2 ) and 1 (} 1 < 1 , show from the prediction equations that the best linear predictor of Xn+l in sp { X� > . . . , X"} is � n xn+l = I (jlj Xn + ! -j' j� ! where ¢1 , . . . , ifln satisfy the difference equations, - Oiflj -! + ( I + 02)¢j - (}(jlj + ! = 0, 2 s j s n - 1, with boundary conditions, and 3.23. Use Definition 3.4.2 and the results of Problem 3.22 to determine the partial autocorrelation function of a moving average of order I . 3.24. Let { X, } be the stationary solution of ¢(B) X, = (}(B)Z,, where { Z,} WN(O, 0"2), (jl(z) # 0 for all z E C such that l z l = I , and ¢( · ) and 0( · ) have no common zeroes. If A is any zero-mean random variable in L 2 which is uncorrelated with { X, } and if I z0 I = I , show that the process { X, + Az� } i s a complex-valued sta tionary process (see Definition 4. 1 . 1 ) and that {X, + Az� } and {X, } both satisfy the equations ( I - z0B)¢(B)X, = ( I - z0 B)(}(B)Z,. � CHAPTER 4 The Spectral Representation of a Stationary Process The spectral representation of a stationary process { Xn t = 0, ± 1, . . . } essen tially decomposes { X1 } into a sum of sinusoidal components with uncorrelated random coefficients. In conjunction with this decomposition there is a cor responding decomposition into sinusoids of the autocovariance function of { X1 }. The spectral decomposition is thus an analogue for stationary stochastic processes of the more familiar Fourier representation of deterministic functions. The analysis of stationary processes by means of their spectral representations is often referred to as the "frequency domain" analysis of time series. It is equivalent to "time domain" analysis, based on the autocovariance function, but provides an alternative way of viewing the process which for some applications may be more illuminating. For example in the design of a structure subject to a randomly fluctuating load it is important to be aware of the presence in the loading force of a large harmonic with a particular frequency to ensure that the frequency in question is not a resonant frequency of the structure. The spectral point of view is particularly advantageous in the analysis of multivariate stationary processes (Chapter 1 1 ) and in the analysis of very large data sets, for which numerical calculations can be performed rapidly using the fast Fourier transform (Section 10.7). §4. 1 Complex-Valued Stationary Time Series It will often be convenient for us to make use of complex-valued stationary processes. Although processes encountered in practice are nearly always real-valued, it is mathematically simpler in spectral analysis to treat them as special cases of complex-valued processes. §4. 1 . Complex-Valued Stationary Time Series 1 15 Definition 4.1.1. The process {X1 } is a complex-valued stationary process E I X1 I 2 < oo, EX1 is independent of t and E(Xt+h X1) is independent of t. if As already pointed out in Example 2.2.3, Remark 1 , the complex-valued random variables X on satisfying E I X I 2 < oo constitute a Hilbert space with the inner product (Q,ff,P) <X, Y ) = E(X Y) . (4. 1 . 1) Definition 4.1 .2. The autocovariance function y( · ) of a complex-valued stationary process {XI } is y(h) = E(Xt+h X1) - EXt+h EX1 • � � (4. 1 .2) Notice that Definitions 4. 1 . 1 and 4. 1.2 reduce to the corresponding defini tions for real processes if { X1 } is restricted to be real-valued. Properties of Complex-Valued Autocovariance Functions The properties of real-valued autocovariance functions which were established in Section 1.5, can be restated for complex-valued autocovariance functions as follows: y(O) � 0, (4. 1 .3) ly(h)l � y(O) for all integers h, y( · ) is a Hermitian function (i.e. y(h) = y( - h)). (4. 1 .4) (4. 1 .5) We also have an analogue of Theorem 1 .5. 1 , namely Theorem 4.1 . 1 . A function K( · ) defined on the integers is the autocovariance function of a (possibly complex-valued) stationary time series if and only if K( · ) is Hermitian and non-negative definite, i.e. ifand only if K(n) = K( - n) and L a; K(i - j)iii � 0, n i, j = 1 (4. 1 .6) for all positive integers n and all vectors a = (a!, . . . ' anY E e. The proofs of these extensions (which reduce to the analogous results in Section 1 .5. 1 in the real case) are left as exercises (see Problems 4. 1 and 4.30). We shall see (Corollary 4.3.1) that "Hermitian" can be dropped from the statement of Theorem 4. 1 . 1 since the Hermitian property follows from the validity of (4. 1 .6) for all complex a. 1 16 4. The Spectral Representation of a Stationary Process §4.2 The Spectral Distribution of a Linear Combination of Sinusoids In this section we illustrate the essential features of the spectral representation of an arbitrary stationary process by considering the simple complex-valued process, n X, = I A ( A.) e i' AJ j= l (4.2. 1) in which - n < )� 1 < A. 2 < · · · < )," = n and A(A. d, . . . , A(A.n) are uncorrelated complex-valued random coefficients (possibly zero) such that j = 1, . . . , n, E(A (A.)) = 0, and j = 1 , . . . , n. E(A(A.)A(A.)) = aJ , For {X,} to be real-valued it is necessary that A(),") be real and that A.j = An - j and A(A.) = A(A.n - ) for j = 1, . . . , n - 1 . (Note that A()�) and A (A_" _ ) are uncorrelated in spite of the last relation.) In this case (see Problem 4.4), n X, = I (C(A_Jcos tA,j - D(A,j )sin tA.), j= l where A(A.) = C(A.) + iD (A.), j = 1 , . . . , n and D(A_") = 0. (4.2.2) It is easy to see that the process (4.2. 1 ), and in particular the real-valued process (4.2.2), is stationary since and E(xr+ h X-r ) = 2 a e ih A',. j =l j n " L.. the latter being independent of t. Rewriting the last expression as a Riemann Stieltjes integral, we see that the process { X, } defined by (4.2. 1) is stationary with autocovariance function, y(h) = I J(-1t,1t] e ihv dF(v), (4.2.3) where F is the distribution function, F(A.) = I a} . (4.2.4) j: AJ <:; A Notice that the function F, which is known as the spectral distribution function of { X, } , assigns all of its mass to the frequency interval ( - n, n]. The mass assigned to each frequency in the interval is precisely the variance of the ran dom coefficient corresponding to that frequency in the representation (4.2. 1 ). §4.3. Herglotz's Theorem 1 17 The equations (4.2. 1) and (4.2.3) are fundamental to the spectral analysis of time series. Equation (4.2. 1) is the spectral representation of the process {X, } itself and equation (4.2.3) is the corresponding spectral representation of the covariance function. The spectral distribution function appearing in the latter is related to the random coefficients in (4.2. 1 ) through the equation F(A) L .<; ,; .< E I A (AJI 2 . The remarkable feature of this example is that every zero-mean stationary process has a representation which is a natural generalization of (4.2. 1 ), namely = X, = I J ( - 1t , 1t] e itv dZ (v). (4.2.5) The integral is a stochastic integral with respect to an orthogonal-increment process, a precise definition of which will be given in Section 4.7. Corre spondingly the autocovariance function Yx ( · ) can be expressed as Yx (h) =I e ihv dF( v), I cos vh dF(v). J ( - 1t . 1t] = = = (4.2.6) where F is a distribution function with F( - n) 0 and F(n) y(O) E I X, I 2 • The representation (4.2.6) is easier to establish than (4.2.5) since it does not require the notion of stochastic integration. We shall therefore establish (4.2.6) (Herglotz' s theorem) in Section 4.3, deferring the spectral representation of { X, } itself until after we have introduced the definition of the stochastic integral in Section 4.7. In the special case when { X, } is real there are alternative forms in which we can write (4.2.5) and (4.2.6). In particular if Yx(h) is to be real it is necessary that F( - ) be symmetric in the sense that F(A) = F(n - ) - F( - A - ), - n < A < n, where F(A -) is the left limit of F at A (see Problem 4.25). Equation (4.2.6) can then be expressed as Yx ( h) = J ( - 1t , 1t] Equivalent forms of (4.2.5) when {X,} is real are given in Problem 4.25. §4.3 Herglotz's Theorem Theorem 4. 1 . 1 characterizes the complex-valued autocovariance functions on the integers as those functions which are Hermitian and non-negative definite. Herglotz's theorem, which we are about to prove, characterizes them as the functions which can be written in the form (4.2.6) for some bounded distribution function F with mass concentrated on ( - n, n]. (Herglotz). A complex-valued function y( · ) defined on the integers is non-negative definite if and only if Theorem 4.3.1 1 18 4. The Spectral Representation of a Stationary Process y(h) = I J ( - 1t , 1t] e ih v dF(v) for all h = 0, ± 1 , . . . , (4.3.1) where F( " ) is a right-continuous, non-decreasing, bounded function on [ - n, n] and F( - n) = 0. (The function F is called the spectral distribution function of y and if F(2) J �, !(v) dv, - n ::;; 2 ::;; n, then f is called a spectral density of = y( . ).) PROOF. If y( ·) has the representation (4.3. 1 ) then it is clear that y( · ) is Hermitian, i.e. y( - h) = y(h). Moreover if a, E C, r = 1, . . . , n, then r. � 1 a,y(r - s)iis = f " '· � 1 a, iis exp[iv(r - s)] dF(v) = fJJ1 a, exp[ivr] 1 2dF(v) � 0, so that y( · ) is also non-negative definite and therefore, by Theorem 4. 1 . 1, an autocovariance function. Conversely suppose y( · ) is a non-negative definite function on the integers. Then, defining fN(v) . 1 � ". ;_, e - vy (r - s)e !Sv 2 N r, s =l =n -- = . 1 " (N - lm l)e - •mvy(m), ;_, 2 n N lm i < N - we see from the non-negative definiteness of y( · ) that fN(v) � 0 for all v E ( - n, n]. = Let FN( · ) be the distribution function corresponding to the density fN( " ) 1 < -" · "k ). Thus FN(2) = 0, 2 ::;; - n, FN(2) FN(n), 2 � n, and FN(2) Then for any integer h, I J ( - 1t , 1t] J ( - 1t , 1t] - 1! ::;; 2 ::;; 1!. = _.!._ L ( - �) y(m) f" ei(h-m)v dv, l h l ) y(h) lhl h i e v dFN(v) = e ih v dFN(v) i.e. I = J:/N(v) dv, 2n l mi< N {( 1 1 -N 0' N _" ' < N, otherwise. (4.3.2) 1 19 §4.3. Herglotz's Theorem Since F N(n) = f( - , , ,1 dFN(v) = y(O) < oo for all N, we can apply Helly's theorem (see e.g. Ash (1 972), p. 329) to deduce that there is a distribution function F and a subsequence { FNJ of the sequence { FN} such that for any bounded continuous function g with g(n) = g( - n), J_ "·"l g(v) dFN"(v) J_"·"l g(v) dF(v) � as k � oo . (4.3.3) Replacing N by Nk in (4.3.2) and letting k � oo , we obtain y(h) = I J ( - 1t , 1t] e ih v dF(v) (4.3.4) which is the required spectral representation of y( · ). 0 Corollary 4.3. 1 . A complex-valued function y( · ) defined on the integers is the autocovariance function of a stationary process {X,, t 0, ± 1, . . . } if and only if either (i) y(h) f< -, , ,1 e ih v dF(v) for all h 0, ± 1 , . . . , where F is a right-continuous, non-decreasing, bounded function on [ - n, n] with F( - n) 0, or (ii) I ?.i=t ai y(i - j ) ai z 0 for all positive integers n and for all a = (a 1 , , an )' E Cn. The spectral distribution function F( · ) (and the corresponding spectral density if there is one) will be referred to as the spectral distribution function (and the spectral density) of both y( · ) and of {X,}. = = = = . • • PROOF. Herglotz's theorem asserts the equivalence of (i) and (ii). From (i) it follows at once that y( · ) is Hermitian. Consequently the conditions of 0 Theorem 4. 1 . 1 are satisfied if and only if y( · ) satisfies either (i) or (ii). It is important to note that the distribution function F( · ) (with F( - n) = 0) is uniquely determined by y(n), n = 0, ± 1, . . . . For if F and G are two distribution functions vanishing on ( - oo, - n], constant on [n, oo) and such that y(h) = I J ( - 1t, 1t] e ih v dF(v) = I J ( - 1t , 1t] e ihv dG(v), h = 0, ± 1, . . . , then it follows from Theorem 2. 1 1 . 1 that I J ( - 1t , 1t] rjJ(v) dF(v) = I = J ( - 1t , 1t] r/J (v) dG(v) if rjJ is continuous with r/J(n) = r/J( - n), and hence that F(),) G(A.) for all ). E ( - oo, oo ). The following theorem is useful for finding F from y in many important cases (and in particular when y is the autocovariance function of an ARMA(p, q) process). 4. The Spectral Representation of a Stationary Process 1 20 Theorem 4.3.2. If K ( · ) is any complex-valued function on the 00 [ K (n)[ n=I- oo then K(h) = where integers such that (4.3.5) < oo , f�" eihvf(v) dv, h = 0, 1, 12n L e-inAK(n). f(A) = ± ... 00 n= - oo (4.3.6) (4.3.7) PROOF. 1 f "" = - L K(n) e i(h - n) v dv 2n n= - oo = K(h), since the only non-zero summand is the one for which n = h. The inter change of summation and integration is justified by Fubini' s theorem since s�" ( l /2n) I::'= - oo [ e i(h - n) v K(n)[ dv < 00 by (4.3.5). D 00 An absolutely summable complex-valued function y( · ) defined on the integers is the autocovariance function of a stationary process if and only if Corollary 4.3.2. 1 00 f(A) := - I e - w A y(n) 2: 0 for all A E [ - n, n], 2n n= - oo in which case f( · ) is the spectral density of y( · ) . . (4.3.8) PROOF. First suppose that y( · ) is an autocovariance function. Since y ( · ) is non-negative definite and absolutely summable, 0 � fN(A) = 1 � L.... 2nN r, s = l � = __!__ I 2n l m i < N . e - ". Ay( r - s)e!SA ' ' (1 - �N )e - imAy(m) � f(A) as N � oo . Consequently f(A) 2: 0, -n � A � n. Also from Theorem 4.3.2 we have y(h) = f�" e ihvf(v) dv, h = 0, ± . . . . Hence f( - ) is the spectral density of y( - ). On the other hand if we assume only that y ( · ) is absolutely summable, 1, §4.3. Herglotz's Theorem 121 = Theorem 4.3.2 allows us to write y(h) J�" e ih vf(v) dv. If j().) ::2:: 0 then this integral is of the form (4.3. 1 ) with F(A) = J�" f(v) dv. This implies, by Corollary 4.3. 1, that y( · ) is an autocovariance function with spectral density f. 0 EXAMPLE ={ 4.3. 1 . Let us prove that the real-valued function K(h) = 1 if h = 0, p if h ± 1 , 0 otherwise, is an autocovariance function if and only if I P I � �. Since K ( · ) is absolutely summable we can apply Corollary 4.3.2. Thus j(A) = - L e- in J.K(n) 2n n= - oo 00 1 = [ 1 1 + 2p cod 2n J is non-negative for all ). E [ - n, n] if and only if I P I � �. Consequently K ( - ) is an autocovariance function if and only if I P I � � in which case K( ) has the spectral density f computed above. In fact K( · ) is the autocovariance function of an MA( 1 ) process (see Example 1 .5.1). · Notice that Corollary 4.3.2 provides us with a very powerful tool for checking non-negative definiteness, which can be applied to any absolutely summable function on the integers. It is much simpler and much more infor mati•fe than direct verification using the definition of non-negative definiteness stated in Theorem 4. 1 . 1 . Corollary 4.3.2 shows i n particular that every ARMA(p, q) process has a spectral density (see Problem 3.1 1). This density is found explicitly in Section 4.4. On the other hand the linear combination of sinusoids (4.2. 1 ) studied i n Section 4.2 has the purely discrete spectral distribution function (4.2.4) and therefore there is no corresponding spectral density. If { X,} is a real-valued stationary process then its autocovariance function x( Y · ) is real, implying (as pointed out in Section 4.2) that its spectral distribution function is symmetric in the sense that - n < A < n. We can then write Yx(h) = 1-rr,rrJ cos(vh) dFx(v). (4.3.9) In particular if Yx( · ) has spectral density fx().), - n � ). � n, then fx(A) fx( - ).), - n � A � n, and hence Yx(h) = J: 2 fx(v) cos(vh) dv. = (4.3.10) 122 4. The Spectral Representation of a Stationary Process {X,} The covariance structure of a real-valued stationary process is thus determined by F x(O - ) and F x(A), 0 ::S:: A ::S:: n (or by fx(A), 0 ::S:: A ::S:: n, if the spectral density fx( · ) exists). From the above discussion, it follows that a function f defined on [ - n, n] is the spectral density of a real-valued stationary process if and only if Remark. = (i) f()�) f( - )�), (ii) f(A) 2 0, and (iii) s�n�u�) dA. < oo . §4.4 Spectral Densities and ARMA Processes Theorem 4.4.1 . If { t;} is any zero-mean, possibly complex-valued stationary process with spectral distribution function Fr( · ), and {X,} is the process then co X, = j=-oo L 1/!i Yr-i co j= - co where L 1 1/!i l < (4.4. 1) oo , {X,} is stationary with spectral distribution function - n ::S:: PROOF. The argument of Proposition 3.1.2 shows that mean zero and autocovariance function, A ::S:: n. (4.4.2) {X,} is stationary with co E(Xt +h X,) = j. kI=-ao 1/Jiflk Yr (h - j + k), h = 0, ± 1 , . . . . Using the spectral representation of { Yr ( · )} we can write ei(h-j+k)v dFy (v) Yx(h) = . t 1/Jif/ { J, k - - ao k J(- 7<.7<] = J_"· "l c=�co 1/Jie- iiv) (=�co lf/k eikv) eihv dFy(v) { eihv I . I I/Jj e - ijv l 2 dFy (v), = J(-n.n] J = - co which immediately identifies Fx( · ) defined by (4.4.2) as the spectral distribu tion function of D { t;} {X,}. If has a spectral density fr( · ) and if also has a spectral density fx( · ) given by {X,} is defined by (4.4. 1 ), then {X,} §4.4. Spectral Densities and ARMA Processes 1 23 (4.4.3) where if;(e - iA ) = L)= - cc if;i e - iiA. The operator if;(B) = I � - oo if;i Bi applied to { r; } in (4.4. 1) is often called a time-invariant linear filter with weights { if;i }. The function if;(e - i ·) is called the transfer function of the filter and the squared modulus I if;(e -i·W is referred to as the power transfer function of the filter. Time-invariant linear filters will be discussed in more detail in Section 4. 10. As an application of Theorem 4.4. 1 we can now derive the spectral density of an arbitrary ARMA(p, q) process. (Spectral Density of an ARMA(p, q) Process). Let {X, } be an ARMA(p, q) process (not necessarily causal or invertible) satisfying (4.4.4) { Z, } WN(0, 1J"2), ¢J(B)X, = 8(B) Z,, Where r/J(z) = 1 - 1J 1 - . . . - ¢Jpz P and 8(z) = 1 + e l + . . . + eq z q have no common zeroes and r/J(z) has no zeroes on the unit circle. Then {X, } has spectral density Theorem 4.4.2 � Z Z (4.4.5) - n ::::; A ::::; n. (Because the spectral density of an ARMA process is a ratio of trigonometric polynomials it is often called a rational spectral density.) PROOF. First recall from Section 3.1 that the stationary solution of (4.4.4) can be written as X, = I � -oo if;i Z,_i where I � - oo 1 1/Ji l < oo . Since { Z,} has spectral density IJ" 2j(2n) (Problem 4.6), Theorem 4.4. 1 implies that {X, } has a spectral density. This also follows from Corollary 4.3.2 and Problem 3. 1 1 . Setting U, = ¢J(B)X, = 8(B) Z, and applying Theorem 4.4. 1 , we obtain Since r/J(e - iA ) (4.4.5). (4.4.6) =1 ExAMPLE 4.4. 1 0 for all A E [ - n, n] we can divide (4.4.6) by I rjJ(e - iA W to obtain D (Spectral Density of an MA(l) Process). If x, then = z, + ez,_ l , }. - n ::::; ::::; n. The graph of fx(X), 0 ::::; A ::::; n, is displayed in Figure 4. 1 for each of the values e = 9 and e = 9 Observe that for e = .9 the density is large for low frequencies and small for high frequencies. This is not unexpected since when e = .9 the process has a large lag one correlation which makes the series smooth with only a small contribution from high frequency components. For - . . . 4. The Spectral Representation of a Stationary Process 1 24 24 22 20 18 1 6 14 12 10 8 6 4 2 0 0 0. 1 0.2 0.3 0.4 0.5 0.3 0.4 0.5 (a) 0 0. 1 0.2 (b) Figure 4. 1 . The spectral densities fx(2rr.c), 0 <::; c -::; t of X, = Z, + OZt - 1 , { Z, } WN(O, 6.25), (a) when f) = - .9 and (b) when fJ = .9. � §4.4. Spectral Densities and ARMA Processes 1 25 8 - .9 the lag one correlation is large and negative, the series fluctuates rapidly about its mean value and, as expected, the spectral density is large for high frequencies and small for low frequencies. (See also Figures 1 . 1 8 and 1 . 1 9.) = 0 EXAMPLE 4.4.2 (The Spectral Density of an AR( 1 ) Process). If then by Theorem 4.4.2, {X, } has spectral density fx(X) az = -11 2n - az . ,Pe - ''r 2 = - ( 1 - 2,P cod + ¢ 2 ) - 1 . 2n = This function is shown in Figure 4.2 for each of the values ,P = .7 and ,P - .7. Interpretations of the graphs analogous to those in Example 4.4. 1 can again be made. Causality, Invertibility and the Spectral Density = Consider the AR MA(p, q) process {X, } satisfying ,P(B)X, = 8(B)Z,, where ,P(z)8(z) =/= 0 for all z E C such that l z l 1 . Factorizing the polynomials ,P( · ) and 8( · ) we can rewrite the defining equations in the form, p q fl ( 1 - ai- 1 B)X, fl ( 1 - bi- 1 B)Z,, j= 1 j= 1 where = and l bj l > 1 , 1 s j s s, l hj l < l , s < j s q. By Theorem 4.4.2, {X, } has spectral density a z fl ]= 1 1 - bi- 1 e - i l l z fX ( )c) = - " 1 - :-1 - l 2 • 2 n fl 1 = 1 1 1 a1 e i l Now define " = "5:flj Sr ( I O(B) = s j ss ;}(B) and 1 - at B) fl ( I - ZiiB) r < j -::; p bj- 1 B) fl ( 1 - bj B). 1 s <jS q Then the ARMA(p, q) process { X, } defined by fl (I - ;}(B)X, = O(B)Z, (4.4.8) 1 26 4. The Spectral Representation of a Stationary Process 0 0. 1 0.2 0.3 0.4 0.5 0.3 0.4 0.5 (a) 0 0. 1 0.2 (b) Figure 4.2. The spectral densities fx (2nc), 0 s c s 1, of X, - ¢>X,_ 1 = Z,, { Z, } WN(O, 6.25), (a) when ¢> = .7 and (b) when ¢> = - .7. � §4.4. Spectral Densities and ARMA Processes 1 27 has spectral density Since I I - bi e - ; ;_1 = I I - bi e i !-1 = l bil l l - bi- l e -ii- I , we can rewrite fx (A.) as fJ 2 Ti s<j ,; q l bil 1 8(e �: Ti s<j ,; q l bjl fx(A.). fx (A.) = 2n = fl r <j ,; p I ail l ifo (e ) I fl r <j $ p I ail Thus the ARMA(p, q ) process { x,+ } defined by : : ?: ( ( n l a l ) ( TI l bj l )-2 ) , ;j(B)X,+ = 8(B) Z" {Z,} - wN 0, fJ 2 2 r <J -:;, p j s<] -5:,_ q is causal and invertible and has exactly the same spectral density (and hence autocovariance function) as the ARMA process (4.4.7). In fact { X, } itself has the causal invertible representation �(B)X, = 8(B)Z,* where { Zr*} is white noise with the same variance as { Z,}. This is easily checked by using the latter equation as the definition of { Zi}. ( See Proposition 3.5. 1 .) EXAMPLE 4.4.3. The ARMA process { Z, } - WN(O, fJ 2 ), is neither causal nor invertible. Introducing �(z) = 1 - 0.5z and e (z) = 1 + 0. 25 z, we see that { X,} has the causal invertible representation { z: } - WN (0, .25 fJ 2 ). x, - o.sx, _ , = z: + o.2sz:_ , , The case when the moving average polynomial 8(z) has zeroes on the unit circle is dealt with in the following propositions. Let { X,} be an ARMA(p, q) process satisfying cp(B)X, = 8(B)Z, , {Z,} - WN(O, fJ 2 ), where cp(z) and 8(z) have no common zeroes, cp(z) =f. 0 for I z I = 1 and 8(z) =f. 0 for l z l < 1 . Then Z, E sp { X5, - oo < s :::;; t } . Proposition 4.4.1. PROOF. Factorize 8(z) as et(z)e*(z), where et (z) = TI o - bj z), 8*(z) = f1 (1 - 1 �j�s s < j s,q _, bi- 1 z), 1 28 4. The Spectral Representation of a Stationary Process l bjl > 1 , 1 � j � process, s and l bj l and note, since </>(B)X, = = 1, s < j � q. Then consider the MA(q - s) Yr = B*(B)Z, f)f(B) Yr, that sp{ Yk , - oo < k � t} s:::: sp { X k , - oo < k � t} for all t. Consequently it suffices to show that Z, E sp{ Yk , Proposition 3.2. 1, { Ur} where - oo < k � t}. By "' WN(O, (Jb), U, = Yr - psp{Y., - oo < k < t) Yr · Using the two moving average representations for { Yr}, we can write the spectral density fy of { Yr } as Jy(A) = (J2 __!!. l o:(e -i..\W 2n = - I B*(e -i..�w. (J 2 2n Since B*(z) has all of its zeroes on the unit circle, o:(z) and 8*(z) must have the same zeroes. This in turn implies that B(z) = o:(z) and (J 2 = (Jb . It now follows that the two vectors, ( U, Yr , . . . , Yr - nY and (Z0 Yr , . . . , Yr - n)', have the same covariance matrix and hence that Taking mean square limits as n --> oo and using the fact that U, E sp{ Yk, - oo < k � t}, we find that Hence, since (J2 = (Jb, E(Z, - PS!i( Y., - oo < k s tl Zr) 2 This implies that Z, = Psp{Y. , oo < L ,1 Z,, _ = EZ� - E U� = 0. or equivalently that Z, E sp { Yk, - oo < k � t} as was to be shown. D Remark 1. If we extend the definition (Section 3. 1 ) of invertibility of the ARMA equations, <f>(B)X, = fJ(B)Z,, by requiring only that Z, E sp{Xb - oo < k � t}, 1 29 §4.4. Spectral Densities and ARMA Processes Proposition 4.4. 1 then states that invertibility is implied by the condition 8(z) =f. 0 for l z l < 1 . The converse is established below as Proposition 4.4.3. Remark 2. If ¢(z) =f. 0 for all I z I s 1, then by projecting each side of the equation, ¢(B)X, = 8(B)Zn in Proposition 4.4. 1 onto sp{X., - oo < s s t - 1 }, we see at once that Z, is the innovation, z, = x, - P, _ 1 X r o where P, _ 1 denotes projection onto sp{ X., - oo < s s t - 1 }. Proposition 4.4.2. Let {X,} be an ARMA(p, q) process satisfying = where ¢(z) and 8(z) have no common zeroes and ¢(z) =f. 0 for all z E IC such that l z l 1 . Then, if {bi, 1 s j s q} are the zeroes of 8(z), with l bil ;:::: 1 , 1 s j s m , and l bi l < 1 , m < j s q, there exists a white noise sequence { U,} such that {X,} is the unique stationary solution of the equations, where (iJ(z) is the polynomial defined in (4.4.8) and e(z) is the polynomial, e(z) n ( 1 - bj- 1 z) n ( 1 - bj z). = 1 ::;;, j ::;;, m The variance of U, is given by m < j ::;;, q PROOF. By Problem 4.29, we know that there exists a white noise sequence { 2,} such that The required white noise sequence { U,} is therefore the unique stationary solution of the equations, m < j -:;; q m < j::;;, q D If {X,} is defined as in Proposition 4.4.2 and the polynom ial 8(z) has one or more zeros in the interior of the unit circle, then Z, ¢ sp{Xs, - oo < s S t}. Proposition 4.4.3. 4. The Spectral Representation of a Stationary Process 1 30 PROOF. By Proposition 4.4.2, we can express X, in the form, 00 x, = I �j u, _ j , where V, = X, - P, _ 1X,, L� o �izi =O(z)f¢ (z) for l z l sp{ Xk, - oo < k s; t} =sp{ Ub - oo s; <k 1 , and s; t}. Suppose that the zeroes of O(z) in the interior of the unit circle are {b/ m < j s; q} where m < q, and let O(z) =O*(z) f1 ( 1 m < j :So q - bi- 1 z) = O*(z)O;(z). From the equations ¢(B)X, =8(B)V, and </J(B)X, = O *(B)O;(B)Z, , it follows that ¢(B)e*(B)Z, = L �j v, _ j , 00 j = - 00 j where �j is the coefficient of z in the Laurent expansion of </J(z)O(z)/O;(z), l z - 1 1 < £, which is valid for some £ > 0. Since </J(z)O(z) and O;(z) have no zeroes in common and since O ;(z) has all of its zeroes in the interior of the unit circle, it follows that �j # 0 for some j = -j0 < 0. From Z, E sp{ X k, - oo < k s; t}, it would follow that ¢(B) e*(B)Z, E sp{ Uk , - oo < k s; t}. But this is impossible since ( V, +jo' ¢(B)O*(B)Z, ) =� -jo Var( Ur + jo) # 0. We conclude therefore that Z, f/; sp{Xb - oo <k s; t}, as required. D Rational Approximations for Spectral Densities For any real-valued stationary process {X, } with continuous spectral density f, it is possible to find both a causal AR(p) process and an invertible MA(q) process whose spectral densities are arbitrarily close to f. This suggests that {X,} can be approximated in some sense by either an AR(p) or an MA(q) process. These results depend on Theorem 4.4.3 below. Recall that f is the spectral density of a real-valued stationary process if and only iff is symmetric, non-negative and integrable on [ - n, n]. If f is a symmetric continuous spectral density on [ - n, n], then for every £ > 0 there exists a non-negative integer p and a polynomial a(z) =f1J=1 (1 - 11t z) = 1 + a 1 z + · · · + aP z P with l '7j l > l , j 1 , . . . , p, and Theorem 4.4.3. = §4.4. Spectral Densities and ARMA Processes 131 real-valued coefficients a0, . . . , aP , such that I A ) a(e - i." W - f(.A) i < £ for all .A E [ - n, n] where A = (1 + ai + · · · + a�r 1 (2n) - 1 J�,,J(v) dv. (4.4.9) PROOF. If f(.A) = 0 the result is clearly true with p = 0. Assume therefore that M = SUP-n -s -< -s n f(),) > 0. Now for any £ > 0 let and define fJ(),) = max { f(.A), ()}. Clearly fJ(),) i s also a symmetric continuous spectral density with fJ(.A) � () and Now by Theorem 2. 1 1 . 1 there exists an integer r such that l r- 1 j=riO ikLi <Si bk e-;u - JJ(A) I < () (4.4. 10) for all A E [ - n, n], (4.4. 1 1) where bk = (2n) - 1 J�"fJ(v)e ivk dv. Interchanging the order of summation and using the fact that fJ is a symmetric function, we have r - 1 L L bk e -;u = L (1 - l k l/r)bk e - ;u_ j = 0 lki <Sj lkl <r This function is strictly positive for all A by (4.4. 1 0) and the definition of fJ(.A). r-1 Let C(z) = I ( 1 - lkl/r)bk zk , lkl< r and observe that if C(m) = 0 then by symmetry C(m -1 ) = 0. Hence, letting p = max { k : bk # 0}, we can write p z PC(z) = K 1 f1 ( 1 - '1i- 1 z)( l - IJi z) j= 1 for some K 1 , 1J 1 , . . . , Y/ p such that 1 '1il > 1 , j = 1 , . . . , p . This equation can be rewritten in the form (4.4. 1 2) where a(z) is the polynomial 1 + a 1 z + · · · + aP z P = f1f=1 (1 - '1i- 1 z), and K 2 = ( - l)P '1 1 '1P K 1 . Equating the coefficients of z 0 on each side of (4.4. 1 2) . • • we find that K 2 = b0(1 + a i + · · · + a�) - 1 = (2n) - 1 ( 1 + ai + ·:· + a�)- 1 fJJ(v) dv. 132 4. The Spectral Representation of a Stationary Process Moreover from (4.4. 1 1 ) we have ] K 2 ] a(e - uW - fb()-) ] < b for all )_. From (4. 4. 1 3) and (4.4. 1 0) we obtain the uniform bound ( I + af + ··· + a� ) - 1 ] a (e iA W � (fb(A) + b)2n: - (4.4. 1 3) (f/b(v) dvr 1 Now with A defined as in the statement of the theorem I K z l a(e - i AW - A ] a(e - i A W I (f" (fb(v) - f(v)) dv) 4n:M (f" f(v) dvr 1 1 4n:Mb (f/(v) dvy . � (2n:) - � � (4.4. 1 4) From the inequalities (4.4. 1 0), (4.4. 1 3) and (4.4. 1 4) we obtain ]A ] a(e - iA W - f(A)] 0, then there exists an invertible MA(q) process Corollary 4.4.1 . such that where CJ2 = ( I + l fx (A.) - f(A)] < e for all A E [ - n , n:] , af + · · · + a; ) - 1 J"-rrf(v) dv. PROOF. Problem 4. 1 4. D If f is a symmetric continuous spectral density and e > 0 then there exists a causal AR(p) process Corollary 4.4.2. such that l fx (A) - f(A) ] < e for all A E [ - n, n ] . 1 33 §4.5. * Circulants and Their Eigenvalues PROOF. Let f'(A.) = max { f(A), e/2}. Then j'(A) � c/2 and 0 :-:;; j'(),) - f(A) :-:;; e/2 for all A E [ - n, n]. (4.4. 1 5) Let M = max,d'()o) and b = min { (2M)� 2 e, (2M)� 1 }. Applying Theorem 4.4.3 to the function 1 /f"(),), we have (4.4. 1 6) I K l a(e�ilW - 1 /f"(A) I < b for all A E [ - n, n], where the polynomial a(z) = 1 + a 1 z + · · · + a P z P is non-zero for l z l :-:;; 1 and K is a positive constant. Moreover by our definition of b, the inequality (4.4. 1 6) yields the bound K � 1 l (a(e � il) l � 2 :-:;; f " (A)/( 1 - bf" (A)) :-:;; M/( 1 - Mb) :-:;; 2M. Thus 1 1 I K � I a(e�ilW 2 - j'(A) I = I K l a(e � i lW - 1 /f " (A) I [K � I a(e � ilW2f'(A)] (4.4. 1 7) < 2M 2 b :-:;; e/2. Combining the inequalities (4.4.1 5) and (4.4. 1 7) we get 1 (4.4. 1 8) I K� l a(e � ilW2 - f(A) I < e for all A E [ - n, n]. Now by Theorem 4.4. 1 the causal AR(p) process has spectral density K� 1 l a(e�ilW, which by (4.4. 1 8) furnishes the required approximation to f()o). D §4.5 * Circulants and Their Eigenvalues It is often desirable to be able to diagonalize a covariance matrix in a simple manner. By first diagonalizing a circulant matrix it is possible to obtain a relatively easy and useful asymptotic diagonalization of the covariance matrix of the first n observations from a stationary time series. We say that the n x n matrix M = [m jJ?.j =J is a circulant matrix if there exists a function i m( · ) with period n such that m ij = m(j - i). That is m(O) m( l ) m (n - 1 ) m(n - 1 ) m(n - 2) m(O) M = m( n - 2) m(n - 1 ) m(n - 3) (4.5. 1 ) m(1) m(2) m(O) The eigenvalues and eigenvectors of M are easy to compute. Let 2nj Wj = - , n 4. The Spectral Representation of a Stationary Process 1 34 and for j = 0, 1 , . . . , n - 1 . The circulant matrix M has eigenvalues n- 1 Aj L m(h)rj- h , j 0, 1 , . . . , n - 1 , h =O with corresponding orthonormal left eigenvectors, j 0, 1 , . . . , n - 1 . Proposition 4.5.1 . = = = PROOF. Straightforward calculations give viM = n - 1 12 [m(O) + m(n - 1)ri + . . . + m(l) r;- 1 , m(1) + m(O)ri + . . . + m(2)r; - 1 , . . . , m(n - 1) + m(n - 2)ri + · · · + m(O)r;- 1 ] = Jcin-112 [1, ri, rf, . . . , rj -1 ] = Aivi, = showing that vi is a left eigenvector of M with corresponding eigenvalue Jci, j 0, 1, . . . , n - 1. Moreover, if vt is the conjugate transpose of vk , then vj vt = n - 1 (1 + rj rk- 1 + . . . + r; - 1 rk-n + 1 ) n - 1 [1 - (rjrk )"] [1 - rjrk r 1 0 ifj =f. k, 1 ifj k. ={ l = = 0 In order to diagonalize the matrix we now introduce the matrix v0 V = v. 1 �n - 1 observing from Proposition 4.5. 1 that VM VM V - 1 = J= ' A, A V and hence that (4.5.2) Diagonalization of a Real Symmetric Circulant Matrix = If the circulant matrix M defined by (4.5. 1) is also real and symmetric (i.e. if m(n - j) E IR,j = 0, 1, . . . , n - 1 ), then we can rewrite the eigenvalues ).i of Proposition 4.5.1 in the form m(j) §4.5.* Circulants and Their Eigenvalues 135 if n is odd, if n is even, (4.5.3) where [n/2 ] is the integer part of n/2. We first consider the case when n is odd. Since m( · ) is an even function, we can express the n eigenvalues of M as L m(h) A0 = Ai = l hl s [n/2 ] L m(h)exp( - iwi h), l hl s !n/2 1 and j = 1, 2, . . . , [n/2], j = 1 , 2, . . . , [n/2]. Corresponding to the repeated eigenvalue Ai = An -i (1 ::::;, j ::5, [n/2]) of M there are two orthonormal left eigenvectors vi and vn -i = vi as specified in Proposition 4.5. 1. From these we can easily find a pair of real orthonormal eigenvectors corresponding to Ai , viz. cj = (vj + vn -JIJ2 = J2;;;[ 1 , cos wj , cos 2wj , . . . , cos(n - 1)wJ and si Setting = i(vn -j - vi )/J2 = jVn [O, sin wi sin 2wi . . . , sin(n - l ) wi]. c0 = , , jVn [ 1 , 1 , 1 , . . . , 1 ] and defining the real orthogonal matrix P by (4.5.4) we have PM = A(s)p and hence (4.5.5) where A <s) = diag {A0, A 1 , A 1 , . . . , A[n!2 J , A[n!2 J } . For the case when n is even, both Ao and An;2 have multiplicity 1. If we replace c [n/2 ] by r 112 cn!2 in the definition of p and drop the last rows of the matrices P and A < s), then we again have PMP' = A<s). Proposition 4.5.2. Let y( · ) be an absolutely summable real autocovariance 4. The Spectral Representation of a Stationary Process 136 function, let f( · ) be its spectral density f(w) = (2n) -t L y( h)e-ihw, a) h=-oo and let Dn be the n n matrix, if n is odd, diag {f(O), f(w d, f(w d , . . . , f(w[n;2 1 ) ,f(w[n;2 1 ) } Dn = diag { f(O), f(w1 ), f(w 1 ), , f(w(n - 2112 ), f(w(n - 2 );2 ), f(wn;2 ) } if n is even. If P is the matrix defined by (4.5.4) and rn = [y(i - j )] ?, i =t , then the components x�J 1 of the matrix x { • . . Prn P' - 2nD" , converge to zero uniformly as n --+ oo ( i.e. sup 1 s: i ,js: n l x�'? l -+ 0). PROOF. Let Pi = [Pi t ' Pi 2 ' . . . ' Pin] denote the ith row of the matrix p and let r�s) denote the real symmetric circulant matrix r�s) = y(O) y(l) y(2) y(l ) y(O) y(l ) y(2) y( l ) y(O) y(2) y( l ) y(3) y(2) y(4) y(3) y(l) y(2) y( 3 ) y(l) y(O) We know from (4.5.5 ) that pnsJP' = Ns1. Moreover since the elements of the matrix A(sJ - 2nD" are bounded in absolute value by L lh/ > [n;2 1 i y( h) l which converges to zero as n --+ oo, it suffices to show that l pJ"'�slpj - P; ln P} I -+ 0 uniformly in i and j. But I Pi ( r�s) - rn ) P} I = [(n 4n - 1 ( t m l t m m n m l (y(m) - y( - m)) k l ( Pik Pi, n - + k - Pi,n- + kPjk ) l - 1 )/2]. Since I Pij l � (2/n) 1 1 2 this expression is bounded by where c 2 =lt (4.5.6) m y(m) l + 2 t m ) t� m i y(n - m) l � 8 m l y(m) l + 8 n m �� c � l y(m) l . The first term converges to zero as n --+ oo by the dominated convergence theorem since the summand is dominated by l y(m) l and L;;; = 1 ly(m) l < oo . The second term goes to zero since it is bounded by L ;;;= [n;2 1 ly(m) l . Since both terms are independent of i and j, the proof of (4.5.6) is complete. D Now let { X,} be a real-valued zero-mean stationary time series with autocovariance function y( · ) which is absolutely summable. Consider the transformed vector of random variables 137 §4.5.* Circulants and Their Eigenvalues (4.5.7) = with ri, vi , j 0, . . . , n - 1 defined as in Proposition 4.5. 1 . The components of Z are approximately uncorrelated for large n by Proposition 4.5.2. Moreover the matrix V, being orthogonal, is easily inverted to give n -1 .. "' . zh exp ( - l] Wh ) . Xi = n - 1/2 L... h�O Thus we have represented X0 , X 1 , . . . , X._1 as a sum of sinusoids with random coefficients which are asymptotically uncorrelated. This is one (albeit rough) interpretation of the spectral representation of the process {X, } . Another easily verified consequence of Proposition 4.5.2 is that with Z defined as in (4.5.7), ro sup I E I Zk l 2 - L y(h)exp( - ihwk ) l --+ 0 as n --+ oo . h� -ro O :<:; k :<:; n - 1 Let us consider now an arbitrary frequency w E ( - n , n] and define, by analogy with (4.5.7), n -1 Zw, n n - 1 12 L Xh exp(ihw). h�O Then n-1 E 1 Zw,n l 2 n -1 L y(k - l ) exp [ iw (k - l)] k, l� O 1 n- L (n - l hl) y (h)exp( iwh) lh l < n ro --+ L y (h) exp ( iwh) 2nf(w) as n --+ oo . h�-ro This shows that (2n)- 1 1Zw,n l 2 is an asymptotically unbiased estimator of f(w) . In Chapter 1 0 we shall show how this estimator can be modified in order to construct a consistent estimator of f(w) . We conclude this section by deriving upper and lower bounds for the eigenvalues of the covariance matrix of X. (X 1 , . . . , X.)' when {X, } is a stationary process with spectral density f(),), - n � ), � n. = = = = = Proposition 4.5.3. that Let {X, } be a stationary process with spectral density f such m := inff(A.) > 0 and M A := sup f(A.) < A oo , 4 . The Spectral Representation o f a Stationary Process 138 and denote by ). 1 , , An (A 1 :::;; ). 2 :::;; · • · :::;; ).n) the eigenvalues of the covariance matrix rn of (X 1 , , X.)'. Then 2nm :::;; A 1 :::;; An :::;; 2nM. . . . • . . PROOF. Let X = (x I ' . . . ' xn) ' be a non-zero right eigenvector of r n corresponding to the eigenvalue An . Then rJ� xje-ijff(v) dv :::;; f " = - rr :::;; showing that An L L Xixke - i(j - k ) vM dv j k = 2nM '[. xJ, j 2nM. A similar argument shows that A 1 ;::::: 2nm. D §4.6* Orthogonal Increment Processes on [ - rc, rc] In order to give a precise meaning to the spectral representation (4.2.5) mentioned earlier, it is necessary to introduce the concept of stochastic inte gration of a non-random function with respect to an orthogonal-increment process {Z(A) }. An orthogonal-increment process on [- n, n] is a complex valued stochastic process {Z(A), - n :::;; A :::;; n} such that Definition 4.6.1 . <Z(A), Z(A) ) < oo, <Z(J.), I ) = 0, and (4.6. 1 ) -n :::;; ). :::;; n, - n :::;; :::;; n, }, where the inner product is defined by <X, Y) (4.6.2) = E(X Y). The process {Z(A), - n :::;; A :::;; n} will be called right-continuous if for all ). E [ - n, n), II Z(A + o ) - Z(A) II 2 = E IZ(A + o ) - Z(A)I 2 ----> 0 as o 1 0. It will be understood from now on that the term orthogonal-increment process will mean right-continuous orthogonal-increment process unless specifically indicated otherwise. §4.6 * Orthogonal Increment Processes on [ - n, n] 139 {Z(A), - n :o:; A :o:; n} is an orthogonal-increment process, then there is a unique distribution function F (i.e. a unique non-decreasing, right-continuous function) such that F(J") 0, F(J") F(n), and (4.6.4) - n :o:; A :o:; f.1 :o:; n. F( p) - F(A) IIZ(p) - Z(J") 11 2 , Proposition 4.6.1 . If = = = PROOF. For F to satisfy the prescribed conditions it is clear, on setting A = that F( p) = IIZ ( p) - Z( - n) ll 2 , - n :o:; f.1 :o:; n. - n, (4.6.5) To check that the function so defined is non-decreasing, we use the orthog onality of Z(p) - Z(A) and Z(A) - Z( - n), - n :o:; A :o:; f.1 :o:; n, to write F( p) = IIZ(p) - Z(A) + Z()") - Z(- n) ll 2 II Z(p) - Z(A) II 2 + IIZ(A) - Z( - n) ll 2 2 F(A). The same calculation gives, for - n :o:; f.1 :o:; f.1 + 15 :o:; n, F( p + 15) - F( p) IIZ( p + 15) - Z( p) l l 2 = = ---> 0 as 15 1 0, by the assumed right-continuity of { Z(A) }. D Remark. The distribution function F of Proposition 4.6. 1 , defined on [ - n, n] by (4.6.5) will be referred to as the distribution function associated with the orthogonal-increment process {Z(A), - n :o:; )" :o:; n}. It is common practice in time series analysis to use the shorthand notation, E(dZ(A) dZ(p)) for the equations (4.6.3) and (4.6.4). = 15;.,,11 dF(A), ExAMPLE 4.6. 1 . Brownian motion {B()"), - n :o:; )" :o:; n} with EB()") = 0 and Var(B(A)) = <T 2 (A + n)/2n, - n :o:; )" :o:; n, is an orthogonal-increment process on [ - n, n]. The associated distribution function satisfies F(A) = 0, )" :o:; - n, F().) = <T 2 , )" 2 n, and F(A) = <T 2 (A + n)/2n, - n :o:; A :o:; n. EXAMPLE 4.6.2. If {N(A), - n :o:; A :o:; n} is a Poisson process on [ - n, n] with constant intensity c then the process Z()") = N(A) - EN(A), - n :o:; )" :o:; n, is an orthogonal-increment process with associated distribution function F()") = 0, 140 4. The Spectral Representation of a Stationary Process F(A) ={Z(A)} A F(A) c(A A s; - n, 2nc, ?: n and = + n), - n s; s; n. If c is chosen to has exactly the same associated distribution function as be () 2j2n then in Example 4.6. 1 . ) " { B(A)} §4. 7* Integration with Respect to an Orthogonal Increment Process We now show how to define the stochastic integral { Z(A), - A n} I(f) = 1 -,, ,/(v)dZ(v), where n s; s; is an orthogonal-increment process defined on the probability space (Q, :Ji', P) and f is any function on [ which is square We integrable with respect to the distribution function associated with proceed step by step, first defining I (f) for any f of the form n i =O f()") = L .IJ( A; , A;+d (A), as I(f) = -n = A0 A1 < F n, n] Z(A). < . . · < An+ l = n, n Z(A;+d - Z(A;)], I.M i =O (4.7. 1 ) (4.7.2) and then extending the mapping I to an isomorphism (see Section 2.9) of 2 2 = L (F) onto a subspace of L (Q, :Ji', P). L2 ( [ Let � be the class of all functions having the form (4.7. 1 ) for some n E 1 , 2, Then the definition (4.7.2) is consistent on � since for any given f E � there is a unique representation of f, -n,n],&B,F) {0, ... }. m f(A) = iI=O rJ(v,, v,+d(A), - n v0 < v1 < · · · < vm+l n, i n which r; ri + l , 0 i < All other representations of f having the form (4.7. 1 ) are obtained by reexpressing one or more of the indicator functions = =f. s; = m. I< v, , v ,+ d as a sum of indicator functions of adjoining intervals. However this makes no difference to the value of I (f), and hence the definition (4.7.2) is the same for all representations (4.7. 1) of f. It is clear that (4.7.2) defines I as a linear mapping on �. Moreover the mapping preserves inner products since if f E � and g E � then there exist representations and n j(A) = iL=O JJ(A;, Ai+ Ip) g(A) = iL=On gJ(A;, A;+d(A) §4.7.* Integration with Respect to an Orthogonal Increment Process 141 )on+ 1 Hence the inner P) ( /(f), /(g) ) = (ta.{;[Z(Jci + l ) - Z(.lc J ], it gJZ()o i + i ) - Z(.lcJ ]) o n = L !Jli ( F(Jci+i) - F(.lcJ ), i=O by the orthogonality of the increments of { Z(A.)} and Proposition 4.6. 1 . But the last expression can be written as 1 -"· "/(v)g(v) dF(v) = <J, g) Ll<Fl' in terms of a single partition - n = )00 < A 1 < · · · < product of /(f) and /(g) in U(Q., .'F, is = n. the inner product in L 2 (F) of f and g. Hence the mapping I on � preserves inner products. Now let � denote the closure in L 2 (F) of the set �. If f E f0 then there exists a sequence Un } of elements of � such that llfn - J IIP(F) ---> 0. We therefore define /(f) as the mean square limit /(f) = m.s.lim I(fn ), (4.7.3) after first checking (a) that the limit exists and (b) that the limit is the same for ali sequences { fn } such that II f, - f I L2 (F) ---> 0. To check (a) we simply observe that for fm , f, E � ' I I I( f, ) - I(fm l ll = I I I(f" - fm l ll = 11 /n - fm 1 1 L 2 (F) ' so if I I f, - f I I P < Fl ---> 0, the sequence { /( f,) } is a Cauchy sequence and therefore convergent in L 2 (0., .'F, P). To check (b), suppose that llfn - J I I P (F) ---> 0 and ll gn - J II L2 (F) ---> 0 where fn , gn E �. Then the sequence /1 , g 1 , /2 , g2 , , must be norm convergent and therefore the sequence /(JJ l, / (g J l, / ( /2 ), /(g 2 ), . , must converge in L 2 (0., .'51', However this is not possible unless the sub sequences { !Un l } and { /(g")} have the same mean square limit. This com plet�s the proof that the definition (4.7.3) is both meaningful and consistent for jE �. The mapping I on � is linear and preserves inner products since if p n E f0 and I I f,< n - p n I I P F ---> 0, f,Ul E � ' i = 1 , 2, then by the linearity of I on �, < l /(a J ( l l + a 2 j< 2 l ) = lim /(a 1 f,( l l + a 2 f,< 2 l ) . • . P). and by the continuity of the inner product, .. 142 4. The Spectral Representation of a Stationary Process 2 = (j(ll, j( ) ) L2(F) · It remains now only to show that qj = U (F). To do this we first observe that the continuous functions on [ - n, n] are dense in L 2 (F) since F is a bounded distribution function (see e.g. Ash ( 1 972), p. 88). Moreover � is a dense subset (in the L 2 (F) sense) of the set of continuous functions on [ - n, n]. Hence qj U(F). Equations (4.7.2) and (4.7.3) thus define I as a linear, inner-product preserving mapping of qj = L 2 (F) into U (n, ff , P). The image J(qj) of qj is clearly a closed linear subspace of L 2 (0., ff, P), and the mapping I is an isormorphism (see Section 2.9) of qj onto J(qj). The mapping I provides us with the required definition of the stochastic integral. = Definition 4.7.1 (The Stochastic Integral). If { Z(A.)} is an orthogonal-increment process on [ - n, n] with associated distribution function F and if f E U(F), then the stochastic integral J( - "·"1 f(A.) dZ(A.) is defined as the random variable I (f) constructed above, i.e. l f(v) dZ(v) := / (f). J(-1t,1t] Properties of the Stochastic Integral For any functions f and g in L 2 (F) we have established the properties (4.7.4) and E (l (f)I( g)) = 1 - "·"l f(v) g (v) dF(v). (4.7.5) E(l(fn)l(gn)) � E(l(f)I(g)) = J " · "/(v)g(v) dF(v). _ (4.7.6) 2 Moreover if Un } and { gn } are sequences in L (F) such that ll fn - fll u(FJ � 0 and llgn - g ii L2(FJ � 0, then by continuity of the inner product, From (4.7.2) it is clear that E (l (f)) = 0 (4.7.7) for 2(all f E �; if f E qj then there is a sequence Un }, fn E �. such that L Fl f and /(!,) � l (f), so E (l (f)) = limn �oo E (I(fn)) and (4.7.7) remains fn 143 §4.8. * The Spectral Representation valid. This argument is frequently useful for establishing properties of stochastic integrals. Finally we note from (4.7.5) and (4.7.7) that if {Z(),) } is any orthogonal increment process on [ - n, n] with associated distribution function then X, = l(e i' · ) = { e icv J (-1t,1t] F, dZ(v), (4.7.8) is a stationary process with mean zero and autocovariance function E(Xc +h X,) = { e i vh J(-1t,1t] dF(v). (4.7.9) In the following section we establish a converse of this result, namely that if {X, } is any stationary process, then {X, } has the representation (4.7.8) for an appropriately chosen orthogonal increment process { Z(A.) } whose associated distribution function is the same as the spectral distribution function of { X, }. §4.8 * The Spectral Representation Let {X, } be a zero mean stationary process with spectral distribution function Ffirst the spectral representation (4.2.5) of the process { X, } we . Toneedestablish to identify an appropriate orthogonal increment process { Z(A.), ) E [ - n, n] } . The identification of { Z(A.) } and the proof of the representation will be achieved by defining a certain isomorphism between the subspaces £ = sp{X" t E E} of and X = sp{e i '·, t E E} of This iso morphism will provide a link between random variables in the "time domain" and functions on [ - n, n] in the "frequency domain". Let Yf = sp {X,, t E E} and ff = sp { e i ' · , t E E} denote the (not necessarily closed) subspaces ff c and ff c consisting of finite linear combinations of X,, t E E, and e i' · , t E E, respectively. We first show that the mappmg , L2 (Q,li',P) L2 (0., li', P) L2 (F). L2 (F) (4.8. 1 ) defines an isomorphism between Yf and % . To check that T is well-defined, suppose that II L}=1 aj X,1 - I::'=1 bk X,J = 0. Then by definition of the norm and Herglotz's theorem, L2 (F) showing that (4.8. 1 ) defines T consistently on Yf. The linearity of T follows 4. The Spectral Representation of a Stationary Process 144 easily from this fact. In addition, showing that T does in fact define an isormorphism between .Yf and X. We show next that the mapping T can be extended uniquely to an iso morphism from :if onto %. If Y E :if then there is a sequence Y, E .Yf such that I I Y, - Yll ...... 0. This implies that { Y, } is a Cauchy sequence and hence, since T is norm-preserving, the sequence { T Y,} is Cauchy in L 2 (F). The sequence { TY, } therefore converges in norm to an element of %. If T is to be norm-preserving on :if we must define TY = m.s.iim TY,. This is a consistent definition of T on :if since if II Y, - Yll ...... 0 then the sequence TY1 , TY1 , TY2 , TY2 , is convergent, implying that the sub sequences { TY, } and { TY,} have the same limit, namely TY. Moreover using the same argument as given in Section 4.7 it is easy to show that the mapping T extended to :if is linear and preserves inner products. Finally, by Theorem 2. 1 1.1, X is uniformly dense in the space of continuous functions ¢ on [ - n, n] with ¢(n) = ¢( - n), which in turn is dense in L 2 (F) (see Ash ( 1972), p. 88). Hence .i? L 2 (F). We have therefore established the following theorem. • . • = Theorem 4.8. 1 . IfF is the spectral distribution function of the stationary process { X" t E .Z}, then there is a unique isomorphism T of sp { X1, t E .Z} onto L 2 (F) such that Theorem 4.8. 1 is particularly useful in the theory of linear prediction (see Section 5.6). It is also the key to the identification of the orthogonal increment process {Z(A), - n :::::; A :::::; n} appearing in the spectral representation (4.2.5). We introduce the process { Z(A)} in the following proposition. Proposition 4.8.1 . If T is {Z(A), - n :::::; A :::::; n} defined Z(A) = defined as in Theorem 4.8. 1 then the process by - n :::::; A :::::; n, T - 1 U(-1t,;.k )), §4.8.* The Spectral Representation 145 is an orthogonal increment process (see Definition 4.6. 1 ). Moreover the distri bution function associated with { Z(A)} (see Proposition 4.6. 1 ) is exactly the spectral distribution function F of { Xr } · PROOF. For each A E [ - n, n], Z(A) is a well-defined element of sp { X, t E Z} by Theorem 4.8. 1 . Hence <Z(A), Z(A) ) < oo. Since Z(A) E sp { X, t E Z} there is a sequence { Y, } of elements of sp { Xr , t E Z} such that II Y, - Z(A) II -+ 0 as n -+ oo. By the continuity of the inner product we have <Z(A), 1 ) = lim < Y,, 1 ) = 0 since each X, and hence each Y,, has zero mean. Finally if - n A3 :s; A4 :s; n, :s; A1 :s; A2 :s; = <Io., , ;. . J ( · ), /(;. , , ;. 2 J ( · ) ) L2 (F) = l J (-n, n] = 0, /(;., , ;..1(v)/<;. , , ;. 2 1(v) dF(v) completing the proof that { Z(A)} has orthogonal increments. A calculation which is almost identical to the previous one gives <Z(p) - Z(A), Z(p) - Z(A)) = F(p) - F(A), showing that {Z(A)} is right-continuous with associated distribution function D F as claimed. It is now a simple matter to establish the spectral representation (4.2.5). Theorem 4.8.2 (The Spectral Representation Theorem). If { Xr } is a stationary sequence with mean zero and spectral distribution function F, then there exists a right-continuous orthogonal-increment process { Z(A), - n :s; A :s; n } such that and (i) E I Z(A) - Z( - n) l 2 = F(A), - n :s; A :s; n, (ii) Xr = J( - n , nJ e irv dZ(v) with probability one. PROOF. Let { Z(A)} be the process defined in Proposition 4.8. 1 and let I be the isomorphism, /(f) = J / (v) dZ(v), _" " · 146 2 f E = L (F) 4. The Spectral Representation of a Stationary Process U(Q, .?, P), I(f) = L .t;(Z(A; + l) - Z(),;)) = This relationship remains valid for all f E = L 2 (F) since both and we must have I = (i.e. = f for all 2(F)) and henceTherefore from Theorem 4.8. 1 . fareE Lisomorphisms. X, = I(eit ·) = JI( - 1t, 1t]eitv dZ(v), giving the required representation for {X,}. The first assertion of the theorem is an immediate consequence of Proposition 4.8. 1 . forthogonal {X,} is a zero-mean stationary sequence then there existhat ts a right continuous increment process { Z(.A_ ) , -n :::; ; A :::; ; n} such Z( -n) = 0 and X, = JI(-1t,1t] e itv dZ(v) with probability one. { Y(.A_) } and {Z(.A_) } are two such processes then P(Y(A) = Z(.A_)) = 1 for each AE[ - n,n]. PROOF. If we denote by { Z*(.A_) } the orthogonal-increment process defined by Proposition 4.8. 1 , then the process Z(.A_) = Z*(.A_) - Z*(-n), -n :::;; :::;; not only satisfies Z(-n) = 0, but also has exactly the same increments as {Z*(),) }. Hence I e itv dZ*(v) = J(-1t,1t] I e itv dZ(v). X, = J(-1t,1t] Suppose now that { Y(.A_) } is another orthogonal-increment process such that Y( -n) = 0 and e it v dY(v) = I e i<v dZ(v) with probability one. (4.8.2) X, = I J(-1t,1t] J(-1t,1t] If we define for f E U(F), Iy(f} = J_"· "/(v)dY(v) from E0 onto I(E0) <:; which was discussed in Section 4.7. If 'lJ has the representation (4.7. 1 ) then n i=O T-t (f). E0 T- 1 I T -1 TJ(f) D Corollary 4.8.1 . I If w n, 147 §4.8 * The Spectral Representation and then we have from (4.8.2) Iy(ei' · ) = lz(ei' ·) for all t E E.. (4.8.3) Since Ir and lz are equal on sp {e ;'., t E Z} which is dense in L 2 (F) (see the comment preceding Theorem 4.8. 1 ), it follows that ly(f) = lz(f) for all f E L 2 (F). Choosing f(v) = /( - n , AJ ( v) we obtain (with probability one) Y()�) = 1-,, ,/(v) dZ(v) = Z(A), - n ::;; A ::;; n. D Remark 1. In the course of the proof of Theorem 4.8.2, the following result was established: Y E sp { X,, t E E.} if and only if there exists a function f E L 2 (F) such that Y = /(f) = J( - ,,,1 J( v) dZ(v). This means that I is an isomorphism of L 2 (F) onto sp{ X,, t E E.} (with the property that l (e i' · ) = X,). The argument supplied for Theorem 4.8.2 is an existence proof which does not reveal in an explicit manner how { Z(A)} is constructed. In the next section, we give a formula for obtaining Z(A) from {X, } . Remark 2. Remark 3. The corollary states that the orthogonal-increment process in the spectral representation is unique if one uses the normalization Z(- n) = 0. Two different stationary processes may have the same spectral distribution function, for example the processes X, = J(-,,,1 e it A dB(A) and Y, J<-,,,1 e i' A dN(A) with { B()�)} and { N(A)} defined as in Examples 4.6. 1 and 4.6.2. In such cases the processes must of course have the same autocovariance function. = ExAMPLE 4.8. 1 . Let Z(A) = B(A) be Brownian motion on [ - n, n] as defined in Example 4.6. 1 with EZ(A) = 0 and Var(Z(A)) = 0" 2 (A + n)/2n, - n ::;; A ::;; n. For t E E., set g,(v) = cos(tv)/( ,, o 1 ( v) + sin(t v)/( 0 ,,1 (v) and X, = J2 { g,(v) dB(v) = J( -n,n] - { J2 ( J(-n,O] J2 cos(tv) dB(v) + { sin(tv) dB(v) J(O,n] (cf. Problem 4.25). Then EX, = 0 by (4.7.7), and by (4.7.5), 0' 2 0' 2 E (X, +h Xh ) = { , 2 dv = 2rc 2n J(-n,nJ g +h(v)g,(v) f" 0 cos(hv) dv. Hence E (X, +h X,) = 0' 2 1\. o and consequently { X, } � WN(0, 0' 2 ). ) , (4.8.4) (4.8.5) 148 4. The Spectral Representation of a Stationary Process Since however B(A.) is Gaussian we can go further and show that the = 0, ± 1 , . . . , are independent with N(O, To random variables sk be any k distinct integers and for each fixed j let prove this let s be a sequence of elements of EfJ, i.e. functions of the form (4.7. 1 ), such that · ) in Since the mapping /8 is an isomorphism of � = U(F) onto /8(�), we conclude from (4.8.4) that (4.8.6) + ... + + ... + X,, t 1, jj<nJ --+ 9si( L2 (F). 81/B(Jtl) X, . • . , � CJ2{ ).Ji<nl } 8k /B(h(n)) � 81 Xs, 8k Xs. · The left -hand side, l8(L Y�1 8i jj<n l ), is clearly normally distributed with mean zero and variance 1 I J� 1 8ifi<nJ I 2 . The characteristic function of 18(L Y�1 8i jj<n l ) is therefore (Section 1 1 .6) L2 (F), as n --+ oo , k 8.J:<n) 1 2 Ik 8 e is 1 2 = (J2 Ik 82 I j� 1 • I j�1 I j� 1 From we conclude therefore that I 1� 1 8j Xsj has the Gaussian char acteristic function, f/J(u) = nlim-tct:J rftn(u) = exp [ - tu2 2 ± 8f]. Since this is true for all choices of 8 1 , . , 8b we deduce that Xs,, ... , Xs. are it then follows that the random jointly normal. From the covariances variables X,, t = 0, ± 1 , . . , are iid N(O, CJ 2 ). If A is a Borel subset of [-n, n], it will be convenient in the following proposition (and elsewhere) to define L f(v) dZ(v) = 1-"· "/(v)/A(v) dZ(v), By the continuity of the inner product in J J L2(F) --+ r . J J U(F) (4.8.6) 0" J=l . (4.8.5) . . Remark 4. (4.8.7) where the right-hand side has already been defined in Section 4.7. upposeofthediscontinuity spectral distribution function F)"0of<then. stationary Spoint process {X,} has a at A.0 where -n < Then with probability one, X, = JI(-"·"l\ {.<o} eirv dZ(v) + (Z(A.o ) - Z(A.() ))eir.<o, where the two terms on the right side are uncorrelated and Proposition 4.8.2. PROOF. The left limit Z(A.0) is defined to be §4.8.* The Spectral Representation Z(A0 ) = 1 49 m.s.lim Z(An ), (4.8.8) where An is any sequence such that An i A0 . To check that (4.8.8) makes sense we note first that {Z(An ) } is a Cauchy sequence since IIZ(An ) - Z(Am ) ll 2 = I F(An ) - F(Am ) l --+ 0 as m, n --+ oo . Hence the limit in (4.8.8) exists. Moreover if vn i A 0 as n --+ oo then II Z(An ) - Z(vn ) ll 2 = I F(An ) - F(vn ) l --+ 0 as n --+ oo , and hence the limit in (4.8.8) is the same for all non-decreasing sequences with limit A 0 . For b > 0 define A± c) = Ao ± b. Now by the spectral representation, if 0 < b < n - I A0 I , X, j e irv dZ(v) = J(-1t,1t]\(.L,,A,] + j e irv dZ(v). J(L,,A,] (4.8.9) Note that the two terms are uncorrelated since the regions of integration are disjoint. Now as b --+ 0 the first term converges in mean square to f<-"·"l \{ Ao ) e irv dZ(v) since e ;' .J<-"·"l\(L,,A,1 --+ e i' · I<-"·"l\ { Ao) in L 2 (F). To see how the last term of (4.8.9) behaves as c5 --+ 0 we use the inequality I J(L,, r A,] e itv dZ(v) - e itAo(Z(Ao) - Z(A(} )) I r eitv dZ(v) - eitAo(Z(Ad) - Z(A_d)) I I J(L,,A,] :$; (4.8. 1 0) As c5 --+ 0 the second term on the right of (4.8. 1 0) goes to zero by the right continuity of {Z(A)} and the definition of Z(A0). The first term on the right side of (4.8. 1 0) can be written as 0 as c5 --+ 0, by the continuity of the function e ir ·. Hence we deduce from (4.8. 1 0) that --+ j e ir v dZ(v) � e i< Ao(Z(A0) - Z(A0 )) as c5 --+ 0. J(L,, A,] The continuity of the inner product and the orthogonality of the two integrals in (4.8.9) guarantee that their mean-square limits are also orthogonal. 4. The Spectral Representation of a Stationary Process ! 50 Moreover Var(Z(A.0 ) - Z(A.0 )) = lim Var(Z(A.0) - Z(A.")) = Ant Ao F A. ) F(},() ). ( 0 - 0 If the spectral distribution function has k points of discontinuity at ..1. 1 , . . . , A.k then { X, } has the representation X, = I J(-1t,1t]\{ A ,, ... , .<,} e irv dZ(v) + �j- (Z(A.j) - Z(A.n)e;,;.i, (4.8. 1 1 ) where the (k + 1 ) terms on the right are uncorrelated (this should be compared with the example in Section 4.2). The importance of (4.8. 1 1) in time series analysis is immense. The process Y, = (Z(A.0 ) - Z(A.0 ))eir -<o is said to be deterministic since Y, is determined for all t if Y,0 is known for some t0 . The existence of a discontinuity in the spectral distribution function at a given frequency ),0 therefore indicates the presence in the time series of a deterministic sinusoidal component with frequency ..1. 0. §4.9* Inversion Formulae Using the isomorphism T of Theorem 4.8. 1 and a Fourier approximation to /( v , wJ C ) E L it is possible to express directly in terms of { X, } the orthogonal-increment process { Z(),)} appearing in the spectral representation (4.2.5). Recall that for - n < v < w < n 2 (F), and Consequently if T(Z(w) - Z(v)) = /( v, w] ( · ) , TX, = e; . ij· " L., IY.J. e UI :S n for all t E Z. V(Fl I( v , w] ( ), · (4.9. 1 ) then by the isomorphism, I r:xjXj � Z(w) - Z(v). (4.9.2) UI :S n An appropriate trigonometric polynomial satisfying (4.9. 1 ) is given by the n1h-order Fourier series approximation to /( v , wJ ( · ), viz. j where hn(A) = I r:xj ei .<, Ul :s: n (4.9.3) (4.9.4) 151 §4.9.* Inversion Formulae I n Section 4. 1 1 we establish the following essential properties of the sequence of approximants { hn( · ) } : l hn(A) - I(v .wJ (A.)I --> 0 n --> where E is any open set containing the points v and w, and sup ). E [- n . n] \ £ as 00 , (4.9.5) sup l hn(A.)I s M < oo for some constant M and all n. (4.9.6) is thew spectral distribution andIfif Fv and are continuity points function ofF suchofthatthe -stationary rc < v < A E [ - n , n] Proposition 4.9.1 . { X" } sequence w < n, then PROOF. Problem 4.26. D { Xn} stationaryF,sequence with autocovariance function If is a function ), spectral distribution and spectral representation r). dZ(A.), and if v and w ( - n < v < w < rc) are continuity points ofF, ei then as n oo __!_ I ( f "' e - ij). ) � Z(w) - Z(v) (4.9.7) 2n and (4.9.8) __!_ y(j) ( f w e - i dA.) --> F(w) - F(v). 2rc I PROOF. The left side of (4.9.7) is just T- 1 h where T is the isomorphism of Theorem 4.8. 1 and h is defined by (4.9.3). By Proposition 4.9. 1 we conclude Theorem 4.9.1 . y( · f<- ,,,1 --> lil :<: n Iii :<; n that X, x j = dA. v v j). n T- 1 hn � T - 1 I(v, w] = n Z(w) - Z(v), which establishes (4.9.7). To find Z(8) - Z(8-), - n < e 4.32. s n, see Problem To prove (4.9.8) we note, from the spectral representation of y( · ) that the left-hand side of (4.9.8) can be written as 4. The Spectral Representation of a Stationary Process ! 52 By Proposition 4.9. 1 and the Cauchy-Schwarz inequality, I Hence <h" - J(v. wJ• 1 l h" - J(v, w111 F112 (n) � o as n � oo . >I ::;:; l - ,, ,1hnCA.)dF(.A) � J _,, ,/(v,w](.A) dF(.Ic) F(w) - F(v), = as required. D Although in this book we are primarily concerned with time series in discrete-time, there is a simple analogue of the above theorem for continuous time stationary processes which are mean square continuous. This is stated below. The major differences are that sums are replaced by integrals and the range of integration is IR instead of (-n, n]. Let { X(t),y( t Ewhich IR} beisacontinuous zero meanatstationary process with 0. Then there exists a autocovariance function spectral distribution function F(t) and an orthogonal-increment process Z(t) on - oo < t < oo such that y(s) = I: eis.< dF(A) and Theorem 4.9.2. ·) More T� oo,over if v and w (- oo < v < w < oo are continuity points of F, then as _!_2n I T (f w e- iry dy)y(t)dt � F(w) - F(v) v and __!_2n__ I-T (fvw e- iry dy) X(t) dt � Z(w) - Z(v). ) -T T For a proof of this result see Hannan ( 1 970). �4. 10* Time-Invariant Linear Filters { t= ... } { c,, k , t, = ... {X, } t t 0, ± 1 , . . . ; c, , kxk, The process Y, , 0, ± 1, is said to be obtained from by application of the linear filter C = k 0, ± 1 , if 00 r; = L: k = -oo = = 0, ± 1, . . . } ,4. 1 0. 1 ) §4.1 0. * Time-Invariant Linear Filters 1 53 the coefficients c,, k are called the weights of the filter. The filter C is said to be time-invariant if c, , k depends only on (t - k), i.e. if (4. 1 0.2) since then Y;-s = L oo Cr -s,k Xk k= 00 = L c,, s + k Xk k= - oo 00 = L c,, k xk - s• k= oo 00 i.e. the time-shifted process { r;_., t = 0, ± 1 , . . . } is obtained from { x, _ s , t = 0, ± 1 , . . . } by application of the same linear filter C. For the time-invariant linear filter H = { h ; , i = 0, ± 1 , . . . } we can rewrite (4. 1 0. 1 ) in the form r; = L hk Xt -k . 00 (4. 1 0.3) k= - oo The time-invariant linear filter (TLF) H is said to be causal if hk = 0 for k < 0, since then r; is expressible in terms only of X., s s t. EXAMPLE 4. 1 0. 1 . (4. 1 0.4) The filter defined by r; = aX_ ,, t = 0, ± 1, ... ' is linear but not time-invariant since the weights are c,, k = al5,, -k which do not depend on (t - k) only. EXAMPLE 4. 1 0.2. The filter t = 0, ± 1 , . . . ' is a TLF with hi = ai,j = - 1 , 0, 1 and hi = 0 otherwise. It is not causal unless a_ 1 = 0. EXAMPLE 4. 1 0.3. The causal ARMA(p, q) process, t = 0, ± 1 , . . . c/J(B)X, 8(B)Z" can be written (by Theorem 3. 1 . 1 ) in the form = ' where L i=o t/Ji z i = 8(z)/c/J(z), lzl s 1. Hence { Xr } is obtained from {Z, } by application of the causal TLF { t/Ji ,j = 0, 1, 2, . . . } . 1 54 4. The Spectral Representation of a Stationary Process An absolutely summable time-invariant linear filter H 0, ± 1 , . . . } is a TLF such that Definition 4.10.1. { hj,j = 00 = lh l I j < w. j= � oo If {X, } is a zero-mean stationary process and H is an absolutely summable TLF, then applying H to {X, } gives Y, = 00 (4. 10.5) L hj Xt -j• j= � oo By Theorem 4.4. 1 we know that { Y; } is stationary with zero mean and spectral distribution function (4. 10.6) where h(e-iv) = 00 L hj e- ijv. j= - oo In the following theorem, we show that (4. 10.5) and (4. 10.6) remain valid under conditions weaker than absolute summability of the TLF. The theorem also shows how the spectral representation of the process { r; } itself is related to that of {X, } . representation Let {X, } be a zero-mean stationary process with spectral e itv dZx(v) X, I J(-1t.1t) and spectral distribution function Fx( ). Suppose H { hj,j 0, ± 1 , . . . } is a TLF such that the series 2:;= -n hj e-ij · converges in L 2 (Fx) norm to h(e- ;·) j=L hje-W (4. 10.7) as n --> oo. Then the process Theorem 4.10.1. = · = = = 00 - co j= - oo is stationary with zero mean, spectral distribution function I l) lh(e- iv )l 2 dFx(v) Fy(A) J(-1t. and spectral representation = (4. 1 0.8) §4. 1 0 * Time-Invariant Linear Filters !55 (4. 1 0.9) - iv) is non-zero for vi A where SA dFx(A.) = 0, the filter H can be I f h(e inverted in the sense that X, can be expressed as I (4. 10. 1 0) X, = g(e - iv)ei" dZy(v), J(-n, 7t] where g(e-i v) 1/h(e - iv) and dZy(v) h(e - iv)dZx(v). From (4. 10. 10) and Remar k 1 of Section 4.8 it then follows=that X, E sp{ Y,, s } . = - oo < i eirv " h -e- iv dZ (v) < oo PROOF. From the spectral representation of { X, } we have n " h-X . = j =/_.- n 1 t-J hie-ii" n ( - n , n] /_. j= - n X 1 (4. 10. 1 1) h(e - i·) L2(Fx), it follows that the left converges to and since Li = - n in side of (4. 1 0. 1 1 ) converges in mean square and that Equation (4. 1 0.8) then follows directly from (4.7.5). Once (4. 10. 10) is established, it will follow immediately that s oo }. Since g(e - iv)h(e - iv) = 1 for v i A and SA dF y(A.) = 0, it follows that g EL 2 (F y) so that the stochastic integral in (4. 1 0. 10) is well-defined. Also, X, e sp{ Y,, = - oo < I J ( - n, n] < 11 - g(e - i'v)h(e - i'v)l 2 dFx(v) = 0, which establishes (4. 1 0. 10). 0 Remark 1. An absolutely summable TLF satisfies the assumptions of the first part of the theorem since in this case, LJ = - n converges uniformly to hi e - ii· h(e -i·). Remark 2. Heuristically speaking the spectral representation of {X, } de composes X, into a sum of sinusoids, -n < v� n. 4. The Spectral Representation of a Stationary Process ! 56 H is to produce corresponding components <v which combine to form the filtered process { Y,}. Consequently [ h (e - i v )[ is often called the amplitude gain of the filter, arg h(e - iv ) the phase gain and h(e - iv ) itself the transfer function. In view of (4. 10.8) with A. = the quantity [ h (e - iv W is referred to as the variance (or power) gain or as the power transfer function at frequency v. The effect of the TLF :-s; n, -n n, Remark 3. The spectral viewpoint is particularly convenient for linear filtering since techniques are available for producing physical devices with prescribed transfer functions. The analysis of the behaviour of networks of such devices is particularly simple in terms of spectral analysis. For example if is operated on sequentially by two absolutely summable TLF's and in series, then the output process will have spectral representation { X1} H 1 H2 { W,} I e irv h l (e-iv )hz (e- iv )dZx(v) W, = J(-1t ,1t] and spectral distribution function I [ h 1 (e -iv)h2 (e - ivW dFx(v). Fw(A.) = J(-1t,1t] Let { Y,} be the MA( 1 ) process, Y, = Z1 - Z, _1, {Z,} WN(O, cr2). Then since h(e - i v) = - e - iv is non-zero for 0 and since F is continuous at zero, it follows from Theorem 4. 1 0. 1 that z, = JI( - 1t, 1t]eitv( l - e - iv) - ' dZy(v). Although in this case Z1 E sp { Y., oo < s t} (see also Problem 3.8), Z1 does not have a representation of the form L� o a Y, - . More generally, the filter ( 1 - B) can be inverted whenever the spectral distribution function of the input process is continuous at zero. To illustrate the possible non-invertibility of ( 1 - B), let us apply it to the stationary process, z; = z, + A, where A is uncorrelated with {Z,}, E(A) = 0 and Var(A) = cr� > 0. Since Y, = ( 1 - B)Z, = ( 1 - B)Z;, it is clear that we cannot hope to recover {z;} from {Y,}. In this example the transfer function, h(e - iv) = 1 - e - iv, is zero at v = 0, a frequency to which F assigns positive measure, cr�. Remark 4. � v 1 - z· :-s; i =/= z i §4. 1 1 .* Properties of the Fourier Approximation hn to /(v,wJ 1 57 §4. 1 1 * Properties of the Fourier Approximation hn to /(v , w] (4.9.5) and (4.9.6) of the trigonometric (4.11.1) hn(8) _I_2n lil s n e iO f-"" I (A)e i dA which were used in deriving the inversion formulae of Section 4. 9 . Let Dn C) be the Dirichlet kernel (see Section 2.1 1), { sin[(n �)A] i Dn(A) = I iLI ei J. = sin(A/2) if A 0' 2n 1 if A = 0. If b E (0, 2n) and { f, } is the sequence of functions defined by fn (x) J: Dn(A) dA, then as n --+ oo, fn(x) --+ n uniformly on the set [b, 2n - b] and sup l fn (x) l :S M < oo for all n 1. PROOF. We have for x 0, xf Dn(A)dA = 2 f x/2 r 1 sin((2n 1))o) dA xf /2 g(A)sin((2n + 1)A)dA, (4.1 1.2) +2 In this section we establish the properties polynomials " L., = - (v, w] + 9 J. ' =I + Proposition 4.1 1.1. = X E � (0, 2 7t - O] � 0 + 0 0 where g(A) = [sin - 1 (A) - A - 1 ] = (A - sin A)/(A sin A). Straightforward calcula tions show that g(A) = A/6 + o (A), g'(A) = i + as A --+ 0 and that g(A) and g'(A) are uniformly bounded on [0, n - b/2]. Integrating the second term on the right side of and using the fact that g(O) = 0, we obtain o(1) (4.11.2) x/ 2 ro 2 g(A)sin((2n + 1)A)dA = - 2(2n 1 ) - g(x/2)cos((2n l)x/2 J 2(2n + 1 ) 1 J: g'(A)cos((2n + l)A)dA. 1 + + - + 12 Since g(A) and g'()o) are bounded for A E [0, n - b/2] it follows easily that this expression converges to zero uniformly in on the set [0, 2n - b]. On the other hand by a standard result in analysis (see Billingsley p. 239), x (1986), ! 58 4. The Spectral Representation of a Stationary Process 2 f x/2 0 A - 1 sin ((2 n + 1 ) ..1.) d ..1. = 2 f (2 n + l )x/2 0 ....... n as n ....... r 1 sin )_ d)_ (4. 1 1 .3) oo. It is also evident that the convergence in (4. 1 1 .3) is uniform in x on the set [<5, 2n - <5] and hence that fn (x) converges uniformly to n on the same set. Moreover since the integral 2 Jll A -l sin ..1. is a continuous function on [0, oo) with finite limits at 0 and = oo , the integral in (4. 1 1 .3) is uniformly bounded in x ( :;::,: 0) and Combined with the uniform convergence to zero of the second integral on the right of (4. 1 1 .2), this shows that fn(x) is uniformly bounded for x E [0, 2n - <5] and :;::,: I . D y =n y dA. . n for - n < v < w < Ifn, { h" } is the sequence of functions defined in (4. 1 1 . 1 ), then 0 as n --> 8E(sup -1t, 1t)\E where E is any open subset of [ - n, n] containing both v and Also 8E(sup- 1t, 1t) h ( ) M < for all n I. Proposition 4. 1 1 .2. l hn(8) - /(v , wJ (8) 1 --> PROOF. l n 8 1 s 00 , w. :;::,: oo h = 2n_!__ L eij8 e- ij'- dA. n(8) 1 2n fw lk> n v fw Dn(8 - A) dA v 1 f e-v Dn ()-) d 2n e-w fe-w 1 ( f e-v 0 Dn(}-) - 0 Dn(A.) = - == A. dA. 2n ) d)_ . (4. 1 1 .4) Given the set E, there exists a 6 > 0 such that 6 < 1 8 - v i < 2n - 6 and 6 < 1 8 - wl < 2n - 6 for all 8 E [ - n, n]\E. Since D"(A.) is an even function it follows from Proposition 4. 1 1 . 1 that _!__ 2n f -v o Dn(A.) o d..{ --> { � -z �f 8 - v > 0, If 8 - V < 0, and the convergence is uniform in 8 on [ - n, n]\E. The same result holds with /(v , wJ (8) uniformly on [ - n, n]\E. v replaced by w, and hence The uniform boundedness of h"(8) follows on applying Proposition 4. 1 1 . 1 to (4. 1 1 .4) and noting that 1 0 - v i < 2n and 1 8 - w l < 2n for all 8 E [ - n, n]. h"(O) --> 0 1 59 Problem:; Problems 4. 1 . If }'( · ) is the autocovariance function of a complex-valued stationary process, show that y(O) � 0, ly(h)l � y(O), y(h) = y( - h), and that y( · ) is non-negative defi nite (see (4. 1.6)). 4.2. Establish whether or not the following function is the autocovariance function of a stationary process: 1 if h = 0, - 5 if h = ± 2, y(h) = : - 5 if h = ± 3, otherwise. 4.3. If 0 < a < n, use equation (4.3. 1) to show that h = ± 1 , ± 2, . . . , h - 1 sin ah, y(h) = h = 0, a, {� { is the autocovariance function of a stationary process { X,, t = 0, ± 1, . . . } . What is the spectral density of {X, } ? 4.4. I f { X, } is the process defined by equation (4.2.1), show that {X, } i s real-valued if and only if Aj = - An - j and A(A) = A(An - ), j = 1, . . . , n - 1 and A(A") is real. Sh<�w that {X,} then satisfies equation (4.2.2). 4.5. Determine the autocovariance function of the process with spectral density f(A) (n - IAI)/n 2 , - n � A � n. = 4.6. Evaluate the spectral density of {Z, }, where {Z, } � WN(O, <J2 ). 4. 7. If { X, } and { Y, } are uncorrelated stationary processes with spectral distribution functions Fx( · ) and Fr( · ), show that the process { Z, := X, + Y, } is stationary and det,ermine its spectral distribution function. 4.8. Let {X, } and { Y, } be stationary zero-mean processes with spectral densities fx and fr· If fx(A) � fr(A) for all A E [ - n, n] , show that (a) rn . Y - rn . X is a non-negative definite matrix, Where rn. Y and rn . X are the covariance matrices of Y = ( Y1 , . . , Y,)' and X = (X 1 , . . . , X")' respectively, and (b) Var(b'X) � Var(b'Y) for all b (b 1 , , bn )' E IR". . = • • • 4.9. Let {X, } be the process X, = A cos(nt/3) + B sin(nt/3) + Y, where Y, Z, + 2.5Zt - J , { Z,} WN (0, 0'2 ), and A and B are uncorrelated (0, v 2 ) random variables which are also uncorrelated with {Z, } . Find the covariance function and the spectral distribution function of { X, } . � = {71: 4. 10. Construct a process {X, } which has spectral distribution function, Fx(w) = + w, 3n: + w, 5n + w, -n � w < - n/6, - n/6 � w < n/6, n/6 � w � n. 1 60 4. The Spectral Representation of a Stationary Process For which values of d does the differenced process VdX, = X, - X,_d have a spectral density? What is the significance of this result for deseasonalizing a time series by differencing? 4. 1 1 . Let {X, } be the ARMA(p, q) process defined by {Z, } - WN(O, a 2 ), ¢>(B)X, = O(B)Z,, where ¢>(z) # 0 for all z E IC such that lzl = I . Recall from Example 3.5. 1 that for some r > I , "' 1 1 I y(k)z k = a2 8(z)O(z - )/[¢>(z)¢>(z - ) ] , k= r- 1 < lzl < r, - oo where the series converges absolutely in the region specified. Use this result in conjunction with Corollary 4.3.2 to deduce that {X, } has a spectral density and to express the density in terms of a 2 , 0( · ) and ¢>( · ). 4. 1 2. Let {X, } denote the Wolfer sunspot numbers (Example 1 . 1 .5) and Jet { Y,} denote the mean-corrected series, Y, = X, - 46.93, t = I, . . . , 1 00. The following AR(2) model for { Y,} is obtained by equating the theoretical and sample autocovariances at lags 0, I and 2: Y, - 1 . 3 1 7 Y,-1 + .634 ¥,_2 = z, {Z, } - WN(0,289.3). (These estimated parameter values are called "Yule-Walker" estimates and can be found using the program PEST, option 3.) Determine the spectral density of the fitted model and find the frequency at which it achieves its maximum value. What is the corresponding period? (The spectral density of any ARMA process can be computed numerically using the program PEST, option 5.) 4. 1 3. If {X, } and { Y, } are stationary processes satisfying X, - cxX,_ 1 = J.t;, and Y, - cx Y, _ 1 = X, + Z,, where l cx l 0, find a positive integer k(s) and constants a0 = I , a 1 , . . . , ak such that the spectral density of the process k Y, = I aj x,_j j�O satisfies sup_ , ,; < ,; • lfr(A) - Var ( Y, )/2n l < s. 4. 1 6. Compute and sketch the spectral density f(2), 0 s 2 s n, of the stationary Problems 161 process {X, } defined by X, - .99X,_ 3 = Z,, { Z, } - WN(O, 1). Does the spectral density suggest that the sample paths of { X, } will exhibit oscillatory behaviour? If so, then what is the approximate period of the oscil lation? Compute the spectral density of the filtered process, Y, = t(X, _ t + X, + X, + t l, and compare the numerical values of the spectral densities of { X, } and { Y, } at frequency w = 2n/3 radians per unit time. What effect would you expect the filter to have on the oscillations of { X, } ? 4. 1 7. The spectral density of a real-valued process { X, } is defined on [0, n] by { 1 00, f(A.) = 0, n/6 - .01 < A < n/6 + .01, otherwise, and on [ - n, OJ by f(A.) = f( - /c). (a) Evaluate the covariance function of { X, } at lags 0 and 1 . (b) Find the spectral density of the process { Y, } where (c) What is the variance of Y,? (d) Sketch the power transfer function of the filter V 1 2 and use the sketch to explain the effect of the filter on sinusoids with frequencies (i) near zero and (ii) near rr/6. 4. 1 8. Let { X, } be any stationary series with continuous spectral density f such that 0 :<:; f(A) :<:; K and f(n) # 0. Let f.(A.) denote the spectral density of the differenced series { ( 1 - B)"X, } . (a) Express f.(A.) i n terms o f f,_1 (A.) and hence evaluate f,(/c). (b) Show that f,().)/f.(n) --> 0 as n --> oo for each A E [0, n). (c) What does (b) suggest regarding the behaviour of the sample-paths of { (1 - B)"X, } for large values of n? (d) Plot { ( 1 - B)" X, } for n = 1, 2, 3 and 4, where X,, t = 1, . . . , 100 are the Wolfer sunspot numbers (Example 1 . 1 .5). Do the realizations exhibit the behaviour expected from (c)? Notice the dependence of the sample variance on the order of differencing. (The graphs and the sample variances can be found using the program PEST.) 4. 19. Determine the power transfer function of the time-invariant linear filter with 2a, ljl2 1 and ljli = 0 for j oft 0, 1 or 2. If you wish coefficients ljl0 1, tjl1 to use the filter to suppress sinusoidal oscillations with period 6, what value of IX should you use? If the filter is applied to the process defined in Problem 4.9, what is the spectral distribution function of the filtered process, Y, = X, 2aX,_1 + X,_z? 4.20.* Suppose { Z().), -n :<:; A :<:; rr} is an orthogonal-increment process which is not necessarily right-continuous. Show that for all A E [ - n, n), Z(A. + c5) converges in mean square as c5 !0. Call the limit Z(A. + ) and show that this new process is a right-continuous orthogonal-increment process which is equal to Z(A.) with probability one except possibly at a countable number of values of A in [ - n, n). = = - = 1 62 4. The Spectral Representation of a Stationary Process 4.21 . * (a) If { Z, } is an iid sequence of N(O, 1) random variables, show that the associated orthogonal-increment process for { Z,} is given by dZ().) = dB(A.) + dB( - A.) + i(dB( - ) ) - dB(A.)) . where B(A.) is Brownian motion on [ - rc, rc] with a2 = 1/4 (see Example 4.6. 1 ) and where integration relative to dB( - A.) is defined by f- .. . /(A.) dB( - A.) = f- /( . . - A.) dB(A.) for all f E L2([ - rc, rc]). (b) Let X, = A cos(wt) + B sin(wt) + z, where A, B and Z,, t = 0, ± 1, . . . , are iid N(O, 1) random variables. Give a complete description of the orthogonal-increment process associated with {X,}. 4.22.* If X, = f<- 1 e ''v dz(v) where {Z(v), -rc :<;; v :<;; rc } is an orthogonal increment process with associated distribution function F( · ), and if Y, - <P Y,_1 = X, where <P E ( - 1, 1 ), find a function 1/J( · ) such that •. • Y, = I J(-n . n] e ''vi/J(v) dZ(v). Hence express E( Y,+ h X,) as an integral with respect to F. Evaluate the integral in the special case when F(v) = a2(v + rc)/2rc, - rc :<;; v :<;; rc. :<;; v :<;; rc} be an orthogonal increment process with associated distribution function F( · ) and suppose that 1/J E L 2 (F). (a) Show that 4.23.* Let {Z(v), - rc W(v) = L .. vJ 1/l(A.) dZ(A.), - rc :<;; v :<;; rc, is an orthogonal increment process with associated distribution function, G(v) = L . . vJ 1 1/J(A.W dF(A.). (b) Show that if g E L2(G) then gi/J E L2(F) and L ... J g(A.) dW(A.) = L . . . J g(),)I/J(A.) dZ(),). (c) Show that if 1 1/1 1 > 0 (except possibly on a set of F-measure zero), then Z(v) - Z( - rc) =I J v - n, ) 1 -- d W(A.), 1/J(A.) - TC :<;; V :<;; TC. 4.24. * If {X, } is the stationary process with spectral representation, t = 0, ± 1, . . . , Problems 1 63 where E l dZx(vW = I ¢(vW dF(v), F is a distribution function on [ - n, n], ¢ almost everywhere relative to dF, and ¢ E L 2(F), show that Y, = I J(-1t,1t] ei"r 1 (v) dZx(A), t = 0, # 0 ± I, . . . , is a stationary process with spectral distribution function F. 4.25.* Suppose that { X, } is a real stationary process with mean zero, spectral distribution function F and spectral representation X, = Jr - •. •l e;'.dZ(v). Let Im { Z(A)} and assume that F is continuous at U(A) Re { Z(A)} and V(J,) 0 and n. (a) Show that dF(A) = dF( - A), i.e. Jr • . •1 g(A) dF(A) = Jr _ •.• 1 g(),) dF( - A) for all g E L 2 (F), where integration with respect to F( - A) is defined by = = - _ I J [ - • . •l I g( - A) dF(A). J[- • . •l Jr- •.•1 g(A) dZ(A) Jr- 1 g(A) dZ( - X) for g(A) dF( - ),) = •. • (b) Show that dZ(A) = dZ( - A), i.e. all g E L 2 (F), where integration with respect to Z( - A) is defined by = l -•. •l g(A) dZ( - A) = l -•.•l g( - A) dZ(A). Deduce that dU(A) = dU( - J,) and d V(X) = - d V( - A). (c) Show that { U(A), 0 :<::; A :<::; n} and { V(A), 0 :<::; A :<::; n} are orthogonal increment processes such that for all ), and J.t, E(dU(X) dU(p)) = 2 - 1 b;_, � dF( A.), E(dV(X) dV(p)) = 2 - 1 3;_, � dF(A), and E(dU(A) dV(p)) (d) Show that X, = and that X, = 2 I J[-n, n] I J [O,x] = 0. cos(vt) dU(v) + I J [ - x, x] cos(vt) dU(v) + 2 I J [O, x] sin(vt) dV(v) sin(vt) dV(v). 4.26.* Use the properties (4.9.5) and (4.9.6) to establish Proposition 4.9 . 1 . 4.27.* Let { Z(v), - n :<::; v :<::; n } be an orthogonal-increment process with E IZ(v2 ) Z(v1 W = a(v2 - v1 ), v2 ;::o: v1 . Show that Y, = I J(-x,x ] (n - I v l/2r l e ivr dZ(v), 4. The Spectral Representation of a Stationary Process 1 64 is a stationary process and determine its spectral density and variance. If X, = f (-rr,n:] eivt dZ(v), find the coefficients of the time-invariant linear filter which, when applied to { Y; }, gives {X, } . 4.28. * Show that i f 1/1( · ) and 0 ( · ) are polynomials with n o common zeroes and if ¢(z) = 0 for some z E IC such that l z l = I, then the ARMA equations 1/i (B)X, = O(B)Z,, have no stationary solution. (Assume the existence of a stationary solution and then use the relation between the spectral distributions of {X, } and { Z, } to derive a contradiction.) 4.29.* Let {X,} be the stationary solution of the ARMA(p, q) equations, ¢(B)X, = O(B)Z, where ¢(z) = and l ai l ( I - a; 1 z) · · · ( 1 - a,- 1 z)(1 - a,-+\z) · · · ( 1 - a; 1 z), > I, i = 1, . . . , r; l ad < 1 , i = r + I, . . . , p. Define �(z) = (I - a; 1 z) · · · ( I - a,- 1 z)(l - a, + 1 z) · · · ( 1 - apz), and show (by computing the spectral density) that {tj(B)X,} has the same autocovariance function as { I a, + 1 · · · aP I 2 0(B)Z,}. It follows (see Section 3.2) that there exists a white noise process {Z,} in terms of which {X,} has the causal representation, �(B)X, = O(B)Z,. What is the variance of Z,? 4.30.* Prove Theorem 4. 1 . 1 . (To establish the sufficiency of condition (4. 1 . 6), let K 1 and K 2 be the real and imaginary parts respectively of the Hermitian function K and let 0"1 be the 2n x 2n-matrix, where K�"1 = [K,(i - j)J7.j= l , r = 1 , 2. Show that U"1 is a real symmetric non-negative definite matrix and let ( Y1 , • . • , Y;,, 0 and covariance matrix 0"1• Define the family of distribution functions, Z1 , . . . , Z")' be a Gaussian random vector with mean Problems 1 65 n E { 1, 2, . . . }, t E Z, and use Kolmogorov's theorem to deduce the existence of a bivariate Gaussian process { ( Y, , Z,), t E Z} with mean zero and covariances E( Y,+h Y,) = E(Z,+hZ,) = t K , (h), E( Y, +hZ,) = - E(Z, + h Y, ) = t K 2(h). Conclude by showing that {X, := Y, - iZ, t E Z } is a complex-valued process with autocovariance function K( · ).) 4.3 1 * Let { B(.l.), - n ::s; A ::s; n} be Brownian motion as described in Example 4.6. 1 . If g E L2(d).) is a real-valued symmetric function, show that the process defined by X, = I J(-rr.O] J2 cos(tv)g(v) dB(v) + I J(O,rr] J2 sin(tv)g(v) dB(v), t = O, ± 1 , . . . , is a stationary Gaussian process with spectral density function g2(.l.)a2/(2n). Conversely, suppose {X,} is a real stationary Gaussian process with orthogo nal-increment process Z(.l.) and spectral density function f(.l.) which is strictly positive. Show that the process, B(.l.) := I J(O, !.] r l i 2 (v)[dZ(v) + dZ(v)] , ;, E [0, n] , is Brownian motion. 4.32.* Show that for any spectral c.d.f. F on [ - n, n] , the function D.( - - O)j(2n + 1 ) (see Section 2. 1 1) converges i n U(F) t o the indicator function o f the set { 0}. Use this result to deduce, in the notation of Theorem 4.9. 1 , that Z(O) - Z(O - ), - n < 0 ::s; n, is the mean square limit as n ..... oo of (2n + W ' I Xj exp( - ijO). Ul $ n CHAPTER 5 Prediction of Stationary Processes In this chapter we investigate the problem of predicting the values { X1, t ;::: n + 1 } of a stationary process in terms of {X b . . . , X" } . The idea is to utilize observations taken at or before time n to forecast the subsequent behaviour P), the best predictor in uH of { X1 }. Given any closed subspace uH of of xn+ h is defined to be the element of uH with minimum mean-square distance from Xn+ h · This of course is not the only possible definition of "best", but for processes with finite second moments it leads to a theory of prediction which is simple, elegant and useful in practice. (In Chapter 1 3 we shall introduce alternative criteria which are needed for the prediction of processes with in finite second-order moments.) In Section 2.7, we showed that the projections P41<x1 x " > Xn+ h and P5P{ l . x 1 , , x " } Xn+ h are respectively the best function of XI , . . . , xn and the best linear combination of 1 , XI , . . . , xn for predicting Xn+ h · For the reasons given in Section 2.7 we shall concentrate almost ex clusively on predictors of the latter type (best linear predictors) instead of attempting to work with conditional expectations. U(Q, .?, • • • • • • • • §5. 1 The Prediction Equations in the Time Domain = Let { X1 } be a stationary process with mean J1 and autocovariance function y( · ). Then the process { 1; } { X1 - J1} is a zero-mean stationary process with autocovariance function y ( - ) and it is not difficult to show (Problem 5. 1) that (5. 1 . 1 ) Throughout this chapter we shall assume therefore, without loss of generality, that J1 0. Under this assumption it is clear from (5. 1 . 1 ) that (5. 1 .2) = §5. 1. The Prediction Equations in the Time Domain 1 67 Equations for the One-Step Predictors denote the closed linear subspace sp {X 1 , . . . ,X.}, n nLet Yf,.0, denote the one-step predictors, defined by if n = 0, {0 Xn +! = P:rc, Xn+! if n 1 . Since xn+ ! E Yf,., n 1 , we can write n 1, Xn+ I = rPn! Xn + · · · + rPnn X ! , � 1, and let X. +1, � � (5. 1 .3) � � (5. 1 .4) � where r/J. 1 , . . . , rPnn satisfy the prediction equations (2.3.8), viz. ) I\ t=l .f rPni xn+! - i , xn +! -j = <Xn+! , xn+ ! -j ), j = 1, . . . , n, with <X , Y ) = E (X Y). By the linearity of the inner product these equations can be rewritten in the form, n i=l L rPniY (i - j) = y(j), j = 1, . . . , n, or equivalently .. (5. 1 .5) where r. = [y(i - j )];_j= !, · " ' 'Yn = (y(l ), . . . , y(n))' and cp. = (r/J. 1 , . . . , r!J•• ) . The projection theorem (Theorem 2.3. 1 ) guarantees that equation (5.1 .5) has at least one solution since xn+l must be expressible in the form (5. 1 .4) for some cp. E IR " . Equations (5. 1 .4) and (5.1 .5) are known as the one-step prediction equations. Although there may be many solutions of(5. 1 .5), every one of them, when substituted in (5. 1 .4), must give the same predictor xn+! since we know (also from Theorem 2.3. 1 ) that Xn + I is uniquely defined. There is exactly one solution of (5. 1 . 5) if and only if r. is non-singular, in which case the solution is ' (5. 1 .6) The conditions specified in the following proposition are sufficient to ensure that r. is non-singular for every n. Proposition 5.1.1. h->ngularthenfortheevery covarin. ance matrix If y(O)of>(X0 and. . .y,(h).->' i0s asnon-si r. = [y ( - j )]i,j=!, . . . ,n i oo I , x) PROOF. Suppose that r. is singular for some n . Then since EX, = 0 there exists such that r, is non-singular and an integer � 1 and real constants r a 1 , ... , a, r x, +! = I ajxj j= ! (see Problem 1 . 1 7). By stationarity we then have 1 68 5. Prediction of Stationary Processes r xr + h = L1 aj xj + h - 1 , for all h � I, j= and consequently for all n � r + 1 there exist real constants ai">, . . . , a�•l, such that (5. 1 .7) where X, = (X 1 , . • . , X,)' and a<•l = (a\">, . . . , a� l)'. Now from (5. 1 .7) y(O) = a<•lT,a<•l " = a<•l' PAP'a<•l, where (see Proposition 1 .6.3) A is a diagonal matrix whose entries are the strictly positive eigenvalues .1. 1 :s; .1. 2 :s; · · · :s; .1., of r, and P P' is the identity matrix. Hence r = ,1. 1 L (aj"l) 2 , j= 1 which shows that for each fixed j, a)" l is a bounded function of n. We can also write y(O) = Cov(X., L 'i= 1 aj"l XJ, from which it follows that y(O) :S:: r L l aj"ll ly(n - j)l. j= 1 In view of this inequality and the boundedness of aj "l, it is clearly not possible to have y(O) > 0 and y(h) -+ 0 as h -+ oo if r. is singular for some n. This completes the proof. D Corollary 5.1 . 1 . Under the conditions of Proposition 5. 1 . 1 , the best linear pre dictor x.+ 1 of x. +1 in terms of X 1 , . . . , X. is n n = 1, 2, . . . , Xn+1 = L rPni Xn+ 1 - i • i= l where cJl. := (r/J. 1 , . . . , r/J••r = r.- 1 Ym y. := (y(1 ), . . . , y(n))' and r. = [y(i - j)L= 1 .....• . The mean squared error is v. = y(O) - y� r; 1 y• . PROOF. The result is an immediate consequence of (5. 1 .5) and Proposition 5. 1 . 1 . D Equations for the h-Step Predictors, h � 1 The best linear predictor of Xn+ h in terms of X 1 , found in exactly the same manner as x. +1 • Thus • • • , x. for any h � 1 can be n, h � 1 , (5. 1 .8) 1 69 §5.2. Recursive Methods for Computing Best Linear Predictors where q,�h ) = (ifJ��, . . . , ifJ��)' is any solution (unique if r" is non-singular) of where y�h ) = (y (h), y(h + 1), ... , y(n + h - 1 ))'. (5.1.9) §5.2 Recursive Methods for Computing Best Linear Predictors In this section we establish two recursive algorithms for determining the and show how they can be one-step predictors Xn+ l , :2: defined by used also to compute the h-step predictors P.Yc,Xn+ h ' :2: Recursive pre diction is of great practical importance since direct computation of P.Yc,Xn+ h from and requires, for large the solution of a large system of linear equations. Moreover, each time the number of observations is increased, the whole procedure must be repeated. The algorithms to be described in this section however allow us to compute best predictors without having to perform any matrix inversions. Furthermore they utilize the predictors based on observations to compute those based on + observations, . We shall also see in Chapter how the second algorithm greatly facilitates the computation of the exact likelihood of { X1 , . . . , Xn } when the process {X, } i s Gaussian. n 1, (5.1.3), h 1. n, (5.1.8) (5.1.9) ... n n = 1, 2, n 1 8 Recursive Prediction Using the Durbin-Levinson Algorithm n :2: 1, we can express Xn+l in the form, n :2: 1. (5.2.1) The mean squared error of prediction will be denoted by v" . Thus n :2: 1, (5.2.2) and clearly v0 = y (O). The algorithm specified in the following proposition, known as the Durbin or Levinson algorithm, is a recursive scheme for computing «Pn = ( n , , cPnnY and vn for n = 1, 2, ... . Proposition (The Durbin-Levinson Algorithm). If {X, } is a zero mean stationary process withtheautocovariance function y( such that y (O) > 0 and coefficients cPnj and mean squared errors vn as defined yby(h)(5.--->2.1)0 ashand--->(5.2.2then ) satisfy y(1)/y(O), v0 y(O), (5.2.3) Since Xn+l = P.Yc,Xn+ l E j'f,, c/J 1 5.2.1 oo, ·) c/J 1 1 = = • . . 1 70 5. Prediction of Stationary Processes [l/Jnl: ] = [l/Jn-:1,1 ] - Y'nn [l/Jn-: l,n-1] • • and l/Jn, n - 1 l/Jn -l , n- 1 A. (5.2.4) • l/Jn- 1 , 1 (5.2.5) $"1 X2 X. $"2 = = sp { sp { X 1 PROOF. By the definition of P:xt , . . . , } and P:xt X 1 } are orthogonal subspaces of Yf,. = sp { X 1 , . . . , X. } . Moreover it is easy to see that for any Y E L ff,P), f¥t;. Y = P:xt Y + P.Jfi Y. Hence 2 (0, , (5.2.6) where (5.2.7) Now by stationarity, (X . . . , X.)' has the same covariance matrix as both (X.,X ._1 , ... ,X1 )' and (X2 ,X.+1)', 1so that n(5.2.8) P� X � = L l/Jn - l, i Xi+l• jn = 1l (5.2.9) f� x. +l = I l/J. - l. j xn+l -j• j=l and I X! - PXj Xl ll 2 = I Xn+ l - PXj Xn+l ll 2 = I X. - X.� ll 2 = vn-1 · (5.2. 1 0) From equations (5.2.6), (5.2.8) and (5.2.9) we obtain xn + l = aX! j=lnL-1 [l/Jn -l, j - al/Jn- l,n-jJ Xn+l -j> (5.2. 1 1) 1, , • . • + where, from (5.2.7) and (5.2.8), In view of(5. 1 .6) and Proposition 5. 1 . 1 , the assumption that y(h) ---+ 0 as h ---+ oo guarantees that the representation x.+l = jI=ln l/J.jxn + l -j (5.2. 1 2) l/Jnn = a (5.2. 1 3) is unique. Comparing coefficients in (5.2. 1 1 ) and (5.2. 1 2) we therefore deduce that §5.2. Recursive Methods for Computing Best Linear Predictors and j = ... , n 1, 171 1, (5.2. 14) in accordance with (5.2.3) and (5.2.4). It remains only to establish (5.2.5). The mean squared error of the predictor Xn+l is = = = lJn = IIXn+l - Xn+ 1 ll 2 II Xn+l - P;r; Xn+l - f'Jf2 Xn+l ll 2 II Xn +l - P,x; Xn+l ll 2 + II Px;Xn+l ll 2 - 2 (Xn+l - P,x; Xn+I , Px;Xn+l ) vn -1 + a 2 vn - l - 2a(Xn+I , XI - P,x; XI ), where we have used (5.2. 10), the orthogonality of X1 and X2 , and the fact that Plfi Xn+l a(X1 - f� X1 ). Finally from (5.2.7) we obtain v. = vn - l (1 - a2 ) as required. = D In Section 3.4 we gave two definitions of the partial autocorrelation of { X, } a t lag viz. n, and a(n) = Corr(Xn+l - PSi>{Xz, . . . , x"} Xn+I , X1 - Psp(x2, . . . , x"} X d ct(n) = r/Jnn· In the following corollary we establish the equivalence of these two definitions under the conditions of Proposition 5.2. 1 . (The Partial Autocorrelation Function). Under the assumptions y ofCorollar Proposition 5.2. 1 5.2.1 PROOF. Since P;r; X. + 1 l. (X1 - Px-, X1 ), equations (5.2. 1 3), (5.2.7) and (5.2. 10) give rflnn = = = (Xn+I , XI - Px; X1 )/IIX1 - P,x; X1 II 2 (Xn+l - P;r; Xn+I , XI - P;r; X1 )/IIX1 - P,x; X1 II 2 Corr(Xn+l - P;r; xn+l , X I - f;r; X ) I . D Recursive Prediction Using the Innovations Algorithm The central idea in the proof of Proposition 5.2. 1 was the decomposition of .Yf', into the two orthogonal subspaces X1 and X . The second recursion, established below as Proposition 5.2.2, depends on 2the decomposition of J'f, into orthogonal subspaces by means of the Gram-Schmidt procedure. n 5. Prediction of Stationary Processes 1 72 Proposition 5.2.2 is more generally applicable than Proposition 5.2. 1 since to be a possibly non-stationary process with mean zero and we allow autocovariance function, { X1} K(i,j) = < X;, xj > = E(X;XJ. As before, we define £, sp {X 1 , ... ,Xn }, Xn +I as I Xn+ I - Xn+ 1ll 2. Clearly (defining X1 0), = := so that m (5. 1 .3), and n ::;::.: Vn = 1, gn+ ! jL�n I enj(Xn+l -j - xn +I -)· = { eni•j 1 , ... , n ; vn }, n= (The Innovations Algorithm). If { X1} has zero mean and Proposition E(X; X ;) K(i, j ), where the matrix [K(i,j)J?. j�I is non-singular for each n 1 , 2 , . . ' then the one-step predictors xn + l ' n 0, and their mean squared errors Vn, n 1, are given by We now establish the recursive scheme for computing 1 , 2, . . . . = 5.2.2 = . = :::::-: ::;::.: (5.2. 1 5) and V0 = K( J , 1), k = ... , n 0, 1 , 1, K(n + 1 , n + 1 ) - nj�OL-1 e?;,n -jvj. (5.2. 1 6) to solve (5.2. 1 6) recursively in the order v0 ; 8 11 , v 1; 822 , 8(It2 1•isV2a; trivial 833 • 83matter 2 • 831 • v3 ; . · .) PROOF. The set {X I - x l , x2 - x2, ... ,Xn - Xn } is orthogonal since X;) E YtJ- 1 fo i < j and (Xj - Xj) YtJ- 1 by definition (X;inner - product of xj . Taking the on both sides of (5.2. 1 5) with Xk +I - Xk+ I , 0 k < n, we have <X'n+l , Xk+l - Xk+l > en.n-k vk . Since (Xn +I - Xn +I) (Xk +I - Xk+ I ), the coefficients en , n -k• k = 0, ... , n - 1 are given by = . j_ r :-:;; = _l_ n (5.2. 1 7) Making use of the representation (5.2. 1 5) with replaced by k, we obtain §5.2. Recursive Methods for Computing Best Linear Predictors Since by (5.2. 1 7), <X.+ t , Xi+ 1 - Xi+ 1 ) (5.2. 1 8) in the form = vien, n -i' 1 73 0 -5;_ j < n, we can rewrite as required. By the projection theorem and Proposition 2.3.2, 0 completing the derivation of (5.2. 1 6). = Remark 1. While the Durbin-Levinson recursion gives the coefficients of X1 , . . . , x. in the representation xn+ 1 I:;� 1 ¢J.jxn+ 1 -j, Proposition 5.2.2 gives the coefficients of the "innovations", (Xi - Xi ),j = 1 , . . , n, in the orthogonal expansion X.+ 1 L}� 1 e.i(X.+ 1 -i - X.+ 1 -J The latter expansion is extremely simple to use and, in the case of ARMA(p, q) processes, can be simplified still further as described in Section 5.3. Proposition 5.2.2 also yields an innovations representation of X.+ 1 itself. Thus, defining 8. 0 1 , we can write = . = n Xn+ 1 = L e.i(Xn+ 1 -i - Xn+ 1 -i), j�O n = 0, 1 , 2, . . . . ExAMPLE 5.2. 1 (Prediction of an MA(l ) Process Using the Innovations Al gorithm). If {X, } is the process, x, = z, + ez,_ 1 , then K(i,j) = 0 for I i - jl > 1 , K(i, i) = a2(1 + 82) and K(i, i + 1 ) this it is easy to see, using (5.2. 1 6), that 2 5;_ j -5;_ n, = 8a2. From and v. = [1 + 82 - v;;-� 1 82a2] a2. = = If we define r. = v./a2, then we can write Xn+ 1 8(X. - X.)/rn -1 where r0 I + 82 and r.+ 1 1 + 82 - 82/r Table 5.2. 1 illustrates the use of these recursions in computing X from observations of X 1 , . . . , X with e = - .9. Note that v. is non-increasing in n and, since II X. - x. - Z. ll � 0 as n -> oo, v. -> a2 (see Problem 5.5). The convergence of v. to a2 is quite rapid = 6 in the example shown in Table 5.2. 1 . •. 5 5. Prediction of Stationary Processes 1 74 X, Table 5.2. 1 . Calculation of and from Five Observations of the MA(l) Process, X, = Z, - .9Z , Z, N(O, 1 ) v, ,1 _ X, + I - 2.58 1 .62 - 0.96 2.62 - 1 .36 0 2 3 4 5 x,+l ,.._, 0 1.28 - 0.22 0.55 - 1.63 - 0.22 v, 1.810 1 .362 1.215 1 . 1 44 1 . 1 02 1 .075 ExAMPLE 5.2.2 (Prediction of an MA(1) Process Using the Durbin-Levinson Algorithm). If we apply the Durbin-Levinson algorithm to the problem considered in Example 5.2. 1 we obtain v0 == 1-. 8.4972 10 rPf/J11 = - .3285 f/J == 1-.362.6605 v 1 .2 1 5 221 = - .4892 rP3 2 - .7404 v3 = 1 . 144 rP3 rPrP432432 = -- ..2433 1 1 9 14 rP4 3 = - .3850 rP4 2 = - .5828 f/J4 1 = - .7870 v4 1 . 102 f/Jv55 5 = 1-.075, . 1 563 f/J 5 4 = - .3 144 f/J5 3 = - .4761 rPs z = - .6430 f/J5 1 = - . 8 1 69 f/J5 1 X5 -0.22, in agreement with the much giving X 6 = f/J5 5 X 1 + simpler calculation based on Proposition 5.2.2 and shown in Table 5.2. 1 . Note that the constants f/J""' n = 1 , 2, . . . , 5, are the partial autocorrelations at lags 1, 2, . . . , 5 respectively. v1 = = = = = ··· + = Recursive Calculation of the h-Step Predictors, h :2: 1 " PnXn+h P PnXn+h = PnPn+h- 1 Xn+h = PnXn+h P.Ye. . Let us introduce the notation for the projection operator Then the can easily be found with the aid of Propo� ition 5.2.2. h-step predictors By Proposition 2.3.2, for h � 1 , Since (Xn+h-i - Xn+h-i) for- j < h, it follows from Proposition 2.3.2 that (5.2. 1 9) PnXn+h n+hj=hL 1 (}n +h-1 .)Xn +h -j - Xn+h -j) j_ Yf. = 1 75 §5.3. Recursive Prediction of an ARMA(p, q) Process where the coefficients 8ni are determined as before by (5.2. 1 6). Moreover the mean squared error can be expressed as E(Xn+h - Pn Xn+h ) 2 = IIXn+h ll 2 - IIPnXn+h l l 2 n+h - 1 = K(n + h , n + h) - L 8�+h- 1 , j vn + h -j- 1 • (5.2.20) j= h §5.3 Recursive Prediction of an ARMA(p, q) Process Proposition 5.2.2 can of course be applied directly to the prediction of the causal ARMA process, (5.3. 1 ) l/J(B)X, = 8(B)Z,, {Z, } WN(O, a 2 ), where as usual, l/J(B) = 1 - l/J 1 B - · · · - l/Jp BP and 8(B) = 1 + 8 1 B + · · · + 8q Bq . We shall see below however that a drastic simplification in the calculations can be achieved if, instead of applying Proposition 5.2.2 directly to {X,}, we apply it to the transformed process (cf. Ansley ( 1979)), t = 1 , . . . , m, = a - 1 X,, (5.3.2) 1 t > m, = a l/J(B)X,, � {W. W, where m = max(p, q). (5.3.3) For notational convenience we define 80 = 1 and assume that p � 1 and q � 1 . (There is no loss of generality i n these assumptions since i n the analysis which follows we may take any of the coefficients l/J; and 8; to be zero.) With the subspaces Yf,. as defined in Section 5. 1 , we can write n � 1. (5.3.4) For n � 1 , Xn+ t and W, + t will denote the projections on Yf,. of Xn+1 and l¥, +1 respectively. As in (5. 1 .3) we also define X 1 = W1 = o. The autocovariance function Yx( · ) of {X, } can easily be computed using any of the methods described in Section 3.3. The autocovariances K(i,j) = E( W; ltj) are then found from a -2 Yx(i -j), 1 � i, j � m, [ a - 2 Yx(i -j) K(i,j) = r� ¢lr Yx(r - l i -j i ) J min(i,j) � m < max(i,j) � 2m, min(i,j) > m, 0, otherwise, where we have adopted the convention 8i = 0 for j > q. (5.3.5) 1 76 5. Prediction of Stationary Processes { W,} we obtain w.. +1 j=1f enpv,,+1-j - w.. +1-j), 1 :::;; n < (5.3.6) f w;,+1 j=1 enj( W,+1-j - w;, +1-j), n � where the coefficients 8ni and mean squared errors E( W,+ 1 - Wn + J2 are found recursively from (5.2. 1 6) with K defined as in (5.3.5). The notable feature of the predictors (5.3.6) is the vanishing of 8ni when both n � andj q. This is a consequence of ( 5.2. 1 6) and the fact that K(n,j ) = 0 if n and In - j I q . To find X" from W, we observe, by projecting each side of (5.3.2) onto Jt;_ 1 , that 1, Jf; 0' -1 x, (5.3.7) { Jf; = 0' - 1 [XI - rP1 X1 -1 - . . . - r/JP XI _p] , t Applying Proposition 5.2.2 to the process { m, = � � = L. m, r" = m > > m t = = > . . . , m, > m, (5.3.8) X1 - X1 = O'[W, - Jf;] for ali t � 1 . Replacing ( rtj - � ) by 0' - 1 (Xi - Xi ) in (5.3.6) and then substituting into (5.3.7) we finally obtain, n 1 :::;; n < xn+1 = L1 8niXn+1-j - xn +1-j), (5.3.9) �� Xn+1 Xn + r/Jp Xn+t- p + j=1I 8niXn +1-i - Xn+1-j), n � which, together with (5.3.2), shows that { m, = + ··· m, and (5.3. 1 0) 8ni where and r" are found from (5.2. 1 6) with K as in (5.3.5). Equations (5.3.9) determine the one-step predictors . . . , recursively. r/J1 , , r/Jp, 81 , . . . , 8q X2 , X3, { W,} 8ni Remark 1. The covariances K(i,j) of the transformed process depend only on and not on 0' 2 • The same is therefore true of and r" . . • . Xn + 1 Remark 2. The representation (5.3.9) for is particularly convenient from a practical point of view, not only because of the simple recursion relations for the coefficients, but also because for n � m it requires the storage of at most p past observations and at most q past innovations Direct application of + 1 J, j = 1 , . . . , q, in order to predict Proposition 5.2.2 to on the other hand leads to a representation of in terms of all the n preceding innovations = 1, . . . , n. xn , . . . , xn+1 - p Xn +1 . xn+ 1 (Xi - Xi),j Remark 3. It can be shown (see Problem 5.6) that if { X1} is invertible then as n ...... 1 and enj ...... ej ,j = 1, . . . , q. (Xn +1 -i - Xn ...... 00, rn {XI} §5.3. Recursive Prediction of an ARMA(p, q) Process 1 77 ExAMPLE 5.3. 1 (Prediction of an AR(p) Process). Applying (5.3.9) to the ARMA(p, 1 ) process with = 0, we easily find that 81 n � p. 5.3.2 (Prediction of an MA(q) Process). Applying (5.3.9) to the ARMA( 1 , q) process with f/J 1 = 0, we obtain ExAMPLE ,L q) 8 X +1 - +1 ), n 1 , n mi n ( 1 gn+ j= 1 ni n -j gn -j where the coefficients enj are found by applying the algorithm (5.2. 1 6) to the defined in (5.3.5). Since in this case the processes { lt;} and co variances { (J - 1 X,} are identical, these covariances are simply q-�2...-ji e,e,+ i-j • = � K(i,j) K(i,j) = (J-2 Yx (i - j) = ExAMPLE 1 1 r=O 5.3.3 (Prediction of an ARMA ( 1 , 1) Process). If (5.3. 1 1 ) and I fiJ I < 1, then equations (5.3.9) reduce to the single equation n � (5.3. 1 2) 1. (J8" 1 8 82 )/(1 To compute we first use equations (3.3.8) with k = 0 and k = 1 to find - f/J2). Substituting in (5.3.5) then gives, for that Yx (O) = 2 ( 1 + 2 ¢J + i, j � 1 , i =j = 1, i = j � 2, l i - ji = 1 i otherwise. , � 1, With these values of K(i,j), the recursions (5.2. 1 6) reduce to (5.3. 1 3) which are quite trivial to solve (see Problem 5. 1 3). for the process In Table 5.3. 1 we show simulated values of (5.3. 1 1 ) with Z, N (O, 1 ), f/J 1 = ¢J = 0.2 and = = 0.4. The table also shows n = 1 , . . . , 1 0, computed from (5.3. 1 3) and the cor the values of and responding predicted values n = 1, . . . , 1 0, as specified by (5.3. 1 2). Since (J2 = 1 in this case, the mean squared errors are rn en 1 ' X +1, n � 81 8X1 , ... , X1 0 1 78 5. Prediction of Stationary Processes Xn Table 5.3. 1 . Calculation of for Data from the ARMA(1, 1 ) Process of Example 5.3.3 n xn+ l rn en ! xn+l 0 1 2 3 4 5 6 7 8 9 10 - 1 . 1 00 0.5 1 4 0. 1 1 6 - 0.845 0.872 - 0.467 - 0.977 - 1 .699 - 1 .228 - 1 .093 1 .3750 1 .0436 1 .0067 1 .001 1 1 .0002 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 0.2909 0.3833 0.3973 0.3996 0.3999 0.4000 0.4000 0.4000 0.4000 0.4000 0 - 0.5340 0.5068 - 0. 1 32 1 - 0.4539 0.7046 - 0.5620 - 0.36 14 - 0.8748 - 0.3869 - 0.5010 ExAMPLE 5.3.4 (Prediction of an ARMA(2, 3) Process). Simulated values of X 1 , . . . , X 1 0 for the causal ARMA(2, 3) process X, - X,_ 1 + 0.24X,_ 2 = Z, + 0.4Z,_ 1 + 0.2Z, _ 2 + 0. 1 Z,_3, {Z, } � WN(O, 1 ), are shown in Table 5.3.2. In order to find the one-step predictors n 2, . . . , 1 1 we first need the covariances Yx(h), h = 0, 1, 2, which are easily found from equations (3.3.8) with k = 0, 1, 2, to be Xn, = Yx (O) = 7. 1 7 1 33, Yx0) = 6.44 1 39 and Yx(2) = 5.06027. Substituting in (5.3.5), we find that the symmetric matrix K = [K(i,j)l,j = l , l , . . . is given by K= 7. 1 7 1 33 6.44 1 39 5.06027 0. 10 0 0 7. 1 7 1 33 6.44 1 39 0.34 0. 1 0 0 0 7. 1 7 1 33 0.8 1 6 0.34 0. 1 0 0 0 1 .2 1 0.50 1 .2 1 0.24 0.50 1 .2 1 0. 1 0 0.24 0.50 1 .2 1 0 (5.3.14) The next step is to solve the recursions (5.2. 1 6) with K(i,j) as in (5.3. 14) for enj and rn_ 1 , j = 1, . . . , n; n = 1 , . . . , 1 0. Then §5.3. Recursive Prediction of an ARMA(p, q) Process n = 1 79 1 , 2, gn+1 xn - 0.24Xn - 1 + j=L1 8n}Xn + 1 -j - .xn+1-j), 3 n = 3, 4, . . . = and , The results are shown in Table 5.3.2. Table 5.3.2. Calculation of Process of Example 5.3.4 xn+l n 0 1 2 3 4 5 6 7 8 9 10 11 12 1 .704 0.527 1 .041 0.942 0.555 - 1 .002 - 0.585 0.010 - 0.638 0.525 rn 7. 1 7 1 3 1 .3856 1 .0057 1 .001 9 1 .00 1 6 1 .0005 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 .Xn + 1 for Data from the ARMA(2, 3) on! Onz (}n3 0.8982 1 .3685 0.4008 0.3998 0.3992 0.4000 0.4000 0.4000 0.4000 0.4000 0.4000 0.4000 0.7056 0. 1 806 0.2020 0. 1995 0. 1 997 0.2000 0.2000 0.2000 0.2000 0.2000 0.2000 0.01 39 0.0722 0.0994 0.0998 0.0998 0.0999 0. 1 000 0. 1 000 0. 1 000 0. 1 000 xn+1 0 1 .5305 - 0. 1 71 0 1 .2428 0.7443 0.3 1 38 - 1 .7293 - 0. 1 688 0.3 193 - 0.873 1 1 .0638 h-Step Prediction of an ARMA(p, q) Process, h � 1 As in Section 5.2 we shall use the notation Then from (5.2. 1 9) we have Pn for the projection operator P £, . n +h -1 Pn w,+h = j=h 2: en+h- 1 )w,+h-j - W..+h-j) { Pn Pn Xn +h Using this result and applying the operator to each side of the equations satisfy (5.3.2), we conclude that the h-step predictors Pn Xn+h = n+ -1 ± 8n+h-l,j(Xn+h -i - Xn +h-i), 1 :S: h :S: m - n, j=h if= 1 r/J; Pn Xn +h - i + h<:,Lj <:, q en+h-1 )Xn+ h-j - xn +h-j ), h > m - n. (5.3. 1 5) 180 5. Prediction of Stationary Processes Once the predictors X 1 , . . . , x. have been computed from (5.3.9), it is a straightforward calculation, with n fixed, to determine the predictors P. X. +1 , P. X. + 3 , . . . , recursively from (5.3. 1 5). Assuming that n > m, as is invariably the case in practical prediction problems, we have for h ;:::: 1 , P.X.+ 2 , (5.3. 1 6) where the second term is zero if h > q. Expressing Xn+h as Xn+h + (Xn+h xn+h ), we can also write, ( 5.3. 1 7) where 8. 0 I for all n . Subtracting (5.3. 1 6) from (5.3. 1 7) gives p h- 1 L en+h-l ,j(Xn+h-j - xn+h -), xn+h - P.Xn+h - L r/>; (Xn+h-i - P.Xn+h-i ) = j=O i=1 := and hence, (5.3 . 1 8) where <D and e are the lower triangular matrices, and 0= [ 8n+i- J . i-j]�.j=1 (8. 0 := 1 , e.i := 0 if ) > q or j < 0). From (5.3. 1 8) we immediately find that the covariance matrix of the vector (X. +1 - P.X.+ 1 , , Xn+h - P. Xn + h )' of prediction errors is . • • (5.3. 1 9) where V = diag(v., v.+ 1 , . . . , vn + h-d· It is not difficult to show (Problem 5.7) that <D - 1 is the lower triangular matrix (5.3.20) <D- 1 = [ X; iJ � i = 1 (Xo := 1 , Xi := 0 if j < 0), whose components Xi,) ;:::: I, are easily computed from the recursion relations, - . (5.3.2 1 ) j = 1 , 2, . . . . xj = I1 r/>kxj-k , k= [By writing down the recursion relations for the coefficients in the power series expansion of 1 /r/>(z) (cf. (3.3.3)), we see in fact that min(p,j) I xiz i = (1 - r/>1 z - · · · - r/>p zPr 1 , w j=O lzl :'S: 1 .] �5.3. Recursive Prediction of an ARMA(p, q) Process 181 The mean squared error of the h-step predictor PnXn+ h is then found from (5.3. 1 9) to be a; (h) : = E(Xn+h - Pn Xn+h ) 2 = :� Cto XA+h -r- 1 , j-ry vn+h -j- 1 • (5.3.22) Assuming invertibility of the ARMA process, we can let n -+ oo in (5.3. 1 6) and (5.3.22) to get the large-sample approximations, p q Pn Xn+h � I </J;Pn Xn +h -i + L ej(Xn+h-j - xn+ h-) j=h i= 1 (5.3.23) and (5.3.24) where lzl ::S:: 1 . EXAMPLE 5.3.5 (Two- and Three-Step Prediction of an ARMA(2, 3) Process). We illustrate the use of equations (5.3. 1 6) and (5.3.22) by applying them to the data of Example 5.3.4 (see Table 5.3.2). From (5.3. 1 6) we obtain 3 2 p1 0 x1 2 = I1 </J;P1 0 X l 2 - i + j=L el l . iX ! 2 -j - .1\ 2-) i= 2 = </J1 X1 1 + </J2 X 1 0 + .2(X 1 0 - X10 ) + . 1 (X9 - X9 ) = 1.1217 and 3 2 P l o X 1 3 = I </J;P!o X ! 3 - i + L e! 2 )X 1 3 -j - x 1 3 -) j= 3 i= 1 = </J1 P1o X 1 2 + </J2 X 1 1 + . l (X 1 0 - X1o l = 1 .0062 For k > 1 3, P1 0 Xk is easily found recursively from P1 o Xk = </J1 P1 0 Xk_ 1 + </J2 P1 o Xk- 2 · To find the mean squared error of PnXn+h we apply (5.3.22) with Xo = 1 , X1 = </J1 = 1 and X 2 = </J 1 X 1 + </J2 = .76. Using the values of enj and vj ( = rj ) in Table 5.3.2, we obtain a fo (2) = E(X 1 2 - P10 X 1 2 ) 2 = 2.960, and 5. Prediction of Stationary Processes 1 82 If we use the large-sample approximation (5.3.23) and (5.3.24), the predicted values and mean squared errors 1 , are unchanged since the coefficients e.i , j = 1 , 2, 3, and the one-step mean squared errors v. = have attained their asymptotic values (to four decimal places) when n = 1 0. P1 0X1 o+h O"f0(h), h ;:::: r. 0"2 §5.4 Prediction of a Stationary Gaussian Process; Prediction Bounds Let { X, } be a zero-mean stationary Gaussian process (see Definition 1 .3.4) with covariance function such that > 0 and 0 as oo. By equation (5. 1 .8) the best linear predictor of in terms of X. = y(O)X + nh y ( ·) IS P.Xn +h y(h) --> (Xh 1-,->. . . , X.)' h ;:::: 1 . (5.4. 1 ) (The calculation of is most simply carried out recursively with the aid of (5.2. 1 9) or, in the case of an ARMA(p, q) process, by using (5.3.1 5).) Since , has a multivariate normal distribution, it follows from Problem 2.20 that (X1 , Xn+h)' P.Xn +h = E.ff<x , . ... . x"J Xn +h = E(Xn + h i X1 , . . . , X.). Gaussian process it is clear that the prediction error, tJ..(h)For:=aXstationary n+h - P.Xn+ h' is normally distributed with mean zero and variance O"; (h) = EtJ..(W, • • • which can be calculated either from (5.2.20) in the general case, or from (5.3.22) if {X, } is an ARMA(p, q) process. Den oting by the ( 1 - a/2)-quantile of the standard normal distribu tion function, we conclude from the observations of the preceding paragraph ± with probability that lies between the bounds (1 - a). These bounds are therefore called ( 1 - a)-prediction bounds for xn +h <l>1_a12 P.Xn +h <l> 1 -aj2 0"n (h) Xn+h · §5.5 Prediction of a Causal Invertible ARMA Process in Terms of Xi, - oo < j < n P.Xn +h {X,} It is sometimes useful, primarily in order to approximate for large n, to determine the projection of onto .A. = sp { Xi , - oo < j :::::; n }. In this section we shall consider this problem in the case when is a causal invertible ARMA(p, q) process, Xn+h (5.5. 1 ) I n order to simplify notation we shall consider n to be a fixed positive integer §5.5. Prediction of a Causal Invertible ARMA Process 1 83 (= and define n). (5.5.2) n Theorem 5.5.1 . X, is the causal invertible ARMA process (5. 5 .1) and X, is defined by (5.5.2) then (5.5.3) L nj Xn+h-j xn+h j=l and (5.5.4) xn+h L 1/Jj Zn+h-j • j=h where Li=o njzj = ¢(z)/8(z) and L�o 1/Jjzj = 8(z)/¢(z), l z l 1. Moreover (5.5.5) PROOF. We know from Theorems 3.1.1 and 3.1. 2 that (5.5.6) zn+h = xn+h + Ll nj Xn+h-j j= X, := P«" X, X, for t s E(Xn+h - Xn+h ) 2 from the following theorem. We can then determine Xn +h and 2 The quantity E(Xn+h - Xn+h ) is useful for large as an approximation to E(Xn+h - Pn Xn+h ) z . If = - co co = s co and = j=O 1/JjZn+h-j· (5.5.7) Applying the operator P11" to each side of these equations and using the fact that Zn+ k is orthogonal to A" for each k 1, we obtain equations (5. 5 . 3) and (5.5.4). Then subtracting (5.5.4) from (5.5.7) we find that h -1 (5.5.8) L 1/Jj Zn+h-j• xn+h - xn+h = j=O from which the result (5. 5 . 5 ) follows at once. D Remark 1. Equation (5. 5 . 3 ) is the most convenient one for calculation of Xn+h · It can be solved recursively for h = 1, 2, 3, ... , using the conditions X, = X,, t n. Thus xn+h co L � s xn+ l xn+ 2 co = - L nj Xn+ l -j • j=! = - n ! Xn+! - j= 2 njxn+ 2 -j• co L etc. 1 84 5. Prediction of Stationary Processes X�, For large n, a truncated solution obtained by setting in (5.5.3) and solving the resulting equations L�n +h niXn +h-i = 0 xt = X, t = 1 , . . . , n, is sometimes used as an approximation to P.Xn + h · This procedure gives with etc. X E(Xnn+h+h - P.Xn+h)2• The mean squared error of as specified by (5.5.5) is also sometimes used The approximation (5.5.5) is in as an approximation to fact simply the large sample approximation (5.3.24) to the exact mean squared error (5.3.22) of P. Xn+h · Remark 2. For an AR(p) process, equation (5.5.3) leads to the expected result, X.+ t = </J1 Xn · · · + f/JpXn + t-p• with mean squared error + For an MA(1) process (5.5.3) gives with mean squared error E(Xn+t - x.+d2 = a2. The "truncation" approximation to P.X.+1 for the MA(l ) process is which may be a poor approximation if I fJ1 1 is near one. Xn +h - Xn +h • = h 1 , 2, . . . , are not Remark 3. For fixed n, the prediction errors uncorrelated. In fact it is clear from (5.5.8) that the covariance of the h-step and k-step prediction errors is for k (Xn+h - P.Xn +h) 2 h. (5.5.9) (Xn+k - P.X.+k) and The corresponding covariance of rather more complicated, but it can be found from (5.3 . 1 9). IS §5.6.* Prediction in the Frequency Domain 185 §5.6* Prediction in the Frequency Domain - , {X,, If t E Z} is a zero-mean stationary process with spectral distribution function F and associated orthogonal increment process { Z(A.), n ::;; ) ::;; n }, then the mapping I defined by (5.6. 1 ) I g = 1-n. n] g(A.) dZ(A.), i s a n isomorphism o f onto = sp { X,, t E Z } (see Remark 1 o f Section 4.8), with the property that ( ) L 2 (F) Yf t E Z. (5. 6 .2) This isomorphism allows us to compute projections (i.e. predictors) in Yf by computing projections in L 2 (F) and then applying the mapping I. For example the best linear predictor in = -oo < t ::;; can of be expressed (see Section 2.9) as P41"Xn+h Xn+h J!tn sp{X,, P.41" Xn+h - I(Psp{exp( it · ). -co <t :Sn} ei(n+h) · )· _ n} (5.6.3) The calculation of the projection on the right followed by application of the mapping is illustrated in the following examples. I EXAMPLE 5. 6 .1. Suppose that {X,} has mean zero and spectral density f such that 1 < I A. I ::;; n. (5.6.4) f(A.) 0, It is simple to check that 11 - e - i-< 1 < 1 for A. E = [ - 1, 1 ] and hence that the series = E [1 - ( 1 - e - i-<)J - 1 = e i-<. Consequently e i( n+ ! k�Of j�OI (�) ( - 1)iei-<(n-j), (5.6.5) converges uniformly on E to ).< } = (5.6.6) with the series on the right converging uniformly on E. By (5.6.4) the series (5.6.6) converges also in L2 (F) to a limit which is clearly an element of sp {exp(it · ), - oo < t ::;; n } . Projecting each side of (5.6.6) onto this subspace, we therefore obtain Psp(exp ( it · ). -co < t:Sn} e i(n+ !) · - e i(n +! ) · - kf�O j�O� (k.) ( - 1)ie i(n -j)·. (5.6. 7) Applying the mapping I to this equation and using (5.6.2) and (5.6.3), we conclude that _ _ L. L. } (5.6.8) 1 86 5. Prediction of Stationary Processes Computation of this predictor using the time-domain methods of Section 5. 1 is a considerably more difficult task. ExAMPLE 5.6.2 (Prediction of an ARMA Process). Consider the causal inver tible ARMA process, qy (B)X, = 8(B)Z,, with spectral density (5.6.9) where 00 a(A.) (2nr 112 <J k=OL th e- iu (5.6. 1 0) and L r'= o t/fk z k 8(z)/¢J(z), l z l 1 . Convergence of the series in (5.6. 1 0) is uniform on [ - n, n] since, by the causality assumption, L k lt/lk l < oo . PSiJ{exp( ir · ), <r s n) ei<n+h) · must satisfy The function g( f" (ei(n+h).\ - g (A.))e-im.la(A.)a(A.) dA. 0, n. (5.6. 1 1) This equation implies that (ei <n + h)· - g( · ))a( · )a( · ) is an element of the subspace A+ = sp {exp(im · ), n} of U( [ - n, n], , dA.). Noting from 0}, we deduce that the function 1 /a( ) sp{exp( ), (ei(5.6.<n+1h0))· -thatg(-))a(-) is also an element of A+ · Let us now write e i <n + h) .l a ( ) g(A.)a(A.) + (ei<n + h) .l g(A.))a(A.), (5.6. 1 2) observing that g(-)a(-) is orthogonal to A+ (in U(d.A.)). But from (5.6. 10), ( 5.6. 1 3) ei(n+hl.la(A.) (2n)-1 2 <Jein.l k =L-h t/lk+h e- ikA, and since the element ei<n + h) · a(-) of 2 (dA.) has a unique representation as a sum of two components, one in A+ and one orthogonal to A+ , we can = :S: = ·) = - oo m :S: = m > " E im · ), :?B m ;:o: = _ 00 ! = L immediately make the identification, g(A.)a(A.) = (2n) - l i2 Using (5.6. 10) again we obtain <Jein.l kL=O t/lk +h e- iu. 00 g(A.) ei".l[¢J(e- i.l)/8(e-i.l) ] kL=O t/lk +he-iu, i.e. g(A.) j=OL rxi ei<n-il-l, (5.6. 1 4) where L'f=orxi z i [¢J(z)/8(z)] L r'= o t/lk + h z\ i z l 1 . Applying the mapping I to each side of (5.6. 14) and using (5.6.2) and (5.6.3), we conclude that 00 = 00 = = :S: §5.7.* The Wold Decomposition 1 87 f$/" Xn+ h a) (5.6. 1 5) L Xn j=O IJ.j -j· It is not difficult to check (Problem 5. 1 7) that this result is equivalent to (5.5.4). = §5.7* The Wold Decomposition In Example 5.6. 1 , the values Xn+j,j ;:::. 1, of the process {X,, t E Z} were perfectly predictable in terms of elements ofA" = sp {X,, -oo < n}. Such processes are called deterministic. Any zero-mean stationary process { X,} which is not deterministic can be expressed as a sum X, = U, + v; of an MA( oo) process { U, } and a deterministic process { v;} which is uncorrelated with { U,}. In the statement and proof of this decomposition (Theorem 5.7. 1 ) we shall use the notation (J 2 for the one-step mean squared error, (J2 = E I Xn+1 - PA" Xn+1 l 2 , t ::::; and At - oo for the closed linear subspace, n = � oo of the Hilbert space At = sp { X,, E Z}. All subspaces and orthogonal com plements should be interpreted as relative to At. For orthogonal subspaces 9'1 and 9'2 we define 9'1 EB //2' := { x + y : E 9;. and y E 9'2 }. t x Remark 1 . The process { X, } is said to be deterministic if and only if (J 2 or equivalently if and only if X, E At for each t (Problem 5. 1 8). = 0, -oo Theorem 5.7.1 (The Wold Decomposition). as 00 X, = L 1/Jj Zt-j + v;, j=O where (i) 1/10 = (ii) { Z,} and j=OL 1/1/ < oo, a) 1 � WN(O, (J 2 ), for each t E Z, 0 for all s, t E Z, (iv) E(Z, for each t E Z, (v) v; E At (vi) { v; } is deterministic. (iii) Z, E At, V,) and If (J2 > 0 then X, can be expressed = - oo (5.7. 1 ) 5. Prediction of Stationary Processes 1 88 ((v) and (vi) are not the same since A is defined in terms of { X,}, not { V. } .) The sequences { 1/JJ , { Zj } and { rj } are uniquely determined by (5.7. 1 ) and the conditions (i)-(vi). - oo PROOF. We first show that the sequences defined by (5.7.2) (5.7.3) and 00 J!; = X, - I 1/Jj Zr -j , j=O (5.7.4) satisfy (5.7.1) and conditions (i)-(vi). The proof is then completed by establish ing the uniqueness of the three sequences. Clearly Z, as defined by (5.7.2) is an element of A, and is orthogonal to A,_ 1 by the definition of fA, _ , X, . Hence Z, E A,':_ 1 c A/�2 c · · · , which shows that for s < t, E(Z5 Z,) = 0. By Problem 5. 1 9 this establishes (ii) and (iii). Now by Theorem 2.4.2(ii) we can write 00 (5.7.5) = I 1/Jj Zr-j , j =O where 1/Jj is defined by (5.7.3) and I� o 1/1} < oo. The coefficients 1/Jj are inde pendent of t by stationarity and psp{Zi, j s; r} Xr 1/10 = a-2 <X,, X, - PA, _ , X, ) = a - 2 I I X, - P.A, _ , X, II 2 = 1 . Equations (5.7.4) and (5.7.5) and the definition of PS!5{ Zj , j s; r ) X, imply that < Vr, Zs ) = 0 for s :0:: t. On the other hand if s > t, Zs E A!--- 1 c A,l- , and since v; E .4t, we conclude that . < Vr, Zs ) = O for s > t, establishing (iv.) To establish (v) and (vi) it will suffice (by Remark 1 ) to show that for every t. (5.7.6) sp { �,j :0:: t} = A - oo Since V. E A, = Ar - 1 EB sp { Z, } and since < v;, Z, ) = 0, we conclude that J!; E Ar - 1 = A,_ 2 EB sp { Z,_d. But since < J!;, Zr 1 ) = 0 it then follows that v; E A,_ 2. Continuing with this argument we see that v; E A,_j for each j � 0, whence v; E n i= o A,_j = A Thus - - oo · sp { �,j :0:: t} � A - oo for every t. (5. 7. 7) Now by (5.7.4), A, = sp {Zj,j :O:: t} EB sp{ rj,j :O:: t}. If YEA_00 then Y E A5 _ 1 for every s, so that < Y, Z5 ) = 0 for every s, and consequently Y E sp{ T-j, j :0:: t } . §5.7.* The Wold Decomposition But this means that At - en s; 189 sp { �,j :s;: t} for every t, (5.7.8) which completes the proof of (5.7.6) and hence of (v) and (vi). To establish uniqueness we observe from (5. 7. 1) that if { Z, } and { V, } are any sequences satisfying (5.7. 1 ) and having the properties (i)-(vi), then Ar - J s; sp { Zj ,j :s;: t - 1 } EB sp { �,j :s;: t - 1 } from which it follows, using (ii) and (iv), that Z, is orthogonal to Ar - 1 - Projecting each side of (5.7. 1 ) onto A, _ 1 and subtracting the resulting equation from (5.7. 1 ), we then find that the process { Z, } must satisfy (5. 7.2). By taking inner products of each side of (5. 7. 1) with Z, _j we see that 1/Jj must also satisfy (5.7.3). Finally, if (5.7. 1) is to hold, it is obviously necessary that V, must be defined as in (5.7.4). D In the course of the preceding proof we have established a number of results which are worth collecting together as a corollary. Corollary 5.7.1 (a) sp { �,j :s;: t} = At for every t. (b) At, = sp {Zj ,j :s;: t} EB Af_ 00 • (c) = sp{Zj ,j E Z}. (d) sp { Uj ,j :s;: t} sp {Zj ,j :s;: t}, where U, = Li= o t/lj Zr-j · -oo u�tJ-co = PROOF. (a) (b) (c) (d) This is a restatement of (5.7.6). Use part (a) together with the relation, A, = sp {Zj ,j :s;: t} EB sp{ �,j :s;: t}. Observe that At = sp{X, t E Z} = sp{Z,, t E Z} EB Af_ 00 • This follows from the fact that At, = sp { tj,j :s;: t} EB At - w D In view of part (b) of the corollary it is now possible to interpret the representation (5.7. 1 ) as the decomposition of the subspace A, into two orthogonal subs paces sp { Z;,j :s;: t} and At- oo · A stationary process is said to be purely non-deterministic if and only if Af_G(, = {0}. In this case the Wold decomposition has no determin istic component, and the process can be represented as an MA( oo ), X, = '[.]= 0 t/lj Zr -j · Many of the time series dealt with in this book (e.g. ARMA processes) are purely non-deterministic. Observe that the h-step predictor for the process (5.7. 1) is 00 since Zj j_ ._4f, error IS �ff, xr+h = jI. t/lj Zr+h-j + v,+h• =h for all j < t, and V,+ h E A, . The corresponding mean squared 190 5. Prediction of Stationary Processes which should be compared with the result (5.5.5). For a purely non-deterministic process it is clear that the h-step prediction mean squared error converges as h -+ oo to the variance of the process. In general we have from part (d) of Corollary 5.7. 1 , psp{Ui ,j <; t } Ut + h = psp{Zi,j<; t } Ut + h = � 1/Jj Zt + h-j , ]� h which shows that the h-step prediction error for the { UJ sequence coincides with that of the { X1 } process. This is not unexpected since the purely deter ministic component does not contribute to the prediction error. 00 EXAMPLE 5. 7 .I. Consider the stationary process xt = zt Y, where { zt } WN (0, ri 2 ), { Z1 } is uncorrelated with the random variable Y and Y has mean zero and variance ri 2 . Since + � 1 n-1 1 n-1 - I xt -j = I zt -j + Y � Y, n j�o n j�o it follows t hat y E Jilt for every t. Also zt ..L J!!s for s < t so zt ..L J!!- oo · Hence Y = ��- X1 is the deterministic component of the Wold decomposition and " zt = xt y is the purely non-deterministic component. - - For a stationary process { X1 } satisfying the hypotheses of Theorem 5. 7. 1 , the spectral distribution function Fx i s the sum of two spectral distribution functions Fu and Fv corresponding to the two components V1 = L � o 1/Ji Zt -i and f'; appearing in (5.7. 1 ) (see Problem 4.7). From Chapter 4, Fu is absolutely continuous with respect to Lebesgue measure and has the spectral density 00 where 1/J(e- ;;, ) = L 1/Ji e - ;i;._ (5.7.9) j�O On the other hand, the spectral distribution Fv has no absolutely continuous component (see Doob ( 1953)). Consequently the Wold decomposition of a stationary process is analogous to the Lebesgue decomposition of the spectral measure into its absolutely continuous and singular parts. We state this as a theorem. Theorem 5.7.2. If ri 2 > 0, then Fx = Fu + Fv where Fu and Fv are respectively the absolutely continuous and singular com ponents in the Lebesgue decomposition of Fx. The density function associated with Fu is defined by (5.7.9). The requirement ri 2 > 0 is critical in the above theorem. In other words it is possible for a deterministic process to have an absolutely continuous spectral distribution function. This is illustrated by Example 5.6. 1 . In the next section, a formula for ri 2 will be given in terms of the derivative of Fx which §5.8. * Kolmogorov's Formula 191 CJ2 is valid even in the case = 0. This immediately yields a necessary and sufficient criterion for a stationary process to be deterministic. §5.8 * Kolmogorov's Formula Let {X, } be a real-valued zero-mean stationary process with spectral distri bution function Fx and let f denote the derivative of Fx (defined everywhere on [ - n, n] except possibly on a set of Lebesgue measure zero). We shall assume, to simplify the proof of the following theorem, that f is continuous on [ - n, n] and is bounded away from zero. Since { X,} is real, we must have f(.A.) = f( A) 0 ::;; ). ::;; n. For a general proof, see Hannan ( 1 970) or Ash and Gardner ( 1 975). - , Theorem 5.8.1 (Kolmogorov's Formula). The one-step mean square prediction error of the stationary process { X, } is {_l_f" CJ2 =2n e x p 2n - rc } ln f(.A.) d.A. . (5.8. 1) PROOF. Using a Taylor series expansion of ln(1 - z ) for l z l < 1 and the identity J"- " e ik .l. d). =0, k # 0, we have for l a l < 1 , f " ln l l - ae - i.l. l 2 d). = f} n(l - ae - i.l.)(l - iiei.l.) d). (5.8.2) If {X,} is an AR(p) process satisfying cp(B)X, = Z, where {Z,} WN(O, CJ 2 ) and ¢(z) = 1 - ¢ 1 z - · · · - ¢vzP # 0 for l z l ::;; 1, then {X,} has spectral density, = 0. � where I a) < 1 , j = 1, . . . , p. Hence f" 2 f " (J p f" (J 2 1n - d.A. - I 1n l 1 - ai e ;. l 2 d.A. =2n 1 n - , 2n 2n j= 1 . establishing Kolmogorov' s formula for causal AR processes. Under the assumptions made on f, it is clear that min _ rc s ;. 9 /(.A.) > 0. Moreover, it is easily shown from Corollary 4.4.2 that for any c: E (0, min f(.A.)), there exist causal AR processes with spectral densities g�l ) and g�2 l such that - rc 1n g(.A.) d.A. = - rc - rc - ' (5.8.3) 1 92 5. Prediction of Stationary Processes a2(f) = E[(Xt - P-sp{Xt - I • · · · • Xr_,l} X )2] Now define n f C ] , . . . , Cn = �i � J� , ll - c 1 e-i.l - . c. e - i"" l 2f(A.) d)., .· - c , c,. By (5.8.3) and the definition of a�( · ), a�(g� :-:::; a�(f) :-:::; a�(g�2l). Since, by Problem 2. 1 8, a� (f) --> a2 (f) E[(X1 - PSf5{X" <s< t} X1 )2 ], a2(g�1 l) :-:::; a2(f) :-:::; a2(g�2)). (5.8.4) 1) ) _ := oo However we have already established that a2(g�i)) 2n expL� f ,ln g�i)(A.)dA.} i 1 , 2. If follows therefore from (5.8.4) that a 2 (f) must equal the common limit, as --> 0, of a2(g�l l) and a2(g�2l), i.e. D a2(f) 2n exp{_l_2n I" In f(A.) dA.}. Remark Notice that - oo J':.. , In f(A.) dA. < oo since In f(A.) :-:::; f(A.). If J". , ln f(A.) d A. -oo, the theorem is still true with a 2 = 0. Thus a2 0 if and only if r, Inf(A.) dA. -oo, and in this case f(A.) 0 almost everywhere. = = £ = _, :-:::; 1. = > > > Remark 2. Equation (5.8. 1 ) was first derived by Szego in the absolutely continous case and was later extended by Kolmogorov to the general case. In the literature however it is usually referred to as Kolmogorov's formula. Fu(dA.) a2 dA./2n Fv(dA.) 0"2 b0(dA.) b0 2n exp {_1_2n J" In (2n0'2 ) dA.} a2. EXAMPLE 5.8. 1 . For the process defined in Example 5.7. 1 , = = and where is the unit mass at the origin. Not surprisingly, the one-step mean square prediction error is therefore = _, Problems 5. 1 . Let { X1 } be a stationary process with mean Jl. Show that where { Y; } = {X1 - Jl}. Problems 193 5.2. Suppose that { � ' n = I , 2, . . . } is a sequence of subspaces of a Hilbert space .Yt with the property that � <;: �+l , n = I , 2, . . . . Let .Yl'00 be the smallest closed subspace of .Yt containing U �, and let X be an element of .Yt. If PnX and P"' X are the projections of X onto � and .Yt", respectively, show that (a) P1 X, (P2 - P1 )X, (P3 - P2 )X, . . . , are orthogonal, (b) I � l 11 ( � + 1 - � ) X II 2 < 00 , and (c) Pn X --> PCN X. 5.3. Show that the converse of Proposition 5. 1 . 1 is not true by constructing a stationary process {X, } such that r" is non-singular for all n and y(h) + 0 as h --> 00 . 5.4. Suppose that { X, } is a stationary process with mean zero and spectral density -n ::::::; Jc ::::::; n. Find the coefficients { 8,i,j = l , . . . , i; i = 1, . . . , 5} and the mean squared errors {v, , i = 0, . . . , 5}. 5.5. Let {X, } be the MA( l ) process of Example 5.2. 1 . If i O I oo , (a) II X" - X" - Zn ll --> 0, (b) Vn -> IJ 2 , and 2 (c) (}n l --> 8. (Note that (} = E(Xn+! Zn)a- and (}n l = v;.\ E(Xn +l (Xn - Xn )).) 5.6. Let {X, } be the invertible M A(q) process X, = Z, + 81 Z, _ 1 + · · · + 8qZr-q' Show that as n --> oo , (a) II X" - X" - Zn ll --> 0, (b) Vn -> IJ2 , and that (c) there exist constants K > 0 and c E (0, I) such that I (}ni - (}i 1 5.7. Verify equations (5.3.20) and (5.3.21 ). ::::::; Kc" for all n. 5.8. The values .644, - .442, - .9 1 9, - 1 .573, .852, - .907, .686, - .753, - .954, .576, are simulated values of X1 , . . . , X 1 0 where { X, } is the ARMA(2, I) process, X, - . I X,_1 - . 1 2X, _2 = Z, - .7Z, _ 1 , { Z, } � WN(O, 1 ). (a) Compute the forecasts P10X1 1 , P1 0X1 2 and P1 0X1 3 and the corresponding mean squared errors. (b) Assuming that Z, N(O, 1 ), construct 95% prediction bounds for X1 1 , X1 2 and x l 3 . (c) Using the method of Problem 5. 1 5, compute X[1 , X[2 and X[3 and compare these values with those obtained in (a). [The simulated values of X1 1 , X1 2 and X1 3 were in fact .074, 1 .097 and - . 1 87 respectively.] � 5.9. Repeat parts (a)-( c) of Problem 5.8 for the simulated values - 1 .222, 1 .707, .049, 1 .903, - 3.341, 3.041, - 1 .0 1 2, - .779, 1 .837, - 3.693 of X 1 , . . . , X1 0 , where {X, } is the MA(2) process X, = Z, - l . I Z,_1 + .28Z,_ 2 , { Z, } � WN(O, 1 ). 5. Prediction of Stationary Processes 194 [The simulated values of X1 1 , X1 2 and X1 3 in this case were 3.995, - 3.859 3.746.] 5.10. If {X I ' . . . ' xn } are observations of the AR(p) process, { Z, } � WN(0, 0" 2 ), show that the mean squared error of the predictor PnXn+ h is h- 1 for n � p, h � 1 , O"; (hJ = 0"2 2: 1/lf j �O where 1/J(z) = L � o lj11z1 1/1/J(z). This means that the asymptotic approximation (5.3.24) is exact for an autoregressive process when n � p. = 5. 1 1 . Use the model defined in Problem 4. 1 2 to find the best linear predictors of the Wolfer sunspot numbers X 1 0 1 , . . . , X 1 05 (being careful to take into account the non-zero mean of the series). Assuming that the series is Gaussian, find 95% prediction bounds for each value. (The observed values of X 10 1 , . . . , X 1 05 are in fact 1 39, I l l , 1 02, 66, 45.) How do the predicted values P1 00X1 oo+ h and their mean squared errors behave for large h? 5. 1 2. Let { X, } be the ARMA(2, 1 ) process, and let X, - . 5 X, _ 1 + .25 X, _ 2 = Z, + 2Z,_ 1 , { Y. X,, t ::s; 2, '_ X, - .5X,_1 + .25X,_ 2 , {Z, } � WN(O, 1 ), t > 2. (a) Find the covariance matrix of ( Y1 , Y2 , Y3 )' and hence find the coefficients e 1 1 and e2 1 in the representations ;\\ = e l l (x l - x l ), + e2 1 (X z - X2 ). (b) Use the mean squared errors of the predictors X 1 , X2 and X 3 to evaluate the determinant of the covariance matrix of (X1 , X2 , X3 )'. (c) Find the limits as n --> oo of the coefficients en! and of the one-step mean square prediction errors vn. (d) Given that X199 = 6.2, X 2 00 = - 2.2 and X2 00 = .5, use the limiting values found in (c) to compute the best predictor x2 0 1 and its mean squared error. 2 (e) What is the value of limh-oo E(Xn +h - PnXn+ h ) ? 5. 1 3. The coefficients enJ and one-step mean squared errors vn = rn0"2 can be deter mined for the general causal ARMA(1 , 1 ) process (5.3 . 1 1 ) by solving the equations (5.3. 1 3 ) as follows: (a) Show that if Yn := rn/(rn - !), then the last of the equations (5.3. 1 3), can be rewritten in the form, n � I. Yn = o - 2Yn - t + I , n l 2 (b) Deduce that Yn e - 2 Yo + L i� l e - (}- ) and hence determine rn and on ! ' n = 1 , 2, . . . . (c) Evaluate the limits as n --> oo ofrn and On1 in the two cases 1 e 1 < 1 and 1 e 1 � 1 . x3 = .5Xz - .25X l = 195 Problems 5. 14. Let {X, } be the MA( l ) process x, = z, + oz, _ l , {Z, } � WN(0, 0"2) with 1 0 1 < I . (a) Show that vn := E I Xn +l - xn+l i 2 = 0'2( 1 - 82"+4)/(1 - 82" + 2). (b) If X�+ I = - Li= I ( - wx n + 1 - j is the truncation approximation to PnXn + I ' show that E I X n + 1 - X�+ 1 1 2 = ( I + 82" + 2)0'2 and compare this value with vn for 1 11 1 near one. 5.15. Let {X, } be a causal invertible ARMA(p, q) process r/i (B)X, = &(B)Z,, Given the sample {X1, Z,* = {0 . • . , Xn }, we define if t :-:::; 0 or t > r/i (B)X, - 111 Z,*_1 - • • · - Bq Z,*_q if t = 1 , . . . , n, n, where we set X, = 0 for t <::; 0. (a) Show that r/J(B)X, = B(B)Z,* for all t <::; n (with the understanding that X, = 0 for t :-:::; 0) and hence that Z,* = n(B)X, where n(z) = r/J(z)/B(z). (b) If x;;+ l = - LJ= I njXn+ l -j is the truncation approximation to PnXn + l (see Remark I in Section 5.5), show that (c) Generalize (b) to show that for all h 2 1 where Xl = xj ifj = 1 , . . . , n. 5.16.* Consider the process X, = A cos(Bt + U), t = 0, ± 1, . . . , where A, B and U are random variables such that (A, B) and U are independent, and U is uniformly distributed on (0, 2n). (a) Show that {X,} is stationary and determine its mean and covariance function. (b) Show that the joint distribution of A and B can be chosen in such a way that { X, } has the autocovariance function of the MA(1) process, Y, = Z, + &Z,_1 , {Z, } WN(0, 0"2), 101 :<:; 1. (c) Suppose that A and B have the joint distribution found in (b) and let X,*+.h and X, + h be the best and best linear predictors respectively of X,+ h in terms of {Xi, - oo < j <::; t }. Find the mean squared errors of X,*+.h and X,+h ' h 2 2. � 5. 1 7.* Check that equation (5.6. 1 5) is equivalent to (5.5.4). 5. 1 8. * If 0'2 is the one-step mean-square prediction error for a stationary process { X,} show that 0'2 = 0 if and only if X, E At for every t. - oo 5. 1 9. * Suppose that {X, } is a stationary process with mean zero. Define Jt, = sp {X., s :-:::; t } and z, = X, - PAt,_1X,. (a) Show that 0'2 = E I X,+1 - PAt, X, + l l 2 does not depend on t. (b) Show that t/Ji = E(X,Z,_i)/0'2 does not depend on t. 5. Prediction of Stationary Processes 196 5.20.* Let { Y, } be the MA( l ) process, {Z, } WN(O, ri2 ), Z, + 2.5Zt - 1 , and define X, A cos(wt) + B sin(wt) + Y, where A and B are uncorrelated (0, rifj random variables which are uncorrelated with { Y, } . (a) Show that { X, } i s non-deterministic. (b) Determine the Wold decomposition of { X, } . (c) What are the components o f the spectral distribution function o f { X , } cor responding to the deterministic and purely non-deterministic components of the Wold decomposition? Y, = � = 5.2 1 . * Let { x., n = 0, ± 1, . . . } be a stationary Markov chain with states ± 1 and transition probabilities P(Xn +l j i X. i) p if i j, ( 1 - p) if i =I= j. Find the white noise sequence { z. } and coefficients aj such that = = Z. E sp {X,, - = oo = < t s n} and n 00 x. = I ajZn -j, j=O = 0, ± 1 , . . . . 5.22. Suppose that = A cos(nt/3) + B sin(nt/3) + z, + .5Z, _, , t 0, ± 1 , . . , where {Z,} WN(O, 1 ), A and B are uncorrelated random variables with mean zero, variance 4 and E(AZ,) = E(BZ,) = 0, t = 0, ± 1, . . . . Find the best linear predictor of X,+, based on X, and X,_ , . What is the mean squared error of the best linear predictor of X,+, based on { Xj, -oo < j ::::; t}? X, = . � 5.23. * Let { X, } be the moving average where �k = G)Ci: ) j= - oo k . (a) Find the spectral density of { X, } . (b) Is the process purely non-deterministic, non-deterministic, o r deterministic? 5.24. * If the zero-mean stationary process { X. } has autocovariance function, y(h) = {1 p if h if h 0, =I= 0, = where 0 oo of n _ , L} = 1 X _ j exists, (b) show that x. can be represented as x. = Z + Y., where { Z, }j,j 0, ± 1 , . . . } are zero-mean uncorrelated random variables with EZ2 = p and E }j 2 = 1 - p, j 0, ± 1, . . . (c) find the spectral distribution function of { Xn } , = = , 1 97 Problems (d) determine the components in the Wold decomposition of X" , and (e) find the mean squared error of the one-step predictor Psp{x1• _ "' < i , nl X " + 1 . 5.25. Suppose that { V,} and { V,} are two stationary processes having the same auto covariance functions. Without appealing to Kolmogorov's formula, show that the two processes have the same one-step mean-square prediction errors. 5.26.* Under the assumptions made in our proof of Kolmogorov's formula (Theorem 5.8. 1 ), show that the mean squared error of the two-step linear predictor X t + 2 ·- Psp{XJ, � x_ < 1 :s; r : X t + 2 is with (J 2 as in (5.8.1). 5.27. Let { X, } be the causal AR( l ) process, X, - rjJX,_1 = Z,, {Z, } � WN(O, (J2 ), and let xn+ ! be the best linear predictor of xn+ ! based on X I ' . . . ' Xn . Defining enO 1 and X 1 0, find en! ' . . . ' enn such that n Xn+! L enj (Xn+! -j - xn+ t -jl· j� O = = = 5.28.* Suppose that X, = L.0 o 1/JiZ, _ i, {Z,} WN(O, I ) and L.0 o 1/JJ < that the h-step mean-square prediction error, � (J2(h) := E(X, + h - Psp{X" - ro < s :>t) X t + h)2 , satisfies (Jz(h) � 1/1� + . . . + ljlj; _ t · Conclude that {X,} is purely non-deterministic. oo . Show CHAPTER 6* Asymptotic Theory In order to carry out statistical inference for time series it is necessary to be able to derive the distributions of various statistics used for the estimation of parameters from the data. For finite n the exact distribution of such a statistic f.(X 1 , . , X.) is usually (even for Gaussian processes) prohibitively compli cated. In such cases, we can still however base the inference on large-sample approximations to the distribution of the statistic in question. The mathe matical tools for deriving such approximations are developed in this chapter. A comprehensive treatment of asymptotic theory is given in the book of Serfling ( 1 980). Chapter 5 of the book by Billingsley ( 1986) is also strongly recommended. . . §6. 1 Convergence in Probability We first define convergence in probability and the related order concepts which, as we shall see, are closely analogous to their deterministic counter parts. With these tools we can then develop convergence in probability ana logues of Taylor expansions which will be used later to derive the large-sample asymptotic distributions of estimators of our time series parameters. Let {a., n = 1, 2, . . } be a sequence of strictly positive real numbers and let { x., n = 1 , 2, . . . } be a sequence of random variables all defined on the same probability space. . Definition 6.1.1 (Convergence in Probability to Zero). We say that X. con verges in probability to zero, written x. = op ( 1 ) or x• .!... 0, if for every e > 0, P( I X. I > e) ---> 0 as n ---> oo . §6. 1. Convergence in Probability 199 Definition 6. 1.2 (Boundedness in Probability). We say that the sequence {Xn } is bounded in probability (or tight), written Xn = Op ( 1 ), if for every e > 0 there exists b(e) E (0, oo) such that P( I Xnl > b(e)) < e for all n. The relation between these two concepts is clarified by the following equivalent characterization of convergence in probability to zero, viz. Xn = op( 1 ) if and only if for every e > 0 there exists a sequence bn(e) ! 0 such that P( I Xn l > bn(e)) < e for all n, (see Problem 6.3). The definitions should also be compared with their non random counterparts, viz. xn = o( 1) if xn --> 0 and xn = 0( 1) if { xn } is bounded. Definition 6.1.3 (Convergence in Probability and Order in Probability). (i) Xn converges in probability to the random variable X, written Xn .!.. X, if and only if Xn - X = op ( 1 ). (ii) xn = op (an) if and only if a;; 1 xn = op ( l ). (iii) Xn = Op (an) if and only if a;; 1 Xn = Op ( 1 ). Notice that if we drop the subscripts p in Definitions 6. 1 .3 (ii) and (iii) we recover the usual definitions of o( · ) and 0( · ) for non-random sequences. In fact most of the rules governing the manipulation of o( ·) and 0( · ) carry over to op ( · ) and OP ( · ). In particular we have the following results. Proposition 6.1 . 1 . If Xn and Y,., n = 1, 2, . . . , are random variables defined on the same probability space and an > 0, bn > 0, n = 1 , 2, . . . , then (i) if Xn = op (an) and Y, = op (bn), we have xn Y, = op (anbn), xn + Y, = op (max(an, bn)), and for r > 0; I Xnl' = op (a�), (ii) if Xn = op (an) and Y, = Op (bn), we have xn Y, = op (anbn). Moreover (iii) the statement (i) remains valid if oP is everywhere replaced by OP . PROOF. (i) If I Xn Y, l/(anbn) > e then either I Y,.l/bn :::;; 1 and I Xnl/an > e or I Y,.l/bn > 1 and I Xn Ynl/(anbn) > e. Hence P(IXn Y, I/(a"b") > e) :::;; P( I Xn llan > e) + P(I Y, I/bn > 1) --> 0 as n --> oo . If I Xn + Y,. l/max(an, bn) > e then either I Xnl/an > e/2 or I Y,. l/bn > e/2. Hence 6. Asymptotic Theory 200 P(IXn + -> Y.t l/max(an, bn) > c) S P(I Xn l/an > c/2) + P(I Y.t l/bn > c/2) 0 as n -> oo . For the last part of (i) we simply observe that P( I Xnl'/a� > c) = P(I Xnl/an > c 11') -> 0 as n -> 00 . 0 Parts (ii) and (iii) are left as exercises for the reader. The Definitions 6. 1 . 1 -6. 1 .3 extend in a natural way to sequences of random vectors. Suppose now that {Xn, n = 1, 2, . . . } is a sequence of random vectors, all defined on the same probability space and such that X" has k components Xn t ' Xn 2 , . . . , Xnb n = 1, 2, . . . . Definition 6.1 .4 (Order in Probability for Random Vectors). (i) xn = op (an) if and only if xnj = op(an), j = 1 , . . . ' k. (ii) X" = Op (a") if and only if Xni = Op (a"), j = 1 , . . . , k. (iii) X" converges in probability to the random vector X, written Xn .!::. X, if and only if X" - X = op (l). Convergence in probability of X" to X can also be conveniently characterized in terms of the Euclidean distance I Xn - X I = [L�= l (Xni - Xi f ] 1 12 . Proposition 6.1.2. Xn - X = op (l) if and only if I Xn - X I = op( l ). PROOF. If Xn - X = op( l ) then for each c > O, limn� G() P( I Xni - Xi l 2 > c/k) = 0 for each j = 1 , . . . , k. But P et ) it I Xni - Xi l 2 > c s P( I Xni - Xi l 2 > c/k) (6. 1 . 1 ) since L�=l I Xni - Xi l 2 > c implies that at least one summand exceeds c/k. Since the right side of (6. 1 . 1 ) converges to zero so too does the left side and hence I Xn - X l 2 = op ( 1 ). By Proposition 6. 1 . 1 this implies that I Xn - X I = op( l ). Conversely if I Xn - X I = op( l ) we have I Xn i - X;l 2 s I Xn - X l 2 whence P( I Xn i - Xd > c) s P( I Xn - X l 2 > c 2 ) ...... 0. 0 Proposition 6.1.3. If X" - Y" .!::. 0 and Yn .!::. Y then X" .!::. Y. PROOF. I Xn - Y l s I Xn - Ynl + I Yn - Y l = op ( 1 ), by Propositions 6. 1 . 1 and �1.2. 0 . Proposition 6.1 .4. If {Xn} is a sequence of k-dimensional random vectors such that X" .!::. X and if g : IRk -> !Rm is a continuous mapping, then g(Xn)!. g(X) PROOF. Let K be a positive real number. Then given any c > 0 we have §6. 1 . Convergence in Probability 201 P( l g( X.) - g(X)I > e) � P( l g (X. ) - g(X)I > e, l X I � K , I X. I � K) + P( { I X I > K} u { I X.I > K } ). Since g is uniformly continuous on {x : l x l � K}, there exists y(e) > 0 such that for all n, { l g( X.) - g(X) I > e, IXI � K , I X. I � K} Hence s;; { I X. - X I > y(e)}. P(l g(X.) - g(X) I > e) � P( I X. - X I > y(e)) + P( I X I > K) + P( I X. I > K) � P( I X. - X I > y(e)) + P( I X I > K) + P( I X I > K/2) + P( I X. - X I > K/2). Now given any i5 > 0 we can choose K to make the second and third terms each less than <5/4. Then since I X. - X I � 0, the first and fourth terms will each be less than <5/4 for all n sufficiently large. Consequently g(X. ) � g(X ). D Taylor Expansions in Probability If g is continuous at a and x. = a + oP (l) then the argument of Proposition 6. 1 .4 tells us that g(X. ) = g(a) + op ( l ). If we strengthen the assumptions on g to include the existence of derivatives, then it is possible to derive probabilistic analogues of the Taylor expansions of non-random functions about a given point a. Some of these analogues which will be useful in deriving asymptotic distributions are given below. Proposition 6.1.5. Let {X. } be a sequence ofrandom variables such that Xn = a + Op (r.) where a E IR and 0 < r. -> 0 as n -> oo. If g is a function with s derivatives at a then s g<il(a) . g(X.) = L . , - (X. - a)l + op(r�), j= O } . < where g il is the ph derivative of g and g < OJ = g. PROOF. Let [ h(x) = g(x) - _L s g <il(a) · . ,- (x - a)l . ) J=O ]j[ (x - a)5 S 1. ] , x i= a, and h(a) = 0. Then the function h is continuous at a so that h(X.) = h(a) + op(l). This implies that h(X.) = op( l ) and so by Proposition 6. 1 . 1 (ii), 202 6. Asymptotic Theory which proves the result. D EXAMPLE 6. 1 . 1 . Suppose that { X,} "' IID(p, a 2 ) with J1 > 0. If Xn = n - 1 L�= J X,, then by Chebychev' s inequality (see Proposition 6.2. 1 below), P(n 1 1 2 1 Xn - p i > c) ::;; a 2 £ - 2 , and hence Xn - p = Op(n - ! 1 2 ). Since In x has a derivative at p, the conditions of Proposition 6. 1.5 are satisfied and we therefore obtain the expansion, In X" = In 11 + p - 1 (X" - p) + op(n - 1 1 2 ). We conclude this section with a multivariate analogue of Proposition 6. 1 .5. Proposition 6.1 .6. Let {Xn} be a sequence of random k X" - a = x 1 vectors such that Op (r"), where a E [Rk and rn � 0 as n � oo. If g is a function from [Rk into IR such that the derivatives ogjox; are continuous in a neighborhood N (a) of a, then PROOF. From the usual Taylor expansion for a function of several variables (see for example Seeley ( 1 970), p. 1 60) we have, as x � a, k og g(x) = g(a) + L - (a ) (x; - a;) + o( l x - a l ). OX; Defining i=l [ h(x) = g(x) - g(a) - it ::i (a) (x; - a;)J / I x - a l, x # a, and h(a ) = 0, we deduce that h is continuous at a and hence that h(X") = op (1) as n � oo . By Proposition 6. 1 . 1 this implies that h(Xn) I Xn - a l = op (rn), which proves the result. D §6.2 Convergence in rt h Mean, r > 0 Mean square convergence was introduced in Section 2.7 where we discussed the space L 2 of square integrable random variables on a probability space (Q, .?, P). In this section we consider a generalization of this concept, conver- §6.2. Convergence in r'h Mean, r > 0 203 gence in r'h mean, and discuss some of its properties. It reduces to mean-square convergence when r = 2. Definition 6.2.1 (Convergence in r'h Mean, r > 0). The sequence of random variables {Xn } is said to converge in r'h mean to X, written X" .!:.. X, if E I Xn - X I' -> 0 as n -> 00. Proposition 6.2.1 (Chebychev's Inequality). If E I X I' < then P( I X I :::0: s) s s - ' E I X I'. oo, r :::0: 0 and s > 0, PROOF. P( I X I :::0: s) = P( I X I 's - ' :::0: 1 ) S E [ I X I's - r I[ l .ro ) ( I X I's- ')] s s - ' E I X I'. D The following three propositions provide useful connections between the behaviour of moments and order in probability. Proposition 6.2.2. If X" .!:.. X then X" .!.. X. PROOF. By Chebychev' s inequality we have for any s > 0, P( I Xn - X I > s) s s - r E I Xn - X I' -> 0 as n -> CIJ . Proposition 6.2.3. If a" > 0, n = D 1 , 2, . . . , and E(X;) = O(a�), then X" = Op (a"). PROOF. Applying Chebychev's inequality again, we have for any M > 0, P(a;; ' I Xn l > M) s a;; 2 E I Xni 2/M 2 s C/M 2 where C = sup (a;; 2 E I Xnl 2 ) < oo . Defining c5(s) 2(Cje) 112 if C > 0 and any positive constant if C = 0, we see from Definition 6. 1 .2 that a;;' I Xnl = OP ( l ). D = Proposition 6.2.4. If EX" __. f1 and Var(X") __. 0 then X" � f1 (and X" .!.. Proposition 6.2.2). 11 by PROOF. __. 0 as n __. ctJ . D 204 6. Asymptotic Theory §6.3 Convergence in Distribution The statements X" � X and Xn !'. X are meaningful only when the random variables X, X 1 , X2 , . . . , are all defined on the same probability space. The notion of convergence in distribution however depends only on the distribution functions of X, X 1 , X2 , . . . , and is meaningful even if X, X 1 , X2 , . . . , are all defined on different probability spaces. We shall show in Proposition 6.3.2 that convergence in distribution of a sequence { Xn } is implied by con vergence in probability. We begin with a definition. Definition 6.3.1 (Convergence in Distribution). The sequence {Xn} of random k-vectors with distribution functions { Fx J · ) } is said to converge in distribu tion if there exists a random k-vector X such that lim FxJx) = Fx(x) for all x E C, (6.3. 1 ) where C is the set of continuity points of the distribution function Fx( · ) of X. If (6.3. 1 ) holds we shall say that Xn converges in distribution to X. Such convergence will be denoted by X" => X or Fx " => Fx . If X" = X then the distribution of X" can be well approximated for large n by the distribution of X. This observation is extremely useful since Fx is often easier to compute than Fx "· A proof of the equivalence of the following characterizations of convergence in distribution can be found in Billingsley ( 1986), Chapter 5. Theorem 6.3.1 (Characterizations of Convergence in Distribution). If Fa, F1 , F2 , are distribution functions on IRk with corresponding characteristic func tions ij?"(t) = JIR" exp(it'x) dF"(x), n = 0, 1, 2, . . . , then the following statements are equivalent: . . • (i) Fn => Fa , (ii) JIR" g(x) dFn(x) --> J IR"g(x) dFa(x) for every bounded continuous function g, (iii) limn � C(J ij?n(t) = ij?a(t) for every t = (t 1 , . . . , tk )' E IRk . Proposition 6.3.1 (The Cramer-Wold Device). Let {Xn} be a sequence of random k-vectors. Then xn = X if and only if A.'Xn = A.'X for all A. = (A. 1 ' . . . ' A.d' E IRk . PROOF. First assume that xn = X. Then for any fixed A. E IR\ Theorem 6.3.1 (iii) gives showing that A.'Xn => A.'X. Now suppose that A.'Xn => A.'X for each A. E IRk . Then using Theorem 6.3.1 again, we have for any A. E IRk, 205 §6.3. Convergence in Distribution f/Jx JA.) = E exp(iA.'Xn ) = which shows that X" => X. ¢Jl.·xJ l) -+ ¢Yl.· x 0 ) = f/Jx (A.) D Remark 1. If X" => X then the Cramer-Wold device with Jci = 1 and Jci = 0, j # i, shows at once that xni => xi where xni and xi are the i'h components of X" and X respectively. If on the other hand Xni => Xi for each i, then it is not necessarily true that X" => X (see Problem 6.8). Proposition 6.3.2. If X" !.. X then (i) E l exp(it'Xn ) - exp(it'X) I -+ 0 as n -+ oo for every t E IRk and (ii) X" => X. PROOF. Given t E IRk and E > 0, choose b(s) > 0 such that l exp(it'x ) - exp(it'y) l = 11 - exp(it'(y - x)) I < E if l x - Yl < b. (6.3.2) We then have E l exp(it'Xn ) - exp(it'X) I = = E l l - exp(it'(X" - X ) ) l E [ l l - exp(it'(X " - X)) I J{Ix"-XI < b } J + E [ l l - exp(it'(X" - X)) I I {Ix"-m' : b}] . The first term is less than E by (6.3.2) and the second term is bounded above by 2P( 1 Xn - X I ::2: b) which goes to zero as n -+ oo since X" !.. X. This proves (i). To establish the result (ii) we first note that I E exp(it'Xn ) - E exp(it'X)I :::;; E l exp(it'Xn) - exp(it'X) I -+ 0 as n -+ oo, and then use Theorem 6.3. 1 (iii). D Proposition 6.3.3. If {Xn } and {Yn} are two sequences of random k-vectors such that X" - Y" = op ( l ) and X" => X, then Y" => X. PROOF. By Theorem 6.3.1 (iii), it suffices to show that l f/JvJt) - f/Jx Jt) l -+ 0 as n -+ oo for each t E IRk, (6.3.3) since then l f/JyJ t) - f/Jx (t) l :::;; l f/JvJt) - f/JxJt) l + l f/Jx Jt) - f/Jx (t) l -+ 0. But l f/JyJt) - f/Jx Jt) l = I E(exp(it'Yn ) - exp(it'Xn )) l :::;; E l l - exp(it'(X" - Y" ))l -+ 0 as n -+ oo, by Proposition 6.3.2. D 206 6. Asymptotic Theory Proposition 6.3.4. If {Xn } is a sequence of random k-vectors such that Xn => X and if h : IRk -4 IR"' is a continuous mapping, then h(Xn) => h(X). PROOF. For a fixed t E IR "', eit' h< X > is a bounded continuous function of X so that by Theorem 6.3. 1 (ii), �h<xjt) -4 �h< X l (t). Theorem 6.3.1 (iii) then implies that h(Xn) => h(X). D In the special case when { Xn } converges in distribution to a constant random vector b, it is also true that {Xn } converges in probability to b, as shown in the following proposition. (Notice that convergence in probability to b is meaningful even when X 1 , X 2 , . . . , are all defined on different proba bility spaces.) Proposition 6.3.5. If Xn => b where b is a constant k-vector, then Xn � b. PROOF. We first prove the result for random variables (i.e. in the case k = 1). If xn => b then Fx Jx) -4 I[b, oo ) (x) for all X =I= b. Hence for any c > 0, P( I Xn - b l :::;; c) = P(b - c :::;; Xn :::;; b + c) -4 I[b,oo ) (b + c) - I[b,oo) (b - c) = 1, showing that xn � b. To establish the result in the general case, k 2: 1, we observe that if Xn => b then Xni => bi for each j = 1 , . . . , k by Remark 1 . From the result of the preceding paragraph we deduce that xnj � bj for each j 1, . . . , k and hence by Definition 6. 1 .4 that Xn � b. D = Proposition 6.3.6 (The Weak Law of Large Numbers). If { Xn } is an iid sequence of random variables with a finite mean Jl, then where Xn := (X 1 + ··· + Xn)/n. - p -4 J1 xn PROOF. Since Xn - J1 = ((X 1 - Jl) + · · · + (Xn - Jl))jn, it suffices to prove the result for zero-mean sequences. Assuming that J1 = 0, and using the in dependence of X 1 , X 2 , . . , we have �xn (t) = Ee ir x" = (�x , (n- 1 t)t . From the inequality 1 1 - y n l :::;; ni l - yl, I Y I :::;; 1, and the assumption that EX 1 = 0 it follows that 1 1 - �xn (t) l :::;; n i l - �x Jn - 1 t) l = n i E( l + itn - 1 X 1 - e irn -' x ' ) l ' :::;; E l n ( 1 + itn- 1 X I - e irn - x ' ) 1 . . §6.3. Convergence in Distribution 207 A Taylor series approximation to cos x and sin x then gives 1 1 + iy - e iY I = 1 1 + iy - cos y - i sin y l ::::;; 1 1 - cos y l + I Y - sin y l ::::;; min (2 1 y l, I Y I 2 ) for all real y. Replacing y by tn �1 x in this bound we see that for every x n l n( l + itn � 1 x - e it -'x)l and ::::;; 2 1 t l l x l, n = 1 , 2, . . . , n itn� J X - e ir - 'x)l -4 0 as n -4 00 . n Since E I X 1 I < oo by assumption, E l n( l + itn� 1 X 1 - e;r - ' x ' ) l -4 0 by the dominated convergence theorem. Hence iflxJt) -4 1 for every t and since 1 is the characteristic function of the zero random variable we conclude from Propositions 6.3.1 (iii) and 6.3.5 that X" � 0. 0 l n( l + Proposition 6.3.7. If {Xn} and {Yn } are sequences of random k - and m-vectors respectively and if X" => X and Y" => b where b is a constant vector, then (6.3.4) (Note that (6.3.4) is not necessarily true if Y" converges in distribution to a non-constant random vector.) PROOF. If we define zn = [X� , b']', then from Proposition 6.3.5 we have Z" - [X�, Y�]' = op ( 1 ). It is clear that Z" => [X', b']' and so (6.3.4) follows from Proposition 6.3.3. D The following proposition is stated without proof since it follows at once from Propositions 6.3.4 and 6.3.7. Proposition 6.3.8. If {Xn } and {Yn} are sequences ofrandom k-vectors such that X" => X and Y" => b where b is constant, then (i) X" + Y" => X + b and (ii) Y�X" => b'X. The next proposition will prove to be very useful in establishing asymptotic normality of the sample mean and sample autocovariance function for a wide class of time series models. Proposition 6.3.9. Let Xn, n random k-vectors such that = 1 , 2, . . . , and Ynj• j = 1 , 2, . . . ; n = 1 , 2, . . . , be 208 6. Asymptotic Theory (i) Ynj = Yj as n -+ oo for each j = 1 , 2, . . . , (ii) Yj => Y as j -+ oo, and (iii) limh,, lim SUPn� co P( I X" - Yn) > s) = 0 for every e > 0. Then X" => Y as n -+ oo. PROOF. By Theorem 6.3. 1 , it suffices to show that for each t E IR k l tf>xJt) - tf>v (t)l -+ 0 as n -+ oo. The triangle inequality gives the bound l tf>xJt) - tf>v (t) l ::::; l tf>xJt) - tPv"/t) l + l tf>v"i (t) - tf>vi (t) l + l tf>v/t) - tf>v (t) l . (6.3.5) From (iii) it follows, by an argument similar to the proof of Proposition 6.3.2 (i), that lim sup" � co I tf>x" (t) - tf>v (t) l -+ 0 as j -+ oo. Assumption (ii) guarantees that the last term in (6.3.5) al� o goes to zero as j -+ oo. For any positive () we can therefore choose j so that the upper limits as n -+ oo of the first and third terms on the right side of (6.3.5) are both less than b/2. For this fixed value of j, limn � l tf>v"J (t) - tf>vJ (t) l = 0 by assumption (i). Consequently lim sup" � co I tf>xJt) - tf>v (t) l < 1b + 1b = b, and since b was chosen arbitrarily, lim supn� oo l tf>xJt) - tf>v (t)l = 0 as required. 0 co Proposition 6.3.10 (The Weak Law of Large Numbers for Moving Averages). Let { X1 } be the two-sided moving average j= - oo where {Z1 } is iid with mean J1 and L � - co l t/ljl < oo . Then (Note that the variance of Z1 may be infinite.) PROOF. First note that the series L� - oo t/lj Zt -j converges absolutely with probability one since Now for each j, we have from the weak law of large numbers, " p n - 1 "\' L. zt -j -+ J.l. t�l 209 §6.4. Central Limit Theorems and Related Results Proposition 6. 1 .4 that ( ) Y.k � 2: 1/Jj Jl· lil sk If we define Yk = (Lui s k 1/Ji )Jl then since Yk --+ Y := (L � � co 1/JJ Jl, it suffices to show by Proposition 6.3.9 that , lim lim sup P ( I x. - Y,k I > e) = 0 for every e > 0. k --too n--t oo Applying Proposition 6.2. 1 with r = 1 , we have P(I X. - Y,k l > e) = P (I l n� 1 I ) I L 1/li Zt �j > e t = l li l > k l/ -::;:, E iL 1/il Z l �j e l l>k -::;:, which implies (6.3.6). ( (6.3.6) ) j L 1 1/Ji l E I Z � I e, li l > k 0 §6.4 Central Limit Theorems and Related Results Many of the estimators used in time series analysis turn out to be asymp totically normal as the number of observations goes to infinity. In this section we develop some of the standard techniques to be used for establishing asymptotic normality. Definition 6.4.1. A sequence of random variables {X. } is said to be asymp totically normal with "mean" Jln and "standard deviation" (Jn if (Jn > 0 for n sufficiently large and where Z � N(O, 1 ). In the notation of Serfling ( 1 980) we shall write this as X. is AN(Jl., (J;). 0 Remark 1 . If x. is AN(Jl., (J;) it is not necessarily the case that Jln = EX. or that (J; = Var(X.). See Example 6.4. 1 below. Remark 2. In order to prove that x. is AN(Jl., (J; ) it is often simplest to establish the result in the equivalent form (see Theorem 6.3.1 (iii)), 1/Jzjt) --+ exp(- t 2/2), where 1/JzJ · ) is the characteristic function of z. = (J.� 1 (X. - Jl.). This approach 6 . Asymptotic Theory 210 works especially well when X. is a sum of independent random variables as in the following theorem. Theorem 6.4.1 (The Central Limit Theorem). If { x. } (X I + . . . + x.)/n, then � IID(fl, 0" 2 ) and x. = PROOF. Define the iid sequence { Y, } with mean zero and variance one by Y, = (X1 - /1)/0" and set Y. = n -1 L7= 1 Y; . By Remark 2, it suffices to show that rPn•t2 yJt) -+ e - 1 212 . By independence, we have [ rPn•t2 yJt) = E exp itn - 1 12 = [r/Jr , (tn - 1 12 ) ] " . t j=l lf] First we need the inequality, l x " - y " l .:::;; n i x - y l for l x l .:::;; 1 and I Y I .:::;; 1 , which can be proved easily by induction on n. This implies that for n ;:o: t 2/4, l [r/Jr 1 (tn - 1 12 )] " - ( 1 - t 2 /(2n))" l .:::;; n l r/Jr 1 (tn - 1 12 ) - ( 1 - t2 /(2n)) l (6.4. 1 ) n i E(eirn - ' 12 Y 1 - ( 1 + itn - 1 1 2 Y1 - t 2 Y i/(2n))) l . Using a Taylor series expansion of e ix in a neighborhood of x = 0 we have nl e itn - 1 1 2x - (1 + itn - 1 12 x - t 2 x 2 /(2n)) l -+ 0 as n -+ oo = and 2 n l e i t"- '1 x - (1 + itn- 1 12 x - t2 x2 /(2n)) l .:::;; (tx)2 for all n and x. Thus, by the dominated convergence theorem, the right-hand side of (6.4. 1 ) converges to zero as n -+ oo and since ( 1 - t 2 /(2n))" -+ e - 1 2 12 we obtain 2 rPn•tzrJt) -+ e - 1 12 as required. D Remark 3. The assumption of identical distributions in Theorem 6.4. 1 can be replaced by others such as the Lindeberg condition (see Billingsley, 1 986) which is a restriction on the truncated variances of the random variables x• . However the assumptions of Theorem 6.4. 1 will suffice for our purposes. Proposition 6.4.1. If x. is AN( fl, 0",7) where O"" -+ 0 as n -+ oo , and if g is a function which is differentiable at fl, then g(X.) is AN(g(fl), g'(/1) 2 0",7 ). PROOF. Since z. = 0".- 1 (X. - /1) => Z where Z � N(O, 1 ), we may conclude from Problem 6.7 that z. = OP( 1 ) as n -+ oo . Hence x. = 11 + OP (O".). By Proposition 6. 1 .5 we therefore have 0",;- 1 [g(X.) - g(/1) ] = 0",;- 1 g'( /l) [X. - 11] + op(1 ), which with Proposition 6.3.3 proves the result. D §6.4. Central Limit Theorems and Related Results EXAMPLE 6.4. 1 . Suppose that { Xn } "' IID(,u, (} 2 ) where .U # 0 and 0 < (} < If Xn = n- 1 (X1 · · · + Xn) then by Theorem 6.4. 1 Xn is AN(,u, (J 2/n), + 21 1 CIJ . and by Proposition 6.4. 1 , X,; 1 i s AN(,u-1, ,u-4(} 2jn). Depending on the distribution of Xn, it is possible that the mean of X,; 1 may not exist (see Problem 6. 1 7). We now extend the notion of asymptotic normality to random k-vectors, k � 1 . Recall from Proposition 1 .5.5 that X is multivariate normal if and only if every linear combination A.' X is univariate normal. This fact, in conjunction with the Cramer-Wold device, motivates the following definition (see Serfling (1 980)) of asymptotic multivariate normality. Definition 6.4.2. The sequence {Xn } of random k-vectors is asymptotically normal with "mean vector" Jln and "covariance matrix" Ln if (i) Ln has no zero diagonal elements for all sufficiently large n, and (ii) A.' Xn is AN(A.'Jln, A.'LnA) for every A E IRk such that A.' LnA > 0 for all sufficient large n. Proposition 6.4.2. If xn is AN(Jtn, Ln) and B is any non-zero m X k matrix such that the matrices BLnB', n = 1 , 2, . . . , have no zero diagonal elements then PROOF. Problem 6.21. D The following proposition is the multivariate analogue of Proposition 6.4. 1 . Proposition 6.4.3. Suppose that x n is AN(Jl, C� L) where L is a symmetric non negative definite matrix and en --+ 0 as n --+ oo. If g (X) = (g 1 (X), . . . , gm(X))' is a m mapping from IRk into !R such that each g i ( ) is continuously differentiable in a neighborhood of Jl, and if DLD' has all of its diagonal elements non-zero, where D is the m x k matrix [(8gj8xi ) (Jt) ] , then g (Xn) is AN(g (Jl), c�DLD'). · PROOF. First we show that xnj = .Uj + Op (cn). Applying Proposition 6.4.2 with B = (bi1 , bi2 , , bid we find that Xni = BX is AN(,ui, c� (}ii) where (}ii is the /h diagonal element of L and (}ii > 0 by Definition 6.4.2. Since c,; 1 (Xni - .Ui) converges in distribution we may conclude that it is bounded in probability (Problem 6.7) and hence that Xni = .Ui + Op (cn). Now applying Proposition 6. 1 .6 we can write, for i = 1 , . . . , m, . • • 212 6 . Asymptotic Theory or equivalently, g(Xn) - g(Jt) = D(Xn - Jt) + op (cn). Dividing both sides by en we obtain 1 c; 1 [g(Xn) - g(Jt) ] = c; D (Xn - Jl) + op ( 1 ), and since c; 1 D(Xn Jl) is AN(O, DI.D'), we conclude from Proposition 6.3.3 that the same is true of c; 1 [g(Xn) - g(Jl)] . D - EXAMPLE 6.4.2 (The Sample Coefficient of Variation). Suppose that { Xn } "' IID(,u, a 2), a > 0, EX� = ,u4 < oo, E X � = ,u 3 , EX?; = ,u2 = ,u 2 + a 2 and E X n = ,u 1 = ,u i= 0. The sample coefficient of variation is defined as Y, = sn/Xn where xn = n - 1 (X 1 + . . . + Xn) and s?; = n- 1 2:: 7= 1 (X; - Xn) 2 • It is easy to verify (Problem 6.22) that (6.4.2) where I. is the matrix with components i, j = 1 , 2. 1 Now Y, = g(Xn, n - 2:: 7= 1 X?) where g(x, y) = x - 1 (y - x 2 ) 1 12. Applying Prop osition 6.4.3 with we find at once that We shall frequently have need for a central limit theorem which applies to sums of dependent random variables. It will be sufficient for our purposes to have a theorem which applies to m-dependent strictly stationary sequences, defined as follows. Definition 6.4.3 (m-Dependence). A strictly stationary sequence of random variables { Xr } is said to be m-dependent (where m is a non-negative integer) if for each t the two sets of random variables {Xi , j :s; t} and {Xi, j z t + m + 1 } are independent. Remark 4. In checking for m-dependence of a strictly stationary sequence { Xr , t = 0, ± 1, ± 2, . . . } it is clearly sufficient to check the independence of §6.4. Central Limit Theorems and Related Results 213 the two sets {Xi , j ::;; 0} and {Xi , j � m + 1 } since they have the same joint distributions as {Xi , j ::;; t} and {Xi , j � t + m + 1 } respectively. Remark 5. The property of m-dependence generalizes that of independence in a natural way. Observations of an m-dependent process are independent provided they are separated in time by more than m time units. In the special case when m = 0, m-dependence reduces to independence. The MA(q) processes introduced in Section 3.1 are m-dependent with m = q. The following result, due originally to Hoeffding and Robbins ( 1 948), extends the classical central limit theorem (Theorem 6.4. 1 ) to m-dependent sequences. Theorem 6.4.2 (The Central Limit Theorem for Strictly Stationary m-Dependent Sequences). If {X, } is a strictly stationary m-dependent sequence of random variables with mean zero and autocovariance function y( · ), and if vm = y(O) + 2 L }= 1 y (j) -1= 0, then (i) limn� oo n Var(Xn) = vm and (ii) Xn is AN(O, vm/n). n n PROOF. (i) n Var(Xn) = n - 1 L L y (i - j) i = 1 j= 1 = L ( 1 - n - 1 lj l )y(j) li l < n = L ( 1 - n - 1 1 j l )y(j) for n > m lil :o; m (ii) For each integer k such that k > 2m, let Y,k = n - 1 12 [ (X 1 + · · · + Xk - m) + (Xk+ 1 + · · · + X2 k - m) + · · · + (X<r -1 Jk + 1 + · · · + X,k - m) ] where r = [n/k] , the integer part of n/k. Observe that n 112 Y,k is a sum of r iid random variables each having mean zero and variance, Rk - m = Var(X 1 + . . . + xk - m) = L (k - m - i jl)y(j). lil <k- m Applying the central limit theorem (Theorem 6.4. 1 ), we have Y,k = Y,. where Y,. N(O, k - 1 R k - m). Moreover, since k- 1 Rk - m ..... vm as k ..... oo, we may conclude (Problem 6. 1 6) that Yk = Y where Y � N(O, vm) · It remains only to show that � lim sup P(l n 112 Xn - Y,k l > �:) = 0 for every e > 0, klim -+oo n -+oo (6.4.3) 6 . Asymptotic Theory 214 since the second conclusion of the theorem will then follow directly from Proposition 6.3.9. In order to establish (6.4.3) we write (n 112 X" - Y,k ) as a sum of r = [ n/k] independent terms, viz. r-1 ... n 112 xn - Y,k = n -1 12 jLt (Xjk- m + l + xjk- m+ 2 + + Xjd � + n - 112 (Xrk -m+t + . . . + X"). Making use of this independence and the stationarity of {X, }, we find that Var(n 1 12 X" - Ynd = n -1 [([n/k] - l)Rm + Rh<na, where Rm = Var(X 1 + · · · + Xm), Rh<n> = Var(X 1 + · · · + Xh<n>) and h(n) = n - k [n/k] + m. Now Rm is independent of n and Rh<n> is a bounded function of n since 0 :::; h(n) :::; k + m. Hence lim SUPn� oo Var(n 112 xn - Y,k ) = k- 1 Rm, and so by Chebychev's inequality condition (6.4.3) is satisfied. D Remark 6. Recalling Definition 6.4. 1 , we see that the condition vm I= 0 is essential for conclusion (ii) of Theorem 6.4.2 to be meaningful. In cases where vm 0 it is not difficult to show that n 112 xn .!'... 0 and n Var(Xn ) -> 0 as n -> CIJ (see Problem 6.6). The next example illustrates this point. = EXAMPLE 6.4.3. The strictly stationary MA( l ) process, is m-dependent with m = 1, and Vm = y (O) + 2y (1) = 0. For this example X" = n- 1 (Z" - Z0), which shows directly that nX" => Z 1 Z0, n 112 X" .!'... 0 and n Var(X") -> 0 as n -> oo . EXAMPLE 6.4.4 (Asymptotic Behaviour of xn for the MA(q) Process with "L J� o ()i I= 0). The MA(q) process, is a q-dependent strictly stationary sequence with Vq = }: /U) = 0"2 (Jo ejy = 2nf(O), • . where f( · ) is the spectral density of {X, } (see Theorem 4.4.2). A direct appli cation of Theorem 6.4.2 shows that (6.4.4) Problems 215 Problems 6. 1 . Show that a finite set of random variables {X 1 , 6.2. Prove parts (ii) and (iii) of Proposition 6. 1 . 1 . 6.3. Show that x. = ov( l ) i f and only if fo r every e such that P ( I X.I > b.(e)) < e for all n. 6.4. Let X 1 , X2 , • • • > , X.} is bounded in probability. 0, there exists a sequence b.(e) !O , be iid random variables with distribution function F. If , X.) and m. := min(X 1 , , X.), show that M. /n !. 0 if x(1 - F(x)) -> 0 as x -> oo and m./n !. 0 if xF( - x) -> 0 as x -> oo . • • . M. : = max(X 1 , . • . • • • 6.5. If X. = Ov( l ), is it true that there exists a subsequence { X . } and a constant K E (0, oo) such that P(I X I < K, k = 1 , 2, . . . ) = 1? • •• 6.6. Let {X, } be a stationary process with mean zero and an absolutely summable autocovariance function y( · ) such that L�= y(h) = 0. Show that n Var(X.) -> 0 and hence that n 1 '2 X. !. 0. - oo 6.7. If {X. } is a sequence of random variables such that X. = X, show that {X. } is also bounded in probability. 6.8. Give an example of two sequences of random variables { X., n = 0, 1, . . . } and { Y,, n = 0, 1 , . . . } such that x. = X0 and Y, = Y0 while (X., Y,)' does not converge in distribution. 6.9. Suppose that the random vectors X. and Y. are independent for each n and that X. = X and Y. = Y. Show that [X�, Y�]' = [X', Y']' where X and Y are independent. 6. 1 0. Show that if x. = X, Y, = Y and X. is independent of Y, for each n, then x. + Y. = X + Y where X and Y are independent. 6. 1 1 . Let {X. } be a sequence of random variables such that EX. = m and Var(X.) = a} > 0 for all n, where a; -> 0 as n -> oo . Define z. a.- 1 (X. - m), = and let f be a function with non-zero derivative f'(m) at m. (a) Show that z. = Op( 1 ) and x. = m + ov( 1). (b) If Y, = [f(X.) - f(m)]/[aJ'(m)], show that Y. - z. = ov( l). (c) Show that if z. converges in probability or in distribution then so does Y,. (d) If s. is binomially distributed with parameters n and p, and f ' ( p ) i= 0, use the preceding results to determine the asymptotic distribution of f(S./n). 6. 1 2. Suppose that x. is AN(11, a; ) where a; -> 0. Show that x. !. 11. 6. 1 3. Suppose that x. is AN(/l, a;) and Y, = a. + ov(a.). If a./a. -> c, where 0 < c < show that (X. - �I)/ Y. is AN(O, c2). oo , , x.m l' = N (O, I:) and 1:. !. I: where I: is non-singular, show that z X � I:; x. = X (m). 6. 14. If X. = (X. 1 , t . • . 6. 1 5. If x. is AN(/l., u;), show that (a) x. is AN(ji., a; ) if and only if a.;a. -> 1 and (fi. - 11. )/a. -> 0, and 6 . Asymptotic Theory 216 (b) a" X" + b" is AN(11"' 0'; ) if and only if a" -> 1 and (11n (a" - 1) + bn)/O'n -> 0. (c) If X" is AN(n, 2n), show that ( 1 - n-1 )X" is AN(n, 2n) but that ( 1 - n- 1 12 )X" is not AN(n, 2n). 6. 1 6. Suppose that xn - N ( l1n> vn) where l1n -> 11, Vn -> v and 0 < v < X" => X, where X - N ( 11, v). 00 . Show that 6. 1 7. Suppose that { X, } - IID(11, 0' 2 ) where 0 < 0'2 < oo. If X" = n- 1 (X + . . . + X") 1 has a probability density function f(x) which is continuous and positive at x 0, show that E I Xn- 1 1 = oo. What is the limit distribution of g"- t when 11 = 0? = 6. 1 8. If X 1 , X2 , . . . , are iid normal random variables with mean 11 and variance 0'2 , find the asymptotic distributions of x; (n- 1 I 1= t Xj) 2 (a) when 11 # 0, and (b) when 11 = 0. = 6.19. Define In + (x) = { ln(x) if x > 0, 0 X :S: 0. If X" is AN(11, (J; ) where 11 > 0 and (J" -> 0, show that In + (X") is AN(ln(11), 11- 2 0'/ ). 6.20. Let f(x) 3x- 2 - 2x - 3 for x # 0. If Xn is AN( ! , 0'; ) find the limit distribution of (f(X") - 1 )/0'� assuming that 0 < 0'" -> 0. = 6.2 1 . Prove Proposition 6.4.2. 6.22. Verify (6.4.2) in Example 6.4.2. If 11 # 0, what is the limit distribution of n - 112 Y,? 6.23. Let X 1 , X 2 , , be iid positive stable random variables with support [0, oo ), exponent :X E (O, 1 ) and scale parameter c 1 1• where c > 0. This means that • • • Ee -ex , = exp( - cO"), 0 ;::: 0. The parameters c and :x can be estimated by solving the two "moment" equations n n - 1 I e -e , x, j= t where 0 < 0 1 < 02 , for c and estimators. :x. = exp( - cOf), i = 1, 2, Find the asymptotic joint distribution of the 6.24. Suppose { Z, } - IID(O, 0' 2 ). (a) For h ;::: I and k ;::: I , show that Z,Z,+h and ZsZs+ k are uncorrelated for all s # t, s ;::: I , t ;::: I . (b) For a fixed h ;::: I , show that n 0' - z n- 112 I (Z,Z, + 1 , , Z,Zr+h)' => ( Nt , . . . , Nh)' t= l • • • , Nh are iid N(O, I ) random variables. (Note that the sequence I , 2, . . . } is h-dependent and is also WN (0, 0'4).) (c) Show that for each h ;::: I , where N1 , N2, { Z,Zr+h' t • • . = n - 112 (� Z,Z,+h - �t� (Z, - Z") (Z,+h - Z" )) .!. 0 217 Problems where 1 z. = n- (21 + · · · + Z.). 1 (d) Noting by the weak law of large numbers that n - L�� Z� !. a2 , conclude from (b) and (c) that where CHAPTER 7 Estimation of the Mean and the Autocovariance Function I f { Xr } i s a real-valued stationary process, then from a second-order point of view it is characterized by its mean 11 and its autocovariance function y( · ). The estimation of fl, y( · ) and the autocorrelation function p( · ) = y( · )/y (O) from observations of X 1 , . . . , Xn, therefore plays a crucial role in problems of inference and in particular in the problem of constructing an appropriate model for the data. In this chapter we consider several estimators which will be used and examine some of their properties. §7. 1 Estimation of f1 A natural unbiased estimator of the mean 11 of the stationary process { Xr } is the sample mean (7. 1 . 1 ) We first examine the behavior of the mean squared error E(Xn - 11f for large n. Theorem 7.1.1. If { Xr } is stationary with mean 11 and autocovariance function y( · ), then as n --> oo, Var(Xn ) = E(Xn - !1) 2 --> 0 if y(n) --> 0, and 00 00 nE(Xn - /1)2 --> I y(h) if I l y(h) l < 00 . h= - oo h= - oo �7. 1 . Estimation of J1 219 1 n n Var(X") = - L Cov(X; , XJ n i, j= 1 PROOF. l hl< n = I ( ) lhl 1 - - y(h) n � I / y(h) / . n lhl< Ify(n) ----> 0 as n ----> oo then limn�<XJ n - 1 Ii hl < n ly(h)l = 2 limn �<XJ ly(n) l = 0, whence Var(X" ) ----> 0. If I h' / y(h) l < oo then the dominated convergence theorem gives w lhl lim n Var(X") = lim I 1 y(h) = L y(h). 0 n h = -oo n�c:o n--+ oo lh l<n = _ 00 ( - Remark 1. If Lh'= - oo / y(h) / < Corollary 4.3.2, oo , - ) - then {X, } has a spectral density f( " ) and, by ro n Var(X") ----> I y(h) = 2nf(O). h= -w Remark 2. If X, = J1 + I� - <XJ 1/JjZr -j with I� -w 1 1/Jj l oo (see Problem 3.9) and ro n Var(Xn) ----> )�co y(h) = 2nf(O) = rJ 2 < oo, then I h'= - co ly(h)l < (� ) j= oo <D 2 1/Jj . Remark 3. Theorem 7. 1 . 1 shows that if y(n) ----> 0 as n ----> oo, then X" converges in mean square (and hence in probability) to the mean Jl. Moreover under the stronger condition Ih'= ly(h) l < oo (which is satisfied by all ARMA(p, q) processes) Var(X" ) n - 1 I h'= y(h). This suggests that under suitable condi tions it might be true that xn is AN(J1, n - 1 I h' -w y(h)). One set of assumptions which guarantees the asymptotic normality is given in the next theorem. - <X) � - co = Theorem 7.1.2. If {X, } is the stationary process, w X, = J1 + L 1/JjZt -j• j= - oo 1/Jj of 0, then X" is AN(J1, n - 1 v), where v = L h= _ 00 y(h) = rJ 2 ( I� - oo 1/JY, and y( · ) is the autocovariance function of {X, } . where L � _"' I t/1) < oo PROOF. See Section 7.3. and L� -co D 220 7. Estimation of the Mean and the Autocovariance Function Theorem 7. 1.2 is useful for finding approximate large-sample confidence intervals for Jl. If the process {X, } is not only stationary but also Gaussian, then from the second line of the proof of Theorem 7. 1 . 1 we can go further and write down the exact distribution of X" for finite n, viz. a result which gives exact confidence bounds for J1 if y( · ) is known, and approximate bounds if it is necessary to estimate y( · ) from the observations. Although we have concentrated here on X" as an estimator of Jl, there are other possibilities. If for example we assume a particular model for the data such as tf!(B) (X, - Jl) = 8(B)Z,, then it is possible to compute the best linear unbiased estimator fln of J1 in terms of X 1 , . . . , X" (see Problem 7.2). However even with this more elaborate procedure, there is little to be gained asymptotically as n -> oo since it can be shown (see Grenander and Rosenblatt ( 1 957), Section 7.3) that for processes {X, } with piecewise continuous spectral densities (and in particular for ARMA processes) lim n Var(fl.") = lim n Var(X"). We shall use the simple estimator X". §7.2 Estimation of y( · ) and p( ) · The estimators which we shall use for y(h) and p(h) are n -h Y (h) = n- 1 L (X, - X")(X,+h - X"), r=1 and 0�h�n- l, (7.2. 1 ) (7.2.2) p (h) = y(h)/Y (O), respectively. The estimator (7.2. 1) is biassed but its asymptotic distribution (as n -> oo) has mean y(h) under the conditions of Proposition 7.3.4 below. The estimators y(h), h = 0, . . . , n - 1 , also have the desirable property that for each n 2 l the matrix � r = yl (O) y(l) y(O) y( l ) " y(n � l) y(n - 2) is non-negative definite. To see this we write f" = n- 1 TT', y(n -l)J y(O) y(n - 2) (7.2.3) §7.2. Estimation of y ( · ) and where T is the n x p( · ) 2n matrix, ... ... T� and lf = X; - Xn , i = �r 221 0 0 y1 y1 y2 y2 Y, !J Y, 0 y1 y2 1 , . . . , n. Then for any real n x 1 vector a we have -1 a' f a = n ( a' T) (a' T)' 2: 0, n and consequently the sample autocovariance matrix rn and sample auto correlation matrix, (7.2.4) Rn = fn /1(0), 1 are both non-negative definite. The factor n - is sometimes replaced by (n - h) - 1 in the definition of y(h), but the matrices t" and R." may not then be non-negative definite. We shall therefore always use the definitions (7.2. 1 ) and (7.2.2) of y(h) and p(h). Note that det f" > 0 if y(O) > 0 (Problem 7. 1 1). From X 1 , . . . , X" it is of course impossible without further information to estimate y(k) and p(k) for k 2: n, and for k slightly smaller than n we should expect that any estimators will be unreliable since there are so few pairs (X1, X1 + k) available (only one if k = n - 1). Box and Jenkins ( 1976), p. 33, suggest that useful estimates of correlation p(k) can only be made if n is roughly 50 or more and k ::;; n/4. It will be important in selecting an appropriate ARMA model for a given set of observations to be able to recognize when sample autocorrelations are significantly different from zero. In order to do this we use the following theorem which gives the asymptotic joint distribution for fixed h of p(1), . . . , p(h) as n � oo . Theorem 7.2.1. If { X1 } is the stationary process, 00 xt - 11 = I t/lj Zt-j• j= where L� - oo I thl < oo and EZ14 < oo , then for each h E { 1, 2, . . . } we have p(h) is AN(p(h), n - 1 W), where j) (h)' = [ /J (l ), p(2), . . . ' p(h) ] , p(h)' = [p(1), p(2), . . . ' p(h)], and W is the covariance matrix whose (i,j)-element is given by Bartlett'sformula, - co 00 wii = L { p(k + i)p(k + j) + p(k - i) p(k + j) + 2p(i)p(j)p 2 (k) k= -oo - 2p(i)p(k)p(k + j) - 2p(j)p(k)p(k + i) } . 222 7. Estimation of the Mean and the Autocovariance Function PROOF. See Section 7.3. 0 In the following theorem, the finite fourth moment assumption is relaxed at the expense of a slightly stronger assumption on the sequence {l/lj } · Theorem 7.2.2. If {X, } is the stationary process ro X, - f.1 = L 1/Jj Zt -j , j= 1 1/1) < oo and L � -ro 1/1} [ j [ < oo, then for each h E { 1 , 2, . . . } - oo where L � -ro p(h) is AN(p(h), n - 1 W), where p(h), p(h) and W are defined as in Theorem 1.2. 1 . PROOF. See Section 7.3. Remark 1. Simple algebra shows that ro wu = L { p(k + i) + p(k - i) - 2p (i) p (k)} k�l X { p(k + j) + p(k - j) - 2p(j)p(k)}, (7.2.5) which is a more convenient form of wu for computational purposes. This formula also shows that the asymptotic distribution of n 112 (p(h) p(h)) is the same as that of the random vector ( Y1 , , Yh)', where - • . . ro i = 1 , . . . , h, (7.2.6) Y; = L (p(k + i) + p(k - i) - 2p(i)p(k))Nk > k� l and N1 , N2 , are iid N(O, 1 ) random variables. The proof o f Theorem 7.2.2 shows in fact that the limit distribution of n 112 (p(h) - p(h)) is completely deter mined by the limit distribution of the random variables a 2 n - 1 12 L�� 1 Z,Z, + i • i = 1 , 2, . . . which are asymptotically iid N(O, 1 ) (see Problem 6.24). • • • - Remark 2. Before considering some applications of Theorem 7.2.2 we note that its conditions are satisfied by every ARMA(p, q) process driven by an iid sequence {Z, } with zero mean and finite variance. The assumption of identical distributions in Theorems 7. 1 .2 and 7.2. 1 can also be replaced by the boundedness of E [ Z, [ 3 and E [ Z,[6 respectively (or by other conditions which permit the use in the proofs of a central limit theorem for non-identically distributed random variables). This should be kept in mind in applying the results. ExAMPLE 7.2. 1 (Independent White Noise). If {X, } � IID(O, a 2 ), then p(l) = 0 if [ / [ > 0, so from (7.2.5) we obtain 223 §7.2. Estimation of y( · ) and p ( · ) wii {01 = if i = j, otherwise. For large n therefore p(l), . . . , p(h) are approximately independent and identically distributed normal random variables with mean · o and variance n- 1 . If we plot the sample autocorrelation function p(k) as a function of k, approximately .95 of the sample autocorrelations should lie between the bounds ± 1 .96n - 112 . This can be used as a check that the observations truly are from an liD process. In Figure 7. 1 we have plotted the sample auto correlation p(k), k = 1, . . . , 40 for a sample of 200 independent observations from the distribution N(O, 1 ). It can be seen that all but one of the auto correlations lie between the bounds ± 1 .96n - 1 12 . If we had been given the data with no prior information, inspection of the sample autocorrelation function would have given us no grounds on which to reject the simple hypothesis that the data is a realization of a white noise process. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 0.9 -1 0 10 30 20 Figure 7.1. The sample autocorrelation function of n white noise, showing the bounds ± 1 .96n-112. = 40 200 observations of Gaussian ExAMPLE 7.2.2 (Moving Average of Order q). If X1 = Z1 + 81 Z1_ 1 + · · · + eqzt - q• then from Bartlett 's formula (7.2.5) we have i > q, W;; = ( 1 + 2p 2 ( 1 ) + 2p 2 (2) + · · · + 2p 2 (q)J, as the variance of the asymptotic distribution of n 1 12 p(i) as n --+ oo. In Figure 224 7. Estimation of the Mean and the Autocovariance Function 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0 3 -0.4 -0 5 -0 6 -0.7 - 0.8 -0.9 -1 0 10 20 30 40 Figure 7.2. The sample autocorrelation function of n = 200 observations o f the Gaus sian M A ( l ) process, X, = Z, - .8Z,_ 1 , { Z, } WN(O, 1 ), showing the bounds ± t .96n - 1 12 [ t + 2p 2 ( t ) r12 . � 7.2 we have plotted the sample autocorrelation function p(k), k for 200 observations from the Gaussian MA( l ) process {Z, } � IID(O, 1 ). = 0, 1 , . . . , 40, = (7.2.6) The lag-one sample autocorrelation is found to be p(1) = - .5354 - 7.57n - 1 12 , which would cause us (in the absence of our prior knowledge of {X, } ) to reject the hypothesis that the data is a sample from a white noise process. The fact that I P{k)l < 1 .96n- 112 for k E { 2, . . . , 40} strongly suggests that the data is from a first-order moving average process. In Figure 7.2 we have plotted the bounds ± 1 .96n - 1i2 [1 + 2p 2 ( 1 )r;2 where p(1) = - .8/1 .64 = - .4878. The sample autocorrelations p(2), . . . , p(40) all lie within these bounds, indicating the compatibility of the data with the model (7.2.6). Since however p ( 1 ) is not normally known in advance, the autocor relations p(2), . . . , p(40) would in practice have been compared with the more stringent bounds ± 1 .96n -1 12 or with the bounds ± 1.96n - 1 12 [ 1 + 2p 2 ( 1)] 1 12 in order to check the hypothesis that the data is generated by a moving average process of order 1 . ExAMPLE 7.2.3 (Autoregressive Process of Order 1 ). Applying Bartlett's formula to the causal AR(1) process, §7.3. * Derivation of the Asymptotic Distributions and using the result (see Section 3.1) that p(i) variance of n 1 12(p(i) - p(i)) is = 225 ,plil , we find that the asymptotic i i -k k + k -i i = L W;; k =l ,p2 (,p - r/J )2 k =iI+ l ,p 2 (,p - r/J )2 = ( 1 - ,p 2i)(1 + r/12)(1 - r/12) - 1 - 2 i r/J 2 i, i = 1 , 2, . . . , ( 1 + r/12 )/( 1 - r/12) for i large. 00 ::::e The result is not of the same importance in model identification as the corre sponding result for moving average processes, since autoregressive processes are more readily identified from the vanishing of the partial autocorrelation function at lags greater than the order of the autoregression. We shall return to the general problem of identifying an appropriate model for a given time series in Chapter 9. § 7. 3 * Derivation of the Asymptotic Distributions This section is devoted to the proofs of Theorems 7. 1 .2, 7.2. 1 and 7.2.2. For the statements of these we refer the reader to Sections 7. 1 and 7.2. The proof of Theorem 7. 1 .2, being a rather straightforward application of the techniques of Chapter 6, is given first. We then proceed in stages through Propositions 7.3. 1 -7.3.4 to the proof of Theorem 7.2. 1 and Propositions 7.3.5-7.3.8 to the proof of Theorem 7.2.2. PROOF OF THEOREM 7. 1 .2. We first define m X ,m = f.l + L 1/Jj Zt - j j= - m and By Example 6.4.4, as n ---> oo, n 1 12( Y,m - Jl) => Ym where Ym � ( C � Y) . N O, rr 2 = m t/li (7.3. 1 ) Now as m ---> oo , rr 2( L j= - m t/IY ---> rr 2(L.i= -oo t/JY , and so by Problem 6. 1 6, Ym ==> Y where Y By Remark 2 of Section 7. 1 , � ( c=�oo t/liy). N 0, rr 2 (7.3.2) 226 7. Estimation of the Mean and the Autocovariance Function Hence lim lim sup Var(n 1 12 (X. - Y..m )) = 0, m-+ro n..--Jo oo which, in conjunction with Chebychev's inequality, implies that condition (iii) of Proposition 6.3.9 is satisfied. In view of (7.3. 1 ) and (7.3.2) we can therefore apply the Proposition to conclude that n 112 (X. - /1) = Y. D The asymptotic multivariate normality of the sample autocorrelations (Theorem 7.2. 1 ) will be established by first examining the asymptotic be havior of the sample autocovariances y(h) defined by (7.2. 1 ). In order to do this it is simplest to work in terms of the function n h = 0, 1 , 2, . . y*(h) = n - 1 I x,x, +h , t=l which, as we shall see in Proposition 7.3.4, has the same asymptotic properties as the sample autocovariance function. . ' Proposition 7.3.1. Let {X,} be the two-sided moving average, 00 X, = L t/lj Zt -j ' j= - ro where EZ,4 = 1]a4 < oo and L� - ro I t/Ii i < 00 . Then if p z 0 and q z 0, lim n Cov(y*(p), y*(q)) - 00 3)y(p)y(q) + I [y(k)y(k - P + q) + y(k + q)y(k - p)], k= -ro where y( · ) is the autocovariance function of {X,}. = (17 (7.3.3) PROOF. First observe that if s = t = u = v, if s = t # u = v, if s # t, s # u and s # v. Now E(X,Xr+p Xt+h+p Xt+h+p + q ) = I I I I t/Ji t/Jj +pt/Jk + h+p t/l, +h+p+q E(Zr - ; Z, _jz, _kz,_ z> i j k I (7.3.4) 227 §7.3. * Derivation of the Asymptotic Distributions and the sum can be rewritten, using (7.3.4), in the form (17 - 3) o.4 L 1/1;1/!;+pl/l;+h+pl/l;+h+p+ q + y(p)y(q) i + y (h + p)y(h + q) + y (h + p + q)y(h). It follows that - Ctl I� x,xt+pxsxs+q) t �l s l� Ey* (p)y*(q) = n 2 E = n-2 y (p)y(q) + y(s - t) y (s - t - p + q) + y(s - t + q)y(s - t - p) + (17 - 3) <T4 � 1/J; I/J;+pi/Ji +s-ti/Ji +s-t+ q . ] Letting k s - t, interchanging the order of summation and subtracting y (p)y(q), we find that Cov(y* (p), y* (q)) = n - 1 L ( 1 - n - 1 l k l ) 7k , (7.3.5) l kl < n where 4 7k = y(k)y(k - p + q) + y (k + q)y(k - p) + (17 3)o L 1/1;1/J;+pl/l;+kl/li +k+ q· i The absolute summability of { 1/Jj } implies that { 7k} is also absolutely summable. We can therefore apply the dominated convergence theorem in (7.3.5) to deduce that = - lim n Cov(y* (p), y*(q)) k=:L-oo 1k 00 = = (I] - 3) y (p) y (q) + 00 L [y(k)y(k - p + q) + y(k + q)y(k - p) ] . k= - oo D Proposition 7.3.2. If {X, } is the moving average, (7.3.6) j= - m where Ez: = 17 04 < oo , and if y( ' ) is the autocovariance function of { X,}, then for any non-negative integer h, [ J (r J l y* (O) _ .· y* (h) 1s. AN where V is the covariance matrix, y O) \.· n - 1 V y(h) ' ' 7. Estimation of the Mean and the Autocovariance Function 228 V= [(I'/ - 3)y(p)y(q) + � l k = oo + y(k + q)y(k - p)) (y(k)y(k - p + q) · q=O , , h ... PROOF. We define a sequence of random (h + I )-vectors {Y, } by y; = (X, X, , x,x, + 1 , . . . , x,x,+ h ). Then {Y, } is a strictly stationary (2m + h)-dependent sequence and n n - 1 L Y, = (y*(O), . . . , y*(h))'. t=! We therefore need to show that as n � oo, n- 1 t t I ( [� ] I..' Y, is AN l..' y O) y(h) ) ' n - 1 I..' VI.. , (7.3.7) for all vectors 1.. E IRh+l such that 1..' VI.. > 0. For any such 1.., the sequence {I.. ' Y, } is (2m + h)-dependent and since, by Proposition 7.3 . 1 , !�� n- 1 Var (� ) I..' Y, = J.' VA > 0, we conclude from Remark 6 of Section 6.4 that { I..' Y, } satisfies the hypotheses of Theorem 6.4.2. Application of the theorem immediately gives the required result (7.3.7). 0 The next step is to extend Proposition 7.3.2 to MA ( oo) processes. Proposition 7.3.3. Proposition 7.3.2 remains true if we replace (7.3.6) by { Z, } j= - co "' 110(0, a 2 ), (7.3.8) PROOF. The idea of the proof is to apply Proposition 7.3.2 to the truncated sequence m x,m = L 1/Jj Zt-j • j= - m and then to derive the result for { X, } by letting m � oo . For 0 � p � h we define n Y!( p) = n - 1 L X,mX(r+ p)m· t=l §7.3.* Derivation of the Asymptotic Distributions Then by Proposition 7.3.2 n t lz [ y!(O) - Ym (O) : ] 229 = Ym , Y! (h) - Ym (h) where Ym C ) is the autocovariance function of {X1m }, Ym - N(O, Vm ) and Vm = [ (17 - 3)ym ( P) Ym (q) + + Ym (k + q)ym (k - p)) Now as m --+ � k� oo (Ym (k)ym (k - P + q) l. q�O . ... , h · oo, where V is defined like Vm with Ym ( · ) replaced by y( · ). Hence Ym = Y where Y - N(O, V). The proof can now be completed by an application of Proposition 6.3.9 provided we can show that lim lim sup P(n 1 12 l y! (p) - Ym ( P) - y*(p) + y(p) l > c) = 0, (7.3.9) m.--. oo n.__... oo for p = 0, 1 , . . . , h. The probability in (7.3.9) is bounded by c - 2 n Var(y! (p) - y*(p)) = 2 c - [ n Var (y! (p)) + n Var(y*(p)) - 2n Cov(y!(p), y* (p))]. From the calcula tions of Proposition 7.3. 1 and the preceding paragraph, lim lim n Var(y! (p)) = lim n Var(y*(p)) vpq where is the (p, q)-element of V Moreover by a calculation similar to that given in the proof of Proposition 7.3. 1 , it can be shown that lim lim n Cov(y!(p), y*(p)) = vPP ' (7.3. 1 0) lim lim sup c - 2 n Var(y!(p) - y* (p)) = 0. (7.3. 1 1) whence This establishes (7.3.9). D Next we show that, under the conditions of Proposition 7.3.3, the vectors [y*(O), . . . , y*(h)]' and [1!(0), . . . , y(h)]' have the same asymptotic distribution. Proposition 7.3.4. If { X1 } is the moving average process, 7. Estimation of the Mean and the Autocovariance Function 230 { Z, } � IID(O, a 2 ), j= - oo where I � _ 00 1 1/Jj l < oo and ez: = 11a4 < oo , and if y( · ) is the autocovariance function of { X, }, then for any non-negative integer h, [ ] ([ l ' y(O) is AN .: , n - 1 v ' y(h) Y (O) :. y(h) where V is the covariance matrix, V= [ (1'/ - 3)y(p)y(q) + � l k = oo + y(k + q)y(k - p)) (7.3. 1 3) q = O, h [ �� Xr+p + n - 1 !2 (7.3.12) (y(k)y(k - P + q) . ..., PROOF. Simple algebra gives, for 0 :=::; p :=::; h, n 1 12 (y* (p) - y(p)) = n 1 12 X. n - 1 ) + n-1 n �� X, + ( 1 - n- 1 p) X. J " XI Xt+p · f..t=n-p+1 The last term is op ( 1 ) since n - 112E I L �=n - p + 1 X,X, +p l :=::; n- 1 12 py(O) and n - 112 py(O) --> 0 as n --> oo. By Theorem 7. 1 .2 we also know that ( C � 1/JiY ) . n 1 12 X. = Y wher e Y � N O, a 2 = oo 1 2 which implies that n 1 x. is OP (l). Moreover by the weak law of large numbers (cf. Proposition 6.3.1 0), n -p n-p n - 1 Xr +p + n - 1 X, + ( 1 - n - 1 p) X. � 0. 1� [ 1� J From these observations we conclude that n 112 (y* (p) - y(p)) = op ( l ) as n --> oo , and the conclusion of the proposition then follows from Propositions 6.3.3 and 7.3.3. D Remark 1. If { 1'; } is a stationary process with mean Jl, then Propositions 7.3. 1 7.3.4 apply to the process {X, } = { 1'; - J1 }, provided of course the specified conditions are satisfied by { 1'; - J1}. In particular if 00 c:o 1'; = J1 + I 1/Jj Zt+j• j= - { Z, } � IID(O, a 2 ), §7.3.* Derivation of the Asymptotic Distributions 23 1 where I.'l= - co I t/1) < oo and EZ,4 = Yfa4 < oo and if y( · ) is the autocovariance function of { Y, }, then for any non-negative integer h, [ � ] (r � ] n-1 v), y O) y O) is AN , y (h) y(h) where V is defined by (7.3. 1 3) and y(p) = n - 1 L,j:-� (lj - Yn ) ( lJ+h - Y,). We are now in a position to prove the asymptotic joint normality of the sample autocorrelations. PROOF OF THEOREM 7.2. 1 . Let g( · ) be the function from [R1h + 1 into !Rh defined by x0 # 0. If y ( " ) is the autocovariance function of {X,}, then by Proposition 6.4.3 and Remark 1 above, p(h) = g ( [Y (O), . . . , y(h) ] ') is AN(g ( [y(O), . . . , y(h) ] '), n 1 D VD'), �- i.e. p(h) is AN(p(h), n - 1 D VD'), where V is defined by (7.3. 1 3) and D is the matrix of partial derivatives, D = y(0) _1 J -p(1) 1 0 · · · O �(2) 0 1 . 0 . . - p(h) 0 0 . . . 1 Denoting by vij and W;j the (i,j)-elements of V and D VD' respectively, we find that wij = vij - p(i)v0j - p(j)v ;0 + p(i)p(j) Voo = �[ k= co p(k)p(k - i + j) + p(k - i) p (k + j) + 2p(i)p(j)p 2 (k) - 2p(i)p(k)p(k + j) - 2p(j)p(k) p(k - i) J. Noting that L k p(k)p(k - i + j) = L k p(k + i)p(k + j) and that L k p(j)p(k)p(k - i) = L k p(j)p(k + i)p(k), we see that wij is exactly as specified in the statement of Theorem 7 . 2. 1 . D We next turn to the proof of Theorem 7.2.2 which is broken up into a series of propositions. Proposition 7.3.5. If {X, } is the moving average process 232 7. Estimation of the Mean and the Autocovariance Function where L� - oc 1 1/l i l j= < oo - oo and L� - oo 1/!J ijl y*(h) !'. < oo, c=�oo 1/ljl/lj+h) (52 PROOF. We give the proof for h = then for h ;:o: 0, = y(h). 0. The general case is similar. Now n y*(O) = n - 1 L L l/1; 1/lj Zr -i Zr -j <= 1 i, j n n - 1 L "L l/l f Z �- ; + Y., t= 1 i 1 where Y, = L L i 7'i 1/1;1/li n L �= 1 Z,_ ; Zr -j· By the weak law of large numbers for moving averages (Proposition 6.3. 1 0), the first term converges in probability to ( L; l/Jl) C5 2 . So it suffices to show that Y, !'. 0. For i 1= j, { Z,_ ; Z,_i , t = 0, ± 1, . . . } WN(O, C54) and hence = � ( Var n 1 - I Z,_ ; Z,_i) <= 1 = n - 1 C54 � 0. Thus for each positive integer k Y,k = and n L L 1/1; 1/lj n- 1 L z,_ ; Zr -j !'. 0, r= 1 lil s k . lil s k, i h lim lim sup E I Y, - Y,k l s lim lim sup L L l l/l;l/li i E I Z1 Z2 I k� oo n �oo k� ro n �oo lil > k lil> k 0. Now appealing to Proposition 6.3.9, we deduce that Y, !'. 0. = D Proposition 7.3.6. Let { X, } be as defined in Proposition 7.3.5 and set y*(h) p *(h) = y*( ) Jor h = 1 , 2, . . . O r. Then ( . L L aii Z, _ ; Zr - i +i n 1 12 p*(h) - p(h) - (y*(0)) - 1 n - 1 12 r=I j,<O i 1 where PROOF. We have ) !'. 0 (7.3.14) i = 0, ± 1 ' . . . ; j = ± 1 ' ± 2, . . . . §7.3.* Derivation of the Asymptotic Distributions 233 p *(h) - p (h) = (y*(0)) -1 (y*(h) - p (h)y*(O)) (� ) � 1/1;1/!jZr -i Zr+ h-j - p (h)L L I/!;1/JjZr - ; Zr -j = (y*(O)f 1 n - 1 f t =l n = (y*(0)) - 1 n - 1 L I I 1/1; ( 1/1;-j+ h - p (h) !/1; j )Zr -i zr- i +i ' t= 1 i j l l 1 1 so that the left side of (7.3. 1 4) is t ) = (y*(0))- 1 n - 1 12 � [ 1/1; ( 1/J; + h - p (h) I/J; ) (t Zr2 Un;) l ( (y*(0))- 1 n- 1 12 � 1/1; ( 1/J; + h - p (h) I/J; ) r Zr2- ; (7.3. 1 5) + where Uni = I7:{_ ; Z12 - L7= 1 Z? is a sum of at most 2 1 i l random variables. Since L ; 1/1; ( 1/J; + h - p (h)l/f; ) = 0 and y*(O) .!.. ( L ; I/Jl)a 2 , the proof will be com plete by Proposition 6. 1 . 1 once we show that L 1/1; (1/J;+ h - p (h) I/J; ) uni = Op(1). i But, � lim sup £ � 1/1; (1/J;+ h - p (h) I/J; ) Un i n-+co ( t ) :-:;; � 1 1/1;1/J;+ h l + l l/ld 2 (2 l i l )a 2 (7.3. 1 6) l < OO and this inequality implies (7.3.16) as required. D Proposition 7.3.7. Let Xr and aii be as defined in Proposition 7.3.6. Then for each positive integer j, and n- 1 12 [� aij tt zt_izt -i+j - � aij t ztzt+j] .!.. 0 t (7.3. 1 7) (7.3. 1 8) PROOF. The left side of (7.3. 1 7) is equal to n - 112 L ; a ij un i where uni = I7:1- i Zt Zt+j - L7= 1 ztzt+j is a sum of at most 2 1 i l products ztzt+j · Moreover, 234 7. Estimation of the Mean and the Autocovariance Function < 00 . Consequently n - 1 12 I, ;a;j Un ; _:. 0. The proof of (7.3.18) is practically identical and is therefore omitted. D Proposition 7.3.8. Let { X, } be the moving average process defined in Prop osition 7.3.5. Then for every positive integer h, n 112 ( p*(h) - p (h))' = ( Y1 , . . . , Y;,)' where p*(h) = (p*(1), . . . , p*(h)), and N1 , N2 , • • • 00 I. (p(k + n + p(k - j) - 2p(j)p(k)) �, j=l are iid N (0, 1) random variables. 1k = PROOF. By Proposition 7.3.6, n n 1 12 (p*(h) - p(h)) = (y*(O)f 1 n - 1 12 rI, I I, a ijZr - i Zr - i+j + op (1). (7.3. 19) = 1 j#O i Also by Problem 6.24, we have for each fixed positive integer m, (J -2 n - 1 /2 (rI= 1 Z,Zr+1 • · · · • rI=1 Z,Zr+m) = (N1 , . . . , Nm )' where N1 , , Nm are iid N(O, 1 ) random variables. It then follows from Propositions 7.3.7 and 6.3.4 that • • • ( ) rr - 2 n - 1 12 I, I I aijZr - i Zr - i +j = I I, (aij + a;, -) �· (7.3.20) j= 1 i O<ljl ,;; m r = 1 i We next show that (7.3.20) remains valid with m replaced by oo . By Prop osition 6.3.9, m may be replaced by oo provided I I, (a j a ) = I, I (aij + a ;. -) � as m -+ oo j= 1 i i + ;, -j � j= 1 i m and oo ( lim lim sup Var n - 1 12 .L m-oo n -oo I IJ I>m t ;:::; l (� aijZ,_ ;Zr-i+j)) l = 0. (7.3.21 ) (7.3.22) §7.3.* Derivation of the Asymptotic Distributions 235 Now (7.3.2 1 ) is clear (in fact we have convergence in probability). To prove (7.3.22), we write Var(n� 112 I lil>m I�=l I ; a ij Zc�; Zr-;+) as n n n � 1 I I I I I I a ii akt E(Z, �; Zc�i+jzs�kzs�k+t) s= l t= i i lil >m k l ll >m n n = n � J I I I I a ij (as�t+i. j + as�t+i�j. �j ) (J4 s=l t=i i lil >m This bound is independent of n. Using the definition of a;i it is easy to show that the bound converges to zero as m ---> oo , thus verifying (7.3.22). Consequently (7.3.20) is valid with m = oo. Since y*(O) ..!'.. ( I i t{I/ ) (J2, it follows from (7.3. 1 9) and Proposition 6.3.8 that n 112(p* (h) - p(h)) => jI=l (I a i + a;, �i) � / (I t/1?) l i l = }/, . Finally the proof of the joint convergence of the components of the vector n 112 (p* (h) - p(h)) can be carried out by writing vector analogues of the preceding equations. D PROOF OF THEOREM 7.2.2. As in the proof of Theorem 7.2. 1 , (see Remark 1 ) we may assume without loss of generality that J1 = 0. By Proposition 6.3.3, it suffices to show that n 1 12 (p*(h) - p(h)) ..!'.. 0 for h = 1 , 2, . . . . As in the proof of Proposition 7.3.4 (and assuming only that { Z, } we have for h ;:::: 0, By Proposition 7.3.5, y*(h) ..!'.. y(h) for h and hence Y (h) ..!'.. y(h) for h ;:::: ;:::: 0 0. - IID(O, (J2)) 7. Estimation of the Mean and the Autocovariance Function 236 Thus by Proposition 6. 1 . 1 , n 112 (p* (h) - p(h)) = n 1 12 (y*(h) - y (h))/y*(O) + n 1 12 (y(O) - y*(O)) y (h)/(y*(O) y (O)) = op( l ). 0 Problems 7. 1 . If {X, } is a causal AR( 1 ) process with mean fl., show that Xn is AN (fl., 0"2(1 - ¢)- 2 n - 1 ). In a sample of size 1 00 from an A R ( 1 ) process with 1ft = .6 and 0" 2 2, we obtain Xn = .27 1 . Construct an approximate 95% confidence interval for the mean fl.· Does the data suggest that f1 = 0? = 7.2. Let { X, } be a stationary process with mean fl. Show that the best linear unbiased estimator /l of f1 is given by fln = (1Tn- 1 tr 1 tT;1 Xn where rn is the covariance matrix of X" = (X 1 , . . . , Xnl' and 1 = ( 1 , . . . , 1 )'. Show also that Var(Jinl = (tT; 1 t) - 1 . [Jin is said to be the best linear unbiased estimator of f1 if E(fln - f1.) 2 = min E I Y - f1 1 2 where the minimum is taken over all Y E sp { X 1 , . . . , X" } with E Y = fl ] . 7.3. Show that for any series { x 1 , . . . , xn }, the sample autocovariances satisfy L lhl < n Y(h) = 0. 7.4. Use formula (7.2.5) to compute the asymptotic covariance matrix of p ( l ), . . . , p(h) for an M A ( 1 ) process. For which values of} and k in { 1 , 2, . . . } are p(j) and p(k) asymptotically independent? 7.5. Use formula (7.2.5) to compute the asymptotic covariance of p ( 1 ) and p(2) for an A R ( 1 ) process. What is the behaviour of the asymptotic correlation of p ( l ) and p(2) a s 1ft --+ ± 1 ? 7.6. F o r a n AR( 1 ) process the sample autocorrelation p ( l ) i s AN(¢, ( 1 - ¢2)n- 1 ). Show that n 112 (p( 1) - ¢)/( 1 - p 2 (1)) 112 is AN(O, 1). If a sample of size 1 00 from an A R ( 1 ) process gives p ( l ) = .638, construct a 95% confidence interval for ¢. Is the data consistent with the hypothesis that 1ft = .7? 7�7. In Problem 7.6, suppose that we estimate 1ft by (p(3)) 113 . Show that (p(3)) 1 13 is AN(¢, n- 1 v) and express v in terms of ¢. Compare the asymptotic variances of the estimators p(l) and (p(3))1 13 as 1ft varies between - 1 and 1 . 7.8. Suppose that { X, } is the A R ( 1 ) process, X, - f1 = tft (X, - fl.) + Z, where 1 ¢ 1 < 1 . Find constants an > 0 and bn such that exp(Xn) is AN(b", an). 7.9. Find the asymptotic distribution of p(2)/p ( l ) for the Gaussian MA( 1 ) process, { Z, } - IID(O, v), where 0 < 1 0 1 < 1 . Problems 237 7. 1 0. If {X, } is the M A ( 1 ) process in Problem 7.9, the moment estimators {J and () of 0 and v based on the observations {X1 , , X. } are obtained by equating the sample and theoretical autocovariances at lags 0 and 1. Thus v( 1 + B 2 ) = y(O), • • • and 0/( 1 + 02 ) = ,0 ( 1). Use the asymptotic joint distribution of (Y{O), p(l ) ) (a) to estimate the probability that these equations have a solution when 0 = .6 and n = 200 (B must be real), and (b) to determine the asymptotic joint distribution of (v, B)'. 7. 1 1 . If X 1 , . • . , x. are n observations of a stationary time series, define Show that the function Y( · ) is non-negative definite and hence, by Theorem 1 .5 . 1 , that Y( · ) i s the autocovariance function o f some stationary process { Y, } . From Proposition 3.2. 1 it then follows at once that { Y,} is an MA(n - 1) process. (Show that tn + h is non-negative definite for all h :2: 0 by setting ¥;, + 1 = ¥;,+ 2 = · · · = Y, + h = 0 in the argument of Section 7.2.) Conclude from Proposition 5. 1 . 1 that if y(O) > 0, then f. is non-singular for every n. CHAPTER 8 Estimation for ARMA M odels The determination of an appropriate ARMA(p, q) model to represent an observed stationary time series involves a number of inter-related problems. These include the choice of p and q (order selection), and estimation of the remaining parameters, i.e. the mean, the coefficients { f/J; , ej : i = 1 , . . . , p; j = 1 , . . . , q} and the white noise variance CJ 2 , for given values of p and q. Goodness of fit of the model must also be checked and the estimation procedure repeated with different values of p and q. Final selection of the most appropriate model depends on a variety of goodness of fit tests, although it can be systematized to a large degree by use of criteria such as the AICC statistic discussed in Chapter 9. This chapter is devoted to the most straightforward part of the modelling procedure, namely the estimation, for fixed values of p and q, of the parameters cjl = (f/J 1 , , f/Jp )', 0 = (81 , , 8q )' and CJ 2 . It will be assumed throughout that the data has been adjusted by subtraction of the mean, so our problem becomes that of fitting a zero-mean ARMA model to the adjusted data x 1 , . . . , xn - If the model fitted to the adjusted data is • . . • . . x, - f/J1 X, _ 1 - · · · - f/Jpx, _p = z, + 81 Z,_ 1 + · · · + eqz,_q, { Z, } � WN(O, CJ 2 ), then the corresponding model for the original stationary series { Y; } is found by substituting 1j - y for Xj , j = t, . . . , t - p, where y = n - 1 I 'i; 1 yj is the sample mean of the original data, treated as a fixed constant. In the case q = 0 a good estimate of cjl can be obtained by the simple device of equating the sample and theoretical autocovariances at lags 0, 1 , . . . , p. This is the Yule-Walker estimator discussed in Sections 8 . 1 and 8.2. When q > 0 the corresponding procedure, i.e. equating sample and theoretical 239 §8. 1 . The Yule�Walker Equations autocovariances at lags 0, . . . , p + q, is neither simple nor efficient. In Sections 8.3 and 8.4 we discuss a simple method, based on the innovations algorithm (Proposition 5.2.2), for obtaining more efficient preliminary estimators of the coefficients when q > 0. These are still not as efficient as least squares or maximum likelihood estimators, and serve primarily as initial values for the non-linear optimization procedure required for computing these more effi cient estimators. Calculation of the exact Gaussian likelihood of an arbitrary second order process and in particular of an ARMA process is greatly simplified by use of the innovations algorithm. We make use of this simplification in our discus sion of maximum likelihood and least squares estimation for ARMA processes in Section 8.7. The asymptotic properties of the estimators and the determina tion of large-sample confidence intervals for the parameters are discussed in Sections 8.8, 8.9, 8. 1 1 and 1 0.8. §8. 1 The Yule-Walker Equations and Parameter Estimation for Autoregressive Processes Let {X, } be the zero-mean causal autoregressive process, {Z,} � WN(0, � 2 ). (8.1.1) Our aim i s t o find estimators of the coefficient vector � = (ifJ 1 , . . . , l/Jp )' and the white noise variance � 2 based on the observations X 1 , . . . , Xn The causality assumption allows us to write X, in the form 00 X, = L t/Jj Zr �j ' j=O (8.1 .2) where by Theorem 3. 1.1, t/J(z) = L i'= o t/Ji z i = 1 /ifJ (z), lzl � 1 . Multiplying each side of (8. 1 . 1 ) by X, �i ' j = 0, . . . , p, taking expectations, and using (8. 1.2) to evaluate the right-hand sides, we obtain the Yule� Walker equations, and (8. 1 .3) (8. 1 .4) where rP is the covariance matrix [y(i - j)JL=1 and yp = (y( l ), y(2), . . . , y(p))'. These equations can be used to determine y(O), . . . , y(p) from � 2 and �· On the other hand, if we replace the covariances y(j),j = 0, . . . , p, appearing in (8.1 .3) and (8. 1.4) by the corresponding sample covariances y(j), we obtain a set of equations for the so-called Yule-Walker estimators � and 6" 2 of � and � 2 , namely (8. 1.5) 240 8. Estimation for ARMA Models and (8. 1 .6) rr 2 = y eo) - cf>' 1P ' where rp = [y(i - j)JL=I and Yp = (y(l ), y (2), . . . y (p)) . If y(O) > 0, then by Problem 7. 1 1, fp is non-singular. Dividing each side of (8. 1 .5) by y (O), we therefore obtain ' ' (8. 1 . 7) and - A f R_ - J A ] 2 (8 . 1 .8) (JA = Y' (0) [ 1 Pp P Pp , where pP = (p(l ), . . . , p(p))' = yP /Y (O). With <f> as defined by (8. 1 .7), it can be shown that 1 - ¢ 1 z - · · · - ¢P zP #- 0 for \ z \ ::;; 1 (see Problem 8.3). Hence the fitted model, is causal. The autocovariances yp(h), h = 0, . . . , p of the fitted model must therefore satisfy the p + 1 linear equations (cf. (8. 1 .3) and (8.1 .4)) h = 1, . . . , p, h = 0. However, from (8. 1 .5) and (8. 1 .6) we see that the solution of these equations is yp(h) = y(h), h = 0, . . . , p so that the autocovariances of the fitted model at lags 0, . . . , p coincide with the corresponding sample autocovariances. The argument of the preceding paragraph shows that for every non-singular covariance matrix rp +t = [y(i - j)J f.}� 1 there is an AR(p) process whose autocovariances at lags 0, . . . , p are y(O), . . . , y (p). (The required coefficients and white noise variance are found from (8. 1 .7) and (8. 1 .8) on replacing p(j) by y(j)/y(O),j = 0, . . . , p, and y(O) by y(O). ) There may not however be an MA(p) process with this property. For example if y (O) = 1 and y (1) = y ( - 1) = [3, the matrix r2 is a non-singular covariance matrix for all f3 E ( - 1 , 1 ). Consequently there is an AR( 1 ) process with autocovariances 1 and f3 at lags 0 and 1 for all f3 E ( - 1 , 1 ). However there is an MA(l) process with autocovariances 1 and f3 at lags 0 and 1 if and only if I /31 ::;; 1/2. (See Example 1 .5. 1 .) It is often the case that moment estimators, i.e. estimators which (like cf>) are obtained by equating theoretical and sample moments, are far less efficient than estimators obtained by alternative methods such as least squares or maximum likelihood. For example, estimation of the coefficient of an MA(1) process by equating the theoretical and sample autocorrelations at lag 1 is very inefficient (see Section 8.5). However for an AR(p) process, we shall see that the Yule-Walker estimator, cf>, has the same asymptotic distribution as n --> oo as the maximum likelihood estimator of cj) to be discussed in Sections 8.7 and 8.8. Theorem 8.1.1. If {X, } is the causal AR(p) process (8. 1 . 1 ) with { Z, } � IID(O, (J 2 ), §8.2. Preliminary Estimation, the Durbin-Levinson Algorithm 24 1 and � is the Yule - Walker estimator of cj}, then n l 12 (� - cj)) => N(O, u 2 rp- l ), where rP is the covariance matrix [y(i - j)JL= I · Moreover, a- 2 .:. (T2. PROOF. See Section 8. 1 0. D Theorem 8. 1 . 1 enables us in particular to specify large-sample confidence regions for cj) and for each of its components. This is illustrated in Example 8.2. 1 . I n fitting autoregressive models to data, the order p will usually be unknown. If the true order is p and we attempt to fit a process of order m, we should expect the estimated coefficient vector �m = (�ml , . . . , �mmY to have a small value of �mm for each m > p. Although the exact distribution of �mm for m > p is not known even in the Gaussian case, the following asymptotic result is extremely useful in helping us to identify the appropriate order of the process to be fitted. Theorem 8.1.2. If {X, } is the causal AR(p) process (8. 1 . 1 ) with { Z,} A and if cj}m = (f/Jm l , . . . , ¢mmY = R ;;. I Pm• m > p, then n l12 (�m - cj)m ) => N(O, u2 r,;; l ), A A � � IID(O, u 2 ), where cj}m is the coefficient vector of the best linear predictor cj}�Xm of Xm+ l based on X m = (Xm • . . . , X d' , i.e. cj}m = R ;;. 1 Pm · In particular for m > p, PROOF. See Section 8. 1 0. D The application of Theorem 8. 1 .2 to order selection will be discussed in Section 8.2 in connection with the recursive fitting of autoregressive models. §8.2 Preliminary Estimation for Autoregressive Processes Using the Durbin- Levinson Algorithm Suppose we have observations x 1, . . . , xn of a zero-mean stationary time series. Provided y(O) > 0 we can fit an autoregressive process of order m < n to the data by means of the Yule-Walker equations. The fitted AR(m) process is 8. Estimation for ARMA Models 242 where from (8. 1 .7) and (8. 1 . 8), (8.2.2) and (8.2.3) Now if we compare (8.2.2) and (8.2.3) with the statement of Corollary 5. 1 . 1 , we see that �m and {jm are related to the sample autocovariances i n the same way that �m and vm are related to the autocovariances of the underlying process {Xr } · (As in Theorem 8. 1 .2, �m is defined as the coefficient vector of the best linear predictor ��Xm of Xm +1 based on X m = (Xm , . . . , X 1 ) ; vm is the corresponding mean squared error.) Consequently (if y(O) > 0 so that R1 , R 2 , are non-singular) we can use the Durbin-Levinson algorithm to fit autoregressive models of successively increasing orders 1, 2, . . . , to the data. The estimated coefficient vectors �� > � 2 , . . . , and white noise variances 0 1 , 0 2 , , are computed recursively from the sample co variances just as we computed �1 , � 2 , . . . , and v 1 , v 2 , , from the covariances in Chapter 5. Restated in terms of the estimates �m , vm , the algorithm becomes: ' . . • • • . • • • Proposition 8.2.1 (The Durbin- Levinson Algorithm for Fitting Autoregressive Models). If y(O) > 0 then the fitted autoregressive models (8.2. 1 ) for m = 1 , 2, . . . , n - 1 , can be determined recursively from the relations, �1 1 = p(l), 01 = y(O) [1 - ,0 2 ( 1 )], (8.2.4) (8.2.5) and (8.2.6) Use of these recursions bypasses the matrix inversion required in the direct computation of �m and vm from (8. 1 .7) and (8. 1 .8). It also provides us with estimates �1 1 , �2 2 , . . , of the partial autocorrelation function at lags 1, 2, . . . . These estimates are extremely valuable, first for deciding on the appropriateness of an autoregressive model, and then for choosing an appropriate order for the model to be fitted. We already know from Section 3.4 that for an AR( p) process the partial autocorrelations a(m) = rPmm , m > p, are zero. Moreover we know from Theorem 8. 1 .2 that for an AR(p) process the estimator �mm , is, for large n and each m > p, approximately normally distributed with mean 0 and variance . §8.2. Preliminary Estimation, the Durbin- Levinson Algorithm 243 1/n. If an autoregressive model is appropriate for the data there should consequently be a finite lag beyond which the observed values �mm are compatible with the distribution N(O, 1/n). In particular if the order of the process is p then for m > p, �mm will fall between the bounds ± l .96n- 1 12 with probability close to .95. This suggests using as a preliminary estimator of p the smallest value ofr such that l �mm l < l .96n- 112 for m > r. (A more systematic approach to order selection based on the AICC will be discussed in Section 9.2.) Once a value for p has been selected, the fitted process is specified by (8.2. 1 ), (8.2.2) and (8.2.3) with m = p. Asymptotic confidence regions for the true coefficient vector lj)P and for its individual components r/JPi can be found with the aid of Theorem 8. 1 . 1 . Thus, if xi ( P) denotes the ( 1 - a) quantile of the chi-squared distribution with p degrees of freedom, then for large sample size n, the region -a (8.2.7) contains P with probability close to (1 - a) . (See Problems 1 . 1 6 and 6. 1 4.) Similarly, if <l> 1 _a denotes the (1 - a) quantile of the standard normal distri bution and vjj is the r diagonal element of vp rp- l ' then for large n the interval { r/J E iR : l r/J - �Pii � n- 1 12 <1> 1 -a/2 vJF } (8.2.8) contains r/Jpi with probability close to ( 1 - a). EXAMPLE 8.2. 1 . One thousand observations x 1 , . . . , x 1 000 of a zero-mean stationary process gave sample autocovariances y(O) = 3.6840, ]1 ( 1 ) = 2.2948 and ]1(2) = 1 .849 1 . Applying the Durbin-Levinson algorithm to fi t successively higher order autoregressive processes to the data, we obtain �1 1 = p(l) = .6229, V I = ]1 (0) ( 1 - p 2 ( 1 )) = 2.2545, �2 2 = []1 (2) - �1 1 y( l) ] /v l = . 1 861, �2 1 = �1 1 - �2 2 �1 1 = .5070, v 2 = V 1 ( 1 - �12 ) = 2. 1 764. The computer program PEST can be used to apply the recursions (8.2.4)-(8.2.6) for increasing values of n, and hence to determine the sample partial autocorrelation function (foii• shown with the sample autocorrelation function p(j) in Figure 8. 1 . The bounds plotted on both graphs are the values ± 1 .96n - 1 1 2 . Inspection of the graph of �ii strongly suggests that the appropriate model for this data is an AR(2) process. Using the Yule-Walker estimates �2 1 , �22 and v 2 computed above, we obtain the fitted process, { Z, } "' WN(0, 2. 1 764). 244 8. Estimation for ARMA Models 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 0.3 -0.4 -0.5 -0.6 - 0. 7 -0.8 -0.9 -1 0 10 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 (b) Figure 8. 1 . The sample ACF (a) and PACF (b) for the data of Example 8.2. 1 , showing the bounds ± 1 .96n�1f2 • §8.3. Preliminary Estimation for Moving Average Processes 245 From Theorem 8. 1 . 1 , the error vector cf, cJ1 is approximately normally distributed with mean 0 and covariance matrix, - n- 1 v 2 r� 2- 1 = = [ n- 1 1 - [ 2 p(j) ¢l2i � i .000965 - .000601 A J[ 1 p(1) ] - .00060 1 .000965 · From (8.2.8) we obtain the approximate .95 confidence bounds, Ji ± 1 .96(.000965) 1 12 for f/J; , i = 1 , 2. These are .5070 ± .0609 for f/J 1 and . 1 86 1 ± .0609 for ¢l2 . The data for this example came from a simulated AR(2) process with coefficients f/J1 = .5, f/J2 = .2 and white noise variance 2.25. The true coeffi cients thus lie between the confidence bounds computed in the preceding paragraph. §8.3 Preliminary Estimation for Moving Average Processes Using the Innovations Algorithm Just as we can fit autoregressive models of orders 1 , 2, . . . , to the data x1 , , x. by applying the Durbin-Levinson algorithm to the sample auto covariances, we can also fit moving average models, • • • {Z1 } � WN(O, vm ), (8.3. 1 ) of orders m = 1 , 2, . . . , by means of the innovations algorithm (Proposition 5.2.2). The estimated coefficient vectors am := (Om 1 , . . . , emm )', and white noise variances vm , m = 1 2, . . . , are specified in the following definition. (The justification for using estimators defined in this way is contained in Theorem 8.3. 1 .) Definition 8.3.1 (Innovation Estimates of Moving Average Parameters). If y(O) > 0, we define the innovation estimates am , vm appearing in (8.3.1) for m = 1 , 2, . . . , n - 1 , by the recursion relations, v0 = y(O), k = 0, . . . ,m - 1 , (8.3.2) and m- 1 vm = y(O) - L e�.m-A · j=O (8.3.3) 8. Estimation for A R M A M odels 246 Theorem 8.3.1 (The Asymptotic Behavior of Om ). Let { X, } be the causal invertible ARMA process ifJ(B)X, = B(B) Z,, {Z, } "' 110(0, a2 ), EZ� < oo, and let t/l (z) = L.i=o t/lj z j = () (z)/ifJ(z), l z l :-:;; 1 , (with t/10 = 1 and t/lj = 0 for j < 0). Then for any sequence of positive integers { m(n), n = 1 , 2, . . . } such that m < n, m -> oo and m = o(n 1 13 ) as n -> oo, we have for each k, where A = [a;J �. j = l and min(i. j) aij = I t/1; - ,t/lj - r· r=1 Moreover, PROOF. See Brockwell and Davis ( 1988b). 0 Remark. Although the recursive fitting of moving average models using the innovations algorithm is closely analogous to the recursive fitting of autoregressive models using the Durbin�Levinson algorithm, there is one important distinction. For an AR(p) process the Yule� Walker estimator �P = (�P 1 , , �PP )' is consistent for cj}P (i.e. �P � cj)P ) as the sample size n -> oo . However for a n MA(q) process the estimator Oq = (Oq 1 ' . . . ' eqq)' i s not consistent for the true parameter vector 9q as n -> oo. For consistency it is necessary to use the estimators (Om 1 , , emqY of oq with { m(n)} satisfying the conditions of Theorem 8.3. 1 . The choice of m for any fixed sample size can be made by increasing m until the vector (Om 1 , , emqY stabilizes. 1t is found in practice that there is a large range of values of m for which the fluctuations in Omj are small compared with the estimated asymptotic standard deviation n - 11 2 (IJ : � 8;,d 1 1 2 as given by Theorem 8.3.1. . • • • • • . • . We know from Section 3.3 that for an MA(q) process the autocorrelations p (m), m > q, are zero. Moreover we know from Bartlett's formula (see Example 7.2.2) that the sample autocorrelation p(m), m > q, is approximately normally distributed with mean p (m) = 0 and variance n - 1 [1 + 2p 2 (1) + · · · + 2 p 2 (q)]. This result enables us to use the graph of p(m), m = 1 , 2, . . . , both to decide whether or not a given set of data can be plausibly modelled by a moving average process and also to obtain a preliminary estimate of the order q. This procedure was described in Example 7.2.2. If, in addition to examining p(m), m = 1, 2, . . . , we examine the coefficient vectors Om , m = 1, 2, . . . , we are able not only to assess the appropriateness of a moving average model and estimate its order q, but also to obtain preliminary estimates Om 1 , . . . , emq of the coefficients. We plot the values em 1 , . . . , emm• 0, 0, . . . for m = 1 , 2, . . . , increasing m until the values stabilize §8.3. Preliminary Estimation for Moving Average Processes 247 (until the fluctuations in each component are of order n - 1 12 , the asymptotic standard deviation of 8m 1 ). Since from Theorem 8.3. 1 the asymptotic variance of {jmj is (J/(81 ' . . . ' ej - 1 ) = n- 1 It:b ef, we also plot the bounds ± 1 .9Mj where tri = (Ji{jm 1 , . . . , em , j - 1 ). A value of {jmi outside these bounds ,suggests that the corresponding coefficient ei is non-zero. The estimate of ei is emi and the largest lag for which {jmi lies outside the bounds ± 1 .96ai is the estimate of the order q of the moving average process. (A more systematic approach to order selection using the AICC will be discussed in Section 9.2.) Asymptotic confidence regions for the coefficient vector Oq and for its individual components can be found with the aid of Theorem 8.3. 1 . For example an approximate .95 confidence interval for ei is given by { 8 E IR . 1 8 - em) � 1 .96n - 1/2 , A ( )} j- 1 ' 1 /2 . em2k k�O (8.3.4) ExAMPLE 8.3. 1 . One thousand observations x 1 , . . . , x 1 000 of a zero-mean sta tionary process gave sample autocovariances y(O) = 7.554 1 , y (l) = - 5. 1 24 1 and y (2) = 1 .3805. The sample autocorrelations and partial autocorrelations for lags up to 40 are shown in Figure 8.2. They strongly suggest a moving average model of order 2 for the data. Although five sample autocorrelations at lags greater than 2 are outside the bounds ± 1 .96n- 1 12 , none are outside the bounds ± 1.96n - 1 12 [ 1 + 2p 2 ( 1 ) + 2p 2 (2) ] 112 . Applying the innovations algorithm to fit successively higher moving average processes to the data, we obtain v0 = 7.5541 , {} , , p(l) = - .67832, v 1 = y(O) - {}f, v0 = 4.0785, {}22 = v()' ]1 (2) = . 1 8275, {}2 1 = V� 1 [Y ( 1 ) - {j22 {jl l (j0 ] = - 1 .0268, V2 = y(O) - 8i2 Do 8i 1 V 1 = 3.0020. = - Option 3 of the program PEST can be used to appl}' the recursions (8.3.2) and (8.3.3) for larger values of m. The estimated values emi , j = 1, . . . , 1 0 and vm are shown in Table 8. 1 for m = 1 , , . . , 1 0, 20, 50 and 1 00. It is clear from the table that the fluctuations in the coefficients from m = 7 up to 1 00 are of order l 000 - 1 12 = .032. The values of 87i , j = 1 , . . . , 7, plotted in Figure 8.3 confirm the MA(2) model suggested by the sample autocorrelation function. The model fitted to the data on the basis of 07 is X, = Z, - 1 .4 1 Z,_ 1 + .60Z,_ 2 , {Z, } � WN(0, 2.24). (8.3.5) In fact from Table 8.1 we see that the estimated coefficients show very little change as m varies between 7 and 1 00. 8. Estimation for ARMA Models 248 1 �-------, 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 - 0. 1 -0.2 - 0.3 -0.4 - 0.5 - 0.6 -07 -0.8 - 0.9 -1 {iJ Ill � '.,( "' )o( )<>{ "' v ��±/� \ �� = OH::r n..Q -to 19-t' 20 10 0 -= Q. 30 � � � 40 (a) 1 0.9 0.8 07 0.6 0.5 0.4 0 . .3 () 2 0.1 0 -0. 1 -0 2 -0.3 -0.4 -0.5 =� � -0.8 -0.9 -1 i/ lf======��d�� 1 ""=.R-Et )a' CJ .= '-""1'? n ..-Iii',;;/ ,_...c:tcr � 0 10 20 30 40 (b) Figure 8.2. The sample ACF (a) and PACF (b) for the data of Example 8.3. 1 , showing the bounds ± ! .96n - 112 • 249 §8.3. Preliminary Estimation for Moving Average Processes Table 8. 1 . Bmi ' j = 1 , . . . , 1 0, and vm for the Data of Example 8.3.1 (}mj � 1 2 3 4 5 6 7 8 9 10 20 50 1 00 1 2 - 0.68 - 1 .03 - 1 .20 - 1.31 - 1 .38 - 1 .4 1 - 1 .41 - 1 .4 1 - 1 .4 1 - 1 .4 1 - 1 .43 - 1 .43 - 1 .43 .18 .37 .44 .5 1 .57 .60 .61 .61 .61 .63 .62 .62 4 3 .03 - .04 - .03 - .0 1 - .0 1 - .02 - .02 - .02 - .03 - .02 - .03 .07 - .04 - .02 - .02 - .03 - .03 - .02 - .02 - .02 - .0 1 5 ()m 7 6 .06 .10 .10 .10 .10 .12 .11 .12 .11 - .02 - .05 - .07 - .08 - .07 - .08 - .08 - .08 8 - .0 1 - .02 - .02 .00 .00 .00 - .0 1 .00 .01 .05 .03 .03 .04 9 10 .01 .04 .02 .02 .01 4.08 3.00 2.65 2.40 2.27 2.24 2.24 2.24 2.24 2.22 2. 1 6 2. 1 0 2.00 .02 - .03 - .03 - .03 An alternative method for obtaining preliminary estimates of the coeffi cients (once q has been determined) is to equate the theoretical and sample autocorrelations at lags 1, . . . , q and solve the resulting non-linear equations for 81 , . . . , eq . Using the algorithm of Wilson ( 1 969) to determine the solution 0 8 0.6 0.4 0.2 - 0 -0.2 � .'\. -----... ------ � -0.4 -0.6 -0.8 -1 - 1 .2 - 1 .4 - 1 .6 0 2 3 4 5 6 7 Figure 8.3. The estimates e7i, j = 1 , . . . , 7, for the data of Example 8.3 . 1 , showing the bounds ± 1 .96(I{ : � e�k) 1 1 2 n - 112 . 250 8. Estimation for ARMA Models for (8 1 , 82 ) such that 1 + 81 z + 82 z 2 X, = Z, - 1 .49Z,_ 1 + X, = Z, - 1 40Z, 1 + 0 for l z l .67Zi _ 2 , 1 , we arrive at the model, { Z,} - WN(0, 2.06). .60Z, _ 2 , { Z,} f= < The actual process used to generate the data in this example was the Gaussian moving average, . _ � WN(O, 2.25). It is very well approximated by the preliminary model (8.3.5). §8.4 Preliminary Estimation for ARMA(p, q) Processes Let {X,} be the zero-mean causal ARMA(p, q) process, X, - r/J1 X,_ 1 - · · · - r/Jp Xr-p = Z, + 8 1 Z,_ 1 The causality assumption ensures that + ··· + 8qZr-q• (8.4. 1 ) { Z, } - WN(O, (J 2 ). 00 X, = L t/lj Zr-j • j=O where by (3.3.3) and (3.3.4), the coefficients t/Ji satisfy {t/10 = 1, t/lj = 8j + min (j,p) i� r/J; t/lj - i • j = (8.4.2) 1, 2, . . . and by convention, 8i = 0 for j > q and r/Ji = 0 for j > p. To estimate t/l t , . . . , t/lp+q • we can use the innovation estimates (jm l , . . . , em ,p+q • "';:hose asymptotic behaviour is specified in Theorem 8.3. 1 . Replacing t/li by 8mi in (8.4.2) and solving the resulting equations, min(j,p) emj = 8j + L ,pi em .j -i • i= 1 j = 1 , 2, " ' (8.4.3) , p + q, for ell and 0, we obtain initial parameter estimates � and 0. From equations (8.4.3) with j = q + 1, . . . , q + p, we see that � should satisfy the equation, em . q+ t em , q+ 2 [ l[ � em, +p = em , q em ,q+l � em, + p - 1 Having solved (8.4.4) for then easily found from : : : �m,q+l - p 8m ,q+2 - p . . . em , q r/J1 l [rPz] .. . r/Jp . (8.4.4) cf, (which may not be causal), the estimate of 0 is 25 1 §8.4. Preliminary Estimation for ARMA(p, q) Processes 0. � 0.8 �--- - 0.7 0.6 ------- \ 0.5 0.4 0.3 0.2 0. 1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 - 0 .9 - 1 0 10 (b) Figure 8.4. The sample ACF (a) and PACF (b) for the data of Example 8.4. 1 , showing the bounds ± l .96n - 112• 8. Estimation for ARMA Models 252 j = 1 , 2, . . . , q. (8.4.5) Finally the white noise variance CJ 2 is estimated by In the case of a pure moving average process, p = 0 and the method reduces to the one described in Section 8.3. EXAMPLE 8.4. 1 . The sample autocorrelation function and partial autocorrela tion function of a zero-mean time series of length 200 are shown in Figure 8.4. Identification of an appropriate model is much less obvious than in Examples 8.2. 1 and 8.3. 1 . However we can proceed as follows. First use program PEST, Option 3, to fit a moving average model (8.3. 1 ), with m chosen so as to give the smallest AICC value. (The AICC is a measure of goodness of fit, defined and discussed later in Section 9.3.) For this example the minimum occurs when m = 8 and the corresponding moving average model has coefficients as follows : Table 8.2. B8. i , j = 1 , . . . , 8, for the Data of Example 8.4. 1 1 1 .341 2 1 .0 1 9 3 .669 4 .423 5 .270 6 . 1 29 7 .0 1 1 8 -.115 The next step is to search for an ARMA(p, q) process, with p and q small, such that the equations (8.4.3) are satisfied with m = 8. For any given p and q (with p + q :<:::; 8), the equations (8.4.3) can be solved for q, and 9 using Option 3 of PEST with m set equal to 8. At the same time the program computes the AICC value for the fitted model. The procedure is repeated for values of p and q such that p + q :<:::; 8 and models with small AICC value are noted as potentially useful preliminary models. In this particular example the AICC is minimized when p = q = 1 and the corresponding preliminary model is {Z,} � WN(O, 1 .097). X, - .760X, _ 1 = Z, + .582Z, _ 1 , This has a close resemblance to the true model, X, - .8X, _ 1 = Z, + .6Z, _ 1 , with {Z,} � WN(O, 1 ), which was used to generate the data. In general the resemblance will not be so close, so it is essential that preliminary estimation be followed by application of a more efficient procedure (see Section 8.5). For larger values of p and q, it is preferable to carry out the search procedure using maximum likelihood estimation (Option 8 of PEST) without preliminary estimation. Thus we can fit maximum likelihood models with p + q = 1 , then p + q = 2, p + q = 3, . . . , using lower order models with appended zero coefficients as initial models for the likelihood maximization. (See Sections 8.7 and 9.2.) §8.5. Remarks on Asymptotic Efficiency 253 §8.5 Remarks on Asymptotic Efficiency The preliminary estimates (�, 9, 8 2 ) of the parameters in the ARMA(p, q) model discussed in Section 8.4 are weakly consistent in the sense that p A p p q, ---> q,, 0 ---> 0 and 8 2 ---> a 2 as n ---> oo . A This i s because (with m(n) satisfying the conditions o f Theorem 8.3. 1 ) B i � 1/Ji m and O � a 2 • Hence (�, 0) must converge in probability to a solution of (8.4.2), m i.e. to (q,, O). In fact using Theorem 8.3. 1 , it may be shown (see Problem 8.22 and Brockwell and Davis ( 1 988a)) that � q, + Op(n- 112 ) and 9 = 0 + Op(n- 112 ). = In the next section we discuss a more efficient estimation procedure (strictly more efficient if q � 1) of (<j>, 9) based on maximization of the Gaussian likelihood. We first introduce, through an example, the concept of relative efficiency of two competing estimators. Consider the MA( 1 ) process X, Z, + 8Z, _ 1 where I 8 1 < 1 and { Z,} "' IID(O, a 2 ). If (J�l J and 8�21 are two estimators of 8 based on the observations X 1 , . . , Xn such that 0�1 is AN(8, aN8)/n), i = 1 , 2, then the asymptotic efficiency of 0�1 l relative to 0�2 l is defined to be ai (8) e(8 , (Jo l, (J< 2l ) . ai {8) = . = (This notion of efficiency extends in an obvious way to more general estimation problems.) If e(8, {J< l J, 0( 2 )) ::::; 1 for all 8 E ( - 1, 1 ) then we say that 0�2) is a more efficient estimator of 8 than e�l ) (strictly more efficient if in addition e(8, 8< 1 1, 8( 2 )) < 1 for some 8 E ( - 1, 1 )). For the MA( 1 ) process let {J�l l denote the moment estimator of 8 obtained by solving the equations y(O) 8 2 ( 1 + 0 2 ) and y(1 ) 8 2 0 for 8 and e. If I !J( l ) l > !- there is no real solution {J so we define {J sgn(p( 1 )). If I P{ 1 ) 1 ::::; !- then p(l) = ()� 1 )/(1 + ( 8� 1 )) 2 ). = = = In general therefore we can write, 8� 1 ) where { -1 (1 - ( 1 1 = g(p(1)) if x < 1 4x 2 ) 112 )/2x if l x l ::::; !-, g(x) if x > }. From Theorem 7.2.2, p( 1 ) is AN(p(1 ), ( 1 - 3p 2( 1 ) + 4p 4( 1))/n), and so by Proposition 6.4. 1 , B� 1 l i s AN( g (p(1 )), af(8)/n), = where - - , 254 8. Estimation for ARMA Models a f (8) = [g'(p( 1 ))] 2 [ 1 - 3 p 2 ( 1 ) + 4p4(1 )] = ( 1 + ()2 + 484 + 86 + 88 )/( 1 - 82 )2 . If we now define 0�2 ) = em ! ' the estimator obtained from the innovations algorithm, then by Theorem 8.3. 1 , 0�2 ) i s AN(8, n -1 ). Thus e(0, 8< 1 ), 0(2 )) = aj 2 (8) :::;; 1 for all 181 < 1 , with strict inequality when 8 # 0. In particular { .82, 8 = .25, 8 = .5o, e(8, 8< 1 ), 0(2 )) = .37, 8 = .75, .06, 1 demonstrating the superiority of 0�2 ) over 0� ). We shall see in Example 8.8.2 that the maximum likelihood estimator 0�3 ) is AN(8, ( 1 - 8 2 )/n). Hence { 8 = .25, .94, 8 = .5o, e(8, 0(2 ), 0(3 )) = .75, 8 = .75 . .44, While 0�3 ) is more efficient, 0�2 ) has reasonably good efficiency except when 1 8 1 i s close t o 1 . The superiority of maximum likelihood estimators from the point of view of asymptotic efficiency holds for a very large class of time-series models. §8.6 Recursive Calculation of the Likelihood of an Arbitrary Zero- Mean Gaussian Process In this section { X1 } is assumed to be a Gaussian process with mean zero and covariance function K(i,j) = EXiXj . Let xn = (X 1 ' . . . ' Xn Y and let xn = (X 1 , , X" )' where X 1 = 0 and Xi = E(Xii X 1 , • • • , Xi - d = PS!i(x, x1_ . ) Xi, j � 2. Let r" denote the covariance matrix, r" = E(X" X�), and assume that r" is non-singular. The likelihood of X" is • .... . . • (8.6. 1 ) The direct calculation o f det r" and r"- 1 can be avoided b y expressing this in terms of the one-step predictors Xi , and their mean squared errors vi _ 1 ,j = 1 , . . . , n, both of which are easily calculated recursively from the innovations algorithm, Proposition 5.2.2. Let 8ii, j = 1, . . , i; i = 1, 2, . . , denote the coefficients obtained when Proposition 5.2.2 is applied to the covariance function K of { X1 }, and let 8i0 = 1 ' 8ij = 0 for j < 0, i = 0, 1 ' 2, . . . . Now define the n X n lower triangular matrix, . . 255 §8.6. Recursive Likelihood Calculation and the n x C = [ 8;.;-J?.}� 0, n diagonal matrix, (8.6.2) D = diag(v0, v 1 , . . . , vn - d · (8.6.3) The innovations representation (5.2. 1 5) of Xi ,j = 1, . . . , n, can then be written in the form, X" = (C - J) (X" - X"), where I is the n x n identity matrix. Hence X" = X" - X n + X n = C(Xn - X") . (8.6.4) Since D is the covariance matrix of (X" - X"), it follows that 1" = CDC' (8.6.5) (from which the Cholesky factorization 1" = U U', with U lower triangular, can easily be deduced). From (8.6.4) and (8.6.5), we obtain n x� rn- 1 X n = (X" - xn yD- 1 (Xn - X") = L (Xj - xy;vj-1 • j=l (8.6.6) and det 1" = (det C)2 (det D) = v 0 v 1 · · · vn-! · The likelihood (8.6. 1 ) of the vector X" therefore reduces to L ( 1") = (2 n) - "1 ( v0 · · · vn-! ) - 1 12 exp 2 { -t� (Xi - (8.6.7) } XY/vj - 1 . (8.6.8) Applying Proposition 5.2.2 to the covariance function K gives X 1 , X 2, . . . , v0 , v 1 , . . , and hence L(1"). If rn is expressible in terms of a finite number of unknown parameters /31 ' . . . , {3,, as for example when { X1 } is an ARMA(p, q) process and r = p + q + 1 , i t is usually necessary t o estimate the parameters from the data X"" A standard statistical procedure in such situations (see e.g. Lehmann ( 1 983)) is to maxi mize the likelihood L(/31 , . . . , /3, ) with respect to /31 , . . . , {3,. In the case when are independently and identically distributed, it is known that X1 , X2 , under rather general conditions the maximum likelihood estimators are consistent as n --> oo and asymptotically normal with variances as small or smaller than those of any other asymptotically normal estimators. A natural estimation procedure for Gaussian processes therefore is to maximize (8.6.8) with respect to {3 1 , . . . , {3,. The dependence of the sequence {X" } must however be kept in mind when studying the asymptotic behaviour of the estimators. (See Sections 8.8, 8. 1 1 and 1 0.8 below.) Even if {X,} is not Gaussian, it makes sense to regard (8.6.8) as a measure of the goodness of fit of the covariance matrix rn (/31 ' . . . ' /3, ) to the data, and . • • • 256 8. Estimation for ARMA Models still to choose the parameters {31 , . . • , {3, in such a way as to maximize (8.6.8). We shall always refer to the estimators /31 , . . . , /3, so obtained as "maximum likelihood" estimators, even when { Xr } is not Gaussian. Regardless of the joint distribution of X 1 , . . . , x., we shall also refer to (8.6. 1 ) (and its algebraic equivalent (8.6.8)) as the "Gaussian likelihood" of X 1 , . . . , X• . §8.7 Maximum Likelihood and Least Squares Estimation for ARMA Processes Suppose now that { Xr } is the causal ARMA(p, q) process, Xr = f/J 1 Xr - t + · · · + f/JpXr - p + eozr + · · · + eqzr - q• {Zr } � WN(O, a2), (8.7. 1) where e0 = 1 . The causality assumption means that 1 - f/J 1 z - · - f/Jp z P -:/- 0 for l z l � 1 . To avoid ambiguity we shall assume also that the coefficients e; and white noise variance a2 have been adj usted (without affecting the autoco variance function of { Xr } ) to ensure that e(z) = 1 + e1 z + + eq z q -:/- 0 for l z l < 1 . Our first problem is to find maximum likelihood estimates of the parameter vectors cj) = (f/J1 , . . . , ¢JP )', 6 = (e1 , . . . , eq )' and of the white noise variance a2• In Section 5.3 we showed that the one-step predictors X;+t and their mean squared errors are given by, · · · · · and (8.7.3) where eij and r; are obtained by applying Proposition 5.2.2 to the covariance function (5.3.5). We recall also that eij and r; are independent of a2• Substituting in the general expression (8.6.8), we find that the Gaussian likelihood of the vector of observations x. = (X 1 , . . . , X.)' is [ L(cj}, 6, a2) = (2na2) - "12(r0 . . · r. - 1 r 1 12 exp - t a - 2 j� (Xj - XYh-1 ] . (8. 7.4) Differentiating In L(cj), 6, a2) partially with respect to a2 and noting that Xj and rj are independent of a2, we deduce (Problem 8. 1 1) that the maximum likelihood estimators �, 9 and r'J2 satisfy §8.7. Maximum Likelihood and Least Squares ARMA Estimation 257 (8.7.5) where n " (Xi - X�) 2h 1 , S(<j>, 9) = L... j=1 A A and cj,, 0 are the values of <j>, 9 which minimize (8.7.6) n /(<j>, 9) = ln(n- 1 S(<j>, 9)) + n - 1 L ln ri_ 1 . j=1 (8.7. 7) We shall refer to /(<j>, 9) as the "reduced likelihood". The calculation of /(<j>, 9) can easily be carried out using Proposition 5.2.2 which enables us to compute 8; - 1 , i , r; _ 1 and X; recursively for any prescribed pair of parameter vectors <j>, 9. A non-linear minimization program is used in the computer program PEST, in conjunction with the innovations algorithm, to search for the values of 4> and 9 which minimize /(<j>, 9). These are the maximum likelihood estimates of 4> and 9 respectively. The maximum likelihood estimator of a 2 is then found from (8.7.5). The search procedure may be greatly accelerated if we begin with parameter values <j>0 , 90 which are close to the minimum of /. It is for this reason that simple, reasonably good preliminary estimates of 4> and 9, such as those described in Sections 8.2, 8.3 and 8.4, are important. It is essential to begin the search with a causal parameter vector <j>0 since causality is assumed in the computation of l(<j>, 9). Failure to do so will result in an error message from the program. The estimate of 4> returned by the program is constrained to be causal. The estimate of 9 is not constrained to be invertible, although if the initial VeCtOr 9o satisfies the condition 1 + 8o 1 Z + . . . + 8o qZ q i= 0 for lzl < 1 and if (<j>0 , 90) is close to the minimum, then it is likely that the value of 0 returned by the program will also satisfy 1 + 81 z + · · · + eq z q i= o for 1 z 1 < 1 . I f not, i t is a simple matter t o adjust the estimates of a 2 and 9 i n order to satisfy the condition without altering the value of the likelihood function (see Section 4.4). Since we specified in (8. 7. 1) that 8(z) i= 0 for l zl < 1, the estimates 0 and 6-2 are chosen as those which satisfy the condition e(z) i= 0 for lzl < 1 . Note however that this constraint i s not always desirable (see Example 9.2.2). An intuitively appealing alternative estimation procedure is to minimize the weighted sum of squares n S(<j>, 9) = " (8.7.8) (Xi - X�)2 h - 1 , jL... =1 with respect to 4> and 9. The estimators obtained in this way will be referred to as the "least squares" estimators � and 9 of 4> and 9. In view of the close relationship (8. 7.7) between l(<j>, 9) and S(<j>, 9), the least squares estimators can easily be found (if required) using the same computer program PEST. For the minimization of S(<j>, 9) however, it is necessary not only to restrict <!> to be causal, but also to restrict 9 to be invertible. Without the latter constraint 258 8. Estimation for ARMA M odels there will in general be no finite (cp, 9) at which S achieves its minimum value (see Problem 8.1 3). If n - 1 L J= t In rj-t is asymptotically negligible compared with In S(cp, 9), as is the case when 9 is constrained to be invertible (since r" -> 1 ), then from (8.7.7), minimization of S will be equivalent to minimization of l and the least squares and maximum likelihood estimators will have similar asymptotic properties. The least squares estimator afs is found from (8.7.9) where the divisor (n - p - q) is used (as in standard linear regression theory) since a - 2 S(cji, 0) is distributed approximately as chi-squared with (n - p - q) degrees of freedom (see Section 8.9). §8.8 Asymptotic Properties of the Maximum Likelihood Estimators If {X, } is the causal invertible process, x, - r/J1 x, _ 1 - · · • - r/Jp Xr - p = z, + 81 z,_ 1 + · · · + 8qZr-q • { Z, } � 110(0, a 2 ), (8.8. 1 ) and i f r/J ( · ) and 8 ( · ) have n o common zeroes, then the maximum likelihood estimator �� = (J1 , . . . , Jp , () 1 , . . . , eq ) = (cf,', f)') is defined tO be the CaUSal inver tible value of �' = W, 9') which minimizes the reduced likelihood /(cp, 9) defined by (8.7.7). The program PEST can be used to determine cf,, 0 numerically. It also gives the maximum likelihood estimate 8 2 of the white noise variance determined by (8.7.5). The least squares estimators cji, 9 are the causal invertible values of cp and 9 which minimize ln(n - 1 S(cp, 9)) = /(cp, 9) - n - 1 I, }= 1 ln rj _ 1 • Because of the invertibility the term n- 1 L }= 1 ln rj - t is asymptotically negligible as n -> oo and the estimators cji and 9 have the same asymptotic properties as cf, and 0. It follows, (see Theorem 1 0.8.2), that if { Z, } � 110(0, a 2 ) and rp( - ) and 8( " ) are causal and invertible with no common zeroes, then (8.8.2) where the asymptotic covariance matrix V(�) can be computed explicitly from (8. 1 1 . 14) (see also ( 10.8.30)). Specifically for p � 1 and q � 1, Eu,v; - 1 (8.8.3) EVt U't EVt Vt' ' where U, = ( U,, . . . , U, + 1 - p )', V, = ( V, , . . . , V,+ 1 -q)' and { U, } , { V, } are the auto V ( �) regressive processes, = a 2 [Eu,u; rp (B) U, = Z,, J (8.8.4) §8.8. Asymptotic Properties of the Maximum Likelihood Estimators 259 and (8.8.5) (For p = 0, V(p) = 0" 2 [EV1V;rt, and for q = 0, V(p) = 0"2 [EU1U;r1 .) We now compute the asymptotic distributions for several special cases of interest. EXAMPLE 8.8. 1 (AR(p)). From (8.8.3), V(cp) = 0"2 [EUiu;r 1 , • where r/J(B) U1 = Z1 Hence V(cp) = (}2 r; 1 , where rP = E(U1 U; ) = [EX; Xi ] fi= 1 , and <f, is AN(cp, n - 1 0" 2 rp- 1 ). In the special cases p = 1 and p = 2 it is easy to express rP- 1 in terms of cp, giving the results, EXAMPLE 8.8.2 (MA(q)). From (8.8.3) where 8 (B) J!; = Z1 • Hence V(O) = 0" 2 [EVI v;r t , v(o) = 0" 2 cqrt. where rq* is the covariance matrix [E V; J.j] i. i= 1 of the autoregressive process J!; + e1 J! ; _ 1 + · · · + eq J!; _ q = zl . Inspection of the results of Example 8.8. 1 yields, for the cases MA(l) and MA(2), EXAMPLE 8.8.3 (ARMA( l , 1 )). In this case V(r/J, 8) = 0" 2 [ E U,l E U1 J!; • J E UI v.I -1 , E J!;2 where U1 - r/J U1_ 1 = Z1 and J!; + 8 J!; _ 1 = Z1 A simple calculation gives, 260 8. Estimation for ARMA Models whence These asymptotic distributions provide us with a general technique for computing asymptotic confidence regions for <!> and 0 from the maximum likelihood or least squares estimates. This is discussed in more detail, together with an alternative technique based on the likelihood surface, in Section 8.9. §8.9 Confidence Intervals for the Parameters of a Causal Invertible ARMA Process Large-sample confidence regions for the coefficients <j) and 0 of a causal invertible ARMA process can be derived from the asymptotic distribution of the maximum likelihood estimators in exactly the same way as those derived from the asymptotic distribution of the Yule-Walker estimator of<!> in Section 8.2. For the process (8.8. 1 ) let P' = (<j)', 0' ) and let p be the maximum likelihood estimator of p. Then defining V(p) by (8.8.3) we obtain the approximate ( 1 a) confidence region for p, - (8.9. 1 ) Writing vjj(p) for ther diagonal element o f V(p), we also have the approximate (1 ct) confidence region for f3i , i.e. - { f3 E IR . I f3 0 /3) s A - n - 1/2 <1> 1 -a/2 Vii1/2 (p) } · A (8.9.2) An alternative approach, based on the shape of the reduced likelihood surface, l(p) = l(<j>, 0), near its minimum can also be used. We shall assume for the remainder of this section that { X, } is Gaussian. For large n, the invertibility assumption allows us to approximate n exp(/(p)) (since -1 n I7�6 ln r; -+ 0) by (8.9.3) where (8.9.4) and the maximum likelihood estimator p by the value of p which minimizes S(p), i.e. by the least squares estimator. The behavior of P can then be investigated by making a further approx imation which reduces the problem to a standard one in the theory of the §8.9. Confidence Intervals for ARMA Parameters 26 1 general linear model. To do this we define T = CT - 1 CD 112 , where C and D were defined in Section 8.6, and let Wn(�) = T - 1 Xn Then by (8.6.5), TT' = Gn(�) and by (8.6.4), the i1h component l¥.t i of l¥.t(�) is (Xi - XJ/rl�� . The problem of finding is thus equivalent to minimizing = Wn(�)'Wn(�) with respect to �· Now make the approximation that for each j, awn;a{3j is constant in a small neighborhood of the true parameter vector �*, and let p S(�) x a l¥.t i = _ [ a� (f3*)]p+j q 1 . i. = If �0 is some fixed vector in this neighborhood, we can then write I.e. (8.9.5) Yn = X �* + Wn(�*), where Y n is the transformed vector of observations, Wn(�0) + X �0 , and the components of Wn(�*) are independent with the distribution N(O, CT2). Equation (8.9.5) is a linear model of standard type, and our estimator P is the value of � which minimizes the squared Euclidean distance, (8.9.6) The standard theory of least-squares estimation for the general linear model (see Section 2.6 and Problem 2. 1 9) suggests the approximations, "' 2 2 2 2 and and q), with q) p CT X (n "' CT X (p + approximately independent. These observations yield the ( 1 - a) confidence regions: (S p) S(�*) S-(pS) (p) S(�*) - S(p) { � E p+q .. S(�) -, S(p) s p + q F1 _,(p + q, n - p q) , S(�) n - p - q } { S(p) S V S 2 S(p) , CT 2 : V E IH : 1H � * .. - X 21 - of2 (n - P - q) Xo!2 (n - P - q) } (8.9.7) (8.9.8) where F, and x; denote the a-quantiles of the F and x2 distributions. Since the function can be computed for any value of � using the program PEST, these regions can be determined numerically. Marginal confidence intervals for the components of �* can be found analogously. These are: S : {3* 1 {/3· S(�)S(�)S(p) 1 E IH : � s t i -o!2 (n - p - q) for n-p-q some � E !Hp +q with j'h component �} . (8.9.9) 8. Estimation for ARMA Models 262 §8. 1 0* Asymptotic Behavior of the Yule-Walker Estimates Throughout this section, it is assumed that {X, } is a causal AR(p) process (8. 1 0. 1 ) where { Z, } "' IID(O, a 2 ). The Yule-Walker estimates o f cp and a 2 are given by equations (8. 1 .3) and (8. 1 .4), or equivalently by � = f; 1 yp and 8 2 = y(O) - y��- lt will be convenient to express (8. 1 0. 1 ) in the form where Y = (X 1 , • • • l Y = Xcp + Z , Xn)', Z = (Z1 , . . . , Zn)' and X is the n Xo x _ 1 . . . x, �, Xo . x2 - p x1 X= . . � J x (8. 10.2) p design matrix, xn - p x - 1 xn -2 Because of the similarity between (8. 10.2) and the general linear model (see Section 2.6 and Problem 2.1 9), we introduce the "linear regression estimate" cp * of cp defined by (8. 1 0.3) cp * = (X' X) - 1 X'Y. The vector cp* is not an estimator in the usual sense since it depends on the values X 1 _ P ' X2 _ P' . . . , X" and not only on the observed values X 1 , , X". Nevertheless, as we shall see, cp* and � have similar properties. This is because n -1 (X'X) is the matrix whose (i,j)-element is equal to n - 1 l:� =� -i Xk Xk+ l i -il and n - 1 X'Y is the vector whose i'h component is n- 1 l:�=� - i XkXk +i · Conse quently (see Proposition 7.3.5), • . • (8. 1 0.4) The proof of Theorem 8. 1 . 1 is divided into two parts. First we establish the limit distribution for cp* and then show that cp* and � must have the same limit law. Proposition 8.10.1. With cp * defined as in (8. 1 0.3) n 112 (cp* - cp) = N(O, a 2 rp- 1 ). PROOF. From (8. 10. 1 ) and (8. 10.2) we have §8. 10.* Asymptotic Behavior of the Yule-Walker Estimates n 112 (cj)* - cj)) = = By setting U, = (X,_ 1 , • • . 263 n 112 ((X'X) -1 X'(Xcj} + Z) - cj)) n (X'Xr 1 (n -112 X' Z). , X,_ P)'Z,, t � 1 we have " n -1!2 X' Z = n -1/2 L U, . t= 1 Observe that E U, = 0 and that , rr 2 rP , h = 0, E U,U,+h = h # 0, ' Op x p since Z, is independent of (X,_ 1 , , X,_P). Let X, = L}= o t{lj Zr -j be the causal representation of X,. For a fixed positive integer m set x�ml = L i= o t{lj Zr -j and U!ml = (X!�l , . . . , X� �� )' Z, and let A. be a fixed element of W. Then A.' U!ml is a strictly. stationary (m + p)-dependent white noise sequence with variance given by rr2 A.T�mlA. where qml is the covariance matrix of (X! �[ , . . . , x:�� ). Hence by Theorem 6.4.2, we have { • • • " n - 1 12 L A.' u:m) = A.'V(m) where v<m) � N(O, rr 2 qm1 ) . t= 1 Since rr 2 r�ml --> rr2 rP as m --> co, we have A.'V<ml = V where V � N (0, rr 2 rp ). Also it is easy to check that (� n -1 Var A.' � (U!m1 - U, ) ) = A.' E [(U, - U!ml) (U, - U!ml)' ] A. --> 0 as m --> co . Since x�mJ � X, as m --> oo , application of Proposition 6.3.9 and the Cramer Wold device gives n- 1;2 X'Z = N(O, rr2 rp ). It then follows from (8. 1 0.4) that n (X'X)- 1 � rP-1 , from which we conclude by Propositions 6.3.8 and 6.4.2 that D PRoOF OF THEOREM 8. 1 . 1 . In view of the above proposition and Proposition 6.3.3, it suffices to show that n 1 12 (� - cj)* ) = op(1). We have I.e. n 112 (� - cj)* ) = n 1/2 f; 1 (yp - n - 1 X'Y) + n 112 (f'; 1 - n(X'X) -1 ) n- 1 X'Y. (8. 1 0.5) 264 8. Estimation for ARMA Models The i'h component of n 1 12 (y P - n - 1 X' Y) is n - 1;2 (�� (Xk - XJ(Xk+ i - Xn ) 0 :� Xk Xk+i) k -; (8. 1 0.6) n-i 1 1 1 = n -112 I Xk Xk+i + n 12 Xn ((1 - n- i)Xn - n - I (Xk + Xk+; )), k =1 k = 1 -i which by Theorem 7. 1 .2 and Proposition 6. 1 . 1 is op ( l ). Next we show that n 112 l l f'P- 1 - n(X'X ) -1 11 = op ( 1 ), (8. 1 0.7) where I I A II is defined for a p x p matrix A to be the Euclidean length of the p 2 dimensional vector consisting of all the components of the matrix. A simple calculation gives n 1 12 l l tp- 1 - n (X'X) - 1 11 = n 112 l l tp- 1 (n- 1 (X'X) - tp ) n(X'Xr 1 ll � n 112 llf'P- 1 II II n - 1 (X'X) - f'P II II n (X'X) - 1 1 1 . Equation (8. 1 0.6) implies that n 1 12 ll n - 1 (X'X) - f'P I I = op ( l ), and since tP- 1 .!.,. rP- 1 and n(X'X) -1 .!.,. rP- 1 , (8. 1 0.7) follows. Combining (8. 1 0.7) with the fact that n - 1 X'Y .!.,. yP' gives the desired conclusion that n 1 12 (cf. - cjl *) op (1 ). Since yP .!.,. yP and cf. .!.,. cjl, it follows that 82 .!.,. (5 2 . 0 = j� � l PROOF OF THEOREM 8. 1 .2. The same ideas used in the proof of Theorem 8. 1 . 1 can be adapted to prove Theorem 8. 1 .2. Fix an integer m > p and note that the linear model in (8. 1 0.2) can be written as ::: X1 -m Z1 rPm 1 Xz - m Zz rPm z . + . , .. .. .. . zn xn - m xn - 2 rPmm where, smce { X, } is an AR (p) process, cjl� = ( r/Jm 1 , . . . , r/Jm m ) := r,;;- 1 ym (r/J 1 , . . . , r/JP, 0, . . . , 0)'. The linear regression estimate of cilm in the model Y = X cilm + Z is then cil! = (X'X)- 1 X'Y, which differs by oP ( n -112 ) from the Yule-Walker estimate cilm = r,;;- 1 Ym· A J = � It follows from the proof of Proposition 8. 1 0. 1 that cil! is AN (cjlm , (J 2 r,;;- 1 ) and hence that In particular, Jmm is AN(O, n - 1 ), since the (m, m) component of r,;;- 1 is (see Problem 8. 1 5) (det rm - d/(det rm) = (det rm - 1 )/((52 det rm - d = (J - 2 . 0 265 §8. 1 1 . * Asymptotic Normality of Parameter Estimators §8. 1 1 * Asymptotic Normality of Parameter Estimators In this section we discuss, for a causal invertible ARMA(p, q) process, the asymptotic normality of an estimator of the coefficient vector which has the same asymptotic distribution as the least squares and maximum likelihood estimators. The asymptotic distribution of the maximum likelihood and least squares estimators will be derived in Section 10.8. Recall that the least squares estimators minimize the sum of squares, 1 However we shall consider the following approximation to S(cp, 9). First we approximate the "standardized innovations" (X, - X,)/(r, _ d 112 by Z,(cp, 9) where Z 1 (cp, 9) = X 1 , �2 (cp, 9) = X2 - r/J1 X 1 - 81 Z t (cp, 9), Z"(cp, 9) = X" - r/J 1 Xn - t - (8_ 1 1 . 1) r/Jp Xn - p - 81 Zn _ 1 (cp, 9) - · · · - 8q Zn - q (cp, 9). By the assumed invertibility we can write Z, in the form, ··· - ro Z, = X, + L nj Xr-j ' ! j= and then (8. 1 1 . 1) corresponds to setting (see Problem 5. 1 5) r- 1 Z,(cp, 9) = X, + L niXt-j· j=! Using the relations (see Problem 8.2 1 ) ro I I Z,(cp, 9) - Z, ll s L l ni i ii X t l l , j=t and we can show that (8. 1 1 .2) for all t where a, c 1 , c 2 and k are constants with 0 < a < 1 . It is useful to make one further approximation to (X, - X,)/(r,_ d 1 12 by linearizing Z,(cp, 9) about an initial estimate (cp0, 90) of (cp, 9). Thus, if W = (r/J 1 , , r/JP , 81 , , 8q ) and �� = (cp� , 9� ), we approximate Z,( �) by • • • • • • 266 8. Estimation for ARMA Models Zr (Po ) - o; ( p - P o ), (8. 1 1 .3) where o; = (Dr . 1 (Po ), . . . ' Dt ,p+ q (P o )), with i = 1 , . . . , p + q. Then by minimizing the sum of squares n L (Zr (Po ) o; ( p - Po )) 2 t= 1 (which by (8. 1 1 .2) and (8. 1 1 .3) is a reasonably good approximation to S(cjl, 9)), we obtain an estimator pt of p which has the same asymptotic properties as the least squares estimator p. The estimator pt is easy to compute from the methods of Section 2.6. Specifically, if we let Z(P0) = (Z1 (P0), . . . , Zn (P0))' and write D for the n x (p + q) design matrix (D 1 , . . . , D")', then the linear regression estimate of AP = P - P o is - so that The asymptotic normality of this estimator is established in the following theorem. Theorem 8.1 1.1. Let { Xr } be the causal invertible ARMA(p, q) process Xr - r/J1 Xr - 1 - · · · r/Jp Xr- p = Zr + B1 Zr - 1 + · · · + BqZr-q• - where { Zr } IID(O, a 2 ) and where r/J (z) and B(z) have no common zeros. Suppose that Po = (/30 1 , . . . , flo . p+q )' is a preliminary estimator of p = (r/J1 , . . . , r/JP , 81 , . . . , Bq)' such that Po - P = o p (n- 1 14), and that pt is the estimator constructed from P o as described above. Then (i) n - 1 D'D � a2 v- 1 ( p) � where V( p) is a (p + q) x (p + q) nonsingular matrix and (ii) n 112 ( pt p) = N(O, V( p)). In addition for the least squares estimator p, we have (iii) n 1 12 ( p - p) = N(O, V(p)). - (A formula for computing V ( p) is given in (8. 1 1 . 1 4).) SKETCH OF PROOF. We shall give only an outline of the proofs of (i) and (ii). The result (iii) is discussed in Section 1 0.8. Expanding Zr (P) in a Taylor series about P = P0, we have (8. 1 1 .4) §8. 1 1 .* Asymptotic Normality of Parameter Estimators where 267 H, = 21 pi�+q Jp+1q 8{382; azpj (P;") ( /3; - /3o;)(/3j - f3o) and Pi is between p and P o · Rearranging (8. 1 1 .4) and combining the equations for t = 1, . . . , n, we obtain the matrix equation, Z(Po ) = D( p - Po ) - H + Z (p), where Z( p) Hence = (Z 1 (p), . . . , Zn (P))', D n 1 12 ( pt _ = (D 1 , . . . , Dn )' and H ( 1 , . . . , Hn Y · H = p) = n 1 12 (Po + ,ftJ p) = n 1 12 (Po + (D'D) - 1 D' Z(Po ) - p), _ I.e. 1 n 1 1z ( pt p) = n 1 ;2 (D'D) - 1 D'Z( p) n ;z (D'Dr 1 D'H. The idea of the proof is to show that n - 1 D'D .!... a 2 v - 1 ( p), 1 n - 1 12 D'Z( p) => N(O, a4 v - ( p)), _ _ (8. 1 1 .5) (8. 1 1 .6) (8. 1 1 . 7) and (8. 1 1 .8) Once these results are established, the conclusion of the theorem follows from Propositions 6. 1 . 1 , 6.3.3 and 6.3.8. From (8. 1 1 . 1 ) we have for t > max(p, q), D,,;(p) = - az, (p) = X, _; - 01 D,_ l . i (p) - · · · - Oq Dr - q ,; (p), a ¢J; i = 1, . . . , p, and i = 1, . . . , q, so that for t > max (p, q), D,, ;(P) satisfies the difference equations, {O(B)D,, ;(P) = X,_;, O( B)D,, i+p ( p) = Z,_;( p), i = 1 , . . . , p, i = 1 , . . . , q. If we define the two autoregressive processes, U, = e -1 (B)X, = r 1 (B)Z,, and v; = e -1 (B)Z,, (8. 1 1 .9) 268 8. Estimation for ARMA Models then it follows from (8. 1 1 .9) and (8. 1 1 .2) that D1, ; (�) can be approximated by B ; U1 = U1 _ for i = 1, . . . , p, ; B - v, = V, - i+p for i = p + I , . . . , p + q. Set U1 = (U1_ 1 , . . . , U1 _ P )' , V1 = ( V,_ 1 , . . . , v,_ q )' and w ; = (U;, V; ). The limit in (8. 1 1 .6) is established by first showing that {ip (8. 1 1 . 1 0) n - 1 ( � D1 , ;(�0) D1 ,) �0) - � D1 , ; (�) D;j�)) = op ( 1 ), , 1 using the assumption (�0 - �) = o( n - 114), and then expanding D1 , ; (�0) D1 j�0) in a Taylor series about � · Next, from the approximation (8. 1 1 . 10) we have ) n - 1 ( � D1 , ; (�) D1 j�) - � Wr; Wri , 1 = op ( 1 ). If EZ14 < oo , then by applying Theorem 7. 1 . 1 to the individual components of WI w;, we obtain n-1 I W1w; .!... E(W1 w; ), (8. 1 1 . 1 1) 1=1 from which we identify V(�) as V(�) = 0" 2 [ E (W 1 W '1 ) ] - 1 . (8. 1 1 . 1 2) However if we assume only that EZ12 < oo, then (8. 1 1 . 1 1) also holds by the ergodic theorem (see Hannan, 1 970). The verification of (8. 1 1 . 7) is completed in the following steps. By expanding D1, 1 (�0) in a Taylor series about � and using (8. 1 1 .2) to approximate Z1( �) by Z1, it can be shown that l (,� , n - 1 12 , 1 D1, ; (�0)Z1(�) - �� Wr ; Z1) = op ( 1 ). (8. 1 1 . 1 3) Since for each i = I , . . . , p + q, Wr , ; , t = . . . , - 1 , 0, 1 , . . . , is a moving average of Z1 _ 1 , Z1_ 2 , the sequence of random vectors W1Z1, t ?: 1 , is uncorrelated with mean zero and covariance matrix, . • • E [(W1Z1)(W1Z1)' ] = 0' 2 E (W1W;) = 64 v - 1 (�). Using the argument given in the proof of Proposition 8.10. 1 , it follows that n - 112 L W1Z1 => N (O, 0"4V - 1 (�)), 1 n =1 which, with (8. 1 1 . 1 3), establishes (8. 1 1 .7). Finally to prove (8. 1 1 .8), it suffices to show that . i, j, k = 1 , . . , p + q, 269 Problems since (/3; - f30J(f3i - f30J = oP (n - 112 ). This term is handled by first showing that Po and P� may be replaced by P and then that the resulting expression has an expectation which is bounded in n. D Note that the expression for V ( p) simplifies to EU 1 V'1 - 1 EV1 V; ] (8. 1 1 . 1 4) where U, and V, were defined in the course of the proof. The application of (8. 1 1 . 1 4) was illustrated for several low-order ARMA models in Section 8.8. Problems 8. 1 . The Wolfer sunspot numbers {X,, t = 1, . . . , 1 00} of Example 1 . 1 .5 have sample autocovariances y(O) = 1 3 82.2, Y(1) = 1 1 1 4.4, Y(2) = 591 .72 and y(3) 96.2 1 5. Find the Yule-Walker estimates of I/J1 , I/J2 and CJ2 in the model = Y, = I/J1 Y, - 1 1/Jz Y,- 2 + Z, , for the mean-corrected series Y, = X, - 46.93, t = 1 , . . . , 1 00. Use Theorem 8. 1 . 1 t o find 95% confidence intervals for I/J 1 and I/J2 . + 8.2. Use the Durbin-Levinson algorithm to compute the sample partial autocorrela tions ¢1 1 , ¢22 and ¢33 of the Wolfer sunspot numbers. Is the value of ¢33 compatible with the hypothesis that the data is generated by an A R(2) process? (Use Theorem 8.1.2 and significance level .05.) 8.3. Let (X 1 , . . . , Xp+l )' be a random vector with mean 0 and non-singular covariance matrix rp +1 = [y(i - j)]r,;;,l . Note that psp{X,, , Xp } xp +1 = I/J 1 Xp + . . . + 1 1/JpX I where II> = rp- Yp (see (8. 1.3)). Show that 1/J(z) = 1 - 1/J I z - . . - 1/Jpz P =f. 0 for l z l :::::; I . (If rjJ(z) = ( 1 - az)((z), with l a l ;:>: 1 , set �(z) = ( 1 - pz)((z) where p = Corr( Yp + l • YP) and lj = ((B)Xi. Then E l r/J(B)Xp + l l 2 = E l Yp + l - p YP I 2 :::::; E l Yp + 1 - a YP I 2 = E l r/J(B)X p+ 1 12 with equality holding if and only if p = a.) ... . � 8.4. Show that the zero-mean stationary Gaussian process {X, } with spectral density has the autocovariance function - n :::::; A. :::::; n, if h = 0, if lhl = 1, 3, 5, . . . , otherwise. Hence find the coefficients 84 1 , . . . , 844 in the innovation representation, 4 X s = L B4)Xs -j - X s -)· j� 1 Find an explicit expression, in terms of X; and X; , i = 1 , . . . , 5, for the maximum likelihood estimator of CJ2 based on X 1 , . . . , X 5 . 8. Estimation for ARMA Models 270 8.5. Use the program PEST to simulate and file 20 realizations of length 200 of the Gaussian ARMA( 1 , 1) process, X, - r/J X,_ 1 = Z, + OZ,_1 , {Z, } � WN(O, 1 ), with r/J = 0 = 6 Use the program PEST as in Example 8.4.1 to find preliminary models for each series. . . 8.6. Use the program PEST to simulate and file 20 realizations of length 200 of the Gaussian MA( I ) process x, = z, + oz,_ l , {Z, } � WN(O, 1 ), with 0 = .6. (a) For each series find the moment estimate e of 0 (see Section 8.5), recording M the number of times the sample autocorrelation p( l) falls outside the interval [ -1, 1]. (b) For each series use the program PEST t o find the innovations estimate B1 of 0 (choosing m to minimize the preliminary AICC value). (c) Use the program PEST to compute the least squares estimate BLs for each senes. (d) Use the program PEST to compute the maximum likelihood estimate {J L M for each series. Compare the performances of the four estimators with each other and with the behavior expected from their asymptotic distributions. Compare the number of series for which lp(l) l > ! with the expected number based on the asymptotic probability computed in Problem 7. 10. 8.7. Use equation (8.7.4) to show that if n > {X1 , . . . , X" } of the causal AR(p) process, p, the likelihood of the observations {Z, } � WN(O, a2), is L(cjl, a2) x = (2na2) - "12(det Gp) - 112 { [ exp -�a-2 x�G; 1 X p + I (X, - r/J1 x,_1 - · · · - r/JpX,_p)2 2 t� p + I j} , 8.8. Use the result of Problem 8.7 to derive a pair of linear equations for the least squares estimates of r/J1 and r/J for a causal AR(2) process. Compare your 2 equations with those for the Yule-Walker estimates. 8.9. Given two observations x1 and x from the causal A R ( I ) process 2 such that lx 1 1 # lx 2 l , find the maximum likelihood estimates of r/J and a2 • 8.1 0. Derive a cubic equation for the maximum likelihood estimate of r/J1 for a causal AR( I ) process. 271 Problems 8. I 1 . Verify that the maximum likelihood estimators .j, and 0 are those values of cJ> and 9 which minimize l(cj>, 9) in equation (8.7.7). Also show that the maximum likelihood estimator of 0" 2 is n - 1 S(«f,, 0). 8. 1 2. For a causal ARMA process, determine the limit of ( 1/n) LJ= I In rj_ 1 . When is the limit non-zero? 8. 1 3. In Section 8.6, suppose that the covariance matrix r. depends on the parameter p. Further assume that the n values v 0 , , v._ 1 are unbounded functions of p. . • . Show that the function S(Pl for a suitable choice of p. = 1 X� r.- (P)X. can be made arbitrarily close to zero 8. 14. Specialize Problem 8. 1 3 to the case when r. is the covariance matrix of an MA(l) process with () equal to any real number. Show that S(()) = LJ= I (Xj - xy;rj-1 can be made arbitrarily small by choosing 8 sufficiently large. m 8. 1 5. For an AR(p) process, show that det rm = (det rP)0" 2 ( - pJ for all m > p. 1 Conclude that the (m, m) component of r,;; is (det rm _ 1 )/(det rml = ()"- 2 . 8. 1 6. Simulation of a Gaussian process. Show that n consecutive observations { Xk, k = 1 , . . . , n} of a zero-mean Gaussian process with autocovariance function K(i,j) can be generated from n iid N (O, 1) random variables Z� > . . . , z., by setting k = 1, . . . , n, where ()kj,j = 1 , . . . , k and vk- l are computed from the innovations algorithm. (Use equation (8.6.4) to show that (X1 , . . . , X.Y has covariance matrix [K(i,j)J7. j = ! ·) 8. 1 7. Simulation of an ARMA(p, q) process. Show that a Gaussian ARMA(p, q) process { X, t = 1, 2, . . . } can be generated from iid N (0, 1) random variables Z 1 , Z2 , . . . by first defining k � m = max(p, q), k > m, where ()kj , j 1, . . . , k and vk - I are found from the innovations algorithm with K(i,j) as defined in (5.3.5). The simulated values of the ARMA process { X, } are then found recursively from = k � m, k > m. 8. 1 8. Verify the calculation of V(t,b, 8) in Example 8.8.3. 8.19. Verify the calculation of V(¢ 1 , 1,62 ) for the AR(2) process in Example 8.8. 1 . 8.20. Using (8.9. 1 ) and one of the series generated in Problem 8.5, plot the boundary of an approximate 95% confidence region for (1,6, 8). 8.2 1 .* Verify the relations (8. 1 1 .2). 8. Estimation for ARMA Models 272 8.22.* If � and 9 are the preliminary estimates of ljl and 9 obtained from equations (8.4.4) and (8.4.5), show that cf> = «J> + 0 p(n - 1 12 ) and 9 = 9 + 0 P(n - l f2). 8.23.* Let l(G) be the function [(G) = ln(n - 1 X'G - 1 X) + n - 1 ln(det G) where G is a positive definite matrix. Show that l(aG) = l(G) where a is any positive constant. Conclude that for an M A( l ) process, the reduced likelihood 1(8) given in (8.7.7) satisfies /(8) = 1(8 - 1 ) and that I(· ) has either a local maximum or minimum at 8 = 1 . CHAPTER 9 Model Building and Forecasting with ARIMA Processes In this chapter we shall examine the problem of selecting an appropriate model for a given set of observations {X" t = 1, . . . , n }. If the data (a) exhibits no apparent deviations from stationarity and (b) has a rapidly decreasing autocorrelation function, we shall seek a suitable ARMA process to represent the mean-corrected data. If not, then we shall first look for a transformation of the data which generates a new series with the properties (a) and (b). This can frequently be achieved by differencing, leading us to consider the class of ARIMA (autoregressive-integrated moving average) processes which is introduced in Section 9. 1 . Once the data has been suitably transformed, the problem becomes one of finding a satisfactory ARMA(p, q) model, and in particular of choosing (or identifying) p and q. The sample autocorrelation and partial autocorrelation functions and the preliminary estimators el-m and Om of Sections 8.2 and 8.3 can provide useful guidance in this choice. However our prime criterion for model selection will be the AICC, a modified version of Akaike's AIC, which is discussed in Section 9.3. According to this criterion we compute maximum likelihood estimators of <j>, 9 and a2 for a variety of competing p and q values and choose the fitted model with smallest AICC value. Other techniques, in particular those which use the R and S arrays of Gray et al. ( 1978), are discussed in the recent survey of model identification by de Gooijer et al. ( 1985). If the fitted model is satisfactory, the residuals (see Section 9.4) should resemble white noise. A number of tests designed to check this are described in Section 9.4, and these should be applied to the minimum-AICC model to make sure that the residuals are consistent with their expected behaviour under the model. If they are not, then competing models (models with AICC-value close to the minimum) should be checked until we find one which passes the goodness of fit tests. In some cases a small difference in AICC-value (say less than 2) between two satisfactory models 9. Model Building and Forecasting with ARIMA Processes 274 may be ignored in the interest of model simplicity. In Section 9.5 we consider the prediction of ARIMA processes, which can be treated as an extension of the techniques developed for ARMA processes in Section 5.3. Finally we examine the fitting and prediction of seasonal ARIMA (SARIMA) models, whose analysis, except for certain aspects of model identification, is quite analogous to that of ARIMA processes. §9. 1 ARIMA Models for Non-Stationary Time Series We have already discussed the importance of the class of ARMA models for representing stationary series. A generalization of this class, which incor porates a wide range of non-stationary series, is provided by the ARIMA processes, i.e. processes which, after differencing finitely many times, reduce to ARMA processes. Definition 9.1.1 (The ARIMA(p, d, q) Process). If d is a non-negative integer, then {X,} is said to be an ARIMA(p, d, q) process if Y, := (1 - B)dX, is a causal ARMA(p, q) process. This definition means that {X, } satisfies a difference equation of the form ifJ* (B)X, = ¢J(B)( 1 - B)dX, = 8(B)Z0 {Z, } � WN(O, cr 2 ), (9. 1 . 1) where ifJ(z) and 8(z) are polynomials of degrees p and q respectively and ifJ(z) # 0 for l z l � 1 . The polynomial ifJ*(z) has a zero of order d at z = 1 . The process {X,} is stationary if and only if d = 0, in which case it reduces to an ARMA(p, q) process. Notice that if d :2:: 1 we can add an arbitrary polynomial trend of degree (d - 1) to {X,} without violating the difference equation (9. 1 . 1 ). ARIMA models are therefore useful for representing data with trend (see Sections 1 .4 and 9.2). It should be noted however that ARIMA processes can also be appropriate for modelling series with no trend. Except when d = 0, the mean of {X, } is not determined by equation (9. 1 . 1 ) and it can in particular be zero. Since for d :2:: 1, equation (9. 1 . 1) determines the second order properties of { ( 1 - B)dX, } but not those of {X, } (Problem 9. 1 ), estimation of cjl, 0 and cr 2 will be based on the observed differences (1 - B)d X,. Additional assumptions are needed for prediction (see Section 9.5). ExAMPLE 9. 1 . 1 . {X, } is an ARIMA ( 1 , 1, 0) process if for some ifJ E ( - 1 , 1), (1 - ¢JB) (1 - B)X, We can then write = Z,, {Z, } � WN(O, cr 2 ). §9. 1 . ARIMA Models for Non-Stationary Time Series 275 90 80 70 60 50 40 30 20 10 0 -10 - 20 0 20 60 40 80 1 00 1 20 1 40 1 60 1 80 200 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0 2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0 5 -0.6 -0.7 -0 8 -0.9 -1 0 10 20 30 40 (b) Figure 9. 1 . (a) A realization of { X 1 , . , X2 0 0 } for the ARIMA process of Example 9. 1 . 1 , (b) the sample ACF and (c) the sample PACF. . . 9. Model Building and Forecasting with ARIMA Processes 276 1 0.9 0.8 07 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -07 -0.8 -0.9 -1 v 0 _B'� 'off � .B-e, -e-=-tJ "'-= 20 10 � � E" !9-ff -=-tl 30 40 (c) Figure 9. 1 . (continued) t xt = Xo + L lj , t � j= l 1, where Y; = ( 1 - OCJ B)Xt = I ,pizt -j · j=O A realization of {X 1 , . . . , X2 00 } with X0 = 0, rjJ = .8 and a = 1 is shown in Figure 9. 1 together with the sample autocorrelation and partial auto correlation functions. A distinctive feature of the data which suggests the appropriateness of an ARIMA model is the slowly decaying positive sample autocorrelation function seen in Figure 9. 1 . If therefore we were given only the data and wished to find an appropriate model it would be natural to apply the operator V = 1 - B repeatedly in the hope that for some j, { ViX1} will have a rapidly decaying sample autocorrelation function compatible with that of an ARMA process with no zeroes of the autoregressive polynomial near the unit circle. For the particular time series in this example, one application of the operator V produces the realization shown in Figure 9.2, whose sample autocorrelation and partial autocorrelation functions suggest an AR(l) model for {VX1 } . The maximum likelihood estimates of rjJ and a 2 obtained from PEST (under the assumption that E (V X1) = 0) are .808 and .978 respectively, giving the model, 277 §9. 1 . ARIMA Models for Non-Stationary Time Series 0 - 1 -2 -3 -4 -5 -6 0 20 40 60 80 1 00 1 20 1 40 1 60 1 80 200 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 20 30 40 (b) Figure 9.2. (a) The differenced series, Y, = X, + 1 - X,, t = 1, . . . , 199, of Example 9. 1 . 1 , (b) the sample ACF o f { Y, } and (c) the sample PACF o f { Y, } . 9. Model Building and Forecasting with ARIMA Processes 278 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 �0.1 �0.2 � 0. 3 � 0. 4 �0.5 �0.6 �0.7 � o-. 8 �0.9 �1 0 10 20 30 40 (c) Figure 9.2. (continued) ( 1 - .808B)( 1 - B)X, = Z,, {Z, } � WN(0, .978), (9. 1 .2) which bears a close resemblance to the true underlying process, (9. 1 .3) ( 1 - .8B) ( 1 - B)X, = Z,, Instead of differencing the series in Figure 9. 1 we could proceed more directly by attempting to fit an AR(2) process as suggested by the sample partial autocorrelation function. Maximum likelihood estimation, carried out using the program PEST and assuming that EX, = 0, gives the model, (1 - 1 .804B + .806B 2 )X, = (1 - .8 1 5B) ( l - .989B)X, = Z,, {Z, } � WN(0, .970), (9. 1 .4) which, although stationary, has coefficients which closely resemble those of the true non-stationary process (9. 1 .3). From a sample of finite length it will be extremely difficult to distinguish between a non-stationary process such as (9. 1 .3) for which �*( 1 ) = 0, and a process such as (9. 1 .4), which has very similar coefficients but for which �* has all of its zeroes outside the unit circle. In either case however, if it is possible by differencing to generate a series with rapidly decaying sample autocorrelation function, then the differenced data can be fitted by a low order ARMA process whose autoregressive polynomial �* has zeroes which are §9. 1 . ARIMA Models for Non-Stationary Time Series 279 comfortably outside the unit circle. This means that the fitted parameters will be well away from the boundary of the allowable parameter set. This is desirable for numerical computation of parameter estimates and can be quite critical for some methods of estimation. For example if we apply the Yule Walker equations to fit an AR(2) model to the data in Figure 9. 1 , we obtain the model (1 - 1 .282B + .290B2 )X, = Z, {Z,} "' WN(O, 6.435), (9. 1 .5) which bears little resemblance to either the maximum likelihood model (9. 1 .4) or the true model (9. 1 .3). In this case the matrix R 2 appearing in (8.1.7) is nearly singular. An obvious limitation in fitting an ARIMA(p, d, q) process { X, } to data is that { X, } is permitted to be non-stationary only in a very special way, i.e. by allowing the polynomial r/J*(B) in the representation r/J*(B)X, = 8(B)Z, to have a zero of positive multiplicity d at the point 1 on the unit circle. Such models are appropriate when the sample autocorrelation function of the data is a slowly decaying positive function as in Figure 9. 1 , since sample autocorrelation functions of this form are associated with models r/J* (B)X, = 8(B)Z, in which r/J* has a zero either at or close to 1 . Sample autocorrelations with slowly decaying oscillatory behavior as in Figures 9.3 and 9.4 are associated with models r/J*(B)X, = 8(B)Z, in which r/J* has a zero close to e i8 for some 8 E ( - n, n] other than {) = 0. Figure 9.3 was obtained from a sample of 200 simulated observations from the process, X, + .99X,_ 1 = Z,, {Z, } "' WN (O, 1 ), for which r/J* has a zero near e;". Figure 9.4 shows the sample autocorrelation function of 200 observations from the process, X, - X,_! + .99X,_z = Z" {Z, } ,... WN(O, 1 ), for which r/J* has zeroes near e± i"13 . In such cases the sample autocor relations can be made to decay more rapidly by applying the operator [1 - (2 cos 8)B + B 2 ] = (1 - e i8B) ( 1 - e-;oB) to the data, instead of the operator ( 1 - B) as in the previous paragraph. If 2n/8 is close to some integer s then the sample autocorrelation function will be nearly periodic with period s and the operator Vs = (1 - Bs ) (with zeroes near B = e±io ) can also be applied to produce a series with more rapidly decaying autocorrelation function (see also Section 9.6). The sample autocorrelation functions in Figures 9.3 and 9.4 are nearly periodic with periods 2 and 6 respectively. Applying the operators (1 - B 2 ) to the first series and ( 1 - B 6 ) to the second gives two new series with the much more rapidly decaying sample autocorrelation functions shown in Figure 9.5 and 9.6 respectively. For the new series it is then not difficult to fit an ARMA model rjJ(B)X, = 8(B)Z, for which the zeroes of rjJ are all well outside the unit circle. Techniques for identifying and determining such ARMA models will be discussed in subsequent sections. 9. Model Building and Forecasting with ARIMA Processes 280 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 - 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 - 1 � w I \ I I I\ I \I \I � g � � A � � 0 10 A 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0 5 0.6 -0.7 -0.8 - 0.9 -1 0 10 (b) Figure 9.3. The sample ACF (a) and PACF (b) of a realization of length 200 of the process X, + .99X,_ 1 = Z,, {Z, } WN(O, 1 ). � §9. 1. ARIMA Models for Non-Stationary Time Series 28 1 1 0 9 0.8 0.7 0.6 0.5 0.4 0.3 0 2 0.1 0 4-4-��-+�-+�-+�-+��r-�+-� -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 40 30 10 20 0 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 o ��--��--��sc��L-�� -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 - 1 0 10 (b) 20 30 40 Figure 9.4. The sample ACF (a) and PACF (b) of a realization of length 200 of the process X, � X,_ 1 + 0.99X,_ 2 = Z,, {Z, } WN(O, 1 ). � 9. Model Building and Forecasting with A RIMA Processes 282 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0 5 0.4 0.3 0 2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 (b) Figure 9.5. The sample ACF (a) and PACF (b) of { ( 1 B2 )X, } where {X, } is the series whose sample ACF and PACF are shown in Figure 9.3. - §9. 1. ARIMA Models for Non-Stationary Time Series 283 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0 2 0. 1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 - 0.3 -0.4 -0.5 -0.6 -0.7 -0.8 - 0.9 -1 0 10 (b) Figure 9.6. The sample ACF (a) and PACF (b) of { ( 1 - B6)X, } where {X, } is the series whose sample ACF and PACF are shown in Figure 9.4. 284 9. Model Building and Forecasting with ARIMA Processes §9.2 Identification Techniques (a) Preliminary Transformations. The estimation methods described in Chapter 8 enable us to find, for given values of p and q, an ARMA(p, q) model to fit a given series of data. For this procedure to be meaningful it must be at least plausible that the data is in fact a realization of an ARMA process and in particular that it is a realization of a stationary process. If the data display characteristics suggesting non-stationarity (e.g. trend and seasonality), then it may be necessary to make a transformation so as to produce a new series which is more compatible with the assumption of stationarity. Deviations from stationarity may be suggested by the graph of the series itself or by the sample autocorrelation function or both. Inspection of the graph of the series will occasionally reveal a strong dependence of variability on the level of the series, in which case the data should first be transformed to reduce or eliminate this dependence. For example, Figure 9.7 shows the International Airline Passenger Data { U0 t 1 , . . . , 1 44} of Box and Jenkins ( 1 976), p. 5 3 l . lt is clear from the graph that the variability increases as Ut increases. On the other hand the transformed series V, = In Ut, shown in Figure 9.8, displays no increase in variability with V,. The logarithmic transformation used here is in fact appropriate whenever { Ut } is a series whose standard deviation increases linearly with the mean. For a systematic account of a general class of variance-stabilizing transformations, we refer the reader to Box and Cox ( 1964). The defining equation for the general Box-Cox transformation /;. is = ut 2 o, .A > o, ut > o, .A = o, and the program PEST provides the option of applying /;. (with (0 s .A s 1 . 5) prior to the elimination of trend and/or seasonality from the data. In practice, if a Box-Cox transformation is necessary, it is often the case that either fo or !1 12 is adequate. Trend and seasonality are usually detected by inspecting the graph of the (possibly transformed) series. However they are also characterized by sample autocorrelation functions which are slowly decaying and nearly periodic respectively. The elimination of trend and seasonality was discussed in Section 1 .4 where we described two methods: (i) "classical decomposition" of the series into a trend component, a seasonal component, and a random residual component, and (ii) differencing. The program PEST(Option 1 ) offers a choice between these techniques. Both methods were applied to the transformed Airline Data V, = In Ut of the preceding paragraph. Figures 9.9 and 9. 1 0 show respectively the two series found from PEST by (i) estimating and removing from { V,} a linear trend component and a seasonal component of period 1 2, and (ii) applying the 285 §9.2. Identification Techniques 600 500 � 1/) u c 0 1/) :J 0 .c t=. 400 300 200 1 00 0 0 12 24 36 48 60 84 72 96 1 08 1 20 1 32 1 44 Figure 9.7. International airline passengers; monthly totals in tlrousands of passengers { U, t I , . , 144} from January 1 949 to December 1960 (Box and Jenkins ( 1970)). = . . 6.5 6.4 6.3 6.2 6. 1 6 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5. 1 5 4.9 4.8 4.7 4.6 0 12 24 36 48 60 72 Figure 9.8. Natural logarithms, V, = In U,, t 84 = 96 1 08 1 20 1 32 1 44 I . . , 1 44, of the data in Figure 9.7. , . 286 9. Model Building and Forecasting with ARIMA Processes 0 12 24 36 48 72 60 84 96 1 08 1 20 1 32 1 44 Figure 9.9. Residuals after removing a linear trend and seasonal component from the data { V,} of Figure 9.8. 0 1 2 24 36 48 60 72 84 96 1 08 1 20 1 32 Figure 9.1 0. The differenced series {VV 1 2 V, + 1 3 } where { V, } is the data shown in Figure 9.8. §9.2. Identification Techniques 287 difference operator (1 - B) ( 1 - B 1 2 ) to { l--; } . Neither of the two resulting series display any apparent deviations from stationarity, nor do their sample autocorrelation functions (the sample autocorrelation function of {VV 1 l--; } is 2 shown in Figure 9. 1 1 ). After the elimination of trend and seasonality it is still possible that the sample autocorrelation function may appear to be that of a non-stationary or nearly non-stationary process, in which case further differencing as described in Section 9. 1 may be carried out. (b) The Identification Problem. Let { Xr } denote the mean-corrected trans formed series, found as described in (a). The problem now is to find the most satisfactory ARMA(p, q) model to represent { Xr }. If p and q were known in advance this would be a straightforward application of the estimation techniques developed in Chapter 8. However this is usually not the case, so that it becomes necessary also to identify appropriate values for p and q. It might appear at first sight that the higher the values of p and q chosen, the better the fitted model will be. For example, if we fit a sequence of AR(p) processes, p = 1, 2, . . . , the maximum likelihood estimate, 82 , of (J 2 generally decreases monotonically as p increases (see e.g. Table 9.2). However we must beware of the danger of overfitting, i.e. of tailoring the fit too closely to the particular numbers observed. An extreme case of overfitting (in a somewhat different context) occurs if we fit a polynomial of degree 99 to 1 00 observations generated from the model Y, = a + bt + Z0 where {Zr } is an independent sequence of standard normal random variables. The fit will be perfect for the given data set, but use of the model to predict future values may result in gross errors. Criteria have been developed, in particular Akaike's AIC criterion and Parzen's CAT criterion, which attempt to prevent overfitting by effectively assigning a cost to the introduction of each additional parameter. In Section 9.3 we discuss a bias-corrected form of the AIC, defined for an ARMA(p, q) model with coefficient vectors <!> and 9, by AICC(<j>, 9) = - 2 ln L(<j>, 9, S(<j>, 9)/n) + 2(p + q + l )nj(n - p - q - 2), (9.2. 1 ) where L(<j>, 9, (J 2 ) i s the likelihood o f the data under the Gaussian ARMA model with parameters (<j>, 9, (J 2 ) and S(<j>, 9) is the residual sum of squares defined in Section 8.7. On the basis of the analysis given in Section 9.3, the model selected is the one which minimizes the value of AICC. Intuitively one can think of 2(p + q + l)n/(n - p - q - 2) in (9.2. 1 ) as a penalty term to discourage over-parameterization. Once a model has been found which minimizes the AICC value, it must then be checked for goodness of fit (essentially by checking that the residuals are like white noise) as discussed in Section 9.4. Introduction of the AICC (or analogous) statistic reduces model identi fication to a well-defined problem. However the search for a model which minimizes the AICC can be very lengthy without some idea of the class 288 9. Model Building and Forecasting with ARIMA Processes 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 - 0.8 -0.9 -1 0 10 (b) Figure 9.1 1 . The sample ACF (a) and PACF (b) of the series {VV1 2 V, +1 3 } shown in Figure 9. 10. §9.2. Identification Techniques 289 of models to be explored. A variety of techniques can be used to accel erate the search by providing us with preliminary estimates of p and q, and possibly also preliminary estimates of the coefficients. The primary tools used as indicators of p and q are the sample autocor relation and partial autocorrelation functions and the preliminary estimators �m and Om , m I , 2, . . . , discussed in Sections 8.2 and 8.3 respectively. From these it is usually easy to judge whether a low order autoregressive or moving average model will prove satisfactory. If so then we can proceed by successively fitting models of orders I , 2, 3, . . . , until we find a minimum value of the AICC. (Mixed models should also be considered before making a final selection.) = ExAMPLE 9.2. 1 . Figure 9. 1 2 shows the sample autocorrelation and partial autocorrelation functions of a series of 200 observations from a zero-mean stationary process. They suggest an autoregressive model of order 2 (or perhaps 3) for the data. This suggestion is supported by the Yule-Walker estimators �m ' m I , 2, . . . , of the coefficient vectors of autoregressive models of order m. The Yule-Walker estimates �mj ' j 1 , . . . , m; m = I , . . . , 5 are shown in Table 9. 1 with the corresponding ratios, = = (9.2.2) 1 where a-�j is the r diagonal element of 6- 2 r;;,- (�m)/n, the estimated version of the asymptotic covariance matrix of �m appearing in Theorem 8. 1 .2. A value of rmj with absolute value greater than 1 causes us to reject, at approximate level .05, the hypothesis that iflmj is zero (assuming that the true underlying process is an AR(p) process with p � m). The next step is to fit autoregressive models of orders 1 , 2, . . . , by maximum likelihood, using the Yule-Walker estimates as initial values for the maximi zation algorithm. The maximum likelihood estimates for the mean-corrected data are shown in Table 9.2 together with the corresponding AICC values. = Table 9. 1 . The Yule-Walker Estimates �mj ,j = 1, . . . , m; m I, . . . , 5, and the Ratios rmj (in Parentheses) for the Data of Example 9.2. 1 m 2 3 4 5 j .878 (1 3.255) 1 .410 (1 2.785) 1 .301 (9.545) 1 .293 (9.339) 1 .295 (9.361) 2 3 4 5 - .606 ( - 5.490) - .352 ( - 1 .595) - .369 ( - 1.632) - .362 ( - 1 .602) - . 1 80 ( - 1. 3 1 8) -.119 ( - .526) - .099 ( - .428) - .047 ( - .338) -.117 ( - .5 1 6) .054 (.391 ) 9. Model Building and Forecasting with ARIMA Processes 290 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 - 0. 4 -0 5 -0 6 -0 7 - 0.8 - 0.9 -1 0 10 (b) Figure 9. 1 2. The sample ACF (a) and PACF (b) for the data of Example 9.2. 1 . §9.2. Identification Techniques 29 1 Table 9.2 The Maximum Likelihood Estimates �mi' &;,,j = 1 , . . . , m; m = 1 , . . . , 5, and the Corresponding AICC, BIC and FPE Values for the Data of Example 9.2. 1 m 2 3 4 5 j 2 .892 1 .47 1 1 .387 1 .383 1 .383 - .656 - .47 1 - .486 - .484 3 - . 127 - .08 1 - .072 4 - .033 - .059 5 .01 9 A2 (Jm 1 .547 .885 .871 .870 .870 AICC BIC FPE 660.44 5 5 1.94 550.86 552.75 554.81 662.40 558.29 561.49 567. 3 1 573.01 1 .562 .903 .897 .905 .914 The BIC and FPE statistics (which are analogous to the AICC but with different penalties for the introduction of additional parameters) are also shown. All three statistics are discussed in Section 9.3. From Table 9.2 we see that the autoregressive model selected by the AICC criterion for the mean-corrected data {X,} is X, - 1 .387X, _1 + .47 1 X, _ 2 + . 1 27X, _3 = Z" {Z, } ....., WN(0, .87 1 ). (9.2.3) Application of the goodness of fit tests to be described in Section 9.4 shows that this model is indeed satisfactory. (If the residuals for the model (9.2.3) had turned out to be incompatible with white noise, it would be necessary to modify the model. The model modification technique described below in (d) is frequently useful for this purpose.) Approximate confidence intervals for the coefficients can be found from the asymptotic distribution of the maximum likelihood estimators given in Section 8.8. The program PEST approximates the covariance matrix V(p) of (8.8.3) by 2 H - ' ( p), where H(P) is the Hessian matrix of the reduced likelihood evaluated at p. From this we obtain the asymptotic .95 confidence bounds sj ± 1 .96 [vjj (P)/nJ 1 12 for f3j , where vjj(p) is the r diagonal element of V(p). This gives the following bounds for the coefficients tP 1 , tP2 , tP3 . ¢ , : 1 .387 ± . 1 36, ¢ 2 : - .47 1 ± .226, ¢ 3 : - . 127 ± . 1 38. The confidence bounds for tP3 suggest that perhaps an AR(2) model should have been fitted to this data since 0 falls between the bounds for tP3 . In fact if we had minimized the BIC rather than the AICC (see Table 9.2) we would have chosen the AR(2) model. The BIC is a Bayesian modification of the AIC criterion which was introduced by Akaike to correct the tendency of the latter to overestimate the number of parameters. The true model in this example was {Z, } ....., WN(O, 1 ). (9.2.4) ExAMPLE 9.2.2. Inspection of the sample autocorrelation and partial autocor J elation functions of the logged and differenced airline data {VV 1 2 v; } shown 292 9. Model Building and Forecasting with ARIMA Processes in Figure 9. 1 1 suggests the possibility of either a moving average model of order 1 2 (or perhaps 23) with a large number of zero coefficients, or alterna tively of an autoregressive model of order 1 2. To explore these possibilities further, the program PEST (Option 3) was used to compute the preliminary estimates om and �m ' m = 1 5, 25, 30, as described in Sections 8.2 and 8.3 respectively. These are shown, with the ratios rmj of each estimated coefficient to 1 .96 times its standard error, in Tables 9.3 and 9.4 respectively. For �m• rmj was defined by equation (9.2.2). For Om , rmj = 8m)(1.966'mj ), where by Theorem 8.3. 1 , a-;,j = n - 1 ( 1 + 8;, 1 + · · · + e;,, j- d, j > n-1 . 1 , and a;, 1 = For the preliminary moving average model of order 30 we have plotted the ratios rmj , j = 1 , . . . , 30, with boundaries at the critical value 1, in Figure 9. 1 3. The graph suggests that we consider models with non-zero coefficients at lags 1, 1 2, 23, possibly 3, and possibly also 5, 9 and 1 3. Of the models with non-zero coefficients at one or more of the lags 1, 3, 1 2 and 23, it is found that the one with smallest AICC value ( - 486.04) is (for X1 = VV 1 2 v; - .00029) Zt - .355Zt - 1 - .201 Zt _ 3 - .524Zt - 1 2 + .24 1 Zt_ 2 3 , (9.2.5) where {Z1 } � WN(O, .001 25). If we expand the class of models considered to include non-zero coefficients at one or more of the lags 5, 9 and 13 suggested Xt 0 = 10 20 30 Figure 9 . 1 3 . Ratio o f the estimated coefficient B30,j to 1 .96 times its standard error, j = I, . . . , 30 (from Table 9.3). §9.2. Identification Techniques 293 Table 9.3. The Innovation Estimates Om , vm , m = m = 15 White Noise Variance .0014261 M A Coefficients - . 1 5392 .071 00 - .40660 .03474 - .04885 .04968 . 1 5 1 23 - .36395 - .07263 Ratio of Coefficients to ( 1 .96 * Standard Error) - .83086 .38408 - 2.37442 . 1 8465 - .25991 .26458 .7571 2 - 1.9 1 782 - .38355 m = .02 1 85 .09606 - .00745 .0885 1 - .07203 . 1 4956 . 1 1 679 .5103 1 - .03700 .47294 - .38 1 22 .74253 - .02483 . 1 2737 - .03 196 - .05076 - .05955 . 12247 - .09385 . 1 2591 - .06775 - . 10028 - . 1 3471 .68497 - . 1 5528 - .24370 - .27954 .66421 - .501 28 .61 149 - .32501 - .470 1 6 - .02979 . 1 3925 - .02092 - .04987 - .06067 - .0401 2 . 1401 8 - .09032 . 1 4 1 33 - .06822 - .08874 - .05 1 06 - . 1 6 1 79 .74792 - .1 0099 - .23740 - .28201 - . 1 8554 .761 08 - .48 1 16 .6821 3 - .32447 - .4 1 1 99 - .23605 25 White Noise Variance .001 2638 MA Coefficients .05981 - .36499 .05701 - .01 327 - .47123 .01909 - . 1 1 667 .061 24 .00908 - .04050 Ratio of Coefficients to ( 1 .96 * Standard - 2. 1 3 1 40 .328 1 3 - .071 4 1 .30723 . 10 1 60 - 2.50732 - .56352 .29444 .04352 - . 1 9399 m = 1 5, 25, 30 for Ex. 9.2.2 - . 1481 2 - .03646 . 1 3557 - .0 1722 .24405 Error) - .8 1 1 27 - . 19619 .66284 - .08271 1 . 168 1 8 30 White Noise Variance .0012483 MA Coefficients - .35719 - . 1 5764 .06006 .03632 .01689 - .063 1 3 .02333 - .47895 . 1 5424 - . 1 3 1 03 .0521 6 - .03701 .01 85 1 - .0351 3 .25435 .03687 - .02951 .04555 Ratio of Coefficients to ( 1 .96* Standard Error) - .86556 .33033 - 2.08588 . 1 9553 .09089 - .33967 . 1 2387 - 2.54232 .75069 - . 17626 .24861 - .628 1 5 1 .20720 - . 1 6683 .0879 1 . 1 7077 - . 1 3662 .2 1083 294 9. Model Building and Forecasting with ARIMA Processes Table 9.4. The Yule� Walker Estimates �m ' vm, Example 9.2.2 m = 1 5, 25, 30 for m = 15 White Noise Variance .0014262 AR Coefficients - .40660 - . 1 6261 - .09364 - .00421 .042 1 6 .09282 - .09957 - .38601 - .1 4 1 60 Ratio of Coefficients to ( 1 .96 * Standard Error) - .88695 - .50827 - 2.37494 .535 1 7 .24526 - .02452 - .77237 - .575 1 7 - 2.22798 - .091 85 . 1 5873 - .08563 .06259 .01347 - .021 74 - .5301 6 .92255 - .4648 1 .361 5 5 .07765 - . 1 2701 - .1 32 1 5 . 1 5464 - .07800 - .046 1 1 - . 1 0408 .06052 .01640 - .04537 - .0938 1 - . 1 0264 - .72665 .84902 - .42788 - .25 1 90 - .57639 .33 1 37 .08970 - .24836 - .5 1 488 - .60263 - . 14058 . 1 5 146 - .07941 - .033 1 6 - . 1 1 330 .045 1 4 .07806 .02239 - .03957 - . 10 1 1 3 - . 1 0948 - .00489 - .76545 .82523 - .39393 - . 1 8026 - .61407 .24853 .421 48 . 1 2 1 20 - .1 9750 - .54978 - .59267 - .02860 m = 25 White Noise Variance .001 2638 AR Coefficients - .36498 - .07087 - . 1 5643 . 1 1 335 .03 1 22 - .00683 - .038 1 5 - .44895 - .1 9 1 80 - . 1 2301 .051 0 1 . 1 0 1 60 .0885 1 - .03933 . 1 0959 Ratio of Coefficients to ( 1 .96 * Standard Error) - 2. 1 4268 -.86904 - .39253 .62205 . 1 7052 - .03747 - .20886 - 2.46259 - .98302 - .67279 .28007 .55726 .48469 - .2 1 632 .60882 m = 30 White Noise Variance .0012483 AR Coefficients - .35718 - .06759 - . 1 5995 .09844 .04452 - .0 1 653 - .03322 - .46045 - . 1 8454 - . 1 5279 .06951 .09498 .08865 - .03566 . 1 1481 .00673 - .07332 .01 324 Ratio of Coefficients to ( 1 .96 * Standard Error) - 2.08586 - .3721 0 - .88070 .53284 .24 1 32 - .09005 - . 1 8061 - 2.503 1 0 - .925 14 - .76248 .34477 .4761 5 .47991 - . 1 9433 .6253 1 .072 1 1 .03638 - .40372 295 §9.2. Identification Techniques Table 9.5. Moving Average Models for Example 9.2.2 j 3 5 12 13 23 AICC 62 0 1 30 Model (9.2.5) {Jj - .355 - .201 0 - .524 0 .241 - 486.04 .00 1 25 .00125 {Jj - .433 - .306 .238 - .656 0 .352 - 489.95 .001 03 .00 1 1 7 {)j - .396 0 0 - .6 1 4 .243 0 - 483.38 .001 34 .00 1 34 Model (9.2.6) Model (9.2.9) by Figure 9. 1 3, we find that there is a model with even smaller AICC value than (9.2.5), namely X1 = Z1 - .433Z1 _ 1 - .306Z1 _ 3 + .238Z1 _ 5 (9.2.6) - .65621_ 1 2 + .35221 - 2 3 , with {Z1} � WN(O, .001 03) and AICC = - 489.95. Since the process defined by (9.2.6) passes the goodness of fit tests in Section 9.4, we choose it as our moving average model for the data. The substantial reduction in white noise variance achieved by (9.2.6) must be interpreted carefully since (9.2.5) is an invertible model and (9.2.6) is not. Thus for (9.2.6) the asymptotic one-step linear predictor variance (the white noise variance of the equivalent invertible version of the model) is not a 2 but a 2/ 1 b1 · · · bj l 2 (see Section 4.4), where b 1 , . . . , bj are the zeroes of the moving average polynomial 8(z) inside the unit circle. For the model (9.2.6), j = 4 and I b1 · · · bj l = .939, so the asymptotic one-step predictor variance is .001 1 7, which is still noticeably smaller than the value .00 1 25 for (9.2.5). The maximum likelihood program PEST also computes the estimated mean squared error of prediction, v"_ 1 , for the last observation based on the first (n - 1 ). This is simply r" _ 1 times the maximum likelihood estimator of a 2 (see Section 8.7). It can be seen in Table 9.5 that v" _ 1 is quite close to IJ2 for each of the invertible models (9.2.5) and (9.2.9). The model (9.2.6) does of course have an invertible version with the same likelihood (which can be found by using the program PEST), however it will have small non-zero coefficients at lags other than 1 , 3, 5, 12 and 23. If we constrain the model to be invertible and to have zero coefficients except at lags 1, 3, 5, 12 and 23, the likelihood is maximized for parameter values precisely on the boundary of the invertible region and the maximum is strictly less than the likelihood of the model (9.2.6). Thus in the presence of lag constraints, insistence on invertibility can make it impossible to achieve the maximum value of the likelihood. A similar analysis of the data, starting from Table 9.4 and fitting auto regressive rather than moving average models, leads first to the model, (9.2.7) 296 9. Model Building and Forecasting with ARIMA Processes with { Z,} � WN(O, .001 46) and AICC = - 472.53. Allowing non-zero coef ficients also at lags 3, 4, 9 and 1 6, we obtain the improved model, X, + .365X,_1 + .467X,_ 1 2 + . 1 79X,_ 1 3 + . 1 29Xr - t 6 = Z,, (9.2.8) with { Z,} � WN(0, .001 42) and AICC = - 472.95. However neither (9.2.7) nor (9.2.8) comes close to the moving average model (9.2.6) from the point of view of the AICC value. It is interesting to compare the model (9.2.6) with the multiplicative model for {VV 1 2 V, } fitted by Box and Jenkins (1976), i.e. with Xr* = VV 1 2 V, , {Z, } � WN(0, .001 34). (9.2.9) X,* = ( 1 - .396B)( 1 - . 6 1 4B 1 2 )Z" The AICC value for this model is - 483.38, making it preferable to (9.2.8) but inferior to both (9.2.5) and to our chosen model (9.2.6). Characteristics of the three moving average models can be compared by examining Table 9.5. (c) Identification of Mixed Models. The identification of a pure auto regressive or moving average process is reasonably straightforward using the sample autocorrelation and partial autocorrelation functions, the pre liminary estimators ci»m and Om and the AICC. On the other hand, for ARMA(p, q) processes with p and q both non-zero, the sample ACF and PACF are much more difficult to interpret. We therefore search directly for values of p and q such that the AICC defined by (9.2. 1 ) is minimum. The search can be carried out in a variety of ways, e.g. by trying all (p, q) values such that p + q = 1 , then p + q = 2, etc., or alternatively by using the following steps. (i) Use maximum likelihood estimation (program PEST) to fit ARMA processes of orders ( 1, 1), (2, 2), . . . , to the data, selecting the model which gives the smallest value of the AICC. [Initial parameter estimates for PEST can be found using Option 3 to fit ARMA(p, p) models as described in Example 8.4. 1 , or by appending zero coefficients to fitted maximum likelihood models of lower order.] (ii) Starting from the minimum-AICC ARMA(p, p) model, eliminate one or more coefficients (guided by the standard errors of the estimated coefficients), maximize the likelihood for each reduced model and compute the AICC value. (iii) Select the model with smallest AICC value (subject to its passing the goodness of fit tests in Section 9.4). The procedure is illustrated in the following example. EXAMPLE 9.2.3. The sample autocorrelation and partial autocorrelation func tions of 200 observations of a stationary series are shown in Figure 9. 1 4. They suggest an AR(4) model for the data, or perhaps a mixed model with fewer coefficients. We shall explore both possibilities, first fitting a mixed model in accordance with the procedure outlined above. §9.2. Identification Techniques 297 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 - 0.5 -0.6 -0.7 - 0.8 -0.9 - 1 0 10 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0 2 -0.3 -0.4 -0.5 -0.6 -07 -0.8 -0.9 -1 0 10 (b) Figure 9. 14. The sample ACF (a) and PACF (b) for the data of Example 9.2.3. 298 9. Model Building and Forecasting with ARIMA Processes Table 9.6. Parameter Estimates for ARMA(p, p) Models, Example 9.2.3 (a) Preliminary Estimates (from PEST) with m = 9 p Jl J2 i'il i'iz I 2 3 .803 1 . 142 - 2.524 - .592 3.576 .868 .528 4. 195 .025 1 .982 p - 2. 1 56 AICC . 1 09 (b) Maximum Likelihood Estimates (from PEST) �� 2 � .701 2 1 . 1 1 8 - .580 3 1 . 1 22 - .555 - .020 4 1 .0 1 6 - 1.475 1 .0 1 2 � - .525 � � .892 .798 . 103 .792 .059 .889 1 .207 � - .042 .897 � .216 82 656.61 591 .43 Non-causal AICC 1 .458 652.33 .982 578.27 .982 582.39 .930 579.98 BIC 657.36 591 .85 603.17 603.67 Table 9.6(a) shows the preliminary parameter estimates �, 0 for ARMA(p, p) models with p = 1 , 2 and 3 (p = 3 gives a non-causal model) and m = 9, obtained from PEST as described in Example 8.4. 1 . On the basis of the AICC values in Table 9.6(a), the ARMA(2, 2) model is the most promising. Since the preliminary ARMA(3, 3) model is not causal, it cannot be used to initialize the search for the maximum likelihood ARMA(3, 3) model. Instead, we use the maximum likelihood ARMA(2, 2) model with appended coefficients r/J3 = 83 = 0. The maximum likelihood results are shown in Table 9.6(b). The AICC values have a clearly defined minimum at p = 2. Comparing each coefficient of the maximum likelihood ARMA(2, 2) model with its stan dard error we obtain the results shown in Table 9.7, which suggest dropping the coefficient 82 and fitting an ARMA(2, 1 ) process. Maximum likelihood estimation then gives the model (for the mean-corrected data), X, - 1 . 1 85X,_ 1 + .624X,_ 2 = Z, + .703Z,_ 1 , {Z, } ""' WN (0, .986), (9.2. 1 0) with AICC value 576.88 and BIC value 586.48. Table 9.7. Comparison of J1 , J2 , 01 and 02 with Their Standard Errors (Obtained from the Program PEST) Estimated coefficient Estimated coefficient 1 .96 * (Standard error) �1 �2 {)1 {)2 1.118 - .580 .798 . 103 5.8 1 1 - 3.605 3.604 .450 If now we fit AR(p) models of order p = 2, . . . , 6 we obtain the results shown in Table 9.8. The smallest AICC and BIC values are both achieved when p = 5, but the values are substantially larger than the corresponding values §9.2. Identification Techniques 299 Table 9.8. Maximum Likelihood AR(p) Models for Example 9.2.3 6'2 AICC p �6 �5 �4 �3 �2 �I 2 3 4 5 6 1 .379 1.712 1 .839 1 .89 1 1 .909 - .773 - 1 .364 - 1 .760 - 1 .932 - 1 .991 .428 .91 9 1 .248 1 .365 - .284 - .627 - .807 . 1 86 .362 - .092 1 .380 1.121 1 .029 .992 .984 640.83 602.03 587.36 582.35 582.77 BIC FPE 646.50 61 1 .77 600.98 599.66 603.56 1 .408 1.155 1 .071 1 .043 1 .044 for (9.2. 1 0). We therefore select the ARMA(2, 1 ) model, subj ect to its passing the goodness of fit tests to be discussed in Section 9.4. The data for this example were in fact generated by the Gaussian process, { Z, } � WN(O, 1 ). (9.2. 1 1) (d) Use of the R esiduals for Model Modification. When an ARMA model </!(B) X, = 8(B)Z1 is fitted to a given series, an essential part of the procedure is to examine the residuals, which should, if the model is satisfactory, have the appearance of a realization of white noise. If the autocorrelations and partial autocorrelations of the residuals suggest that they come from some other clearly identifiable process, then this more complicated model for the residuals can be used to suggest a more appropriate model for the original data. If the residuals appear to come from an ARMA process with coefficient vectors <1-z and 9z, this indicates that { Z, } in our fitted model should satisfy <Pz(B)Z1 = 8z(B) W, where { W, } is white noise. Applying the operator <Pz(B) to each side of the equation defining {X, } , we obtain, </lz(B) </J (B)X, = </!z(B)8(B)Z1 = 8z(B)8(B) W, , (9.2. 1 2) where { W, } is white noise. The modified model for { X, } is thus an ARMA process with autoregressive and moving average operators <Pz(B)</!(B) and 8z(B)8(B) respectively. EXAMPLE 9.2.4. Consider the AR(2) model in Table 9.8, (1 - 1 .379B + .773B 2 )X, = Z,, (9.2. 1 3) which was fitted to the data of Example 9.2.3. This is an unsatisfactory model, both for its high AICC value and the non-whiteness of its residuals. The sample autocorrelation and partial autocorrelation functions of its residuals are shown in Figure 9. 1 5. They suggest an MA(2) model for {Z,}, i.e. Z, = (1 + 8 1 B + 82 B 2 ) W,, { W, } � WN(0, 0" 2 ). (9.2. 14) From (9.2. 1 3) and (9.2. 14) we obtain an ARMA(2, 2) process as the modified model for { X, } . Fitting an ARMA(2, 2) process by maximum likelihood and allowing subsets of the coefficients to be zero leads us to the same model for the data as was found in Example 9.2.3. 300 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0 5 -0.6 -0.7 -0.8. -0.9 -1 9. Model Building and Forecasting with ARIMA Processes 0 10 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 (b) Figure 9. 1 5. The sample ACF (a) and PACF (b) of the residuals when the data of Example 9.2.3 is fitted with the AR(2) model (9.2. 1 3). §9.3. Order Selection 301 It is fortunate in Example 9.2.4 that the fitted AR (2) model has an auto regressive polynomial similar to that of the best-fitting ARMA(2, 1) process (9.2. 10). It frequently occurs, when an AR(p) model is fitted to an ARMA(p, q) process, that the autoregressive polynomials for the two processes are totally different. The residuals from the AR ( p) model in such a case are not likely to have a simple form such as the moving average form encountered in Example 9.2.4. §9.3 Order Selection In Section 9.2 we referred to the problem of overfitting and the need to avoid it by imposing a cost for increasing the number of parameters in the fitted model. One way in which this can be done for pure autoregressive models is to minimize the final prediction error (FPE) of Akaike ( 1969). The FPE is an estimate of the one-step prediction mean squared error for a realization of the process independent of the one observed. If we fit autoregressive processes of steadily increasing order p to the observed data, the maximum likelihood estimate of the white noise variance will usually decrease with p, however the estimation errors in the expanding set of fitted parameters will eventually cause the FPE to increase. According to the FPE criterion we then choose the order of the fitted process to be the value of p for which the FPE is minimum. To apply this criterion it remains only to express the FPE in terms of the data x 1 , . . . ' x• . Assume that {X 1 , , x. } is a realization of an AR ( p) process with coeffi cients r/J 1 , . . . , r/JP , (p < n), and let { Y1 , . . . , Y.} be an independent realization of the same process. If J 1 , . . . , JP are the maximum likelihood estimators of the coefficients based on { X 1 , . . . , X. } and if we use these to compute the one-step predictor J 1 Y. + · · · + JP ¥, + 1 - p of ¥,+ 1 , then the mean-square prediction error 1s . . • E ( Yn +1 - J 1 Y. - · · · - JP Yn + 1 - p )2 = E [ Yn+ 1 - rP1 Y, - · · · - r/Jp ¥,+ 1 -p - (J, - r/J 1 ) Y. - . . . - (Jp - r/Jp ) ¥,+1 - p] z = (J 2 + E [(cjlp - cjlp )' [ Y,+l - i ¥,+1 -J L �1 (cjlp - cjlp ) ] , A A where cjl� = (¢;1 , . . . , r/Jp ), �� = (J1 , . . . , JP) and CJ 2 is the white noise variance of the AR ( p) model. Writing the last term in the preceding equation as the expectation of the conditional expectation given X 1 , . . . , x., and using the independence of {X 1 , . . . , X. } and { Y1 , , Y. }, we obtain • • • E ( ¥,+1 - J1 Y, - · · · - Jp ¥,+1 -p )2 = (J z + E [(�p - cjlp )' fp (�p - cjlp )], where rP = E [ Y; l}J L � 1 . We can approximate the last term by assuming that 302 9. Model Building and Forecasting with ARIMA Processes n 1 12 (�p - cf>p) has its asymptotic distribution N (O, 0" 2 rP- 1 ) from Theorem 8. 1 . 1 . This gives (see Problem 1 . 1 6) E( Yn+l - J1 Yn - · · · - Jp Y,+ l -p) 2 � ( �). (9.3.1) 0"2 1 + I f 172 i s the maximum likelihood estimator o f 0" 2 then for large n, n<1 2 /0" 2 is distributed approximately as chi-squared with (n - p) degrees of freedom (see Section 8.9). We therefore replace 0" 2 in (9.3.1) by the estimator na 2/(n - p) to get the estimated mean square prediction error of Yn +l , FPE = a2 n+p . n-p (9.3.2) Inspection of Table 9.2 shows how the FPE decreases to a minimum then increases as p is increased in Example 9.2. 1 . The same table shows the non increasing behaviour of 17 2 . A more generally applicable criterion for model-selection is the information criterion of Akaike ( 1973), known as the AIC. This was designed to be an approximately unbiased estimate of the Kullback-Leibler index of the fitted model relative to the true model (defined below). Here we use a bias-corrected version of the AIC, referred to as the AICC, suggested by Hurvich and Tsai ( 1989). If X is an n-dimensional random vector whose probability density belongs to the family {!( · ; 1/J), 1/J E 'P}, the Kullback-Leibler discrepancy between f( · ; 1/1) and f( · ; 8) is defined as where d( 1/1 I 8) = il( 1/1 I 8) - il(8 I 8), .!l(l/1 1 8) = = E8( - 2 ln f(X ; 1/1)) r - 2 ln(f(x; 1/J))f(x; 8) dx, J n;Jn is the Kullback-Leibler index of f(· ; 1/J) relative to f( - ; 8). (In general .!l(l/1 1 8) -:f .!l(8 1 1/J).) Applying Jensen's inequality, we see that d(l/1 I 8) = ?: I J n;J n - - 2 ln 2 ln = - 2 ln = 0 ( ) f(x; 1/J) f(x; 8) dx f(x ; 8) (Jrn;Jn (t/ f(x, 1/1) f(x ; 8) dx f(x ; 8) (x ; I/J) dx with equality holding if and only if f(x; 1/J) = ) ) f(x; 8) a.e. [f( · 8)]. , §9.3. Order Selection 303 , X n of an ARMA process with unknown Given observations X 1 , parameters 8 = (p, a2), the true model could be identified if it were possible to compute the Kullback-Leibler discrepancy between all candidate models and the true model. Since this is not possible we estimate the Kullback-Leibler discrepancies and choose the model whose estimated discrepancy (or index) is minimum. In order to do this, we assume that the true model and the alternatives are all Gaussian. (See the Remark below for further comments on this point.) Then for any given 8 (p, a2), f( · ; 8) is the probability density of ( Y1 , , Y,)', where { t;} is a Gaussian ARMA(p, q) process with coefficient vector p and white noise variance rr2. (The dependence of 8 on p and q is through the dimension of the autoregressive and moving average coefficient vectors in p.) Suppose therefore that our observations X 1 , . . . , X n are from a Gaussian ARMA process with parameter vector 8 = (p, rr2) and assume for the moment that the true order is (p, q). Let fJ (p, 8-2) be the maximum likelihood estimator of 8 based on X I• ' xn and let Yl, . . . ' Y, be an independent realization of the true process (with parameter 8). Then . . • = • . . • . • - 2 ln L r(P, 8-2) so that = = - 2 ln L x(P, 8-2) + & - 2 Sr(P) - n, (9.3.3) Making the local linearity approximation used in Section 8. 1 1, we can write, for large n, [ asy 1 a2 Sr(P) n Sr(P) � Sr(P) + (p - p) 8i (p) + 2 (p - py (p - p) a {3; a{3j i , j = I n a zt � S r( P) + ( p - P)2 L (p)Zt(P) + (p - PY D'D (p - p). t = l ap A A A A J A A A � p From Section 8. 1 1, we know that n - 1 D'D--+a2 V - 1 ( p), p is AN(p, n - 1 V( p)), and that (aZ1jap)(P)Z1(P) has mean 0. Replacing D'D by nrr2 V - 1 (P) and assuming that n 1 12(P - p) has covariance matrix V(p), we obtain A EII,,,[Sr(P)] � � A A EII,,,[Sr(P)] + rr 2 E11.A(P - p)' V - 1 (p)(p - p)] rr2n + rr 2 (p + q), A I since (azt�ap)(p)Zt(P) is independent of p - p and E(U'I: - U) = trace(I:I: - 1 ) = k for any zero-mean random k-vector U with nonsingular covariance matrix I:. From the argument given in Section 8.9, nG-2 Sx( P) is distributed approximately as rr2x2(n - p - q) for large n and is asymptotically independent of p. With the independence of {X 1 , . . . , X n } and { Y1 , , Y,}, this implies that 8-2 is asymptotically independent of Sr( P). = • • • 9. Model Building and Forecasting with ARIMA Processes 304 (sr(P)) Consequently, Efl,a' fj2 - n � a2 (n + p + q)(Efl,<>' a 2 ) - n A - 2(p + q + 1 )n n-p-q-2 Thus the quantity, - 2 In Lx( P, ff 2 ) + 2(p + q + 1)n/(n - p - q - 2), is an approximately unbiased estimate of the expected Kullback-Leibler index £8(�(8 1 8)) in (9.3.3). Since the preceding calculations (and the maximum likelihood estimators p and ff 2 ) are based on the assumption that the true order is (p, q), we therefore select the values of p and q for our fitted model to be those which minimize AICC(p), where A ICC(�) := - 2 I n Lx(�, Sx(�)/n) + 2(p + q + 1 )n/(n The AIC statistic, defined as p - q - 2). (9.3.4) AIC(�) := - 2 ln Lx(�, S x(�)/n) + 2(p + q + 1 ), can be used in the same way. Both AICC(�, a2) and AIC(�, a 2 ) can be defined for arbitrary a 2 by replacing Sx(�)/n in the preceding definitions by a 2 , however we shall use AICC(�) and AIC(�) as defined above since both AICC and AIC are minimized for any given � by setting a 2 = Sx(�)/n. For fitting autoregressive models, Monte Carlo studies (Jones ( 1 975), Shibata ( 1976)) suggest that the AIC has a tendency to overestimate p. The penalty factors, 2(p + q + 1 )n/(n - p - q - 2) and 2(p + q + 1 ), for the AICC and AIC statistics are asymptotically equivalent as n -> oo . The AICC statistic however has a more extreme penalty for large-order models which counter acts the overfitting tendency of the AI C. The BIC is another criterion which attempts to correct the overfitting nature of the AI C. For a zero-mean causal invertible ARMA(p, q) process, it is defined (Akaike ( 1978)) to be, BIC = (n - p - q) In[nff 2/(n - p - q)] + n(1 + In Fn) + (p + q) I n [(t, X� - )/(p J nff 2 + q) (9.3.5) where ff 2 is the maximum likelihood estimate of the white noise variance. The BIC is a consistent order selection procedure in the sense that if the data {X 1, , Xn } are in fact observations of an ARMA(p, q) process, and if p and q are the estimated orders found by minimizing the BIC, then p -> p and q -> q with probability one as n -> oo (Hannan ( 1980)). This property is not shared by the AICC or AIC. On the other hand, order selection by minimization of the AICC, AIC or FPE is asymptotically efficient for autoregressive models, while order selection by BIC minimization is not • • • §9.3 Order Selection 305 (Shibata ( 1980), Hurvich and Tsai (1989)). Efficiency in this context is defined as follows. Suppose that {X,} is a causal AR(oo) process satisfying 00 L nj X, _ j = Z,, j=O (where n 0 = 1 ) and let (¢ P 1 , . . . , ¢ PP)' be the Yule-Walker estimates of the coefficients of an AR(p) model fitted to the data {X 1, , x.} (see (8.2.2)). • • • The one-step mean-square prediction error for an independent realization { Y, } of {X,}, based on the AR(p) model fitted to {X,}, is 00 = Ex( z:+ 1 - (¢ p1 + n d Y, - · · · - (¢PP + np) ¥;, + 1 - p - L nj ¥;, + 1 -) 2 j= p + 1 = (}"2 + (<l>p, oo + 1too ) roo («<>p, oo + 1too ) A f A =: H(p) where E x denotes expectation conditional on X 1 , , x •. Here {zn is the white noise associated with the { Y,} process, r oo is the infinite-dimensional covarianc� matrix of { Y,}, and <J>p. oo and 1t 00 are the infinite-dimensional , vectors, ( ¢ P 1 , . . . , rf; PP' 0, 0, . . .)' and (n 1 , n 2 , . . . )' . Now if v: is the value of p which minimizes H(p), 0 ::::; p ::::; k., and k. is a sequence of constants converging to infinity at a suitable rate, then an order selection procedure is said to be efficient if the estimated order Pn satisfies . • • H(p:) !.. 1 H{P.) as n -+ oo . In other words, an efficient order selection procedure chooses an AR model which achieves the optimal rate of convergence of the mean-square prediction error. Of course in the modelling of real data there is rarely such a thing as the " true order". For the process X, = L� o l/Jj Zr - j there may be many polynomials 8(z), ¢(z) such that the coefficients of zj in 8(z)j¢(z) closely approximate l/Jj for moderately small values ofj. Correspondingly there may be many ARMA processes with properties similar to {X,}. This problem of identifiability becomes much more serious for multivariate processes. The AICC criterion does however provide us with a rational criterion for choosing between competing models. It has been suggested (Duong ( 1 98 1)) that models with AIC values within c of the minimum value should be considered competitive (with c = 2 as a typical value). Selection from amongst the competitive models can then be based on such factors as whiteness of the residuals (Section 9.4) and model simplicity. Remark. In the course of the derivation of the AICC, it was assumed that the observations {X 1 , , X.} were from a Gaussian ARMA(p, q) process. However, even if (X 1 , . . . , X.) has a non-Gaussian distribution, the argument • • . 9. Model Building and Forecasting with ARIMA Processes 306 given above shows that the AICC is an approximately unbiased estimator of E( - 2 In Ly(p, 8 2)), (9.3.6) where the expectation is now taken relative to the true (possibly non-Gaussian) distribution of (X 1 , • • . , X nY and ( Y1 , . . . , Y,.)' and Ly is the Gaussian likelihood based on ( Y1 , . . . , Y,.)'. The quantity in (9.3.6) can be interpreted as the expected Kullback-Leibler index of the maximum likelihood Gaussian model relative to the true distribution of the process. The AICC for Subset Models. We frequently have occasion, particularly in analyzing seasonal data, to fit ARMA(p, q) models in which all except m ( :s;; p + q) of the coefficients are constrained to be zero (see Example 9.2.2). In such cases the definition (9.3.4) is replaced by, AICC(�) = - 2 ln Lx(�, Sx(�)/n) + 2(m + l )n/(n - m - 2). (9.3.7) §9.4 Diagnostic Checking Typically the goodness of fit of a statistical model to a set of data is judged by comparing the observed values with the corresponding predicted values obtained from the fitted model. If the fitted model is appropriate, then the residuals should behave in a manner which is consistent with the model. When we fit an ARMA(p, q) model to a given series we first find the maximum likelihood estimators �, 9 and 6- 2 of the parameters �' 9 and (J 2 • In the course of this procedure the predicted values X,(�, 0) of X, based on X 1 , , X,_ 1 are computed for the fitted model. The residuals are then defined, in the notation of Section 8.7, by • . • t = 1 , . . . , n. (9.4. 1) If w e were t o assume that the maximum likelihood ARMA(p, q) model i s the true process generating {X, }, then we could say that { Jt; } "' WN (0, 6- 2 ). However to check the appropriateness of an ARMA(p, q) model for the data, we should assume only that X 1 , • • • , X" is generated by an ARMA(p, q) process with unknown parameters �' 9 and (J 2 , whose maximum likelihood estimators are �, 9 and 6- 2 respectively. Then it is not true that { Jt; } is white noise. Nonetheless Jt;, t = 1, . . . , n should have properties which are similar to those of the white noise sequence, t = 1 , . . . , n. Moreover by (8. 1 1 .2), E( W,(�, 9) - Z,>2 is small for large t, so that properties of the residuals { Jt; } should reflect those of the white noise sequence { Z, } generating the underlying ARMA(p, q) process. I n particular the sequence { lt; } should be approximately (i) uncorrelated if {Z, } "' WN(O, (J 2 ), (ii) inde pendent if { Z, } "' IID(O, (J 2 ), and (iii) normally distributed if Z, "' N (0, (J 2 ). §9.4. Diagnostic Checking 307 Remark. There are several other candidates for the title "residuals" of a fitted ARMA process. One choice for example is (see Problem 5 . 1 5(a)) Z, = {J - l (B) J(B)X" t = 1, . . . , n, where J(z) : = 1 J1 z · · · JP z P, B(z) : = 1 + 0 1 z + · · · + Bq z q and X, := 0, t :<:;; 0. However we prefer to use the definition (9.4. 1 ) because of its direct interpretation as a scaled difference between an observed and a predicted value, and because it is computed for each t in the course of determining the maximum likelihood estimates. - - - The Graph of {l¥,, t = 1 , . . . , n } . If the fitted model is appropriate, then the graph of W,, t = 1 , . . . , n, should resemble that of a white noise sequence. While it is difficult to identify the correlation structure of { W, } (or any time series for that matter) from its graph, deviations of the mean from zero are sometimes clearly indicated by a trend or cyclic component, and non-constancy of the variance by fluctuations in W, whose magnitude depends strongly on t. The residuals obtained from fitting an AR(3) model to the data in Example 9.2. 1 are displayed in Figure 9. 1 6. The residuals W, have been rescaled; i.e. divided by the estimated standard deviation cr, so that most of the values should lie between ± 1 .96. The graph gives no indication of a non-zero mean or non-constant variance, so on this basis there is no reason to doubt the compatibility of wl ' . . . ' If;. with white noise. The next step is to check that the sample autocorrelation function of wl ' . . . ' If;. behaves as it should under the assumption that the fitted model is appropriate. 3 2 I 0 II IV -1 � � I� I� lA � � v -2 -3 0 20 40 60 80 1 00 1 20 1 40 1 60 1 80 200 Figure 9. 16. Rescaled residuals from the AR(3) model for the data of Example 9.2. 1 . 9. Model Building and Forecasting with ARIMA Processes 308 The Sample Autocorrelation Function of Jt; . The sample autocorrelations of an iid sequence Z 1 , , z. with E(Z,l} < oo are for large n approximately iid with distribution N (O, 1/n) (see Example 7.2. 1 ). Assuming therefore that we have fitted an appropriate ARMA model to our data and that the ARMA model is generated by an iid white noise sequence, the same approximation should be valid for the sample autocorrelation function of Jt;, t = 1 , . . . , n, defined by • • . h = 1 , 2, . . . where W = n- 1 L�= 1 Jt;. However, because each Jt; is a function of the maxi mum likelihood estimator (�, 0), W1 , . . . , J¥, is not an iid sequence and the distribution of fiw(h) is not quite the same as in the iid case. In fact fiw(h) has an asymptotic variance which for small lags is less than 1 /n and which for large lags is close to 1/n. The asymptotic distribution of fiw (h) is discussed below. Let Pw = (fiw( 1 ), . . . , Pw(h))' where h is a fixed positive integer. If { X, } is the causal invertible ARMA process </J(B)X, = O(B)Z,, define (9.4.2) and a(z) = (�(zW 1 It will be convenient also to define ai Assuming h 2: p + q, set = = 00 aizi. L j=O (9.4.3) 0 for j < 0. (9.4.4) and (9.4.5) Note that fp+q is the covariance matrix of ( Y1 , . . . , Yp+ q) where { Y, } is an AR(p + q) process with autoregressive polynomial given by �(z) in (9.4.2) and with rJ2 = 1 . Then using the argument given in Box and Pierce ( 1 970), it can be shown that (9.4.6) where Ih is the h x n- 1 ( 1 - qu). h identity matrix. The asymptotic variance of fiw (i) is thus EXAMPLE 9.4. 1 (AR( 1 )). In this case fp + q = (1 - </J2)- 1 and qii = qii (</J) = </J2 ( i -1)( 1 - </J2), i = 1 , 2, . . . . 309 §9.4. Diagnostic Checking 0.2 0. 1 5 0. 1 0 . 05 0 -0.05 -0. 1 -0. 1 5 -0.2 0 5 10 Figure 9. 1 7. The bounds ± 1 .96n- 112 ( 1 - qii(r/J))1 12 of Example 9.4.1 with n = 1 00 and rjJ = 0 (outer), rjJ = .8 (inner). The bounds ± 1 .96( 1 - qii (rp)) 112 n - 112 are plotted in Figure 9. 1 7 for two values of rp . In applications, since the true value of rp is unknown, the bounds ± 1 .96( 1 - q;; (�)) 1i2 n - 1 i2 are plotted. A value of Pw(h) lying outside these bounds suggests possible inconsistency of the residuals, a;, t = 1, . . . , n, with the fitted model. However it is essential to bear in mind that approximately 5 percent of the values of Pw (h) can be expected to fall outside the bounds, even if the fitted model is correct. ExAMPLE 9.4.2 (AR(2)). A straightforward calculation yields q 1 2 = - ¢ 1 r/J2 ( 1 + r/J2 ), q 2 2 = 1 - r/Ji - ¢ f ( 1 + r/J2 ) 2 . Since the sequence { ai } in (9.4.3) satisfies the recursion relations, q l l = 1 - r/Ji, ai - ¢ 1 ai - 1 - rp2 ai - 2 = 0, j ?. 2, it follows from (9.4.5) that (9.4.7) and hence that q ii = ¢ 1 qi,i - 1 + ¢2 qi,i - 2 · The asymptotic variance (1 q;; (cjl))n - 1 can thus easily be computed using the recursion (9.4.7) and the initial values q 1 1 , q 1 2 and q2 2 . The auto - correlations of the estimated residuals from the fitted AR(2) model in Example 3 10 9. Model Building and Forecasting with ARIMA Processes 0.2 0. 1 5 0. 1 0 . 05 0 -0.05 -0. 1 -0. 1 5 -0.2 10 0 20 30 40 Figure 9. 1 8. The autocorrelations of the residuals { W, } from the AR(2) model, X, - 1 .458Xr - � + 6X, 2 = Z,, fitted to the data in Example 9.2. 1 . The bounds are computed as described in Example 9.4.2. . _ 9.2. 1 and the bounds ± 1 .96(1 - q;; (J1 , J2 )) 1 12 n - 112 are plotted in Figure 9. 1 8. With the exception of Pw ( 1 3), the correlations are well within the confidence bounds. The limit distribution of Pw for M A ( 1 ) and MA(2) processes is the same as in Examples 9.4. 1 and 9.4.2 with cjl replaced by - 0. Moreover the ARMA ( 1 , 1 ) bounds can be found from the AR(2) bounds b y setting r/>1 = (¢> - 8) and ¢>2 - ¢>8 where ¢> and 8 are the respective parameters in the ARMA ( l , 1 ) model. = The Portmanteau Test. Instead of checking to see if each Pw(i) falls within the confidence bounds ± 1 .96(1 - q ;; ) 112 n - 1 12 , it is possible to consider instead a single statistic which depends on Pw(i), 1 s i s h. Throughout this discussion h is assumed to depend on the sample size n in such a way that (i) hn -> oo as n -> oo, and (ii) the conditions of Box and Pierce ( 1 970) are satisfied, namely = l{!i = O(n - 112 ) for j ;:::: h" where l{!i , j 0, 1, . . . are the coefficients in the expansion X, I.;o l{!i Z,_i, and (b) h. = O(n 1 12 ). Then since hn -> oo, the matrix fp+ q may be approximated by T;I;. and so the (a) = matrix Q in (9.4.5) and (9.4.6) may be approximated by the projection matrix (see Remark 2 of Section 2.5), 311 §9.4. Diagnostic Checking I;, ( n i;, )- 1 n , which has rank p + q. Thus if the model is appropriate, the distribution of Pw = (p w ( l ), . . . ' P w (h))' is approximately N(O, n -1 (Ik - I;, ( T� 7;, )- 1 rn). It then follows from Problem 2. 1 9 that the distribution of h A 2 ( ") Q w = n PwPw = n " L... Pw J A' A j= 1 is approximately chi-squared with h - (p + q) degrees of freedom. The ade quacy of the model is therefore rejected at level ex if Qw > x i - a (h - p - q). Applying this test to the residuals from the fitted AR(3) model in Example 9.2. 1 with h = 25, we obtain n I J;1 p?v ( j) = 1 1 .995, which is less than x 295(22) = 33.9. Thus on the basis of this test, there is no reason to doubt the adequacy of the fitted model. For the airline data in Example 9.2.2, we have n I J;1 p?v ( j) = 1 2. 1 04 for the fitted moving average model with non-zero coefficients at lags 1 , 3, 5, 1 2 and 23. Comparing this value with x 29 5 (25 - 5) = 3 1 .4, we see that the residuals pass the portmanteau test. Note that the number of coefficients fitted in the model is 5. For the residuals from the AR(2) model fitted to the data of Example 9.2.4, we obtain n IJ;1 p?v ( j) = 56.61 5 which is larger than x 295(23) = 35.2. Hence, as observed earlier, this model is not a good fit to the data. Ljung and Box ( 1 978) suggest replacing the statistic Qw in the above test procedure with Q w = n(n + 2) h L pfv ( j)/(n - j). j= 1 They argue that under the hypothesis of model adequacy, the cutoff value given by x i - a (h - p - q) is closer to the true ( 1 - a)-quantile of the distribu tion of Q w than to that of Qw . However, as pointed out by Davies, Triggs and Newbold ( 1 977) the variance of Q w may exceed that of a x 2 distribution with h - p - q degrees of freedom. The values of Q w with h = 25 for Examples 9.2. 1 and 9.2.2 are 1 2.907 and 1 3.768, respectively. Hence the residuals pass this test of model adequacy. Examination of the squared residuals may often suggest departures of the data from the fitted model which could not otherwise be detected from the residuals themselves. Granger and Anderson ( 1 978) have found examples where the residuals were uncorrelated while the squared residuals were cor related. We can test the squared residuals for correlation in the same way that we test the residuals themselves. Let '\' n_=- h ( �V.Z L..., r -1 r , Pww (h) '\'n _ _ w 2 ) ( Wr 2+h _ L..., r = 1 ( Wr 2 - W 2 ) � w2 ) ' h > _ 1 be the sample autocorrelation function of the squared residuals where W 2 = 312 9 . Model Building and Forecasting with ARIMA Processes n - 1 I 7= 1 W, 2 . Then McLeod and Li ( 1983) show that h L NvwU )/(n j) Q ww = n(n + 2) j=1 has an approximate x2 (h) distribution under the assumption of model ade quacy. Consequently, the adequacy of the model is rejected at level a if Q ww > x f _,. (h). For Examples 9.2.1 and 9.2.2 with h = 25 we obtain the values «2 ww = 26.367 and Q ww = 1 6.356, respectively. Since x�95(25) = 37.7, the squared residuals - for these two examples pass this portmanteau test. An advantage of portmanteau tests is that they pool information from the correlations Pw (i), i = 1 , . . . , h at different lags. A distinct disadvantage how ever, is that they frequently fail to reject poorly fitting models. In practice portmanteau tests are more useful for disqualifying unsatisfactory models from consideration than for selecting the best-fitting model among closely competing candidates. Tests of Randomness. In addition to the tests based on the sample auto correlation function of { W, } which we have already described, there are a number of other tests available for checking the hypothesis of "randomness" of Uf;}, i.e. the hypothesis that { W, } is an iid sequence. Three of these tests are described below. For further details and for additional tests of randomness, see Kendall and Stuart ( 1976). (a) A Test Based on Turning Points. If y 1 , . . • , Yn is a sequence of observa tions, then we say that the data has a turning point at time i, 1 Y;+ 1 or if Y; - 1 > Y; and Y; < Y;+1 . Define T to be the number of turning points of the sequence y 1 , . . . , Yn - If y 1 , . . . , Yn are observations of a random (iid) sequence, then the probability of a turning point at time i is 1. The expected number of turning points is therefore - f.-LT = ET = 2(n - 2)/3. It can also be shown that the variance is (J¥ = V ar (T) = ( 1 6n - 29)/90. A large value of T - f.-LT indicates that the series is fluctuating more rapidly than expected for a random series. On the other hand a value of T f.-LT much smaller than zero indicates a positive correlation between neighboring observations. It can be shown that for an iid sequence - T is AN (J.-L T , (Jf), so the assumption that y 1 , • • • , Yn are observations from a random sequence is rejected if I T - f.-LT II(JT > <1> 1 _a12 where <1> 1 _a12 is the 1 - a/2 percentage point of a standard normal distribution. The values of T for the residuals in 313 §9.4. Diagnostic Checking Examples 9.2. 1 -9.2.3 are displayed i n Table 9.9. Inspecting the I T - J1rl /a1 column of the table we see that the three sets of residuals safely pass this test of randomness. = (b) The D ifference-Sign Test. For this test we count the number of values of i such that Yi > Y i -1 , i 2, . . . , n or equivalently the number of times the differenced series Yi - Yi-1 is positive. If we denote this number by S, it is clear that under the random sequence assumption, Jls = ES = t(n - 1). It can also be shown, under the same assumption, that a§ = Var(S) = (n + 1)/12, and that S is AN(Jls, a§}. A large positive (or negative) value of S - Jls indicates the presence of an increasing (or decreasing) trend in the data. We therefore reject the assumption of no trend in the data if I S - Jls l /as > <1\ - a;z · Table 9.9 contains the results of this test applied to the residuals of Examples 9.2.1 -9.2.3. In all three cases, the residuals easily pass this test of randomness. The difference-sign test as a test of randomness must be used with caution. A set of observations exhibiting a strong cyclic component will pass the difference-sign test for randomness since roughly half of the observations will be points of increase. (c) The Rank Test. The rank test is particularly useful for detecting a linear trend in the data. Define P to be the number of pairs (i, j ) such that Yi > Y i > j > i, i 1 , . . . , n - 1 . There is a total of ( � ) = tn(n - 1) pairs (i,j ) such that j > i, and for each pair the event { yj > yJ has probability t if {yj} is a random sequence. The mean of P is therefore Jlp ±n(n - 1 ). It can also be shown that the variance of P is af, n(n - 1)(2n + 5)/8 and that P is AN(Jlp, af,) (see Kendall and Stuart, 1976). A large positive (negative) value of P - Jlp indicates the presence of an increasing (decreasing) trend in the data. The assumption of randomness of { yj } is therefore rejected at level a if I P - Jlp l jap > <1> 1 - a;z · From Table 9.9 we see that the residuals from Examples 9.2. 1 -9.2.3 easily pass this test of randomness. = = = Table 9.9. Tests of Randomness Applied to Residuals in Examples Example 9.2. 1 Example 9.2.2 Example 9.2.3 9.2.1 -9.2.3 T fly I T - J.!TI/ar s J.!s I S - J.!s l /as p I P - flpl/ap 1 32 87 131 1 32 86 1 32 0 .21 . 10 99 65 1 04 99.5 65 99.5 .12 0 1.10 1 0465 3929 10086 .36 .44 .10 9 . Model Building and Forecasting with ARIMA Processes 314 Checking for Normality. If it can be assumed that the white noise process { Z,} generating an ARMA (p, q) process is Gaussian, then stronger conclusions can be drawn from the fitted model. For example, not only is it then possible to specify an estimated mean squared error for predicted values, but asymptotic prediction confidence bounds can also be computed (see Section 5.4). We now consider a test of the hypothesis that { Z,} is Gaussian. Let l( 1 l < 1( > < · · · < l(nl be the order statistics of a random sample 2 Y1 , , Y" from the distribution N(.u, 0"2 ). If X< 1 > < X< > < · · · < X<n> are the 2 order statistics from a N(O, 1) sample of size n, then • . . E l(jl = .u + O"mj , where mj = EX(j) , j = 1 , . . . , n. Thus a plot of the points (m 1 , l( 1 > ), . . . , (m", l(nl) should be approximately linear. However if the sample values Y; are not normally distributed, then the plot should be non-linear. Consequently, the squared correlation of the points (m; , l( il), i = 1 , . . . , n should be near one if the normal assumption is correct. The assumption of normality is therefore rejected if the squared correlation R 2 is sufficiently small. If we approximate m ; by <l>-1 ((i - .5)/n) (see Mage ( 1 982) for some alternative approximations), then R2 reduces to where Y = n - 1 ( Y1 + · · · + Y"). Percentage points for the distribution of R2 , assuming normality of the sample values, are given by Shapiro and Francia ( 1 972) for sample sizes n < 1 00. For n = 200, P (R2 < .987) = .05 and P (R2 < .989) = . 1 0; for n = 1 3 1 , the corresponding quantiles are .980 and .983. In Figure 9. 1 9, we have plotted (<l>- 1 ((i - .5)/n), lf( il), i = 1, . . . , n for the three sets of residuals obtained in Examples 9.2. 1 -9.2.3. The respective R2 values are .992, .984 and .990. Based on the graphs and the R2 values, the hypothesis that the residuals { W,}, and hence { Z, }, are normally distributed is not rejected, even at level . 1 0. §9.5 Forecasting ARIMA Models In this section we demonstrate how the methods of Section 5.3 can be adapted to forecast the future values of an ARIMA(p, d, q) process {X, } . (The required numerical calculations can be carried out using the program PEST.) If d 2 1 the first and second moments EX, and E(Xr+ h X,) are not determined by the difference equations (9. 1 . 1 ). We cannot expect therefore to determine best linear predictors for {X, } without further assumptions. §9.5. Forecasting ARIMA Models 315 3 ,-------,---� 0 0 0 - 3 �----,--,---�---� -3 3 -1 (a) 4 ,-------,---� 3 0 2 -2 -3 0 0 0 0 0 - 4 4------,---�---,---� 3 -3 - 1 (b) 1 Figure 9.19. Scatter plots of the points (<l>- ((i 5)/n) Wri!), i = I , . . (a) Example 9.2. 1 , (b) Example 9.2.2 and (c) Example 9.2.3. - . , . , n, for 316 9 . Model Building and Forecasting with ARIMA Processes 3 .-------,---. 0 2 oo o o - 3 �W------,---,_---,---� -3 3 -1 (c) Figure 9. 19. (continued) For example, suppose that { Y, } is a causal ARMA(p, q) process and that X0 is any random variable. Define X, = Xo + l lj , jL =! t = 1 , 2, . . . . Then { X" t :2:: 0} is an ARIMA(p, 1, q) process with mean EX, = EX0 and autocovariances E(X, + h X,) - (EX 0) 2 depending on Var(X 0) and Cov(X 0 , }j), j = 1 ' 2, . . . . The best linear predictor of xn+ l based on X0 ' X I ' . . . ' xn is the projection Ps" Xn+ l where Sn = sp{X0 , X 1 , . . . , Xn } = sp{X0, Y1 , . . . , Y, }. Thus Ps"Xn + l = PsJX o + Y1 + ··· + Y, + d = X n + Ps, Yn + ! · To evaluate this projection it is necessary in general to know E(X0 lj), j = 1 , . . . , n + 1 , and EX1;. However if we assume that X0 is uncorrelated with lj, j = 1 , 2, . . . , then Ps" Y, + 1 is simply the projection of Y, + 1 onto sp { Y1 , . . . , Y, } which can be computed as described in Section 5.3. The assumption that therefore suffices to determine the best X0 is uncorrelated with Y1 , Y2 , linear predictor Ps" Xn+ l in this case. Turning now to the general case, we shall assume that our observed process {X, } satisfies the difference equations, . . • , t = 1, 2, . . . ' §9.5. Forecasting ARIMA Models 317 where { Y; } is a causal ARMA(p, q) process, and that the vector (X 1 _d, . . . , X0) is uncorrelated with Y;, t > 0. The difference equations can be rewritten in the form X, = Y; - I ( �) ( j� 1 } t = 1 , 2, . . . . - 1 )i X,_i, (9.5 . 1 ) It i s convenient, b y relabelling the time axis if necessary, t o assume that we observe X 1 - d' X 2 - d, . . . , Xn . (The observed values of { Y;} are then Y1, . . . , Y;. .) Our goal is to compute the best linear predictor of Xn+ h based on x 1 - d' . . . , xn , i.e. Ps" Xn+ h : = psp{X 1 -d · · · · · Xn } Xn +h · In the notation of Section 5.2 we shall write and Since Sn = sp {X1 _d, . . . , X0, ¥1 , . . . , Y;. }, and since by assumption, we have (9.5.2) Hence if we apply the operator P8" to both sides of (9.5. 1 ) with t obtain Ps"Xn+h = Pn Y;. + h - t j� 1 ( �) ( } = n - 1 ) j Psn Xn+ h -j · + h, we (9.5.3) Since the predictors Pn Yn + 1 , P. Y;. + 2 , . . . , can be found from ( 5.3. 1 6), the pre dictors P8"Xn+ 1 , P8"X.+ 2 , , are then easily computed recursively from (9.5.3). In order to find the mean squared error of prediction it is convenient to express P. Yn+ h in terms of {XJ. For t :::0: 0 define . • . Then from (9.5. 1 ) and (9.5.3) with n = t we have t :::0: 0, and consequently for n p > m = max(p, q) and h q :::0: 1, P. Y;. + h = I cp i Pn Y;. + h - i + I en +h - 1 , /X n + h - j - x :+ h - J j�h i� 1 (9.5.4) 9 . Model Building and Forecasting with ARIMA Processes 318 Setting q) *(z) = ( 1 - z)d q)(z) = 1 - q)f z - · - lft;+d z p + d, we find from (9.5.2), (9.5.3) and (9.5.4) that for n > m and h � 1 , · · q p+d Psn Xn+h = L q)f Psn Xn + h -j + L en +h - l , j(Xn+ h-j - x:+h -J, j=l j=h (9.5.5) which is analogous to the h-step prediction formula (5.3. 1 6) for an ARMA process. The same argument which led to (5.3.22) shows that the mean squared error of the h-step predictor is (Problem 9.9) (9.5.6) Where 8n 0 = 1 , ro x (z) = I x, z ' = ( 1 - iftiz r=O · · · - lft;+ d z p + d ) - 1 , lzl < 1, and The coefficients Xi can be found from the recursions (5.3.2 1 ) with 1ft/ replacing q)i. For large n we can approximate (9.5.6), provided 8( · ) is invertible, by h-1 a;(h) = L l/JJa 2 , (9.5.7) j=O where l/J(z) = L l/Ji z i = (q)*(z)t 1 8(z), j=O ro lzl < 1 . EXAMPLE 9.5. 1 . Consider the ARIMA( 1 , 2, 1) model, (1 - q)B) ( 1 - B) 2 X, = ( 1 + 8B)Z,, t = 1 , 2, . . . ' where (X _ 1 , X 0) is assumed to be uncorrelated with the ARMA(1, 1 ) process, r; = ( 1 - Bf X,, t = 1 , 2, . . . . From (5.3 . 1 2) we have and pn yn+ ! = q) Y,. + en ! ( Y,. - Y,.) Since in this case q)*(z) = (1 - z) 2 ( 1 - q)z) we find from (9.5.5) that { = 1 - (I)) + 2)z + (21)) + 1 )z 2 - q)z 3 , Ps" Xn+! = (I)) + 2)Xn - (21)) + 1 )Xn - 1 + iftXn- 2 + 8" 1 ( Y,. - f,.), Ps" Xn+h = (q) + 2) Ps" Xn+h - ! - (21)) + 1 ) Ps" Xn+ h- 2 + lftPs" Xn+h - 3 for h > 1 . (9.5.8) §9.5. Forecasting ARIMA Models If for the moment we regard 319 n a s fixed and define the sequence { g(h)} by g(h) = Ps" Xn +h• { g(h)} satisfies the difference equations 1/> *(B)g(h) g(h) - (¢> + 2)g(h - 1) + (2¢> + 1 )g(h - 2) - if>g(h - 3) = 0, (9.5.9) h > 1, with initial conditions, then = (9.5.1 0) Using the results of Section 3.6, we can write the solution of the difference equation (9.5.9) in the form g(h) = a0 + a1 h + a 2 ¢> \ where a0, a1 and a 2 are determined by the initial conditions (9.5. 10). Table 9. 10 shows the results of predicting the values X199, X2 00 and X2 0 1 o f an ARIMA(1, 2, 1) process with ¢> = .9, 8 = .8 and a 2 = 1, based o n 200 observations {X _1 , X0, . . . , X1 9 8 } . By running the program PEST to compute the likelihood of the observations Y, = ( 1 - B) 2 X,, t = 1, . . . , 198, under the model, {Z, } � WN(O, 1), we find that Y1 98 - Y1 98 = - 1 .953, 81 97 , 1 = .800 and v 197 = 1 .000. Since 81 97, 1 = limn�co en , 1 and v ! 97 = limn�ro vn to three decimal places, we use the large-sample approximation (9.5.7) to compute a?98(h). Thus ( 1 - .9B) Y, = (1 + .8B)Z,, h- 1 h-1 a?98(h) = L t/lf a 2 = L t/Jf , j� O where j� O t/J (z) = 8(z)/¢> *(z) = (1 + .8z)(1 - 2.9z + 2.8z 2 - .9z3r 1 = 1 + 3.7z + 7.93z 2 + 1 3.537z3 + . . · Since X19 6 = lzl < 1. - 221 95.57, X 1 9 7 = - 22335.07, X198 = - 22474.41 and , X19 s - x r98 = Y1 9 s - Y1 9 s equation (9.5.8) gives, = - 2261 5. 1 7. = - 1.95, Ps , 9s X199 = 2.9X 1 9 s - 2.8 X197 + .9X 1 9 6 + .8(X!9 s - X (98) 9. Model Building and Forecasting with ARIMA Processes 320 Table 9. 10. Predicted Values Based on 200 Observations {X _ 1 , X0, . . . , X198 } of the ARI M A ( 1 , 2, 1 ) Process in Example 9.5. 1 (the Standard Deviation of the Prediction Error Is Also Shown) h -1 0 Ps , x t 9s+h - 22335.07 0 - 22474.41 0 •• (J [ gg (h ) - 226 1 5. 1 7 2 3 - 22757. 2 1 3.83 - 22900.41 8.81 These predicted values and their mean squared errors can be found from PEST. The coefficients a0, a1 and a 2 in the function, g(h) = P5,98 X1 9 s +h = a0 + a 1 h + a 2 (.9)h, h 2: - 1, can now be determined from the initial conditions (9.5. 1 0) with n = 1 98. These give g(h) = - 22346.6 1 - 1 53.54h - 1 27.8(.9)h . Predicted values P5198 X 1 9 s + h for any positive h can be computed directly from g(h). More generally, for an arbitrary ARIMA(p, d, q) process, the function defined by g(h) = P5"Xn+ h satisfies the (p + d)1h-order difference equation, r/J*(B)g(h) = 0 for h > q, with initial conditions h = q, q - 1 , . . . ' q + 1 - p - d. The solution g(h) can be expressed for d 2: 1 as a polynomial of degree (d - 1 ) plus a linear combination of geometrically decreasing terms corresponding to the reciprocals of the roots of r/J(z) = 0 (see Section 3.6). The presence of the polynomial term for d 2: l distinguishes the forecasts of an ARIMA process from those of a stationary ARMA process. §9.6 Seasonal ARIMA Models Seasonal series are characterized by a strong serial correlation at the seasonal lag (and possibly multiples thereof). For example, the correlation function in Figure 9.4 strongly suggests a seasonal series with six seasons. In Section 1 .4, we discussed the classical decomposition of the time series X, = m, + s , + Y, where m, is the trend component, s, is the seasonal component, and Y, is the random noise component. However in practice it may not be reasonable to assume that the seasonality component repeats itself precisely in the same way cycle after cycle. Seasonal ARIMA models allow for randomness in the seasonal pattern from one cycle to the next. §9.6. Seasonal ARIMA Models 32 1 Suppose we have r years of monthly data which we tabulate as follows: Month 2 Year 2 3 r 12 X! x1 3 Xz s Xz x14 Xz6 xl 2 Xz 4 x 36 X I + ! 2(r-1) x 2 +1 2(r-1) x 1 2+1 2(r-1) Each column in this table may itself be viewed as a realization of a time series. Suppose that each one of these twelve time series is generated by the same ARMA(P, Q) model, or more specifically that the series corresponding to the r mOnth, Xj+ 1 2P f 0, . . . , r - 1 , SatisfieS a difference equatiOn Of the form, = Xj+ l 2t = <ll 1 Xj+ l 2(t-l) + . . . + <llp Xj+ l 2 (t- P) + � + 1 2t + e l uj+ l 2(t - l ) + · · · + e Q uj + l 2(t - Q> • (9.6. 1 ) where { � +12n t = . . . , - 1 , 0, 1 , . . . } - WN(O, rrb). (9.6.2) Then since the same ARMA(P, Q) model is assumed to apply to each month, (9.6. 1 ) can be rewritten for all t as x t = <Di x t - 1 2 + · · · + <Dp X t - 1 2 P + ut + e 1 u1 - 1 2 + · · · + e Q ut - 1 2 Q , where (9.6.2) holds for each j = 1 , . . . , 1 2. (Notice however that E( U1 Ut + h ) is not necessarily zero except when h is an integer multiple of 1 2.) We can thus write (9.6. 1 ) in the compact form, (9.6.3) where <ll (z) = 1 - <ll 1 z - · · · - <llp z P, E>(z) = 1 + E> 1 z + · · · + E> Q z Q, and { � + 1 2n t = . . . , - 1 , 0, 1 , . . . } - WN(O, rrb ) for each j. We refer to the model (9.6.3) as the between-year model. EXAMPLE 9.6. 1 . Suppose p = 0, Q = 1 and e I = - .4. Then the series of observations for any particular month is a moving average of order 1 . If E(U1 Ut + h ) = 0 for all h, i.e. if the white noise sequences for different months are uncorrelated with each other, then the columns themselves are uncorre lated. The correlation function for such a process is displayed in Figure 9.20. EXAMPLE 9.6.2. Suppose P = 1, Q = 0 and <11 1 = .7. In this case the 12 series (one for each month) are AR( l ) processes which are uncorrelated if the white noise sequences for different months are uncorrelated. A graph of the cor relation function of this process is shown in Figure 9.20. 9. Model Building and Forecasting with ARIMA Processes 322 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0 5 -0.6 -0.7 -0.8 -0.9 -1 v 0 12 24 36 60 48 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 - 0 �8 -0.9 - 1 � 0 12 A \ 24 (b) 36 48 Figure 9.20. The autocorrelation functions of {X, } when (a) X, = U, 7X, 1 2 = U, (see Examples 9.6. 1 and 9.6.2). (b) X, - . _ J 60 - . 4U, 1 2 and _ §9.6. Seasonal ARIMA Models 323 It is unlikely that the 12 series corresponding to the different months are uncorrelated as in Examples 9.6. 1 and 9.6.2. To incorporate dependence between these series we assume now that the { U, } sequence in (9.6.3) follows an ARMA{p, q) model, l/J(B) U, = 8(B)Z1 , { Z, } � WN(O, CT 2 ). (9.6.4) This assumption not only implies possible non-zero correlation between consecutive values of U,, but also within the twelve sequences { LJ +1 2, t = . . . , - 1 , 0, 1, . . }, each of which was previously assumed to be uncor related. In this case (9.6.2) may no longer hold, however the coefficients in (9.6.4) will frequently have values such that E( U, U,+ 1 2J is small for j = ± 1 , ± 2, . . . . Combining the two models (9.6.3) and (9.6.4), and allowing for differencing leads us to the definition of the general seasonal multiplicative ARIMA process. . Definition 9.6.1 (The SARIMA(p, d, q) x (P, D , Q)s Process). If d and D are non-negative integers, then {X, } is said to be a seasonal ARIMA(p, d, q) x (P, D, Q)s process with period s if the differenced process t; : = ( 1 - B)d ( 1 - Bs)vX, is a causal ARMA process, l/J(B) (Bs ) t; = 8(B)0 (Bs )z,, where ¢{z) = 1 - ¢ 1 z - · · · - c/JpzP, <l>(z) = 1 - <1> 1 z el z + . . . + ezz q and 0(z) = 1 + e l z + . . . e Q z Q. · · · - <l>p zP, 8(z) = 1 + Note that the process { t; } is causal if and only if l/J(z) =1= 0 and <l>{z) =I= 0 for ::::; 1 . In applications, D is rarely more than one and P and Q are typically less than three. Because of the interaction between the two models describing the between year and the between-season dependence structure, the covariance function for a SARIMA process can be quite complicated. Here we provide general guidelines for identifying SARIMA models from the sample correlation function of the data. First, we find d and D so as to make the differenced observations lzl stationary in appearance (see Section 9.2). Next, we examine the sample auto correlation and partial autocorrelation functions of { t; } at lags which are multiples of s in order to identify the orders P and Q in the model (9.6.3). If p( - ) is the sample autocorrelation function of { t;} then P and Q should be chosen so that p(ks), k 1 , 2, . . . , is compatible with the autocorrelation function of an ARMA(P, Q) process. The orders p and q are then selected by attempting to match p ( 1 ), . . . , p(s - 1) with the autocorrelation function of an ARMA(p, q) process. Ultimately the AICC criterion (Section 9.3) and the goodness of fit tests (Section 9.4) are used to identify the best SARIMA model among competing alternatives. = 324 9. Model Building and Forecasting with ARIMA Processes For given values of p, d, q, P, D and Q, the parameters cj), 9, <1>, 0 and u 2 can be found using the maximum likelihood procedure of Section 8.7. The differences Yr = ( 1 - B)d(1 - nst X, constitute an ARMA(p + sP , q + s Q) process in which some of the coefficients are zero and the rest are functions of the (p + P + q + Q)-dimensional vector �' = W, ', 9', 0'). For any fixed � the reduced likelihood /(�) of the differences Y1 + + ' . . . , Y,, is easily d sD computed as described in Section 8.7. The maximum likelihood estimate of � is the value which minimizes /(�) and the maximum likelihood estimate of u 2 is given by (8.7.5). The estimates can be found using the program PEST by specifying the required multiplicative relationships between the coefficients. The forecasting methods described in Section 9.5 for A RIMA processes can also be applied to seasonal models. We first predict future values of the ARMA process { Yr} using (5.3. 1 6) and then expand the operator (1 - B)d(1 - Bs)D to derive the analogue of equation (9.5.3) which determines the best predictors recursively. The large sample approximation to the h-step mean squared error for prediction of {X, } is u; (h) = u 2 '[.J-:,6 t/J} , where u 2 is the white noise variance and '[.�0 t/Jiz i = 8 (z)0(z5)/[tft(z)<l>(z5) ( 1 - z)d(1 - zs)D], l z l < 1 . Invertibility is required for the validity of this approximation. The goodness of fit of a SARIMA model can be assessed by applying the same techniques and tests described in Section 9.4 to the residuals of the fitted model. In the following example we fit a SARIMA model to the series {X, } of monthly accidental deaths i n the U.S.A. (Example 1 . 1 .6). EXAMPLE 9.6.3. The accidental death series X 1 , , X 7 is plotted in Figure 2 1 .6. Application of the operator (1 - B) ( 1 - B 1 2 ) generates a new series { 'Yr } with no apparent deviations from stationarity as seen in Figure 1 . 1 7. The sample autocorrelation function p( · ) of { 'Yr } is displayed in Figure 9.2 1 . The values p ( 1 2) = - .333, p(24) = - .099 and p(36) = .01 3 suggest a moving average of order 1 for the between-year model (i.e. P = 0, Q = 1 ). More over inspection of p(1 ), . . . , p ( 1 1 ) suggests that p(1) is the only short-term correlation different from zero, so we also choose a moving average of order 1 for the between-month model (i.e. p = 0, q = 1 ). Taking into account the sample mean (28.83 1 ) of the differences Y, = (1 - B) ( 1 - B 1 2 )X,, we therefore arrive at the model, • . . { Z, } � WN (0, u 2 ), for the series { Yr } . The maximum likelihood estimates of the parameters are, and {J1 = - .479, 01 = - .59 1 , 8- 2 = 94240, with AICC value 855.53. The fitted model for {X,} IS thus the 325 §9.6. Seasonal ARIMA Models 1 0.9 0 8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 36 24 12 0 Figure 9.2 1 . The sample ACF of the differenced accidental deaths {VV1 2 X, }. p(l) - p(12): - .36 - . 1 0 .10 - . 1 1 .04 . 1 1 - .20 - .0 1 . 10 - .08 .20 - .33 p(13) - p(24): .09 p(25) - p(36): - .03 SARIMA(O, 1, 1) x . 1 2 - .04 - .06 . 1 8 - . 19 .09 - . 1 6 .02 .11 .02 .03 - .04 .05 - . 1 2 .04 .03 .00 - .09 .04 . 1 6 - . 10 .0 I (0, 1 , 1 ) 1 2 process ( 1 - B) (1 - B 1 2 )X, = Y; = 28.83 1 + ( 1 - .479B)( 1 - .59 1 B 1 2 )Z�> (9.6.5) where { Z,} � WN(O, 94240). Notice that the model (9.6.5) has a slightly more general form than in Definition 9.6. 1 owing to the presence of the constant term, 28.83 1 . Predicted values and their mean squared errors can however still be computed as described in Section 9.5 with minor modifications. Thus predicted values of { Y; } are obtained by adding 28.831 to the corresponding predicted values of the ARMA process { Y; - 28.83 1 }. From (9.6.5) it is easy to write out the analogue of (9.5. 1), from which the predicted values of {X,} are then found recursively as in Section 9.5. The mean-squared errors are given as before by (9.5.6), i.e. by ignoring the constant term in (9.6.5). Thus for large n the mean-squared h-step prediction error is approximately (from (9.5.7)), a� (h) = a2 h- 1 I t/JJ , where j� O a2 = 94240, 326 9. Model Building and Forecasting with ARIMA Processes = Table 9. 1 1 . Predicted Values of the Accidental Deaths Series for t 73, . . . , 78, the Standard Deviations CJ1 of the Prediction Errors, and the Observed Values X1 73 74 75 76 77 78 8441 307 7706 346 8550 38 1 8886 414 9844 443 1 028 1 471 8347 287 7620 322 8358 358 8743 394 9796 432 1 0 1 80 475 7798 7406 8363 8460 9217 93 1 6 Model (9.6.5) Predictors (JI Model (9.6.6) Predictors a, Observed values X, Instead o f fitting a S AR I M A model t o the series { X1 }, w e could look for the best-fitting moving average model for {VV1 2 X1} as we did in Example 9.2.2. This procedure leads to the model VV1 2 X1 = r; = 28.831 + Z1 - .596Z1 - l - .68521 _ 1 2 + .45821 -1 3 , - .405ZI_ 6 (9.6.6) where {Z1} � WN(O, 7 1 370). The residuals for the models (9.6.5) and (9.6.6) both pass the goodness of fit tests in Section 9.4. The AICC value for (9.6.6) is 855.61 , virtually the same as for (9.6.5). The program PEST can be used to compute the best h-step linear predictors and their mean squared errors for any ARIMA (or SARIMA) process. The asymptotic form (9.5.7) of the mean squared error (with CJ 2 replaced by vn) is used if the model is invertible. If not then PEST computes the mean squared errors by converting the model to an invertible one. In Table 9. 1 1 we show the predictors of the accidental deaths for the first six months of 1 979 together with the standard deviations of the prediction errors and the observed numbers of accidental deaths for the same period. Both of the models (9.6.5) and (9.6.6) are illustrated in the table. The second of these is not invertible. Problems 9. 1 . Suppose that {X, } is an ARIMA(p, d, q) process, satisfying the difference equations, ¢(B) ( 1 - B)d X, = O(B)Z,, Show that these difference equations are also satisfied by the process W, = X, + A0 + A 1 t + · · · + A d - t t d - t , where A 0 , . . . , A d - ! are arbitrary random variables. 327 Problems 9.2. The model fitted to a data set x 1, + X, .4X,_1 . . . , x 100 = Z,, is { Z, } - WN(O, 1 ). The sample acf and pacf of the residuals are shown in the accompanying table. Are these values compatible with whiteness of the residuals? If not, suggest a better model for {X, } , giving estimates of the coefficients. 2 3 4 5 6 7 8 9 10 II 12 .4 1 2 - .625 .025 - .044 - .228 .038 -.316 - .020 - .287 - .077 - . 1 98 - .007 -.I l l - .061 - .056 - .042 - .009 .089 .048 .052 . 1 33 . 1 25 La g ACF PACF .799 .799 9.3. Suppose { X, } is an MA(2) process, X, = Z, + 81 Zr-t + 82 Z,_ 2 , { Z, } WN(O, 0"2). If the AR( 1 ) process, ( I - ¢B)X, = Y, , is mistakenly fitted to {X,}, determine the autocovariance function of { Y,}. 9.4. The following table shows the sample acf and pacf of the series, Y, = VX, t = 1, . . . , 399, with I,�!i Y, = 0 and Yr(O) = 8.25. (a) Specify a suitable ARMA model for { Y, } , giving estimates of all the parameters. Explain your choice of model. (b) Given that X395 = 1 02.6, X396 1 05.3, X397 = 1 08.2, X398 = 1 10.5 and X 399 1 1 3.9, use your model to find the best mean square estimates of X4 00 and X40 1 and estimate the mean squared errors of your predictors. = = Lag 2 3 4 5 .4 1 8 - .068 .298 - .080 6 8 .115 - .083 .03 1 - .045 10 9 ACF PACF .808 .808 .654 .006 .538 .023 Lag II 12 13 14 15 16 17 18 19 20 - .03 1 - .0 1 6 - .069 - .09 1 - .096 - .034 -.1 1 1 .001 - . 1 26 - .034 -.115 .05 1 -.116 - .03 1 -.1 16 - .005 - . 1 05 .008 - .083 .001 ACF PACF .210 .003 7 .007 .09 1 - .010 .003 9.5. Consider the process Y, rxt + f3 + Z, , t 0, 1 , 2, . . , { Z, } - IID(0, 0"2 ), where rx and f3 are known. Observed values of Y0, . . . , Yn are available. Let W, = Y, - Y, _ 1 rx, t I , 2, . . . . (a) Find the mean squared error of the best linear predictor W, + 1 = en d Wn - W, ) (use Problem 5.1 3). Ps,;{ W I · · · · · Wn } w,+l (b) Find the mean squared error of the predictor of Yn + h given by Y, + wn + [ + hrx, h = 1, 2, . . . . (c) Compare the mean squared error computed in (b) with that of the best predictor E( Yn + h I Yo , . . . , Y,). (d)* Compute the mean squared error of the predictor in (c) when rx and f3 are replaced by the least squares estimators & and /3 found from Y0, . . . , Y, . = - = = = . 9. Model Building and Forecasting with ARIMA Processes 328 9.6. Series A (Appendix A) consists of the lake levels in feet (reduced by 570) of Lake Huron for July of each year from 1 875 through 1972. In the class of ARIMA models, choose the model which you believe fits the data best. Your analysis should include: (i) a logical explanation of the steps taken to find the chosen model, (ii) approximate 95% confidence bounds for the components of cjl and 0, (iii) an examination of the residuals to check for whiteness as described in Section 9.4. 9.7. The following observations are the values X0, , X9 of an ARIMA(O, 1 , 2) process, VX, = Z, - 1 . 1 Z,_ 1 + .28Z,_ 2 , {Z, } WN(O, 1): 2.83, 2. 1 6, .85, - 1.04, .35, - .90, . 1 0, - .89, - 1 .57, - .42. (a) Find an explicit formula for the function g(h) = P59X9 + h' h ;::: 0. (b) Compute a�(h) for h = 1, . . . , 5. • . • � 9.8. Let {X, } be the ARIMA(2, 1 , 0) process, (1 - .8B + .25B 2 )VX, = Z,, {Z, } � WN(O, 1 ). Determine the function g(h) = Ps Xn+h for h > 0. Assuming that n is large, compute a;(h) for h = 1, . . . ' 5. n 9.9. Verify equation (9.5.6). 9.10. Let { X, } be the seasonal process (1 - .7B 2 ) X, = (1 + .3B 2 ) Z, {Z, } � WN(O, 1). Find the coefficients { t/lj } in the representation X, = L �o tjljZr -j · Find the coefficients { nj} in the representation Z, = L�o njXt -j · Graph the autocorrelation function of {X, } . Find a n expression for P10X1 1 and P1 0X1 2 i n terms of X1, . . . , X10 and the innovations X, - X, t = 1, . . . , 1 0. (e) Find an explicit expression for g(h) = P1 0X1 o+h , h ;::: 1, in terms of g(1) and (a) (b) (c) (d) g(2). 9. 1 1. Let { X, } be the seasonal process, 1 X, = (1 + .2B)(1 - .8B 2 )Z,, (a) Determine the coefficients { nJ in the representation Z, = L}�o njXr -j · (b) Graph the autocorrelation function of {X, } . 9. 1 2. Monthly observations { D,, - 1 1 :<::: t :<::: n} are deseasonalized b y differencing at lag 1 2. The resulting differences X, = D, - D,_1 2 , t = 1, . . . , n, are then found to be well fitted by the ARMA model, X, - 1.3X,_1 + .5X,_ 2 = Z, + .5Z,_1 , {Z, } � WN(0, 3.85). Assume in the following questions that n is large and {D, - 1 1 :<::: t :<::: 0} is uncorrelated with {X,, t ;::: 1 }; P" denotes projection onto sp {X,, 1 :<::: t :<::: n} and Ps" denotes projection onto sp { D,, - 1 1 :<::: t :<::: n}. (a) Express P"X" + ' and PnXn+l in terms of X1 , . . . , X" and the innovations (Xj - �- 1 X), j = 1, . . . , n. Problems 329 (b) Express P5"Dn + l and P5"Dn+l in terms of {D,, - 1 1 � t � n}, PnXn+ l and PnXn+ 2 · (c) Find the mean squared errors of the predictors P5"Dn + l and P5"Dn + l · 9. 1 3. For each of the time series B-F in Appendix A find an ARIMA (or ARMA) model to represent the series obtained by deleting the last six observations. Explain and justify your choice of model in each case, giving approximate confidence bounds for estimated coefficients. Use each fitted model to obtain predicted values of the six observations deleted and the mean squared errors of the predictors. Compare the predicted and observed values. (Use PEST to carry out maximum likelihood estimation for each model and to generate the approxi mate variances of the estimators.) CHAPTER 1 0 Inference for the Spectrum of a Stationary Process In this chapter we consider problems of statistical inference for time series based on frequency-domain properties of the series. The fundamental tool used is the periodogram, which is defined in Section 1 0. 1 for any time series { x 1 , . . . , x"}. Section 1 0.2 deals with statistical tests for the presence of "hidden periodicities" in the data. Several tests are discussed, corresponding to various different models and hypotheses which we may wish to test. Spectral analysis for stationary time series, and in particular the estimation of the spectral density, depends very heavily on the asymptotic distribution as n � oo of the periodogram ordinates of the series {X 1 , , X" }. The essential results are contained in Theorem 1 0.3.2. Under rather general conditions, the periodo gram ordinates J"(Jc;) at any set of frequencies A 1 , . . . , Am, 0 < A 1 < · . . < ),m < n, are asymptotically independent exponential random variables with means 2nf(Jc;), were f is the spectral density of {X, } . Consequently the periodogram In is not a consistent estimator of 2nf. Consistent estimators can however be constructed by applying linear smoothing filters to the periodogram. The asymptotic behaviour of the resulting discrete spectral average estimators can be derived from the asymptotic behaviour of the periodogram as shown in Section 10.4. Lag-window estimators of the form (2n) - 1 L lh l o:: r w(h/r)y(h)e - ihw, where w(x), 1 :o:; x :o:; 1 , is a suitably chosen weight function, are also dis cussed in Section 10.4 and compared with discrete spectral average estimators. Approximate confidence intervals for the spectral density are given in Section 10.5. An alternative approach to spectral density estimation, based on fitting an ARMA model to the data and computing the spectral density of the fitted process, is discussed in Section 1 0.6. An important role in the development of spectral analysis has been played by the fast Fourier transform algorithm, which makes possible the rapid calculation of the periodogram for very large . • . - 331 § I 0. 1 . The Periodogram data sets. An introduction to the algorithm and its application to the computa tion of autocovariances is given in Section 1 0.7. The chapter concludes with a discussion of the asymptotic behaviour of the maximum likelihood estimators of the coefficients of an ARMA(p, q) process. § 10. 1 The Periodogram Consider an arbitrary set of (possibly complex-valued) observations x 1 , . . . , x" made at times 1 , . . . , n respectively. The vector belongs to the n-dimensional complex space C". If u and v are two elements of C", we define the inner product of u and v as in (2. 1 .2), i.e. < u, v ) = n L U;V; · i=l (10. 1. 1 ) By imagining the data x 1 , , x" to be the values at 1 , . . . , n of a function with period n, we might expect (as is shown in Proposition 10. 1 . 1 below) that each xt can be expressed as a linear combination of harmonics, • • • t = 1 , . . . , n, ( 1 0. 1 . 2 ) where the frequencies wj = 2nj/n are the integer multiples of the fundamental frequency 2n/n which fall in the interval ( - n, n] . (Harmonics e itwJ with fre quencies 2nj/n outside this interval cannot be distinguished on the basis of observations at integer times only.) The frequencies wj = 2nj/n, - n < wj s n, are called the Fourier frequencies of the series { x 1 , . . . , x" } . The representation ( 1 0. 1 .2) can be rewritten in vector form as x = where L ajej, j E Fn (10. 1 .3) (10. 1 .4) }"', = {jEZ : -n < wj = 2nj/n s n} = { - [ (n - 1 )/2] , . . . , [n/2] }, ( 1 0. 1 .5) and [x] denotes the integer part of x. Notice that F" contains n integers. The validity and uniqueness of the representation (10. 1 .3) and the values of the coefficients aj are simple consequences of the following proposition. Proposition 10.1.1. The vectors ej, j E F", defined by ( 1 0. 1 .4) constitute an ortho normal basis for C". 332 10. Inference for the Spectrum of a Stationary Process PROOF. < ej, ek> = n - 1 I e ir<wrwk> r=l n D Corollary 1 0.1.1. For any X E en , ( 1 0. 1 .6) where ai = <x, ej ) n = n -1!2 L x,e -it wJ. t� 1 ( 1 0. 1 .7) PROOF. Take inner products of each side of ( 1 0. 1 .6) with ei , j E Fn . D Definition 10.1.1. The discrete Fourier transform of X E e n is the sequence {ai ,j E Fn } defined by ( 1 0. 1 .7). Definition 10.1.2 (The Periodogram of x E e). The value J (wi ) of the periodo gram of x at frequency wi = 2 nj/n, j E Fn, is defined in terms of the discrete Fourier transform { ai } of x by, l (wJ := l ai l 2 = l < x , ei ) l 2 = n-1 1 f x,e- itwJ J 2 . ( 1 0. 1 .8) t�1 Notice that the periodogram decomposes ll x ll 2 into a sum of components [ ( x ei ) l 2 associated with the Fourier frequencies, wi , j E Fn- Thus , ll x ll 2 = I l (wJ. j E Fn ( 1 0. 1 .9) This decomposition can be neatly expressed as the "analysis of variance" shown in Table 1 0. 1 . ( [y] denotes the integer part of y.) Table 1 0. 1 . Decomposition of l l x l l 2 into Components Corresponding to the Harmonic Decomposition ( 1 0. 1 .6) of x Source Degrees of freedom Sum of squares Frequency w_[<n - 1 )/2 1 I a-r<n -1 )/2] 1 2 Frequency w0 (mean) l ao l 2 = n - 1 1 L x, l 2 Freq uency Wrn1 1 2 l a[n/2 ] 1 2 Total n t=l n ll x ll 2 333 §I 0. 1 . The Periodogram If x E IR:" and if wj ( = 2nj/n) and - wj are both in ( - n, n], it follows from aj = a_j and l (wj) = I( - wj). We can therefore rewrite ( 1 0. 1 .6) in the form ( 1 0. 1 .7) that X= a oeo + [ (n-1 )/2] L (aj ej + aj e -j ) + an;2 en/2 ' j= 1 ( 1 0. 1 . 1 0) where the last term is defined to be zero if n is odd. Writing aj in its polar form, aj = rj exp(i8J, we can reexpress ( 1 0. 1 . 1 0) as x = a0 e0 + [ (n -1)/2] 1 L 2 12 rj(cj cos 8j - sj sin 8) + a n12 en12 , j ( 1 0. 1 . 1 1 ) =1 where and sj = (2/n) 1i2 (sin wj, sin 2wj, . . . , sin nwX Now { e 0, c 1 , s 1 , . . . , c [ (n -1 )121, s[(n- 1 )12 1, en12 }, with the last vector excluded if n is odd, is an orthonormal basis for IR:". We can therefore decompose the sum of squares L7= 1 xl into components corresponding to each vector in the set. For 1 � j � [(n - 1 )/2], the components corresponding to cj and sj are usually lumped together to produc� a "frequency w/' component as in Table 1 0.2. This is just the squared length of the projection of x onto the two-dimensional subspace sp {cj, sj } of IR:". Notice that for x E IR:" the same decomposition is obtained by pooling the contributions from frequencies wj and - wj in Table 1 0. 1 . We have seen how the periodogram generates a decomposition of l l x l l 2 into components associated with the Fourier frequencies wj = 2njjn E ( - n, n]. Table 1 0.2. Decomposition of II x 1 1 2 , x E IR: " , into Components Corresponding to the Harmonic Decomposition (1 0. 1 . 1 1 ) Source Frequency w0 Sum of squares Degrees of freedom a6 = n- 1 (mean) Frequency w 1 2 2rf n L x( = Frequency wk Frequency wn;z Total = n (if n is even) n t=l (� } 2 l a 1 12 x = = J(O) 2 / (w t l 1 0. Inference for the Spectrum of a Stationary Process 334 It is also closely related to the sample autocovariance function as demonstrated in the following proposition. Y (k), l kl < n, Proposition 10.1.2 (The Periodogram of x E IC" in Terms of the Sample Auto covariance Function). If wi is any non-zero Fourier frequency, I(w) = I y(k) e-iko\ l kl < n then (10. 1 . 1 2) where y(k) := n -1 L,�,;;-t (x, +k - m) (x, - m), k � 0, m := n - 1 I�= 1 x, and y(k) = y( - k), k < 0. [If m = 0, or if we replace y(k) in (10. 1 . 1 2) by y(k), where y is defined like y with m replaced by zero, the following proof shows that (10. 1 . 1 2) is then valid for all Fourier frequencies, wi E ( - n, n].] PROOF. By Definition 10. 1 .2, we can write n n s=l r=l I(wj) = n -1 L xs e- iswj I x, eit wJ. I(wj) n n n -1 I L (xs - m)(x, - m ) e-i(s - t)wj s=l r= 1 = L y(k) e- ikwJ. lk l < n = 0 Remark. The striking resemblance between (10.1.12) and the expression f(w) = y(k)e- ikw for the spectral density of a stationary process with I l y (k) l < oo , suggests the potential value of the periodogram for spectral (2n) -1 L::"= _00 density estimation. This aspect of the periodogram will be taken up in Section 1 0.3. § 10.2 Testing for the Presence of Hidden Periodicities In this section we shall consider a variety of tests (based on the periodogram) which can be used to test the null hypothesis H0 that the data { X 1 , . . . , Xn } is generated by a Gaussian white noise sequence, against the alternative hypothesis H 1 that the data is generated by a Gaussian white noise sequence with a superimposed deterministic periodic component. The form of the test will depend on the way in which the periodic component is specified. The data is assumed from now on to be real. (a) Testing for the model for the data is Presence of a Sinusoid with Specified Frequency. 11 The (10.2.1) + A cos wt + B sin wt + Z,, where {Z, } is Gaussian white noise with variance a 2 , A and B are non-random X, = 335 §10.2. Testing for the Presence of Hidden Periodicities constants and w is a specified frequency. The null and alternative hypotheses are and (10.2.2) H0 : A = B H1 and B are not both zero. : A = 0, (10.2.3) If w is one of the Fourier frequencies w = 2 nk/n E (0, n), then the analysis of variance (Table 1 0.2) provides us with an easy test. The model ( 1 0.2. 1 ) can be written, in the notation of (10. 1 . 1 1 ), as Z � N(O, cr 2 /"). ( 10.2.4) We therefore reject H0 in favour of H1 if the frequency wk sum of squares in Table 10.2, i.e. 2/(wk ), is sufficiently large. To determine how large, we observe that under H0 (see Problem 2. 1 9), 2/(wk ) = II P5P{c".s.J X II 2 = and that I(wd is independent of II X - P5P{e0.c",s. ) X II 2 = II P'P{c".s"J Z II 2 � cr 2 x2 (2), n L X? - 1(0) - 2/(wk ) i=l � cr 2 x2 (n - We therefore reject H0 in favor of H 1 at level a if (n - 3)/(wk ) /[� X? - /(0) - 2/(wd] > 3). F1 _a(2, n - 3). An obvious modification of the above test can also be used if w = n . However if w is not a Fourier frequency, the analysis is a little more com plicated since the vectors 1 c = (2/n) 12 (cos w, cos 2w, . . . , cos nw)' , s = (2/n) 1 12 (sin w, sin 2 w, . . . , sin nw)', and e0 are not orthogonal. In principle however the test is quite analogous. The model now is 1 1 1 X = n 12 f.1e0 + (n/2) 12 Ac + (n/2) 12 Bs + Z, and the two hypotheses H0 and H 1 are again defined by (10.2.2) and In this case we reject H0 in favor of H 1 if is large. Now and 2/* (w) : = 1 1 Psp{e0,c,s} X - Psp{e0) X II 2 under H0, 2/*(w) � cr2x2(2), I*(w) is independent of II X - P5P{e0,c,s} X II 2 � CT 2 X 2 (n - 3). (10.2.3). 1 0. Inference for the Spectrum of a Stationary Process 336 We therefore reject H0 in favour of H 1 at level r:x if (n - 3)J*(w)/I I X - Psp{e0, c , s} X II 2 > F1 -a (2, n - 3). To evaluate the test statistic we have _ n - 1 ;2 Psp { eo } X and (see Section 2.6) n " L., i=1 X; eo, Psp{ e0,c, s} X = n 1 12 t1 e0 + (n/2) 1i2 A + (n/2) 1i2 Bs' c r where {i., A and B are least squares estimators satisfying W' W({i, A , B)' = W'X, and W is the (n x 3)-matrix [ n 112 e0 , (n/2) 1i2 c, ( n/2) 1i2 s] . (b) Testing for the Presence of a Non-Sinusoidal Periodic Component with Specified Integer- Valued Period, p < n. Iff is any function with values fr, t E 7l, and with period p E (1, n), then the same argument which led to (10. 1 . 1 1 ) shows that f has the representation, [(p- 1 )/2 ] 1 (10.2.5) A 1 ( - 1) , [Ak cos (2nkt/p) Bk sin (2nk t/p)] = fr L fl. + + + k= 1 P2 0 if p is odd. Our model for the data is therefore t = 1, . . . , n where { Z, } is Gaussian white noise with variance a 2 and fr ( 10.2.5). The null hypothesis is H0 : Aj = Bj = 0 for all j, where A P12 := (10.2.6) is defined by (10.2.7) and the alternative hypothesis is H1 : H0 is false. (10.2.8) Define the n-component column vectors, e0 = (1/n) 112 (1, 1, . . . , 1 )', yj = (2/n) 1 12 (cos 1/Jj , cos 21/fj , . . . , cos ni/IJ' and Gj = (2/n) 1i2 (sin 1/Jj , sin 2 1/fj , . . . , sin n i/Jj )' where 1/Jj 2 j/p, j 1 , 2, . . . , [p/2]. Now let S(p) be the span of the p vectors e0, y 1 , CJ 1 , y 2 , CJ 2 , . . . , (the last is eP1 2 if p is even, CJ(p - 1 )1 2 if p is odd) and let W be the (n x p)-matrix W = [eo , y, , CJ1 , y 2 , . . ]. The projection of X = ( X 1 , , X.)' onto S(p) is then (see Section 2.6) = n = . • . . § 1 0.2. Testing for the Presence of Hidden Periodicities Ps x = From (10.2.5) and (10.2.6), 337 ( 10.2.9) W(W' w) - 1 wx. (10.2. 10) I I X - Ps X II 2 = liZ - Ps Z f � a 2 x2 (n - p), since Z : = (Z1 , • • . , Z")' � N (O, a 2 /"). Moreover under H0, I I Ps(p)X - Psp { e0) X II 2 = I I Ps(p)z - P5P (e0 ) Z II 2 � D" 2 X2 (p - (10.2. 1 1) 1 ), and is independent of II X - Ps X II 2 . We reject H0 in favour of H 1 if I I Ps(p)X - P5P{•o} X II is sufficiently large. From ( 1 0.2. 10) and (10.2. 1 1 ), we obtain a size rx test if we reject H0 when II Ps x - X l ll 2/(p - 1 ) > I I X - Ps X II 2 /(n - p) Fl - a ( P - (10.2. 1 2) 1 ' n - p), where X = L:i'= 1 Xjn, 1 := (1, . . . , 1 )' and Ps X is found from (10.2.9). In the special case when n is an integer multiple of the period p, say n = rp, the calculations simplify dramatically as a result of the orthogonality of the p vectors e0, y 1 , cr 1 , y , cr , . . . . In fact, in the notation of (10. 1 . 1 1 ), 2 Hence, using Table 2 j = 0, . . . ' [ p/2] . 1 0.2, 1 1 Ps(p) X II 2 = = where bP = reduces to [p/2 ] L j= O [ I I Psp{c.1) X II 2 + 11 Psp{s.1) X I I 2 ] /(0) + 2 L I(wr) + bp l(n), p 1 sj< /2 1 if p is even, 0 if p is odd. The rejection criterion (10.2. 12) therefore where, as usual, wri = 2nrj/n. (c) Testing for Hidden Periodicities of Unspecified Frequency: Fisher's Test. If { X, } is Gaussian white noise with variance a2 and X = (X 1 , , X")', then, since 2 /(wd = l l f'.--p { c• . s. J X II 2 , k = 1 , . . . , [(n - 1 )/2] , we conclude from Prob lem 2.19 that • • • where k = 1, . . . ' q, (1 0.2. 14) q := [(n - 1 )/2] , and that V1 , . . . , Vq are independent. Since from ( 1 0.2. 1 4) the density function of � is e - x I[ o, oo/x), we deduce that the joint density function of V1 , � is • . • , 338 10. Inference for the Spectrum of a Stationary Process fv , . . . v. (VI , . . . , vq) = q fl e - "• I[O, oo ) (vJ i =l ( 1 0.2. 1 5) This is the key result used in the proof of the following proposition. Proposition 10.2.1 . If { X1 } is Gaussian white noise, then the random variables, L � =I � L � = I I(wk ) = i = 1, . . . ' q - 1, ' Lk=l � Lk=l I(wd are distributed as the order statistics of a sample of (q - 1 ) independent random variables, each one uniformly distributed on the interval [0, 1 ] . y ·. = ' PROOF. Let si = L �=l J.j, i = 1, . . . , q. Then from ( 1 0.2. 1 5), the joint density function of S1 , , Sq is (see e.g. Mood, Graybill and Boes ( 1974)) . • . fs , . . . s. (s 1 , . . . , sq) = exp [ - s 1 - (s 2 - s d - · · · - (sq - sq_ 1 ) ] 0 s s 1 s · · · s sq. = exp( - sq), ( 1 0.2. 1 6) The marginal density function of Sq is the probability density function of the sum of q independent standard exponential random variables. Thus s: - 1 fs. (sq) = _ ! exp( - sq ), (q 1 ) ( 1 0.2. 1 7) From ( 1 0.2. 1 6) and ( 1 0.2. 1 7), the conditional density of (SI ' . . . ' sq - 1 ) given sq is fs, . . . s. _ , Js. (S 1 , . . . , sq - I I sq) = (q - 1 ) ! s; q + I , 0 s s 1 s · · · s sq - I s sq . Since by definition Y; = SJSq, i = 1 , . . . , q - 1 , the conditional density of yl ' . . . ' Yq-1 given sq is and since this does not depend on sq, we can write the unconditional joint density of Y1 , , Yq -1 as, • • • 0 s Y 1 s · · · s Yq s 1. ( 1 0.2. 1 8) -1 This is precisely the joint density of the order statistics of a random sample of size (q - 1) from the uniform distribution on (0, 1 ). 0 Corollary 10.2.1. Under the conditions of Proposition 1 0.2. 1 , the cumulative distribution function with jumps of size (q - 1 ) - 1 at Y; , i = 1 , . . . , q - 1 , is the empirical distribution function of a sample of size ( q - 1) from the uniform distribution on (0, 1 ). Corollary 10.2.2. If we define Y0 := 0, Yq := 1 and §I 0.2. Testing for the Presence of Hidden Periodicities 339 then under the conditions of Proposition 10.2. 1, P(Mq :::;; a) = where x+ = I ( - 1 )j ( :) ( 1 - ja)'t- 1 , j =O max (x, 0). (10.2.1 9) 1 PROOF. It is clear from Proposition 1 0.2. 1 that Mq is distributed as the length of the largest subinterval of (0, 1 ) obtained when the interval is randomly partitioned by (q - 1) points independently and uniformly distributed on (0, 1). The distribution function of this length is shown by Feller (1971), p. 29, to have the form (1 0.2. 19). D Fisher' s Test for Hidden Periodicities. Corollary 10.2.2 was used by Fisher to construct a test of the null hypothesis that {X, } is Gaussian white noise against the alternative hypothesis that {X, } contains an added deterministic periodic component of unspecified frequency. The idea is to reject the null hypothesis if the periodogram contains a value substantially larger than the average value, i.e. (recalling that q = [(n - 1 )/2]) if �q : = [ max J(w;)J /[ q-1 .I J (w;)J t=1 OS: t OS: q I = ( 10.2.20) qMq, is sufficiently large. To apply the test, we compute the realized value from the data X 1 , . . . , X" and then use (1 0.2. 19) to compute P(�q 2 x) = 1 - I ( - 1)j ( :) ( 1 - jx/q)'t- 1 • j= O x of �q (10.2.21 ) 1 I f this probability i s less than rx, w e reject the null hypothesis a t level rx. = = ExAMPLE 1 0.2. 1 . Figure 10. 1 shows a realization of {X 1 , . . . , X 1 00 } together with the periodogram ordinates I(wj ),j = 1 , . . , 50. In this case q [99/2] . 49 and the realized value of � 4 9 is x = 9.4028/1 . 1 092 = 8.477. From (10.2.21), P(� 4 9 > 8.477) = .0054, and consequently we reject the null hypothesis at level .01 . [The data was in fact generated by the process t = 1, . . . ' 100, where { Z, } is Gaussian white noise with variance 1 . This explains the peak in the periodogram at w 1 7 = .34n.] X, = cos (nt/3) + Z,, The Kolmogorov-Smirnov Test. Corollary 1 0.2. 1 suggests another test of the null hypothesis that {X, } is Gaussian white noise. We simply plot the empirical distribution function defined in the corollary and check its compatibility with the uniform distribution function F(x) = x, 0 :::;; x :::;; 1, using the Kolmogorov- 10. Inference for the Spectrum of a Stationary Process 340 3 2 � (\ � LN 0 -1 N W\ 1\ � -2 � \ \I v \ u � -3 0 10 20 30 40 50 60 70 80 90 1 00 (a) 0 10 20 (b) 30 40 50 Figure 1 0. 1 . (a) The series {X 1 , . . . , X 1 00} o f Example 1 0.2. 1 and (b) the corresponding periodogram ordinates l(2nj/1 00), j = 1, . . . , 50. § 1 0.2. Testing for the Presence of Hidden Periodicities 341 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 0 20 10 40 30 50 Figure 1 0.2. The standardized cumulative periodogram C(x) for Example 1 0.2. 1 showing the Kolmogorov-Smirnov bounds for rx = .05 (inner) and rx = .01 (outer). Smirnov test. For q > 30 (i.e. for sample size n > 62), a good approximation to the level-a Kolmogorov-Smirnov test is to reject the null hypothesis if the empirical distribution function exits from the bounds 0 < X < 1, where k 0 5 = 1.36 and k 0 1 = 1 .63. This procedure is precisely equivalent to plotting the standardized cumula tive periodogram, C(x) = { 0, Y;, 1, X < 1, i ::::;; X < i + 1 , i = 1 , . . . , q X :::0: q, - 1, ( 10.2.22) and rejecting the null hypothesis at level a if for any x in [ 1 , q], the function C exits from the boundaries, Y = x-1 q- 1 -- 1 ± ka (q - 1 ) - /2 . ( 1 0.2.23) EXAMPLE 1 0.2.2. Figure 1 0.2 shows the cumulative periodogram and Kolmogorov-Smirnov boundaries for the data of Example 1 0.2. 1 with a = .05 and a = .01 . We do not reject the null hypothesis even at level .05 using this test. The Fisher test however rejected the null hypothesis at level 10. Inference for the Spectrum of a Stationary Process 342 .01 since it is specifically designed to detect departures from the null hypothesis of the kind encountered in this example. Generalization of the Fisher and Kolmogorov-Smirnov Tests. The null hypoth esis assumed for both these tests was that {X, } is Gaussian white noise. However when n is large the tests can also be used to test the null hypothesis that {X, } has spectral density f by replacing I(wk ) by I(wk )/f(wk ) in the definitions of Y; and �q · § 10.3 Asymptotic Properties of the Periodogram In this section we shall consider the asymptotic properties of the periodogram of X 1 , , X" when {X,} is a stationary time series with mean f.1 and absolutely summable autocovariance function y( " ). Under these conditions {X, } has a continuous spectral density (Corollary 4.3.2) given by . • • (10.3. 1) w E [ - n, n]. f(w) = (2n) - 1 L y(k)e- ikw, k = -oo The periodogram of {X 1 , . . . , X. } is defined at the Fourier frequencies wi = 2nj/n, wi E [ - n, n], by I.(wJ = n - 1 I X,e - ir wi 00 { I 1= 1 l z· 10. 1.2, this definition is equivalent to I.(O) = n J X J 2, (10.3.2) I.(wi) = Ln y (k)e - ikwi if wi -:f. 0, kl l< where y(k) = n - 1 L ��ik l (X, - X)(X, + Ik l - X) and X = n - 1 L �=1 X,. In deriving the asymptotic properties of I. it will be convenient to use the alternative By Proposition representation, (10.3.3) which can be established by the same argument used in the proof of Pro position 1 0. 1 .2. In view of (10.3.2) a natural estimate of f(wJ for wi -:f. 0 is I.(wi)/(2n). We now extend the domain of I. to the whole interval [ - n, n] in order to estimate f(w) for arbitrary non-zero frequencies in the interval [ - n, n]. This can be done in various ways, e.g. by replacing wi in (10.3.2) by w and allowing w to take any value in [ - n, n]. However we shall follow Fuller ( 1976) in defining the periodogram on [ - n, n] as a piecewise constant function which coincides with ( 1 0.3.2) at the Fourier frequencies wi E [ - n, n]. §I 0.3. Asymptotic Properties of the Periodogram 343 Definition 10.3.1 (Extension of the Periodogram). For any w E [ - n, n] the periodogram is defined as follows: if wk - n/n < w :::;; wk + n/n and 0 :::;; w :::;; n, In( - w) If. w E [ - n, O). Clearly this definition implies that In is an even function which coincides with (10.3.2) at all integer multiples of 2n/n. For w E [0, n], let g(n, w) be the multiple of 2n/n closest to w (the smaller one if there are two) and for w E [ - n, 0) let g(n, w) = g(n, - w). Then Definition 10.3.1 can be rewritten as (10.3.4) In(w) = In(g(n, w)). In(w) _ {In(wk ) The following proposition establishes the asymptotic unbiasedness of the periodogram estimate In(w)/(2n) of f(w) for w # 0. Proposition 1 0.3.1. If {X, } is stationary with mean J1 and absolutely summable autocovariance function y ( · ), then (i) Ein(O) - nj1 2 2nf(O) and (ii) Ein(w) -+ 2nf(w) if w # 0. (If J1 = 0 then Ein(w) converges uniformly to 2nf(w) on [ - n, n].) -+ PROOF. By Theorem 7. 1 . 1 , 00 Ein(O) - nJ1 2 = n Var (Xn) -+ L y (n) = 2nf(O). n= -oo Now if w E (0, n] then, for n sufficiently large, g(n, w) # and ( 10.3.4) 0. Hence, from (10.3.3) n - lkl Ein(w) = Ln n - 1 L E[(X, - j1)(Xr+ lk l - Jl) ] e - ikg(n ,ro) t=! lk l< = ( 1 - l k l/n)y(k)e - ikg(n, w). kl l< n However, since y ( · ) is absolutely summable, L lk l < n (1 - I k l/n)y(k)e - ik.l. con verges uniformly to 2nf(2) and therefore (since g(n, w) -+ w) we have EUw) -+ 2nf(w). The uniform convergence of Ein(w) to 2nf(w) when J1 = 0 is easy to check using the uniform continuity of f on [ - n, n] and the uniform convergence of g(n, w) to w on [0, n]. D L As indicated earlier, the vectors {ci, si;j = 1 , . . . , q = [(n - 1 )/2] } in equa tion (10. 1 . 1 1) are orthonormal. Consequently if {X,} is Gaussian white noise with variance a2, then the random variables, 1 0. Inference for the Spectrum of a Stationary Process 344 j= j= 1, ... 1, . . . ' q, (10.3.5) , q, are independent with distribution N(O, CJ 2 ). Consequently, as observed in Section the periodogram ordinates, 10.2, j= 1, . . . , q, are independently and exponentially distributed with means CJ 2 = 2nfx (wi), where is the spectral density of { Xr}. An analogous asymptotic result (Theorem can be established for linear processes. First however we shall consider the case when {X, } � IID(O, CJ 2 ). fxC )10.3.2) � IID(O, CJ 2 ) Suppose that {Z, } and let In(w), -n w n, denote the periodogram of {Z1 , , Zn} as defined by (10. 3.4). (i) If 0 2 1 then thevector random vector (In(2 1and ) con 2 verges in distribution as n to a of independent exponentially distributed random variables,andeachw with2njjnmean[0,CJ2n],• then (ii) If EZ{ f/CJ4 i 3)CJ4 + 2 4 if wi 0 or n, {n-1(ry Var(Jn (wi )) n -1 (., - 3)CJ4 + (J4 if 0 wi n, (10. 3.6) and (10. 3.7) 3 0 10.so 2that (If Z 1 is normally distributed, then (c).) In(wJ and In(wk) are uncorrelated for =I k, as pointed out in Section Proposition 10.3.2. � � . • • < < ··· < m < n , . . . , In()om ))' --> oo = = < oo E CJ = ., 11 j = z - < < = PROOF. (i) For an arbitrary frequency 2 E (O, n) define 2)), a (2) := a ( ( 2)) and f3 (2) : = where a (wi) and fJ(wJ are given by with Z, replacing X,. Since In(2i) = it suffices to show that (a 2 (2J + (a(2 1 ), f3 (2 d, . . . , a(2m ), f3 (2m))' is AN(O, CJ 2 I2 m ), f3(g(n, g n, (10.3.5) f32().J )/2, (10.3.8) where I2 m is the 2m 2m identity matrix. if 2 is a fixed frequency in (0, n) then for all sufficiently large n, g(n,Now 2) (0, n) and hence by the independence of the sequence { Z,} Var(a(2)) Var (a (g(n, 2)) n CJ 2 (2jn) L cos 2 ( g(n, 2)t) x E = = t= l § 10.3. Asymptotic Properties of the Periodogram Moreover for any s > 345 0, n n -1 L E(cos 2(g (n, A.)t)ZI2 /[Ieos (g(n . .!)I)Z,I>tn'i 2a] ) 1= 1 n � n -1 L E(ZI2 J[IZ,I >enl/2a] ) 1= 1 -+ 0 as n -+ oo , implying that a ( A.) is AN(O, tT2) by the Lindeberg condition (see Billingsley ( 1 986)). Finally, for all sufficiently large n, g(n, A.J E (0, n), i = 1, . . . , m, and since the covariance matrix of (e<(A.d, f3(A.d, . . . , e<(A.m ), f3(A.m ))' is 0"2 12 m , the joint convergence in (i) is easily established using the Cramer-Wold device. (ii). By definition of J"(wJ, we have n n In (w·) J = n -1 '\' � '\' � Zs Zl e iwJ(I - s)' s= 1 1 = 1 and hence, n n n n E ln (wJ in (wk ) = n - 2 L L L L E(ZS ZI Zu Zv ) eiwj(l - s)eiwk( v - u). s= 1 1 = 1 u= 1 v = 1 By (7.3.4), this expression can be rewritten as ( n -1 ( '1 - 3)0"4 + 0"4 1 + n - 2 and since EJ"(wj) = n -1 L�= 1 EZ12 1� = 1 + n-2 1� 1 l) e i(wk - wj)l 2 , (I �� e i(wj+wk)l 12 + I �� ei(wk-wj)l 1 2} 0"2, it follows that 4 Cov (Jn (wj), Jn (wk)) = n -1 ('1 - 3)0" + n-20"4 . l ei(wj+wk)l 2 The relations (10.3.6) and ( 10 3 .7) are immediate consequences of this equation. 0 We next extend Proposition 10.3.2 to the linear process { Z1 } � IID(O, 0"2 ), (10.3.9) j= - oo where L� - ro I I/I) < oo . The spectral density of this process is related (see (4.4.3)) to the spectral density of the white noise sequence {Z1 } by - n � A. � n, j where 1/J(e -u) = LJ= -ro 1/Jj e-i .< (and fz(A) = CT2j2n). Since J" (A.)/2n can be thought of as a sample version of the spectral density function, we might expect a similar relationship to exist between the respective periodograms of { X1 } and { Z1 }. This is the content of the following theorem. I 0. Inference for the Spectrum of a Stationary Process 346 Theorem 1 0.3.1. Let {X, } be the linear process defined by ( 1 0.3.9) and let In , x(A) and In, z(A) denote the periodograms of {X 1 , . . . , Xn } and {Z1 , . . . , Zn } respec tively. Then, if wk = 2nk/n E [0, n], we can write (10.3.10) where max w, E [0, 1t) E I Rn(wk ) I --+ O as n --> oo. /f in addition, L� - ro l l/li 1 1N12 < oo and E I Z1 14 < oo, then maxw, E [0,1t] E 1 Rn (wd l 2 = O (n - 1 ). Remark 1. Observe that we can rewrite (10.3. 10) as n, ln, x ( )o) = 1 1/J (e - ig( ).)W In, z(A) + R n (g(n, A)), (10.3. 1 1) where suph [ -" · "l E I R n (g(n, A)) l --> 0. In particular, Rn (g(n, A)) .!.. 0 for every A E [ - n, n]. PROOF. Set A = wk E [0, n] and let lx(A) and Jz(A) denote the discrete Fourier transforms of {X,} and { Z,} respectively. Then n lx(A) = n - 112 L X, e - w r= 1 I.e. ( 1 0.3. 1 2) , n oo , 1, ).i Unj • "" 1 z1 e - ;;, and Y.n ( A, ) - n - 112 " " -j1 -j z,e -;;,, - L.,t= Where Unj - L.,t= _ 00 'f'j e -i L.,j= Note that if Ul < n, then Unj is a sum of 21jl independent random variables, whereas if ljl :::::: n, Uni is a sum of 2n independent random variables. It follows that _ _ and hence that Thus (10.3. 1 3) § 10.3. Asymptotic Properties of the Periodogram 347 Now if m is a fixed positive integer we have for n > m, co n - 1/2 I l l/il [ min(l j[ , n) 1 /2 ::;; n - 1/2 I l l/il l lj[ 1 12 + I 1 1/!i [ , j� - oo I i i S: m l i l> m whence co lim n - 1 12 I l l/li [ min([ j [, n) 112 ::;; I 1 1/!J j� - co n�co l i l> m Since m is arbitrary it follows that the bound in ( 1 0.3. 1 3) converges to zero as n ---+ oo. Recalling that In , x(wk ) = lx (wk )lx( - wk ), we deduce from ( 1 0.3. 1 0) and ( 1 0.3. 1 2) that R " (A) = 1/J (e - iA ) lz (A) Y, ( - A) + ljl (eiA ) Jz ( - A) Y, (A) + I Yn (AW . Now [ ljl (e- i A ) l ::;; Ii� - oo l l/jl l < oo and E [ Jz (AW = Ein , z (A) = (J2• Moreover we have shown that the bound in ( 1 0.3. 1 3) does not depend on A. Application of the Cauchy-Schwarz inequality therefore gives max E l Rn (wd l ---+ 0 as n ---+ oo. Finally if E [ Z 1 [4 < oo and I� - co [ thl lj[ 1 12 E l Un) 4 ::;; 2 [ j [ E [ Z 1 [4 oo, then (see Problem < 1 0. 1 4) + 3(2 [ j [ (J 2)2, so that El Yn ( A) [ 4 ::;; = n-2 C��oo l l/lj [ (2 [j[ E [ Z 1 [4 + 1 1 2 1 N (J4 ) 14 r O(n - 2). Hence by applying the Cauchy-Schwarz inequality and Proposition 1 0.3.2 to each of the terms in R� (A), we obtain max E l R n (wk W = O(n- 1 ) as desired. Theorem 1 0.3.2. Let D { X, } be the linear process, j = � co { Z, } � IID(O, (J2 ), where I� - oo 1 1/!j l < oo . Let In (A) denote the periodogram of { X � > . . . , Xn } and let f(A) be the spectral density of { X, } . (i) If f(A) > 0 for all A E [ - n, n] and if 0 < A 1 < . . . < Am < n, then the random vector Un ()o 1 ), , In ( Am))' converges in distribution to a vector of inde pendent and exponentially distributed random variables, the i1h component of which has mean 2nf(A;), i = 1 , . . . , m. (ii ) If I� - co l l/li l lj[ 1 12 < oo, EZt = rw4 < oo, wi = 2nj/n 2': 0 and wk = 2 nk/n 2': 0, then • • . 348 Cov(In(wJ, In(wd) = { 10. I nference for the Spectrum of a Stationary Process 2(2n) 2f 2 (wj ) + O(n - 1 12 ) if wj = wk = 0 or n, (2n) 2j l (wJ + O(n - 1 12 ) if O < wj = wk < n, O(n - 1 ) if wi =I= wk > where the terms O(n - 112 ) and O(n - 1 ) can be bounded un iformly in j and k by c n - 112 and c 2 n- 1 respectively, for some positive constants c and c 2 • 1 1 PROOF. From Theorem 10.3. 1 , we have ln (g(n, }")) = 2 nf(g(n, Aj))a - 2 ln , z (Aj ) + Rn (g(n, I"J). Since f(g(n, AJ) --+ j(Aj ) and Rn(g(n, Aj )) !. 0, the result (i) follows immedi ately from Propositions 10.3.2 and 6.3.8. Now if L]= - oo l t/lj l ljl 112 < oo and EZ1 < oo then from (10.3. 1 1 ) we have Var(In(wd) = (2nf(wd/a2 ) 2 Var(/n ,z (wd) + Var(Rn(wk)) In().) = + 2(2nf(wk))/a2 ) Cov(Jn, z (Wk), Rn(wk)). Since Var(Rn(wk)) :5: E I R n (wk) l 2 = O(n - 1 ) and since Var(/n , z(wd) is bounded uniformly in wk, the Cauchy-Schwarz inequality implies that Cov(Jn,z (wk), 1 Rn(wk)) = O(n - 1 2 ) . It therefore follows from (10.3.6) and Proposition 1 0.3.2 that if wk = 0 or n, if O < wk < n. A similar argument also gives Cov(Jn(wJ, In(wk)) = O(n - 112 ) if wj =I= wk. In order to improve the bound from O(n - 112 ) to O(n- 1 ) in this last relation we follow the argument of Fuller ( 1 976). Set w = wj and A = wk with A =I= w. Then by the definition of the periodo gram, we have By the same steps taken in the proof of Proposition 7.3. 1 , the above expression may be written as the sum of the following three terms: n n n n oo iw(t - s) e i.<( v - u)' " L. " L. " " ,/, ,/, n - 2 ('1 - 3) (j4 L. L. " L. ,/, 'l'j ,/, 'l't-s+j'l'u-s+j'l'v - s+je s = l t = l u==l v=l j= - oo (10.3. 14) ( 1 0.3. 1 5) and § 1 0.3. Asymptotic Properties of the Periodogram 349 (n -1 st1 vt1 y (v - s)e-iwse iAv) (n -1 �� .t1 y (u - t) e i"''e-iAu) . (10.3. 1 6) By interchanging the order of summation, we see that the first term is bounded by n- 2 (1] - 3)a4 C=�oo 1 1/Jj lr = O (n - 2 ). (10.3. 1 7) Now the first factor of (1 0.3. 1 5) can be written as n n n n-u n - 1 L L y (u - s) e- iw(s - u) e- i(H w)u n -1 L L y (s) e- iws e- i(w + A )u, u=l s = l - u s= l u = l from which it follows that n n n -1 L L y (u - s) e-iws e- iAu s= 1 u = l (10.3. 1 8) = However, since w + A. 2n( j + k)/n i= 0 or 2n, we have for 0 :<S: s :<S: n n -s n n f- e i( w + A) U = f- e i(w + A)u � e i ( w + A) U u 1 u 1 u =n s+ 1 = I I I lo = Similarly, :'S: _ _ s. f u=n - s+l e i<w + A) u l I I It e- i(w + A )u :'S: l s i , - n + 1 :<S: s :<S: - 1 . u= s These inequalities show that the right side of (10.3. 1 8) is bounded by n- 1 L l s l l y (s) l jsj < n :'S: :'S: :<S: = 00 n - l L L l s i i i/Jji/jJ +s l jsj < n j = -oo n - 112 L L l s i 112 I I/iJ i/Ji+sl jsj < n j= - oo OCJ c=�oo j=�oo I s + ji 1/2 1 1/Js+ji/Jj l + s =�oo j =�oo J jl l/2 1 1/Jji/Js+j l ) 2n -112 (1:00 l s i 112 1 1/Js1 ) c � 1 1/Jj l ) = oo n -1;2 = O (n - 112 ). - 1, 1 0. Inference for the Spectrum of a Stationary Process 350 Hence I n- 1 s� ut1 y(u i - s) e- wse- ilu I = O(n - 112 ). (10.3. 1 9) The relation ( 1 0.3. 1 9) remains valid if w is replaced by - w or if A is replaced by - A or both. The terms ( 1 0.3. 1 5) and ( 1 0.3. 1 6) are therefore of order ). Taking into account (10.3. 1 7) we deduce that Cov(J.(w), /.(A)) = as desired. 0 O(n- - 1 O(n 1) A good estimator e. of any parameter () should be at least consistent, i.e. oo. However Theorem 10.3.2 shows should converge in probability to () as is not a consistent estimator of j(),). Since for large the periodo that gram ordinates are approximately uncorrelated with variances changing only slightly over small frequency intervals, we might hope to construct a consistent estimator of j(A) by averaging the periodogram ordinates in a small neigh borhood of A. (just as we obtain a consistent estimator of a population mean by averaging the observed values in a random sample of size The number of Fourier frequencies in a given interval increases approximately linearly with By averaging the periodogram ordinates over a suitably increasing number of frequencies in a neighborhood of A, we can indeed construct consistent spectral density estimators as shown in the following section. n -+ I.(A)/2n n n). n. § 1 0.4 Smoothing the Periodogram Let {X,} be the linear process LJ= -oo lthl l j l112 where < oo. If . . . , x., then we may write I.(w),jE F., is the periodogram based on X 1 , j = 1, . . . , [(n - 1 )/2] , where f is defined by ( 1 0.3. 1) and the sequence { �} (by Theorem 10.3.2) is approximately WN(O, 1) for large n. In other words, we may think of = 1, 1)/2], as an uncorrelated time series with a trend f(w) which we wish to estimate. The considerations of Section 1 .4 suggest with the aid, for example, estimating f(wj ) by smoothing the series of the simple moving average filter, + 1 - I.(wj +k). I, (2n)- 1 J(w),j . . . , [(n - (2n)- 1 l k l ,o; m {I.(w) }, (2m ) 1 More generally we shall consider the class of estimators having the form j(wJ = (2nr1 I. W.(k)I.(wj +d, l k l ,o; mn ( 1 0.4. 1) 351 §I 0.4. Smoothing the Periodogram where { mn } is a sequence of positive integers and { W,(")} is a sequence of weight functions. For notational simplicity we shall write m for m", the dependence on n being understood. In order for this estimate of the spectral density to be consistent (see Theorem 10.4. 1 below) we impose the following conditions on m and { W,( · ) }: m -> oo and m/n -> 0 as n ...... oo , W,(k) = W,( - k), I w,(k) = 1 , l k l ,; m W,(k) ;:::: 0, ( 10.4.2) for all k, (10.4.3) ( 10.4.4) and (10.4.5) I W,2(k) -> 0 as n -> oo . l k l ,; m If wi+k ¢ [ the term l (wi+k ) in ( 1 0.4. 1) is evaluated by defining I to have period The same convention will be used to define f(w), w ¢ [ We shall refer to the set of weights { W,(k), l k l $; m} as a filter. 2n.n, n], n, n]. Definition 10.4.1 (Discrete Spectral Average Estimator). The estimator ](w) = ](g(n, w)), with ](wJ defined by ( 1 0.4. 1 ) and m and { W,( · ) } satisfying ( 10.4.2)-(10.4.5), is called a discrete spectral average estimator of f(w). The consistency of discrete spectral average estimators is established in the proof of the following theorem. Theorem 10.4.1. Let { X1 } be the linear process, j :::=: - oo with I� I t/li l lj l 112 < oo and EZ{ < oo . If j is a discrete spectral average estimator of the spectral density f, then for A, w E [0, (a) lim E](w) = f(w) -oo and (b) lim n� oo ( I W,2( j ) Ii i ,; m n], ) -1 { 2j 2 (w) if w = A = 0 or Cov( ](w), f(A)) = JZ (w) if O < w = A < O if w =I= A. n, n, PROOF. (a) From (10.4. 1) we have I E](w) - f(w)l = I lkIl [(2n)-1 Eln(g(n, w) + wk ) - f(g(n, w) + wd (1 0.4.6) + f(g(n, w) + wk ) - f(w)J I · ,; m W,(k) 352 10. Inference for the Spectrum of a Stationary Process 2) m implies that lmax kl ,; m l g(n,w) + wk - wl -> 0 as n -> 0, this implies b y the continuity of f, that lmax kl,;m l f( g(n,w) + wd - f(w) l :::;; e/2, The restriction (1 0.4. on oo . For any given s > n sufficiently large. Moreover, by Proposition 1 0.3. 1 , lmax kl,;m 1 (2n)-1 EI"(g(n,w) + wk) - f(g (n,w) + wdl < s/2, for n sufficiently large. Noting that L l k l ,; m W,(k) = 1, we see from (10.4.6) that A w - f w l :::;; s for n sufficiently large. Since is arbitrary, this implies IEf () () that E/(w) -> f(w) . (b) From the definition of f we have Cov(/(w),/(A)) (2nr2 UlL,; m lkLl ,; m W,( j) W,(k) Cov(l"(g (n, w) + wJ, I"(g(n, A) + wd). I f w i= A and n i s sufficiently large, then g (n, w) + wj i= g (n, A) + wk for all l j l l k l :::;; m. Hence, with c2 as defined in Theorem 10.3.2, I Cov(/(w),/(A)) I I ljlL,; m lkLl ,; m W,(j) W,(k)O(n - 1 ) I :::;; Cz n- 1 (ljlL,; m W,( j))z :::;; c2 n-1 (ljlL,; m W,2(j)) (2m + 1 ). Since m/n -> 0, this proves assertion (b) i n the case w i= Now suppose that 0 < w = A < n. Then by Theorem 10.3.2, Var(/(w)) = (2nr2 UlL,; m W,2 (j)((2nff2(g(n,w) + wj) + O(n-112 )) + (2n)- 2 UlL,; m lkLl ,; m W,(j) W,(k)O(n-1 ). k #j An argument similar to that used in the proof of (a) shows that the first term for s = , = X is equal to The second term is bounded by c2 n -1(2nr2 (ljl,;m L W, ( j))2 :::;; c2 n-1(2n)-2 Ul,;m L W,2 (j)(2m + 1). 353 § 1 0.4. Smoothing the Periodogram Consequently (liIi S m W/(j))- 1 Var( /(w)) -+ j l (w). n Remark 1. The assumption L l k l s m W,2(k)-+ 0 ensures that Var( /(w)) -+ 0. Since E/(w) -+ f(w), this implies that the estimator /(w) is mean-square The remaining cases w = A. = 0 o r are handled i n a similar fashion. D consistent for f(w). A slight modification of the proof of Theorem 1 0.4. 1 shows in fact that sup IE/(w) - f(w)l -+ 0 -n S w :S tt and sup Var( /(w)) -+ 0. -n, n], Hence J converges in mean square to f uniformly on [ i.e. 2 sup E l /(w) - f(w)l = sup (Var( /(w)) + I E/(w) - f(w) l 2 ) Remark 2. Theorem 1 0.4. 1 refers to a zero-mean process { X1 }. In practice we deal with processes { 1'; } having unknown mean Jl. The periodogram is then usually computed for the mean-corrected series { 1'; - Y} where Y is the sample mean. The periodograms of { 1';}, { 1'; - Jl} and { 1'; - Y} are all identical at the non-zero Fourier frequencies but not at frequency zero. In order to estimate f(O) we therefore ignore the value of the periodogram at frequency 0 and use a slightly modified form of ( 1 0.4. 1 ), namely (2nr1 [ W,(O)/n(wd + 2 kt1 W,(k)In(wk+dJ. ( 1 0.4.7) Moreover, whenever /n(O) appears in the moving averages (10.4. 1 ) for /(wi ), j = 1 , . . . , [n/2], we replace it by 2n](O) as defined in ( 10.4.7). /(0) = EXAMPLE 1 0.4. 1 . For the simple moving average estimator, k if l l :5:, m, otherwise, we have Ll kl s m W,2 (k) = (2m + 1)-1 so that2 {2j (w) , (2m + 1 ) Var(f(w)) -+ l j (w) if w = 0 or n, if O < w < n. In choosing a weight function it is necessary to compromise between bias and variance of the spectral estimator. A weight function which assigns I 0. 354 Inference for the Spectrum of a Stationary Process roughly equal weights to a broad band of frequencies will produce an estimate of f( · ) which, although smooth, may have a large bias, since the estimate of f(w) depends on values of /" at frequencies distant from w. On the other hand a weight function which assigns most of its weight to a narrow frequency band centered at zero will give an estimator with relatively small bias, but with a large variance. In practice it is advisable to experiment with a range of weight functions and to select the one which appears to strike a satisfactory balance between bias and variance. = EXAMPLE 10.4.2. The periodogram of 1 60 observations generated from the MA( 1 ) process X1 Z1 - .6Z1_ 1 , { Z1 } � WN(O, 1 ), is displayed in Figure 1 0.3. Figure 10.4 shows the result of using program SPEC to apply the filter { t , t , t } (W,(k) = (2m + 1 ) - 1 , l k l s m = 1 ). As expected with such a small value of m, not much smoothing of the periodogram has occurred. Next we use a more dispersed set of weights, W, (O) = W,(1) = W, (2) = l1 , W, (3) l1 , W,(4) = 2\ , producing the smoother spectral estimate shown in Figure 10.5. This particular weight function is obtained by successive application of the filters { t , j-, j-} and { �' i, �' i, �' t �} to the periodogram. Thus the esti mates in Figure 1 0.5 (except for the end-values) are obtained by applying the filter {+, �' �' �' �' �' �} to the estimated spectral density in Figure 10.4. Applying a third filter {/1 ' �\ , , 1\ , /J} to the estimate in Figure 10.5 we obtain the still smoother spectral density estimate shown in Figure 1 0.6. The weight function resulting from successive application of the three filters is shown in the inset of Figure 1 0.6. Its weights (multiplied by 23 1 ) are { 1 , 3, 6, 9, 1 2, 1 5, 1 8, 20, 21, 2 1 , 2 1 , 20, 1 8, 1 5, 1 2, 9, 6, 3, 1 } . Except for the peak at frequency w 7 5 , the estimate in Figure 1 0.6 has the same general form as the true spectral density. We shall see in Section 5 that the errors are in fact not large compared with their approximate standard deviations. = • . . ExAMPLE 10.4.3 (The Wolfer Sunspot Numbers). The periodogram for the Wolfer sunspot numbers of Example 1 . 1 .5 is shown in Figure 10.7. Inspecting this graph we notice one main peak at frequency w 1 0 = 2n(. 1 ) (correspond ing to a ten-year cycle) and a possible secondary peak at w = w 1 2 . In Figure 10.8, the periodogram has been smoothed using the weight function W,(O) = W, ( 1 ) = W, (2) l1 , W, (3) 221 and W,(4) = 2\ , which i s obtained b y succes. . 1 7, 1 71 } to th e peno 1 7, 1 7, 1 7> 1 7> 1 31 } and { 7> d osJVe app1"icatwn of th e two fIi l ters { 31 > 3> gram. In Section 1 0.6 we shall examine some alternative spectral density estimates for the Wolfer sunspot numbers. . = = Lag Window Estimators. The spectral density f is often estimated by a function of the form, JL(w) = 1 (2n) - L w(hjr) Y (h)e - ihw, I h i ,; r ( 10.4.8) where y( · ) is the sample autocovariance function and w(x) is an even, piecewise !i I 0.4. Smoothing the Periodogram 355 13 12 1 1 10 9 8 7 6 5 4 3 2 0 0. 1 0.3 0.2 Figure 1 0.3. The periodogram J1 60(2nc), 0 < c Example 1 0.4.2. :5: 0.4 0.5 0.5, of the simulated MA(l) series of 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0. 1 0.2 Figure 10.4. The spectral estimate /(2nc), 0 :5: c with the weights { t, t , n 0.3 :5: 0.4 0.5 0.5, of Example 1 0.4.2, obtained 10. Inference for the Spectrum of a Stationary Process 356 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 0 0. 1 0.2 Figure 10.5. The spectral estimate ](2nc), 0 with the inset weight function. 0.3 ::::; c ::::; 0.4 0.5 0.5, o f Example 10.4.2, obtained 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0 2 0. 1 0 0 0. 1 0.2 Figure 10.6. The spectral estimate ](2nc), 0 with the inset weight function. 0.3 ::::; c ::::; 0.4 0.5 0.5, of Example 10.4.2, obtained 357 §1 0.4. Smoothing the Periodogram 16 15 14 13 12 1 1 VJ u c 0 � VJ :0 0 £ c 10 9 8 7 6 5 4 3 2 0 0 0. 1 0.2 0.3 0.4 0.5 Figure 10.7. The periodogram /1 00(2nc), 0 < c � 0.5, of the Wolfer sunspot numbers. 2.4 2.2 2 1 .8 � " u c 0 " :0 0 .r: c 1 .6 1 .4 1 .2 0.8 0.6 0.4 0.2 0 0 0. 1 0.2 0.3 0.4 0.5 Figure I 0.8. The spectral estimate ](2nc), 0 < c � 0.5, o f the Wolfer sunspot numbers, obtained with the same weight function as Figure 1 0.5. 358 1 0. Inference for the Spectrum of a Stationary Process = continuous function of x satisfying the conditions, w(O) 1, l w(x) l ::::; 1, and w(x) = for all x, for l x l 0, > 1. The function w( · ) i s called the lag window, and the corresponding estimator /L is called the lag window spectral density estimator. By setting w(x) = 1 , lxl ::::; 1, and r = n, we obtain 2n/L(w) = I.(w) for all Fourier frequencies w = wi # 0. However if we assume that r = r. is a function of n such that r --+ oo and rjn --+ 0 as n --+ oo , then /L is a sum of (2r + 1 ) terms, each with a variance which is O(n- 1 ). If { r. } satisfies these conditions and {X, } satisfies the con ditions of Theorem 10.4. 1 , then it can be shown that /L(w) is in fact a mean square consistent estimator of f(w). Although the estimator /L(w) and the discrete spectral average estimator /(w) defined by (10.4. 1) appear to be quite different, it is possible to approxi mate a given lag window estimator by a corresponding average of periodogram ordinates. In order to do this define a spectral window, W(w) = (2n)-1 lhlI� r w(hjr)e-ihw, ( 10.4.9) and an extension o f the periodogram, I hi <n Then T,. coincides with the periodogram I. at the non-zero Fourier frequencies 2nj/n and moreover, Y (h) = (2n) - 1 I:, e ih;.l. (A.) dA.. Substituting this expression into (10.4.8) we get /L(w) = (2n)-2 ihl�rI w(hjr) I " e-ih(w--<>T,.(A.) dA = (2nr 2 I " (lhIl�r w(h/r)e- ih(w-;.)) ln(),) dA. (2n) - 1 r, W(w A.)T,.(A.) d), = (2nr 1 I:, W(A.)T,.(w A.) dA.. -n -n = - + Partitioning the interval [ - n, n] at the Fourier frequencies and replacing the last integral by the corresponding Riemann sum, we obtain 359 § 1 0.4. Smoothing the Periodogram ]L(w) ::::: (2 ) - i Ln W(w)l,(w + w) 2 /n I i <; [ /2 ) 1 (2 ) - 1 L W(w)I.(g(n, w) + wi )2 jn . i [n/ ) n ::::: n n n I i <; 2 Thus we have approximated ]L(w) by a discrete spectral average with weights lj J � [n/2]. ( 1 0.4. 1 0) (Notice that the approximating spectral average does not necessarily satisfy the constraints (1 0.4.2)-(1 0.4.5) imposed earlier.) From ( 1 0.4. 10) we have I W.2 ( j ) = (2nf I W 2 (wi)/n 2 I ii <; [n/2 ) Ii i <; [n/2 ) ::::: 2n I " n = ::::: { W 2 (w) dw -n 1 L w 2 (h/r) n lhl <: r - r n I-l1 (by ( 10.4.9)) w 2 (x) dx . Although the approximating spectral average does not satisfy the conditions of Theorem 10.4. 1 , the conclusion of the theorem suggests that as n -> oo , n - Var(fL(w)) -> r A 2j 1 (w) JZ(w) J 1 -1 w 2 (x) dx if w = 0 or n, f1 w2 (x) dx if O < w < (10.4. 1 1) n. If {X, } satisfies the conditions of Theorem 1 0.4. 1 and if { r. } satisfies the conditions r. -> oo and r. /n -. 0 as n -. oo, then ( 10.4. 1 1 ) is in fact true and E]L(w) -> f(w) for 0 � w � n. Proofs of these results and further discussion of JL(w) can be found in the books of Anderson (1971), Brillinger ( 198 1 ) and Hannan ( 1970). Examples. We conclude this section by listing some commonly used lag windows and the corresponding spectral windows W( · ) as defined by ( 10.4.9). ExAMPLE 1 (The Rectangular or Truncated Window). This window has the form 1 if J x l � 1 , w(x) = 0 otherwise, { and the corresponding spectral window is given by the Dirichlet kernel (see Figure 2.2), 360 10. Inference for the Spectrum of a Stationary Process W(w) = (2n) _ 1 sin\(r + i )w) . sm(w/2) ( 1 0.4. 1 2) Observe that W(w) is negative for certain values of w. This may lead to negative estimates of the spectral density at certain frequencies. From (10.4. 1 1 ) we have, a s n -> oo , 2r Var(fL(w)) � -j l (w) for 0 < w < n. n A ExAMPLE 2 (The Bartlett or Triangular Window). In this case w (x) _ { 1 - l xl 0 if l x l � 1 , if lx l > 1 , and the corresponding spectral window i s given b y the Fejer kernel (see Figure 2.3), W(w) = si nz(rw/2) . (2nr) - 1 . 2 sm (w/2) Since W(w) � 0, this window always gives non-negative spectral density esti mates. Moreover, as n -> oo, 0 < w < n. The asymptotic variance is thus smaller than that of the rectangular lag window estimator using the same sequence {rn } · EXAMPLE 3 (The Daniell Window). From (10.4. 1 0) we see that the spectral window, W(w) = {rj2n, 0, lwl � njr, otherwise, corresponds to the discrete spectral average estimator with weights W,(j) = (2m + 1 ) - 1 , l j l � m = [nj2r]. From ( 10.4.9) we find that the lag window corresponding to W(w) is w(h/r) I.e. = f" W(w)eihw dw = n - 1 (r/h)sin(nhjr), w(x) = sin(nx)/(nx), - 1 � x � l. The corresponding lag window estimator has asymptotic variance 0<w < n. 36 1 §1 0.4. Smoothing the Periodogram EXAMPLE 4 (The Blackman-Tukey Window). This lag window has the general form 1 - 2a + 2a cos x, l x l ::;; 1 , w(x) = 0, otherwise, { with corresponding spectral window, W(w) = aD, (w - n/r) + ( 1 - 2a)D, (w) + aD, (w + n/r), where D, is the Dirichlet kernel, (10.4. 1 2). The asymptotic variance of the corresponding density estimator is 0 < w < n. The Blackman-Tukey windows with a = .23 and a = .25 are often referred to as the Tukey-Hamming and Tukey-Hanning windows respectively. {1- EXAMPLE 5 (The Parzen Window). This lag window is defined to be 6 1 x l 2 + 6 1 x l 3, l x l < !, w(x) = 2(1 - l x l ) 3 , t ::;; l x l ::;; 1 , 0, otherwise, with approximate spectral window, 6 sin4 (rw/4) . W(w) = -3 . nr sm (w/2) 4 The asymptotic variance of the spectral density estimator is Var ( ]L (w)) � .539rjl(w)/n, 0 < w < n. Comparison of Lag-Window Estimators. Lag-window estimators may be com pared by examining the spectral windows when the values of r for the different estimators are chosen in such a way that the estimators have the same asymp totic variance. Thus to compare the Bartlett and Daniell estimators we plot the spectral windows W8(w) = (2nrr 1 sin 2 (rw/2)/(sin 2 (w/2)) and W�(w) = r'/(2n), l w l ::;; njr', (10.4.1 3) where r' = 2rj3. Inspection of the graphs (Problem 1 0. 1 8) reveals that the mass of the window W8 is spread over a broader frequency interval and has secondary peaks or "side-lobes" at some distance from the centre. This means that the Bartlett estimator with the same asymptotic variance as the Dan-iell estimator is liable (depending on the spectral density being estimated) to exhibit greater bias. For other factors affecting the choice of an appropriate lag window, see Priestley (1981). The width of the rectangular spectral window which leads to the same 1 0. Inference for the Spectrum of a Stationary Process 362 asymptotic variance as a given lag-window estimator is sometimes called the bandwidth of the given estimator. For example the Bartlett estimator with parameter r has bandwidth 2n/r' = 3n/r. § 10.5 Confidence Intervals for the Spectrum In this section we provide two approximations to the distribution of the discrete spectral average estimator /(w) from which confidence intervals for the spectral density f(w) can be constructed. Assume that {X, } satisfies the conditions of Theorem 10.4. 1 (i.e. X, = L� - ro thZ, _i, I I I/Iii iN12 < oo, {Z, } � IID(O, cr 2 ) and Ez: < oo) and that / is the discrete spectral average (10.5. 1) /(wJ = (2nr 1 L l¥,.(k)In (wi + wk ). lk l ,; m The x 2 Approximation. By Theorem 10.3.2, the random variables In(wi + wd/ (nf(wi + wk )), -j < k < n/2 - j, are approximately independent and distri buted as chi-squared with 2 degrees of freedom. This suggests approximating the distribution of j(wi ) by the distribution of the corresponding linear combination of independent and identically distributed x 2 (2) random variables. However, as advocated by Tukey ( 1 949), this distribution may in turn be approximated by the distribution of c Y, where c is a constant, Y x 2 (v) and c and v are found by the method of moments, i.e. by setting the mean and variance of c Y equal to the asymptotic mean and variance of j(wJ. This procedure gives the equations � c v = f(wJ, 2c 2 v = L l¥,.2 (k)jl (wi ), l kl ,; m from which we find that c = Llkl ,; m l¥,.2 (k)f(wi)/2 and v = 2/(Li kl ,; m l¥,2 (k)). The number v is called the equivalent degrees of freedom of the estimator j The distribution of v/(wJ/f(wJ is thus approximated by the chi-squared distribution with v degrees of freedom, and the interval ( v/(wJ vj(wi) , X.297 s (v) X.2o zs (v) ), 0 < wi < n, (10.5.2) is an approximate 95% confidence interval for f(wJ. By taking logarithms in (10.5.2) we obtain the 95% confidence interval (ln f(wJ + In v - In X2. 9 7 5 (v), ln f(wi ) + ln v - ln X.20 25 (v)), (10.5.3) A A for ln f(wi ). This interval, unlike (10.5.2) has the same width for each wi E (0, n) . In Figure 1 0.9, we have plotted the confidence intervals (10.5.3) for the data § 1 0.5. Confidence Intervals for the Spectrum 0 0.1 0.2 363 0.3 0.4 0.5 Figure 1 0.9. 95% confidence intervals for ln(2n:f(2n:c)) based o n the spectral estimates of Figure 1 0.6 and a x 2 approximation. The true function is also shown. of Example 10.4.2 using the spectral estimate displayed in Figure 10.6. Using the weights specified in Example 10.4.2 we find that Llk l :o; m W,2 (k) = .07052 and v = 28.36 so that (10.5.3) reduces to the interval ( 10.5.4) cwj = (ln ](wj ) - .450, ln ](wJ + .617). Notice that this is a confidence interval for ln f(wJ only, and the intervals { Cwj ' 0 < wi < n} are not to be interpreted as simultaneous 95% confidence intervals for {ln f(wJ, O < wi < n}. The probability that C"'j contains ln f(wi) for all wi E (0, n) is less than .95. However we would expect the intervals C"'j to include ln f(wj ) for approximately 95% of the frequencies wi E (O, n). As can be seen in Figure 10.9, the true log spectral density lies well within the confidence interval (10.5.4) for all frequencies. The Normal Approximation. There are two intuitive justifications for making a normal approximation to the distribution of j(w). The first is that if the equivalent number of degrees of freedom v is large (i.e. ifLik l :o; m W,2 (k) is small) and if Y is distributed as x 2 (v), then the distribution of c Y can be well approximated by a normal distribution with mean cv = f(wi) and variance 2c 2 v = Ll k l :o; m W,.2 (k)f 2 (w), 0 < wi < n . The second is that w e may approxi mate ](wi) for n large by a sum of (2m + 1) independent random variables, which by the Lindeberg condition, is AN(f(wi), Likl :o;m W,.2 (k)j l (wi)). Both points of view lead to the approximation N(f(w), Llk l :o; m W,2 (k)j 2 (wJ) for the 364 I 0. Inference for the Spectrum of a Stationary Process 0 0. 1 0.2 0.4 0.3 0.5 Figure 1 0. 1 0. 95% confidence intervals for ln(2nf(2nc)) based o n the spectral estimates of Figure 1 0.6 and a normal approximation. The true function is also shown. distribution of j(wJ Using this approximation we obtain the approximate 95% confidence bounds, 1 12 f(wi ) ± 1 .96 L W,,l (k ) f(wi ), lkl oS m for f(wi ). Since the width of the confidence interval depends on j(wi ), it is customary to construct a confidence interval for ln f(wi ). The normal approximation to j(wi ) implies that ln j(wJ is AN(lnf(wJ, Ll kl oS m W,2 (k)) by Proposition 6.4. 1 . Approximate 95% confidence bounds for ln f(wJ are therefore given by 112 ln j(wJ ± 1 .96 L W,2 ( k) . (10.5.5) lkl oS m For the spectral estimate shown in Figure 1 0.6, we have L lkl oS m W,2 (k) = .07052, so that the bounds ( 1 0.5.5) become ln j(wJ ± .520. (10.5.6) ) ( ( ) These bounds are plotted in Figure 10. 1 0. The width of the intervals ( 10.5.4) based on the x 2 approximation is very close to the width of the intervals (10.5.6) based on the normal approximation. However the normal intervals are centered at ln j(wi ) and are therefore located below the x 2 intervals. This 365 § 1 0.6. Rational Spectral Density Estimators can be seen in Figure 10. 1 0 where the spectral density barely touches the upper limit of the confidence interval. For values of v 2: 20, there is very little difference between the two approximations. § 10.6 Autoregressive, Maximum Entropy, Moving Average and M aximum Likelihood ARMA Spectral Estimators The m1h order autoregressive estimator fm(w) of the spectral density of a stationary time series {X, } is the spectral density of the autoregressive process { Y, } defined by Y, - �m 1 Yr- 1 - ··· - �mm Yr-m = Z,, ( 1 0.6. 1 ) where �m = (�m 1 , . . . , �mmY and vm are the Yule-Walker estimators defined by (8.2.2) and (8.2.3). These estimators can easily be computed recursively using Proposition 8.2. 1 . Then yy(h) = y(h), h = 0, ± 1 , . . . , ± m, (see Section 8. 1 ) and ( 10.6.2) The choice of m for which the approximating AR(m) process "best" represents the data can be made by minimizing AICC(<f>m) as defined by (9.3.4). Alternatively the CAT statistic of Parzen ( 1 974) can be minimized. This quantity is defined for m = 1 , 2, . . . , by m CAT(m) = n - 1 L vj- 1 - v;;,l , j=1 and for m = 0 by CAT(O) = - 1 - n- 1 , where j = 1 , 2, . . . . We shall use AICC for choosing m. The m1h order autoregressive estimator fm(w) defined by ( 10.6.2) is the same as the maximum entropy estimator, i.e. the spectral density ]which maximizes the entropy, E = f, In g(A.) dA. over the class of all densities g which satisfy the constraints, 10. Inference for the Spectrum of a Stationary Process 366 r" e i Ah g()_) d). = y( h), (10.6.3) h = 0, ± 1 , . . . , ± m. To show this, let { W, } be any zero-mean stationary process with spectral density g satisfying (1 0.6.3), and let a;+ l = Psr;{ wJ, - '"' <j � t } W,+l · Then by Kolmogorov's formula (5.8.1 ), - 1 )2 E( W,+ 1 - W,+ Now for any sequence a 1 , . . • = { f" 1 2n exp 2n , a m E IR, } ln g(..1.) d..1. . -n where { r; } is the AR(m) process (10.6. 1 ), since { 1';} and { W, } both have autocovariances y(j), 0 :::; j :::; m. Setting aj = �mj• j = 1 , . . . , m, in the last expression and using Kolmogorov's formula for the process { r; }, we obtain the inequality " " ln /m (..1.) d..1. , ln g(..1.) d..1. :-:::;; 2n exp � 2n exp � 2n 2n { f -n } { f -n } as required. The idea of maximum entropy spectral estimation is due to Burg ( 1 967). Burg' s estimates �m l > . . . , �mm in (1 0.6. 1 ) are however a little different from the Yule-Walker estimates. The periodogram and the non-parametric window estimators discussed in Section 1 0.4 are usually less regular in appearance than autoregressive estimators. The non-parametric estimates are valuable for detecting strict periodicities in the data (Section 1 0.2) and for revealing features of the data which may be smoothed out by autoregressive estimation. The autoregressive procedure however has a much more clearly defined criterion for selecting m than the corresponding criteria to be considered in the selection of a spectral window. In estimating a spectral density it is wise to examine both types of density estimator. Parzen ( 1 978) has also suggested that the cumulative periodogram should be compared with the autoregressive estimate of the spectral distribution function as an aid to autoregressive model selection in the time domain. In the definition ( 1 0.6.2) it is natural to consider replacing the Yule-Walker estimates �m and Om by the corresponding maximum likelihood estimates, with m again chosen to minimize the AICC value. In fact there is no need to restrict attention to autoregressive models, although these are convenient since �m is asymptotically efficient for an AR(m) process and can be computed very rapidly using Proposition 8.2. 1 . However there are processes, e.g. a first order § 1 0.6. Rational Spectral Density Estimators 367 moving average with (} 1 � 1, for which autoregressive spectral estimation performs poorly (see Example 1 0.6.2). To deal with cases of this kind we can use the estimate suggested by Akaike ( 1 974), i.e. 6 2 1 1 + {) 1 e -iw + • + eq e -iqw l 2 ( 10.6.4) ' f(w) = 2 n 1 1 - ¢> 1 e -•w - . . . - rPAv e - 'P"' I 2 • • A A • • where � = (J1 , . . . , Jv y, 9 = ( {)1 , , {)qy and 6 2 are maximum likelihood esti mates of an ARMA(p, q) process fitted to the data, with p and q chosen using the A ICC. We shall refer to the function j as the maximum likelihood ARMA (or MLARMA) spectral density estimate. A simpler but Jess efficient estimator than ( 10.6.4) which is particularly useful for processes whose MA( oo) representation has rapidly decaying coefficients is the moving average estimator (Brockwell and Davis 1 988(a)) given by (jm (10.6.5) gA m (w) = 1 1 + {)m l e -iw + • • • + {)mm e -imw l 2 , . • • 2n where om = ( {)m I ' . . . ' {)mm Y and (jm are the innovation estimates discussed in Section 8.3. Like the autoregressive estimator ( 10.6.2), gm(w) can be calculated very rapidly. The choice of m can again be made by minimizing the AICC value. As is the case for the autoregressive estimator, there are processes for which the moving average estimator performs poorly (e.g. an AR(l ) process with r/J 1 � 1 ). The advantage of both estimators over the MLARMA estimator (1 0.6.4) is the substantial reduction in computation time. Moreover, under specified conditions on the growth of m with n, the asymptotic distributions of the m'h order autoregressive and moving average spectral density estimators can be determined for a large class of linear processes (see Berk ( 1974) and Brockwell and Davis ( 1 988(a)). ExAMPLE 10.6. 1 (The Wolfer Sunspot Numbers). For the Wolfer sunspot numbers of Example 1 0.4.3, the minimum AICC model for the mean corrected data was found to be, X, - 1 .475X, _ 1 + .937X, _ 2 - .21 8X, _ 3 + . 1 34X, _ 9 = Z, , ( 10.6.6) with { Z,} � WN(O, 1 97.06) and AICC = 826.25. The rescaled periodogram (2n)- 1 I 1 00(2ncj), ci = 1/100, 2/100, . . . , 50/100, and the MLARMA estimator, /(2nc), 0 :::; c :::; .50, i.e. the spectral density of the process ( 1 0.6.6), are shown in Figure 1 0. 1 1 . Figure 10. 1 2 shows the autoregressive estimators j3 (2nc) (with AICC = 836.97) and /8(2nc) (with AICC = 839.54). The estimator ]3(2nc) has the smallest AICC value. The estimator /8(2nc) which corresponds to the second smallest local minimum and fourth smallest overall AICC value, has a more sharply defined peak (like the periodogram) at frequency w 1 0 • 1 0. I nference for the Spectrum of a Stationary Process 368 3.0 2.8 2.6 2.4 2.2 rn , " 0 rn :J 0 .r: � t:, 2.0 1 .8 1 .6 1.4 1.2 1 .0 0.8 0 .6 0.4 0.2 0.0 0 0.1 0.2 0.1 0.2 (a) 0.3 0.4 0 .5 0.3 0.4 0.5 3.0 2.8 2.6 2.4 2.2 rn , " 0 Ill :J 0 .r: � t:, 2.0 1 .8 1 .6 1.4 1 .2 1.0 0.8 0.6 0.4 0.2 0.0 '-.../ 0 (b) Figure 1 0. 1 1 . (a) The rescaled periodogram (2nr 1 /100(2nc), 0 < c � 0.5, and (b) the maximum likelihood ARMA estimate j(2nc), for the Wolfer sunspot numbers, Example I 0.6. 1 . 369 § 1 0.6. Rational Spectral Density Estimators 3.0 2.8 2.6 2.4 2.2 2.0 ., , c: 0 ., ::J 0 .s: � t;, 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0 0. 1 0.3 0.2 0.4 0.5 0.4 0.5 (a) 3.0 2.8 2.6 2.4 2.2 2.0 " , c: 0 " ::J 0 .s: � t;, 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0 0.1 0.3 0.2 (b) Figure 1 0. 1 2. The autoregressive spectral density estimates (a) ]3(2nc) and (b) ]8(2nc) for the Wolfer sunspot numbers. 10. Inference for the Spectrum of a Stationary Process 370 3.0 2.8 2 .6 2.4 2.2 2.0 � ., 1.8 "tl c 0 1.6 t:, , .2 " :J 0 .r:. , .4 1.0 0.8 0.6 0.4 0.2 0.0 0 0.1 0.2 0.3 0.4 0.5 Figure 1 0. 1 3. The moving average spectral density estimate g 1 3(2n:c) for the Wolfer sunspot numbers. Observe that there is a close resemblance between /3(2nc) and the non parametric estimate of Figure 10.8. The moving average estimator with smallest AICC value (848.99) is g 1 3(2nc) shown in Figure 10.1 3. EXAMPLE 10.6.2 (MA( l )). A series of 400 observations was generated using the model (10.6.7) The spectral density of the process, f(2nc) = 1 1 + e - i 2 7tc l 2/(2n), 0 :::::: c :::::: 0.50, and the rescaled periodogram (2n) - 1 l400(2nci), ci = 1/400, 2/400, . . . , 200/400, of the data are shown in Figure 10.14. The data were mean-corrected. Figure 1 0. 1 5 shows the autoregressive estimator /9(2nc) with (AICC = 1 162.66) and the moving average estimator g6(2nc) (with AICC = 1 1 52.06). Maximum likelihood estimation gives the minimum AICC ARMA model, { Z, } � WN(O, .980), (10.6.8) with AICC = 1 1 37.72. The MLARMA estimator of the spectral density is therefore j(2nc) = .980 f(2nc), showing that j and f are almost indistin guishable in this example. § 1 0.6. Rational Spectral Density Estimators 0 0.1 0.2 37 1 0.3 0.4 0.5 (a) 2.4 2.2 .-------� 2 1 .8 1 .6 1 .4 1 .2 0.8 0.6 0 0. 1 0.2 (b) 0.3 0.4 0.5 Figure 1 0. 1 4. (a) The spectral density f(2nc) and (b) the rescaled periodogram of the realization {X 1 , . . . , X400 } of the process X, = Z, + Z, _ 1 , {Z,} WN(O, 1 ), of Example 1 0.6.2. � 10. Inference for the Spectrum of a Stationary Process 372 2.4 2.2 2 1 .8 1 .6 1 .4 1 .2 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0 0. 1 0.2 (a) 0.3 0.4 0.5 0.3 0.4 0.5 2.4 2.2 2 1 .8 1 .6 1 .4 1 .2 0.8 0.6 0.4 0.2 0 (b) Figure 1 0. 1 5. (a) The autoregressive spectral estimate ]9(2nc) and (b) the moving average estimate g6(2nc) for the data of Example 10.6.2. § 1 0.7. The Fast Fourier Transform (FFT) Algorithm 373 § 1 0.7 The Fast Fourier Transform (FFT) Algorithm A major factor in the rapid development of spectral analysis in the past twenty years has been the availability of a very fast technique for computing the discrete Fourier transform (and hence the periodogram) of long series. The algorithm which makes this possible, the FFT algorithm, was developed by Cooley and Tukey ( 1 965) and Gentleman and Sande (1966), although some of the underlying ideas can be traced back to the beginning of this century (Cooley et al., 1967). We first illustrate the use of the algorithm by examining the computational savings achieved when the number of observations n can be factorized as (10.7.1) n = rs. (The computational speed is increased still more if either r or s can be factorized.) Instead of computing the transform as defined by (10. 1.7), i.e. n aj = n - 1!2 L xt e- i"'Jt, t=1 we shall compute the closely related transform, n -1 e- 2 Ttijtjn' (10.7.2) b1 = '\' O :::;; j :::;; n - 1 . L Xt +l . t =O Then, by straightforward algebra, 0 :::;; j :::;; [n/2], (10.7.3) - [(n - 1 )/2] ::;; j :::;; - 1. Under the assumption (10.7.1), each t E [0, n - 1 ] has a unique representation, t = ru + v, u E {O, . . . , s - 1 }, v E {O, . . . 1 }. Hence ( 10.7.2) can be rewritten as, r- 1 s-1 bj = L L Xru+v+ 1 exp [ - 2nij(ru + v)/n] v =O u=O s- 1 r -1 = L exp(- 2nijv/n) L Xru +v+ 1 exp ( - 2nijujs), v =O u=O , r - i.e. r- 1 =O where { bj, 0 :::;; j :::;; s - 1 } is the Fourier transform, bj = L exp( - 2nij v/n)bj, v • v ( 10.7.4) "' bj , v = s -1 L Xru+v+ 1 exp( - 2nijujs). u=O (10.7.5) 1 0. Inference for the Spectrum of a Stationary Process 374 If we now define an "operation" to consist of the three steps, the computation of a term exp(- 2niju/k), a complex multiplication and a complex addition, then for each v the calculation of { bi. " ' 0 ::;, j ::;, s - 1 } requires a total of Ns = sz operations. Since l = 0, 1 , 2, . . . , it suffices to calculate { bi. "' 0 ::;, j ::;, s - 1 } in order to determine { bi. "' 0 :::; j ::;, n - 1 } . A total of rNs = rs2 operations is therefore required to determine {bi. v• O ::;, v ::;, r - 1 , 0 ::;, j ::;, n - 1 }. The calculation of {bi, O ::;, j ::;, n - 1 } from (10.7.4) then requires another nr operations, giving a total of rNs + nr = n(r + s) (10.7.6) operations altogether for the computation of { bi, 0 ::;, j ::;, n - 1 }. This repre sents a substantial savings over the n 2 operations required to compute {bi } directly from ( 10.7.2). If now s can be factorized as s = s 1 s 2 , then the number of operations Ns required to compute the Fourier transform (10.7.5) can be reduced by applying the technique of the preceding paragraph, replacing n by s, r by s 1 and s by s 2 • From (1 0.7.6) we see that Ns is then reduced from s 2 to (10.7.7) Replacing Ns in (10. 7.6) by its reduced value N;, we find that { bi, 0 ::;, j ::;, n - 1 } can now be calculated with rs(s 1 + s 2 ) + nr = n(r + s 1 + s 2 ) (10.7.8) operations. The argument of the preceding paragraph is easily generalized to show that if n has the prime factors p 1 , . . . , Pk • then the number of operations can be reduced to n(p 1 + p 2 + · · · + Pk ). In particular if n = 2P, then the number of operations can be reduced to 2n log2 n. The savings in computer time is particularly important for very large n. For the sake of improved computational speed, a small number of dummy observations (equal to the sample mean) is sometimes appended to the data in order to make n highly composite. A small number of observations may also be deleted for the same reason. If n is very large, the resulting periodogram will not be noticeably different from that of the original data, although the Fourier frequencies { wi , j E F. } will be slightly different. An excellent discussion of the FFT algorithm can be found in the book of Bloomfield ( 1976). We conclude this section by showing how the sample autocovariance function of a time series can be calculated by making two applications of the FFT algorithm. Let y(k), l k l < n, denote the sample autocovariance function of {X 1 , , X.}. We first augment the series to Y = ( Y1 , . . . , Y2 ._ 1 )', where t ::;, n, . . • §10.8.* Asymptotic Behavior of the Maximum Likelihood Estimators and Y; = 0, t> 375 n. The discrete Fourier transform of Y is then n ai = (2n - 1 ) - 1;2 2tL=1- 1 Y; e- i<AJ , Ai 2nj/(2n - 1 ), jE Fzn -1, ( 10.7.9) where F2n_ 1 {jEZ : - n Ai n}, and the periodogram of Y is (by Prop osition 1 0. 1.2 and the fact that I �:� 1 Y; = 0) = = < :S:: L n 2n and summing over 1 1kl<n y(k) e- ;uJ. j E F2 n - 1 , we get Multiplying each side by e im A1 1 2 (10.7. 1 0) y(m) = n - L lail e im AJ . The autocovariances y(k) can thus be computed by taking the two Fourier = �- j E F2n- l transforms (10.7.9) and (10.7. 1 0), and using the FFT algorithm in each case. For large n the number of operations required is substantially less than the number of multiplications and additions (of order n 2 ) required to compute < n, from the definition. The fast Fourier transform technique is particularly advantageous for long series, but significant savings can be achieved even for series of length one or two hundred. y(k), lkl § 10.8 * Derivation of the Asymptotic Behavior of the Maximum Likelihood and Least Squares Estimators of the Coefficients of an ARMA Process In order to derive the asymptotic properties of the maximum likelihood estimator, it will be convenient to introduce the concept of almost sure convergence. Definition 1 0.8.1 (Almost Sure Convergence). A sequence of random variables { Xn} is said to converge to the random variable X almost surely or with probability one if P(Xn converges to X) l . It is implicit that X, X 1 , X 2 , . . . are all defined on the same probability space. Almost sure convergence of { Xn } to X will be written as Xn � X a.s. Remark 1. If Xn � X a.s. then Xn !... X. To see this, note that for any s > 0, = 10. Inference for the Spectrum of a Stationary Process 376 1 = P(X. converges to X) (Q JJ. !�� Co sP = P . { I Xk - X I s �:} { I Xk - X I s e } s lim inf P( I X. - X I s �:). ) ) The converse is not true, although if X" !... X there exists a subsequence {X.J of {X.} such that X.1 -+ X a.s. (see Billingsley ( 1986)). Remark 2. For the two-sided moving average process with Lj l l/lj l < oo, it can be shown that for h E { O, ± 1, . . . }, y(h) = n - 1 L (XI - x. ) (XI+Ihl - x.) -+ y(h) a.s. n-lh l and 1= 1 y(h) : = n - 1 L X1Xr+ l h l -+ y(h) a.s. 1= 1 n - lh l ( 1 0.8. 1 ) The proofs of these results are similar t o the corresponding proofs of conver gence in probability given in Section 7.3 with the strong law of large numbers replacing the weak law. Strong Consistency of the Estimators Let { X1 } be the causal invertible ARMA(p, q) process, xl - c/J 1 xl- 1 - . . . - c/Jp XI - p = zl + 81 zl - 1 + . . . + 8q zl - q • { Z1 } - IID(O, a2 ), ( 1 0.8. 2) where c/J(z) and 8(z) have no common zeroes. Let � = (c/J 1 , . . . , ¢JP , 81 , . . . , 8q )' and denote by C the parameter set, C = {� E IW + q : cp(z)(}(z) # 0 for l z l S 1, cp p # 0, 8q =F 0, and ¢( · ), 8( · ) have no common zeroes}. Remark 3. Notice that � can be expressed as a continuous function � (a 1 , , aP , b 1 , . . . , bq ) of the zeroes a l > . . . , aP of c/J( · ) and b1 , . . . , bq of 8(- ). The parameter set C is therefore the image under � of the set { (a 1 , . . . , aP , b 1 , . . . , bq ) : I a ! > 1 , l bj l > 1 and a; =F bj , i = 1, . . . , p, j = 1 , . . . , q}. • • . ; The spectral density f(2; �) of { X1 } can be written in the form, az f(2; �) = 2 g( 2; �), n § 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators 377 where ( 1 0.8.3) Proposition 1 0.8.1. Let P o be a fixed vector in C. Then (2n)� 1 for all p E C such that P I= I" g ( A. ; P o) dA. (Jc ; p) g n � Po ( C denotes > 1 the closure of the set C). PROOF. If { X, } is an ARMA(p, q) process with coefficient vector P o and white noise variance 0"6 , then we can write where t/J0(B) and 80(B) are the autoregressive and moving average polynomials with coefficients determined by P o · Now suppose that p = (cp', 9')' E C, and p I= P o · I f l cf>(z)/8(z) l is unbounded on i z l .$; 1 then (2n) � 1 I�" [g(Jc; P o)/g(Jc; P)J dJc = oo and the result follows. So suppose l cf>(z)/8(z) i is bounded on i z l .$; 1 and consider the one-step predictor L� 1 niXr � i of X, where n(z) = 1 + L� 1 nizi = cf>(z)8� 1 (z). Since P I= P0 , the mean squared error of this predictor is greater than that of the best linear one-step predictor, and hence - ( j� Y ( 0"6 < E X, + nj Xt j � = E e� 1 (B)t/J(B)X, y . But the spectral density of e� 1 (B)t/J(B)X, is (0"6/2n) [ g (A.; P o)/g (A. ; p)] and hence 0"6 < Var (B� 1 (B)t/J(B)X, ) which establishes the proposition. = 0"5 2n I" g ( ),; Po) Jc d ( ) , � n g A.; P 0 The Gaussian likelihood of the vector of observations X. = (X1 , . . . , X. )' is given by { : L (p, 0"2 ) = (2n0" 2 )�nJ2 1 G. (PWif2 exp - } x� c.�I (p) x. , 2 2 where G.(P) = 0" � 2 r.( P) and r. ( P) is the covariance matrix of X •. From Section 8.7, the maximum likelihood estimator p is the value of p in C which minimizes 1 (1 0.8.4) l ( P) = ln (X � G.� ( P) X./n) + n � 1 ln det ( G. (p)). The least squares estimator p is found by minimizing a:;(p) = 1 n � X� G.� 1 (P)X. with respect to p E C . A third estimator � , is found by minimizing (1 0.8.5) 1 0. Inference for the Spectrum of a Stationary Process 378 a; ( {3 ) n - 1 I I.(wi)/g(wi ; p) (10.8.6) j with respect to p E C, where I.( · ) is the periodogram of {X , . . . , X.} and the 1 sum is taken over all frequencies wi = 2nj/n E ( - n, n]. We shall show that the three estimators, �' p and p have the same limit distribution. The argument follows Hannan (1 973). See also Whittle (1 962), Walker (1 964) and Dunsmuir and Hannan ( 1 976). In the following propositions, assume that {X, } is the ARMA process defined by (10.8.2) with parameter values P o E C and > 0. Proposition = O"g O"g I " 10.8.2. For every p E C, g(A ; P o ) d). a.s. (10.8.7) 2n g(A; p) 0, defining g0(A; p) = ( l e(e - i,\W + b)/ l ¢(e - i,\) l 2 , a; ( p) -+ Moreover for every b > n - 1 L ln(w) i g0(wi; p) uniformly in P E C almost surely. _" -+ O"� I " 2n _ " g(A; Po ) d). g0(A ; p) (10.8.8) PROOF. We shall only prove (10.8.8) since the proof of (10.8.7) is similar. Let qm (A; p) be the Cesaro mean of the first m Fourier approximations to g0(A; P) - 1 , given by m k qm (A ; P) = m - 1 L L bk e - i ,\ j�O lkl :$j -1 = -1 1 where bk (2n) f':.. " eik A(g0(A; p)) - 1 dA. By the non-negative definiteness of { bd, qm(A; p) � 0. As a function of (A, p), (go().; p)) - is uniformly continuous on the set [ - n, n] x C. It therefore follows easily from the proof of Theorem 2. 1 1 . 1 that qm (A; p) converges uniformly to (g0(A; p)f1 on [ - n, n] x C and in particular that for any c: > 0, there exists an m such that l qm(A ; P) - (g0(A; p)) - 1 1 < c for all (A, p) E [ - n, n] x C. We can therefore write, for all p E C, 1 n _1 'J" ( = ) _1"'J ln(w) - n ga(wi ; p) n- 1 I� . I I.(w)qm(wi , p) I.(wi)((ga(wi; p)) - 1 - qm(wi; P)) :::::; w - 1 "L I.(wi ) j = c:y(O) where the last equality follows from (10. 1 .9). I (10.8.9) § 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators Now for n > 379 m, n - 1 L ln(wj) qm (wj ; P) j ( - wi = L L y(h) ( 1 - �) bk n 1 L e - i (h - k)) nk m j l hl< l l< m = L ii(k) ( 1 - lk l< m -lmkl ) bk m k= - lk l 1 J(n k) ( ) bk . ( 10.8. 10) m 1 +2L For k and t fixed, the process { X,+ n - k ' n = 1 , 2, . . . } is strictly stationary and ergodic (see Hannan, 1970) and by a direct application of the ergodic theorem, n - 1 Xr+ n - k -> 0 a.s. From this it follows that y(n - k) = n - 1 L �= 1 x,x,+n - k -> 0 a.s. for each fixed k . The second term in ( 10.8. 1 0) therefore converges to zero a.s. as n -> oo . By Remark 2, the first term converges to L lk l < m y(k)( 1 - l k lfm)bk and since bk is uniformly bounded in p and k, we have b n - 1 Lj ln(w)qm (wj; P) -> kL y(k) ( 1 - �) m k l l< m uniformly in p E C a.s. Moreover ( 1 0.8. 1 1 ) � r"lqm(Jc; P) - (go(Jc ; p)) - 1 l f(2; Po) d2 � 6')'(0) . -> Since y(O) y(O) a.s. we conclude from ( 1 0.8.9), ( 1 0.8.1 1 ) and the above inequality that uniformly in p E C a.s. D Proposition 10.8.3. There exists an event B with probability one such that for any sequence { Pn }, Pn E C with Pn -> p, we have the following two possibilities: (a) If P E C, then for all outcomes in B, g(Jc ; Po ) d) . a;(p" ) ( 10.8. 12) 2n g(Jc; p) _. a5 f" _" . 10. Inference for the Spectrum of a Stationary Process 380 (b) If � E in B, ac (where ac is the boundary of the set C), then for all outcomes I.Ill Inf O"-n2 (Rl'n ) . n � oo > O"� _ 2n I " g(A.;g(),;�o ) d ' �) - rr A. (1 0.8. 1 3) PROOF. (a) Since � E C, inf;. g(A.; �) > 0 and sup;_ g(A.; �) < oo . Consequently for each r. E (0, inf;. g(A.; �)) there exists an N such that sup l g(A.; �") - g(A. ; �)I < r./2 for n :::::: N. (10.8. 14) ;_ By Corollary 4.4.2, we can find a polynomial, a(z) = 1 + a 1 z + · · · + am z m, and a positive constant K m such that a(z) i= 0 for l z l :o::; 1 , and {: sup l g(), ; �) - Am(A.) I < s/2, i f Am(A) > inf g(A. ; �)/2 > 0, ;_ ;_ where (10.8. 1 5) Note that Km --+ 1 as r. --+ 0. Let H" be the covariance matrix corresponding to the spectral density (2n) - 1 Am(A.). Then if y E IR" and y' y = 1 , ly ' G" ( �") y = y' ( 2nr' :o::; (2n) - 1 :o::; s(2n) - ' = H" y l 1 fJ itl yieii;f (g(A.; �") - Am(A.)) dA. I fJ j� yjeij-\ 1 2 (l g(A.; g(A.; lg(A.; "I I t yie ii-\ 1 2 dA. - s for n �") - - rr ::;:.: N. J- l �) I + �) - Am()�) I ) dA. (10.8. 1 6) Now if { Y, } is an ARMA process with spectral density (2nr 1 g(A. ; �") (and white noise variance 1 ) then by Proposition 4.5.3 the eigenvalues of G"(�") are bounded below by inf;. g(A. ; �" ) ::;:.: K > 0 for some constant K and all n suffi ciently large. The same argument also shows that the eigenvalues of H" are bounded below by inf;_ Am()�) > 0. Thus the eigenvalues of G,;- 1 ( �" ) and H,;- 1 are all less than a constant C 1 (independent of s) so that l n - ' X� (G,;-' (�") - H;' )X"I = l n -1 X� H"- 1 (H" - G" (�")) G; ' (�")X"I (10.8. 1 7) :o::; r.Cf y(O) for n ::;:.: N. We next consider the asymptotic behavior of n- 1 X�H;' X" . Let { Y, } be the AR(m) process with spectral density (2n) - 1 K ml a(e - ; ;.W 2 §1 0.8. * Asymptotic Behavior of the Maximum Likelihood Estimators 381 (and white noise variance K m ). Then by the Gram-Schmidt orthogonalization process, we can choose (5ik ' k = 1 , . . . , j, j = 1 , . . . , m, so that the random variables wl = 15 1 1 Yl , Wz = (52 1 Yt + l5zz Yz , are white noise with mean zero and variance K m . Then where W" matrix = ( W1 , . . • , W,.)', Y" = ( Y1 , . . . , Y" )' and T is the lower trangular (5mm at az at m m T = (5 l (5 2 am am - ! am (10.8. 1 8) az at am It follows that TH" T' = K m l, where I is the n x n identity matrix, and hence that H;; 1 = (T' T)/(K m ). Except for the m 2 components in the upper left corner, and the m 2 components in the bottom right corner, the matrix H"- t = [h uJ7.i=t is the same as the matrix R;; 1 = [ hu ] 7. i =t with m - -j\ K ;;/ a,ar+ \ i-i\ if l i - ji :::;; m, r- 0 ( 1 0.8. 1 9) hu 0 otherwise, _ _ { t where a0 := 1 . It then follows that m = n - 1 L: (h;i - h;JX;Xi + n - 1 i,j=l n L i, =n j -m+ l (h u - h;J X;Xi ( 10.8.20) --> 0 a.s. , 1 since n - X; --> 0 a.s. and n - 1 Xn - i --> 0 a.s. by an application of the ergodic theorem. It is easy to show that for n > m I 0. Inference for the Spectrum of a Stationary Process 382 and l (w ) ( n - 1 L In j - n - 1 � In �) 1 g (wj , p) 1 A m (wj ) I :S: Ci n - 1 � In (wj ) I A m (wJ - g (wj; P) I 1 ::::; C��>y(O), ( 10.8.22) where C2 1 = (inf,c g (A- ; P)/2) > o. Combining equations (10.8. 1 7), ( 10.8.20), (10.8.21 ) and (1 0.8.22), we have ( ) n - 1 X � G; 1 ( Pn)Xn - n - I L In � j g(wj , p) m 1 :S: (C i + C�)�>y(O) + 2 C 2 L i y( n - k ) l + l n - X � H; 1 X n - n - 1 X� H; 1X n l k= l for all n ?: N. Now let { Pd be a dense set in C and let Sk denote the probability one event where I I ( ) n - 1 L In �j ___. 0"6 j g (Wj , Pk ) 2n f" ( ) n - 1 L In �j ....... 0"6 j g (wj , p) 2n f" g (A-: Po ) d . Ag (A, Pk ) The event S = nk'== 1 Sk also has probability one and a routine approximation argument shows that for each outcome in S n { y(O) ....... y(O) }, -n -n g (A- ; Po ) dA. g ( A. , P) for each P E C. If B1 denotes the event 00 B1 = n ( {n - 1 Xk ....... 0} n {n- 1 Xn - k ....... 0}) n { Y(O) ....... y(O)} k=! then P (B 1 ) = 1 and for each outcome in B1 , I�.��S�p I n -I xn' Gn-I (nJln) Xn f" 2n 1 n S g ( ), ; Po ) d 1 2 2 ::::; (C 1 + C2 ) £)1 (0)• g (A ; p) A Since Cf and Ci do not depend on £ and £ is arbitrary, the assertion (10.8.12) follows for each outcome in B1 . (b) Set Pn = (cjl�, 9� ) ' and p = w, 9' ) ' . Since Pn ....... p, choose 9t such that et (z) = 1 + etz + . . . + eJz q f= 0 for l z l ::::; 1 and sup \ 1 8t(e -;"W - I B (e - ;" ) 1 2 \ < £. (If 8(z) f= 0 for l z l ::::; - (J6 - n ;_ 1 , take 9t = 9. ) With P! = (cjl�, 9t' )', we have g (A-; Pn ) ::::; 1 8t (e - ;"W + £ for all sufficiently large n. l if>n (e -;"W §1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators 383 By Corollary 4.4.2 there exists a polynomial b(z) = 1 + b 1 z + · · · + bk z k and a positive constant K such that b(z) =I= 0 for l z l � 1 and K < &t(e - ; 2 .') 1 + 2t: l l et(e -i '-) 1 2 + t: < b(e - l -· ... w for all A. Setting A (A; cp) = K l 1 + a 1 e - ;;. + · · · + ame - im).l - 2 l ¢>(e - i'-W 2 we have (10.8.23) = K l b(e - ;;.W 2 x g()o; Pn) � A (A; <I-n) for all A and sufficiently large n. Define the matrices T, Hn and H; 1 as in the proof of (a) with Am(),) replaced by A (A; .Pn ). Since the coefficients in the matrix T are bounded in n, we have from ( 1 0.8.20) and (1 0.8.21 ) n - 1 x'n H n- 1 xn I (w ) " n . J.h· --> 0 a.s. - n - 1 L..., j A (w , 'f' ) (10.8.24) j n 1 1 Since A - (.A.; cpn) = K l b(e ;;.W I ¢>(e ;;.W is uniformly bounded, we have, by the argument given in the proof of Proposition 1 0.8.2 (see (10.8. 1 1 )), that n 0" ln(wj ) _ 1 '\' L.... --> 6 j A (wj ; .Pn) 2n - f" g()o ; Po) d 1 A a.s. _, A (A; cp) (10.8.25) Also, since g(A ; Pn) � A (A; <I-n) for all large n, the matrix Hn - Gn(Pn) is non negative definite (see Problem 4.8) and thus the matrix G; 1 (Pn ) - H; 1 is also non-negative definite (see Rao ( 1973), p. 70). Thus, by ( 10.8.24), (10.8.25) and ( 10.8.23), Letting 9t --. 9, the lower bound becomes and finally, letting t: --> 0 and applying the monotone convergence theorem, we obtain the desired result, namely As in the proof of (a), we can also find a set B2 such that P(B2 ) = 1 and for each outcome in B2 , 384 1 0. Inference for the Spectrum of a Stationary Process for any sequence Pn --+ p with p E ac. Proposition D 1 0.8.4. If P E C, then ln(det Gn (P)) > 0 and n 1 in (det Gn (P)) --+ 0 - as n --+ oo . PROOF. Suppose { t; } is an ARMA process with spectral density (2n:) - 1 g(A.; p). Then det Gn (P) = r0 · · · r" _ 1 , where r, = E( l-";+ 1 - Yr 1 ) 2 (see (8.6.7)). Rewriting the difference equations for { t; } as 1'; + 2:: ;;, 1 ni Yr -i = Z,, we see from the definition of Yr+1 that + 1 = ( Var(Z, + d = E l-"; + 1 + )z I= 1 n:i l-";+ 1 -i J c=�1 l nil y Yr(O). Since r, 1 and r0 1 , det Gn (P) r0 rn - 1 1 , and hence ln(det Gn (P)) 0. Moreover, since (L� l n)) 2 y (0) -+ 0 as t + it follows that r1 -+ 1 , so ::;; 1 + 2 > r+ 1 by Cesaro convergence, = Y ··· > - > oo, n- 1 0 ::;; n - 1 ln(det Gn (P)) = n - 1 L In rr - 1 t=O D 10.8.1. Let �"' �" and Pn be the estimators in C which minimize l( p) = ln(X� G; 1 (P)Xn /n) + n - 1 ln(det Gn (P)), a:; (p) = n - 1 X � G; 1 (P)X n , and a;(p) = n - 1 Li (In (wj)/g(wi; p)), respectively, where {X, } is an ARMA process with true parameter values Po E C and rr5 > 0. Then Theorem (i) (ii) (iii) PROOF. Let B be the event given in the statement of Proposition 1 0.8.3. Then § 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators 385 there exists an event B* c B with probability one such that for each outcome in B*, (1 0.8.7) holds with p = Po and ( 1 0.8.8) is valid for all rational c5 > 0. We shall therefore prove convergence in (iHiii) for each outcome in B*. So for the remainder of the proof, consider a fixed outcome in B*. (i) Suppose Pn does not converge to Po· Then by compactness there exists a subsequence {�n. } such that �n. -> P where P E C and P #- Po · By Proposition 1 0.8.2, for any rational c5 > 0, '\' In,(w) f.... gi Pn j wj; ) g(k R 0) --_ 1'_ d).. giJ. ; p) IliDk- oom. f (J-2n,(lR'n,) ;:::-: IliDk- oomf nk . . (] 2 = _ll. 2n However by Proposition ( 1 0.8. 1 ), aJ 2n f" _, . I" _ , - 1 ' g(J.; Po) ' d A > a02 , g().; p) so by taking c5 sufficiently small we have lim inf rr;. (�n. ) > aJ . k-oo On the other hand, by definition of �n and ( 1 0.8.7), ( 10.8.26) lim sup rr;(�n ) :$; lim sup rr; (po) which contradicts (1 0.8.26). Thus we must have �n -> Po · It now follows quite easily from Proposition 1 0.8.2 that rr; (�n) ...... aJ. (ii) As in (i) suppose �n does not converge to Po · Then there exists a subsequence {PnJ such that Pn, ...... P #- Po with P E C. By Propositions 1 0.8.3 and 10.8. 1 But, by Proposition 10.8.3(a) and the definition of �n, lim sup a,;(�n) :$; lim sup a; (p0 ) = aJ which contradicts the above inequality. Therefore we conclude that �n ...... p 0, and hence, by Proposition 10.8.3(a), that a;(�n) -> aJ. (iii) Suppose Pn• -> P #- Po for some subsequence {Pnk }. Then by Propositions 10.8.3 and 10.8.4 and the definition of Pn , we obtain the contradiction A 386 1 0. Inference for the Spectrum of a Stationary Process ln(O"�) < lim infln(a,;. (�"• )) k � ro :s; lim inf l( �n ) :s; lim sup l(P0) . n-oo k--+-oo = ln(O"�). Thus �n --> Po and a,;(�n ) --> (}� by Proposition 1 0.8.3(a). D Asymptotic Normality of the Estimators 1 0.8.2. Under the assumptions of Theorem 1 0.8.1, (i) �" is AN(p0 , n - 1 w - 1 (P0 )), (ii) �" is AN(P0 , n- 1 w -1 (P0)), and (iii) �" is AN(p0 , n -1 w - t (P0)), where Theorem Before proving these results we show the equivalence of the asymptotic covariance matrix w -1 (P0) and the matrix V(P0) specified in (8.8.3). In order to evaluate the (j, k)-component of W(p) for any given p E C, i.e. _!_ I " W1k 4n _ _" a !n g(.l.; P) a !n g ( .l. ; p) 1 dA' of3j o f3k (10.8.27) we observe that In g(),; p) = In 8(e - iA ) + In 8(e iA ) - In <P (e - iA ) - In <P (e iA ), where <P (z) = 1 - <P t z - . . . - <Pp z P and 8(z) = 1 + e l z + . . . + eq z q. Hence a In g(A; P)/o<Pj = e - ijA r l (e - iA ) + e ijA r l (e iA ) and (J !n g(.l_ ; p)j(J8j = e- ijA e - l (e - iA ) + e ijA e - l (e iA ). Substituting in (10.8.27) and noting that for j, k :2: 1 , I�" j, I e iU + kl A <P- 2 (e iA ) d.l. = I� " e - iU+k)A <P - 2 (e - iA ) d.l. = 0, we find that for k :,::; p, " WJk = _!_ (e - i(j - k) A + e iU - kl A ) I <P (e -iA ) I - 2 dA = E [ Ur -1.+ 1 Ur - k +l ] , 4 n _" 387 § 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators where { ut } is the autoregressive process defined by, = _!_ I-nn { N1 } The same argument shows that for j, k W Jk 4n � � WN(O, 1 ). (10.8.28) p + 1, (e - i(j - k)A + e i(j - k)A ) I B(e - iAW 2 d) = E [ V.t -J. +l V.t - k + 1 ] • � where { v; } is the autoregressive process defined by, B(B) Yr = No p + 1, __!_ I n a d.. a e a a n = __!_ I-n For j � p and k = p + m w, p+ m = j 4n - 7t � { N1 } � WN(O, 1 ). ( 10.8.29) In g(.A.; �) in g(.A. ; �) d.A. m 'f'J m [e i<m -j)A rp - 1 (e -iA ) B - 1 (e iA ) + e - i< -j)A rp - 1 (e iA )B- 1 (e - iA )J d.A.. 4n If { Z(.A.), - n � A. � n} is the orthogonal increment process associated with the process { N1 } in (10.8.28) and ( 10.8.29), we can rewrite ltf . p+ m as ltf. p+m = � [ \I:n (fn + e iA<t -jl rp - 1 (e - iA ) dZ(.A.), e iA(t - ml g- 1 (e - iA ) dZ(A. ), = E [ Ut -j +l V. - m + l ] , I:" r" ) )J e iA(t - m) e - 1 (e - iA ) dZ(A.) e iAU -jlrp -1 (e- iA ) dZ(}�) and by the symmetry of the matrix W(�). 1 � m � q, 1 � k � p. The expressions for ltjk can be written more succinctly in matrix form as W(�) = E [Y1 Y; J, (10.8.30) where Y1 = (U0 Ut - 1 , . . . , Ut - p+ 1 , v; , Yr - 1 , . • • , Yr -q + 1 )' and { U1 } and { Yr } are the autoregressive processes defined by ( 1 0.8.28) and ( 1 0.8.29) respectively. The expression ( 1 0.8.30) is equivalent to (8.8.3). We now return to the proof of Theorem 1 0.8.2, which is broken up into a series of propositions. 10.8.5. Suppose In ( " ) is the periodogram of {X 1 , . • • , Xn } and In , z C ) is the periodogram of { Z 1 , . . . , Zn } · If IJ( · ) is any continuous even function on [ - n, n] with absolutely summable Fourier coefficients {am , - oo < m < oo }, then Proposition (10.8.3 1 ) as n � oo . 1 0. Inference for the Spectrum of a Stationary Process 388 PROOF. From Theorem 10.3.1, I.(wj) - g (wj; �0) 1•. 2(wj) = R.(wj) where R.(),) = ; ; 1/J (e - 0')12 (..1.) Y, ( - A.) + !/J, (e 0')Jz ( - A.) Y,(A.) + I Y, (A.W , 1/J (e - ;. ) = I � o 1/Jk e - ;.kr, 1) "n " ro . . w u "" ;;. Y. 1 1 1 1 JZ(),' ) = n - 12 L.,t=l z,e - ' n (A = n - 12 L.,1= 0 '1' 1 e - -1 z,e - u n , 1 and Un,1 = L.,t=l r 1 ). - I7= I Z,e - iAt = I �=! (Zr - 1 - zn -l+ r ) e - i }.( - The proof of Theorem 10.3. 1 gives max wk E [ O,nJ E I R . (wk ) l = O(n- 1 12 ), however this result together with the bound l l E n - 1 12 I R. (w) YJ(Wj ) :;:::; n 1 12 sup i YJ {A) I max E I R. (wk ) l , j A Wk E [0, 7t) is not good enough to establish ( 10.8.3 1 ) . Therefore a more careful analysis is required. Consider n- 1 /2 I 1/J (e- i"'j ) Jz(w) Y, ( - wj) YJ(w) j oo n ro oo / n - 312 I I I I I I !/Jk ljJ1 am Zr [Z,_1 - zn - 1 + r ] e - iwj(k + m - r + t) . j k=O 1= 0 r=l m = -oc; r=l Now for k, I, r and m fixed let s = r - m - k mod(n). Then = if t if t -=f. s, = s, which implies that l l I !/J ljJ1 am Zr [Z,_1 - zn - l +r ] �J e - iwj(k+m-r +t) :;:::; 2 1 !/Jk !/J1 am i (J'5 r =l k and hence that E n -I --+ 0 as n --+ oo . Since E l Y,(w) l 2 :;:::; 2 n - 1 0'6(Ik"= o 1 1/J k l l k l 1 12 ) 2 (see (10.3. 1 3)), 2 E n - 1 12 I I Y,(w) I 2YJ(w) :;:::; 2n- 1 120'6 I I I/Jk l l k l 1 12 sup i YJ(},) I l 1 l --+ 0, (k = O ) ;. whence as desired. D Proposition 10.8.6. If YJ ( · ) and its Fourier coefficients {a m } satisfy the conditions of Proposition 10.8.5 and if I:::= t l am I m 1 12 < oo and J':., YJ(A.) g (A.; �0) dA. = 0, 389 § 1 0.8.* Asymptotic Behavior of the M aximum Likelihood Estimators t hen PROOF. In view of Proposition 1 0.8.5, it suffices to show that ( n - 1 L ln . z(w)x(w) is AN o, n - 1 atf" ) x 2 (.A.) d). , (10.8.32) n -" J where X(A) = IJ(A)g(),; �0). Let Xm(A) = L lk l s m bk e m, where bk (2rrr 1 J':." e- ik i.X(A) d).. The assumptions on the Fourier coefficients of 17 (},) = together with the geometric decay of Yxlh) imply that l:: ;;"= 1 l bk l k 1 12 < oo, and Xm(),) -> X(A) = L bk eiu as rn -> oo, k where the convergence is uniform in A. It also follows from our assumptions that b0 = 0. We next show that for all t: > 0, lim lim sup P n - 112 � /n,z(wi ) (x(wi ) - Xm(wJ) > e = 0. (10.8.33) m --+oo n --+ oo 1 Observe that L bk e ikwJ n- 1 !2 L In,z(w) (x(wJ - Xm(wJ) = n-1!2 L L Yz(h)e - ihwJ j lhl < n lk l> m j ( � n 1 12 Yz(O) k t;_o bkn )( ( where iiz(h) = n -1 L:��jhl Z,Z,+ Ihl and Kh term in (10.8.34), we have I I ) I = ) ( 10.8.34) { k E Z : l k n + h i > rn}. For the h � 2')iz (O)n 112 � bk I k n = 0 I ( 1 0.8.35) 1 2 1 � 2jlz(O) L i bk l k -> 0 a.s. k =n by Remark 2 and the summability of { l bk l k 1 12 }. Moreover, since Ejlz (h) 0 for h =1= 0 and Eyz(h)yz(k) = 0 for h =I= k (see Problem 6.24), we have and for n > rn [ ro ( E 2n 1 12 I1 Yz(h) L bkn + h k E Kh h= )] = 0 = (10.8.36) Now (10.8.33) follows at once from (10.8.35) and (10.8.36). 10. Inference for the Spectrum of a Stationary Process 390 To complete the proof it is enough to show, by Proposition 6.3.9 and Problem 6. 1 6, that n- 1 and ( :ci f n � I z(wi)Xm(wi) • f. is AN 0, n - 1 . f, x ;, (Jc) d.lc --> x2(Jc) d.lc X;,(Jc) d.lc ) (10.8.37) (10.8.38) as m --> oo . But since n 112 yz(n - k) = n- 1 12 I�= 1 Z,Zr+n - k = op(1 ), it follows from Proposi tions 6.3.3 and 6.3.4 and Problem 6.24 that m m 1 1 n- 112 2 ) z (wJ xm (wJ = 2n 12 I Yz(k)bk + 2n 12 I Yz(n - k)bk k= 1 j k= 1 m 2n 1 12 L Yz (k)bk + op(1 ) k=1 •. = => N (0, � ) = f':., 4ari k 1 b[ . By Parseval's identity, 4ari L �= 1 b[ arifn x;,(Jc) d.lc, which establishes (10.8.37). Finally (10.8.38) follows from the uniform convergence of Xm (Jc) to xm D PROOF OF THEOREM 10.8.2. (i) The Taylor-series expansion of 8 0'2 (P0)f8 P about p = �. can be written as n 1 12 80'2 (�. ) 8 0'2 (Po ) n 112 ap ap _ = - _ n 1 12 8 2 0'2 (P!) R ( Pn apz _ R ) PO / 8 2 2(P!) (R R n 1 2 0' Pn - PO ) ' apz for some P! E C satisfying II P! - �. II < II �. - Po II ( II · II = Euclidean norm). Now -1 . R t - 2( Rtn ) a2 ()" - "[ ) 8 2 g (Wj , Pn ) P n 1 Lr n (WJ· ap2 ap2 = and since P! --> Po a.s. by Theorem 1 0.8. 1, the proof given for Proposition 1 0.8.2 can be used to establish the result, az 0'2 (P!) aJ ap2 --> 21! " I-n (k R g ' PO ) az g - 1 (Jc ; Po ) .lc d a.S. apz (10.8.39) Since (2 n)- 1 g (Jc; p) is the spectral density of a causal invertible ARMA process with white noise variance equal to one, it follows that J':., In g (Jc ; p) d.lc 0 for = 391 § 1 0.8 * Asymptotic Behavior of the Maximum Likelihood Estimators ali � E C, and hence that Since the last relation holds also with g replaced by g- 1 , it follows from (10.8.39) that Consequently it suffices to show that au z (�o) a� · IS 4 AN(O, n- 1 4<To W(�o)), or equivalently, by the Cramer-Wold device, that c' for ali c E [RP + q. But au2(�o) . IS AN(O, n - 1 4aric' W(�o) c) a� a (�o) c uz = n -1 � I"(w)1J(w), 7' a� I where 17(-A) = c' ag -1 (-A; � 0)/a�. Now 17'( " ) and 17"( " ) are also continuous func tions on [ - n, n], so that by Problem 2.22, the Fourier coefficients of 17( ) satisfy the assumptions of Proposition 1 0.8.6 and · I" I" I = c'O = 0. 17(-A)g(.A; �0) d), = - c' : ln g(.A; �) d.A u � -rr ll = llo Hence, invoking Proposition 1 0.8.6, we have - rr (o, I" ) a n -1 I I.(wi }1J(w) is AN n -1 6 172(.A) g2(.A; � 0) d.A , n _ j and since (a6 /n) f�" 1'/2(.A) g2 (.A; �0) d.A = 4aric' W(�0) c', the proof of (i) is complete. (ii) Expanding 00'2(�0)/a� in a Taylor series about the vector � = P., we have as in the proof of (i), -rr a (�o) - /Z a z az (�!) R ( = n1 n 1/Z az a� a� z I'n - R 1' 0 ) for some �! E C with �! --+ � 0 a.s. By (i) and Proposition 6.3.3, it suffices to show that 1 0. Inference for the Spectrum of a Stationary Process 392 (1 0.8.40) and - a 2 (Po ) p o '" aa 2 (Po ) 1 ' . . . ' p + q . ( 10.8.41) - n 1/2 u ---+ tOf k 8f3k 8f3k The proofs of ( l 0.8.40) and ( 1 0.8.41 ) follow closely the argument given for the proof of Proposition 10.8.3. We shall only prove (10.8.41). Since g()"; Po ) and 8g(A; P0)/8f3k have continuous derivatives of all orders with respect to )" and since g(A; P o ) > 0 and I 8g(A ; P0)/8f3k I > 0, it follows easily from Problem 2.22 that n 1/2 ( 10.8.42) as h ---+ oo. Set q m (A) = L b(h; Po )e- ih).. lhi :O: m Then 8b(h ; P o ) -ih). " . L.... --- e lhl ,; m apk Equations (10.8.42) ensure that if m = [n 1 15 ] (the integer part of n 1 15 ), then where a(z) = 1 + a 1 z + · · · + am z m # O for lzl � 1 and K m ---+ 1 as m ---+ oo. Let H. be the covariance matrix corresponding to the autoregressive spectral density (2nqm (A))- 1 • We shall show that n - 11 2 � (X� G; 1 (P0)X. - � I.(w)qm(w)) !. 0 a k ( 10.8.44) as n ---+ oo, where m = [n 115 ]. Once this is accomplished, the result (10.8.41 ) follows immediately, since b y (10.8.43), § 1 0.8.* Asymptotic Behavior of the Maximum Likelihood Estimators 393 0 :-::;; n-112 L In (wJO(n - 315 ) = O(n- 111 )]7 (0) = op(l). j Throughout the remainder of this proof set m = [n115 ]. From the proof of Proposition 10.8.3, the eigenvalues of Gn- 1 (�0) and H;;1 are uniformly bounded in n. Moreover, the eigenvalues of the matrices aG"(�0)ja{3k and aH"(�0)ja{3k are also uniformly bounded in n since ag().; �0 )japk and aq;;/ (A)japk are uniformly bounded (see the proof of Proposition 4.5.3). It is easy to show from ( 1 0.8.43) that there exists a positive constant K such that for all y E IR ", (cf. ( 10.8. 1 7)) II (Gn- 1 <Po ) - H"- 1 (� o )) YII :-::;; K n - 315 IIYII , and I CG;�� ) I I 1� o) - a o) y :-::;; K n - 315 II Y II . It then follows from a routine calculation that n- 112 a k (X� Gn-! ( IJo )Xn - X�H;;1 Xn ) :-::;; 0 O(n -1 11 )]7(0) = op( l ). ( 1 0.8.45) We next compare n - 112 a(X�Hn- 1 Xn )/a{3k with n-112 a(X�H;;1 Xn )/a{3k where H;;1 is the covariance matrix of an MA(m) process with spectral density (2nt 1 q m ()�) = (2nKm t 1 ! a(e- i-')! 2 • Now from the proof of Proposition 10.8.3 (see ( 1 0.8. 1 9) and ( 1 0.8.20)), a n - 112 - (Xn' Hn- 1 xn - X'n Hn- 1 xn ) ap� _ = . � a n -112 L. - (h I}.. - h!} )X!. X.J i,j= 1 apk It follows from ( 1 0.8.43) that a(h ;j - fiij)japk is uniformly bounded in i and j, and since m = [n115] the above expression is op(1 ). By the same reasoning, n - 1;z a ' a xn Hn-1 x n - n -1 !2 " In (wj ) q m (wj) + op (1) apk 7 apk (see ( 1 0.8.21 )), and with ( 1 0.8.45) this establishes ( 10.8.44) and hence ( 10.8.41 ). (iii) From the Taylor series expansion of /(�0) about � = �n (cf. (i) and (ii)), it suffices to show that as n -> oo , _ a 2 ln(det G"(�!)) P ( 10.8.46) n 1 -> O, 2 a� where �! -> �0 a.s., and ( 1 0.8.47) 1 0. Inference for the Spectrum of a Stationary Process 394 We shall only prove (1 0.8.47) since the argument for ( 1 0.8.46) is similar, but less technical due to the presence of the factor n - 1 . As in the proof of Proposition 10.8.4, if { t; } is an ARMA process with spectral density (2nr 1 g(Jc; p), then det Gn(P) = r0(P) · · · r" _ 1 (p), where r1(P) = £ ( ¥;+1 - Y,+d 2 • Denote the autocovariance function of { t; } by 1J(h; p) and write the difference equations for { t; } as 00 t; + L niP) Yr -i = Zo j=l { Z1 } � (10.8.48) 110(0, 1). We have from Corollary 5. 1 . 1, rt (p) = IJ (O; p) - 'l; Gt- 1 (P) 'ln where '11 = (1]( 1 ; p), . . . , 1J(t; p))'. For notational convenience, we shall often suppress the argument p when the dependence on p is clear. From (1 0.8.48), we have where 'loo = ( 1] ( 1 ; p), 1](2; p), . . . )' , Goo = [1J( i - j; P) J�j=l and n(2; p), . . . )'. It then follows that 1t 00 = (n(1 ; p), and it is easy to show that G;;/ may be written as where G;;/ = TT', T = [n;-iP) J�j=l , n0(P) = 1 and niP) = 0 for j < 0. We also have from ( 10.8.48) and the independence of Z1 and { Yr - 1 , t;_ 2 , . . . }, that 1](0; p) = 1t:n Goo 1too + l . Consequently, we may write rt (P) = 1 + 11:n G� 1 'loo - 'l; Gt- 1 'ln and hence art (Po ) a :n Goo = 2 'l G - 1 'l oo + 'l ,oo Gco- 1 a G - 1 'loo af3k af3k af3k 00 -2 C(J a 'l; - 1 - aGl - 1 Gt 'lt + 'l ,t Gt 1 Gt 'lt apk a pk where all of the terms on the right-hand side are evaluated at p = Po · Note that if cp1 = (r/J1 1 , . . . , rPtt Y = G1- 1 '11 is the coefficient vector of the best linear predictor of ¥; + 1 in terms of ( r;, . . . , Y1 )', then the above equation reduces to §10.8.* Asymptotic Behavior of the Maximum Likelihood Estimators 395 ( 1 0.8.49) We next show that the vectors rr, Observe that = (n 1 , • • • , n,)' and �� are not far apart. and Y,+ l + n 1 Y, + · · · + n, Yl = I nj Y,+l -j + z, +l j> t so that the variance of (n 1 + <A d Y, + · · · + (n, + </>,,) Y1 is equal to ( � ni Y,+ l-i Z,+l - ( Y,+l - Yr+d) (�, ni Y,+l -i) (Z,+1 - ( Y,+ 1 - Yr+1 )) (� ni y ( I ni ) (rr, + �,)'G,(rr, + �,) = Var + j >t + 2 Var � 2 Var �2 �4 1] (0, �0) + 2(r, - 1 ) ,l l j> t l l 2 1] (0, �0), where the last inequality comes from the calculation in the proof of Proposi tion 1 0.8.4. Since the eigenvalues of G, are bounded below by inf.\ g().; �0) > L > 0, t L (ni + r/J,J 2 j=l � L - 1 (rr, + �,)'G,(rr, + �,) ( 1 0.8.50) for some K > 0 and 0 = < s < 1. Therefore, from ( 10.8.49), where K 1 ( 1 /2n) J':., l ag( A.; �0)/aPk l dA.. Since L� 1 nJ < oo and L J =l r/J,] is bounded in t (see ( 1 0.8.50)), we have from the Cauchy-Schwarz inequality, 1 0. Inference for the Spectrum of a Stationary Process 396 l ar,(�o) I -< a/3k where r,(�0 ) K 2 t !f2 s � tj2 + K3 " .{.... j>t l n·l J + K 4 ts� tj2 K 2 , K 3 , K4 � and K 5 are positive constants and 0 1, it then follows that < ln(det Gn (�o) < a n�l ort(�o) I apk I t=O I of3k I rt 1'0) n a (� ) .::;; n� ! j2 i r� o I r=o I apk �; n tz � n t;z �[ L... s1 < 1 . Since (R l .::;; n� t;z K s ( l - s l )� t -> 0 as n -> oo , which completes the proof of ( 10.8.47). D Problems 1 0. 1 . The discrete Fourier transform { ai,j E F" } of {X 1 , . . . , X" } can be expressed as wi = 2njjn E ( - n, n], where n J(},) = n- 1!2 L X,e - i<J., t= l - oo < .?t < oo. 1 1 Show that X, = (2n) - n 12 J':n J(A.)e;';. d.?t. [In this and the following questions we shall refer to J(A.), - oo < .?t < oo, as the Fourier transform of {X1 , . . . , Xn } · Note that the periodogram { In(wi), wi = 2njjn E ( - n, n] } can be expressed as 1 0.2. Suppose that z, = x,y,, t = 1 , . . . , n. If lx, Jy and lz are the Fourier transforms (see Problem 1 0. 1 ) of {x,}, { y, } and {z, } respectively, and if �i = lx(2njjn), IJi = ly(2njjn) and (i = lz(2njjn), j = 0, ± 1 , . . . , show that (i = n - 1 !2 L �k iJj -k · k E F" 1 0.3. Suppose that { x,, t = 0, ± I, . . . } has period n and that z s = L gk x, _k , t = 0, ± 1 , . . . . k = -s i If G(e - ;;.) = L:i= - s gie- i \ - oo < .?t < oo, and lx, lz are the Fourier transforms of {x1, . . . , xn} and {z1, , zn } respectively, show that , • • . wi = 2njjn E ( n, n], - 397 Problems and wj = 2nj/n E ( - n, n]. 1 0.4.* Show that the sequence X, = e i vr , t = I , . . . , n, has Fourier transform, sin [n(A - v)/2] ] exp[ - i(A - v)(n + 1 )/2], J(),) = n- t;z . ) sm [ ( - v)/2 - OO < A < OO. , Use this result to evaluate and plot the periodograms /8 and It o in the case when v = n/4. [n/4 is a Fourier frequency for n = 8 but not for n = 1 0.] Verify in each case that 'I,j E FJn(wj) = 'I,� t i X, I2. � 10.5.* If J( · ) is the Fourier transform in Problem 10.4 and - n < v < n, determine the frequencies ), such that (a) IJ().W is zero and (b) jJ( W has a local maximum at A. Let M = jJ (vW and M t = I J(). t ll2 where At is the frequency closest to v (but not equal to v) at which I J( W has a local maximum. Show that (a) M ---> oo as n ---> oo, (b) Mt/M ---> .0471 90 and A t ---> v as n ---> oo, and (c) for any fixed frequency w E [ - n, n] such that w -=1- v, j J (wW ---> 0 as n ---> oo . · · 1 0.6. Verify (10.2. 1 1 ) and show that IIPs X - P5P{ t )X II 2 is independent of II X - Ps X II 2 • 1 0.7. The following quarterly sales totals { X,, t = 1 , . . . , 12} were observed over a period of three years: 27, 1 8, 1 0, 1 9, 24, 1 7, 5, 1 5, 22, 1 8, 2, 14. Test at level .05 the hypothesis, X, = c + Z,, { Z, } � IID(O, 172 ), where c is constant, against the alternative, X, = c + S, + Z,, where S, is a deterministic sinusoid with period one year. Repeat the test assuming only that S, has period one year but is not necessarily sinusoidal. 10.8. Use the computer program SPEC to compute and file the periodogram I (wj), 0 < wj = 2nj/n s; n of the Wolfer sunspot numbers. File also the standardized cumulative periodogram C(j), 1 s; j s; [(n - 1 )/2], defined by ( 10.2.22). Plot the two periodograms and use the latter to conduct a Kolmogorov-Smirnov test at level .05 of the null hypothesis that { X, } is white noise. 10.9.* Consider the model X, = f1 + A cos wt + B sin wt + Z,, t = 1, 2, . . . , where {Z, } is iid N(0, 172), fl, A, B and 172 are unknown parameters and w is known. If X" , A and li are the estimators, X" = n- t 'I,�� X,, A = (2/n) t;z 'I,�� t (X, - X" )cos wt and li = (2/n) t 12 'I,��1 (X, X" )sin wt of fl, A and B, show that - 398 1 0. Inference for the Spectrum of a Stationary Process 1 0. 1 0. Generate 1 00 observations, { X1 , , X100}, of Gaussian white noise. Use the following three procedures to test (at level .05) the null hypothesis, {X, } is Gaussian white noise, against the alternative hypothesis, {X, } is Gaussian white noise with an added deterministic periodic component of unspecified frequency. (a) Fisher's test. (b) The Kolmogorov-Smirnov test. (c) Let wi be the frequency at which the periodogram is maximum and apply the test described in Section 1 0.2(a) using the model X, f1 + A cos wit + B sin wit + Z,. In other words reject the null hypothesis if • . • ( 100 - 3)/(w) ICt Xf - / (0) - 2/(wi)) = > F. 95 (2, 97). Is this a reasonable test for hidden periodicities of unspecified frequency? = 1 0. 1 1 . Compute the periodogram of the series {X, - Xr- 1 , t 2, . . . , 72} where X,, t = 1 , . . . , 72, are the accidental deaths of Example 1 . 1 .6. Use the procedure described in Section 1 0.2(a) to test for the presence of a deterministic periodic component with frequency 1 2n/71 . (This is the Fourier frequency with period closest to 1 2.) Apply Fisher's test to the periodogram of the residuals from the fitted model (9.6.6) for { X, } . 1 0. 1 2. For the Lake Huron data of Problem 9.6, estimate the spectral density func tion using two different discrete spectral average estimators. Construct 95% confidence intervals for the logarithm of the spectral density. Also compute the M LARMA spectral density estimate and compare it with the discrete spectral average estimators. 1 0. 1 3. * Suppose that V1 , V2 , . . . , is a sequence of iid exponential random variables with mean one. (a) Show that P(max 1 -s;hs,q J.j - In q ::::; x) e - e-x for all x as q oo . 1 (b) Show that P(max 1 -s;hq J.j/(q - Ll=! J.j) ::::; x + In q) e - · -x a s q oo . (c) I f C:q is a s defined i n (10.2.20) conclude that for large q, P((;q - In q � x) � 1 - exp{ - - } . ex ...... ...... ...... ...... 1 0. 1 4. If {Z, } - IID(O, cr 2 ) and EZi < oo , establish the inequality, E(LJ'= 1 Z)4 ::::; mEZ{ + 3m2 cr4. 1 0. 1 5. Find approximate values for the mean and variance of the periodogram ordinate /2 00(n/4) of the causal AR( 1 ) process X, - .5X,_1 1 = Z,, { z,} - IID(O, cr 2 ). Defining ](wi) = ( 1 0nr L.l= -z I2 00(wi + wk), wi = 2nj/200, use the asymp totic distribution of the periodogram ordinates to approximate (a) the mean and variance of ](n/4), (b) the covariance of ](n/4) and ](26n/100), (c) P(](n/4) > l . lf(n/4)) where f is the spectral density of {X,}, (d) P(maxl -s; j .,; 9 g (/z o0(w)/f(wi)) > .06 L,]1\ (/2 00(wi)/f(w))). 1 0. 1 6. Show that successive application of two filters {a_,, . . . , a,} and { b_, . . . , b, } to a time series {X, } is equivalent to application of the single filter { c_,_, . . . , c,+,} where 399 Problems 00 00 ck = j L ak-A = j L bk -j aj , = = - oo - co and aj , bj are defined to be zero for l j l > r, s respectively. In Example 1 0.4.2 show that successive application of the three filters, r 1 { 1 , 1, 1 }, T 1 { 1, . . . , 1 } 1 and 1 1 - I { 1 , . . . , 1 } is equivalent to application of the filter (23 1 ) - { 1 , 3, 6, 9, 12, 1� 1 8, 20, 2 1 , 2 1 , 2 1 , 20, 1� 1 5, 1 � 9, � 3, 1 } . 1 0. 1 7. If L ?= 1 X, = 0, /"( · ) is the period-2n extension of the periodogram of {X1 , X" }, and f�(w), wj = 2nj/n, is the Daniell estimator, m ' 1 show that • • • , fD (wj ) = L In(wj + wd, 2n (2m + 1) k = - m where Ak = (2m + 1 ) - 1 sin[(2m + 1 )kn/n]/[sin(knjn)]. Compare this result with the approximate lag window for the Daniell estimator derived in Section 1 0.4. 1 0. 1 8. Compare the Bartlett and Daniell spectral density estimators by plotting and examining the spectral windows defined in ( 1 0.4. 1 3). 1 0. 1 9. Derive the equivalent degrees of freedom, asymptotic variance and bandwidth of the Parzen lag-window estimator defined in Section 1 0.4. 1 0.20. Simulate 200 observations of the Gaussian AR(2) process, X, - X,_ 1 + .85X,_ 2 = Z,, Z, � WN(O, 1 ), and compare the following four spectral density estimators: (i) the periodogram, (ii) a discrete spectral average estimator, (iii) the maximum entropy estimator with m chosen so as to minimize the AICC value, (iv) the M LARMA spectral density estimator. Using the discrete spectral average estimator, construct 95% confidence intervals for ln f(A.), A E (0, n), where f is the spectral density of {X, } . Does In f(A.) lie entirely within these bounds? Why does f( · ) have such a large peak near n/3? 1 0.21.* (a) Let X I , . . . , xn be iid N(O, a 2 ) random variables and let Yl , . . . , Y. be the corresponding periodogram ordinates, }j = I"(w), where q = [(n - 1 )/2]. Determine the joint density of Y1 , . . . , Yq and hence the maximum likelihood estimator of a 2 based on Y1 , . . . , Y. · (b) Derive a pair of equations for the maximum likelihood estimators rfo and 6 2 based on the large-sample distribution of the periodogram ordinates /" (2 1 ), . • . , /"( 2m ), 0 < 2 1 < · · · < 2m < n, when {X1 , , X" } is a sample from the causal AR(1) process, X, = I/JX,_ + Z,, {Z, } IID(O, a 2 ) . 1 • • . � 1 0.22.* Show that the partial sum S2 n +l (x) of the Fourier series of I10.n1(x) (see (2.8.5)) 10. Inference for the Spectrum of a Stationary Process 400 satisfies 1 1 Szn +dx) - - + _ 2 fx n 0 sin [2(n + l)y] dy, . sm y X 2 0. Let x 1 denote the smallest value of x in (0, n] at which Szn+l ( · ) has a local maximum, and let M1 = Szn+ 1 (x d. Show that (a) limn-oo x1 = 0 and (b) limn-oo M1 = 1.089367. [This persistence as n --> oo of an "overshoot" of the Fourier series beyond the value of /10.n1 (x) on [0, n] is called the Gibbs phenomenon.] CHAPTER 1 1 Multivariate Time Series Many time series arising in practice are best considered as components of some vector-valued (multivariate) time series { X,} whose specification includes not only the serial dependence of each component series {Xr; } but also the interdependence between different component series { Xr; } and {Xti}. From a second order point of view a stationary multivariate time series is determined by its mean vector, J1 = EX, and its covariance matrices r(h) = E(Xr +h X;) ' J1J1 , h = 0, ± 1, . . . . Most of the basic theory of univariate time series extends in a natural way to multivariate series but new problems arise. In this chapter we show how the techniques developed earlier for univariate series are extended to the multivariate case. Estimation of the basic quantities J1 and r( · ) is considered in Section 1 1 .2. In Section 1 1 .3 we introduce multivariate ARMA processes and develop analogues of some of the univariate results in Chapter 3. The prediction of stationary multivariate processes, and in partic ular of ARMA processes, is treated in Section 1 1 .4 by means of a multivariate generalization of the innovations algorithm used in Chapter 5. This algorithm is then applied in Section 1 1.5 to simplify the calculation of the Gaussian likelihood of the observations { X 1 , X 2 , . . . , X"} of a multivariate ARMA process. Estimation of parameters using maximum likelihood and (for autoregressive models) the Yule-Walker equations is also considered. In Section 1 1 .6 we discuss the cross spectral density of a bivariate stationary process {X,} and its interpretation in terms of the spectral representation of {X.}. (The spectral representation is discussed in more detail in Section 1 1 .8.) The bivariate periodogram and its asymptotic properties are examined in Section 1 1 .7 and Theorem 1 1 .7. 1 gives the asymptotic joint distribution for a linear process of the periodogram matrices at frequencies A 1 , A 2 , , Am E (0, n). Smoothing of the periodogram is used to estimate the cross-spectrum and hence the cross-amplitude spectrum, phase spectrum and . . • 402 1 1 . Multivariate Time Series squared coherency for which approximate confidence intervals are given. The chapter ends with an introduction to the spectral representation of an m-variate stationary process and multivariate linear filtering. § 1 1. 1 Second Order Properties of Multivariate Time Series Consider m time series { X1 ; , t = 0, ± 1, ± 2, . . . }, i = 1, . . . , m, with EXr7 < oo for all t and all i. If all the finite dimensional joint distributions of the random variables { Xr d were multivariate normal, then the distributional properties of { X1 ; } would be completely determined by the means, (1 1.1.1) and covariances, ( 1 1 . 1 .2) yij (t + h, t) := E [(Xr +h, i - flr +h, ; ) (Xti - flti )] . Even when the observations { Xti } do not have joint normal distributions, the quantities flr; and yij (t + h, t) specify the second-order properties, the co variances providing us with a measure of the dependence, not only between observations in the same series, but also between observations in different series. It is more convenient when dealing with m interrelated series to use vector notation. Thus we define ( 1 1 . 1 .3) t = 0, ± 1 , ± 2, . . . . The second-order properties of the multivariate time series { Xr } are then specified by the mean vectors, Jlr : = EXt = (flr t • . . . , fltm )', and covariance matrices, r{t + h, t) := E [(X r+h - Jlr+h ) (Xr - Jlr )' ] ( 1 1 . 1 .4) = [yij(t + h, t) ]�j = t · ( 1 1 . 1 .5) Remark. If { Xr } has complex-valued components, then r(t + h, t) is defined as r{t + h, t) = E [ (Xr+h - Jlr+h ) (Xr - Jlr )* ] , where * denotes complex conjugate transpose. However we shall assume except where explicitly stated otherwise that X1 is real. As in the univariate case, a particularly important role is played by the class of multivariate stationary time series, defined as follows. Definition 1 1.1.1 (Stationary Multivariate Time Series). The series ( 1 1 . 1 .3) with � I I . I . Second Order Properties of Multivariate Time Series 403 = means and co variances (1 1 . 1 .4) and ( 1 1 . 1 .5) is said to be stationary if 11, and r(t + h, t), h 0, ± I , . . . , are independent of t. For a stationary series we shall use the notation, 11 and ( 1 1 . 1 .6) := E X , = (Jl i , · . . , Jlm )', ( 1 1 . 1 .7) We shall refer to 11 as the mean of the series and to r(h) as the covariance matrix at lag h. Notice that if {X,} is stationary with covariance matrix function r( · ), then for each i, { X,i } is stationary with covariance function Yii ( · ). The function Yii( ), i i= j, is called the cross-covariance function of the two series { X,i } and { X,i }. It should be noted that Yi) · ) is not in general the same as Yii ( · ). The correlation matrix function R( · ) is defined by · (1 1 . 1.8) The function R( · ) is the covariance matrix function of the normalized series obtained by subtracting 11 from X, and then dividing each component by its standard deviation. The covariance matrix function r( · ) [yii( " )]�i=l , of a stationary time series { X, } has the properties, = (i) (ii) (iii) (iv) r(h) = r'( - h), iyii(h) i ::;; [yii (O)yii (0)] 1 12 , i, j = I , . . . , m, y u( · ) is an autocovariance function, i = 1 , . . . , m, Li.k = l aj r(j - k)ak � 0 for all n E { 1 , 2, . . . } and a 1 , . . . , a. E !Rm . The first property follows at once from the definition, the second from the Cauchy-Schwarz inequality, and the third from the observation that "'Iii ( · ) is the autocovariance function of the stationary series {X, i , t = 0, ± 1 , . . . } . Property (iv) is a statement of the obvious fact that E(L'J= 1 a�{Xi - 11)) 2 � 0. Properties (i), (ii), (iii) and (iv) are shared by the correlation matrix function R( " ) [pii( · ) ] �j=I , which has the additional property, (v) Pii (O) = 1 . = (A complete characterization of covariance matrix functions of stationary processes is given later in Theorem 1 1 .8. 1 .) The correlation pii(O) is the correlation between Xt i and X,i , which is generally not equal to I if i # j (see Example 1 1 . 1 . 1 ). It is also possible that i Yii(h) i > I Yii(O) I if i i= j (see Problem 1 1 . 1 ). ExAMPLE 1 1 . 1 . 1 . Consider the bivariate stationary process {X, } defined by, xt l X, 2 = = z,, Z, + . 7 5Z, _ 1 o , 1 1. Multivariate Time Series 404 where { Z, } � [ ] [0 WN (0, 1 ). Elementary calculations yield J1 = 0, 1( - 1 0) = 1 0 .75 , l(O) = 1 0 .75 [ J 1 , 1(10) = .75 1 .5625 and r ( j) = 0 otherwise. The correlation matrix function is given by R ( - 10) = [� ::sl R(O) = and R(j) = 0 otherwise. [ l 1 .8 .8 1 R (10) = [�6 �8l The simplest multivariate time series is multivariate white noise, defined quite analogously to univariate white noise. 1 1.1.2 (Multivariate White Noise). The m-variate series { Z" t = 0, ± 1 , ± 2, . . . } is said to be white noise with mean 0 and covariance matrix !:, written Definition ( 1 1 . 1 .9) if and only if { Z, } is stationary with mean vector 0 and covariance matrix function, l(h) = {t ( 1 1 . 1 . 1 0) IID(O, !:), (1 1.1.1 1) if h = 0. 0, otherwise. We shall also use the notation { Z, } � to indicate that the random vectors Z,, t = 0, ± 1 , . . . , are independently and identically distributed with mean 0 and covariance matrix !:. Multivariate white noise {Z,} is used as a building block from which can be constructed an enormous variety of multivariate time series. The linear processes are those of the form 00 x , = I cj z,_j , j= - oo {Z,} � WN(O, l:), (1 1 . 1 . 1 2) where { CJ is a sequence of matrices whose components are absolutely summable. The linear process {X, } is stationary (Problem 1 1 .2) with mean 0 and covariance matrix function, 00 (1 1 . 1 . 1 3) l(h) = I cj +h tc;, h = 0, ± 1, . . . . j= - oo We shall reserve the term MA( oo) for a process of the form ( 1 1 . 1 . 1 2) with Ci = 0, j < 0. Thus {X,} is an MA( oo) process if and only if for some white noise sequence {Z,}, 405 § 1 1 .2. Estimation of the Mean and Covariance Function 00 X1 = I cj zl-j' j�O where the matrices Ci are again required to have absolutely summable com ponents. Multivariate ARMA processes will be discussed in Section 1 1.3, where it will be shown in particular that any causal ARMA(p, q) process can be expressed as an MA( oo) process, while any invertible ARMA(p, q) process can be expressed as an AR( oo) process, 00 I Ajxl-j = zl' j�O where the matrices Ai have absolutely summable components. Provided the covariance matrix function r has the property ,I ;:'� -oo I Yii (h) l < oo, i, j = 1 , . . . , m, then r has a spectral density matrix function, 1 00 ( 1 1 . 1 . 1 4) f().) = - L e - m r(h), - n ::::;; ). ::::;; n , 2n h�-oo and r can be expressed in terms of f as r(h) = J:, e ;;."J().) d). ( 1 1 . 1 . 1 5) The second order properties of the stationary process {X1} can therefore be described equivalently in terms of f( · ) rather than r( ). Similarly X1 has a spectral representation, · XI = I e iJ.r d Z ().), J(-1t,1l] ( 1 1 . 1 . 1 6) where { Z().), - n ::::;; )0 ::::;; n } is a process whose components are orthogonal increment processes satisfying E(dZ.().)dZ ( )) = } k jJ. {jjk().) d). 0 if ). = fJ. if ). #- j.J.. ( 1 1 . 1. 1 7) The spectral representations of r( · ) and {X1} are discussed in Sections 1 1 .6 and 1 1 .8. They remain valid without absolute summability of yii( ) provided f(Jo) d)o is replaced in ( 1 1 . 1 . 1 5) and ( 1 1 . 1 . 1 7) by dF().) (see Section 1 1 .8). · § 1 1 .2 Estimation of the Mean and Covariance Function As in the univariate case, the estimation of the mean vector and cross correlation function of a stationary multivariate time series plays an im portant part in describing and modelling the dependence structure between 1 1 . M ultivariate Time Series 406 the component time series. Let {X, = (X, 1 , , X, m )' , -oo m-dimensional stationary time series with mean vector • • • < t < oo } be an and covariance matrix function [yii (h) J ri= t where Y;)h) = Cov(X,+h .i • X,). The cross-correlation function between the processes { xti } and { x,j} is given by h = 0, ± 1 , . . . . pii (h) = yii (h)/(Y;; (O) yi)0)) 112 , r(h) = E [ (X, + h - !J) (X, - IJ)' J = Estimation of IJ . Based on the observations X 1 , . . . , X., an unbiased estimate of 11 is given by the vector of sample means - 1 n x . = - L X, . n r=l Observe that the mean of the r time series /1j is estimated by ( 1/n) L �= l xtj. The consistency of the estimator X. under mild conditions on ')!;; (h) can be established easily by applying Theorem 7. 1 . 1 to the individual time series { Xr; }, i = 1 , . . . , m. This gives the following result. 1 1.2.1. If {X, } is a stationary multivariate time series with mean IJ and covariance function r( · ), then as n � oo Proposition and E(X. - IJ)'(X. - IJ) � 0 if ')!; ; (n) � 0, i = 1 , . . . , m m ro ro nE(X. - !J)'(X. - IJ) � L L ')!; ; (h) if L IY;; (h) l < oo, i = 1 , . . . , m. h = - co i = l h = - oo The vector x. is asymptotically normal under more restrictive assumptions on the process. In particular, if {X, } is a multivariate moving average process then x. is asymptotically normal. This result is given in the following proposition. 1 1.2.2. Let {X,} be the stationary multivariate time series, ro x , = 11 + L ck z, _ k o { Z, } � IID(O, l:), k = - oo where { Ck = [ Ck (i, j ) J ri = d is a sequence of m x m matrices such that Lr'= - oo I Ck (i, j ) l < oo, i, j = 1, . . . , m. Then Proposition PROOF. See Problem 1 1 .3. D 407 § 1 1 .2. Estimation of the Mean and Covariance Function This proposition can be used for constructing confidence regions for J.l. For example if the covariance matrix l:x := n - 1 (L;;'� -oo Cdl:(L ;;'� - oo C�) is nonsingular and known, then an asymptotic (1 - ()() confidence region for J.1 is ( 1 1.2. 1 ) This region is o f little practical use since i t is unlikely that l: x will be known while J.1 is unknown. If we could find a consistent estimate fx ofl:x and replace l:x by fx in (1 1 .2. 1 ), we would still have an asymptotic 1 - ()( confidence region for J.l. However, in general, l:x is a difficult quantity to estimate. A simpler approach is to construct for each i, individual confidence intervals for J..l. ; based on X 1 ;, , X.; which are then combined to form one confidence region for J.l. If J;(w) is the spectral density of the ith process, {X, ; }, then by the results of Section 10.4 (see ( 1 0.4. 1 1)), • . . ( �) y;;(h) 2n/;(O) := L 1 r l hl 9 is a consistent estimator of 2nf(O) = L;;'� Y;; (k) provided r = r. is a sequence of numbers satisfying r./n � 0 and r. � oo. Thus if X. ; denotes the sample mean of the ith process, and <l>� is the a-quantile of the standard normal distribution, then by Theorem 7. 1 .2, the bounds X. ; ± <l> 1 _a12 (2n/;(O)/n) 1/2 are asymptotic (1 - ()() confidence bounds for J..l.; · Hence A P([ J..I. ; - X. ;[ :0:: <l> 1 _ a12 (2n}; (O)/n) 1 /2 , l - 1 , . . . , m) - oo - A • _ where the right-hand side converges to 1 - m()( as n � oo . Consequently as n � oo, the region ( 1 1 .2.2) has a confidence coefficient of at least 1 - ()(. For large values of m this confidence region will be substantially larger than an exact ( 1 - ()() region. Nevertheless it is easy to construct, and in most applications is of reasonable size provided m is not too large. Estimation of r(h). For simplicity we shall assume throughout the remainder of this section that m = 2. As in the univariate case, a natural estimate of the covariance matrix r(h) = E [(Xr +h - JJHXr J.l)'] is { n-1 f(h) = n-1 - f (Xr + h - X.)(Xr - X.)' " t- 1 for 0 s; h s; L (Xr+ h - X.) (Xr - X.)' for - n + 1 t� -h+ ! " n - 1, :0:: h < 0. 1 1 . Multivariate Time Series 408 Writing yii(h) for the (i,j)-component of f'(h), i = 1, 2, we estimate the cross correlation function by Pii(h) = Yii(h) (Yii (O)yi)O) r 1!2 _ If i = j this reduces to the sample autocorrelation function of the i'h series. We first show the weak consistency of the estimator yii(h) (and hence of pii(h)) for infinite-order moving averages. We then consider the asymptotic distribution of yii(h) and P;i (h) in some special cases of importance. Theorem 1 1 .2.1 . Let {X,} be the bivariate time series 00 x, = L ck z,_ b = k - oo where { Ck = [ Ck (i,j)JL = 1 } is a sequence of matrices with L ;;'= i, j = 1 , 2. Then as n --> oo , - oo I Ck (i,j)l < oo , and pij(h) � pij(h) for each fixed h :::0: 0 and for i, j = 1 , 2. PROOF. We shall show that f'(h) � r(h) where convergence in probability of random matrices means convergence in probability of all of the components of the matrix. From the definition of f'(h) we have, for 0 :::;; h :::;; n - 1 , n -h n -h n- h t(h) = n - 1 L x t+h x; - n - 1 X " L x; - n- 1 L xt+hx� t= 1 t= 1 t= 1 ( 1 1 .2.3) 1 (n h)n + - - X " X�. Since EX = 0, we find from Proposition 1 1 .2. 1 that X" = op ( 1 ), n - 1 L�:t X, = op ( 1 ) and n -1 L�:1h Xt+h = oP (l). Consequently we can write f'(h) = r*(h) + op ( 1 ), where ( 1 1 .2.4) " r*(h) = n- 1 I x , + h x; t= 1 " 00 = n -1 L L t=l i = -oo 00 L ci+h zt -i z;_i c; j= -oo Observe that for i # j, the time series {Z,_ ; 1 Z, i 2 , t = 0, ± 1 , . . . } is white noise so that by Theorem 7. 1 . 1 , n -1 L�= 1 zt - i, 1 zt -j, 2 � 0. Applying this _ _ _ 409 § 1 1.2. Estimation of the Mean and Covariance Function argument to the other three components of Z, _ ;Z, _j , we obtain n i # j, n -1 L z, _ ; Z r -j � 0 2 X 2 , i =l where 0 2 x 2 denotes the 2 x 2 zero matrix. Hence for m fixed, For any matrix A define I A I and EA to be the matrices of absolute values and expected values, respectively, of the elements of A. Then E I G!(h) - G! (h ) l l ii Iljl>m ci+h n - 1 t=lf z,_;z;_j c; l i #j .:::; L I C; + h l (n -1 f E I Zr - ; Z j l ) I C) I r= 1 ii #j ljl> m = E I I or ; i _ or The latter bound is independent of n and converges to 0 as m --+ oo . Hence lim lim sup E I G!(h) - G! (h) l m-+oo n-+oo which, by Proposition 6.3.9, implies that = 0, G!(h) � 02 X 2 · Now ( �� z, _; z; _ ;) c; G!(h) + � C; + h ( n -1 f Z,z;) c; + 4: Ci + h(n - 1 Un ;) C; t=l f*(h) = G!(h) + � ci + h n -1 = l l where Un ; = .L7�{ - ; z,z; - I7= 1 z,z; is a sum of 2 l i l random matrices if I il < n and a sum of 2n random matrices if I i l 2 n. Hence I I E ,L ci + h(n -1 Un ; ) C; ::;; 2n - 1 L l i i i C; + hl l l: I I C; I i Iii ,; n + 2n -1 L I Ci+ h l l l: I I C; I lil> n and by the absolute summability of the components of the matrices { C; }, this 410 1 1 . M ultivariate Time Series bound goes to zero as n ---> oo. It therefore follows that ( r*(h) = � ci + h n - 1 �� zrz;) c; + op( 1 ). By applying the weak law of large numbers to the individual components of zt z;, we find that n - 1 f. zt z; � t, t= 1 and hence r*(h) � I ci +htc; i = r(h). Consequently, from ( 1 1 .2.4), f(h) � r(h). ( 1 1 .2.5) The convergence of pij(h) to pij(h) follows at once from ( 1 1 .2.5) and Proposition 6. 1 .4. D In general, the derivation of the asymptotic distribution of the sample cross-correlation function is quite complicated even for multivariate moving averages. The methods of Section 7.3 are not immediately adaptable to the multivariate case. An important special case arises when the two component time series are independent moving averages. The asymptotic distribution of p 1 2 (h) for such a process is given in the following theorem. Theorem 1 1 .2.2. Suppose that 00 and xtl = I cxjzt -j. 1 , j = - oo { Zt t } 00 � IID(O, a-?}, xt 2 = I [Jjzt -j, 2 • { Zd IID(O, o'i}, j � - oo where the two sequences { Zt t } and {Z1 2 } are independent, Ij l cxj l < Lj l fJjl < oo . Then if h � 0, � ( }:, fJdh) is AN 0, n - 1 j ) oo and P (j)p2 2 (j) . oo 1 1 If h, k � 0 and h =I= k, then the vector (p d h), p1 2 (k))' is asymptotically normal with mean 0, variances as above and covariance 00 n - 1 L P1 1 (j)p22 (j + k - h). j= - oo PROOF. It follows easily from ( 1 1 .2.3) and Proposition 1 1 .2. 1 that 411 §1 1 .2. Estimation of the Mean and Covariance Function ( 1 1 .2.6) where n n Y tz (h) = n - 1 L: X,+h. 1 X, z = n -1 L: I I rxi+hf3j ZH. 1 z, _j. z · t� 1 i j t� 1 Since Eyf2 (h) = 0, we have n Var(y f2 (h)) n n n - 1 L L L L L L: rxi+ hf3pk+ hf3t E [Zs - i , 1 zs -j, 2 Z, _u Zr -l, 2 ] . s� 1 r � 1 i j k l By the independence assumptions, = - - if s i = t otherwise, ( 1 1 .2.7 ) k and s - j = t - l, so that Applying the dominated covergence theorem to the last expression, we find that ro ( 1 1 .2.8) n Var (y f2 (h)) ---> L y 1 1 ( j )y2 2 (j) as n ---> w. j= - oo Next we show that y f2 (h) is asymptotically normal. For m fixed, we first consider the (2m + h)-dependent, strictly stationary time series, { I Iil s: m L iii S: m rxif3j Zr +h -i , 1 Zr -j, 2 , t = 0, ± 1, . . . }. By Theorem 6.4.2 and the calculation leading up to ( 1 1 .2.8), n n - 1 I L L rxi f3j Zr + h - i , ! Zr -j, Z is AN(O, n - 1 am ), r � 1 l ii S: m lii S: m where Now as m ---> w, am ---> Liy1 1 ( j)y2 2 ( j). Moreover, the above calculations can be used to show that 2 lim lim sup nE y f2 (h) - n -1 rI L L rxif3j Zr+h - i, 1 Zr -j, 2 = 0. m�oo n�oo �J l ii S: m li i S: m 1 I This implies, with Proposition 6.3.9, that y f2 (h) is ( AN 0, n - 1 � k � oo ) Y 1 1 (k) Yzz (k) . ( 1 1 .2.9) I I . Multivariate Time Series 412 Since y 1 1 (0) !.. y 1 1 (0) and y22 (0) !.. y22(0), we find from ( 1 1 .2.6), ( 1 1 .2.9) and Proposition 6.3.8 that ( � 2 t\ 2(h) = "Ydh)(y1 1 (O)y2 z (O)f 11 is AN 0, n - 1 j= Finally, after showing that a ) /" (j) pzz(j) . - 00 n Cov (y t2(h), Yt2(k)) --+ L y 1 1 (j)y22(j + k h), j= - oo the same argument, together with the Cramer-Wold device, can be used to establish the last statement of the theorem. D This theorem plays an important role in testing for correlation between two processes. If one of the two processes is white noise then p 1 2 (h) is AN(O, n - 1 ) in which case it is straightforward to test the hypothesis that p 1 2(h) = 0. However, if neither2process is white noise, then a value of I /J1 2 (h)l which is large relative to n - 1 1 does not necessarily indicate that p 1 2 (h) is different from zero. For example, suppose that { Xr 1 } and { X, 2 } are two independent AR( 1 ) processes with p 1 1 (h) = p22(h) = .81hl. Then the asymptotic variance of fJ d h) is n - 1 ( 1 + 2 L k"= 1 (.64)k ) = 4.556n- 1 . It would therefore not be surprising to observe a value of p 1 2(h) as large as 3n - 112 even though { X, I } and { X, 2 } are independent. If on the other hand p 1 1 (h) = .81hl and p22 (h) = ( - .8)1 hl, then the asymptotic variance of p 1 2(h) is .21 95n- 1 and an observed value of 3n - 112 for p12(h) would be very unlikely. Testing for Independence of Two Stationary Time Series. Since by Theorem 1 1 .2.2 the asymptotic distribution of p 1 2 (h) depends on both p 1 1 ( · ) and p22( • ), any test for independence of the two component series cannot be based solely on estimated values of p 1 2(h), h = 0, ± 1 , . . . , without taking into account the nature of the two component series. This difficulty can be circumvented by "prewhitening" the two series before computing the cross-correlations p 1 2(h), i.e. by transforming the two series to white noise by application of suitable filters. If { X, I } and { X, 2 } are invertible ARMA(p, q) processes this can be achieved by the transformations, - 00 n (i) Xr j. Z, ; - " j=L.O i - i where L� o n)0zi = (/bu>(z)jOU>(z), lz l s 1 , and (/6< 0, ou> are the autoregressive and moving average polynomials of the i1h series, i = 1 , 2. Since in practice the true model is nearly always unknown and since the data X,i , t s 0, are not available, it is convenient to replace the sequences {Z,; }, i = 1 , 2, by the residuals { If;;. t = 1 , . . . , n} (see (9.4. 1)) which, if we assume that the fitted ARMA(p, q) models are in fact the true models, are white noise sequences for i = 1 , 2. To test the hypothesis H0 that {X, I } and {X, 2 } are independent series, we § 1 1 .2. Estimation of the Mean and Covariance Function 413 observe that under H0, the corresponding two prewhitened series {Z,I } and { Z,2 } are also independent. Under H0, Theorem 1 1 .2.2 implies that the sample autocorrelations 1\ 2 (h), p 1 2 ( k), h =I k, of { Zr } and {Z,2 } are asymptotically 1 independent normal with means 0 and variances n - 1 • An approximate test for independence can therefore be obtained by comparing the values of l p 1 2 (h)l with 1 .96n-112 , exactly as in Example 7.2. 1 . If we prewhiten only one of the two original series, say { Xr }, then under H0 Theorem 1 1 .2.2 implies 1 that the sample autocorrelations p 1 2 (h), p 1 2 (k), h =I k, of { Z, I } and { X,2 } are asymptotically normal with means 0, variances n - 1 and covariance n-1 p22 (k - h), where p22 ( · ) is the autocorrelation function of {X;2 }. Hence for any fixed h, p 1 2 (h) also falls (under H0) between the bounds ± 1 .96n - 112 with a probability of approximately .95. EXAMPLE 1 1 .2. 1 . The sample cross-correlation function p 1 2 ( · ) of a bivariate time series of length n = 200 is displayed in Figure 1 1 . 1 . Without knowing the correlation function of each process, it is impossible to decide if the two processes are uncorrelated with one another. Note that several of the values 2 of p 1 2 (h) lie outside the bounds ± 1 .96n - 11 = ± . 1 39. Based on the sample autocorrelation function and partial autocorrelation function of the first process, we modelled { X, } as an AR(1) process. The sample cross-correlation 1 function p 1 2 ( · ) between the residuals (J.f; 1 , t = 1, . . . , 200} for this model and { X,2 , t 1, . . . , 200} is given in Figure 1 1 .2. All except one of the values p u (h) lie between the bounds ± . 1 39, suggesting by Theorem 1 1 .2.2, that the time = 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 -20 -10 0 10 20 Figure 1 1. 1 . The sample cross-correlation function p1 2 (h) between { Xn } and { Xrz }, 1 Example 1 1.2. 1 , showing the bounds ± 1 .96n- 12 • I I . Multivariate Time Series 414 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 - 20 -10 0 10 20 Figure 1 1 .2. The sample cross-correlation function between the residuals { W, 1 } and { X, 2 }, Example 1 1 .2. 1 , showing the bounds ± 1 .96n - 1 12 • series { Jt; , } (and hence { Xr 1 }) is uncorrelated with the series { X1 2 }. The data for this example were in fact generated from two independent AR(l) processes and the cross-correlations were computed using the program TRANS. = = ExAMPLE 1 1 .2.2 (Sales with a Leading Indicator). In this example we consider the sales data { 1'; 2 , t 1 , . . . , 1 50 } with leading indicator { Y, 1 , t 1 , . . . , 1 50 } given by Box and Jenkins ( 1 976), p. 537. The autocorrelation functions of { 1'; 1 } and { 1';2 } suggest that both series are non-stationary. Application of the operator ( 1 - B) yields the two differenced series {Dr 1 } and {D12 } whose properties are compatible with those of low order ARMA processes. Using the program PEST, it is found that the models and D1 2 - .838D1 _ � , 2 - .0676 = { Z1J } � WN(0, .0779), Z1 2 - .61 021 _ � , 2 , ( 1 1 .2. 1 0) ( 1 1 .2. 1 1 ) {Z1 2 } WN(O, 1 .754), provide a good fit to the series {D, J } and { D, 2 } , yielding the "whitened" series of residuals { Jt; J } and { lt; 2 } with sample variances .0779 and 1 .754 � respectively. The sample cross-correlation function of { D, 1 } and { D,2 } is shown in Figure 1 1 .3. Without taking into account the autocorrelation structures of { D, 1 } and { D,2 } it is not possible to draw any conclusions from this function. § 1 1 .2. Estimation of the Mean and Covariance Function 415 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 -20 -10 0 10 20 Figure 1 1 .3. The sample cross-correlation function between { D,1 } and { D, 2 }, Example 1 1 .2.2. Examination of the sample cross-correlation function of the whitened series { ft; d and { ft;2 } is however much more informative. From Figure 1 1.4 it is apparent that there is one large sample cross-correlation (between ft; 1 and ft; +3. 2 ) and that the others are all between ± 1 .96n- 1 12 • Under the assumption that { ft; t } and { ft; 2 } are jointly Gaussian, Bartlett's formula (see Corollary 1 1 .2. 1 below) indicates the compatibility of the cross-correlations with a model for which pd - 3) # 0 and P1 2(h) = 0, h # - 3. The value p 1 2 ( - 3) = .969 suggests the model, ft; 2 = 4.74ft; _ 3• 1 + N,, ( 1 1 .2.1 2) where the stationary noise { N, } has small variance compared with { ft; 2 } and is uncorrelated with { ft;t }. The coefficient 4.74 is the square root of the ratio of sample variances of { ft; 2 } and { ft; 1 }. A study of the sample values of { ft;2 - 4.74 ft; _3. d suggests the model for { N,}, { U,} "' WN(O, .0782). ( 1 + .345B)N, = U,, ( 1 1 .2. 1 3) Finally, replacing { Z,t } and { Z,2 } in ( 1 1 .2. 1 0) and ( 1 1 .2. 1 1 ) by { ft; t } and { ft;2 } and using ( 1 1 .2. 1 2) and ( 1 1 .2. 1 3), we obtain a model relating { D1 1 } , {D,2 } and { U, }, namely, I I . M ultivariate Time Series 416 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 - 0. 5 -0.6 - 0. 7 -0.8 -0.9 -1 -20 -10 0 10 20 Figure 1 1 .4. The sample cross-correlation function between the whitened series { W, 1 } and { W, 2 }, Example 1 1 .2.2. D, 2 + .0765 = (1 - .6 10B)(1 - .838B) - 1 [4.74(1 - .474B) - 1 D, _ 3, 1 + ( 1 + .345B) - 1 U,]. This model should be compared with the one derived later (Section 1 3. 1) by the more systematic technique of transfer function modelling. (Bartlett's Formula). If {X,} is a bivariate Gaussian process (i.e. if all of the finite dimensional distributions of {(X, 1 , X, 2 )', t = 0, ± 1, . . . } are multivariate normal) and if the autocovariances satisfy Theorem 1 1 .2.3 00 L I Yij (h)l < 00 , h= -oo i, j = 1 , 2, then 00 lim n Cov (p u(h), p u (k)) = L [p 1 1 (j) p22 (j + k - h) + p ,z ( j + k) pz , ( j - h) [See Bartlett (1 955).] - P1 2 (h) { p u (j) P1 2 U + k) + Pzz (j) pz, ( j - k)} - P1 2 (k) { p u (j) P1 2 U + h) + Pzz (j) pz, (j - h)} + p u(h) p1 2 (k) { !p �, (j) + P�z (j) + !P�z (j)} J. §1 1 .3. Multivariate ARMA Processes 417 If {Xr} satisfies the conditions of Theorem 1 1 .2.3, if either { Xt l } or { XtZ } is white noise, and if Corollary 1 1.2.1 . P 1 z (h) = 0, then h ¢ [a, b], lim n Var ( p u(h ) ) = 1 , h ¢ [a, b]. PROOF. The limit is evaluated by direct application of Theorem 1 1 .2.3. D § 1 1 . 3 Multivariate ARMA Processes As in the univariate case, we can define an extremely useful class of multivariate stationary processes {Xr }, by requiring that {Xr } should satisfy a set of linear difference equations with constant coefficients. Definition 1 1 .3.1 (Multivariate ARMA(p, q) Process). {Xn t = 0, ± 1, . . . } is an m-variate ARMA(p, q) process if {Xr } is a stationary solution of the difference equations, Xr - <1> 1 Xr _ 1 - · · · - <l>pXr - p = Zr + 8 1 Zr _ 1 + where <1> 1 , . . . , P , 8 1 , . . . , eq are real m x ··· + eqzt - q' ( 1 1 .3 . 1 ) m matrices and {Zr} � WN(O, t). The equations ( 1 1 .3. 1 ) can be written in the more compact form ( 1 1 .3.2) {Zr} WN(O, t), (B)Xr = 8(B)Zn where <l>(z) := I - <1> 1 z - . . . - <l>p z P and 8(z) := I + 8 1 z + . . . + eq z q are matrix-valued polynomials, I is the m x m identity matrix and B as usual denotes the backward shift operator. (Each component of the matrices <l>(z), 8(z) is a polynomial with real coefficients and degree less than or equal to p, q respectively.) � EXAMPLE 1 1 .3. 1 (Multivariate AR(l) Process). This process satisfies Xr = Xr -1 + Zr, { Zr } � WN(O, t). ( 1 1 .3.3) By exactly the same argument used in Example 3.2. 1 , we can express Xr as 00 i ( 1 1 .3.4) xt = L zt-j , j=O provided all the eigenvalues of are less than 1 in absolute value, i.e. provided det(/ - z<l>) # 0 for all z E C such that l z l s 1. ( 1 1 .3.5) 418 1 1 . Multivariate Time Series If this condition is satisfied then the series ( 1 1 .3.4) converges (componentwise) both in mean square and absolutely with probability 1 . Moreover it is the unique stationary solution of(1 1 .3.3). The condition (1 1 .3.5) is the multivariate analogue of the condition 1 ¢' 1 < 1 , required for the existence of the causal representation (1 1 .3.4) in the univariate case. Causality and invertibility of a general ARMA(p, q) model are defined precisely as in Definitions 3.1.3 and 3. 1.4 respectively, the only difference being that the coefficients 1/Ji , ni in the representations X, = L � o t/Ji Z< -i and Z, = L� o niX< -i • are replaced by matrices 'Pi and IIi whose components are required to be absolutely summable. The following two theorems provide us with criteria for causality and invertibility analogous to those of Theorems 3. 1 . 1 and 3.1 .2. Theorem 1 1.3.1 (Causality Criterion). If det <D(z) # 0 for all z E C such that l z l then ( 1 1 .3.2) has exactly one stationary solution, :::;; 1, ro X, = L 'l'j Z r -j • ( 1 1 .3.6) ( 1 1 .3.7) j= O where the matrices 'Pi are determined uniquely by ro i 'l'(z) := L 'l'iz = <D- 1 (z)8(z), j=O lzl :::;; 1. ( 1 1 .3.8) PROOF. The condition (1 1 .3.6) implies that there exists e > 0 such that <D -1 (z) exists for l z l < I + e . Since each of the m 2 elements of <D- 1 (z) is a rational function of z with no singularities in { l z l < 1 + e }, <D - 1 (z) has the power series expansion, i l z l < I + e. <D- 1 (z) = L Aiz = A (z), j= O Consequently A) 1 + e/2)i ---+ 0 (componentwise) as j ---+ oo, so there exists K E (0, oo) independent of j, such that all the components of Ai are bounded in absolute value by K ( 1 + e/2) -i, j = 0, 1, 2, . . . . In particular this implies absolute summability of the components of the matrices Ai. Moreover we have ro A (z)<D(z) = I for l z l :::;; 1 , where I is the (m x m ) identity matrix. By Proposition 3. 1 . 1 , if {X, } is a stationary solution of (1 1 .3.2) we can apply the operator A (B) to each side of this equation to obtain X, = A (B)8(B)Z,. Thus we have the desired representation, §1 1.3. Multivariate ARMA Processes 419 ro xt = L 'f'j zt-j , j=O where the sequence { 'f'i } is determined by ( 1 1 .3.8). Conversely if X1 = L � o 'f'iZt -j with {'f'i } defined by ( 1 1 .3.8) then <D(B)X1 = <D(B)'f'(B)Z1 = 8 (B) Z n showing that {'f'(B)Z1 } is a stationary solution of ( 1 1 .3.6). Combining the results of the two preceding paragraphs we conclude that if det <D(z) i= 0 for I z I s 1, then the unique stationary solution of ( 1 1.3.2) is the causal solution (1 1.3.7). 0 Since the analogous criterion for invertibility is established in the same way (see also the proof of Theorem 3.1.2), we shall simply state the result and leave the proof as an exercise. Theorem 1 1 .3.2 (Invertibility Criterion). If det 8(z) i= 0 for all z E IC such that l z l s 1 , ( 1 1 .3.9) and { X1 } is a stationary solution of ( 1 1 .3.2) then ro ( 1 1 .3.10) zt = I njxt -j , j=O where the matrices nj are determined uniquely by n(z) := I nj z j = e - 1 (z)<D(z), j=O ro lzl s 1. ( 1 1.3. 1 1) PROOF. Problem 1 1 .4. D Remark. The matrices 'f'i and ni of Theorems 1 1 .3. 1 and 1 1 .3.2 can easily be found recursively from the equations j 'f'j = I <D; 'f'j i + ej , i=l - j= 1 , 2, (1 1 .3. 1 2) ..., and j j = 1, 2, . . . , (1 1.3. 1 3) nj = - I e i nj-i - <Dj, i=l where ei = 0, j > q, and <D; = 0, i > p. These equations are established by comparing coefficients of z i in the power series identities ( 1 1 .3.8) and (1 1 .3.1 1) after multiplying through by <D(z) and 8 (z) respectively. ExAMPLE 1 1.3.2. For the multivariate ARMA(1 , 1 ) process with <D 1 = U :n 420 and 0 1 = I I . M ultivariate Time Series [ <1>'1 , an elementary calculation using (1 1 .3.8) gives _2 1 .5z(1 + .5z) 'P(z) (1 - .5z) lzl .5z(1 - .5z) 1 - .25z2 = J ' ::o; 1. ( 1 1 .3. 14) The coefficient matrices 'l'j in the representation ( 1 1 .3.7) are found, either by expanding each component of ( 1 1 .3.14) or by using the recursion relation (1 1.3.12), to be '1'0 = I and 'J'.J j =r [. J + 1 2j - 1 ' 2 1 ] j = 1, 2, . . . . It is a simple matter to carry out the same calculations for any multivariate ARMA process satisfying (1 1.3.6), although the algebra becomes tedious for larger values of m. The calculation of the matrices ITj for an invertible ARMA process is of course quite analogous. For numerical calculation of the coefficient matrices, the simplest method is to use the recursions ( 1 1 .3.12) and (1 1.3.13). The Covariance M atrix Function of a Causal ARMA Process From the representation ( 1 1 .3.7) we can express the covariance matrix r(h) = of the causal process (1 1 .3.1) as E(Xt + h X;) = = 00 h 0, ± 1' ± 2, . . . ' I +kt 'l'� , k=O '�'h where the matrices 'Pj are found from (1 1.3.8) or (1 1.3. 1 2) and 'l'j : = 0 for j < 0. It is not difficult to show (Problem 1 1 .5) that there exists B E (O, 1) and a constant K such that the components yij(h) of r(h) satisfy l yij(h)l < Kslhl for all i, j and h. The covariance matrices r(h), h = 0, ± 1 , ± 2, . . . , can be determined by solving the Yule-Walker equations, qh) p j = 0, 1 , 2, . . . , (1 1 .3 . 1 5) r(j) - I ,r(j - r) = I: e, t'P;_j, j �rS: q r=l obtained by post-multiplying (1 1.3. 1) by x;_j and taking expectations. The first (p + 1) of the equations (1 1.3. 1 5) can be solved for the components of r(O), . . . , r(p) using the fact that r( - h) r'(h). The remaining equations then give r(p + 1 ), f(p + 2), . . recursively. The covariance matrix generating function is defined (cf. (3.5. 1)) as = . = oo = 00 L f(h)z h, h= which can be expressed (Problem 1 1 .7) as G(z) 'P(z)l:'P'(z- 1 ) <l> - 1 (z)E>(z)l:E> ' (z - 1 ) <l>' - 1 (z - 1 ). G(z) = ( 1 1 .3. 1 6) (1 1.3. 1 7) 421 § 1 1 .4. Best Linear Predictors of Second Order Random Vectors § 1 1.4 Best Linear Predictors of Second Order Random Vectors Let {X, = (Xr 1 , . . . , X,m)', t = 0, ± 1, ± 2, . . . } be an m-variate time series with mean EX, = 0 and covariance function given by the m x m matrix, K(i, j) = E(X i Xj). IfY = ( Y1 , . . . , Ym)' is a random vector with finite second moment, we define (1 1 .4. 1) where Sn = sp{X,i, t = 1, . . . , n; j = 1, . . . , m}. If V = (U 1 , , Um)' is a random vector, we shall say that u E sn if ui E Sn , i = 1, . . . ' m. It then follows from the projection theorem that the vector P(Y I X 1 , . . . , Xn) is characterized by the two properties : . . • ( 1 1 .4.2) and i = 1, . . . , n, ( 1 1 .4.3) where we say that two m-dimensional random vectors X and Y are orthogonal (written X _l_ Y) if E(XY') = Om x m · The best linear predictor of xn + I based on the observations X I , . . . ' xn is obtained On replacing Y by Xn + 1 in ( 1 1 .4. 1), i.e. ifn = 0, Xn + 1 = P(Xn + 1 I X 1 , . . . , Xn), if n ;;:: 1 . � {0, Since x n + I E Sn, there exist m X m matrices <l>n l ' . . . ' nn such that Moreover, from (1 1 .4.3), we have xn + I equivalently, n = 1, 2, . . . . - Xn + 1 _j_ ( 1 1 .4.4) x n + 1 - i , i = 1, . . . ' n, or i = 1, . . . , n. (1 1.4.5) When X n + l is replaced by the expression in ( 1 1.4.4), these prediction equations become n i = 1 , . . . , n. L <l>niK(n + 1 j, n + 1 i) = K(n + 1 , n + 1 i), j= I - - - In the case when {X,} is stationary with K(i, j) = r(i - j), the prediction equations simplify to the m-dimensional analogues of (5. 1 .5), i.e. n (1 1 .4.6) i = 1, . . . , n. L: nir(i - j) = r(i), j= I 422 1 1 . Multivariate Time Series The coefficients { <l>nJ may be computed recursively using the multivariate Durbin-Levinson algorithm given by Whittle ( 1 963). Unlike the univariate algorithm, however, the multivariate version requires the simultaneous solution of two sets of equations, one arising in the calculation of the forward predictor, P(X n + 1 I X 1, . . . , Xn), and the other in the calculation of the backward predictor, P(X0 I X 1 , . . . , Xn ). Let n 1 , . . . , nn be m x m coefficient matrices satisfying n = 1, 2, . . . . ( 1 1 .4.7) P(Xo i Xl , · · · X n) = <l>n! X l + · · · + <l>nn X n , , Then from ( 1 1 .4.3), n i = 1 , . . . , n. I <l>nj ru - i) = r( - i), j= l The two prediction error covariance matrices will be denoted by V, = E(Xn + ! - Xn + l )(Xn + l - X n + l )', Vn = E(X0 - P(X0 I X 1 , . . . , Xn))(X0 - P(X0 I X1, . . . , X n))'. Observe from ( 1 1 .4.5) that for n 2 1 , Vn = E[(Xn + 1 - X n + 1 )X � + 1] = r(O) - <l>nl r( - 1) - · · · - <l>nn r( - n) ( 1 1 .4.8) ( 1 1 .4.9) and similarly that ( 1 1 .4. 1 0) We also need to introduce the matrices L\n = E[(Xn + 1 - Xn + 1 )X �] = r(n + 1) - <l>n l r(n) - ··· - <l>nn r(l), ( 1 1 .4. 1 1) and Lin = E[(X0 - P(X0 / X 1 , . . . , X n))X� + 1 ] = r( - n - 1 ) - <l>n 1 r( - n) - · · · - <l>nn r( - 1). ( 1 1 .4. 1 2) Proposition 1 1.4.1 . (The Multivariate Durbin-Levinson Algorithm). Let { X1} be a stationary m-dimensional time series with EX1 = 0 and autocovariance function r(h) = E(Xt + h X;). If the covariance matrix of the nm components of X1, . . . , X n is nonsingular for every n 2 1 , then the coefficients { <l>nJ, { <l>nJ in ( 1 1 .4.4) and ( 1 1 .4.7) satisfy, for n 2 1 , 1 <l>nn = L:\n - 1 vn-- 1 > <linn = Lin- ! v;_l l , <l>nk = <l>n - ! ,k - <l>nn <l>n - ! , n - k • <l>nk = <l>n - l . k - <l>nn<l>n - l . n - k• ( 1 1 .4. 1 3) k = 1, . . . , n - 1, k = 1, . . . , n - 1, § 1 1 .4. Best Linear Predictors of Second Order Random Vectors 423 where V,, Vn, dn, Lin are given by ( 1 1 .4.9H l 1 .4. 1 2) with V0 = V0 = r(O) and PROOF. The proof of this result parallels the argument given in the univariate case, Proposition 5.2. 1 . For n = 1, the result follows immediately from ( 1 1 .4.6) and ( 1 1 .4.8) so we shall assume that n > 1. The multivariate version of(5.2.6) is (1 1 .4. 14) where U = X 1 - P(X 1 1 X2 , . . . , Xn) and A is an m the orthogonality condition x m matrix chosen to satisfy Xn + I - AU .1 U I.e., E(Xn + 1 U') = AE(UU'). ( 1 1.4. 1 5) By stationarity, P(Xn + I I Xz , . . . ' Xn) = <l>n - l . l Xn + . . . + <l>n - J . n - I X2, ( 1 1 .4. 1 6) ( 1 1.4. 1 7) U = X I - <f>n - J . IX 2 - · · · - <f>n - J , n - ! Xn, and E(UU') = Vn - 1· (1 1.4. 1 8) It now follows from ( 1 1 .4.3), ( 1 1 .4.1 1), (1 1 .4. 1 5) and ( 1 1 .4. 1 8) that A = E(Xn + ! U')V,;-_11 = E[(Xn + I - P(Xn + I I Xz , . . . , Xn))U'] vn--1 1 = E[(Xn + l - P(Xn + l i Xz , . . . , Xn))X'l ] V,;-_1 1 = [r(n) - <l>n - l , l r(n - 1) - . . . - <l>n - l , n - l r(1)] V;_l l ( 1 1 .4. 1 9) = dn - 1 v ;_\ . Combining equations (1 1 .4. 14), ( 1 1 .4. 1 6) and ( 1 1 .4. 1 7), w e have n- I xn + l = A X ! + I (<l>n - l , j - A <f>n - J . n -)Xn + l - j j� I which, together with (1 1 .4. 19), proves one half of the recursions ( 1 1 .4. 1 3). A symmetric argument establishes the other half and completes the proof. D Remark 1 . In the univariate case, r(h) = r( - h), so that the two equations ( 1 1 .4.6) and ( 1 1.4.8) are identical. This implies that <l>nj = <f>nj for all j and n. The equations ( 1 1.4. 1 3) then reduce to the univariate recursions (5.2.3) and (5.2.4). 1 1 . Multivariate Time Series 424 If for a fixed p 2 1, the covariance matrix of (X� + 1 , . . . , X'd' is nonsingular, then the matrix polynomial <l>(z) = I - <l>P 1 z - · · · - <l>PP zP is causal in the sense that det <l>(z) -:f. 0 for all z E IC such that I z I :s;; 1 (cf. Problem 8.3). To prove this, let { TJ,} be the stationary mp-variate time series Remark 2. [ ] x' TJ , = ?- 1 Xt 1 X, . Applying Proposition 1 1.4. 1 to this process with n = 1, we obtain lJz = lJz - lJ z + lJ z where lJ z with M = E(l) 2 TJ� )[E(l) 1 l)'1 )] - = 1 P(TJ z l lJ 1 ) = MTJ 1 and lJz - lJ z ..l (1 1.4.20) lJ1 · It is easily seen, from the composition of the vectors stationarity, that the matrix M has the form M= <l>p1 <l>p 2 I 0 <l>p. p - 1 0 <l>pp 0 0 I 0 0 l) 2 and l) 1 and ( 1 1 .4.21) 0 0 0 0 1 and since det(zi - M) = zmP det(<l>(z - )) (see Problem 1 1.8), it suffices to show that the eigenvalues of M all have modulus less than one. Let r = E(TJ 1l)'1 ), which is positive definite by assumption, and observe that from the orthogonality relation (1 1 .4.20), E(TJ z - fJ z )(TJz - f] 2 )' = r - MrM' . If A is an eigenvalue of M with corresponding left eigenvector a, i.e. = A.a* where a* denotes the complex-conjugate transpose of a, then a* M E l a* (TJ 2 - f] 2 ) 1 2 = a*ra - a*MrM'a = a*ra - l -1 1 2 a*ra = a*ra( 1 - 1 -1 1 2 ). Since r is positive definite, we must have 1 -1 1 since this would imply that a* (TJ2 - f] 2 ) :s;; 1 . The case I -1 1 = 1 is precluded = 0, § 1 1.4. Best Linear Predictors of Second Order Random Vectors 425 which in turn implies that the covariance matrix of (X� + 1 , . . . , X'1 )' is singular, a contradiction. Thus we conclude that det <l>(z) =P 0 for all I z I s 1 . We next extend the innovations algorithm for computing the best one-step predictor to a general m-variate time series with mean zero. From the definition of Sn , it is clear that Sn = t sp { X i - X1i , j = 1 , . . . , m; t = 1 , . . . , n}, so that we may write n x n + l = L 0nj(Xn + l �j - xn + l �), j= I where { eni,j = 1 , . . . , n} is a sequence of m x m matrices which can be found recursively using the following algorithm. The recursions are identical to those given in the univariate case (Proposition 5.2.2) and, in contrast to the Durbin-Levinson recursions, involve only one set of predictor coefficients. 0 Proposition 1 1 .4.2 (The Multivariate Innovations Algorithm). Let { X1} be an m-dimensional time series with mean EX1 = for all t and with covariance function K (i, j) = E(X ; Xj). If the covariance matrix of the nm components of X I , . . . , xn is nonsingular for every n 2: 1 , then the one-step predictors xn + I , n 2: 0, and their prediction error covariance matrices V,, n 2: 1 , are given by if n = 0, if n ?: 1 , ( 1 1 .4.22) and V0 = K(l, 1) k�l en,n �k = K(n + 1, k + 1) - .L en,n�j v; e �.k�j v,.� J , J=O ( ) k = 0, n�J vn = K (n + 1 , n + 1) - L en,n�j v; e �. n�j · j=O . . . , n - 1, ( 1 1 .4.23) (The recursions are solved in the order V0; 0 1 1 , V1 ; 022, 02 1 , V2; 033, 832, 8 3 1 , V3; . . .) · PROOF. For i < j, X; - X; E Sj � J and since each component of xj - xi is orthogonal to sj�l by the prediction equations, we have (X; - X;) .l (Xj - X) if i =P j. ' ( 1 1 .4.24) Post multiplying both sides of ( 1 1 .4.22) by (Xk + 1 - X k + 1 ) , 0 s k s n, and I I. M ultivariate Time Series 426 taking expectations, we find from ( 1 1 .4.24) that E �n+t (Xk+t - �k+t )' E> - k V, . Since (Xn+t - �n+ t ) .1 (Xk+t - �k+t ) (see ( 1 1 .4.3)), we have EXn+t (Xk+t - �k+t )' = E�n+t (Xk+t - �k+t ) = E> n-k V, . ( 1 1 .4.25) Replacing Xk+t in ( 1 1 .4.25) by its representation given in ( 1 1 .4.22), we obtain k -1 e... - k v, = K (n + 1 , k + 1) - L E Xn+l (Xj+l - �j+l )'E>�.k-j• j=O which, by ( 1 1 .4.25), implies that k -1 e... -k v, = K(n + 1, k + 1) - I e... -j J.jE>�.k-j· j=O Since the covariance matrix of X 1 , . . . , x. is nonsingular by assumption, V, is nonsingular and hence k -1 1 e... - k = K (n + 1 , k + 1 ) - .Io e . . . -j J.jE>�.k-j v,- . ,= = •. • ' ( •. ) Finally we have n -1 x n+l - �n+l + L e ... -j (Xj+l - �j+l ), j=O which, by the orthogonality of the set {Xi - �i,j = 1, . . . , n + 1 }, implies that n -1 L en , n -j J.jE> �. n -j K(n + 1, n + 1) = v.. + j=O as desired. D x n+l = Recursive Prediction of an ARMA(p, q) Process Let {X1 } be an m-dimensional causal ARMA(p, q) process { Z1 } WN(O, !), <l>(B)X1 = E> (B)Z1, where ( B) = I - <1> 1 B - . . · - <l>P B P, E> (B) = I + 0 1 B + . . · + E>q Bq, det l: # 0 and I is the m x m identity matrix. As in Section 5.3, there is a sub stantial savings in computation if the innovations algorithm is applied to the transformed process � { wt = x, WI = <l>(B)Xt, t = t > 1 , . . . , max(p, q), max(p, q), ( 1 1 .4.26) rather than to {X1 } itself. If the covariance function of the {X1} process is denoted by f( ' ), then the covariance function K ( i, j ) = E (WiW)) is found to be 427 § 1 1 .4. Best Linear Predictors of Second Order Random Vectors if 1 ::;; i ::;; j ::;; f (i - j) p f (i - j) - L <D, f (i + r - j) if 1 ::;; i ::;; r=1 K(i,j) = < j ::;; 21, ( 1 1 .4.27) if I q. The advantage of working with this process is that the covariance matrix is zero when I i - jl > q, i, j > I. The argument leading up to equations (5.3.9) carries over practically verbatim in the multivariate setting to give � xn+1 = { ;: if 1 ::;; 0 ni(Xn+1 -i - X n+1-) n ::;; l, <1> 1 X" + · · · + <DpXn+1 - p + if_1 0 niX n+1-i - Xn+1 -i) if n > l, q � ( 1 1 .4.28) and E (Xn+1 - Xn+1 ) (Xn+1 - Xn+1 ) = V, , 1 , . . . , n and V, are found from ( 1 1 .4.23) with K(i,j) as in ' where 0"i' j = ( 1 1 .4.27). Remark 3. In the one dimensional case, the coefficients ()"i j = 1 , . . . , q do not depend on the white noise variance a2 (see Remark 1 of Section 5.3). However, in the multivariate case, the coefficients 0ni of X n+1 -i - Xn+1 -i will typically depend on 1. ' In the case when {Xr} is also invertible, xn+ 1 - xn + 1 approximation to zn+1 for n large in the sense that Remark 4. IS an E (X n+1 - Xn+1 - Zn+d (Xn+ 1 - X n+ 1 - Zn+ 1 )' � 0 as n � oo . It follows (see Problem 1 1. 1 2) that as n� j = oo , 1, . . . , q, and EXAMPLE 1 1 .4. 1 (Prediction of an ARMA(l, 1 )). Let X1 be the ARMA(1 , 1 ) process ( 1 1 .4.29) { Z1 } � WN(O, 1) with det(/ - <Dz) # 0 for I z I ::;; 1 . From ( 1 1 .4.28), we see that n � 1. ( 1 1 .4.30) 1 1 . M ultivariate Time Series 428 The covariance function for the process {W1} defined by ( 1 1.4.26) is given by K(i,j) = r(O), te', t + ete', 0, K'(i,j), i, j = 1, 1 :::;; i, j = i + 1, 1 i + 1, j < i. { As in Example 5.3.3, the recursions in ( 1 1.4.23) simplify to V0 = r(O), ( 1 1 .4.3 1) en! = et v,-::.\ , ' + = t ete - en! V.. - 1 e� l · v, In order to start this recursion, it is necessary first to compute r(O). From (1 1 .3.15) we obtain the two matrix equations r(O) - r'(l) = t + et(' + e'), r( 1 ) - r(O) = e t. Substituting r( l ) = <l>r(O) + et into the first expression, we obtain the single matrix equation, (1 1.4.32) r(O) - r(O)' = te' + e t' + t + ete', which is equivalent to a set of linear equations which can be solved for the components of r(O). Ten observations X 1 , . . . , X 1 0 were generated from the two-dimensional ARMA(1, 1) process + J [xt l ] [·7 o][xr- 1 . 1 ] = [ztl ] [ _ .5 .6 [zzt- l , l ] 8 _ x1 2 o .6 xr - 1 . 2 Z1 2 .7 . r-1 . 2 ( 1 1 .4.33) where { ZI } i s a sequence of iid N([8J , u l -� l ] ) random vectors. The values of xn + j, v, and en ! for n = 0, 1 , . . . ' 10, computed from equations ( 1 1 .4.30}- ( 1 1 .4.32), are displayed in Table 1 1. 1 . Notice that the matrices V, and en 1 are converging rapidly to the matrices t and e , respectively. Once X l , . . . ' xn are found from equations ( 1 1.4.28), it is a simple matter to compute the h-step predictors of the process. As in Section 5.3 (see equations (5.3.1 5)), the h-step predictors Ps"Xn+h ' h 1, 2, . . . , satisfy n+h -1 Xn h j - xn + h -J, en h jL =h + -l.A + = h > l-n § 1 1 .4. Best Linear Predictors of Second Order Random Vectors 429 Table 1 1 .1. Calculation of X. for Data from the ARMA(1, 1) Process of Example 1 1 .4. 1 n 0 2 3 4 5 6 7 8 9 xn +l [ - 1 .875 ] [ - 2.1 .693 5 1 8] - .030 [ - 3.002] 1 .057 [ -- 2.454 ] - 1 .038 [ - 1. 1 19] [ --1.086 .720] - .455 [ - 2.738] .962 [ - 2.565 ] 1 .992 [ - 4.603] 2.434 [ - 2.689 ] 2. 1 1 8 10 [7.240 3.701 [2.035 [ 11 .060 .436 .777 [1.215 . 14 1 [ 1 .740 .750 [1 . 1 1 3 [ 1 .744 .085 .728 [ 1 .059 [ 1 .721 .045 .722 [ 1 .038 [ 1 .721 .030 v, ] ] ] ] ] ] ] ] ] ] ] 3.701 6.7 1 6 1 .060 2.688 .777 2.323 .740 2.238 .750 2. 177 .744 2.1 19 .728 2.084 .721 2.069 .722 2.057 .721 2.042 .717 .7 1 7 2.032 e. , [ .01 3 [ - .. 142 193 - .3 5 1 [ .345 .426 [ - .424 [ - .5.4421 2 .580 [ - .446 .610 [ - .461 .623 [ - .475 .639 [ - .480 [ - .48.6571 �n+l ] ] ] ] ] ] ] ] ] ] .224 .243 .502 .549 .554 .61 7 .555 .662 .562 .707 .577 .735 .585 .747 .586 .756 .587 .767 .59 1 - .666 .775 [�] [ - .958] 1 .693 ] [ - 2.930 - .4 1 7 [ - 2.48 1 ] [ -- 11.000 .728 ] - .662 [ - .073 ] [ - 1.304 .001 ] .331 [ - 2.809 ] 2.754 [ - 2. 1 26] .463 [ - 3.254 ] 4.598 [ - 3.077] - 1.029 where for fixed n, the predictors P8" Xn+l , P8"Xn+2, P8" Xn +J • . . . are determined recursively from ( 1 1 .4.34). Of course in most applications n > l = max(p, q), in which case the second of the two relations in (1 1 .4.34) applies. For the ARMA(1, 1) process of Example 1 1 .4. 1 we have for h ;;::: 1, = [(.7) h - 1 x�n +l . l . (.6)h - l xn +l . 2 J More generally, let us fix n and define g(h) := P8" Xn+ h · Then g(h) satisfies the multivariate homogeneous difference equation, g(h) - <l> l g(h - 1) - . . . - <l>p g(h - p) = 0, for h > q, (1 1.4.35) 430 1 1. M ultivariate Time Series with initial conditions, i = 0, . . . , p - 1 . By appealing to the theory of multivariate homogeneous difference equations, it is often possible to find a convenient representation for g(h) and hence Ps, Xn + h by solving ( 1 1 .4.35). § 1 1.5 Estimation for Multivariate ARMA Processes If { X, } is a causal m-variate ARMA(p, q) process, X , - <11 1 X,_ 1 - · · · - <DpXr-p = Z, + 0 1 Z,_ 1 + ··· + eqzr-q , ( 1 1 .5. 1 ) where {Z1 } � WN(O, t), then the Gaussian likelihood of {X 1 , . . . , X. } can be determined with the aid of the multivariate innovations algorithm and the technique used in Section 8.7 for the univariate case. For an arbitrary m-variate Gaussian process {X1 } with mean and covariance matrices 0 K (i,j) = E (X;Xj), we can determine the exact likelihood of {X 1 , , Xn } as in Section 8.6. Let X denote the nm-component column vector of observations, X := (X '1 , , X�)' and let X := (X '1 , , X� )' where X 1 , , x . are the one-step predictors defined in Section 1 1 .4. Assume that r. := E (XX') is non-singular for every n and let eik and J!j be the coefficient and covariance matrices defined in Proposition 1 1.4.2, with 0;o = I and 0ii = 0, j < 0, i = 0, 1 , 2, . Then, introducing the (nm x nm) matrices, • . . • • • • . . . • • ... (1 1 .5.2) and D = diag { V0, . . . , V, _ J }, (1 1 .5.3) we find by precisely the same steps as in Section 8.6 that the likelihood of { X 1 , . . . , Xn } is n - 1 12 1 n exp - - L (Xj - XJ' J!f=Uxj - XJ ' L(r.) (2n) - nmf2 n det J!f- 1 2 j=l J=l ( 1 1 .5.4) = ( ) { } where the one-step predictors xj and the corresponding error covariance matrices Tj _ 1 ,j = 1 , . . . , n, are found from Proposition 1 1.4.2. Notice that the calculation of L(r.) involves operations on vectors and square matrices of dimension m only. To compute the Gaussian likelihood of {X 1 , . . . , X. } for the ARMA process ( 1 1 .5. 1 ) we proceed as in Section 8.7. First we introduce the process {W, } §1 1 .5. Estimation for Multivariate ARMA Processes 431 defined by ( 1 1 .4.26) with covariance matrices K(i,j) = E(W; Wj) given by ( 1 1 .4.27). Applying the multivariate innovations algorithm to the transformed process {W,} gives the coefficients ejk and error covariance matrices V; in the representation of ( 1 1 .4.28) of Xj . Since xj - xj = wj - Wj, j 1 , 2, . . . , i t follows from ( 1 1 .5.4) that the Gaussian likelihood L (<D, 0, t) of { X r. . . . , Xn } can be written as n nm det L(<D, 0, t) = (2n) - fZ ( 1 1 .5.5) +1 (})1 liJ- 1 )-1;2 = where Xj is found from ( 1 1.4.28) and ejb V; are found by applying Proposition 1 1 .4.2 to the covariance matrix ( 1 1 .4.27). In view of Remark 3 of Section 1 1 .4, it is not possible to compute maximum likelihood estimators of <D and 0 independently of t as in the univariate case. Maximization of the likelihood must be performed with respect to all the parameters of <D, 0 and t simultaneously. The potentially large number of parameters involved makes the determination of maximum likelihood estimators much more difficult from a numerical point of view than the corresponding univariate problem. However the maximization can be per formed with the aid of efficient non-linear optimization algorithms. A fundamental difficulty in the estimation of parameters for mixed ARMA models arises from the question of identifiability. The spectral density matrix of the process ( 1 1 .5.1) is f(w) = _!_ c!> - 1 (e - iw)e(e - iw)t e'(e iw)c!> ' - 1 (e iw). 2n The covariance matrix function, or equivalently the spectral density matrix function f( · ), of a causal invertible ARMA process does not uniquely determine t, ci>( · ) and 8( · ) unless further conditions are imposed (see Dunsmuir and Hannan ( 1976)). Non-identifiability of a model results in a likelihood surface which does not have a unique maximum. The identifiability problem arises only when p > 0 and q > 0. For a causal autoregressive or invertible moving average process, the coefficient matrices and the white noise covariance matrix t are uniquely determined by the second order properties of the process. It is particularly important in the maximum likelihood estimation of multivariate ARMA parameters, to have good initial estimates of the parameters since the likelihood function may have many local maxima which are much smaller than the global maximum. Jones ( 1984) recommends initial fitting of univariate models to each component of the series to give an initial approximation with uncorrelated components. Order selection for multivariate ARMA models can be made by minimizing I I . Multivariate Time Series 432 a multivariate analogue of (9.3.4), namely 2 ln L(<l> 1 , . . . , <l>p, 0 1 , . . , eq , I) + 2(k + 1 )nmj(nm - k 2), where k = (p + q)m 2 . Spectral methods of estimation for multivariate ARMA parameters are also frequently used. A discussion of these (as well as some time domain methods) is given in Anderson ( 1 980). AICC = - . - Estimation for Autoregressive Processes Using the Durbin-Levinson Algorithm There is a simple alternative estimation procedure, based on the multivariate Durbin-Levison algorithm, for fitting autoregressions of increasing order. This is analogous to the preliminary estimation procedure for autoregressions in the univariate case discussed in Section 8.2. Suppose we have observations x l > . . . , x. of a zero-mean stationary m-variate time series and let f(O), . . . , f'(n - 1 ) be the sample covariance function estimates. Then the fitted AR(p) process (p < n) is = Where the COefficientS <Dp l' . . , <DPP and Vp are COmputed recursively from Proposition 1 1 .4. 1 with r(h) replaced by f'(h), h 0, . . . , n - 1 . The order p of the autoregression may be chosen to minimize AICC - 2 In L(P 1 , . . . , PP ' VP ) + 2(pm2 + 1)nmj(nm - pm2 - 2). . = ExAMPLE 1 1 .5. 1 (Sales with a Leading Indicator). In this example we fit an autoregressive model to the bivariate time series of Example 1 1 .2.2. Let xtl and xr 2 = = ( 1 - B) ¥; 1 - .0228, t = 1, . . . , 149, ( 1 - B) ¥; 2 - .420 t = 1 , . . . , 1 49, where { Y;d and { ¥, 2 }, t = 0, . . . , 149, are the leading indicator and sales data respectively. The order of the minimum AICC autoregressive model for (X1 1 , X1 2 )', computed using the program ARVEC, is p = 5 with parameter estimates given by - . 1 92 - .0 1 8 - .5 1 7 .024 - .073 .010 <!> 1 , <1> 2 = <1> = , 3 5 .047 .250 5 5 .01 9 .05 1 4.678 .207 - <!> 54 - - [ [ - .032 3.664 ] - .009 ] , .004 <D 55 = [ [ ] - , .022 .01 1 1.300 .029 , ] [ [ .076 v = 5 - .003 ] - .003 ] , .095 433 §1 1 .5. Estimation for Multivariate ARMA Processes and AICC = 1 14.94. Since the upper right component of each of the coefficient estimates is near 0, we may model the { X, 1 } process separately from { X, 2 }. The MA(1) model { U,} WN(O, .0779) ( 1 1 .5.6) X1 1 = (1 - .414B)U,, "' provides an adequate fit to the series { X, d. Inspecting the bottom row of the coefficient matrices, <1> 5 1 , . . . , <!> 5 5 , and deleting those elements which are near 0, we arrive at the approximate relation between {X,d and {X, 2 } given by X, 2 = .250X, _ 2 , 2 + .207X, _ 3, 2 + 4.678X, _ 3, 1 + 3.664X, _ 4 , 1 + 1.300X, _ 5, 1 + W, or, equivalently, x, 2 = 4.678B 3 (1 + .783B + .278B2 ) X1 1 + ( 1 - .250B 2 - .207B 3) _ 1 w, ( 1 - .250B 2 - .201B 3 ) ( 1 1.5. 7) where { W,} WN(O, .095). Moreover, since the estimated noise covariance matrix is essentially diagonal, it follows that the two sequences {X,d and { W,} are uncorrelated. This reduced model ( 1 1.5.6) and ( 1 1 .5.7) is an example of a transfer function model which expresses the output series { X, 2 } as the output of a linear filter with input {X,d plus added noise. The model ( 1 1.5.6) and ( 1 1 .5.7) is similar to the model found later in Section 13.1 (see (13.1.23)) using transfer function techniques. Assuming that the fitted AR(5) model is the true model for {X, := (Xt l , Xd'}, the one- and two-step ahead predictors of x l SO and x l 5 1 are "' = and = [ ] . 163 - .2 1 7 [ ] - .027 .816 ' with error covariance matrices - .003 .095 ' ] = [ .0964 - .0024 - .0024 . . 0953 ] 1 1 . Multivariate Time Series 434 Forecasting future values of the original data Y1 = ( }; 1 , ¥; 2)' is analogous to the forecasting of univariate ARIMA models discussed in Section 9.5. Let P 1 49( · ) denote the operator P( · 1 1, Y 0 , . . . , Y 1 49) where 1 = (1, 1)' and assume, as in the univariate case, that Yo j_ X I , . . . ' x 1 49 • Then, defining sn as in (1 1.4. 1), we find (see Problem 1 1.9) that = [ ] [ ] [ ] [ ] = [ ] [ ] [ ] [ ] .0228 . 1 63 1 3.4 1 3.59 = + + .420 262.7 262.90 - .217 and .0228 - .027 + .420 .8 16 with error covariance matrices + 13.59 1 3.59 262.90 = 264. 14 ' - .003 .095 ] and ] E[(Y1 s 1 - P1 49Y l s l)(Y l s l - P 1 49 Y 1 s 1YJ = (I + <l>sdVs (I + <l>s1Y + Vs .094 - .003 = . - .003 .181 [ These predicted values, computed using the program ARVEC, are i n close agreement with those obtained from the transfer function model of Section 1 3 . 1 (see (13.1.27) and ( 1 3. 1.28). Although the two models produce roughly the same prediction mean squared errors for the leading indicator data, the AR model gives substantially larger values for the sales data (see (13.1 .29) and (13.1.30)). § 1 1.6 The Cross Spectrum Recall from Chapter 4 that if { X1 } is a stationary time series with absolutely summable autocovariance function y( · ), then { X1 } has a spectral density (Corollary 4.3.2) given by 1 00 f(A_) = - L e- ihA y (h), 2n h = - oo -n ::::; A ::::; n, (1 1.6. 1) 435 § 1 1 .6. The Cross Spectrum and the autocovariance function can then be expressed as y (h) = J:, . e ih Aj(A) dA. ( 1 1 .6 2) By Theorem 4.8.2, the process { X, } has a corresponding spectral representation, X, where {Z(A), -n :s; = J(-n,n] ( A :s; e i'A dZ(A), (1 1 .6.3) n} is an orthogonal increment process satisfying the latter expression representing the contribution to the variance of {X, } from harmonic components with frequencies in the interval (A 1 , }, 2 ]. In this section we shall consider analogous representations for a bivariate stationary time series, X, = (Xr 1 , X, 2 )', with mean zero and covariances yij(h) = E(Xr+ h , i Xtj) satisfying cY) (1 1 .6.4) i, j = 1, 2. I I Yij(h) l < 00 , h= Although we shall confine our discussion to bivariate time series, the ideas can easily be extended to higher dimensions and to series whose covariances are not absolutely summable (see Section 1 1 .8). -ro 0 Definition 1 1.6.1 (The Cross Spectrum). If {X, } is a stationary bivariate time series with mean and covariance matrix function r( · ) satisfying ( 1 1 .6.4), then the function 1 � - ihA e Y1 2 (h), ), E [ - n, n], fdA) = 2n h=� oo is called the cross spectrum or cross spectral density of {X, 1 } and { X, 2 }. The matrix /1 1 (h) ft 2 (h) j(A) = __1_ I e -ih Ar(h) = 2n h= - oo /2 1 (h) f22 (h) is called the spectral density matrix or spectrum of {X,}. [ J The spectral representations of yij(h) and r(h) follow at once from this definition. Thus i, j = 1, 2, and ( 1 1 .6.5) 1 1 . Multivariate Time Series 436 The function fi i ( · ) is the spectral density of the univariate series { X1; } as defined in Chapter 4, and is therefore real-valued and symmetric about zero. However since yii( · ), i =F j, is not in general symmetric about zero, the cross spectrum J;j ( · ) is typically complex-valued. If { Z;(A), - n ::::; A ::::; n } is the orthogonal increment process in the spectral representation of the univariate series { Xti }, then we know from Chapter 4 that X1 ; = I J(-1t,1t] e itl dZ;(A) (1 1 .6.6) and ( 1 1 . 6. 7 ) the latter being an abbreviation for f1� J;;(A) dA = E I Z;(A 2 ) - Z;(Adl 2 , - n ::::; A 1 ::::; A- 2 ::::; n. The cross spectrum J;j(A) has a similar interpretation, namely k(A.) dA. = E(dZ;(A) dZp)), ( 1 1 .6.8) which is shorthand for J1� J;)A.) dA = E [(Z;(A 2 ) - Z;(A 1 )) (Z)A2 ) - Zp1 ))], - n ::::; A I ::::; A z ::::; n. As shown in Section 1 1 .8, the processes { z l (A.)} and { Z2 ().) } have the additional property, E(dZ;(A) dZiJ-t) ) = 0 for A =F J1 and i, j = 1 , 2. The relation (4. 7.5) for univariate processes extends in the bivariate case to i, j = 1, 2, ( 1 1 .6.9) for all functions g and h which are square integrable with respect to J;; and jjj respectively (see Remark 1 of Section 1 1 .8). From ( 1 1 .6.8) we see that f2 1 (A) !1 z (A). This implies that the matrices f(A.) are Hermitian, i.e. that j().) = f *(A ) where 2 * denotes complex conjugate transpose. Moreover if a = (a 1 , a 2 )' E C then a*f(A.)a is the spectral density of {a*X1 }. Consequently a *f(A.) a ;::::: 0 for all 2 a E C , i.e. the matrix f(A.) is non-negative definite. The correlation between dZ 1 (A) and dZ 2 (A) is called the coherency or coherence, %u(A), at frequency A. From ( 1 1 .6.7) and ( 1 1 .6.8) we have ( 1 1 .6. 1 0) %u(A.) = ju(A)/[ f1 1 (A)jzz (A.)] 1 1 2 . By the Cauchy-Schwarz inequality, the squared coherency function I Xu{AW satisfies the inequalities, = -n ::::; A ::::; n, and a value near one indicates a strong linear relationship between dZ1 (),) and dZ2 (A). 437 § 1 1 .6. The Cross Spectrum Since /1 2 (A.) is complex-valued, it can be expressed as f1 2 (A.) = c dA.) - iq dA.), where and q dA.) = - Im { f1 2 (A.) } . The function c 1 2 (A.) is called the cospectrum of {Xt d and { Xt 2 }, and q 1 2 (A.) is called the quadrature spectrum. Alternatively f1 2 {A.) can be expressed in polar coordinates as where 1 adA.) = (c f 2 (A.) + qi 2 (A.)) 12 is called the amplitude spectrum and r/>dA.) = arg (c 1 2 (A.) - iq dA.)) E ( - n, n], the phase spectrum of {X t d and { Xt 2 } . The coherency is related to the phase and amplitude spectra by X"1 2 (A.) = a d A.)[fl l (A.)fn(A.)r 1 12 e x p[i¢ dA.)] = f X'dA.) f ex p[icf> 1 2 (A.)]. EXAMPLE 1 1 .6. 1 . Let {X t } be the process defined in Example 1 1 . 1 . 1 , i.e. where { Zt } � WN (0, 1 ). Then f( A.) = __!__ [r( - 1 0) e l O iJc + r(O) + r(10)e - ! O i Jc J 2n and 1 /1 2 (A.) = - [ 1 + .75 cos(10A.) + .75i sin(1m)] 2n = a 1 2 (J�)exp [ir/> 1 2 (A.)], where the amplitude spectrum a 1 2 (A.) is a dA.) = 1 [ 1 .5625 + 1 .5 cos( l OA.)] 1 i2 , 2n and tan r/>1 2 (A.) = .75 sin(10A.)/[1 + .75 cos(10A.) ]. 1 1. Multivariate Time Series 438 Since f1 1 ()o) (2n) - 1 and f22 (A) = (2n) - 1 (1.5625 + 1 .5 cos(1 0A)), the squared coherency is = - n :;:::; A :;:::; n . The last result is a special case of the more general result that l ./f'u(A) I 2 1, - n :;:::; A :;:::; n, whenever {X t t } and {Xd are related by a time-invariant linear filter. Thus if Remark 1 . = X, 2 L tf!i Xt -j, 1 j= where Li I t/li l < oo , then by Theorem 4. 1 0. 1 , X, 2 GO = � oo = (4. t/Ji e - ii'- ) e;,;_ dZ1 J(-1t,1t] l (A). Hence dZ2 (A) = Li tf!i e - ii'- dZ1 (A), - n :;:::; A :;:::; n. Since dZ2 (A) and dZ 1 (A) are linearly related for all ) the squared absolute correlation between dZ 1 (A) and dZ2(A), i.e. l ffdA) I 2 , is 1 for all A. This result can also be obtained by observing that J o, E (X, + h , 2 X,d = = whence l l;, tf!i e i<r+ h -i)'- dZ1 (A) l e i''- dZ1 (A) J E [ J(-1t,1t] J(-1t, 1t] J:" (� t/Jie-ii'-) e ih'-J1 1 (A) dA, J L tf!i e - ii'-jl l (A). j Substituting in (1 1 .6. 1 0) and using the fact that f22 (A) I Li t/Ji e - ii'- 1 2 f1 1 (A), we obtain the same result, i.e. I Xu{AW 1, - n :;:::; A :;:::; n. f2 t (A) = = = If {Xt t } and {Xd have squared coherency l ffdA) I 2 and if linear filters are applied to each process giving Remark 2. Y, l L !Xj Xr -j, l j= GO = - oo and 00 L f3j Xr -j, 2 j= where Li l cxi l < oo and Li l f3il < oo, then { Y, d and { ¥, 2 } have the same squared coherency l ffdAW. This can be seen by considering the spectral representations x ,k f( - 1t , 1tit v dZk(v), Y,k L - 1t , 1t] e irv dZy,(v), and observing, from Theorem 4. 1 0.1, that dZy , (v) L !Xi e- iiv dZ1 (v) j ¥, 2 = - oo = = = 439 §1 1 .6. The Cross Spectrum and dZy2(v) = L {3i e - ii v dZ2 (v). j From these linear relations it follows at once that the correlation between dZy, (v) and dZy2(v) is the same as that between dZ1 (v) and dZ2 (v). Remark 3. Let {X,} be a bivariate stationary series and consider the prob lem of finding a time-invariant linear filter 'P = { 1/Ji } which minimizes E I X, 2 - L� l/li Xr -j. I I 2 . If 'I' is any time-invariant linear filter with transfer function 1/J(e - iv) = L 1/Jj e - iiv, j= - oo then using ( 1 1 .6.6) and ( 1 1 .6.9) we can write 2 2 eit v dZ2 (v) 1/J(e - iv)eit v dZ! (v) E x, 2 1/Jj Xr -j. l = E j = oo - oo co I � 1 I f" = J:, f" 1 U22 (v) - I/J (e - iv)f1 2 (v) - 1/J (e iv)fd v) + = 1 1/J(e - iv)l 2/1 1 (v)] dv J:, E l dZ2 (v) - I/J (e - iv) dZ1 (vW. It is easy to check (Problem 1 1. 1 3) that the integrand is minimized for each v if ( 1 1 .6. 1 1) and the spectral density of Li l/li Xr -j. l is then f2 1 1 (v) = l /2 1 (vW/f1 1 (v). The density /2 1 1 is thus the spectral density of the linearly filtered version of { Xr 1 } which is the best mean square approximation to { X, 2 } . We also observe that f2 1(v) ( 1 1 .6. 1 2) I :ff1 2 (Jc) l 2 = 1 , f22 (v) so that l ffdJcW can be interpreted as the proportion of the variance of {Xa} at frequency v which can be attributed to a linear relationship between { X, 2 } and { Xn }. Remark 4. If { X, J } and {X, 2 } are uncorrelated, then by Definition 1 1 .6. 1 , fdv) = 0, - n ::;; v ::;; n, from which it follows that l ffdJc) l 2 0, - n ::;; v ::;; n. = ExAMPLE 1 1 .6.2. Consider the bivariate series defined by I I. Multivariate Time Series 440 where ¢ > 0 and {Z1} {X1 2 } is � WN(O, (J 2 1). The cross covariance between {X1d and if h = - d, otherwise, and the cross spectrum is therefore fd.!c) = (2n)- l ¢)(J z eid).. The amplitude and phase spectra are clearly a d),) = (2n)-l «fo(Jz and ¢ d.!c) = (d.!c + n)mod(2n) - n. (The constraint - n < «fo 1 2 (),) ::o;; n means that the graph of «fo 12 (.!c), - n < A ::o;; n, instead of being a straight line through the origin with slope d, consists of 2r + 1 parallel lines, where r is the largest integer less than (d + 1 )/2. Each line has slope d and one of them passes through the origin.) Since /1 1 (.!c) = (J 2/(2n) and f22 (A) = (J 2 ( 1 + «fo 2 )/2n, the squared coherency is - n ::o;; A ::o;; n. 5. In the preceding example the series { X1 2 } is a lagged multiple of { X1 1 } with added uncorrelated noise. The lag is precisely the slope of the phase spectrum «fo 1 2 . In general of course the phase spectrum will not be piecewise linear with constant slope, however «fo 1 2 (Jc) can still be regarded as a measure of the phase lag of { xt2 } behind { xt I } at frequency A in the sense that fd.!c) d.!c = a 1 2 (.!c)ei¢,2 <;.> dJc = E [ l dZ 1 (.!c) l l dZ2 (.!c) l ei<e.<;.> - e2 <;.>> ], Remark where E>JA.) = arg(dZ;(),)), i = 1 , 2. We say that X1 2 lags d time units behind X1 1 at frequency A if ex p (it.!c) dZ2 (Jc) = exp (i(t - d)),) dZ 1 (.!c). We can then write /1 2 (),) d.!c = Cov (dZ 1 (.!c), exp(- id.!c) dZ1 (.!c)) = exp (id.!c)/1 1 (.!c) d.!c. Hence «fod.!c) = arg(fd.!c)) = (d.!c + n)mod(2n) and «fo� 2 (.!c) = d. In view of its in terpretation as a time lag, ¢� 2 (.!c) is known as the group delay at frequency A. EXAMPLE 1 1 .6.3 (An Econometrics Model). The mean corrected price and supply of a commodity at time t are sometimes represented by X1 1 and X1 2 respectively, where {xt l = - «fo1 Xtz + uto o < «P1 < 1, ( 1 1 .6. 1 3) 0 < ¢z < 1 , , Xtz = «ftz Xt- 1 . 1 + V, where { U1 } WN(O, (Jb), { V, } WN(O, (JB) and { U1 } , { V, } are uncorrelated. We now replace each term in these equations by its spectral representation. Noting that the resulting equations are valid for all t, we obtain the following equations for the orthogonal increment processes Z 1 , Z2 , Zu and Zv in the spectral representations of { xt l }, { xt2 }, { ut } and { v, }: � � 441 §1 1 .6. The Cross Spectrum and dZ2 (A) rjJ2 e - i;. dZ 1 ()o) + dZv(A). Solving for dZ1 (A) and dZ2 (A), we obtain dZ1 (A.) = (1 + r/J 1 r/J2 e - i;.r 1 [ - r/J 1 dZv(A) + dZu(A)] = and dZ2 (A.) = ( 1 + rP 1 rP2 e - iJ.) - 1 [dZv(A.) + r/J2 e - iJ. dZu()o)] . From (1 1 .6.8) and ( 1 1 .6.9) it follows that /1 1 (A) = 1 1 + </J 1 </J 2 e - i;. I - 2 (CJ b + </J I CJ W(2n), fn( A.) 1 1 + <P 1 </J 2 e - i;. I - 2 (CJ� + </J � CJ b)/(2n), and /1 2 ()o) = 1 1 + </Y 1 </J 2 e - i;. I - 2 ( </J 2 CJ� COs A - </J 1 CJ� + i</J 2 CJ b sin A.)/(2n). = The squared coherency is therefore, by ( 1 1 .6. 1 0), r/JI t + r/J l CJt - 2rjJ1 rP2 CJb (JB co d I X1 2 (A.W = CJ ' r/JI (Jt + r/Jl (Jt + ( 1 + r/Jl r/JI )(Jb (JB and Notice that the squared coherency is largest at high frequencies. This suggests that the linear relationship between price and supply is strongest at high frequencies. Notice also that for A close to n, (r/J CJb cos A. - r/J1 CJB ) r/J2 CJb co d r/J,u(A.) � 2 "' 2 ('1'2 CJu cos A1 - '1'A. 1 CJv2 ) 2 indicating that price leads supply at high frequencies as might be expected. In the special case r/J 1 = 0, we recover the model of Example 1 1 .6. 2 with d = 1, for which </J dA.) = (A. + n)mod(2n) - n and </J 'dA.) = 1 . EXAMPLE 1 1 .6.4 (Linear Filtering with Added Uncorrelated Noise). Suppose that \f { 1/Ji ,j = 0, ± 1 , . . . } is an absolutely summable time-invariant linear filter and that { X, d is a zero-mean stationary process with spectral density f1 1 (A). Let { N, } be a zero-mean stationary process uncorrelated with { X, d and with spectral density fN()o). We then define the filtered process with added noise, ( 1 1 .6. 1 4) x, 2 = j=L: I/Jj X,_j. 1 + N,. - oo = OCJ I I . M ultivariate Time Series 442 Since { Xr 1 } and { N, } are uncorrelated, the spectral density of { X, 2 } is where j t{! (e-;;') = L� - w t{!je-i A. ( 1 1 .6. 1 5) Corresponding to ( 1 1 .6. 14) we can also write ( 1 1 .6. 1 6) where Z2 , Z1 and ZN are the orthogonal increment processes in the spectral representations of { X, 2 }, { Xn } and { N, }. From (1 1 .6. 1 6), E(dZ2 (),) dZ1 (A) ) = t{! (e-iA)/1 1 (A) dA and hence The amplitude spectrum is and since /1 1 is real-valued, the phase spectrum coincides with the phase gain of the filter, i.e. qJ2 1 (A) = a rg(t{! (e-iA)). In the case of a simple delay filter with lag d, i.e. t{!j = 0, j i= d, q)2 1 (A) = n)mod(2n) - n, indicating that { Xn } leads { X, 2 } by d as expected. The transfer function t{! (e-i·) of the filter, and hence the weights { t/JJ , can be found from the relation arg(e - idA) = ( - dA + ( 1 1 .6. 1 7) quite independently of the noise sequence { N, } . From ( 1 1 .6. 1 5) and ( 1 1 .6. 1 7) we also have the relation, 2 fz z (),) = l fz t (A) I //l i (A) + fN (A) = 1 Jf2 1 (AW/z z (A) + fN(A), where I Jf2 1 (AW is the squared coherency between {X, 2 } and {X, J } . Hence 2 ( 1 1 .6. 1 8) fN(A) = ( 1 - 1 Jfz t (A) I )/zz (A), and by integrating both sides, we obtain a� : = Var(N ,) = f�y - ! Jf2 1(AW )fz z (A) dA. In the next section we discuss the estimation of/1 1 (A), f2 2 (A) and f1 2 (A) from n pairs of observations, (X, 1, Xd', t = 1 , . . . , n. For the model ( 1 1.6. 1 4), these estimates can then be used in the equations ( 1 1 .6. 1 7) and ( 1 1 .6. 1 8) to estimate the transfer function of the filter and the spectral density of the noise sequence {N, } . 443 §1 1.7. Estimating the Cross Spectrum § 1 1.7 Estimating the Cross Spectrum Let {X, } be a stationary bivariate time series with EX, = J1 and E(X,+ h X;) where the covariance matrices r(h) have absolutely summable components. The spectral density matrix function of {X,} is defined by ' JlJl = r(h), [/11 ] 1 (A) fdA.) = ( 2nr l I r (h)e - ih .<, - n � A � n. (A) fz i fzz (A) h= - oo In this section we shall consider estimation of by smoothing the multi variate periodogram of {X , . . . , X. }. First we derive bivariate analogues of the asymptotic results of Sections 10.3 and 1 0.4. We then discuss inference for the squared coherency, the amplitude spectrum and the phase spectrum which were defined in Section 1 1 .6. The discrete Fourier transform of {X 1 , , X.} is defined by f()o) = /(A) • • • n J(wJ = n - 1/2 L X, e - itwJ, t =l where wi = 2nj/n, - [(n - 1 )/2] � j � [n/2], are the n Fourier frequencies introduced in Section 10. 1 . The periodogram of { X I , . . . , x. } is defined at each of these frequencies wi to be the 2 x 2 matrix, J.(wi ) = J(w)J * (wi ), where * denotes complex conjugate transpose. As in Section 10.3 the definition is extended to all frequencies w E [ - n, n] by setting I" (w) = {I.(g(n, I:(g(n, w)) w) ) - if w � 0, if w < 0, ( 1 1 .7. 1 ) where g(n, w), 0 � w � n, is the multiple of 2n/n closest to w (the smaller one if there are two). We shall suppress the subscript n and write Iii(w), i, j = 1 , . . . , n, for the components of I.(w). Observe that Iu(w) i s the periodogram of the univariate observations {Xu , . . . , X.;}. The function I 1 2 (w) is called the cross periodogram. At the Fourier frequency wk it has the value, Asymptotic Properties of the Periodogram Since the next two propositions are straightforward extensions of Propositions 1 0. 1 .2 and 1 0.3. 1, the proofs are left to the reader. Proposition 1 1 .7.1 . n- 1 L �= l X,, then If wi is any non-zero Fourier frequency and X. = I I . Multivariate Time Series 444 where f(k) = n-1 L��t (Xr+ k - Xn )(X, - Xn ) ', k 2: 0, and f(k) = f' ( - k), k < 0. The periodogram at jrequency zero is In (O) = nXn X�. If {X,} is a stationary bivariate time series with mean 1.1 and covariance matrices r(h) having absolutely summable components, then Proposition 1 1 .7.2. (i) and Ein (O) - niJIJ' ---> 2nf(O) Ein (w) ---> 2nf(w), ifw of. 0 where f( · ) is the spectral matrix function of {X,}. (ii) We now turn to the asymptotic distribution and asymptotic covariances of the periodogram values of a linear process. In order to describe the asymp totic distribution it is convenient first to define the complex multivariate normal distribution. 1 1 .7.1 (The Complex Multivariate Normal Distribution). If l: = 1:Definition 1 + il:2 isma complex-valued m m matrix such that l: = l:* and a*l:a 0 for all a E e , then we say that y = y 1 + iY 2 is a complex-valued multivariate normal random vector with mean 1.1 = 1.11 + i1.12 and covariance matrix l: if J [ �:] � N ( [:: J � [�: - �: ) . ( 1 1 .7.2) We then write Y � Nc(IJ, l:). If y<nl = Y\n l + iY�l, n = 1, 2, . . . , we say that y<nl ([IJ(n)J [l;(n) - l;(n)J ) [ y(n)J is ANc(l.l(n l , l:(n l ) if y �nl is AN ll�n) , :2 l; �l 1:\n) , where each l: (n ) = l:\nl + il:�l satisfies the conditions imposed on l:. These guarantee (Problem 2: x 1 1 1 . 1 6) that the matrix in ( 1 1 .7.2) is a real covariance matrix. Suppose that { Z,} � IID(O, l) where l is non-singular, and let In (w), - n :0:: w :0:: n, denote the periodogram of {Z1, . . . , Zn } as defined by Proposition 1 1 .7.3. (1 1 .7. 1 ). (i) 0< If }" < · · · < Am < n then the matrices In (}" ), . . . , In (}"m) converge jointly in distribution as n ---> oo to independent random matrices, each distributed as Yk Yk* where Yk � Nc(O, l). (ii) If EZ� < oo, i = I , 2, and wj, wk are Fourier frequencies in [0, n], then 0 < wj = wk < n, wj = wk = 0 or 1 1 wj of. wk, n, where Ipq( · ) is the (p, q)-element of In ( · ), apq is the (p, q)-element of l, and Kpqrs is the fourth cumulant between Z,P , Z,q, Z,r and Z,s. (See Hannan (1970), p. 23.) 445 §1 1.7. Estimating the Cross Spectrum A. n J(A.) = n - 112 rL� 1 z, e- itg(n , A). We first show that J(A.) is ANc(O, I) (see Definition 1 1 .7. 1 ). We can rewrite J(A.) as n J(),) = n 112 rL� 1 [Z, cos(tg (n, A.)) - iZ, sin(tg(n, A.)) ] . PROOF. (i) For an arbitrary frequency E (O, n) define - [zZ,, cos(tg (n, A.))A.))] Now the four-dimensional random vector, L U" ·. = n _112 f . ' sm(tg(n, r�1 is a sum of independent random vectors and for g(n, E (0, n) we can write (see Problem 1 1 . 1 7) A.) ( 1 1 .7.3) Applying the Cramer-Wold device and the Lindeberg condition as in the proof of Proposition 1 0.3.2, we find that This is equivalent, by Definition 1 1 . 7. 1, to the statement that J(A.) is ANc(O, l:). (Note that a complex normal random vector with real covariance matrix L has uncorrelated real and imaginary parts each with covariance matrix L/2.) It then follows by Proposition 6.3.4 that /"(A.) => YY * where Y � N(O, l:). For a computation analogous to the one giving ( 1 1 .7.3) yields w -# A., E[J(A.)J*(w)] = 0 J(w) A ), In(A.m ) < 1 < < m < for all n sufficiently large. Since J(A.) and are asymptotically joint normal, it follows that they are asymptotically independent. Extending this argument to the distinct frequencies 0 n, we find that J (A. 1 ), ··· J (),m ) and hence ), , are asymptotically independent. (ii) The proof is essentially the same as that of Proposition 10.3.2 and is therefore omitted. (See also Hannan (1 970), p. 249.) D In(A.1 • • • • • • , As in Section 1 0.3, a corresponding result (Theorem 1 1 .7. 1) holds also for linear processes. Before stating it we shall relate the periodogram of a linear process to the periodogram of the underlying white noise sequence. Proposition 1 1 .7.4. Let {X, } be the linear process, 1 1. M ultivariate Time Series 446 00 ( 1 1 .7.4) X, = I ck zr - k • k= where l: is non-singular and the components of the matrices Ck satisfy 2 I k'= I Ck (i, j)J I kJ 11 < oo , i, j = 1 , 2. Let I x ( · ) and I ( ) be the periodograms of { X 1 , . . . , X.} and { Z 1 , . . . , Z.} respectively. If EZj < oo, i = 1 , 2, and C(e - iw) := I I:'= Ck e- ikw, then for each Fourier frequency wk E [0, n], -oo •. - oo •. z - _ 00 I x (wk ) = C (e - iw )I., (wd C (e iw ) + R.(wk ), where the components of R.(wk ) satisfy 1 i, j = 1 , 2. max E I R n, iiwk W = O(n - ), , " •. 2 ' " Wk E [0 7t] PROOF. The argument follows that in the proof of Theorem 1 0.3. 1 . (See also Hannan ( 1 970), p. 248.) 0 Theorem 1 1 .7.1 . Let {X, } be the linear process defined by ( 1 1 .7.4) with periodogram I.(A) = [Jij().)JL = 1 , - n :5: A :5: n. (i) If 0 < A1 < · · · < Am < n then the matrices I.(A1 ), . . . , I.(Am ) converge jointly in distribution as n --> oo to independent random matrices, the k'h of which is distributed as Wk Wk* where Wk Nc(O, 2nf(Ad) and f is the spectral density matrix of {X,}. (ii) If wi = 2nj/n E [0, n] and wk = 2nkjn E [0, n], then (2n) 2 [ fp , (w).fsq (wi ) + fps (w)fq, (wi ) ] + O(n - 112 ) if wi = wk = 0 or n, � if 0 < wi = wk < n, if (l)j i= (l)b 1 1 2 where the terms O(n - 1 ) and O(n - ) can be bounded un iformly in j and k by c 1 n - 1 12 and c 2 n- 1 respectively for some positive constants c1 and c2• PROOF. The proof is left to the reader. (See the proof of Theorem 1 0.3.2 and Hannan (1 970), pp. 224 and 249.) D Smoothing the Periodogram As in Section 1 0.4, a consistent estimator of the spectral matrix of the linear process ( 1 1 .7.4) can be obtained by smoothing the periodogram. Let { m.} and { W,( · ) } be sequences of integers and (scalar) weight functions respectively, satisfying conditions ( 10.4.2)-(1 0.4.5). We define the discrete spectral average estimator j by 0 :5: w :5: n. ( 1 1 . 7.5) j(w) := (2n) - 1 I W,(k)I.(g(n, w) + wk ), lk l :5 mn § 1 1 .7. Estimating the Cross Spectrum 447 In order to evaluate j(w), 0 :s:; w :s:; n, we define In to have period 2n and replace In(O) whenever it appears in ( 1 1 .7.5) by { t } j(O) : = (2n) 1 Re W, (O) In(w 1 ) + 2 W,(k)In(wk+d · k l - We have applied the same weight function to all four components of In(w) in order to facilitate the statement and derivation of the properties of j(w). It is frequently advantageous however to choose a different weight-function sequence for each component of In( · ) since the components may have quite diverse characteristics. For a discussion of choosing weight functions to match the characteristics of In ( - ) see Chapter 9 of Priestley (1981). The following theorem asserts the consistency of the estimator j(w). It is a simple consequence of Theorem 1 1 .7. 1 . If {X1} is the linear process defined by ( 1 1 .7.4) and j(w) = [/;j{w)JL= t is the discrete spectral average estimator defined by ( 1 1 .7.5), then for A., w E [0, n], Theorem 1 1.7.2. (a) lim Ej(w) = f(w) and if w = A. = 0 or if 0 < w = A. < n, if w =ld. n, (Recall that if X and Y are complex-valued, Cov(X, Y) = E(X Y) - (EX)(EY).) The cospectrum c 1 2 (w) = [ f1 2 (w) + f2 1 (w)]/2 and the quadrature spectrum q1 2 (w) = i [ f1 2 (w) - f2 1 (w)]/2 will be estimated by c1 2 (w) [ J1 2 (w) + j2 1 (w)]/2 = and respectively. By Theorem 1 1.7.2(b) we find, under the conditions specified, that the real-valued random vector (L ikl s m W,2 (k)t 1 (]1 1 (w), f22 (w), c1 2 (w), qdw))', 0 < w < n, has asymptotic covariance matrix, f1 1 C t z fz z C t z tUt t !22 + c i 2 - qi 2 ) C1 2 q 1 2 ft t q 1 2 f2 2 q 1 2 ' Ctzqtz tU1 1 f2 2 + q iz - c i 2 ) ( 1 1 .7.6) J 1 1 . M ultivariate Time Series 448 f � where the argument w has been suppressed. Moreover we can express ( ]1 1 (w), f22 (w), c u(w), q 1 2 (w))' as the sum of (2m + 1) random vectors, ]1 1 (w) / 1 1 (g(n, w) + wk ) f22 (w) l 22 (g(n, w) + wd = L Wn (k) Re {Iu(g(n, w) + wk ) } � u(w) lkl <; m q 1 2 (w) - Im{Iu(g(n, w) + wk) } J J ' where the summands, by Theorem 1 1 .7. 1 , are asymptotically independent. This suggests that (]1 1 (w), J2 2 (w), c 1 2 (w), q 1 2 (w) ) is AN ( U1 1 (w), !22 (w), c 1 2 (w), q 1 2 (w) )', a; V) ' (1 1 .7.7) where a; = L ikl <; m W,2 (k) and V is defined by ( 1 1 .7.6). We shall base our statistical inference for the spectrum on the asymptotic distribution (1 1 .7.7). For a proof of (1 1 .7.7) in the case when j(w) is a lag window spectral estimate, see Hannan ( 1 9 70), p. 289. Estimation of the Cross-Amplitude Spectrum To estimate oc 1 2 (w) = l f1 2 (w)l = l c 1 2 (w) - iq 1 2 (w)l we shall use &1 2 (w) : = (c i 2 (w) + 4 i 2 (w)) 112 = h(c u(w), q u(w)). By (1 1 .7.7) and Proposition 6.4.3 applied to h(x, y) = (x 2 + y 2 ) 1 12 , we find that if oc u(w) > 0, then where a;(w) = (:�y + G�Y V33 V44 + 2 (:�)(:�) v34, vii is the (i,j)-element of the matrix defined by ( 1 1 .7.6), and the derivatives of h are evaluated at (c 1 2 (w), q 1 2 (w)). Calculating the derivatives and simplifying, we find that if the squared coherency, l %dwW, is strictly positive then a 1 2 (w) is AN(ocu(w), a; ociiw)( l %dw)l - 2 + 1 )/2). ( 1 1 .7.8) Observe that for small values of I %1 2 (wW, the asymptotic variance of &1 2 (w) is large Estimation of the Phase Spectrum The phase spectrum tP 1 2 (w) = arg f1 2 (w) will be estimated by J1 2 (w) := arg(c u(w) - iq u(w)) E ( - n, n] . §1 1 .7. Estimating the Cross Spectrum If l %dwW 449 0, then by ( 1 1 .7.7) and Proposition 6.4.3, ¢ 1 2(w) is AN(¢ u(w), a;ai z(w)( l %u(w) l - 2 - 1 )/2). > ( 1 1 .7.9) The asymptotic variance of ¢ dw), like that of & u(w), is large if l %1 2(w) l 2 is small In the case when %1 2 (w) = 0, both c 1 2 (w) and q 1 2 (w) are zero, so from (1 1 .7.7) and (1 1 .7.6) [�dw) J ([ ] [ 0 f ° , ta; 1 1 /2 2 0 0 /1 1 /22 As /;u(w) = arg(cu(w) - iq u(w)) = arg[(a; /1 1 /22 /2) - 1 12 (c u(w) - iq u(w))], we conclude from Proposition 6.3.4 that q dw) is AN ] )· /;1 2 (w) => arg(V1 + iV2 ), where V1 and V2 are independent standard normal random variables. Since V1 /V2 has a Cauchy distribution, it is a routine exercise in distribution theory to show that arg(V1 + iV2 ) is uniformly distributed on ( - n, n). Hence if n is large and %1 2 (w) = 0, J1 2 (w) is approximately uniformly distributed on ( - n, n). From ( 1 1 .7.9) we obtain the approximate 95% confidence bounds for � 1 2 (w), ¢ dw) ± 1 .96an& dw)( I Xu(w) l - 2 - 1) 1 ' 2/2 1 ' 2 , where I XdwW is the estimated squared coherency, I XdwW = &iz (w);[ Jl l (w)]dw)J, and it is assumed that 1 Xu(w) l 2 > 0. Hannan ( 1 970), p. 257, discusses an alternative method for constructing a confidence region for � 1 2 (w) in the case when W,(k) = (2m + 1 ) - 1 for l kl :;:; m and W(k) 0 for l k l > m. He shows that if the distribution of the periodogram is replaced by the asymptotic distributions of Theorem 1 1 .7. 1, then the event E has probability ( 1 - a), where 1 - I X1 2(wW 1 ' 2 • E = j sm(¢ t 1 - a1 2 (4m) 1 2(w) - ¢ u(w)) l :;:; 4m l %u(w) l 2 = { A [ � ] } and t 1 _ a1 2 (4m) is the ( 1 - ct/2)-quantile of the t-distribution with 4m degrees of freedom. For given values of ¢ dw) and I Xu(w) j , the set of ¢ dw) values satisfying the inequality which defines E is therefore a 1 00(1 - a)% confidence region for ¢ dw). If the right-hand side of the inequality is greater than or equal to 1 (as will be the case if I Xdw) l 2 is sufficiently small), then we obtain the uninformative confidence interval ( - n, n] for � 1 2 (w). On the other hand if the right-hand side is less than one, let us denote its arcsin (in [0, n/2)) by �*. Our confidence region then consists of values �1 2 (w) such that l sin(/; 1 2 (w) - � 1 2 (w))l :;:; sin �*, 1 1 . Multivariate Time Series 450 i.e. such that ( 1 1 .7. 10) or J1 2 (w) + n - r/J* :<:; r/Ju(w) :<:; J1 2 (w) + + r/J*. n The confidence region can thus be represented as a union of two subintervals of the unit circle whose centers are diametrically opposed (at J1 2 (w) and ¢ 1 2(w) + n) and whose arc lengths are 2¢*. If l ffdwW is close to one, then we normally choose the interval centered at J1 2 (w), since the other interval corresponds to a sign change in both c 1 2 (w) and q 1 2 (w) which is unlikely if I Jf'dwW is close to one. Estimation of the Absolute Coherency The squared coherency l %dwW is estimated by l ffdwW where If I ffu(w)l > 1 2'1 2 (w)l = [ci z (w) + 4i z (w) r12 /[Jl l (w)fzz (w)] 112 = h ( jl l (w)Jzz (W), c u(w), q u(w)). 0, then by ( 1 1 .7.7) and Proposition 6.4.3, l ffu(w) l is AN( I %1 2 (w) l , a;( l - l %dwW) 2/2), ( 1 1.7. 1 1) giving the approximate 95% confidence bounds, for I Jf'dw) l . l fu(w)l ± 1 .96an ( l - l f1 2 (w)l 2 )/j2, Since d [tanh - 1 (x)]/dx Proposition 6.4.3 that = [ C � :) } d � ln dx = (1 - x 2 ) - 1 , it follows from (1 1 .7. 1 2) From ( 1 1 .7. 1 2) we obtain the constant-width large-sample 100(1 - a)% confidence interval, (tanh - 1 (1X'u(w)l) - I -a/2 an /J2, tanh - 1 ( 1 2'1 2 (w)l) + I - af2 an/J2), for tanh - 1 ( l %1 2 (w)l ). The corresponding 100(1 - a)% confidence region for l %1 2 (w) l is the intersection with [0, 1 ] of the interval (tanh [tanh - 1 (1X'u(w)l ) - I -af2 an /J2 ] , tanh [tanh - 1 ( I X'1 2 (w) l ) + I - a/2 an /J2 ] ), assuming still that I Jf'u(w)l > 0. (1 1 .7. 1 3) If the weight function W,(k) in ( 1 1 .7.5) has the form W,(k) = (2m + 1) - I for l k l :<:; m and W,(k) = 0, l k l > m, then the hypothesis l %1 2 (w) l = 0 can be § 1 1.7. Estimating the Cross Spectrum 45 1 tested against the alternative hypothesis l ffu(w) l > 0 using the statistic, Y = 2m i %u(w) l 2 /[1 - l %u(wWJ . Under the approximating asymptotic distribution of Theorem 1 1 .7. 1 , it can be shown that I %1 2 (wW is distributed as the square of a multiple correlation coefficient, so that Y F(2, 4m) under the hypothesis that l ff1 ;(w) l = 0. (See Hannan ( 1970), p. 254.) We therefore reject the hypothesis l ffu(w) l = 0 if � Y > F1 _ a (2, 4m) (1 1 .7. 14) where F1 _ a (2, 4m) is the ( 1 - a)-quantile of the F distribution with 2 and 4m degress of freedom. The power of this test has been tabulated for numerous values of l ffu(w) l > 0 by Amos and Koopmans ( 1 963). EXAMPLE 1 1 .7. 1 (Sales with a Leading Indicator). Estimates of the spectral density for the two differenced series {D,J } and {D,2 } in Example 1 1 .2.2 are shown in Figures 1 1 .5 and 1 1 .6. Both estimates were obtained by smoothing the respective periodograms with the same weight function W.(k) /3 , l k l s 6. From the graphs, it is clear that the power is concentrated at high frequencies for the leading indicator series and at low frequencies for the sales series. The estimated absolute coherency, i &"1 2 (w) l is shown in Figure 1 1 .7 with = 0.05 0 . 045 0.04 0 . 035 0.03 0.025 0.02 0.0 1 5 0.01 0 .005 0 0 0. 1 0 2 0.4 0.3 Figure 1 1 .5. The spectral density estimate ]d2nc), 0 :s; c leading indicator series of Example 1 1 .7. 1 . :s; 0.5 0.5, for the differenced 452 1 1 . Multivariate Time Series 1 .6 1 .5 1 .4 1 .3 1 .2 1 .1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Q. 1 0 0.1 0 0.2 0.4 0.3 0.5 Figure 1 1 .6. The spectral density estimate ]2(2nc), 0 :o; c :o; 0.5, for the differenced sales data of Example 1 1.7. 1 . corresponding 95% confidence intervals computed from 1 1 .7. 1 3. The confi dence intervals for l f1 2 (w)J are bounded away from zero for all w, suggesting that the coherency is positive at all frequencies. To test the hypothesis H0 : l fdw) J = 0 at level a = .05, we use the rejection region ( 1 1 .7.14). Since m = 6, we reject H0 if 2 1 2 J ffdw) J > F 95(2, 24) 3.40, 1 - l fdw) J 2 � = = i.e. if lffdw)J > .470. Applying this test to lffdw)J, we find that the hypothesis l fdw) J 0 is rejected for all w E (0, n:). In fact the same conclusions hold even at level a = .005. We therefore conclude that the two series are correlated at each frequency. The estimated phase spectrum rP 1 2(w) is shown with the 95% confidence intervals from ( 1 1 .7. 10) in Figure 1 1.8. The confidence intervals for ¢1 2(w) are quite narrow at each w owing to the large values of l ffdw) J . Observe that the graph of rP dw) is piecewise linear with slope 4. 1 at low frequencies and slope 2.7 at the other frequencies. This is evidence, supported by the earlier analysis of the cross correlation function in Example 1 1 .2.2, that {D,1} leads {D, 2 } by approximately 3 time units. A transfer function model for these two series which incorporates a delay of 3 time units is discussed in Example 1 3. 1 . 1. The results shown in Figures 1 1 .5-1 1 .8 were obtained using the program SPEC. §1 1 .7. Estimating the Cross Spectrum 453 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0. 1 0.2 0 3 0.4 0.5 Figure 1 1.7. The estimated absolute coherency I K 1 2 (2nc)l for the differenced leading indicator and sales series of Example 1 1 .7. 1 , showing 95% confidence limits. 0 -1 -2 -3 -4 0 0.1 0.2 0.3 0.4 0.5 Figure 1 1 .8. The estimated phase spectrum, �1 2 (2nc), for the differenced leading indicator and sales series, showing 95% confidence limits. I I . Multivariate Time Series 454 § 1 1 . 8 * The Spectral Representation of a Multivariate Stationary Time Series In this section we state the multivariate versions of the spectral representation Theorems 4.3.1 and 4.8.2. For detailed proofs see Gihman and Skorohod ( 1 974) or Hannan ( 1 970). All processes are assumed to be defined on the probability space (Q, .'#', P). Theorem 1 1 .8.1. f( · ) is the covariance matrix function of an m-variate stationary process {X,, t = 0, ± 1 , . . . } if and only if h = 0, ± 1 , . . . , e ihv dF(v), f(h) = ( J(-n.n] where F( · ) is an m x m matrix distribution function on [ - n, n]. ( We shall use this term to mean that F( - n) = 0, F( · ) is right-continuous and (F(/1) - F(A)) is non-negative definite for all A :::; Jl, i.e. oo > a *(F ( Jl) - F(A)) a ?: 0 for all a E e m, where a* denotes the complex conjugate transpose of a.) F is called the spectral distribution matrix of {X,} or of r( · ). Each component Fik( · ) of F( · ) is a complex-valued distribution function and J( ,1 eihv dF(v) is the matrix whose (j, k)-component is L - n] eihv dFjk(v). _ ,_ "· PROOF. See Gihman and Skorohod (1 974), p. 2 1 7. D In order to state the spectral representation of {X, }, we need the concept of a (right-continuous) vector-valued orthogonal increment process {Z(A), - n :::; A :::; n}. For this we use Definition 4.6. 1, replacing (X, Y) by EXY* and I I X II 2 by EXX*. Specifically, we shall say that {Z(A), - n :::; A :::; n} is a vector-valued orthogonal increment process if the components of the matrix E(Z(A)Z*(A)) are finite, - :::; A :::; n, EZ(A) = 0, - n :::; A :::; n, E(Z(A4 ) - Z(A3))(Z(A 2 ) - Z(A d )* = 0 if (A 1 , A2] (A3, A4 ] = r/J, and E(Z(A + b) - Z(A))(Z(A + b) - Z(A))* --> 0 as b !O. Corresponding to any process {Z(A), - n :::; A :::; n} satisfying these four properties, there is a unique matrix distribution function G on [- n, n] such (i) (ii) (iii) (iv) n n that G(/1) - G(A) = E [(Z(/1) - Z(A.))(Z(/1) - Z(),))*], A :::; fl. ( 1 1 .8. 1 ) In shorthand notation the relation between the matrix distribution function G and {Z(A.), - n :::; A :::; n} can be expressed as E(dZ(A.) d Z* (/1)) = bA dG(A.) = · �' {dG(A.) 0 if J1 = ) , : otherwise. Standard Brownian motion {B(A.), - n :::; A :::; n} with values m IRm and § 1 1 .8. * The Spectral Representation of a Multivariate Stationary Time Series 455 B( - n) = 0 is an orthogonal increment process with G(A) = ()_ + n)I where I is the (m x m) identity matrix. The fact that G(A) is diagonal in this particular case reflects the orthogonality of B;(A), Bj(A), i i= j, for m-dimensional Brownian motion. It is not generally the case that G(A) is diagonal; in fact from ( 1 1 .8. 1) the (i,j)-element of dG(A) is the covariance, E (dZ;(A) dZp)). The stochastic integral I ( f) with respect to { Z(A)} is defined for functions f which are square integrable with respect to the distribution function G0 := L f= 1 Gii as follows. For functions of the form f(A) = n L fJu,, ;.,+ ,/A), i =O - n = A0 < A 1 < . . . < An+l = n, ( 1 1 .8.2) we define n ! ( f ) := L J; [Z(A;+d - Z(A;)]. ( 1 1 .8.3) i =O This mapping is then extended to a Hilbert space isomorphism I of L2(G0) into L 2 (Z), where L 2(Z) is the closure in L 2 (0., Ji', P) of the set of all linear combinations of the form ( 1 1 .8.3) with arbitrary complex coefficients };. The inner product in L 2 (Z) is defined by (1 1 .8.4) Definition 1 1 .8.1 . If { Z(A), - n .::;; A .::;; n} is an m-variate orthogonal increment process with E (dZ(A) dZ * (Jl)) = b;..ll dG(A) and G0 = 2:� 1 G;;, then for any f E L 2 (G0) we define the stochastic integral J( - , , ,J i(v) dZ( v) to be the random vector I (f ) E (Z) with I defined as above. U The stochastic integral has properties analogous to (4.7.4)-(4.7.7), namely E (l (f)) = 0, l (a J + a 2 g) = a 1 I ( f) + a 2 l (g), <J( f ), I(g)) = 1- , , ,/( v) g( v) dG0( v), a 1 , a 2 E C, and <I(fn), l (g.) ) --> 1 - , , ,/( v) g( v) dG0(v) if fn � f and g. L2(G0) � g, with the additional property, E (l (f)I( g)*) = 1 - , , , /( v) g( v) dG(v). Now suppose that {X,} is a zero-mean m-variate stationary process with spectral distribution matrix F( ' ) as in Theorem 1 1 .8. 1 . Let Yt' be the set of random vectors of the form 1 1 . M ultivariate Time Series 456 ( 1 1 .8.5) U(Q, F0 U(F0).F and let :if denote the closure in ff, P) of Jlt. The inner product in :if is defined by ( 1 1 .8.4). Define % to be the (not necessarily closed) subspace sp e ;1 . t E Z of where = L7'= t ii . lf .i1 denotes the closure of % in then, as in Section 4.8, .i1 = The mapping T defined by L2{(F0), } U(F0) r(j=lf. aiX11) = f. aie i1r, j=l ( 1 1 .8.6) can be extended as in Section 4.8 to a Hilbert space isomorphism of :if onto which by Theorem 1 1 .8.1 has the property that L2 (F0), E[(T� 1f )(T� 1 g * ] 1�,, ,1 f(v)g(v) dF(v), Consequently the process {Z(A), - n ::; A ::; n} defined by Z(A) = T � 1 I( �Jt.AJ ( - ), - n ::; ) ::; n, is an orthogonal increment process, and the matrix distribution function associated with { Z(A)} is precisely the spectral distribution matrix F of { X1 } appearing in Theorem The spectral representation of {X1} is then ) = , ( 1 1 .8.8) 1 1 .8. 1 . established by first showing that T� 1f 1 ,, ,/(v)dZ(v), � then setting f(v) e i1 v and using = ( 1 1 .8.9) ( 1 1 .8.6). = If {X1} { Z(A), ::; ::; n} F(A), is a stationary Theorem 1 1 .8.2 (The Spectral Representation Theorem). sequence with mean zero and spectral distribution matrix F ( · ), then there exists a right-continuous orthogonal increment process A such that -n (i) and E[(Z(A) - Z( (ii) X1 = Z( - n))* ] = dZ(v) with probability 1. - n) ) (Z(Jc) - j ei1 v J(�1t,1t] PROOF. The steps are the same as those in the proof of Theorem process being defined by (1 1 .8.8). { Z().) } 4.8.2, the D The following corollary is established by the same argument which we used to prove Corollary 4.8. 1 . If {X1 } is a zero-mean stationary sequence then there exists a - n ::; such that right-continuous orthogonal increment process Corollary 1 1 .8.1 . {Z(A), A ::; n} § 1 1 .8. * The Spectral Representation of a Multivariate Stationary Time Series Z( - n) = 0 and xt = I eitA dZ(v) J(-1t,1t] 457 with probability 1 . If { Y()") } and {Z(A-) } are two such processes, then P(Y(A_) = Z(A-) = 1 for each A E [ - n, n] . gEL2 (F0), Equations ( 1 1 .8.7) and ( 1 1 .8.9) imply that for any functions f, E [f_ "· "/()") dZi()") f_"· "l g(A.) dZj(A-) J = 1 - "· "/(A-) g(A-) dFi (A.). It can be shown (Problem 1 1.22) that the same relation then holds for all fEU(Fi ; ) and gEL2 (Fjj). As in the univariate case we also have the important result that Y E (see (1 1 .8.5)) if and only if there exists a function gEL 2(F0) such that Y = 1 -"· "l g(v) dZ(v) with probability 1 . I n many important cases of interest (in particular i f {X1 } i s an ARMA process) the spectral distribution matrix F( ) has the form, F(w) = J:J(v) dv, - n w n. Then f( · ) is called the spectral density matrix of the process. In the case when for al i, j E { 1 , ... , m} we have the simple relations ( 1 1 . 1 . 14) L �= IYii(h) l and ( 1 1 . 1 . 1 5) connecting r( and f( ) . Remark I. Remark 2. Yi' Remark 3. · s - oo < oo i ·) s · Time Invariant Linear Filters The spectral representation of a stationary m-variate time series is particularly useful when dealing with time-invariant linear filters. These are defined for m-variate series just as in Section 4. 10, the only difference being that the coefficients of the filter 0, ± 1 , are now (/ x m) matrices instead of scalars. In particular an absolutely summable TLF has the property that the elements of the matrices are absolutely summable and a causal TLF has the property that 0 for j < 0. is obtained from by application of the absolutely summable If (l x m) TLF then 00 = L 1 1 .8. 1 0) Hi H = { Hi,j = { Y1 } H = {Hi},Hi = {X1 } yt j= HjXt -j• - co ... } ( 1 1 . Multivariate Time Series 458 The following theorem expresses the spectral representation of {Y1 } in terms of that of {X1 }. If H = {Hi} is an absolutely summable (l x m) TLF and {X1 } is any zero-mean m-variate stationary process with spectral representation, Theorem 1 1 .8.3. e'tv dZ(v), Xt = l J(�rr,rr] and spectral distribution matrix F, then the /-variate process ( 1 1 .8.1 0) is stationary with spectral representation Y1 = l e 11vH(e � 'v) dZ(v), J(�rr,rr] and spectral distribution matrix Fy satisfying dFy(v) = H(e � iv) dF(v)H' (e 'v), where H(e'v) = L Hj e iiv. 00 j= - oo PROOF. The proof of the representation of Y1 is the same as that of Theorem 4. 1 0. 1 . Since Y1 is a stochastic integral with respect to the orthogonal increment process {W(v)} with dW(v) = H(e�'v) dZ(v), it follows at once that EY1 = 0 and that Y1 is stationary with dFy(v) = E(dW(v) dW*(v)) = H(e � iv) dF(v)H'(e'v) E(Yt+ h Y1*) = l e'hv dFy(v). J(�rr,rr] and Remark D 4. The spectral representation decomposes X1 into a sum of sinusoids eit v dZ(v), �v� The effect of the TLF H is to produce corresponding components � v� eitv H(e �iv) dZ(v), which combine to form the filtered process {Y1}. The function H(e � iv), � v � is called the matrix transfer function of the filter H = { HJ. -n n. -n -n n, ExAMPLE 1 1 .8. 1 . The causal ARMA(p, q) process, <l>(B)X1 = 8(B)Z0 can be written (by Theorem 1 1 .3.1) in the form 00 xt = j�IO 'l'j z t�j• n, 459 Problems where L � o 'f'i z i = <1> - 1 (z)0(z), l z l :::;; 1 . Hence {X,} i s obtained from {Z, } by application of the causal TLF {'f'i,j = 0, 1 , 2, . . . }, with matrix transfer function, - n :::;; v :::;; n. By Theorem matrix, 1 1 .8.3 the spectral distribution matrix of X therefore has density - n :::;; v :::;; n. ExAMPLE 1 1 .8.2. The spectral representation of any linear combination of components of X, is easily found from Theorem 1 1 .8.3. Thus if r; = a* X, where a E Cm, then where Zy(v) = a* Z(v), and dFy (v) = E(dZy(v) dZ:(v)) = a* dF(v)a. The same argument is easily extended to the case when r; is a linear combi nation of components of X, X,_ 1 , . . . . Problems 1 1.1. Let { Y, } be a stationary process and define the bivariate process, X11 = Y, , = Y,-d where d =1 0. Show that { (X1 1 , X1 2 )' } is stationary and express its cross-correlation function in terms of the autocorrelation function of { Y,} . If Pr (h) ..... 0 as h ..... oo show that there exists a lag k such that p 1 2 (k) > pn(O). X, 2 1 1 .2. Show that the linear process defined in ( 1 1 . 1 . 1 2) is stationary with mean 0 and covariance matrix function given by ( 1 1 . 1 . 1 3). 1 1 .3. * Prove Proposition 1 1 .2.2. 1 1 .4. 1 1 .5. 1 1 .6. Prove Theorem 1 1 .3.2. If {X, } is a causal ARMA process, show that there exists e E (0, 1) and a constant K such that IYii(h)l :::;; K e1 h l for all i, j and h. Determine the covariance matrix function of the ARMA(1, 1) process defined in ( 1 1 .4.33). 460 1 1 . Multivariate Time Series 1 1 .7. If G(z) = L h'� 1(h)z h is the covariance matrix generating function of an ARMA process, show that G(z) = <l> - 1 (z) E>(z) tE>'(z - 1 )<l>' - 1 (z - 1 ). P 1 1 .8. For the matrix M in ( 1 1 .4.21 ), show that det(z/ - M) = zm det(<l>(z - 1 )) where P <l>(z) I - <l>P 1 z - · · · - <l>PPz . - oc = 1 1 .9. (a) Let { X, } be a causal multivariate AR(p) process satisfying the recursions { Z, } � WN(O, t). For n > p write down recursion relations for the predictors, Ps, Xn + h , h � 0, and find explicit expressions for the error covariance matrices in terms of the AR coefficients and * when h 1, 2 and 3. (b) Suppose now that {Y,} is the multivariate ARIMA(p, 1 , 0) process satisfying V'Y, X,, where { X, } is the AR process in (a). Assuming that Y 0 j_ X,, t � I , show that h P( Yn + h i Yo , Y J> · · · , Yn) = Yn + L Ps, Xn + j j� I = = and derive the error covariance matrices when h 1 , 2 and 3. Compare these results with those obtained in Example 1 1 .5. 1 . = 1 1 . 10. Use the program ARVEC t o analyze the bivariate time series, X, 1 , X, 2 , t 1 , . . . , 200 (Series J and K respectively in the Appendix). Use the minimum AICC model to predict (Xt. l , Xr. 2 ), t = 201 , 202, 203 and estimate the error covariance matrices of the predictors. = 1 1 . 1 1 . Derive methods for simulating multivariate Gaussian processes and multi variate Gaussian ARMA processes analogous to the univariate methods speci fied in Problems 8. 1 6 and 8 . 1 7. 1 1 . 1 2. Let { X , } be the invertible MA(q) process { Z, } � W N ( O, t ) , where t is non-singular. Show that as n --> oo, (a) E(X n+ t - Xn + t - Zn+ t ) (Xn+t - Xn + t - Zn+ t l' --> 0, (b) vn --> t, and (c) E>nj --> E>j, j = 1, . . . , q. (For (c), note that E>j = E(X n+t Z�+ t -)� - t and E>nj = E (Xn + 1 (Xn + t - j xn+t -ll v,.-::. �.) 1 1 . 1 3. If X and Y are complex-valued random variables, show that E l Y - aXI 2 is minimum when a = E( YX)/E I X I 2 . 1 1 . 1 4. Show that the bivariate time series (Xn , Xd' defined in ( 1 1 .6.14) is stationary. 1 1 . 1 5. If A and its complex conjugate A are uncorrelated complex-valued random variables such that EA = 0 and E I A I 2 = (J2 , find the mean and covariance matrix of the real and imaginary parts of A. If X, = L}�1 ( Aj e i<;t + � e - i'i' ), 0 < il 1 < · · · < iln < rr, where {Aj , � ,j = 1 , . . . , n } are uncorrelated, E Aj = 0 and E I Aj l 2 = (Jl/2 , j = 1 , . . . , n, express X, as L}� 1 [Bj cos(iljt) + Cj sin(iljt)] and find the mean and variance of Bj and Cj . Problems 461 Y is a complex-valued random vector with covariance E( Y - Jl) ( Y - Jl) * = :E 1 + i:E2, verify that the matrix 1 1 . 1 6. If matrix :E := is the covariance matrix of a real-valued random vector. 1 1. 1 7. Let un = n [ Z, cos ( tw) _ 112 � L.. ' . r =l Z, sm ( tw) J where { Z, } is bivariate white noise with mean 0 and covariance matrix 1:, and 2nj/n E (0, n). Show that E V. V� = !(� n w1 = 1 1 . 1 8. If U1 and U2 are independent standard normal random variables, show that U2/U1 has a Cauchy distribution and that arg(U1 + iU2) is uniformly dis tributed on ( - n, n). 1 1 . 19. Verify the calculation of the asymptotic variances in equations ( 1 1 .7.8), ( 1 1 .7.9) and ( 1 1 .7. 1 1). 1 1 .20. Let { X, 1 , t = I , . . . , 63} and {X,2, t = I , . . . , 63} denote the differenced series {V In Yr 1 } , {V In Y, 2} where { Y, J } and { Y,2} are the annual mink and muskrat trappings (Appendix A, series H and I respectively). (a) Compute the sample cross correlation function of { Xr1 } and { X,2 } for lags between - 30 and + 30 using the program TRANS (b) Test for independence of the two series. 1 1.21 . With { Xr 1 } and { Xd as in Problem 1 1 .20, estimate the absolute coherency, I K 1 2(il)l and phase spectrum ¢'1 2(il), 0 � il. � n, using S PEC. What do these functions tell you about the relation between the two series? Compute approxi mate 95% confidence intervals for I K 1 2(il)l and ¢' 1 2(il). 1 1 .22. * Prove Remark 1 of Section 1 1.8. 1 1 .23. * Let {X, } be a bivariate stationary process with mean 0 and a continuous spectral distribution matrix F. Use Problem 4.25 and Theorem 1 1.8.2 to show that {X, } has the spectral representation X,1 = 2 r J (O.•J cos(vt) d U) v) + 2 r J(O.n] sin ( vt) d lj(v), j = I , 2, where {U(il) = ( U1 (il.), UAil.))' } and {V(il.) = ( V1 (il.), V2(..i.))' } are bivariate ortho gonal increment processes on [0, n] with 1 }, R e {d = 2 - E(dV(il.) dV'(fl)) E(dV(A.) dV' (!1)) and = i5;..� F(il.) r 1 i5;..� Re { d F (A.) } , 462 1 1 . Multivariate Time Series If {X,} has spectral density matrix f(A), then en{ A) = and r' Cov(dU, (A), dU2 (A)) = r' Cov(dV, (A), dV2 (A)) where c1 2 (A) is the cospectrum and q 1 2 (A) is the quadrature spectrum. Thus c1 2 (A) and q1 2 (A) can be interpreted as the covariance between the "in-phase" and "out of phase" components of the two processes { X,d and {X,J at frequency A. CHAPTER 1 2 State- Space Models and the Kalman Recursions In recent years, state-space representations and the associated Kalman recursions have had a profound impact on time series analysis and many related areas. The techniques were originally developed in connection with the control of linear systems (for accounts of this subject, see the books of Davis and Vinter ( 1 985) and Hannan and Deistler ( 1988)). The general form of the state-space model needed for the applications in this chapter is defined in Section 1 2. 1 , where some illustrative examples are also given. The Kalman recursions are developed in Section 1 2.2 and applied in Section 1 2.3 to the analysis of ARMA and ARIMA processes with missing values. In Section 1 2.4 we examine the fundamental concepts of controllability and observability and their relevance to the determination of the minimal dimension of a state-space representation. Section 1 2.5 deals with recursive Bayesian state estimation, which can be used (at least in principle) to compute conditional expectations for a large class of not necessarily Gaussian processes. Further applications of the Bayesian approach can be found in the papers of Sorenson and Alspach ( 1 97 1), Kitagawa ( 1987) and Grunwald, Raftery and Guttorp ( 1989). § 1 2. 1 State-Space Models In this section we shall illustrate some of the many time-series models which can be represented in linear state-space form. By this we mean that the series {Y�> t = 1, 2, . . . } satisfies an equation of the form t = 1 , 2, . . . ' ( 1 2. 1 . 1 ) 1 2. State-Space Models and the Kalman Recursions 464 where X, + I = F,X, + v, t = 1 , 2, . . . . ( 1 2. 1 .2) The equation ( 1 2. 1 .2) can be interpreted as describing the evolution of the state X, of a system at time t (a v x 1 vector) in terms of a known sequence of v x v matrices F 1 , F 2 , and the sequence of random vectors X 1 , V 1 , Equation ( 1 2. 1 . 1) then defines a sequence of observations, Y , which V2 , are obtained by applying a linear transformation to X, and adding a random noise vector, W,, t = 1 , 2, . . . . (The equation ( 12. 1 .2) is generalized in control theory to include an additional term H, u, on the right, representing the effect of applying a control u, at time t for the purpose of influencing X, + 1 .) • . . • • • . Assumptions. Before proceeding further, we list the assumptions to be used in the analysis of the state equation ( 1 2. 1 .2) and the observation equation ( 1 2. 1 . 1 ) : (a) F 1 , F 2 , is a sequence of specified v x v matrices. (b) G 1 , G 2 , . . . is a sequence of specified w x v matrices. (c) {X 1 , (V; , w;)', t = 1 , 2, . . . } is an orthogonal sequence of random vectors with finite second moments. (The random vectors X and Y are said to be orthogonal, written X l. Y, if the matrix E(XY') is zero.) (d) EV, = 0 and EW, = 0 for all t. (e) E(V,V ;) = Q,, E(W,W;) = R,, E(V, W;) = S, , where {Q,}, {R,} and {S,} are specified sequences of v x v, w x w and v x w matrices respectively. . . • Remark 1. In many important special cases (and in all the examples of this section) the matrices F, G,, Q, R, and S, will be independent of t, in which case we shall suppress the subscripts. It follows from the observation equation ( 1 2. 1 . 1 ) and the state equation ( 1 2. 1 .2) that X, and Y, have the functional forms, for t = 2, 3, . . . , Remark 2. and x, = !,( X I , v I • . . . , v, _ l ) ( 1 2. 1 .3 ) Y, = g,(X 1 , V � > . . . , V, _ 1 , W,). ( 1 2. 1 .4) Remark 3. From Remark 2 and Assumption (c) it is clear that we have the orthogonality relations, and V, l. Y, 1 � s � t, W, l. Y, 1 � s < t. As already indicated, it is possible to formulate a great variety of time-series (and other) models in state-space form. It is clear also from the 465 § 1 2. 1 . State-Space Models definition that neither {X r } nor {Y,} is necessarily stationary. The beauty of a state-space representation, when one can be found, lies in the simple structure of the state equation ( 1 2. 1 .2) which permits relatively simple analysis of the process { Xr }. The behaviour of {Y,} is then easy to determine from that of { X,} using the observation equation ( 1 2. 1 . 1 ). If the sequence { X 1 , V 1 , V 2 , . . . } is independent, then {X,} has the Markov property, i.e. the distribution of X, + 1 given X" . . . , X 1 is the same as the distribution of X, + 1 given X,. This is a property possessed by many physical systems provided we include sufficiently many components in the specification of the state X, (for example, we may choose the state-vector in such a way that X, includes components of X, _ 1 for each t). To illustrate the versatility of state-space models, we now consider some examples. More can be found in subsequent sections and in the books of Aoki ( 1 987) and Hannan and Deistler ( 1 988). The paper of Harvey ( 1984) shows how state-space models provide a unifying framework for a variety of statistical forecasting techniques. ExAMPLE 1 2. 1 . 1 (A Randomly Varying Trend With Added Noise). If {3 is constant, { V,} WN(O, a 2 ) and Z 1 is a random variable uncorrelated with { V, , t = 1 , 2, . . . }, then the process {Z,, t = 1 , 2, . . . } defined by t = 1 , 2, . . . ' (12.1 .5) Zr+ l = z, + {3 + v, = z l + {3t + VI + . . . + v,, � has approximately linear sample-paths if a is small (perfectly linear if a = 0). The sequence { V,} introduces random variation into the slope of the sample-paths. To construct a state-space representation for { Z,} we introduce the vector Then ( 1 2. 1 .5) can be written in the equivalent form, t = 1 , 2, . . . ' ( 1 2. 1 .6) where V, = ( V, , 0)'. The process {Z,} is then determined by the observation equation, Z, = [ 1 O]X, . A further random noise component can be added to Z,, giving rise to the sequence Y, = [ 1 O]X, + �' (12.1 .7) t = 1 , 2, . . . , where { � } WN(O, v 2 }. If {X 1 , V1 , W1 , V2 , W2 , } is an orthogonal sequence, the equations ( 1 2. 1 .6) and ( 1 2 . 1 .7) constitute a state-space representation of the process { Y,}, which is a model for data with randomly varying trend and added noise. For this model we have � • • • 1 2. State-Space Models and the Kalman Recursions 466 EXAMPLE 1 2. 1 .2 (A Seasonal Series with Noise). The classical decomposition ( 1 .4. 1 2) considered earlier in Chapter 1 expressed the time series { X1} as a sum of trend, seasonal and noise components. The seasonal component (with period d) was a sequence {s1} with the properties sr + d = S1 and L�= 1 s1 = 0. Such a sequence can be generated, for any values of s 1 , s0 , , s _ d + J • by means of the recursions, • • • sr + 1 = - sr - · · · - sr - d + 2 ' t = 1 , 2, . . . ' ( 12. 1 .8) A somewhat more general seasonal component { Y, }, allowing for random deviations from strict periodicity, is obtained by adding a term V, to the right side of ( 1 2. 1. 1 8), where { V,} is white noise with mean zero. This leads to the recursion relations, Y, + 1 = - Y, - . . . - Y. - d + 2 + v; , (12.1.9) t = 1 , 2, . . . . To find a state-space representation for { Y,} we introduce the (d - I )-dimensional state vector, The series { Y, } is then given by the observation equation, Y, = [1 0 0 ·· · O]Xl ' t = 1 , 2, . . . ' where { X1} satisfies the state equation, t = 1 , 2, . . . ' with V1 = ( V, , 0, . . . , 0)' and -1 1 F= 0 0 -1 0 0 -1 0 0 -1 0 0 0 ExAMPLE 1 2. 1 . 3 (A Randomly Varying Trend with Seasonal and Noise Components). Such a series can be constructed by adding the two series in Examples 1 2. 1 . 1 and 1 2. 1 .2. (Addition of series with state-space representations is in fact always possible by means of the following construction. See Problem 1 2.2.) We introduce the state-vector where X11 and X� are the state vectors in Examples 1 2. 1 . 1 and 1 2. 1 .2 respectively. We then have the following representation for { Y,}, the sum of the two series whose state-space representations were given in Examples § 1 2. 1 . State-Space Models 467 1 2. 1 . 1 and 1 2. 1 .2. The state equation is ( 12. 1 . 10) where F 1 , F 2 are the coefficient matrices and {V,1 }, { vn are the noise vectors in the state equations of Examples 1 2. 1 . 1 and 1 2. 1 .2 respectively. The observation equation is Y, = [ 1 0 0 ··· ( 1 2. 1 . 1 1 ) O]X, + W,, where { W,} is the noise sequence i n ( 1 2. 1 .7). I f the sequence of random vectors, {X 1 , v t, Vi, W1 , VL V�, W2 , . . . }, is orthogonal, the equations ( 1 2. 1 . 1 0) and ( 1 2. 1 . 1 1) constitute a state-space representation for { Y,} satisfying assumptions (aHe). We shall be concerned particularly in this chapter with the use of state-space representations and the Kalman recursions in the analysis of ARMA processes. In order to deal with such processes we shall need to consider state and observation equations which are defined for all t E { 0, ± 1 , . . . } . Stationary State-Space Models Defined for t E {0, ± 1 , . . . } Consider the observation and state equations, Y, = GX, + W,, t = 0, ± 1 , . . ( 1 2. 1 . 1 2) . ' ( 1 2. 1 . 1 3) t 0, ± 1, . . . X, + 1 = FX, + V, , where F and G are v x v and w x v matrices respectively, {VJ WN(O, Q), {W,} WN(O, R), E(V, w;) = S for all t and Vs .1 W, for all s =I= t. The state equation (12. 1 . 1 3) is said to be stable (or causal) if the matrix F has all its eigenvalues in the interior of the unit circle, or equivalently if det(J Fz) =1= 0 for all z E C such that I z I :s; 1 . The matrix F is then also said to be stable. In the stable case the equations ( 1 2. 1 . 1 3) have the unique stationary solution (Problem 1 2.3) given by 00 (12. 1 . 14) X, = L FiVt - j - 1 • j=O The corresponding sequence of observations, = ' � � - 00 Y, = w, + L GFiV, _ j - 1 • j= O is also stationary. 1 2. State-Space Models and the Kalman Recursions 468 ExAMPLE 1 2. 1 .4 (State-Space Representation of a Causal AR(p) Process). Consider the AR(p) process defined by t = 0, ± 1 , . . . , ( 1 2. 1 . 1 5) where {Zr } - WN(O, a 2 ), and ¢(z) := 1 - ¢ 1 z - · · · - r/J p zP is non-zero for l z l ::::; 1 . To express { X r } in state-space form we simply introduce the state vectors, t = 0, ± 1 , . . . . ( 1 2. 1 . 1 6) If at time t we observe r; = X" then from ( 1 2. 1 . 1 5) and ( 1 2. 1 . 1 6) we obtain the observation equation, r; = [0 0 0 . . . 1]X" ( 1 2. 1 . 1 7) t = 0, ± 1 , . . . , and state equation, xt + l = 0 0 0 0 0 0 0 cPp cPp - 1 cPp - 2 0 0 ¢1 Xr + 0 0 0 zt + l • t = 0, ± 1 , . . . . ( 1 2. 1 . 1 8) In Example 1 2. 1 .4 the causality condition, ¢(z) i= 0 for l z l ::::; 1 , is equivalent to the condition that the state equation ( 1 2. 1 . 1 8) is stable, since the eigenvalues of the coefficient matrix F in ( 1 2. 1 . 1 8) are simply the reciprocals of the zeroes of ¢(z) (Problem 1 2.4). The unique stationary solution of ( 1 2. 1 . 1 8) determines a stationary solution of the AR(p) equation ( 1 2. 1 . 1 5), which therefore coincides with the unique stationary solution specified in Remark 2 of Section 3. 1 . Remark 4. If equations ( 1 2. 1 . 1 7) and ( 1 2. 1 . 1 8) are postulated to hold only for t = 1, 2, . . . , and if X 1 is a random vector such that {X 1 , Z 1 , Z 2 , } is an orthogonal sequence, then we have a state-space representation for { r;} of the type defined earlier by ( 1 2. 1 . 1 ) and (12.1 .2). The resulting process { r;} is well-defined, regardless of whether or not the state equation is stable, but it will not in general be stationary. It will be stationary if the state equation is stable and if X 1 is defined by ( 1 2. 1 . 1 6) with X r = L� o t/Jj Zr - j • t = 1 , 0, . . . , 2 - p, and t/J(z) = 1/¢(z), i z l ::::; 1 . Remark 5. • • . EXAMPLE 1 2. 1 .5 (State-Space Representation of a Causal ARMA(p, q) Process). State-space representations are not unique. We shall give two §12. 1 . State-Space Models 469 representations for an ARMA(p, q) process. The first follows easily from Example 1 2. 1 .4 and the second (Example 1 2. 1 .6 below) has a state-space with the smallest possible dimension (more will be said on this topic in Section 12.4). Consider the causal ARMA(p, q) process defined by where { Z,} � ¢(B) ¥; = 8(B)Z, t = 0, ± 1, . . . 2 WN(O, a ) and ¢(z) =!= 0 for I z I ::::; 1. Let ( 1 2.1. 1 9) ' r = max(p, q + 1), ¢j = 0 for j > p, ej = 0 for j > q and e o = 1 . Then i t is clear from ( 1 2. 1 . 1 9) that we can write ¥; = [ 8, - 1 e, - 2 . . . Bo]X, ( 1 2. 1 .20) where ( 1 2.1.2 1 ) and t = 0, ± 1 , . . . . ¢(B)X, = Z, (12. 1 .22) But from Example 1 2. 1 .4 we can write 0 0 x, + 1 0 0 0 0 = 0 0 0 ¢ , c/Jr - 1 c/Jr - 2 X, + 0 0 0 ¢1 z, + 1 , t = 0, ± 1, . . . . ( 1 2. 1 .23) Equations ( 1 2.1 .20) and (12.1.23) are the required observation and state equations. The causality assumption implies that ( 1 2.1.23) has a unique stationary solution which determines a stationary sequence { ¥;} through the observation equation ( 1 2. 1 .20). It is easy to check that this sequence satisfies the ARMA equations ( 1 2. 1 . 1 9) and therefore coincides with their unique stationary solution. ExAMPLE 1 2.1.6 (The Canonical Observable Representation of a Causal ARMA(p, q) Process). Consider the ARMA(p, q) process { ¥;} defined by ( 1 2. 1 . 1 9). We shall now establish a lower dimensional state-space representation than the one derived in Example 1 2.1.5. Let Then m = max(p, q) and ¢i = 0 for j > p. ¥; = [1 0 0 · · · O]X, + Z, t = 0, ± 1, . . . ' ( 1 2.1 .24) 1 2. State-Space Models and the Kalman Recursions 470 where { X1 } is the unique stationary solution of 0 0 xl + l = 1 0 0 0 0 0 0 0 </Jm </Jm - 1 </Jm - 2 XI + </J I 1/1 1 1/1 2 1/Jm - 1 1/Jm t = 0, ± 1 , . . . ' zn (12.1 .25) and 1jJ 1 , . . . , 1/Jm are the coefficients of z, z 2 , . . . , zm in the power series expansion of (}(z)f<P(z), \ z \ :::;; 1. (If m = 1 the coefficients of X1 in ( 1 2. 1 .24) and ( 1 2. 1 .25) are 1 and <P 1 respectively.) PROOF. The result will be proved by showing that if { X1} is the unique stationary solution of (12.1.25) and if { l;} is defined by ( 1 2. 1 .24), then { l;} is the unique stationary solution of ( 1 2. 1 . 1 9). Let F and G denote the matrix coefficients of X1 in ( 1 2. 1 .25) and ( 1 2. 1 .24) respectively, and let H = ( 1/1 1 , 1/1 2 , . . . , 1/Jm)'. Then ( 1 2. 1 .26) i = 1, . . . , m, and, since det(z/ - F) = z m - </J 1 zm - l - · · · - <Pm (see Problem 1 2.4), the Cayley-Hamilton Theorem implies that ( 1 2. 1 .27) pm - </J ipm - 1 - . . . - </J m/ = 0. From (12.1 .24) and ( 1 2. 1 .25) we have GXI + zn 'Yr + 1 = GFX1 + GHZ1 + Z1 + 1 , Yr = These equations, together with (12.1.26) and ( 1 2 . 1.27), imply that 'Yr + m - </J 1 'Yr + m - 1 - · · · - </J m 'Yr 0 =[ - </Jm · · · - </J 1 1/1 1 1 ] 1/1 2 1/1 1 0 0 0 zl 0 zl + l 0 zl + 2 zl +m 1/Jm 1/Jm - 1 1/Jm - 2 Since 1/1 1 , . . . , 1/1 m are the coefficients of z, z 2 , . . . , zm in the power series expansion of (}(z)/</J(z), i.e. ljli = </J 1 1/lj - l + ¢ 2 1/li - 2 + · · · + <Pi + 8i as in (3.3.3), 471 §12. 1 . State-Space Models we conclude that { r;} satisfies the ARMA equations, Yr + m - <!> 1 Yr + m - 1 - · · · - ¢m 1'; = [8m em - ! 81 · · · z, Zr + l 1 ] Z, + z ( 1 2. 1 .28) Thus the stationary process { r;} defined by ( 1 2. 1 .24) and ( 1 2. 1 .25) satisfies ¢(B) r; = 8(B)Z, t = 0, ± 1, . . . , and therefore coincides with the unique stationary solution of these equations. ExAMPLE 1 2. 1 .7 (State-Space Representation of an ARIMA(p, d, q) Process). If { l';} is an ARIMA(p, d, q) process with {Vd l';} satisfying ( 1 2. 1 . 1 9), then by the preceding example {Vd l';} has the representation, t = 0, ± 1, . . . ' ( 12. 1 .29) where { X,} is the unique stationary solution of the state equation, X, + I = FX, + HZ, and G, F and H are the coefficient matrices in (12. 1 .24) and ( 1 2. 1 .25). Let A and B be the d x 1 and d x d matrices defined by A = B = 1 if d = 1 and A= 0 0 0 , B= 0 0 0 0 0 0 ! + d d d ( - 1 ) (�) ( - 1 ) (d � d ( - 1 ) l (d � z ) if d > 1 . Then since r; = vd r; the vector 0 � . ( - 1)1 1'; -j , d) . ( j� L. I 0 0 1 d (12. 1 .30) } satisfies the equation, Y, = AVd l"; + BY, _ 1 = A GX, + BY, _ 1 + AZ,. Defining a new state vector T, by stacking X, and Y, _ 1 , we therefore obtain the state equation, t = 1 , 2, . . . ' (12. 1 .31) 1 2. State-Space Models and the Kalman Recursions 472 and the observation equation, from ( 1 2. 1 .29) and ( 12. 1 .30), (-1)d + 1 (�) (-1)d(d�d (-l)d - 1 (d �z) l'; = [G d] . . . t = [yt-Xt 1 J + Z� > 1 , 2, . . . , (12.1.32) with initial condition, and the orthogonality conditions, Y0 _l t = 0, ± 1, . . . , Z�> ( 1 2. 1 .33) . . . , Y0)'. The conditions (12.1 .33), which are satisfied where Y0 = in particular if Y0 is considered to be non-random and equal to the vector of observed values (y y2 . . . , y0)', are imposed to ensure that the j_ Y0 and assumptions (a)-(e) are satisfied. They also imply that Y0 j_ V l'; , t :2: 1, as required earlier in Section 9.5 for prediction of ARIMA processes. State-space models for more general ARIMA processes (e.g. { l';} such that {VV l';} is an ARMA(p, q) process) can be constructed in the same way. (See Problem 1 2.9.) (Y1 _d, Y2 - d, 1 - d, - d, X1 d 12 For the ARIMA( l , 1, 1) process defined by (1 + BB)Z� > {Z1} � WN(O, a 2), the vectors X1 and Y1 _ 1 reduce to 1 = X1 and Y1 _ 1 = fr _ . The state-space representation ( 1 2. 1 .31) and (12.1.32) becomes (Problem 1 2.7) (1 - ¢B)(l - B) l'; = Yr [ 1 [xt + 1 ] [¢ o][ xt [¢ + e]zp Yr 1 1 Yr- 1 J 1 = where = and X 1] [ Yr-xt 1 J + zp + 1 (12.1 .34) t= 1, 2, . . . , ( 1 2. 1 .35) ( 1 2. 1 .36) ExAMPLE 1 2. 1 .8 (An ARMA Process with Observational Error). The is not always the most canonical observable representation of Example convenient representation to employ. For example, if instead of observing the ARMA process { l';}, we observe 12.1.6 473 §12. 1 . State-Space Models where { N,} is a white-noise sequence uncorrelated with { Y,}, then a state-space representation for { U,} is immediately obtained by retaining the state equation ( 1 2. 1 .23) of Example 1 2. 1 .5 and replacing the observation equation (12.1.20) by U, = [8, _ , 8, _ 2 · · · 80]X, + N, . (The state-space model in Example 1 2. 1 .6 can also be adapted to allow for added observational noise.) State-space representations have many virtues, their generality, their ease of analysis, and the availability of the Kalman recursions which make least squares prediction and estimation a routine matter. Applications of the latter will be discussed later in this chapter. We conclude this section with a simple application of the state-space representation of Example 1 2. 1 .6 to the determination of the autocovariance function of a causal ARMA process. The Autocovariance Function of a Causal ARMA Process If { Y,} is the causal ARMA process defined by ( 1 2. 1 . 19), then we know from Example 1 2. 1 .6 and ( 12. 1 . 1 4) that where G = [ 1 0 0 · · · OJ and X, = HZ, _ , + FHZ, _ 2 + F2 HZ, _ 3 + · · · , with the square matrix F and the column vector H as in (12.1.25). It follows at once from these equations that 00 Y, = z, + I GFj - 1 HZ, _ j · j= 1 Hence (12.1.37) { l 2 if k = 0, , r(k) = a [ 1 + If= 1 GFj - H H'F'j - 1 G'] Y a 2 [GF ik i - ! H + If= , GFj - 1 HH'F' I k l + j - 1 G'] if k i= 0. The coefficients lj;j in the representation Y, ( 1 2. 1 .37) as { = If= 0 lj;j z, _ j can be read from 1 ifj = 0, = I j j 1/1 G F - H ifj � 1 ' which shows in particular that lj;j converges to zero geometrically as j -> oo . 474 12. State-Space Models and the Kalman Recursions This argument, unlike those used in Chapter 3, does not reqmre any knowledge of the general theory of difference equations. § 12.2 The Kalman Recursions In this section we shall consider three fundamental problems associated with the state-space model defined by (12. 1 . 1 ) and (12. 1 .2) under the assumptions (a)-(e) of Section 1 2. 1 . These are all concerned with finding best (in the sense of minimum mean-square error) linear estimates of the state-vector X, in terms of the observations Y 1, Y 2 , . . . , and a random vector Y 0 satisfying the conditions (12.2. 1 ) The vector Y 0 will depend o n the type of estimates required. I n many (but not all) applications, Y 0 will be the degenerate random vector Y0 = 1 ( 1 , 1, . . . , 1)'. Estimation of X, in terms of = (a) Y0, . . . , Y, 1 defines the prediction problem, (b) Y 0, . . . , Y, defines the filtering problem, and (c) Y0, . . . , Y" defines the smoothing problem (in which it is assumed that _ n > t). Each of these problems can be solved recursively using an appropriate set of Kalman recursions which will be established in this section. Before we can do so however we need to clarify the meaning of best linear estimate in this context. 12.2.1. The best one-step linear predictor, X10 of X, = (X, 1 , • • . , X r v)' is the random vector whose ith component, i = 1, . . . , v, is the best linear predictor of X ti in terms of all the components of the t vectors, Y0, y I ' . . . , Y, - I • More generally the best estimator x t lk of X, is the random vector whose ith component, i = 1, . . . , v, is the best linear estimator of X ti in terms of all the components of Y 0, Y 1 , . . . , Yk . The latter notation covers all three problems (a), (b) and (c) with k = t - 1 , t and n respectively. In particular X, = X,1, _ 1 • The corresponding error covariance matrices are defined to be Definition The Projection P(X I Y0 , . . . , Y1) of a Second-Order Random Vector X In order to find X, (and more generally X, 1 k) we introduce (cf. Section 1 1 .4) the projections P(X I Y0 , . . . , Y,), where X, Y0, . . . , Y, are jointly distributed §12.2. The Kalman Recursions 475 random vectors with finite second moments. If X is a v-component random vector with finite second moments we shall say that X E L2 . If X E L'2 , and Y0 , Y 1 , Y 2 , . . . have finite second moments, then we define P(X I Y0 , . . . , Y r) to be the random v-vector whose ith component is the projection P(X i i S) of the ith component of X onto the span, S, of all of the components of Y0, . . . , Yr . We shall abbreviate the notation by writing Definition 12.2.2. t = 0, 1 , 2, . . . ' throughout this chapter. The operator Pr is defined on U:� 1 L2 . Remark 1 . By the definition of P(Xi i S), PJX) is the unique random vector with components in S such that [X - Pr(X)] (See ( 1 1 .4.2) and ( 1 1 .4.3).) .1 Ys , s = 0, . . . ' t. For any fixed v, Pr( · ) is a projection operator on the Hilbert space inner product (X, Y) = Li� 1 E(Xi Y;) (see Problem 1 2. 1 0). Orthogonality of X and Y with respect to this inner product however is not equivalent to the definition E(XY') = 0. We shall continue to use the latter. Remark 2. L2 with Remark 3. If all the components of X, Y 1 , distributed and Y 0 = 1, then Pr(X) = E(X I Y �o · · · · Y r), Remark 4. Pr is then • . . , Y r are jointly normally t 2 1. linear in the sense that if A is any k x v matrix and X , V E L2 and Remark 5. If Y E L2 and X E L2 , then P(X I Y) = MY, where M is the v x w matrix, M = E(XY')[E(YY')r 1 and [E(YY')r 1 is any generalized inverse of E(YY'). (A generalized inverse of a matrix S is a matrix such that SS - 1 S = S. Every matrix has at least one. See Problem 1 2. 1 1 ). s- 1 Proposition 12.2.1. for t, s 2 1 , If { Xr } and { Yr } are defined as in ( 1 2. 1 . 1) and (12.1 .2), then (1 2.2.2) 1 2. State-Space Models and the Kalman Recursions 476 and in particular, X, = P, _ , (X,), (1 2.2.3) where X, and X,1 s are as in Definition 1 2.2. 1 and Y0 satisfies (12.2. 1). PROOF. The result is an immediate consequence of Definitions 1 2.2. 1 and 1 2.2.2. 0 We turn next to the derivation of the Kalman recursions for the one-step predictors of X, in the state-space model defined by ( 12. 1 . 1 ) and ( 1 2. 1 .2). Proposition 1 2.2.2 (Kalman Prediction). Suppose that X, + 1 = F, X, + V,, t = 1 , 2, . . . , ( 12.2.4) and where Y, = G,X, + W, , E Ut = E [ ] V ' = O, w, t = 1, 2, . . . E(U ' U')' ( 1 2.2.5) , [ Q , R, J, S, = s; X 1 , U 1, U 2 , . . . , are uncorrelated, and Y0 satisfies (12.2. 1). Then the one-step predictors, x, = P, _ , x , , and the error covariance matrices, n, = E[(X, - x,)(X, - x,n are uniquely determined by the initial conditions, and the recursions, for t = 1, 2, . . . , = = = = G,n, G; + R, , F,n, G; + s, , F, n, F; + Q , , F, 'P, F; + e, � ,- ' e; , n, + , - 'P, + , , x, + , = F, x, + e, � ,- ' (Y, - G, X,), where � ,- 1 is any generalized inverse of � , . �� e, n, + , 'P , + , n, + , = PROOF. We shall make use of the innovations, 1,, (1 2.2.6) ( 1 2.2.7) defined by 10 = Y0 and t = 1, 2, . . . . § 1 2.2. The Kalman Recursions 477 The sequence {11 } is orthogonal by Remark 1. Using Remarks 4 and 5 and the relation, Pr( ' ) = Pr t ( · ) + P( ' l lr), (see Problem 1 2. 1 2), we find that Xr + t = = = (1 2.2.8) Pr - t X r + t + P(Xr + t l lr) Pr - I (Fr xt + VI) + e� ��- � �� Ft x t + e � ��- � � 1 > ( 1 2.2.9) where �� = E(I1I;) = G1il1 G; + Rn et = E(X r + I I;) = E[(Fr Xr + vi)([Xr - XrJ'G; + w;)] = FrQr G; + St . To evaluate �1, 81 and il1 recursively, we observe that n t + l = E(Xt + l x; + d - E(Xr + l x; + d = nt + l - 'Pr + l , where, from ( 1 2.2.4) and ( 12.2.9), nt + 1 = Fr nt F; + Qt . and D Remark 6. The initial state predictor X 1 is found using Remark 5. In the important special case when Y0 = 1, it reduces to EX 1 . h-Step Prediction of {Y1} Using the Kalman Recursions The results of Proposition 1 2.2.2 lead to a very simple algorithm for the recursive calculation of the best linear mean-square predictors, P1 Y1 + h' h = 1 , 2, . . . . From ( 12.2.9), ( 1 2.2.4), ( 12.2.5), ( 12.2.7) and Remark 2 in Section 1 2. 1 , we find that ( 1 2.2. 1 0) h = 2, 3, . . . , and h = 1, 2, . . . . ( 1 2.2. 1 1) ( 1 2.2. 1 2) 478 1 2. State-Space Models and the Kalman Recursions From the relation, h = 2, 3, . . . , Q�h>: = E[(Xr+ h - P,X,+ h)(Xr+ h - P,X, +h)'] satisfies the recur h = 2, 3, . . . ' ( 1 2.2. 1 3) with op> = 0, + 1 . Then from (1 2.2.5) and ( 12.2. 1 2) it follows that l:�h > := E[(Y, +h - P,Y, +h)(Yr+h - P,Y,+ h)'] is given by h = 1 , 2, . . . . ( 1 2.2. 14) (Kalman Filtering). Under the conditions of Proposition 1 2.2.2, and with the same notation, the estimates X,1 , = P,X, and the error covariance matrices 0,1 , = E[(X, - Xq,)(X, - X,1 ,)'] are determined by the relations, ( 1 2.2. 1 5) P,X, = P, _ 1 X, + O, G; L1,- 1 (Y, - G, X,), we find that swns, Proposition 1 2.2.3 and ( 12.2. 1 6) PROOF. From ( 1 2.2.8) it follows that where M = E(X, I;)[E(I, I;)]- 1 E[X,(Gr(X, Jt;)'].-1,- 1 1 ( 1 2.2. 1 7) = Q, G;.-1, . To establish ( 1 2.2. 1 6) we write X, - P, _ 1 X, = X, - P,X, + P,X, - P, _ 1 X, = X, - P,X, + MI,. Using ( 1 2.2. 1 7) and the orthogonality of X, - P,X, and MI,, we find from the last equation that = - X,) + as required. D Fixed Point Smoothing). Under the conditions of and Proposition 1 2.2.2, and with the same notation, the estimates the error covariance matrices are determined for fixed t by the following recursions, which can be solved successively for n t, t + 1 , . . . : ( 1 2.2. 1 8) Proposition 1 2.2.4 (Kalman = 0,1 n = E[(X, - X,1 n)(X, - X,1 ")'X,1] n = Pn X,, n, n+1 = O, n[Fn - E>nL1n- 1 GnJ', Q, l n = O, l n - 1 - O,, nG�.-1; 1 Gn o;, n , ( 1 2.2. 1 9) (1 2.2.20) 479 § 1 2.2. The Kalman Recursions with initial conditions, P, _ l x l = X, and n, , , = n, l r - 1 = n, (found from Proposition 1 2.2.2). PROOF. Using ( 1 2.2.8) we can write Pn X, = Pn _ 1 X, + Cin, where In = Gn(Xn - Xn) + Wn . By Remark 5 above, I c = E[X,(Gn(Xn - Xn) + Wn)'][E(In i�)] - = n,, n G��n- l ' (1 2.2.2 1) where n, , n := E [(X, - X,)(Xn - Xn)' ] . It follows now from ( 1 2.2.4), ( 1 2.2. 1 0), the orthogonality ofVn and wn with X, - X" and the definition ofn,,n that n r,n + I = E[(X, - X,)(Xn - XnY(Fn - en �n- 1 Gn)'] = n, , n[Fn - en�; 1 GnJ ' thus establishing ( 1 2.2. 1 9). To establish (1 2.2.20) we write X, - Pn X, = X, - Pn _ 1 X, - Ci n. Using ( 1 2.2.21) and the orthogonality of X, - PnX, and In, the last equation then gives n = t, t + 1, . . . , as required. D EXAMPLE 1 2.2. 1 (A Non-Stationary State-Space Model). Consider the univariate non-stationary model defined by t and where = 1, 2, . . . ' t = 1 , 2, . . . ' We seek state estimates in terms of 1 , Y1 , ¥2 , . . . , and therefore choose Y0 = 1 . I n the notation o f Proposition 1 2.2.2 we have n l = 'I' I = 1 , n l = 0, and the recurswns, � � = n, + 1 , e , = 2n, , nr + l = 40, + 1 = j-(4' + 1 - 1 ), ( 1 2.2.22) 4!1l '1' , + 1 = 4'1', + e,2 ��- 1 = 4'1', + -- , 1 + n, n r + l = n r + l - 'I' , + I = j-(4' + 1 - 1) - 'I' , + I · Setting '1', + I = - n, + I + j-(4' + I - 1), we find from the fourth equation that 4!1'2 + 1 - 1 ) = 4( - n, + 11 - n, + 1 + 11 3�4' - 1)) + --- . 3�4' 1 + n, 480 1 2. State-Space Models and the Kalman Recursions This yields the recursion Qt + 1 - 1 + 50, 1 + 0, ' --- from which it can be shown that 4 + 2j5 - (j5 - 1 )c2 - ' , where c = !{7 + 3j5). ( 12.2.23) 2 + (J5 + 3)c 2 - r We can now write the solution of ( 12.2.22) as 0, = {11, n, + 1, e , = 20, n, = �(4' - 1 ), \{1, = �(4' - 1 ) - n, , = ( 12.2.24) with n, as in ( 12.2.23). The equations for the estimators and mean squared errors as derived in the preceding propositions can be made quite explicit for this example. Thus from Proposition 1 2.2.2 we find that the one-step predictor of X, + 1 satisfies the recursions, 20, 1 + n, X,+ 1 = 2X, + -- ( I; - X,), � � � with x l = 1 , and with mean squared error Q, + 1 given b y ( 12.2.23). Similarly, from (12.2. 1 2) and (12.2.14), the one-step predictor of r; + 1 and its mean squared error are given by, and L� l J = 0, + 1 + 1 . The filtered state estimate for Xr + 1 and its mean squared error are found from Proposition 1 2.2.3 to be and Qt + 1 Qt + 1 lt + 1 - _:__c:__ 1 + Qt + 1 Finally the smoothed estimate of X, + 1 based on Y0 , Y1, . . . , r; + 2 is found, using Proposition 1 2.2.4 and some simple algebra, to be - _ 48 1 §1 2.2. The Kalman Recursions with mean squared error, n, + l nt + l lt + 2 - --1 + nt + 1 It is clear from ( 12.2.23) that the mean squared error, nn of the one-step predictor of the state, Xn converges as t --+ oo. In fact we have, as t --+ oo , -1 and ntlt + 1 --+ Js . 4 These results demonstrate the improvement in estimation of the state X, as we go from one-step prediction to filtering to smoothing based on the observed data Y0 , . . . , Yr + 1 . For more complex state-space models i t is not feasible to derive explicit algebraic expressions for the coefficients and mean squared errors as in Example 1 2.2. 1 . Numerical solution of the Kalman recursions is however relatively straightforward. Remark 7 . ExAMPLE 1 2.2.2 (Prediction of an ARIMA(p, 1 , q) Process). In Example 1 2. 1.7, we derived the following state-space model for the ARIMA(p, 1 , q) process { 1;} : 1; where = [G 1 ] [ X, 1 J Yr - + Z ,, [X, + 1 ] = [ OJ[ X,- 1 J [ ] Yr F G 1 1; X1 + t H z ,, 1 = 1 , 2, . . . ' t = 1 , 2, . . . ' z: FiH z, _ i' j=O t = 0, ± 1 , . . . ' 00 = ( 12.2.25) ( 12.2.26) and the matrices, F, G and H are as specified in Example 1 2. 1 .7. Note that Y0 (and the corresponding innovation I 0 = Y0) here refers to the first observation of the ARIMA series and not to the constant value 1. The operator P.( - ), as usual, denotes projection onto sp{ Y0 , . . . , 1;}. Letting T, denote the state vector (X;, 1; _ 1 )' at time t, the initial conditions for the recursions ( 1 2.2.6) and ( 12.2.7) are therefore 0 E(X� X'1 ) 0 1 = E(T1T'1) = , T 1 = P0 T 1 = Yo [J [ 482 1 2. State-Space Models and the Kalman Recursions The recursions ( 1 2.2.6) and ( 12.2.7) for the one-step predictors T, and the error covariance matrices n, = E[(T, - T,)(T, - T,)'] can now be solved. The h-step predictors and mean squared errors for the ARIMA process { Y,} are then found from ( 1 2.2. 1 1)-( 12.2. 14). It is worth noting in the preceding example, since X, = Y, - r; _ , is orthogonal to Y0 , t ;:::: 1 , that Remark 8. P,X, + 1 = P(X, + 1 I X 1 , . . . , X,) = X �+ 1 , where X�+ 1 is the best linear predictor of X, + 1 based on X 1 , , X,. Consequently, the one-step predictors of the state-vectors T, = (X;, Y, _ 1 )' are • . • t = 1 , 2, . . . , with error covariance matrices, n, = [�* �J where X� and � are computed by applying the recursions ( 1 2.2.6) and ( 12.2.7) to the state-space model for the ARMA process {X,}. Applying ( 1 2.2. 1 1) and ( 1 2.2. 1 2) to the model ( 12.2.25), ( 12.2.26) we see in particular that and In view of the matrix manipulations associated with state-space representations, the forecasting of ARIMA models by the method described in Section 9.5 is simpler and more direct than the method described above. However, if there are missing observations in the data set, the state-space representation is much more convenient for prediction, parameter estimation and the estimation of missing values. These problems are treated in the next section. Remark 9. § 12.3 State-Space Models with Missing Observations State-space representations and the associated Kalman recursions are ideally suited to the precise analysis of data with missing values, as was pointed out by Jones (1980) in the context of maximum likelihood estimation for ARMA processes. In this section we shall deal with two missing-value problems for state-space models. The first is the evaluation of the (Gaussian) likelihood based on {Yi,, . . . , YJ where i 1 , i 2 , , i, are positive integers such that 1 � i 1 < i 2 < · · · < i, � n. (This allows for observation of the process {Y,} at • . . 483 §1 2.3. State-Space Models with Missing Observations irregular intervals, or equivalently for the possibility that (n - r) observations are missing from the sequence {Y 1 , , Yn} .) The solution of this problem will enable us, in particular, to carry out maximum likelihood estimation for ARMA and ARIMA processes with missing values. The second problem to be considered is the minimum mean squared error estimation of the missing values themselves. • . . The Gaussian Likelihood of 1 ::::;; i 1 < i2 < . . < i, ::::;; n {Yi1, • • • , Yd, · Consider the state-space model defined by equations ( 1 2. 1 . 1 ) and (12. 1.2) and suppose that the model is completely parameterized by the components of the vector 9. If there are no missing observations, i.e. if r = n and ij = j, j = 1, . . . , n, then the likelihood of the observations {Y 1 , . . . , Yn} is easily found as in ( 1 1 .5.4) to be L(9; Y l , . . . , Yn) where Yj = Pj - l Yj and Lj = Ljl ), j � 1 , are the one-step predictors and error covariance matrices found from ( 1 2.2. 1 2) and ( 1 2.2. 1 4) with Y0 == 1 . To deal with the more general case o f possibly irregularly spaced observations {Y ; ,, . . . , Y ;,}, we introduce a new series { Yr}, related to the process {X,} by the modified observation equations, Y� = Y 0 and t where G*I = {, G 0 if t E { i 1 , . . . , q, otherwise, W*I and { N ,} is iid with N, � N(O, / ) w x w , Ns _L X I , Ns _L = = 1 , 2, . . . {w, N, [�J ' ( 1 2.3. 1 ) if t E { i 1 , . . . , i.}, (1 2.3.2) otherwise, S, t = 0, ± 1 , . . . . (12.3.3) Equations ( 1 2.3.1) and ( 1 2. 1 .2) constitute a state-space representation for the new series {Yi}, which coincides with {Y,} at each t E { i 1 , i2 , . . . , ir }, and at other times takes random values which are independent of {Y,} with a distribution independent of 9. Let L 1 (9 ; Y; , , . . . , Y;) be the Gaussian likelihood based on the observed values Y;, , . . . , Y;, of Y ; , , . . . , Y ;, under the model defined by (12. 1 . 1 ) and (12.1.2). Corresponding to these observed values, we define a new sequence, 484 1 2. State-Space Models and the Kalman Recursions ' = {0 y f , . . . ' y: , by Yr y* ift E { i 1 , . . . , i,}, otherwise. ( 1 2.3.4) Then it is clear from the preceding paragraph that ( 12.3.5) L 1 (9 ; Y i , , . . . , yj,} = (2n)<n - r) w/ 2 L 2(9; y f , . . . , y:), where L2 denotes the Gaussian likelihood under the model defined by ( 1 2.3. 1) and (12.1.2). In view of ( 1 2.3.5) we can now compute the required likelihood L 1 of the realized values { y" t = i 1 , , i,} as follows : . • . (i) Define the sequence { y t, t = 1 , . . . , n} as in ( 1 2.3.4). (ii) Find the one-step predictors Y7 of Y7, and their error covariance matrices I:(, using Proposition 1 2.2.2 and the equations ( 1 2.2. 1 2) and (1 2.2. 1 4) applied to the state-space representation, ( 1 2.3. 1 ) and ( 1 2. 1 .2) of {Yn. Denote the realized values of the predictors, based on the observation sequence { y t}, by {yn. (iii) The required Gaussian likelihood of the irregularly spaced observations, { Yi , , . . . , y d, is then, by ( 12.3.5), L 1 (9 ; Y i, , · · · , Yi) = (2nl - rw/2 (n n J= 1 det I:J ) - 1 12 { 1 n exp - - _I (yJ - YJYI:J - 1 (yJ - YJl . 2 J= l ( 12.3.6) } ExAMPLE 1 2.3.1 (An AR(1) Series with One Missing Observation). Let { Y;} be the causal AR(1) process defined by To find the Gaussian likelihood of the observations y 1 , YJ , y4 and y 5 of Y1 , Y3 , Y4 and Y5 we follow the steps outlined above. (i) Set yt = Yi , i = 1, 3, 4, 5 and Y! = (ii) We start with the state-space model for { Y;} from Example 1 2. 1 .4, i.e. r; = X , , X, + 1 = ¢ X , + Z, + 1. The corresponding model for { Yt } is then, from ( 1 2.3. 1 ), t = 1 , 2, . . . X, + 1 = F, X, + v; , Y ( = G(X, + W(, t = 1 , 2, . . . 0. ' where G*l = { ' 0 1 if t i= 2, if t = 2, R*l = {0 if t 1 if t W*' = i= 2, = 2, s: = {o 0, if t i= 2, N, if t = 2, 485 §1 2.3. State-Space Models with Missing Observations and X 1 = L� o �jz 1 _ j (see Remark 5 of Section 1 2. 1 ). Starting from the initial conditions, 0 11- and applying the recursions ( 1 2.2.6), we find (Problem 1 2. 1 6) that t and t 1 = {¢ 0 if t = 1 , if t = 3, if t = 2, 4, 5, if t = 1 ' 3 , 4, 5, if t = 2, From ( 1 2.2. 1 2) and ( 1 2.2. 1 4) with h = 1 , we find that with corresponding mean squared errors, (iii) From the preceding calculations we can now write the likelihood of the original data as EXAMPLE 1 2.3.2 (An ARIMA( l , 1 , 1) Series with One Missing Observation). Suppose we have observations y0 , y 1 , y 3 , y4 and y 5 of the ARIMA( l , 1 , 1 ) process defined i n Example 1 2. 1 .7. The Gaussian likelihood of the observations y1, J3 , y4 and y5 conditional on Y0 = Yo can be computed by a slight modification of the method described above. Instead of setting Y0 = 1 as in the calculation of unconditional likelihoods, we take as Y 0 the one-component vector consisting of the first observation Y0 of the process. The calculation of the conditional likelihood is then performed as follows: (i) Set y[ = yi, i = 0, 1, 3, 4, 5 and Y i = 0. (ii) The one-step predictors Y;", t = 1, 2, . . . , and their mean squared errors are evaluated from Proposition 1 2.2.2, equations ( 1 2.2. 1 2) and ( 1 2.2. 14) and the state-space model for { Yi} derived from Example 1 2. 1 .7, i.e. X, + 1 Y;" = F, X, + V� > = G;"X, + W;", t = 1 , 2, . . . ' t = 1 , 2, . . . , 1 2. State-Space Models and the Kalman Recursions 486 G*1 S( = {a 2 [ ¢ + 1 e] 0 if t i= 2, if t = = {[[01 1 ] if t i= 2, OJ if t 2, if t i= 2, if t 2, if t i= 2, if t 2, = = = and 2, and using the recursions ( 12.2.6), ( 12.2. 1 2) and ( 1 2.2. 1 4), we obtain the . . . ' n, and mean squared errors, is the predicted values, in terms of ... evaluated at best linear predictor of d. · · · , (iii) The required likelihood of the observations conditional is found by direct substitution into ( 1 2.3.6). We shall not attempt on to write it algebraically. yf, YiY0 = y0, Yf, Yi Y�, Yf, , Y(_I.f,1 , ... ' I.! (.yi y0, y 1 , YJ, y4, y5, Remark 1. If we are given observations of an ARIMA(p, d, q) process at times 1 - d, 2 - d, . . . , 0, where 1 :::; < < · · · < ir :::; n, we can use the representation ( 1 2. 1 .3 1) and ( 1 2. 1 .32) with the same argument as in Example 1 2.3.2 to find the Gaussian likeliconditional on hood of (Missing values among the first d observations can be handled by treating them as unknown parameters for likelihood maximization.) A similar analysis can be carried out using more general differencing operators of the form ( 1 - B)d( 1 - Bs)n (see Problem 1 2.9). The dimension of the state vector constructed in this way is max(p + d + sD, q). Different approaches to maximum likelihood estimation for ARIMA processes with missing values can be found in Ansley and Kohn ( 1985) and Bell and Hillmer ( 1990). i 1 i2 Y;,, . . . , Y;, y1 _d, y2 -d, . . . , y0, Y;i ,, ,. Y;. .,,, i.". . , Y;, 1 Y1_d = Y 1 - d, yY21 - d,d =y2Yz-d,d,. ....,.Yo, Y0 = Yo· - - Observation vectors from which some but not all components are missing can be handled using arguments similar to those used above. For details see Brockwell, Davis and Salehi ( 1990). Remark 2. 487 § 1 2.3. State-Space Models with Missing Observations Estimation of Missing Observations for State-Space Models Given that we observe only Y;,, Y;,, . . . , Y;, , 1 ::s; i1 < i2 < · · · < i, ::s; n, where {Y,} has the state-space representation ( 1 2. 1 . 1) and (12.1.2), we now consider the problem of finding the minimum mean square error estimators P(Y,IY0, Y;,, . . . , Y;,) of Y, , 1 ::s; t ::S; n where Y0 = 1. To handle this problem we again use the modified process {Yi} defined by ( 1 2.3. 1) and ( 12. 1 .2) with Y6 = 1 . Since Ys* = Ys for s E {i1, . . . , i,} and Y: .1 X" Y0 for 1 ::s; t ::s; n and s ¢ { 0, i1, . . . , i,}, we immediately obtain the minimum mean squared error state estimators, 1 ::s; t ::s; n. ( 1 2.3.7) The right-hand side can be evaluated by direct application of the Kalman fixed point smoothing algorithm (Proposition 1 2.2.4) to the state-space model ( 1 2.3. 1 ) and ( 1 2. 1 .2). For computational purposes the observed values of Y�, t ¢ { 0, i1, . . . , i,} are quite immaterial. They may, for example, all be set equal to zero, giving the sequence of observations of Yi defined in ( 1 2.3.4). In order to evaluate P(Y, I Y0, Y;,, . . . , Y;,), 1 ::s; t ::s; n, we use ( 1 2.3.7) and the relation, ( 1 2.3.8) Y, = G,X, + W, . Under the assumption that E(V, W ;) = S, we find from ( 1 2.3.8) that = 0, t = 1 , . . . , n, ( 1 2.3.9) P(Y, I Y0, Y; , , . . . , Y;,) = G, P(X, I Y6, Y f, . . . , Y:). ( 1 2.3. 1 0) It is essential, in estimating missing observations of Y1 with ( 1 2.3.10), to use a state-space representation for {Y,} which satisfies ( 12.3.9). The ARMA state-space representation in Example 1 2. 1 .5 satisfies this condition, but the one in Example 1 2. 1 .6 does not. ExAMPLE 1 2.3.3 (An AR(1) Series with One Missing Observation). Consider the problem of estimating the missing value Y2 in Example 1 2.3. 1 in terms of Y0 = 1 , Y1, Y3, Y4 and Y5. We start from the state-space model, X,+ 1 = cf;X1 + Zr + l• 1'; = X" for { J-;}, which satisfies the required condition (1 2.3.9). The corresponding model for { Yi} is the one used in Example 1 2.3. 1 . Applying Proposition 1 2.2.4 to the latter model, we find that Ps X z = P3 X z , n2. 3 = cf;a2 ' 488 1 2. State-Space Models and the Kalman Recursions and n 2 1 2 - 0" 2 , 2 n2 1 1 - (J , (J 2 , Q2 l r = ( 1 + c/J 2 ) t z 3, where P1( · ) here denotes P( · I Y�, . . . , Yt) and nr. n • nr l n are defined correspondingly. Since the condition ( 1 2.3.9) is satisfied, we deduce from ( 1 2.3. 1 0) that the minimum mean squared error estimator of the missing value Y2 is with mean squared error, ExAMPLE 1 2.3.4 (Estimating Missing Observations of an ARIMA(p, d, q) Process). Suppose we are given observations Y1 _ d , Y2 _ d , . . . , Y0 , ¥; 1 , , Y;, ( 1 � i 1 < i 2 · · · < ir � n) of an ARIMA(p, d, q) process. We wish to find the best linear estimates of the missing values Y,, t ¢: { i 1 , . . . , ir} , in terms of Y, , t E { i 1 , . . . , ir} and the components of Y 0 := ( Y1 _ d, Y2 - d • . . . , Y0)'. This can be done exactly as described above provided we start with a state-space representation of the ARIMA series { Y,} which satisfies ( 12.3.9) and we apply the Kalman recursions to the state-space model for { Yi } . Although the representation in Example 1 2. 1.7 does not satisfy ( 12.3.9), it is quite easy to contruct another which does by starting from the model in Example 1 2. 1 .5 d for {V Y,} and following the same steps as in Example 1 2. 1 .7. This gives (Problem 1 2 . 8), • • • ( 1 2.3. 1 1) where [ Xr + I Yr OJ[ [ ] [ J J = F AG B Xr Yr - 1 + H 0 Zr + l • ( 1 2.3. 1 2) 1, 2, . . . . The matrices G, F and H are the coefficients in ( 1 2. 1 .20) and ( 1 2. 1 .23) and A and B are defined as in Example 1 2. 1 .7. We assume, as in Example 1 2. 1 .7, that for t = t = 0, ± 1 , . . . . ( 1 2.3. 1 3) This model clearly satisfies ( 12.3.9). Missing observations can therefore be estimated by introducing the corresponding model for { Yi } and using ( 1 2.3.7) and ( 1 2.3. 1 0) . § 12.4. Controllability and Observability 489 § 12.4 Controllability and Observability In this section we introduce the concepts of controllability and observability, which provide a useful criterion (Proposition 1 2.4.6) for determining whether or not the state vector in a given state-space model for {Y,} has the smallest possible dimension. Consider the model (with X, stationary), = FX1 + vl' Y, = GX1 + W,, x, + 1 where {[�J} � ( [;, !]) WN o, t = 0, ± 1, . . . ' t = 0, ± 1 , . . . ' ( 1 2.4. 1) and F satisfies the stability condition det(/ - Fz) #- 0 for i z l ::; 1 . ( 1 2.4.2) From Section 1 2. 1 , X1 and Y1 have the representations _ j = 1 Fi - 1 yt -j ' Xt - 00 '\' i...J 00 Y, = I GFi- 1 yr -j + Wl ' j= 1 ( 1 2.4.3) and (X; , Y;)' is a stationary multivariate time series (by a simple generalization of Problem 1 1 . 14). To discuss controllability and observability, we introduce the subclass of stationary state-space models for {Y1} defined by the equations X1 + 1 yt = FX1 + HZn = GXt + Zt, t = 0, ± 1 , . . . ' t = 0, ± 1 , . . . ' ( 12.4.4) where {Z1} WN(O, t) with dimension v, H is a v x w matrix and F is stable. If the noise vector, Zl ' is the same as the innovation of Yl' i.e. if � ( 1 2.4.5) then we refer to the model ( 1 2.4.4) as an innovations representation. Obviously if {Y,} has an innovations representation then it has a representation of the form ( 1 2.4. 1 ) with stable F. The converse of this statement is established below as Proposition 1 2.4. 1 . Even with the restnchon that the white noise sequence {Z1} satisfies ( 12.4.5), the matrices F, G and H in the innovations representation ( 1 2.4.4) are not uniquely determined (see Example 1 2.4.2). However, if t is non-singular then the sequence of matrices { GFi - 1 H, j = 1, 2, . . . }, is Remark 1 . 1 2. State-Space Models and the Kalman Recursions 490 necessarily the same for all innovations representation of {Y1} since from 00 L: GFi - 1 HZ1 _ j + zl ' j= 1 i 1 it follows that GF - H = E(Y1Z; _ )t - 1 with Z1 given by ( 12.4.5). Yl = Under assumption ( 12.4.2), the state-space model (12.4. 1 ) has a n innovations representation, i.e. a representation of the form (12.4.4) with noise vector Z1 defined by ( 12.4.5). Proposition 1 2.4.1 . PROOF. For the state vectors defined in ( 12.4. 1 ), set X(t l s) = P(X1 1 Yi, - oo < j ::;; s). Then, with Z1 defined by ( 1 2.4.5), Z1 l_ Yi, j < t, so that by Problem 12. 12, P( · I Yi, j ::;; t) = P( · I Yi, j < t) + P( · I Z1). Hence by Remark 5 of Section 1 2.2 and the orthogonality ofV1 and Yi,j < t, where X(t + l i t) = P(Xt + 1 1 Yj , j < t) + HZI = P(FXI + Vt i Yj , j < t) + HZI = FX(t i t - 1 ) + HZn ( 12.4.6) H = E(X1 + 1 z;) [ E(Z1Z;)] - 1 , and [E(Z1 Z;)] - 1 is any generalized inverse of E(Z1Z;). Since (X;, Y;)' is stationary, H can be chosen to be independent of t. Finally, since W1 l_ Yi, j < t, P(Y1 1 Yi,j < t) = P(GX1 + W1 1 Yi,j < t) = GX(t i t - 1). Together with ( 1 2.4.6), this gives the innovations representation, as required. X(t + l i t) = FX(t i t - 1 ) + HZn Y1 = GX(t i t - 1 ) + Z1, ( 12.4.7) D EXAMPLE 1 2.4. 1 . The canonical representation (12.1 .24), ( 1 2. 1 .25) of the causal ARMA process { I";} satisfying t; = <P 1 Yr - 1 + · · · + <Pp Yr - p + zt + e 1 z1 - 1 + · · · + eqzt - q' {Z1} � WN(O, a 2 ), ( 12.4.8) has the form ( 12.4.4). It is also an innovations representation if ( 1 2.4.8) is invertible (Problem 1 2. 1 9). Assuming that zl E sp{ Y., - 00 < s ::;; t} and defining Y(t l s) = P( l-; 1 lj, - oo < j ::;; s), § 1 2.4. Controllability and Observability 49 1 we now show how the canonical representation arises naturally if we seek a model of the form (1 2.4.7) with X(t [ t - 1) = ( Y( t [ t - 1), . . . , Y(t - 1 + m [ t - 1))' and m = max(p, q). Since we are also assuming the causality of ( 12.4.8), we have sp{ }j ,j � s} = sp{ Zj j � s}, so that from (5.5.4) , Y(t + j [ t) = L t/Jk Zr + j- k k =j w j = 0, . . . , m = max(p, q). ( 12.4.9) Replacing t by t + m in (1 2.4.8) and projecting both sides onto sp{ lj, j � t}, we obtain from (1 2.4.9) and the identity, t/Jm = L:k'= 1 ¢k t/Jm -k + em (see (3.3.3)), the relation m Y( t + m [ t) = I cp k Y(t + m - k [ t) + emzl k=l m = kL= l cpk Y(t + m - k [ t - 1) + t/Jm Z1 • ( 1 2.4. 1 0) Now the state-vector defined by X( t [ t - 1) = ( Y (t [ t - 1), . . . , Y( t - 1 + m [ t - 1 ))' satisfies the state equation X(t + 1 [ t) = FX(t - 1 [ t) + HZ, where F and H are the matrices in Example 1 2. 1.6 (use ( 12.4.9) for the first r - 1 rows and ( 12.4. 10) for the last row). Together with the observation equation, Y, = Y (t [ t - 1) + Y, - Y(t [ t - 1) = [1 0 . . . O]X(t [ t - 1) + Z, this yields the canonical state-space model of Example 12.1.6. Definition 1 2.4.1 (Controllability). The state-space model (1 2.4.4) is said to be controllable if for any two vectors xa and xb, there exists an integer k and noise inputs, Z l , . . . ' zk such that x k = xb when X o = Xa . I n other words, the state-space model i s controllable, if by judicious choice of the noise inputs, Z 1 , Z 2 , . . . , the state vector X , can be made to pass from xa to xb . In such a case, we have X o = Xa, X 1 = Fxa + HZ 1 , 492 1 2. State-Space Models and the Kalman Recursions and hence xb - Fkxa where = [H FH = ek[z;. ··· ··· Fk - l H] [Zi z;. _ 1 ··· Z'1 ] ' Z'1 ] ' , ( 1 2.4. 1 2) From these equations, we see that controllability is in fact a property of the two matrices F and H. We therefore say that the pair (F, H) is controllable if and only if the model ( 12.4.4) is controllable. Proposition 1 2.4.2. The state-space model ( 1 2.4.4), or, equivalently, the pair (F, H) is controllable ifand only ifev has rank v (where v is the dimension of X1). PROOF. The matrix ev is called the controllability matrix. If ev has rank v then the state can be made to pass from xa to xb in v time steps by choosing [ Z� 1 Z'1 ] ' = e�( ev e�) - (xb - F "x.). ··· Recall from Remark 2 of Section 2.5 that ev e� is non-singular if ev has full rank. To establish the converse, suppose that (F, H) is controllable. If A.(z) = det(F - zl) is the characteristic polynomial of F, then by the Cayley-Hamilton Theorem, A.(F) = 0, so that there exist constants {3 1 , • . . , f3v such that ( 1 2.4. 1 3) More generally, for any k on k) such that � 1 , there exist constants {3 1 , . • • , f3v (which depend ( 12.4. 1 4) This is immediate for k :$; v. For k > v it follows from ( 1 2.4. 1 3) by induction. Now if ev does not have full rank, there exists a non-zero v-vector y such that y'ev = y'[H F H · · · F" - 1 H] = 0', which, in conjunction with ( 1 2.4. 14), implies that y' Fj H = O', for j = 0, 1 , . . . , 0 Choosing Xa = and xb = y, we have from ( 1 2.4. 1 1) and the preceding equation that 1 y'y = y'(Fkxa + HZk + FHZk - l + · · · + Fk - H Z 1) = 0, which contradicts the fact that y # 0. Thus ev must have full rank. Remark 2. From the proof of this proposition, we also see that rank( ek) :$; rank( ev) for k rank(ek) = rank(ev) for k :$; > v, v. D 493 § 1 2.4. Controllability and Observability For k ::;; v this is obvious and, for k > v, it follows from (12.4. 14) since the columns of pv + iH, j ;;:::: 0, are in the linear span of the columns of C" . ExAMPLE 1 2.4.2. Suppose v = 2 and w = 1 with [ � .�] and H = F= " Then [�J. has rank one so that (F, H) is not controllable. In this example, . 5j - 2 for j ;;:::: 1 , Fi - 1 H = 0 [ J so that by replacing V1 and Wr i n ( 12.4.3) by HZ1 and Z0 respectively, we have f: G [0] .sj - z zr -j + zt . j= 1 Since the second component in X1 plays no role in these equations, we can eliminate it from the state-vector through the transformation X 1 = [ 1 OJXr = Xt t . Using these new state variables, the state-space system is now controllable with state-space equations given by Xr + 1 = .5 X r + 2Z0 Y, = Y, = G [�J xr + zr . This example is a special case of a more general result, Proposition 1 2.4.3, which says that any non-controllable state-space model may be transformed into a controllable system whose state-vector has dimension equal to rank(CJ ExAMPLE 1 2.4.3. Let F and H denote the coefficient matrices in the state equation (12. 1 .25) of the canonical observable representation of an ARMA(p, q) process. Here v = m = max(p, q) and since j = 0, 1, . . . ' we have t/Jv ] t/J v + I � t/Jz - 1 ' 494 1 2. State-Space Models and the Kalman Recursions where tj;j are the coefficients in the power series ( 1 2.4. 1 5) If Cv is singular, then there exists a non-zero vector, a = (av 1 > that _ . . • , a0)' such ( 1 2.4. 1 6) and hence k = v, v + 1 , . . . , 2v - 1 . ( 1 2.4. 1 7) Multiplying the left side of ( 1 2.4. 1 6) by the vector (¢v, . . . , ¢ 1 ) and using (3.3.4) with j > v, we find that ( 1 2.4. 1 7) also holds with k = 2v. Repeating this same argument with Cv replaced by the matrix [tj; i + j] i. j = 1 (which satisfies equation ( 1 2.4. 1 6) by what we have just shown), we see that ( 1 2.4. 1 7) holds with k = 2v + 1 . Continuing in this fashion, we conclude that ( 1 2.4. 1 7) is valid for all k ;;::: v which implies that a(z)tf;(z) is a polynomial of degree at most v - 1 , viz. 8(z) v a(z)l/l(z) = a(z) - = b0 + b 1 z + · · · + bv _ 1 z - t = b(z), ¢(z) v where a(z) = a0 + a 1 z + · · · + av _ 1 z - t . In particular, ¢(z) must divide a(z). This implies that p ::; v - 1 and, since v max(p, q), that v = q > p. But since ¢(z) divides a(z), a(z)I/J(z) = b(z) is a polynomial of degree at least q > v - 1 ;;::: deg(b(z)), a contradiction. Therefore Cv must have full rank. = If the state-space model ( 12.4.4) is not controllable and k = rank( Cv), then there exists a stationary sequence of k-dimensional state-vectors {X1} and matrices F, fi, and G such that F is stable, (F, fi) is controllable and Proposition 1 2.4.3. xt + 1 = txt + Hz(' Yt = GX1 + Z1 • ( 1 2.4. 1 8) PROOF. For any matrix M, let Yf(M) denote the range or column space of M. By assumption rank(Cv) = k < v so that there exist v linearly independent vectors, v 1 , . . . , vv, which can be indexed so that �(Cv) = sp{v 1 , . . . , vk }. Let T denote the non-singular matrix T = [V 1 V 2 . V vJ . • • Observe that § 12.4. Controllability and Observability 495 where the second equality follows from Remark 2. Now set F = T � 1 FT and H = T � 1 H, so that [ ( 1 2.4. 1 9) TF = FT and TH = H. ] . · · · F� as F� 1 1 F� 1 2 and cons1. d enng on y t he fi rst k co umns By partltlomng F2 1 F 22 of the equation in ( 1 2.4. 1 9), we obtain [PI I ] I I = F[v 1 · · · vk]. [v 1 · · · v vJ F� z1 Since the columns of the product o n the right belong to sp{v 1 , , vd and since v 1 , . . . , v" are linearly independent, it follows that F 2 1 = 0. Similarly, by writing R = [:J • . • with H 1 a k x w matrix and noting that []l(H) £: sp{v 1 , vk}, we conclude that H 2 = 0. The matrices appearing in the statement of the proposition are now defined to be - � k k , F = F 1 1 , H = H 1 , and G = G T O • . . , [J ] � X where h k is the k-dimensional identity matrix. To verify that F, G and have the required properties, observe that p· � 1 pj � l n = - H 0 x [ -] H and rank[R PH · · · pv � 1 R] = rank[H fR · · · pv � 1 H] = rank[T� 1 H (T� 1 FT)(T � 1 H) · · · (T � 1 F" � 1 T)(T � 1 H)] = rank[H FH · · · F" � 1 H] = rank(C v) = k. By Remark 2 the pair (F, H) is therefore controllable. In addition, F satisfies the stability condition ( 12.4.2) since its eigenvalues form a subset of the eigenvalues of F which in turn are equal to the eigenvalues of F. Now let X1 be the unique stationary solution of the state equation x t + I = Px t + ilz1 • Then Y1 satisfies the observation equation Yt = GX1 + zl ' 1 2. State-Space Models and the Kalman Recursions 496 since we know from ( 1 2.4.4) that yt = zt + If= t GFj - t H zt j and since 1 k t j j G F - fi = G T � p t 1f [ ] ' - = G(TFj - t T - 1 )(TB) = GFj - 1 H, j = 1 , 2, . . . . D Definition 1 2.4.2 (Observability). The state-space system ( 1 2.4.4) is observable if the state X0 can be completely determined from the observations Y0, Y 1 , when Z0 = Z 1 = · · · = 0. For a system to be observable, X0 must be uniquely determined by the sequence of values GX0 , GFX0, GF 2 X0, . . . . • . . Thus observability is a property of the two matrices F and G and we shall say that the pair (F, G) is observable if and only if the system ( 1 2.4.4) is observable. If the v x kw matrix O� := [G' F'G' · · · F'k - 1 G'] [ ] has rank v for some k, then we can express X0 as GX0 G X X0 = (O� Ok) - 1 0� o GFk - 1X0 � = (O�Ok) - 1 0�(0k X0), showing that (F, G) is observable in this case. Proposition 1 2.4.4. The pair of matrices (F, G) is observable if and only if Ov has rank v. In particular, (F, G) is observable ifand only if(F', G') is controllable. The matrix Ov is referred to as the observability matrix. PROOF. The discussion leading up to the statement of the proposition shows that the condition rank(Ov) = v is sufficient for observability. To establish the necessity suppose that (F, G) is observable and Ov is not of full rank. Then there exists a non-zero vector y such that Ovy = 0. This implies that GFj - t y = 0 for j = 1 , . . . , v, and hence for all j ;::::: 1 (by ( 1 2.4. 1 4)). It is also true that GFj - t O = 0 showing that the sequence GFj - 1 X0, j = 1 , 2, . . . , is the same for X0 = y as for X0 = 0. This contradicts the assumed observability of (F, G), and hence 497 §1 2.4. Controllability and Observability rank(O") must be v. The last statement of the proposition is an immediate consequence of Proposition 1 2.4.2 and the observation that 0� = Cv where Cv is the controllability matrix corresponding to (F', G'). 0 ExAMPLE 1 2.4.3 (cont.). The canonical observable state-space model for an ARMA process given in Example 1 2.4.6 is observable. In this case v = m = max(p, q) and GFj - 1 is the row-vector, j = 1 , . . . ' v. from which it follows that the observability matrix Ov is the v-dimensional identity matrix. If (F, G) is not observable, then we can proceed as in Proposition 1 2.4.3 to construct two matrices F and G such that F has dimension k = rank(Ov) and (F, G) is observable. We state this result without proof in the following proposition. Proposition 1 2.4.5. If the state-space model ( 12.4.4) is not observable and k = rank(O"), then there exists a stationary sequence of k-dimensional state vectors {X,} and matrices F, fl and G such that F is stable, (F, G) is observable and x., + ! = tx, + flz,, Y, = GX, + z, . ( 1 2.4.20) The state-space model defined by (12.4.4) and ( 12.4.5) is said to be minimal or of minimum dimension if the coefficient matrix F has dimension less than or equal to that of the corresponding matrix in any other state-space model for {Y,}. A minimal state-space model is necessarily controllable and observable; otherwise, by Propositions 1 2.4.3 and 1 2.4.4, the state equation can be reduced in dimension. Conversely, controllable and observable innovations models with non-singular innovations covariance are minimal, as shown below in Proposition 1 2.4.6. This result provides a useful means of checking for minimality, and a simple procedure (successive application of Propositions 1 2.4.3 and 1 2.4.5) for constructing minimal state-space models. It implies in particular that the canonical observable model for a causal invertible ARMA process given in Example 1 2.4. 1 is minimal. Proposition 1 2.4.6. The innovations model defined by equations ( 1 2.4.4) and ( 1 2.4.5), with t non-singular, is minimal if and only if it is controllable and observable. PROOF. The necessity of the conditions has already been established. To show sufficiency, consider two controllable and observable state-space models satisfying (1 2.4.4) and ( 1 2.4.5), with coefficient matrices (F, G, H) and 498 1 2. State-Space Models and the Kalman Recursions (F, G, ii) and with state dimensions v and i5 respectively. It suffices to show that v = i5. Suppose that i5 < v. From Remark 1 it follows that j = 1, 2, . . . , GFj - 1 H = GFj - 1 fi, [ ] and hence, multiplying the observability and controllability matrices for each model, we obtain o v cv = GFH GF2H GF" - 1 H GF"H GF"._ 1 H GF"H GF2� - 1 H GH GFH . . = o-v cv - ( 1 2.4.21 ) Since 0" and Cv have rank v, �(Cv) = IR", �(O" C") = �(0) and hence rank(O"C") = v. On the other hand by ( 12.4.21 ), �(O"C") � �(O v), and since rank( Ov) = i5 (Remark 2), we obtain the contradiction i5 ;;::: rank(Ov Cv) = v. Thus i5 = v as was to be shown. 0 § 12.5 Recursive Bayesian State Estimation As in Section 1 2. 1 , we consider a sequence of v-dimensional state-vectors { XP t ;;::: 1 } and a sequence of w-dimensional observation vectors { Y0 t ;;::: 1 } . r I t will be convenient t o write y< J for the wt-dimensional column vector r y < J = (Y'I , Y � , . . . , Y;)'. In place of the observation and state equations ( 1 2. 1 . 1 ) and ( 12.1 .2), we now assume the existence, for each t, of specified conditional probability r r densities of Yr given (X 0 y< - I J) and of Xr + 1 given (X 0 y< l). We also assume that these densities are independent of y<t - I J and y<tJ, respectively. Thus the observation and state equations are replaced by the collection of conditional densities, t = 1 , 2, . . . , ( 1 2.5. 1 ) and t = 1 , 2, . . . , ( 1 2.5.2) with respect to measures v and Jl, respectively. We shall also assume that the initial state X 1 has a probability density p 1 with respect to f.l. If all the conditions of this paragraph are satisfied, we shall say that the densities {p 1 , p�ol, p�5l, t = 1 , 2 , . . . } define a Bayesian state-space model for {Yr}. In order to solve the filtering and prediction problems in this setting, we shall determine the conditional densities p�fl(x r 1 y<r )) of X r given y<tJ , and p�Pl(xr i Y<t - l l) of Xr given y<t - 1 ), respectively. The minimum mean squared r error estimates of Xr based on y< J and y<t - I J can then be computed as the r conditional expectations, E(Xr l y< l ) and E(Xr l yU - I l). § 1 2.5. Recursive Bayesian State Estimation 499 The required conditional densities p�fl and p�Pl can be determined from t = 1, 2, . . . } and the following recursions, the first of which is obtained by a direct application of Bayes' s Theorem using the assumption that the distribution of Y, given ( X , , yu - J l ) does not depend on yu - J l : {p 1 , p�0l, p�sl, = p�ol(y, I x,)p�Pl(x, I y<' - 1 l)/c,(y<tl), P� 1 1 (x, + J l y<'l) = p�fl(x, I yUl)p�sl(x, + 1 1 x,) d11(x,), PVl(x, I yUl) where f ( 1 2.5.3) (1 2.5.4) cr(y (tl ) = Pv, rvu - l )(y, l y<r - l l). The initial condition needed to solve these recursions is (1 2.5.5) The constant cr(y<' l) appearing in (12.5.3) is just a scale factor, determined by the condition J p�fl(x, 1 y<tl) d11(x,) = 1 . ExAMPLES 1 2.5. 1 . Consider the linear state-space model defined b y (12. 1 . 1 ) and (12.1 .2) and suppose, i n addition t o the assumptions made earlier, that { X 1 , W 1 , V 1 , W2 , V2 , . . } is a sequence of independent Gaussian random vectors, each having a non-degenerate multivariate normal density. Using the same notation as in Section 1 2. 1 , we can reformulate this system as a Bayesian state-space model, characterized by the Gaussian probability densities, . = n(x 1 ; EX � > f.! 1 ), p�ol(y, / X, ) = n(y, ; G, X, , R,), p�s l(x, + 1 1 X ,) = n(x, + 1 ; F, X" Q,), p 1 (x 1 ) ( 12.5.6) ( 1 2.5. 7) ( 1 2.5.8) where n(x ; �' I:) denotes the multivariate normal density with mean � and covariance matrix I:. Note that in this formulation, we assume that S, E( V, W;) = 0 in order to satisfy the requirement that the distribution of X, + 1 given (X" yul) does not depend on Y ( tl . To solve the filtering and prediction problems in the Bayesian framework, we first observe that the conditional densities, p�fl and p�Pl are both normal. We shall write them (using notation analogous to that of Section 1 2.2) as = ( 1 2.5.9) and ( 1 2.5. 1 0) From (1 2.5.4), ( 1 2.5.8), (1 2.5.9) and (12.5. 1 0), we find that 1 2. State-Space Models and the Kalman Recursions 500 and 0, + 1 = F,n, 1 , F; + Q, . Substituting the corresponding density n(x, ; X, , n,) for ( 1 2.5.3) and ( 1 2.5.7), we find that rlii/ and -1 G; R,- 1 G, + n, 1 = G;R,- G, + (F, _ 1 Qr - 1 l r - 1 F; _ 1 + = m Q, _ d - 1 1 X,1, = X , + n, 1, G; R,- (Y, - G, X,). � � From (1 2.5.3) with conditions, and p�Pl (x, l Y ( ' - 1 l) pi_Pl(x 1 l y( 0 l) = n( x 1 ; EX 1 , Q 1 ) we obtain the initial n.- 1 1 n. - 1 ��1 1 1 = G '1 R 1- G 1 + ·�1 . Remark 1 . Under the assumptions made in Example 1 2.5. 1 , the recursions of Propositions 1 2.2.2 and 1 2.2.3 give the same results for X, and X, 1 " since for Gaussian systems best linear mean-square estimation is equivalent to best mean-square estimation. Note that the recursions of Example 1 2.5. 1 require stronger assumptions on the covariance matrices (including existence of R,- 1 ) than the recursions of Section 1 2.2, which require only the assumptions (a)-(e) of Section 1 2. 1 . 12.5.2. Application of the results of Example 1 2.5. 1 to Example 1 2.2. 1 (with X 1 = 1 and the additional assumption that the sequence { W1 , V1 , W2 , V2 , . . . } is Gaussian) immediately furnishes the recursions, 1 n,l/ = 1 + (4n, - 1 1 ' - 1 + 1 ) - , ExAMPLE X, I, = 2 xr - 1 1r - 1 + n, l l ¥; - 2X r - 1 l r - d , X , + 1 = 2Xt l t > n, + 1 = 4n, l , + 1, with initial conditions X 1 1 1 = 1 and Q 1 1 1 = 0. It is easy to check that these recursions are equivalent to those derived earlier in Example 1 2.2. 1 . ExAMPLE 1 2.5.3 (A Non-Gaussian Example). I n general the solution of the recursions ( 1 2.5.3) and ( 12.5.4) presents substantial computational problems. Numerical methods for dealing with non-Gaussian models are discussed by Sorenson and Alspach ( 1 971) and Kitagawa ( 1987). Here we shall illustrate the recursions (1 2.5.3) and ( 12.5.4) in a very simple special case. Consider the state equation, ( 1 2.5. 1 1) X, = aX, _ 1 , Problems 501 with observation density (relative to counting measure on the non-negative integers), p,(o) (y, l x,) = (nx,)Y e - nx ' 1 y, . , , y, = 0, 1, . . . ( 1 2.5. 1 2) ' and initial density (with respect to Lebesgue measure), ( 1 2 . 5 . 1 3) X ;:::: 0. (This is a simplified model for the evolution of the number X, of individuals at time t infected with a rare disease in which X, is treated as a continuous rather than an integer-valued random variable. The observation y; represents the number of infected individuals observed in a random sample consisting of a small fraction n of the population at time t.) Although there is no density p�sl(x, l x, _ 1 ) with respect to Lebesgue measure corresponding to the state equation ( 1 2.5 . 1 1 ), it is clear that the recursion ( 1 2.5.4) is replaced in, this case by the relation, ( 1 2.5. 1 4) while the recursion ( 1 2.5 . 3) is exactly as before. The filtering and prediction densities p�JJ and p�Pl are both with respect to Lebesgue measure. Solving for pf(l from ( 1 2.5. 1 3) and the initial condition ( 1 2.5.5) and then successively substituting in the recursions ( 1 2.5. 14) and ( 1 2.5.3), we easily find that t ::::: ( 1 2.5. 1 5) 0, and t ::::: 0, where a, = a + y 1 + . · + y, and A, = Aa 1 - ' + n:(l a - ')/( 1 - a - 1 ). In particular the minimum mean squared error estimate of x, based on y<n is the conditional expectation a,/A, The minimum mean squared error is the variance of the distribution defined by ( 1 2.5. 1 5), i.e. a,/A� . . - . Problems 1 2. 1 . For the state-space model ( 1 2. 1 . 1 ) and (12.1 .2), show that F, F, _ 1 · · · F 1 X 1 + v, + F, V, _ 1 + · · · + F, F, _ 1 · · · F2 V 1 x, + 1 = Y, = G,(F, _ 1 F, _ 2 · · · F 1 X 1 ) + G, V, - 1 + G, F, _ 1 F, _ 2 · · · F2 V 1 + W, . and + G,F, _ Iv,_ 2 + · · · 1 2. State-Space Models and the Kalman Recursions 502 These expressions define j, and g, in ( 1 2. 1 .3) and ( 1 2. 1 .4). Specialize to the case when F, = F and G, = G for all t. 1 2.2. Consider the two state-space models and { Xr + l . l : F 1X r 1 + Vt l , { Xt + l . z : Fz X,z + V,z , Y,. 1 - G 1 X, 1 + Wn , Y, 2 - G 2 X, 2 + W, 2 , where {(V ; 1 , W; 1 , V; 2 , W; 2)'} i s white noise. Derive a state-space representation for { (Y; 1 , Y ; 2)'}. 1 2.3. Show that the unique stationary solutions to equations ( 1 2. 1 . 1 3) and ( 1 2. 1 . 1 2) are given by the infinite series 00 X, = L j= 0 piy r - j - 1 and Y, = W, 00 + L j=O GFiV,_ j - 1 , which converge i n mean square provided det(J - Fz) o1 0 for I z I ::::; 1 . Conclude that {(X ; , v;)'} is a multivariate stationary process. (Hint : Show that there exists an 8 > 0 such that (I - Fz) - 1 has the power series representation, L:J= 0 Fizi, in the region l z l < 1 + 8.) 1 2.4. Let F be the coefficient of X, in the state equation ( 1 2. 1 . 1 8) for the causal AR(p) process Establish the stability of ( 1 2. 1 . 1 8) by showing that the eigenvalues of F are equal to the reciprocals of the zeros of the autoregressive polynomial c/J(z). In particular, show that det(z/ - F) = zPcjJ(z - 1 ). 1 2.5. Let { X, } be the unique stationary solution of the state equation ( 1 2 . 1 .23) and suppose that { Y,} is defined by ( 1 2 . 1 .20). Show that { Y,} must be the unique stationary solution of the ARMA equations ( 1 2. 1 . 1 9). 1 2.6. Let { Y,} be the MA( l ) process Y, = z, + ez ,_ 1, (a) Show that { Y,} has the state-space representation Y, = [ 1 O] X, where { X, } is the unique stationary solution of X, + 1 = [� �]x, [�jz,n + 503 Problems In particular, show that the state-vector X, may be written as (b) Display the state-space model for { Y;} obtained from Example 1 2 . 1 .6. 1 2.7. Verify equations ( l 2. 1 .34H 12 . 1 .36) for an ARIMA( 1 , 1 , 1 ) process. 1 2.8. Let { Y;} be an ARIMA(p, d, q) process. By using the state-space model Example 1 2. 1 .5 show that { Y;} has the representation m Y; = GX, with X, + 1 = FX, + HZ, + 1 for t = 1 , 2, . . . and suitably chosen matrices F, G and H. Write down the explicit form of the observation and state equations for an ARIMA( 1 , 1 , 1 ) process and compare with equations ( 1 2 . 1 .34H 1 2 . 1 .36). 1 2.9. Following the technique of Example 1 2. 1 .7, write down a state-space model for { Y;} where {V'V' 1 2 Y;} is an ARMA(p, q) process. 2 1 2 . 1 0. Show that the set L� of random v-vectors with all components in L (Q, ff, P) is a Hilbert space if we define the inner product to be < X, Y) = L:r� 1 E(Xi Y;) for all X, Y E L� . If X, Y0 , . . . , Y, E L� show that P(X I Y0, . . . , Y,) as in Definition 1 2.2.2 is the projection of X (in this Hilbert space) onto S", the closed linear subspace of L� consisting of all vectors of the form C0 Y0 + · · · + C, Y" where C0 , . . . , C, are constant matrices. 1 2. 1 1 . Prove Remark 5 of Section 1 2.2. Note also that if the linear equation, Sx = b, has a solution, then x = s - 1 b is a solution for any generalized inverse 1 1 s- I of s. (If Sy = b for some vector y then S(S - b) = ss - Sy Sy = b.) = 1 2. 1 2. Let A1 and A2 be two closed subspaces of a Hilbert space £' and suppose that A1 .l A2 (i.e. x .l y for all x E A1 and y E A2). Show that where A1 EB A2 is the closed subspace {x + y: x E A1 , ( 1 2.2.8) follows immediately from this identity. y E A2}. Note that 1 2. 1 3. The mass of a body grows according to the rule X, + 1 = aX, + V., where X 1 i s known to be 1 0 exactly and { V. } a > 1, � Y; = X, + W, , WN(O, 1 ). At time t we observe 504 1 2. State-Space Models and the Kalman Recursions where { W,} - WN(O, I ) and { W,} is uncorrelated with { v;}. If P, denotes 2 projection (in L (Q, :1'; P)) onto sp { l , Y1, . . . , Y,}, t ?. 1 , and P0 denotes projection onto sp{ 1 } , (a) express a;+ 1 i n terms o f a; , where t = 1 , 2, . . . ' (b) express P,X, + 1 in terms of a;, Y, and P, _ 1 X,, (c) evaluate P2 X 3 and its mean squared error if Y2 = 1 2, and (d) assuming that Iim,_ ro a; exists, determine its value. a = 1 .5, 1 2. 1 4. Use the representation found in Problem 1 2.6(a) to derive a recursive scheme for computing the best linear one-step predictors Y, based on Y1, , Y, _ 1 and their mean squared errors. • • • 1 2. 1 5. Consider the state-space model defined by ( 12.2.4) and ( 12.2.5) with F, and G, G for all t and let k > h ?. I . Show that = = F and E(Y, + k - PY, + k)(Yc + h - P,Y, + h)' = GFk - hQ:h)G' + GFk - hSc + h · -1 1 2. 1 6. Verify the calculation of 0 , �, and Q, in Example 1 2.3. 1 . 1 2. 1 7. Verify the calculation of P 5 X 2 and its mean squared error in Example 1 2.3.3. 1 2. 1 8. Let y1 = - .2 1 0, y2 the MA( l ) process = .968, y4 = .618 and y5 Y, = Z, + .5Z, _ �> = - .880 be observed values of {Z,} - WN(O, 1 ). Compute P( Y6 1 Y1, Y2 , Y4, Y5) and its mean squared error. Compute P(Y7 1 Y1, Y2 , Y4, Y5) and its mean squared error. Compute P( Y3 1 Y1, Y2 , Y4, Y5) and its mean squared error. Substitute the value found in (c) for the missing observation y 3 and evaluate P( Y6 1 Y1, Y2 , Y3 , Y4, Y5) using the enlarged data set. (e) Explain in terms of projection operators why the results of (a) and (d) are the same. (a) (b) (c) (d) 12.19. Show that the state-space representation ( 1 2. 1 .24), ( 1 2. 1 .25) of a causal invertible ARMA(p, q) process is also an innovations representation. 1 2.20. Consider the non-invertible MA( l ) process, Y, = Z, + 2Z, _ 1, {Z,} - WN(O, 1). Find an innovations representation of { Y,} (i.e. a state-space model of the form ( 1 2.4.4) which satisfies (1 2.4.5)). 1 2.2 1 . Let { v;} be a sequence of independent exponential random variables with 1 E v; = t - and suppose that {X, , t ?. 1 } and { Y,, t ?. 1 } are the state and observation random variables, respectively, of the state-space system, x i = vi, x, = x, _ 1 + v;, t = 2, 3, . . . ' 505 Problems where the distribution of the observation 1;, conditional on the random variables X 1, Y2 , I :s; s < t, is Poisson with mean X, . (a) Determine the densities { p 1 , p�ol, p�sl, t 2 I }, in the Bayesian state-space model for { 1;}. (b) Show, using ( 1 2.5.3HI 2.5.5), that and ! P 2Pl ( x 2 I Y I ) (c) Show that = 2 2 + y, x l + y, e - zx, , r(2 + Y d 2 x2 > 0. x, > 0, and Xr + l > 0, where Ci1 = y1 + · · · + y, . (d) Conclude from (c) that the minimum mean squared error estimates of X, and X, + 1 based on Y1, . . . , 1;, are X, l , = t + Y1 + · · · + l'; t+ I ----- and X, + t � respectively. t + l + Y1 + · · · + 1; t+I = ------ CHAPTER 1 3 Further Topics In this final chapter we touch on a variety of topics of special interest. In Section 1 3. 1 we consider transfer function models, designed to exploit, for predictive purposes, the relationship between two time series when one leads the other. Section 1 3.2 deals with long-memory models, characterized by very slow convergence to zero of the autocorrelations p(h) as h -+ oo. Such models are suggested by numerous observed series in hydrology and economics. In Section 1 3.3 we examine linear time-series models with infinite variance and in Section 1 3.4 we briefly consider non-linear models and their applications. § 1 3. 1 Transfer Function Modelling In this section we consider the problem of estimating the transfer function of a linear filter when the output includes added uncorrelated noise. Suppose that {X, t } and {X, 2 } are, respectively, the input and output of the transfer function model x, 2 = co L rj x r -j. l + N, , j=O ( 1 3. 1 . 1 ) where T = { ti , j = 0, 1 , . . . } i s a causal time-invariant linear filter and { N,} is a zero-mean stationary process, uncorrelated with the input process {X1 1 }. Suppose also that { X, J } is a zero-mean stationary time series. Then the bivariate process {(Xt l , X,2)'} is also stationary. From the analysis of Example 1 1 .6.4, the transfer function T(e - i '-) = L� 0 ti e- ii '-, - n < A. :::;; n, 507 § 1 3. 1 . Transfer Function Modelling can be expressed in terms of the spectrum of {(X,�> Xd'} (see 1 1 .6. 1 7)) as ( 1 3 . 1 .2) { The analogous time-domain equation which relates the weights tj } to the cross covariances is 00 f2 1 (k) = jL:= O tjy l l (k - j). ( 1 3 . 1 .3) X, This equation is obtained by multiplying each side of ( 1 3. 1 . 1) by - k , l and then taking expectations. The equations ( 1 3. 1 .2) and ( 1 3 . 1 .3) simplify a great deal if the input process � WN(O, then we can happens to be white noise. For example, if immediately identify tk from ( 1 3. 1 .3) as {X,J} ai), ( 1 3 . 1 .4) This observation suggests that "pre-whitening" of the input process might simplify the identification of an appropriate transfer-function model and at the same time provide simple preliminary estimates of the coefficients tk . If can be represented as an invertible ARMA(p, q) process, { X, J} ( 1 3. 1 .5) c/J(B)Xt ! = 8(B)Z,, { Z,} � WN(O, a;J, then application of the filter n(B) = ¢(B)8- 1 (B) to {X, t } will produce the whitened series { Z,}. Now applying the operator n(B) to each side of ( 1 3. 1 . 1 ) and letting Y, = n(B)X, 2 , we obtain the relation, Y, L tj Z, _ j + N;, j= O = where 00 N; = n(B)N, { N;} { and is a zero-mean stationary process, uncorrelated with Z, } . The same arguments which gave ( 1 3 . 1 .2) and ( 1 3 . 1 .4) therefore yield, when applied to (Z, , Y,)', 00 L h= T(e - ; .) = 2na; 2jyz(Jc) = ai, 2 Yrz(h)e - ih). - w and = PrzU)ariaz , where Prz( is the cross-correlation function of { Y, } and { Z, } ,fyz( is the cross spectrum, ai = Var(Z,) and a� = Var( Y,). Given the observations {(X tl , X, 2)', t = 1 , . . . , n}, the results of the previous paragraph suggest the following procedure for estimating { tJ and ·) tj · ) 1 3. Further 508 Topics analyzing the noise { N1} in the model ( 1 3 . 1 . 1 ) : <f. and 9 denote the maximum likelihood estimates of the autoregressive and ( 1) Fit an ARMA model to {X1 1 } and file the residuals {Z 1 , . • • , 2"} . Let moving average parameters and let «J� be the maximum likelihood estimate of the variance of { Zr }. (2) Apply the operator fi(B) = ify(B)fr 1 (B) to { X12}, giving the series { i\ , . . . , f.} . The values Y1 can be computed as the residuals obtained by running the computer program PEST with initial coefficients <f., 9 and using Option 8 with 0 iterations. Let 8� denote the sample variance of Y1 • (3) Compute the sample cross-correlation function Prz(h) between { � } and { .ZJ Comparison of pyz(h) with the bounds ± 1 .96rt - 1 1 2 gives a preliminary indication of the lags h at which Prz(h) is significantly different from zero. A more refined check can be carried out by using Bartlett 's formula (Theorem 1 1 .2.3) for the asymptotic variance of Prz(h). Under the assumptions that {Z1} � WN(O, 8�) and { (Y� ' Z1Y} is a stationary Gaussian process, { n Va r(p rz(h)) � 1 - p �z(h 1 .5 <Xl - k �� 00 (p �z(k) + p � y(k)/2) J + L [pyz(h + k)p rz(h - k) - 2pyz(h)p rz( - k)p h(h + k)]. k� - <Xl In order to check the hypothesis H 0 that Prz(h) = 0, h ¢ [a, b], where a and b are integers, we note from Corollary 1 1 .2. 1 that under H 0 , Var(p yz(h)) � n - 1 for h ¢ [a, b]. We can therefore check the hypothesis H 0 by comparing P rz , h ¢ [a, b] with the bounds ± l .96n - 1 1 2 • Observe that Prz(h) should be zero for h < 0 if the model ( 1 3 . 1 . 1) is valid. (4) Preliminary estimates of th for lags h at which Prz( h) is found to be significantly different from zero are For other values of h the preliminary estimates are th = 0. Let m ?: 0 be the largest value of j such that ti is non-zero and let b ?: 0 be the smallest such value. Then b is known as the delay parameter of the filter { ij}. If m is very large and if the coefficients {tJ are approximately related by difference equations of the form j ?: b + p, then T(B) = L'J'� b tiB can be represented approximately, using fewer parameters, as i 509 § 1 3 . 1 . Transfer Function Modelling In particular, if ij = 0, j < b, and 0 = w0 v( b' j ;;::: b, then ( 1 3 . 1 .6) T(B) = w0(1 - v 1 B) - 1 Bb . Box and Jenkins (1 970) recommend choosing T(B) to be a ratio of two polynomials, however the degrees of the polynomials are often difficult to estimate from {ti} . The primary objective at this stage is to find a para metric function which provides an adequate approximation to T(B) with out introducing too large a number of parameters. If T(B) is represented as T(B) = Bbw(B)v - 1 (B) = Bb(w0 + w 1 B + · · · + wq Bq)(1 - v 1 B - · · · - vP BP) - 1 with v(z) # 0 for l z l :::;; 1 , then we define m = max(q + b, p). (5) The noise sequence {N�' t = m + 1 , . . . , n} is estimated by iV, = x, 2 - f(B)Xtl. (We set N, = 0, t :::;; m , in order to compute N, , t > m = max(b + q, p).) (6) Preliminary identification of a suitable model for the noise sequence is carried out by fitting a causal invertible ARMA model ¢ < Nl(B)N, = e< Nl(B) W,, { W,} � WN(O, O"�), to the estimated noise N m + I ' . . . , Nn . (7) Selection o f the parameters b, p and q and the orders p 2 and q 2 of cP(N)( · ) and e< Nl( · ) gives the preliminary model, ¢< Nl(B)v(B)X, 2 Bb ¢< Nl(B)w(B)Xt 1 + e< Nl(B)v(B) W,, where T(B) Bb w(B)v - 1 (B) as in step (4). For this model we can compute W;(w, v, <P<Nl, o<Nl), t > m * = max(p 2 + p, b + p 2 + q), by setting W, = 0 for t :::;; m * . The parameters w, v, <P< NJ and o< Nl can then be estimated by mlfllm!Zlflg = = n ( 1 3 . 1 . 7) L w;(w, v, <P(N), o< Nl) r ::::: m* + 1 subject to the constraints that ¢ < Nl(z), e< Nl(z) and v(z) are all non-zero for l z l :::;; 1 . The preliminary estimates from steps (4) and (6) can be used as initial values in the minimization and the minimization may be carried out using the program TRANS. Alternatively, the parameters can be estimated by maximum likelihood, as discussed in Section 1 2.3, using a state-space representation for the transfer function model (see ( 1 3. 1 . 19) and ( 1 3. 1 .20)). (8) From the least squares estimators of the parameters of T(B), a new estimated noise sequence can be computed as in step (5) and checked for compatibility with the ARMA model for { N,} fitted by the least squares procedure. If the new estimated noise sequence suggests different orders for cP(N)( . ) and e< Nl( " ), the least squares prOCedure in Step (7) can be repeated using the new orders. (9) To test for goodness of fit, the residuals from the ARMA fitting in steps (1) and (6) should both be checked as described in Section 9.4. The sample cross-correlations of the two residual series { zl ' t > m * } and 510 1 3. Further Topics { W,, t > m*} should also be compared with the bounds ± 1 .96/Jn in order to check the hypothesis that the sequences { N,} and { Z,} are uncorrelated. ExAMPLE 1 3 . 1 . 1 (Sales with a Leading Indicator). In this example we fit a transfer function model to the bivariate time series of Example 1 1 .2.2. Let X11 = ( 1 - B)l-; 1 - .0228, and X, 2 = ( 1 - B) Y, 2 - .420, t= 1, . . . ' 149, t = 1 , . . . ' 149, where { Y, 1 } and { Y, 2 }, t = 0, . . . , 149, are the leading indicator and sales data respectively. It was found in Example 1 1.2.2 that { X, 1 } can be modelled as the zero mean ARMA process, X" = ( 1 - .474B)Z,, = {Z,} � WN(O, .0779). We can therefore whiten the series by application of the filter n(B) ( 1 - .474B) - 1 . Applying n(B) to both { X, J } and {X, 2 } we obtain cri = .0779, and cJ � 4.021 7. = These calculations and the filing of the series { Z,} and { Y,} were carried out using the program PEST as described in step (2). The sample corre lation function pyz(h) of { Y,} and { Z,}, computed using the program TRANS, is shown in Figure 1 3. 1 . Comparison of p yz(h) with the bounds ± 1 .96(149) - 1 1 2 = ± . 16 1 suggests that p yz(h) = 0 for h < 3. Since tj = PrzU)crr/crz is decreasing approximately geometrically for j � 3, we take T(B) to have the form ( 1 3 . 1 .6), i.e. T(B) = w0(1 - v1B) - 1 B3• Preliminary estimates of w0 and v 1 are given by w0 = t3 4.86 and v 1 = t4/t3 .698. The estimated noise sequence is obtained from the equation = = t = 4, 5, . . . ' 149. Examination of this sequence using the program PEST leads to the MA( l) model, { W1} "' WN (O, .0590) . Substituting these preliminary noise and transfer function models into equation ( 1 3 . 1 . 1 ) then gives X12 = 4.86B3 ( 1 - .698B) - 1 Xt! + ( I - . 364B) Wt, { Wt} "' WN(O, .0590). 511 § 1 3. 1 . Transfer Function Modelling 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 ��---7��-�----�� -0 4 -0.5 -0.6 -0 7 -0.8 -0.9 - 1 4-------�--��--�-� -10 -20 10 0 Figure 1 3. 1 . The sample cross-correlation function pyz(h), - 20 :S h 13.1.1. :S 20 20, of Example Now minimizing the sum of squares ( 1 3. 1 .7) with respect to the parameters (w0, v 1 , 8iN>) using the program TRANS, we obtain the least squares model X, 2 = 4.7 1 7B3(1 - .724B) - 1 X, 1 + ( 1 - .582B) lf; , { lt'; } WN(O, .0486), ( 1 3. 1 .8) � where Xr 1 = (1 - .474B)Z, { Z,} � WN(O, .0779). Notice the reduced white noise variance of { lt';} in the least squares model as compared with the preliminary model. The sample autocorrelation and partial autocorrelation functions for the senes N, = X, 2 - 4.7 1 7B3(1 - .724B) - 1 X, 1 are shown in Figure 1 3.2. These graphs strongly indicate that the MA( l ) model i s appropriate for the noise process. Moreover the residuals � obtained from the least squares model ( 1 3. 1 .8) pass the diagnostic tests for white noise as described in Section 9.4, and the sample cross-correlations between the residuals � and Z,, t = 4, . , 1 49, are found to lie between the bounds ± 1 .96/Jl44 for all lags between ± 20. . . 1 3. Further Topics 512 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 . -0.9 - 1 0 1 0 20 30 40 20 30 40 (a) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 - 1 0 1 0 (b) Figure 1 3.2. The sample ACF (a) and PACF (b) of the estimated noise sequence N, = X, 2 - 4.71 78 3 ( 1 - .724B) - 1 X11 of Example 1 3. 1 . 1 . 513 § 1 3 . 1 . Transfer Function Modelling A State-Space Representation o f the Series {(Xtt, X12)'}t A major goal of transfer function modelling is to provide more accurate prediction of X n +h. 2 than can be obtained by modelling {X1 2 } as a univariate series and projecting X " + h . 2 onto sp{ X 1 2 , 1 ::::;; t ::::;; n}. Instead, we predict xn + h , 2 using ( 1 3 . 1 .9) To facilitate the computation of this predictor, we shall now derive a state-space representation of the input-output series {(X1 1 , X1 2 )'} which is equivalent to the transfer function model. We shall then apply the Kalman recursions of Section 1 2.2. (The state-space representation can also be used to compute the Gaussian likelihood of {(X1 1 , X1 2 )', t = 1, . . . , n} and hence to find maximum likelihood estimates of the model parameters. Model selection and missing values can also be handled with the aid of the state-space representation; for details, see Brockwell, Davis and Salehi ( 1 990).) The transfer function model described above in steps ( 1)-(8) can be written as ( 1 3. 1 . 1 0) where { X t l } and { N1} are the causal and invertible ARMA processes ( 1 3. 1 . 1 1 ) { Zr } � WN(O, a�), ¢(B)X t 1 = B(B)Z� ' ( 1 3 . 1 . 1 2) </J(Nl(B)N I = ()(Nl(B)Ht;, { Ht;} WN(O, a �), � {Z1} is uncorrelated with { Ht;}, and v(z) =/= 0 for l z l ::::;; 1 . By Example 1 2. 1 .6, { X t l } and { N1} have the state-space representations Xr + l , l = F 1Xr 1 + H I ZP ( 1 3. 1 . 1 3) X t l = G 1 x1 1 + Zr , and Dr + I = F(Nlnr + H(Nl w;, ( 1 3. 1 . 14) Nr = G(Nlnr + w;, where (F � > G 1 , H d and (F< NJ, G< NJ, H(NJ) are defined in terms of the autoregressive and moving average polynomials for {X1} and {Nr}, respectively, as in Example 12.1 .6. In the same manner, define the triple (F 2 , G 2 , H 2 ) in terms of the "autoregressive" and "moving average" polynomials v(z) and zb w(z) in ( 1 3. 1 . 1 0). From ( 1 3. 1 . 1 0), it is easy to see that { X1 2 } has the representation (see Problem 1 3.2), ( 1 3. 1 . 1 5) where b = w0 if b = 0 and 0 otherwise, and { x1 2 } is the unique stationary solution of ( 1 3 . 1 . 1 6) t Pa ges 5 1 3-5 1 7 m a y be omitted without loss of continuity. 514 1 3. Further Topics Substituting from ( 1 3. 1 . 1 3) and ( 1 3 . 1 . 14) into ( 1 3. 1 . 1 5) and ( 1 3. 1 . 1 6), we obtain x t 2 = Gz Xtz + b G l xt l + b Zt + G(N)nt + w,, xt + I . Z = FzXtz + Hz G J xt l + HzZt . ( 1 3. 1 . 1 7) ( 1 3. 1 . 1 8) By combining ( 1 3. 1 . 1 3), ( 1 3 . 1 . 14), ( 1 3. 1 . 1 7) and ( 1 3 . 1 . 1 8), the required state-space representation for the process, {(X t 1 , Xt 2 )', t = 1 , 2, . . . }, can now be written down as ( 1 3 . 1 . 1 9) where { 'It equation, = (xt' l , Xtz , nt')'} f [ ' the unique stationary solution of the state IS F, 'It + ! = H �G � 0 0 F(N) 0 Fz 0 ] [ H, + t Hz 'I 0 l/t,]l�J ( 1 3 . 1 .20) ExAMPLE 1 3. 1 . 1 (c on t .). The state-space model for the differenced and mean corrected leading indicator-sales data (with b = 3, w(B) = w0, v(B) = 1 - v 1 B, cp(B) = 1 , B(B) = 1 + B I B, cp(N)(B) = 1 and e<Nl(B) = 1 + BiN)B) is l j [ xt l where {'It equation, xt z = (x; 1 , x; 2 , n;)' } 'It + I = 0 0 0 = IS 0 0 0 Wo 0 0 0 1 o o o 1 o o o j [j o l 't + 1 zt w, . ( 1 3. 1 .2 1 ) the umque stationary solution of the state 0 0 0 0 0 0 0 0 VI 0 0 0 0 'It + 0 0 0 Wo 0 0 B<fl el 0 0 l�J. ( 1 3 . 1 .22) We can estimate the model parameters in ( 1 3. 1.21) and ( 1 3 . 1 .22) by maximizing the Gaussian likelihood ( 1 1.5.4) of {(Xt 1 , X t 2)', t = 1 , . . . , 149}, using ( 12.2. 1 2) and ( 1 2.2. 14) to determine the one-step predictors and their error covariance matrices. This leads to the fitted model, { Zr } Nt = ( 1 - .621 B) W,, { W,} "' WN(O, .0768), ( 1 3 . 1 .23) "' WN(O, .0457), which differs only slightly from the least squares model ( 1 3 . 1 .8). It is possible to construct a state-space model for the original leading indicator and sales data, { Yt : = ( l-; 1 , l-;2)', t = 0, 1, . . . , 149}, at the expense of increasing the dimension of the state-vector by two. The analysis is similar § 1 3. 1 . Transfer Function Modelling 515 to that given i n Example 1 2. 1 .7 for ARIMA processes. Thus we rewrite the model ( 1 3. 1 .21)-( 1 3 . 1 .22) as ( 1 3 . 1 .24) [ ] [ ] [ J [ J [ ]+[ ] [ ] [ ] + [z,J + [ ] + Observing that ·0228 Yr 1 = Y, r .420 1"; 2 v r; I = v r; 2 _ _ 0228 r · .420 - Y, _ 1 - (t - 1 ) = X, = GTJ, .0228 Yr - 1 , 1 - (t - 1 ) .420 r; _ 1 . 2 .0228 .420 .0228 .420 Y, _ 1 - (t - 1 ) VVr .0228 , .420 we introduce the state vectors, T, + 1 = (TJ; + 1 , Y; - tj.t')', where J.l' = (.0228, .420). It then follows from the preceding equation and ( 1 3 . 1 .24) that {Y, - tJ.1} has the state-space representation, for t = 1, 2, . . . , Y, - tJ.l = [G with state equation, ( 1 3 . 1 .25) T,+ [ F lzxz]T, + [�:], I= initial condition, /2x z]T, + [�J 0 G ( 1 3. 1 .26) W.' and orthogonality conditions, t = 0, ± 1 , . . . . + To forecast future sales, we apply the Kalman recursions, ( 1 2.2. 10)- ( 1 2.2. 14) to this state-space model to evaluate P.(Yn + h - (n h)J.l), where P.( · ) denotes projection onto sp{ Y0 , Y 1 - J.l, . . . , Y. - nJ.l} = sp{Y0 , X 1 , . . . , X.}. Then the required predictor P(Y" + h 1 1, Y0 , . . . , Y.) is given by P(Yn +h l l, Y o , . . . , Y.) = (n + h)J.l + P.(Yn + h - (n + h)J.l). 1 3. Further Topics 516 As in the case of an ARIMA process (see Remark 8, Section 1 2.2), this predictor can be computed more directly as follows. Since Y 0 is orthogonal to X 1 , X 2 , and 11 � > flz , . . . , we can write 0 . • • P o tt l = Ett J = and . , X, _ 1 ) = ft, for t 2 2. Similarly, P, ttr + h = P(tt, + h i X 1 , . . . , X ,) and P, X , + h = GP,ttr + h for all t 2 1 and h 2 1 . Both ft, and its error covariance matrix, P, _ 1 fl, = P(tt, I X 1 , . . n�. , = E(tt, - ft,)(tt, - ft,)', can be computed recursively by applying Proposition 1 2.2.2 to the model ( 1 3 . 1 .24) with initial conditions Tt ! = '1'�. I = Q, and 0, n�. 1 = E(tt J tt'J l = I FjQ ! F'j j=O 0.0 1 74 0.0000 1 .92 1 5 0.0000 0.0000 0.5872 - 0. 1 7 1 9 0.4263 co 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 , 0.0000 - 0. 1 7 19 0.4263 0.5872 1.92 1 5 0.5872 1.92 1 5 0.5872 0.0000 0.0000 0.01 76 where Q 1 is the covariance matrix of V1 (see Problem 1 3.4). Consequently, the one-step predictors for the state-vectors, T, = (tt; , Y; _ 1 - (t - l)'Jl')', m ( 1 3. 1 .26) are Q [Q�·' OJ with error covariance matrices given by t = 0 0 , for t 2 1 . It follows from ( 1 2.2. 1 2), ( 1 3 . 1 .24) and ( 1 3 . 1 .25) that p 1 49(Y J s o - 1 50J1) = [G whence P(Y 1 so l l, Yo, · · · , Y 1 49 ) [y lt ! s o 1 49 - 149J1 J [ ]+[ ]+[ ]=[ ] = Jl = + / 2 x 2] X 1 so .0228 .420 + Y 1 49 . 1 38 - .232 1 3.4 262.7 1 3.56 . 262.89 ( 1 3 . 1 .27) 517 § 1 3 . 1 . Transfer Function Modelling [ ] [ ] [ Similarly, P(Y t s l l l , Y o , . . . , Y I49) = = .0228 .420 + P 1 4 9 X 1 s 1 + P(Y i s o l l , Y o , . . . , Y I 49) .0228 .420 + [ ] [ ] J 1 3.56 0 .92 1 + 262.89 = 1 3.58 . 264.23 ( 1 3. 1 .28) The corresponding one- and two-step error covariance matrices, computed from (1 2.2. 1 3) and (1 2.2. 1 4), are found to be [ Q�� i 49 ol = [ G /2 x 2] L l49 O 0 0 [ J and = where and Q1 [ X l s o)(X I 50 - X 1 5 o)' E(X I 5 0 .07 68 0 , = .0457 0 = J , /2 x 2] + [G [ < l Q 1249 = F J G 0 I2 X 2 J ( 1 3 . 1.29) .09 79 0 , 0 .0523 [ .0768 0 0 .0457 ][ ( 1 3. 1 .30) Q�� ll49 0 O 0 ][ is the covariance matrix of (V'1 , Z 1 , F' 0 W1 )'. Prediction Based on the Infinite Past For the transfer function model described by ( 1 3 . 1 . 10H 1 3 . 1 . 1 2), prediction of Xn + h. 2 based on the infinite past {(X1 1 , X, 2 )', - oo < t :s; n}, is substantially simpler than that based on {(X, 1 , Xd', 1 :s; t :s; n}. The infinite-past predictor, moreover, gives a good approximation to the finite-past predictor provided n is sufficiently large. The transfer function model ( 1 3 . 1 . 10H 1 3. 1 . 1 2) can be rewritten as where f3(B) = X, 2 = T(B)X, 1 + {3(B)W,, 1 X1 1 = 8(B)¢ - (B)Z, , (13.1.31) ( 1 3. 1 .32) (}( N)(B)/¢(N)(B). Eliminating x, l gives X, 2 = 00 00 L cxj Z, _ j + L f3j w, _ j • j= O j=O ( 1 3 . 1 .33) 1 3. Further Topics 518 where cx(B) = T(B)O(B)I¢(B). Our objective is to compute Pn X n + h, 2 : = psp{X,,, x,,, - aJ < t :5 n }x n + h, 2 ' Since {X,d and {N,} are assumed to be causal invertible ARMA processes, it follows that sp { (X, 1 , X, 2 )', - oo < t � n} = sp { (Z, Jt;)', - oo < t � n}. Using the fact that { Z,} and { Jt; } are uncorrelated, we see at once from ( 1 3. 1 .33) that Pn X n + h, 2 = L ct.j Zn +h - j + L J3j W, +h -j' ( 1 3 . 1 .34) j=h j=h Setting t = n + h in ( 1 3 . 1 .33) and subtracting ( 1 3 . 1 .34) gives the mean squared error, h-! h- ! ( 1 3 . 1 .35) E(Xn + h, 2 - Pn X n + h, 2) 2 = d I ct.J + O"fy L J3J. j=O j=O To compute the predictors Pn X n + h, z we proceed as follows. Rewrite ( 1 3. 1 .3 1 ) as ( 1 3 . 1 .36) aJ aJ where A, U and V are polynomials of the form, A (B) = 1 - A 1 B - · · · - A a Ba , U (B) = Uo + U1 B + · · · + UuB u , and 1 + V1B + · · · + VvB" . Applying the operator P" to equation ( 1 3 . 1 .36) with t = n + h, we obtain V(B) = a = v u I A j Pn X n + h - j, 2 + L Uj Pn X n + h - b -j, ! + L J-} W, + h - j ' j= ! j=O j=h ( 1 3. 1 .37) where the last sum is zero if h > v. Since { Xt l } is uncorrelated with { Jt;}, Pn X j 1 Psr;{x,, - ao < r ;<;n} X j 1 , and the second term in ( 1 3. 1 .37) is therefore obtained by predicting the univariate series { X r 1 } as described in Chapter 5 using the model ( 1 3. 1 .32). In keeping with our assumption that n is large, we can replace Pn X j by the finite past predictor obtained from the program PEST. The values Hj, j :-s:; n, are replaced by their estimated values Wj from the least squares estimation in step (7) of the modelling procedure. Equations ( 1 3 . 1 .37) can now be solved recursively for the predictors Pn Xn + ! , 2 , Pn X n + 2 , 2 ' Pn X n + 3, 2 ' ' ' ' ' Pn X n + h , 2 = ExAMPLE 1 3. 1 .2 (Sales with a Leading Indicator). Applying the preceding results to the series {(X, 1 , Xr z )', 1 � t � 1 49} of Example 1 3 . 1 . 1, we find from ( 1 3. 1 .23) and ( 1 3. 1 .37) that P 1 49 X 1 so, 2 = .726X 1 49, 2 + 4.70 1 X !4 7 ,1 - 1 .347 W1 4 9 + . 4 5 1 W! 4 s · § 1 3 . 1 . Transfer Function Modelling 519 Replacing X 1 4 9, 2 and X 1 4 7 , 1 by the observed values .08 and - .093 and W1 49 and W1 4 8 by the sample estimates - .0706 and . 1449, we obtain the predicted value P 1 4 9 X 1 50 . 2 = - .219. Similarly, setting X 1 4 8, 1 = .237, we find that P t4 9 X t 5 t , 2 = .726P t49 X t 5o, 2 + 4.70 1 X t4 s. t + .45 1 Wt 4 9 = .923. In terms of the original sales data { � 2 } we have ¥1 4 9, 2 = 262.7 and � 2 = � - 1 . 2 + x 1 2 + .420. Hence the predictors of actual sales are Pf4 9 ¥1 50, 2 = 262.7 - .2 1 9 + .420 = 262.90, P f4 9 Y1 5 t . 2 = 262.90 + .923 + .420 = 264.24, { ( 1 3 . 1 .38) where Pf4 9 means projection onto sp{ 1 , ¥1 4 9, 2 , (X5, 1 , X s, 2 )', - oo < s < 149}. These values are in close agreement with ( 1 3 . 1 .27) and ( 1 3 . 1 .28) obtained earlier by projecting onto sp{ 1 , ( � 1 , � 2 )', 0 � t � 1 49}. Since our model for the sales data is ( 1 - B) � 2 = .420 + 4.701B\1 - .476B)( l - .726B) - 1 Z1 + ( 1 - .621B) VJ--; , an argument precisely analogous to the one giving ( 1 3. 1 .35) yields the mean squared errors, h-1 h- 1 E( Yt 4 9 + h, 2 - Pf4 9 Yt 4 9 + h, 2) 2 = ai L rxf 2 + a� L [3f 2 , j=O j=O where 1 3 rxf zj = 4.70 1 z ( 1 - .476z)( 1 - .726z) - 1 ( 1 - z) L j=O 00 and L f3J zi ( 1 - .62 1 z)(1 - z) - 1 . 00 = j= O For h = 1 and 2 w e obtain E( Yt 5o, 2 - P f4 9 Yt 5o, 2 ) 2 = .0457, ( 1 3. 1 .39) E( Yt 5 t , 2 - Pf4 9 Yt 5 t , 2 ) 2 .0523, m agreement with the finite-past mean squared errors m ( 1 3 . 1 .29) and ( 1 3 . 1 .30). It is interesting to examine the improvement obtained by using the transfer function model rather than fitting a univariate model to the sales data alone. { = 520 1 3. Further Topics If we adopt the latter course we find the model, X, 2 - .249X, _ 1 _ 2 - . 1 99X, _ 2 , 2 = U,, where { U,} � WN(O, 1.794) and X, 2 = 1'; 2 r; 1 2 - .420. The correspond ing predictors of Y1 5 0 , 2 and Y1 5 1 , 2 are easily found from the program PEST to be 263. 14 and 263.58 with mean squared errors 1.794 and 4.593 respectively. These mean squared errors are dramatically worse than those obtained using the transfer function modeL - - . § 13.2 Long Memory Processes An ARMA process {X,} is often referred to as a short memory process since the covariance (or dependence) between X1 and X 1 +k decreases rapidly as k -> oo. In fact we know from Chapter 3 that the autocorrelation function is geometrically bounded, i.e. i p(k)i :::;; Cr\ k = 1, 2, . . . , where C > 0 and 0 < r < l . A long memory process is a stationary process for which p(k) � Ck 2d - l as k -> oo, ( 1 3.2. 1 ) where C # 0 and d < .5. [Some authors make a distinction between "inter mediate memory" processes for which d < 0 and hence L� _ 00 I p(k)l < oo, and "long memory" processes for which 0 < d < .5 and L k"= -oo i p(k)l oo.] There is evidence that long memory processes occur quite frequently in fields as diverse as hydrology and economics (see Hurst ( 1 95 1 ), Lawrance and Kottegoda (1 977), Hipel and McLeod ( 1 978), and Granger ( 1 980)). In this section we extend the class of ARMA processes as in Hosking ( 1 98 1 ) and Granger and Joyeux ( 1980) to include processes whose autocorrelation func tions have the asymptotic behaviour ( 1 3.2. 1). While a long memory process can always be approximated by an ARMA(p, q) process (see Sections 4.4 and 8. 1), the orders p and q required to achieve a reasonably good approximation may be so large as to make parameter estimation extremely difficult. For any real number d > - 1 , we define the difference operator Vd = ( 1 - B)d by means of the binomial expansion, = vd = ( 1 - B)d = where n- = 1 L nj Bi, 00 j=O ru - d) k-1-d = n ' k f (j + 1)r( - d) O<ksj j = 0, 1 , 2, . . . ' ( 1 3.2.2) 521 § 1 3.2. Long Memory Processes and r( · ) is the gamma function, r(x) : = {Iro t x- 1 e - 1 dt, oo, x-1 r(l + x), X > 0, X = 0, X < 0. Definition 1 3.2.1 (The ARIMA (O,d,O) Process). The process { X0 t = 0, ± 1 , . . . } is said to be an ARIMA (0, d, 0) process with d E ( - .5, .5) if { X, } is a stationary solution with zero mean of the difference equations, ( 1 3.2.3) The process { X1 } is often called fractionally integrated noise. Throughout this section convergence of sequences of random variables means convergence in mean square. Remark 1 . Remark 2. Implicit in Definition 1 3.2. 1 is the requirement that the series VdX, = 'If= 0 niX, _ i with { ni } as in ( 1 3.2.2), should be mean square con vergent. This implies, by Theorem 4. 10. 1, that if X1 has the spectral representation X, = f< - "· "/1" dZx(A.) then VdX, = f e il " ( 1 J(-1t,1t] - e - i). )d dZx(A.). ( 1 3.2.4) In view of the representation ( 13.2.3) of { Z1} we say that { X1} is invertible, even though the coefficients { ni } may not be absolutely summable as in the corresponding representation of { Z, } for an invertible ARMA pro cess. We shall say that { X, } is causal if X1 can be expressed as Remark 3. X, = L 1/Jj Zr -j j= ro O where '[,f= 0 1/J J < oo . The existence of a stationary causal solution of ( 13.2.3) and the covariance properties of the solution are established in the following theorem. Theorem 1 3.2.1 . If d E ( - .5, .5) then there is a unique purely nondeterministic stationary solution { Xr } of ( 1 3.2.3) given by ro xl = I 1/Jj Z,_j = v - dzo j= O ( 1 3.2.5) 1 3. Further Topics 522 where t/1· 1 = ru + dl TI k - 1 = k ['(j + 1 ) 1 (d) O<k ,;j + d j = 0, 1, 2, . . . . ' ( 1 3.2.6) Denoting byf( · ), y ( · ), p( · ) and oc( · ) the spectral density, autocovariancefunction, autocorrelation function and partial autocorrelation function respectively of {X, } , we have - n :s;: A y(O) p (hl = = CJ 2 1(1 - 2d)/12 ( 1 - d), qh + dl 1 (1 - dl ['(h - d + 1 ) ['(d) = TI k - 1 + d O<k ,; h k - d :S:: n, ( 1 3.2. 7) ( 1 3.2.8) ' h = 1, 2, . . . , ( 1 3.2.9) and oc(h) = d/(h - d), h = ( 1 3.2. 1 0) 1, 2, . . . . Remark 4. App1ying Stirling's formu1a, l(x) to ( 1 3.2.2), (1 3.2.6) and ( 1 3.2.9), we obtain � J2� e - x + 1 (x - lY - 1;2 as x � oo, as j � 00 , nj rd - 1 /1( - d) tf;i l - 1 /l(d) as j � oo, � (1 3.2. 1 1) � ( 1 3.2. 1 2) and p(h) � h 2d - 1 1(1 - d)j['(d) as h � oo. ( 1 3.2. 1 3) Fractionally integrated noise with d # 0 is thus a long memory process in the sense of Definition 1 3.2. 1 . Remark 5. Since sin A � A as A � 0, we see from ( 1 3.2.7) that (1 3.2. 1 4) showing that f(O) is finite if and only if d :s;: 0. The asymptotic behaviour ( 1 3.2. 14) of f(A) as A � 0 suggests an alternative frequency-domain definition of long memory process which could be used instead of ( 1 3.2. 1 ). PROOF OF THEOREM 1 3.2. 1 . We shall give the proof only for 0 < d < .5 since the proof for - .5 < d < 0 is quite similar and the case d = 0 is trivial. From (1 3.2. 1 2) it follows that L� o tj;J < oo so that L tf;i e - ii · � ( 1 - e - ' Td as n � oo , j=O n ( 1 3.2. 1 5) 523 §1 3.2. Long Memory Processes where convergence is in L 2(dA.) and dA. denotes Lebesgue measure. By Theorem 4. 1 0. 1 , (1 - B)-dZ, : = L 1/Jj Zr-j j =O is a well-defined stationary process and if { Z, } has the spectral representation Z, f< - ,.,1 e i.l.r d W(A.), then ew(l - e - 0Td d W(A.). ( 1 - B)-dz, { 00 = = J ( - 7t , 7t] Since L� o I nil < oo (by ( 1 3.2. 1 1 )), we can apply the operator ( 1 - B)d = L� o niBi to (1 - B) - dz, (see Remark 1 in Section 4. 1 0), giving - (1 B)d(1 - B) - dZ, = { J ( - 7t, 7t] ew d W (A.) = Z,. Hence {X,} as defined by ( 1 3.2.5) satisfies ( 1 3.2.3). To establish uniqueness, let { Y,} be any purely nondeterministic stationary solution of ( 1 3.2.3). If { Y, } has the spectral representation, Y, = { J ( - n , n] e io. dZr(A.), then by ( 1 3.2.4), the process {( 1 - B)d Y,} has spectral representation, (1 - B)d Y, { = J ( - 7t , 7t] e i'.< ( l - e -i.< )d dZr(A.) and spectral density a2/(2n). By ( 1 3.2. 1 5), Theorem 4. 1 0. 1 and the continuity of Fr at 0, ( 1 - srdz, = ( 1 - B) -d ( l - B)d Y, i = lim 1/Ji B ( 1 - B)d Y, n-+oo 1=0 = { (.I ) J ( -1t, 7t] ( 1 - e - ; ;. ) - d(l - e -i.< )de ir .< dZy (A.) = Y,. ( 1 - B) - d Z, = X,. Hence Y, = By (4.1 0.8) the spectral density of {X,} is f(A.) = 1 1 - e - i.< l - 2da2/(2n) = 1 2 sin(A./2W 2da2/(2n). The autocovariances are 2f (J " = - n o cos(hA.)(2 sin(A./2W2d dA. ( - 1 t r ( l - 2d) 2, r(h - d + 1)r(1 - h - d( h = 0, 1 , 2, . . . . 524 13. Further Topics I" where the last expression is derived with the aid of the identity, n: cos (hn/2) 1 ( v + 1)2 1 -v . v-1 cos (hx)s m (x) dx = v v l(( + h + 1 )/2) 1(( v - h + 1 )/2) o (see Gradshteyn and Ryzhik ( 1 965), p. 372). The autocorrelations ( 1 3.2.9) can be written down at once from the expression for y(h). To determine the partial autocorrelation function we write the best linear predictor of xn+1 in terms of xn , . . . , x1 as .fn+1 = rPn 1 xn + . . . + rPnnx 1 and compute the coefficients rPni from the Durbin-Levinson algorithm (Prop osition 5.2. 1). An induction argument gives, for n = 1 , 2, 3, . . . , rPn) = · whence (X - n r(j - d)l(n - d - j + 1 ) ( ) j r( - d) l(n - d + 1 ) ' - l(h - d)1 (1 - d) j = 1 , . . . , n, d_ ( h) - rPnh - - 1 -_ . D ( - d) l(h - d + 1 ) h - d Fractionally integrated noise processes themselves are of limited value in modelling long memory data since the two parameters d and (J 2 allow only a very limited class of possible autocovariance functions. However they can be used as building blocks to generate a much more general class of long memory processes whose covariances at small lags are capable of assuming a great variety of different forms. These processes were introduced independently by Granger and Joyeux ( 1 980) and Hosking ( 1 98 1). (The ARIMA(p, d, q) Process with d E (- .5, .5)). { X1, t = 0, ± 1 , . . . } is said to be an ARIMA(p, d, q) process with d E ( - .5, .5) or a fractionally integrated ARMA(p, q) process if { X1} is stationary and satisfies the difference equations, Definition 13.2.2 ( 1 3.2. 1 6) where { zt } is white noise and r/J, e are polynomials of degrees p, q respectively. Clearly { X1 } is an ARIMA(p, d, q) process with d E ( - .5, .5) if and only if Vd X1 is an ARMA(p, q) process. If 8 (z) =f. 0 for l z l ::::; 1 then the sequence 1; = rjJ(B)8 - 1 (B)X1 satisfies and r/J(B)X1 = 8(B) 1;, so that { X1 } can be regarded as an ARMA(p, q) process driven by fractionally integrated noise. 525 § 1 3.2. Long Memory Processes Theorem 1 3.2.2 (Existence and Uniqueness of a Stationary Solution of ( 1 3.2. 1 6)). Suppose that d E ( - 5, .5) and that ¢( - ) and 8( - ) have no common zeroes. (a) If r/J(z) =f 0 for l z l = 1 then there is a unique purely nondeterministic stationary solution of (1 3.2. 1 6) given by 00 xt = L 1/!j v -d Zr -j j= - ro where 1/J (z) = L� - oo 1/Jj zj = 8(z)/r/J(z). (b) The solution {Xr } is causal if and only if r/J (z) =f 0 for l z l :::; 1 . (c) The solution {Xr } is invertible if and only if 8(z) =f 0 for l z l :::; 1 . (d) If the solution {X, } is causal and invertible then its autocorrelation function p ( · ) and spectral density f( · ) satisfy, for d i= 0, p(h) � Ch 2 d - ! as h --> oo , where C =f ( 1 3.2. 1 7) 0, and as ). --> 0. PROOF. We omit the proofs of (a), (b) and (c) since they are similar to the arguments given for Theorems 3. 1 . 1 -3. 1.3 with Theorem 4. 1 0. 1 replacing Proposition 3. 1 .2. If { X, } is causal then r/J (z) =f 0 for l z l :::; 1 and we can write 00 X, = 1/! (B) Y, = L 1/!j Y, -j • j�O where is fractionally integrated noise. If Yr( · ) is the autocovariance function of { Y,}, then by Proposition 3. 1 . 1 (with 1/Jj := O,j < 0), Cov ( Xr+h• X,) = L L 1/Jj l/lk yy (h - j + k) j k I.e. Yx(h) = L y(k}yy(h - k), k ( 1 3.2. 1 9) where y(k) = Lj 1/Jj l/lj +k is the autocovariance function of an ARMA(p, q) pro cess with a 2 = 1. If follows that ly(k) l < Cr\ k = 0, 1 , 2, . . . , for some C > 0 526 1 3. Further Topics and r E (0, 1) and hence that h 1 - 2 d I lii(k) l -4 0 as h -4 oo. \k \> � From ( 1 3.2. 19) we have h l- 2 d Yx(h) = h l - 2 d I y (k)yy (h - k) \k\> � + ( 1 3.2.20) L y(k)h l - 2 d)ly (h - k) . \k\ s::;h ( 1 3.2.2 1 ) The first term o n the right converges to zero as h -4 oo b y ( 1 3.2.20). B y ( 1 3.2. 1 3) there is a constant C =1= 0 (since we are assuming that d =1= 0) such that Yr(h - k) C(h - k) z d- 1 Ch z d - 1 jh. Hence � uniformly on the set l k l ::;:; � Now letting h -4 oo in (13.2.2 1) gives the result (1 3.2. 1 7). Finally from (4.4.3) and ( 1 3.2.7) the spectral density of { X1} is f(A.) 1 8 (e - i'-) l 2 lt;b(e - i'-W 2/r(2) = Remark 6. A formula involving only the gamma and hypergeometric functions is given in Sowell ( 1990) for computing the autocovariance function of an ARIMA(p, d, q) process when the autoregressive polynomial ¢(z) has distinct zeroes. Remark 7 (The ARIMA(p, d, q) Process with d ::;:; - .5). This is a stationary process {X r} satisfying ( 1 3.2.22) It is not difficult to show that ( 1 3.2.22) has a unique stationary solution. Xt = r 1 (B)O(B)V - d Z1. The solution however is not invertible. Notice that if { X1 } is an ARIMA(p, d, q) process with d < .5 then { ( 1 B)X1 } is an ARIMA(p, d - 1 , q) process. In particular if 0 < d < .5, the effect of applying the operator ( 1 B) is to transform the long memory process into an intermediate memory process (with zero spectral density at frequency zero). - - 527 § 1 3.2. Long Memory Processes Parameter Estimation for ARIMA(p, d, q) Processes with d E ( - .5, .5) Estimation of The Mean. Let {X, } be the causal invertible ARIMA(p, d, q) process defined by d E ( - .5, .5). ( 1 3.2.23) A natural estimator of the mean EX, = J1 is the sample mean, X" = n�1 (X 1 + · · · + X" ). Since the autocorrelation function p ( · ) of {X, } satisfies p (h) � as h � conclude from Theorem 7. 1 . 1 that 0 and that nE(X" - Jt) 2 � {0 oo if - .5 < d < if O < d < .5. oo, we 0, Using ( 1 3.2. 1 3) we can derive (Problem 1 3.6) the more refined result, n1 � 2d E(Xn - J1)2 � C for d E ( - . 5, . 5), where C is a positive constant. For long memory processes the sample mean may not be asymptotically normal (see Taqqu ( 1 975)). Estimation of the Autocorrelation Function, p ( · ). The function p ( · ) is usually estimated by means of the sample autocorrelation function p( · ). In the case - .5 < d < { X, } has the moving average representation 0, X, = J1 + L 1/Jj Zr �j • 00 j�O with L}�o 1 1/!i I < oo If in addition { Z, } IID(O, a 2 ) and EZt < oo then n 1 12 (p(h) - p (h)) is asymptotically normal with mean zero and variance given by Bartlett's formula, (7.2.5). If 0 < d < .5 the situation is much more com plicated; partial results for the case when {Z, } is Gaussian can be found in Fox and Taqqu ( 1 986). � Estimation of d, and 9 (a) Maximum likelihood. The Gaussian likelihood of X = (X 1 , . . . , X" )' for the process ( 1 3.2.23) with J1 0 can be expressed (cf. (8.7.4)) as = L ( p, a 2 ) = � { 1 )�t1 (2na 2 r"12 (r0, . . . , rn � 1 ) 1 12 ex p - 2 2 (j } (Xi - XYh � � · 528 13. Further Topics where � = (d, f/J 1 , . . . , f/JP , (J 1 , . . . , (Jq )', Xi , j = 1 , . . . , n, are the one-step predictors and ri - t = CJ - 2 E (Xi - xy, j = 1, . . . , n. The maximum likelihood estimators and 8 2 can be found by maximizing L( p, CJ 2 ) with respect to � and CJ 2 • By the same arguments used in Section 8.7 we find that a2 = n - 1 S( ), p p where p and is the value of � which minimizes ln(S(�)/n) + n - 1 L In ri -l · ( 1 3.2.24) j=l For {Z, } Gaussian, it has been shown by Yajima ( 1 985) in the case p = q = 0, d > 0, and argued by Li and McLeod ( 1 986) in the case d > 0, that /(�) n = ( 1 3.2.25) where W(�) is the (p + q + 1) x (p + q + 1) matrix whose (j, k) element is 1 W.k (R) - 4n 1 " _ f" _ " a In g(A. ; �) a In g(A. ; �) dA_ a{Jk a {Ji ' and CJ 2 g( ; �)/(2n) is the spectral density of the process. The asymptotic behaviour of is unknown in the case d < 0. Direct calculation of /(�) from ( 1 3.2.24) is slow, especially for large n, partly on account of the difficulty involved in computing the autocovariance function of the process ( 1 3.2.23), and partly because the device used in Section 8.7 to express Xi in terms of only q innovations and p observations cannot be applied when d i= 0. It is therefore convenient to consider the approximation to /(�), · p 1 I (w ) Ia(�) = In - L -"� 1 , i g(wi n ; �) where I.( · ) is the periodogram of the series {X 1 , . . . , X. } and the sum is over all non-zero Fourier frequencies wi = 2nj/n E ( - n, n]. Hannan ( 1 973) and Fox and Taqqu (1 986) show that the estimator p which minimizes lu(�) is consistent and, if d > 0, that p has the same limit distribution as in ( 1 3.2.25). The white noise variance is estimated by I (wi (j 2 = ! L . � . n i g(w i ; �) · The approximation la(�) to /(�) does not account for the determinant term n - 1 L'l = 1 ln ri - t = n - 1 ln det(CJ - 2 r.) where r. = E(XX'). Although n - 1 L ln ri _ 1 --> 0 j= 1 n as n --> oo , § 1 3.2. Long Memory Processes 529 this expression may have a non-negligible effect on the minimization of I(�) even for series of several hundred observations. A convenient approximation to the determinant term can be found from Proposition 4.5.2, namely n -1 In (ry ) g(wi; �) = n -1 � In g(w ; �). i Adding this term to Ia(�), we arrive at a second approximation to I given by ( 1 3.2.26) Estimation based on minimizing lb( · ) has been studied by Rice ( 1979) in a more general setting. For ARIMA(p, d, q) processes with d E ( - .5, .5), empirical studies show that the estimates which minimize lb tend to have less bias than those which minimize Ia . (b) A regressio n method. The second method is based on the form of the spectral density ( 1 3.2.27) where ( 1 3.2.28) is the spectral density of the ARMA(p, q) process, ( 1 3.2.29) Taking logarithms in ( 1 3.2.27) gives ; ln f(A.) = lnfu(O) - d ln l 1 - e - ;. 1 2 + ln[fu(A-)/fu (O)]. ( 1 3.2.30) Replacing A in ( 1 3.2.30) by the Fourier frequency wi = 2nj/n E (O, n) and adding In In(wi ) to both sides, we obtain In ln(w) = ln fu(O) - d ln l 1 - e -iw1 1 2 + ln(Jn(wi )/f(wj )) ( 1 3.2.3 1 ) + ln(fu(w)!fu(O)). Now if wi i s near zero, say wi ::;; wm where wm i s small, then the last term is negligible compared with the others on the right-hand side, so we can write ( 1 3.2.3 1 ) as the simple linear regression equation, j = 1 , . . . , m, ( 1 3.2.32) where lj = In In(w), xi = ln l l - e -iw1 l 2 , ei = ln(In(w)/f(w)), a = ln fu(O) and b = - d. This suggests estimating d by least-squares regression of Y1 , . . . , Ym on x 1 , . . . , xm . When this regression is carried out, we find that the least-squares estimator a of d is given by 530 I 3. Further Topics a m lm = - i� (X; - x) ( Y; - Y) i� (x; - x)2. ( 1 3.2.33) Geweke and Porter-Hudak ( 1 983) argue that when - .5 < d < 0 there exists a sequence m such that (In n)2/m --+ 0 as n --+ oo and ( / [ � x; ]) a is AN d, n2 6; ( - x)2 as n --+ oo . ( 1 3.2.34) Notice that n2/6 is the variance of the asymptotic distribution of ln(J (.l.)/f(.l.) ) for any fixed A E (0, n). Having estimated d, we must now estimate the ARMA parameters cp and 9. Since X, v - d V, where { V, } is an ARMA(p, q) process, we find from ( 10.3. 1 2) (replacing Z by U) that ( 1 3.2.35) lx(l) ( 1 - e - uT d lu (A.) + Y,(.l.) = = where lx( · ) and lu( · ) are the discrete Fourier transforms of {X1 , . . . , X. } and { U1 , . . . , v. } respectively. Ignoring the error term Y,(.l.) (which converges in probability to zero as n --+ 00 ) and replacing d by a, we obtain the approximate relation ( 1 3.2.36) If now we apply the inverse Fourier transform to each side of (1 3.2.36) we obtain the estimates of V,, t = 1, . . . , n, ( 1 3.2.37) where the sum is taken over all Fourier frequencies wi E ( - n, n] (omitting the zero-frequency term if d < 0). Estimates of p, q, cp and 9 are then obtained by applying the techniques of Chapter 9 to the series { 0, } . The virtue of the regression method i s that it permits estimation o f d without knowledge of p and q. The values { 0,} then permit tentative identifi cation of p and q using the methods already developed for ARMA processes. Final estimates of the parameters are obtained by application of the approxi mate likelihood method described in (a). ExAMPLE 1 3.2. 1 . We now fit a fractionally integrated ARMA model to the data {X, t = 1, . . . , 200} shown in Figure 1 3.3. The sample autocorrelation function (Figure 1 3.4) suggests that the series is long-memory or perhaps even non-stationary. Proceeding under the assumption that the series is stationary, we shall fit an ARIMA(p, d, q) model with d E ( - .5, .5). The first step is to estimate d using ( 1 3.2.33). Table 1 3 . 1 shows the values of the regression estimate a for values of m up to 40. The simulations of Geweke and Porter Hudak ( 1983) suggest the choice m = n · 5 or 14 in this case. In fact from the table we see that the variation in a is rather small over the range 1 3 :::;: m :::;: 3 5. It appears however that the term ln(fu(w)!fu(O)) in ( 1 3.2.30) is not negligible §1 3.2. Long Memory Processes 531 0 - 1 -2 -3 -4 0 60 40 20 80 1 00 1 20 1 40 1 60 1 80 200 Figure 1 3.3. The data {X,, t = I , . . . , 200} of Example 1 3. 1 . 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 4-------��&ff�--�--��- -0.2 -0.3 -0.4 -0 5 -0 6 -0.7 -0 8 -0 9 - 1 50 40 30 20 1 0 0 Figure 1 3.4. The sample autocorrelation function of the data shown in Figure 1 3.3. Table 1 3 . 1 . Values of the Regression Estimator d in Example 1 3.2. 1 m a 13 14 .342 .371 15 16 17 18 19 20 .299 .356 .41 1 .42 1 .37o .334 25 30 35 .37o .409 .424 40 45 .521 .562 532 1 3. Further Topics for j 2 40. We take as our estimate the value of d when m is 1 4, i.e. d = .37 1 , with estimated variance n 2/[6(IJ.�·1 (x; - xlJ = .053 1 . A n approximate 95% confidence interval for d is therefore given by ( - .08 1 , .500). (Although the asymptotic distribution ( 1 3.2.34) is discussed by Geweke and Porter-Hudak only for d < 0, their simulation results support the validity of this distribution even in the case when 0 < d < .5.) Estimated values of U, = Vd (X, + .0434) are next found from ( 1 3.2.37). The sample autocorrelation function of the estimates ( 0,} (Figure ( 1 3.5) strongly suggests an MA(1) process, and maximum likelihood estimation of the param eters gives the preliminary model, V · 3 7 5 (X, + .0434) = Z, + .8 1 6Z, _ l , ( 1 3.2.38) {Z, } WN(0, .489). � Finally we reestimate the parameters of the ARIMA(O, d, 1) model by minimizing the function lb(d, 8) defined in ( 1 3.2.26). The resulting model is {Z, } � WN(O, . 5 1 4), ( 1 3.2.39) which is very similar both to the preliminary model ( 1 3.2.38) and to the model {Z, } � WN(0, .483), ( 1 3.2.40) from which the series {X, } was generated. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 - 1 0 1 0 20 30 40 Figure 1 3.5. The sample autocorrelation function of the estimates 0 , of V· 3 7 1 (X, + .0433), Example 1 3.2. 1 . 533 §1 3.2. Long Memory Processes Prediction of an ARIMA(p, d, q) Process, d E ( - .5, .5) Let {X, } be the causal invertible ARIMA(p, d, q) process, d E ( - .5, .5) r/J(B)Vd X, = B(B) Z, ( 1 3.2.4 1 ) The innovations algorithm can be applied to the covariance function of {X, } to compute the best linear predictor of Xn+h i n terms o f X 1 , , X" . For large n however it is much simpler to consider the approximation • • . Xn+h : = P-;;pfl x . -oo <1·< n} Xn+h · J' - Since we are assuming causality and invertibility we can write 00 X, = L 1/Jj Zt -j j=O and 00 Z, = L ni Xt -i • j=O where L J= O 1/Jj z j = 8(z)r 1 (z) (1 - zr d and L .i= o nj z j = r/J(z)8 - 1 (z) ( 1 - z)d, i z l < 1 . Theorem 5.5. 1 can be extended to include the process ( 1 3.2.41 ), giving 00 00 xn+h = - j=L1 nj xn+h -j = j=Lh 1/Jj Zn +h-j ( 1 3.2.42) and 2 h-1 ( 1 3.2.43) j=O Predicted values of X 20 1 , . . . , X 2 3 0 were computed for the data of Example ( 1 3.2. 1) using the fitted model ( 1 3.2.39). The predictors X 2oo + h were found using a truncated and mean-corrected version of ( 1 3.2.42), namely 1 99+h x200 +h + .0433 = - L nj (X200 +h-j + .0433), j= 1 2 a; (h) = E(Xn+h - xn+h ) = CJ L 1/1} . from which X2 0 1 , X202, . . . , can be found recursively. The predictors are shown with the corresponding observed values X 20 1 , . . . , X 2 3 0 in Figure 1 3 .6. Also shown are the predictors based on the ARMA (3, 3) model, 3 ( I - . 1 32B + .037 B 2 - .407 B ) (X, + .0433) = Z, + 1 .061 ZH + .49 1 Z, _ 2 + .21 8Z, _ 3 , { Z,} � WN (0, .440), which was fitted to the data {X,, t = 1, . . . , 200} using the methods of Section 9.2. The predictors based on the latter model converge much more rapidly to the mean value, - .0433, than those based on the long memory model. 1 3. Further Topics 534 210 200 230 220 Figure 1 3.6. The data X 2 0 1 , . . . , X 2 3 0 o f Example 1 3.2. 1 showing the predictors based on the ARMA model (lower) and the long memory model (upper). The average squared errors of the 30 predictors are 1 .43 for the long memory model and 2.35 for the ARMA model. Although the ARMA model appears to fit the data very well, with an estimated one-step mean square prediction variance .440 as compared with .514 for the long memory model, the value .440 is a low estimate of the true value (.483) for the process (1 3.2.40) from which the series was generated. Both predictors will, as the lead time h -+ oo, have bias - .0433 and asymptotic variance 2. 1 73, the variance of the generating process. It is interesting to compare the rates of approach of the two predictor variances to their asymptotic values. This is done in Table 1 3.2 which shows the ratio avo-;( 00 ) for both models as computed from ( 1 3.2.43). It is apparent that the ARMA predictors are not appreciably better than the mean for lead times of 10 or more, while the value of the long memory predictor persists for much greater lead times. Table 1 3.2. a;(h)/O';(oo) for the Long Memory and ARMA Models for the Data of Example 1 3.2. 1 h 2 3 4 5 10 20 30 40 50 Long memory model .250 .606 .685 .728 .757 .830 .887 .9 14 .932 .945 ARMA model .261 .632 .730 .844 .922 .993 1 .000 1 .000 1 .000 1 .000 §1 3.3. Linear Processes with Infinite Variance 535 § 1 3.3 Linear Processes with Infinite Variance There has recently been a great deal of interest in modelling time series using ARMA processes with infinite variance. Examples where such models appear to be appropriate have been found by Stuck and Kleiner (1974), who considered telephone signals, and Fama ( 1 965), who modelled stock market prices. Any time series which exhibits sharp spikes or occasional bursts of outlying obser vations suggests the possible use of an infinite variance model. In this section we shall restrict attention to processes generated by application of a linear filter to an iid sequence, { Z,, t = 0, ± I , . . . }, of random variables whose dis tribution F has Pareto-like tails, i.e. x) � pC, as x � oo , xaF( - x) = xaP(Z, :::;; - x) � qC, a s X � oo , {xa(l - F(x)) = xa P(Z1 > ( 1 3.3. 1) where 0 < rx. < 2, 0 :::;; p = 1 - q :::;; 1, and C is a finite positive constant which we shall call the dispersion, disp(Z,), of the random variable Z, . From ( 1 3.3. 1 ) we can write xa( l - F(x) + F( - x)) = xaP( I Z,I > x) � C as x � oo . A straightforward calculation (Problem ( 1 3.7)) shows that if b ;:::: ('1, if b < ('1,, ( 1 3.3.2) ( 1 3.3.3) Hence Var(Z,) = oo for 0 < rx. < 2 and E I Z, I < oo only if 1 < rx. < 2. An im portant class of distributions satisfying ( 1 3.3. 1 ) consists of the non-normal stable distributions. 13.3.1 (Stable Distributions). A random variable Z is said to be stable, or to have a stable distribution, if for every positive integer n there exist constants, an > 0 and bn, such that the sum zl + . . . + zn has the same distribution as an Z + bn for all iid random variables Z1 , . . . , Zn, with the same distribution as Z. Definition Properties of a Stable Random Variable, Z Some of the important properties of Z are listed below. For an extensive discussion of stable random variables see Feller (1971), pp. 568-583, but note the error in sign in equation (3. 1 8). I . The characteristic function, l/J(u) = E exp(iuZ), is given by exp {iuf3 - d l u l a ( l - i8 sgn(u)tan(nrx./2)) } if rx. =1= I , "' 'f' (u) = exp { iu/3 - d l u l ( l + i8(2/n)sgn(u)ln l u i ) } if rx. = 1 , { ( 1 3.3.4) 1 3. Further Topics 536 where sgn(u) is u/ 1 u I if u =/= 0, and zero otherwise. The parameters a E (0, 2], 1 f3 E IR, d 1a E [0, 00 ) and e E [ - 1 , 1 ] are known as the exponent, location, scale and symmetry parameters respectively. 2. If a = 2 then Z N(/3, 2d). 3. If 8 = 0 then the distribution of Z is symmetric about /3. The symmetric stable distributions (i.e. those which are symmetric about 0) have charac teristic functions of the form � ( 1 3.3.5) 4. If a = 1 and 8 = 0 then Z has the Cauchy distribution with probability density f(z) = (d/n) [d2 + (z - /3) 2 r l ' z E R 5. The symmetric stable distributions satisfy the property of Definition 1 3. 3 . 1 with an = n 1 1a and b" = 0, since if Z, Z1, . . . , Z" all have the characteristic function ( 1 3.3.5) and z l • · · · · zn are independent, then n E exp [iu (Z1 + · · · + Zn ) J = e - d lu l" = E exp [iuZn1fa]. 6. If F is the distribution function of Z and a E (0, 2), then ( 1 3.3. 1 ) is satisfied with p = ( 1 + 8)/2 and C= {d/(r( l - a)cos(na/2)) 2d/n if a =1= if a = 1, ( 1 3.3.6) 1. In the following proposition, we provide sufficient conditions under which the sum LJ= 1/JjZ, _ j exists when { Z,} is an iid sequence satisfying ( 1 3.3. 1 ). _ w Proposition 1 3.3.1. Let { Z,} be an iid sequence of random ( 1 3.3. 1 ). If { 1/1J is a sequence of constants such that variables satisfying GO < oo for some <5 E (0, a) n [0, 1 ] , L j = I 1/!l then the infinite series, ( 1 3.3.7) - ro w L 1/JjZr - j • j= - rfJ converges absolutely with probability one. PROOF. First consider the case hence 1 < a < 2. Then by ( 1 3.3.3), E I Z 1 1 < oo and 00 = L 1 1/Jj i E I Zt l < oo. j= - oo Thus L � - oo 1 1/JjZr-jl is finite with probability one. Now suppose 0 < a < 1 . Since 0 < <5 < 1, we can apply the triangle in equality lx + y l b � l x l b + I Y ib to the infinite sum L � - oo 1/!jZr-j· Making use 537 §1 3.3. Linear Processes with Infinite Variance of (1 3.3.3) we then find that 00 = Hence I� -oo I t/JiZt -) < oo I 1 t/Jl E I Z1 1b j= - oo < 00 . with probability one. Remark 1. The distribution of the infinite sum I� _ 00 Specifically, 0 t/JiZt _ i satisfies (1 3.3.2). (see Cline, 1983). Remark 2. If Z 1 has a symmetric stable distribution with characteristic func tion e - d W (and dispersion C given by ( 1 3.3.6)), then I� _ w t/liZr - i also has a symmetric stable distribution with dispersion c = c i � -w l t/X J . Remark 3. The process defined by w xt = I t/Jj Zr -j , j= ( 1 3.3.8) - ro where { t/Ji} and { Zr } satisfy the assumptions of Proposition 1 3.3. 1, exists with probability one and is strictly stationary, i.e. the joint distribution of (X 1 , . . . , Xk )' is the same as that of (X 1 +h' . . . , Xk+h )' for all integers h and positive integers k (see Problem 1 3.8). In particular if the coefficients t/Ji are chosen so that t/iJ = 0 for j < 0 and 00 I t/Ji z i = 8(z)/r/J(z), j�O lz l :S 1 , ( 1 3.3.9) Where 8(z) = 1 + 8 1 z + . . . + 8q z q and rp(z) = 1 - tP 1 Z - . . . - rpp z P -f= 0 for l z l :S 1, then it is easy to show that {Xc} as defined by ( 1 3.3.8) satisfies the ARMA equations rp(B)Xr = 8(B)Zr · We record this result as a proposition. Proposition 1 3.3.2. Let { Zr } be an iid sequence of random variables with distribution function F satisfying ( 1 3.3. 1). Then if 8( · ) and ¢( · ) are polynomials such that rp(z) i= 0 for l z l :S 1 , the difference equations ( 1 3.3. 1 0) 1 3. Further Topics 538 have the unique strictly stationary solution, ( 1 3.3. 1 1) L t/Jj Zt -j' j=O where the coefficients { t/1i} are determined by the relation ( 13.3.9). If in addition ¢(z) and 8(z) have no common zeroes, then the process ( 1 3.3. 1 1) is invertible if and only if 8(z) of. O for i z l :S; 1. X, = 00 PROOF. The series ( 1 3.3. 1 1) converges absolutely with probability one by Proposition 1 3.3. 1 . The fact that it is the unique strictly stationary solution of ( 1 3.3. 1 0) is established by an argument similar to that used in the proof of Theorem 3. 1 . 1 . Invertibility is established by arguments similar to those in the proof of Theorem 3. 1 .2. See Problem 1 3.9. D Although the process { X,} defined by (1 3.3.8) is strictly stationary it is not second-order stationary since by Remark 1 and ( 1 3.3.3), E I X, l 2 = oo. Never theless we can still define, for such a process, an analogue of the autocorrela tion function, namely h = 1 , 2, . . . . ( 1 3.3. 1 2) We use the same notation as for the autocorrelation function of a second-order stationary process since if {Z,} is replaced in ( 1 3.3.8) by a finite variance white noise sequence, then ( 1 3.3. 12) coincides with the autocorrelation function of {X,}. Our point of view in this section however is that p(h) is simply a function of the coefficients { t/li } in the representation ( 1 3.3.8), or as a function of the coefficients { ¢J and { BJ if { X ,} is an ARMA process defined as in ( 1 3.3. 1 0). We can estimate p (h) using the sample autocorrelation function, h = 1 , 2, . . . , but it is by no means clear that p(h) is even a consistent estimator of p (h). However, from the following theorem of Davis and Resnick ( 1 986), we find that p(h) is not only consistent but has other good properties as an estimator of p (h). Theorem 1 3.3. 1 . Let { Z,} be an iid symmetric sequence of random variables satisfying ( 1 3.3. 1 ) and let {X,} be the strictly stationary process, X, = where j=L 00 - oo U l l t/ll < oo t/Jj Zt-j' j=L 00 -oo for some <5 E (O, a) n [0, 1 ] . §1 3.3. Linear Processes with Infinite Variance 539 Then for any positive integer h, (n/1n(n)) 1 1"( ,0 ( 1 ) - p ( 1 ), . . . , p(h) - p (h))' => ( Y1 , . . . , }/, )', where Yk = L ( p (k + j ) + p( k - j ) - 2p ( j ) p (k))S)S0, 00 j�l k ( 1 3.3. 1 3) = 1, . . . , h, and S0, S 1 , . . . , are independent stable random variables; S0 is positive stable with characteristic function, E exp(iuS0) ( 1 3.3.14) = exp { - C f(l - 1X/2)cos( mx/4) 1 u l "12 ( 1 - i sgn(u)tan(n:IX/4)) } and S 1 , S 2 , . . . , are iid with characteristic function, exp { - C 2 f ( 1 - 1X)cos(n1X/2) 1 u n if IX =/= 1 , . E exp(!US1 ) = exp { - C 2 n l u l /2 } if rx = 1 . { ( 1 3.3. 1 5) If IX > 1 then ( 1 3.3. 1 3) is also true when p(fJ) is replaced by its mean corrected version, p(h) = L_�,:-1h (X1 - X)(X, +h - X)/L,�� (X, - X) 2, where X = n - 1 (X 1 + · · · + Xn ). It follows at once from this theorem that p(h) .!.. p (h), and more specifically that p(h) - p (h) = Op ( [n/ln(n) J -11") = op (n - 1 1P ) for all f3 > IX. This rate of convergence to zero compares favourably with the slower rate, OP (n - 1 12), for the difference p(h) - p(h) in the finite variance case. The form of the asymptotic distribution of p(h) can be somewhat simplified. In order to do this, note that }/, has the same distribution as · ( 1 3.3. 1 6) l p (h + j ) + p ( h - j ) - 2p ( j ) p ( hW u;v, C�� r where V ( � 0) and U are independent random variables with characteristic functions given by (1 3.3.14) and ( 1 3.3. 1 5) respectively with C = 1 . Percentiles of the distribution of U/V can be found either by simulation of independent copies of U /V or by numerical integration of the joint density of ( U, V) over an appropriate region. Except when IX = 1, the joint density of U and V cannot be written down in closed form. In the case IX 1, U is a Cauchy random variable with probability density fu(u) = t [n 2/4 + u 2 r 1 (see Property 4 of stable random variables), and V is a non-negative stable random variable with density (see Feller (1971)), fv (v) = �v - 312 e - "1 <4v >, v � 0. The distribution function of U/ V is therefore given by = P(U/ V � x) = I"' P(U � xy)fv (y) dy = f oo 0 r ( 1 3.3. 1 7) 112 (n: w) - 312 [arctan(xw) + (n:/2)] exp( - 1/(2w)) dw. 540 1 3. Further Topics Notice also that U /V has the same distribution as the product of a standard Cauchy random variable (with probability density n - 1 (1 + x 2 ) - 1 ) and an independent random variable distributed as x 2 (1). ExAM PLE 1 3.3. 1 (An Infinite Variance Moving Average Process). Let {X,} be the MA(q) process, x, = z, + 01 z, _ 1 + · · · + eqz,_q , where the sequence {Z,} satisfies the assumptions of Theorem 1 3.3. 1 . Since p(h) = 0 for I h i > q, the theorem implies in this case that ( jt PUllara (n/ln(n)) 1 1" ( f5 (h) - p(h)) = 1 + 2 I h u; v, > q, where the right-hand side reduces to U/V if q = 0. Two hundred simulated values of the MA( l ) process X, = Z, + .4Zn ( 1 3.3. 1 8) with {Z, } an iid standard Cauchy sequence (i.e. Ee iuz , = e -lul), are shown in Figure 1 3.7. The corresponding function p(O) is shown in Figure 1 3.8. Except for the value at lag 7, the graph of p(h) does suggest that the data is a realization of an MA( l ) process. Furthermore the moment estimator, iJ, of 8 is .394, agreeing well with the true value (} = .40. (B is the root in [ - 1 , 1 ] of j5(1) = 8/( 1 + fJ2 ). If there is no such root, we define iJ sgn(j5(1)) as in Section 8.5.) = 240 220 200 1 80 1 60 1 40 1 20 1 00 80 60 40 20 0 - 20 -40 0 20 40 60 80 1 00 1 20 1 40 1 60 1 80 200 Figure 1 3.7. Two hundred simulated values of the MA( l ) process, X, = Z, + .4Z, _ 1, where { Z,} is an iid standard Cauchy sequence. 541 §1 3.3. Linear Processes with Infinite Variance 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 - 1 0 1 0 40 30 20 Figure 1 3.8. The function p(h) for the simulated Cauchy MA( t ) series of Example 1 3.3. 1 . The .975 quantile of U/V for the process ( 1 3.3. 1 8) is found numerically from ( 1 3.3. 1 7) to have the value 1 2.4. By Theorem 1 3.3. 1 , approximately 95% confidence bounds for p ( 1 ) are therefore given by p(1) ± 1 2.4(1 1 - 2p 2 ( 1 ) 1 + 1Ji ( 1 ) 1 ) (ln(n)/n) .341 ± .364. = These are not particularly informative bounds when n 200, but the difference between them decreases rapidly as n increases. In simulation studies it has been found moreover that p(h) gives good estimates of p(h) even when n 200. Ten thousand samples of {X 1 , . . . , X 2 00} for the process ( 1 3.3. 1 8) gave 1 0,000 values of p(1), from which the sample mean and variance were found to be .34 1 and .0024 respectively. For a finite-variance MA( l ) process, Bartlett's formula gives the value, v = ( 1 - 3 p 2 ( 1 ) + 4p4 (1))/n, for the asymptotic variance of ,0(1). Setting n = 200 and p ( 1 ) = .4/( 1 + .4 2 ) = .345, we find that v = .00350. Thus the sample variance of p ( 1 ) for 200 observations of the Cauchy process ( 1 3.3. 1 8) compares favourably with the asymptotic approximation to the variance of p ( l ) for 200 observations of the corre sponding finite-variance process. Analogous remarks apply to the moment estimator, iJ, of the coefficient of the MA(1) process. From our 1 0,000 realiza tions of {X 1 , . . . , X2 00}, the sample mean and variance of iJ were found to be .40 1 and .00701 respectively. The variance of the moment estimator, fJ, for a finite-variance MA( I ) process is n - 1 (1 + 82 + 484 + 86 + 88 )/( 1 - 8 2 ) 2 (see Section 8.5). When n = 200 and e = .4 this has the value .00898, which is somewhat larger than the observed sample variance, .00701 , of fJ for the Cauchy process. = = 1 3. Further Topics 542 ExAMPLE 1 3.3.2 (An Infinite Variance AR(1) Process). Figure 1 3.9 shows 200 simulated values {X 1 , . . . , X2 00 } of the AR(1) process, X1 = .7X1 _ 1 + z, where {Z, } is again an iid Cauchy sequence with E e iuz , = e- l ul . Each observed spike in the graph corresponds to a large value of Z,. Starting from each spike, the absolute value o( X, decays geometrically and then fluctuates near zero until the next large value of Z, gives rise to a new spike. The graph of p(h) resembles a geometrically decreasing function as would be expected from a finite-variance AR(1) process (Figure 1 3.1 0). The "Yule-Walker" estimate of c/J is ,6(1) = .697, which is remarkably dose to the true value, cjJ = .7. From 1 0,000 simulations of the sequence { X 1 , . . . , X2 00 }, the sample mean of p(1) was found to be .692 and the sample variance was .0025. For an AR( l ) process with finite variance, the asymptotic variance of p(1) is ( 1 - c/J 2 )/n (see Example 7.2.3). When n = 200 and cjJ = .7, this is equal to .00255, almost the same as the observed sample variance in the simulation experiment. The performance of the estimator p ( 1 ) of cjJ in this case is thus very close, from the point of view of sample variance, to that of the Yule-Walker estimator in the finite variance case. Linear Prediction of ARMA Processes with Infinite Variance. Let { X, } be the strictly stationary ARMA process defined by ( 1 3.3.7) with c/J(z)B(z) #- 0 for all z E C such that l z l ::::;: 1 . Suppose also that the iid sequence { Z1} satisfies 1 60 1 50 1 40 1 30 1 20 1 10 1 00 90 80 70 60 50 40 30 20 10 0 -10 - 20 0 20 40 60 80 1 00 1 20 1 40 1 60 1 80 200 Figure 1 3.9. Two hundred simulated values of the AR( l ) process, X, = .7X, _ 1 + Z,, where { Z,} is an iid standard Cauchy sequence. 543 91 3.3. Linear Processes with Infinite Variance 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0. 1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 0 10 20 40 30 Figure 1 3. 10. The function p(h) for the simulated Cauchy AR( 1 ) series of Example 1 2.5.2. ( 1 3.3. 1). In assessing the performance of the linear predictor, (1 3.3. 1 9) we cannot consider E(Xn + l - Xn+l ) 2 as we did for second order processes since this expectation is infinite. Other criteria for choosing a "best" predictor which have been suggested include minimization of the expected absolute error (when a > 1 ), and the use of a pseudo-spectral technique (Cambanis and Soltani (1 982)). Here we shall consider just one criterion, namely minimization of the error dispersion (see ( 1 3.3. 1 )). Using ( 1 3.3. 1 1 ) we can rewrite X n + 1 in the form 00 xn+l = I (an ! t/lj + an 2 t/lj - 1 + . . . j=O + annt/lj - n+l )Zn -j' ( 1 3.3.20) and using ( 1 3.3. 1 1) again we obtain Since { Z1 } is assumed to have dispersion C, it follows from Remark 2 that ( + .I l t/lj+l - an l t/lj - disp(Xn+l - Xn+l ) = c 1 ;=0 ··· - ) annt/lj - n+l l " . ( 1 3.3.22) 1 3. Further Topics 544 In the special case when Z, has the symmetric stable distribution with exponent 1X E (0, 2) and scale parameter d 1 1a (i.e. E ewz, = exp( - d i iW)), the dispersion of Z, (see Property 6) is C = dj[r(l 1X)cos(n1X/2)], IX i= 1, and C = 2djn, IX = 1 . The prediction error is also symmetric stable with dis persion given by ( 1 3.3.22). Minimization of ( 1 3.3.22) is therefore equivalent to minimization of the scale parameter of the error distribution and hence to minimization of P ( I Xn + 1 - Xn + 1 1 > s) for every s > 0. The minimum dispersion criterion is useful also in regression problems (Blattberg and Sargent ( 1 97 1 )) and Kalman filtering problems (Stuck ( 1 978)) associated with stable sequences. For general sequences {Z,} satisfying ( 1 3.3. 1 ) the minimum dispersion criterion minimizes the tail probabilities of the distribution of the prediction error. The minimization of ( 1 3.3.22) for IX E (0, 2) is rather more complicated than in the case IX = 2 and the best predictor is not in general unique. For a general discussion of this problem (and the related problem of finding h-step predictors) see Cline and Brockwell ( 1 985). Here we shall simply state the results for an MA( 1 ) process and, when Z, has a Cauchy distribution, compare the minimum dispersion predictor .x" + 1 I.'J=1 anj xn + 1 -j with the predictor X;:' = L 'J= 1 l/Jnj Xn + 1 -j obtained by assuming that { Z, } has finite variance. - = If X, = Z, + 8Z, 1 where { Z,} is an iid sequence with distribution function satisfying ( 1 3.3. 1 ), then the minimum dispersion linear predictor xn + 1 of xn + 1 based on x1 , . . . , xn is n xn + 1 = - L ( - 8)j Xn +1 -j if IX :::; 1 , j= 1 n 1 IJ " + 1 - j 1 ( = X n + 1 - L - 8) X n + 1 -j if IX > 1 , 1 IJn + 1 j= 1 where IJ = I e ta/(a - 1 ) . The error dispersion of xn + 1 is C [ 1 + i 8 1 < " + 1 la] if IX :::; 1 , Proposition 13.3.3. _ � . _ - [ c 1 + I IJ I (n + 1 )a c � y- 1 J 1 IJ " IJ The minimum dispersion h-step predictor, h c [1 + t en. PROOF. See Cline and Brockwell ( 1 985). ExAM PLE 2: if IX > 1 . 1, is zero with error dispersion D 1 3.3.3 (Linear Prediction of a Cauchy MA(1) Process). Suppose that (1 3.3.23) 1 81 < 1, where {Z, } is an iid standard Cauchy sequence, i.e. E e ;uz, = e - l ul . Then condition ( 1 3.3. 1 ) is satisfied with p = q = 1 and C = 2/n. By Proposition 545 § 1 3.4. Threshold Models 1 3.3.3, the minimum dispersion one-step predictor is n Xn +l = I ( - BYXn+ l -j• j=l with corresponding error dispersion, 2 d1sp(Xn+ 1 - Xn+ d = - (1 + I B I n+l ). n - · ( 1 3.3.24) - ( 1 3.3.25) If now we imagine { Z1} in ( 1 3.3.23) to have finite variance and compute the best linear mean square predictor x :+ 1, we find from Problem 3. 10 that n n+ j (1 B2 " + 2 )X:i+l = - I [ ( - e)j - ( - ef l - J Xn+l -j • (1 3.3.26) j=l and hence that (1 e z n+ 2 ) (Xn +l - X;i+ l ) - From ( 1 3.3.27) we can easily compute the error dispersion when the mean square linear predictor x:+ 1 is applied to the Cauchy process ( 1 3.3.23). We find that -( ) 2 1 + IBI . * d !Sp (xn+l - xn+ l ) - - 1 + I e I n+l 1 + I B I " +! ' n which is clearly greater than the dispersion of (Xn + I - gn + d ( 1 3.3.28) in ( 13.3.25). The minimum dispersion linear predictor of Xn + l based on {Xj , -oo < j :::;; n } turns out to be the same (for a causal invertible ARMA process) as the best linear mean square predictor computed on the assumption that { Z1 } has finite variance. The dispersion of the one-step prediction error is just the dispersion of {Z1} (2/n in Example 1 3.3.3). Although we have only considered linear prediction in this section, we should not forget the potential for improved prediction of infinite variance (and finite variance) processes using predictors which are non-linear in the observations. In the next section we give a brief introduction to non-linear time-series models, with particular reference to one of the families of non linear models ("threshold models") which have been found useful in practice. § 13.4 Threshold Models Linear processes of the form 00 X I = I t/Jj Zt -j • j=O (1 3.4. 1) 1 3. Further Topics 546 where Z1 E At, = sp{X., - oo < s :::;; t }, play an important role in time series analysis since for such processes the best mean square predictor, E(X1 + h I X 5, - oo < s :::;; t) and the best linear predictor, P.tt,X 1 + h• are identical. (In fact for the linear process ( 1 3.4. 1 ) with { ZJ "' WN(O, cr 2 ), the two predictors are identical if and only if { Z1} is a martingale difference sequence relative to {X1}, i.e. if and only if E(Z1 + 1 1 Xs, - oo < s :::;; t) = 0 for all t (see Problem 1 3. 1 1 ).) The Wold decomposition (Section 5.7) ensures that any purely non-deterministic stationary process can be expressed in the form ( 1 3.4. 1 ) with {Z1} "' WN(O, cr 2 ), but the process {Z1} is generally not an iid sequence and the best mean square predictor of X t + h may be quite different from the best linear predictor. However, in the case when {X1} is a purely non-deterministic Gaussian stationary process, the sequence {Z1} in the Wold decomposition is Gaussian and therefore iid. Every stationary purely non-deterministic Gaussian process can therefore be generated by apply ing a causal linear filter to an iid Gaussian sequence. We shall therefore refer to such processes as Gaussian linear processes. They have the desirable property (like the more general linear process ( 1 3.4. 1)) that P.tt, X t +h = E(X 1 + h 1 Xs , - oo < s :::;; t). Many of the time series encountered in practice exhibit characteristics not shown by linear Gaussian processes and so in order to obtain good models and predictors for such series it is necessary to relax either the Gaussian or the linear assumption. In the previous section we examined a class of non-Gaussian (infinite variance) linear processes. In this section we shall provide a glimpse of the rapidly expanding area of non-linear time series modelling and illustrate this with a threshold model proposed by Tong ( 1 983) for the lynx data (Series G, Appendix A). Properties of Gaussian linear processes which are sometimes found to be violated by observed time series are the following. A Gaussian linear process {XJ is reversible in the sense that (X11, , X1J has the same distribution as (X 1 , , X1)'. (Except in a few special cases, linear, and hence ARMA processes, are reversible if and only if they are Gaussian (Weiss ( 1975), Breidt and Davis ( 1990)).) Deviations from this property are suggested by sample-paths which rise to their maxima and fall away at different rates (see, for example, the Wolfer sunspot numbers, Figure 1 .5, and the logarithms to base 10 of the lynx data, Figure 1 3. 1 1 ). Gaussian linear processes do not exhibit sudden bursts of outlying values as are sometimes observed in practice. Such behaviour can however be shown by non-linear processes (and by processes with infinite variance). Other characteristics suggesting deviation from a Gaussian linear model are discussed by Tong ( 1 983). If we restrict attention to second order properties of a time series, it will clearly not be possible to decide on the appropriateness or otherwise of a Gaussian linear model. To resolve this question we consider moments of order greater than two. Let { X 1} be a process which, for some k ?: 3, satisfies sup1 E 1 X1 l k < oo and E(XtoXt J . . . X t) = E(Xto + h X tJ +h . . . X t, + h), • • • " • • • 547 §1 3.4. Threshold Models 4.5 4 3.5 3 2.5 2 1 .5 1 ��==��m==m��=m��m=�� 1 8 2 0 30 40 50 60 70 80 90 1 900 1 0 20 30 40 50 60 70 Figure 1 3. 1 1 . The logarithms t o base 1 0 of the Canadian lynx series ( 1 82 1-1 934), showing 50 predicted values based on the observations up to 1 920 and the autoregressive model ( 1 3.4.3). . . . }{X,} {0, 1}. Ck(r 1 , {0,, rkX-� > X,+, ,, . . . , X(0r +, r...k_ ,,,0) ikz 1 z 2 zk x(z1, , zd:= In E[exp(iz1X, iz2 X,+,, + + izk Xr+ r,_)]. In particular, the third order cumulant function C3 o f {X,} coincides with the third order central moment function, i.e. r, s E {0, ± 1, . . .} , ( 1 3.4.2) where = EX,. lf L;, Ls I C 3 (r, s)l we define the third order polyspectral density (or bispectral density) of {X,} to be the Fourier transform, 1 , f3 (w1 , w 2 ) = --2 " " C 3 (r, s) - Irro - 1sro2 (2n) r = - oo s= - oo in which case C3(r, s) J�,J�/rro, + isro'f3(w l , Wz) dw l dwz. [More generally, if the order cumulants Ck(r1, rk _1 ), of {X,} are absolutely summable, we define the order polyspectral density as the for all t 0 , t 1 , . . . , ti , h E ± 1, and for all j E 1, . . . , k The kth order cumulant d of is then defined as the joint cumulant i.e. as the coefficient of of the random variables, · · · in the Taylor expansion about of • • • + • • . p ··· < oo , 00 00 L.. L.. e • . , = kth • • • , kth 1 3. Further Topics 548 Fourier transform of Ck . For details see Rosenblatt ( 1985) and Priestley ( 1 988).] If {X1} is a Gaussian linear process, it follows from Problem 1 3. 1 2 that the cumulant function C 3 of { X1} i s identically zero. (The same is also true of all the cumulant functions Ck with k > 3.) Consequently f3(w 1, w2) = 0 for all w 1 , w2 E [ - n, n]. Appropriateness of a Gaussian linear model for a given data set can therefore be checked by using the data to test the null hypothesis, f3 = 0. For details of such a test, see Subba-Rao and Gabr ( 1984). If {X1} is a linear process of the form ( 1 3.4. 1 ) with E I Z1 I 3 < oo , Ez ; = Yf and L� o 1 1/!j l < oo , it can be shown from ( 1 3.4.2) (see Problem 1 3. 1 2) that the third order cumulant function of { X1} is given by C 3 (r, s) = (with 1/Jj = 0 for j < 1/J(z) := L 00 i = - 00 ( 1 3.4.3) 1/J i i/J i + r i/J i +s, 0) and hence that { X1} has bispectral density, f3(wl , w2) where Yf :2 1/J(ei(wl + w,>)ljl(e - i"'l)ljf(e - iw,), = 4 ( 1 3.4.4) L� o 1/Jj zj. By Theorem 4.4. 1 , the spectral density of { X1} is (J2 f(w) = - 1 1/J(e - i"'W. 2n Hence "- ) '+' (w l , w2 ·· - l f3(wl , w 2 W f(wdf(w2)f(w1 + w2) 11 2 2nrJ6 • Appropriateness of the linear process ( 1 3.4. 1 ) for modelling a given data set can therefore be checked by using the data to test for constancy of ¢(w1, w2) (see Subba-Rao and Gabr ( 1 984)). If it is decided that a linear Gaussian model is not appropriate, there is a choice of several families of non-linear processes which have been found useful for modelling purposes. These include bilinear models, autoregressive models with random coefficients and threshold models. Excellent accounts of these are available in the books of Subba-Rao and Gabr ( 1 984), Nicholls and Quinn ( 1982) and Tong ( 1 990) respectively. Threshold models can be regarded as piecewise linear models in which the linear relationship varies with the values of the process. For example if R< il , i = 1, . . . , k , is a partition of IRP, and {Z1} IID(O, 1 ), then the k difference equations, � p x l = (J<ilzt + " "-f.i)xt - ). ' � o/j j= 1 i = 1 , . . . ' k, ( 1 3.4.5) 549 § 1 3.4. Threshold Models defined a threshold AR(p) model. Model identification and parameter estimation for threshold models can be carried out in a manner similar to that for linear models using maximum likelihood and the AIC criterion. Details can be found in the book of Tong ( 1990). It is sometimes useful to express threshold models in state-space form (cf. Section 1 2. 1 ). For example, the model ( 1 3.4.5) can be re-expressed as xt = [O o o ·· · 1JSP where Sr = = (Xr - p + 1 , Xr - p + 2 , , Xr)' satisfies the state equation, Sr + 1 = FrSr + Hr Zr + 1 · • • . This state representation differs from those in Chapter 1 2 in that the matrices Fr and Hr now depend on Sr . Thus 0 0 Ft - 0 0 0i ¢ ) ¢( ) � 0 p- 1 0 ¢(pi)- 2 0 0 0 0 Ht r/JV) 0 if S; E R(i). (J(i) As an illustration of the use of threshold models, Tong identifies the following model for the logarithm to base 10 of the lynx data ( 1 82 1-1920) : { + 1 .068Xr - 1 - .207Xr - 2 + . 17 1 Xr - 3 - .453Xr - 4 Xr - 2 s 3.05, + .224Xr - s - .033Xr _ 6 + . 1 74Zp 2.296 + 1 .425Xr - J - 1 .080Xr_ 2 - .09 1 Xr _ 3 + .237Zp X r _ 2 > 3.05, ( 13.4.6) where { Zr} � 110(0, 1 ). The model ( 1 3.4.6) is said to have a delay parameter b = 2 since the form of the equation specifying Xr is dependent on the value of Xr _ 2 . 1t is easy to compute the best mean square predictor E(Xn + h i Xn t s n) for the model ( 1 3.4.6) if h = 1 or h = 2 but not if h > 2 (see Problem 1 3 . 1 3). More generally, if the delay parameter is b, computation of the best predictor is easy if h < b but is extremely complicated otherwise. A natural approximation procedure for computing the best predictors given X 1, , X" is to set Zr = 0, t � n + 1, in the recursions defining the process and then Tong ( 1983), p. 1 87, refers to the to solve recursively for Xn + 1, Xn + 2 , predictors obtained in this way as the values of the eventual forecast function. For the logarithms of the lynx data and the model (1 3.4.6) the eventual forecast function exhibits a stable limit cycle of period 9 years with values (2.6 1 , 2.67, 2.82, 3.02, 3.25, 3.4 1, 3.37, 3. 1 3, 2.80). An alternative technique suggested by Tong for computing h-step predictors when h > b and the data is nearly cyclic with period T is to fit a new model with delay parameter k T + b where k is a positive integer. Under the new model, prediction can then be carried out for values of h up to kT + b. The most satisfactory general procedure for forecasting threshold models is to simulate future values of Xr = .802 • • . • . . • 1 3. Further Topics 550 the process using the fitted model and the observed data { X 1, . . . , X n } · From N simulated values of Xn + h we can construct a histogram which estimates the conditional density of X n + h given the data. This procedure is implemented in the software package STAR of H. Tong (see Tong ( 1 990)). It is interesting to compare the non-linear model ( 1 3.4.6) for the logarithms of the lynx data with the minimum AICC autoregressive model found for the same series using the program PEST, namely X, = 1 . 1 23 + 1 .084X, _ 1 - .477X, _ 2 + .265X, _ 3 - .21 8X, _ 4 ( 1 3.4.7) {Z,} � WN(O, .0396) + . 1 80X, _ 9 - .224X, _ 1 2 + Z, , The best linear mean-square h-step predictors, h = 1 , 2, . . . , 50, for the years 1921-1970 were found from ( 1 3.4.7). They are shown with the observed values of the series ( 1 82 1-1934) in Figure 1 3. 1 1. As can be seen from the graph, the h-step predictors execute slowly damped oscillations about the mean (2.880) of the first 1 00 observations. As h � oo the predictors converge to 2.880. Figures 1 3 . 1 2 and 1 3 . 1 3 show respectively 1 50 simulated values of the processes ( 1 3.4.7) and (13.4.6). Both simulated series exhibit the approximate 9 year cycles of the data itself. In Table 1 3.3 we show the last 1 4 observed values of X, with the corresponding one-step predictors X, based on (1 3.4.7) and the predictors X, = E(X, I Xs , s < t) based on ( 1 3.4.6). The relative performance of the predictors can be assessed by computing 1 s = (S/14) 1 2 where S is the sum of squares of the prediction errors for 5 4.5 4 3.5 3 2.5 2 1 .5 0 10 20 30 40 50 60 70 80 90 1 00 1 1 0 1 20 1 30 1 40 1 50 Figure 1 3. 1 2. One hundred and fifty simulated values of the autoregressive model ( 1 3.4.7) for the transformed lynx series. § 1 3.4. Threshold Models 0 1 0 20 30 55 1 40 50 60 70 80 90 1 00 1 1 0 1 20 1 30 1 40 1 50 Figure 1 3. 1 3. One hundred and fifty simulated values of the threshold model ( 1 3.4.6) for the transformed lynx series. Table 1 3.3. The Transformed Lynx Data {X,, t = 1 0 1 , . . . , 1 14} with the One-Step Predictors, X ,, Based on the Autoregressive Model ( 1 3.4.7), and X, Based on the Threshold Model (1 3.4.6) 101 1 02 1 03 1 04 1 05 1 06 1 07 1 08 1 09 1 10 Ill 1 12 1 13 1 14 X, x, .X, 2.360 2.601 3.054 3.386 3.553 3.468 3 . 1 87 2.724 2.686 2.821 3.000 3.201 3.424 3.531 2.349 2.793 2.865 3.23 1 3.354 3.329 2.984 2.668 2.432 2.822 2.969 3.242 3.406 3.545 2.3 1 1 2.877 2.9 1 1 3.370 3.588 3.426 3.094 2.77 1 2.422 2.764 2.940 3.246 3.370 3.447 1 3. Further Topics 552 t = 101, . . . , 1 14. From the table we find that the value of s for the autoregressive predictor X , is . 1 38, while for the threshold predictor X,, the values of s is reduced to . 1 20. A bilinear model for the log lynx data can be found on p. 204 of Subba-Rao and Gabr (1 984) and an AR(2) model with random coefficients on p. 143 of Nicholls and Quinn ( 1 982). The values of s for predictors based on these models are . 1 1 5 and . 1 1 6 respectively. These values indicate the improvements attainable in this example by consideration of non-linear models. Problems 1 3. 1 . Suppose that { Xn } and { X, 2 } are related by the transfer function model, 2.5B X, I + W, , X, z = 1 - .7B --- (1 - .8B)X,1 = Z,, where { Z,} WN(O, 1 ), { W,} WN(O, 2), and { Z,} and { W,} are uncorrelated. (a) Write down a state-space model for { (Xn, X, z )'}. (b) If WI OO = 1 .3, x 1 00 . 2 = 2.4, x ! OO. l = 3. 1 5, find the best linear predictors of X 1 0 1 . 2 and X 1 0 2• 2 based on {X,1, X, 2 , t ::;; 1 00}. (c) Find the mean squared errors of the predictors in (b). � � 1 3.2. Show that the output, { X, 2 }, of a transfer function model satisfies the equation ( 1 3. 1 . 1 5). 1 3.3. Consider the transfer function model Bz X, 2 = --- Xn + W, , 1 - .5B (1 + .5B)X,1 = Z,, where {Z,} and { W,} are uncorrelated WN(O, 1 ) sequences. Let Q, = psp{X,1 . - oo < s .o; t} • Show that Pn X n + l . l = Qn X n + l . l and hence evaluate Pn Xn + J , J· Express PnXn + 1 . 2 and PnXn + z . z in terms of Xj 1, Xj2 , j ::;; n and W". Evaluate E(Xn + I . Z - Pn Xn + l . 2) 2 , E(Xn + l . l - Pn X n + 2 . 2 ) 2 . Show that the univariate process { X, 2 } has the autocovariance function of an ARMA process and specify the process. (Hint : consider ( 1 - .25B 2)X, 2 .) 2 (e) Use (d) to compute E(Xn + l . z - R n Xn + 1 , 2 ) . (a) (b) (c) (d) 1 3.4. Verify the calculations of I1�. 1 in Example 1 3. 1 . 1 (cont.). 553 Problems 1 3.5. Find a transfer function model relating the input and output series X,1, X,2, t = I , . . . , 200 (Series J and K respectively in the Appendix). Use the model to predict X 201 • 2 , X 202 • 2 and X 203• 2 . Compare the predictors and their mean squared errors with the corresponding predictors and mean squared errors obtained by modelling { Xrz} as a univariate A R MA process and with the results of Problem 1 1 . 10. 1 3.6. If {X,} is the causal invertible ARIMA(p, d, q) process defined by ( 1 3.2.23) and x. = n - 1 (X 1 + · · · + X.), show that 2 n 1 - dE(X. - /1) 2 --> C, where C is a positive constant. 1 3.7. Verify the properties ( 1 3.3.3) for a random variable Z1 whose distribution satisfies ( 1 3.3 . 1 ). 1 3.8. Show that the linear process ( 1 3.3.8) is strictly stationary if { tjJj } and { Z,} satisfy the conditions of Proposition 1 3.3. 1 . 1 3.9. Prove Proposition 1 3.3.2. (Note that Proposition 1 3.3.1 remains valid for strictly stationary sequences {Z,} satisfying ( 1 3.3.1 ), and use arguments similar to those used in the proofs of Theorems 3 . 1 . 1 and 3 . 1.2.) 1 3 . 10. Modify Example 1 3.3.3 by supposing that {Z,} is iid with E eiuz, = e - 1 • 1 ', u E IR, 0 < rx < 2. Use Proposition 1 3.3.3 to show that for each fixed sample size n and coefficient B E ( - 1 1 ), the ratio of the error dispersion of X*. + 1 (see ( 1 3.3.28)) tO that Of X n+ I • COnverge aS rJ. --> 0 tO 1 + n/2. , 1 3. 1 1 . Let {X,} be the process, 00 where t/10 # X, = I 1/Jj Zr - j• {Z,} j�O � 2 WN(O, a ), 0, Ii�o 1/Jl < oo and Z, E A, = sp {X , - oo < s � t } . Show that . E(X, + 1 1 Xs, - oo < s � t) = PJt, X, + 1 if and only if { z,} is a martingale difference sequence, i.e. if and only if E(Z, + 1 I X - oo < s � t) = 0 for all t. 5, 1 3. 1 2. If {X,} is the linear process ( 1 3.4. 1) with {Z,} 110(0, a 2 ) and 11 that the third order cumulant function of {X,} is given by � C k, s) = 11 = Ez; , show 00 I 1/J;t/J; +,t/Ji + s · i= - OC; Use this result to establish equation ( 1 3.4.4). Conclude that if {X,} is a Gaussian linear process, then C3(r, s) = 0 and f3(w 1 , w2) = 0. 1 3. 1 3. Evaluate the best mean square predictors E(X, + h i Xs, - oo < s � t), h = 1, 2 for the threshold model ( 1 3.4.6). APPENDIX Data Sets All of the following data sets are listed by columns. Series A. Level of Lake Huron in Feet (Reduced by 570), 1 875- 1 972 10.38 1 1 .86 10.97 10.80 9.79 10.39 10.42 10.82 1 1.40 1 1 .32 1 1 .44 1 1.68 1 1.17 1 0.53 1 0.01 9.9 1 9.14 9. 1 6 9.55 9.67 8.44 8.24 9. 1 0 9.09 9.35 8.82 9.32 9.01 9.00 9.90 9.83 9.72 9.89 10.01 9.37 8.69 8.19 8.67 9.55 8.92 8.09 9.37 1 0. 1 3 1 0. 14 9.5 1 9.24 8.66 8.86 8.05 7.79 6.75 6.75 7.82 8.64 10.58 9.48 7.38 6.90 6.94 6.24 6.84 6.85 6.90 7.79 8. 1 8 7.51 7.23 8.42 9.61 9.05 9.26 9.22 9.38 9.10 7.95 8. 1 2 9.75 1 0.85 10.41 9.96 9.61 8.76 8. 1 8 7.21 7. 1 3 9.10 8.25 7.91 6.89 5.96 6.80 7.68 8.38 8.52 9.74 9.3 1 9.89 9.96 Series B. Dow Jones Utilities Index, Aug. 28-Dec. 1 8, 1 972 1 10.94 1 1 0.69 1 1 0.43 1 1 0.56 1 10.75 1 1 0.84 1 10.46 1 1 0.56 1 1 0.46 1 10.05 1 09.60 1 09.31 1 09.3 1 1 09.25 1 09.02 1 08.54 1 08.77 1 09.02 1 09.44 1 09.38 1 09.53 1 09.89 1 10.56 1 10.56 1 1 0.72 1 1 1 .23 1 1 1 .48 1 1 1 .58 1 1 1 .90 1 1 2. 1 9 1 1 2.06 1 1 1 .96 1 1 1 .68 1 1 1 .36 1 1 1 .42 1 1 2.00 1 1 2.22 1 1 2.70 1 1 3. 1 5 1 14.36 1 14.65 1 1 5.06 1 1 5.86 1 1 6.40 1 1 6.44 1 1 6.88 1 1 8.07 1 1 8.51 1 19.28 1 19.79 1 19.70 1 19.28 1 19.66 1 20. 14 120.97 121.13 1 2 1 .55 1 2 1 .96 122.26 123.79 1 24. 1 1 1 24.1 4 123.37 1 23.02 1 22.86 1 23.02 1 23. 1 1 1 23.05 1 23.05 1 22.83 123. 1 8 1 22.67 1 22.73 1 22.86 1 22.67 1 22.09 1 22.00 1 2 1 .23 Appendix. Data Sets 556 Series C. Private Housing Units Started, U.S.A. (Monthly). [Makridakis 922] 1 361 1 278 1443 1 524 1 483 1404 1450 1517 1 324 1 533 1 622 1 564 1 244 1 456 1 534 1 689 1641 1 588 1614 1639 1 763 1 779 1 622 1491 1603 1 820 1517 1448 1 467 1 550 1 562 1 569 1455 1 524 1486 1 484 1361 1433 1423 1438 1478 1 488 1 529 1 432 1 482 1 452 1 460 1 656 1 370 1 378 1 394 1 352 1 265 1 194 1 086 1 1 19 1 046 843 961 990 1 067 1 123 1 056 1 09 1 1 304 1 248 1 364 1 407 142 1 149 1 1 538 1 308 1 380 1 520 1 466 1 554 1 408 1405 1 5 12 1 495 1 556 1 569 1 630 1 548 1 769 1 705 1 56 1 1 524 1 583 1 528 1 368 1 358 1 507 1381 1 229 1 327 1 085 1 305 1319 1 264 1 290 1 385 1517 1 399 1 534 1 580 1 647 1 893 1 828 1 741 1910 1 986 2049 2026 2083 2 1 58 2041 2 1 28 2 1 82 2295 2494 2390 2334 2249 2221 2254 2252 2382 248 1 2485 2421 2366 248 1 2289 2365 2084 2266 2067 2 1 23 205 1 1 874 1 677 1 724 1 526 Series D. Industrial Production, Austria (Quarterly). [Makridakis 337] 54. 1 59.5 56.5 63.9 57.8 62.0 58.5 65.0 59.6 63.6 60.4 66.3 60.6 66.8 63.2 7 1 .0 66.5 72.0 67.8 75.6 69.2 74. 1 70.7 77.8 72.3 78. 1 72.4 82.6 72.9 79.5 72.6 82.8 76.0 85. 1 80.5 89. 1 84.8 94.2 89.5 99.3 93.1 103.5 96.4 107.2 101.7 1 09.5 101.3 1 12.6 1 05.5 1 1 5.4 1 08.0 1 29.9 1 12.4 1 23.6 1 14.9 1 3 1 .0 1 22.6 1 3 1 .9 1 20.5 1 30.7 1 1 5.7 1 19.7 1 09.7 1 25.1 Series E. Industrial Production, Spain (Monthly). [Makridakis 868] 128 1 34 133 141 1 34 142 143 1 36 108 142 146 149 141 1 56 151 1 60 1 56 1 60 161 149 1 18 147 1 58 146 1 32 1 39 1 39 137 144 146 149 142 101 141 1 22 1 45 1 48 137 1 37 1 55 152 1 53 1 52 1 53 1 13 151 1 59 1 65 161 1 60 1 67 1 78 1 67 1 76 1 73 1 64 1 23 1 75 1 75 1 76 1 74 557 Appendix. Data Sets Series F. General Index of Industrial Production (Monthly). [Makridakis 904] 96 97 99 100 1 02 1 06 1 02 80 1 04 1 04 1 07 102 1 00 1 07 1 10 111 1 14 113 1 09 91 1 16 118 123 121 1 19 125 131 132 1 35 138 132 1 06 1 34 132 1 36 1 32 1 33 140 142 146 148 149 1 47 1 15 1 50 1 52 1 58 1 54 1 55 1 59 1 60 1 63 1 67 1 60 1 62 1 26 1 60 161 1 67 1 67 1 64 1 65 1 73 1 79 181 1 82 1 75 131 183 181 1 75 1 82 1 82 1 85 191 191 188 1 39 1 89 1 90 1 99 193 1 95 200 205 208 216 216 210 1 69 217 213 220 217 3495 587 1 05 1 53 387 758 1 307 3465 6991 63 1 3 3794 1 836 345 382 808 1 388 271 3 3800 309 1 2985 3790 674 81 80 1 08 229 399 1 1 32 2432 3574 2935 1 537 529 485 662 1 000 1 590 2657 3396 83023 40748 35396 29479 42264 58 1 7 1 508 1 5 5 1 285 70229 76365 70407 41 839 45978 478 1 3 57620 66549 54673 55996 60053 39169 2 1 534 1 7857 2 1 788 33008 1 84 1 79 181 1 79 1 87 1 85 183 1 77 1 76 1 30 1 76 1 75 181 1 76 Series G. Annual Canadian Lynx Trappings, 1 82 1 - 1 934 269 321 585 871 1475 282 1 3928 5943 4950 2577 523 98 1 84 279 409 2285 2685 3409 1 824 409 151 45 68 213 546 1 033 2 1 29 2536 957 361 377 225 360 731 1 638 2725 2871 2 1 19 684 299 236 245 552 1 623 33 1 1 6721 4254 687 255 473 358 784 1 594 1 676 225 1 1426 756 299 201 229 469 736 2042 28 1 1 443 1 25 1 1 389 73 39 49 59 1 88 377 1 292 403 1 Series H. Annual Mink Trappings, 1 848- 1 9 1 1 37123 347 1 2 296 1 9 21151 24859 25152 42375 50839 6 1 58 1 6 1 95 1 7623 1 63264 44730 3 1 094 49452 4396 1 61 727 60334 5 1 404 58451 73575 74343 27708 3 1985 39266 44740 60429 72273 792 14 79060 84244 62590 35072 36160 45600 47508 52290 1 10824 76503 64303 558 Appendix. Data Sets Series I. Annual Muskrat Trappings, 1 848- 1 9 1 1 224347 1 79075 1 75472 194682 292530 493952 5 1 2291 345626 258806 302267 3 1 3502 254246 1 77291 206020 335385 357060 509769 41 8370 320824 4 1 2 1 64 6 1 808 1 4 1 4 1 73 232251 443999 703789 767896 671 982 523802 5833 19 437 1 2 1 486030 499727 478078 829034 1029296 1 069 1 83 1 083067 8 1 7003 347050 380 1 32 344878 2236 14 322 1 60 574742 8061 03 934646 648687 6748 1 1 8 1 3 1 59 551716 568934 701487 767741 928 1 99 1650214 1 488287 924439 1 056253 695070 407472 1 724 1 8 302195 7491 42 963597 5.06 4.93 6.00 5.59 6.24 3.79 4.5 1 6.00 4.76 5.55 4.74 5.61 6.52 5.95 4.38 5.95 5.80 4.87 6.47 4.01 5.57 6.30 3.34 6.38 4.73 4.70 6.29 1 .98 6.71 5.39 5.73 5.55 5.88 4.27 4.34 4.26 4.98 3.38 6. 14 5.45 5.92 2.62 7.68 3.37 3.90 7.01 1.24 9.03 1 .08 7.86 0.87 9.26 1 .29 7.59 3.90 4.37 6.00 3.56 5.92 4.38 6.24 6.57 3.05 6.43 4.03 5.86 4.82 2.90 5.65 5.14 3. 1 7 6.73 4.43 3.99 4.46 Series J. Simulated Input Series 6.94 3.48 5.92 5. 1 1 7.00 2.23 5.78 5.26 5. 19 4.83 4.48 5.98 4.45 5.73 6.38 2.47 7.43 2.73 5.90 5. 19 5.08 4.28 3.64 6.01 3.08 5.98 5.07 6.43 3.64 6.84 1 .49 7.39 2.36 5.24 4.63 6.87 2.8 1 6.27 4.98 3.45 4.90 4. 1 1 5.40 6.38 3.94 4.96 6.35 2.75 6.03 4.76 3.31 5.56 5.06 4.28 3.89 7.47 3.60 8.88 1 .26 7.78 2.76 9.23 2.95 6.38 4.35 5.33 5.00 4.87 6.74 3.37 6.56 3.45 4.54 4.95 4.34 3.69 7.40 3. 1 8 6.92 3.40 7.65 3.42 5.91 3.91 6.99 2. 1 5 7. 1 7 3.94 3.87 7.61 3.73 5.08 6.61 1 .94 6. 1 7 5.73 4. 14 5.76 3.56 5.44 4.8 1 5.46 5.01 3.05 6.48 6.35 3.86 5.01 6.49 3.43 7.70 2. 1 3 8.47 2.41 5.93 4.69 5.29 6.44 5.14 5.93 3.95 5.87 4.64 7.25 5.00 Appendix. Data Sets 559 Series K. Simulated Output Series 1 5.21 21.13 14. 1 4 18.22 8.74 1 7.33 1 7. 1 0 22.20 4.44 1 1 .62 1 7.50 1 8.59 1 4.77 14.63 17.16 1 3.44 1 5.63 20.99 1 1 .98 20. 1 3 1 1 .90 1 3.94 14.06 19.71 1 5.73 1 2.79 14. 1 2 7.77 1 1 .03 1 3.36 1 7.36 1 2.82 22. 1 6 8.83 22.66 10.65 10.72 6.77 1 5.73 7.26 1 2.82 1 2.38 7.26 7.90 1 2. 1 1 14.56 1 7. 1 1 1 6. 1 5 1 7. 1 2 1 9.97 1 2.36 1 5.92 1 4.00 7.05 10. 1 9 10.78 9.21 7.36 1 7.01 1 2. 1 4 24.86 1 2.34 23.49 1 2.76 26.97 14.95 22.94 1 5.88 20.30 1 9. 1 3 1 7.26 1 9.99 1 5.76 19.82 1 1 .09 10.70 1 1 .71 1 0.65 7.61 1 7.73 9.71 1 8.58 1 5.04 24.44 14.33 1 7. 1 8 1 2.21 2 1 .90 1 2.28 20.73 1 1 .76 8.82 1 6.89 1 2.58 1 3.63 1 9.38 9.80 1 7.85 1 9.01 1 8.9 1 1 8.88 14.20 1 6.58 1 7.32 1 6.85 1 8.32 1 6. 1 1 1 9. 1 7 21.18 1 6. 1 7 1 2.02 21 .46 1 7.35 22.76 1 4.03 25.30 1 3.46 17.17 1 2.98 20.27 22.75 20.75 23.2 1 1 7.67 1 7.24 1 3.78 1 9.22 1 4.90 1 3.06 1 0.99 1 7.73 21.1 1 22.43 1 6.87 14.56 1 7.60 1 4.45 1 4.89 1 8.00 24.21 22.95 1 9.59 1 4.67 2 1 .08 24.71 20. 1 8 22. 1 8 1 8.08 1 7.29 1 6.30 8.69 1 6.46 14.26 1 1.23 1 2.90 5.82 1 5. 1 3 1 8.50 1 7.66 1 7.37 1 7.78 1 3.96 1 3.89 1 1.98 1 3 .02 1 0.64 1 8.09 1 9.92 24.02 1 3.62 24.69 1 3.59 10.97 1 6.25 4.93 21 .28 1 0.00 1 8.47 4.75 23.36 6.40 1 7.65 1 3.67 14. 1 5 14. 1 6 1 0.63 1 7.78 1 1.99 1 3.65 1 7.64 1 1.00 1 7.07 1 5.07 1 6.24 1 4.92 1 1.62 1 6.75 1 6.54 1 0.82 1 8.45 Bibliography Ahlfors, L.V. (1953), Complex Analysis, McGraw-Hill, New York. Akaike, H. ( 1969), Fitting autoregressive models for prediction, A nnals of the Institute of Statistical Mathematics, Tokyo, 21, 243-247. Akaike, H. ( 1 973), Information theory and an extension of the maximum likelihood principle, 2nd International Symposium on Information Theory, B.N. Petrov and F. Csaki (eds.), Akademiai Kiado, Budapest, 267-28 1 . Akaike, H . ( 1 978), Time series analysis and control through parametric models, Applied Time Series Analysis, D.F. Findley (ed.), Academic Press, New York. Amos, D.E. and Koopmans, L.H. ( 1 963), Tables of the distribution of coherences for stationary bivariate Gaussian processes, Sandia Corporation Monograph SCR-483. Anderson, O.D. ( 1 976), Time Series Analysis and Forecasting. The Box-Jenkins Ap proach, Butterworths, London. Anderson, T.W. ( 1 97 1 ), The Statistical Analysis of Time Series, John Wiley, New York. Anderson, T.W. ( 1 980), Maximum likelihood estimation for vector autoregressive moving average models, Directions in Time Series, D.R. Brillinger and G.C. Tiao (eds.), Institute of M athematical Statistics, 80- 1 1 1 . Ansley, C. F . ( 1979), A n algorithm for the exact likelihood of a mixed autoregressive moving average process, Biometrika, 66, 59-65. Ansley, C. F. and Kohn, R. ( 1 985), On the estimation of ARIMA models with missing values, Time Series A nalysis of Irregularly Observed Data, E. Parzen (ed.), Springer Lecture Notes in Statistics, 25, 9-37. Aoki, M. ( 1987), State Space Modelling of Time Series, Springer-Verlag, Berlin. Ash, R.B. ( 1 972), Real Analysis and Probability, Academic Press, New York. Ash, R.B. and Gardner, M . F. ( 1 975), Topics in Stochastic Processes, Academic Press, New York. Bartlett, M .S. (1 955), An Introduction to Stochastic Processes, Cambridge University Press. Bell, W. and Hillmer, S. (1 990), Initializing the Kalman filter for non-stationary time series models, Research Report, U.S. Bureau of the Census. Berk, K .N. ( 1974), Consistent autoregressive spectral estimates, Ann. Statist., 2, 489-502. 562 Bibliography Billingsley, P. ( 1986), Probability and Measure, 2nd ed., Wiley-lnterscience, New York. Birkhoff, G. and Mac Lane, S. (I 965), A Survey of Modern Algebra, MacMillan, New York. Blattberg, R . and Sargent, T. ( 1 97 1 ), Regression with non-Gaussian stable distur bances: Some sampling results, Econometrica, 39, 501 -5 10. Bloomfield, P. ( 1 976), Fourier Analysis of Time Series: An Introduction, John Wiley, New York. Box, G.E.P. and Jenkins, G.M. ( 1 970), Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco. Box, G.E.P. and Pierce, D.A. ( 1 970), Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Amer. Statist. Assoc. 65, 1 509-1 526. Breidt, F.J. and Davis, R.A. ( 1 99 1 ). Time-reversibility, identifiability and indepen dence of innovations for stationary time series. J. Time Series Analysis, 1 3, 377390. Brei man, L. ( 1968), Probability, Addison-Wesley, Reading, Massachusetts. Brillinger, D. R. ( 1 965), An introduction to polyspectra, Annals of Math. Statist., 36, 1 3 5 1 - 1 374. Brillinger, D. R . ( 198 1 ), Time Series Analysis: Data Analysis and Theory, Holt, Rine h a rt & Winston, New York. Brillinger, D.R. and Rosenblatt, M . ( 1 967), Asymptotic theory of estimates of kth order spectra, Spectral Analysis of Time Series, B. Harris (ed.), John Wiley, New York, 1 53 - 1 88. Brillinger, D. R. and Rosenblatt, M. ( 1967), Computation and interpretation of kth order spectra, Spectral Analysis of Time Series, B. Harris (ed.), John Wiley, New York, 1 89-232. Brockwell, P.J. and Davis, R.A. ( 1988a), Applications of innovation representations in time series analysis, Probability and Statistics, Essays in Honor of Franklin A. Graybill, J.N. Srivastava (ed.), Elsevier, Amsterdam, 6 1 -84. Brockwell, P.J. and Davis, R.A. ( 1988b), Simple consistent estimation of the coefficients of a linear filter, Stoch. Processes and Their Applications, 22, 47-59. Brockwell, P.J., Davis, R.A. and Salehi, H. ( 1990), A state-space approach to transfer function modelling, in Inference from Stochastic Processes, I.V. Basawa and N.U. Prabhu (eds). Burg, J.P. ( 1967), Maximum entropy spectral analysis, 37th Annual International S.E.G. Meeting, Oklahoma City, Oklahoma. Cambanis, S. and Soltani, A . R . ( 1 982), Prediction of stable processes: Spectral and moving average representations, Z. Wahrscheinlichkeitstheorie verw. Geb., 66, 5936 1 2. Chatfield, C. ( 1984), The Analysis of Time Series: An Introduction, 3rd ed., Chapman and Hall, London. Churchill, R.V. ( 1 969), Fourier Series and Boundary Value Problems, McGraw-Hill, New York. Cline, D.B.H. ( I 983), Estimation and linear prediction for regression, autoregression and A R M A with infinite variance data, Ph.D. Dissertation, Statistics Department, Colorado State University. Cline, D.B.H. and Brockwell, P.J. ( 1 985), Linear prediction of A RM A processes with infinite variance, Stoch. Processes and Their Applications, 1 9, 28 I -296. Cooley, J.W. and Tukey, J.W. ( 1 965), An algorithm for the machine calculation of complex Fourier series, Math. Comp., 19, 297-30 1 . Cooley, J.W., Lewis, P.A.W. and Welch, P.O. ( 1 967), Historical notes o n the fast Fourier transform, IEEE Trans. Electroacoustics, A U-15, 76-79. Bibliography 563 Davies, N., Triggs, C.M . and Newbold, P. (1 977), Significance levels of the Box-Pierce portmanteau statistic in finite samples, Biometrika, 64, 5 1 7-522. Davis, H.F. (1 963), Fourier Series and Orthogonal Functions, Allyn & Bacon, Boston. Davis, M.H.A. and Vinter, R.B. (1985), Stochastic Modelling and Control, Chapman and Hall, London. Davis, R.A. and Resnick, S.I. ( 1 986), Limit theory for the sample covariance and correlation functions of moving averages, Ann. Statist., 14, 533-558. Doob, J.L. (1 953), Stochastic Processes, John Wiley, New York. Dunsmuir, W. and Hannan, E.J. ( 1 976), Vector linear time series models, Adv. Appl. Prob., 8, 339-364. Duong, Q.P. (1 984), On the choice of the order of autoregressive models: a ranking and selection approach, J. Time Series Anal., 5, 145- 1 57. Fama, E. ( 1965), Behavior of stock market prices, J. Bus. U. Chicago, 38, 34- 105. Feller, W. (1971), A n Introduction to Probability Theory and Its A pplications, Vol. 2, 2nd ed., John Wiley, New York. Fox, R. and Taqqu, M.S. (1 986), Large sample properties of parameter estimates for strongly dependent stationary Gaussian time series, A nn. Statist., 14, 5 1 7532. Fuller, W.A. ( 1 976), Introduction to Statistical Time Series, John Wiley, New York. Gardner, G., Harvey, A.C. and Phillips, G.D.A. ( 1 980), An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering, Applied Statistics, 29, 3 1 1 -322. Gentleman, W.M. and Sande, G. ( 1 966), Fast Fourier transforms for fun and profit, AFIPS, Proc. 1 966 Fall Joint Computer Conference, Spartan, Washington, 28, 563-578. Geweke, J. and Porter-Hudak, S. ( 1 983), The estimation and application of long memory time series models, J. Time Series Analysis, 4, 221 -238. Gihman, 1 . 1 . and Skorohod, A.V. ( 1 974), The Theory of Stochastic Processes I, translated by S. Kotz, Springer-Verlag, Berlin. de Gooijer, J.G., Abraham, B., Gould, A. and Robinson, L. ( 1985), Methods of determining the order of an autoregressive-moving average process: A survey, Int. Statist, Review, 53, 301-329. Gradshteyn, I.S. and Ryzhik, I . M . (1 965), Tables of Integrals, Series and Products, Academic Press, New York. Granger, C.W. ( 1 980), Long memory relationships and the aggregation of dynamic models, Econometrics, 14, 227-238. Granger, C. W .1. and Andersen, A.P. ( 1 978), Non-linear time series modelling, Applied Time Series Analysis, D.F. Findley (ed.), Academic Press, New York. Granger, C.W. and Joyeux, R. ( 1 980), An introduction to long-memory time series models and fractional differencing, J. Time Series Analysis, 1, 1 5-29. Gray, H .L., Kelley, G.D. and Mcintire, D.O. (1 978), A new approach to ARMA modelling, Comm. Statist., B 7 , 1 -77. Graybill, F.A. ( 1 983), Matrices with Applications in Statistics, Wadsworth, Belmont, California. Grenander, U. and Rosenblatt, M . ( 1 957), Statistical Analysis of Stationary Time Series, John Wiley, New York. Grunwald, G.K., Raftery, A.E. and Guttorp, P. (1989), Time series of continuous proportions, Statistics Department Report, 6, The University of Melbourne. Hannan, E.J. (1 970), Multiple Time Series, John Wiley, New York. Hannan, E.J. ( 1973), The asymptotic theory of linear time series models, J. Appl. Prob., 10, 1 30- 145. Hannan, E.J. ( 1980). The estimation of the order of an ARMA process. Ann. Statist. 8, 1 07 1- 1 08 1 . 564 Bibliography Hannan, E.J. and Deistler, M. ( 1988), The Statistical Theory of Linear Systems, John Wiley, New York. Harvey, A.C. ( 1984), A unified view of statistical forecasting procedures, J. Forecasting, 3, 245-275. Hosking, J. R. M . ( 1981 ), Fractional differencing, Biometrika, 68, 1 65- 1 76. H urst, H. ( 1 95 1 ), Long-term storage capacity of reservoirs, Trans. Amer. Soc. Civil Engrs., 1 16, 778-808. Hurvich, C.M . and Tsai, C.L. ( 1989). Regression and time series model selection in small samples. Biometrika 76, 297-307. Jagers, P. ( 1 975), Branching Processes with Biological Applications, Wiley-lnterscience, London. Jones, R.H. ( 1 975), Fitting autoregressions, J. Amer. Statist. Assoc., 70, 590-592. Jones, R.H. ( 1 980), Maximum likelihood fitting of ARMA models to time series with missing observations, Technometrics, 22, 389-395. Jones, R.H. ( 1 985), Fitting multivariate models to unequally spaced data, Time Series Analysis of Irregularly Observed Data, E. Parzen (ed.), Springer Lecture Notes in Statistics, 25, 1 58 - 1 88. Kailath, T. ( 1 968), An innovations approach to least squares estimation-Part 1 : Linear filtering i n additive white noise, IEEE Transactions on Automatic Control, A C- 13, 646-654. Kailath, T. ( 1 970), The innovations approach to detection and estimation theory, Proceedings IEEE, 58, 680-695. Kalman, R.E. ( 1 960), A new approach to linear filtering and prediction problems, Trans. A SME, J. Basic Eng., 83D, 35-45. Kendall, M .G. and Stuart, A. ( 1 976), The Advanced Theory of Statistics, Vol. 3, Griffin, London. Kitagawa, G. ( 1987), Non-Gaussian state-space modelling of non-stationary time series, J . A . S. A , 82 (with discussion), 1032-1063. Koopmans, L.H. ( 1974), The Spectral Analysis of Time Series, Academic Press, New York. Lamperti, J. ( 1 966), Probability, Benjamin, New York. Lawrance, A .J. and Kottegoda, N .T. ( 1 977), Stochastic modelling of riverflow time series, J. Roy. Statist. Soc. Ser. A, 140, 1 -47. Lehmann, E.L. ( 1 983), Theory of Point Estimation, John Wiley, New York. Li, W.K. and McLeod, A.I. ( 1 986), Fractional time series modelling, Biometrika, 73, 2 1 7-22 1 . Lii, K.S. and Rosenblatt, M . ( 1982), Deconvolution and estimation of transfer function phase and coefficients for non-Gaussian linear processes, Ann. Statist., 10, 1 195-1 208. Ljung, G.M. and Box, G.E.P. ( 1 978), On a measure of lack of fit in time series models, Biometrika, 65, 297-303. McLeod, A. I. and Hipel, K. W. ( 1 978), Preservation of the rescaled adjusted range, I. A reassessment of the Hurst phenomenon, Water Resources Res., 14, 49 1 -508. McLeod, A.I. and Li, W.K. ( 1 983), Diagnostic checking A RM A time series models using squared-residual autocorrelations, J. Time Series A nalysis, 4, 269-273. Mage, D.T. ( 1 982), An objective graphical method for testing normal distributional assumptions using probability plots, A merican Statistician, 36, 1 1 6- 1 20. Makridakis, S., Andersen, A . , Carbone, R., Fildes, R., Hibon, M . , Lewandowski, R., Newton, J., Parzen, E. and Winkler, R. ( 1 984), The Forecasting A ccuracy of Major Time Series Methods, John Wiley, New York. Melard, G. ( 1 984), A fast algorithm for the exact likelihood of moving average models, Applied Statistics, 33, 104- 1 14. Bibliography 565 Mood, A . M . , Graybill, F.A. and Boes, D.C. ( 1 974), Introduction to the Theory of Statistics, McGraw-Hill, New York. Nicholls, D.F. and Quinn, B.G. (1 982), Random Coefficient A utoregressive Models: An Introduction, Springer Lecture Notes in Statistics, 1 1 . Parzen, E. ( 1 974), Some recent advances in time series modelling, IEEE Transactions on Automatic Control, A C-I9, 723-730. Parzen, E. ( 1 978), Time series modeling, spectral analysis and forecasting, Directions in Time Series, D.R . Brillinger and G.C. Tiao (eds.), Institute of Mathematical Statistics, 80- 1 1 1 . Priestley, M.B. ( 1 98 1 ), Spectral A nalysis and Time Series, Vols. I and 2, Academic Press, New York. Priestley, M.B. ( 1 988), Non-linear and Non-stationary Time Series Analysis, Academic Press, London. Rao, C.R . ( 1 973), Linear Statistical Inference and Its Applications, 2nd ed., John Wiley, New York. Rice, J. ( 1979). On the estimation of the parameters of a power spectrum, J. Multivariate A nalysis, 9, 378-392. R issanen, J. (1 973), A fast algorithm for optimum linear predictors, IEEE Transac tions on Automatic Control, A C- I8, 555. Rissanen, J. and Barbosa, L. ( 1 969), Properties of infinite covariance matrices and stability of optimum predictors, Information Sci. , I, 221 -236. Rosenblatt, M. ( 1 985), Stationary Sequences and Random Fields, Birkhauser, Boston. Schweppe, F.C. ( 1965), Evaluation of likelihood functions for Gaussian signals, IEEE Transactions on Information Theory, IT- I I , 61 -70. Seeley, R.T. ( 1 970), Calculus of Several Variables, Scott Foresman, Glenview, Illinois. Serfling, R.J. ( 1980), Approximation Theorems of Mathematical Statistics, John Wiley, New York. Shapiro, S.S. and Francia, R.S. ( 1 972), An approximate analysis of variance test for normality, J. Amer. Statist. Assoc., 67, 2 1 5-216. Shibata, R . ( 1 976), Selection of the order of an autoregressive model by A kaike's information criterion, Biometrika, 63, 1 1 7- 1 26. Shibata, R. ( 1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Ann. Statist. 8, 1 47-1 64. Simmons, G.F. ( 1 963), Introduction to Topology and Modern Analysis, McGraw-H ill, New York. Sorenson, H.W. and Alspach, D.L. ( 1 97 1 ), Recursive Bayesian estimation using Gaussian sums, Automatica, 7, 465-479. Sowell, F.B. ( 1990). Maximum likelihood estimation of stationary univariate fractionally integrated time series models, J. of Econometrics, 53, 1 65 - 1 88. Stuck, B.W. ( 1978), M inimum error dispersion linear filtering of scalar sym metric stable processes, IEEE Transactions on Automatic Control, AC-23, 507509. Stuck, B.W. and Kleiner, B. (1 974), A statistical analysis of telephone noise, The Bell System Technical Journal, 53, 1 263- 1 320. Subba Rao, T. and Gabr, M . M . ( 1984), An Introduction to Bispectral Analysis and Bilinear Time Series Models, Springer Lecture Notes in Statistics, 24. Taqqu, M . S. ( 1 975), Weak convergence to fractional Brownian motion and to the Rosenblatt process, Z. Wahrscheinlichkeitstheorie verw. Geb., JI, 287-302. Tong, H. ( 1 983), Threshold Models in Non-linear Time Series Analysis, Springer Lecture Notes in Statistics, 2 1 . Tong, H . ( 1 990), Non-linear Time Series: A Dynamical Systems Approach, Oxford University Press, Oxford. 566 Bibliography Tukey, J. ( 1949), The sampling theory of power spectrum estimates, Proc. Symp. on Applications of Autocorrelation Analysis to Physical Problems, NA V EXOS-P-735, Office of Naval Research, Washington, 47-67. Walker, A . M . ( 1 964), Asymptotic properties of least squares estimates of the param eters of the spectrum of a stationary non-deterministic time series, J. A ust. Math. Soc., 4, 363-384. Wampler, S. ( 1988), Missing values in time series analysis, Statistics Department, Colorado State University. Weiss, G. ( 1975), Time-reversibility of linear stochastic processes, J. App/. Prob., 12, 83 1-836. Whittle, P. ( 1962), Gaussian estimation in stationary time series, Bull. lnt. Statist. lnst., 39, 105- 1 29. Whittle, P. ( 1963), On the fitting of multivariate autoregressions and the approximate canonical factorization of a special density matrix, Biometrika, 40, 1 29-1 34. Whittle, P. ( 1 983), Prediction and Regulation by Linear Least-Square Methods, 2nd ed., University of Minnesota, Minneapolis. Wilson, G.T. ( 1969), Factorization of the generating function of a pure moving average process, SIA M J. Num. Analysis, 6, 1 -7. Yajima, Y. ( 1 985), On estimation of long-memory time series models, The A ustralian J. Statistics, 27, 303-320. Index Accidental deaths in the USA, 1 973-1978 (monthly) 7, 2 1 -25, 1 1 3, 324--3 26 ACF (see Autocorrelation function) AIC 273, 284--2 87, 291 -296, 302 AICC 243, 247, 252, 273, 287, 289, 302, 365, 43 1 , 432 Airline passenger data 1 5, 284-287, 29 1 -296 All Star baseball games (1 933-1980) 5, 10 Amplitude spectrum (see Crossamplitude spectrum) AN (see Asymptotic normality) Analysis of variance 332, 335 AR( oo) processes (see Autoregressive processes of infinite order) AR(p) processes (see Autoregressive processes) ARIMA processes 274--2 83 definition 274 prediction 3 14--320 seasonal (see Seasonal ARIMA models) state-space representation of 471 ARIMA(O, d, 0) processes with .5 < d < .5 (see Fractionally integrated noise) ARIMA(p, d, q) processes with .5 < d < .5 (see Fractionally integrated ARMA processes) ARMA(p, q) processes 78 � � autocovariance function of 91-97 calculating coefficients in AR representation 87, I l l in MA representation 85, 91-92, 473 causal 83, 85 equations for 78 estimation least squares 257 maximum likelihood 257 preliminary 250-252 identification 252, 296--299 invertible 86, 1 28 multivariate (see M ultivariate A R MA processes) prediction 1 7 5-1 82 seasonal (see Seasonal ARIMA models) spectral density of 1 23 state-space representations 468-471 canonical observable representation 469-47 1 with infinite variance 537 with mean 11 78 with observational error 472-473 ARVEC 432, 434, 460 Asymptotic normality 209 for random vectors 2 1 1 Asymptotic relative efficiency 253-254 Autocorrelation function (ACF) 1 2 analogue for linear processes with infinite variance 538 568 Autocorrelation function (ACF) (cont.) sample 29, 220, 527, 538 asymptotic covariance matrix of 221, 222 asymptotic distribution of 221, 222, 538 Autocovariance function characterization of 27 definition I I difference equations for 93 elementary properties of 26 Herglotz's theorem 1 1 7-1 1 8 of ARMA processes 91-97, 473-474 of AR(p) processes 94 of AR(2) processes 95 of complex-valued time series (see Complex-valued autocovariance function) of MA(q) processes 94 of MA( I ) processes 9 1 sample 28, 220 asymptotic covariance matrix of 230 asymptotic distribution of 229-230 computing using FFT 374-375 consistency of 232 spectral representation of 1 1 7-1 1 8 Autocovarianee generating function 103-1 05 of ARMA processes 103 Autoregressive integrated moving average (see ARIMA processes) Autoregressive models with random coefficients 548 Autoregressive moving average (see ARMA processes) Autoregressive polynomial 78 Autoregressive (AR(p)) processes 79 AR( I ) process 79, 81 with infinite variance 542-552 estimation Burg 366 maximum likelihood 259 preliminary 241-245 Yule-Walker 1 60, 239, 241 , 262, 279, 365, 542 asymptotic distribution of 241, 262-264 FPE 301-302, 307 identification of the order 242-243 partial autocorrelation function of 100 prediction 1 77 I ndex Yule-Walker equations 239, 279 Autoregressive processes of infinite order (AR(cx:: )) 91, 405 Autoregressive spectral density estimator 365 Backward shift operator 1 9, 78, 4 1 7 Bandwidth 362, 399 Bartlett's formula 22 1 , 222, 527 AR( I ) 224-225 independent white noise 222 MA(q) 223-224 multivariate 4 1 5, 4 1 6, 508 Bayesian state-space model 498, 505 Bessel's inequality 56 Best estimator 474 Best linear predictor 64, 1 66, 1 68, 3 1 7, 546 of second order random vectors 421 Best linear unbiased estimator of J1 220, 236 Best mean square predictor 62, 546 Best one-step predictor 474 BIC criterion 289, 29 1, 306 Bienayme-Galton-Watson process 10 Bilinear models 548, 552 Binary process 9 Bispectral density 547 Bivariate normal distribution 35-36, 37 Bounded in probability 199 (see also Order in probability) Branching process 10 Brownian motion 37-38 with drift 38 on [ - n, n] 1 39, 147, 1 62, 1 64 multivariate 454-455 CAT criterion 287, 365 Cauchy criterion 48 Cauchy distribution 449, 461 Cauchy-Schwarz inequality 43 Cauchy sequence 46 Causal 83, 1 05, 1 25-1 30 ARMA processes 85, 88 fractionally integrated ARMA processes 521, 525 multivariate ARMA processes 4 1 8, 424 seasonal ARIMA processes 323 state equation 467 time-invariant linear filter 1 53, 459 Index Cayley-Hamilton theorem 470, 492 Central limit theorem 2 1 0 for infinite order moving averages 2 1 9 for MA(q) processes 2 1 4 for strictly stationary m-dependent sequences 2 1 3 Lindeberg's condition 2 1 0, 345, 363, 445 Characteristic function 1 1 of multivariate normal random vector 34 of a random vector 1 1 of a stable random variable 535 Chebyshev's inequality 203 Cholesky factorization 255 Circulant matrix 133 approximation to the covariance matrix of a stationary process by 1 35-1 36 diagonalization of 1 34-- 1 35 eigenvalues of 1 34 eigenvectors of 1 34 Classical decomposition 14-- 1 5, 40, 284 Closed span 54 Closed subspace 50 Coherency 436 (see also Squared coherency) Complete orthonormal set 56 Completeness of L 2(0, :F, P) 68-69 Complex multivariate normal distribution 444 Complex-valued autocovariance functions 1 1 5, 1 1 9, 1 20 Complex-valued stationary processes 1 14-1 1 5 autocovariance function o f 1 1 5 existence of 1 64 linear combination of sinusoids 1 1 6-- 1 1 7 spectral representation of 145 Conditional expectation 63-65, 76 Confidence regions for parameters of ARMA models 260--261, 291 for parameters of AR(p) models 243 for parameters of MA(q) models 247 for the absolute coherency 450, 453 for the mean vector of a stationary multivariate time series 407 for the phase spectrum 449-450, 453 for the spectrum 362-365 (see also Spectral density estimation) 569 Conjugate transpose 402 Consistency condition for characteristic functions 1 1 for distribution functions 1 1 Controllability 49 1 Controllability matrix 492 Convergence almost surely 375 in distribution 204-209 characterizations of 204 in mean square 47 in norm 45, 48 in probability 1 98, 199, 375 in rth mean 203 Correlation function (see Autocorrelation function) Correlation matrix function 403 Cospectrum 437, 462 estimation of 447 Covariance function (see Autocovariance function) Covariance matrix 32 Covariance matrix function 403 characterization of 454 estimation of 407 properties of 403 spectral representation of 435, 454 Covariance matrix generating function 420, 460 Cramer-Wold device 204 Cross-amplitude spectrum 437 estimation of 448 Cross-correlation function 406 sample 408 asymptotic distribution of 4 1 0, 416 Bartlett's formula 4 1 5, 4 1 6 weak consistency o f 408 Cross-covariance function 403 sample 408 weak consistency of 408 spectral representation of 435 Cross-periodogram 443 Cross-spectrum 435 Cumulant 444, 547 Cumulant function 547 of linear processes 548 Current through a resistor 2 Delay parameter 508, 549 Deterministic 1 50, 1 87 Diagnostic checking 306--3 1 4 (see also Residuals) Index 570 Difference equations (see also Homogeneous linear difference equations) for ARIMA processes 274, 3 1 6 for h-step predictor 3 1 9-320 multivariate 429-430 for multivariate ARMA processes, 4 1 7 Difference operator first order 19 with positive lag d 24 with real lag d > I 520 Differencing to generate stationary data 1 9, 274-284 at lag d 24 Dimension of a subspace 58 Dirichlet kernel 69-70, ! 57, 359, 361 Discrete Fourier transform 332, 373 multivariate 443 Discrete spectral average (see Spectral density function) Dispersion 535, 537 Distribution function associated with an orthogonal-increment process 1 39, 454 Distribution functions of a stochastic process 1 1 , 4 1 , 1 64 Dow Jones utilities index 555 Durbin-Levinson algorithm 1 69-1 70, 269, 422 for fitting autoregressive models 242, 432 - Econometrics model 440-44 1 Empirical distribution function 338, 341 Equivalent degrees of freedom 362, 399 Ergodic stationary sequence 379 Ergodic theorem 268, 379, 381 Error dispersion 543, 544 Estimation of missing values in an ARMA process 488 Estimation of the white noise variance least squares 258, 377 maximum likelihood 257, 377 preliminary 251 using the Durbin-Levinson algorithm 240, 242 using the innovations algorithm 245-246 Euclidean space 43, 46 completeness of 46 Eventual forecast function 549 Fast Fourier transform (FFT) 373-375 Fejer kernel 70-71 , 360 FFT (see Fast Fourier transform) Filter 350 (see also Time-invariant linear filter and smoothing) low pass 1 8 successive application of filters 354, 398 Fisher's test (see Testing for hidden periodicities) Forecasting (see Prediction of stationary processes) Forecasting ARIMA processes 3 1 4-320 an ARIMA(1, 2, 1) example 3 1 8-320 h-step predictor 3 1 8 mean square error of 3 1 8 Fourier coefficients 56, 66 Fourier frequencies 3 3 1 Fourier series 65-67 nth order Fourier approximation 66 to an indicator function 1 57- 1 58 uniform convergence of Cesaro sums 69 Fourier transform 396 FPE criterion 289, 301-302, 306 Fractionally integrated ARMA processes 524 autocorrelation function of 525 causal 525 estimation 526--5 32 maximum likelihood 527-528 regression method 529-530 invertible 525 prediction 533-534 spectral density of 525 with d < - .5 526 Fractionally integrated noise 521 autocorrelation function of 522 autocovariance function of 522 partial autocorrelation function of 522 spectral density function of 522 Frequency domain 1 14, 1 43, 330 prediction in 1 85-1 87 Gain (see Time-invariant linear filter) Gamma function 521 Gaussian likelihood 254, 255 (see also Recursive calculation of the Gaussian likelihood) 571 Index of a multivariate ARMA process 430-43 1 of an AR(p) process 270 of an ARMA process 256 with missing observations 486 of general second order process 254-256 Gaussian linear process 546 Gaussian time series 1 3 bivariate 1 64, 4 1 6 multivariate 430 prediction 1 82 General linear model 60-62, 75 Generalized inverse 475, 503 Gibbs phenomenon 400 Gram-Schmidt orthogonalization 58, 74, 75, 1 7 1 , 381 Group delay 440 Helly's selection theorem 1 1 9 Herglotz's theorem 28, 1 1 7-122 Hermite polynomials 75 Hermitian function 1 1 5 Hermitian matrix 435 Hessian matrix 29 1 Hilbert space, definition 46 closed span 54 closed subspace 50 complex L2 spaces 47 Euclidean space 46 isomorphisms, 67-68 [2 67, 76 L2 46, 47 U[ - n, n] 65 orthogonal complement 50 separable 56 Homogeneous linear difference equations 105-1 1 0 first order I 08 general solution of 107 initial conditions 106 linear independent solutions of 106 second order 108 Identification techniques 284-301 of ARMA processes 296-299 of AR(p) processes 289-291 of fractionally integrated ARMA processes 530 of MA(q) processes 291-294 of seasonal ARIMA processes 323 78, 404 Independent white noise 222 (see l iD) Index set 8 Inner product 42-43 continuity of 45 Inner-product space 42-43 complete 46 orthonormal set 55 Innovations 1 29, 1 73, 476 standardized 265 Innovations algorithm 1 72 applied to ARMA processes 1 75 for preliminary estimation of MA(q) models 245-249, 292-294, 367 multivariate 423 Integration with respect to an orthogonal-increment process (see Stochastic integral) Intermediate memory processes 520 Inversion formulae (see Spectral distribution function and orthogonal-increment processes) Invertible 86, I 05, 1 25-1 30 ARMA processes 86, 87, 1 28 fractionally integrated ARMA processes 521, 525 infinite variance ARMA processes 538 multivariate ARMA processes 419 time-invariant linear filter 1 5 5 Isomorphism 67, 68 between time and frequency domains 143-144 properties of 68 liD Kalman recursions 474-482 filtering 478 fixed-point smoother 478 prediction 476 ARIMA(p, I , q) 48 1 h-step 477-478 Kolmogorov-Smirnov test (see Testing for hidden periodicities) Kolmogorov's formula 191, 197, 366 Kolmogorov's theorem 9, 1 0, 27, 38, 41, 1 64 statement I I Kullback-Leibler discrepancy 302 Kullback-Leibler index 302 572 Lag window 358 Bartlett or triangular 360, 361, 399 Blackman-Tukey 361 Daniell 360, 361, 399 Parzen 361, 399 rectangular or truncated 359 Tukey-Hamming 361 Tukey-Hanning 361 Lag-window spectral estimator (see Spectral density estimation) Lake Huron (1 875-1 972) 328, 398, 555 Laurent series 88, 1 30 Least squares estimation for ARMA processes 257-258, 377 asymptotic properties 258-260, 384, 386 derivation of 265-269, 376-396 of variance 258 Least squares estimation for transfer function models 509 Least squares estimation of trend 1 5 Lebesgue decomposition of the spectral measure 1 90 Lebesgue measure 190 Likelihood function (see Gaussian likelihood) Lindeberg's condition (see Central limit theorem) Linear approximation in IR 3 48 in L 2 49 Linear filter 1 7, 1 52, 441 (see also Time-invariant linear filter) Linear multivariate processes 404 Linear processes with infinite variance 535-545 analogue of the autocorrelation function 538 Linear regression 60--6 2 Long memory processes 520--5 34 Lynx trappings ( 1 82 1-1934) 546, 549-552, 557 m-dependence 2 1 2-21 3, 263 Matrix distribution function 454 MA(q) (see Moving average processes) MA(oo) (see Moving average processes of infinite order) Markov chain 196 Markov property 465 Martingale difference sequence 546, 553 Maximum entropy spectral density estimator (see Spectral density estimation) Index Maximum likelihood estimation 256 Maximum likelihood estimation for ARMA processes 256-258 asymptotic properties 258-260, 384, 386 derivation of 265-269, 376-396 Maximum likelihood estimation for fractionally integrated ARMA processes 527-528 M aximum likelihood estimation for transfer function models 5 1 4 Maximum likelihood spectral density estimator (see Spectral density estimation) Mean best linear unbiased estimator of 220, 236 of a multivariate stationary time series 403 of a random vector 32 sample 29, 2 1 8, 406 asymptotic normality of 2 1 9, 406 derivation of 225 mean squared error of 2 1 8-21 9 Mean square convergence 47, 62 properties of 62 Minimum dispersion h-step predictor 544 M inimum dispersion linear predictor 544 Minimum mean squared error of prediction of a stationary process 53-54 Mink trappings ( 1848-1 9 1 1 ) 46 1, 557 Missing values in ARMA processes 482--488 estimation of 487-488 likelihood calculation with 483--486 MLARMA (see Spectral density estimation) Moment estimator 240, 249, 253, 270, 362 Moment generating function 39, 41 Moving average polynomial 78 Moving average (MA(q)) processes 78, 89-90 autocorrelation function of 80 autocovariance function of 79 estimation innovations 245, 270 maximum likelihood 259, 270 method of moments 249, 253, 270, 540 preliminary 245-250, 291-294 Index invertible and non-invertible versions 295, 326 order identification of 246--247, 291-294 prediction 1 77 with infinite variance 540-541 Moving average processes of infinite order (MA( oo )) 89-9 1 autocovariance function of 9 1 multivariate 405 with infinite variance 536 Multiple correlation coefficient 45 1 Multivariate ARMA processes 4 1 7 AR( oo ) representation 4 1 7 causal 420 covariance matrix function of 420 covariance matrix generating function of 420 estimation 430-434 maximum likelihood 43 1 Yule-Walker 432 identification 431 invertible 42 1 MA( oo ) representation 4 1 8 prediction 426-430 (see also Recursive prediction) Yule-Walker equations 420 Multivariate innovations algorithm 425 applied to an ARMA( 1 , 1) process 427-428 one-step predictors 425 prediction error covariance matrix 425 Multivariate normal distribution 33-37 an equivalent characterization of 36 characteristic function of 34 conditional distribution 36 conditional expectation 36 density function 34 Multivariate time series 402 covariance matrices of 402 mean vectors of 402 prediction 421-430 stationary 402-403 Multivariate white noise 404 Muskrat trappings ( 1 848-19 1 1) 461, 557 Noise 1 5 (see also White noise) Non-deterministic, purely 1 89, 1 97, 546 573 Non-negative definite function 26-27, 1 1 5, 1 1 7 Non-negative definite matrix 33 Non-stationary time series 29, 274 Norm 43 convergence in 45, 48 properties of 45 Observability 496 Observability matrix 496 Observational equation 464 One-step mean squared error based on infinite past 1 87, 295 Kolmogorov's formula 1 9 1 One-step predictors (see Prediction of stationary processes) Order in probability 1 99 for random vectors 200 Order selection 301-306 Orthogonal complement of a subspace 50 elements in a Hilbert space 44 matrix 33 eigenvalues of 33 eigenvectors of 33 projection 5 1 random vectors 421, 464 Orthogonal-increment process 1 38-1 40, 145, 1 52 integration with respect to 1 40-143 inversion formula for 1 5 1 , 152 right continuous 1 35, 454 vector-valued 454 Orthonormal basis 56 for C" 3 3 1 for IR " 333 Orthonormal set 55 complete 56 of random variables 55 PACF (see Partial autocorrelation function) Parallelogram law 45 Parameters of a stable random variable 535 Pareto-like tails 535 Parseval's identity 57 Partial autocorrelation function (PACF) 98- 1 02 an equivalent definition of 1 02, 1 7 1 estimation o f 102, 243 of an AR( 1 ) process 98 Index 574 Partial autocorrelation function (cont.) of an AR(p) process 100 of an MA( l ) process 100, 101, 1 1 3 sample 1 02, 243 Periodogram 332 asymptotic covariance of ordinates for iid sequences 344 for two-side moving averages 347-348 asymptotic distribution for iid sequences 344 for two-sided moving averages 347-348 asymptotic unbiasedness 343 cumulative periodogram 341 extension to [ - n, n] 343 multivariate 443 asymptotic properties of 443-446 smoothing of 350-362, 446 PEST 23, 24, 25, 40, 1 1 1 , 1 60, 1 6 1 , 243, 252, 257, 26 1 , 270, 276, 277, 284, 292, 295, 414, 508, 5 1 0, 550 Phase spectrum 437 confidence interval for 449-450, 453 estimation of 448-450, 452, 453 Poisson process 38 on [ - n, n] 1 39 Polyspectral density 547 Population of USA ( 1 790-1980) 3, 1 5-16, 20 Portmanteau test for residuals 3 1 0-3 1 2 Power transfer function 1 23 (see also Time-invariant linear filter) Prediction bounds 1 82 Prediction equations 53 for h-step predictors 1 68 for multivariate time series 42 1 for one-step predictors 1 67 in the time domain 1 67 Prediction of stationary processes (see also Recursive prediction) AR(p) processes 1 77 ARMA processes 1 75-182 based on infinite past 1 82-1 84 covariance matrix of prediction errors 1 84 h-step prediction 1 79-1 82 truncation approximation 1 84 best linear predictor of a stationary process 1 66, 1 68 fractionally integrated ARMA processes 533-534 Gaussian processes 1 82 prediction bounds 182 h-step prediction 1 68, 1 74-1 7 5, 1 79-1 82 mean squared error of 1 75 in frequency domain 1 85-1 87 ARMA processes 186 infinite variance processes 542-545 M A( l ) processes 544 MA( l ) processes 1 73 MA(q) processes 1 77 multivariate ARMA processes 426-430 one-step predictors 167, 1 72, 425, 474, 476 mean squared error of 1 69, 1 72, 425, 474, 476 Preliminary transformations 284 Prewhitening 412, 4 1 3, 414, 4 1 5, 507 Projection in IR" 58-60 mapping 52 properties of 52 matrix 59 of multivariate random vectors 421, 475 theorem 5 1 Quadrature spectrum 437, 462 estimation of 447 and S arrays 273 Random noise component 1 5 Random variable 9 Random vector 32 Random walk 1 0, 14 Rational spectral density (see Spectral density function) Realizations of a stochastic process 9 Recursive calculation of the Gaussian likelihood function 254-256 for an ARMA process with missing observations 483-485 Recursive prediction 1 69- 1 82 Durbin-Levinson algorithm 1 69-1 70 h-step predictors 1 74-1 75 for ARMA processes 1 79-1 82 mean squared error of 1 8 1 innovations algorithm 1 72 Kalman prediction (see Kalman recursions) multivariate processes 422-430 R Index Durbin-Levinson algorithm 422--423 innovations algorithm 425 of a multivariate ARMA process 426-427 of an AR(p) process 1 77 of an ARMA process 1 75-182 of an MA(q) process 1 77 Reduced likelihood 257, 272 Residuals 307 application to model modification 299-301 diagnostic checking 306-3 1 4 check o f normality 3 1 4 graph of 307 sample autocorrelation function of 308-3 1 0 tests o f randomness for 3 1 2-3 1 3 Reversible time series 546 Riemann-Lebesgue lemma 76 Riemann-Stieltjes integral 1 1 6 Sales with a leading indicator 4 1 4, 432-434, 5 10--5 1 2, 5 1 4-520 Sample autocorrelation function 29, 220, 527, 538 of residuals 307 autocovariance function 28, 220 coefficient of variation 2 1 2 covariance matrix 220 non-negative definite 22 1 mean 29, 2 1 8, 527 multivariate 406 SARIMA (see Seasonal A RIMA models) Seasonal ARIMA models 320--3 26 Seasonal component 1 5, 284 differencing 24 estimation of 20--2 5 method S1 (small trend) 2 1 , 24 method S2 (moving average) 23, 24 Separable Hilbert space 56 Series A (see Lake Huron ( 1 875-1972)) Series B (Dow Jones Utilities Index) 329, 555 Series C (Private Housing Units Started, USA) 327, 554 Series D (Industrial Production, Austria) 329, 556 Series E (Industrial Production, Spain) 329, 556 575 Series F (General Index of Industrial Production) 329, 557 Series G (see Lynx Trappings ( 1 821-1934)) Series H (see M ink Trappings ( 1 848-191 1)) Series I (see Muskrat Trappings ( 1 848-191 1)) Series J (Simulated Input Series) 460, 553, 558 Series K (Simulated Output Series) 460, 553, 559 Shift operator (see Backward shift operator) Simulation of an ARMA process 27 1 multivariate 460 Simulation of a Gaussian process 27 1 multivariate 460 SMOOTH 1 7 Smoothing exponential 1 7 by means of a moving average 1 6- 1 9 the periodogram (see Spectral density estimation) using a simple moving average 350, 353 S PEC 354, 397, 452, 46 1 Spectral density estimation autoregressive 365, 367, 369, 370, 372 discrete spectral average 3 5 1 asymptotic properties of 3 5 1 , 353 confidence intervals using x 2 approximation 362-363 confidence intervals using a normal approximation 363-364 simple moving average 350, 353 lag window 354, 358, 372 asymptotic variance of 359 maximum entropy 365 maximum likelihood ARMA (MLARMA) 367, 368, 370 moving average 367, 370 rational 365, 372 smoothing the periodogram 350--362 Spectral density function 1 1 8, 1 19, 1 20, 1 22 an asymptotically unbiased estimator of 1 3 7 i n the real-valued case 1 22 of ARMA processes 1 23 causality and invertibility 1 25-1 30 of an AR( l ) 1 25 of an MA( l ) 123 rational 1 23 576 Spectral density function (cont.) rational function approximations to 1 30-133 Spectral density matrix 435, 443, 457 estimation of 446-447 discrete spectral average 446-447 smoothing the periodogram 446-447 Spectral distribution function 1 1 8, 1 1 9, 145 discontinuity in 1 48, 1 50 in the real-valued case 1 2 1 inversion formula for 1 5 1 , 1 52 Lebesgue decomposition of 1 90 of a linear combination of sinusoids 1 1 6 Spectral distribution matrix function 454 Spectral matrix function (see Spectral density matrix) Spectral representation of an autocovariance function 1 1 8 of a continuous-time stationary process 1 52 of a covariance matrix function 405, 454 of a stationary multivariate time series 405, 456 of a stationary time series 145 Spectral window 358 Spectrum (see Spectral density function and cross-spectrum) Spencer's 1 5 point moving average 1 8-19, 39 Squared coherency function 436-439 confidence interval for 450, 453 estimation of 450, 453 test of zero coherency 451, 452 Stable matrix 467 Stable random variables 535-536 parameters of 535 positive 2 1 6 properties of 535 symmetric 535 State equation 464 State-space models 463--474 Bayesian 498, 505 for threshold models 548 non-stationary 479--48 1 stationary 467 causal 467 controllable 49 1 innovations representation 489, 490 Index minimum dimension 497 observable 496 stable 467 with missing observations 482-488 Stationarity 1 2 covariance 1 2 i n the wide sense 1 2 second order 1 2 strict 12 weak 1 2 Stirling's formula 522 Stochastic integral 142, 455 properties of 1 42, 455 with respect to a vector-valued orthogonal increment process 455 Stochastic process 8 distribution functions of 1 1 , 4 1 realizations o f 9 Strict stationarity 1 2 Strikes i n the USA ( 1 951-1980) 4, 1 7, 1 8, 1 1 3 Strong consistency of estimators for ARMA parameters 376-388 Strong law of large numbers 376 Taylor expansions in probability 201-202 Testing for hidden periodicities 334-342 Fisher's test 337-339, 342 Kolmogorov-Smirnov test applied to the cumulative periodogram 339-342 of a specified frequency 334-337 of an unspecified frequency 337-342 Testing for the independence of two stationary time series 4 1 2-4 1 3 Tests o f randomness 3 1 2-3 1 3 based on the periodogram 339-342 based on turning points 3 1 2 difference-sign test 3 1 3 rank test 3 1 3 Threshold models 545-552 Time domain 1 14, 145 prediction equations 53 Time-invariant linear filter (TLF) 123, 1 53, 438, 439, 457--458, 506 absolutely summable 1 54 amplitude gain 1 56 causal 1 5 3 for multivariate time series 457-459 invertible 1 55 matrix transfer function 458 577 Index phase gain 1 56 power gain 1 56 power transfer function 123, 1 56 simple delay 442 transfer function 1 23, 1 56, 442 Time series 1 discrete-time continuous-time 1 Gaussian 1 3 TLF (see Time-invariant linear filter) TRANS 414, 461 , 509, 5 1 0 Transfer function (see Time-invariant linear filter) Transfer function models 432, 506 estimation for 507-5 10, 5 1 4 prediction for 5 1 5, 5 1 7-520 state-space representation of 5 1 3-5 1 7 Transformations 1 5 (see also Identification techniques) variance-stabilizing 284 Trend component 1 5, 284 elimination of 1 5-24 in absence of seasonality 1 5 differencing 1 9 estimation of 1 5, 1 9 b y least squares 1 5 by smoothing with a moving average 1 6 randomly varying with noise 465, 466 Triangle inequality 44 Trigonometric polynomials 69, 1 50, 1 57 Vandermonde matrix I 09 Weak law of large numbers 206 for infinite order moving averages 208 Weakly consistent 253 Weighted sum of squares 257 White noise 78 multivariate 404 Window (see Lag window) Wold decomposition 1 87, 546 Wolfer sunspot numbers ( 1770-1 869) 6, 29, 32, 1 60, 161, 269, 354, 397 WORD6 40, 1 94 WN (see White noise) Yule-Walker equations (see Autoregressive processes and multivariate ARMA processes)

Time Series Theory and Methods Springer Series in Statistics

Похожие документы

Разделы

Поддержка

Time Series Theory and Methods Springer Series in Statistics

Похожие документы

Добавить этот документ в коллекции

Добавить этот документ в сохраненные

Предложите, как улучшить StudyLib