Conclusions

Code

Data


In this data exploration, I sought to examine the relationship between various indicators of economic well being in America. Specifically, I was interested in the relationship between formal measure (like unemployment rates from the Bureau of Labor Statistics), and potential informal signals, namely google trends data on economic topics. How close were these measures? Were some of these variables useful leading indicators of others?

Getting to Know Our Data

All time series analysis begins with visualizations to understand our data. Through visualizing my time series, I was quicly able to tell some key patterns that might shape how I'd need to treat my data (if I were to use all of the data in all my analyses).

Relationships Among The Variables

Because I was interested in the relationship among variables, I chose to do an extra section of my website on multivariate relationships: please look there for the full analysis, which I will re-summarize here.

I was able to fit a VAR(1) model, that showed that unemployment was significantly related to lag-1 variables on unemployment, google searches for 'unemployment', consumer sentiment, and google searches for 'food stamps'. This is important becasue it suggests that google trends could give a 1-month warning of unemployment statistics.

Areas for Future Analysis

This time series analysis suggests several areas for future research.

Forecasting Does Poorly in the COVID-19 Era

Here I compare the chosen SARIMA(4,1,1)(2,0,2)12 model of US unemployment with various benchmark methods (the series mean and the naive forecast), and with the actual rate for 2020 (January onwards). This just works with data from January 2004 onward.

As we can see, all of the forecasting did terribly for 2020 because none of them could have accounted for the fact of the COVID-19 crisis. The 'mean' method (which simply predicts that the future will be equal to the mean of all past observations) did the best, but still not great.

Please see the ARIMA page for discussion of the error rates of these methods.

In this way, COVID-19 is a reality-check on the limits of time series forecasting, because these basic methods assume that patterns and relationships of recent time periods will hold in the future.

It is possible that, going forward, the huge spikes in economic indicators in 2020 will require us to increase the uncertainty in future models, ie in anticipation of future pandemics. Having seen the high impact of a public health emergency on the US economy, perhaps forecasters could incorporate information about the current risk of health threats, or just project a larger confidence interval to capture this possiblity.

Images Source: New York Public Library at https://unsplash.com/photos/Q0_u7YwXqh0