S&P 500 Stock Price Prediction Using Technical, Fundamental and Text Data

Shan Zhong; David Hitchcock

doi:10.19139/soic-2310-5070-1362

Shan Zhong University of South Carolina https://orcid.org/0000-0003-1915-7350
David Hitchcock University of South Carolina

DOI: https://doi.org/10.19139/soic-2310-5070-1362

Keywords: Stacking, LSTM, Random Forest, Text Sentiment

Abstract

We summarized both common and novel predictive models used for stock price prediction and combined them with technical indices, fundamental characteristics and text-based sentiment data to predict S&P stock prices. A 66.18% accuracy in S&P 500 index directional prediction and 62.09% accuracy in individual stock directional prediction was achieved by combining different machine learning models such as Random Forest and LSTM together into state-of-the-art ensemble models. The data we use contains weekly historical prices, finance reports, and text information from news items associated with 518 different common stocks issued by current and former S&P 500 large-cap companies, from January 1, 2000 to December 31, 2019. Our study's innovation includes utilizing deep language models to categorize and infer financial news item sentiment; fusing different models containing different combinations of variables and stocks to jointly make predictions; and overcoming the insufficient data problem for machine learning models in time series by using data across different stocks.

References

Mandelbrot, Benoit B The variation of certain speculative prices, Fractals and Scaling in Finance, pp. 371-418, 1997.

Samuelson, Paul A Proof that properly anticipated prices fluctuate randomly, The World Scientific Handbook of Futures Markets, pp. 25-38, 2016.

Qian, Bo and Rasheed, Khaled Stock market prediction with multiple classifiers, Applied Intelligence, vol. 26, no. 1, pp. 25–33, 2007.

Bollen, Johan and Mao, Huina and Zeng, Xiaojun Twitter mood predicts the stock market, Journal of Computational Science, vol. 2, no. 1, pp. 1–8, 2011.

Fama, Eugene F Efficient capital markets: A review of theory and empirical work, The Journal of Finance, vol. 25, no. 2, pp. 383–417, 1970.

Fischer, Thomas G Reinforcement learning in financial markets - a survey, FAU Discussion Papers in Economics, 2018.

Obthong, Mehtabhorn and Tantisantiwong, Nongnuch and Jeamwatthanachai, Watthanasak and Wills, Gary A survey on machine learning for stock price prediction: algorithms and techniques, 2020.

Hurst, Brian and Ooi, Yao Hua and Pedersen, Lasse Heje A century of evidence on trend-following investing, The Journal of Portfolio Management, vol. 44, no. 1, pp. 15–29, 2017.

Piotroski, Joseph D Value investing: The use of historical financial statement information to separate winners from losers, Journal of Accounting Research, pp. 1–41, 2000.

Fama, Eugene F The behavior of stock-market prices, The Journal of Business, vol. 38, no. 1, pp. 34–105 1965.

Adebiyi, Ayodele Ariyo and Adewumi, Aderemi Oluyinka and Ayo, Charles Korede Comparison of ARIMA and artificial neural networks models for stock price prediction, Journal of Applied Mathematics, vol. 2014, 2014.

Ariyo, Adebiyi A and Adewumi, Adewumi O and Ayo, Charles K Stock price prediction using the ARIMA model, 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, pp. 106–112, 2014.

Meyler, Aidan and Kenny, Geoff and Quinn, Terry Forecasting Irish inflation using ARIMA models, Central Bank and Financial Services Authority of Ireland, 1998.

Lux, Thomas and Kaizoji, Taisei Forecasting volatility and volume in the Tokyo stock market: Long memory, fractality and regime switching, Journal of Economic Dynamics and Control, vol. 31, no. 6, pp. 1808–1843 2007.

Nelson, David MQ and Pereira, Adriano CM and de Oliveira, Renato A Stock market’s price movement prediction with LSTM neural networks, 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1419–1426 2017.

Jiang, Minqi and Liu, Jiapeng and Zhang, Lu and Liu, Chunyu An improved Stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms, Physica A: Statistical Mechanics and its Applicationsl, vol. 541, pp. 122272 2020.

Butler, Kirt C and Malaikah, S Jamal Efficiency and inefficiency in thinly traded stock markets: Kuwait and Saudi Arabia, Journal of Banking & Financel, vol. 16, no. 1, pp. 197–210 1992.

Kavussanos, Manolis G and Dockery, Everton A multivariate test for stock market efficiency: the case of ASE, Applied Financial Economics, vol. 11, no. 5, pp. 573–579 2001.

Yao, Juan and Gao, Jiti and Alles, Lakshman Dynamic investigation into the predictability of Australian industrial stock returns: Using financial and economic information, Pacific-Basin Finance Journal, vol. 13, no. 2, pp. 225–245 2005.

Park, Young S and Lee, Jung-Jin An empirical study on the relevance of applying relative valuation models to investment strategies in the Japanese stock market, Japan and the World Economy, vol. 15, no. 3, pp. 331–339 2003.

Atsalakis, G and Valavanis, Kimon P Surveying stock market forecasting techniques-Part I: Conventional methods, Journal of Computational Optimization in Economics and Finance, vol. 2, no. 1, pp. 45–92 2010.

Hirschberg, Daniel S Algorithms for the longest common subsequence problem, Journal of the ACM (JACM), vol. 24, no. 4, pp. 664–675 1977.

Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, pp. 3111–3119 2013.

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 2018.

Araci, Dogu Finbert: Financial sentiment analysis with pre-trained language models, arXiv preprint arXiv:1908.10063 2019.

Yu, Pengfei and Yan, Xuesong Stock price prediction based on deep neural networks, Neural Computing and Applications, vol. 32, no. 6, pp. 1609–1628 2020.

Ding, Xiao and Zhang, Yue and Liu, Ting and Duan, Junwen Deep learning for event-driven stock prediction, Twenty-fourth international joint conference on artificial intelligence, 2015.

Gorenc Novak, Marija and Veluscek, Dejan Prediction of stock price movement based on daily high prices, Quantitative Finance, vol. 16, no. 5, pp. 793–826 2016.

Akita, Ryo and Yoshihara, Akira and Matsubara, Takashi and Uehara, Kuniaki Deep learning for stock prediction using numerical and textual information, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS, pp. 1–6 2016.

Yahoo! Finance Yahoo!, https://finance.yahoo.com/, 2021.

Alpha Vantage Alpha Vantage Inc., https://www.alphavantage.co/, 2021.

Gorenc Novak, Marija and Veluscek, Dejan Prediction of stock price movement based on daily high prices, Quantitative Finance, vol. 16, no. 5, pp. 793–826 2016.

Malo, Pekka and Sinha, Ankur and Korhonen, Pekka and Wallenius, Jyrki and Takala, Pyry Good debt or bad debt: Detecting semantic orientations in economic texts, Journal of the Association for Information Science and Technology, vol. 65, no. 4, pp. 782–796 2014.

Yeo, In-Kwon and Johnson, Richard A A new family of power transformations to improve normality or symmetry, Biometrika, vol. 87, no. 4, pp. 954–959 2000.

Ho, Tin Kam Random decision forests, Proceedings of 3rd international conference on Document Analysis and Recognition, vol. 1, pp. 278–282 1995.

Svozil, Daniel and Kvasnicka, Vladimir and Pospichal, Jiri Introduction to multi-layer feed-forward neural networks, Chemometrics and Intelligent Laboratory Systems, vol. 39, no. 1, pp. 43–62 1997.

Brownlee, Jason A gentle introduction to the rectified linear unit (ReLU), Machine Learning Mastery, vol. 6, 2019.

Srivastava, Nitish and Hinton, Geoffrey and Krizhevsky, Alex and Sutskever, Ilya and Salakhutdinov, Ruslan Dropout: a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958 2014.

Gers, Felix A and Schmidhuber, Jurgen and Cummins, Fred Learning to forget: Continual prediction with LSTM, Neural computation, vol. 12 no. 10, pp. 2451–2471 2000.

Prechelt, Lutz Early stopping-but when?, Neural Networks: Tricks of the trade, pp. 55–69 1998.

Wolpert, David H Stacked generalization, Neural Networks, vol. 5 no. 2, pp. 241–259 1992.

Sutton, Richard S and Barto, Andrew G Reinforcement Learning: An Introduction, 2018.

Puterman, Martin L Markov Decision Processes: Discrete Stochastic Dynamic Programming, 2014

Deng, Yue and Bao, Feng and Kong, Youyong and Ren, Zhiquan and Dai, Qionghai Deep direct reinforcement learning for financial signal representation and trading, IEEE transactions on neural networks and learning systems, vol. 28 no. 3, pp. 653–664 2016.