The prediction of stock's price using Linear Regression for Machine Learning (1)

28 Nov 2021

Outline

This is purpose is prediction of stock’s price using ML with financial statements.

I have already gotten financial statements up to yahoo’s fianance web.This method have alreday been explained by this link. If you want how to get, you should check it. And this full idea has been already explained for kaggle.

The overall method will be carried out in the following …

Dats Preprocessing
Correlation for features
Modeling
Conclusion

Data Preprocessing

It is imfortant to analyze data for ML. I only use the recent value and numeric datas. Some example are B (bilion) -> 10^9 and etc… Link. And then, I load preprocessed data.

Previous Data Preprocessing
Additional Data Preprocessing

Previous Data Preprocessing

Load Financial statements of stock preprocessed

df_stats = pd.read_json(url+'/data_preprocessing/{0}_stats_element.json'.format(index_name))
df_addstats = pd.read_json(url+'/data_preprocessing/{0}_addstats_element.json'.format(index_name))
df_balsheets = pd.read_json(url+'/data_preprocessing/{0}_balsheets_element.json'.format(index_name))
df_income = pd.read_json(url+'/data_preprocessing/{0}_income_element.json'.format(index_name))
df_flow = pd.read_json(url+'/data_preprocessing/{0}_flow_element.json'.format(index_name))

Merge dataframe

df = pd.concat([df_stats, df_addstats, df_balsheets, df_income, df_flow], axis=1)

Check numeric datasets

from pandas.api.types import is_numeric_dtype
num_cols = [is_numeric_dtype(dtype) for dtype in df.dtypes]

Split data and test for correlation

from sklearn.model_selection import train_test_split
train_df_corr, test_df_corr = train_test_split(df, test_size=0.2)

Correlation for features and Heatmap

corrmat = train_df_corr.corr()
top_corr_features = corrmat.index[abs(corrmat['marketCap'])>0]
plt.figure(figsize=(13,10))
plt_corr = sns.heatmap(train_df_corr[top_corr_features].corr(), annot=True)

We can think how to consider features for changing the degree of correlation.

It is a value that does not take into account any degree.

Heat map

There are many values that do not matter if viewed simply because there are tickers that contain insufficient information.Therefore, we need additional preprocessing to process in sufficient information.

HAN's BLOG

En | Ko | Jp

The prediction of stock's price using Linear Regression for Machine Learning (1)

Outline

1. Introduction

Data Preprocessing

Previous Data Preprocessing

Load Financial statements of stock preprocessed

Merge dataframe

Check numeric datasets

Split data and test for correlation

Correlation for features and Heatmap

HAN's BLOG En | Ko | Jp

The prediction of stock's price using Linear Regression for Machine Learning (1)

Outline

1. Introduction

Data Preprocessing

Previous Data Preprocessing

Load Financial statements of stock preprocessed

Merge dataframe

Check numeric datasets

Split data and test for correlation

Correlation for features and Heatmap

Related posts

The prediction of stock's price using Linear Regression for Machine Learning (2) 04 Dec 2021

Preprocessing with financial statements 10 Jul 2021

How to get stock's financial statements? 04 Jul 2021

HAN's BLOG

En | Ko | Jp