Preprocessing with financial statements
10 Jul 2021Preprocessing with financial statements
I have already gotten the financial statement of post.
It is enough from person to person. However I will analyze for meachine learning for regression.
I hope to get and change numeric data in statements.
If you want to understand what it is mean in financialk statements, you should know and search the defintion for google. It is not major of finance for me and already explained more easily than me for world.
I have used config file for root_url which is setted by myself and get index list using API of yahoo_fin.
And, I utilized class made by me. It is useful to preprocess and multiple strategy that it is updated for performance improvement curruntly.
- yahoo_fin API
pip install yahoo_fin
You have to make directory(
Code to get preprocessing stock’s financial statements
import pandas as pd import pandas_datareader as pdr import yahoo_fin.stock_info as yfs import json def main(url='', index_list = ['AAPL'], index_name = 'dow', EACH = True, ALL = True, SHOW = True, SAVE = True): ## Configuration directory url url_data = url+'data_origin/' url_pre = url+'data_preprocessing/' strategy = LongTermStrategy(url, filename) # Select Long term strategy ################### ## Preprocessing ## ################### if EACH == True: print('Preprocessing each Financial Statements') # Basic stats dict_basic = {} basic_fac = ['PER', 'PSR', 'PBR', 'PEG', 'forPER', 'Cap'] print('\nEach factor preprocessing in Basic stats ('+ ' '.join(s for s in basic_fac), end=')\n') for fac in basic_fac: if fac == 'PER': df = strategy.get_PER() elif fac == 'PSR': df = strategy.get_PSR() elif fac == 'PBR': df = strategy.get_PBR() elif fac == 'PEG': df = strategy.get_PEG() elif fac == 'forPER': df = strategy.get_FORPER() elif fac == 'Cap': df = strategy.get_CAP() if SAVE == True: print('\n----------------------------- SAVE DATA {} ---------------------------------\n'.format(fac)) df.to_json(url_pre+'/{0}_{1}.json'.format(index_name, fac)) df.to_csv(url_pre+'/{0}_{1}.csv'.format(index_name, fac)) if SHOW == True: print(df) dict_basic[fac] = df # Additional stats dict_add = {} basic_fac = ['Beta', 'DivRate', 'ROE', 'ROA', 'PM', 'Cash', 'Debt'] print('\nEach factor preprocessing in Additional stats ('+ ' '.join(s for s in basic_fac), end=')\n') for fac in basic_fac: if fac == 'Beta': df = strategy.get_Beta() elif fac == 'DivRate': df = strategy.get_DivRate() elif fac == 'ROE': df = strategy.get_ROE() elif fac == 'ROA': df = strategy.get_ROA() elif fac == 'PM': df = strategy.get_PM() elif fac == 'Cash': df = strategy.get_Cash() elif fac == 'Debt': df = strategy.get_Debt() if SAVE == True: print('\n----------------------------- SAVE DATA {} ---------------------------------\n'.format(fac)) df.to_json(url_pre+'/{0}_{1}.json'.format(index_name, fac)) df.to_csv(url_pre+'/{0}_{1}.csv'.format(index_name, fac)) if SHOW == True: print(df) dict_add[fac] = df # Balance sheets dict_bal = {} basic_fac = ['TA'] print('\nEach factor preprocessing in Balance sheets ('+ ' '.join(s for s in basic_fac), end=')\n') for fac in basic_fac: if fac == 'TA': df = strategy.get_TA() if SAVE == True: print('\n----------------------------- SAVE DATA {} ---------------------------------\n'.format(fac)) df.to_json(url_pre+'/{0}_{1}.json'.format(index_name, fac)) df.to_csv(url_pre+'/{0}_{1}.csv'.format(index_name, fac)) if SHOW == True: print(df) dict_bal[fac] = df # Income sheets dict_income = {} basic_fac = ['TR'] print('\nEach factor preprocessing in Income sheets ('+ ' '.join(s for s in basic_fac), end=')\n') for fac in basic_fac: if fac == 'TR': df = strategy.get_TR() if SAVE == True: print('\n----------------------------- SAVE DATA {} ---------------------------------\n'.format(fac)) df.to_json(url_pre+'/{0}_{1}.json'.format(index_name, fac)) df.to_csv(url_pre+'/{0}_{1}.csv'.format(index_name, fac)) if SHOW == True: print(df) dict_income[fac] = df # Cash flow dict_flow = {} basic_fac = ['DIV', 'ISS'] print('\nEach factor preprocessing in Cash flow ('+ ' '.join(s for s in basic_fac), end=')\n') for fac in basic_fac: if fac == 'DIV': df = strategy.get_DIV() elif fac == 'ISS': df = strategy.get_ISS() if SAVE == True: print('\n----------------------------- SAVE DATA {} ---------------------------------\n'.format(fac)) df.to_json(url_pre+'/{0}_{1}.json'.format(index_name, fac)) df.to_csv(url_pre+'/{0}_{1}.csv'.format(index_name, fac)) if SHOW == True: print(df) dict_flow[fac] = df if ALL == True: stats_name = ['stats', 'addstats', 'balsheets', 'income', 'flow'] print('\nPreprocessing element in ('+ ' '.join(s for s in stats_name), end=')\n') for stat in stats_name: if stat == 'stats': df = strategy.get_stats(preprocessing = True) elif stat == 'addstats': df = strategy.get_addstats(preprocessing = True) elif stat == 'balsheets': df = strategy.get_balsheets_element(index_list) elif stat == 'income': df = strategy.get_income_element(index_list) elif stat == 'flow': df = strategy.get_flow_element(index_list) if SAVE == True: print('\n----------------------------- SAVE DATA {} ---------------------------------\n'.format(stat)) print('') df.to_json(url_pre+'/{0}_{1}_element.json'.format(index_name, stat)) df.to_csv(url_pre+'/{0}_{1}_element.csv'.format(index_name, stat)) if SHOW == True: print(df) if __name__=='__main__': from class_Strategy import LongTermStrategy with open('config/config.json', 'r') as f: config = json.load(f) root_url = config['root_dir'] dow_list = yfs.tickers_dow() filename = '' s= input("Choice of stock's list (dow, sp500, nasdaq, other, all, selected): ") if s == 'dow': dow_list = yfs.tickers_dow() filename = 'dow' elif s == 'sp500': filename = 'sp500' dow_list = yfs.tickers_sp500() elif s == 'nasdaq': filename = 'nasdaq' dow_list = yfs.tickers_nasdaq() elif s == 'other': filename = 'other' dow_list = yfs.tickers_other() elif s == 'all': filename = 'all' dow_list_1 = yfs.tickers_nasdaq() dow_list_2 = yfs.tickers_other() dow_list = dow_list_1 + dow_list_2 elif s == 'selected': filename = 'selected' url = 'data_ForTrading/selected_ticker.json' temp_pd = pd.read_json(url) temp_pd = temp_pd['Ticker'] dow_list = temp_pd.values.tolist() main(url=root_url, index_list = dow_list, index_name = filename, EACH=True, ALL=True, SAVE=True, SHOW=False) else: pass
It is needed what to set various factor like PER PBR and so on.
And, we can choose parameter what you want in main funtion.
Result
It is example for dow index not to show dataframes which are setted as parameter is False.
Choice of stock's list (dow, sp500, nasdaq, other, all, selected): dow Preprocessing each Financial Statements Each factor preprocessing in Basic stats (PER PSR PBR PEG forPER Cap) ----------------------------- SAVE DATA PER --------------------------------- ----------------------------- SAVE DATA PSR --------------------------------- ----------------------------- SAVE DATA PBR --------------------------------- ----------------------------- SAVE DATA PEG --------------------------------- ----------------------------- SAVE DATA forPER --------------------------------- ----------------------------- SAVE DATA Cap --------------------------------- Each factor preprocessing in Additional stats (Beta DivRate ROE ROA PM Cash Debt) ----------------------------- SAVE DATA Beta --------------------------------- ----------------------------- SAVE DATA DivRate --------------------------------- ----------------------------- SAVE DATA ROE --------------------------------- ----------------------------- SAVE DATA ROA --------------------------------- ----------------------------- SAVE DATA PM --------------------------------- ----------------------------- SAVE DATA Cash --------------------------------- ----------------------------- SAVE DATA Debt --------------------------------- Each factor preprocessing in Balance sheets (TA) ----------------------------- SAVE DATA TA --------------------------------- Each factor preprocessing in Income sheets (TR) ----------------------------- SAVE DATA TR --------------------------------- Each factor preprocessing in Cash flow (DIV ISS) ----------------------------- SAVE DATA DIV --------------------------------- ----------------------------- SAVE DATA ISS --------------------------------- Preprocessing element in (stats addstats balsheets income flow) ----------------------------- SAVE DATA stats --------------------------------- ----------------------------- SAVE DATA addstats --------------------------------- For balance sheets 100%|████████████████████████████████████████████████████████████████████| 30/30 [00:01<00:00, 28.21it/s] ----------------------------- SAVE DATA balsheets --------------------------------- For income statements 100%|████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 45.14it/s] ----------------------------- SAVE DATA income --------------------------------- For cash flow 100%|████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 47.32it/s] ----------------------------- SAVE DATA flow ---------------------------------
Conclusion
Let us know how to preprocess stock’s financi statements in index.
If you want to use this code, I’m very sorry that you should change code and make directory for data a little bit.
I should appreciate and refer for many blog on google. Thanks a lot.
If you satisfied this post you should check Github and please Star :)