Getting historical stock quotes and dividend Info using python

The previous post describes getting stock information using python and Yahoo Finance API. This post continues to add more information using the YF API. The additional information focus on historical price trend and dividend information. The dividend information (payout consistency, date etc) are particular useful as they are not easily available for scraping.

The same concept applies here in getting the hist price and dividend information as in the previous post. First is the construction of the respective urls, then use python PATTERN module to download the .csv and finally use Pandas to combine all the information.

The url for the hist price and dividend information are very similar. For the url formation of the hist price, it is as follows:

http://ichart.yahoo.com/table.csv?s=S58.SI&c=2009&a=9&b=23&f=2014&d=9&e=22&g=d&ignore=.csv

The blue part is the stock symbol (only one symbol can be run at a time), the pink and green portion represent the start and end date respectively. The brown portion is the interval in d,m, y. By changing the interval g = v, the dividend information as in the dividend payout at the particular date is given. The url str is as below.

http://ichart.yahoo.com/table.csv?s=S58.SI&c=2009&a=9&b=23&f=2014&d=9&e=22&g=v&ignore=.csv

For the script, the interval is easily set by using the following part of the code. The formation of url will straight away append the hist price url and dividend url in a single function.

    def set_interval_to_retrieve(self, days):
        """ Set the interval (num of days) to retrieve.
            Args:
                days (int): Number of days from current date to retrieve.
        """
        self.date_interval = days
    def calculate_start_and_end_date(self):
        """ Return the start and end (default today) based on the interval range in tuple.
            Returns:
                start_date_tuple : tuple in yyyy mm dd of the past date
                end_date_tuple : tupe in yyyy mm dd of current date today
        """
        ## today date or end date
        end_date_tuple = datetime.date.today().timetuple()[0:3] ## yyyy, mm, dd
        start_date_tuple = (datetime.date.today() - datetime.timedelta(self.date_interval)).timetuple()[0:3]
        return start_date_tuple, end_date_tuple

    def form_hist_quotes_date_interval_portion_url(self):
        """ Form the date interval portion of the url
            Set to self.hist_quotes_date_interval_portion_url
            Note: add the number of the month minus 1.
        """
        start_date_tuple, end_date_tuple = self.calculate_start_and_end_date()

        from_date_url_str = '&c=%s&a=%s&b=%s' %(start_date_tuple[0],start_date_tuple[1]-1, start_date_tuple[2])
        end_date_url_str = '&f=%s&d=%s&e=%s' %(end_date_tuple[0],end_date_tuple[1]-1, end_date_tuple[2])
        interval_str = '&g=d'
        dividend_str = '&g=v'

        self.hist_quotes_date_interval_portion_url = from_date_url_str + end_date_url_str + interval_str
        self.hist_quotes_date_dividend_portion_url = from_date_url_str + end_date_url_str + dividend_str

For the hist stock data part, the current script only retrieve the past 3 days behaviour of a particular stock. It will show whether a stock is continuously rising or falling for the past 3 days. It simply compares the 3 day prices to see if the prices get lower or higher with each coming day. This script is limited in the aspect that it cater for only 3 days running. There is room to improve upon this aspect.

For the dividend part, it is more interesting. It will retrieve information on whether the stock have been continuing giving out dividends every year for the past four years. It will also display the number of times each year the dividends are given out. In addition, it also provides the quarter (calender year) in which the dividends are given out based on past year.

Below are the parts of the code that capture the dividends information. It make uses of the pandas Data frame. First, several columns are added for easy processing. The dates are split to year and month columns for easier date processing. In addition, the dividend months are identified for each payout and classified to specific quarters.

    def insert_yr_mth_col_to_div_df(self):
        """ Insert the year and month of dividend to div df.
            Based on the self.all_stock_div_hist_df["Date"] to get the year and mth str.
            Set back to self.all_stock_div_hist_df
        """
        self.all_stock_div_hist_df['Div_year'] = self.all_stock_div_hist_df['Date'].map(lambda x: int(x[:4]))
        self.all_stock_div_hist_df['Div_mth'] = self.all_stock_div_hist_df['Date'].map(lambda x: int(x[6:7]))

    def insert_dividend_quarter(self):
        """ Insert the dividend quarter. Based on Calender year.
        """

        #combined all the div mth??
        self.all_stock_div_hist_df['Div_1stQuarter'] = self.all_stock_div_hist_df['Div_mth'].isin([1,2,3,])
        self.all_stock_div_hist_df['Div_2ntQuarter'] = self.all_stock_div_hist_df['Div_mth'].isin([4,5,6])
        self.all_stock_div_hist_df['Div_3rdQuarter'] = self.all_stock_div_hist_df['Div_mth'].isin([7,8,9])
        self.all_stock_div_hist_df['Div_4thQuarter'] = self.all_stock_div_hist_df['Div_mth'].isin([10,11,12])

The next part focus on deciding whether the stock has been consistently giving out dividend for the past four years (Need to adjust the date if wish to set to longer periods.). The script will first filter the information so that the data only contain information for past 4 years. Using Pandas Groupby function, it will group the raw data by stock by year. It will count the number of year exist for the stock. If the stock has been giving out dividends yearly, it will count 4 which is one for each year. Using the aggregation “mean”, it will also calculate on average number of times the payout per year.

    def get_num_div_payout_per_year(self):
        """ Get the number of div payout per year, group by symbol and year.
            Exclude the curr year information.
        """
        curr_yr, curr_mth = self.get_cur_year_mth()

        ## exclude the current year as dividend might not have pay out yet and keep within 4 years period
        target_div_hist_df = self.all_stock_div_hist_df[~(self.all_stock_div_hist_df['Div_year']== curr_yr)]
        target_div_hist_df = target_div_hist_df[target_div_hist_df['Div_year']>= curr_yr-4]

        ## get the div payout each year in terms of count
        div_cnt_df =  target_div_hist_df.groupby(['SYMBOL', 'Div_year']).agg("count").reset_index()
        div_payout_df = div_cnt_df.groupby('SYMBOL').agg('mean').reset_index()[['SYMBOL','Dividends']].rename(columns = {'Dividends':'NumDividendperYear'})

        ## get the number of years div pay for 4 year period --4 means every year.
        div_cnt_yr_basis_df = div_cnt_df.groupby('SYMBOL').agg('count').reset_index()[['SYMBOL','Div_year']].rename(columns = {'Div_year':'NumYearPayin4Yr'})

        ## join the data frame
        self.all_stock_consolidated_div_df = pandas.merge(div_payout_df,div_cnt_yr_basis_df, on = 'SYMBOL')

The last part focus on the quarter in which dividend payout resides. It will first filter out data by last year only. Then, it will group the data by Symbol and iterate over the rows to see the four “Div_XQuarter” rows return true. If yes, it will return true for the Div Quarter Column.

    def get_dividend_payout_quarter_df(self):
        """ Get the dividend payout quarter for each stock.
            Based on curr year -1 as guage.
            Append to the self.all_stock_consolidated_div_df
        """
        curr_yr, curr_mth = self.get_cur_year_mth()
        target_div_hist_df = self.all_stock_div_hist_df[(self.all_stock_div_hist_df['Div_year']== curr_yr-1)]
        def check_availiable1(s):
            for n in s.values:
                if n == True:
                    return True
            return False
        target_div_hist_df = target_div_hist_df.groupby('SYMBOL').agg(check_availiable1).reset_index()[['SYMBOL','Div_1stQuarter','Div_2ntQuarter','Div_3rdQuarter','Div_4thQuarter' ]]
        self.all_stock_consolidated_div_df = pandas.merge(self.all_stock_consolidated_div_df,target_div_hist_df, on = 'SYMBOL', how = 'left')

A sample of the output is as below. Some basic information is as followed. For the Stock OV8, it only pays out 2 years in last 4 years and the payout is twice (2nd and 3rd Quarter). The price is on the rise for the past 3 days. S58 is consistently paying out every year (NumYearPayin4yr =4) with payout twice every year. Price is pretty consistent over the last 3 days.

SYMBOL NumDividendperYear NumYearPayin4Yr Div_1stQuarter Div_2ntQuarter \
0 OV8.SI 2 2 False True
1 S58.SI 2 4 True False

Div_3rdQuarter Div_4thQuarter
0 True False
1 True False

SYMBOL Trend_3_days_drop Trend_3_days_rise
0 OV8.SI False True
1 S58.SI False False

The full script can be found at GitHub.