Importing | Utilities | Display | Styles | Time Series |
---|---|---|---|---|
pd_np_mpl_import | css_styling | pretty | get_stylers | timeseries_overview |
url_import | ser | header_text | df_style_string | featurize_dt_index |
yf_import | dfme | head_tail_vert | style_df | add_change_column |
import_all | sp, p, & d | head_tail_horz | fancy_style_df | add_lags |
get_complementary | see & multi | apply_style | boxplot_correlation | |
time conversion | missing_values | get_accuracy | ||
get_daily_error | ||||
Evan Marie online: | EvanMarie.com| EvanMarie@Proton.me | Linked In | GitHub | Hugging Face | Mastadon | Jovian.ai | TikTok | CodeWars | Discord ⇨ ✨ EvanMarie ✨#6114 | |
%%html
<style>
a:link {color: #1E3A4C !important; font-weight: 600 !important;}
a:visited {color: #1E3A4C !important; font-weight: 600 !important;}
a:hover {color: purple !important; font-weight: 600 !important;}
</style>
from helpers import *
import_all()
plt.style.use('dark_blue_greens.mplstyle')
# DEFINE PROJECT COLORS --------------------------------------#
very_dark = "#142733"
medium_dark = "#1E3A4C"
light_main = '#DBE5EB'
light_complementary = '#E1C7C2'
dark_font_color = 'black'
light_font_color = 'white'
# Assigning colors to project variables -----------------------#
pretty_label_text = dark_font_color
pretty_label_background = light_complementary
pretty_background = light_main
pretty_text = dark_font_color
bgcolor = light_main
text_color = dark_font_color
innerbackcolor = medium_dark
outerbackcolor = very_dark
fontcolor = light_font_color
pd_np_mpl_import() | [Return to Top] | |
---|---|---|
Imports Pandas, NumPy, Matplotlib.pylot, Reload, and Seaborn |
def pd_np_mpl_import():
global pd
global np
global plt
global reload
global sns
pd = __import__('pandas', globals(), locals())
np = __import__('numpy', globals(), locals())
matplotlib = __import__('matplotlib', globals(), locals())
plt = matplotlib.pyplot
sns = __import__('seaborn', globals(), locals())
importlib = __import__('importlib', globals(), locals())
reload = importlib.reload
url_import() | [Return to Top] | |
---|---|---|
Imports urlretrieve() |
def url_import():
global urlretrieve
urllib = __import__('urllib', globals(), locals())
urlretrieve = urllib.request.urlretrieve
yf_import() | [Return to Top] | |
---|---|---|
Imports the Yahoo Finance, yfinance, module |
def yf_import():
global yf
yf = __import__('yfinance', globals(), locals())
import_all() | [Return to Top] | |
---|---|---|
Imports whichever of the previously mentioned imports are contained within it so that most of my importing is simply importing the helpers file and running this function |
def import_all():
pd_np_mpl_import()
url_import()
css_styling() | [Return to Top] | |
---|---|---|
Imports the custom_styles.css file to be used with the notebook |
def css_styling():
from IPython.core.display import HTML
styles = open("custom_style.css", "r").read()
return HTML(styles)
ser() | [Return to Top] | |
---|---|---|
Converts the data passed to pd.Series(), just a shortcut version for pd.Series() |
def ser(x):
return pd.Series(x)
my_list = [1, 2, 3, 4, 5]
my_series = ser(my_list)
my_series
0 1 1 2 2 3 3 4 4 5 dtype: int64
dfme() | [Return to Top] | |
---|---|---|
Converts the data passed to pd.DataFrame(), just a shortcut version for pd.DataFrame() |
def dfme(x):
return pd.DataFrame(x)
my_df = dfme(my_series)
my_df
0 | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
sp(), p(), & d() | [Return to Top] | |
---|---|---|
Three more shortcut functions: - sp() - makes an empty row of space - p() - shortcut for print() - d() - shortcut for IPython display() |
def sp(): print('');
def p(x): print(x); sp()
def d(x): display(x); sp()
sp()
p('I print things.')
d(my_df)
I print things.
0 | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
get_complementary() | [Return to Top] | |
---|---|---|
Pass a hexcode for a color as a string and get the hexcode string for the complementary color. |
def get_complementary(color):
color = color[1:]
color = int(color, 16)
comp_color = 0xFFFFFF ^ color
comp_color = "#%06X" % comp_color
return comp_color
get_complementary('#FF0000') # hexcode for red
'#00FFFF'
header_text('Whaddup, World?!',
bgcolor = '#FF0000',
text_color = '#00FFFF')
Whaddup, World?! |
Timestamp Conversion Functions | [Return to Top] | |
---|---|---|
These functions are all utilized within the display functions so that when working with time series data whose timestamps include intraday values, a user can specify whether or not to show just the day, intraday = False, which is the default, or to show the entire timestamp data, intraday = True |
def time_stamp_converter(df):
def find_datetime(df):
datetime_cols = []
for col in df.columns:
if df[col].dtype.name.startswith('datetime64'):
datetime_cols.append(col)
return datetime_cols
def find_timestamp(df):
timestamp_cols = []
for col in df.columns:
if isinstance(df[col][0], pd._libs.tslibs.timestamps.Timestamp):
timestamp_cols.append(col)
return timestamp_cols
datetime_cols = find_datetime(df)
for col in datetime_cols:
df[col] = df[col].dt.strftime('%Y-%m-%d')
return df
timestamp_cols = find_timestamp(df)
for col in timestamp_cols:
df[col] = df[col].dt.strftime('%Y-%m-%d')
return df
# ............................................................................#
def date_only(data, intraday=False):
if intraday == False:
if data.index.dtype == 'datetime64[ns]':
data.index = data.index.strftime('%Y-%m-%d')
return data
elif isinstance(data.index[0], pd._libs.tslibs.timestamps.Timestamp):
data.index = data.index.strftime('%Y-%m-%d')
return data
else:
return data
else:
return data
# ............................................................................#
def force_df(data, intraday=False):
if isinstance(data, pd.core.series.Series):
return date_only(data, intraday=intraday).to_frame()
elif isinstance(data, pd.core.frame.DataFrame):
return date_only(data, intraday=intraday)
else:
try:
return pd.Series(data).to_frame()
except:
return header_text("The data cannot be displayed.", fontsize='18px')
pretty() | [Return to Top] | |
---|---|---|
Displays non-series and non-df data in a clean and attractive way. Pass, for example, array data followed by a label, and it will print the data with a header label as passed. |
def pretty(data, label=None, fontsize='15px',
bgcolor=pretty_background,
textcolor=pretty_text, width=None
):
from IPython.display import HTML
import numpy as np
if isinstance(data, np.ndarray):
data = list(data)
if label:
output_df = pd.DataFrame([label, data])
else:
output_df = pd.DataFrame([[data]])
if label:
df_styler = (
[{'selector': '.row0',
'props': [('background-color', pretty_label_background),
('color', pretty_label_text),
('font-size', fontsize),
('font-weight', 550),
('text-align', 'left'),
('padding', '3px 10px 3px 10px')]},
{'selector': '.row1',
'props': [('background-color', pretty_background),
('color', pretty_text),
('font-size', fontsize),
('font-weight', 'bold'),
('text-align', 'left'),
('padding', '3px 10px 3px 10px')]},
{'selector': 'tbody',
'props': [('border', '1px solid'),
('border-color', 'black')]},
{'selector': 'tr',
'props': [('border', '0.8px solid'),
('border-color', 'black')]}])
else:
df_styler = (
[{'selector': '.row0',
'props': [('background-color', pretty_background),
('color', pretty_text),
('font-size', '15px'),
('font-weight', 'bold'),
('text-align', 'left'),
('padding', '3px 2px 5px 5px')]},
{'selector': 'tbody',
'props': [('border', '1px solid'),
('border-color', 'black')]},
{'selector': 'tr',
'props': [('border', '0.8px solid'),
('border-color', 'black')]}])
display(output_df.style.hide(axis='index') \
.hide(axis='columns') \
.set_table_styles(df_styler))
sp()
pretty(my_list, 'This is my list:')
This is my list: |
[1, 2, 3, 4, 5] |
an_array = np.arange(4)
pretty(an_array, 'I am an array:')
I am an array: |
[0, 1, 2, 3] |
header_text() | [Return to Top] | |
---|---|---|
This function can be used to output any string data as a header for other output data to help organize a notebook's outputs. It is used within many other display functions to create the headers/titles for the data they are displaying. |
def header_text(text, width=None, bgcolor=bgcolor, text_color=text_color,
fontsize='15px'
):
from IPython.display import HTML
if not width:
font_height = int(fontsize[:-2])
width = font_height * 25
width = str(width) + "px"
else:
if type(width) != str:
width = str(width)
if width[-1] == "x":
width = width
else:
width = str(width) + "px"
df_styler = (
[{'selector': 'tr',
'props': [('background-color', bgcolor),
('color', text_color),
('font-size', fontsize),
('font-weight', '550'),
('text-align', 'left'),
('padding', '3px 30px 3px 50px')]},
{'selector': 'td',
'props': [('padding', '5px 10px 5px 10px'),
('width', width),
('text-align', 'center')]}])
out_df = pd.DataFrame([text]).style.hide(axis='index') \
.hide(axis='columns') \
.set_table_styles(df_styler)
return display(HTML('<center>' + out_df.to_html()))
header_text('I am a beautiful example of a header!',
fontsize = '20px',
bgcolor = 'deeppink',
text_color = 'white')
I am a beautiful example of a header! |
describe_em() | [Return to Top] | |
---|---|---|
Outputs the describe() data individually for all column names pass in the col_list. |
def describe_em(df, col_list, title = None, fontsize = '15px'):
df_list = []
if title:
header_text(title, fontsize = fontsize)
for column in col_list:
df_tuple = (df[column].describe(), 'df.' + column)
df_list.append(df_tuple)
multi(df_list)
poke_df = pd.read_csv('https://raw.githubusercontent.com/KeithGalli/pandas/master/pokemon_data.csv').head(100)
describe_em(poke_df, ['Attack', 'Defense', 'Speed'], 'Stats for Pokemon')
Stats for Pokemon |
Attack | |
---|---|
count | 100.00 |
mean | 71.69 |
std | 24.98 |
min | 20.00 |
25% | 52.00 |
50% | 70.00 |
75% | 86.25 |
max | 150.00 |
Defense | |
---|---|
count | 100.00 |
mean | 64.86 |
std | 28.98 |
min | 15.00 |
25% | 45.00 |
50% | 60.00 |
75% | 78.00 |
max | 180.00 |
Speed | |
---|---|
count | 100.00 |
mean | 68.63 |
std | 28.19 |
min | 15.00 |
25% | 45.00 |
50% | 70.00 |
75% | 90.00 |
max | 150.00 |
head_tail_vert() | [Return to Top] | |
---|---|---|
Displays the head and tail portions of a passed df according to the num passed and includes a header/title description for the data being displayed for easy notebook navigation and data comprehension. This function will display the head and tail vertically on top of one another. |
def head_tail_vert(df, num, title, bgcolor=bgcolor,
text_color=text_color, fontsize='18px',
intraday=False):
from IPython.core.display import HTML
if type(df) != pd.core.frame.DataFrame:
df = df.copy().to_frame()
if not intraday:
df = time_stamp_converter(df.copy())
if df.index.dtype.name.startswith('datetime64'):
df.index = df.index.strftime('%Y-%m-%d')
elif isinstance(df.index[0], pd._libs.tslibs.timestamps.Timestamp):
df.index = df.index.strftime('%Y-%m-%d')
head_data = "<center>" + df.head(num).to_html()
tail_data = "<center>" + df.tail(num).to_html()
print("")
header_text(f'{title}: head({num})', fontsize=fontsize,
bgcolor=bgcolor, text_color=text_color)
display(HTML(head_data))
print("")
header_text(f'{title}: tail({num})', fontsize=fontsize,
bgcolor=bgcolor, text_color=text_color)
display(HTML(tail_data))
print("")
head_tail_vert(poke_df, 5, 'Pokemon DF with head_tail_vert()')
Pokemon DF with head_tail_vert(): head(5) |
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
Pokemon DF with head_tail_vert(): tail(5) |
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
95 | 88 | Grimer | Poison | NaN | 80 | 80 | 50 | 40 | 50 | 25 | 1 | False |
96 | 89 | Muk | Poison | NaN | 105 | 105 | 75 | 65 | 100 | 50 | 1 | False |
97 | 90 | Shellder | Water | NaN | 30 | 65 | 100 | 45 | 25 | 40 | 1 | False |
98 | 91 | Cloyster | Water | Ice | 50 | 95 | 180 | 85 | 45 | 70 | 1 | False |
99 | 92 | Gastly | Ghost | Poison | 30 | 35 | 30 | 100 | 35 | 80 | 1 | False |
head_tail_horz() | [Return to Top] | |
---|---|---|
Displays the head and tail portions of a passed df according to the num passed and includes a header/title description for the data being displayed for easy notebook navigation and data comprehension. This function will display the head and tail horizontally beside one another. If the current window size is too small, it will stack them on top of one another and display identically to head_tail_vert(). |
def head_tail_horz(df, num, title, bgcolor=bgcolor,
text_color=text_color, precision=2,
intraday=False, title_fontsize='18px',
table_fontsize="12px"):
if type(df) != pd.core.frame.DataFrame:
df = pd.DataFrame(df.copy())
if not intraday:
df = time_stamp_converter(df.copy())
if df.index.dtype.name.startswith('datetime64'):
df.index = df.index.strftime('%Y-%m-%d')
elif isinstance(df.index[0], pd._libs.tslibs.timestamps.Timestamp):
df.index = df.index.strftime('%Y-%m-%d')
header_text(f'{title}', fontsize=title_fontsize,
bgcolor=bgcolor, text_color=text_color)
multi([(df.head(num),f"head({num})"),
(df.tail(num),f"tail({num})")],
fontsize=table_fontsize, precision=precision,
intraday=intraday)
head_tail_horz(poke_df[['Name', 'Type 1', 'Attack', 'Defense']],
5, 'Pokemon DF Selected Columns with head_tail_horz')
Pokemon DF Selected Columns with head_tail_horz |
Name | Type 1 | Attack | Defense | |
---|---|---|---|---|
0 | Bulbasaur | Grass | 49 | 49 |
1 | Ivysaur | Grass | 62 | 63 |
2 | Venusaur | Grass | 82 | 83 |
3 | VenusaurMega Venusaur | Grass | 100 | 123 |
4 | Charmander | Fire | 52 | 43 |
Name | Type 1 | Attack | Defense | |
---|---|---|---|---|
95 | Grimer | Poison | 80 | 50 |
96 | Muk | Poison | 105 | 75 |
97 | Shellder | Water | 65 | 100 |
98 | Cloyster | Water | 95 | 180 |
99 | Gastly | Ghost | 35 | 30 |
see() | [Return to Top] | |
---|---|---|
Used for displaying any dataframe or series data along with a descriptive header / title. |
def see(data, title=None, width="auto", fontsize='18px',
bgcolor=bgcolor, text_color=text_color, intraday=False):
if title != None:
header_text(f"{title}", fontsize=fontsize, width=width,
bgcolor=bgcolor, text_color=text_color)
if isinstance(data, pd.core.frame.DataFrame):
if not intraday:
data = time_stamp_converter(data.copy())
if data.index.dtype.name.startswith('datetime64'):
data.index = data.index.strftime('%Y-%m-%d')
elif isinstance(data.index[0], pd._libs.tslibs.timestamps.Timestamp):
data.index = data.index.strftime('%Y-%m-%d')
display(HTML("<center>" + data.to_html()));
sp()
elif isinstance(data, pd.core.series.Series):
if data.index.dtype.name.startswith('datetime64'):
data.index = data.index.strftime('%Y-%m-%d')
elif isinstance(data.index[0], pd._libs.tslibs.timestamps.Timestamp):
data.index = data.index.strftime('%Y-%m-%d')
display(HTML("<center>" + data.to_frame().to_html()));
sp()
else:
try:
display(HTML("<center>" + data.to_frame().to_html()));
sp()
except:
pretty(data, title);
sp()
see(poke_df.sample(5), '5 Random Samples from Pokemon DF')
5 Random Samples from Pokemon DF |
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
86 | 80 | Slowbro | Water | Psychic | 95 | 75 | 110 | 100 | 80 | 30 | 1 | False |
41 | 36 | Clefable | Fairy | NaN | 95 | 70 | 73 | 95 | 90 | 60 | 1 | False |
15 | 12 | Butterfree | Bug | Flying | 60 | 45 | 50 | 90 | 80 | 70 | 1 | False |
35 | 30 | Nidorina | Poison | NaN | 70 | 62 | 67 | 55 | 55 | 56 | 1 | False |
multi() | [Return to Top] | |
---|---|---|
This function takes a list of tuples of dataframes or series and their corresponding headers/titles, and displays them side by side, horizontally for easier data comparison. |
def multi(data_list, fontsize='15px', precision=2, intraday=False):
from IPython.display import display_html
caption_style = [{
'selector': 'caption',
'props': [
('background', bgcolor),
('border-radius', '3px'),
('padding', '5px'),
('color', text_color),
('font-size', fontsize),
('font-weight', 'bold')]}]
thousands = ",";
spaces = " "
table_styling = caption_style
stylers = []
for idx, pair in enumerate(data_list):
if len(pair) == 2:
table_attribute_string = "style='display:inline-block'"
elif pair[2] == 'center':
table_attribute_string = "style='display:inline-grid'"
styler = force_df(data_list[idx][0], intraday=intraday).style \
.set_caption(data_list[idx][1]) \
.set_table_attributes(table_attribute_string) \
.set_table_styles(table_styling).format(precision=precision,
thousands=thousands)
stylers.append(styler)
if len(stylers) == 1:
display_html('<center>' + stylers[0]._repr_html_(), raw=True); sp();
elif len(stylers) == 2:
display_html('<center>' + stylers[0]._repr_html_() + spaces + stylers[1]._repr_html_() + spaces, raw=True); sp();
elif len(stylers) == 3:
display_html('<center>' + stylers[0]._repr_html_() + spaces + stylers[1]._repr_html_() + spaces + stylers[
2]._repr_html_() + spaces, raw=True); sp();
elif len(stylers) == 4:
display_html('<center>' + stylers[0]._repr_html_() + spaces + stylers[1]._repr_html_() + spaces + stylers[
2]._repr_html_() + spaces + stylers[3]._repr_html_() + spaces, raw=True); sp();
multi([(poke_df.head(5)[['Name', 'Defense']], 'Pokemon DF 1'),
(my_df, 'My DF'),
(poke_df.head(5)[['Name', 'Attack']], 'Pokemon DF 2')])
Name | Defense | |
---|---|---|
0 | Bulbasaur | 49 |
1 | Ivysaur | 63 |
2 | Venusaur | 83 |
3 | VenusaurMega Venusaur | 123 |
4 | Charmander | 43 |
0 | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
Name | Attack | |
---|---|---|
0 | Bulbasaur | 49 |
1 | Ivysaur | 62 |
2 | Venusaur | 82 |
3 | VenusaurMega Venusaur | 100 |
4 | Charmander | 52 |
missing_values() | [Return to Top] | |
---|---|---|
Calculates all the missing values in a dataframe and displays the column names and how many missing values each has. |
def missing_values(df, bgcolor=bgcolor, text_color=text_color, fontsize='18px'):
from IPython.display import HTML
pd.options.display.float_format = '{:,.0f}'.format
missing_log = []
for column in df.columns:
missing_values = df[column].isna().sum()
missing_log.append([column, missing_values])
missing = pd.DataFrame(missing_log, columns=['column name', 'missing'])
header_text(f'Columns and Missing Values', fontsize=fontsize,
bgcolor=bgcolor, text_color=text_color)
missing = "<center>" + missing.to_html()
display(HTML(missing))
missing_values(poke_df)
Columns and Missing Values |
column name | missing | |
---|---|---|
0 | # | 0 |
1 | Name | 0 |
2 | Type 1 | 0 |
3 | Type 2 | 49 |
4 | HP | 0 |
5 | Attack | 0 |
6 | Defense | 0 |
7 | Sp. Atk | 0 |
8 | Sp. Def | 0 |
9 | Speed | 0 |
10 | Generation | 0 |
11 | Legendary | 0 |
get_stylers() | [Return to Top] | |
---|---|---|
This function is used with style_df() to convert a style dictionary into a styler that will be used to display a dataframe. |
def get_stylers(styles_dictionary):
df_stylers = []
for style, styling in styles_dictionary.items():
styler = {'selector': '', 'props': []}
styler['selector'] = style
styler['props'] = styling
df_stylers.append(styler)
return df_stylers
style_input = {'tr': 'background-color: white; color: blue;',
'th': 'background-color: blue; color: white;'}
poke_df[['Name', 'HP', 'Attack', 'Defense']].sample(5)\
.style.set_table_styles(get_stylers(style_input))
Name | HP | Attack | Defense | |
---|---|---|---|---|
87 | SlowbroMega Slowbro | 95 | 75 | 180 |
9 | Squirtle | 44 | 48 | 65 |
37 | Nidoran (Male) | 46 | 57 | 40 |
6 | Charizard | 78 | 84 | 78 |
90 | Farfetch'd | 52 | 65 | 55 |
df_style_string() | [Return to Top] | |
---|---|---|
Pass a dataframe name as a string, and this returns the style string needed for styling a dataframe, to which the df_stylers can be passed. It eliminates the index and column axes for clean display. The output string can then be used in other code to style a dataframe. Note: the hiding of the index and columns can be configured after cutting and pasting the output string. |
def df_style_string(df_name):
return f"{df_name}.style.hide(axis = 'index').hide(axis = 'columns').set_table_styles(df_stylers)"
tiny_poke_df = poke_df[['Name', 'HP', 'Attack', 'Defense']].sample(5)
df_style_string('tiny_poke_df')
"tiny_poke_df.style.hide(axis = 'index').hide(axis = 'columns').set_table_styles(df_stylers)"
tiny_poke_df.style.hide(axis = 'index')\
.set_table_styles(get_stylers(style_input))
Name | HP | Attack | Defense |
---|---|---|---|
Magnemite | 25 | 35 | 70 |
Spearow | 40 | 60 | 30 |
Mankey | 40 | 80 | 35 |
Tentacruel | 80 | 70 | 65 |
Pidgeotto | 63 | 60 | 55 |
style_df() | [Return to Top] | |
---|---|---|
Takes a dataframe and a styles dictionary and returns the dataframe with the styles passed. - style.Styler.set_table_styles() documentation - Pandas table visualization documentation |
def style_df(df,
styles_dictionary,
hide_index=True,
hide_columns=True,
precision=2,
thousands=','
):
df_stylers = get_stylers(styles_dictionary)
formatting = {precision:precision,
thousands:thousands}
if hide_index == False & hide_columns == False:
return df.set_table_styles(df_stylers).format(precision = precision,
thousands = thousands)
elif hide_index & hide_columns:
return df.style.hide(axis='index') \
.hide(axis='columns') \
.set_table_styles(df_stylers).format(precision = precision,
thousands = thousands)
elif hide_index:
return df.style.hide(axis='index') \
.set_table_styles(df_stylers).format(precision = precision,
thousands = thousands)
else:
return df.style.hide(axis='columns') \
.set_table_styles(df_stylers).format(precision = precision,
thousands = thousands)
styles_dictionary = {'tr': 'color: white; background-color: red;',
'td': 'width: 125px;',
'tbody': 'border: 2px solid white'}
style_df(tiny_poke_df, styles_dictionary, hide_columns = False)
Name | HP | Attack | Defense |
---|---|---|---|
Magnemite | 25 | 35 | 70 |
Spearow | 40 | 60 | 30 |
Mankey | 40 | 80 | 35 |
Tentacruel | 80 | 70 | 65 |
Pidgeotto | 63 | 60 | 55 |
fancy_style_df() | [Return to Top] | |
---|---|---|
Utilizes the various Pandas df stylers and combines them into one function to produce the following options: (see the documentation for each below for more details on usage) - background_gradient - bars (a bar graph for each value) - highlight_max - highlight_min - highlight_null - highlight_between - highlight_quantile |
def fancy_style_df(df, background_gradient = False,
bars = False,
highlight_max = False,
highlight_min = False,
highlight_null = False,
highlight_between = False,
highlight_quantile = False, **kwargs):
if background_gradient:
return df.style.background_gradient(**kwargs)
elif bars:
return df.style.bar(**kwargs)
elif highlight_max:
return df.style.highlight_max(**kwargs)
elif highlight_min:
return df.style.highlight_min(**kwargs)
elif highlight_null:
return df.style.highlight_null(**kwargs)
elif highlight_between:
return df.style.highlight_between(**kwargs)
elif highlight_quantile:
return df.style.highlight_quantile(**kwargs)
else:
return df.style
# DF with no styling |
---|
pd.options.display.float_format = '{:,.3f}'.format
random_df = dfme(np.random.randn(5, 5))
random_df.style.format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
# DF with background_gradient |
---|
fancy_style_df(random_df,
background_gradient = True,
cmap = 'RdBu').format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
# DF with bars |
---|
fancy_style_df(random_df,
bars = True,
align=0,
vmin=-2,
vmax=2,
cmap="RdYlGn",
height=50,
width=60,
props="width:120px; border-right: 1px solid black;")\
.format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
# DF with highlight_max, axis = 1 |
---|
fancy_style_df(random_df,
highlight_max = True,
axis = 1,
props = 'font-weight: 600;\
color: deeppink;\
background-color: cyan').format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
# DF with highlight_max, axis = 0 |
---|
fancy_style_df(random_df,
highlight_max = True,
axis = 0,
props = 'font-weight: 600;\
color: deeppink;\
background-color: cyan').format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
# DF with highlight_min, axis = 1 |
---|
fancy_style_df(random_df,
highlight_min = True,
axis = 1,
props = 'font-weight: 600;\
color: red;\
background-color: yellow').format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
# DF with highlight_min, axis = 0 |
---|
fancy_style_df(random_df,
highlight_min = True,
axis = 0,
props = 'font-weight: 600;\
color: red;\
background-color: yellow').format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
# DF with highlight_null |
---|
fancy_style_df(poke_df[['Name', 'Type 2', 'HP', 'Attack']].sample(6),
highlight_null = True,
props = 'font-weight: 600;\
color: yellow;\
background-color: blue')
Name | Type 2 | HP | Attack | |
---|---|---|---|---|
53 | Venonat | Poison | 60 | 55 |
52 | Parasect | Grass | 60 | 95 |
1 | Ivysaur | Poison | 60 | 62 |
64 | Arcanine | nan | 90 | 110 |
10 | Wartortle | nan | 59 | 63 |
89 | Magneton | Steel | 50 | 60 |
# DF with highlight_ between, range 0.2 - 1, subset of columns |
---|
fancy_style_df(random_df,
highlight_between = True,
subset = [0, 2, 4],
left = -0.2,
right = 1,
props = 'font-weight: 600;\
color: yellow;\
background-color: purple').format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
# DF with highlight_quantile, range 0.0002 - 0.5 |
---|
fancy_style_df(random_df,
highlight_quantile = True,
q_left = 0.0002,
q_right = 0.5,
props = 'font-weight: 600;\
color: cyan;\
background-color: darkblue').format(precision = 3)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.208 | -1.640 | 2.787 | 1.106 | 0.618 |
1 | -0.364 | -3.042 | 0.042 | -1.275 | -0.511 |
2 | 0.905 | -1.294 | -0.354 | -0.416 | 0.482 |
3 | -0.197 | 1.350 | -0.195 | -0.424 | -0.187 |
4 | -1.178 | 1.090 | 1.431 | 0.248 | -1.660 |
apply_style() | [Return to Top] | |
---|---|---|
Applies a style function to a passed df and returns the styled df. |
def apply_style(df, style_function, **kwargs):
return df.style.applymap(style_function, **kwargs)
# apply_style() with a defined function |
---|
def style_negative(value, props=''):
return props if (value < 0) else None
apply_style(random_df, style_negative, props = 'color: blue;\
font-weight: 600;\
background-color: orange')
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.207984 | -1.640030 | 2.787219 | 1.105597 | 0.617784 |
1 | -0.364321 | -3.041952 | 0.041501 | -1.275126 | -0.510988 |
2 | 0.905161 | -1.294120 | -0.354169 | -0.415611 | 0.481940 |
3 | -0.197128 | 1.349801 | -0.194505 | -0.424056 | -0.187124 |
4 | -1.178297 | 1.089540 | 1.430637 | 0.248058 | -1.659592 |
apply_style() with a lambda function |
---|
apply_style(random_df,
lambda value: 'color: red; font-weight: 600;\
background-color: yellow'\
if (value < 1) and (value > 0.2) else None)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | -1.207984 | -1.640030 | 2.787219 | 1.105597 | 0.617784 |
1 | -0.364321 | -3.041952 | 0.041501 | -1.275126 | -0.510988 |
2 | 0.905161 | -1.294120 | -0.354169 | -0.415611 | 0.481940 |
3 | -0.197128 | 1.349801 | -0.194505 | -0.424056 | -0.187124 |
4 | -1.178297 | 1.089540 | 1.430637 | 0.248058 | -1.659592 |
timeseries_overview() | [Return to Top] | |
---|---|---|
Produces an overview of timeseries data and important information about the key metric column passed. |
def timeseries_overview(df, metric_col):
index_col = ['total records', 'start date', 'end date', 'total columns',
'column labels', 'total missing values', f'{metric_col} average',
f'{metric_col} std', f'{metric_col} min', f'{metric_col} 25%',
f'{metric_col} 50%', f'{metric_col} 75%', f'{metric_col} max', ]
num_records, num_cols = df.shape
missing_values = df.isna().sum().sum()
columns = ', '.join(list(df.columns))
start_date = min(df.index).strftime('%m/%d/%Y')
end_date = max(df.index).strftime('%m/%d/%Y')
[metric_average, metric_std, metric_min, metric_25th,
metric_50th, metric_75th, metric_max] = df.describe().iloc[1:][metric_col]
values = [num_records, start_date, end_date, num_cols, columns, missing_values,
metric_average, metric_std, metric_min, metric_25th, metric_50th,
metric_75th, metric_max]
overview = pd.concat([pd.Series(index_col), pd.Series(values)], axis=1)
overview.columns = [' ', 'measurement'];
overview.set_index(' ')
styling = {'measurement': [{'selector': '',
'props': [('font-size', '15px'),
('text-align', 'left'),
('padding-right', '15px'),
('padding-left', '55px')]}],
' ': [{'selector': '',
'props': [('font-weight', 'bold'),
('text-align', 'left'),
('font-size', '15px'),
('padding-right', '15px'),
('padding-left', '15px')]}]}
pretty(f'DataFrame Overview | Primary Metric: {metric_col}', fontsize='18px')
return overview.style \
.hide(axis='index') \
.set_table_styles(styling) \
.format(precision=0, thousands=",")
# data for this section will be energy consumption in megawatts in the UK from 2009 through 2022 |
---|
data = pd.read_csv('https://mydatabucky.s3.amazonaws.com/uk_power_consumption.csv',
parse_dates = ['settlement_date'])
data = data[['settlement_date', 'tsd']]
data.columns = ['datetime', 'consumption']
data = data.set_index('datetime', drop=True)
timeseries_overview(data, 'consumption')
DataFrame Overview | Primary Metric: consumption |
measurement | |
---|---|
total records | 244,676 |
start date | 01/01/2009 |
end date | 12/31/2022 |
total columns | 1 |
column labels | consumption |
total missing values | 0 |
consumption average | 33,342 |
consumption std | 7,596 |
consumption min | 16,629 |
consumption 25% | 27,313 |
consumption 50% | 32,548 |
consumption 75% | 38,800 |
consumption max | 60,147 |
add_change_column() | [Return to Top] | |
---|---|---|
Adds a column to time series data for the change from one timestamp to another for the given column. |
def add_change_column(df, column_changing, new_col_name):
df['previous'] = df[column_changing].shift()
df = df.drop(df.index[0])
df[new_col_name] = df[column_changing] - df.previous
df = df.drop(columns = ['previous'])
return df
data_change = add_change_column(data.copy(), 'consumption', 'change')
pretty('change column added, consumption decrease shown in green')
apply_style(data_change.head(5),
style_negative,
props = 'background-color: lime;\
font-weight: 600')
change column added, consumption decrease shown in green |
consumption | change | |
---|---|---|
datetime | ||
2009-01-01 00:30:00 | 38964 | 260.000000 |
2009-01-01 01:00:00 | 38651 | -313.000000 |
2009-01-01 01:30:00 | 37775 | -876.000000 |
2009-01-01 02:00:00 | 37298 | -477.000000 |
2009-01-01 02:30:00 | 37135 | -163.000000 |
featurize_dt_index() | [Return to Top] | |
---|---|---|
Creates time series features from a datetime index. |
def featurize_dt_index(df, daytime=True):
''' Create time series features based on a datetime index '''
df = df.copy()
df['hour'] = df.index.hour
df['weekday'] = df.index.dayofweek
df['weekday_name'] = df.index.strftime('%A')
df['month'] = df.index.month
df['month_name'] = df.index.strftime('%B')
df['quarter'] = df.index.quarter
df['year'] = df.index.year
df['week_of_year'] = df.index.isocalendar().week
df['day_of_year'] = df.index.dayofyear
if daytime:
# Add column with category for time of day:
# midnight, early_morning, late_morning, afternoon, evening, night
def time_of_day(hour):
if hour >= 0 and hour < 6:
return 'midnight'
elif hour >= 6 and hour < 9:
return 'early_morning'
elif hour >= 9 and hour < 12:
return 'late_morning'
elif hour >= 12 and hour < 15:
return 'afternoon'
elif hour >= 15 and hour < 18:
return 'evening'
else:
return 'night'
df['time_of_day'] = (df['hour'].apply(time_of_day)).astype('category')
df['weekday_name'] = df['weekday_name'].astype('category')
df['month_name'] = df['month_name'].astype('category')
df['week_of_year'] = df.week_of_year.astype(float)
return df
# featurize_dt_index, daytime = False |
---|
data_featurized = featurize_dt_index(data_change, daytime = False)
see(data_featurized.head(5),
'Time Series Data with Features Added, No Time of Day')
Time Series Data with Features Added, No Time of Day |
consumption | change | hour | weekday | weekday_name | month | month_name | quarter | year | week_of_year | day_of_year | |
---|---|---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||||
2009-01-01 | 38964 | 260.000 | 0 | 3 | Thursday | 1 | January | 1 | 2009 | 1 | 1 |
2009-01-01 | 38651 | -313.000 | 1 | 3 | Thursday | 1 | January | 1 | 2009 | 1 | 1 |
2009-01-01 | 37775 | -876.000 | 1 | 3 | Thursday | 1 | January | 1 | 2009 | 1 | 1 |
2009-01-01 | 37298 | -477.000 | 2 | 3 | Thursday | 1 | January | 1 | 2009 | 1 | 1 |
2009-01-01 | 37135 | -163.000 | 2 | 3 | Thursday | 1 | January | 1 | 2009 | 1 | 1 |
# featurize_dt_index() with daytime = True |
---|
data_featurized = featurize_dt_index(data_change, daytime = True)
see(data_featurized.sample(5),
'Time Series Data with Features Added (including Time of Day)')
Time Series Data with Features Added (including Time of Day) |
consumption | change | hour | weekday | weekday_name | month | month_name | quarter | year | week_of_year | day_of_year | time_of_day | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | ||||||||||||
2016-06-10 | 26026 | -1,589.000 | 23 | 4 | Friday | 6 | June | 2 | 2016 | 23.000 | 162 | night |
2016-09-24 | 25735 | -290.000 | 11 | 5 | Saturday | 9 | September | 3 | 2016 | 38.000 | 268 | late_morning |
2017-09-23 | 21744 | 94.000 | 4 | 5 | Saturday | 9 | September | 3 | 2017 | 38.000 | 266 | midnight |
2019-07-14 | 25758 | -416.000 | 13 | 6 | Sunday | 7 | July | 3 | 2019 | 28.000 | 195 | afternoon |
2018-07-22 | 20846 | 57.000 | 2 | 6 | Sunday | 7 | July | 3 | 2018 | 29.000 | 203 | midnight |
see(data_featurized.time_of_day.value_counts(), 'Time of Day Value Counts')
Time of Day Value Counts |
time_of_day | |
---|---|
midnight | 61175 |
night | 61148 |
afternoon | 30588 |
early_morning | 30588 |
evening | 30588 |
late_morning | 30588 |
see(data_featurized.weekday_name.value_counts(), 'Weekday Name Value Counts')
Weekday Name Value Counts |
weekday_name | |
---|---|
Thursday | 35039 |
Friday | 34992 |
Monday | 34944 |
Saturday | 34944 |
Tuesday | 34944 |
Wednesday | 34944 |
Sunday | 34868 |
see(data_featurized.month_name.value_counts(), 'Month Name Value Counts')
Month Name Value Counts |
month_name | |
---|---|
December | 20832 |
January | 20831 |
March | 20804 |
October | 20784 |
August | 20736 |
May | 20688 |
July | 20640 |
April | 20160 |
November | 20160 |
September | 20160 |
June | 19920 |
February | 18960 |
add_lags() | [Return to Top] | |
---|---|---|
Takes a dataframe and a list of lags and their column labels and creates lag features for the given lags and the key metric passed. |
def add_lags(df, target_column, lag_label_list):
target_map = df[target_column].to_dict()
inputs = lag_label_list
for tup in inputs:
df[tup[1]] = (df.index - pd.Timedelta(tup[0])).map(target_map)
return df
lag_list = [('364 days', 'one_year'),
('728 days', 'two_year'),
('1092 days', 'three_year')]
data_lags = add_lags(data_change.copy(), 'consumption', lag_list)
header_text('1, 2, and 3 year lags have been added')
data_lags.tail(5).style.format(precision = 3, thousands = ',')
1, 2, and 3 year lags have been added |
consumption | change | one_year | two_year | three_year | |
---|---|---|---|---|---|
datetime | |||||
2022-12-31 21:30:00 | 25,634 | -887.000 | 25,379.000 | 33,008.000 | 31,401.000 |
2022-12-31 22:00:00 | 24,788 | -846.000 | 24,955.000 | 31,426.000 | 29,807.000 |
2022-12-31 22:30:00 | 24,365 | -423.000 | 24,501.000 | 30,019.000 | 28,423.000 |
2022-12-31 23:00:00 | 24,766 | 401.000 | 24,648.000 | 28,432.000 | 26,996.000 |
2022-12-31 23:30:00 | 24,843 | 77.000 | 24,351.000 | 27,535.000 | 26,017.000 |
missing_values(data_lags)
Columns and Missing Values |
column name | missing | |
---|---|---|
0 | consumption | 0 |
1 | change | 0 |
2 | one_year | 18098 |
3 | two_year | 35380 |
4 | three_year | 52854 |
boxplot_correlation() | [Return to Top] | |
---|---|---|
Creates a boxplot for viewing the correlation between a variable and the key metric in time series data. |
def boxplot_correlation(df, feature_x, feature_y, order=None, palette=None):
import seaborn as sns
fig, ax = plt.subplots(figsize=(13, 7), facecolor=outerbackcolor)
ax.set_facecolor(innerbackcolor)
sns.boxplot(data=df,
x=feature_x,
y=feature_y,
order=order,
palette=palette)
x_name = str(df[feature_x].name)
y_name = str(df[feature_y].name)
ax.grid()
plt.xlabel(x_name, color='white', fontsize=15)
plt.ylabel(y_name, color='white', fontsize=15)
plt.xticks(color='white');
plt.yticks(color='white');
plt.title(f'Feature Correlation: {x_name.capitalize()} - {y_name.capitalize()}',
fontsize=20, pad=20, color='white');
# boxplot_correlation: week of year vs energy consumption |
---|
boxplot_correlation(data_featurized, 'week_of_year', 'consumption')
# boxplot_correlation: month of year vs energy consumption |
---|
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
'August', 'September', 'October', 'November', 'December']
boxplot_correlation(data_featurized, 'month_name', 'consumption', order = months)
# boxplot_correlation: time of day vs energy consumption |
---|
daytimes = ['midnight', 'early_morning', 'late_morning', 'afternoon',
'evening', 'night']
boxplot_correlation(data_featurized, 'time_of_day', 'consumption', order = daytimes)
get_accuracy() | [Return to Top] | |
---|---|---|
Calculates the RMSE score, the absolute accuracy, the relative accuracy, the sharpe ratio, and the overall accuracy for predictions and target data. |
def get_accuracy(df, pred_col, actual_col):
from sklearn.metrics import mean_squared_error
df = df.copy()
df['abs_acc'] = (1 - (abs(df[actual_col] -
df[pred_col]) / df[actual_col])) * 100
range_diff = np.max(df[actual_col]) - np.min(df[actual_col])
df['rel_acc'] = (1 - (abs(df[actual_col] -
df[pred_col]) / range_diff)) * 100
range_std = np.std(df[actual_col])
df['sharpe'] = (abs(df[actual_col] -
df[pred_col]) / range_std)
rmse = np.sqrt(mean_squared_error(df[actual_col],
df[pred_col]))
header_text(f"Average RMSE: {rmse:,.2f} | Average sharpe ratio: {df['sharpe'].mean():.2f} ", fontsize='16px', width = '600px')
header_text(f"Average absolute accuracy: {df['abs_acc'].mean():.2f}%",
fontsize='16px', width = '600px')
header_text(f"Average relative accuracy: {df['rel_acc'].mean():.2f}%",
fontsize='16px', width = '600px')
return df
# for the next examples, predictions from a trained model are needed # preparing the data for a regressor model |
---|
df = data_featurized.copy()
df = pd.get_dummies(df, columns = ['time_of_day'])
split_point = round(len(df) * .75)
train_data = df[: split_point]
test_data = df[split_point :]
features = ['change', 'hour', 'weekday', 'month', 'quarter', 'year',
'week_of_year', 'day_of_year', 'time_of_day_afternoon',
'time_of_day_early_morning', 'time_of_day_evening',
'time_of_day_late_morning', 'time_of_day_midnight',
'time_of_day_night']
target = 'consumption'
train_in, train_out = train_data[features], train_data[target]
test_in, test_out = test_data[features], test_data[target]
# defining the model, training, and getting predictions |
---|
from xgboost import XGBRegressor as xgb
model = xgb(n_estimators = 1200,
learning_rate = 0.012,
early_stopping_rounds = 23)
model.fit(train_in, train_out,
eval_set = [(train_in, train_out),
(test_in, test_out)], verbose = 125)
predictions = ser(model.predict(test_in))
[0] validation_0-rmse:35068.38825 validation_1-rmse:29587.22375 [125] validation_0-rmse:8127.22325 validation_1-rmse:5360.71831 [250] validation_0-rmse:2592.35417 validation_1-rmse:2412.92408 [257] validation_0-rmse:2486.24336 validation_1-rmse:2422.52324
actual = test_out.copy()
ser(predictions).index = list(actual.index)
actual_vs_predicted = pd.concat([actual, predictions], axis = 1)
actual_vs_predicted.columns = ['actual', 'predicted']
results = get_accuracy(actual_vs_predicted, 'predicted', 'actual')
Average RMSE: 2,404.93 | Average sharpe ratio: 0.31 |
Average absolute accuracy: 93.43% |
Average relative accuracy: 93.94% |
get_daily_error() | [Return to Top] | |
---|---|---|
Calculates the daily error for a key metric from the prediction results dataframe. |
def get_daily_error(df, pred_col, actual_col, num_examples,
ascending=False):
temp = df[[actual_col, pred_col]].copy()
temp['date'] = temp.index.strftime('%A, %b %d, %Y')
temp['error'] = np.abs(df[actual_col] - df[pred_col])
results = temp.sort_values("error", ascending=ascending)
error_style = {'error': [{'selector': '',
'props': [('color', 'red'),
('font-weight', 'bold'),
('padding-right', '15px'),
('padding-left', '15px')]}],
'date': [{'selector': 'td',
'props': [('color', 'blue'),
('font-weight', 'bold'),
('padding-right', '15px'),
('padding-left', '15px')]}],
'prediction': [{'selector': 'td',
'props': [('padding-right', '25px'),
('padding-left', '15px')]}]}
if ascending == True:
pretty(f'Daily error for the {num_examples} days with the lowest error:',
fontsize='18px')
else:
pretty(f'Daily error for the {num_examples} days with the highest error:',
fontsize='18px')
return results[['date',
'error',
pred_col,
actual_col]].head(num_examples).style.hide(axis='index') \
.set_table_styles(error_style) \
.format(precision=3, thousands=",")
get_daily_error(results, 'predicted', 'actual', 10)
Daily error for the 10 days with the highest error: |
date | error | predicted | actual |
---|---|---|---|
Monday, Jan 03, 2022 | 11,562.941 | 37,972.941 | 26,410 |
Monday, May 25, 2020 | 11,459.938 | 29,435.938 | 17,976 |
Sunday, Dec 25, 2022 | 11,418.531 | 34,780.531 | 23,362 |
Monday, May 25, 2020 | 11,370.938 | 29,435.938 | 18,065 |
Sunday, Dec 25, 2022 | 11,246.348 | 35,300.348 | 24,054 |
Sunday, Dec 25, 2022 | 11,122.531 | 34,780.531 | 23,658 |
Monday, May 25, 2020 | 10,925.391 | 29,644.391 | 18,719 |
Sunday, Dec 25, 2022 | 10,682.348 | 35,300.348 | 24,618 |
Monday, May 25, 2020 | 10,681.938 | 29,435.938 | 18,754 |
Monday, Apr 13, 2020 | 10,672.688 | 30,354.688 | 19,682 |
get_daily_error(results, 'predicted', 'actual', 10, ascending = True)
Daily error for the 10 days with the lowest error: |
date | error | predicted | actual |
---|---|---|---|
Saturday, Feb 12, 2022 | 0.049 | 26,323.951 | 26,324 |
Saturday, Oct 16, 2021 | 0.154 | 30,670.154 | 30,670 |
Wednesday, Jun 17, 2020 | 0.160 | 21,535.160 | 21,535 |
Friday, Nov 06, 2020 | 0.170 | 24,826.170 | 24,826 |
Thursday, Oct 15, 2020 | 0.180 | 34,020.820 | 34,021 |
Tuesday, Sep 08, 2020 | 0.191 | 31,237.809 | 31,238 |
Friday, Mar 20, 2020 | 0.289 | 34,961.289 | 34,961 |
Saturday, Sep 17, 2022 | 0.320 | 21,116.320 | 21,116 |
Thursday, Nov 07, 2019 | 0.355 | 27,966.645 | 27,967 |
Sunday, Dec 22, 2019 | 0.479 | 26,528.479 | 26,528 |