| Top | Initial Data Visualization | Visualizing Features | Prophet Model | Incoporating Holidays | Future Predictions |

Forecasting with Prophet
Resources consulted: YouTube Tutorial & Kaggle Notebook
Evan Marie online: | EvanMarie.com| EvanMarie@Proton.me | Linked In | GitHub | Hugging Face | Mastadon | Jovian.ai | TikTok | CodeWars | Discord ⇨ ✨ EvanMarie ✨#6114 |


| Top | Initial Data Visualization | Visualizing Features | Prophet Model | Incoporating Holidays | Future Predictions |

Initial Data Visualization
Featurized Data


| Top | Initial Data Visualization | Visualizing Features | Prophet Model | Incoporating Holidays | Future Predictions |

Feature Visualization
Train-Test Split


| Top | Initial Data Visualization | Visualizing Features | Prophet Model | Incoporating Holidays | Future Predictions |

Prophet Model
- Prophet requires the datetime index to be a column named ds and the metric or target column to be named y
Documentation on Prophet
Prepping the testing and training data
Fitting the model
Forecasting
Customized Prophet Plot Function (adapted from Prophet documentation)
Confidence interval becomes wider the farther a prediction is into the future
(Source)
Confidence Intervals are estimates that are calculated from sample data to determine ranges likely to contain the population parameter(mean, standard deviation)of interest. For example, if our population is (2,6), a confidence interval of the mean suggests that the population mean is likely between 2 and 6. And how confidently can we say this? Obviously 100%, right? Because we know all the values and we can calculate it very easily.

But in real-life problems, this is not the case. It is not always feasible or possible to study the whole population. So what do we do? We take sample data. But can we rely on one sample? No, because different samples from the same data will produce different mean. So we take numerous random samples (from the same population) and calculate confidence intervals for each sample and a certain percentage of these ranges will contain the true population parameter. This certain percentage is called the confidence level. A 95% confidence level means that out of 100 random samples taken, I expect 95 of the confidence intervals to contain the true population parameter.
Visualizing Components
Model Evaluation
Zoom in to date range within predictions and actual values
Monthly Comparison: Earliest Predictions
Weekly Comparison: Earliest Predictions
Monthly Comparison: Latest Predictions
Weekly Comparison: Latest Predictions
Error Metric Evaluation


| Top | Initial Data Visualization | Visualizing Features | Prophet Model | Incoporating Holidays | Future Predictions |

Incorporating Holidays
Predictions: Accounting for holidays
Visualizing Components: Incorporating Holidays
Monthly Comparison (incorporating holidays): Earliest Predictions
Weekly Comparison (incorporating holidays): Earliest Predictions
Monthly Comparison (incorporating holidays): Latest Predictions
Weekly Comparison (incorporating holidays): Latest Predictions


| Top | Initial Data Visualization | Visualizing Features | Prophet Model | Incoporating Holidays | Future Predictions |

Future Predictions
# retraining model on entirety of the data
# Rob didn't do this, but I do not see a reason not to


| Top | Initial Data Visualization | Visualizing Features | Prophet Model | Incoporating Holidays | Future Predictions |