Skip to content

Commit

Permalink
FInal
Browse files Browse the repository at this point in the history
  • Loading branch information
rmarconzini committed Jan 5, 2023
1 parent cb0ce19 commit 2a17f70
Show file tree
Hide file tree
Showing 19 changed files with 898 additions and 41 deletions.
Binary file added Altro/Paper Pictures/boxCox_PreCovid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/boxplot_Days_Precovid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/decompose_PostCovid.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/distribuzioneNA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/previsioniARIMA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/ss_dec_PreCovid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/trend_PreCovid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/tsdisplay_PreCovid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/vendite_Festivo_PreCovid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/vendite_PostCovid.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Altro/Paper Pictures/vendite_PreCovid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Analisi predittive/0501_Covid.RData
Binary file not shown.
Binary file added Analisi predittive/0501_PreCovid.RData
Binary file not shown.
Binary file removed Analisi predittive/2912Covid.RData
Binary file not shown.
111 changes: 78 additions & 33 deletions Analisi predittive/RIstorante1_Arima_Covid.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ course: Progetto Data Science Lab

```{r}
# Clean Workspace
rm(list=ls())
# rm(list=ls())
```

```{r setup, include=FALSE}
Expand Down Expand Up @@ -175,15 +175,11 @@ vendite_r1[vendite_r1 == 0] <- NaN
```{r}
# Trasformo in ts
vendite_r1 <- msts(vendite_r1, seasonal.periods=c(7,365.25), ts.frequency = 7)
plot(vendite_r1)
autoplot(vendite_r1, ylab = "Vendite")
```
## Imputazione dei valori mancanti
```{r}
```
```{r}
library(imputeTS)
ggplot_na_distribution(vendite_r1)
Expand All @@ -195,6 +191,10 @@ statsNA(vendite_r1)
vendite_r1_nona <- na_kalman(vendite_r1)
ggplot_na_imputations(vendite_r1, vendite_r1_nona)
```
```{r}
trend <-xts(vendite_r1_dec[,2], order.by = df$data, frequency = 7)
write.zoo(trend, file="C:\\Users\\marco\\OneDrive\\UNIMIB_DataScience\\99-PROJECTS\\DataScienceLab2022\\Analisi predittive\\trend.csv", sep=",")
```
```{r}
vendite_r1 <- na_kalman(vendite_r1)
Expand Down Expand Up @@ -360,7 +360,7 @@ head(dum_day_predict)
```{r}
# Divsione training-test serie storica
split_vendite <- ts_split(vendite_r1, sample.out =round(length(vendite_r1)*0.2))
split_vendite <- ts_split(vendite_r1, sample.out =60)
train <- split_vendite$train
test <- split_vendite$test
Expand All @@ -372,7 +372,7 @@ length(test)
```{r}
# Divisione training-test matrice dummy giornaliere
train_date <- nrow(dum_day) *0.8
train_date <- nrow(dum_day)-60
train_dumday<- dum_day[1:train_date,]
test_dumday <- dum_day[-c(1:train_date),]
Expand All @@ -384,7 +384,7 @@ length(test_dumday[,1])
```{r}
# Divisione training-test dummy annuali
train_date <- nrow(dum_ann) *0.8
train_date <- nrow(dum_ann)-60
train_dumann<- dum_ann[1:train_date,]
test_dumann <- dum_ann[-c(1:train_date),]
Expand All @@ -396,7 +396,7 @@ length(test_dumann[,1])
```{r}
# Divisione training-test termini di Fourier
train_date <- nrow(four) *0.8
train_date <- nrow(four)-60
train_four<- four[1:train_date,]
test_four <- four[-c(1:train_date),]
Expand All @@ -408,7 +408,7 @@ length(test_four[,1])
```{r}
# Divsione training-test dataframe xreg
train_date <- nrow(xreg) *0.8
train_date <- nrow(xreg)-60
train_xreg<- xreg[1:train_date,]
test_xreg <- xreg[-c(1:train_date),]
Expand Down Expand Up @@ -674,7 +674,7 @@ pars_test(mod3$coef, mod3$var.coef)
```
```{r}
mod3_auto <- auto.arima(train, xreg = train_four, lambda = "auto", stepwise = FALSE)
# mod3_auto <- auto.arima(train, xreg = train_four, lambda = "auto", stepwise = FALSE)
```
```{r}
Expand All @@ -697,13 +697,6 @@ checkresiduals(mod3_auto, test = "LB", lag = 25, plot = FALSE)
pars_test(mod3_auto$coef, mod3_auto$var.coef)
```
```{r}
# Fit prime 10 settimane
plot(train[1:70],
col = "blue", lwd=0.5,
main = "Fitted Values", type = "l")
lines(mod3$fitted[1:70], col="red", lwd=3)
```
Scelgo il modello *mod3* in quanto presenta caratteristiche migliori per quanto riguarda errori di previsioni e residui, anche se risulta essere meno parsimonioso
Expand Down Expand Up @@ -1185,6 +1178,39 @@ for (i in 1:length(matrices)) {

```

```{r}
# cAMBIO PARAMETRI GLOBALI PER IL TERZO E QUARTO MODELLO

h = 60 # 2 mesi
initial = 365 #180*3 modelli maggiormente complessi richiedono un training set iniziale più ampio
window = NULL

start.time <- Sys.time()
print(start.time)


xreg3 <- four
e_3_1 <- tsCV_ARIMA(y = vendite_r1, forecastfunction = f_mod3, h=h, xreg=xreg3, initial = initial)
#e3 <- tail(e3, n = -initial)
e3_1 <- e_3_1$e
e3_percentage_1 <- e_3_1$e_percentage
e3_estimate_1 <- e_3_1$y_estimate
e3_groundtruth_1 <- e_3_1$y_groundtruth


xreg4 <- rbind(xreg_mod4_train, xreg_mod4_test)
e_4_1 <- tsCV_ARIMA(y = vendite_r1, forecastfunction = f_mod4, h=h, xreg=xreg4, initial = initial)
#e4 <- tail(e4, n = -initial)
e4_1 <- e_4_1$e
e4_percentage_1 <- e_4_1$e_percentage
e4_estimate_1 <- e_4_1$y_estimate
e4_groundtruth_1 <- e_4_1$y_groundtruth

end.time <- Sys.time()
print(end.time)
time.taken <- end.time - start.time
print(time.taken)
```


## Confrontro tra i modelli
Expand All @@ -1211,18 +1237,19 @@ legend("bottomright",legend=c("1_ARIMA1","2_ARIMA2","3_ARIMA3","4_ARIMA4"),col=1
### RMdSE

```{r}
RMSE_mod1 <- sqrt(colMedians(e1^2, na.rm = TRUE, hasNA = TRUE))
RMSE_mod2 <- sqrt(colMedians(e2^2, na.rm = TRUE, hasNA = TRUE))
RMSE_mod3 <- sqrt(colMedians(e3^2, na.rm = TRUE, hasNA = TRUE))
RMSE_mod4 <- sqrt(colMedians(e4^2, na.rm = TRUE, hasNA = TRUE))
library(robustbase)
RdMSE_mod1 <- sqrt(colMedians(e1^2, na.rm = TRUE, hasNA = TRUE))
RdMSE_mod2 <- sqrt(colMedians(e2^2, na.rm = TRUE, hasNA = TRUE))
RdMSE_mod3 <- sqrt(colMedians(e3^2, na.rm = TRUE, hasNA = TRUE))
RdMSE_mod4 <- sqrt(colMedians(e4^2, na.rm = TRUE, hasNA = TRUE))
#RMSE_mod6 <- sqrt(colMeans(e6^2, na.rm = TRUE))


# Zoom in
plot(1:60, RMSE_mod1, type="l", col=1, xlab="horizon", ylab="RMSE", ylim = c(1000,4500))
lines(1:60, RMSE_mod2, type="l",col=2)
lines(1:60, RMSE_mod3, type="l",col=3)
lines(1:60, RMSE_mod4, type="l",col=4)
plot(1:60, RdMSE_mod1, type="l", col=1, xlab="horizon", ylab="RdMSE", ylim = c(1000,4500))
lines(1:60, RdMSE_mod2, type="l",col=2)
lines(1:60, RdMSE_mod3, type="l",col=3)
lines(1:60, RdMSE_mod4, type="l",col=4)
legend("bottomright",legend=c("1_ARIMA1","2_ARIMA2","3_ARIMA3","4_ARIMA4"),col=1:4,lty=1)

```
Expand All @@ -1244,12 +1271,12 @@ MAE_mod2 <- colMeans(abs(e2), na.rm = TRUE)
MAE_mod3 <- colMeans(abs(e3), na.rm = TRUE)
MAE_mod4 <- colMeans(abs(e4), na.rm = TRUE)

plot(1:42, MAE_mod1, type="l", col=1, xlab="horizon", ylab="MAE", ylim = c(0,6000))
lines(1:42, MAE_mod2, type="l",col=2)
lines(1:42, MAE_mod3, type="l",col=3)
lines(1:42, MAE_mod4, type="l",col=4)
plot(1:60, MAE_mod1, type="l", col=1, xlab="horizon", ylab="MAE", ylim = c(0,6000))
lines(1:60, MAE_mod2, type="l",col=2)
lines(1:60, MAE_mod3, type="l",col=3)
lines(1:60, MAE_mod4, type="l",col=4)

legend("topleft",legend=c("1_ARIMA1","2_ARIMA2","3_ARIMA3","4_ARIMA4"),col=1:4,lty=1)
legend("bottomright",legend=c("1_ARIMA1","2_ARIMA2","3_ARIMA3","4_ARIMA4"),col=1:4,lty=1)
```

### MAPE
Expand All @@ -1267,4 +1294,22 @@ lines(1:42, MAPE_mod3, type="l",col=3)
lines(1:42, MAPE_mod4, type="l",col=4)

legend("topleft",legend=c("1_ARIMA1","2_ARIMA2","3_ARIMA3","4_ARIMA4"),col=1:4,lty=1)
```
```


```{r}
RMSE_mod1 <- sqrt(colMeans(e1^2, na.rm = TRUE))
RMSE_mod2<- sqrt(colMeans(e2^2, na.rm = TRUE))
RMSE_mod3_1 <- sqrt(colMeans(e3_1^2, na.rm = TRUE))
RMSE_mod4_1 <- sqrt(colMeans(e4_1^2, na.rm = TRUE))
#RMSE_mod6 <- sqrt(colMeans(e6^2, na.rm = TRUE))


# Zoom in
plot(1:60, RMSE_mod1, type="l", col=1, xlab="horizon", ylab="RMSE", ylim = c(2500,10000))
lines(1:60, RMSE_mod2, type="l",col=2)
lines(1:60, RMSE_mod3_1, type="l",col=3)
lines(1:60, RMSE_mod4_1, type="l",col=4)
legend("topleft",legend=c("1_ARIMA1","2_ARIMA2","3_ARIMA3","4_ARIMA4"),col=1:4,lty=1)
```

104 changes: 96 additions & 8 deletions Analisi predittive/Ristorante1_Arima_PreCovid.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ head(vendite_r1)
# Data exploration

```{r}
plot(vendite_r1)
autoplot(vendite_r1, ylab="Vendite", xlab="Tempo")
```

I dati a disposizione sono incassi giornalieri per il Ristorante1. Per dati giornalieri ci possono essere più tipologie di stagionalità. Nel nostro caso abbiamo, molto probabilmente:
Expand All @@ -157,14 +157,22 @@ weekend <- tapply(weekend$lordototale, weekend$Giorno, mean, simplify = FALSE)

data.frame(c(feriali,weekend))
```
```{r}
boxplot(data.frame(c(feriali,weekend)), col="blue", ylab = "Vendite")
```

Notiamo infatti come le medie giornalieri siano sostanzialmente simili dal Lunedi al Giovedì, mentre aumentano notevolmente nei giorni Venerdì-Sabato-Domenica.

Per valutare la presenza di stagionalità annuale, sfruttiamo la variabile "Festivo", valorizzata a True quando siamo in presenza di un Festivo, che non sia però un giorno del weekend.

```{r}
festivo_noweekend <- df[df$Weekend == 'False',c("data", "lordototale", "Giorno", "Festivo")]
tapply(festivo_noweekend$lordototale, festivo_noweekend$Festivo, mean)
```

```{r}
festivo_mean <- data.frame(t(tapply(festivo_noweekend$lordototale, festivo_noweekend$Festivo, mean)))
colnames(festivo_mean) <- c("Non Festivo", "Festivo")
boxplot(festivo_mean, ylab ="Vendite")
```

Le medie nei giorni festivi risultano essere maggiori rispetto a quelle dei giorni feriali. Ciò quindi ci porta ad affermare la presenza di stagionalità annuale, anche se difficilmente stimabile per via della quantità ridotta di osservazioni.
Expand All @@ -184,7 +192,7 @@ Come precedentemente accennato la stagionalità a 365 giorni non viene stimata p

```{r}
plot(vendite_r1, main="Trend Vendite giornaliere", xlab ="Time", ylab="Fatturato")
lines(vendite_r1_dec[,2], col="blue", lwd=2)
lines(vendite_r1_dec[,2], col="red", lwd=2)
```

Notiamo come effettivamente le irregolarità del trend sono dovute a tutte le festività non modellate dalla stagionalità settimanale
Expand All @@ -197,6 +205,10 @@ La serie storica è ovviamente non stazionaria, per i seguenti motivi:
- Presenza di un trend (osservazioni sistematicamente sopra o sotto la media)
- Possibile non stazionarietà in varianza (se guardiamo il grafico della decomposizione, la parte stagionale tende ad un certo punto a restringersi)

```{r}
tsdisplay(vendite_r1)
```

```{r}
ndiffs(vendite_r1)
nsdiffs(vendite_r1)
Expand Down Expand Up @@ -246,7 +258,7 @@ Il test ci indica la necessità di una differenziazione, sia stagionale che non.


```{r}
tsdisplay(vendite_r1_diff7)
tsdisplay(diff(vendite_r1_log, 7), main = "")
```

ACF non decade a zero lentamente e non presenta pattern stagionali.
Expand Down Expand Up @@ -362,7 +374,7 @@ four <- as.matrix((four))
```{r}
split_vendite <- ts_split(vendite_r1, sample.out =round(length(vendite_r1)*0.2))
split_vendite <- ts_split(vendite_r1, sample.out =73)
train <- split_vendite$train
test <- split_vendite$test
length(train)
Expand All @@ -372,7 +384,7 @@ length(test)
```{r}
# Divisione training-test matrice dummy giornaliere
train_date <- nrow(dum_day) *0.8
train_date <- nrow(dum_day)-73
train_dumday<- dum_day[1:train_date,]
test_dumday <- dum_day[-c(1:train_date),]
Expand All @@ -384,7 +396,7 @@ length(test_dumday[,1])
```{r}
# Divisione training-test dummy annuali
train_date <- nrow(dum_ann) *0.8
train_date <- nrow(dum_ann)-73
train_dumann<- dum_ann[1:train_date,]
test_dumann <- dum_ann[-c(1:train_date),]
Expand All @@ -396,7 +408,7 @@ length(test_dumday[,1])
```{r}
# Divisione training-test termini di Fourier
train_date <- nrow(four) *0.8
train_date <- nrow(four)-73
train_four<- four[1:train_date,]
test_four <- four[-c(1:train_date),]
Expand Down Expand Up @@ -1360,3 +1372,79 @@ lines(1:42, MAPE_mod4, type="l",col=4)

legend("topleft",legend=c("1_ARIMA1","2_ARIMA2","3_ARIMA3","4_ARIMA4"),col=1:4,lty=1)
```


## Previsioni

Si sceglie quindi il modello 3, per effettuare le previsioni del fatturato durante il primo lockdown.

```{r}
# Termini di Fourier per le previsioni
four_pred <- fourier(vendite_r1, K = c(3,15), h=73)
```

```{r}
lockdown <- read.csv("C:\\Users\\marco\\OneDrive\\UNIMIB_DataScience\\99-PROJECTS\\DataScienceLab2022\\Dati ristoranti\\periodo-covid_r1.csv")
lockdown$data <- parse_date(lockdown$data, "%Y-%m-%d", locale = locale("it"))
```

```{r}
predict_lockdown <- subset(lockdown, lockdown$data < "2020-05-07", select = "data")
```


```{r}
previsioni <- forecast(Arima(vendite_r1,
order = c(2, 1, 3),
seasonal = list(order = c(1, 0, 1)),
xreg = four,
include.constant = TRUE,
lambda = "auto"),
h = 73,
xreg=four_pred)
```

```{r}
autoplot(previsioni)
```
```{r}
# Creo dataframe con le previsioni e i dati reali
predict_lockdown$previsioni <- previsioni$mean
predict_lockdown$lordototale <- lockdown[lockdown$data < "2020-05-07", 9]
head(predict_lockdown)
```


```{r}
predict_lockdown <- melt(predict_lockdown, id.vars = "data")
```


RUNNA QUESTO SOTTO



```{r}
p <- ggplot(predict_lockdown,
aes(x = data, y = value, col = variable)) + geom_line()
p + labs(x = "Data", y='Vendite') + ggtitle("Previsioni ARIMA") +
theme(legend.title = element_blank(),
legend.position = c(0.9, 0.18),
legend.background = element_rect(fill = "white", color = "black"))
```

### Confronto fatturato anno precedente

```{r}
# Somma fatturato valori predetti
sum_lockdown <- sum(lockdown$previsioni)
sum_lockdown
```
```{r}
# Somma fatturato anno 2021 pari a 6864238,728
percent_fatt <- (sum_lockdown*100)/6864238.728
percent_fatt
```



Binary file added Analisi predittive/cars_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 2a17f70

Please sign in to comment.