• Accurate forecasts of the U.S. renewable energy consumption mix are essential for planning transmission upgrades, sizing storage, and setting balancing market rules. We introduce a Bayesian Dirichlet ARMA model (BDARMA) tailored to monthly shares of hydro, geothermal, solar, wind, wood, municipal waste, and biofuels from January 2010 through January 2025. The mean vector is modeled with a parsimonious VAR(2) in additive log ratio space, while the Dirichlet concentration parameter follows an intercept plus five Fourier harmonics, allowing for seasonal widening and narrowing of predictive dispersion. Forecast performance is assessed with a 61-split rolling origin experiment that issues twelve month density forecasts from January 2019 to January 2024. Compared with three alternatives (a Gaussian VAR(2) fitted in transform space, a seasonal naive approach that repeats last year’s proportions, and a drift-free ALR random walk), BDARMA lowers the mean continuous ranked probability score by 15 to 60 percent, achieves componentwise 90 percent interval coverage near nominal, and maintains point accuracy (Aitchison RMSE) on par with the Gaussian VAR through eight months and within 0.02 units afterward. These results highlight BDARMA’s ability to deliver sharp and well-calibrated probabilistic forecasts for multivariate renewable energy shares without sacrificing point precision.

    Links:

  • with Peter Coles and Erica Savage

    Short-term changes in booking behaviors can significantly undermine naive forecasting methods in the travel and hospitality industry, especially during periods of global upheaval. Traditional metrics like average or median lead times capture only broad trends, often missing subtle yet impactful distributional shifts. In this study, we introduce a normalized L1 (Manhattan) distance to measure the full distributional divergence in Airbnb booking lead times from 2018 to 2022, with particular emphasis on the COVID-19 pandemic. Using data from four major U.S. cities, we find a two-phase pattern of disruption: a sharp initial change at the pandemic’s onset, followed by partial recovery but persistent divergences from pre-2018 norms. Our approach reveals shifts in travelers' planning horizons that remain undetected by conventional summary statistics. These findings highlight the importance of examining the entire lead-time distribution when forecasting demand and setting pricing strategies. By capturing nuanced changes in booking behaviors, the normalized L1 metric enhances both demand forecasting and the broader strategic toolkit for tourism stakeholders, from revenue management and marketing to operational planning, amid continued market volatility.

    Links:

  • with Rob Weiss and Liz Medina

    We examine how prior selection affects the Bayesian Dirichlet Auto-Regressive Moving Average (B-DARMA) model for compositional time series. Through three simulation scenarios—correct specification, overfitting, and underfitting—we compare five priors: informative, horseshoe, laplace, spike-and-slab, and hierarchical. Under correct specification, all priors perform similarly, though horseshoe and hierarchical yield slightly lower bias. Overfitting highlights the advantage of strong shrinkage (particularly horseshoe), while none can rectify model misspecification when essential AR/MA terms are omitted.

    We also apply B-DARMA to daily S\&P 500 sector trading data, using a large-lag model to demonstrate overparameterization risks. Shrinkage priors effectively mitigate spurious complexity, whereas the weakly informative prior inflates errors in volatile sectors. These results underscore the importance of carefully chosen priors and model complexity in compositional time-series analysis, especially in high-dimensional settings.

    Links

  • with Kai Brusch and Rob Weiss

    In the hospitality industry, lead time data are a form of compositional data that are crucial for business planning, resource allocation, and staffing. Hospitality businesses accrue fees daily, but recognition of these fees is often deferred. This paper presents a novel class of Bayesian time series models, the Bayesian Dirichlet auto-regressive moving average (B-DARMA) model, designed specifically for compositional time series. The model is motivated by the analysis of five years of daily fees data from Airbnb, with the aim of forecasting the proportion of future fees that will be recognized in 12 consecutive monthly intervals. Each day’s compositional data are modeled as Dirichlet distributed, given the mean and a scale parameter. The mean is modeled using a vector auto-regressive moving average process, which depends on previous compositional data, previous compositional parameters, and daily covariates. The B-DARMA model provides a robust solution for analyzing large compositional vectors and time series of varying lengths. It offers efficiency gains through the choice of priors, yields interpretable parameters for inference, and produces reasonable forecasts. The paper also explores the use of normal and horseshoe priors for the vector auto-regressive and vector moving average coefficients, and for regression coefficients. The efficacy of the B-DARMA model is demonstrated through simulation studies and an analysis of Airbnb data.

    Links:

  • with Erica Savage

    The rise of remote work has sparked widespread claims that stays in short-term rentals are getting longer, but rigorous, nation-wide evidence remains limited. This study interrogates that claim by assembling a booking-weighted panel of U.S.\ Airbnb reservations spanning 2019–2024 and by applying a comprehensive suite of statistical tools. We fit Gamma and log-normal distributions via weighted maximum likelihood to characterise the full shape of the nights-per-booking (NPB) distribution; we quantify pandemic-phase shifts with weighted negative-binomial regression and a two-part hurdle model that isolates long-stay behaviour; and we capture month-to-month dynamics with a seasonal ARIMA(0,1,1) 12 specification. Together, these approaches allow us to (i) benchmark competing density forms, (ii) separate changes in the frequency versus the duration of long stays, and (iii) assess the forecasting value of pandemic-era regime indicators. The paper offers the first large-scale, post-pandemic portrait of U.S.\ Airbnb length-of-stay patterns and sets the stage for discussing how remote-work–driven travel may reconfigure pricing strategy, zoning policy, and lodging-tax design.

    Links:

  • with Erica Savage and Kai Brusch

    Many sectors (such as hospitality) face the challenge of forecasting metrics that span multiple time axes – where the timing of an event's occurrence is distinct from the timing of its recording or initiation. In this paper, Katz, Savage, and Brusch present a novel two-part forecasting methodology that addresses this challenge by treating the forecasting process as a time-shift operator. The methodology combines univariate time series forecasting to predict total bookings on booking dates with the Bayesian Dirichlet Auto-Regressive Moving Average (B-DARMA) model. The aim is to forecast the allocation of future bookings across different trip dates based on the time between booking and trip (lead time). This approach provides a sensible solution for forecasting demand across different time axes, offering interpretable results, flexibility, and the potential for improved accuracy. The efficacy of the two-part methodology is illustrated through an analysis of Airbnb booking data.

    Links:

  • with Rob Weiss

    High-dimensional vector autoregressive (VAR) models offer a versatile framework for multivariate time series analysis, yet they face critical challenges from over-parameterization and uncertain lag order. In this paper, we systematically compare three Bayesian shrinkage priors (horseshoe, lasso, and normal) with two frequentist regularization approaches (ridge and nonparametric shrinkage) under three carefully crafted simulation scenarios. These scenarios encompass (i) overfitting in a low-dimensional setting, (ii) sparse high-dimensional processes, and (iii) a combined scenario where both large dimension and overfitting complicate inference.

    We evaluate each method in terms of parameter estimation (via root mean squared error, coverage, and interval length) and out-of-sample forecasting (via one-step-ahead forecast RMSE). Our findings show that local-global Bayesian methods, particularly the horseshoe, dominate in maintaining accurate coverage and minimizing parameter error, even when the model is heavily over-parameterized. Frequentist ridge often yields competitive forecasts but can underestimate uncertainty, leading to sub-nominal coverage. A real-data application using macroeconomic variables from Canada illustrates how these methods perform in practice, reinforcing the advantages of local-global priors in stabilizing inference when dimension or lag order is inflated.

    Links:

Research

Publications

Pre-Prints

  • with Rob Weiss

    We introduce a new class of Bayesian Dirichlet Auto-Regressive Moving Average with Dirichlet Auto-Regressive Conditional Heteroskedasticity (B-DARMA-DARCH) models for analyzing and forecasting compositional time series data. This model extends the standard B-DARMA framework by incorporating a DARCH component to capture time-varying volatility, effectively modeling both the mean structure and heteroskedasticity inherent in compositional data.

    Applying the B-DARMA-DARCH model to Airbnb's currency fee proportions across different regions, we demonstrate its ability to capture temporal dynamics and volatility patterns in real-world data. The model outperforms traditional B-DARMA and Bayesian transformed VARMA models in terms of forecast accuracy and residual diagnostics. Notably, it effectively captures significant disruptions such as those caused by the COVID-19 pandemic, highlighting its robustness in the face of structural breaks and extreme events. The B-DARMA-DARCH model offers a flexible and powerful framework for modeling dynamic compositional data with time-varying proportions and heteroskedasticity, making it a valuable tool for various applications in finance and other fields.

    Links:

  • Observation-driven Dirichlet models for compositional time series often use the additive log-ratio (ALR) link and include a moving-average (MA) term built from ALR residuals. In the standard B-DARMA recursion, the usual MA regressor has a nonzero conditional mean under the Dirichlet likelihood, which biases the mean path and blurs the interpretation of MA coefficients. We propose a minimal change: replace the raw regressor with a centered innovation, computed in closed form using digamma functions. Centering restores mean-zero innovations for the MA block without altering either the likelihood or the ALR link. We provide simple identities for the conditional mean and the forecast recursion, show first-order equivalence to a digamma-link DARMA while retaining a closed-form inverse to the compositional mean, and provide ready-to-use code. A weekly application to the Federal Reserve H.8 bank-asset composition compares the original (raw-MA) and centered specifications under both fixed-holdout and rolling one-step designs. The centered formulation improves log predictive scores with essentially identical point accuracy and markedly cleaner Hamiltonian Monte Carlo diagnostics.
    Links:

  • This commentary translates the central ideas in Lead times in flux into a practice ready handbook in R. The original article measures change in the full distribution of booking lead times with a normalized L1 distance and tracks that divergence across months relative to year over year and to a fixed 2018 reference. It also provides a bound that links divergence and remaining horizon to the relative error of pickup forecasts. We implement these ideas end to end in R, using a minimal data schema and providing runnable scripts, simulated examples, and a prespecified evaluation plan. All results use synthetic data so the exposition is fully reproducible.

    Links:

Working Papers

  • with Sean Wilson, Liz Medina, and Jess Needleman

    Compositional time series are vectors of positive shares that sum to one and arise in analyses of mixes, allocations, and market shares. Classical log-ratio VAR models are convenient but can produce incoherent forecasts unless they are reclosed and often ignore time-varying dispersion. The Bayesian Dirichlet autoregressive moving-average (BDARMA) framework addresses these issues by combining a Dirichlet likelihood with ARMA dynamics on log-ratio coordinates for the simplex mean and, optionally, for concentration. We introduce darma, an R package implementing BDARMA with additive, centered, and isometric log-ratio coordinates, covariates in both mean and concentration, forecasting via generated quantities, posterior predictive checks, and PSIS-LOO model comparison. The package includes a centered moving-average parameterization that restores mean-zero innovations for the MA block without altering the Dirichlet likelihood and provides simulation-based multistep forecasting that is simplex-coherent by design.

  • with Liz Medina and Jess Needleman

    This paper investigates the distributional properties of daily lead times for two distinct demand metrics on the Airbnb platform, specifically Nights Booked (a volume-based measure) and Gross Booking Value (a revenue-based measure). Drawing on data from a large North American region over the period 2019--01--01 to 2024--12--01, we treat daily lead-time allocations as compositional vectors and apply a multi-faceted approach integrating compositional data transformations, tail modeling, distributional fitting, and Wasserstein-based divergence measures. Our analysis shows that revenue-based demand systematically diverges from volume-based demand in a mid-range horizon (roughly 30--90 days) rather than in the extreme tail. We also identify structural breaks in the daily divergence time series, suggesting that macro disruptions can permanently alter how volume vs.\ revenue allocations evolve. Comparisons of lognormal, Weibull, Gamma, and nonparametric generalized additive models (GAMs) reveal that a parsimonious Gamma distribution consistently delivers strong day-level fits for both metrics, even outperforming a spline-based approach in Kullback--Leibler divergence. These findings highlight the benefits of a distribution-level methodology for business and economic applications where volume and revenue behaviors can decouple, especially in the wake of major external shocks.

  • This paper introduces the Cradle prior, a new global--local shrinkage prior for high-dimensional regression and related problems. By blending a half-Laplace local scale with a half-Cauchy global scale, the Cradle prior is designed to ``cradle'' small coefficients near zero while remaining flexible enough to accommodate moderately large coefficients. The construction yields a sharper spike at zero than standard Laplace (Lasso) priors, yet avoids the extremely fat tails of the horseshoe. We investigate theoretical properties, including tail behavior and posterior concentration, and present extensive simulation results comparing Cradle with existing global--local methods (e.g.\ horseshoe, Bayesian Lasso). Empirical findings suggest that Cradle often outperforms competing methods, especially in scenarios where the underlying signal is sparse yet features moderately large effect sizes. We illustrate how the Cradle prior can be applied to real genomic datasets, where large numbers of predictors but relatively few moderately sized signals are common. Code and examples are provided to encourage adoption.

  • Spreadsheets and black‑box machine‑learning models dominate forecasting conversations, yet most practitioners operate in a quieter middle lane. This Note frames forecasting as statistical inference carried forward, showing that the decisive step is not deeper algorithms but a clear statement of the data‑generating process (DGP) behind each business question. Three Airbnb use‑cases: (1) quarterly booking‑volume budgeting for resource planning, (2) real‑time demand balancing across listings, and (3) multi‑year regulatory scenario modelling, illustrate how the DGP shapes modelling choices

  • We develop a principled empirical Bayes workflow for hierarchical models with many groups and sparse data, and we apply it to city-level heterogeneity in Airbnb cancellations. The workflow fits each city once under a weak baseline prior, then estimates population hyperparameters by either a deconvolution moment estimator or a posterior recycling maximum likelihood estimator that reuses the city posteriors through importance ratios stabilized by Pareto smoothing. The same separate fits can be reweighted to approximate the hierarchical posteriors without refitting. We provide diagnostics for the presence of random effects and for hyperprior misfit, basic asymptotic guarantees, and extensions to multivariate random effects with correlated intercepts and slopes. The approach serves as a fast, calibrated front end to a full joint hierarchical analysis.

  • We propose a parsimonious intervention model for compositional time series---common in tourism applications such as market shares of destinations, transport modes, booking channels, or accommodation types---in which a product launch induces a low-rank, smoothly timed shift in the mean trajectory on the Aitchison simplex. The method combines (i) a directional factor that captures how mass re-allocates across categories after launch, (ii) an innovation-form DARMA(p,q) recursion for logratio coordinates with component-specific covariates, and (iii) a Dirichlet observation with time-varying concentration. The intervention is governed by a launch-anchored logistic gate that learns both timing and speed. We discuss identifiability, priors, computation, and forecasting, prove basis invariance under orthonormal isometries of logratio space, and provide a practical workflow for tourism demand analytics. A simulation design and an empirical blueprint illustrate estimation, interpretation, and forecasting with exogenous tourism indicators.