--- title: "Sparse Sufficient Dimension Reduction via Penalized Principal Machines" author: - Jungmin Shin (The Ohio State University) - Seung Jun Shin (Korea University) date: "`r Sys.Date()`" output: rmarkdown::pdf_document: toc: true number_sections: true vignette: > %\VignetteIndexEntry{Sparse Sufficient Dimension Reduction via Penalized Principal Machines} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(ppmSDR) ``` # Introduction Sufficient dimension reduction (SDR) reduces the dimensionality of predictors $\mathbf{X}$ while preserving their relationship with a response $Y$. SDR estimates a basis $\mathbf{B}$ of the *central subspace* $\mathcal{S}_{Y\mid\mathbf{X}}$ defined by $$ Y \perp \mathbf{X} \mid \mathbf{B}^\top \mathbf{X}. $$ In high dimensions, **sparse SDR** improves interpretability and accuracy by driving many rows of $\mathbf{B}$ to zero, which is achieved by adding a sparsity-inducing penalty to the SDR optimization problem. The **ppmSDR** package implements a unified framework for sparse SDR based on the penalized principal machine ($\mathrm{P}^2\mathrm{M}$). A single front-end, `ppm()`, dispatches to ten loss-specific estimators, all fitted by one group coordinate descent (GCD) engine, and `ppm_tune()` selects the sparsity parameter by cross-validation. # Penalized Principal Machines ## Principal machine Given data $(y_i, \mathbf{x}_i) \in \mathbb{R} \times \mathbb{R}^p$ with centered predictors, and a sequence of cutoffs $r_1 < \cdots < r_h$, the sample principal machine solves $$ (\beta_{0k}, \boldsymbol{\beta}_k) = \arg\min_{\beta_0, \boldsymbol{\beta}} \boldsymbol{\beta}^\top \hat{\Sigma} \boldsymbol{\beta} + \frac{c}{n} \sum_{i=1}^n L_k\!\left(\tilde{y}_{ik}, \beta_0 + \boldsymbol{\beta}^\top \mathbf{x}_i\right), \quad k = 1, \ldots, h, $$ where $\hat{\Sigma} = \sum_i \mathbf{x}_i \mathbf{x}_i^\top / n$. The basis $\hat{\mathbf{B}}$ is estimated by the leading $d$ eigenvectors of $\sum_{k=1}^h \hat{\boldsymbol{\beta}}_k \hat{\boldsymbol{\beta}}_k^\top$. - For the **response-based PM (RPM)**, $\tilde{y}_{ik} = \mathbb{I}\{y_i \ge r_k\} - \mathbb{I}\{y_i < r_k\}$ and the loss $L_k$ is fixed across cutoffs. - For the **loss-based PM (LPM)**, the loss $L_k$ varies with $r_k$ while $\tilde{y}_{ik}$ is fixed at $y_i$. ## Penalized estimation Under the sparsity assumption the slopes $\boldsymbol{\beta}_k$ share a common support across $k$, leading to the row-group penalized objective $$ \sum_{k=1}^{h} \left[ \boldsymbol{\beta}_k^\top \hat{\Sigma} \boldsymbol{\beta}_k + \frac{c}{n} \sum_{i=1}^n L_k(\tilde{y}_{ik}, \beta_{0k} + \boldsymbol{\beta}_k^\top \mathbf{x}_i) \right] + \sum_{j=1}^{p} p_{\lambda}\!\left(\|\boldsymbol{\beta}_{(j)}\|_2\right), $$ where $\boldsymbol{\beta}_{(j)} = (\beta_{1j}, \ldots, \beta_{hj})^\top$ and $p_\lambda(\cdot)$ is the group LASSO, SCAD or MCP penalty. The penalty acts group-wise on all coefficients of predictor $j$, so variable selection corresponds to identifying the predictors that form a sparse basis. ## Supported losses and algorithms | Machine | Response | Type | Loss $L_k(\tilde y_k, f)$ | Algorithm | |:--|:--|:--|:--|:--| | $\mathrm{P}^2\mathrm{LSM}$ | Continuous | RPM | $(1 - \tilde y_k f)^2$ | GCD | | $\mathrm{P}^2\mathrm{WLSM}$ | Binary | LPM | $w_k(1 - y f)^2$ | GCD | | $\mathrm{P}^2\mathrm{LR}$ | Continuous | RPM | $\log(1 + e^{-\tilde y_k f})$ | Iterative GCD | | $\mathrm{P}^2\mathrm{WLR}$ | Binary | LPM | $w_k \log(1 + e^{-y f})$ | Iterative GCD | | $\mathrm{P}^2\mathrm{AR}$ | Both | LPM | $(y-f)^2(\rho_k\mathbb{I}\{y \ge f\} + (1-\rho_k)\mathbb{I}\{y