| Title: | Penalized Principal Machine for Sufficient Dimension Reduction |
|---|---|
| Description: | A unified, computation-friendly framework for penalized principal machines (P2M), a class of sparse sufficient dimension reduction (SDR) estimators for regression and binary classification. Principal machines (PM) estimate the central subspace by solving a family of convex-loss problems over several cutoffs; their penalized counterparts (P2M) add a row-group sparsity penalty so that dimension reduction and variable selection are performed simultaneously. All estimators are fitted by a single group coordinate descent (GCD) algorithm that accommodates least squares, logistic, asymmetric least squares, L2-hinge, hinge (support vector machine, SVM) and quantile losses, together with the least absolute shrinkage and selection operator (LASSO), the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP). Methods are described in Li, Artemiou and Li (2011) <doi:10.1214/11-AOS932>, Shin and Artemiou (2017) <doi:10.1016/j.csda.2016.12.003>, Artemiou, Dong and Shin (2021) <doi:10.1016/j.patcog.2020.107768> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2>. |
| Authors: | Jungmin Shin [aut, cre] (Department of Biomedical Informatics, The Ohio State University), Seung Jun Shin [aut] (Department of Statistics, Korea University) |
| Maintainer: | Jungmin Shin <[email protected]> |
| License: | GPL-3 |
| Version: | 2.0.0 |
| Built: | 2026-06-20 09:46:21 UTC |
| Source: | https://github.com/c16267/ppmsdr |
The Boston housing data of Harrison and Rubinfeld (1978), as distributed in
MASS. Following the real-data analysis in the package's accompanying
article, the binary Charles-River dummy variable (chas) is removed,
leaving twelve continuous predictors and the response medv. Values are on
their natural scale; standardize them before fitting (see Examples).
bostonboston
A data frame with 506 rows and 13 variables:
per-capita crime rate by town.
proportion of residential land zoned for lots over 25,000 sq ft.
proportion of non-retail business acres per town.
nitrogen-oxides concentration (parts per 10 million).
average number of rooms per dwelling.
proportion of owner-occupied units built prior to 1940.
weighted mean of distances to five Boston employment centres.
index of accessibility to radial highways.
full-value property-tax rate per $10,000.
pupil-teacher ratio by town.
, where is the proportion of the
Black population by town. See the note below.
lower-status percentage of the population.
median value of owner-occupied homes in $1000s (the response).
The variable b embeds a racist premise from the original 1978 study
and is retained only to reproduce the published analysis. It should not be
used uncritically; see the discussion in the documentation of
MASS::Boston.
Harrison, D. and Rubinfeld, D. L. (1978) Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102. Distributed in the MASS package.
data(boston) x <- scale(as.matrix(boston[, setdiff(names(boston), "medv")])) y <- as.numeric(scale(boston$medv)) fit <- ppm(x, y, loss = "lssvm", penalty = "grSCAD", lambda = 8e-3) summary(fit, d = 2)data(boston) x <- scale(as.matrix(boston[, setdiff(names(boston), "medv")])) y <- as.numeric(scale(boston$medv)) fit <- ppm(x, y, loss = "lssvm", penalty = "grSCAD", lambda = 8e-3) summary(fit, d = 2)
Fits a penalized principal machine (P2M), a sparse sufficient dimension reduction (SDR) estimator, through a single group coordinate descent (GCD) engine. A principal machine (PM) estimates the basis of the central subspace by solving a family of convex-loss problems over several cutoffs (slices); the penalized version adds a row-group sparsity penalty so that dimension reduction and variable selection are performed simultaneously.
ppm( x, y, loss = "lssvm", H = 10, C = 1, lambda = 0.01, gamma = 3.7, penalty = c("grSCAD", "grLasso", "grMCP"), max.iter = 100, ... )ppm( x, y, loss = "lssvm", H = 10, C = 1, lambda = 0.01, gamma = 3.7, penalty = c("grSCAD", "grLasso", "grMCP"), max.iter = 100, ... )
x |
A numeric matrix or data frame of predictors, of dimension
|
y |
A response vector of length |
loss |
Character string selecting the loss function. One of
|
H |
Number of cutoffs (slices); a single integer |
C |
Positive cost parameter that balances the loss against the
covariance term. Default |
lambda |
Positive regularization parameter controlling sparsity.
Default |
gamma |
Concavity parameter of the SCAD/MCP penalty; must exceed |
penalty |
Penalty type: |
max.iter |
Maximum number of GCD iterations. Default |
... |
Additional arguments passed to the underlying solver. The most
useful is |
ppm() is a single front-end that dispatches to the loss-specific solver
selected by loss. Two families are supported, following the taxonomy of
Shin and Shin (2024):
Response-based PM (RPM) for a continuous response, where the loss is
fixed and the pseudo-response varies across slices:
"lssvm" (least squares, P2LSM), "l2svm" (L2-hinge, P2L2M),
"svm" (hinge, P2SVM), "logit" (logistic, P2LR),
"asls" (asymmetric least squares, P2AR) and "qr" (quantile, P2QR).
Loss-based PM (LPM) for a binary response coded internally as
, where the loss varies across slices:
"wlssvm" (P2WLSM), "wl2svm" (P2WL2M), "wsvm" (P2WSVM) and
"wlogit" (P2WLR).
Acronyms used above: SDR (sufficient dimension reduction), PM (principal
machine), P2M (penalized principal machine), GCD (group coordinate descent),
SVM (support vector machine). The penalty penalty is one of the group
LASSO (least absolute shrinkage and selection operator), the group SCAD
(smoothly clipped absolute deviation) or the group MCP (minimax concave
penalty), passed as "grLasso", "grSCAD" or "grMCP".
The basis of the central subspace is estimated by the leading eigenvectors
of the working matrix , where
is the slope estimated at the -th cutoff.
An object of S3 class "ppm", a list containing:
the estimated working matrix (a p by p symmetric matrix).
the eigenvalues and eigenvectors of M; the
leading d eigenvectors estimate the basis of the central subspace.
the (validated) input data.
the fitting settings.
"continuous" or "binary".
the sample size and the number of predictors.
the matched call.
Li, B., Artemiou, A. and Li, L. (2011) Principal support vector machines for linear and nonlinear sufficient dimension reduction. The Annals of Statistics, 39(6), 3182–3210. doi:10.1214/11-AOS932
Shin, S. J. and Artemiou, A. (2017) Penalized principal logistic regression for sparse sufficient dimension reduction. Computational Statistics & Data Analysis, 111, 48–58. doi:10.1016/j.csda.2016.12.003
Breheny, P. and Huang, J. (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25, 173–187. doi:10.1007/s11222-013-9424-2
set.seed(1) n <- 1000; p <- 10 B <- matrix(0, p, 2); B[1, 1] <- B[2, 2] <- 1 x <- matrix(rnorm(n * p), n, p) y <- (x %*% B[, 1]) / (0.5 + (x %*% B[, 2] + 1)^2) + 0.2 * rnorm(n) ## penalized principal least-squares SVM (P2LSM) with the group SCAD penalty fit <- ppm(x, y, loss = "lssvm", penalty = "grSCAD", lambda = 0.01) round(fit$evectors[, 1:2], 3) print(fit) summary(fit) ## binary response: penalized principal asymmetric least squares (P2AR) yb <- sign(y) fitw <- ppm(x, yb, loss = "asls", penalty = "grSCAD", lambda = 0.04) round(fitw$evectors[, 1:2], 3) print(fitw) summary(fitw) data(boston) xb <- scale(as.matrix(boston[, setdiff(names(boston), "medv")])) yb <- as.numeric(scale(boston$medv)) fit_b <- ppm(xb, yb, loss = "lssvm", penalty = "grSCAD", lambda = 8e-3) summary(fit_b, d = 2)set.seed(1) n <- 1000; p <- 10 B <- matrix(0, p, 2); B[1, 1] <- B[2, 2] <- 1 x <- matrix(rnorm(n * p), n, p) y <- (x %*% B[, 1]) / (0.5 + (x %*% B[, 2] + 1)^2) + 0.2 * rnorm(n) ## penalized principal least-squares SVM (P2LSM) with the group SCAD penalty fit <- ppm(x, y, loss = "lssvm", penalty = "grSCAD", lambda = 0.01) round(fit$evectors[, 1:2], 3) print(fit) summary(fit) ## binary response: penalized principal asymmetric least squares (P2AR) yb <- sign(y) fitw <- ppm(x, yb, loss = "asls", penalty = "grSCAD", lambda = 0.04) round(fitw$evectors[, 1:2], 3) print(fitw) summary(fitw) data(boston) xb <- scale(as.matrix(boston[, setdiff(names(boston), "medv")])) yb <- as.numeric(scale(boston$medv)) fit_b <- ppm(xb, yb, loss = "lssvm", penalty = "grSCAD", lambda = 8e-3) summary(fit_b, d = 2)
Selects the sparsity parameter lambda of a penalized principal machine by
-fold cross-validation. For each candidate lambda, the model is
fitted on the training folds and the held-out distance correlation (dCor;
Szekely, Rizzo and Bakirov, 2007) between the response and the predictors
projected onto the estimated d-dimensional central subspace is recorded.
The lambda maximizing the average held-out dCor is returned, following the
criterion of Shin, Wu, Zhang and Liu (2017).
ppm_tune( x, y, loss = "lssvm", d = 2, H = 10, C = 1, gamma = 3.7, penalty = c("grSCAD", "grLasso", "grMCP"), max.iter = 100, n.fold = 5, lambda = NULL, nlambda = 20, lambda.max = 0.1, lambda.min.ratio = 0.001, verbose = FALSE, ... )ppm_tune( x, y, loss = "lssvm", d = 2, H = 10, C = 1, gamma = 3.7, penalty = c("grSCAD", "grLasso", "grMCP"), max.iter = 100, n.fold = 5, lambda = NULL, nlambda = 20, lambda.max = 0.1, lambda.min.ratio = 0.001, verbose = FALSE, ... )
x |
A numeric matrix or data frame of predictors, of dimension
|
y |
A response vector of length |
loss |
Character string selecting the loss function. One of
|
d |
Working structural dimension used to form the held-out sufficient
predictors. Default |
H |
Number of cutoffs (slices); a single integer |
C |
Positive cost parameter that balances the loss against the
covariance term. Default |
gamma |
Concavity parameter of the SCAD/MCP penalty; must exceed |
penalty |
Penalty type: |
max.iter |
Maximum number of GCD iterations. Default |
n.fold |
Number of cross-validation folds. Default |
lambda |
Optional numeric vector of candidate values. If |
nlambda |
Length of the automatically generated grid. Default |
lambda.max |
Largest candidate value in the generated grid. Default |
lambda.min.ratio |
Ratio of the smallest to the largest candidate in
the generated grid. Default |
verbose |
Logical; if |
... |
Additional arguments passed to the underlying solver. The most
useful is |
An object of S3 class "ppm_tune", a list with elements
the selected value of lambda.
the candidate grid (decreasing).
the average held-out distance correlation at each candidate.
a ppm() object refitted on all data at opt.lambda.
the settings and the matched call.
Shin, S. J., Wu, Y., Zhang, H. H. and Liu, Y. (2017) Principal weighted support vector machines for sufficient dimension reduction in binary classification. Biometrika, 104(1), 67–81. doi:10.1093/biomet/asw057
Szekely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007) Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794. doi:10.1214/009053607000000505
set.seed(1) n <- 1000; p <- 10 x <- matrix(rnorm(n * p), n, p) y <- x[, 1] / (0.5 + (x[, 2] + 1)^2) + 0.2 * rnorm(n) cv <- ppm_tune(x, y, loss = "lssvm", d = 2, n.fold = 5) cv$opt.lambda summary(cv$fit, d = 2)set.seed(1) n <- 1000; p <- 10 x <- matrix(rnorm(n * p), n, p) y <- x[, 1] / (0.5 + (x[, 2] + 1)^2) + 0.2 * rnorm(n) cv <- ppm_tune(x, y, loss = "lssvm", d = 2, n.fold = 5) cv$opt.lambda summary(cv$fit, d = 2)
Print a fitted penalized principal machine
## S3 method for class 'ppm' print(x, ...)## S3 method for class 'ppm' print(x, ...)
x |
An object of class |
... |
Currently ignored. |
The input object x, invisibly.
set.seed(1) x <- matrix(rnorm(200 * 6), 200, 6) y <- x[, 1] / (0.5 + (x[, 2] + 1)^2) + 0.2 * rnorm(200) fit <- ppm(x, y, loss = "lssvm", lambda = 0.01) print(fit)set.seed(1) x <- matrix(rnorm(200 * 6), 200, 6) y <- x[, 1] / (0.5 + (x[, 2] + 1)^2) + 0.2 * rnorm(200) fit <- ppm(x, y, loss = "lssvm", lambda = 0.01) print(fit)
Print a penalized principal machine cross-validation result
## S3 method for class 'ppm_tune' print(x, ...)## S3 method for class 'ppm_tune' print(x, ...)
x |
An object of class |
... |
Currently ignored. |
The input object x, invisibly.
Reports the estimated basis of the central subspace for a given working
dimension d, together with the predictors selected by the row-group
penalty (those whose loadings are non-zero across the leading d
directions).
## S3 method for class 'ppm' summary(object, d = 2, tol = 1e-06, ...)## S3 method for class 'ppm' summary(object, d = 2, tol = 1e-06, ...)
object |
An object of class |
d |
Working structural dimension; the number of leading eigenvectors
to report. Default |
tol |
Tolerance below which a row L2-norm is treated as zero when
determining the selected variables. Default |
... |
Currently ignored. |
Invisibly, a list with elements d, basis (the p by d
estimated basis), selected (indices of the selected predictors) and
n.selected.
set.seed(1) x <- matrix(rnorm(200 * 6), 200, 6) y <- x[, 1] / (0.5 + (x[, 2] + 1)^2) + 0.2 * rnorm(200) fit <- ppm(x, y, loss = "lssvm", lambda = 0.01) summary(fit, d = 2)set.seed(1) x <- matrix(rnorm(200 * 6), 200, 6) y <- x[, 1] / (0.5 + (x[, 2] + 1)^2) + 0.2 * rnorm(200) fit <- ppm(x, y, loss = "lssvm", lambda = 0.01) summary(fit, d = 2)
Diagnostic measurements for 569 breast masses from the Wisconsin Diagnostic
Breast Cancer study (Street, Wolberg and Mangasarian; 1993). Each record has
thirty real-valued features computed from a digitized image of a fine-needle
aspirate, namely the mean, standard error (_se) and worst (largest) value
of ten cell-nucleus characteristics, together with a benign/malignant
diagnosis. Features are on their natural scale; standardize them before
fitting (see Examples).
wdbcwdbc
A data frame with 569 rows and 31 variables: a factor diagnosis
with levels "B" (benign, 357 cases) and "M" (malignant, 212 cases),
followed by thirty numeric features. The features are the _mean, _se
and _worst summaries of: radius, texture, perimeter, area,
smoothness, compactness, concavity, concave_points, symmetry
and fractal_dimension (for example radius_mean, radius_se,
radius_worst).
For the weighted-loss principal machines ("wsvm", "wlssvm",
"wlogit", "wl2svm") the response is coded as malignant = +1 and
benign = -1, as shown in the Examples.
Street, W. N., Wolberg, W. H. and Mangasarian, O. L. (1993) Nuclear feature extraction for breast tumor diagnosis. Biomedical Image Processing and Biomedical Visualization, 1905, 861–870. UCI Machine Learning Repository, Breast Cancer Wisconsin (Diagnostic) Data Set.
data(wdbc) x <- scale(as.matrix(wdbc[, -1])) y <- ifelse(wdbc$diagnosis == "M", 1, -1) fit <- ppm(x, y, loss = "wl2svm", penalty = "grSCAD", lambda = 0.3) summary(fit, d = 2) B <- fit$evectors[, 1:2] scores <- x %*% B plot(scores[, 1], scores[, 2], col = ifelse(y == 1, "red", "blue"), pch = ifelse(y == 1, 17, 1), xlab = "1st SDR direction", ylab = "2nd SDR direction") legend("topright", legend = c("malignant (+1)", "benign (-1)"), col = c("red", "blue"), pch = c(17, 1))data(wdbc) x <- scale(as.matrix(wdbc[, -1])) y <- ifelse(wdbc$diagnosis == "M", 1, -1) fit <- ppm(x, y, loss = "wl2svm", penalty = "grSCAD", lambda = 0.3) summary(fit, d = 2) B <- fit$evectors[, 1:2] scores <- x %*% B plot(scores[, 1], scores[, 2], col = ifelse(y == 1, "red", "blue"), pch = ifelse(y == 1, 17, 1), xlab = "1st SDR direction", ylab = "2nd SDR direction") legend("topright", legend = c("malignant (+1)", "benign (-1)"), col = c("red", "blue"), pch = c(17, 1))