A detailed look at each analytical method we use to help institutions understand their instructional costs, identify efficiency opportunities, and make data-driven strategic decisions.
The foundational question: what predicts instructional cost per FTE student after controlling for institutional characteristics? This model identifies which factors your institution can influence and which are structural constraints.
Is your instructional cost high because of factors you control (faculty mix, program composition) or factors you don't (state cost of living, Carnegie classification)? The regression residual tells you exactly how much your actual spending deviates from what the model predicts for an institution with your profile.
| Deliverable | Description |
|---|---|
| Coefficient table | Which factors significantly predict cost, with direction and magnitude. A one-unit increase in % T/TT faculty predicts a $X increase in cost/FTE. |
| Partial R² decomposition | How much variance each predictor explains, using Shapley values (dominance analysis). Answers: "Faculty mix explains 28% of cost variation; research intensity explains 19%." |
| Residual analysis | Your residual = actual cost minus predicted cost. Positive = spending more than expected. Negative = running lean. This is your efficiency signal. |
| Predicted vs. actual plot | Visual showing where your institution falls relative to the regression line among all peers. |
| Scenario analysis | "If you shifted faculty mix by X%, your predicted cost would change by $Y." Actionable what-if modeling grounded in the regression coefficients. |
Key insight: The original Delaware Cost Study reported only descriptive benchmarks (means, quartiles). Our regression model goes further by answering why costs differ — separating controllable factors from structural ones.
Institutions exist within states that have fundamentally different funding models, cost of living, and regulatory environments. A standard regression treats every institution as independent — but a public university in Mississippi and one in California operate in very different cost contexts. Multilevel modeling accounts for this nesting structure.
How much of your cost is driven by state-level factors (things you can't control) versus institution-level decisions (things you can)? And does the relationship between, say, faculty mix and cost differ depending on what state you're in?
| Deliverable | Description |
|---|---|
| Intraclass Correlation (ICC) | The percentage of cost variation that is between states vs. within states. If ICC = 0.25, then 25% of cost differences are explained by which state you're in. |
| State random effects | A ranked list of all 50 states showing their cost premium or discount after controlling for institutional characteristics. "Being in California adds $1,200 to predicted cost/FTE." |
| Cross-level interactions | Does the effect of faculty mix on cost differ by state funding level? In high-appropriation states, adding T/TT faculty might cost less because state subsidies offset it. |
| Contextual effects | Separating within-state effects from between-state effects of the same predictor. The meaning of "high % T/TT" may differ depending on whether you're comparing within your state or across states. |
| Caterpillar plot | Visual ranking of all state random intercepts with 95% confidence intervals — instantly shows which states are significantly above or below average. |
Why this matters: If your cost is high but your state random effect is also high, the "problem" may be geographic, not institutional. This reframes the conversation from "we're inefficient" to "we're operating in an expensive state, and here's how we compare to peers in the same context."
Data Envelopment Analysis (DEA) is a non-parametric method that constructs an empirical efficiency frontier from the data. The original Delaware Cost Study cited DEA in multiple conference presentations (AIR 2018, ACE 2019) as the next-generation analytical method — but never implemented it. We build what they envisioned.
Given your inputs (expenditures, faculty), are you producing the maximum possible outputs (students served, degrees awarded)? If not, how far are you from the frontier, and which specific efficient institutions should you benchmark against?
| Input/Output | Variables |
|---|---|
| Inputs | Instructional expenditures, FTE instructional staff, % T/TT faculty |
| Outputs | FTE students served, total degrees awarded, research expenditures (for R1s) |
| Variant | Purpose |
|---|---|
| CRS (Constant Returns to Scale) | Overall technical efficiency — are you on the frontier regardless of size? |
| VRS (Variable Returns to Scale) | Scale-adjusted efficiency — are you efficient for your size? Small colleges aren't penalized for not operating at R1 scale. |
| Scale Efficiency | CRS score / VRS score — are you operating at optimal scale? If not, should you grow or shrink? |
| Super-efficiency | Ranks institutions on the frontier against each other. Useful when multiple institutions score 1.0. |
| Malmquist Productivity Index | Tracks efficiency change over time, decomposed into: (a) efficiency change (catching up to the frontier) and (b) technical change (the frontier itself shifting). |
Why DEA over regression: Regression estimates an average relationship. DEA identifies the best practice frontier. An institution can be average by regression standards but far from the efficiency frontier. DEA also handles multiple inputs and outputs simultaneously without assuming a linear relationship.
Carnegie classification is the default peer grouping in higher education — but it's a blunt instrument. Two "Doctoral: Very High Research" institutions can have wildly different cost structures, faculty compositions, and enrollment profiles. Research from the original Cost Study's own presentations (NEAIR 2021) demonstrated that data-driven peer groups outperform Carnegie-based ones for benchmarking.
Who are your real peers based on your actual instructional cost and productivity profile? And how do you perform within that true peer group?
| Deliverable | Description |
|---|---|
| Cluster assignment | Your institution is placed in a named, characterized peer group (e.g., "Research-Active, Teaching-Heavy" or "Low-Cost, High-Throughput"). |
| Cluster profiles | Radar charts showing the metric signature of each cluster. Instantly see what makes your group distinctive. |
| Carnegie cross-tabulation | How do data-driven groups differ from Carnegie classification? Some Carnegie classes may split into multiple clusters; others may merge. |
| Within-cluster benchmarks | Your rank on each metric within your data-driven peer group — more meaningful than Carnegie-wide comparisons. |
| 10 nearest neighbors | The institutions most statistically similar to yours across all dimensions (Mahalanobis distance). Your true comparators. |
Example finding: "Oklahoma State is classified as 'Doctoral: Very High Research' by Carnegie, but our cluster analysis places it in a group of 47 institutions we call 'Research-Active, Teaching-Heavy' — high research expenditure combined with high student-to-faculty ratios. This is a distinct cost profile from the typical R1 pattern."
Standard regression tells you what predicts average cost. But the factors driving cost for the median institution may be entirely different from those driving cost at the 90th percentile. Quantile regression estimates separate models at different points in the cost distribution.
Do the same factors matter equally across the cost spectrum? Is research intensity irrelevant for low-cost institutions but a dominant factor for high-cost ones?
| Deliverable | Description |
|---|---|
| Quantile coefficient plots | Line plots showing how each predictor's coefficient changes across τ = 0.10 to 0.90. Where the line is flat, the effect is constant. Where it rises or falls, the effect depends on where you sit in the cost distribution. |
| Conditional distribution | Given your institution's characteristics, the full predicted distribution of cost — not just a point estimate. "Institutions with your profile range from $6,800 (10th percentile) to $12,400 (90th percentile)." |
| Tail analysis | What drives the most expensive institutions? If the 90th percentile coefficient for research intensity is 3x the median coefficient, research buyouts are disproportionately driving costs at the top. |
Real-world value: A provost asking "why is our cost high?" gets a fundamentally different answer from quantile regression than from OLS. If the OLS model says faculty mix explains 28% of cost variation, the quantile model might show that faculty mix explains 40% of variation at the 90th percentile but only 15% at the 25th percentile.
Before acting on any benchmarking result, you need to know whether your institution is an outlier — and if so, why. We apply four complementary methods because each catches different types of anomalies.
| Method | What It Catches | How It Works |
|---|---|---|
| Mahalanobis distance | Multivariate outliers | Measures how far an institution is from the center of the data across all metrics simultaneously, accounting for correlations. An institution can look normal on every single metric but be unusual in the combination. |
| Cook's distance | Influential observations | From the regression model — identifies institutions that disproportionately affect the regression results. Removing this institution would change the coefficients significantly. |
| DBSCAN | Density-based outliers | Finds institutions that don't belong to any natural cluster. Unlike Mahalanobis (which assumes a single center), DBSCAN works with arbitrary shapes and identifies "noise points." |
| IQR fencing | Per-metric outliers | Simple, transparent flagging for the scorecard. Any metric beyond 1.5 × IQR from the median gets flagged. Easy for stakeholders to understand. |
Why four methods: No single outlier detection method is complete. Mahalanobis misses non-elliptical clusters. Cook's only works with the regression model. DBSCAN requires choosing epsilon. IQR is univariate only. Together, they provide a comprehensive picture.
This is the most sophisticated model in our toolkit — and it's led by Dr. Jam Khojasteh, Associate Editor of the Structural Equation Modeling: A Multidisciplinary Journal, with 40+ publications in SEM and related methods. SEM doesn't just identify what predicts cost — it maps the full web of causal pathways showing how factors relate to each other in producing cost.
How do research intensity, faculty investment, and enrollment profile interact to produce instructional cost? What are the direct effects (research → cost) versus the indirect effects (research → faculty mix → workload → cost)? Does the model work the same way for public and private institutions?
| Latent Variable | Measured IPEDS Indicators |
|---|---|
| Research Intensity | Research exp/T/TT, % T/TT faculty, Carnegie R-classification |
| Faculty Investment | % T/TT, % full-time, average faculty salary |
| Enrollment Profile | FTE total, % graduate, degrees per 100 FTE |
| Instructional Cost | Cost per FTE, personnel %, instruction % of E&G |
| Productivity | Students per faculty, degrees per faculty |
| Deliverable | Description |
|---|---|
| Path diagram | Publication-quality diagram with standardized path coefficients on every arrow. The visual tells the full story. |
| Direct effects | Research intensity → cost: β = 0.34. "A one standard deviation increase in research intensity directly increases cost by 0.34 SD." |
| Indirect effects | Research → faculty mix → cost: β = 0.18. "Research intensity also increases cost indirectly by changing faculty composition." |
| Total effects | Direct + indirect = 0.52. "The total impact of research intensity on cost is larger than either path alone." |
| Model fit indices | CFI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08 — publication-standard reporting. |
| Measurement invariance | Does the model work the same way for public vs. private institutions? We test configural, metric, scalar, and strict invariance. Dr. Khojasteh literally wrote the book on this (Khojasteh & Lo, 2015). |
| Latent scores | Each institution gets estimated scores on each latent construct — your "research intensity score," "productivity score," etc. |
Why SEM over regression: Regression treats all predictors as independent causes. SEM models the relationships among predictors. Research intensity doesn't just predict cost — it changes faculty mix, which changes workload, which changes cost. SEM captures this full causal chain. It also handles measurement error through latent variables, producing less biased estimates.
A trend line shows where cost has been. A growth model tells you what trajectory class your institution belongs to and where it's headed. Based on the longitudinal SEM methods published by Dr. Khojasteh (Marcoulides & Khojasteh, 2018; Whittaker & Khojasteh, 2017).
Is your cost trajectory rising, stable, or declining? Are there distinct subpopulations of institutions following different paths? What predicts which path you're on? And does your enrollment trajectory co-evolve with your cost trajectory?
| Model | What It Does |
|---|---|
| Latent Growth Curve (LGC) | Estimates the average trajectory across all institutions (intercept = starting point, slope = rate of change) and individual variation around it. "The average institution's cost grew $312/year, but the standard deviation of slopes is $180 — there's huge variation." |
| Growth Mixture Model (GMM) | Identifies distinct subpopulations following different trajectories. "We identified 3 trajectory classes: 'Rising' (38% of institutions, avg +$520/year), 'Stable' (45%, avg +$80/year), and 'Declining' (17%, avg -$200/year). Your institution belongs to the Rising class." |
| Parallel Process Model | Models cost and enrollment trajectories simultaneously. "Institutions whose enrollment declined by > 5% show cost increases of $800/year — the fixed cost structure doesn't shrink with enrollment." |
Beyond trend lines: A simple trend line treats every institution's trajectory as the same shape (linear). Growth mixture models discover that institutions follow fundamentally different patterns — and knowing which pattern you're on changes the strategic response entirely.
This is the ultimate consulting deliverable — a data-driven recommendation engine for program investment and disinvestment. It combines efficiency scores, enrollment demand, and strategic value into a single decision framework.
| Axis | Components | Data Source |
|---|---|---|
| Cost Efficiency (X) | DEA score + regression residual + cost-per-FTE percentile within CIP | Models 1 & 3 |
| Demand Signal (Y) | 5-year enrollment growth rate + completion rate trend + BLS occupation projections | IPEDS + Bureau of Labor Statistics |
| Strategic Value (Z) | Mission alignment + accreditation requirements + cross-subsidy role + institutional distinctiveness | Client input (scored rubric) |
Faculty compensation is the largest component of instructional cost (typically 80-90%). Every retirement, every new hire, every adjunct-to-lecturer conversion changes your cost structure. This simulation engine lets you model those changes before making them.
"What if we replace 5 retiring tenured faculty with 3 lecturers and 4 adjuncts? What happens to our cost per FTE, our student-faculty ratio, our percentile rank among peers, and our trajectory over 5 years?"
| Parameter | Source |
|---|---|
| Faculty retirements by type and year | Client-provided or actuarial estimate |
| New hires by type (T/TT, lecturer, adjunct) | Client scenario input |
| Salary by faculty type | IPEDS HR data or client-provided |
| Benefit rates by type | Client-provided or national average |
| Teaching capacity by type (SCH/FTE) | Estimated from current SCH/FTE ratios |
| Enrollment projection | Client-provided or growth model forecast |
The tradeoff made visible: Every faculty composition decision involves a cost-quality tradeoff. This simulation doesn't tell you what to do — it shows you the quantified consequences of each option so you can make an informed decision.