Mean robust optimization

Irina Wang¹,
Cole Becker²,
Bart Van Parys³ &
…
Bartolomeo Stellato ORCID: orcid.org/0000-0003-4684-7111¹

279 Accesses
2 Altmetric
Explore all metrics

Abstract

Robust optimization is a tractable and expressive technique for decision-making under uncertainty, but it can lead to overly conservative decisions when pessimistic assumptions are made on the uncertain parameters. Wasserstein distributionally robust optimization can reduce conservatism by being data-driven, but it often leads to very large problems with prohibitive solution times. We introduce mean robust optimization, a general framework that combines the best of both worlds by providing a trade-off between computational effort and conservatism. We propose uncertainty sets constructed based on clustered data rather than on observed data points directly thereby significantly reducing problem size. By varying the number of clusters, our method bridges between robust and Wasserstein distributionally robust optimization. We show finite-sample performance guarantees and explicitly control the potential additional pessimism introduced by any clustering procedure. In addition, we prove conditions for which, when the uncertainty enters linearly in the constraints, clustering does not affect the optimal solution. We illustrate the efficiency and performance preservation of our method on several numerical examples, obtaining multiple orders of magnitude speedups in solution time with little-to-no effect on the solution quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Mixed uncertainty sets for robust combinatorial optimization

Article 24 July 2019

Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate

Article Open access 25 July 2022

Robust Discrete Optimization Under Discrete and Interval Uncertainty: A Survey

References

Bandi, C., Bertsimas, D.: Tractable stochastic analysis in high dimensions via robust optimization. Math. Program. 134(1), 23–70 (2012)
Article MathSciNet Google Scholar
Beck, A.: First-Order Methods in Optimization. SIAM-Society for Industrial and Applied Mathematics, Philadelphia (2017)
Book Google Scholar
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)
Book Google Scholar
Ben-Tal, A., den Hertog, D., Vial, J.P.: Deriving robust counterparts of nonlinear uncertain inequalities. Math. Program. 149(1–2), 265–299 (2015)
Article MathSciNet Google Scholar
Ben-Tal, A., Nemirovski, A.: Robust solutions of Linear Programming problems contaminated with uncertain data. Math. Program. 88(3), 411–424 (2000)
Article MathSciNet Google Scholar
Ben-Tal, A., Nemirovski, A.: Selected topics in robust convex optimization. Math. Program. 112, 125–158 (2008)
Article MathSciNet Google Scholar
Beraldi, P., Bruni, M.E.: A clustering approach for scenario tree reduction: an application to a stochastic programming portfolio optimization problem. TOP 22(3), 934–949 (2014)
Article MathSciNet Google Scholar
Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)
Article MathSciNet Google Scholar
Bertsimas, D., den Hertog, D., Pauphilet, J.: Probabilistic guarantees in robust optimization. SIAM J. Optim. 31(4), 2893–2920 (2021)
Article MathSciNet Google Scholar
Bertsimas, D., Gupta, V., Kallus, N.: Data-driven robust optimization. Math. Program. 167(2), 235–292 (2018)
Article MathSciNet Google Scholar
Bertsimas, D., den Hertog, D.: Robust and Adaptive Optimization. Dynamic Ideas, Belmont (2022)
Google Scholar
Bertsimas, D., Mundru, N.: Optimization-based scenario reduction for data-driven two-stage stochastic optimization. Oper. Res. (2022)
Bertsimas, D., Sim, M.: The price of robustness. Oper. Res. 52(1), 35–53 (2004)
Article MathSciNet Google Scholar
Bertsimas, D., Sturt, B., Shtern, S.: A data-driven approach to multistage stochastic linear optimization. Manag. Sci. (2022)
Bradley, B.O., Taqqu, M.S.: Financial Risk and Heavy Tails, Handbooks in Finance, vol. 1. North-Holland, Amsterdam (2003)
Google Scholar
Carmona, R.A.: Rsafd: Statistical Analysis of Financial Data in R (2020). R package version 1.2
Chen, L.: Clustering of sample average approximation for stochastic program (2015)
Chen, R., Paschalidis, I.: Distributionally robust learning. Found. Trends Optim. 4(1–2), 1–243 (2020)
Google Scholar
Chen, Z., Sim, M., Xiong, P.: Robust stochastic optimization made easy with RSOME. Manag. Sci. 66(8), 3329–3339 (2020)
Article Google Scholar
Clement, P., Desch, W.: An elementary proof of the triangle inequality for the Wasserstein metric. Proc. Am. Math. Soc. 136(1), 333–339 (2008)
Article MathSciNet Google Scholar
Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3), 595–612 (2010)
Article MathSciNet Google Scholar
Dupačová, J., Gröwe-Kuska, N., Römisch, W.: Scenario reduction in stochastic programming. Math. Program. 95(3), 493–511 (2003)
Article MathSciNet Google Scholar
Emelogu, A., Chowdhury, S., Marufuzzaman, M., Bian, L., Eksioglu, B.: An enhanced sample average approximation method for stochastic optimization. Int. J. Prod. Econ. 182, 230–252 (2016)
Article Google Scholar
Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171, 115–166 (2018)
Article MathSciNet Google Scholar
Esteban, P.A., Morales, J.M.: Partition-based distributionally robust optimization via optimal transport with order cone constraints. 4OR 20(3), 465–497 (2022)
Article MathSciNet Google Scholar
Fabiani, F., Goulart, P.: The optimal transport paradigm enables data compression in data-driven robust control. In: 2021 American Control Conference (ACC), pp. 2412–2417 (2021)
Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162(3), 707–738 (2015)
Article MathSciNet Google Scholar
Gao, R.: Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality. CoRR (2020)
Gao, R., Kleywegt, A.: Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 48, 603–655 (2023)
Article MathSciNet Google Scholar
Givens, C.R., Shortt, R.M.: A class of Wasserstein metrics for probability distributions. Mich. Math. J. 31(2), 231–240 (1984)
Article MathSciNet Google Scholar
Goh, J., Sim, M.: Distributionally robust optimization and its tractable approximations. Oper. Res. 58(4–part–1), 902–917 (2010)
Article MathSciNet Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Google Scholar
Holmberg, K., Rönnqvist, M., Yuan, D.: An exact algorithm for the capacitated facility location problems with single sourcing. Eur. J. Oper. Res. 113(3), 544–559 (1999)
Article Google Scholar
Jacobson, D., Hassan, M., Dong, Z.S.: Exploring the effect of clustering algorithms on sample average approximation. In: 2021 Institute of Industrial and Systems Engineers (IISE) Annual Conference & Expo (2021)
Kuhn, D., Esfahani, P.M., Nguyen, V., Shafieezadeh-Abadeh, S.: Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning, pp. 130–166 (2019)
Liu, Y., Yuan, X., Zhang, J.: Discrete approximation scheme in distributionally robust optimization. Numer. Math. Theory Methods Appl. 14(2), 285–320 (2021)
Article MathSciNet Google Scholar
MOSEK ApS: The MOSEK optimization toolbox. Version 9.3. (2022)
Neumann, J.V.: Zur theorie der gesellschaftsspiele. Math. Ann. 100, 295–320 (1928)
Article MathSciNet Google Scholar
Perakis, G., Sim, M., Tang, Q., Xiong, P.: Robust pricing and production with information partitioning and adaptation. Manag. Sci. (2023)
Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26(7), 1443–1471 (2002)
Article Google Scholar
Rockafellar, R.T., Wets, R.J.: Variational analysis. Grundlehren der mathematischen Wissenschaften (1998)
Roos, E., den Hertog, D.: Reducing conservatism in robust optimization. INFORMS J. Comput. 32(4), 1109–1127 (2020)
MathSciNet Google Scholar
Rujeerapaiboon, N., Schindler, K., Kuhn, D., Wiesemann, W.: Scenario reduction revisited: fundamental limits and guarantees. Math. Program. 191(1), 207–242 (2022)
Article MathSciNet Google Scholar
Thorndike, R.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953)
Article Google Scholar
Trillos, N., Slepčev, D.: On the rate of convergence of empirical measures in $\infty $-transportation distance. Can. J. Math. 67, 1358–1383 (2014)
Article MathSciNet Google Scholar
Uryasev, S., Rockafellar, R.T.: Conditional Value-at-Risk: Optimization Approach. Springer, New York (2001)
Google Scholar
Wang, I., Becker, C., Van Parys, B., Stellato, B.: Mean robust optimization. arXiv (2023)
Wang, Z., Wang, P., Ye, Y.: Likelihood robust optimization for data-driven problems. CMS 13(2), 241–261 (2016)
Article MathSciNet Google Scholar
Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62(6), 1358–1376 (2014)
Article MathSciNet Google Scholar
Xu, H., Caramanis, C., Mannor, S.: A distributional interpretation of robust optimization. Math. Oper. Res. 37(1), 95–110 (2012)
Article MathSciNet Google Scholar
Zhen, J., Kuhn, D., Wiesemann, W.: A unified theory of robust and distributionally robust optimization via the primal-worst-equals-dual-best principle (2021). https://doi.org/10.1287/opre.2021.0268
Zymler, S., Kuhn, D., Rustem, B.: Distributionally robust joint chance constraints with second-order moment information. Math. Program. 137(1), 167–198 (2013)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The simulations presented in this article were performed on computational resources managed and supported by Princeton Research Computing, a consortium of groups including the Princeton Institute for Computational Science and Engineering (PICSciE) and the Office of Information Technology’s High Performance Computing Center and Visualization Laboratory at Princeton University. We would like to thank Daniel Kuhn for the useful feedback and for pointing us to the literature on scenario reduction techniques.

Author information

Authors and Affiliations

Department of Operations Research and Financial Engineering, Princeton University, Princeton, USA
Irina Wang & Bartolomeo Stellato
Operations Research Center, Massachusetts Institute of Technology, Cambridge, USA
Cole Becker
Stochastics Group, Centrum Wiskunde en Informatica, Amsterdam, The Netherlands
Bart Van Parys

Authors

Irina Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cole Becker
View author publications
You can also search for this author in PubMed Google Scholar
Bart Van Parys
View author publications
You can also search for this author in PubMed Google Scholar
Bartolomeo Stellato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bartolomeo Stellato.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Irina Wang and Bartolomeo Stellato are supported by the NSF CAREER Award ECCS 2239771.

Appendices

Proof of the constraint reformulation in (15)

We give a prove for the general case of maximum-of-concave functions. When $J=1$, we take $\alpha _{1k} = w_k$, for all k. For a simpler proof for the case of single-concave functions, refer to [47, Appendix A].

To simplify notation, we define $c_k(v_{jk}) =\Vert v_{jk} - \bar{d}_k \Vert ^p - \epsilon ^p$. Then, starting from the inner optimization problem of (14):

$$\begin{aligned} \begin{aligned}&{\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S, \alpha \in \Gamma }{\sup } & \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) \\ \text{ subject } \text{ to } & \sum _{k=1} ^K \sum _{j=1}^J \alpha _{jk} c_k(v_{jk}) \le 0 \end{array}\right. }\\&={\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S, \alpha \in \Gamma }{\sup } & \underset{\lambda \ge 0}{\inf }~\sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) - \lambda \sum _{k=1} ^K \sum _{j=1}^J \alpha _{jk} c_k(v_{jk})\\ \end{array}\right. }\\&={\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0}{\inf }~\underset{v_{11}, \dots , v_{JK} \in S}{\sup } & \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) - \lambda \sum _{k=1} ^K \sum _{j=1}^J \alpha _{jk} c_k(v_{jk}),\\ \end{array}\right. }\\ \end{aligned} \end{aligned}$$

We applied the Lagrangian in the first equality. Then, as the summation is over upper-semicontinuous functions $g_j(v_{jk},x)$ concave in $v_{jk}$, we applied the Von Neumann-Fan minimax theorem [38] to interchange the inf and the sup. Next, we rewrite the formulation using an epigraph trick, and make a change of variables.

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0, s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }~\underset{v_{11}, \dots , v_{JK} \in S}{\sup }& \sum _{j=1}^J \alpha _{jk}(g_j(v_{jk}, x) - \lambda c_k(v_{jk})) \le s_k \quad k = 1,\dots ,K, \end{array}\right. }\\&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }~{\underset{\alpha _{11}v_{11}\in \alpha _{11}S, \dots , \alpha _{JK}v_{JK} \in \alpha _{JK}S}{\sup }}& \sum _{j=1}^J \alpha _{jk}(g_j((\alpha _{jk}v_{jk})/\alpha _{jk}, x) \\ & - \lambda c_k((\alpha _{jk}v_{jk})/\alpha _{jk})) \le s_k \quad k = 1,\dots ,K. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

In the last step, we rewrote $v_{jk} = (\alpha _{jk}v_{jk})/\alpha _{jk}$, and maximized over ${\alpha _{jk}v_{jk} \in \alpha _{jk}S}$. In the case $\alpha _{ij} > 0$, the terms in the summation are unchanged, and maximizing over $v_{jk}$ is equivalent to maximizing over $\alpha _{jk}v_{jk}$. In the case $\alpha _{jk}= 0$, we have in the transformed formulation $(\alpha _{jk}v_{jk})/\alpha _{jk} = 0/0 = 0$, and $\alpha _{jk}(g_j(0, x) - \lambda c_k(0)) = 0(g_j(0, x) - \lambda c_k(0)) = 0$. In the original formulation, the term $\alpha _{jk}(g_j(v_{jk}, x) - \lambda c_k(v_{jk}))$ is also equivalent to 0. As the terms in the summation are 0 regardless of the value of $v_{jk}$, maximizing over $v_{jk}$ is equivalent to maximizing over 0. Therefore, we note that the optimal value remains unchanged. Next, we make substitutions ${h_{jk}} = \alpha _{jk}v_{jk}$, and define functions $g_j'(h_{jk},x) = \alpha _{jk}g_j(h_{jk}/\alpha _{jk},x)$, $c_k'(h_{jk}) = \alpha _{jk}{c_k}(h_{jk}/\alpha _{jk})$.

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }~{\underset{h_{11}\in \alpha _{11}S, \dots , h_{JK} \in \alpha _{JK}S}{\sup }}& \sum _{j=1}^J g_j'(h_{jk}, x)- \lambda c_k'(h_{jk}) \le s_k \quad k = 1,\dots ,K, \end{array}\right. }\\&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }& \sum _{j=1}^J [-g_j' +{\chi _{\alpha _{jk}S}} + \lambda c_k']^*(0) \le s_k \quad k = 1,\dots ,K. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

For the new functions defined, we applied the definition of conjugate functions. We also define the characteristic function $\chi _{S}(v)$ with $\chi _{S}(v) = 0$ if $v \in S$; $=\infty $ otherwise. Now, using the conjugate form $f^*(y) = \alpha g^*(y)$ of a right-scalar-multiplied function $f(x) = \alpha g(x/\alpha )$, and noting that $\chi _{\alpha _{jk}S}(h)$ takes the same value as $\alpha _{jk} \chi _{S}(h/{\alpha _{jk}}) $, we can rewrite the constraint as

$$\begin{aligned} \begin{aligned} \sum _{j=1}^J \alpha _{jk}[-g_j +\chi _{S} + \lambda c_k]^*(0) \le s_k \quad k = 1,\dots ,K. \end{aligned} \end{aligned}$$

Next, borrowing results from Esfahani et al. [24, Theorem 4.2], Rockafellar and Wets [41, Theorem 11.23(a), p. 493], and Zhen et al. [51, Lemma B.8], with regards to the conjugate functions of infimal convolutions and p-norm balls, we note that:

$$\begin{aligned} \begin{aligned} \alpha _{jk}[(-g_j +\chi _{S} + \lambda c_k)]^* (0)&= \alpha _{jk}\underset{y_{jk},z_{jk}}{\inf } ([-g_j]^*(z_{jk} - y_{jk},x) \\&\quad + \sigma _{S}(y_{jk}) + [\lambda c_k]^* (-z_{jk})), \end{aligned} \end{aligned}$$

$$\begin{aligned} {[}\lambda c_k]^* (-z_{jk})= & \underset{v_{jk}}{\sup }(-z_{jk}^T v_{jk} - \lambda \Vert v_{jk} - \bar{d}_k\Vert ^p + \lambda \epsilon ^p)\\= & -z_{jk}^T\bar{d}_k + \phi (q)\lambda {\Vert }z_{jk}/\lambda {\Vert }^q_* + \lambda \epsilon ^p. \end{aligned}$$

Substituting this in, we have

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0, {s,z,y}}{\inf } & \sum _k^K s_k\\ \text{ subject } \text{ to } & \sum _{j=1}^J \alpha _{jk}([-g_j]^*(z_{jk} - y_{jk},x)\\ & + \sigma _{S}(y_{jk}) - z_{jk}^T\bar{d}_k + \phi (q)\lambda \left\| z_{jk}/\lambda \right\| ^q_* + \lambda \epsilon ^p ) \le s_k \\ & \hspace{2cm} \quad k = 1,\dots ,K. \end{array}\right. } \end{aligned} \end{aligned}$$

Taking the supremum over $\alpha $, noting that $\sum _{j=1}^J \alpha _{jk} = w_k$ for all k, we arrive at

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} ~\underset{\lambda \ge 0, {s,z,y}}{\inf } & \sum _k^K s_k\\ \text{ subject } \text{ to } & w_k ([-g_j]^*(z_{jk} - y_{jk},x) + \sigma _{S}(y_{jk}) - z_{jk}^T\bar{d}_k \\ & ~~ + \phi (q)\lambda \left\| z_{jk}/\lambda \right\| ^q_* + \lambda \epsilon ^p ) \le s_k \quad k = 1,\dots ,K, \quad j = 1,\dots ,J, \end{array}\right. } \end{aligned} \end{aligned}$$

which is equivalent to (15).

Reformulation of the maximum-of-concave case for ${p = \infty }$

We again give the general proof for $J \ge 1$. For the case $J=1$, refer to the simpler proof [47, Appendix B]. When ${p = \infty }$, we have

(17)

which has a reformulation where the constraint above is dualized,

$$\begin{aligned} \begin{array}{ll} \text {minimize} & f(x)\\ \text {subject to} \quad & \sum _{k=1}^{K} w_k s_k \le 0\\ & [-g_j]^*(z_{jk} - y_{jk},x) + \sigma _{{S}}(y_{jk}) - z_{jk}^T\bar{d}_k + \lambda _k \epsilon \le s_k\\ & \hspace{3cm} \quad k = 1,\dots , K, \quad j = 1,\dots , J\\ & \left\| z_{jk} \right\| _* \le \lambda _k \quad k = 1,\dots , K, \quad j = 1,\dots , J, \end{array} \end{aligned}$$

(18)

with new variables $s_k \in {\textbf{R}}$, $z_{jk} \in {\textbf{R}}^m$, and $y_{jk} \in {\textbf{R}}^m$. We prove this by starting from the inner optimization problem of (17):

$$\begin{aligned} \begin{aligned}&{\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S, \alpha \in \Gamma }{\sup } & \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) \\ \text{ subject } \text{ to } & \sum _{j=1}^J (\alpha _{jk}/w_k) \Vert v_{jk} - \bar{d}_k\Vert \le \epsilon , \quad k = 1,\dots ,K \end{array}\right. }\\&={\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S, \alpha \in \Gamma }{\sup } & \underset{\lambda \ge 0}{\inf }~\sum _{k=1}^K (\sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) \\ & \qquad + \lambda _k (\epsilon - \sum _{j=1}^J (\alpha _{jk}/w_k) \Vert v_{jk} - \bar{d}_k\Vert ))\\ \end{array}\right. }\\&={\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0}{\inf }~\underset{v_{11}, \dots , v_{JK} \in S}{\sup }& \sum _{k=1}^K (\sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) \\ & \qquad + \lambda _k (\epsilon - \sum _{j=1}^J (\alpha _{jk}/w_k) \Vert v_{jk} - \bar{d}_k\Vert )) \end{array}\right. } \end{aligned} \end{aligned}$$

We have again formulated the Lagrangian and applied the minmax theorem to the sum of concave functions in $v_{jk}$. Now, we can rewrite this in epigraph form,

then use the definition of the dual norm to rewrite the constraints,

$$\begin{aligned} \begin{aligned}&{\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S}{\sup }~ \lambda _k \epsilon + \sum _{j=1}^J \underset{\Vert z_{jk}\Vert _* \le \lambda _k/w_k}{\min } \alpha _{jk}g_j(v_{jk}, x) \\ \qquad \quad \qquad - \alpha _{jk}z_{jk}^T(v_{jk} - \bar{d}_k)\le s_k \quad k = 1,\dots ,K, \end{array}\right. }\\&= {\left\{ \begin{array}{ll} & ~{\underset{\alpha _{11}v_{11}\in \alpha _{11}S, \dots , \alpha _{JK}v_{JK} \in \alpha _{JK}S}{\sup }}~ \lambda _k \epsilon + \sum _{j=1}^J \underset{\Vert z_{jk}\Vert _* \le \lambda _k/w_k}{\min } \alpha _{jk}g_j\\ & \qquad \qquad \qquad ((\alpha _{jk}v_{jk})/\alpha _{jk}, x) - \alpha _{jk}z_{jk}^T(v_{jk} - \bar{d}_k)\le s_k \quad \qquad k = 1,\dots ,K. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

In the last step, we again rewrote $v_{jk} = (\alpha _{jk}v_{jk})/\alpha _{jk}$ inside functions $g_j$. Using similar logic as in Appendix A, we note that this does not change the optimal value for both $\alpha _{jk} > 0$ and $\alpha _{jk} = 0$, as taking the supremum over the transformed variables is equivalent to taking the supremum over the original variables. For conciseness, we omit the steps of creating the right-scalar-multiplied function and transformations. For details, please refer to Appendix A. We apply the definition of conjugate functions and arrive at the optimization problem

$$\begin{aligned} \begin{aligned}&{\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,{s,z}}{\inf } & \sum _{k=1} ^K s_k \\ \text{ subject } \text{ to }& \lambda _k \epsilon + \sum _{j=1}^J \alpha _{jk}[-g_j +\chi _{S}]^*(-z_{jk},x) +\alpha _{jk}z_{jk}^T\bar{d}_k \le s_k \\ & \quad k = 1,\dots ,K\\ & \Vert z_{jk}\Vert _* \le \lambda _k/w_k \quad k = 1,\dots ,K, \quad j = 1,\dots ,J. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

Now, substituting $\lambda _k = \lambda _k w_k$, $z_{jk} = -z_{jk}$, and substituting in the conjugate functions derived in Appendix A, we have

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,{s,z}}{\inf } & \sum _{k=1} ^K s_k \\ \text{ subject } \text{ to } & \lambda _k w_k\epsilon + \sum _{j=1}^J \alpha _{jk} ([-g_j]^*(z_{jk} - y_{jk},x) + \sigma _{S}(y_{jk}) \\ & \hspace{2cm} - z_{jk}^T\bar{d}_k) \le s_k \quad k = 1,\dots ,K\\ & \Vert z_{jk}\Vert _*\le \lambda _k,\quad k = 1,\dots ,K, \quad j = 1,\dots ,J. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

Note that rescaling $\lambda _k$ did not affect value of the problem, as minimizing $\lambda _k$ is equivalent to minimizing $\lambda _k w_k$, as $w_k >0.$ Lastly, taking the supremum over $\alpha $, we arrive at

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\lambda \ge 0,{s,z}}{\inf } & \sum _{k=1} ^K s_k \\ \text{ subject } \text{ to } & w_k(\lambda _k\epsilon + [-g_j]^*(z_{jk} - y_{jk},x) + \sigma _{S}(y_{jk}) - z_{jk}^T\bar{d}_k) \le s_k\\ & \quad k = 1,\dots ,K, \quad j = 1,\dots ,J\\ & \Vert z_{jk}\Vert _*\le \lambda _k,\quad k = 1,\dots ,K, \quad j = 1,\dots ,J. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

Proof of the dual problem reformulation as $\mathbf {p \rightarrow \infty }$

We prove the dual equivalence. For a proof of primal equivalence, see the appendix of [47].

Theorem 6

Let S be a bounded set. Define here

$$\begin{aligned} \bar{g}^K(x;\infty )=\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & \lambda \ge 0,~ z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k=1,\dots , K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k + \epsilon \left\| z_{k} \right\| _* \le s_k\\ & \hspace{5.5cm} \quad k=1,\dots , K. \end{array} \right. \end{aligned}$$

(19)

Then, $ \lim _{p\rightarrow \infty } \bar{g}^K(x;p)= \bar{g}^K(x;\infty ) $ for any $x\in X$.

Proof

First, from Equation (5) we have for any $p> 1$ that

$$\begin{aligned}&\bar{g}^K(x;p)\\ =&\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & \lambda \ge 0,~ z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k = 1,\dots ,K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k\\ & \qquad \qquad \quad + \phi (q)\lambda \left\| z_{k}/\lambda \right\| ^q_* + \lambda \epsilon ^p\le s_k \qquad k=1,\dots , K \end{array} \right. \\ \ge&\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & \lambda _k \ge 0,~ z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k = 1,\dots ,K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad \qquad \quad + \phi (q)\lambda _k \left\| z_{k}/\lambda _k \right\| ^q_* + \lambda _k \epsilon ^p\le s_k \qquad k=1,\dots , K \end{array} \right. \\ =&\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k = 1,\dots , K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad \qquad \quad + \epsilon \left\| z_{k} \right\| _*\le s_k \qquad k =1,\dots , K \end{array} \right. \end{aligned}$$

where the first equality is established in Appendix A and the second equality follows from Lemma 1. Remark that the inequality in the second step simply follows as we introduce $\lambda _k$ and do not impose that $\lambda _k=\lambda $ for all $k=1,\dots , K$. Hence, considering the limit for p tending to infinity gives us now $ {\liminf _{p\rightarrow \infty }} \bar{g}^K(x;p)\ge \bar{g}^K(x;\infty ). $ It remains to prove the reverse $ {\limsup _{p\rightarrow \infty }} \bar{g}^K(x;p) \le \bar{g}^K(x;\infty ). $

Second, we have for any $p>1$ with $1/p+1/q=1$ that

$$\begin{aligned}&\bar{g}^K(x;p)\\ \le&\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k=1,\dots ,K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k + \phi (q)\\ & \qquad \qquad \left[ \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \max _{k'=1}^K\Vert z_{k'} \Vert _*\right] ^{1-q} \left\| z_{k} \right\| ^q_* \\ & \qquad \qquad + \left[ \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \max _{k'=1}^K\Vert z_{k'} \Vert _*\right] \epsilon ^p\le s_k \quad k=1,\dots , K\\ & (q-1)^{1/4} \le \Vert z_k \Vert _* \le (q-1)^{-1/4}\quad k =1,\dots , K, \end{array} \right. \end{aligned}$$

which follows from the choice $\lambda _k = \left[ \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \max _{k'=1}^K\Vert z_{k'} \Vert _*\right] $ and by imposing the restrictions $(q-1)^{1/4} \le \Vert z_k \Vert _* \le (q-1)^{-1/4}$ for all $k =1,\dots , K$. Next, using the identities $\phi (q)\left[ \frac{q-1}{q}\right] ^{1-q} = (q-1)^{(q-1)}/q^q \left[ \frac{q-1}{q}\right] ^{1-q} = 1/q$ and $p+\frac{1}{1-q} = \frac{1}{\frac{1}{p}} + \frac{1}{1-q} = \frac{1}{1-\frac{1}{q}}+\frac{1}{1-q} = \frac{-q}{-q+1}+\frac{1}{1-q}=\frac{1-q}{1-q}=1$, as well as pulling $\epsilon \left\| z_{k} \right\| _*>0$ out of the last two terms, we can rewrite the first constraints as

$$\begin{aligned}&\left\{ \begin{array}{l@{~~~~}l} & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad +\epsilon \left\| z_{k} \right\| _* \left[ \frac{1}{q} \left[ \max _{k'=1}^K\frac{\Vert z_{k'} \Vert _*}{\left\| z_{k} \right\| _*}\right] ^{1-q} +\frac{q-1}{q} \max _{k'=1}^K\frac{\Vert z_{k'} \Vert _*}{\left\| z_{k} \right\| _*} \right] \le s_k \quad k =1,\dots , K.\\ \end{array} \right. \end{aligned}$$

Then, we note that $\max _{k'=1}^K \Vert z_{k'} \Vert _*/\left\| z_{k} \right\| _* \ge \Vert z_{k} \Vert _*/\left\| z_{k} \right\| _* = 1$ and $\max _{k'=1}^K \Vert z_{k'} \Vert _*/{\left\| z_{k} \right\| _*} \le (q-1)^{-1/2}$, and hence we can apply Lemma 2 to obtain

$$\begin{aligned}&\bar{g}^K(x;p) \le \left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k=1,\dots , K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad \qquad +\epsilon \left\| z_{k} \right\| _* D(q) \le s_k \quad k =1,\dots , K,\\ & (q-1)^{1/4} \le \Vert z_k \Vert _* \le (q-1)^{-1/4}\quad k=1,\dots , K. \end{array} \right. \end{aligned}$$

Let

$$\begin{aligned} \bar{g}_u^K(x;p)=\left\{ \begin{array}{l@{~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k =1,\dots , K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad \qquad +\epsilon \left\| z_{k} \right\| _* D\left( \frac{p}{p-1}\right) \le s_k \quad k =1,\dots , K\\ & (p-1)^{-1/4} \le \Vert z_k \Vert _* \le (p-1)^{1/4}\quad k =1,\dots , K. \end{array} \right. \end{aligned}$$

(20)

Hence, as $q=p/(p-1)$ and $q-1=1/(p-1)$ we have $ \bar{g}^K(x;p) \le \bar{g}_u^K(x;p) $ for all $p>1$. Hence, taking the limit $p\rightarrow \infty $ we have $\bar{g}^K(x;\infty )\le \limsup _{p\rightarrow \infty } \bar{g}^K(x;p)$. In fact, as the function $D\left( \frac{p}{p-1}\right) $ defined in Lemma 2 is nonincreasing for all p sufficiently large this implies that $\bar{g}^K_u(x;p)$ is nonincreasing for p sufficiently large and hence we have ${\bar{g}^K(x;\infty )\le \limsup _{p\rightarrow \infty } \bar{g}^K(x;p) }\le \liminf _{p\rightarrow \infty } \bar{g}_u^K(x;p) = \lim _{p\rightarrow \infty } \bar{g}_u^K(x;p)$. We now prove here that $\lim _{p\rightarrow \infty } \bar{g}_u^K(x;p) = \liminf _{p\rightarrow \infty } \bar{g}_u^K(x;p)\le \bar{g}^K(x;\infty )$. Consider any feasible sequence $\{(z^t_{k}, \,y^t_{k}, \,s^t_k=[-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k +\epsilon \left\| z^t_{k} \right\| _* )\}_{t\ge 1}$ in the optimization problem characterizing $\bar{g}^K(x;\infty )$ in Equation (19) so that $\lim _{t\rightarrow \infty }\sum _{k=1}^K w_k s^t_k=\bar{g}^K(x;\infty )$. Let $\tilde{z}^t_{k}\in \arg \max \{ \Vert z \Vert _* \mid z\in {\textbf{R}}^m, \, \Vert z-z_k^t \Vert _*\le 1/t \}$ for all $t\ge 1$ and $k=1,\dots , K$ and observe that $\Vert \tilde{z}^t_k \Vert _*= 1/t+\Vert z_k^t \Vert _*\ge 1/t$. Consider now an increasing sequence $\{p_t\}_{t\ge 1}$ so that $(p_t-1)^{1/4}\ge \max ^K_{k=1} \Vert \tilde{z}^t_k \Vert _*$ and $(p_t-1)^{-1/4}\le 1/t$. Finally observe that the auxiliary sequence $\{(\tilde{z}^t_{k}, \,\tilde{y}^t_{k}=y^t_k + (\tilde{z}^t_{k}-z^t_{k}), \,\tilde{s}^t_k= [-g]^*(\tilde{z}^t_{k} - \tilde{y}^t_{k}) + \sigma _{S}(\tilde{y}^t_{k}) - (\tilde{z}^t_{k})^T{d}_k +\epsilon \left\| \tilde{z}^t_{k} \right\| _* D\left( p_t/(p_t-1)\right) )\}_{t\ge 1}$ is by construction feasible in the minimization problem characterizing the function $\bar{g}_u^K(x;p_t)$ in Equation (20). Hence, finally, we have

$$\begin{aligned} \lim _{p\rightarrow \infty }&g_u^K(x;p) = \lim _{t\rightarrow \infty } g_u^K(x;p_t) = \lim _{t\rightarrow \infty } \sum _{k=1}^K w_k \tilde{s}_k^t\\ {=}&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k\left( [-g]^*(\tilde{z}^t_{k} - \tilde{y}^t_{k}) + \sigma _{S}(\tilde{y}^t_{k}) - (\tilde{z}^t_{k})^T{d}_k +\epsilon \left\| \tilde{z}^t_{k} \right\| _* D\left( p_t/(p_t-1)\right) \right) \\ \le&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k\left( [-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k +\epsilon \left\| z^t_{k} \right\| _* D\left( p_t/(p_t-1)\right) \right) \\&\qquad + \sum _{k=1}^Kw_k\left( \max _{s\in S} \Vert s \Vert +\Vert d_k \Vert +\epsilon D\left( p_t/(p_t-1)\right) \right) /t\\ \le&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k\left( [-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k +\epsilon \left\| z^t_{k} \right\| _* D\left( p_t/(p_t-1)\right) \right) \\ {=}&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k([-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k \\&\quad +\epsilon \left\| z^t_{k} \right\| _* + \epsilon \left\| z^t_{k} \right\| _* \left( D\left( p_t/(p_t-1)\right) -1\right) )\\ \le&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k([-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k \\&\quad +\epsilon \left\| z^t_{k} \right\| _* + \epsilon (p_t-1)^{1/4} \left( D\left( p_t/(p_t-1)\right) -1\right) )\\ \le&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_ks_k = \bar{g}^K(x;\infty ). \end{aligned}$$

To establish the third inequality observe first that $-(\tilde{z}^t_{k})^T{d}_k = -(z^t_{k})^T{d}_k - (\tilde{z}^t_k-z^t_k)^Td_k\le -(z^t_{k})^T{d}_k + \Vert \tilde{z}^t_k-z^t_k \Vert _* \Vert d_k \Vert \le -(z^t_{k})^T{d}_k + \Vert d_k \Vert /t$. Second, remark that we have

$$\begin{aligned} \sigma _{S}(\tilde{y}^t_{k}) = \sigma _{S}(y^t_{k}+(\tilde{z}^t_k-z_k^t)) \le&\max _{s\in S} s^T(y^t_{k}+(\tilde{z}^t_k-z_k^t)) \\ \le&\max _{s\in S} s^Ty^t_{k}+\max _{s\in S} s^T(\tilde{z}^t_k-z_k^t)\\ \le&{\max _{s\in S} s^Ty^t_{k}+ \Vert s \Vert \Vert \tilde{z}^t_k-z_k^t \Vert _* \le \max _{s\in S}~s^Ty^t_{k}+ 1/t \max _{s\in S} \Vert s \Vert }, \end{aligned}$$

as $\Vert \tilde{z}_t-z_t \Vert \le 1/t$. Lemma 2 guarantees that $\lim _{t\rightarrow \infty } D\left( p_t/(p_t-1)\right) =1$. Finally, $\left\| z^t_{k} \right\| _*\le \left\| \tilde{z}^t_{k} \right\| _*\le (p_t-1)^{1/4}$ and

$$\begin{aligned} \lim _{t\rightarrow \infty } (p_t-1)^{1/4} \left( D\left( p_t/(p_t-1)\right) -1\right) =&\lim _{p\rightarrow \infty } (p-1)^{1/4} \left( D\left( p/(p-1)\right) -1\right) \\ =&\lim _{q\rightarrow 1} (q-1)^{-1/4} \left( D(q)-1 \right) =0 \end{aligned}$$

with $1/p+1/q=1$ using again Lemma 2. $\square $

Lemma 1

We have

$$ \min _{\lambda \ge 0} ~\phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p = \Vert z \Vert _*\epsilon $$

for any $p>1$ and $q>1$ for which $1/p+1/q=1$, $\phi (q)=(q-1)^{q-1}/q^q$ and $\epsilon >0$.

Proof

Remark that as the objective function $\lambda \mapsto \phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p$ is continuous and we have $\lim _{\lambda \rightarrow 0} \phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p = \lim _{\lambda \rightarrow \infty } \phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p=\infty $ as $\epsilon >0$ there must exist a minimizer $\lambda ^\star \in \min _{\lambda \ge 0}\, \phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p$ with $\lambda _\star >0$. The necessary and sufficient first-order convex optimality conditions of the minimization problem guarantee

$$\begin{aligned}&\lambda ^\star \in \min _{\lambda \ge 0} ~\phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p \iff (1-q)\phi (q) \lambda _\star ^{-q} \Vert z \Vert _\star ^q + \epsilon ^p = 0\\ \iff&\epsilon ^p = (q-1) \phi (q) \lambda _\star ^{-q} \Vert z \Vert _\star ^q \iff \lambda _\star = \left[ (q-1) \phi (q)\right] ^{1/q} \Vert z \Vert _\star \epsilon ^{-p/q} \\&\iff \lambda _\star = \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \Vert z \Vert _\star \end{aligned}$$

where we exploit that $1/p+1/q=1$ and $\phi (q)=(q-1)^{q-1}/q^q$. Indeed, we have

$$\begin{aligned} \left[ (q-1) \phi (q)\right] ^{1/q}= & \left[ (q-1)^{q}/q^q\right] ^{1/q}= (q-1)/q, \quad \\ -\frac{p}{q}= & -\frac{1}{\frac{1}{p} q} = -\frac{1}{(1-1/q) q}=-\frac{1}{q-1}=\frac{1}{1-q}. \end{aligned}$$

Hence, we have

$$\begin{aligned}&\min _{\lambda \ge 0} ~\phi (q) \lambda ^{1-q} \Vert z \Vert _\star ^q+\lambda \epsilon ^p= ~\phi (q) \lambda _\star ^{1-q} \Vert z \Vert _\star ^q+\lambda _\star \epsilon ^p\\&\qquad = ~\phi (q) \left[ \frac{(q-1)^{1-q}}{q^{1-q}} \epsilon \Vert z \Vert ^{1-q}_\star \right] \Vert z \Vert _\star ^q +\left[ \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \Vert z \Vert _\star \right] \epsilon ^p\\&\qquad = ~\phi (q) \frac{(q-1)^{1-q}}{q^{1-q}} \epsilon \Vert z \Vert _\star +\frac{q-1}{q} \epsilon ^{p+\frac{1}{1-q}} \Vert z \Vert _\star \\&\qquad = \phi (q) \frac{(q-1)^{1-q}}{q^{1-q}} \epsilon \Vert z \Vert _\star +\frac{q-1}{q} \epsilon \Vert z \Vert _\star \\&\qquad = ~\frac{(q-1)^{q-1}}{q^q} \frac{(q-1)^{1-q}}{q^{1-q}} \epsilon \Vert z \Vert _\star +\frac{q-1}{q} \epsilon \Vert z \Vert _\star = \frac{1}{q} \epsilon \Vert z \Vert _\star +\frac{q-1}{q} \epsilon \Vert z \Vert _\star \\&\qquad = \left[ \frac{1}{q} +\frac{q-1}{q} \right] \epsilon \Vert z \Vert _\star = \epsilon \Vert z \Vert _\star \end{aligned}$$

where we exploit that $1/p+1/q=1$ and $\phi (q)=(q-1)^{q-1}/q^q$. Indeed, we have

$$ p+\frac{1}{1-q} = \frac{1}{\frac{1}{p}} + \frac{1}{1-q} = \frac{1}{1-\frac{1}{q}}+\frac{1}{1-q} = \frac{-q}{-q+1}+\frac{1}{1-q}=\frac{1-q}{1-q}=1 $$

establishing the claim. $\square $

Lemma 2

Let $q>1$ then

$$ \max _{t\in [1, 1/\sqrt{q-1}]} ~\frac{1}{q} t^{1-q} +\frac{q-1}{q} t = D(q)=\max \left( 1, \frac{1}{q} \frac{1}{(q-1)^{(1-q)/2}} +\frac{\sqrt{q-1}}{q}\right) $$

with $\lim _{q\rightarrow 1} D(q)=1$ and $\lim _{q\rightarrow 1} (q-1)^{1/4} \left( D(q)-1\right) =0$.

Proof

The objective function is convex in t. Convex functions attain their maximum on the extreme points of their domain. The limits can be verified using standard manipulations. $\square $

Proof of Theorem 4

We prove (i) $\bar{g}^N(x) \le \bar{g}^K(x)$, (ii) $\bar{g}^K(x) \le \bar{g}^{N*}(x) + (L/2) D(K)$, and (iii) when the support constraint does not affect the worst-case value, ${\bar{g}^K(x) \le \bar{g}^{N}(x) + (L/2) D(K)}$.

Proof of (i). We begin with a feasible solution $v_1, \dots , v_{N}$ of (MRO-K) with $K=N$. Then for $K < N$, we set $u_k = \sum _{i \in C_k}{v_i}/|C_k|$ for each of the K clusters. We see $u_k$ with $k = 1,\dots ,K$ satisfies the constraints of (MRO-K) for $K < N$, as

$$\begin{aligned} \sum _{k = 1}^K \frac{|C_k|}{N}\left\| u_k - \bar{d}_k \right\| ^p&= \sum _{k = 1}^K \frac{|C_k|}{N} \left\| \frac{\sum _{i \in C_k}{v_i}}{|C_k|} - \frac{\sum _{i \in C_k}{d_i}}{|C_k|} \right\| ^p\\&\le \sum _{k = 1}^K \frac{|C_k|}{N} \sum _{i \in C_k} \frac{1}{|C_k|} \Vert v_i - d_i\Vert ^p\\&= \sum _{k = 1}^K \frac{1}{N} \sum _{i \in C_k} \Vert v_i - d_i\Vert ^p \le \epsilon ^p, \end{aligned}$$

where we have applied triangle inequality, Jensen’s inequality for the convex function $f(x) = \Vert x\Vert ^p$, and the constraint of (MRO-K) with $K=N$. In addition, since the support $S$ is convex, for every k our constructed $u_k$, as the average of select points $v_i$ $\in S$, must also be within $S$. The same applies with respect to the domain of g.

Since we have shown that the $u_k$’s satisfies the constraints for (MRO-K) with $K<N$, it is a feasible solution. We now show that for this pair of feasible solutions, in terms of the objective value, $\bar{g}^K(x) \ge \bar{g}^N(x)$. By assumption, g is concave in the uncertain parameter, so by Jensen’s inequality,

$$\begin{aligned} \sum _{k = 1}^K \frac{|C_k|}{N} g \left( \frac{1}{|C_k|}\sum _{i \in C_k}v_i, x\right)&\ge \sum _{k = 1}^K \frac{|C_k|}{N} \frac{1}{|C_k|}\sum _{i \in C_k} g(v_i)\\ \sum _{k = 1}^K \frac{|C_k|}{N}g(u_k,x)&\ge \frac{1}{N}\sum _{i \in N} g(v_i). \end{aligned}$$

Since this holds true for $u_k$’s constructed from any feasible solution $v_i, \dots , v_N$, we must have $\bar{g}^K(x) \ge \bar{g}^N(x)$.

Proof of (ii). Next, we prove $\bar{g}^K(x) \le \bar{g}^{N*}(x) + (L/2) D(K)$ by making use of the L-smooth condition on $-g$. We first solve (MRO-K) with $K < N$ to obtain a feasible solution $u_1, \dots , u_k$. We then set $\Delta _k =u_k - \bar{d}_k$ for each $k \le K$, and set $v_i = d_i + \Delta _k \quad \forall i \in C_k, k = 1,\dots ,K$. These satisfy the constraint of (MRO-K) with $K=N$ and no support constraints (which we will refer to as $K=N^*$ from here on), as

$$\begin{aligned} \frac{1}{N} \sum _{i=1}^N \Vert v_i - d_i \Vert ^p = \frac{1}{N} \sum _{k=1}^K \sum _{i \in C_k} \Vert \Delta _k\Vert ^p = \sum _{k=1}^K \frac{|C_k|}{N} \Vert u_k - \bar{d}_k\Vert ^p \le \epsilon ^p. \end{aligned}$$

Since the constraints are satisfied, the constructed $v_i \dots v_N$ are a valid solution for (MRO-K) with $K=N^*$. We note that these $v_i$’s are also in the domain of g, given that the uncertain data $\mathcal {D}_N$ is in the domain of g. For monotonically increasing functions g, (e.g., $\log (u)$, $1/(1+u)$), we must have $\Delta _k = u_k - \bar{d_k} \ge 0$ in the solution of (MRO-K), as the maximization of g over $u_k$ will lead to $u_k \ge \bar{d_k}$. Therefore, $v_i = d_i + \Delta _k$ is also in the domain, as the L-smooth and concave function g with only a potential lower bound will not have holes in its domain above the lower bound. For monotonically decreasing functions g, the same logic applies with a nonpositive $\Delta _k$. We now make use of the convex and L-smooth conditions [2, Theorem 5.8] on $-g: \forall v_1, v_2 \in S, \lambda \in [0,1],$

$$\begin{aligned} g(\lambda v_1 + (1-\lambda ) v_2) \le \lambda g(v_1) + (1 - \lambda )g(v_2) + \frac{L}{2}\lambda (1 - \lambda )\Vert v_1 - v_2 \Vert ^2_2, \end{aligned}$$

which, we can apply iteratively, with the first iteration being

$$\begin{aligned} g\left( \frac{1}{|C_k|} v_1 + \frac{|C_k|-1}{|C_k|} \bar{v}_2\right)\le & \frac{1}{|C_k|}g(v_1) + \frac{|C_k|-1}{|C_k|}g(\bar{v}_2) \\ & + \frac{L}{2}\frac{1}{|C_k|}\frac{|C_k|-1}{|C_k|}\Vert v_1 - \bar{v}_2 \Vert ^2_2, \end{aligned}$$

where $\bar{v}_2 = \frac{1}{|C_k|-1}\sum _{i \in C_k, i\ne 1} v_i$. Note that $v_1 - \bar{v}_2 = d_1 - \frac{1}{|C_k|-1}\sum _{i \in C_k, i\ne 1} d_i$, as they share the same ${\Delta _k}$. The next iteration will be applied to $g(\bar{v}_2)$, and so on. For each cluster k, this results in:

$$\begin{aligned} g \left( \frac{1}{|C_k|}\sum _{i \in C_k} v_i, x\right)&\le \frac{1}{|C_k|}\sum _{i \in C_k} g(v_i,x) + \frac{L}{2|C_k|} \sum _{i=2} ^{|C_k|} \frac{i-1}{i} \left\| d_i - \frac{\sum _{j=1}^{i-1}d_j}{i-1} \right\| ^2_2\\ g(\bar{d}_k + \Delta _k,x)&\le \frac{1}{|C_k|}\sum _{i \in C_k} g(v_i,x) + \frac{L}{2|C_k|} \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2\\ g(u_k,x)&\le \frac{1}{|C_k|}\sum _{i \in C_k} g(v_i,x) + \frac{L}{2|C_k|} \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2, \end{aligned}$$

where we used the equivalence $ \sum _{i=2} ^{|C_k|}((i-1)/{i}) \left\| d_i - {\sum _{j=1}^{i-1}d_j}/({i-1}) \right\| ^2_2 = \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2. $ Now, summing over all clusters, we have

$$\begin{aligned} \sum _{k = 1}^K ({|C_k|}/{N} ) g(u_k,x)&\le {1}/{N} \sum _{i = 1}^N g(v_i,x) + (L/2) D(K). \end{aligned}$$

Since this holds for any feasible solution of (MRO-K) with $K<N$, we must have $ \bar{g}^K(x) \le \bar{g}^{N*}(x) + (L/2) D(K)$.

Proof of Theorem 5

Proof of the lower bound. We use the dual formulations of the MRO constraints. For the case $p \ge 1$, we first solve (MRO-K-Dual) with $K<N$ to obtain dual variables $z_{jk}$, $\gamma _{jk}$. For each data label i in cluster $C_k$, for all clusters $k = 1,\dots ,K$, and for all pieces $j = 1,\dots , J$, if we set

$$\begin{aligned} {\lambda = \lambda },\quad z_{ji} = z_{jk}, \quad \gamma _{ji} = \gamma _{jk}, \quad s_i = s_k + \max _j\{(-z_{jk} - H^T\gamma _{jk})^T (d_i - \bar{d}_k)\}, \end{aligned}$$

we have obtained a valid solution for (MRO-K-Dual) with $K=N$. The increase in the objective value from $\bar{g}^K(x)$ to that of $\bar{g}^N(x)$, ${i.e.}~\bar{g}^N(x)- \bar{g}^K(x)$, is

$$\begin{aligned} \begin{aligned} \delta (K,z,\gamma )&=(1/N)\sum _{i=1}^N s_i - \sum _{k=1}^K(|C_k|/N)s_k \\&= (1/N)\sum _{k = 1}^K\sum _{i\in C_k} \max _j\{(-z_{jk} - H^T\gamma _{jk})^T(d_i - \bar{d}_k)\}. \end{aligned} \end{aligned}$$

We note that $\delta (K,z,\gamma ) \ge (|C_k|/N)\sum _{k = 1}^K (-z_{1k} - H^T\gamma _{1k})^T (1/|C_k|)\sum _{i\in C_k} (d_i - \bar{d}_k) = (|C_k|/N) \sum _{k = 1}^K (-z_{1k} - H^T\gamma _{1k})^T0 = 0$. The constructed feasible solution for (MRO-K-Dual) with $K=N$ is an upper bound for its optimal solution, since it is a minimization problem. We then have $\bar{g}^N(x) - \bar{g}^K(x) \le \delta (K,z,\gamma )$, which translates to $\bar{g}^N(x) - \delta (K,z,\gamma ) \le \bar{g}^K(x)$.

Now, for $p=\infty $, the same procedure can be applied; we obtain a solution to MRO-K-Dual-$\infty $ with $K < N$, and construct a feasible solution for MRO-K-Dual-$\infty $ with $K=N$. We modify the variables in the same manner, with the exception of $\lambda $, which is now set by $\lambda _i = \lambda _k$. The definition of $\delta (K,z,\gamma )$ remains unchanged, so the same result follows.

Proof of the upper bound. We use the primal formulations of the MRO constraints. Similar to the single-concave proof, for $p \ge 1$, we first solve the MRO problem with K clusters to obtain a feasible solution $u_{11}, \dots , u_{JK}$, $\alpha _{11}, \dots , \alpha _{JK}$. Note that the existence of $\alpha $ removes the need to analyze which function $g_j$ attains the maximum for each cluster. We then set $\Delta _{jk} =u_{jk} - \bar{d}_k$ for each $k \le K$, and set $v_{ji} = d_i + \Delta _{jk}$, $\alpha _{ji} = \alpha _{jk}/|C_k| \quad \forall i \in C_k, k = 1,\dots ,K$. These satisfy the constraint of the problem with N clusters and without support constraints, as

$$\begin{aligned} \sum _{i=1}^N \sum _{j=1}^J \alpha _{ji} \Vert v_{ji} - d_i \Vert ^p&= \sum _{k=1}^K \sum _{j=1}^J \sum _{i \in C_k} \alpha _{ji} \Vert \Delta _{jk}\Vert ^p = \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} \Vert u_{jk} - \bar{d}_k\Vert ^p \le \epsilon ^p. \end{aligned}$$

For $p = \infty $, we repeat the process of obtaining and modifying a feasible solution, and observe, for $i=1,\dots ,N$,

$$\begin{aligned} \sum _{j=1}^J (\alpha _{ji}/(1/N)) \Vert v_{ji} - d_i \Vert&= \sum _{j=1}^J (\alpha _{jk}/(|C_k|/N)) \Vert \Delta _{jk}\Vert \\&= \sum _{j=1}^J (\alpha _{jk}/w_k) \Vert u_{jk} - \bar{d}_k\Vert \le \epsilon . \end{aligned}$$

Therefore, we also have constraint satisfaction for $p = \infty $. The constructed solutions for both cases remain in $\mathop \textbf{dom}_u{g}$, following the arguments in the proof of (ii) in Appendix D.

Next, the objective functions for $p\ge 1$ and $p =\infty $ are identical, so the same analysis applies. For each cluster k and function $g_j$, using the L-smooth condition on $-g_j$, we observe

$$\begin{aligned} \alpha _{jk} g_j\left( \frac{1}{|C_k|}\sum _{i \in C_k} v_{ji}, x\right)&\le \alpha _{jk} \left( \sum _{i \in C_k} \frac{1}{|C_k|} g_j(v_{ji},x)\right. \\&\quad \left. + \frac{L_j}{2|C_k|} \sum _{i=2} ^{|C_k|} \frac{i-1}{i} \left\| d_i - \frac{\sum _{j=1}^{i-1}d_j}{i-1} \right\| ^2_2\right) \\ \alpha _{jk} g_j(\bar{d}_k + \Delta _k,x)&\le \sum _{i \in C_k} \alpha _{ji} g_j(v_{ji},x) + \frac{\alpha _{ji} L_j}{2} \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2. \end{aligned}$$

Then, summing over all the clusters and functions, we have

$$\begin{aligned} \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(u_{jk},x)&\le \sum _{k=1}^K \sum _{j=1}^J \sum _{i \in C_k} \alpha _{ji} g_j(v_{ji},x)\\&\quad + \sum _{k=1}^K \sum _{j=1}^J \frac{\alpha _{ji} \max _{j\le J} L_j}{2} \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2\\&\le \sum _{i=1}^N \sum _{j=1}^J \alpha _{ji} g(v_{ji},x) + \max _{j\le J} (L_j/2N)\sum _{i=1}^N \Vert d_i - \bar{d}_k\Vert ^2_2 . \end{aligned}$$

Since this holds for all feasible solutions of the problem with K clusters, we conclude that

$$\begin{aligned} \bar{g}^K(x) \le \bar{g}^{N*}(x) + \max _{j\le J} (L_j/2) D(K).\end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, I., Becker, C., Van Parys, B. et al. Mean robust optimization. Math. Program. (2024). https://doi.org/10.1007/s10107-024-02170-4

Download citation

Received: 27 December 2023
Accepted: 01 November 2024
Published: 28 November 2024
DOI: https://doi.org/10.1007/s10107-024-02170-4

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mixed uncertainty sets for robust combinatorial optimization

Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate

Robust Discrete Optimization Under Discrete and Interval Uncertainty: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proof of the constraint reformulation in (15)

Reformulation of the maximum-of-concave case for \({p = \infty }\)

Proof of the dual problem reformulation as \(\mathbf {p \rightarrow \infty }\)

Theorem 6

Proof

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Mean robust optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mixed uncertainty sets for robust combinatorial optimization

Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate

Robust Discrete Optimization Under Discrete and Interval Uncertainty: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proof of the constraint reformulation in (15)

Reformulation of the maximum-of-concave case for \({p = \infty }\)

Proof of the dual problem reformulation as \(\mathbf {p \rightarrow \infty }\)

Theorem 6

Proof

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation