[go: up one dir, main page]

Skip to main content
Log in

Mean robust optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Robust optimization is a tractable and expressive technique for decision-making under uncertainty, but it can lead to overly conservative decisions when pessimistic assumptions are made on the uncertain parameters. Wasserstein distributionally robust optimization can reduce conservatism by being data-driven, but it often leads to very large problems with prohibitive solution times. We introduce mean robust optimization, a general framework that combines the best of both worlds by providing a trade-off between computational effort and conservatism. We propose uncertainty sets constructed based on clustered data rather than on observed data points directly thereby significantly reducing problem size. By varying the number of clusters, our method bridges between robust and Wasserstein distributionally robust optimization. We show finite-sample performance guarantees and explicitly control the potential additional pessimism introduced by any clustering procedure. In addition, we prove conditions for which, when the uncertainty enters linearly in the constraints, clustering does not affect the optimal solution. We illustrate the efficiency and performance preservation of our method on several numerical examples, obtaining multiple orders of magnitude speedups in solution time with little-to-no effect on the solution quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bandi, C., Bertsimas, D.: Tractable stochastic analysis in high dimensions via robust optimization. Math. Program. 134(1), 23–70 (2012)

    Article  MathSciNet  Google Scholar 

  2. Beck, A.: First-Order Methods in Optimization. SIAM-Society for Industrial and Applied Mathematics, Philadelphia (2017)

    Book  Google Scholar 

  3. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)

    Book  Google Scholar 

  4. Ben-Tal, A., den Hertog, D., Vial, J.P.: Deriving robust counterparts of nonlinear uncertain inequalities. Math. Program. 149(1–2), 265–299 (2015)

    Article  MathSciNet  Google Scholar 

  5. Ben-Tal, A., Nemirovski, A.: Robust solutions of Linear Programming problems contaminated with uncertain data. Math. Program. 88(3), 411–424 (2000)

    Article  MathSciNet  Google Scholar 

  6. Ben-Tal, A., Nemirovski, A.: Selected topics in robust convex optimization. Math. Program. 112, 125–158 (2008)

    Article  MathSciNet  Google Scholar 

  7. Beraldi, P., Bruni, M.E.: A clustering approach for scenario tree reduction: an application to a stochastic programming portfolio optimization problem. TOP 22(3), 934–949 (2014)

    Article  MathSciNet  Google Scholar 

  8. Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)

    Article  MathSciNet  Google Scholar 

  9. Bertsimas, D., den Hertog, D., Pauphilet, J.: Probabilistic guarantees in robust optimization. SIAM J. Optim. 31(4), 2893–2920 (2021)

    Article  MathSciNet  Google Scholar 

  10. Bertsimas, D., Gupta, V., Kallus, N.: Data-driven robust optimization. Math. Program. 167(2), 235–292 (2018)

    Article  MathSciNet  Google Scholar 

  11. Bertsimas, D., den Hertog, D.: Robust and Adaptive Optimization. Dynamic Ideas, Belmont (2022)

    Google Scholar 

  12. Bertsimas, D., Mundru, N.: Optimization-based scenario reduction for data-driven two-stage stochastic optimization. Oper. Res. (2022)

  13. Bertsimas, D., Sim, M.: The price of robustness. Oper. Res. 52(1), 35–53 (2004)

    Article  MathSciNet  Google Scholar 

  14. Bertsimas, D., Sturt, B., Shtern, S.: A data-driven approach to multistage stochastic linear optimization. Manag. Sci. (2022)

  15. Bradley, B.O., Taqqu, M.S.: Financial Risk and Heavy Tails, Handbooks in Finance, vol. 1. North-Holland, Amsterdam (2003)

    Google Scholar 

  16. Carmona, R.A.: Rsafd: Statistical Analysis of Financial Data in R (2020). R package version 1.2

  17. Chen, L.: Clustering of sample average approximation for stochastic program (2015)

  18. Chen, R., Paschalidis, I.: Distributionally robust learning. Found. Trends Optim. 4(1–2), 1–243 (2020)

    Google Scholar 

  19. Chen, Z., Sim, M., Xiong, P.: Robust stochastic optimization made easy with RSOME. Manag. Sci. 66(8), 3329–3339 (2020)

    Article  Google Scholar 

  20. Clement, P., Desch, W.: An elementary proof of the triangle inequality for the Wasserstein metric. Proc. Am. Math. Soc. 136(1), 333–339 (2008)

    Article  MathSciNet  Google Scholar 

  21. Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3), 595–612 (2010)

    Article  MathSciNet  Google Scholar 

  22. Dupačová, J., Gröwe-Kuska, N., Römisch, W.: Scenario reduction in stochastic programming. Math. Program. 95(3), 493–511 (2003)

    Article  MathSciNet  Google Scholar 

  23. Emelogu, A., Chowdhury, S., Marufuzzaman, M., Bian, L., Eksioglu, B.: An enhanced sample average approximation method for stochastic optimization. Int. J. Prod. Econ. 182, 230–252 (2016)

    Article  Google Scholar 

  24. Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171, 115–166 (2018)

    Article  MathSciNet  Google Scholar 

  25. Esteban, P.A., Morales, J.M.: Partition-based distributionally robust optimization via optimal transport with order cone constraints. 4OR 20(3), 465–497 (2022)

    Article  MathSciNet  Google Scholar 

  26. Fabiani, F., Goulart, P.: The optimal transport paradigm enables data compression in data-driven robust control. In: 2021 American Control Conference (ACC), pp. 2412–2417 (2021)

  27. Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162(3), 707–738 (2015)

    Article  MathSciNet  Google Scholar 

  28. Gao, R.: Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality. CoRR (2020)

  29. Gao, R., Kleywegt, A.: Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 48, 603–655 (2023)

    Article  MathSciNet  Google Scholar 

  30. Givens, C.R., Shortt, R.M.: A class of Wasserstein metrics for probability distributions. Mich. Math. J. 31(2), 231–240 (1984)

    Article  MathSciNet  Google Scholar 

  31. Goh, J., Sim, M.: Distributionally robust optimization and its tractable approximations. Oper. Res. 58(4–part–1), 902–917 (2010)

    Article  MathSciNet  Google Scholar 

  32. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)

    Google Scholar 

  33. Holmberg, K., Rönnqvist, M., Yuan, D.: An exact algorithm for the capacitated facility location problems with single sourcing. Eur. J. Oper. Res. 113(3), 544–559 (1999)

    Article  Google Scholar 

  34. Jacobson, D., Hassan, M., Dong, Z.S.: Exploring the effect of clustering algorithms on sample average approximation. In: 2021 Institute of Industrial and Systems Engineers (IISE) Annual Conference & Expo (2021)

  35. Kuhn, D., Esfahani, P.M., Nguyen, V., Shafieezadeh-Abadeh, S.: Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning, pp. 130–166 (2019)

  36. Liu, Y., Yuan, X., Zhang, J.: Discrete approximation scheme in distributionally robust optimization. Numer. Math. Theory Methods Appl. 14(2), 285–320 (2021)

    Article  MathSciNet  Google Scholar 

  37. MOSEK ApS: The MOSEK optimization toolbox. Version 9.3. (2022)

  38. Neumann, J.V.: Zur theorie der gesellschaftsspiele. Math. Ann. 100, 295–320 (1928)

    Article  MathSciNet  Google Scholar 

  39. Perakis, G., Sim, M., Tang, Q., Xiong, P.: Robust pricing and production with information partitioning and adaptation. Manag. Sci. (2023)

  40. Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26(7), 1443–1471 (2002)

    Article  Google Scholar 

  41. Rockafellar, R.T., Wets, R.J.: Variational analysis. Grundlehren der mathematischen Wissenschaften (1998)

  42. Roos, E., den Hertog, D.: Reducing conservatism in robust optimization. INFORMS J. Comput. 32(4), 1109–1127 (2020)

    MathSciNet  Google Scholar 

  43. Rujeerapaiboon, N., Schindler, K., Kuhn, D., Wiesemann, W.: Scenario reduction revisited: fundamental limits and guarantees. Math. Program. 191(1), 207–242 (2022)

    Article  MathSciNet  Google Scholar 

  44. Thorndike, R.: Who belongs in the family? Psychometrika 18(4), 267–276 (1953)

    Article  Google Scholar 

  45. Trillos, N., Slepčev, D.: On the rate of convergence of empirical measures in \(\infty \)-transportation distance. Can. J. Math. 67, 1358–1383 (2014)

    Article  MathSciNet  Google Scholar 

  46. Uryasev, S., Rockafellar, R.T.: Conditional Value-at-Risk: Optimization Approach. Springer, New York (2001)

    Google Scholar 

  47. Wang, I., Becker, C., Van Parys, B., Stellato, B.: Mean robust optimization. arXiv (2023)

  48. Wang, Z., Wang, P., Ye, Y.: Likelihood robust optimization for data-driven problems. CMS 13(2), 241–261 (2016)

    Article  MathSciNet  Google Scholar 

  49. Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62(6), 1358–1376 (2014)

    Article  MathSciNet  Google Scholar 

  50. Xu, H., Caramanis, C., Mannor, S.: A distributional interpretation of robust optimization. Math. Oper. Res. 37(1), 95–110 (2012)

    Article  MathSciNet  Google Scholar 

  51. Zhen, J., Kuhn, D., Wiesemann, W.: A unified theory of robust and distributionally robust optimization via the primal-worst-equals-dual-best principle (2021). https://doi.org/10.1287/opre.2021.0268

  52. Zymler, S., Kuhn, D., Rustem, B.: Distributionally robust joint chance constraints with second-order moment information. Math. Program. 137(1), 167–198 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The simulations presented in this article were performed on computational resources managed and supported by Princeton Research Computing, a consortium of groups including the Princeton Institute for Computational Science and Engineering (PICSciE) and the Office of Information Technology’s High Performance Computing Center and Visualization Laboratory at Princeton University. We would like to thank Daniel Kuhn for the useful feedback and for pointing us to the literature on scenario reduction techniques.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bartolomeo Stellato.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Irina Wang and Bartolomeo Stellato are supported by the NSF CAREER Award ECCS 2239771.

Appendices

Proof of the constraint reformulation in (15)

We give a prove for the general case of maximum-of-concave functions. When \(J=1\), we take \(\alpha _{1k} = w_k\), for all k. For a simpler proof for the case of single-concave functions, refer to [47, Appendix A].

To simplify notation, we define \(c_k(v_{jk}) =\Vert v_{jk} - \bar{d}_k \Vert ^p - \epsilon ^p\). Then, starting from the inner optimization problem of (14):

$$\begin{aligned} \begin{aligned}&{\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S, \alpha \in \Gamma }{\sup } & \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) \\ \text{ subject } \text{ to } & \sum _{k=1} ^K \sum _{j=1}^J \alpha _{jk} c_k(v_{jk}) \le 0 \end{array}\right. }\\&={\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S, \alpha \in \Gamma }{\sup } & \underset{\lambda \ge 0}{\inf }~\sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) - \lambda \sum _{k=1} ^K \sum _{j=1}^J \alpha _{jk} c_k(v_{jk})\\ \end{array}\right. }\\&={\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0}{\inf }~\underset{v_{11}, \dots , v_{JK} \in S}{\sup } & \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) - \lambda \sum _{k=1} ^K \sum _{j=1}^J \alpha _{jk} c_k(v_{jk}),\\ \end{array}\right. }\\ \end{aligned} \end{aligned}$$

We applied the Lagrangian in the first equality. Then, as the summation is over upper-semicontinuous functions \(g_j(v_{jk},x)\) concave in \(v_{jk}\), we applied the Von Neumann-Fan minimax theorem [38] to interchange the inf and the sup. Next, we rewrite the formulation using an epigraph trick, and make a change of variables.

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0, s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }~\underset{v_{11}, \dots , v_{JK} \in S}{\sup }& \sum _{j=1}^J \alpha _{jk}(g_j(v_{jk}, x) - \lambda c_k(v_{jk})) \le s_k \quad k = 1,\dots ,K, \end{array}\right. }\\&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }~{\underset{\alpha _{11}v_{11}\in \alpha _{11}S, \dots , \alpha _{JK}v_{JK} \in \alpha _{JK}S}{\sup }}& \sum _{j=1}^J \alpha _{jk}(g_j((\alpha _{jk}v_{jk})/\alpha _{jk}, x) \\ & - \lambda c_k((\alpha _{jk}v_{jk})/\alpha _{jk})) \le s_k \quad k = 1,\dots ,K. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

In the last step, we rewrote \(v_{jk} = (\alpha _{jk}v_{jk})/\alpha _{jk}\), and maximized over \({\alpha _{jk}v_{jk} \in \alpha _{jk}S}\). In the case \(\alpha _{ij} > 0\), the terms in the summation are unchanged, and maximizing over \(v_{jk}\) is equivalent to maximizing over \(\alpha _{jk}v_{jk}\). In the case \(\alpha _{jk}= 0\), we have in the transformed formulation \((\alpha _{jk}v_{jk})/\alpha _{jk} = 0/0 = 0\), and \(\alpha _{jk}(g_j(0, x) - \lambda c_k(0)) = 0(g_j(0, x) - \lambda c_k(0)) = 0\). In the original formulation, the term \(\alpha _{jk}(g_j(v_{jk}, x) - \lambda c_k(v_{jk}))\) is also equivalent to 0. As the terms in the summation are 0 regardless of the value of \(v_{jk}\), maximizing over \(v_{jk}\) is equivalent to maximizing over 0. Therefore, we note that the optimal value remains unchanged. Next, we make substitutions \({h_{jk}} = \alpha _{jk}v_{jk}\), and define functions \(g_j'(h_{jk},x) = \alpha _{jk}g_j(h_{jk}/\alpha _{jk},x)\), \(c_k'(h_{jk}) = \alpha _{jk}{c_k}(h_{jk}/\alpha _{jk})\).

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }~{\underset{h_{11}\in \alpha _{11}S, \dots , h_{JK} \in \alpha _{JK}S}{\sup }}& \sum _{j=1}^J g_j'(h_{jk}, x)- \lambda c_k'(h_{jk}) \le s_k \quad k = 1,\dots ,K, \end{array}\right. }\\&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }& \sum _{j=1}^J [-g_j' +{\chi _{\alpha _{jk}S}} + \lambda c_k']^*(0) \le s_k \quad k = 1,\dots ,K. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

For the new functions defined, we applied the definition of conjugate functions. We also define the characteristic function \(\chi _{S}(v)\) with \(\chi _{S}(v) = 0\) if \(v \in S\); \(=\infty \) otherwise. Now, using the conjugate form \(f^*(y) = \alpha g^*(y)\) of a right-scalar-multiplied function \(f(x) = \alpha g(x/\alpha )\), and noting that \(\chi _{\alpha _{jk}S}(h)\) takes the same value as \(\alpha _{jk} \chi _{S}(h/{\alpha _{jk}}) \), we can rewrite the constraint as

$$\begin{aligned} \begin{aligned} \sum _{j=1}^J \alpha _{jk}[-g_j +\chi _{S} + \lambda c_k]^*(0) \le s_k \quad k = 1,\dots ,K. \end{aligned} \end{aligned}$$

Next, borrowing results from Esfahani et al. [24, Theorem 4.2], Rockafellar and Wets [41, Theorem 11.23(a), p. 493], and Zhen et al. [51, Lemma B.8], with regards to the conjugate functions of infimal convolutions and p-norm balls, we note that:

$$\begin{aligned} \begin{aligned} \alpha _{jk}[(-g_j +\chi _{S} + \lambda c_k)]^* (0)&= \alpha _{jk}\underset{y_{jk},z_{jk}}{\inf } ([-g_j]^*(z_{jk} - y_{jk},x) \\&\quad + \sigma _{S}(y_{jk}) + [\lambda c_k]^* (-z_{jk})), \end{aligned} \end{aligned}$$
$$\begin{aligned} {[}\lambda c_k]^* (-z_{jk})= & \underset{v_{jk}}{\sup }(-z_{jk}^T v_{jk} - \lambda \Vert v_{jk} - \bar{d}_k\Vert ^p + \lambda \epsilon ^p)\\= & -z_{jk}^T\bar{d}_k + \phi (q)\lambda {\Vert }z_{jk}/\lambda {\Vert }^q_* + \lambda \epsilon ^p. \end{aligned}$$

Substituting this in, we have

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0, {s,z,y}}{\inf } & \sum _k^K s_k\\ \text{ subject } \text{ to } & \sum _{j=1}^J \alpha _{jk}([-g_j]^*(z_{jk} - y_{jk},x)\\ & + \sigma _{S}(y_{jk}) - z_{jk}^T\bar{d}_k + \phi (q)\lambda \left\| z_{jk}/\lambda \right\| ^q_* + \lambda \epsilon ^p ) \le s_k \\ & \hspace{2cm} \quad k = 1,\dots ,K. \end{array}\right. } \end{aligned} \end{aligned}$$

Taking the supremum over \(\alpha \), noting that \(\sum _{j=1}^J \alpha _{jk} = w_k\) for all k, we arrive at

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} ~\underset{\lambda \ge 0, {s,z,y}}{\inf } & \sum _k^K s_k\\ \text{ subject } \text{ to } & w_k ([-g_j]^*(z_{jk} - y_{jk},x) + \sigma _{S}(y_{jk}) - z_{jk}^T\bar{d}_k \\ & ~~ + \phi (q)\lambda \left\| z_{jk}/\lambda \right\| ^q_* + \lambda \epsilon ^p ) \le s_k \quad k = 1,\dots ,K, \quad j = 1,\dots ,J, \end{array}\right. } \end{aligned} \end{aligned}$$

which is equivalent to (15).

Reformulation of the maximum-of-concave case for \({p = \infty }\)

We again give the general proof for \(J \ge 1\). For the case \(J=1\), refer to the simpler proof [47, Appendix B]. When \({p = \infty }\), we have

(17)

which has a reformulation where the constraint above is dualized,

$$\begin{aligned} \begin{array}{ll} \text {minimize} & f(x)\\ \text {subject to} \quad & \sum _{k=1}^{K} w_k s_k \le 0\\ & [-g_j]^*(z_{jk} - y_{jk},x) + \sigma _{{S}}(y_{jk}) - z_{jk}^T\bar{d}_k + \lambda _k \epsilon \le s_k\\ & \hspace{3cm} \quad k = 1,\dots , K, \quad j = 1,\dots , J\\ & \left\| z_{jk} \right\| _* \le \lambda _k \quad k = 1,\dots , K, \quad j = 1,\dots , J, \end{array} \end{aligned}$$
(18)

with new variables \(s_k \in {\textbf{R}}\), \(z_{jk} \in {\textbf{R}}^m\), and \(y_{jk} \in {\textbf{R}}^m\). We prove this by starting from the inner optimization problem of (17):

$$\begin{aligned} \begin{aligned}&{\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S, \alpha \in \Gamma }{\sup } & \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) \\ \text{ subject } \text{ to } & \sum _{j=1}^J (\alpha _{jk}/w_k) \Vert v_{jk} - \bar{d}_k\Vert \le \epsilon , \quad k = 1,\dots ,K \end{array}\right. }\\&={\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S, \alpha \in \Gamma }{\sup } & \underset{\lambda \ge 0}{\inf }~\sum _{k=1}^K (\sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) \\ & \qquad + \lambda _k (\epsilon - \sum _{j=1}^J (\alpha _{jk}/w_k) \Vert v_{jk} - \bar{d}_k\Vert ))\\ \end{array}\right. }\\&={\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0}{\inf }~\underset{v_{11}, \dots , v_{JK} \in S}{\sup }& \sum _{k=1}^K (\sum _{j=1}^J \alpha _{jk} g_j(v_{jk}, x) \\ & \qquad + \lambda _k (\epsilon - \sum _{j=1}^J (\alpha _{jk}/w_k) \Vert v_{jk} - \bar{d}_k\Vert )) \end{array}\right. } \end{aligned} \end{aligned}$$

We have again formulated the Lagrangian and applied the minmax theorem to the sum of concave functions in \(v_{jk}\). Now, we can rewrite this in epigraph form,

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,s}{\inf }\ & \sum _{k=1}^K s_k \\ \text{ subject } \text{ to }& \underset{v_{11}, \dots , v_{JK} \in S}{\sup }~ \lambda _k \epsilon + \sum _{j=1}^J \alpha _{jk}g_j(v_{jk}, x)\\ & \qquad - \lambda _k(\alpha _{jk}/w_k) \Vert v_{jk} - \bar{d}_k\Vert \le s_k \qquad \qquad \quad k = 1,\dots ,K, \end{array}\right. }\\ \end{aligned} \end{aligned}$$

then use the definition of the dual norm to rewrite the constraints,

$$\begin{aligned} \begin{aligned}&{\left\{ \begin{array}{ll} \underset{v_{11}, \dots , v_{JK} \in S}{\sup }~ \lambda _k \epsilon + \sum _{j=1}^J \underset{\Vert z_{jk}\Vert _* \le \lambda _k/w_k}{\min } \alpha _{jk}g_j(v_{jk}, x) \\ \qquad \quad \qquad - \alpha _{jk}z_{jk}^T(v_{jk} - \bar{d}_k)\le s_k \quad k = 1,\dots ,K, \end{array}\right. }\\&= {\left\{ \begin{array}{ll} & ~{\underset{\alpha _{11}v_{11}\in \alpha _{11}S, \dots , \alpha _{JK}v_{JK} \in \alpha _{JK}S}{\sup }}~ \lambda _k \epsilon + \sum _{j=1}^J \underset{\Vert z_{jk}\Vert _* \le \lambda _k/w_k}{\min } \alpha _{jk}g_j\\ & \qquad \qquad \qquad ((\alpha _{jk}v_{jk})/\alpha _{jk}, x) - \alpha _{jk}z_{jk}^T(v_{jk} - \bar{d}_k)\le s_k \quad \qquad k = 1,\dots ,K. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

In the last step, we again rewrote \(v_{jk} = (\alpha _{jk}v_{jk})/\alpha _{jk}\) inside functions \(g_j\). Using similar logic as in Appendix A, we note that this does not change the optimal value for both \(\alpha _{jk} > 0\) and \(\alpha _{jk} = 0\), as taking the supremum over the transformed variables is equivalent to taking the supremum over the original variables. For conciseness, we omit the steps of creating the right-scalar-multiplied function and transformations. For details, please refer to Appendix A. We apply the definition of conjugate functions and arrive at the optimization problem

$$\begin{aligned} \begin{aligned}&{\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,{s,z}}{\inf } & \sum _{k=1} ^K s_k \\ \text{ subject } \text{ to }& \lambda _k \epsilon + \sum _{j=1}^J \alpha _{jk}[-g_j +\chi _{S}]^*(-z_{jk},x) +\alpha _{jk}z_{jk}^T\bar{d}_k \le s_k \\ & \quad k = 1,\dots ,K\\ & \Vert z_{jk}\Vert _* \le \lambda _k/w_k \quad k = 1,\dots ,K, \quad j = 1,\dots ,J. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

Now, substituting \(\lambda _k = \lambda _k w_k\), \(z_{jk} = -z_{jk}\), and substituting in the conjugate functions derived in Appendix A, we have

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\alpha \in \Gamma }{\sup }~\underset{\lambda \ge 0,{s,z}}{\inf } & \sum _{k=1} ^K s_k \\ \text{ subject } \text{ to } & \lambda _k w_k\epsilon + \sum _{j=1}^J \alpha _{jk} ([-g_j]^*(z_{jk} - y_{jk},x) + \sigma _{S}(y_{jk}) \\ & \hspace{2cm} - z_{jk}^T\bar{d}_k) \le s_k \quad k = 1,\dots ,K\\ & \Vert z_{jk}\Vert _*\le \lambda _k,\quad k = 1,\dots ,K, \quad j = 1,\dots ,J. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

Note that rescaling \(\lambda _k\) did not affect value of the problem, as minimizing \(\lambda _k\) is equivalent to minimizing \(\lambda _k w_k\), as \(w_k >0.\) Lastly, taking the supremum over \(\alpha \), we arrive at

$$\begin{aligned} \begin{aligned}&= {\left\{ \begin{array}{ll} \underset{\lambda \ge 0,{s,z}}{\inf } & \sum _{k=1} ^K s_k \\ \text{ subject } \text{ to } & w_k(\lambda _k\epsilon + [-g_j]^*(z_{jk} - y_{jk},x) + \sigma _{S}(y_{jk}) - z_{jk}^T\bar{d}_k) \le s_k\\ & \quad k = 1,\dots ,K, \quad j = 1,\dots ,J\\ & \Vert z_{jk}\Vert _*\le \lambda _k,\quad k = 1,\dots ,K, \quad j = 1,\dots ,J. \end{array}\right. }\\ \end{aligned} \end{aligned}$$

Proof of the dual problem reformulation as \(\mathbf {p \rightarrow \infty }\)

We prove the dual equivalence. For a proof of primal equivalence, see the appendix of [47].

Theorem 6

Let S be a bounded set. Define here

$$\begin{aligned} \bar{g}^K(x;\infty )=\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & \lambda \ge 0,~ z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k=1,\dots , K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k + \epsilon \left\| z_{k} \right\| _* \le s_k\\ & \hspace{5.5cm} \quad k=1,\dots , K. \end{array} \right. \end{aligned}$$
(19)

Then, \( \lim _{p\rightarrow \infty } \bar{g}^K(x;p)= \bar{g}^K(x;\infty ) \) for any \(x\in X\).

Proof

First, from Equation (5) we have for any \(p> 1\) that

$$\begin{aligned}&\bar{g}^K(x;p)\\ =&\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & \lambda \ge 0,~ z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k = 1,\dots ,K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k\\ & \qquad \qquad \quad + \phi (q)\lambda \left\| z_{k}/\lambda \right\| ^q_* + \lambda \epsilon ^p\le s_k \qquad k=1,\dots , K \end{array} \right. \\ \ge&\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & \lambda _k \ge 0,~ z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k = 1,\dots ,K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad \qquad \quad + \phi (q)\lambda _k \left\| z_{k}/\lambda _k \right\| ^q_* + \lambda _k \epsilon ^p\le s_k \qquad k=1,\dots , K \end{array} \right. \\ =&\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k = 1,\dots , K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad \qquad \quad + \epsilon \left\| z_{k} \right\| _*\le s_k \qquad k =1,\dots , K \end{array} \right. \end{aligned}$$

where the first equality is established in Appendix A and the second equality follows from Lemma 1. Remark that the inequality in the second step simply follows as we introduce \(\lambda _k\) and do not impose that \(\lambda _k=\lambda \) for all \(k=1,\dots , K\). Hence, considering the limit for p tending to infinity gives us now \( {\liminf _{p\rightarrow \infty }} \bar{g}^K(x;p)\ge \bar{g}^K(x;\infty ). \) It remains to prove the reverse \( {\limsup _{p\rightarrow \infty }} \bar{g}^K(x;p) \le \bar{g}^K(x;\infty ). \)

Second, we have for any \(p>1\) with \(1/p+1/q=1\) that

$$\begin{aligned}&\bar{g}^K(x;p)\\ \le&\left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k=1,\dots ,K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k + \phi (q)\\ & \qquad \qquad \left[ \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \max _{k'=1}^K\Vert z_{k'} \Vert _*\right] ^{1-q} \left\| z_{k} \right\| ^q_* \\ & \qquad \qquad + \left[ \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \max _{k'=1}^K\Vert z_{k'} \Vert _*\right] \epsilon ^p\le s_k \quad k=1,\dots , K\\ & (q-1)^{1/4} \le \Vert z_k \Vert _* \le (q-1)^{-1/4}\quad k =1,\dots , K, \end{array} \right. \end{aligned}$$

which follows from the choice \(\lambda _k = \left[ \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \max _{k'=1}^K\Vert z_{k'} \Vert _*\right] \) and by imposing the restrictions \((q-1)^{1/4} \le \Vert z_k \Vert _* \le (q-1)^{-1/4}\) for all \(k =1,\dots , K\). Next, using the identities \(\phi (q)\left[ \frac{q-1}{q}\right] ^{1-q} = (q-1)^{(q-1)}/q^q \left[ \frac{q-1}{q}\right] ^{1-q} = 1/q\) and \(p+\frac{1}{1-q} = \frac{1}{\frac{1}{p}} + \frac{1}{1-q} = \frac{1}{1-\frac{1}{q}}+\frac{1}{1-q} = \frac{-q}{-q+1}+\frac{1}{1-q}=\frac{1-q}{1-q}=1\), as well as pulling \(\epsilon \left\| z_{k} \right\| _*>0\) out of the last two terms, we can rewrite the first constraints as

$$\begin{aligned}&\left\{ \begin{array}{l@{~~~~}l} & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad +\epsilon \left\| z_{k} \right\| _* \left[ \frac{1}{q} \left[ \max _{k'=1}^K\frac{\Vert z_{k'} \Vert _*}{\left\| z_{k} \right\| _*}\right] ^{1-q} +\frac{q-1}{q} \max _{k'=1}^K\frac{\Vert z_{k'} \Vert _*}{\left\| z_{k} \right\| _*} \right] \le s_k \quad k =1,\dots , K.\\ \end{array} \right. \end{aligned}$$

Then, we note that \(\max _{k'=1}^K \Vert z_{k'} \Vert _*/\left\| z_{k} \right\| _* \ge \Vert z_{k} \Vert _*/\left\| z_{k} \right\| _* = 1\) and \(\max _{k'=1}^K \Vert z_{k'} \Vert _*/{\left\| z_{k} \right\| _*} \le (q-1)^{-1/2}\), and hence we can apply Lemma 2 to obtain

$$\begin{aligned}&\bar{g}^K(x;p) \le \left\{ \begin{array}{l@{~~~~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k=1,\dots , K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad \qquad +\epsilon \left\| z_{k} \right\| _* D(q) \le s_k \quad k =1,\dots , K,\\ & (q-1)^{1/4} \le \Vert z_k \Vert _* \le (q-1)^{-1/4}\quad k=1,\dots , K. \end{array} \right. \end{aligned}$$

Let

$$\begin{aligned} \bar{g}_u^K(x;p)=\left\{ \begin{array}{l@{~}l} \text{ minimize } & \sum _{k=1}^K w_k s_k\\ \text{ subject } \text{ to } & z_{k}\in {\textbf{R}}^m, ~y_{k}\in {\textbf{R}}^m, ~s_k\in {\textbf{R}}^m \quad k =1,\dots , K,\\ & [-g]^*(z_{k} - y_{k}) + \sigma _{S}(y_{k}) - z_{k}^T{d}_k \\ & \qquad \qquad +\epsilon \left\| z_{k} \right\| _* D\left( \frac{p}{p-1}\right) \le s_k \quad k =1,\dots , K\\ & (p-1)^{-1/4} \le \Vert z_k \Vert _* \le (p-1)^{1/4}\quad k =1,\dots , K. \end{array} \right. \end{aligned}$$
(20)

Hence, as \(q=p/(p-1)\) and \(q-1=1/(p-1)\) we have \( \bar{g}^K(x;p) \le \bar{g}_u^K(x;p) \) for all \(p>1\). Hence, taking the limit \(p\rightarrow \infty \) we have \(\bar{g}^K(x;\infty )\le \limsup _{p\rightarrow \infty } \bar{g}^K(x;p)\). In fact, as the function \(D\left( \frac{p}{p-1}\right) \) defined in Lemma 2 is nonincreasing for all p sufficiently large this implies that \(\bar{g}^K_u(x;p)\) is nonincreasing for p sufficiently large and hence we have \({\bar{g}^K(x;\infty )\le \limsup _{p\rightarrow \infty } \bar{g}^K(x;p) }\le \liminf _{p\rightarrow \infty } \bar{g}_u^K(x;p) = \lim _{p\rightarrow \infty } \bar{g}_u^K(x;p)\). We now prove here that \(\lim _{p\rightarrow \infty } \bar{g}_u^K(x;p) = \liminf _{p\rightarrow \infty } \bar{g}_u^K(x;p)\le \bar{g}^K(x;\infty )\). Consider any feasible sequence \(\{(z^t_{k}, \,y^t_{k}, \,s^t_k=[-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k +\epsilon \left\| z^t_{k} \right\| _* )\}_{t\ge 1}\) in the optimization problem characterizing \(\bar{g}^K(x;\infty )\) in Equation (19) so that \(\lim _{t\rightarrow \infty }\sum _{k=1}^K w_k s^t_k=\bar{g}^K(x;\infty )\). Let \(\tilde{z}^t_{k}\in \arg \max \{ \Vert z \Vert _* \mid z\in {\textbf{R}}^m, \, \Vert z-z_k^t \Vert _*\le 1/t \}\) for all \(t\ge 1\) and \(k=1,\dots , K\) and observe that \(\Vert \tilde{z}^t_k \Vert _*= 1/t+\Vert z_k^t \Vert _*\ge 1/t\). Consider now an increasing sequence \(\{p_t\}_{t\ge 1}\) so that \((p_t-1)^{1/4}\ge \max ^K_{k=1} \Vert \tilde{z}^t_k \Vert _*\) and \((p_t-1)^{-1/4}\le 1/t\). Finally observe that the auxiliary sequence \(\{(\tilde{z}^t_{k}, \,\tilde{y}^t_{k}=y^t_k + (\tilde{z}^t_{k}-z^t_{k}), \,\tilde{s}^t_k= [-g]^*(\tilde{z}^t_{k} - \tilde{y}^t_{k}) + \sigma _{S}(\tilde{y}^t_{k}) - (\tilde{z}^t_{k})^T{d}_k +\epsilon \left\| \tilde{z}^t_{k} \right\| _* D\left( p_t/(p_t-1)\right) )\}_{t\ge 1}\) is by construction feasible in the minimization problem characterizing the function \(\bar{g}_u^K(x;p_t)\) in Equation (20). Hence, finally, we have

$$\begin{aligned} \lim _{p\rightarrow \infty }&g_u^K(x;p) = \lim _{t\rightarrow \infty } g_u^K(x;p_t) = \lim _{t\rightarrow \infty } \sum _{k=1}^K w_k \tilde{s}_k^t\\ {=}&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k\left( [-g]^*(\tilde{z}^t_{k} - \tilde{y}^t_{k}) + \sigma _{S}(\tilde{y}^t_{k}) - (\tilde{z}^t_{k})^T{d}_k +\epsilon \left\| \tilde{z}^t_{k} \right\| _* D\left( p_t/(p_t-1)\right) \right) \\ \le&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k\left( [-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k +\epsilon \left\| z^t_{k} \right\| _* D\left( p_t/(p_t-1)\right) \right) \\&\qquad + \sum _{k=1}^Kw_k\left( \max _{s\in S} \Vert s \Vert +\Vert d_k \Vert +\epsilon D\left( p_t/(p_t-1)\right) \right) /t\\ \le&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k\left( [-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k +\epsilon \left\| z^t_{k} \right\| _* D\left( p_t/(p_t-1)\right) \right) \\ {=}&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k([-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k \\&\quad +\epsilon \left\| z^t_{k} \right\| _* + \epsilon \left\| z^t_{k} \right\| _* \left( D\left( p_t/(p_t-1)\right) -1\right) )\\ \le&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_k([-g]^*(z^t_{k} - y^t_{k}) + \sigma _{S}(y^t_{k}) - (z^t_{k})^T{d}_k \\&\quad +\epsilon \left\| z^t_{k} \right\| _* + \epsilon (p_t-1)^{1/4} \left( D\left( p_t/(p_t-1)\right) -1\right) )\\ \le&\lim _{t\rightarrow \infty } \sum _{k=1}^Kw_ks_k = \bar{g}^K(x;\infty ). \end{aligned}$$

To establish the third inequality observe first that \(-(\tilde{z}^t_{k})^T{d}_k = -(z^t_{k})^T{d}_k - (\tilde{z}^t_k-z^t_k)^Td_k\le -(z^t_{k})^T{d}_k + \Vert \tilde{z}^t_k-z^t_k \Vert _* \Vert d_k \Vert \le -(z^t_{k})^T{d}_k + \Vert d_k \Vert /t\). Second, remark that we have

$$\begin{aligned} \sigma _{S}(\tilde{y}^t_{k}) = \sigma _{S}(y^t_{k}+(\tilde{z}^t_k-z_k^t)) \le&\max _{s\in S} s^T(y^t_{k}+(\tilde{z}^t_k-z_k^t)) \\ \le&\max _{s\in S} s^Ty^t_{k}+\max _{s\in S} s^T(\tilde{z}^t_k-z_k^t)\\ \le&{\max _{s\in S} s^Ty^t_{k}+ \Vert s \Vert \Vert \tilde{z}^t_k-z_k^t \Vert _* \le \max _{s\in S}~s^Ty^t_{k}+ 1/t \max _{s\in S} \Vert s \Vert }, \end{aligned}$$

as \(\Vert \tilde{z}_t-z_t \Vert \le 1/t\). Lemma 2 guarantees that \(\lim _{t\rightarrow \infty } D\left( p_t/(p_t-1)\right) =1\). Finally, \(\left\| z^t_{k} \right\| _*\le \left\| \tilde{z}^t_{k} \right\| _*\le (p_t-1)^{1/4}\) and

$$\begin{aligned} \lim _{t\rightarrow \infty } (p_t-1)^{1/4} \left( D\left( p_t/(p_t-1)\right) -1\right) =&\lim _{p\rightarrow \infty } (p-1)^{1/4} \left( D\left( p/(p-1)\right) -1\right) \\ =&\lim _{q\rightarrow 1} (q-1)^{-1/4} \left( D(q)-1 \right) =0 \end{aligned}$$

with \(1/p+1/q=1\) using again Lemma 2. \(\square \)

Lemma 1

We have

$$ \min _{\lambda \ge 0} ~\phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p = \Vert z \Vert _*\epsilon $$

for any \(p>1\) and \(q>1\) for which \(1/p+1/q=1\), \(\phi (q)=(q-1)^{q-1}/q^q\) and \(\epsilon >0\).

Proof

Remark that as the objective function \(\lambda \mapsto \phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p\) is continuous and we have \(\lim _{\lambda \rightarrow 0} \phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p = \lim _{\lambda \rightarrow \infty } \phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p=\infty \) as \(\epsilon >0\) there must exist a minimizer \(\lambda ^\star \in \min _{\lambda \ge 0}\, \phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p\) with \(\lambda _\star >0\). The necessary and sufficient first-order convex optimality conditions of the minimization problem guarantee

$$\begin{aligned}&\lambda ^\star \in \min _{\lambda \ge 0} ~\phi (q)\lambda \left\| z/\lambda \right\| ^q_* + \lambda \epsilon ^p \iff (1-q)\phi (q) \lambda _\star ^{-q} \Vert z \Vert _\star ^q + \epsilon ^p = 0\\ \iff&\epsilon ^p = (q-1) \phi (q) \lambda _\star ^{-q} \Vert z \Vert _\star ^q \iff \lambda _\star = \left[ (q-1) \phi (q)\right] ^{1/q} \Vert z \Vert _\star \epsilon ^{-p/q} \\&\iff \lambda _\star = \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \Vert z \Vert _\star \end{aligned}$$

where we exploit that \(1/p+1/q=1\) and \(\phi (q)=(q-1)^{q-1}/q^q\). Indeed, we have

$$\begin{aligned} \left[ (q-1) \phi (q)\right] ^{1/q}= & \left[ (q-1)^{q}/q^q\right] ^{1/q}= (q-1)/q, \quad \\ -\frac{p}{q}= & -\frac{1}{\frac{1}{p} q} = -\frac{1}{(1-1/q) q}=-\frac{1}{q-1}=\frac{1}{1-q}. \end{aligned}$$

Hence, we have

$$\begin{aligned}&\min _{\lambda \ge 0} ~\phi (q) \lambda ^{1-q} \Vert z \Vert _\star ^q+\lambda \epsilon ^p= ~\phi (q) \lambda _\star ^{1-q} \Vert z \Vert _\star ^q+\lambda _\star \epsilon ^p\\&\qquad = ~\phi (q) \left[ \frac{(q-1)^{1-q}}{q^{1-q}} \epsilon \Vert z \Vert ^{1-q}_\star \right] \Vert z \Vert _\star ^q +\left[ \frac{q-1}{q} \epsilon ^{\frac{1}{1-q}} \Vert z \Vert _\star \right] \epsilon ^p\\&\qquad = ~\phi (q) \frac{(q-1)^{1-q}}{q^{1-q}} \epsilon \Vert z \Vert _\star +\frac{q-1}{q} \epsilon ^{p+\frac{1}{1-q}} \Vert z \Vert _\star \\&\qquad = \phi (q) \frac{(q-1)^{1-q}}{q^{1-q}} \epsilon \Vert z \Vert _\star +\frac{q-1}{q} \epsilon \Vert z \Vert _\star \\&\qquad = ~\frac{(q-1)^{q-1}}{q^q} \frac{(q-1)^{1-q}}{q^{1-q}} \epsilon \Vert z \Vert _\star +\frac{q-1}{q} \epsilon \Vert z \Vert _\star = \frac{1}{q} \epsilon \Vert z \Vert _\star +\frac{q-1}{q} \epsilon \Vert z \Vert _\star \\&\qquad = \left[ \frac{1}{q} +\frac{q-1}{q} \right] \epsilon \Vert z \Vert _\star = \epsilon \Vert z \Vert _\star \end{aligned}$$

where we exploit that \(1/p+1/q=1\) and \(\phi (q)=(q-1)^{q-1}/q^q\). Indeed, we have

$$ p+\frac{1}{1-q} = \frac{1}{\frac{1}{p}} + \frac{1}{1-q} = \frac{1}{1-\frac{1}{q}}+\frac{1}{1-q} = \frac{-q}{-q+1}+\frac{1}{1-q}=\frac{1-q}{1-q}=1 $$

establishing the claim. \(\square \)

Lemma 2

Let \(q>1\) then

$$ \max _{t\in [1, 1/\sqrt{q-1}]} ~\frac{1}{q} t^{1-q} +\frac{q-1}{q} t = D(q)=\max \left( 1, \frac{1}{q} \frac{1}{(q-1)^{(1-q)/2}} +\frac{\sqrt{q-1}}{q}\right) $$

with \(\lim _{q\rightarrow 1} D(q)=1\) and \(\lim _{q\rightarrow 1} (q-1)^{1/4} \left( D(q)-1\right) =0\).

Proof

The objective function is convex in t. Convex functions attain their maximum on the extreme points of their domain. The limits can be verified using standard manipulations. \(\square \)

Proof of Theorem 4

We prove (i) \(\bar{g}^N(x) \le \bar{g}^K(x)\), (ii) \(\bar{g}^K(x) \le \bar{g}^{N*}(x) + (L/2) D(K)\), and (iii) when the support constraint does not affect the worst-case value, \({\bar{g}^K(x) \le \bar{g}^{N}(x) + (L/2) D(K)}\).

Proof of (i). We begin with a feasible solution \(v_1, \dots , v_{N}\) of (MRO-K) with \(K=N\). Then for \(K < N\), we set \(u_k = \sum _{i \in C_k}{v_i}/|C_k|\) for each of the K clusters. We see \(u_k\) with \(k = 1,\dots ,K\) satisfies the constraints of (MRO-K) for \(K < N\), as

$$\begin{aligned} \sum _{k = 1}^K \frac{|C_k|}{N}\left\| u_k - \bar{d}_k \right\| ^p&= \sum _{k = 1}^K \frac{|C_k|}{N} \left\| \frac{\sum _{i \in C_k}{v_i}}{|C_k|} - \frac{\sum _{i \in C_k}{d_i}}{|C_k|} \right\| ^p\\&\le \sum _{k = 1}^K \frac{|C_k|}{N} \sum _{i \in C_k} \frac{1}{|C_k|} \Vert v_i - d_i\Vert ^p\\&= \sum _{k = 1}^K \frac{1}{N} \sum _{i \in C_k} \Vert v_i - d_i\Vert ^p \le \epsilon ^p, \end{aligned}$$

where we have applied triangle inequality, Jensen’s inequality for the convex function \(f(x) = \Vert x\Vert ^p\), and the constraint of (MRO-K) with \(K=N\). In addition, since the support \(S\) is convex, for every k our constructed \(u_k\), as the average of select points \(v_i\) \(\in S\), must also be within \(S\). The same applies with respect to the domain of g.

Since we have shown that the \(u_k\)’s satisfies the constraints for (MRO-K) with \(K<N\), it is a feasible solution. We now show that for this pair of feasible solutions, in terms of the objective value, \(\bar{g}^K(x) \ge \bar{g}^N(x)\). By assumption, g is concave in the uncertain parameter, so by Jensen’s inequality,

$$\begin{aligned} \sum _{k = 1}^K \frac{|C_k|}{N} g \left( \frac{1}{|C_k|}\sum _{i \in C_k}v_i, x\right)&\ge \sum _{k = 1}^K \frac{|C_k|}{N} \frac{1}{|C_k|}\sum _{i \in C_k} g(v_i)\\ \sum _{k = 1}^K \frac{|C_k|}{N}g(u_k,x)&\ge \frac{1}{N}\sum _{i \in N} g(v_i). \end{aligned}$$

Since this holds true for \(u_k\)’s constructed from any feasible solution \(v_i, \dots , v_N\), we must have \(\bar{g}^K(x) \ge \bar{g}^N(x)\).

Proof of (ii). Next, we prove \(\bar{g}^K(x) \le \bar{g}^{N*}(x) + (L/2) D(K)\) by making use of the L-smooth condition on \(-g\). We first solve (MRO-K) with \(K < N\) to obtain a feasible solution \(u_1, \dots , u_k\). We then set \(\Delta _k =u_k - \bar{d}_k\) for each \(k \le K\), and set \(v_i = d_i + \Delta _k \quad \forall i \in C_k, k = 1,\dots ,K\). These satisfy the constraint of (MRO-K) with \(K=N\) and no support constraints (which we will refer to as \(K=N^*\) from here on), as

$$\begin{aligned} \frac{1}{N} \sum _{i=1}^N \Vert v_i - d_i \Vert ^p = \frac{1}{N} \sum _{k=1}^K \sum _{i \in C_k} \Vert \Delta _k\Vert ^p = \sum _{k=1}^K \frac{|C_k|}{N} \Vert u_k - \bar{d}_k\Vert ^p \le \epsilon ^p. \end{aligned}$$

Since the constraints are satisfied, the constructed \(v_i \dots v_N\) are a valid solution for (MRO-K) with \(K=N^*\). We note that these \(v_i\)’s are also in the domain of g, given that the uncertain data \(\mathcal {D}_N\) is in the domain of g. For monotonically increasing functions g, (e.g., \(\log (u)\), \(1/(1+u)\)), we must have \(\Delta _k = u_k - \bar{d_k} \ge 0\) in the solution of (MRO-K), as the maximization of g over \(u_k\) will lead to \(u_k \ge \bar{d_k}\). Therefore, \(v_i = d_i + \Delta _k\) is also in the domain, as the L-smooth and concave function g with only a potential lower bound will not have holes in its domain above the lower bound. For monotonically decreasing functions g, the same logic applies with a nonpositive \(\Delta _k\). We now make use of the convex and L-smooth conditions [2, Theorem 5.8] on \(-g: \forall v_1, v_2 \in S, \lambda \in [0,1],\)

$$\begin{aligned} g(\lambda v_1 + (1-\lambda ) v_2) \le \lambda g(v_1) + (1 - \lambda )g(v_2) + \frac{L}{2}\lambda (1 - \lambda )\Vert v_1 - v_2 \Vert ^2_2, \end{aligned}$$

which, we can apply iteratively, with the first iteration being

$$\begin{aligned} g\left( \frac{1}{|C_k|} v_1 + \frac{|C_k|-1}{|C_k|} \bar{v}_2\right)\le & \frac{1}{|C_k|}g(v_1) + \frac{|C_k|-1}{|C_k|}g(\bar{v}_2) \\ & + \frac{L}{2}\frac{1}{|C_k|}\frac{|C_k|-1}{|C_k|}\Vert v_1 - \bar{v}_2 \Vert ^2_2, \end{aligned}$$

where \(\bar{v}_2 = \frac{1}{|C_k|-1}\sum _{i \in C_k, i\ne 1} v_i\). Note that \(v_1 - \bar{v}_2 = d_1 - \frac{1}{|C_k|-1}\sum _{i \in C_k, i\ne 1} d_i\), as they share the same \({\Delta _k}\). The next iteration will be applied to \(g(\bar{v}_2)\), and so on. For each cluster k, this results in:

$$\begin{aligned} g \left( \frac{1}{|C_k|}\sum _{i \in C_k} v_i, x\right)&\le \frac{1}{|C_k|}\sum _{i \in C_k} g(v_i,x) + \frac{L}{2|C_k|} \sum _{i=2} ^{|C_k|} \frac{i-1}{i} \left\| d_i - \frac{\sum _{j=1}^{i-1}d_j}{i-1} \right\| ^2_2\\ g(\bar{d}_k + \Delta _k,x)&\le \frac{1}{|C_k|}\sum _{i \in C_k} g(v_i,x) + \frac{L}{2|C_k|} \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2\\ g(u_k,x)&\le \frac{1}{|C_k|}\sum _{i \in C_k} g(v_i,x) + \frac{L}{2|C_k|} \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2, \end{aligned}$$

where we used the equivalence \( \sum _{i=2} ^{|C_k|}((i-1)/{i}) \left\| d_i - {\sum _{j=1}^{i-1}d_j}/({i-1}) \right\| ^2_2 = \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2. \) Now, summing over all clusters, we have

$$\begin{aligned} \sum _{k = 1}^K ({|C_k|}/{N} ) g(u_k,x)&\le {1}/{N} \sum _{i = 1}^N g(v_i,x) + (L/2) D(K). \end{aligned}$$

Since this holds for any feasible solution of (MRO-K) with \(K<N\), we must have \( \bar{g}^K(x) \le \bar{g}^{N*}(x) + (L/2) D(K)\).

Proof of Theorem 5

Proof of the lower bound. We use the dual formulations of the MRO constraints. For the case \(p \ge 1\), we first solve (MRO-K-Dual) with \(K<N\) to obtain dual variables \(z_{jk}\), \(\gamma _{jk}\). For each data label i in cluster \(C_k\), for all clusters \(k = 1,\dots ,K\), and for all pieces \(j = 1,\dots , J\), if we set

$$\begin{aligned} {\lambda = \lambda },\quad z_{ji} = z_{jk}, \quad \gamma _{ji} = \gamma _{jk}, \quad s_i = s_k + \max _j\{(-z_{jk} - H^T\gamma _{jk})^T (d_i - \bar{d}_k)\}, \end{aligned}$$

we have obtained a valid solution for (MRO-K-Dual) with \(K=N\). The increase in the objective value from \(\bar{g}^K(x)\) to that of \(\bar{g}^N(x)\), \({i.e.}~\bar{g}^N(x)- \bar{g}^K(x)\), is

$$\begin{aligned} \begin{aligned} \delta (K,z,\gamma )&=(1/N)\sum _{i=1}^N s_i - \sum _{k=1}^K(|C_k|/N)s_k \\&= (1/N)\sum _{k = 1}^K\sum _{i\in C_k} \max _j\{(-z_{jk} - H^T\gamma _{jk})^T(d_i - \bar{d}_k)\}. \end{aligned} \end{aligned}$$

We note that \(\delta (K,z,\gamma ) \ge (|C_k|/N)\sum _{k = 1}^K (-z_{1k} - H^T\gamma _{1k})^T (1/|C_k|)\sum _{i\in C_k} (d_i - \bar{d}_k) = (|C_k|/N) \sum _{k = 1}^K (-z_{1k} - H^T\gamma _{1k})^T0 = 0\). The constructed feasible solution for (MRO-K-Dual) with \(K=N\) is an upper bound for its optimal solution, since it is a minimization problem. We then have \(\bar{g}^N(x) - \bar{g}^K(x) \le \delta (K,z,\gamma )\), which translates to \(\bar{g}^N(x) - \delta (K,z,\gamma ) \le \bar{g}^K(x)\).

Now, for \(p=\infty \), the same procedure can be applied; we obtain a solution to MRO-K-Dual-\(\infty \) with \(K < N\), and construct a feasible solution for MRO-K-Dual-\(\infty \) with \(K=N\). We modify the variables in the same manner, with the exception of \(\lambda \), which is now set by \(\lambda _i = \lambda _k\). The definition of \(\delta (K,z,\gamma )\) remains unchanged, so the same result follows.

Proof of the upper bound. We use the primal formulations of the MRO constraints. Similar to the single-concave proof, for \(p \ge 1\), we first solve the MRO problem with K clusters to obtain a feasible solution \(u_{11}, \dots , u_{JK}\), \(\alpha _{11}, \dots , \alpha _{JK}\). Note that the existence of \(\alpha \) removes the need to analyze which function \(g_j\) attains the maximum for each cluster. We then set \(\Delta _{jk} =u_{jk} - \bar{d}_k\) for each \(k \le K\), and set \(v_{ji} = d_i + \Delta _{jk}\), \(\alpha _{ji} = \alpha _{jk}/|C_k| \quad \forall i \in C_k, k = 1,\dots ,K\). These satisfy the constraint of the problem with N clusters and without support constraints, as

$$\begin{aligned} \sum _{i=1}^N \sum _{j=1}^J \alpha _{ji} \Vert v_{ji} - d_i \Vert ^p&= \sum _{k=1}^K \sum _{j=1}^J \sum _{i \in C_k} \alpha _{ji} \Vert \Delta _{jk}\Vert ^p = \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} \Vert u_{jk} - \bar{d}_k\Vert ^p \le \epsilon ^p. \end{aligned}$$

For \(p = \infty \), we repeat the process of obtaining and modifying a feasible solution, and observe, for \(i=1,\dots ,N\),

$$\begin{aligned} \sum _{j=1}^J (\alpha _{ji}/(1/N)) \Vert v_{ji} - d_i \Vert&= \sum _{j=1}^J (\alpha _{jk}/(|C_k|/N)) \Vert \Delta _{jk}\Vert \\&= \sum _{j=1}^J (\alpha _{jk}/w_k) \Vert u_{jk} - \bar{d}_k\Vert \le \epsilon . \end{aligned}$$

Therefore, we also have constraint satisfaction for \(p = \infty \). The constructed solutions for both cases remain in \(\mathop \textbf{dom}_u{g}\), following the arguments in the proof of (ii) in Appendix D.

Next, the objective functions for \(p\ge 1\) and \(p =\infty \) are identical, so the same analysis applies. For each cluster k and function \(g_j\), using the L-smooth condition on \(-g_j\), we observe

$$\begin{aligned} \alpha _{jk} g_j\left( \frac{1}{|C_k|}\sum _{i \in C_k} v_{ji}, x\right)&\le \alpha _{jk} \left( \sum _{i \in C_k} \frac{1}{|C_k|} g_j(v_{ji},x)\right. \\&\quad \left. + \frac{L_j}{2|C_k|} \sum _{i=2} ^{|C_k|} \frac{i-1}{i} \left\| d_i - \frac{\sum _{j=1}^{i-1}d_j}{i-1} \right\| ^2_2\right) \\ \alpha _{jk} g_j(\bar{d}_k + \Delta _k,x)&\le \sum _{i \in C_k} \alpha _{ji} g_j(v_{ji},x) + \frac{\alpha _{ji} L_j}{2} \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2. \end{aligned}$$

Then, summing over all the clusters and functions, we have

$$\begin{aligned} \sum _{k=1}^K \sum _{j=1}^J \alpha _{jk} g_j(u_{jk},x)&\le \sum _{k=1}^K \sum _{j=1}^J \sum _{i \in C_k} \alpha _{ji} g_j(v_{ji},x)\\&\quad + \sum _{k=1}^K \sum _{j=1}^J \frac{\alpha _{ji} \max _{j\le J} L_j}{2} \sum _{i \in C_k} \Vert d_i - \bar{d}_k\Vert ^2_2\\&\le \sum _{i=1}^N \sum _{j=1}^J \alpha _{ji} g(v_{ji},x) + \max _{j\le J} (L_j/2N)\sum _{i=1}^N \Vert d_i - \bar{d}_k\Vert ^2_2 . \end{aligned}$$

Since this holds for all feasible solutions of the problem with K clusters, we conclude that

$$\begin{aligned} \bar{g}^K(x) \le \bar{g}^{N*}(x) + \max _{j\le J} (L_j/2) D(K).\end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, I., Becker, C., Van Parys, B. et al. Mean robust optimization. Math. Program. (2024). https://doi.org/10.1007/s10107-024-02170-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10107-024-02170-4

Keywords

Mathematics Subject Classification

Navigation