5.2. Kernel-QA Optimizer

KernelQAOptimizer performs black-box optimization using second-order polynomial kernel functions with Quadratic Annealing (Kernel-QA).

Kernel-QA

The kernel-QA is a very similar black-box optimization method to FMQA (5.1. FMQA Optimizer)

Kernel-QA flow

The only (and crucial) difference from FMQA is that, instead of using a factorization machine, kernel-QA uses a kernel method with a polynomial kernel as a surrogate model/acquisition function.

In the kernel-QA optimizer, the surrogate model/acquisition function \(g(\mathbf{x})\) is constructed based on the training data set analytically (\(N_0\) sets of input-output pairs of the black-box function) \((\mathbf{x^*}, \mathbf{y^*})\). The formulation of \(g(\mathbf{x})\) in the kernel method is described as follows:

\[ g(\mathbf{x}) = \mu(\mathbf{x}) - \beta \sigma(\mathbf{x}). \]

Here, \(\mu(\mathbf{x})\) is a surrogate model that predicts the most expected values (mean) of the black-box function given the training data set (let us say \(\mu(\mathbf{x})\) is equivalent to the FM in FMQA optimizer). \(\sigma(\mathbf{x})\) is the estimated uncertainty of the surrogate model at a given \(\mathbf{x}\). The above formulation follows the Lower Confidence Bound (LCB) metric. In the above formulation, \(\beta\) is a hyperparameter for the kernel-QA method. When \(\beta = 0\), \(g(\mathbf{x})\) is simply a surrogate model for the black-box function. Similar to Bayesian optimization, \(g(\mathbf{x})\) becomes an acquisition function when \(\beta > 0\).

Note

In the kernel-QA method, the formulation of \(\mu(\mathbf{x})\) and \(\sigma(\mathbf{x})\) is the key to reduce computational difficulties (i.e., the curse of dimensionality) and take advantage of quantum annealing/Ising machines for optimization. To see further details of the formulations of \(\mu\) and \(\sigma\), click below to expand.

Detailed formulation (click to expand)
\[ \mu(\mathbf{x}) = \mathbf{x}^{T} \mathbf{Q}_{\mu} \mathbf{x} + 2\gamma \mathbf{q}_{\mu}^T \mathbf{x}. \]
\[ \sigma(\mathbf{x}) = \mathbf{x}^{T} \mathbf{Q}_{\sigma} \mathbf{x} + 2\gamma \mathbf{q}_{\sigma}^T \mathbf{x}. \]

Here, \(\mathbf{Q}_{\mu}\) and \(\mathbf{q}_{\mu}\) are respectively the coefficient matrix and vector for \(\mu(\mathbf{x})\), and \(\mathbf{Q}_{\sigma}\) and \(\mathbf{q}_{\sigma}\) are those for \(\sigma(\mathbf{x})\). Clearly, both \(\mu\) and \(\sigma\) are quadratic functions and thereby utilize QUBO solvers for the optimization phase.

These coefficient matrices and vectors are constructed based on the following polynomial kernels rather than radial basis function kernels or similar typically used in the Gaussian process. Here, \(k_{\mu}\) is for \(\mathbf{Q}_{\mu}\) and \(\mathbf{q}_{\mu}\), and \(k_{\sigma}\) is for \(\mathbf{Q}_{\sigma}\) and \(\mathbf{q}_{\sigma}\).

\[ k_{\mu}(\mathbf{x}_a, \mathbf{x}_b) = (\mathbf{x}_a^T \mathbf{x}_b)^2 + \gamma. \]
\[ k_{\sigma}(\mathbf{x}_a, \mathbf{x}_b) = \mathbf{x}_a^T \mathbf{x}_b + \gamma. \]

Using these kernels, the coefficients are computed as follows.

\[\begin{split} \begin{align} \mu(\mathbf{x}) &= \sum_{j = 1}^N \hat{c}_j k_{\mu}(\mathbf{x^*}_j, \mathbf{x}) \\ &= \mathbf{x}^{T} (\sum_{j = 1}^{N} \hat{c}_j \mathbf{x^*}_j \mathbf{x^*}_j^T) \mathbf{x} + 2 \gamma (\sum_{j = 1}^{N} \hat{c}_j \mathbf{x^*}_j)^T \mathbf{x}. \end{align} \end{split}\]
\[\begin{split} \begin{align} \sigma(\mathbf{x}) &= k_{\sigma}(\mathbf{x}, \mathbf{x}) - \mathbf{k^*}_{\sigma}^T(\mathbf{K} + \lambda \mathbf{I})^{-1} \mathbf{k^*}_{\sigma} \\ &= \mathbf{x}^T \left\{\mathbf{I} - \left(\sum_{i = 1}^{n} \sum_{j = 1}^{n} L_{ij} \mathbf{x^*}_i  \mathbf{x^*}_j^T \right) \right\} \mathbf{x} - 2 a \sum_{i = 1}^{n}\sum_{j = 1}^{n} L_{ij} (\mathbf{x^*}_{i}^T \mathbf{x}). \end{align} \end{split}\]

Here, \(\mathbf{k^*} = [k_\sigma(\mathbf{x}, \mathbf{x^*}_0, \cdots, \mathbf{x^*}_n)]\), where \(n\) is the number of samples in the training data \((\mathbf{x^*}, \mathbf{y^*})\). Also, \(\mathbf{K}\) is a Gram matrix whose \(i,j\) element is \(k_{\mu/\sigma}(\mathbf{x_i}, \mathbf{x_j})\). Also, \(\hat{c}_i\) is the \(i\)-th element of a coefficient vector \(\hat{\mathbf{c}} = (\mathbf{K} + \lambda \mathbf{I})^{-1} \mathbf{y^*}\), where \(\lambda\) is a regularization parameter.

Note that computing \((\mathbf{K} + \lambda \mathbf{I})^{-1}\) naively is computationally expensive. To avoid such a circumstance, \((\mathbf{K} + \lambda \mathbf{I})^{-1}\) is sequentially updated each time a new training input-output pair is added at the end of optimization cycle (4) in the above figure.

In the above formualtion, \(\beta\), \(\gamma\) and \(\lambda\) are hyper parameters.

Using kernel-QA optimizer

Using the kernel-QA optimizer is like using the FMQA optimizer. You just need to replace FMQAOptimizer with KernelQAOptimizer. Here is some explanation of using KernelQAOptimizer.

The code below shows the preparation of the black-box function and initial training data, which are already mentioned in “2. Black-Box Function”.

import numpy as np
from amplify_bbopt import (
    DatasetGenerator,
    KernelQAOptimizer,
    RealVariable,
    blackbox,
)

from utils.pseudo_simulators import (
    pseudo_wing_simulator as wing_simulator,
)

np.set_printoptions(legacy="1.25")


@blackbox
def objective_lift_drag(
    wing_width: float = RealVariable(bounds=(1, 20), nbins=100),
    wing_height: float = RealVariable(bounds=(1, 5), nbins=20),
    wing_angle: float = RealVariable(bounds=(0, 45), nbins=20),
) -> float:
    """This black-box function executes wing_simulator(), and returns
    the negative lift-drag ratio for a given wing's width, height, and angle.
    """
    lift, drag = wing_simulator(wing_width, wing_height, wing_angle)
    lift_drag = lift / drag
    print(f"{lift=:.2e}, {drag=:.2e}, {lift_drag=:.2e}")
    return -lift_drag  # value to minimize


# Generate initial training data set
data = DatasetGenerator(objective=objective_lift_drag).generate(num_samples=10)
Hide code cell output
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #0/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=6.22e+02, drag=1.60e+02, lift_drag=3.89e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #1/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=1.36e+02, drag=5.80e+01, lift_drag=2.34e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #2/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=2.41e+01, drag=1.12e+01, lift_drag=2.16e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #3/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=5.86e+02, drag=1.70e+02, lift_drag=3.46e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #4/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=3.75e+02, drag=1.55e+02, lift_drag=2.43e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #5/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=5.32e+02, drag=1.55e+02, lift_drag=3.44e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #6/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=5.71e+02, drag=2.58e+02, lift_drag=2.21e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #7/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=6.18e+02, drag=1.71e+02, lift_drag=3.63e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #8/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=3.95e+02, drag=2.33e+02, lift_drag=1.69e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #9/10 initial data for objective_lift_drag
amplify-bbopt | 2024/10/04 05:47:14 | INFO | - [obj]: lift=6.62e+01, drag=1.77e+02, lift_drag=3.73e-01

Now, we will set a solver client that optimizes the surrogate model \(g(\boldsymbol{x})\) at step ② in the above figure at each FMQA cycle.

You can choose a solver client from the Amplify SDK. Here, we will use Amplify Annealing Engine (Amplify AE) as a solver as an example. All available solvers are listed here. The timeout to be set in the client.parameters.timeout corresponds to the searching timeout for the optimization step (② in the above figure) at each FMQA cycle.

You also need to provide your API token for the solver (for Amplify AE, you can get one for free after user registration).

from datetime import timedelta

from amplify import FixstarsClient

# Set up solver client
client = FixstarsClient()
client.parameters.timeout = timedelta(milliseconds=2000)  # 2 seconds
# client.token = "xxxxxxxxxxx"  # Enter your Amplify AE API token & remove comment.

Finally, using the data, client and objective_lift_drag defined above, you can instantiate the KernelQAOptimizer class as follows. You can see the overview of the black-box optimization setting as well.

# Instantiate the Kernel-QA optimizer
optimizer = KernelQAOptimizer(
    data=data, objective=objective_lift_drag, client=client
)

# Display the overall black-box optimization setting
print(optimizer)
num variables: 3
num elemental variables: 3
num amplify variables: 137
optimizer client: FixstarsClient
objective weight: 1.0
--------------------
trainer class: ModelKernelTrainer
model class: ModelKernel
model params: {beta: 0.0, gamma: 0.0}
reg_param: 1

To run the optimization cycles, use the class’s optimize() method with the number of optimization cycles. In the output, you can see the evolution of best objective at each optimization cycle. With appropriate settings, you can expect that best objective will decrease on average with cycles.

# Perform FMQA optimization for [num_cycles] cycles
optimizer.optimize(num_cycles=5)
Hide code cell output
amplify-bbopt | 2024/10/04 05:47:14 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:14 | INFO | #1/5 optimization cycle, constraint wt: 7.77e+00
amplify-bbopt | 2024/10/04 05:47:14 | INFO | model corrcoef: 1.000, beta: 0.0
amplify-bbopt | 2024/10/04 05:47:16 | INFO | num_iterations: 21
amplify-bbopt | 2024/10/04 05:47:16 | INFO | - [obj]: lift=4.22e+02, drag=7.20e+01, lift_drag=5.86e+00
amplify-bbopt | 2024/10/04 05:47:16 | INFO | y_hat=-5.857e+00, best objective=-5.857e+00
amplify-bbopt | 2024/10/04 05:47:16 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:16 | INFO | #2/5 optimization cycle, constraint wt: 1.17e+01
amplify-bbopt | 2024/10/04 05:47:16 | INFO | model corrcoef: 1.000, beta: 0.0
amplify-bbopt | 2024/10/04 05:47:19 | INFO | num_iterations: 20
amplify-bbopt | 2024/10/04 05:47:19 | INFO | - [obj]: lift=4.31e+02, drag=7.24e+01, lift_drag=5.94e+00
amplify-bbopt | 2024/10/04 05:47:19 | INFO | y_hat=-5.944e+00, best objective=-5.944e+00
amplify-bbopt | 2024/10/04 05:47:19 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:19 | INFO | #3/5 optimization cycle, constraint wt: 1.19e+01
amplify-bbopt | 2024/10/04 05:47:19 | INFO | model corrcoef: 1.000, beta: 0.0
amplify-bbopt | 2024/10/04 05:47:22 | INFO | num_iterations: 20
amplify-bbopt | 2024/10/04 05:47:22 | INFO | - [obj]: lift=4.44e+02, drag=7.31e+01, lift_drag=6.07e+00
amplify-bbopt | 2024/10/04 05:47:22 | INFO | y_hat=-6.074e+00, best objective=-6.074e+00
amplify-bbopt | 2024/10/04 05:47:22 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:22 | INFO | #4/5 optimization cycle, constraint wt: 1.21e+01
amplify-bbopt | 2024/10/04 05:47:22 | INFO | model corrcoef: 1.000, beta: 0.0
amplify-bbopt | 2024/10/04 05:47:25 | INFO | num_iterations: 20
amplify-bbopt | 2024/10/04 05:47:25 | INFO | - [obj]: lift=4.48e+02, drag=7.33e+01, lift_drag=6.12e+00
amplify-bbopt | 2024/10/04 05:47:25 | INFO | y_hat=-6.117e+00, best objective=-6.117e+00
amplify-bbopt | 2024/10/04 05:47:25 | INFO | ----------------------------------------
amplify-bbopt | 2024/10/04 05:47:25 | INFO | #5/5 optimization cycle, constraint wt: 1.22e+01
amplify-bbopt | 2024/10/04 05:47:25 | INFO | model corrcoef: 1.000, beta: 0.0
amplify-bbopt | 2024/10/04 05:47:28 | INFO | num_iterations: 20
amplify-bbopt | 2024/10/04 05:47:28 | INFO | - [obj]: lift=4.57e+02, drag=7.38e+01, lift_drag=6.20e+00
amplify-bbopt | 2024/10/04 05:47:28 | INFO | y_hat=-6.202e+00, best objective=-6.202e+00

Once all optimization cycles are completed, you can check the final solution as follows. You can also plot the optimization history. See 6. Visualization for more detials.

# Print results
print(f"{optimizer.best_solution=}")  # Solution (optimal input)
print(f"{optimizer.best_objective=:.3e}")  # Objective function value
optimizer.best_solution={'wing_width': 19.616161616161616, 'wing_height': 2.263157894736842, 'wing_angle': 7.105263157894736}
optimizer.best_objective=-6.202e+00

Advanced setting

By default, an KernelQAOptimizer class instance considers the ModelKernel class as a surrogate model class and the ModelKernelTrainer class as the model trainer class. As for the model parameters of the ModelKernel class, there are the following parameters.

  • \(\beta\): A weight for ‘sigma’ in the lower confidence bound (LCB). If multiple values of beta are given as a list, a value in the list is chosen and used cyclically each time the model is constructed (=in each optimization cycle). When beta = 0 or min(beta) = 0, ModelKernel.is_sigma_required is False, meaning the model uncertainty is not considered.

  • \(\gamma\): A constant in the kernel function to define a linear term in the model. gamma = 0 means no linear term is considered in the model. Note that when one of a priori encoding methods (()[conversion.ipynb]) is chosen for all decision variables, linear terms are expected to be modeled naturally in the quadratic terms.

The default training parameters are mentioned in ModelKernelTrainer.set_train_params. You can adjust/modify these model trainers and model classes and their parameters.

Model parameters

In the above optimization problem, let’s adjust the parameter \(\gamma\) of the FM model (ModelKernel) from the default value of 1 to 0.

trainer = optimizer.trainer
trainer.set_model_params(gamma=0)

Training parameters

You can also adjust training parameters for the model. For the default trainer class ModelKernelTrainer, you can use the ModelKernelTrainer.set_train_params method. The following code line modifies the regularization parameters reg_param. With print(trainer), you can see that the changes to the above model and training parameters have been successfully made.

trainer.set_train_params(reg_param=0.1)
print(trainer)
--------------------
trainer class: ModelKernelTrainer
model class: ModelKernel
model params: {beta: 0.0, gamma: 0}
reg_param: 0.1

You can also use your own custom model and trainer classes. For details, see Custom Trainer and Model.