# Speedup Formulation Formulating a large combinatorial optimization problem using Python's `for` statements can be very time-consuming. The Amplify SDK provides a fast way to formulate large optimization problems in practical time. For example, significant speedups can be achieved using Numpy-like methods in the {py:class}`~amplify.PolyArray` class. This page describes several methods for fast formulations, using the binary variable formulation of the traveling salesperson problem as an example. ## Formulating the traveling salesperson problem The traveling salesperson problem with $n$ cities can be formulated using $(n+1)n$ binary variables as follows: Let $d$ be a distance matrix, where $d_{i,j}$ denotes the distance between city $i$ and city $j$. ```{seealso} We will not explain what the variables, objective function, and constraints refer to since the meaning of the formulation is unimportant here. See [](tsp.ipynb) for more details on formulating the traveling salesperson problem. ``` $$ \begin{align} \text{minimize} \quad & \sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} & \\ \text{subject to} \quad & \sum_{0 \leq i < n} q_{k, i} = 1 \quad & \text{for} \quad k \in \{0, 1, \ldots, n - 1\}, \\ & \sum_{0 \leq k < n} q_{k, i} = 1 \quad & \text{for} \quad i \in \{0, 1, \ldots, n - 1\}, \\ & q_{0, i} = q_{n, i} \quad & \text{for} \quad i \in \{0, 1, \ldots, n - 1\}, \\ & q_{k, i} \in \{0, 1\} & \end{align} $$ Before starting the formulation, let's create the distance matrix `distance` (= $d$). This time `distance` will be generated as a random symmetric matrix. ```{testcode} import numpy as np NUM_CITIES = 100 # Generate a random symmetric matrix distance = np.zeros((NUM_CITIES, NUM_CITIES)) for i in range(NUM_CITIES): for j in range(i+1, NUM_CITIES): distance[i, j] = distance[j, i] = np.random.rand() ``` ````{note} For practical purposes, it is better to generate the coordinates of each city and find the distance matrix. For example, using [scipy](https://scipy.org/) the following method can be used for fast computation. ```python from scipy.spatial import distance locations = np.random.random((NUM_CITIES, 2)) # randomly generate coordinates distance = distance.cdist(locations, locations, metric='euclidean') ``` ```` ## Naive formulations The `for` statement and the list comprehension, you can formulate the problem as follows. ```{testcode} from amplify import Poly, VariableGenerator, one_hot # Create decision variables gen = VariableGenerator() q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES) # Set q[NUM_CITIES, i] = q[0, i] for i in range(NUM_CITIES): q[NUM_CITIES, i] = q[0, i] # Construct the objective function objective = 0 for i in range(NUM_CITIES): for j in range(NUM_CITIES): for k in range(NUM_CITIES): objective += distance[i, j] * q[k, i] * q[k + 1, j] # Construct constraints row_one_hot_constraints = sum( one_hot(sum(q[i, k] for k in range(NUM_CITIES))) for i in range(NUM_CITIES) ) col_one_hot_constraints = sum( one_hot(sum(q[i, k] for i in range(NUM_CITIES))) for k in range(NUM_CITIES) ) # Create a model model = objective + (row_one_hot_constraints + col_one_hot_constraints) ``` The resulting formulation time is as follows (depending on the execution environment). ```{list-table} :header-rows: 1 :width: 90% :widths: 1 1 1 * - Objective function - Constraints - Total * - 1.55 s - 13.3 ms - 1.56 s ``` ## Improvement #1: Not using the built-in sum function In the first line of the above code, add ```{testcode} from amplify import sum ``` and use the {py:func}`amplify.sum` function instead of the built-in {py:func}`sum` function. The {py:func}`amplify.sum` function is a class sum-specific function provided by the Amplify SDK. Then the formulation time is improved as follows (depending on the execution environment). ```{list-table} :header-rows: 1 :width: 90% :widths: 1 1 1 * - Objective function - Constraints - Total * - 1.55 s - **3.95 ms** - 1.55 s ``` Simply importing the Amplify SDK's {py:func}`amplify.sum` function speeds up the constraint formulation by about a factor of 3. The difference from the overall time is small, but in the case of the traveling salesman problem, this is because the number of constraints ($2n$) is small compared to the number of terms in the objective function ($O\left(n^3\right)$). This optimization can be very effective if the {py:func}`sum` function is used in areas where the formulation is time-consuming. ```{note} The Amplify SDK's {py:func}`amplify.sum` function overrides the {py:func}`sum` function. The Amplify SDK's {py:func}`sum` function automatically falls back to the built-in {py:func}`sum` function for classes not provided by the Amplify SDK, allowing for replacement. ``` ## Improvement #2: Not using `for` statements and list comprehension The objective function used to be formulated using the `for` statement, but Python's `for` statement is slow. As much as possible, we will try to formulate them using the methods of {py:class}`~amplify.PolyArray`, a numpy-like polynomial array of the Amplify SDK, or numpy. First, let $A$ be the $n \times n$ matrix consisting of the upper $n rows of $q$ and $B$ be the $n\times n$ matrix consisting of the lower $n$ rows. Then the objective function is written as follows. $$ \sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} = \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j} $$ Looking at $A$ and $d$, $$ (Ad)_{k, j} = \sum_{0 \leq k, j < n} A_{k, i} d_{i, j}. $$ Therefore, the objective function $\displaystyle \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j}$ can be the sum of "the matrix product of ($A$ and $d$) and the element product of $B$". Thus, the objective function can be written with the Amplify SDK as follows. ```{testcode} q1 = q[:-1] q2 = q[1:] objective = ((q1 @ distance) * q2).sum() ``` Constraints can also be passed to the {py:func}`~amplify.one_hot` function with the `axis` keyword argument and written as follows. ``` row_one_hot_constraints = one_hot(q1, axis=1) col_one_hot_constraints = one_hot(q1, axis=0) ``` Overall the code will looks like this. ```{testcode} from amplify import VariableGenerator, one_hot # Generate decision variables gen = VariableGenerator() q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES) # Set q[NUM_CITIES, i] = q[0, i] q[-1, :] = q[0, :] # Create slices of q q1 = q[:-1] q2 = q[1:] # Constructe the objective function objective = ((q1 @ distance) * q2).sum() # Construct the constraints row_one_hot_constraints = one_hot(q1, axis=1) col_one_hot_constraints = one_hot(q1, axis=0) model = objective + (row_one_hot_constraints + col_one_hot_constraints) ``` With this modification, the time taken is improved as follows (depending on the execution environment). ```{list-table} :header-rows: 1 :width: 90% :widths: 1 1 1 * - Objective function - Constraints - Total * - **152.1 ms** - **0.870 ms** - **154.4 ms** ``` ## Improvement #3: taking it straightforwardly In the "Improvement #2", we tried to transform the objective function into a form that would work with numpy's methods, but it takes a lot of work to think about the transformation. The Amplify SDK supports the {py:func}`~amplify.einsum` function, which can mechanically formulate such a function using Einstein's contraction notation. First, as in "Improvement #2", by denoting the $n \times n$ matrix consisting of the upper $n$ rows of $q$ as $A$ and the $n\times n$ matrix consisting of the lower n rows as $B$, the objective function is written as follows. $$ \sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} = \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j} $$ Note that the range over which the subscripts $i$, $j$, and $k$ of the matrices $d$, $A$, and $B$ appearing in this formula move coincides with the length of the corresponding row or column, respectively. In such a case, using the {py:func}`~amplify.einsum` function, the computation of the objective function can be written as follows. ``` # Construct the objective function objective = einsum("ij,ki,kj->", distance, q1, q2) ``` Here, the first argument of the {py:func}`~amplify.einsum` function represents the sequence of subscripts of the product in Einstein's contraction notation, where `"ij,ki,kj->"`{l=python}" is the product $d_{i, j} A_{k, i} B_{k, j}$. The second and subsequent arguments then give the arrays corresponding to the respective subscripts. The complete code is as follows. ```{testcode} from amplify import VariableGenerator, one_hot, einsum # Create decision variables gen = VariableGenerator() q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES) # Set q[NUM_CITIES, i] = q[0, i] q[-1, :] = q[0, :] # Create slices of q q1 = q[:-1] q2 = q[1:] # Construct the objective function objective = einsum("ij,ki,kj->", distance, q1, q2) # Constract the constraints row_one_hot_constraints = one_hot(q1, axis=1) col_one_hot_constraints = one_hot(q1, axis=0) # Create a model model = objective + (row_one_hot_constraints + col_one_hot_constraints) ``` With this formulation, we observed the following times (depending on the execution environment). ```{list-table} :header-rows: 1 :width: 90% :widths: 1 1 1 * - Objective function - Constraints - Total * - **58.11 ms** - 0.870 ms - **60.4 ms** ``` As a result of the above, the objective function and overall computation time became **25 times** faster, and the constraint construction became **15 times** faster.