# Speedup Formulation
Formulating a large combinatorial optimization problem using Python's `for` statements can be very time-consuming. The Amplify SDK provides a fast way to formulate large optimization problems in practical time.
For example, significant speedups can be achieved using Numpy-like methods in the {py:class}`~amplify.PolyArray` class.
This page describes several methods for fast formulations, using the binary variable formulation of the traveling salesperson problem as an example.
## Formulating the traveling salesperson problem
The traveling salesperson problem with $n$ cities can be formulated using $(n+1)n$ binary variables as follows: Let $d$ be a distance matrix, where $d_{i,j}$ denotes the distance between city $i$ and city $j$.
```{seealso}
We will not explain what the variables, objective function, and constraints refer to since the meaning of the formulation is unimportant here. See [](tsp.ipynb) for more details on formulating the traveling salesperson problem.
```
$$
\begin{align}
\text{minimize} \quad & \sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} & \\
\text{subject to} \quad & \sum_{0 \leq i < n} q_{k, i} = 1 \quad & \text{for} \quad k \in \{0, 1, \ldots, n - 1\}, \\
& \sum_{0 \leq k < n} q_{k, i} = 1 \quad & \text{for} \quad i \in \{0, 1, \ldots, n - 1\}, \\
& q_{0, i} = q_{n, i} \quad & \text{for} \quad i \in \{0, 1, \ldots, n - 1\}, \\
& q_{k, i} \in \{0, 1\} &
\end{align}
$$
Before starting the formulation, let's create the distance matrix `distance` (= $d$). This time `distance` will be generated as a random symmetric matrix.
```{testcode}
import numpy as np
NUM_CITIES = 100
# Generate a random symmetric matrix
distance = np.zeros((NUM_CITIES, NUM_CITIES))
for i in range(NUM_CITIES):
for j in range(i+1, NUM_CITIES):
distance[i, j] = distance[j, i] = np.random.rand()
```
````{note}
For practical purposes, it is better to generate the coordinates of each city and find the distance matrix. For example, using [scipy](https://scipy.org/) the following method can be used for fast computation.
```python
from scipy.spatial import distance
locations = np.random.random((NUM_CITIES, 2)) # randomly generate coordinates
distance = distance.cdist(locations, locations, metric='euclidean')
```
````
## Naive formulations
The `for` statement and the list comprehension, you can formulate the problem as follows.
```{testcode}
from amplify import Poly, VariableGenerator, one_hot
# Create decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)
# Set q[NUM_CITIES, i] = q[0, i]
for i in range(NUM_CITIES):
q[NUM_CITIES, i] = q[0, i]
# Construct the objective function
objective = 0
for i in range(NUM_CITIES):
for j in range(NUM_CITIES):
for k in range(NUM_CITIES):
objective += distance[i, j] * q[k, i] * q[k + 1, j]
# Construct constraints
row_one_hot_constraints = sum(
one_hot(sum(q[i, k] for k in range(NUM_CITIES))) for i in range(NUM_CITIES)
)
col_one_hot_constraints = sum(
one_hot(sum(q[i, k] for i in range(NUM_CITIES))) for k in range(NUM_CITIES)
)
# Create a model
model = objective + (row_one_hot_constraints + col_one_hot_constraints)
```
The resulting formulation time is as follows (depending on the execution environment).
```{list-table}
:header-rows: 1
:width: 90%
:widths: 1 1 1
* - Objective function
- Constraints
- Total
* - 1.55 s
- 13.3 ms
- 1.56 s
```
## Improvement #1: Not using the built-in sum function
In the first line of the above code, add
```{testcode}
from amplify import sum
```
and use the {py:func}`amplify.sum` function instead of the built-in {py:func}`sum` function. The {py:func}`amplify.sum` function is a class sum-specific function provided by the Amplify SDK. Then the formulation time is improved as follows (depending on the execution environment).
```{list-table}
:header-rows: 1
:width: 90%
:widths: 1 1 1
* - Objective function
- Constraints
- Total
* - 1.55 s
- **3.95 ms**
- 1.55 s
```
Simply importing the Amplify SDK's {py:func}`amplify.sum` function speeds up the constraint formulation by about a factor of 3. The difference from the overall time is small, but in the case of the traveling salesman problem, this is because the number of constraints ($2n$) is small compared to the number of terms in the objective function ($O\left(n^3\right)$). This optimization can be very effective if the {py:func}`sum` function is used in areas where the formulation is time-consuming.
```{note}
The Amplify SDK's {py:func}`amplify.sum` function overrides the {py:func}`sum` function. The Amplify SDK's {py:func}`sum` function automatically falls back to the built-in {py:func}`sum` function for classes not provided by the Amplify SDK, allowing for replacement.
```
## Improvement #2: Not using `for` statements and list comprehension
The objective function used to be formulated using the `for` statement, but Python's `for` statement is slow. As much as possible, we will try to formulate them using the methods of {py:class}`~amplify.PolyArray`, a numpy-like polynomial array of the Amplify SDK, or numpy.
First, let $A$ be the $n \times n$ matrix consisting of the upper $n rows of $q$ and $B$ be the $n\times n$ matrix consisting of the lower $n$ rows. Then the objective function is written as follows.
$$
\sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} = \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j}
$$
Looking at $A$ and $d$,
$$
(Ad)_{k, j} = \sum_{0 \leq k, j < n} A_{k, i} d_{i, j}.
$$
Therefore, the objective function $\displaystyle \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j}$ can be the sum of "the matrix product of ($A$ and $d$) and the element product of $B$". Thus, the objective function can be written with the Amplify SDK as follows.
```{testcode}
q1 = q[:-1]
q2 = q[1:]
objective = ((q1 @ distance) * q2).sum()
```
Constraints can also be passed to the {py:func}`~amplify.one_hot` function with the `axis` keyword argument and written as follows.
```
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)
```
Overall the code will looks like this.
```{testcode}
from amplify import VariableGenerator, one_hot
# Generate decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)
# Set q[NUM_CITIES, i] = q[0, i]
q[-1, :] = q[0, :]
# Create slices of q
q1 = q[:-1]
q2 = q[1:]
# Constructe the objective function
objective = ((q1 @ distance) * q2).sum()
# Construct the constraints
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)
model = objective + (row_one_hot_constraints + col_one_hot_constraints)
```
With this modification, the time taken is improved as follows (depending on the execution environment).
```{list-table}
:header-rows: 1
:width: 90%
:widths: 1 1 1
* - Objective function
- Constraints
- Total
* - **152.1 ms**
- **0.870 ms**
- **154.4 ms**
```
## Improvement #3: taking it straightforwardly
In the "Improvement #2", we tried to transform the objective function into a form that would work with numpy's methods, but it takes a lot of work to think about the transformation. The Amplify SDK supports the {py:func}`~amplify.einsum` function, which can mechanically formulate such a function using Einstein's contraction notation.
First, as in "Improvement #2", by denoting the $n \times n$ matrix consisting of the upper $n$ rows of $q$ as $A$ and the $n\times n$ matrix consisting of the lower n rows as $B$, the objective function is written as follows.
$$
\sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} = \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j}
$$
Note that the range over which the subscripts $i$, $j$, and $k$ of the matrices $d$, $A$, and $B$ appearing in this formula move coincides with the length of the corresponding row or column, respectively. In such a case, using the {py:func}`~amplify.einsum` function, the computation of the objective function can be written as follows.
```
# Construct the objective function
objective = einsum("ij,ki,kj->", distance, q1, q2)
```
Here, the first argument of the {py:func}`~amplify.einsum` function represents the sequence of subscripts of the product in Einstein's contraction notation, where `"ij,ki,kj->"`{l=python}" is the product $d_{i, j} A_{k, i} B_{k, j}$. The second and subsequent arguments then give the arrays corresponding to the respective subscripts.
The complete code is as follows.
```{testcode}
from amplify import VariableGenerator, one_hot, einsum
# Create decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)
# Set q[NUM_CITIES, i] = q[0, i]
q[-1, :] = q[0, :]
# Create slices of q
q1 = q[:-1]
q2 = q[1:]
# Construct the objective function
objective = einsum("ij,ki,kj->", distance, q1, q2)
# Constract the constraints
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)
# Create a model
model = objective + (row_one_hot_constraints + col_one_hot_constraints)
```
With this formulation, we observed the following times (depending on the execution environment).
```{list-table}
:header-rows: 1
:width: 90%
:widths: 1 1 1
* - Objective function
- Constraints
- Total
* - **58.11 ms**
- 0.870 ms
- **60.4 ms**
```
As a result of the above, the objective function and overall computation time became **25 times** faster, and the constraint construction became **15 times** faster.