# Speedup Formulation

Formulating a large combinatorial optimization problem using Python's `for` statements can be very time-consuming. The Amplify SDK provides a fast way to formulate large optimization problems in practical time.
For example, significant speedups can be achieved using Numpy-like methods in the {py:class}`~amplify.PolyArray` class.

This page describes several methods for fast formulations, using the binary variable formulation of the traveling salesperson problem as an example.

## Formulating the traveling salesperson problem

The traveling salesperson problem with $n$ cities can be formulated using $(n+1)n$ binary variables as follows: Let $d$ be a distance matrix, where $d_{i,j}$ denotes the distance between city $i$ and city $j$.

```{seealso}
We will not explain what the variables, objective function, and constraints refer to since the meaning of the formulation is unimportant here. See [](tsp.ipynb) for more details on formulating the traveling salesperson problem.
```

$$
\begin{align}
\text{minimize}   \quad & \sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} & \\
\text{subject to} \quad & \sum_{0 \leq i < n} q_{k, i} = 1 \quad & \text{for} \quad k \in \{0, 1, \ldots, n - 1\}, \\
                        & \sum_{0 \leq k < n} q_{k, i} = 1 \quad & \text{for} \quad i \in \{0, 1, \ldots, n - 1\}, \\
                        & q_{0, i} = q_{n, i} \quad & \text{for} \quad i \in \{0, 1, \ldots, n - 1\}, \\
                        & q_{k, i} \in \{0, 1\}        &
\end{align}
$$

Before starting the formulation, let's create the distance matrix `distance` (= $d$). This time `distance` will be generated as a random symmetric matrix.

```{testcode}
import numpy as np

NUM_CITIES = 100

# Generate a random symmetric matrix
distance = np.zeros((NUM_CITIES, NUM_CITIES))
for i in range(NUM_CITIES):
    for j in range(i+1, NUM_CITIES):
        distance[i, j] = distance[j, i] = np.random.rand()
```

````{note}
For practical purposes, it is better to generate the coordinates of each city and find the distance matrix. For example, using [scipy](https://scipy.org/) the following method can be used for fast computation.

```python
from scipy.spatial import distance

locations = np.random.random((NUM_CITIES, 2)) # randomly generate coordinates
distance = distance.cdist(locations, locations, metric='euclidean')
```

````

## Naive formulations

The `for` statement and the list comprehension, you can formulate the problem as follows.

```{testcode}
from amplify import Poly, VariableGenerator, one_hot

# Create decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)

# Set q[NUM_CITIES, i] = q[0, i]
for i in range(NUM_CITIES):
    q[NUM_CITIES, i] = q[0, i]

# Construct the objective function
objective = 0
for i in range(NUM_CITIES):
    for j in range(NUM_CITIES):
        for k in range(NUM_CITIES):
            objective += distance[i, j] * q[k, i] * q[k + 1, j]

# Construct constraints
row_one_hot_constraints = sum(
    one_hot(sum(q[i, k] for k in range(NUM_CITIES))) for i in range(NUM_CITIES)
)
col_one_hot_constraints = sum(
    one_hot(sum(q[i, k] for i in range(NUM_CITIES))) for k in range(NUM_CITIES)
)

# Create a model
model = objective + (row_one_hot_constraints + col_one_hot_constraints)
```

The resulting formulation time is as follows (depending on the execution environment).

```{list-table}
:header-rows: 1
:width: 90%
:widths: 1 1 1
* - Objective function
  - Constraints
  - Total
* - 1.55 s
  - 13.3 ms
  - 1.56 s
```

## Improvement #1: Not using the built-in sum function

In the first line of the above code, add

```{testcode}
from amplify import sum
```

and use the {py:func}`amplify.sum` function instead of the built-in {py:func}`sum` function. The {py:func}`amplify.sum` function is a class sum-specific function provided by the Amplify SDK. Then the formulation time is improved as follows (depending on the execution environment).

```{list-table}
:header-rows: 1
:width: 90%
:widths: 1 1 1
* - Objective function
  - Constraints
  - Total
* - 1.55 s
  - **3.95 ms**
  - 1.55 s
```

Simply importing the Amplify SDK's {py:func}`amplify.sum` function speeds up the constraint formulation by about a factor of 3. The difference from the overall time is small, but in the case of the traveling salesman problem, this is because the number of constraints ($2n$) is small compared to the number of terms in the objective function ($O\left(n^3\right)$). This optimization can be very effective if the {py:func}`sum` function is used in areas where the formulation is time-consuming.

```{note}
The Amplify SDK's {py:func}`amplify.sum` function overrides the {py:func}`sum` function. The Amplify SDK's {py:func}`sum` function automatically falls back to the built-in {py:func}`sum` function for classes not provided by the Amplify SDK, allowing for replacement.
```

## Improvement #2: Not using `for` statements and list comprehension

The objective function used to be formulated using the `for` statement, but Python's `for` statement is slow. As much as possible, we will try to formulate them using the methods of {py:class}`~amplify.PolyArray`, a numpy-like polynomial array of the Amplify SDK, or numpy.

First, let $A$ be the $n \times n$ matrix consisting of the upper $n rows of $q$ and $B$ be the $n\times n$ matrix consisting of the lower $n$ rows. Then the objective function is written as follows.

$$
\sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} = \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j} 
$$

Looking at $A$ and $d$,

$$
(Ad)_{k, j} = \sum_{0 \leq k, j < n} A_{k, i} d_{i, j}.
$$

Therefore, the objective function $\displaystyle \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j}$ can be the sum of "the matrix product of ($A$ and $d$) and the element product of $B$". Thus, the objective function can be written with the Amplify SDK as follows.

```{testcode}
q1 = q[:-1]
q2 = q[1:]
objective = ((q1 @ distance) * q2).sum()
```

Constraints can also be passed to the {py:func}`~amplify.one_hot` function with the `axis` keyword argument and written as follows.

```
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)
```

Overall the code will looks like this.

```{testcode}
from amplify import VariableGenerator, one_hot

# Generate decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)

# Set q[NUM_CITIES, i] = q[0, i]
q[-1, :] = q[0, :]

# Create slices of q
q1 = q[:-1]
q2 = q[1:]

# Constructe the objective function
objective = ((q1 @ distance) * q2).sum()

# Construct the constraints
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)

model = objective + (row_one_hot_constraints + col_one_hot_constraints)
```

With this modification, the time taken is improved as follows (depending on the execution environment).

```{list-table}
:header-rows: 1
:width: 90%
:widths: 1 1 1
* - Objective function
  - Constraints
  - Total
* - **152.1 ms**
  - **0.870 ms**
  - **154.4 ms**
```

## Improvement #3: taking it straightforwardly

In the "Improvement #2", we tried to transform the objective function into a form that would work with numpy's methods, but it takes a lot of work to think about the transformation. The Amplify SDK supports the {py:func}`~amplify.einsum` function, which can mechanically formulate such a function using Einstein's contraction notation.

First, as in "Improvement #2", by denoting the $n \times n$ matrix consisting of the upper $n$ rows of $q$ as $A$ and the $n\times n$ matrix consisting of the lower n rows as $B$, the objective function is written as follows.

$$
\sum_{0 \leq i, j, k < n} d_{i, j} q_{k, i} q_{k+1, j} = \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j}
$$

Note that the range over which the subscripts $i$, $j$, and $k$ of the matrices $d$, $A$, and $B$ appearing in this formula move coincides with the length of the corresponding row or column, respectively. In such a case, using the {py:func}`~amplify.einsum` function, the computation of the objective function can be written as follows.

```
# Construct the objective function
objective = einsum("ij,ki,kj->", distance, q1, q2)
```

Here, the first argument of the {py:func}`~amplify.einsum` function represents the sequence of subscripts of the product in Einstein's contraction notation, where `"ij,ki,kj->"`{l=python}" is the product $d_{i, j} A_{k, i} B_{k, j}$. The second and subsequent arguments then give the arrays corresponding to the respective subscripts.

The complete code is as follows.

```{testcode}
from amplify import VariableGenerator, one_hot, einsum

# Create decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)

# Set q[NUM_CITIES, i] = q[0, i]
q[-1, :] = q[0, :]

# Create slices of q
q1 = q[:-1]
q2 = q[1:]

# Construct the objective function
objective = einsum("ij,ki,kj->", distance, q1, q2)

# Constract the constraints
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)

# Create a model
model = objective + (row_one_hot_constraints + col_one_hot_constraints)
```

With this formulation, we observed the following times (depending on the execution environment).

```{list-table}
:header-rows: 1
:width: 90%
:widths: 1 1 1
* - Objective function
  - Constraints
  - Total
* - **58.11 ms**
  - 0.870 ms
  - **60.4 ms**
```

As a result of the above, the objective function and overall computation time became **25 times** faster, and the constraint construction became **15 times** faster.