Speedup Formulation#
Formulating a large combinatorial optimization problem using Python’s for
statements can be very time-consuming. The Amplify SDK provides a fast way to formulate large optimization problems in practical time.
For example, significant speedups can be achieved using Numpy-like methods in the PolyArray
class.
This page describes several methods for fast formulations, using the binary variable formulation of the traveling salesperson problem as an example.
Formulating the traveling salesperson problem#
The traveling salesperson problem with \(n\) cities can be formulated using \((n+1)n\) binary variables as follows: Let \(d\) be a distance matrix, where \(d_{i,j}\) denotes the distance between city \(i\) and city \(j\).
See also
We will not explain what the variables, objective function, and constraints refer to since the meaning of the formulation is unimportant here. See Traveling Salesperson Problem for more details on formulating the traveling salesperson problem.
Before starting the formulation, let’s create the distance matrix distance
(= \(d\)). This time distance
will be generated as a random symmetric matrix.
import numpy as np
NUM_CITIES = 100
# Generate a random symmetric matrix
distance = np.zeros((NUM_CITIES, NUM_CITIES))
for i in range(NUM_CITIES):
for j in range(i+1, NUM_CITIES):
distance[i, j] = distance[j, i] = np.random.rand()
Note
For practical purposes, it is better to generate the coordinates of each city and find the distance matrix. For example, using scipy the following method can be used for fast computation.
from scipy.spatial import distance
locations = np.random.random((NUM_CITIES, 2)) # randomly generate coordinates
distance = distance.cdist(locations, locations, metric='euclidean')
Naive formulations#
The for
statement and the list comprehension, you can formulate the problem as follows.
from amplify import Poly, VariableGenerator, one_hot
# Create decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)
# Set q[NUM_CITIES, i] = q[0, i]
for i in range(NUM_CITIES):
q[NUM_CITIES, i] = q[0, i]
# Construct the objective function
objective = 0
for i in range(NUM_CITIES):
for j in range(NUM_CITIES):
for k in range(NUM_CITIES):
objective += distance[i, j] * q[k, i] * q[k + 1, j]
# Construct constraints
row_one_hot_constraints = sum(
one_hot(sum(q[i, k] for k in range(NUM_CITIES))) for i in range(NUM_CITIES)
)
col_one_hot_constraints = sum(
one_hot(sum(q[i, k] for i in range(NUM_CITIES))) for k in range(NUM_CITIES)
)
# Create a model
model = objective + (row_one_hot_constraints + col_one_hot_constraints)
The resulting formulation time is as follows (depending on the execution environment).
Objective function |
Constraints |
Total |
---|---|---|
1.55 s |
13.3 ms |
1.56 s |
Improvement #1: Not using the built-in sum function#
In the first line of the above code, add
from amplify import sum
and use the amplify.sum()
function instead of the built-in sum()
function. The amplify.sum()
function is a class sum-specific function provided by the Amplify SDK. Then the formulation time is improved as follows (depending on the execution environment).
Objective function |
Constraints |
Total |
---|---|---|
1.55 s |
3.95 ms |
1.55 s |
Simply importing the Amplify SDK’s amplify.sum()
function speeds up the constraint formulation by about a factor of 3. The difference from the overall time is small, but in the case of the traveling salesman problem, this is because the number of constraints (\(2n\)) is small compared to the number of terms in the objective function (\(O\left(n^3\right)\)). This optimization can be very effective if the sum()
function is used in areas where the formulation is time-consuming.
Note
The Amplify SDK’s amplify.sum()
function overrides the sum()
function. The Amplify SDK’s sum()
function automatically falls back to the built-in sum()
function for classes not provided by the Amplify SDK, allowing for replacement.
Improvement #2: Not using for
statements and list comprehension#
The objective function used to be formulated using the for
statement, but Python’s for
statement is slow. As much as possible, we will try to formulate them using the methods of PolyArray
, a numpy-like polynomial array of the Amplify SDK, or numpy.
First, let \(A\) be the \(n \times n\) matrix consisting of the upper \(n rows of \)q\( and \)B\( be the \)n\times n\( matrix consisting of the lower \)n$ rows. Then the objective function is written as follows.
Looking at \(A\) and \(d\),
Therefore, the objective function \(\displaystyle \sum_{0 \leq i, j, k < n} d_{i, j} A_{k, i} B_{k, j}\) can be the sum of “the matrix product of (\(A\) and \(d\)) and the element product of \(B\)”. Thus, the objective function can be written with the Amplify SDK as follows.
q1 = q[:-1]
q2 = q[1:]
objective = ((q1 @ distance) * q2).sum()
Constraints can also be passed to the one_hot()
function with the axis
keyword argument and written as follows.
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)
Overall the code will looks like this.
from amplify import VariableGenerator, one_hot
# Generate decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)
# Set q[NUM_CITIES, i] = q[0, i]
q[-1, :] = q[0, :]
# Create slices of q
q1 = q[:-1]
q2 = q[1:]
# Constructe the objective function
objective = ((q1 @ distance) * q2).sum()
# Construct the constraints
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)
model = objective + (row_one_hot_constraints + col_one_hot_constraints)
With this modification, the time taken is improved as follows (depending on the execution environment).
Objective function |
Constraints |
Total |
---|---|---|
152.1 ms |
0.870 ms |
154.4 ms |
Improvement #3: taking it straightforwardly#
In the “Improvement #2”, we tried to transform the objective function into a form that would work with numpy’s methods, but it takes a lot of work to think about the transformation. The Amplify SDK supports the einsum()
function, which can mechanically formulate such a function using Einstein’s contraction notation.
First, as in “Improvement #2”, by denoting the \(n \times n\) matrix consisting of the upper \(n\) rows of \(q\) as \(A\) and the \(n\times n\) matrix consisting of the lower n rows as \(B\), the objective function is written as follows.
Note that the range over which the subscripts \(i\), \(j\), and \(k\) of the matrices \(d\), \(A\), and \(B\) appearing in this formula move coincides with the length of the corresponding row or column, respectively. In such a case, using the einsum()
function, the computation of the objective function can be written as follows.
# Construct the objective function
objective = einsum("ij,ki,kj->", distance, q1, q2)
Here, the first argument of the einsum()
function represents the sequence of subscripts of the product in Einstein’s contraction notation, where "ij,ki,kj->"
” is the product \(d_{i, j} A_{k, i} B_{k, j}\). The second and subsequent arguments then give the arrays corresponding to the respective subscripts.
The complete code is as follows.
from amplify import VariableGenerator, one_hot, einsum
# Create decision variables
gen = VariableGenerator()
q = gen.array("Binary", NUM_CITIES + 1, NUM_CITIES)
# Set q[NUM_CITIES, i] = q[0, i]
q[-1, :] = q[0, :]
# Create slices of q
q1 = q[:-1]
q2 = q[1:]
# Construct the objective function
objective = einsum("ij,ki,kj->", distance, q1, q2)
# Constract the constraints
row_one_hot_constraints = one_hot(q1, axis=1)
col_one_hot_constraints = one_hot(q1, axis=0)
# Create a model
model = objective + (row_one_hot_constraints + col_one_hot_constraints)
With this formulation, we observed the following times (depending on the execution environment).
Objective function |
Constraints |
Total |
---|---|---|
58.11 ms |
0.870 ms |
60.4 ms |
As a result of the above, the objective function and overall computation time became 25 times faster, and the constraint construction became 15 times faster.