Discretizing Nonlinear Functions for Optimization

Connor Owen
cowen33@gatech.edu

One important assumption for any linear program, or mixed integer linear program, is that all functions in the program be linear. While there are techniques to solve nonlinear programs, they generally are computationally more expensive, and often intractable (i.e., not practically solvable). Instead of using these nonlinear techniques, there are ways that we can approximate nonlinear functions with linear functions so that we can use typical linear optimization engines.

In this notebook, we'll cover a simple formulation for approximating a nonlinear function in a linear program in one dimension. After, we'll cover an improvement to the formulation that requires some simple modifications.

We'll be using the libraries NumPy and Matplotlib for math and graphing functions, and then PuLP for formulating and solving the optimization problems.

Discretizing a Nonlinear Function in One Dimension

To illustrate the formulation, we will use a toy problem of optimizing the function $ x \sin{(\frac{8 \pi x} {10})} $ over the domain $[0, 10]$. Below we graph the function using Matplotlib.

In [7]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

def sinFunction():
    Fs = 10
    f = 4
    sample = 10 #end of domain
    x = np.arange(0, sample, 0.01) ##step size of 0.01 to make high quality graph
    y = x*np.sin(2 * np.pi * f * x / Fs)
    plt.plot(x, y)
    plt.xlabel('x')
    plt.ylabel('f(x)')
    plt.show()

sinFunction()

The basic idea behind discretizing this function is that we will create break points to split the domain into several segments, take the function value at the breakpoints of the domain, and then connect these points with straight lines. The resulting piecewise line segment will be a linear approximation of our nonlinear sin function.

Below, we will define a function to make a list of the x-values of the breakpoints given a number of segments and the length of the domain. We also will define a function to return the function value of our function above given an x-value.

In [8]:
def createBreakPoints(numSegments, xRange):
    """
    numSegments is the number of segments that we will have after we split up the domain
    xRange is the length of the domain that we are considering 
    """
    breakPointValues = [ i*xRange/(numSegments) for i in range(numSegments + 1)]
    return(breakPointValues)


def optFunction(x):
    Fs = 10
    f = 4
    functionValue = x*np.sin(2 * np.pi * f * x / Fs)
    return functionValue

Let's take a look at what our approximate function looks like with varying levels of segments.

12 Segments

In [14]:
numSegments = 12
xRange = 10
breakPointValues = createBreakPoints(numSegments, xRange)
for i in range(numSegments):
    plt.plot(breakPointValues[i:i+2], [optFunction(breakPointValues[i]),optFunction(breakPointValues[i+1])], 'ro-')

sinFunction()

25 Segments

In [15]:
numSegments = 25
xRange = 10
breakPointValues = createBreakPoints(numSegments, xRange)
for i in range(numSegments):
    plt.plot(breakPointValues[i:i+2], [optFunction(breakPointValues[i]),optFunction(breakPointValues[i+1])], 'ro-')

sinFunction()

100 Segments

In [16]:
numSegments = 100
xRange = 10
breakPointValues = createBreakPoints(numSegments, xRange)
for i in range(numSegments):
    plt.plot(breakPointValues[i:i+2], [optFunction(breakPointValues[i]),optFunction(breakPointValues[i+1])], 'ro-')

sinFunction()

As you can see with the above illustration, the more segments we define, the more accurate an approximation we create of the sin function. In general, the more segments we define, the closer we will get to the optimal value of the nonlinear function.

Now we will cover the optimization problem that we wish to solve. First, the nonlinear version:

$$\max_x \left\{ x \sin{(\frac{8 \pi x} {10})}: 0 \leq x \leq 10 \right\} $$



Now, we will begin the discretization of our function. Source [1] shows this implementation as Model II. We begin by dividing the $x$ domain into convex combinations of $k$ breakpoints, since any $x$ value can be written as the convex combination as two consecutive points in the domain.

$$ x = \sum^{k}_{i \ = \ 0} a_i \xi_i \tag{1}$$$$ \sum^{k}_{i \ = \ 0} \xi_i = 1 \tag{2}$$$$ \xi_i \geq 0 \; \mathrm{for} \; 0 \leq i \leq k \tag{3}$$

The constraints above allow us to define the domain as convex combinations of two points. If we assume that the one to two breakpoints are consecutive, then we can define the approximate function value as the convex combination of the breakpoints and the function values where the selected breakpoints are:

$$\widehat{f(x)} = \sum^{k}_{i \ = \ 0} f(a_i) \ \xi_i \tag{4}$$

In order to ensure that any positive breakpoints are consecutive, we create a decision variable $z_i$, which is an indicator variable for the segments between the breakpoints; 1 if the segment is selected, 0 otherwise.

$$ 0 \leq \xi_0 \leq z_0 \tag{5}$$$$ 0 \leq \xi_i \leq z_{i-1} \;\;\;\; \forall 1 \leq i \leq k - 1 \tag{6}$$$$ 0 \leq \xi_k \leq z_{k-1} \tag{7}$$$$ \sum^{k-1}_{i \ = \ 0} z_i = 1 \tag{8}$$$$ z_i \geq 0 \;\;\;\; \forall \ 1 \leq i \leq k - 2 \tag{9}$$

Constraint (5) ensures that if the first breakpoint is positive, then the first segment is positive; constraint (7) ensures the same for the last breakpoint and the last segment. The constraints from (6) ensure that for the middle breakpoints, if they are positive, then the segment to its left must be positive.

Line (8) ensures that exactly one segment is selected, and (9) ensures non-negativity for $z_i$ (but note that $z_0$ and $z_k$ are guaranteed positivity from constraints (5) and (7) respectively).

Modeling a Nonlinear Function Approximation with PuLP

Now that we have our theoretical model formulation, we can begin the Python formulation with PuLP. We instantiate a problem object, and declare our primary decision variables.

In [4]:
from pulp import * 

##create the problem variable
prob = LpProblem('Linearization Model', LpMaximize)

##primary decision variable##
#---------------------------#
#-domain variable
x = LpVariable("x", 0, None, LpContinuous)

#-variable for approximating nonlinear function
approxFunctionValue = LpVariable("w", 0, None, LpContinuous)

Next we can begin defining the segment variables and the breakpoint variables. When we write our final optimization function, we will make these variables as a function of the number of segments that we want. We also define a list of the breakpoint domain values (as we did above)

In [ ]:
##-create breakpoint variables
breakpointVars = LpVariable.dicts("partition var", 
                                list(range(numSegments + 1)),
                                lowBound = 0,
                                cat = 'Continuous')

##-create segment variables
segmentVars = LpVariable.dicts("segment var",
                                list(range(numSegments)),
                                lowBound = 0,
                                cat = 'Integer')

breakpointValues = createBreakPoints(numSegments, 10)

Now, we define the objective function as the approximate function value, and then add the constraints as they were define earier in our mathematical formulation.

In [ ]:
##-add the objective function to the prob variable
prob += approxFunctionValue

##CONSTRAINTS##
#---------------#
#-define the approx function value as the sum of the inner product of 
#-the breakpoint variables and the function value of where the breakpoints are
prob += approxFunctionValue == sum(breakpointVars[i] * optFunction(breakPointValues[i]) for i in range(numSegments + 1))

#-define x as the sum of the inner product of the breakpoint variables and the corresponding values
prob += x == sum(breakpointVars[i]*breakpointValues[i] for i in range(numSegments + 1))

##Adjacency constraints
prob += breakpointVars[0] >= segmentVars[0]

for i in range(1, numSegments - 1):
    ##print(i, " + ", i+1, " >= ", i)
    prob += breakpointVars[i] + breakpointVars[i+1] >= segmentVars[i]

prob += breakpointVars[numSegments] >= segmentVars[numSegments - 1]

##convex combination constraints
prob += sum(breakpointVars[i] for i in range(numSegments)) == 1

##only 1 segment selected
prob += sum(segmentVars[i] for i in range(numSegments)) == 1

We can also throw in another linear constraint just to show how we an use the discretization along with other constraints.

In [ ]:
#y <= 10-x
prob += approxFunctionValue <= -7/10*x + 7

Note that this constraint is defined with the approximate function value, so it's entirely possible that the optimal solution could violate the nonlinear counterpart to this constraint.

Below, we pull together the entire optimization program, and write it as a function.

In [28]:
def functionLinearApproximation(numSegments, loggingFlag): 
    #-create the problem variable
    prob = LpProblem('Linearization Model', LpMaximize)

    ##primary decision variable##
    #---------------------------#
    #-domain variable
    x = LpVariable("x", 0, None, LpContinuous)

    #-variable for approximating nonlinear function
    approxFunctionValue = LpVariable("w", 0, None, LpContinuous)
    
    #-create breakpoint variables
    breakpointVars = LpVariable.dicts("partition var", 
                                    list(range(numSegments + 1)),
                                    lowBound = 0,
                                    cat = 'Continuous')

    #-create segment variables
    segmentVars = LpVariable.dicts("segment var",
                                    list(range(numSegments)),
                                    lowBound = 0,
                                    cat = 'Integer')
    
    #-create a list of the breakpoint domain values
    breakpointValues = createBreakPoints(numSegments, 10)
    
    ##-add the objective function to the prob variable
    prob += approxFunctionValue

    ##CONSTRAINTS##
    #-------------#
    #-define the approx function value as the sum of the inner product of 
    #-the breakpoint variables and the function value of where the breakpoints are
    prob += approxFunctionValue == sum(breakpointVars[i] * optFunction(breakpointValues[i]) for i in range(numSegments + 1))

    #-define x as the sum of the inner product of the breakpoint variables and the corresponding values
    prob += x == sum(breakpointVars[i]*breakpointValues[i] for i in range(numSegments + 1))

    #-Adjacency constraints
    prob += breakpointVars[0] >= segmentVars[0]

    for i in range(1, numSegments - 1):
        prob += breakpointVars[i] + breakpointVars[i+1] >= segmentVars[i]

    prob += breakpointVars[numSegments] >= segmentVars[numSegments - 1]

    #-convex combination constraints
    prob += sum(breakpointVars[i] for i in range(numSegments)) == 1

    #-only 1 segment selected
    prob += sum(segmentVars[i] for i in range(numSegments)) == 1

    prob += approxFunctionValue <= -7/10*x + 7
    
    #-solve the problem with default sovler
    prob.solve()
    
    if loggingFlag == 1:
        #-if our logging flag is on, we write out all variable optimal values
        print("Status:", LpStatus[prob.status])
        for v in prob.variables():
            print(v.name, " = ", v.varValue)
            
    return (x.varValue, approxFunctionValue.varValue, breakpointValues)

Just below, we'll look at the output of the optimization function for 12 segments. Note that the breakpoints with positive values straddle the segment between them

In [21]:
functionLinearApproximation(12, 1)
Status: Optimal
partition_var_0  =  0.0
partition_var_1  =  0.0
partition_var_10  =  0.0
partition_var_11  =  0.0
partition_var_12  =  0.0
partition_var_2  =  0.0
partition_var_3  =  0.0
partition_var_4  =  0.0
partition_var_5  =  0.0
partition_var_6  =  0.37889832
partition_var_7  =  0.62110168
partition_var_8  =  0.0
partition_var_9  =  0.0
segment_var_0  =  0.0
segment_var_1  =  0.0
segment_var_10  =  0.0
segment_var_11  =  0.0
segment_var_2  =  0.0
segment_var_3  =  0.0
segment_var_4  =  0.0
segment_var_5  =  0.0
segment_var_6  =  1.0
segment_var_7  =  0.0
segment_var_8  =  0.0
segment_var_9  =  0.0
w  =  3.1376907
x  =  5.5175847
Out[21]:
(5.5175847, 3.1376907)

Results and Visualizations

Below, we define a quick function to help us graph the results of the optimization for a given number of segments.

In [39]:
def graphOptimization(numSegments): 
    x, y, breakpointValues = functionLinearApproximation(numSegments, 0)
    plt.plot([0,10],[7,0])
    for i in range(numSegments):
        plt.plot(breakpointValues[i:i+2], [optFunction(breakpointValues[i]),optFunction(breakpointValues[i+1])], 'ro-')
    plt.plot([x], [y], 'go-')
    sinFunction()
    

Let's take a look at the the results from the optimization given 12 segmnets.

12 Segments

In [36]:
graphOptimization(12)

It's important to note that the given the small number of segments, we see a large amount of error in the approximation. You can see that in this case, the actual function value of the sin function would violate the constraint that the point be at or below the blue line. However, to remedy this, we can raise the number of segments, and hopefully get closer to the actual function value.

25 Segments

In [37]:
graphOptimization(25)

100 Segments

In [38]:
graphOptimization(100)

A Better Formulation

While the model described above is certainly functional, there is another model that has more desirable properties for optimization. This model is described as model III in source [1], and we will work on the implementation here.

First, we will take constraints (1)-(4), (5), (7), and (8) from the formulation above. We then add constraints that we will refer to as cumulative adjacency constraints.

$$ \sum^{k-1}_{j = i+1} z_j \leq \sum^{k}_{j=i+1} \xi_j \leq \sum^{k - 1}_{j = i} z_j \tag{10} \;\;\;\; \forall \ 1 \leq i \leq k-2$$

This constraint works similar to the adjacency constraints from the previous model, except that it works cumulatively. Let's take an example in which $i = 5$, and $k = 10$.

$$ z_6 + z_7 + z_8 + z_9 \leq \xi_6 + \xi_7 + \xi_8 + \xi_9 + \xi_{10} \leq z_5 + z_6 + z_7 + z_8 + z_9 $$

It ensures by groupwise constraints that any positive breakpoints straddle a selected segment.

What makes this formulation better than the previous is because it is locally ideal. This means that the LP relaxation of the MIP has all extreme points of $z_i$ in {0,1}. For a small problem like ours, this improvement would probably not make a noticeable difference; however, it could be important for larger problems where speed is important, since it could dramatically reduce the size of the branch-and-bound tree.

Regardless of speed improvement, we implement the alternate implementation below. Note that nearly everything stays the same, except for a few lines of adjacency constraints.

In [44]:
def cumulativeAdjacencyLinearApproximation(numSegments, loggingFlag): 
    ##create the problem variable
    prob = LpProblem('Linearization Model', LpMaximize)

    ##primary decision variable##
    #---------------------------#
    #-domain variable
    x = LpVariable("x", 0, None, LpContinuous)

    #-variable for approximating nonlinear function
    approxFunctionValue = LpVariable("w", 0, None, LpContinuous)
    
    #-create breakpoint variables
    breakpointVars = LpVariable.dicts("partition var", 
                                    list(range(numSegments + 1)),
                                    lowBound = 0,
                                    cat = 'Continuous')

    #-create segment variables
    segmentVars = LpVariable.dicts("segment var",
                                    list(range(numSegments)),
                                    lowBound = 0,
                                    cat = 'Integer')
    
    #-create a list of the breakpoint domain values
    breakpointValues = createBreakPoints(numSegments, 10)
    
    #-add the objective function to the prob variable
    prob += approxFunctionValue

    ##CONSTRAINTS##
    #---------------#
    #-define the approx function value as the sum of the inner product of 
    #-the breakpoint variables and the function value of where the breakpoints are
    prob += approxFunctionValue == sum(breakpointVars[i] * optFunction(breakpointValues[i]) for i in range(numSegments + 1))

    #-define x as the sum of the inner product of the breakpoint variables and the corresponding values
    prob += x == sum(breakpointVars[i]*breakpointValues[i] for i in range(numSegments + 1))

    #-Adjacency constraints
    prob += breakpointVars[0] >= segmentVars[0]

    for i in range(1, numSegments - 1):
        prob += sum(segmentVars[j-1] for j in range(i, numSegments)) \
                >= sum( breakpointVars[j] for j in range(i, numSegments + 1) )
        prob += sum( breakpointVars[j] for j in range(i, numSegments + 1) ) \
                >= sum(segmentVars[j-1] for j in range(i + 1, numSegments) )

    prob += breakpointVars[numSegments] >= segmentVars[numSegments - 1]

    #-convex combination constraints
    prob += sum(breakpointVars[i] for i in range(numSegments)) == 1

    #-only 1 segment selected
    prob += sum(segmentVars[i] for i in range(numSegments)) == 1

    prob += approxFunctionValue <= -7/10*x + 7
    
    #-solve the problem with default sovler
    prob.solve()
    
    if loggingFlag == 1:
        #-if logging flag is on then write optimal values for variables
        print("Status:", LpStatus[prob.status])
        for v in prob.variables():
            print(v.name, " = ", v.varValue)
            
    return (x.varValue, approxFunctionValue.varValue, breakpointValues)
In [45]:
cumulativeAdjacencyLinearApproximation(12, 1)
Status: Optimal
partition_var_0  =  0.0
partition_var_1  =  0.0
partition_var_10  =  0.0
partition_var_11  =  0.0
partition_var_12  =  0.0
partition_var_2  =  0.0
partition_var_3  =  0.0
partition_var_4  =  0.0
partition_var_5  =  0.0
partition_var_6  =  0.37889832
partition_var_7  =  0.62110168
partition_var_8  =  0.0
partition_var_9  =  0.0
segment_var_0  =  0.0
segment_var_1  =  0.0
segment_var_10  =  0.0
segment_var_11  =  0.0
segment_var_2  =  0.0
segment_var_3  =  0.0
segment_var_4  =  0.0
segment_var_5  =  0.0
segment_var_6  =  1.0
segment_var_7  =  0.0
segment_var_8  =  0.0
segment_var_9  =  0.0
w  =  3.1376907
x  =  5.5175847
Out[45]:
(5.5175847,
 3.1376907,
 [0.0,
  0.8333333333333334,
  1.6666666666666667,
  2.5,
  3.3333333333333335,
  4.166666666666667,
  5.0,
  5.833333333333333,
  6.666666666666667,
  7.5,
  8.333333333333334,
  9.166666666666666,
  10.0])

For the next part in this series of nonlinear approximations, we'll cover how to implement a two dimensional grid, and then we'll cover how to create a binary indexed grid, which allows us to have a fine grid resolution at a low variable cost.

Sources