Goal Programming for Optimization

Connor Owen
cowen33@gatech.edu

Goal programming is a branch of optimization that deals with balancing multiple goals for a given problem. In many applications of optimization, it is common that a decision maker has to consider multiple, often conflicting, objectives. For example, a city planner might want to balance coverage and equity when deciding where to place a new amenity, or a plant might have a production goal they want to meet exactly, but can go over or under based on conflicting constraints. Goal programming provides a framework with which to formulate this type of problems.

The general formulation of goal programming is define a goal, then define deviations from this goal, and finally minimize the deviation in some way. To explore this problem, we'll go over an example problem inspired by a real-life problem I was asked to solve.

Mentor Assignment Problem

The problem goes like this: there are m mentors, and n mentees, such that m < n. The mentoring sessions happen in groups, so the mentors will each have multiple mentees. Each mentee beforehand lists the preference for their top 3 mentors. We want to come up with an assignment of mentees to mentors that balances individual choice with group size, since we want neither extremely large groups nor extremely small groups.

These two objectives (as you'll see in the data) are conflicting, since if everyone were to get their first choice of mentor, the resulting groups would not be equally sized. So, let's start taking a look at the data.

Mentors: Jack, Cindy, Simrun, Christopher, Edward

Mentees: Chiara, Rachel, Irving, Felipe, Tommy, Spencer, Terrell, Alex, Kristen, Elsa, Melody, Eduardo, Riley, Robin, Marcus, Clarke, Frankie, Sarah, David, Darin, Javier

Below, we'll look at the preferences in a pandas table using Python. 1, 2, and 3 represent their top three choices (1 is their first choice), while a null represents no preference.

In [46]:
import random
#-Preference data: we'll read it into pandas using a dict object

mentors = ['Jack', 'Cindy', 'Simrun', 'Christopher', 'Edward']
mentees = ['Chiara', 'Rachel', 'Irving', 'Felipe', 'Tommy', 'Spencer', 
           'Terrell', 'Alex', 'Kristen', 'Elsa', 'Melody', 
           'Eduardo', 'Riley', 'Robin', 'Marcus', 'Clarke', 
           'Frankie', 'Sarah', 'David', 'Darin', 'Javier']

def selectMentor(randNum):
    if 0 <= randNum <= 0.5:
        mentor = 'Jack'
    elif 0.5 < randNum <= 0.6:
        mentor = 'Cindy'
    elif 0.6 < randNum <= 0.8:
        mentor = 'Simrun'
    elif 0.8 < randNum <= 0.9:
        mentor = 'Christopher'
    else:
        mentor = 'Edward'   
    return mentor

def selectMentors():
    (mentor1, mentor2, mentor3) = (selectMentor(random.random()), selectMentor(random.random()), selectMentor(random.random()))
    while mentor2 == mentor1:
        mentor2 = selectMentor(random.random())
    while mentor3 == mentor1 or mentor3 == mentor2:
        mentor3 = selectMentor(random.random())
    return [mentor1, mentor2, mentor3]
        
#-we create the preferences using random choices from the lists above
#preferences= {mentee: random.sample(mentors, 3) for mentee in mentees }
preferences= {mentee: selectMentors() for mentee in mentees }
In [47]:
def countPreferences(preferences):
    """
    This function goes through the mentee preferences, 
    and adds up how many times each mentor got ranked 
    as 1, 2 or 3.
    """
    mentorCounts = {}
    for i in range(1,4):
        for mentee in preferences.keys():
            mentorList = preferences[mentee]
            mentorCounts[i] = mentorCounts.get(i, {mentor:0 for mentor in mentors})
            mentorCounts[i][mentorList[i-1]] = 1 + mentorCounts[i][mentorList[i-1]]
    return mentorCounts

countPreferences(preferences)
Out[47]:
{1: {'Christopher': 2, 'Cindy': 0, 'Edward': 3, 'Jack': 11, 'Simrun': 5},
 2: {'Christopher': 2, 'Cindy': 1, 'Edward': 4, 'Jack': 4, 'Simrun': 10},
 3: {'Christopher': 5, 'Cindy': 7, 'Edward': 5, 'Jack': 2, 'Simrun': 2}}
In [48]:
import pandas as pd

preferencesDataFrame = pd.DataFrame(preferences)

#use to create HTML table of data frame
#print(preferencesDataFrame.to_html()) 
Alex Chiara Clarke Darin David Eduardo Elsa Felipe Frankie Irving Javier Kristen Marcus Melody Rachel Riley Robin Sarah Spencer Terrell Tommy
0 Edward Simrun Simrun Cindy Christopher Simrun Jack Jack Simrun Edward Jack Christopher Cindy Christopher Simrun Jack Jack Edward Jack Christopher Christopher
1 Simrun Cindy Jack Simrun Edward Jack Christopher Christopher Cindy Jack Christopher Jack Jack Jack Christopher Simrun Simrun Jack Cindy Simrun Jack
2 Cindy Jack Christopher Jack Simrun Christopher Cindy Simrun Jack Simrun Simrun Simrun Simrun Simrun Jack Christopher Christopher Simrun Christopher Jack Cindy
In [49]:
preferences
Out[49]:
{'Alex': ['Edward', 'Simrun', 'Christopher'],
 'Chiara': ['Jack', 'Christopher', 'Simrun'],
 'Clarke': ['Simrun', 'Cindy', 'Edward'],
 'Darin': ['Jack', 'Simrun', 'Christopher'],
 'David': ['Jack', 'Simrun', 'Christopher'],
 'Eduardo': ['Jack', 'Edward', 'Cindy'],
 'Elsa': ['Jack', 'Edward', 'Cindy'],
 'Felipe': ['Jack', 'Simrun', 'Christopher'],
 'Frankie': ['Simrun', 'Jack', 'Edward'],
 'Irving': ['Jack', 'Simrun', 'Cindy'],
 'Javier': ['Christopher', 'Simrun', 'Cindy'],
 'Kristen': ['Christopher', 'Jack', 'Simrun'],
 'Marcus': ['Edward', 'Christopher', 'Jack'],
 'Melody': ['Edward', 'Simrun', 'Jack'],
 'Rachel': ['Jack', 'Edward', 'Cindy'],
 'Riley': ['Jack', 'Simrun', 'Edward'],
 'Robin': ['Jack', 'Simrun', 'Cindy'],
 'Sarah': ['Jack', 'Simrun', 'Christopher'],
 'Spencer': ['Simrun', 'Jack', 'Edward'],
 'Terrell': ['Simrun', 'Jack', 'Edward'],
 'Tommy': ['Simrun', 'Edward', 'Cindy']}

One possible formulation we can do to match mentees to mentors is to assign costs 1, 2, and 3 to their first, second, and third choices, respectively, and minimize the sum of the assignment. Without any restriction, everyone will naturally get their first choice (since there are no group size limits).

Let's take a look at what the assignment is like with no group balancing.

Our formulation will be fairly simple: we create arcs between mentees and mentors to make a complete bipartite graph. Each arc will be given a binary variable, where 1 represents an assignment, and 0 represents no assignment.

Sets

$\mathcal{N}$: set of mentees, indexed by n

$\mathcal{M}$: set of mentors, indexed by m

Parameters

$\mathrm{p_{n,m}}$ preference mentee n has for mentor m, 1 represents most preferred

Decision Variables

$a_{n,m}$: arc variable between mentee n and mentor m

Formulation

$\min{\; \sum_{m \in \mathcal{M}}\sum_{n \in \mathcal{N}}a_{n,m}\mathrm{p_{n,m}}} $

$\sum_{m \in \mathcal{M}}{a_{n,m}} = 1 \; \forall n \in \mathcal{N} \;\;\; (1)$

$\sum_{n \in \mathcal{N}}{a_{n,m}} \geq 1 \; \forall m \in \mathcal{M} \;\;\; (2)$

$ a_{n,m} \in \{0,1\} \; \forall n \in \mathcal{N}, \; \forall m \in \mathcal{M}$

The objective function above minimizes the cost of assignment, where the cost represents the preference mentee n has for mentor m. A low preference is better, and we can define a "no preference" to be a large number so as to avoid assignment. Constraint 1 ensures that each mentee is assigned to exactly 1 mentor, and constraint 2 ensures that each mentor gets at least 1 mentee.

In [82]:
import pulp


def groupOptimization(preferences):
    
    prob = pulp.LpProblem("No Restrictions Mentoring", pulp.LpMinimize)
    
    arcs = [(mentee, mentor) for mentee in mentees for mentor in mentors]
    arc_variables = pulp.LpVariable.dicts("Arcs", 
                                   [(mentee, mentor) for mentee in mentees \
                                                    for mentor in mentors], \
                                  0, 1, pulp.LpBinary)
    
    
    def mentorPreferences(mentee, mentor):
        #if mentor in preferences then return position in list
        if mentor in preferences[mentee]:
            return preferences[mentee].index(mentor)
        #if mentor not in preferences return large number
        else:
            return 10
    
    prob += sum(mentorPreferences(mentee, mentor)*arc_variables[(mentee, mentor)] for (mentee, mentor) in arcs )
    
    #Every mentee is assigned to 1 mentor
    for mentee in mentees:
        prob += sum(arc_variables[(mentee, mentor)] for mentor in mentors) == 1

    #Every mentor is assigned at least one individual
    for mentor in mentors: 
        prob += sum(arc_variables[(mentee, mentor)] for mentee in mentees) >= 1

    prob.solve()
    
    sol = {}    
    for mentee in mentees:
        for mentor in mentors: 
            if arc_variables[(mentee, mentor)].value() == 1:
                sol[mentee] = mentor
                    
    return(sol)

Before we get to creating some visualizations of the optimal solution, we can simply print out the assignment, as well as the preference the mentee had for that assignment. In this case, we see that everyone but Clarke received their preferred mentor. This means that the only reason she received Cindy as a mentor was because of the constraint ensuring that each mentor received at least 1 mentee.

In [108]:
sol1 = groupOptimization(preferences)
#prepare dictionary for counting preferences
countDict1 = {0:0, 1:0, 2:0}
groupSizeDict1 = {mentor:0 for mentor in mentors}
for mentee in sol1.keys():
    #print out assignments with preferences
    print(mentee, "\t", sol1[mentee], "\t", preferences[mentee].index(sol1[mentee]))
    countDict1[preferences[mentee].index(sol1[mentee])] += 1
    groupSizeDict1[sol1[mentee]] += 1 
Chiara 	 Jack 	 0
Rachel 	 Jack 	 0
Irving 	 Jack 	 0
Felipe 	 Jack 	 0
Tommy 	 Simrun 	 0
Spencer 	 Simrun 	 0
Terrell 	 Simrun 	 0
Alex 	 Edward 	 0
Kristen 	 Christopher 	 0
Elsa 	 Jack 	 0
Melody 	 Edward 	 0
Eduardo 	 Jack 	 0
Riley 	 Jack 	 0
Robin 	 Jack 	 0
Marcus 	 Edward 	 0
Clarke 	 Cindy 	 1
Frankie 	 Simrun 	 0
Sarah 	 Jack 	 0
David 	 Jack 	 0
Darin 	 Jack 	 0
Javier 	 Christopher 	 0

Assignment Visualization

No we can create some quick charts showing the distribution of assignments and group sizes. Since the data is straightforward to visualize, I choose a simple matplotlib bar chart with a basic color scheme. We wrap up the visualization in a function so that we can use it later.

In [91]:
def barChartCount(dictionary):
    x = []
    height = []
    for i in dictionary.keys():
        x.append(i)
        height.append(dictionary[i])
    return (x,height)
In [130]:
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import cm

#Function taken from Gareth on CodeGolf
#https://codegolf.stackexchange.com/questions/4707/outputting-ordinal-numbers-1st-2nd-3rd#answer-4712
ordinal = lambda n: "%d%s" % (n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])


x, height = barChartCount(countDict1)
plt.bar(x, height, align='center', color= 'maroon')
plt.title("Count of Mentees by Preference Assigned")
plt.ylabel("Mentee count")
#Add 1 to list of preferences to make 0 the 1'st choice
plt.xticks(x, list(map(lambda x: ordinal(x+1)+" choice", x)))
plt.show()

x2, height2 = barChartCount(groupSizeDict1)
plt.bar(list(range(len(x2))), height2, color='navy')
plt.title("Group Size For Each mentor")
plt.ylabel("Mentee count")
plt.xticks(list(range(len(x2))), x2)
plt.show()
##

Goal Programming Assignment

While in the last model, nearly every mentee got their preferred mentor, we also see that the group sizes are not equitably distributed. For instance, Jack might not be able to give much attention to his many mentees, while Cindy has the capacity to take on more mentors. This is where goal programming comes in: we can add this trade-off between preferences and group size into the model to be able to evaluate solutions that balance both.

Below, we go over the additional parameters, variables and constraints that we need to add to the model. The main idea is that we define an "ideal" or equitable group size, and then define decision variables that are defined to be deviations from this ideal group size. Then, we add a weighted sum of these deviations to the objective function and minimize it.

Additional Parameters

$G$: ideal average group size, defined as $\frac{|\mathcal{N}|}{|\mathcal{M}|}$

$\pi$: weight for the deviations to be used in the objective function

Additional Decision Variables

$\delta^+_{m}$: positive deviation from average group size for mentor m

$\delta^-_{m}$: negative deviation from average group size for mentor m

Additional Constraints

$ \delta^+_{m} - \delta^-_{m} = \sum_{n \in \mathcal{M}}{a_{n,m}} - G \;\; \forall m \in \mathcal{M}$

New Objective Function

$\min{\; \sum_{m \in \mathcal{M}}\sum_{n \in \mathcal{N}}a_{n,m}\mathrm{p_{n,m}} + \pi \sum_{m \in \mathcal{M}}({\delta^+_{m} + \delta^-_{m}}) } $

The additional decision variables are used to quantify the deviations that each mentor has from the average "ideal" group size. This ideal is the number of mentees divided by the number of mentors. While not necessarily an integer, it will still serve in the optimization model. The additional constraint defines the positive and negative deviations. The new objective function minimizes a weighted sum of the assigned preferences and the sum of deviations of the ideal group size.

In [132]:
import pulp


def groupGoalOptimization(preferences):
    
    prob = pulp.LpProblem("No Restrictions Mentoring", pulp.LpMinimize)
    
    arcs = [(mentee, mentor) for mentee in mentees for mentor in mentors]
    arc_variables = pulp.LpVariable.dicts("Arcs", 
                                   [(mentee, mentor) for mentee in mentees \
                                                    for mentor in mentors], \
                                  0, 1, pulp.LpBinary)
    
    #define the deviation variables
    devPos = pulp.LpVariable.dicts("Dev_Pos", mentors, 0, len(mentees))
    devNeg = pulp.LpVariable.dicts("Dev_Neg", mentors, 0, len(mentees))
    
    #find the average size of a group if groups were equal size
    avg_group_size = len(mentees)/len(mentors)
    print(avg_group_size)
    
    #function to return preference for a given assignment
    def mentorPreferences(mentee, mentor):
        #if mentor in preferences then return position in list
        if mentor in preferences[mentee]:
            return preferences[mentee].index(mentor)
        #if mentor not in preferences return large number
        else:
            #return 10 to make difficult for a no preference
            return 10
    
    #the weight used in the objective function for the deviations
    devWeight = 5
    
    prob += sum(mentorPreferences(mentee, mentor)*arc_variables[(mentee, mentor)] for (mentee, mentor) in arcs) \
               + devWeight * sum(devPos[mentor] + devNeg[mentor] for mentor in mentors)
    
    #Every mentee is assigned to 1 mentor
    for mentee in mentees:
        prob += sum(arc_variables[(mentee, mentor)] for mentor in mentors) == 1
        
    #Every mentor is assigned at least one individual
    for mentor in mentors: 
        prob += sum(arc_variables[(mentee, mentor)] for mentee in mentees) >= 1
        
        prob += devPos[mentor] - devNeg[mentor] == sum(arc_variables[(mentee, mentor)] for mentee in mentees) \
                                                       - avg_group_size
                                                    

    prob.solve()
    
    #collect the assignment in a dictionary
    #key is mentee, value is assigned mentor
    sol = {}    
    for mentee in mentees:
        for mentor in mentors: 
            if arc_variables[(mentee, mentor)].value() == 1:
                sol[mentee] = mentor
                    
    return(sol)

Now we will solve the problem, and write out the assignment that was stored in the "sol" dictionary. The optimization program will also write out the ideal group size, which in this case is 4.2. Note that while this is not a possible group size, it will serve the intended purpose for goal programming.

In [133]:
sol = groupGoalOptimization(preferences)
countDict = {0:0, 1:0, 2:0}
groupSize = {mentor:0 for mentor in mentors}
for mentee in sol.keys():
    print(mentee, "\t", sol[mentee], "\t", preferences[mentee].index(sol[mentee]))
    countDict[preferences[mentee].index(sol[mentee])] += 1
    groupSize[sol[mentee]] += 1
4.2
Chiara 	 Christopher 	 1
Rachel 	 Cindy 	 2
Irving 	 Cindy 	 2
Felipe 	 Jack 	 0
Tommy 	 Simrun 	 0
Spencer 	 Simrun 	 0
Terrell 	 Simrun 	 0
Alex 	 Edward 	 0
Kristen 	 Christopher 	 0
Elsa 	 Cindy 	 2
Melody 	 Edward 	 0
Eduardo 	 Edward 	 1
Riley 	 Jack 	 0
Robin 	 Jack 	 0
Marcus 	 Edward 	 0
Clarke 	 Cindy 	 1
Frankie 	 Simrun 	 0
Sarah 	 Jack 	 0
David 	 Christopher 	 2
Darin 	 Jack 	 0
Javier 	 Christopher 	 0

Assignment Visualization

Now we can visualize the assignment in the simple bar charts that we defined earlier with a function. You can see the trade-off between preferences and group size in this chart. We note that the size of Jack's group has gone down, while Cindy's has gone up. While the group sizes are more equally distributed, there are now more individuals that did not get their first choice.

It's worthwile noting that as a decision maker, you would have the weight $\pi$ allows to be able to fine-tune that trade-off and create different scenarios, and ultimately select the best option that suits business needs.

In [134]:
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import cm

x1, height1 = barChartCount(countDict)
plt.bar(x1, height1, align='center', color= 'maroon')
plt.title("Count of Mentees by Preference Assigned")
plt.ylabel("Mentee count")
#Add 1 to list of preferences to make 0 the 1'st choice
plt.xticks(x1, list(map(lambda x: ordinal(x+1) + " choice", x1)))
plt.show()

x2, height2 = barChartCount(groupSize)
plt.bar(list(range(len(x2))), height2, color='navy')
plt.title("Group Size For Each Mentor")
plt.ylabel("Mentee count")
plt.xticks(list(range(len(x2))), x2)
plt.show()