Stack#

Abstract data types#

The data types you have seen so far are all concrete, in the sense that we have completely specified how they are implemented. For example, the Card class represents a card using two integers. As we discussed at the time, that is not the only way to represent a card; there are many alternative implementations.

An abstract data type, or ADT, specifies a set of operations (or methods) and the semantics of the operations (what they do), but it does not not specify the implementation of the operations. That’s what makes it abstract.

Why is that useful?

  1. It simplifies the task of specifying an algorithm if you can denote the operations you need without having to think at the same time about how the operations are performed.

  2. Since there are usually many ways to implement an ADT, it might be useful to write an algorithm that can be used with any of the possible implementations.

  3. Well-known ADTs, such as the Stack ADT in this chapter, are often implemented in standard libraries so they can be written once and used by many programmers.

  4. The operations on ADTs provide a common high-level language for specifying and talking about algorithms.

When we talk about ADTs, we often distinguish the code that uses the ADT, called the client code, from the code that implements the ADT, called the provider code.

The Stack ADT#

In this chapter, we will look at one common ADT, the stack. A stack is a collection, meaning that it is a data structure that contains multiple elements. Other collections we have seen include dictionaries and lists.

An ADT is defined by the operations that can be performed on it, which is called an interface. The interface for a stack consists of these operations:

  • __init__: Initialize a new empty stack.

  • push: Add a new item to the stack.

  • pop: Remove and return an item from the stack. The item that is returned is always the last one that was added.

  • is_empty: Check whether the stack is empty.

A stack is sometimes called a last in, first out or LIFO data structure, because the last item added is the first to be removed.

Implementing stacks with Python lists#

The list operations that Python provides are similar to the operations that define a stack. The interface isn’t exactly what it is supposed to be, but we can write code to translate from the Stack ADT to the built-in operations.

This code is called an implementation of the Stack ADT. In general, an implementation is a set of methods that satisfy the syntactic and semantic requirements of an interface.

Here is an implementation of the Stack ADT that uses a Python list:

class Stack(object):
    def __init__(self):
        self.items = []

    def push(self, item):
        self.items.append(item)

    def pop(self):
        return self.items.pop()

    def is_empty(self):
        return (self.items == [])

A Stack object contains an attribute named items that is a list of items in the stack. The initialization method sets items to the empty list.

To push a new item onto the stack, push appends it onto items. To pop an item off the stack, pop uses the homonymous (same-named) list method to remove and return the last item on the list.

Finally, to check if the stack is empty, is_empty compares items to the empty list.

An implementation like this, in which the methods consist of simple invocations of existing methods, is called a veneer. In real life, veneer is a thin coating of good quality wood used in furniture-making to hide lower quality wood underneath. Computer scientists use this metaphor to describe a small piece of code that hides the details of an implementation and provides a simpler, or more standard, interface.

Pushing and popping#

A stack is a generic data structure, which means that we can add any type of item to it. The following example pushes two integers and a string onto the stack:

s = Stack()
s.push(54)
s.push(45)
s.push("+")

We can use is_empty and pop to remove and print all of the items on the stack:

output = ""
while not s.is_empty():
    output += "%s " % (s.pop())
print(output)
+ 45 54 

The output is + 45 54. In other words, we just used a stack to print the items backward! Granted, it’s not the standard format for printing a list, but by using a stack, it was remarkably easy to do.

You should compare this bit of code to the implementation of print_backward in the Linked lists chapter. There is a natural parallel between the recursive version of print_backward and the stack algorithm here. The difference is that print_backward uses the runtime stack to keep track of the nodes while it traverses the list, and then prints them on the way back from the recursion. The stack algorithm does the same thing, except that is use a Stack object instead of the runtime stack.

Using a stack to evaluate postfix#

In most programming languages, mathematical expressions are written with the operator between the two operands, as in 1 + 2. This format is called infix. An alternative used by some calculators is called postfix. In postfix, the operator follows the operands, as in 1 2 +.

The reason postfix is sometimes useful is that there is a natural way to evaluate a postfix expression using a stack:

  1. Starting at the beginning of the expression, get one term (operator or operand) at a time.

    • If the term is an operand, push it on the stack.

    • If the term is an operator, pop two operands off the stack, perform the operation on them, and push the result back on the stack.

  2. When you get to the end of the expression, there should be exactly one operand left on the stack. That operand is the result.

Parsing#

To implement the previous algorithm, we need to be able to traverse a string and break it into operands and operators. This process is an example of parsing, and the results—the individual chunks of the string – are called tokens. You might remember these words from the The way of the program chapter.

Python provides a split method in both the string and re (regular expression) modules. The function string.split splits a string into a list using a single character as a delimiter. For example:

import string
string.split("Now is the time"," ")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [4], in <cell line: 2>()
      1 import string
----> 2 string.split("Now is the time"," ")

AttributeError: module 'string' has no attribute 'split'

In this case, the delimiter is the space character, so the string is split at each space.

The function re.split is more powerful, allowing us to provide a regular expression instead of a delimiter. A regular expression is a way of specifying a set of strings. For example, [A-z] is the set of all letters and [0-9] is the set of all numbers. The ^ operator negates a set, so [^0-9] is the set of everything that is not a number, which is exactly the set we want to use to split up postfix expressions:

import re
re.split("([^0-9])", "123+456*/")

Notice that the order of the arguments is different from string.split; the delimiter comes before the string.

The resulting list includes the operands 123 and 456 and the operators * and /. It also includes two empty strings that are inserted after the operands.

Evaluating postfix#

To evaluate a postfix expression, we will use the parser from the previous section and the algorithm from the section before that. To keep things simple, we’ll start with an evaluator that only implements the operators + and *:

def eval_postfix(expr):
    import re
    token_list = re.split("([^0-9])", expr)
    stack = Stack()
    for token in token_list:
        if  token == '' or token == ' ':
            continue
        if  token == '+':
            sum = stack.pop() + stack.pop()
            stack.push(sum)
        elif token == '*':
            product = stack.pop() * stack.pop()
            stack.push(product)
        else:
            stack.push(int(token))
    return stack.pop()

The first condition takes care of spaces and empty strings. The next two conditions handle operators. We assume, for now, that anything else must be an operand. Of course, it would be better to check for erroneous input and report an error message, but we’ll get to that later.

Let’s test it by evaluating the postfix form of (56+47)*2:

print(eval_postfix("56 47 + 2 \*"))

That’s close enough.

Clients and providers#

One of the fundamental goals of an ADT is to separate the interests of the provider, who writes the code that implements the ADT, and the client, who uses the ADT. The provider only has to worry about whether the implementation is correct – in accord with the specification of the ADT – and not how it will be used.

Conversely, the client assumes that the implementation of the ADT is correct and doesn’t worry about the details. When you are using one of Python’s built-in types, you have the luxury of thinking exclusively as a client.

Of course, when you implement an ADT, you also have to write client code to test it. In that case, you play both roles, which can be confusing. You should make some effort to keep track of which role you are playing at any moment.



Glossary#

abstract data type (ADT)#

A data type (usually a collection of objects) that is defined by a set of operations but that can be implemented in a variety of ways.

interface#

The set of operations that define an ADT.

implementation#

Code that satisfies the syntactic and semantic requirements of an interface.

client#

A program (or the person who wrote it) that uses an ADT.

provider#

The code (or the person who wrote it) that implements an ADT.

veneer#

A class definition that implements an ADT with method definitions that are invocations of other methods, sometimes with simple transformations. The veneer does no significant work, but it improves or standardizes the interface seen by the client.

generic data structure#

A kind of data structure that can contain data of any type.

infix#

A way of writing mathematical expressions with the operators between the operands.

postfix#

A way of writing mathematical expressions with the operators after the operands.

parse#

To read a string of characters or tokens and analyze its grammatical structure.

token#

A set of characters that are treated as a unit for purposes of parsing, such as the words in a natural language.

delimiter#

A character that is used to separate tokens, such as punctuation in a natural language.

Exercises#

  1. Apply the postfix algorithm to the expression 1 2 + 3 *. This example demonstrates one of the advantages of postfix—there is no need to use parentheses to control the order of operations. To get the same result in infix, we would have to write (1 + 2) * 3.

  2. Write a postfix expression that is equivalent to 1 + 2 * 3.

Improved Linked Queue#

We would like an implementation of the Queue ADT that can perform all operations in constant time. One way to do that is to modify the Queue class so that it maintains a reference to both the first and the last node, as shown in the figure:

The ImprovedQueue implementation looks like this:

class ImprovedQueue(object):
    def __init__(self):
        self.length = 0
        self.head   = None
        self.last   = None

    def is_empty(self):
        return (self.length == 0)

So far, the only change is the attribute last. It is used in insert and remove methods:

class ImprovedQueue(object):
    ...
    def insert(self, cargo):
        node = Node(cargo)
        node.next = None
        if self.length == 0:
            # if list is empty, the new node is head and last
            self.head = self.last = node
        else:
            # find the last node
            last = self.last
            # append the new node
            last.next = node
            self.last = node
        self.length = self.length + 1

Since last keeps track of the last node, we don’t have to search for it. As a result, this method is constant time.

There is a price to pay for that speed. We have to add a special case to remove to set last to None when the last node is removed:

class ImprovedQueue(object):
    ...
    def remove(self):
        cargo = self.head.cargo
        self.head = self.head.next
        self.length = self.length - 1
        if self.length == 0:
            self.last = None
        return cargo

This implementation is more complicated than the Linked Queue implementation, and it is more difficult to demonstrate that it is correct. The advantage is that we have achieved the goal – both insert and remove are constant-time operations.

Priority queue#

The Priority Queue ADT has the same interface as the Queue ADT, but different semantics. Again, the interface is:

  • __init__ Initialize a new empty queue.

  • insert Add a new item to the queue.

  • remove Remove and return an item from the queue. The item that is returned is the one with the highest priority.

  • is_empty Check whether the queue is empty.

The semantic difference is that the item that is removed from the queue is not necessarily the first one that was added. Rather, it is the item in the queue that has the highest priority. What the priorities are and how they compare to each other are not specified by the Priority Queue implementation. It depends on which items are in the queue.

For example, if the items in the queue have names, we might choose them in alphabetical order. If they are bowling scores, we might go from highest to lowest, but if they are golf scores, we would go from lowest to highest. As long as we can compare the items in the queue, we can find and remove the one with the highest priority.

This implementation of Priority Queue has as an attribute a Python list that contains the items in the queue.

class PriorityQueue(object):
    def __init__(self):
        self.items = []

    def is_empty(self):
        return self.items == []

    def insert(self, item):
        self.items.append(item)

The initialization method, is_empty, and insert are all veneers on list operations. The only interesting method is remove:

class PriorityQueue(object):
    ...
    def remove(self):
        maxi = 0
        for i in range(1, len(self.items)):
            if self.items[i] > self.items[maxi]: maxi = i
        item = self.items[maxi]
        self.items[maxi:maxi+1] = []
        return item

At the beginning of each iteration, maxi holds the index of the biggest item (highest priority) we have seen so far. Each time through the loop, the program compares the i-eth item to the champion. If the new item is bigger, the value of maxi if set to i.

When the for statement completes, maxi is the index of the biggest item. This item is removed from the list and returned.

Let’s test the implementation:

q = PriorityQueue()
q.insert(11)
q.insert(12)
q.insert(14)
q.insert(13)
while not q.is_empty(): print(q.remove())

If the queue contains simple numbers or strings, they are removed in numerical or alphabetical order, from highest to lowest. Python can find the biggest integer or string because it can compare them using the built-in comparison operators.

If the queue contains an object type, it has to provide a __cmp__ method. When remove uses the > operator to compare items, it invokes the __cmp__ for one of the items and passes the other as a parameter. As long as the __cmp__ method works correctly, the Priority Queue will work.

The Golfer class#

As an example of an object with an unusual definition of priority, let’s implement a class called Golfer that keeps track of the names and scores of golfers. As usual, we start by defining __init__ and __str__:

class Golfer(object):
    def __init__(self, name, score):
        self.name = name
        self.score= score

    def __str__(self):
        return "%-16s: %d" % (self.name, self.score)

__str__ uses the format operator to put the names and scores in neat columns.

Next we define a version of __cmp__ where the lowest score gets highest priority. As always, __cmp__ returns 1 if self is greater than other, -1 if self is less than other, and 0 if they are equal.

class Golfer(object):
    ...
    def __cmp__(self, other):
        if self.score < other.score: return  1   # less is more
        if self.score > other.score: return -1
        return 0

Now we are ready to test the priority queue with the Golfer class:

tiger = Golfer("Tiger Woods",    61)
phil  = Golfer("Phil Mickelson", 72)
hal   = Golfer("Hal Sutton",     69)

pq = PriorityQueue()
pq.insert(tiger)
pq.insert(phil)
pq.insert(hal)
while not pq.is_empty(): print(pq.remove())


Glossary#

queue#

An ordered set of objects waiting for a service of some kind.

Queue#

An ADT that performs the operations one might perform on a queue.

queueing policy#

The rules that determine which member of a queue is removed next.

FIFO#

First In, First Out, a queueing policy in which the first member to arrive is the first to be removed.

priority queue#

A queueing policy in which each member has a priority determined by external factors. The member with the highest priority is the first to be removed.

Priority Queue#

An ADT that defines the operations one might perform on a priority queue.

linked queue#

An implementation of a queue using a linked list.

constant time#

An operation whose runtime does not depend on the size of the data structure.

linear time#

An operation whose runtime is a linear function of the size of the data structure.

Exercises#

  1. Write an implementation of the Queue ADT using a Python list. Compare the performance of this implementation to the ImprovedQueue for a range of queue lengths. #. Write an implementation of the Priority Queue ADT using a linked list. You should keep the list sorted so that removal is a constant time operation. Compare the performance of this implementation with the Python list implementation.