Searching 1#

Searching for Two Largest Values in a List#

Based on Chapter 11 of the book Practical Programming by P. Gries, J. Campbell, and J. Montojo.

Suppose we have the following list of data, for instance representing the number of seals treated in the Sealcentre in Pieterburen (NL).

334 468 549 836 660 389 308 392 520 271

The first number, 334, represents the number of seals treated in 2009 and the last number, 271, the number of seals treated in 2018.

We start with a simpler problem, what is the highest number, and we can use the list.index to find the corresponding index.

counts = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
high = max(counts)
counts.index(high)
3

Or, more concise:

counts = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
counts.index(max(counts))
3

Now, suppose we want to find the indexes of the 2 highest amounts. There is no direct method to do so.

We will present three alternative algorithms:

  1. Find, remove, find: first find the index for highest amount, remove highest amount and then look for the index of the next highest amount. Insert the highest amount back, and adapt the second index if needed.

  2. Sort: copy the list, and sort, and use the two highest amounts to find the corresponding indices in the original list.

  3. Walk through the list and keep track of the two highest amounts found so far, update if a higher amount is found.

The ultimate question is of course which one is the fastest solution?

Find, Remove, Find#

from typing import List, Tuple

def find_two_highest_remove(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the two highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_remove(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """
    # Find the index of the maximum in seals
    # Remove that item form the list
    # Find the index of the new maximum in the list
    # Put the highest item back in the list
    # If necessary, adjust the second index
    # Return the two indices

As we have seen max and list.index do the for finding the index of the highest amount.

from typing import List, Tuple

def find_two_highest_remove(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the two highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_remove(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """
    # Find the index of the maximum in seals
    highest : int = max(amounts)
    high_index1 : int = amounts.index(highest)
    # Remove that item form the list
    amounts.remove(highest)
#    print(high_index1)
    
    # Find the index of the new maximum in the list
    next_highest : int = max(amounts)
    high_index2 : int = amounts.index(next_highest)
#    print(high_index2)
    
    # Put the highest item back in the list
    # If necessary, adjust the second index
    # Return the two indices
    
find_two_highest_remove([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])

Since we removed the highest amount, we have to put it back and if necessary adapt the index of the second-highest amount, if the highest amount was before the second-highest amount in the list.

from typing import List, Tuple

def find_two_highest_remove(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the two highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_remove(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """
    # Find the index of the maximum in seals
    highest : int = max(amounts)
    high_index1 : int = amounts.index(highest)
    # Remove that item form the list
    amounts.remove(highest)
    
    # Find the index of the new maximum in the list
    next_highest : int = max(amounts)
    high_index2 : int = amounts.index(next_highest)
    
    # Put the highest item back in the list
    amounts.insert(high_index1, highest)
    
    # If necessary, adjust the second index
    if high_index1 <= high_index2:
        high_index2 += 1
        
    # Return the two indices
    return (high_index1, high_index2)

find_two_highest_remove([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
(3, 4)
import doctest
doctest.testmod(verbose=True)  # with details
Trying:
    seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting nothing
ok
Trying:
    find_two_highest_remove(seals)
Expecting:
    (3, 4)
ok
Trying:
    seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting:
    True
ok
1 items had no tests:
    __main__
1 items passed all tests:
   3 tests in __main__.find_two_highest_remove
3 tests in 2 items.
3 passed and 0 failed.
Test passed.
TestResults(failed=0, attempted=3)
# Remove this line and add your code here

Sort#

In the next cell you will find the refined algorithm based on sorting.

from typing import List, Tuple

def find_two_highest_sort(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the two highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_sort(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """
    # Sort a copy of amounts
    # Get the two highest values
    # Find their indices in the original list
    # Return the two indices

We have to refine the sorting and obtaining the index. Note that we sort the list in reverse order!

from typing import List, Tuple

def find_two_highest_sort(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the two highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_sort(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """
    # Sort a copy of amounts
    temp_amounts : List[int] = sorted(amounts, reverse = True)
    print(temp_amounts)
    # Get the two highest values
    highest : int = temp_amounts[0]
    next_highest : int = temp_amounts[1]
    print(highest)
    print(next_highest)
    
    # Find their indices in the original list
    # Return the two indices
    
find_two_highest_sort([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
[836, 660, 549, 520, 468, 392, 389, 334, 308, 271]
836
660

We need to find the indices for both values in the original list.

from typing import List, Tuple

def find_two_highest_sort(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the two highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_sort(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """
    # Sort a copy of amounts
    temp_amounts : List[int] = sorted(amounts, reverse = True)

    # Get the two highest values
    highest : int = temp_amounts[0]
    next_highest : int = temp_amounts[1]
    
    # Find their indices in the original list
    high_index1 : int = amounts.index(highest)
    high_index2 : int = amounts.index(next_highest)
    
    # Return the two indices
    return (high_index1, high_index2)

find_two_highest_sort([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
(3, 4)
doctest.testmod(verbose=True)  # with details
Trying:
    seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting nothing
ok
Trying:
    find_two_highest_remove(seals)
Expecting:
    (3, 4)
ok
Trying:
    seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting:
    True
ok
Trying:
    seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting nothing
ok
Trying:
    find_two_highest_sort(seals)
Expecting:
    (3, 4)
ok
Trying:
    seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting:
    True
ok
1 items had no tests:
    __main__
2 items passed all tests:
   3 tests in __main__.find_two_highest_remove
   3 tests in __main__.find_two_highest_sort
6 tests in 3 items.
6 passed and 0 failed.
Test passed.
TestResults(failed=0, attempted=6)
# Remove this line and add your code here

Walk through the list#

In the next cell you will find the refined algorithm where walk over the list to find the highest two values.

from typing import List, Tuple

def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the two highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_walk(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """

    # Examine each value in the list in order
    # Keep track of the indices of the two highest values found so far
    # Update the indices when a new higher value is found
    # Return the two indices

We start with swapping the first and second line of the refinement steps. Furthermore, the updating of the indices is a step in the loop.

from typing import List, Tuple

def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the tow highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_walk(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """

    # Keep track of the indices of the two highest values found so far
    # Examine each value in the list in order
    #     Update the indices when a new higher value is found
    # Return the two indices 
from typing import List, Tuple

def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the tow highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_walk(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """

    # Keep track of the indices of the two highest values found so far
    if amounts[0] > amounts[1]:
        high_index1, high_index2 = 0, 1
    else:
        high_index1, high_index2 = 1, 0
        
    # Examine each value in the list in order
    #     Update the indices when a new higher value is found
    # Return the two indices 

We will now use a for loop over the indices, because we are interested in the indices and not in the values themselves.

There are alternative solutions, like using a while loop or a for loop to iterate over the values.

from typing import List, Tuple

def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the tow highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_walk(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """

    # Keep track of the indices of the two highest values found so far
    if amounts[0] > amounts[1]:
        high_index1, high_index2 = 0, 1
    else:
        high_index1, high_index2 = 1, 0
        
    # Examine each value in the list in order
    for i in range(2, len(amounts)):
    #     Update the indices when a new higher value is found
    # Return the two indices   
  Cell In[16], line 21
    # Return the two indices
                               ^
SyntaxError: unexpected EOF while parsing

We need to update the indices if the value at the current index is higher than one of the current values.

  1. We have to update both indices if the value at the current index is higher than the highest values seen so far. In this case both indices have to be updated

  2. The value at the current index is between the both highest values seen so far. Only the index of the lowest highest values has to be updated.

  3. The value at the current index is lower than the highest values seen so far. Nothing needs to be done.

from typing import List, Tuple

def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
    """Return a tuple of the indices of the tow highest values in list amounts.
    >>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    >>> find_two_highest_walk(seals)
    (3, 4)
    >>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
    True
    """

    # Keep track of the indices of the two highest values found so far
    if amounts[0] > amounts[1]:
        high_index1, high_index2 = 0, 1
    else:
        high_index1, high_index2 = 1, 0
        
    # Examine each value in the list in order
    for i in range(2, len(amounts)):
    #     Update the indices when a new higher value is found
        if amounts[i] > amounts[high_index1]:
            high_index2 = high_index1
            high_index1 = i
        elif amounts[i] > amounts[high_index2]:
            high_index2 = i
        
    # Return the two indices 
    return (high_index1, high_index2)

find_two_highest_walk([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
doctest.testmod(verbose=True)  # with details
# Remove this line and add your code here

Timing the Functions#

First we are going to generate a huge list of arbitrary unique numbers.

import random
from typing import List

i : int = 0
rdlst : List[int] = []
    
while i < 80000:
    rdnr : int = random.randint(0,1000000)
    if rdnr not in rdlst:
        rdlst = rdlst + [rdnr]
    i += 1
    
#print(rdlst)

Via profiling you get insight in the efficiency of your code.

import time

t1 = time.perf_counter()
#find_two_highest_remove([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
find_two_highest_remove(rdlst)
t2 = time.perf_counter()
print("The remove code took {:.2f}ms".format((t2 - t1) * 1000))

t1 = time.perf_counter()
#find_two_highest_sort([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
find_two_highest_sort(rdlst)
t2 = time.perf_counter()
print("The sort code took {:.2f}ms".format((t2 - t1) * 1000))

t1 = time.perf_counter()
#find_two_highest_walk([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
find_two_highest_walk(rdlst)
t2 = time.perf_counter()
print("The walk code took {:.2f}ms".format((t2 - t1) * 1000))
# Remove this line and add your code here

Searching#

Searching elements in a list is a frequently used operation, as we saw in the cells above. In these cells we use the built-in method index on lists.

We are going to present a number of different search algorithms for unsorted lists, eventually we will produce a more efficient algorithm if the list elements are sorted.