Searching
Contents
Searching 1#
Searching for Two Largest Values in a List#
Based on Chapter 11 of the book Practical Programming by P. Gries, J. Campbell, and J. Montojo.
Suppose we have the following list of data, for instance representing the number of seals treated in the Sealcentre in Pieterburen (NL).
334 468 549 836 660 389 308 392 520 271
The first number, 334, represents the number of seals treated in 2009 and the last number, 271, the number of seals treated in 2018.
We start with a simpler problem, what is the highest number, and we can use the list.index
to find the corresponding index.
counts = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
high = max(counts)
counts.index(high)
3
Or, more concise:
counts = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
counts.index(max(counts))
3
Now, suppose we want to find the indexes of the 2 highest amounts. There is no direct method to do so.
We will present three alternative algorithms:
Find, remove, find: first find the index for highest amount, remove highest amount and then look for the index of the next highest amount. Insert the highest amount back, and adapt the second index if needed.
Sort: copy the list, and sort, and use the two highest amounts to find the corresponding indices in the original list.
Walk through the list and keep track of the two highest amounts found so far, update if a higher amount is found.
The ultimate question is of course which one is the fastest solution?
Find, Remove, Find#
from typing import List, Tuple
def find_two_highest_remove(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the two highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_remove(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Find the index of the maximum in seals
# Remove that item form the list
# Find the index of the new maximum in the list
# Put the highest item back in the list
# If necessary, adjust the second index
# Return the two indices
As we have seen max
and list.index
do the for finding the index of the highest amount.
from typing import List, Tuple
def find_two_highest_remove(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the two highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_remove(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Find the index of the maximum in seals
highest : int = max(amounts)
high_index1 : int = amounts.index(highest)
# Remove that item form the list
amounts.remove(highest)
# print(high_index1)
# Find the index of the new maximum in the list
next_highest : int = max(amounts)
high_index2 : int = amounts.index(next_highest)
# print(high_index2)
# Put the highest item back in the list
# If necessary, adjust the second index
# Return the two indices
find_two_highest_remove([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
Since we removed the highest amount, we have to put it back and if necessary adapt the index of the second-highest amount, if the highest amount was before the second-highest amount in the list.
from typing import List, Tuple
def find_two_highest_remove(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the two highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_remove(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Find the index of the maximum in seals
highest : int = max(amounts)
high_index1 : int = amounts.index(highest)
# Remove that item form the list
amounts.remove(highest)
# Find the index of the new maximum in the list
next_highest : int = max(amounts)
high_index2 : int = amounts.index(next_highest)
# Put the highest item back in the list
amounts.insert(high_index1, highest)
# If necessary, adjust the second index
if high_index1 <= high_index2:
high_index2 += 1
# Return the two indices
return (high_index1, high_index2)
find_two_highest_remove([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
(3, 4)
import doctest
doctest.testmod(verbose=True) # with details
Trying:
seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting nothing
ok
Trying:
find_two_highest_remove(seals)
Expecting:
(3, 4)
ok
Trying:
seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting:
True
ok
1 items had no tests:
__main__
1 items passed all tests:
3 tests in __main__.find_two_highest_remove
3 tests in 2 items.
3 passed and 0 failed.
Test passed.
TestResults(failed=0, attempted=3)
Do It Yourself!
Create the function find_two_lowest_remove, which finds the indexes of the 2 lowest numbers of a list of integers received as parameter. Use the find, remove, find algorithm to compute the expected output.
# Remove this line and add your code here
Sort#
In the next cell you will find the refined algorithm based on sorting.
from typing import List, Tuple
def find_two_highest_sort(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the two highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_sort(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Sort a copy of amounts
# Get the two highest values
# Find their indices in the original list
# Return the two indices
We have to refine the sorting and obtaining the index. Note that we sort the list in reverse order!
from typing import List, Tuple
def find_two_highest_sort(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the two highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_sort(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Sort a copy of amounts
temp_amounts : List[int] = sorted(amounts, reverse = True)
print(temp_amounts)
# Get the two highest values
highest : int = temp_amounts[0]
next_highest : int = temp_amounts[1]
print(highest)
print(next_highest)
# Find their indices in the original list
# Return the two indices
find_two_highest_sort([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
[836, 660, 549, 520, 468, 392, 389, 334, 308, 271]
836
660
We need to find the indices for both values in the original list.
from typing import List, Tuple
def find_two_highest_sort(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the two highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_sort(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Sort a copy of amounts
temp_amounts : List[int] = sorted(amounts, reverse = True)
# Get the two highest values
highest : int = temp_amounts[0]
next_highest : int = temp_amounts[1]
# Find their indices in the original list
high_index1 : int = amounts.index(highest)
high_index2 : int = amounts.index(next_highest)
# Return the two indices
return (high_index1, high_index2)
find_two_highest_sort([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
(3, 4)
doctest.testmod(verbose=True) # with details
Trying:
seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting nothing
ok
Trying:
find_two_highest_remove(seals)
Expecting:
(3, 4)
ok
Trying:
seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting:
True
ok
Trying:
seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting nothing
ok
Trying:
find_two_highest_sort(seals)
Expecting:
(3, 4)
ok
Trying:
seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
Expecting:
True
ok
1 items had no tests:
__main__
2 items passed all tests:
3 tests in __main__.find_two_highest_remove
3 tests in __main__.find_two_highest_sort
6 tests in 3 items.
6 passed and 0 failed.
Test passed.
TestResults(failed=0, attempted=6)
Do It Yourself!
Create the function find_two_lowest_sort, which finds the indexes of the 2 lowest numbers of a list of integers received as parameter. Use the sort algorithm to compute the expected output.
# Remove this line and add your code here
Walk through the list#
In the next cell you will find the refined algorithm where walk over the list to find the highest two values.
from typing import List, Tuple
def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the two highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_walk(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Examine each value in the list in order
# Keep track of the indices of the two highest values found so far
# Update the indices when a new higher value is found
# Return the two indices
We start with swapping the first and second line of the refinement steps. Furthermore, the updating of the indices is a step in the loop.
from typing import List, Tuple
def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the tow highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_walk(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Keep track of the indices of the two highest values found so far
# Examine each value in the list in order
# Update the indices when a new higher value is found
# Return the two indices
from typing import List, Tuple
def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the tow highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_walk(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Keep track of the indices of the two highest values found so far
if amounts[0] > amounts[1]:
high_index1, high_index2 = 0, 1
else:
high_index1, high_index2 = 1, 0
# Examine each value in the list in order
# Update the indices when a new higher value is found
# Return the two indices
We will now use a for loop over the indices, because we are interested in the indices and not in the values themselves.
There are alternative solutions, like using a while loop or a for loop to iterate over the values.
from typing import List, Tuple
def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the tow highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_walk(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Keep track of the indices of the two highest values found so far
if amounts[0] > amounts[1]:
high_index1, high_index2 = 0, 1
else:
high_index1, high_index2 = 1, 0
# Examine each value in the list in order
for i in range(2, len(amounts)):
# Update the indices when a new higher value is found
# Return the two indices
Cell In[16], line 21
# Return the two indices
^
SyntaxError: unexpected EOF while parsing
We need to update the indices if the value at the current index is higher than one of the current values.
We have to update both indices if the value at the current index is higher than the highest values seen so far. In this case both indices have to be updated
The value at the current index is between the both highest values seen so far. Only the index of the lowest highest values has to be updated.
The value at the current index is lower than the highest values seen so far. Nothing needs to be done.
from typing import List, Tuple
def find_two_highest_walk(amounts : List[int]) -> Tuple[int, int]:
"""Return a tuple of the indices of the tow highest values in list amounts.
>>> seals = [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
>>> find_two_highest_walk(seals)
(3, 4)
>>> seals == [334, 468, 549, 836, 660, 389, 308, 392, 520, 271]
True
"""
# Keep track of the indices of the two highest values found so far
if amounts[0] > amounts[1]:
high_index1, high_index2 = 0, 1
else:
high_index1, high_index2 = 1, 0
# Examine each value in the list in order
for i in range(2, len(amounts)):
# Update the indices when a new higher value is found
if amounts[i] > amounts[high_index1]:
high_index2 = high_index1
high_index1 = i
elif amounts[i] > amounts[high_index2]:
high_index2 = i
# Return the two indices
return (high_index1, high_index2)
find_two_highest_walk([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
doctest.testmod(verbose=True) # with details
Do It Yourself!
Create the function find_two_lowest_walk, which finds the indexes of the 2 lowest numbers of a list of integers received as parameter. Use the walk through the list algorithm to compute the expected output.
# Remove this line and add your code here
Timing the Functions#
First we are going to generate a huge list of arbitrary unique numbers.
import random
from typing import List
i : int = 0
rdlst : List[int] = []
while i < 80000:
rdnr : int = random.randint(0,1000000)
if rdnr not in rdlst:
rdlst = rdlst + [rdnr]
i += 1
#print(rdlst)
Via profiling you get insight in the efficiency of your code.
import time
t1 = time.perf_counter()
#find_two_highest_remove([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
find_two_highest_remove(rdlst)
t2 = time.perf_counter()
print("The remove code took {:.2f}ms".format((t2 - t1) * 1000))
t1 = time.perf_counter()
#find_two_highest_sort([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
find_two_highest_sort(rdlst)
t2 = time.perf_counter()
print("The sort code took {:.2f}ms".format((t2 - t1) * 1000))
t1 = time.perf_counter()
#find_two_highest_walk([334, 468, 549, 836, 660, 389, 308, 392, 520, 271])
find_two_highest_walk(rdlst)
t2 = time.perf_counter()
print("The walk code took {:.2f}ms".format((t2 - t1) * 1000))
Do It Yourself!
Report the time taken by each of the three functions to return the indices of the lowest numbers of the list of integers.
# Remove this line and add your code here
Searching#
Searching elements in a list is a frequently used operation, as we saw in the cells above. In these cells we use the built-in method index
on lists.
We are going to present a number of different search algorithms for unsorted lists, eventually we will produce a more efficient algorithm if the list elements are sorted.
Linear Search#
Linear search starts with the index 0
and looks at each list item one by one, in order to see whether the element at the given index is the element we are looking for?
We will give two variants, each of the variants uses a loop.
from typing import Any
def linear_search(lst : list, value : Any) -> int:
"""Return the index of the first occurrence of value in lst,
return -1 if value is not in lst.
>>> linear_search([2, 5, 1, -3], 5)
1
>>> linear_search([2, 4, 2], 2)
0
>>> linear_search([2, 5, 1, -3], 4)
-1
>>> linear_search([], 5)
-1
"""
# Examine the items at each index i in lst, starting at index 0
# Is lst[i] the value we are looking for? If so, stop searching.
We will first present a solution using a while-loop and an auxilary variable representing the index.
from typing import Any
def linear_search(lst : list, value : Any) -> int:
"""Return the index of the first occurrence of value in lst,
return -1 if value is not in lst.
>>> linear_search([2, 5, 1, -3], 5)
1
>>> linear_search([2, 4, 2], 2)
0
>>> linear_search([2, 5, 1, -3], 4)
-1
>>> linear_search([], 5)
-1
"""
# Start at index 0
i : int = 0
# Examine the items at each index i in lst
# Is lst[i] the value we are looking for? If so, stop searching.
while i != len(lst) and lst[i] != value:
i += 1
# If we have inspected all elements
if i == len(lst):
return -1
else:
return i
import doctest
doctest.testmod(verbose=True) # with details
The next solution present uses a for-loop and an auxilary variable representing the index.
from typing import Any
def linear_search(lst : list, value : Any) -> int:
"""Return the index of the first occurrence of value in lst,
return -1 if value is not in lst.
>>> linear_search([2, 5, 1, -3], 5)
1
>>> linear_search([2, 4, 2], 2)
0
>>> linear_search([2, 5, 1, -3], 4)
-1
>>> linear_search([], 5)
-1
"""
for i in range(len(lst)):
if lst[i] == value:
return i
return -1
doctest.testmod(verbose=True) # with details
Suppose the list is sorted, would you use the same search function?
Binary Search#
Binary search is only applicable if the elements in the list are sorted. The basic idea of binary search is the find the middle of the list, compare the value with the middle element, if the two are the equal, then the index of the middle element is returned. Otherwise two options are possible, the value is smaller the middle element then the search is continued in the first half, else the search is continued in the second half.
The advantage is that the number of steps to find the index in a list with 1.000.000 is about 20!
from typing import Any
def binary_search(lst : list, value : Any) -> int:
""" Return the index of the first occurrence of the value in lst,
or return -1 if the value is not found in lst.
>>> binary_search([1, 2, 4, 4, 5, 7, 9, 10], 1)
0
>>> binary_search([1, 2, 4, 4, 5, 7, 9, 10], 4)
2
>>> binary_search([1, 2, 4, 4, 5, 7, 9, 10], 5)
4
>>> binary_search([1, 2, 4, 4, 5, 7, 9, 10], 10)
7
>>> binary_search([1, 2, 4, 4, 5, 7, 9, 10], -3)
-1
>>> binary_search([1, 2, 4, 4, 5, 7, 9, 10], 11)
-1
>>> binary_search([1, 2, 4, 4, 5, 7, 9, 10], 3)
-1
>>> binary_search([], -3)
-1
>>> binary_search([1], 1)
0
"""
# Mark the left and right indices of the part of the list to be searched
i : int = 0
j : int = len(lst) - 1
while i != j + 1:
m = (i + j) // 2
if lst[m] < value:
i = m + 1
else:
j = m - 1
if 0 <= i < len(lst) and lst[i] == value:
return i
else:
return -1
binary_search([1, 2, 4, 4, 5, 7, 9, 10, 12, 15,
20, 20, 25, 33, 33, 44, 55, 60,
61, 62, 64, 67, 70, 73, 76, 78], 55)
import doctest
doctest.testmod(verbose=True) # with details
There are a lot of test cases because the test cases cover:
The value is the first item.
The value occurs twice, we want the index of the first occurrence.
The value is exactly in the middle of the list.
The value is the last item of the list.
The value is smaller than all elements in the list.
The value is larger than all elements in the list.
The value is not in the list, but is larger than some but smaller than others.
The list has no items.
The list has exactly one item.
Do It Yourself!
Use the linear and the binary search to find the word done in the list. Profile each function and compare the time taken by each one of them.
lst = ['we', 'are', 'almost', 'done', 'with', 'the', 'course']
# Remove this line and add your code here
Again some timing measurements.
import time
t1 = time.perf_counter()
idx1 : int = linear_search(rdlst, 202123)
t2 = time.perf_counter()
print("The linear search code took {:.2f}ms".format((t2 - t1) * 1000))
print(idx1)
rdlst.sort()
t1 = time.perf_counter()
idx2 : int = binary_search(rdlst, 202123)
t2 = time.perf_counter()
print("The binary search code took {:.2f}ms".format((t2 - t1) * 1000))
print(idx2)
- 1
This Jupyter Notebook is based on external content related to algorithms (e.g. searching and sorting).