Searching And Sorting

Searching

Searching is the algorithmic process of finding a particular item in a collection of items. A search typically answers either TRUE or FALSE as to whether the item is present. On occasion it can be modified to return whether the item is FOUND.

In Python, there is a simple way to ask whether an item is in a list of items: using the IN operator.

>>> 15 in [3, 4, 2, 4, 1]

False

However, there are other, more explicitly stated functions that can be implemented into a Python program specifically. For instance, the Sequential Search.

The Sequential Search

When data items are stored in a collection such as a list, we say that they have a linear or sequential relationship. Each data item is stored in a position relative to the others. In Python lists, these relative positions are the index values of the individual items. Since these index values are ordered, it is possible for us to visit them in sequence. This process gives rise to our first searching technique, the sequential search.

If the item is present within the list, the complexity of the sequential search is O(1) at best and O(n) at worst. If the item is not present, the complexity will only be O(n).

Because the function orderedSequentialSearch searches an ordered list, we can use the other values in the list to decide whether or not we have passed where the value should be. For instance, if we are searching for the value 5 but our sequential search function has found the value 7, our function can postulate that the list does not hold a value of 5.

The Binary Search

It is possible to take greater advantage of the ordered list if we are clever with our comparisons. In the sequential search, when we compare against the first item, there are at most n - 1 more items to look through if the first item is not what we are looking for. Instead of searching a list in sequence, a BINARY SEARCH will start by examining the middle term. I fthat item is the one we are searching for, we are done. If it is not the correct item, we can use the ordered nature of the list to eliminate half of the remaining items. If the item we are searching for is greater than the middle item, we know that the entire lower half of the list as well as the middle item can be eliminated from further consideration.

One of the drawbacks of a binary search is that because it searches directly based on whether or not the selected term is greater than or less than the sought out item, it cannot be done on an unordered list. If we start with a sorted list of n items, about (n/2) items will be left after the first comparison. After the second comparison, there will be about (n/4), then (n/8), and so on. The maximum number of comparisons is logarithmic with respect to the number of items in the list. Therefore teh binary search is O(log n). For this reason, it is important to note that for small values of n, the additional cost of sorting is probably not worth it, even though a binary search is generally better than a sequential search. Even for large lists, however, sorting even once can be so expensive that simply performing a sequential search from the start may be the best choice.

Here is an example of two ways to implement teh binary search in Python, both iteratively and recursively:

Hashing

Hashing is a data structure that can be searched in O(1) time. In order to do this, we will need to know even more about where the items might be when we go to look for them in the collection. If every item is where it should be, then the search can use a single comparison to discover the presence of an item. However, this is not typically the case.

With hashing, you do not need to actually search to get a result. Hashing is done by creating a hash table, which contains slots that each hold an item that is names by an integer value. For example, we will have a slot named 0, a slot named 1, a slot named 2, and so on.

We can implement a hash table in Python by using a list with each element initialized to the special Python value NONE. The mappign between an item and the slot where that item belongs in the hash table is called the hash function. The hash function will take any item in the collection and return an integer in the range of slot name, between 0 and (m - 1). Our first hash function, sometimes referred to as the "Remainder Moethod," simply takes an item and divides it by the table size, returning the remainder as its hash value. (h(item) = item % 11). Once the hash values have been computed, we can insert each item into the hash table at the designated position. This is referred to as the load factor, and is commonly denotes by [lamda] = (numberofitems)/(tablesize).

When we want to search fro an item, we simply use the hash function to compute the slot name for the item and then check the hash table to see if it is present. Hashing is a data structure that can search in O(1) time, since a constant amount of time is required to compute the hash value and then index the hash table at that location.

Here is a great video that elaborates on the usefulness of Hash Tables:

Implementing Hash Tables in Python

Here is some sample hashing code created in the CS260: Data Structures class at COCC. The first sectioon of code is the Node class, which allows us to navigate and search the hash table using variables "key" and "value."

This file is then implemented into our HashMap class:

The lines of code underneath the HASHMAP class are a brief test to check if our functions run.

Sorting

Sorting is the process of placing elements from a collection in some kind of order. For example, a list of words could be sorted alphabetically or by length. A list of cities could be sorted by population, by area, or by zip code. We have already seen a number of algroithms that were able to benefit from having a sorted list.

There are many, many sorting algorithms that have been developed and analyzed. This suggests that sorting is an important area of study in computer science. Sorting a large number of items can take a substantial amount of computing resources. Like searching, the efficiency of a sorting algorithm is related to the number of items being processed. For small collection, a complex sorting method may be more trouble than it is worth. The overhead may be too high. On the other hand, for larger collections, we want to take advantage of as many improvements as possible.

The Bubble Sort

The bubble sort makes multiple passes through a list. It copares adjacent items and exchanges those that are out of order. Each pass through the list places the next largest value in its proper place. In essence, each item "bubbles" up to the location where it belongs, hence the name.