We can iterate over a list using a for loop

employees = ['Nick','Lore','Hugo']

for employee in employees:
    print(employee)

->
Nick
Lore
Hugo

We can iterate over a string using a for loop

for letter in 'DataCamp':
    print(letter)

->    
D
A
T
(..)

We can iterate over a range object using a for loop

for i in range(4):
    print(i)

->   
0
1
2
3

Iterators vs iterables :

Iterable :

  • Examples : lists, strings, dictionaries, file connections
  • An object with an associated iter() method
  • Applying iter() to an iterable creates an iterator

Iterator :

  • Produces next value with next()
word='Da'
it=iter(word)

next(it)

->D

next(it)

->a

next(it)

->StopIteration Traceback (most recent call last)
<ipython-input-11-2cdb14c0d4d6> in <module>()

Iterating at once with *

word ='Data'
it = iter(word)
print(*it)

->D a t a

print(*it)

->.. nothing else to print

Iterating over dictionaries

pythonistas = {'hugo': 'bowne-anderson','francis': 'castro'}
for key, value in pythonistas.items():
    print(key, value)
    
->
francis castro
hugo bowne-anderson

Iterating over file connections

file = open('file.txt')
it = iter(file)
print(next(it))

->This is the first line.

print(next(it))

->This is the second line.

One of the good things with iters is that you actually don’t use the space needed to create the variable in which iteration happens. If we wanted for instance loop over a range of 10^100, the number is huge and if we just do a for loop in range(10^100) the system will probably crash.

But if we do :

# Create an iterator for range(10 ** 100): googol
googol = iter(range(10 ** 100))

# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))

We can go over the values one by one without having to lock the space needed for this big range.

Playing with iterators

Using enumerate()

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
e = enumerate(avengers)
print(type(e))

-><class 'enumerate'>

e_list = list(e)
print(e_list)

->[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]

it creates tuples in a list with the index as well.

enumerate() and unpack()

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
for index, value in enumerate(avengers):
    print(index, value)

->
0 hawkeye
1 iron man
2 thor
3 quicksilver

for index, value in enumerate(avengers, start=10):
    print(index, value)

->
10 hawkeye
11 iron man
12 thor
13 quicksilver

Using zip()

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']
z = zip(avengers, names)

print(type(z))
-><class 'zip'>

z_list = list(z)
print(z_list)
->
[('hawkeye', 'barton'), ('iron man', 'stark'),
('thor', 'odinson'), ('quicksilver', 'maximoff')]

zip() and unpack

avengers = ['hawkeye','iron man','thor','quicksilver']
names = ['barton','stark','odinson','maximoff']
for z1, z2 in zip(avengers, names):
    print(z1, z2)

->
hawkeye barton
iron man stark
thor odinson
quicksilver maximoff

Print zip with *

avengers = ['hawkeye','iron man','thor','quicksilver']
names = ['barton','stark','odinson','maximoff']
z = zip(avengers, names)
print(*z)

->
('hawkeye','barton') ('iron man','stark')
('thor','odinson') ('quicksilver','maximoff')

Using iterators to load large les into memory

Loading data in chunks :

  • There can be too much data to hold in memory
  • Solution: load data in chunks!
  • Pandas function: read_csv() and Specify the chunk: chunk_size
import pandas as pd
result = []

for chunk in pd.read_csv('data.csv', chunksize=1000):
    result.append(sum(chunk['x']))

total = sum(result)

print(total)
->4252532

We loop over the file and basically get the 1000 first records and do a sum and append the sum to result. We then take the next 1000 and so on and so forth and then we do a sum of all the elements of the list result and get the final sum of all the elements in the file read.

We can do so as well with the code below :

import pandas as pd
total = 0

for chunk in pd.read_csv('data.csv', chunksize=1000):
    total += sum(chunk['x'])
    
print(total)
->4252532

Now we can rework, another time the algorithm on the tweeter tweets !

# Define count_entries()
def count_entries(csv_file,c_size,colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file,chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('tweets.csv',10,'lang')

# Print result_counts
print(result_counts)