We can iterate over a list using a for loop
employees = ['Nick','Lore','Hugo']
for employee in employees:
print(employee)
->
Nick
Lore
HugoWe can iterate over a string using a for loop
for letter in 'DataCamp':
print(letter)
->
D
A
T
(..)We can iterate over a range object using a for loop
for i in range(4):
print(i)
->
0
1
2
3Iterators vs iterables :
Iterable :
- Examples : lists, strings, dictionaries, file connections
- An object with an associated iter() method
- Applying iter() to an iterable creates an iterator
Iterator :
- Produces next value with next()
word='Da' it=iter(word) next(it) ->D next(it) ->a next(it) ->StopIteration Traceback (most recent call last) <ipython-input-11-2cdb14c0d4d6> in <module>()
Iterating at once with *
word ='Data' it = iter(word) print(*it) ->D a t a print(*it) ->.. nothing else to print
Iterating over dictionaries
pythonistas = {'hugo': 'bowne-anderson','francis': 'castro'}
for key, value in pythonistas.items():
print(key, value)
->
francis castro
hugo bowne-andersonIterating over file connections
file = open('file.txt')
it = iter(file)
print(next(it))
->This is the first line.
print(next(it))
->This is the second line.One of the good things with iters is that you actually don’t use the space needed to create the variable in which iteration happens. If we wanted for instance loop over a range of 10^100, the number is huge and if we just do a for loop in range(10^100) the system will probably crash.
But if we do :
# Create an iterator for range(10 ** 100): googol googol = iter(range(10 ** 100)) # Print the first 5 values from googol print(next(googol)) print(next(googol)) print(next(googol)) print(next(googol)) print(next(googol))
We can go over the values one by one without having to lock the space needed for this big range.
Playing with iterators
Using enumerate()
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] e = enumerate(avengers) print(type(e)) -><class 'enumerate'> e_list = list(e) print(e_list) ->[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]
it creates tuples in a list with the index as well.
enumerate() and unpack()
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
for index, value in enumerate(avengers):
print(index, value)
->
0 hawkeye
1 iron man
2 thor
3 quicksilver
for index, value in enumerate(avengers, start=10):
print(index, value)
->
10 hawkeye
11 iron man
12 thor
13 quicksilverUsing zip()
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']
z = zip(avengers, names)
print(type(z))
-><class 'zip'>
z_list = list(z)
print(z_list)
->
[('hawkeye', 'barton'), ('iron man', 'stark'),
('thor', 'odinson'), ('quicksilver', 'maximoff')]zip() and unpack
avengers = ['hawkeye','iron man','thor','quicksilver']
names = ['barton','stark','odinson','maximoff']
for z1, z2 in zip(avengers, names):
print(z1, z2)
->
hawkeye barton
iron man stark
thor odinson
quicksilver maximoffPrint zip with *
avengers = ['hawkeye','iron man','thor','quicksilver']
names = ['barton','stark','odinson','maximoff']
z = zip(avengers, names)
print(*z)
->
('hawkeye','barton') ('iron man','stark')
('thor','odinson') ('quicksilver','maximoff')Using iterators to load large les into memory
Loading data in chunks :
- There can be too much data to hold in memory
- Solution: load data in chunks!
- Pandas function: read_csv() and Specify the chunk: chunk_size
import pandas as pd
result = []
for chunk in pd.read_csv('data.csv', chunksize=1000):
result.append(sum(chunk['x']))
total = sum(result)
print(total)
->4252532
We loop over the file and basically get the 1000 first records and do a sum and append the sum to result. We then take the next 1000 and so on and so forth and then we do a sum of all the elements of the list result and get the final sum of all the elements in the file read.
We can do so as well with the code below :
import pandas as pd
total = 0
for chunk in pd.read_csv('data.csv', chunksize=1000):
total += sum(chunk['x'])
print(total)
->4252532Now we can rework, another time the algorithm on the tweeter tweets !
# Define count_entries()
def count_entries(csv_file,c_size,colname):
"""Return a dictionary with counts of
occurrences as value for each key."""
# Initialize an empty dictionary: counts_dict
counts_dict = {}
# Iterate over the file chunk by chunk
for chunk in pd.read_csv(csv_file,chunksize=c_size):
# Iterate over the column in DataFrame
for entry in chunk[colname]:
if entry in counts_dict.keys():
counts_dict[entry] += 1
else:
counts_dict[entry] = 1
# Return counts_dict
return counts_dict
# Call count_entries(): result_counts
result_counts = count_entries('tweets.csv',10,'lang')
# Print result_counts
print(result_counts)