We can iterate over a list using a for loop
employees = ['Nick','Lore','Hugo'] for employee in employees: print(employee) -> Nick Lore Hugo
We can iterate over a string using a for loop
for letter in 'DataCamp': print(letter) -> D A T (..)
We can iterate over a range object using a for loop
for i in range(4): print(i) -> 0 1 2 3
Iterators vs iterables :
Iterable :
- Examples : lists, strings, dictionaries, file connections
- An object with an associated iter() method
- Applying iter() to an iterable creates an iterator
Iterator :
- Produces next value with next()
word='Da' it=iter(word) next(it) ->D next(it) ->a next(it) ->StopIteration Traceback (most recent call last) <ipython-input-11-2cdb14c0d4d6> in <module>()
Iterating at once with *
word ='Data' it = iter(word) print(*it) ->D a t a print(*it) ->.. nothing else to print
Iterating over dictionaries
pythonistas = {'hugo': 'bowne-anderson','francis': 'castro'} for key, value in pythonistas.items(): print(key, value) -> francis castro hugo bowne-anderson
Iterating over file connections
file = open('file.txt') it = iter(file) print(next(it)) ->This is the first line. print(next(it)) ->This is the second line.
One of the good things with iters is that you actually don’t use the space needed to create the variable in which iteration happens. If we wanted for instance loop over a range of 10^100, the number is huge and if we just do a for loop in range(10^100) the system will probably crash.
But if we do :
# Create an iterator for range(10 ** 100): googol googol = iter(range(10 ** 100)) # Print the first 5 values from googol print(next(googol)) print(next(googol)) print(next(googol)) print(next(googol)) print(next(googol))
We can go over the values one by one without having to lock the space needed for this big range.
Playing with iterators
Using enumerate()
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] e = enumerate(avengers) print(type(e)) -><class 'enumerate'> e_list = list(e) print(e_list) ->[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]
it creates tuples in a list with the index as well.
enumerate() and unpack()
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] for index, value in enumerate(avengers): print(index, value) -> 0 hawkeye 1 iron man 2 thor 3 quicksilver for index, value in enumerate(avengers, start=10): print(index, value) -> 10 hawkeye 11 iron man 12 thor 13 quicksilver
Using zip()
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] names = ['barton', 'stark', 'odinson', 'maximoff'] z = zip(avengers, names) print(type(z)) -><class 'zip'> z_list = list(z) print(z_list) -> [('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]
zip() and unpack
avengers = ['hawkeye','iron man','thor','quicksilver'] names = ['barton','stark','odinson','maximoff'] for z1, z2 in zip(avengers, names): print(z1, z2) -> hawkeye barton iron man stark thor odinson quicksilver maximoff
Print zip with *
avengers = ['hawkeye','iron man','thor','quicksilver'] names = ['barton','stark','odinson','maximoff'] z = zip(avengers, names) print(*z) -> ('hawkeye','barton') ('iron man','stark') ('thor','odinson') ('quicksilver','maximoff')
Using iterators to load large les into memory
Loading data in chunks :
- There can be too much data to hold in memory
- Solution: load data in chunks!
- Pandas function: read_csv() and Specify the chunk: chunk_size
import pandas as pd result = [] for chunk in pd.read_csv('data.csv', chunksize=1000): result.append(sum(chunk['x'])) total = sum(result) print(total) ->4252532
We loop over the file and basically get the 1000 first records and do a sum and append the sum to result. We then take the next 1000 and so on and so forth and then we do a sum of all the elements of the list result and get the final sum of all the elements in the file read.
We can do so as well with the code below :
import pandas as pd total = 0 for chunk in pd.read_csv('data.csv', chunksize=1000): total += sum(chunk['x']) print(total) ->4252532
Now we can rework, another time the algorithm on the tweeter tweets !
# Define count_entries() def count_entries(csv_file,c_size,colname): """Return a dictionary with counts of occurrences as value for each key.""" # Initialize an empty dictionary: counts_dict counts_dict = {} # Iterate over the file chunk by chunk for chunk in pd.read_csv(csv_file,chunksize=c_size): # Iterate over the column in DataFrame for entry in chunk[colname]: if entry in counts_dict.keys(): counts_dict[entry] += 1 else: counts_dict[entry] = 1 # Return counts_dict return counts_dict # Call count_entries(): result_counts result_counts = count_entries('tweets.csv',10,'lang') # Print result_counts print(result_counts)