So far, we have seen different Python data types. We usually store our data in different file formats. In addition to handling files, we will also see different file formats(.txt, .json, .xml, .csv, .tsv, .excel) in this section. First, let us get familiar with handling files with common file format(.txt).
File handling is an import part of programming which allows us to create, read, update and delete files. In Python to handle data we use open() built-in function.
# Syntax
open('filename', mode) # mode(r, a, w, x, t,b) could be to read, write, update
The default mode of open is reading, so we do not have to specify ‘r’ or ‘rt’. I have created and saved a file named reading_file_example.txt in the files directory. Let us see how it is done:
f = open('./files/reading_file_example.txt')
print(f) # <_io.TextIOWrapper name='./files/reading_file_example.txt' mode='r' encoding='UTF-8'>
As you can see in the example above, I printed the opened file and it gave some information about it. Opened file has different reading methods: read(), readline, readlines. An opened file has to be closed with close() method.
f = open('./files/reading_file_example.txt')
txt = f.read()
print(type(txt))
print(txt)
f.close()
# output
<class 'str'>
This is an example to show how to open a file and read.
This is the second line of the text.
Instead of printing all the text, let us print the first 10 characters of the text file.
f = open('./files/reading_file_example.txt')
txt = f.read(10)
print(type(txt))
print(txt)
f.close()
# output
<class 'str'>
This is an
f = open('./files/reading_file_example.txt')
line = f.readline()
print(type(line))
print(line)
f.close()
# output
<class 'str'>
This is an example to show how to open a file and read.
f = open('./files/reading_file_example.txt')
lines = f.readlines()
print(type(lines))
print(lines)
f.close()
# output
<class 'list'>
['This is an example to show how to open a file and read.\n', 'This is the second line of the text.']
Another way to get all the lines as a list is using splitlines():
f = open('./files/reading_file_example.txt')
lines = f.read().splitlines()
print(type(lines))
print(lines)
f.close()
# output
<class 'list'>
['This is an example to show how to open a file and read.', 'This is the second line of the text.']
After we open a file, we should close it. There is a high tendency of forgetting to close them. There is a new way of opening files using with – closes the files by itself. Let us rewrite the the previous example with the with method:
with open('./files/reading_file_example.txt') as f:
lines = f.read().splitlines()
print(type(lines))
print(lines)
# output
<class 'list'>
['This is an example to show how to open a file and read.', 'This is the second line of the text.']
To write to an existing file, we must add a mode as parameter to the open() function:
Let us append some text to the file we have been reading:
with open('./files/reading_file_example.txt','a') as f:
f.write('This text has to be appended at the end')
The method below creates a new file, if the file does not exist:
with open('./files/writing_file_example.txt','w') as f:
f.write('This text will be written in a newly created file')
We have seen in previous section, how to make and remove a directory using os module. Again now, if we want to remove a file we use os module.
import os
os.remove('./files/example.txt')
If the file does not exist, the remove method will raise an error, so it is good to use a condition like this:
import os
if os.path.exists('./files/example.txt'):
os.remove('./files/example.txt')
else:
print('The file does not exist')
File with txt extension is a very common form of data and we have covered it in the previous section. Let us move to the JSON file
JSON stands for JavaScript Object Notation. Actually, it is a stringified JavaScript object or Python dictionary.
Example:
# dictionary
person_dct= {
"name":"Tech",
"country":"Finland",
"city":"Helsinki",
"skills":["JavaScrip", "React","Python"]
}
# JSON: A string form a dictionary
person_json = "{'name': 'Tech', 'country': 'Finland', 'city': 'Helsinki', 'skills': ['JavaScrip', 'React', 'Python']}"
# we use three quotes and make it multiple line to make it more readable
person_json = '''{
"name":"Tech",
"country":"Finland",
"city":"Helsinki",
"skills":["JavaScrip", "React","Python"]
}'''
To change a JSON to a dictionary, first we import the json module and then we use loads method.
import json
# JSON
person_json = '''{
"name": "Tech",
"country": "Finland",
"city": "Helsinki",
"skills": ["JavaScrip", "React", "Python"]
}'''
# let's change JSON to dictionary
person_dct = json.loads(person_json)
print(type(person_dct))
print(person_dct)
print(person_dct['name'])
# output
<class 'dict'>
{'name': 'Tech', 'country': 'Finland', 'city': 'Helsinki', 'skills': ['JavaScrip', 'React', 'Python']}
Tech
To change a dictionary to a JSON we use dumps method from the json module.
import json
# python dictionary
person = {
"name": "Tech",
"country": "Finland",
"city": "Helsinki",
"skills": ["JavaScrip", "React", "Python"]
}
# let's convert it to json
person_json = json.dumps(person, indent=4) # indent could be 2, 4, 8. It beautifies the json
print(type(person_json))
print(person_json)
# output
# when you print it, it does not have the quote, but actually it is a string
# JSON does not have type, it is a string type.
<class 'str'>
{
"name": "Tech",
"country": "Finland",
"city": "Helsinki",
"skills": [
"JavaScrip",
"React",
"Python"
]
}
We can also save our data as a json file. Let us save it as a json file using the following steps. For writing a json file, we use the json.dump() method, it can take dictionary, output file, ensure_ascii and indent.
import json
# python dictionary
person = {
"name": "Tech",
"country": "Finland",
"city": "Helsinki",
"skills": ["JavaScrip", "React", "Python"]
}
with open('./files/json_example.json', 'w', encoding='utf-8') as f:
json.dump(person, f, ensure_ascii=False, indent=4)
In the code above, we use encoding and indentation. Indentation makes the json file easy to read.
CSV stands for comma separated values. CSV is a simple file format used to store tabular data, such as a spreadsheet or database. CSV is a very common data format in data science.
Example:
"name","country","city","skills"
"Tech","Finland","Helsinki","JavaScript"
Example:
import csv
with open('./files/csv_example.csv') as f:
csv_reader = csv.reader(f, delimiter=',') # w use, reader method to read csv
line_count = 0
for row in csv_reader:
if line_count == 0:
print(f'Column names are :{", ".join(row)}')
line_count += 1
else:
print(
f'\t{row[0]} is a teachers. He lives in {row[1]}, {row[2]}.')
line_count += 1
print(f'Number of lines: {line_count}')
# output:
Column names are :name, country, city, skills
Tech is a teacher. He lives in Finland, Helsinki.
Number of lines: 2
To read excel files we need to install xlrd package. We will cover this after we cover package installing using pip.
import xlrd
excel_book = xlrd.open_workbook('sample.xls)
print(excel_book.nsheets)
print(excel_book.sheet_names)
XML is another structured data format which looks like HTML. In XML the tags are not predefined. The first line is an XML declaration. The person tag is the root of the XML. The person has a gender attribute. Example:XML
<?xml version="1.0"?>
<person gender="female">
<name>Asabeneh</name>
<country>Finland</country>
<city>Helsinki</city>
<skills>
<skill>JavaScrip</skill>
<skill>React</skill>
<skill>Python</skill>
</skills>
</person>
For more information on how to read an XML file check the documentation
import xml.etree.ElementTree as ET
tree = ET.parse('./files/xml_example.xml')
root = tree.getroot()
print('Root tag:', root.tag)
print('Attribute:', root.attrib)
for child in root:
print('field: ', child.tag)
# output
Root tag: person
Attribute: {'gender': 'male'}
field: name
field: country
field: city
field: skills
Now do some exercises for your brain and muscles.
Write a function which count number of lines and number of words in a text. All the files are in the data the folder: a) Read obama_speech.txt file and count number of lines and words b) Read michelle_obama_speech.txt file and count number of lines and words c) Read donald_speech.txt file and count number of lines and words d) Read melina_trump_speech.txt file and count number of lines and words
Read the countries_data.json data file in data directory, create a function that finds the ten most spoken languages
# Your output should look like this
print(most_spoken_languages(filename='./data/countries_data.json', 10))
[(91, 'English'),
(45, 'French'),
(25, 'Arabic'),
(24, 'Spanish'),
(9, 'Russian'),
(9, 'Portuguese'),
(8, 'Dutch'),
(7, 'German'),
(5, 'Chinese'),
(4, 'Swahili'),
(4, 'Serbian')]
# Your output should look like this
print(most_spoken_languages(filename='./data/countries_data.json', 3))
[(91, 'English'),
(45, 'French'),
(25, 'Arabic')]
Read the countries_data.json data file in data directory, create a function that creates a list of the ten most populated countries
# Your output should look like this
print(most_populated_countries(filename='./data/countries_data.json', 10))
[
{'country': 'China', 'population': 1377422166},
{'country': 'India', 'population': 1295210000},
{'country': 'United States of America', 'population': 323947000},
{'country': 'Indonesia', 'population': 258705000},
{'country': 'Brazil', 'population': 206135893},
{'country': 'Pakistan', 'population': 194125062},
{'country': 'Nigeria', 'population': 186988000},
{'country': 'Bangladesh', 'population': 161006790},
{'country': 'Russian Federation', 'population': 146599183},
{'country': 'Japan', 'population': 126960000}
]
# Your output should look like this
print(most_populated_countries(filename='./data/countries_data.json', 3))
[
{'country': 'China', 'population': 1377422166},
{'country': 'India', 'population': 1295210000},
{'country': 'United States of America', 'population': 323947000}
]
# Your output should look like this
print(find_most_common_words('sample.txt', 10))
[(10, 'the'),
(8, 'be'),
(6, 'to'),
(6, 'of'),
(5, 'and'),
(4, 'a'),
(4, 'in'),
(3, 'that'),
(2, 'have'),
(2, 'I')]
# Your output should look like this
print(find_most_common_words('sample.txt', 5))
[(10, 'the'),
(8, 'be'),
(6, 'to'),
(6, 'of'),
(5, 'and')]
Day 30 Conclusions In the process of preparing this material, I have learned quite a…
Day 29: Building API In this section, we will cover a RESTful API that uses HTTP…
Day 28: Application Programming Interface (API) API API stands for Application Programming Interface. The kind…
Day 27: Python with MongoDB Python is a backend technology, and it can be connected…
Day 26: Python for Web Python is a general-purpose programming language, and it can be…
Day 25: Pandas Pandas is an open-source, high-performance, easy-to-use data structure, and data analysis tool…