Working with I/O Files in Python

Reading data from and writing data files is one of the most common situations within programming in general. The creation, access, and modification of the long-term storage data in a file are performed by the file management system of the operative system. Those tasks are very important, especially in applications that need to store the data, and then use it later. In this tutorial, you will learn how to do this using Python, but first, let's dive into some important definitions related to files.

Table of Content

logo

Definition of File

Using simple words, a file is a collection of bytes used to store data, which is organized in a specific format, for example, it can be as simple as a text file, or as complex as images or video data, it can be even an executable program. This is defined by the extension of the file, for example, a text file has .txt extension, an image may have .jpg or .png, and so on. Whatever the type is, the data end up in the binary form (0 and 1), because is the faster and only way a computer can understand it.

Usually, a file has attributes related to the date when it was created, the last update, the large, etc. Even they have the ownership attribute, which indicates the owner of the file; this one can be changed typically through CLI in Unix-based systems like Linux and Mac OS, using the command chown which means "change owner". You can do the same in Windows OS using its GUI. Also, a file has attributes that indicate access rights, this means who can read, write or execute such file, typically they are represented by the symbols r, w and x respectively; those rights can be specified or changed by the file owner.

Paths

When you're searching for a file you need a path to follow to get there, thus, a path is a combination of directories that lead you to a specific sub directory or file. It is a string representing the location of the file in the file system, it is composed of three elements: the file folder path, the file name, and the extension of the file, which is located at the end of the filename separated with a period (.), as we mentioned above this are used to indicate the file type.

Though file extension are optional, you can find files with no extensions at all.

Let's take a look at following files tree:

/
│
├── data/
|   │
│   ├── costumers/
│   │   └── costumers.csv
│   │
│   └── info.txt
|
└── wrangling.py

Let's supposed you want to clean the customer.csv file using the python script named wrangling.py and your current location is the same folder as the data, then to get the csv file you should write in the code of the script a string like this:

path = 'data/costumers/costumers.csv'

If you were to check some info about the csv in the info.txt you should write something like this:

path = './data/info.txt'

A path may be absolute or relative. The paths in the example above they both are relatives because they provide a sequence from the current working directory. An absolute path provides a sequence from the root of the file system to a specific sub directory or file, for example:

path = '/Users/The_python_tutor/data/info.txt'

That's if you are in Unix based OS, if you're using Windows it would look as follows:

path = 'C:\Users\The_python_tutor\data\info.txt'

ASCII and UTF-8

Something that you need to take into account is the format in which the data you're reading has been written. This is important to avoid encoding problems. An encoding is a translation from byte data to something readable by humans, this is done by assigning a numerical value to represent characters. The most common formats are ASCII and UTF-8. They're similar and share the same numerical to character values. Despite that, parsing a file with incorrect character encoding can lead to failures or misrepresentation of the character. You need to translate ASCII into UTF-8 since Python uses this one as default.

Opening and Closing Files

Opening files in Python is pretty straightforward since there's a built-in function that allows us to do it. The open function creates a file handler object that you can use to manipulate the file. The function requires as a minimum the name of the file you want to work with:

>>> file = open("info.txt") # Relative path
>>> file = open("C:\Users\The_python_tutor\data\info.txt") # Absolute path

You can specify the access mode, for example appending, reading, or writing only. You can even specify opening the file in text or binary mode. The default is read-only in text mode, by using it we get strings (ASCII Characters) when reading from the file. Using the binary mode it returns bytes, it is used when dealing with non-text files like images or executable files. You can find a full list of the modes in the Python official documentation, the most common are listed below:

Mode Description
r Opens a file for reading only. This is the default mode
rb Opens a file for reading only in binary format. This is the default mode
w Opens a file for writing only. Overwrites the file if it exists, otherwise creates a new file
a Opens a file for appending only

In addition, to prevent reading errors we recommend set the UTF-8 as the encoding read option. Once we stop performing actions in the file in our programs, we have to close it to free up the resources that the file is using. We can do this using the close method. Keeping open referenced file objects can cause problems in large applications:

file = open("info.txt", "r", encoding="utf-8")
#Do something 
file.close()

You can make sure your file is closed printing the attribute closed, it returns False if the file is still opened, otherwise True:

>>> file = open("info.txt", "r", encoding="utf-8")
>>> print(file.closed)
False
>>> file.close()
>>> print(file.closed)
True

The above code is not secure because if any exception occurs when we're making some operation in the file the program will stop without closing the file. A good practice to avoid that is using a try-finally block:

file = open("info.txt", "r" encoding="utf-8")
try:
    #Do something 
finally:
    file.close()

A better way to do the same is using the with statement, it closes the file for you once it leaves the with block, even in cases of error:

with open("info.txt", "r" encoding = "utf-8") as file:
    #Do something

This is something called context manager, it handles open-close operations.

Reading and Writing Files

Once we have opened the file, the next thing we can do is being able to access to the content of the file, or write data to that file, maybe both. Reading data from a text file is supported by the read, readline and readlines methods:

Method What it does
read() Returns the entire contents of the file as a single string
readline() Reads the next line of text from a file. It returns all the text on one line up to and including the newline character
readlines() Returns a list of all the lines in a file, where each item of the list represents a single line

Let's see some examples:

>>> file = open("info.txt", "r", encoding="utf-8")
>>> file.read()
'An info file using to describe the content of a CSV file.\n'
>>> file.read()
''

The read method returns a new line as \n, also when the file comes to its end, we get empty strings in on further readings.

We can read each line in our file using readline method:

>>> file = open("message.txt", "r", encoding="utf-8")
>>> file.readline()
"Hi! We're the Python Tutor Team!\n"
>>> file.readline()
'This file was made for using it in this tutorial.'
>>> file.close()

Using readlines we can read all the lines at once:

>>> file = open("message.txt", "r", encoding="utf-8")
>>> file.readlines()
['Hi! We are the Python Tutor Team!\n', 'This file was made for using it in this tutorial.']

Also, we can use a for-loop to iterate over every line of our file:

>>> with open("message.txt", "r", encoding="utf-8") as file:
...     for line in file:
...         line
...
"Hi! We're the Python Tutor Team!\n"
'This file was made for using it in this tutorial.'

To write in the file we have to use the write method. We have to be careful with this method because it overwrites if already exists and the previous data will be erased. Let's see when we created message.txt:

with open("message.txt", "w", encoding="utf-8") as file:
    file.write("Hi! We are the Python Tutor Team!\n")
    file.write("This file was made for using it in this tutorial.\n")

If you want to append to a file or start writing at the end of an existing one, you have to use the a option for the mode argument:

with open("message.txt", "a", encoding="utf-8") as file:
    file.write("\nThis is a file created for this post.\n")

Let's check it out:

>>> with open("message.txt", "r", encoding="utf-8") as file:
...     for line in file:
...         line
...
'Hi! We are the Python Tutor Team!\n'
'This file was made for using it in this tutorial.\n'
'This is a file created for this post.\n'

Beyond text files

As you can imagine there are plenty of file types like csv, pdf, json and so on. Here we just covered the basics, but we will cover more about this topics in next entries.

For now we want to let you know that there's a lot of built-in libraries out there that can help you deal with files that you're working with, here are just a few of them:

Library name What it does
zipfile Work with zip files
pandas Work with many types of files like csv, txt even html and more
mslib Work with Microsoft Installer files
plistlib It generates and parses Mac OS X .plist files
Pillow Allows image reading and manipulation
PyPDF2 Work with pdf
xlwings Allows read and write Excel files

We hope you find this post useful! Now you know how to work with files in Python the sky is the limit for what you can do with your programs! Don't forget share with your Python Peers. Thanks for reading.

0 Comments

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel