from os import getcwd,chdir,listdir
= getcwd()
currentdir = listdir()
myfiles #chdir(/path/to/new/directory)
10 File I/O (Input/Output)
Up to now, we have been working with computer-generated or manually typed data sets. Often in a scientific setting your data will be stored in a file and you will need to read the contents of the file into Python so you can perform an analysis. Most data files are text files, but there is a large variety of these that differ mostly in the way the information is formatted. The file extension (i.e., the 3 or 4 letters after the period at the end of a file name) specifies the formatting of the file. For example, a .csv file (short for comma-separated values) has commas to separate the information.
10.1 The os
module
Before we learn how to read and write data files, we should first learn how to navigate our computer’s file system from within Python. For this, we will use a module called os
(short for operating system). The os
module allows you to perform standard operations on the files and folders on your computer. When opening files from a python script, the default search path is the directory where the current file is located. If you want to open files located in other directories, you may need to use the os
module to navigate there first. The following functions are the most commonly used ones
Function | Description |
---|---|
getcwd() |
Short for get current working directory. Returns a string of the current directory. |
chdir(path) |
Change the current working direction to be at path . |
listdir() |
List all of the files and folders in the current working directory. |
mkdir(path) |
Make an new directory at location path . |
In the cell below, we show the usage of these functions.
To Do:
- Add print statements to the cell above and inspect the results to understand what each function does.
- Modify the third statement above (the one that uses
chdir
) to change the current working directory to one that actually exists on your machine.- Everyone has a “Downloads” folder on their computer. Use the
chdir
function to change the current working directory to the Downloads folder.
The os
module has many, many more functions that do useful things but you will mostly use the functions mentioned above. As you get more experience using your computer at the command line, you will be better equipped to understand the usefulness of the rest of this library.
10.2 Reading Files
10.2.1 Reading Line by Line
The first way to read a file is using a for
loop to iterate over the file line by line. Admittedly, this is not the most elegant or efficient way to read a file but we present it first because it always works. First, the file is opened using the open
command. The file should be attached to a variable for later use. Next, the readlines()
function is used to read the file in as a list of strings; one string for each line in the file. We should use a for
loop for to iterate over this list, effectivley parsing the file line by line. Finally, the file should be closed when you’re finished reading it. Let’s see an example for reading in the following file, which will be named squares.csv
. (You can download squares.csv here if you want to execute the cells below without an error.)
1, 1
2, 4
3, 9
4, 16
5, 25
6, 36
7, 49
file = open("squares.csv")
= file.readlines() # Read the file into a list of strings, one string for each line in the file.
lines for line in lines: # Iterate over the list of strings.
print(line) # Do something
file.close()
1,1
2,4
3,9
4,16
5,25
6,36
7,49
8,64
9,81
10,100
You can see how each line gets read separately and printed off. But this isn’t super useful yet because we’d probably like to have the numbers stored in lists for our forthcoming analysis. We can fix this by creating some empty lists and appending the appropriate values as they are read in.
= []
numbers = []
squares
file = open("squares.csv")
for line in file.readlines():
int(line.split(',')[0]) )
numbers.append( int(line.split(',')[1]) )
squares.append(
file.close()
print(numbers)
print(squares)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Now the data from the file is saved in Python lists and our analysis can proceed.
One final note: if all you ned to do is read the first few lines of a file, the readline()
function can be used (not readlines()
) and each call to this function will read the next line in the file.
file = open("squares.csv")
= file.readline()
lineOne = file.readline()
lineTwo = file.readline()
lineThree file.close()
10.2.2 Using NumPy’s genfromtxt
function
If the data file is highly structured (every line looks the same, separator character is consistent across the file, etc) then NumPy’s genfromtxt
function can read the data very efficiently into something called an array. The genfromtxt
function requires only one argument (the file name) with another optional argument (delimiter
) that is typically included to specify the character used to separate the data. Below is an example for using this function to read in the .csv data we have been working with.
import numpy as np
= np.genfromtxt('squares.csv',delimiter = ",")
data
print(data)
[[ 1. 1.]
[ 2. 4.]
[ 3. 9.]
[ 4. 16.]
[ 5. 25.]
[ 6. 36.]
[ 7. 49.]
[ 8. 64.]
[ 9. 81.]
[ 10. 100.]]
The data from the file is stored in something called a 2D numpy array which is similar to a nested list. It’s best to think of a 2D numpy array as having rows and columns, and to index them the desired row and column must be provided. We can index these arrays similar to how we index nested lists but instead of using multiple sets of brackets ([]), simply place the desired row and column number in a single set of brackets, as shown in the cell below. Use a colon (:) to select an entire row or column.
import numpy as np
= np.genfromtxt('squares.csv',delimiter = ",")
data
= data[:,0] # Extract the entire first column of the array
numbers = data[:,1] # Extract the entire second column of the array.
squares = data[1:3,1] # Extract the first and second row of the 2nd column.
mySlice print(numbers)
print(squares)
print(mySlice)
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
[ 1. 4. 9. 16. 25. 36. 49. 64. 81. 100.]
[4. 9.]
Another useful feature of arrays is that you can do math with them and the mathematical operations are performed on every number in the array automatically. This is in contrast to lists which cannot be used to perform mathematical operations. See the example in the cell below:
import numpy as np
= np.genfromtxt('squares.csv',delimiter = ",")
data
= data[:,0] # Extract the entire first column of the array
numbers = data[:,1] # Extract the entire second column of the array.
squares
= np.sqrt(squares) # Take the square root of all of the number in the "squares array"
calc print(calc)
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
Other optional arguments that can be used with the genfromtxt
function are given in the table below:
Argument | Description |
---|---|
delimiter |
The string used to separate value. By default, whitespace acts as the delimiter. |
skip_header |
The number of lines to skip at the beginning of a file. |
skip_footer |
The number of lines to skip at the end of a file. |
usecols |
Specify which columns to read with 0 being the first. For example, usecols = (0,2,5) will read the 1st, 3rd, and 6th columns. |
comments |
The character used to indicate the start of a comment. Lines beginning with this character will be discarded. |
10.3 Writing Files
Writing Python data to file is as simple as is reading a file. Just like when you are reading a file, determining which method to use will be determined by the type of data that you are writing. If your data is strictly numerical information stored in an array, Numpy has a function that will quickly save the data to a file. If your data is rife with inconsistencies, non-numerical data, etc, you’ll have to use Python’s native write
function.
10.3.1 Writing Line by Line
Sometimes the data file that you want to write includes some text or other non-numerical data. For example, what if you wanted to write the following data to file:
Planet | Acceleration due to gravity (m/s\(^2\) |
---|---|
Earth | 9.8 |
Moon | 1.6 |
Mars | 3.7 |
Venus | 8.83 |
Saturn | 11.2 |
Uranus | 10.5 |
Neptune | 13.3 |
Pluto | 0.61 |
Jupiter | 24.5 |
Sun | 275 |
In this case you must open the file you want to write through and write each line of the file one by one. This requires that you use a loop to iterate over the data. An example is given below.
= ["Earth","Moon","Mars","Venus","Saturn","Uranus","Neptune","Pluto","Jupiter","Sun"]
planets =[9.8,1.6,3.7,8.83,11.2,10.5,13.3,0.61,24.5,275]
g
= open("planets.txt","w")
f
"Planet g (m/s^2)\n")
f.write("------------------\n")
f.write(
for idx,planet in enumerate(planets):
f"{planet:10s} {g[idx]:5.2f} \n" )
f.write(
f.close()
10.3.2 The writelines
function
If the data you are wanting to write to file is already in a list of strings, the writelines()
function can be used once to write all of the lines in the file.
= ["Planet g (m/s^2)","------------------", "Earth 9.8","Moon 1.6","Mars 3.7","Venus 8.83","Saturn 11.2","Uranus 10.5","Neptune 13.3","Pluto 0.61","Jupiter 24.5","Sun 275"]
planets
= open("planets.txt","w")
f
f.writelines(planets)
f.close()
10.4 The with
statement
Forgetting to close a file that you are writing to can be problematic so you should always include the close()
function when you are done. If you are worried about forgetting it, you can use a with
block that will automatically close the file once the block terminates. An example is given below:
= ["Earth","Moon","Mars","Venus","Saturn","Uranus","Neptune","Pluto","Jupiter","Sun"]
planets =[9.8,1.6,3.7,8.83,11.2,10.5,13.3,0.61,24.5,275]
g
with open("planets.txt","w") as f:
"Planet g (m/s^2)\n")
f.write("------------------\n")
f.write(
for idx,planet in enumerate(planets):
f"{planet:10s} {g[idx]:5.2f} \n" ) f.write(
Even though no close()
function is called, the file planets.txt
will be automatically closed once the with
block is terminated. A with
block can also be used when reading data files.
10.4.1 Using NumPy’s savetxt
function
If the data is strictly numerical and contains no text, the savetxt
function is a fast and efficient way to write the data to file. For example, maybe you have a two dimensional array containing various columns of planetary data.
Radius ( \(\times 10^{6}\) meters ) | Mass ( \(\times 10^{23}\) kg ) | Acceleration due to gravity (m/s\(^2\)) |
---|---|---|
6.37 | 59.8 | 9.8 |
1.74 | 0.736 | 1.6 |
3.38 | 6.42 | 3.7 |
6.07 | 48.8 | 8.83 |
58.2 | 5680 | 11.2 |
23.5 | 868 | 10.5 |
22.7 | 1030 | 13.3 |
1.15 | 0.131 | 0.61 |
69.8 | 19000 | 24.5 |
696 | 19890000 | 275 |
from numpy import savetxt
=[[6.37,59.8,9.8],[1.74,0.736,1.6],[3.38,6.42,3.7],[6.07,48.8,8.83],[58.2,5680,11.2],[23.5,868,10.5],[22.7,1030,13.3],[1.15,.131,0.61],[69.8,19000,24.5],[696,19890000,275]]
data
"planetsData.txt",data,fmt = "%5.2e") savetxt(
The fmt
keyword argument can be be added to specify how to format the data. If the data is stored in separate one-dimensional arrays, you can pack them into a single list(tuple) when using the savetxt
function and it will write each data set to its own line in the file.
import numpy as np
= [6.37,1.74,3.38,6.07,58.2,23.5,22.7,1.15,69.8,696]
radius = [59.8,0.736,6.42,48.8,5680,868,1030,0.131,19000,19890000]
mass = [9.8,1.6,3.7,8.83,11.2,10.5,13.3,0.61,24.5,275]
g
"planetsDataTwo.txt",(radius,mass,g),fmt = "%5.2e") np.savetxt(
In this example, radius, mass, and g will be written to file in seperate rows. If you’d like them to be writen in separate columns, you must use np.transpose
to flip the array before writing it to file, as shown below:
import numpy as np
= [6.37,1.74,3.38,6.07,58.2,23.5,22.7,1.15,69.8,696]
radius = [59.8,0.736,6.42,48.8,5680,868,1030,0.131,19000,19890000]
mass = [9.8,1.6,3.7,8.83,11.2,10.5,13.3,0.61,24.5,275]
g
"planetsDataTwo.txt",np.transpose((radius,mass,g)),fmt = "%5.2e") np.savetxt(
Other optional arguments that can be used with the savetxt
function are given in the table below:
Argument | Description |
---|---|
delimiter |
String or character to separate columns. |
newline |
String or character to separate lines. |
header |
String to be written at the beginning of the file. |
footer |
String to be written at the end of the file. |
comments |
String that will be prepended to the header and footer string to mark them as comments. |
10.5 Flashcards
- What does
os.getcwd()
do? - What does
os.chdir()
do? - What does
os.mkdir()
do? - What does
os.listdir()
do? - How do you open a file and read all of the lines in that file using
readlines()
? - How do you open a file and read all of the lines in that file using
numpy.genfromtxt()
? - What is the
delimiter
keyword argument used for when reading a file withnumpy.genfromtxt()
? - How do you open and write to a file using the
write
function? - How do you write to a file using
numpy.savetxt
? - Where are the sacrament prayers located? (They are found in more than one location)
10.6 Exercises
This file contains the coefficients of \(100\) second-order polynomials. Each line in the file represents a single second order polynomial: \(y = a x^2 + b x+ c\) with the first number equal to \(a\), the second equal to \(b\) and the third equal to \(c\).
- Use any method of your choosing to read this file into python.
- Use a loop (I’ll let you decide what kind of loop to use) to evaluate each polynomial at \(x = 2.5\) and add them all up. Hint: You should find that the sum of all of the polynomials is \(5285\).
# Python code here
- In the cell below, you will find arrays containing \(1000\) values for mass, length, and radius just as you did in the other homework problem.
- Using a single line of code, calculate the moments of inertia for the entire data set using the familiar formula: \[ I = {1\over 4} M R^2 + {1\over 12} M L^2\]
- Write this data to file first using the
open
andwrite
functions. Mass values should be in column 1, lengths in column 2, radii in column 3, and moments of inertia in column 4. Add a header to the file to label the columns. Open the file and inspect it to verify that you did it right. - Repeat exercise 2 but using the
numpy.savetxt
function. Open the file and inspect it to verify that you did it right.
from numpy.random import normal
= normal(5,.5,1000)
mass = normal(2,.2,1000)
length = normal(1,.1,1000) radius
- This file contains planetary data for all of the planets in our solar system formatted as a “.csv” file.
- Download the file and move it to a location on your computer that you are familiar with.
- Read the file using the
open
andreadlines
functions. - Repeat exercise 2 using
numpy.genfromtxt()
. - Using a single line of code, calculate the average eccentricity of the planets.
- Using a single line of code each, determine the maximum and minium acceleration due to gravity of the planets.
skip_header
andusecols
keyword arguments to tellgenfromtxt
to skip the first row and column.
# Python code here
- This file contains the output from a quantum mechanical calculation on a Cobalt-based alloy. The file is pretty large and the data is not well-structured at all even though there is valuable information inside. Reading this file is definitely a job for
open
andreadlines
rather thannumpy.genfromtxt()
. Periodically in this file there are lines that look like this:
free energy TOTEN = -72.3092 eV
and you need to extract the numbers from all of these lines. Use the following steps to build a list containing these numbers:- Use the
open
andreadlines
functions to read this file into a list of strings. - Iterate over this list and look for lines that contain the string “TOTEN”.
- When you find a line that has this string in it,
split
the string and extract the number. Make sure you convert the number into a float. - Append this number to a list that you defined initially to be empty.
- The number of energies in your list.
- The average of the last 5 entries.
- The minimum and maximum energies in your list.
- Use the
# Python code here