Lecture 36a: Files

posted Apr 5, 2013, 5:51 AM by Samuel Konstantinovich   [ updated Apr 10, 2013, 10:07 AM ]

#NEW NOTES added in blue.
#Before you use a file you must open it with the open(filename,mode) function. The filename is a string like 'start.txt' or "data.bat". 
#To read files use 'r' as the mode, to write to a new file use 'w' as mode.
#Make a text file called first.txt, that contains the following information: (without triple quotes)
'''this is
a three
line file'''
#be sure to save this file in the same location as the python program you wish to open it! To make things simple save all your code in the same place so you don't lose your labs. 

#Your python program could open it to read with the command:
x= open ('first.txt','r')

#There are 3 different ways you want to read from a file, read(), readlines(), and readline(). You want to choose ONE of those three methods, and stick with it for any given case.

#read() will get the entire contents of a file as one string 
s = x.read()
#You have used up all of the x file, and should NOT try to read from it again, you should just close it.
#all of the text is stored in the variable s, you can print s and see the whole file.

#readlines() will get the entire contents of a file as a list of multiple strings, where each string of the list is a separate line of the file.
b = x.readlines()
#You have used up all of the x file, and should NOT try to read from it again, you should just close it.
#now b is the list: ['this is\n','a three\n','line file']

#readline() will get one line of the file, starting with the first line, each call of readline() will get the next line of text, until there is no file left, in which case readline() returns an empty string "".
#normally we use readline in a loop which I give an example of later. For now I will not use a loop so you can see what happens:
#a now contains 'this is\n'
#You have NOT used up all of the x file, and you need to read from it again if you want to see the rest of the file.

#a now contains 'a three\n'

#a now contains 'line file'

#a now contains '', so you NOW know you used up the file. You should NOT try to read from it again, you should just close it.

fileName1 = 'general.txt'
fileName2 = 'romeoandjuliet.txt'
f= open( fileName1,'r' )
#the 2nd parameter can be 'r' , 'w', or 'a'. (read , write, append)
#be aware that write mode erases the file.

#1st example: read()
#this will get all of the contents of the file and put it into the variable
text = f.read(
#you can then print it or do whatever you want with it
print "The regular text version:"
print text

print "The split version:"
print text.split()

#2nd example: readlines()
#makes a list of all of the lines (like split, but keeps the\n at the end)
print lines

#the 1st and 2nd examples are fine for small files, but you don't want
#to do this with very large files! If you do, Python will load the
#entire file into memory. 

#3rd example: readline()
#readline will allow you to get one line of the file at a time.
print line1
print line2
#notice that the print adds an extra new line after the newline
#you can loop the readline command to keep getting the next file

#or use a loop to do this with every line
while line:  #line is False when it is empty string. 
    print line

#IMPORTANT YOU SHOULD CLOSE YOUR FILE, after you finish using it:

fileName = 'romeoandjuliet.txt'
f= open( fileName,'r' )

text = f.read()
text = text.replace("SCENE","")
words = text.split()

while i<len(words):

#below this is some fun stuff that you aren't expected to be able to replicate (yet)
#use some lists to count the frequencies of each word...
for a in words:
    if not a.lower() in listwords:
        counts[ listwords.index(a.lower()) ] +=1

for i in range(len(listwords)):
    freq.append( [counts[i],listwords[i]])
    if counts[i] >0  and len(listwords[i])>13:
        print counts[i],"\t:","\t",listwords[i]

print freq

#IMPORTANT YOU SHOULD CLOSE YOUR FILE, after you finish using it: