Lecture 36b: (LAB08)

posted Apr 8, 2013, 6:08 AM by Samuel Konstantinovich   [ updated Apr 10, 2013, 6:04 AM ]
READ THIS DESCRIPTION COMPLETELY.

Attached to the lecture are two files.

1. The lab is described in the lecture notes. This is most important. 
2. Attached is py file with some runnable examples,
3. Attached is the data file
4. You must make a new py file, LAB08.py , or LAB08Fileprocessing.py where you will write the solutions to problems 1-4.
 
The lecture notes have important information about the file format, and hints on what you should do.

Data file format: 
The file is a representation of this table:
Rk   Year    Tm                  Lg        G     W    L...
2    2012    New York Yankees    AL East   162   95   67...
3    2012    New York Yankees    AL East   160   94   66...
4    2012....
...
...

You can open the file in a spreadsheet program or text editor. THIS IS NOT REQUIRED, but can help you see the data better.
-you can open it in gedit, to see it unformatted. Opening the file is not a requirement, but can help you understand what it looks like.
-you open it using libre office, BE CAREFUL you will have to make sure to ONLY HAVE 'commas' CHECKED when it asks you how to read it. NOT Spaces, NOT semicolons.


The first line of this file has the 'names' for each column in the format:
'Rk,Year,Tm,Lg,G,W,L,Ties,W-L%,Finish,Playoffs,R,RA,BatAge,PitchAge,#Bat,#Pitch,Top Player,Managers\n'

The next line, and all subsequent lines are actual data in the format::
'2,2012,New York Yankees,AL East,162,95,67,0,0.586,1st of 5,Lost ALCS (4-0),804,668,32.7,30.3,45,23,R.Cano (8.5),Joe Girardi (95-67)\n'

HINTS:

You can split the header to help you find things:
['Rk', 'Year', 'Tm', 'Lg', 'G', 'W', 'L', 'Ties', 'W-L%', 'Finish', 'Playoffs', 'R', 'RA', 'BatAge', 'PitchAge', '#Bat', '#Pitch', 'Top Player', 'Managers\n']

Then split each line afterwards... the index of the name corresponds to the data...
['2', '2012', 'New York Yankees', 'AL East', '162', '95', '67', '0', '0.586', '1st of 5', 'Lost ALCS (4-0)', '804', '668', '32.7', '30.3', '45', '23', 'R.Cano (8.5)', 'Joe Girardi (95-67)\n']

You can see that
'Year' is index 1, and '2012' is index 1.
You can also see that the numbers are strings, and need to be converted to integers for calculations.

Complete these probems:
Your function may assume: fileText is the readlines() from any file with the same format as yankees.csv.
Your function may assume that columnName is a valid header in the file. 

1. Write a function getSum(fileText,columnName) that calculates the sum of any numerical column in a file that has the format of yankees.csv.


This takes a list of strings fileText, and a string columName
e.g.
print getSum(fileData,'G')   
#this calculates the total number of games played
#since fileData is a list of strings from a file, and 'G' is a column in the file

#2 Write a similar function printAverage(fileText,columnName) that calculates the average of a numerical column in a file that has the format of yankees.csv.
e.g.
print getAverage(fileData,'W')
#would print the average number of wins

#3
#write a function yearOfMax(fileText,columnName) that tells you which year did the team have the highest value of a specific column:
e.g.
print yearOfMax(fileData,'L') 
#this would tell you which year the team lost the most games.


#4 Calculate the average win % of the team. You cannot just average the W-L% column, because each year had a different number of games!














ċ
filesLab.py
(3k)
Samuel Konstantinovich,
Apr 8, 2013, 6:08 AM
ċ
yankees.csv
(14k)
Samuel Konstantinovich,
Apr 8, 2013, 6:08 AM
Comments