2014-04-30 Final Baseball Lab

posted Apr 30, 2014, 5:38 AM by Samuel Konstantinovich   [ updated May 1, 2014, 6:53 AM ]
Your final job is to make a yankees stat website. You have until Sunday night to get this working!

It must  be a py file that calculates stats. 

(The problems are numbered by difficulty, print the table at the end of your file if you do it first)

0. When you are done, you will have to choose a second team to compare stats. The stats were found here:
http://www.baseball-reference.com/

I grabbed this, and switched to CSV:
http://www.baseball-reference.com/teams/NYY/

Note: The format of the new data files from the baseball reference website is SLIGHTLY different:
Rk,Year,Tm,Lg,G,W,L,Ties,W-L%,pythW-L%,Finish,GB,Playoffs,R,RA,BatAge,PAge,#Bat,#P,Top Player,Managers
Using a spreadsheet program (remember to separate on comma only!!!)
a. Delete the 2 extra columns (pythW-L% and GB)
b. Rename PAge -> PitchAge , and #P -> #Pitch
In the end you will have Two web pages, a yankees page, and an OTHER_TEAM page. They should have links to the opposite page at the top, so you can go between them.

1. Make a complete HTML website, a little style would be nice, but you can do that last. (3 points)

2. Print all of the yankees stats in a nice HTML table, since this is a big table, put it at the bottom of your page.  (3 points)

3. Print total stats for:   Wins, Losses, Ties (3 points)
Like:

Total for all years:
W     2000
L        500
Ties       0
W-L%  80%     ***This has to be calculated, not added up or averaged***    

4. Print an HTML table with the column names, and the year with the max and min value for that column. 
(3 points) Do this for all integer columns (You do not have to worry about the % columns W-L%, BatAge, PitchAge)

Such as:    
Column     MinValue     MinYear     MaxValue   MaxYear
G              160             1910          192            2004
W                50             2010          120            1995
L                 30             1995          100            2010
Ties              0              1987            10           1942
...                 ...                 ...                ...             ...
...                 ...                 ...                etc


Hints:
1. Make your data into a list of lists instead of a list of strings:
header = [ "year","x","y"]
data = [
["1912","11","11"]
["1925","9","2"]
["1980","1","7"] 
]

This can make your functions easier to work with.

2. There are many ways to go about part 4. You need to use multiple functions to help you solve this.

You SHOULD have functions to find the max and sum of a specific column from your other lab. If you make the data a list of lists, then you have to modify these functions.

You can write a function to use in conjunction with your max/min functions such as:
yearOfValue(val,col,data)  gives me the first year that has a matching number in the given column in the dataset.

using the fake data above:
yearOfValue(9,1,data) -> 1925


If you find the minimum value of column 1, you can then use the year of value function :
minValue = minValueInCol(data,1)
minYear = yearOfValue(minValue,1,data)

Now you have the minValue, and the year... do the same for the max value. 










BONUS CONTENT, Notes I was supposed to post from the SAT data:

infile=open('SAT2010.csv','r')


#period 3 is on top, period 2 is on the bottom, there are some differences.
#our goal is to write a new file with just the following data columns:
outfile=open('new.csv','w')
outfile.write("School Name,Math,Reading,Total Score\n")

lines= infile.read().split('\n')
infile.close()
lines = lines[1:]
for line in lines:
    if len(line)>0:
        line=line.split(',')
        if not 's' in line:
            name = line[1]
            if not '"' in name:
                math = line[4]
                reading = line[3]
                total = str(int(math)+int(reading))
                result = name+','+math+','+reading+','+total+"\n"
                outfile.write(result)
outfile.close()


#from period 2
'''

text = infile.read()
lines = text.split("\n")

#header = lines[0].split(',')
#print header

#column 1 is the school name
#column 3,4 are the math,reading
lines = lines[1:]
#print lines[:5]


for line in lines:
    if not '"' in line and len(line)>0:
        line    = line.split(',')
        name    = line[1]
        math    = line[3]
        reading = line[4]
        if math!="s":
            total = int(math)+int(reading)
            newLine = name+","+math+","+reading+","+str(total)
            outfile.write(newLine+'\n');
outfile.close()
infile.close()
'''









ċ
SAT2010.csv
(25k)
Samuel Konstantinovich,
May 1, 2014, 6:02 AM
Comments