Lecture 44: (Lab D)

posted Apr 24, 2013, 6:26 AM by Samuel Konstantinovich   [ updated Apr 24, 2013, 6:26 AM ]
1. Take any text file with some words in it. 
2. Open the file and read it into a string.
3. Make a dictionary and build a frequency chart. That is the keys are the words, and the values are the frequency the words occur.

e.g.
If your text file contains:  "The wind man. The wind is awesome man!"
You would make a dictionary {'The':2, 'wind':2, 'man.':1,'man!':1,'is':1,'awesome':1}

4. Make the dictionary use the upper case versions of the word, and strip all punctuation.
e.g.
If your text file contains:  "The wind man. The wind is awesome man!"
You would make a dictionary {'THE':2, 'WIND':2, 'MAN':2,'IS':1,'AWESOME':1}

5. Write a text file that contains the frequencies of all lf the words (alphabetized) and their frequencies in the format:
A: 976
A-BED: 1
A-FIELD: 1
A-WARPING: 1
ABANDON: 1
ABANDON'ST: 1
ABANDONED: 1
ABASH'D: 1
ABATE: 1
...

6. Tweak your code not to output the empty string, or numbers.
"This is the symphony, that Schubert never ever finished. The divine comedy is next.
This is not a test. A test would be challenging. 1 1 1 1 2 3s"
Should produce:
A: 2
BE: 1
CHALLENGING: 1
COMEDY: 1
DIVINE: 1
EVER: 1
FINISHED: 1
IS: 3
NEVER: 1
NEXT: 1
NOT: 1
SCHUBERT: 1
SYMPHONY: 1
TEST: 2
THAT: 1
THE: 2
THIS: 2
WOULD: 1

7. Get a book from project gutenberg and run it on that to see the results! You should strip away the header/footer that project gutenberg adds to the book. Compare your results with your neighbors. 

Comments