2014-05-07 Lab17

posted May 7, 2014, 5:40 AM by Samuel Konstantinovich   [ updated May 7, 2014, 6:23 AM ]
Your goal: Lab17
-Make a web page (From now on, there is no html file in any assignment. You are making python programs that generate web pages)
-The web page opens two or more files that contain plain text books. (don't spend time finding books in class, do that at home when you don't need my help)
-The output of the webpage is a comparison of word frequencies, as outlined in Part1, and Part2 below.
-When you download two books in plain text:
  a. After you save your book, remove the gutenberg header / footer. Just include the book itself.
  b. Replace all hyphens '-' with ' 'spaces. This will fix issues like   "end.--Next" showing up as a word.

Part I. You need to make 3 tables.
(make a table around the 3 tables, so you get side by side tables)
Wrapper table contains:
Hamlet                     Othello                Highest %:
#Table1:                                          #Table2:                                 #Table3:.
Word    Count    %         word   count  %        Book     Dif
a       700      3.0%      a      650    2.0%     Hamlet   1.0%
an      200      1.4%      an     198    0.7%     Hamlet   0.7% 
at      133      0.3%      at     232    0.7%     Othello  0.4%
...                        ...
fish      2      0.001%    fish   0      0.0%     Hamlet   0.001%
...                         ...
The words should be in alphabetical order, and the words in each table1+table2 have to correspond to each other. If the other book does not have the word, the other book needs to add have a zero in its table.

The 3rd table is which book has the higher %, and the difference between the percentages.

Specifics that can help you:

1a. You should have a function that reads a book, and makes a dictionary of tallys.
1b. You should have a function that takes a dictionary and makes an inverted dictionary.

2. Make a function fillMissing(A,B) that takes two dictionaries of words+tallys, and checks all the keys of A that are not in B, and adds them to B with tally of 0. (You can use this twice to fill in all the missing words)

3a. Make a function that takes a dictionary and makes a list of lists in the format of tables 1 and 2.

3b. Make a function that takes two dictionaries, and makes a list of lists in the format of table 3.
4. Make a function that takes a list of lists, and makes an HTML table out of it. (this makes the functions in 3a/3b much cleaner, moving the tags to a separate place.

Part II.  (You need an inverted dictionary to get these)
After you complete the big table, you will put a summary of stats on top. 
You should list (but are not limited to):
1. A table of the 20 most common words in each book, and their tallys.
2. The number of unique words in each book
3. A list of those words.
4. At least one more statistic you calculated.