/* IS2610 Assignment 7 template - digital word tree due 6PM Nov 10, 2005 As discussed in class your assignment for next week is to redo the word statistics from Voyage of the Beagle using a 27 wide digital (ie. radix like) search tree. The 27 way branching will be driven by your recoding of the alphabetic letters into the numbers 0 to 25 and the end of word termination as the number 26 which is defined as END below. Note that some of the stats, such as the number of unique and total words, will be different from what we got in previous assignments due to a change in the get_word() function. Previously get_word() also included numeric digits as part of a word but for this assignment we are only using alphabetic characters with the upper case letters folded into lower case. You will have to do the character to numeric code conversion as you work your way through each individual character of every word inside of your insert() function. Because the digital tree completely represents the search key (ie. the words) by their individual paths through the tree we do not explicitly store the words anywhere. This means that you have to be able to decode the tree paths into the actual character strings that make up the words so that you can properly report the mostfrequent_word and deepest_word. That is the purpose of the current[] array and the treewalk_level that are global variables that you will use from inside of your treewalk() function. As you recursively walk the tree you should increase and decrease the treewalk_level level so it corresponds to the depth of the recursive calls and returns. Then you can use the treewalk_level in conjunction with your looping across the paths from any node position to insert the proper corresponding character into the current[] array so that current will always contain the character string representation for your current tree position. This is also where you would do the reverse coding from the 0-26 code back into normal ASCII characters. That way you can simply strcpy the current[] string out into the most_frequent_word[] and deepest_word[] arrays as needed. My results are ... $ a.out < vbgle10.txt the root count is 209122 the tree contains 47761 nodes there are 12446 unique words out of 209122 total words most frequent word appears 16930 times the deepest word is at level 19 */ #include #include #include #include #define MAXLEN 100 // A very safe maximum length #define TWID 27 // The tree branching width #define END (TWID-1) // The code for a non letter is 26 struct node { // The tree node structure int count; struct node *np[TWID]; } root; // these will be the stats to collect int nodes_used, total_words, unique_words, most_frequent_count, deepest_level; char most_frequent_word[MAXLEN]; char deepest_word[MAXLEN]; // function prototypes void insert(struct node *, char *); void treewalk(struct node *); int get_word(char *); main(int argc, char *argv[]) { char word_buff[MAXLEN]; // the reusable word buffer while(get_word(word_buff)) insert(&root, word_buff); treewalk(&root); printf("the root count is %d\n", root.count); printf("the tree contains %d nodes\n", nodes_used); printf("there are %d unique words out of %d total words\n", unique_words, total_words); printf("most frequent word <%s> appears %d times\n", most_frequent_word, most_frequent_count); printf("the deepest word <%s> is at level %d\n", deepest_word, deepest_level); } void insert(struct node *np, char *word) { char c; /* ***** YOUR CODE TO INSERT A WORD IN THE TREE ***** */ } char current[MAXLEN]; // keep track of the current word during recursion int treewalk_level = -1; // keep track of the recursion depth void treewalk(struct node *np) { int i, c; /* ***** YOUR CODE TO WALK THE TREE TO COLLECT THE STATS ***** */ } #include // Leave this EXACTLY as it is // its slightly different from previous versions because it ignores digits int get_word(char *s) { int c; do { c = getchar(); if(c == EOF) return(0); } while(!isalpha(c)); do { if(isupper(c)) c = tolower(c); *s++ = c; c = getchar(); } while(isalpha(c)); *s = 0; return(1); }