Computer Literacy -- Project Description

In order to allow completion within the time frame allowed, I have limited the scope the program covers. It does not currently attempt to infer facts from the material that it reads. It also does not support phrases or clauses yet. Finally, there is no support for nouns spanning more than one word (like Alamogordo High School or Supercomputing Challenge).

The program has been developed under the Linux operating system (SuSE 7.0, kernel version 2.2.16) with gcc version 2.95.2. However, it has been developed with portability in mind, and compiles on LCC-Win32 under Windows 95.

The program consists of four portions-the structure and constant definitions, listed in Appendix A, the grammar functions, included as Appendix B, the database functions, in Appendix C, and the main() routine, in Appendix D.

The program first breaks down the input sentence into word structures, containing the word itself, the end punctuation mark (if any), and fields for the word's allowed parts of speech (from the dictionary) and the final part of speech the program determines is being used in the particular case. Then, the possible parts of speech are flagged by a dictionary routine. It uses a separate dictionary file-a sample is shown in Appendix E. The array of word structures is then passed to a function that tests a list of valid sentence patterns against the possible parts of speech of each word, using rules as to what parts of speech are allowed in each part of the sentence, to determine which sentence pattern is actually in use. Finally, the program identifies the part of speech of each word from the information provided by the dictionary and the detected sentence pattern. It examines the sentence for chains of adjectives where one or more adjectives may also legally be used as a noun and verbs that may legally be used as nouns. Then, they are stored to a database. The database is a simple list of sentences, where each sentence is a list of words and parts of speech. Before making queries on the database, slight user intervention is currently required-the database query routine requires that the file is terminated by the capital letter 'E' on its own line. Then, when a query is made, it first parses the query as it does normal statements, and then goes on to do what is now a rudimentary search of the database.