Write a program to parse English language story problems of a particular type ("Age" problems) to produce the algebraic equations which can be used to solve the problem. For example:
"A year ago, Gary was twice as old as Ron is now. In four more years, Ron will be as old as Gary is now. Neither one is yet a teenager. How old are Gary and Ron now?"
should produce equations Gary-1=2*Ron and Ron+4=Gary
Each edition of the
Mensa
Brain Puzzlers Page-A-Day Calendar
that my wife and I work
through has a number of age related problems similar to the above sample.
It occurred to me that the text of these problems was uniform enough that a program to examine the
text and convert it into the appropriate equations would be more interesting
than solving the puzzles by hand. The current program took about a
week of spare time to create and was more successful than I would have
expected.
The syntax analysis is far from being generalized but it is adequate to handle
the 8 age problems in this
year's calendar which were entered verbatim as "AgeTest1.txt" through
"Agetest8.txt" and included in the zip file downloads below. The problems
are converted in several stages and using a number of word conversion tables.
Briefly, the analysis is table driven (to the extent that this was easy to
do), with idea that I could enhance the parser and make it work for new problems
by changing the tables. There are 5 text files used as follows:
1. Un-needed words and delimiters are removed based on "UnNeededWords.tbl" file.
2. Names of the people are identified. Common initial capitalized words to
ignore are in "Initialwords.tbl". The other capitalized words must be
proper names. The names are used as variable names representing that
person's age.
3. Numbers are converted to a standard text form using the "Numbers.tbl" file;
"one" to "1", "twice" to "2*"
etc.
4.Sentences are converted to a "canonical" form replacing names with "&V", whole
numbers and fraction
numerators with "&N", denominators with "&D". Patterns in file "OpWords.tbl" are
tested against the
canonical form and matches are replaced with a corresponding text phrase in
equation form.
5. Numeric and name identifiers and then replaced with the original values and
the results displayed.
Two problems made interesting coding tasks:
For the text forms represented by the 8 included sample files, the program works
quite well. The
resulting 2 equations in 2 unknowns are easily solved algebraically. A future
version may add the less
interesting solver code to produce numeric answers for the problems.
Addendum November 28, 2007: Here is Version 2 which does solve the age equations to report the numerical solution. While finding the solution using a linear programming technique would be simple, putting the expression in standard form to extract the coefficients might not have been. My alternative approach uses a recently posted Expression Evaluator object to find values for the left and right sides of each equation by trial and error. Using ages 1 through 20 for each person, we look for ages which make the left side expression equal to the right side expression for each equation and reports the successful pair.
Addendum December 11, 2007: Version 2.1 posted today handles a couple of additional problems, mainly by adding a few entries to to Unneeded words table and the OpWords table. The last problem from today's Mensa calendar entry, Agetest10.txt, required a change to test all equations in pairs rather than simply using the first two equations since the first two sentences yield equations which are essentially identical. ("Rachael is now twice as old as Ryan will be in one year." and "In two more years, she'll be twice as old as Ryan will be then." yielding "Rachael=2x(Ryan+1)" and "Rachael+2=2x(Ryan+2)" ) . Either equation together with the 3rd sentence will produce an solution.
Addendum January 16, 2008: The new Mensa calendar started off 2008 with an problem about Nick's and his grandfather's ages which our program couldn't solve. The un-capitalized word "grandfather" was not recognized as a name and led to a new category of words, those which should be treated as names even though they do not begin with a capital letter. To reduce the growing number of word conversion categories, I replaced all of the individual text files with a single initialization file . AgeProblemTables.ini contains sections for each of the previous table files plus the new "Capitalized" section. Version 3 zip files contain AgeTest11.txt, the now solvable problem about Nick and his grandpa.
For the programmers, I should mention what I learned and using TInifile: The ReadSection method which reads names for a specific section, only recognizes entries that contain an "=" sign even though the value (the data to the right of the equal sign) is not required. Firstwords, UnNeeded, and Capitalize sections all used this "name only" format. The Numbers, Denominators, and OpWords sections use the ReadSectionValues method to read names and values.
Addendum April 8, 2008: Version 4 was posted today with a couple
of additional problems; 15 now including a couple of variations of the same
problem with alternate text. There was some more tweaking of the parsing
tables, and, to help debugging, buttons to reload the parsing tables
without restarting the program, and a "Backtest" button which runs all
available problems and displays a summary of results.
| Done January, 2008 |
|
| Handling a wider range of age problems would make the program smarter (not automatically!) | |
| Done Nov. 28,2007. |
| Original: November 14, 2007 |
Modified: November 07, 2008 |