Wednesday, September 22, 2010

An Overview of CAS

In this post, I intend to outline the basic problem of computer algebra, exposit the design of CAS, name its principle components, and briefly discuss possible directions of future development.

Note that most of this is written in very basic terms. A reader who has looked at the CAS source, or one who desires a technical description of each of CAS's components should skim the current posting (paying particular attention to the last two sections) and await the forthcoming series of explanations of CAS's constituent parts.


Computer Algebra
Everyone in a mathematical field has, at some point, performed symbolic computations. The purpose of a computer algebra system is to automate such computations, usually interactively. For the aspiring author of any such software, this suggests some questions that a sufficient design must answer:
  1. How should symbolic expressions be represented internally?
  2. How should operations be performed upon them?
  3. In what manner should the end-user interact with the program's calculational machinery?
CAS is my exploratory attempt to answer these questions. Below, I will, broadly, describe the design choices I made when writing CAS (keeping in mind the three questions above), leaving more complete, technical descriptions to future posts.


CAS
CAS is written in Python, for two principle reasons. Firstly, that Python possesses an interactive, dynamic nature, complete with REPL (a Read, Evaluate, Print Loop, for the uninitiated). Secondly, that before writing CAS, I was already familiar with Python. The nature of Python lent itself nicely to writing CAS as a module, then importing it into a REPL for interactive use -- effectively, extending Python to do symbolic mathematics. This enables a hypothetical end-user to use a well-known language to work in CAS, and provides them with an interface.

That decision mostly answered the third question above, and the answer to the first was easily found. Any mathematical expression can be represented by a tree. For example, consider a polynomial: x^2 + 2*x + 1. One tree representation looks like this (please pardon the ugly diagram):
Fig. 1 -- x^2+2*x+1
In an expression, nearly every operator has two arguments (as demonstrated above by the nodes with children), but expressions can also include functions (which often take just one argument) or negations (which necessarily take only one argument). Consider, as an example, the function sin(x). The corresponding tree is shown below.
Fig. 2 -- sin(x)
Every expression can thus be represented in terms of trees of operators, functions, and their arguments -- trees are the logical way for CAS to represent expressions internally. The chief problem with this decision is that it seems rather unfair to an end-user to expect them to input expressions in tree form. How, then, to convert an expression in conventional notation into a tree? That is the job of the parser: to take input and transform it into a data structure usuable by the program -- in this case, that data structure is a tree. I shall write more of the parser shortly.

The second question yet remains, and here is the heart of the program. It is necessary to choose one's data structures carefully, and a good user interface is important, but the most interesting part of writing CAS was in finding a  good way to perform the computations.

From an algorithmic point of view, symbolic mathematics can be regarded as consisting of rules for applying operations to expressions. I decided that the best way to translate this to code lay in building a rule-set and machinery to determine which rules in the set apply to a given expression. Then, it remains only to apply the correct rules in the correct order and return the result. This is the basic idea behind both the differentiator and the simplifier. In the first, we have the familiar rules of differentiation, and in the second, a small subset of the procedures for algebraic simplification.

The implementation of this idea was simple; a basic form of the algorithm is stated below.
  1. Take the root node of the tree-form of a given expression.
  2. What is it? E.g., is it an operator? A function? Determine the root object's type, and look up an appropriate rule in the rules-table.
  3. Apply the rule to the top-level of the expression (i.e., the root node and its children).
  4. Recurse down the tree: take each of the root node's children and apply this algorithm to them.
In this way, the common aspects of two distinct problems (i.e., those of differentiation and simplification) were abstracted away, and all that remained to be done was to specify domain-specific rules for each (and those rules are simply the algorithms of symbolic differentiation and a subset of those of algebraic simplification).

Now that the design questions are answered, I would like to enumerate the components of CAS, each of which will receive a more complete treatment in its own post.


The Components
The principal parts of CAS are found each in its own file, the names and brief descriptions of which follow:
  • cas -- the base module itself, constituting the interface code
  • parser -- the machinery for transforming symbolic expressions in traditional notation to ones in tree form
  • differentiation -- the machinery for differentiation of symbolic expressions
  • integration -- a place-holder for a future integrator of expressions
  • simplification -- a very basic simplifier to clean up the output of the differentiator
I will begin my series of posts on these starting with the parser, and omitting until some future point the integrator.


Future Development
The most obvious potential improvement to CAS would be the addition of an integrator. Integration, in the most general case, is a difficult problem -- there are no nice, simple rules like there are for differentiation. I do not expect to be able to build, quickly, a sophisticated integrator, but a primitive rule-based one should not be too difficult to write -- watch this space!

Of more immediacy is the addition of the remainder of the full complement of trigonometric functons (i.e., secant, cosecant, cotangent) and the addition of the hyperbolic functions. As well, it would be interesting to extend the simplifier with some knowledge of the basic trigonometric identities.

No comments:

Post a Comment