Thursday 5 May 2011

Interpreters

Part of the software I was recently contracted to maintain included a small language and interpreter that was used to do validations of the data. While looking to extend the language to handle significantly more complex data, it started to become apparent that there wasn’t really any point in using an interpreter at all!

To understand when its appropriate to use an interpreter, it is perhaps worthwhile looking at the different types of interpreters that are available.

The first group are those that directly interpret the source code. The original BASIC was of this type, and so are Unix shell programs and MSDOS’ batch file interpreter. These interpreters read the source code (often called a shell script) and immediately perform whatever operations are specified. Usually, there are various optimisations, but for the purpose of this discussion, these interpreters can be viewed as just described.

The main benefit of these interpreters is the ease with which the code can be changed and re-run. The down side is that they are usually quite slow. The other main benefit is that these interpreters are often built in to the system, and hence can be relied on to be available.

A good use for these direct interpreters is as “glue” in a complex system of ordinary executable programs. Often, it is not possible, until a system is being installed, to know what all the parameters and environment variables required will be. By putting shell script wrappers around programs, the customer can easily configure their system without the need to recompile the main programs.

The second type of interpreter uses the source code to build a representation of the program in memory, and then the interpreter uses this internal representation. Languages such as Lisp, Postscript, and Visual Basic use this technique, as did the interpreter that we were maintaining.

These languages retain much of the flexibility of the direct interpreter varieties, but usually work significantly faster. Some, such as Lisp, even allow the source to be changed by the program while the program is running. This is good for doing research into artificial intelligence, but is not recommended for commercial programs!

The third type of interpreter is represented by the Java language. In these, the source code is compiled into an intermediate form, usually known as “byte code”. This byte code is then interpreted when the program is run. The advantage of this approach is that the byte code is portable across platforms, and can usually be interpreted very quickly.

So, the reasons for using an interpreter include portability of compiled code, flexibility as required for rapid application development, and as glue in a large complex system.

There is one other reason that is put forward for using an interpreter:

“The details are not known (or cannot be known) during development, so we will provide a language and an interpreter and let the users define the details later.”

If the language that is provided is highly specific to the application domain, then this might be a valid approach, but in the two cases that I have seen in the last few years, the language provided had very little to do with the specific application. In one case I was able to convert hundreds of lines of this code into C++ with a shell script of just two lines.

An alternative to providing a language, is to provide a code generator for an existing language, and compile the result.

Modern computing systems, both Unix and Windows, have the ability to dynamically load libraries at run time. This allows the application system's programs to load code generated by the user, and execute it almost as easily as code that was originally written for the application.

To make this work you will need to provide some application specific framework code, and application specific objects. Then you create a code generator that takes input from the user and creates the validation and other business specific code. This should be much simpler to write than a language, and the results will be much faster, and it should have a better user interface.

So, while there are many good reasons for including an interpreter in your system, you should also consider that there might be better ways of solving the “unknown requirements” problem.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

No comments:

Post a Comment