Ramblings on Software Architecture: May 2011

Friday, 27 May 2011

Development Tasks, Part 3

This is the third part of a series describing the development tasks for a software
project. See Software Development Process and Development Methodology for the
context of this topic.

In the previous part we looked at the initial design phase. In this part we will
cover the detailed design of the proposed system.

Test Case Design and Construction

In this step we develop the test harness. This is done prior to code development
so that coding has something to test against.

Write the tests with input values based on the use case parameters in the
precondition set up. Evaluate the return code for completeness and correctness,
and log any discrepancies.

Use Case Design

Here we document the flow of information in the system. With an OO design it
can be difficult to understand the interactions between objects, so we document
that here.

Most texts will suggest that sequence diagrams are the correct tool for this task.
However its been my experience that these diagrams are tedious and messy to
construct, and very quickly become difficult to follow in all but the simplest
scenarios. I prefer to use collaboration diagrams as they can more easily show
the flow of control for complex systems. It is also possible to convert trace
data from a running system into collaboration diagrams, thus documenting the
real situation.

System Test Support Design

Just as modern hardware has “Power On Self Test”, so should a software system.
If we build support for system testing into the applications we can speed the
testing phase, and thus get the project completed sooner.

Persistent Storage Design

We need to define the format, and rules, that apply to the storage of the
application’s data. This usually involves experts in the storage component,
and can to some extent be started prior to the application specific work. It
will usually involve creating some additional classes to interface the
application specific classes to the storage mechanisms.

Class Design

Based on the method specifications and use case design, document each
attribute’s visibility, type, and initial value and for each method in
the class document its signature, visibility, and type.

Review the design against established design patterns to ensure that the
design is complete.

You may be required to do this step in a design tool, but my preference, when
developing C++ code is to do this step by creating the header files for the
classes and including stub versions of the implementation with comments
describing the intended design.

Code Generators

Many large systems will have a fair amount of code that can be generated from
significantly simpler descriptions of data structures. You should review your
design at this point to determine if there are any candidates for code
generation (before some poor programmer tries to write them by hand).

In the next part we will look at the implementation tasks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Wednesday, 18 May 2011

Development Tasks, Part 2

This is the second part of a series describing the development tasks for a software project. See Software Development Process and Development Methodology for the context of this topic.

In the previous part we looked at the analysis phase. In this part we will
cover the initial design of the proposed system.

Project Tools

This task is one that is on-going for the life of the project. It is
responsible for setting up and maintaining the various tools that the project
will use.

Initially this might be just a word processor, but it will expand to include
version control products, project planning programs, development environments,
and so on.

At the very least it should include a list of what tools are necessary to build
the system.

Architecture

This task is concerned with the design problems that go beyond the selection of
algorithms and data structures, concentrating on the overall structure of the system.

Structural issues include gross organization and global control structure;
protocols for communication, synchronization, and data access; assignment of
functionality to design elements; physical distribution; composition of design
elements; scaling and performance; and selection among design alternatives.

The output of this task includes multiple views (eg concept model, network model
database distribution, etc), a list of the products that will form part of the
delivered system, and the specification for the interfaces between major internal
components.

Interface Stubs

For each interface create facade classes for each module. Add stub code that
allows each interface to be called and return default values. Define the abstract
interface classes and stub interface classes for any callback style interfaces.

Create test harnesses that call these interfaces. These will evolve into test
harnesses for the modules. The aim is to be able to develop each module in
isolation from the other modules. (See Chaotic Programming.)

This task has the useful side effect of getting the programmers involved in
some actual development very early in the process.

Component Exploration

Third party products rarely behave exactly as expected, or they may be new to
the developers. They need to be well understood if they are to be used successfully.

Build test programs to demonstrate how the components work together. Document
the criteria required to get them to inter-operate as required for the project.

Note that there is some danger that the client will modify their requirements
when they see what the products are capable of.

Migration Planning

Most systems will start life with a body of data that was captured by its
predecessors. That data needs to be converted and loaded into the new system.
This task describes how that is to be achieved. We do this early so that the
ability to load data is designed in from the beginning.

Determine which data will be useful, and organise how it will be extracted
from the existing system and loaded into the new one. Determine how the data
will be cleaned up to get rid of incorrect values.

Test Planning

For every outcome of each use case (for the current increment), describe the test
cases, this includes the number of tests, how the pre-condition state will be
achieved, and how the post-condition state will be validated. Break this into
sections for acceptance testing, system testing, integration testing and unit testing.

Application Specification

The aim is to ensure that everything the programmer needs to know is properly
defined.

For each use case describe how each system attribute is changed.

The deliverables include use case descriptions, class diagrams, details of any
complex algorithms, and an updated data dictionary.

User Interface

Design the user interfaces for the system based on the use cases. This can be
done with a user interface design program which can then generate the classes
necessary to run the interface.

The resulting UI can then be used to confirm the use cases by demonstrating the
system to the users. Obviously you will inform them that this is just the UI and
that there is nothing actually working yet.

In the next part we look at the remainder of the design tasks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Friday, 13 May 2011

Development Tasks, Part 1

See Software Development Process and Development Methodology for the context
of this topic.

Vision Statement

A vision statement sets the goals for the project. It puts forward an idea
for others to consider. It will usually outline the problems that need to be
addressed and suggest some sort of solution.

The vision statement remains an important document for the life of the project.
It should be the first document that is read when wanting to understand what a
project is all about.

A vision statement can also help to keep a project constrained to its original
goals and help prevent scope creep.

Preliminary Analysis

This task elaborates and clarifies the client’s statement of need. It allows the
client to tell if the analyst has really understood the problem.

Discuss with the domain experts and do research into similar problems.
Investigate the implications of the requirements.

From the article "Making a success of preliminary analysis using UML"

“It is necessary to define the application domains on which a system is to be put in place, and the processes that the system must support. Terminology, definitions and domain boundaries are clarified, in order to explain the problem in a clear context. In this domain, functioning modes must be explained, in the form of business procedures, but also in the form of rules and business constraints."

One of the outputs of the preliminary analysis task is the first draft of the
project data dictionary or glossary.

Feasibility Study

The primary purpose of a feasibility study is to demonstrate that there is at
least one viable solution to the problem. For a solution to be viable it will
need to be realizable with the resources available and technically possible to
implement.

A feasibility study should concentrate on the technically novel components
of the system. There is no point in re-examining things which are readily
available such as web servers or databases.

It is not necessary for the study to produce the best solution, just one
which satisfies the minimum requirements, or which demonstrates that with
more development effort a satisfactory solution should be possible.

Modelling

This task applies when there is some existing system that is being replaced.
The existing system might be automated, or entirely manual paper based, but
either way it is important to document what it does before we try to reproduce
it.

From the article "Making a success of preliminary analysis using UML"

"An analysis of what already exists must be carried out, by representing it as a system whose structure, roles, responsibilities and internal and external information exchanges are shown. All preliminary information must be collected, in the form of documents, models, forms or any other representation. The nature of the products developed by the processes is explained.”

Requirements Gathering

This task involves creating a formal list of things that the proposed system
must accomplish. The list should have a well defined numbering system so that
requirements can be unambiguously referred to for the remainder of the project,
and beyond into the maintenance phase.

The functional requirements list what the system is supposed to do. These are
usually the things that most interest the users of the system.

The environmental requirements describe the environment in which the system
must operate. For example the system might have to operate using Windows
based computers, or it might have to operate in an aeroplane.

The quality (or non-functional) requirements will drive the architecture of the
system. Obtain the limits on the performance (speed and size), security, safety,
and resource usage (disk, memory, band-width). Determine the availability
requirements, how modifiable the system must be, the ease with which it can be built
and tested. Consider how soon the product must be delivered. Usability is often
an important requirement. This needs to be considered for the various users of
the system, from the occasional user to the “power” user. Describe what
is required to administer the system.

Deliverables

At this stage of the project you should have use case descriptions describing
how users will interact with the system, a list of the main actors and objects,
detailed interface formats for external systems (and memorandums of understanding
with the owners of those systems) and an initial idea of the increment plan.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Tuesday, 10 May 2011

Development Methodology

The development phases outlined in "The Software Development Process" are
just the broad categories of tasks that need to be undertaken. The methodology
outlines all the potential tasks that a software project might need. It is
important, however, for the project manager to tailor these tasks according
to the needs of the particular project.

Analysis Phase

This phase attempts to define what it is that is required. It should avoid any
attempt to constrain the design; we want all options open to the designers so
they can come up with the best solution.

The analysis phase tasks include preparing a vision statement, preliminary
analysis, feasibility studies, modelling of existing systems, and requirements
gathering. For most real world projects there will also be some sort of approval
process to get the funding for the development.

Design Phase

The design phase starts with the software architecture and steadily refines the
design until it is ready to be handed to the programmers. The level of detail
required is highly dependent on the nature of the project. For a small project
in a well defined environment you could do the whole thing around a white board
in a few hours. At the other extreme it could take a large team many months and
produce a huge amount of documentation.

The design tasks includes developing the software architecture, planning the
migration from existing systems, developing the interfaces between the components
of the system (and to external systems), exploring the capabilities and limitations
of the proposed COTS products, defining the test strategy, creating a data dictionary,
creating use case and class diagrams, designing the user interfaces, designing support
for testing, designing the persistent storage, and designing any code generation
utilities that are required.

In parallel with the design tasks, it will also be necessary to set up the
development and testing environments, procure all the support tools required,
and recruit the team members into the project.

Implementation Phase

The programmers can often start with the interfaces to the components. These
should be well defined quite early in the design process. Stub and test versions of
each interface can be developed so that each component can be developed in
isolation by relying on the interface tests.

Similarly stub versions of the user interfaces can be developed and presented
to the client as a prototype of the application. This will confirm that the
requirements are correct and will set up the client's expectations for the
project.

The implementation phase includes the coding of the application programs and
libraries, the unit testing, the development of the documentation and the
development of the deployment guide. Testing of each use case should be done
as part of the implementation, preferably using some automated capability.
Quality control processes such as code reviews must be part of the schedule.

A generalisation task should be scheduled. This allows the designers and programmers
to revisit the design and re-factor some of the code. This will make the next
increment and later maintenance much easier.

Test Phase

The test phase is responsible for testing the assembled components (integration
testing) and testing the system in its enterprise context (system testing).

The test phase will also include the client's acceptance testing, so you will
need to support that.

Finally each round of development should finish off with a review so that
the development process itself can adapt to the team's strengths and weaknesses.

-- the next few blogs will expand the description of these tasks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Saturday, 7 May 2011

Software Development Process

This is the first of a set of blogs on the topic of developing software.
It relates to software architecture by showing the context; by showing how architecture
relates to the overall process.

I have always had an interest in the development process, but this was enhanced when
I got a copy of the book, "The Object Oriented Design Process” by Tom Rowlett.
The problem was that the methodology described left out many important steps, and
there was nothing about the actual development process - there was nothing on progress
control, change control or quality control. It occurred to me that I could
write a much better book, but unfortunately I am not much of a writer and there
has never been enough time. So these blogs will have to do for now.

There are three components to the development process: the model, the
methodology, and the management.

The first is the development model. This describes the overall structure of the
project and how it evolves over time. The models are "waterfall", "iterative",
"test driven", "incremental" and "spiral". Each one attempts to solve problems
with its predecessor, but still each can be used for certain categories of projects.

Waterfall

The waterfall model was the first formalisation of the development process.
The development of a system was divided into four phases: analysis, design,
implementation and testing. The project would be scheduled so that each phase
was completed before the next was begun. This simple model was well suited to
the usual project scheduling tools and has been the basis for managing projects
for many many years.

Of course there are some serious problems. If testing shows up a bug, you need
to send it back to the implementation phase to get it fixed. Similar problems
can occur in the other phases. Generally you only use the waterfall model for
very small projects in a well known environment.

Iterative

The iterative model formalises the feedback from one phase to the previous.
It allows the designers to begin work as soon as the analysts have most of
the requirements defined and it allows the designers to seek clarification
from the analysts. Similarly the programmers can start on some of the coding
earlier, and can seek clarification of the design. Testing and bug fixing can
be managed sensibly.

The main problem is that all the testing still happens at the end where the
pressure to complete is at its maximum.

Test Driven

The test driven model starts the testing process as soon as the requirements
have been defined. While the designers are developing the design for the system,
the testers can be designing their tests. The testing can begin as soon as some
components of the system have been developed.

This model works well for small developments, but for large projects it can
take too long to deliver anything, during which time the requirements can start
to change.

Incremental

The incremental model breaks the functionality of the system into separately
deliverable components, or increments. This enables us to get something
delivered quickly. It also allows the design to be adjusted if the requirements
change - each increment includes a small analysis and design step to keep the
project aligned with the business.

Note that not all increments get delivered to the customer as a release. An
increment should be a small, well defined body of work concentrating on a
single aspect and may only take a few weeks to develop. If the customer has
a lengthy acceptance testing procedure it might be overwhelmed by releases
that happen monthly, so it might be best to only do a release occasionaly.

The problem with large projects is that the first increment can take too long
to deliver since it has to create most of the infrastructure to support even the
smallest amount of functionality.

Spiral

The spiral model attempts to get early increments delivered more quickly by
using less or lower quality infrastructure. The project thus advances in two
dimensions at once - quality and functionality.

The main danger is that the initial low quality increments might give the
project a bad reputation, so it might be better to keep the first few as
demonstration only versions rather than as actual releases.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Friday, 6 May 2011

Super Sized Software

The killer application that can take Free Software to places that proprietary software cannot follow is beginning to emerge. If it can be done it will cement FLOSS's position into the IT world, permanently.

For a long time the Linux community has been hoping to find the killer application that will propel our favourite OS into the forefront. Unfortunately, anything that can be built in Linux can also be built under Windows - it might not work the same, but from the user's perspective its likely to appear equivalent. So far the search has just seemed to be wishful thinking.

Now we might have an answer.

It began with a conversation with a friend one evening. We were discussing the possibility of building our own MMORPG. My friend was of the opinion that building an entire MMORPG was too big a task, but it would be interesting to build some component that could utilise the multi-core CPUs that are beginning to come to market, and more so for the next generation that would have dozens of cores. At the time I was not very familiar with MMORPGs so I decided that it might be interesting to learn more on the subject.

Over the following weeks I looked into what might go into an MMORPG. I looked at interpreted languages, game engines, physics engines, scene graphs and the other core components of a MMORPG. In each case I found open source components that could be used to implement them. However, the more I looked, the larger the project became. It needed a communications component, so P2P software came in. It would need a development environment, so Eclipse looked promising. It needed sound and graphics editing, it needed documentation tools and drawing tools. It needed configuration management. The list just seemed to grow and grow. But there was always some FLOSS software that might fit.

The problem became “How do I build this?” Even though 90% of the software already existed as FLOSS components I still had to build the other 10%, and that was a lot of software. I thought of starting a project on SourceForge, but for that to spark an open source project it needed to have some code already done, so that was ruled out for now. The problem really was that I did not have an architecture for the project, a blue print to guide the development.

At this point I realised that what applied to MMORPGs also applied to other very large projects. My career over the last 25 years has included more than my fair share of enormous projects. I now realise that many of these projects could have been constructed from FLOSS components, plus a relatively small amount of project specific development. Of course, in those days there was not the variety of FLOSS that we have today, so it was not necessarily wrong to overlook that option 20 years ago.

My recent work involved the maintenance and development of a software suite. It had about 500K lines of code. When we first took over the project there were multiple proprietary 3rd party products incorporated. We dropped some almost immediately as there were various licensing issues. Others dropped out over time as the vendors either withdrew the product, went out of business, or decided that they would only support a different operating system. Fortunately we were able to either develop our own replacements or find some open source product to fill the gaps.

This is where FLOSS can leave proprietary behind.

When you spend any appreciable time in a Microsoft focussed organisation you quickly realise that one of the limiting factors, if not the limiting factor, in a lot of system development is the cost of licensing the components. A solution will often be judged not by how well it solves the problem, but according to how much it will cost in licenses. There is also the overhead of managing those licenses, a non-trivial additional business cost in many large organisations.

These license costs put an effective upper limit to the size of a system. Beyond a certain size it is both too expensive to implement in-house and too expensive to build from third party components. There is no point in building a product for the market if your profit will be totally consumed by license charges, so marketable products have an even lower size limit.

You will also find that interfacing the products together quite challenging. The vendors will often use proprietary data formats within their products, and often also for persistent storage. You will therefore be limited to interfacing to components that the vendors have built suitable export/import functionality for. Often the building of a large system consists of writing the glue to hold all the components together. On the other hand FLOSS products are usually built to open standards, and it has been my experience that their developers will go out of their way to be as compatible with other products as possible.

The lifetime of your large system will want to be as long as possible. However, many vendors seem to have quite short product cycles. From my experience it is difficult to avoid doing complete redevelopments every four or five years simply because too many components are no longer applicable to the project, even with a relatively small number of 3rd party components. With a large project it would seem that you would quickly arrive at an impossible to maintain state. Again, FLOSS products do not seem to have this problem, with many components of the system I was maintaining actually pre-dating Linux by up to a decade.

Now, of course, we cannot start building such monster systems immediately. They are simply too large for any FLOSS team to undertake. Since few people have tried to do this before there are very few tools and very little knowledge on how to build monster systems. The commercial world has done some things like this for government and big business. Their tools are for bespoke systems where most of the code is written specifically for the project. Their techniques are really just scaled up versions of the techniques we all use in developing software. Since these projects are often in $100M range we need a much more efficient system for the FLOSS community.

A possible approach might be to use knowledge engineering to guide and capture the architecture. The Web Ontology Language (OWL) seems to be rich enough to capture a software architecture, and its logic capability can be used to detect inconsistencies. It might also be possible to use artificial intelligence techniques, such as genetic algorithms to optimise the design.

In summary, the killer application for Linux is not any particular program, but the ability to build software on a grand scale. It should be possible to build enterprise size systems, and perhaps even industry scale systems.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Thursday, 5 May 2011

Interpreters

Part of the software I was recently contracted to maintain included a small language and interpreter that was used to do validations of the data. While looking to extend the language to handle significantly more complex data, it started to become apparent that there wasn’t really any point in using an interpreter at all!

To understand when its appropriate to use an interpreter, it is perhaps worthwhile looking at the different types of interpreters that are available.

The first group are those that directly interpret the source code. The original BASIC was of this type, and so are Unix shell programs and MSDOS’ batch file interpreter. These interpreters read the source code (often called a shell script) and immediately perform whatever operations are specified. Usually, there are various optimisations, but for the purpose of this discussion, these interpreters can be viewed as just described.

The main benefit of these interpreters is the ease with which the code can be changed and re-run. The down side is that they are usually quite slow. The other main benefit is that these interpreters are often built in to the system, and hence can be relied on to be available.

A good use for these direct interpreters is as “glue” in a complex system of ordinary executable programs. Often, it is not possible, until a system is being installed, to know what all the parameters and environment variables required will be. By putting shell script wrappers around programs, the customer can easily configure their system without the need to recompile the main programs.

The second type of interpreter uses the source code to build a representation of the program in memory, and then the interpreter uses this internal representation. Languages such as Lisp, Postscript, and Visual Basic use this technique, as did the interpreter that we were maintaining.

These languages retain much of the flexibility of the direct interpreter varieties, but usually work significantly faster. Some, such as Lisp, even allow the source to be changed by the program while the program is running. This is good for doing research into artificial intelligence, but is not recommended for commercial programs!

The third type of interpreter is represented by the Java language. In these, the source code is compiled into an intermediate form, usually known as “byte code”. This byte code is then interpreted when the program is run. The advantage of this approach is that the byte code is portable across platforms, and can usually be interpreted very quickly.

So, the reasons for using an interpreter include portability of compiled code, flexibility as required for rapid application development, and as glue in a large complex system.

There is one other reason that is put forward for using an interpreter:

“The details are not known (or cannot be known) during development, so we will provide a language and an interpreter and let the users define the details later.”

If the language that is provided is highly specific to the application domain, then this might be a valid approach, but in the two cases that I have seen in the last few years, the language provided had very little to do with the specific application. In one case I was able to convert hundreds of lines of this code into C++ with a shell script of just two lines.

An alternative to providing a language, is to provide a code generator for an existing language, and compile the result.

Modern computing systems, both Unix and Windows, have the ability to dynamically load libraries at run time. This allows the application system's programs to load code generated by the user, and execute it almost as easily as code that was originally written for the application.

To make this work you will need to provide some application specific framework code, and application specific objects. Then you create a code generator that takes input from the user and creates the validation and other business specific code. This should be much simpler to write than a language, and the results will be much faster, and it should have a better user interface.

So, while there are many good reasons for including an interpreter in your system, you should also consider that there might be better ways of solving the “unknown requirements” problem.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Tuesday, 3 May 2011

The Code is the Design

There are at least two problems with using the code as the design. The first is that
most modern languages have insufficient richness to actually describe the design.

For a simple example, take an object with member that is a pointer/reference to
another object. You cannot tell just by looking at the declaration of the pointer if the
reference is to an object that is an intimate component or just some casual reference.
(Our project coding standard requires the comment "//owned" to be placed next to
declarations to make this relationship clearer.)

The second problem is that you have no idea of the correctness of the code. We used
to jokingly say on this project that it was entirely bug free since the code is the design
and it is, therefore, by definition fully in compliance with the design. The Eiffel
language tries to overcome this by explicitly setting up a contract for each function,
but it is not exactly a common development language.

To summarize, the code is an exact description of what the program does, but a very
poor description of what it was intended to do.

Some languages (Java) compound the situation by providing very little clue about the
overall structure. A directory containing 70 Java classes, most of which have a main()
method, can take quite a while to analyse.

The market for CASE tools to assist with design is a good clue that the languages
themselves are not up to the task of doing the higher level design tasks.

Technology

One of the issues with design is that it is expensive to create and maintain. While I
am of the opinion that it is even more expensive to not have this documentation we
should consider some ways of improving the efficiency of the process.

An important consideration is that the design must be available to the maintainers of
the system. If the design is captured in some expensive CASE tool that the client is
not prepared to keep licensed then the design effectively becomes a non-maintainable
paper document. Such a CASE tool should be part of the deliverable system, just like
the DBMS, for example.

Conclusion

While it may be possibly to skip a lot of design documents, or to just whiteboard
them while the system is being developed, they are likely to be sorely missed
during the maintenance phase of the project. Capturing the intent of the programs
allows us to determine their correctness.

The cost of making changes during maintenance is usually far higher than for making
them in the development process. A large part of this cost arises from the need to
understand how the change will impact the existing system. Hence the more
information that is available the better.

In anything but the most trivial of projects, the design process remains the core of
software development. Documenting that design allows it to be comunicated to those
who need to understand the intent of the programs.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.