andreas

Mar 202010
 

The desire for code reuse has been a driving force behind most efforts in software engineering, and in this article we will look into three increasingly sophisticated ways to achieve reuse.

Libraries

Libraries can be anything from Cobol copy libs to modern shared libraries. They can be the result of meticulous modularization in your own, earlier applications (a rare case), they can be part of a “system library” (like the vast set of libraries in UNIX or the Java Runtime Environment), or they can be third-party libraries, open source or commercial. It does not matter which, all libraries have something in common: They never exactly fit your application.

In order to be re-usable, libraries have to cater to different users. It also does not matter if they are object-oriented or not, you almost always have to initialize something and then to call a procedure or a method, passing data down as parameters, having to check for and to react to errors. Initialization can be cheap or expensive, you may have to do it upon every use or only once at startup or first use. The problem with libraries is, that in order to be useful, they have to have extensive interfaces.

Typical libraries in early GUI systems (e.g. XLib, OSF Motif, etc) or in systems for distributed computing (ONC aka Sun RPC, OSF DCE, Microsoft DCOM, CORBA) have hundreds or thousands of procedures or methods, often with long parameter lists, parameters frequently being of types defined in libraries as well, types that can be created by other library functions and so on and so forth.

Using those libraries quickly riddles your application with infrastructure code, often forces you to structure your code in certain ways, and these ways may be incompatible with other, alternative libraries, making any attempt to switch between alternatives practically impossible.

Frameworks

Frameworks are a much more sophisticated solution for code reuse. They accept the problem of entanglement, embrace it, and reverse the direction of control. Frameworks are the dominant solution today. No more do you call the library, the library calls you. A framework is called a framework, because it provides a frame, a kind of main program that does all initialization and most of the infrastructure plumbing.

Of course you still have to write application code. A framework is like a generic main program, but in order to do anything useful, it must rely on application code. Let me give a very primitive example.

Earliest GUI libraries relied on applications providing a main loop that took care of events. Something like this:

// some variables
MyTypeOfObjects currentlySelected;
...
// the loop
boolean done = false;
do {
    WindowSystem.event e = WindowSystem.getNextEvent();
    switch (e.getCode()) {
        case WindowSystem.EXPOSE:
            doRedraw();
            break;
        case WindowSystem.MENU_CLICK:
            WindowSystem.MenuDetails menuButton = e.getMenuDetails();
            switch (menuButton.getCode()) {
                case WindowSystem.StandardMenus.OPEN:
                    doOpen(currentlySelected);
                    break;
                case WindowSystem.StandardMenus.EXIT:
                    done = true;
                    break;
                case WindowSystem.StandardMenus.DELETE:
                    // delete currently selected object ...
                ...
                default:
                    log("Recieved invalid menu entry!");
                    break;
            }
        case WindowSystem.KEY_PRESS:
            // process keys ...
        ...
        default:
            log("Recieved invalid menu entry!");
            break;
    }
} while (!done);
// maybe some cleanup
...
exit(0);

GUI frameworks released you of the burden to write these main loops yourself. You had a main program provided by the framework, and this main program was able to process all possible events, menu buttons or key presses that the window system could ever deliver. By default it would ignore events, but you could register functions like doRedraw() and doOpen(FrameworkTypeOfObjects currentlySelected) to be called in case of certain events.

Obviously this is big progress, but the framework, still being a library, can’t know about your application types. You see how I have changed the method signature of doOpen() from taking a parameter of MyTypeOfObjects to a parameter of FrameworkTypeOfObjects. The framework is in control now, and because it can’t know your data types, it forces you to accept framework data types as parameters.

Again you have reuse, but now the framework forces its abstractions upon you. You have to write less code, but it is not as few as you’d have hoped for, because you now need to write a layer to adapt your abstractions to those of the framework. Of course you can ignore the problem and simply use framework abstractions in your own code, but if you do that, you’re doomed anyway, at least in the long run. New versions of the framework will force you into a deadly maintenance routine, and if the framework ever becomes unavailable, you can happily begin writing your program anew.

WADL

Sometime around 2001 I was confronted with the request to write a big web application. The application would have to work with a relational database, it would have to be written in Perl and I would have to use kind of a framework that had been developed in-house. A quick analysis identified three user roles, and the number of pages would be greater than 40. I had roughly six months before the system would go productive.

At that time I had almost no experience in Perl, had never used relational databases, and it was going to be my second web application. #1 had been in Perl as well, but it had been an application with two or three pages and the same number of forms. It had been trivial, but as I had hand-crafted it, it had been tedious nevertheless. I was in bad need of a tool.

In a spell of recklessness I used five of the six months to analyze the problem and construct a tool, and then I spent a month building the application. It was a gamble, but it worked, the application was a success and I was in business.

I called this tool WADL (Web Application Definition Language). Just like RPCmagiX (I wrote about it in the last post), I failed to ever publish WADL, and in the meantime the name has been taken. WADL is now a W3C proposal for something like the REST equivalent to WSDL.

My “WADL” was more, much more. It was a way to specify the structure and visual details of a web application. The specification was done in XML, and with a code generator you could generate a complete application. All the pages and forms were there, they only had no content. For prototyping purposes you could associate dummy data with the pages, and this way it was possible to create a complete prototype without writing one single line of code. The pages displayed meaningful data, it’s only that the data sent from one page had no influence on the next page. In cases where the result page was determined from input data, the prototype would pop up a choice box where you could select the desired outcome.

You had one XML file for the structure of the application (the “Application Definition”) and one XML file for each page (“Page Definition”). Furthermore you could have XML files describing database structures with tables, views, foreign key relations, etc.

The application definition consisted of some application attributes (most important the name), the definition of roles (like “user” or “administrator”), the reference to the database definitions if any, and finally the definition of the graph of pages. There were start pages (those that you could directly address from a GET request, at least one per role) and other pages. Each page had an attribute “roles”, specifying the roles that could get to that page. Events took you from page to page, each event corresponding to a button on the page that could be pressed and that would submit a form.

Roles could overlap. Think of a system where a role “user” can search for and display data. A second role “admin” can enter new data, but of course “admin” can search and display as well. The roles overlap, “admin” shares part of his graph of pages with “user”.

The page definitions basically described what was on the pages. There were text blocks and form blocks, and within form blocks you had form elements like input fields, text areas, select boxes, labels, grouping elements, etc. A layout generator would automatically generate a layout, conforming to our internal style guide, but the system was modular, layout generators could be plugged in, and it was even possible to use HTML templates (I called them “HTML Makeup”) on a per-page basis.

From the database definitions WADL created Perl classes, one for each table definition. A support library handled encapsulated database access.

That’s about what you got from XML alone. For everything else you had to write Perl code, but WADL generated templates to give you a quick start. You had to implement so-called “Processors” for all possible page transitions. In cases where the target page was determined at runtime, the processor was called a single method process() in a class Processors::OriginatingPage_EVENT (edge processor, processing the edges corresponding to a single event), the return value of this method determining the target page, and in all other cases it was a method OriginatingPage_EVENT() in a Perl class Processors::Page::TargetPage (page processor, processing all incoming edges to a page), with “OriginatingPage” and “TargetPage” being the respective page names and “EVENT” being the name of the event. To implement processors, you simply copied from the generated template directory, and began inserting code. This worked pretty well, because due to the prototyping system, basic questions about application structure could normally be answered very early.

The processors communicated with the application via generated input and output objects. Thus they did not have to care about the actual page structure. They took values from methods named like page elements, but it did not matter whether a value came from an input field or from a text box.

WADL: Additional Benefits

Knowing so much about an application opens up many opportunities, that you probably have not even thought of. One of WADL’s most successful features was a byproduct of my curiosity. I wanted to know, how much of the application I had already implemented, and so I began collecting data, but then I thought, why not visualizing it?

I already knew the Graphviz project, and it was fairly simple to write a program that created an application graph for each role and a graph for the database structure. The nodes in the application graphs represented pages, the edges were events. Blue edges represented events for which processors were already in place, gray edges represented events without processors yet.

Nodes were clickable, and they brought you to another graph, showing the incoming and outgoing events for that page. Here the events were clickable, and they brought you to the actual code of the processors. From there you could go on to the next page graph and so on. Essentially you could click through the source code in exactly the same way as you would navigate through the application.

This visualization proved to be one of WADL’s most successful features, because it was trivial to assess a project’s progress. You only had to look at the blue and gray edges and at some numbers in the statistics section.

Another byproduct was a code generator for database transformations. It took two database definitions and transformation rules, and from that it generated a script for transferring data from one database to another, doing all necessary transformations. Machine-readable knowledge about structure – you can do all sorts of things with it.

But there’s more. WADL was extremely easy to learn. You had guidance in XML via the DTDs, the whole project structure was generated, there were commands to generate templates for HTML makeup, processor templates could be generated, the templates were commented, thus it was all a matter of copying some files and filling in code where the comments hinted at it. One of our programmers had never before written a Perl application, and his first project was the biggest WADL application that was ever built. More than a hundred pages, many hundreds of processors, a three step workflow, and at the end of it the input of roughly a hundred users was compiled into an automatically created PDF full of tables. He created the application, finished it in time, and I did the PDF creation code.

What I did, was implementing an HTML to LaTeX translator, and then we typeset the document on the fly at download time. Using HTML as input had the advantage, that we could display the same code on a preview page. I took the code that I had written and made it part of WADL’s tool set.

But there’s even more. WADL automatically structured the projects. You never had to write any plumbing code, everything was always the same, regardless of project, regardless of project phase. Adding a functionality might mean adding a page and some processors, but that never complicated the project. Each new page had the same complexity, each processor needed the same effort as any other processor. WADL scales linearly.

I can’t imagine any pure library system or any pure framework that could ever scale that way. They can’t, because you always have to write plumbing code, code that’s repetitive and tedious to write, and whenever you do it, you do it in a slightly different way. Only then you complicate things, because what should be similar, becomes different, and over time it turns the project into a maintenance nightmare.

It’s a hard fact: such code should not be written. It can’t be simplified, because it is complicated by nature. It can’t be packed into libraries, because though it follows patterns, it is never the exact same. It’s similar structure we’re talking about, not sameness of code. There’s nothing to be factored out.

While RPCmagiX (see the last post) was a tool to create the ideal library for your interface, WADL was a tool to create the ideal framework for your application. Both would have been impossible without code generators.

WADL is still in use, but I did not get the funding to keep it current. It is pretty outdated now. The base mechanism is still CGI, that means one process per request, we have no AJAX support and I have never found a good way to keep up with .NET’s excellent support for SOAP.

And Now?

If I would implement WADL now, I would use the Java Enterprise Edition as its basis. It implements all that a big, scalable application could possibly need, it does it in a quite elegant way, and this basis would also make it more acceptable to management, would make it less of a risky, exotic solution tied to one person.

It’s only that I am not interested in WADL any more. I have solved it once, I could do it again, but there would be no challenges, no surprises, only tedious work. I intend to aim higher. How high, that’s what we will find out as this blog develops, as a plan begins to form, as I get input, as we discuss these matters. A first sketch will follow soon.

Mar 202010
 

Promises

Every two or three years we see the Next Big Thing. We had programming languages, Structured Programming, Object Orientation and all that stuff, and more recently things like Agile Software Development or SOA. Always we could see an industry developing, and always our managements have bought into it, and always things got … easier???

Yes and no. There is certainly an evolutionary process, and it is completely obvious, that today’s software development methods are ways ahead of what was done 50 years ago. But then, the industry favors what can be sold, and this brings its own share of problems.

A Look Back

20 years ago I wrote my programs in C. Now I use Java and sometimes C#, in between I have used Tcl/Tk and Perl a lot.

Scripting languages were a big step from C. Have you ever used regular expressions or hash tables in C? They are implemented as libraries and using them never feels natural, That’s very much different in modern scripting languages.

Java is a similar step forward, only in a slightly different direction. It not only releases me from the burden of keeping track of the memory allocated by my programs, it’s also strongly typed, and that makes a difference in big projects and in tool support. I use Eclipse as my Java IDE, and once you get used to that, there is no way back.

And still: although the tools have become so much better, the computers so much faster, we still struggle with the same class of problems we had 10 or 20 years ago.

RPCmagiX

In 1990 I began playing around with Remote Procedure Calls. Distributed Computing and Client/Server were big buzzwords then. It took me about seven years to come up with a solution that was also my Master Thesis. The tool, RPCmagiX, was meant to be published as free software, but in the end I failed at that. It was used internally though, and many of the programs made with it are still in production.

RPCmagiX was all about maximum simplification. Using RPC systems like DCE or ONC, you had to write a lot of boilerplate code, and you had to do it over and over again. Different RPC systems were incompatible not only on network protocol level, but also in the way programs had to be written to use them.

RPCmagiX did away with all of that. It used a nice unified model, and all you had to do, was to write a subroutine that would run in the server, along with a client that called the subroutine. You used a graphical tool to define the interface, datatypes, the procedures and their parameter lists, selected programming languages for client and server, and the tool would generate a library to be linked to the client, and a library including a main program, that could be linked to the procedure implementations, and that could be started as the server program. The only thing that client code used to refer to the service, was a first parameter in every procedure, a so-called service name, basically a structured name modeled after UNIX file names. The generated client stub and the client library used this name to look up matching server instances, connected to servers, handled automatic failover, etc. As a programmer, you did not have to write a single line of network code, and you had the additional benefit of being able to make cross-language calls. For instance we had Windows clients in Visual Basic calling UNIX servers written in Cobol.

RPCmagiX did a great job hiding the whole complexity of distributed computing inside of the interface. Using specification of the interface and code generation, it gave us lots of additional benefits as well, for instance you could generate data for statistical use, you could send notifications in case of failures, you could have load balancing, and all that in a standardized way. This was possible, because the system knew about the interface and could use all the information in the interface, but the actual application code was not at all entangled with networking concerns.

Using such a system gives you a clear boundary between what you use and what you write. If your application can do networking by knowing not more than a simple name of a service, if you can simply call a remote procedure without having to initialize network libraries, create stubs and connect them to sockets, if calling a remote procedure is a one-liner, only then you have a chance to later replace your networking system if you need to.

At that time we were using ONC (aka SUN RPC), because it was a free alternative to the then expensive DCE (OSF Distributed Computing Environment). We prepared to eventually make the switch to DCE, which we actually never did, but DCE largely failed anyway. Such is life :)

Whatever. Responsible application design must try to avoid creating new systems that immediately become the legacy systems of the future. We can do this by separating application logic from infrastructure code. If we do that, we can change infrastructure providers, not entirely without effort, new technologies must still be tested and encapsulated, but at least it is possible. Doing this, we protect our investment in analysis and avoid writing the same systems every ten years, over and again.

The State Of The Industry

Frankly, I would have expected the industry to come to similar conclusions, but interestingly enough it did not. We still get library system after library system, we still have to use factories to create objects that we need to initialize in order to make them do their tasks. We see progress with technologies like the Java Enterprise Edition (JEE) or Spring, where we have Dependency Injection, but although I am very impressed with JEE and the elegant solutions they have come up with in release 5 and 6, there are plenty of other fields where it is bad as it ever was.

The answer is simple. Consensus in the industry is always the result of political struggle, very often even of outright war. It’s Betamax/VHS/Video 2000 all over again. Remember HD-DVD vs BluRay? Sometimes the better system wins (may have been the case with BluRay), sometimes it’s the most inferior (as it was certainly with VHS), but in no case is it due to technical merits.

And it gets even worse when you look at how buying decisions are made and who makes them. Wave after wave of management fashions washes over our heads, and although we are lucky and many of the latest buzzwords never leave management circles, some do, and then we are confronted with requests like to build a Service Oriented Architecture (SOA) based upon an Enterprise Service Bus. And that all, because some clever consultants sold management the idea, that such a mythical beast would allow non-programmers to click together meaningful applications from a host of useful generic services.

Of course this won’t happen and to a programmer, the whole idea looks ridiculously naive, but just having to constantly evaluate the currently fashionable paradigm and to cope with the ever immature and unreliable tools, just the pure effort to implement applications on poorly understood foundations, just this does all the damage.

The result is a succession of applications that have nothing in common, have no consistent architecture, are by necessity entangled with the underlying technology, and that is because they always begin as prototypes. Invariably these applications are typical first applications in a new technology. They are poorly structured, riddled with workarounds for features that the underlying technology does not correctly implement at that time, and even that is only under ideal conditions. Add some organizational incompetence and you have a poisonous brew.

Any Way Out?

I don’t think there is much to expect from the industry. Too many players, too much politics, and obviously a solution to these problems is not seen as a necessity. To the contrary. The industry makes much of its profit exactly by perpetuating the status quo. Thus we have to look elsewhere.

I have no readily available solution either. I had one, RPCmagiX, for a very narrow field, and even though I had another broader, but still limited solution for the field of creating web applications (I’ll come to that in another post), both are currently outdated. I didn’t get the resources to maintain and develop them further, but then, I have learned a lot in the process of creating them, and I plan to bring those experiences into a new process and toolset for application design. I will work on that on my own, in my free time, and I will document what I think about and what I do on this blog.

The results of my work, if any, will be published as Free Software, and from a certain point I will try to find people who join me in creating this work. I have no schedule though. If I were you, I would not bet on me finishing anything for you to use in a certain project. This is very much research, and due to being done in my free time, I can’t promise anything. On the other hand, we all may learn something on that way, and even if it all comes to nothing, we may at least have some fun trying it :D