Version 9

    Max's Demon

     

     

    Problem Statement

     

    The artifacts desired by humans developing software often are very different from those most suited to computers executing the software. A simple example is the difference between Java source and byte code. A human creating software using Java technology could choose to write in byte code but, for a variety of reasons, finds Java source code easier. On the other hand, the Java Virtual Machine could, in theory, execute Java source code directly but, again for a variety of reasons, it is better to compile Java source code into byte code for execution. This probably seems completely obvious to developers, and hardly worth any effort to explain or consider. Lurking within this system, however, are a number of constraints that are often hard to satisfy in general, and which cause problems when the same pattern is applied to developer tooling in general.

     

    Background

     

    For the purposes of this article, we will use the general (intentionally vague) term 'translation' to refer to cases where one artifact type is changed into another artifact type. 'Translation' is also called 'generation,' 'mapping,' or 'compilation.' For example, the Java compiler translates Java source code into Java byte code. The vagueness of 'translation' here will abstract away details such as whether the translation is done by the computer or by a human (or perhaps a mixture of both), when exactly the translation occurs (e.g. as a separate step during development or on an as-necessary basis during execution), and precisely what happens during the translation (e.g. what additional information is injected or what sorts of optimization are performed). Rather, the key notion for this article is that 'translation' means 'changing one artifact type into another.'  Finally, and regardless of the reasons, we will assume that translation is required/desired in a number of cases.

     

    Likewise, the term 'developer' is intended to be vague. We mean to include a range of skill levels/roles/experience within this general catch-all term. So, a 'developer' might be a Java programmer, a web programmer, or a business analyst (creating, for example, business process models or defining data transformations).

     

    Impact of Translation

     

    The act of translation is not the final step in development, except in those exceedingly rare cases where software doesn't have any bugs, errors during execution, or other deficiencies (e.g. performance). Rather, it is often necessary to understand the execution state of one artifact in terms of the artifact that it was translated from. A simple example is a Java run-time exception: at a minimum the error message is expected to be human readable (not expressed in terms of byte code) and, in common development scenarios, further information (such as line numbers) is mapped back into the Java source code. This allows developers to understand Java run time errors in terms of the Java source code. It is difficult to imagine Java being as popular as it is if developers had to debug Java in byte code. Yet, situations similar to 'debugging in byte code' are not uncommon.

     

    Java can enable developers to debug byte code in terms of Java source code because Java technology provides linkage from Java byte code to Java source code. That is, the translation between Java source and byte code is bidirectional. When can think of this as Java technology providing one translation from  Java source to byte code and another translation from byte code to the Java source code that is was derived from (it is also possible, with varying degrees of success, to 'disassemble' Java byte code back into Java source code, but that is a slightly different topic).

     

    What happens when the translation is not bidirectional? In short, information is lost in translation (strictly speaking: information present during the process of translation is not preserved after the translation is complete). Because of this missing information, the translated artifact can not be understood, except at a general level, in terms of the source artifact. The developer is then thrust into the domain of the translated artifact. This is not appropriate: if the developer were willing to work in the domain of the translated artifact, then why not start there from the beginning?

     

    The Challenge

     

    Imagine if when Java code is executed any errors or run-time monitoring details where expressed as an array showing the state of each bit in memory and on the CPU? Java developers would find this extremely difficult to accept and even the very few who did manage to learn debugging at this level would constantly need to switch from the problem domain (expressed in the Java source code) to the low-level machine domain. Clearly this would not be an acceptable situation.

     

    The requirement from the above is simple to state, but hard to realize in practice:

     

    • Translations for one artifact to another must be as closely bidirectional as possible.

     

    Why 'as closely bidirectional' and not more strictly 'bidirectional?' Detailed examples are beyond the scope of this article, but there are often cases where only an approximate mapping for specific parts of the artifact is possible. (That is, 'bidirectional' does not mean 'one-to-one.') In these cases, every effort should be made to keep the relationship as close as possible.

     

    The insidious aspect of translation problem is that is pushes developer tooling into extremes:

     

    • Translation itself can be a hard problem, let alone the requirement for bidirectional support. The temptation then becomes to avoid these problems completely. Doing so, however, often means that developers are forced to work at a level that is not productive. (A common example: editing XML files is often seen as 'easy' except by those developers who do not know, and do not wish to know, XML.)

     

    • Translation is done, but is not bidirectional. On the surface this looks better than the other extreme, since at least it allows developers to work at a more appropriate level. The illusion of productivity disappears the moment the developer has to interact with the translated artifact during a debug session.

     

    Thus, the fundamental challenge that Max's Demon gives us:

     

    • Write tools in the developer's domain, translate to run-time artifacts, and let the developer understand run-time execution state in terms of the developer's domain.

     

    In the most general case, this is a very hard problem. Excellent tooling does not arise from avoiding challenges; it comes from managing complexities.