Freitag, 19. Oktober 2007

Textual Editing Framework - First Release and Tutorial

It took me quit longer than expected, but finally there is something that everyone of you can try. There is no an eclipse update-site to install a pretty early but functional version of TEF and a tutorial that helps you with the first steps. If you're interested, everything is layed out in great detail on our TEF web-site.

Donnerstag, 16. August 2007

About the Who is Who in Domain Specific Languages

In this post we try to get a different perspective on what domain specific development/languages can be by looking at the roles involved.

Role and stakeholders in traditional software engineering are pretty simple. We create and use software. You have the software developer and you have the software user. One is the technical genius that creates the product, the other simply uses it.

In software development with domain specific languages, things become more complicated. Since we create languages, use languages to create software, and use software, we need more than two roles. Now we have the language developer, the language user/software developer, and the user. The language developer still needs to be technical genius; he creates the DSLs that make all the domain concepts usable for anybody. Then, we have the language user; he is either a technician or a domain expert. The user stays the user, simply using the end product.

Who is the language user/software developer/domain expert? First we should separate all the roles into two different sets: The competence perspective: computer scientist, domain expert, user. The computer scientist knows about platforms, programming, software. The domain expert knows his domain, he knows about concepts and applications. The user knows nothing, but that he knees this software. The other rule set comes from the language perspective: language engineer, language user, user.

Now we different scenarios with different individuals and distribute the roles from both sets to the involved people:
  • Software development with specialised languages: DSLs are made for more efficient software development. The domain is software development. We have the traditional role allocation. The software developer is computer scientist and domain expert at the same time. Software developer play both roles language engineer and language user. The software user is the user. Example DSLs are Makefiles, JET, JSP, etc.
  • Domain specific development: DSLs are made to integrate the user into the software engineering process. The computer scientist constrains himself to his core expertise. He simply takes the domain concepts from that he has no idea how to use them, and "computerizes them", binds them to a specific system platform, builds some GUI around it, etc. The computer scientist is not a software developer anymore. The user is the one who knows the domain, who knows how to use the concepts. Software user and domain expert are the same stakeholder. The user is also the language user and software user. Example DSLs are made for simulations in sciences like Physics, Biology, Chemistry, etc. or DSLs to create specific forms of data like in geo-information systems, or gene-databases.
  • DSL based software engineering: Here we have three stakeholders. The computer scientist is language engineer. The domain expert is software developer and language user. The user is the software user. An example is the famous traffic light control language example.

Dienstag, 3. Juli 2007

From What We Code Complete

Code completion is an important part of modern development environments for textual languages. Over the past decades the proposals offered to the user during code completion have evolved from simple hippo completion (each word in the opened document is used as a proposal) to sophisticated proposals rooted in the used language semantics. But when we say code completion is based on language semantics, what semantics is suitable to provide code completion. For what language can we or can we not offer code completion?

The "Normal" Case, e.g. A Single Java File

In any case, and therefore in the normal case, we need more sophisticated model of the written text than the text itself. The edited text has to be parsed and transformed into a model that allows semantic analysis of the text. One possibility is to use a parse-tree and a library that allows to resolve variables, types, basically everything that can be referenced and therefore is possibly subject for code completion. Based on that a single edited file can be parsed and the information gathered can be used for code-completion proposals with in that file.

Using a Model That Spans Multiple Sources, e.g. A Java Project

The same technique as in the normal case, but we don't use just the model of one file. All the models of all the files organized in a project are combined. This does not only mean all the source files, but also names and types from library files or compiled object files, etc.

Using Context Information from External Models, e.g. OCL

Some languages, like OCL, do not only use the names and types that are defined using the language itselves. OCL for example is used to define expressions over models. It therefore references into these models using names from these models. Code completion in OCL often means to propose the names of properties and operations from external models.

Using Context Information from a Runtime System, e.g. Python

Some languages have only weak or no static types. This means that when you edit a program in such a language before the program runs, you don't exactly know what the types of your variables, parameters, etc. are. And with no information about the types, you don't know the properties, operations, or whatever features the values in your variables, etc. might provide. In such cases the development environment has to allow to edit the program at runtime. So you run your program and stop the execution at some point, then you start to edit your program. Now you have a completely different situation. Since you are at runtime, the environment can read the types from current variable, etc. values. From this runtime context an editor can provide code completion using runtime types.

Conclusion

Depending on the language semantics the sources for code completion proposals varies. This of course can make developing code completion less or more challenging. But you probably have to search really hard in order to find a language that would not allow any intelligent code-completion at all.

Donnerstag, 21. Juni 2007

Different Ways to Define the Operational Semantics of a Language Based on its Meta-Model

Since meta-modelling became a successful method to define abstract language syntax, many research groups started to define the operational semantics of language also based on a meta-model. A variety of methods have been developed. This article briefly explains three different categories of methods and asks the question of which of the three approaches is the best one.

1. Structural Operational Semantics and Meta-Modelling

What is the abstract structure of operational semantics definitions: Plotkin's [1] classical approach to operational semantics is based on state transition systems. All the possible configurations of a runnable system are defined through states. There are transitions between states; they define which configuration may be followed by which other configuration. A system behaviour is one path leading from a starting configuration, following transitions. A language semantics can then be defined with (1) a set of configurations/states (this includes all possible programs + runtime information like variable assignments, memory, etc.); (2) possible transitions between configurations; and (3) something that selects one (or more [non-determinism]) behaviours, where the initial configuration/state is given by a program or model written in the language.

How can operational semantics be defined based on a meta-model? Meta-modelling allows to define configurations/states. A meta-model is an object-oriented data model that describes a set of models consisting of a finite number of objects, links between object, and attributes of objects. Meta-models can be used to define the abstract syntax of a language (remember that the model or program is one part of a configuration/state) and it can also be used to define data-structures that hold runtime information (like variable assignments, memory, etc.).

2. Three Ways of Defining Transitions and Behaviours Based on a Meta-Model

Even though we know that configurations can be described by meta-models, there are still several ways to define possible transitions and system behaviours. We want to briefly introduce three different approaches.

2.1 Code-Generation

Based on the language concepts given in a meta-model, a code generation can be used to generate executable code from a model. This code can use a mixture of model and programming language variables/states to define a system configuration. The executed code describes state changes by changing the model and/or changing variables within the generated program.

2.2 Action Languages

Meta-modelling defines a set of predefined actions. These actions are things like: create a object, set an attribute, create a link, delete an object, etc. These actions describe very small, atomic changes in a model. Actions can be scheduled from various forms of action languages. Examples are UML activities or most imperative programming languages. Such languages use expressions over models to create decisions and parameters for the actions. Blocks written in those action languages can be linked to the meta-model on two modularisation levels. One possibility is that operations are implemented in an action languages. In this case one operation implementation can call other operations. The whole semantics descriptions feels and behaves like a normal object-oriented program. The other possibility is to provide a behaviour implementation for classes, where each object (class-instance) runs its behaviour once it got created/instantiated.

2.3 Model Transformations

Another approach is model transformation. Using a transformation language, one can define more complicated model transitions. We are here focusing on rule-based transformations, where each rule describes a transition between two models. Such a rule usually selects a part in model that shall be replaced (lhs) and a sub-model (rhs) that is used to replace the lhs. Model transformation allow more complex, user defined transitions between configurations/states.

For rule-based transformations also several ways of scheduling exists. For example, transformations can also be choreographed using action languages, where the execution of a transformation rule becomes an action. Transformation rules can also be grouped with logical operations: one rule can only be applied when another rule could already be applied, or several rules that can be fired are alternatives to each other, where one of the rules is either selected non-deterministically or via rule prioritisation.

3. What is the Best Way of Describing Operational Semantics?

Maybe we should first ask: is there a best way of describing operational semantics? All the methods that we just described, with all their variations that exist out there, have their own advocates, convinced of the advantages of their method. Also do all research groups have their own examples that seem to proof how efficient and elegant their methods are. But how can we really determine what the characteristics of a method really are and for what kind of languages each method is the most suitable?

We plan to take a genuine DSL, that was developed at our institute, and we want to create operational semantics for that languages with the three different approaches. The researched DSL describes stream based data processing. It already has an operational semantics based on scheme-code generation. During our studies on an agile language engineering process we will create an action language-based operational semantics with the method presented here. Furthermore, we will create a transformation-based operational semantics, using a QVT implementation.

We expect to see that different parts of the language will cause troubles when implemented with the three different methods. We hope that we can identify certain language concept characteristics that may favor the one or other method. Furthermore we expect the three solutions to have different qualities when it comes to size of the semantics description, execution performance, scalability, reusability, easyness of combining the language with other or with a concrete system platfrom, and so on and so on.


Dienstag, 19. Juni 2007

Problems for Textual Model Notations

Several frameworks and meta-tools to combine context-free text notations with meta-models exist. But the current research seems to be focused mainly on those syntactical structures that can be defined by context-free grammars solely. It is ignored that most languages, even though defined with context-free grammars, contain non context-free features. This is especially troublesome when you accept that meta-models describe certain non context-free features and therefore provide a different expressive power than grammars.

1. Textual notations are more than context free syntax


One could say that meta-models, in the sense of object-oriented data-models, e.g. class diagrams, were introduced to get a hold of the definition of graphical modelling languages. No matter if this reflects the "historical"-facts or not, meta-modelling allows abstract syntax definitions that are fundamentally different from definitions based on context-free grammars. Where such grammars basically define ordered-tree structures (better classified as term-like structures), meta-models describe graph-like structures.

Nowadays, with all the fancy meta-model based transformations, code-generation, model analysis, testing, integration, ..., and MDA tools available, we like to use meta-modelling for the definition for all kind of languages. This also includes traditional text-based languages. Therefore, meta-modelling has eventually entered the technological space of textual syntax, which makes dealing with the differences between grammars and meta-models an important research subject.

There two different approaches to meta-model based text languages. First, we still think of the language in terms of grammars and create a meta-model from a grammar. This way, we subsequently attach all meta-model benefits to a regular context-free grammar based language [xText, grammarware]. The second approach starts with the understanding that a language is best designed by concept, in a meta-modelling fashion. Here we create a textual notation for a formerly developed meta-model [TCS, TEF].

In both approaches promising results were achieved when it comes to reflect the expressive power of grammars. In other words, as long as we are concentrating on terms, creating models from text and text from models is implementable. This is usually based on some kind of mapping between grammars and meta-models.

As in traditional compiler design, syntax analysis (dealing with text based on a grammar) is followed by static semantic analysis (dealing with language features that exceed the power of context-free grammars). From the meta-modelling perspective this means two things. First, meta-models define references that allow them to describe graphs instead of trees. Second, model conditions, for example OCL constraint, that further narrow what valid meta-model instances are. Both aspects have corresponding concept s in the grammar world. The first one can be called name resolution. A resulting parse tree is augmented with attributes which connect nodes in a parse tree and basically turn this tree into a graph. The second are well-formedness rules that are based on the same principles than constraints in meta-modelling.

Consequently the next research question for both approaches (meta-models from grammars as well as textual notations for a meta-model) is how to integrate name-resolution and other static semantic checks.

2. Going beyond context free syntax

Our framework [TEF] allows to create eclipse-based text editors from a meta-model. Each of these editors continuesly parses the user input and creates a meta-model conform model from it. This part of TEF is based on work in [TCS, grammarware] and basically covers everything that is context-free. And now is the question how we incorporate the non context-free parts.

We divided the text reconciliation process into three phases. The first one is parsing, creating a model that reflects the tree structures of the text. The second phase is name-resolution. It transforms the model into a graph structure. In the third phase we check all constraints on the model. In all phases we report errors in the model back to the user: eclipse-like as a red underline with hover-message and ruler marking. Interesting for us are now the name-resolution and constraint checking phases.

2.1 Name resolution

At the moment TEF allows to provide the Java implementation of a name resolution algorithm. This implementation has to work on the following parameters : a model that does not yet contain references; a model element that describes the name that is to be resolved (a single name or a more complex identifier), and the surrounding context element. Name resolution has to return the element that is referenced by the name. For the future we try to replace this Java implementations by OCL expressions. This allows to describe name resolution on a higher level of abstraction. The challenge is to provide a technique thats is hopefully independent from the actual method of name resolution. Such methods could be symbol tables, local search, inside-out, recursive decent, etc.

2.2 Code-Completion

Closely related to name resolution is code-completion. Each name (reference, identifier, etc.) that the user may write in an editor is potentially subject to code-completion. Thus, code-completion refers to completing a name. Code-completion therefore means to provide a set of valid names/references in a given context. There are two problems to be solved.

First, from the context it is not clear of what kind the reference is to be completed. For example, think of Java: when you start to type an identifier in an expression, it is not clear if it becomes a field name or an operation name. This can be solved by parsing to the point of code-completion and then analyse the current parse stack to retrieve the possible reductions. These reduction possibilities reflect the possible kinds of elements that are requested.

The second problem is that you usually cannot successfully parse the document when the user requests code-assist, because the user is just in the middle of typing the document. Since you cannot parse the document, you cannot provide the necessary context information. You have to implement some sort of error recovery to at least generate a partial model from the text. This partial model has to serve as the source for possible name declarations. In other words, error recovery becomes an important requirement.

2.4 Syntactic sugar in the context of name resolution

Very often a language syntax provides different notations for the same model elements. Take Java again: a member variable "foo" can be referenced by "foo" from within the body of a member method, or it can be referenced as a field of the "this" variable using "this.foo". Where the more implicit notation "foo" has a concrete definition in the context-free grammar, no such thing exists in the meta-model. Therefore at some point, the implicit notation "foo" has to be resolved or replaced by its actual meaning "this.foo". This is especially troublesome because the string "foo" could also refer to a local variable. Whether "foo" means "this.foo" or just "foo" is a matter of static semantics. Currently TEF allows to define several meta-model bindings for the same syntactical constructs. Which binding is finally chosen depends on name resolution. If "foo" can be successfully resolved to a local variable name it becomes a reference to this local variable. If not, it is assumed that "foo" actually means "this.foo" and it is tried to find the name "foo" among the members of "this".

2.3 Constraint checking

This is rather straight forward. The model resulting from parsing and name resolution is checked based on all meta-model invariants and report an error when an invariant is violated.

3. Conclusion

Context-free grammars describe trees, meta-model graphs. Therefore, name-resolution is a trouble-some but very important part of creating a model from text. It is a necessity for both context-free syntax to meta-model approaches. In combination with semantic text editors it requires error recovery to provide reasonable code-completion.

Montag, 11. Juni 2007

Easy Language Interpreters

In a recently published tutorial, we demonstrate a new way of defining an operational semantics and therefore automatically create an interpreter for meta-model based languages. The tutorial models a composite state automaton language including a simulator for such automatons. This article provides a motivation and overview about the presented method.

How complicated has it to be to create an interpreter for a language? An interpreter simply realizes the operational semantics of a language and allows to execute models or programs in that language. There are a lot of methods tailored for a mathematical precise semantics definition, and some of them, for example ASMs, even allow to automatically derive an interpreter from a semantics description. But in the end these methods are suited for language understanding. They can, for example, be used to proof properties like static safety. But methods like this are usually to cumbersome to be efficient. Hence, they are debilitating the design process of a language. When you want to try new language features, test for the user acceptance of a new behavior, or simply need a language tool fast, formal methods are a burden and not help.

We propose a technique that allows you to derive a language interpreter quickly: without learning a new formalism or new methodology. We simply take existing modeling techniques and combine them towards a new application: defining the operational semantics of a language. We start with regular meta-modeling, for example with MOF. We use A MOF 2.0 for Java to do this. When you have a meta-model for a language, you can create operations within this meta-model. These operations declaration will act as the interface to the language behavior. The implementations of operations, can modify a model. Operation implementations can apply simple actions like creating an element, set a value to an attribute, etc. When executed, each operation therefore leads to a sequence of model changes. Of course, operations can call other operations and one operation as to act as the main operation.

Viewed from this operations and implementations perspective, a model is simply an object-oriented program. From the semantics perspective we have defined a meta-model for possible system states (the MOF meta-model) and the operations and implementations define a sequence of transitions (model changes lead to a sequence of models). This is analog to the classical structural operational semantics definition as introduced by Plotkin in the early 80s.

How can you implement the operations in a meta-model. You can, of course, simply provide Java implementations. Ifyou want to use a more abstract behavior description, we provide another method based on UML Activities and OCL. We propose an UML Activitiy like language that uses predefined actions (create, set, delete, call another operation, etc.) as atomic activities. Control flow in this language is created with OCL expressions, and the actions can be parameterized with OCL expressions. The activity language can deal with operation parameters and return values.

With both implementation methods, Java and activities, you have a choice. Activities provide the higher abstraction level and are more readable. Java implementations perform better when executed and allow to use other Java APIs. With Java implementations you are not restricted to model actions. You can therefore create semantics that are connected to underlying platform. This makes it easier to integrate your language and models with your target platform. You can also mix the use of activities and Java and implement one operation with Java and another with activities.

What do you gain compared to other methods. Traditional mathematical methods are to cumbersome to be effective in the language design process. Other meta-model based approaches require you to learn a new action language. We only use existing languages and methodologies. If you know meta-modeling, object-orientation, activities, OCL, and Java, you can immediately start to create language and interpreter. The use of meta-modeling, especially with the CMOF model, allows very flexible and reusable language definitions. It even allows a pattern approach for languages. The system state is a model (instance of your meta-model) and you can apply other modeling techniques to the system state: invariants and conditions, model transformations, model based tests, or XMI based persistence. We provide a pragmatic way to model operational semantics and create language interpreters. Its an ideal complement to technologies language GMF, openArchitectureWare, and other language workbenches.

If you want to learn more, try our tutorial and this paper on our method.

Mittwoch, 30. Mai 2007

Agile Language Engineering

This is about the idea to apply agile methods to language engineering. We discuss the agile principle, why it is interesting for language engineering, what is needed to actually realise an agile language engineering process, and what can already be done with the technology at hand.

The agile method is popular in general software engineering. It promises better products, more efficient development, and above all more fun for all participants. Agility basically means to use a lightweight, very adaptive, and dynamic development process that: (1) concentrates on developers and customers, their needs above fix processes; (2) short development cycles that concentrate on the most important changes and maintain a software product that is testable and executable at all times; (3) use a single software description, which is usually the program code, as the only artifact; (4) include the customer into the process, have him use the software extensively during development, and embrace whatever feedback he/she can provide. More about the agile principle and agile methods can be learned from the agile manifesto.

How can we apply this general philosophy for (software) engineering to the engineering of languages; and why would we like to do so? The why first: domain specific languages. These are languages that provide functionality to domain experts in an outfit that is suitable to them: this basically means to provide concepts and notations as they are already used within the domain. The problem with these languages is that they are only used by a small group of people, these language have a short life-cycle, are hard to understand by the computer scientist that design them, and are subject to regular changes. All these problems are taken care of by agile methods: rigorous user/developer exchange, embrace change, short development cycles with fast results.


When you accept that language engineering could benefit from agile methods, you will ask what do I need to conduct agile language engineering? Lets look at a development cycle, typical for an agile process. The whole process usually starts with a user story: he/she needs a whole new language. You sit down with the user and scribble a few example models, which we call reference models. Now, you're going to implement the language, meaning that you create language tools (editors, simulators, code-generators, etc.; depending on the language). You do implementing until you reach the point where all the reference models can be successfully handled by your tools. You can always test your tools using the reference models. Now you have a working version of the language and can give it to the user. He/she will try the language, while you start refactoring what you have build. After a while the user comes back to you with a new story, requesting a new feature or to change an existing language feature. The whole cycle starts again.

All this sounds like a normal agile process, what is so language specific about it? We identified three technologies that are obviously necessary to realise this process. You need these reference models as a specific form of testing. These reference models are the base for your automated language tests. We would like to know whether the reference models cover the whole language. We summarize these research aspects under the name language testing. The next thing is that we need ways to develop language tools efficient. We propose the term language modelling which uses (meta-)models at a high-level of abstraction to describe certain language aspects rather than implementing language tools manually. The meta bubble is full of examples: OCL can be used for static language semantics, our own MOF Action Semantics can be used to define operational semantics, GMF or TEF can be used to define graphical or textual editors, and so on and so on. These technologies allow you to create language tools fast. In agile language engineering the language often changes. User come up with new ideas or we have to refactor our language. We need a third technology to change meta-models in such away that we can co-evolve all the models and other descriptions that are based on the meta-model. This research aspects is known as meta-model adaptation and model co-adaptation. With all these technologies available and aligned to each other, one could actually exercise agile language development.

We are currently trying to align all our tools to realise an example language project based on the agile method. We have meta-modelling, OCL, and MOF Action Semantics to effecient develop a language with operational semantics. A college of mine, is currently realising a meta-model refactoring and co-adaptation tools suit based on the eclipse refactoring framework. Another college is looking into the reference model test issue. Theoretically, we have everything needed for a first experiment on a toy language. I hope to present some results of this experiment soon.

Freitag, 25. Mai 2007

Welcome to the Meta Bubble

I will use the meta-bubble as platform to inform you about news, changes, insight thoughts, and discussions that rank arround my current projects. Of course I try to promote my work on other media media as well: like on web-sites, open source project sites, eclipse update sites, and in the futures they might by even more sites. But, non of these allow you to instantly get informed about new features, documentations, tutorials, or simply about new ideas that get into the projects. And of course, the meta bubble does not only inform, it also allows you to participate, to put your thoughts and ideas into the projects or simply discuss meta-issues. So, from know on I like to call them our projects.

We only want to give a very short introduction to our projects here; there is much more information at our main web-site. We have several projects that will help you to model languages. As you probably know, computer languages can be described with computer models. The best known technique might be meta-modelling with languages like MOF or EMF's ECore., and, as you already know, these meta-language are very good at describe language structures; they are used to model the abstract syntax of a language. This already allows you to handle your models, programm with them, or store them as files. But still, there a several other language aspects to cover:
Depending on your language, you want to have editors, statical analysers, compilers/code-generators, or interpreters/simulators. We think that each of these aspects can be described using an owned specific meta-languags.

We developed A MOF 2 for Java (AMOF2 or AMOF), a CMOF (MOF2) based model programming environment. AMOF also has comprehensive OCL facilities, using the OSLO OCL interpreter for invariants and implementations for derived attributes and query operations. AMOF supports user defined implementations of operations and derived attributes in several languages. We already mentioned OCL, but you can also use Java, or a specific kind of UML activities that we developed for the modelling of operational semantics.

This is actually our next project, called MOF Action Semantics (MAS) . Here we use meta-models to describe the abstract syntax and runtime structures of an executable language. These structures can be augmented with behaviour, which leads to immediate executeability of your models. We are about to extend this project with an "all-you-need" eclipse support that you allows to define your language, execute your models based on your language, and debug your operational language semantics.

The Textual Editor Framework (TEF) is an eclipse based programming framework to create semantic richt text editors for your EMF models. It allows you to define an arbitrary concrete syntax for your existing ECore meta-models. Your editor can be supplemented with syntax-highlighting, code completion, error annotations, etc. We are currently working on an comprehensive OCL editor, which can be used for any EMF based model, and also comes in handy for most of our other projects heavily depending on OCL.

I hope you got a good first view on what we are doing here. More details can be found at our sites: meta-tools, berlios-project site for AMOF, berlios-project site for TEF, my home page. And of course, I hope to give further inside with each additional post in the meta bubble. So, you better stay tuned and subscribe to this blog.