OpenMRS Architecture Proposal: Modules & Bundles

This is an architecture proposal for OpenMRS. I advocate defining a single, universal configuration mechanism. This will encourage better software practices, make OpenMRS easier to use, and strengthen the community's ability to support it.

The Configuration Problem

At present, OpenMRS has a modular structure. A module is just a bunch of code that somehow ties into the OpenMRS core. Modules often will

  1. Make some user-facing UI changes in a package called an .omod
  2. Include some application logic in the java package, whose public classes constitute a programmatic API
  3. Introduce some UI in the admin panel through which it is configurable
  4. Pull in config files from the Application Data Directory
  5. Set and read Settings (formerly Global Properties) for some configuration

This suggests that there are four ways to configure an OpenMRS module:

  1. By creating another module that uses that module's API
  2. Putting configuration files in the Application Data Directory
  3. Using the admin UI extension specific to that module
  4. Changing the values of Settings

There is broad consensus that approach (3), using the admin UI, is a bad idea for most modules. Configurations produced this way exist only in data, in MySQL tables, and are not easily reproducible.

Approach (1), which we use for most things at Partners In Health, works pretty well, modulo difficulties keeping metadata in MySQL in sync with what the code specifies. However, it requires experienced software engineers to implement. Becoming reliant on this approach would be incompatible with OpenMRS being the EMR technology of choice for low-resource settings.

Approach (4) is on the right track, but isn't sufficiently powerful to address most configuration needs. It doesn't provide a natural way to specify an array, for example; let alone more structured data.

This leaves approach (2). Versions of this are used by Mekom and Bahmni, with what seems like much success and satisfaction. This is what I'd like to elaborate on.

Everything is a Module

Many modules provide software features. HTML Form Entry provides form functionality and ways of specifying forms. AddressHierarchy provides a structure for addresses, and ways of providing address metadata.

There are other modules, such as AddressHierarchyRawanda, KenyaCore (I presume), and mirebalais which provide some amount of in-code configuration. This is generally mixed in with software features as well.

Parsing Things Out

I'd like to propose, then, that each of these concepts have their own representation in software. Modules, more or less as we presently know them, should be responsible for the former. The latter should take place entirely in config files, gathered by user concern into bundles. The point at which these concepts intersect is the configuration. The centerpiece of this proposal, then, is a canonical config library.

Config Library

Modules should specify the configurations they expect, their config schema, using a declarative API provided by the config library. See the appendix for an example.

Config specification should take place in a special Java class which, like the Activator class, would be specified in config.xml. We'll call it the config specification class. If possible, the config specification files should all be executed prior to running the activator for any module. This would allow for the application to fail fast on an invalid configuration, and for configuration to be completely specified by the time modules start loading.

The config schemata specified by modules can then be met by suitable config files. The config library should at minimum be able to interpret YAML files according to the specified schemata. It should be designed for first-class support of YAML, and take advantage of YAML language features. It would be good for it to also be able to interpret CSV files that are formatted according to its expectations. See the appendix for an example.

I advocate YAML for its brevity, power, and reverse-compatibility with JSON. Its manageability and readability scales much better than JSON. It is more powerful than JSON, and much more powerful than CSV. It supports comments.

All configuration files should be kept in a fixed location relative to the distribution POM, called the Config Directory. There should be no user-editable content in the Application Data directory.

Configuration should be kept in an in-memory config store rather than in the database. The config library should provide an API for modules to access configuration elements.

This library would also provide read-only access of arbitrary data in the Config Directory, in order to e.g. parse HTML Forms and metadata.

Defining Configuration

I want to differentiate here configuration from metadata. Metadata should not be kept in memory, configuration should.

Concepts are metadata. I think many of us hope that OCL for OpenMRS will be the way to manage concepts in the future.

Drugs are metadata. AddressHierarchy entries are probably metadata too. There are probably other things that are metadata. The important point is that some discretion is required in identifying what we mean by configuration. Basically, if it's reasonable to expect there to be hundreds of entries or more, it's probably metadata.

Non-concept metadata should be provided in files of whatever format is most suitable in subdirectories (for per-module namespacing) of the Config Directory.

Modules

In technical terms, modules should

  1. Add software features and/or user-facing UI
  2. Expose a Java API of public classes
  3. Use the config library to define a config schema, to be satisfied by files in the Config Directory
  4. If absolutely necessary, use other types of files for things that would be unwieldy to represent with a config API, such as metadata. AddressHierarchy entries are a good example. XML for HTML Forms might be a good non-metadata example.

Modules should not be specific to any particular geography or disease.

Global Properties / Settings should be done away with. Instead, when defining a primitive-valued YAML configuration element, the programmer should be able to pass a parameter runtimeMutable. If set to true, the configuration element appears on a screen in the Admin UI where its value can be edited.

Bundles

A bundle is a configuration file, or bundle thereof, which configures some set of modules for the needs of a particular geography or disease.

For example, a bundle for diabetes might introduce intake and follow-up forms for HTML Form Entry Module, some lab tests on the Simple Lab Entry Module, and a program with a dashboard. A verbundletical for Mexico might include a handful of additional fields for the registration form via the Registration Module, some government-required forms specified for HTML Form Entry Module, and some government-required reports specified for the Reporting Module.

Since these are each expressed in plain YAML or CSV, a non-programmer user can easily remove or customize any of these components. If configuration exists for a module that is not installed, OpenMRS should ignore that piece of configuration, and the administrator should see a note informing them of such in the Admin UI.

It would be possible to split configurations into separate files per bundle, or per module, or both. OpenMRS would resolve all the configurations it finds into one config tree. Conflicts should cause the application to fail fast with a helpful error message.

CSV configurations would need one file per top-level configuration object. See example below.

There should be a community ecosystem of bundles like the existing one for modules. The clear division of modules from bundles would make sharing configuration much more feasible and fruitful.

Benefits

There are a handful of obvious benefits from all this:

  • Predictability and consistency in module configuration
  • No module-specific Admin UIs to maintain
  • Shareable, reconfigurable configurations
  • Powerful configurability available to non-programmers
  • Version-controllable configuration

The most important one, however, is really...

Fewer, Better Modules

At the OpenMRS Conference people were saying "we need to encourage orgs to make what they've built reusable for others," when what we should be saying is "we need to encourage orgs to build things that are reusable for others."

We can do that by drawing a clear line between software features and configuration. Doing so discourages people from building software features in site-specific ways. This encourages community support for making existing modules better and more configurable.

Not only does this make the OpenMRS ecosystem much tighter and more robust, it encourages better software practices across the board. Reusability and isolation of concerns are generally valuable design principles that this would promote.

Concerns

Compiler Error Checking

Configurations specified in code get checked for type correctness by the Java compiler. If a constant (like a location tag name) is misspelled or hasn't been defined, the Java compiler will complain about it.

I contend that we shouldn't have to recompile to configure. The application should check the validity of configuration as the first order of business.

Location Tags and User Roles should be specified in config files. They should be defined explicitly. Permissions rules would be written referencing them, using a minimal logic domain-specific language (DSL).

I'd be interested to hear if there are things other than permissions that people use Location Tags or User Roles for!

UUIDs

One of the more unpleasant aspects of managing metadata in a reproducible way, whether in code or in config files, is the need to specify UUIDs for basically everything. While code-based configuration also suffers from this problem, I would like to address it in brief:

We shouldn't be specifying and copy-pasting UUIDs around, we should be specifying namespace-unique human-readable keys that OpenMRS namespaces and hashes into UUIDs behind the scenes.

There are many types of human-readable keys already in use, we're just not using them in lieu of UUIDs, as we should be.

Autocomplete

Another benefit of configuring in code is that constants, such as the aforementioned human-readable keys, can be autocompleted by an IDE. This is a clear win for in-code configuration, if you're using an IDE with syntax-smart autocomplete.

One solution is to use syntax-dumb autocomplete, i.e., autocomplete which learns every token you put into a file or project, regardless of context, and suggests those tokens when you start typing something similar.

Most IDEs do this out of the box.

I hope that the improvement in iteration time achieved by not having to recompile to reconfigure would compensate for the increase in errors.

The Path Forward

I would love to see full core support for this in OpenMRS 3.0. This means

  • The introduction of a Config Directory
  • The creation of the config library, which
    • Provides an API to define a configuration
    • Parses and validates config files
    • Provides an API for accessing configuration
    • Provides an API for accessing other data in the Config Directory
  • The introduction of a config specification class parallel to the module activator, similarly specified in config.xml
  • The deprecation of Settings and module-specific Admin UI (users of the relevant APIs should see warnings when not in production)

And then, maybe, an OpenMRS 4.0 with a fully YAML- and CSV- configurable Reference App.

What we can start doing immediately is building our modules according to the principles defined here. Create modules that are general and configurable via config files in Application Data. Migrate away from in-code configuration. Migrate away from Admin UI configuration. Migrate away from Settings.

Feedback Welcome!

Please participate in the OpenMRS Talk thread! You're also welcome to email me your feedback at bistenes@pih.org, or message me on OpenMRS talk at @bistenes.

This is a draft, a work in progress. I hope that it looks like it’s on the right track to a lot of the community, and that it can evolve with your feedback to be something with enough force and buy-in behind it to make it happen.




Appendix: Example Code

Here's a hypothetical configuration for patient relationships, with some creative license taken for illustrative purposes.

// Config.java

import org.openmrs.config.ConfigSchema
import org.openmrs.config.ConfigSpecification
import org.openmrs.config.types.ConfigArray
import org.openmrs.config.types.ConfigEnum
import org.openmrs.config.types.ConfigObject

public class Config implements ConfigSpecification {

    public ConfigSchema define() {

        // define a relationship config object
        ConfigObject relationshipConf = new ConfigObject();
        relationshipConf.addKey(required=True, auto=False)
        relationshipConf.addStringField('aToB', runtimeMutable=True, required=True);
        relationshipConf.addStringField('bToA', runtimeMutable=True, required=True);
        relationshipConf.addStringField('description', runtimeMutable=True);
        relationshipConf.addBooleanField('allowSelf', runtimeMutable=True, default=True);

        ConfigEnum personTypes = new ConfigEnum('patient', 'provider', 'nonProviderUser');
        ConfigArray allowedPersonATypes = relationshipConf.addArray('allowedPersonATypes',
personTypes);
        ConfigArray allowedPersonBTypes = relationshipConf.addArray('allowedPersonBTypes',
personTypes);

        // tell config module to expect a top-level array of relationshipConfs
        ConfigArray relationships = new ConfigArray(relationshipConf);
        ConfigSchema schema = new ConfigSchema('relationships', relationships);
        return schema;

    }
}

This might be satisfied by some YAML that looks like

relationships:
  - key: DOC_PT
    aToB: doctor
    bToA: patient 
    allowedPersonATypes:
      - provider
    allowedPersonBTypes:
      - patient
      - nonProvideruser
  - key: SIBLINGS
    aToB: sibling
    bToA: sibling

It could also be satisfied with a CSV called relationships.csv that looks like

key,        aToB,    bToA,      allowedPersonATypes, allowedPersonBTypes
DOC_TO_PT,  doctor,  patient,   provider,            "patient; nonProviderUser"
SIBLINGS,   sibling, sibling,,