Reef Reference Documentation

Information contained in this document is subject to change without notice. Complying with all applicable copyright laws is the responsibility of the user. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without express written permission from Green Energy Corp.

?

Document Control #: 02.004.000.000.000

Release Date: 20111220


1. Architecture
1.1. Introduction
1.2. SOA
1.3. AMQP
1.4. Protobuf
1.5. REST
2. System Concepts
2.1. Applications and Users
2.2. Operational Data
2.2.1. Measurements
2.2.2. Commands
2.2.3. Events
2.2.4. Alarms
2.3. Model
2.3.1. Point Model
2.3.2. Command Model
2.3.3. System Model
2.4. Services
2.5. Service Events and Subscriptions
3. Subsystems
3.1. Modeling in Reef
3.1.1. Equipment Model
3.1.2. Communication Model
3.1.3. Aggregate Model
3.2. Measurement Subsystem
3.2.1. Measurement Stream
3.2.2. Processing
3.2.3. Measurement Publishing
3.3. Commands Subsystem
3.3.1. Selecting a Command
3.3.2. Command Feedback
3.3.3. Command Flow
3.4. Event and Alarms Subsystem
3.4.1. Event Generation
3.4.2. Event and Alarm Configuration
3.4.3. Alarm Lifecycle

Chapter?1.?Architecture

1.1.?Introduction

Reef is a horizontally-scalable automation platform factored to a Service-Oriented Architecture (SOA). Reef is designed to handle the demanding challenges of the Smart Grid and offers the following advantages over traditional utility operations platforms.

  • Clustering - Any node can assume the responsibilities of a failed node to increase fault tolerance.

  • Federation - Reef can be scaled out on commodity hardware to increase the size of your system.

  • Platform/Language Independent - Write networked applications for Reef in your favorite language.

  • Hot upgrades - Reef nodes can be upgraded with minimal impact to operational systems by leveraging OSGi.

1.2.?SOA

1. ? A design methodology that, when properly applied, creates a loosely-coupled system by decomposing functionality in terms of abstract, reusable services. ?

2. ? A ubiquitous, but often misused buzzword. ?

SOA means different things to different people. Before discussing the approach to SOA used by Reef, it's useful to delineate what SOA is not in the context of Reef:

  • SOA is NOT Web Services, WSDL, SOAP, or even XML

  • SOA does NOT require a registry

  • SOA is NOT synonymous with Enterprise Service Bus (ESB)

Reef takes the most flexible aspects of SOA and leaves the heavier technologies to a dedicated ESB. Three technologies are employed to ensure Reef stays scalable and language/platform independent: AMQP, Protobuf, and REST.

1.3.?AMQP

Advanced Messaging Queuing Protocol (AMQP) is a wire-level standard for enterprise messaging. It was born out of frustration with proprietary standards and the failure of JMS to provide true interoperability. Reef uses the Qpid broker, and clients are available in a number of different languages.

1.4.?Protobuf

Protocol Buffers (PB's) is a technology for encoding data in a compact, yet extensible way. It was open sourced by Google and is used there for internal RPC protocols and file formats. Reef uses protocol buffers to define service resources and the REST framework. PB's use a lightweight schema format that loosely resembles C structs:

Example?1.1.?Reef measurement quality as a Protobuf schema

// mirror the iec61850 quality (CIM uses these too)
message Quality {
        enum Validity {
            // No abnormal condition of the acquisition function or
            // the information source is detected
                GOOD = 0;

                // Abnormal condition ""
                INVALID = 1;

                // supervision function detects abnormal behavior,
                // however value could still be valid.
                // Up to client how to interpret.
                QUESTIONABLE = 2;
        }

        enum Source {
            // value is provided by an input function from
            // the process I/O or calculated by application
                PROCESS = 0;

            // value is provided by input on an operator
            // or by an automatic source
                SUBSTITUTED = 1;
        }

        optional Validity validity = 1 [default = GOOD];
        optional Source source = 2 [default = PROCESS];
        optional DetailQual detail_qual = 3;

        // classifies a value as a test value, not to be used for operational purposes
        optional bool test = 4 [default = false];

        // further update of the value has been blocked by an
        // operator. if set, DetailQual::oldData should be set
        optional bool operator_blocked = 5 [default = false];
}
                


The example schema above highlights another important feature of Reef. Reef leverages applicable external standards to define operational entities like measurements, quality, and controls.

1.5.?REST

Representational State Transfer (REST) is an architectural style for distributed systems. REST is commonly thought of as a "nouns and verbs" alternative to Remote Procedure Call (RPC), but it is actually a proper subset of RPC that has some interesting properties. Factoring a distributed system in a RESTful style has the following advantages (which can also be thought of as constraints) as applicable to Reef:

  • Uniform interface - Service consumers/providers communicate through a uniform interface (i.e. GET, PUT, POST, DELETE) which decouples client server development.

  • Opaque transport - Clients are unaware of whether they are directly connected to the end service or a proxy. Proxies can be used for things like load balancing or to enforce security.

  • Stateless - Every request contains all of the information necessary to satisfy the request. Any server-side state must be addressable in some way.

Chapter?2.?System Concepts

2.1.?Applications and Users

There are many sorts of applications that can be integrated with Reef.

  • Human Machine Interfaces (HMI) - Applications that provide an interface for a human to interact with some portion of the system, usually operational data.

  • Bridges - To communicate with other systems there may be a number of custom or standardized bridges using domain specific interfaces.

  • Field Protocol Adapters - Adapters to communicate with field equipment and get real-world data into Reef.

  • Calculation and Analysis Tools - Once data is in Reef, domain specific tools can be created to perform calculations or do advanced analysis.

  • Automated Control - Using the data in Reef advanced applications can be created to perform automated control or optimization tasks.

In Reef, an Agent is the name we give to any actor in the system who uses the exposed Services. Human operators and administrators are considered Agents and so are all applications. In Reef there is no distinction made between an application and a human operator. When an application acts as a proxy for another Agent (such as an HMI) it will be considered the same as that Agent doing the operation directly.

Agents will identify themselves to the system using a name and a password before interacting with the Services. They will be assigned an authorization token (AuthToken) that is sent with all future requests to the Reef system.

2.2.?Operational Data

Operational data forms the "live" content of the system. Its purpose is to:

  • Maintain an accurate view of the state of field equipment.

  • Facilitate remote interaction with field equipment.

  • Provide for the ability to monitor changes in the state of the system.

The four central operational object types are measurements, commands, events, and alarms.

Figure?2.1.?Operational Data


2.2.1.?Measurements

Measurements are the basic units of data published by data sources and processed and monitored by the system. Often, measurements are acquired using communications protocols, and are used to represent the state of remote field devices. Measurements may also be generated or manually entered by agents.

Measurements carry the following data fields:

  • Value can be of type boolean (status, binary input), integer or floating point (analog, analog input), or text string. This is the central data payload of the measurement.

  • Quality gives agents extra information about the state of the measurement. Quality may indicate problems with end devices, communications issues, or operator influence ("not in service").

  • Time identifies when the measurement took place, or when it was acquired by the communications system.

  • Unit gives the measurement value real-world meaning. Examples are "volt", "psi", etc. Additionally, unit carries an implicit definition of scale.

Measurements enter the system either from communication front-ends or directly using the Reef APIs. The measurement stream is then processed -- configured transformations are applied (scaling, value mapping) and side-effects are triggered (event/alarm generation) . Finally, measurements are stored in the database and published to the bus.

Many system functions are concerned with maintaining the most recent value for any given measurement in order to provide an up-to-date representation of system devices and components. The system also maintains measurement history, the chronological stream of previous measurement values.

2.2.2.?Commands

Commands are the indications agents use to interact with and modify the state of the system. Commonly, commands are tunneled by the communications processors to remote field devices in order to exercise control over their functions. Whereas measurements constitute the flow of data from the field, commands form the information moving outwards.

Commands may or may not contain an associated value. In the field communications world, commands with values are usually referred to as setpoints, and may refer to a target value the system or end device is intended to reach. Values are therefore used when a simple imperative cannot convey the proper message, such as "set temperature to 65 degrees celsius".

When multiple agents may be using the system concurrently, simultaneously making modifications to the same subsystems can lead to undesired results and indeterminate behavior. Furthermore, it is frequently the case that some field devices need to be declared "off limits" for safety or maintenance reasons. The following objects are used to regulate access to commands:

  • Selects are acquired by agents to grant exclusive access to a command or set of commands.

  • Blocks are used to prevent any agent from accessing a command or set of commands.

2.2.3.?Events

Events are objects that record a meaningful occurrence or change of state in the system. They are used both to monitor the system in real-time and to provide an audit log of system history.

Events are configured with the parameters type and severity, and contain context information such as the originating subsystem and associated agent. The "message" of the event may contain a further description, and contain any relevant data attributes.

Ultimately, the definition of events, as well as the conditions under which they are triggered, are highly configurable. Events, as a whole, are a tool system designers use to characterize system behavior and to provide agents and administrators necessary information.

While most measurement updates will not cause events (due to their high volume and redundancy), updates that fall outside reasonable or nominal values may constitute a system-notable change and trigger an event.

Many or most commands issued by agents will qualify as events. Other events may not be tied to operational data, but instead will record system activity such as application errors and agent authentication actions.

2.2.4.?Alarms

Alarms are a refinement of events which identify system occurrences that require operator intervention. Alarm objects are tied closely to event objects. All alarms are associated with events, but not all events cause alarms.

In contrast to events, alarms have persistent state. The three principal alarm states are unacknowledged, acknowledged, and removed. The transitions between these states constitute the alarm lifecycle, and manipulation of the states involves workflow. Transitions in alarm state may themselves be events, as they are part of the record of agent operations.

Figure?2.2.?Relationship between Alarm and Event


During the configuration process, the system designer decides what events trigger alarms. The primary consumers of alarms are operators tasked with monitoring the system in real-time and responding to abnormal conditions.

2.3.?Model

Figure?2.3.?Relationship between model and remote environment


2.3.1.?Point Model

Points are model objects configured to match data inputs in the remote environment. They are the static, long-lived complement to measurements. Points can be thought of as the "mailbox" where all past and current measurements for the real world value are sent and stored. If we want to know the current value we open the mailbox and pull out the most recent measurements. If we want to look at the history of a Point we open the mailbox and pull out all the measurements received in the appropriate time frame. The sequence of measurements in time is referred to as the "measurement stream".

For agents, points are a higher-level interface to the system data that hides the details of communications protocols and remote device configurations. Measurement producers and measurement consumers therefore use points as a shared context.

Figure?2.4.?Points and Measurements


The primary attribute of points is their name, which partly forms their unique identification in the system. Other attributes may reflect the point's conceptual state. For instance, a point may be "abnormal", although this is really a reflection of the last or most current measurement value. A point might be said to be "disconnected", although this is actually a property of the communication endpoint associated with it.

2.3.2.?Command Model

Like points, command model objects are configured to match data outputs in the remote environment. For agents, command requests are the live objects that refer to the command model. The communications front-end is responsible for interpreting command requests and forwarding them to devices. The command model also serves as the common reference point for command access control system; select and block requests are aimed at the same "command".

Figure?2.5.?Command Model


2.3.3.?System Model

Points and commands provide a useful abstraction for the details of field device configurations and communications protocols. Alone, however, a simple listing of inputs and outputs does not provide sufficiently for agents to manage and interact with the system.

The system model adds structure and context to the simple point and command model. The underlying data can then be referred to in a way that both mirrors the actual remote environment as well as matching more closely a human-understandable conceptual view of the system.

Figure?2.6.?Model Overlaid on Points and Commands


Model objects are connected by relationships, which allow system designers to capture the "shape" of the system. A set of model relationships might capture any or all of the following:

  • Hierarchical organization, by "ownership"

  • Connectivity (electrical or otherwise) between equipment and devices

  • Locational or spatial information

  • Mappings between logical equipment and communication infrastructure

  • Associations between data inputs and data outputs

Figure?2.7.?Model Relationships


An important property of some relationships is that they are transitive. This means that if a device owns a point and a device group owns the device, the device group also owns the point. Further, it means that an event or an alarm that is "for" a point is also "for" the device and the device group. It therefore becomes possible to talk about "the events for a device" when many or most of the events are in fact directly related to the points themselves.

Figure?2.8.?Inheriting Associations


Models are self-describing and inspectable. This has major benefits:

  • The system model reduces and simplifies integration. Without a powerful model, configurations (lists of points and commands) for all subsystems and applications must be matched and synchronized, often manually by administrators. A common model reduces inconsistencies between different parts of the system, and ultimately responsibility for system integration is shifted from data management to ensuring all system participants have a shared understanding of the model.

  • The inspectability and consistency of the system model facilitates the development of advanced application logic. Partly, this is because, as above, application integration is simplified. Beyond that, however, the model renders the system traversable, so that algorithms can be geared towards logical concepts and the application can then use the model to "find" the necessary low-level data.

2.4.?Services

Services are the fundamental organizing concept of the interface to the system. Agents consume information from, and operate on, the system by sending service requests and receiving service responses.

Figure?2.9.?Services


2.5.?Service Events and Subscriptions

Standard service request-reply provides a simple, client-initiated way of interacting with the system. Service events, on the other hand, provide an event-driven method of observing changes in system state. By subscribing to service events, agents can receive notifications of changes as they occur.

The measurement stream is an example of when an application might want to subscribe to service events. Measurements are constantly "flowing", and repeatedly polling for changes can be unwieldy and wasteful. Instead, clients can create subscriptions to receive measurements as they arrive:

Figure?2.10.?Measurements Subscriptions


An important feature of event subscriptions is that they can be constrained to only refer to a specific set of objects. For instance, an application might want updates for only one of several points:

Figure?2.11.?Subscribe by Point


A central tenet of RESTful systems is that they involve manipulation of resources. The objects of the system (operational, model, and otherwise) can be thought of as those resources, and service events track their creation, modification, and removal. When service events aren't externally generated, as with measurements, they are often themselves the result of service requests.

Figure?2.12.?Service Events


Chapter?3.?Subsystems

3.1.?Modeling in Reef

The data acquisition and control portion of Reef is primarily concerned with providing operators and applications a view of the current state of the real world system and the ability to issue commands affecting the system. To understand the how and why of the parts of the Reef system it is first important to understand how a very simple real world system would be modeled. We are using a simplified power distribution model for these examples but the modeling is flexible enough to handle common data acquisition and control applications.

Below is a simplified representation of an electrical circuit breaker with an attached power line. In this simplified system we are modeling the circuit breaker as having two status points indicating if it is currently Closed or Tripped (open) and two commands Trip and Close that will cause it to change state. We are modeling the power line to have two analog points, Voltage and Current , that indicate how much power is flowing through the line.

Figure?3.1.?Simple "Real World" System


3.1.1.?Equipment Model

It makes sense to arrange these points based on which piece of equipment they are most closely associated with. In the Reef system we call this the "equipment model." In power systems it is common to consider points and commands to be "owned" by pieces of equipment which may themselves be owned by "equipment groups".

In Reef we model this logical tree by assigning every object an Entity node that describes the name and type (ex: Point , Command , Equipment , etc) of the object. For the preceding example we create Entities for four points, two commands, two pieces of equipment and one equipment group. We then create an edge with type "owned" from the equipment group to each piece of equipment. From each piece of equipment we create an edge to the points and commands they own. Notice these edges have a direction, we would say "Line owns Voltage" or "Voltage is owned by Line"; "Voltage owns Line" would imply the opposite relationship.

Figure?3.2.?Logical Model as Entities


3.1.2.?Communication Model

The "logical model" describes the system in domain specific terms. In most domains the logical units we model are not themselves "intelligent electronic devices" (IEDs) from which we can collect data. For example the logical Entity "power line" is not actually a box we can plug an ethernet cable into and have it report the voltage and current passing through it, it is simply an electrical conduit. In the real world we generally only have a limited set of telemetry on a system often collecting measurements for many logical pieces of equipment using a single piece of equipment (IED). In the case of a power line we may have attached transducers for current and voltage that are being sampled. Below is one possible layout for how we would measure the points in our simple substation example.

Figure?3.3.?Instrumentation of Real World


We model an IED as a CommunicationEndpoint which is responsible for Points and Commands. An edge with type "source" is added from the endpoint to each of Points and Commands. We use this relationship to update the online flag on Points and Commands and provide visualizations for engineers to help them diagnose communication failures.

Figure?3.4.?Communication Tree


CommunicationEndpoints are the equipment in the real world that Reef will be communicating with using field protocols like DNP3, Modbus or ICCP. Many of these field protocols require complex configuration parameters to specify the specifics of how to configure the protocol and the exact format of the returned data and how to map that data to the measurements we are creating. We put this sort of data in ConfigFiles that the ProtocolAdapter can pull out of the services. We also need to know the address of the endpoint, whether that is a TCP/IP address or a serial port name and location. Each piece of information is also assigned an Entity and an edge with type "uses" is added from the communication endpoint to each of those Entities.

Figure?3.5.?Communication Tree


While it is true that some pieces of field equipment are "smart" and would be modeled as both a piece of equipment and as a CommunicationEndpoint, modeling them separately leads to a much easier to understand system for all users. An operator or application needs to concern itself only with the "logical model" without having to know how the data is actually being collected. If an operator is troubleshooting a communication issue they can look solely at the communication model and ignore the logical model.

3.1.3.?Aggregate Model

All of the Points and Commands are used in both the communication and logical models and this explains why we gave each of the models a relationship type ("owns" and "source"). Each Point or Command will have an edge pointing to a logical owner and its communication endpoint. When we merge the two trees we end up with a single pool of Entities that may be related to one or many other Entities. In computer science this is called a "directed graph with colored edges" and forms the basis of the Entity system.

Figure?3.6.?Merged Tree


Applications will use these relationships to determine the "shape" of the system. The Entities themselves do not contain any "type specific" information specific to each type of Entity. Some examples of "type specific" information are "is this Command blocked", "is this communication endpoint considered online" or "is this Point currently marked abnormal". This data is represented in Reef with the type specific messages (sometimes referred to as concrete types) like Point or Command. Applications will generally want to use the "concrete types" because they contain the interesting data, and they will use the Entity system to determine the UUIDs for the concrete objects they are using.

Figure?3.7.?Entities vs. Concrete Types


3.2.?Measurement Subsystem

The movement of measurement data to consumer applications is the single most important responsibility of the Reef system. Without a reliable and trustworthy flow of Measurements from the field, a data acquisition and control system is practically worthless so much of the advanced engineering in Reef is to support the Measurement stream.

3.2.1.?Measurement Stream

Measurements begin in field devices; usually they are physical sensors measuring real values. Reef uses field protocols to collect the Measurements from the IEDs (CommunicationEndpoints). These field protocols are generally modeled using the same sorts of Points and Measurements, making translation straightforward. A Reef enabled ProtocolAdapter for the field protocol will take the raw Measurements, convert them to a Reef Measurement protobuf object and send them to the MeasurementProcessor. The MeasurementProcessor inspects each Measurement, applying whatever transformations or overrides are necessary, and then stores them in the MeasurementStore and publishes the Measurements to the "engineering" channel for consumption by applications.

Figure?3.8.?Measurement Stream


3.2.2.?Processing

When processing a raw Measurement into an "engineering measurement" each Measurement goes through the following steps:

  • Overrides: Field data can be blocked from making it to the engineering channel. Each Measurement is checked against the list of blocked points and either sent on for further processing or routed to a simple "last-value" cache.

  • Processing: Raw Measurements from the field often need manipulations to transform them to engineering values or annotate the quality of the Measurements. The MeasurementProcessor is also responsible for generating Events when Measurements meet certain conditions.

  • Publishing: Once the Measurements are processed they are published first to the MeasurementStore then to the engineering channel.

Figure?3.9.?Measurement Processor Steps


3.2.2.1.?Measurement Overrides

Sometimes it is necessary to override the "real world" field Measurements for a Point . This is usually done for one of two reasons:

  • Bad Values: A field device is reporting a bad value (wildly oscillating or pinned to 0) and is generating spurious alarms or confusing "advanced applications" that are performing calculations on that value. In this case an operator will override that data to a nominal value.

  • Training or Testing: when an operator or integrator is testing alarms/UI/apps its often valuable to just be able to quickly override a value and see that the correct behaviors occur.

There are two ways this is done, marking a point "Not in Service" (NIS) and "Overriding" the Point . Both methods will cause one more Measurement to be published for the Point and stop all future field Measurements from making it to the engineering channel. The quality of the published Measurement is altered to indicate that the Measurement is SUBSTITUTED and there will be no future updates coming.

When the MeasurementOverride or NIS is removed from a Point , the most recently received field value will be published to indicate that the Measurement stream is "live" again.

The difference between NIS and Overrides is the final value that is published:

  • Not in Service (NIS): Re-publishes the last field Measurement with adjusted quality.

  • Override: Takes an agent supplied Measurement to set a specific value.

3.2.2.2.?Measurement Transformations

Converting a raw Measurement to its engineering representation and generating Events can be very use case specific and a one-size-fits-all solution is not flexible to production data streams. For example some installations may want to pre-check all incoming data to verify that the value is "reasonable" and generate a different sort of event for that case. To stay flexible each Point has its processing steps explicitly configured when the system is loaded. Each Measurement may have many steps and each step is defined in terms of a "Trigger Condition" and an "Action" to perform if the conditions match.

3.2.2.2.1.?Trigger Conditions

To determine whether performing an action is necessary, each processing step specifies the conditions the measurement must meet. A step will not be executed unless all conditions match (logical AND). Common triggers are:

  • Unit: Checks that the measurement has correct unit; nearly every step should indicate the expected unit. When a Reef ProtocolAdapter reads a Measurement from the field it will often be able to attach a meaningful "unit" immediately if no transformations are necessary. If the value requires some calculation or translation to have a meaningful unit, the ProtocolAdapter will attach the unit "raw" and a transformation step will update the value and set the new unit.

  • Exact Value: Some steps (usually Event generation) only activate if a Measurement has a specific value. We can check for exact boolean, string and integer values.

  • Ranges: A common pattern is to check that a Measurement is inside of a certain range of values. Multiple overlapping ranges are often specified for a single Point to provide different events when a measurement exceeds the nominal range into the warning range and when it strays into a dangerous range. Ranges can also have a deadband to stop a value oscillating near a range boundary from generating too many events. This is commonly referred to as Limits, Limit Pairs and Nested Limits.

Figure?3.10.?Nested Limits


When we determine that a Measurement matches all of the trigger conditions we say the "trigger is active" and start processing actions. Many actions will be run whenever the trigger is active, but in some cases it is valuable to run an action when a trigger is first activated or when it is de-activated. This is sometimes referred to as "edge-triggering" and "level-triggering", and both are supported. Actions therefore have an ActivationType that indicates when the action can be performed.

  • HIGH: Do action whenever trigger is active. Most transformations and annotations use HIGH.

  • LOW: Do action whenever trigger is in-active, not commonly used.

  • RISING: Do action when trigger is first activated, often used to generate "Measurement went out of Nominal" events.

  • FALLING: Do action when trigger was active and is now inactive. This can be used to generate "Measurement returned to Nominal" events.

  • TRANSITION: Do action when trigger active state changes, often used to generate "Measurement nominal status changed" events.

Figure?3.11.?Action Triggering Conditions


3.2.2.2.2.?Processing Actions

The measurement processor is capable of performing the following actions on each Measurement:

  • Scale: It is common that data from the field is sent with the wrong scale or unit. For example many legacy protocols do not have support for "floating point" values so devices are configured to send 100 or 1000 times the value. Reef provides a linear transformation with slope and offset.

  • Enum Transformation: Many Measurements (in particular statuses) need to be converted to string values for use in applications. A boolean value of true may indicate "CLOSED" for a breaker, but indicate "NOMINAL" for a door alarm point. The MeasurementProcessor will do this transformation so applications and operators don't need to understand the mappings between boolean values and real world state. The Measurement type is changed to string but the underlying boolean value is still available if needed. Less common but also supported is turning integer values into strings representing states.

  • Quality Annotation: We often want to update the quality of a Measurement but not affect the value. The most common case is to set the quality bit "abnormal" when the Measurement is outside expected or nominal values.

  • Generate Event: When a Measurement matches some condition (usually an undesired value) we want to alert the operator quickly by generating an Event. The action configuration determines what event code we are going to publish. The Point name, current value and quality are all sent as attributes to EventService to create helpful Event messages.

3.2.3.?Measurement Publishing

Once a Measurement is fully processed it is published to two places. First it is published to the MeasurementStore , which is responsible for storing the current and historical values for each Point. Once written to the store the Measurements are published to the "engineering channel" and the message bus sends a copy to every application that has subscribed to those Measurements.

3.3.?Commands Subsystem

When agents need to affect the real world, they will execute a Command. To execute a Command they create a CommandRequest that represents that single execution.

3.3.1.?Selecting a Command

Executing a Command is a potentially dangerous operation. If an electrical circuit breaker was closed at the wrong time or a factory line was started unexpectedly, people or equipment could be injured or destroyed. It is therefore vitally important that when Commands are executed, an agent can stop other agents from executing Commands that may conflict or cause a dangerous condition. This is often referred to as "Selecting" a group of Commands. We could also say that there is an "inter-lock" between the Commands.

Determining which Commands must not be executed simultaneously is done by the domain experts when initially modeling the system. In some cases we may "Select" only other Commands on that piece of Equipment but in other cases we may "Select" all of the Commands in the region.

To provide support for the full range of inter-lock requirements, the Command subsystem requires that all agents "Select" all of the Commands they want to have exclusive access to before making a CommandRequest. Another way to think about Selects is considering them as a key that opens a padlock on the Command "mailbox" so we can put the CommandRequest inside. Only one agent can possess this key at a time, so by taking all of the keys we can make sure no one else can execute the Commands.

Diagram of single select for many Commands, use keys and locks.

Figure?3.12.?Command Selects


When we have successfully executed our Command we need to release all of the Commands we had acquired access to. To release the Commands we DELETE the Select we were holding. By default Selects will timeout after a few moments even if the agent doesn't DELETE it.

In some cases we want to make sure no agent, including ourselves, will be able to execute a Command. This is referred to as "Blocking" a Command. Continuing with the mailbox analogy, "Blocking" a Command is like adding an extra padlock onto the mailbox which no agent has the key to so it must be cut off. "Blocking" a Command is functionally nearly the same as "Selecting" it; major differences are no one can execute the Command and it doesn't timeout by default.

Caution

A single Select should be used when selecting or blocking multiple Commands, acquiring them one-by-one could lead to a deadlock if another agent is trying to access the same Commands.

3.3.2.?Command Feedback

When a Command is executed, a CommandStatus code is returned to the agent from the field device indicating whether the Command was carried out. The CommandStatus code SUCCESS is the only one that indicates the Command was successful in the field; any other result indicates a failure to operate the Command. What is often confusing is that a SUCCESS response may not mean the Command has had the desired effect. For example, a Close Command to a circuit breaker could just be configured to send current through an electromagnet to attract the breaker blade magnetically to the closed position. The IED executing this Command will report SUCCESS if it was able to activate the electromagnet to that contact point, but it may not detect whether the blade moved. We would need to look at the Closed Point to determine if the circuit breaker had actually closed the circuit.

Figure?3.13.?Field Device Command Implementation


In most systems there are Points that are closely related to the Command and should show a change if the Command was successful. For the circuit breaker example, most breakers will have a Point for Closed and a Point for Tripped. A Close Command should have moved the blade to the closed side of the circuit. It is the client's responsibility to verify that a Command had the desired effect by looking at the related Points. In Reef we encode this relationship as a "feedback" relationship between the Command and Point entities.

Figure?3.14.?Command Feedback


3.3.3.?Command Flow

  • When an agent wants to execute a Command they first obtain a Select for at least that Command.

  • Then they create a CommandRequest and post it to the CommandRequestService.

  • If the CommandRequestService determines that the CommandRequest can be issued, it will forward the request to the ProtocolAdapter responsible for that Command.

  • The ProtocolAdapter will translate the CommandRequest to the protocol specific format and issue the Command to the field device waiting for the response.

  • The field protocol will report if the Command was successfully issued or if a failure occurred.

  • The ProtocolAdapter will return that status code to the CommandRequestService, which will record the status of the CommandRequest and send that response to the agent.

  • The agent should then wait to see an update to the "feedback" Point.

  • Finally delete the select.

3.4.?Event and Alarms Subsystem

When an Event is produced in Reef it is posted to the EventService. Most subsystems and applications will generate Events during the normal course of operations; they are not just for error messages.

3.4.1.?Event Generation

When we decide to generate an Event we need to include all the relevant information we have so that appropriate messages can be generated and Events can be tied to the related objects. Information needed when generating an Event:

  • Event Type: Also referred to as the "event code", this string defines the "meaningful event" and determines how the Event will be handled and routed.

  • Entity: The Entity most closely related to the Event. For example if an Event is generated from a measurement, it will generally be attached to the Point. If an Event was generated with no clear Entity relationship, it can be left blank or attached to the generating application.

  • Agent: The agent most responsible for causing an Event. Most subsystems will just use the same agent they are authorized to Reef with; if this field is left blank the agent carried by the AuthToken will be used. Any subsystem that is acting as a proxy for another agent should send that Agent's UUID. For example when the CommandRequestService is executing a Command selected by "Agent A" it will attach "Agent A" to the generated Event.

  • Attributes: An Event can carry more data about an occurrence than just the "Event Type" and related Entity. For example, an Event indicating that a measurement has an unexpected value is far more useful when the unexpected value is in the Events message. Each subsystem should attach any information as named attributes that might be valuable in an Event message. A naive implementation would have each subsystem create an appropriate string message but this would make it impossible to translate or customize. Reef uses "named arguments" rather than "positional arguments" for the rendered messages, because this makes the Events themselves more self contained.

  • Device Time: When an Event is recorded by the system the system time it is received is set on the Event object. In some cases we are generating the Event based on some other data which may not be "now". In those cases we set the "Device Time" to indicate when we think the Event really occurred. An example is the MeasurementProcessor which generates an Event based on a Measurement ; the Measurement may have been delayed in the field but carries a timestamp so we attach the Measurements timestamp to the generated Event.

3.4.2.?Event and Alarm Configuration

In many installations Events and Alarms drive the day-to-day workflow of the human operators. It is therefore important that it is possible to configure them appropriately to give the operators the right level of information without overwhelming them. It is also important that the accompanying message reflect the terminology and language in use at that location.

Each EventType is configurable in a number of ways:

  • Severity: This is a number between 1 and 10, 1 being the most severe.

  • Designation: Handle this Event as a LOG, EVENT or LOG. See Event Designation section for details.

  • Resource String: The text of the rendered Event message is configured here. The string format is based on Java's MessageFormat syntax. One major difference is that we have replaced the numeric arguments with the named attributes.

    // an example resource for an out of range value
    "Point {name} is out of range. Value: {value} {unit}"
    // -> "Point Line1.Voltage is out of range. Value: 109 volts"

3.4.2.1.?Event Designation

When an Event is received by the EventService it consults the EventConfigService to determine how the Event should be handled:

  • Log: Some Events may be considered to be too numerous or unimportant to store as an Event. In these cases we render the Event message to the offline system log only.

  • Event: Most Events fall into this category. We store the posted Event to the event storage and publish a service ADDED event for the Event.

  • Alarm: The most important Events can be promoted to Alarms. Creating an Alarm will push an alert to an operator immediately, prompting them to examine the situation and take some action. The Event is still stored in the event storage as well.

This flexibility allows subsystems to be written just to generate Events at interesting times and lets the system integrator decide which Events should generate Alarms.

3.4.3.?Alarm Lifecycle

Whereas Events are simply created and viewed, Alarms are more complex and have an associated workflow. This workflow is used to communicate an urgent piece of information to a human operator and verify that they have responded to the situation.

  • An Alarm starts out in the UNACKNOWLEDGED state. Most important Alarms are also configured to generate an audible alert for every operator.

  • If the Alarm is audible the operator may silence the alarm. This will silence the alarm for all operators.

  • Once the operator is looking into the problem they will ACKNOWLEDGE the Alarm.

  • When the problem is considered handled, the operator will REMOVE the Alarm, which removes it from active displays although it remains in the history.

Figure?3.15.?Alarm Lifecycle