Delay-Tolerant Networking E. Birrane Internet-Draft Johns Hopkins Applied Physics Laboratory Intended status: Informational October 30, 2017 Expires: May 3, 2018 Asynchronous Management Architecture draft-birrane-dtn-ama-06 Abstract This document describes an asynchronous management architecture (AMA) suitable for providing application-level network management services in a challenged networking environment. Challenged networks are those that require fault protection, configuration, and performance reporting while unable to provide humans-in-the-loop with synchronous feedback or otherwise preserve transport-layer sessions. In such a context, networks must exhibit behavior that is both determinable and autonomous while maintaining compatibility with existing network management protocols and operational concepts. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 3, 2018. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect Birrane Expires May 3, 2018 [Page 1] Internet-Draft AMA October 2017 to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 4 1.4. Organization . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Challenged Networks . . . . . . . . . . . . . . . . . . . 6 3.2. Current Approaches and Their Limitations . . . . . . . . 7 4. Service Definitions . . . . . . . . . . . . . . . . . . . . . 8 4.1. Configuration . . . . . . . . . . . . . . . . . . . . . . 9 4.2. Reporting . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3. Autonomous Parameterized Procedure Calls . . . . . . . . 10 4.4. Administration . . . . . . . . . . . . . . . . . . . . . 10 5. Desirable Properties . . . . . . . . . . . . . . . . . . . . 11 5.1. Intelligent Push of Information . . . . . . . . . . . . . 11 5.2. Minimize Message Size Not Node Processing . . . . . . . . 12 5.3. Absolute Data Identification . . . . . . . . . . . . . . 12 5.4. Custom Data Definition . . . . . . . . . . . . . . . . . 12 5.5. Autonomous Operation . . . . . . . . . . . . . . . . . . 13 6. Roles and Responsibilities . . . . . . . . . . . . . . . . . 13 6.1. Agent Responsibilities . . . . . . . . . . . . . . . . . 14 6.2. Manager Responsibilities . . . . . . . . . . . . . . . . 15 7. Logical Data Model . . . . . . . . . . . . . . . . . . . . . 15 7.1. EDDs, VARs, and Reporting . . . . . . . . . . . . . . . . 16 7.2. Controls and Macros . . . . . . . . . . . . . . . . . . . 17 7.3. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.4. Operators and Literals . . . . . . . . . . . . . . . . . 18 8. System Model . . . . . . . . . . . . . . . . . . . . . . . . 18 8.1. Control and Data Flows . . . . . . . . . . . . . . . . . 18 8.2. Control Flow by Role . . . . . . . . . . . . . . . . . . 19 8.2.1. Notation . . . . . . . . . . . . . . . . . . . . . . 19 8.2.2. Serialized Management . . . . . . . . . . . . . . . . 20 8.2.3. Multiplexed Management . . . . . . . . . . . . . . . 21 8.2.4. Data Fusion . . . . . . . . . . . . . . . . . . . . . 23 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 10. Security Considerations . . . . . . . . . . . . . . . . . . . 24 11. Informative References . . . . . . . . . . . . . . . . . . . 24 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 25 Birrane Expires May 3, 2018 [Page 2] Internet-Draft AMA October 2017 1. Introduction The Asynchronous Management Architecture (AMA) provides application- layer network management services over links where delivery delays prevent timely communications between a network operator and a managed device. These delays may be caused by long signal propagations or frequent link disruptions (such as described in [RFC4838]) or by non-environmental factors such as unavailability of network operators, administrative delays, or delays caused by quality-of-service prioritizations and service-level agreements. 1.1. Purpose This document describes the motivation, service definitions, desirable properties, roles/responsibilities, system model, and logical data model that form the AMA. These descriptions should be of sufficient specificity that implementations conformant to this architecture will operate successfully in a challenged networking environment. This document is not a prescriptive standardization of a physical data model or protocol. Instead, it serves as informative guidance to authors of such models and protocols. An AMA is necessary as the assumptions inherent to the architecture and design of synchronous management tools and techniques are not valid in challenged network scenarios. In these scenarios, synchronous approaches either patiently wait for periods of bi- directional connectivity or require the investment of significant time and resources to evolve a challenged network into a well- connected, low-latency network. In some cases such evolution is merely a costly way to over-resource a network. In other cases, such evolution is impossible given physical limitations imposed by signal propagation delays, power, transmission technologies, and other phenomena. Asynchronous management of asynchronous networks enables large-scale deployments, distributed technical capabilities, and reduced deployment and operations costs. The rationale and motivation for asynchronous management is captured in [BIRRANE1], [BIRRANE2],[BIRRANE3]. The properties and feasibility of such a system are taken from prototyping work done in accordance with [I-D.irtf-dtnrg-dtnmp]. 1.2. Scope It is assumed that any challenged network where network management would be usefully applied supports basic services (where necessary) such as naming, addressing, integrity, confidentiality, Birrane Expires May 3, 2018 [Page 3] Internet-Draft AMA October 2017 authentication, fragmentation, and traditional network/session layer functions. Therefore, these items are outside of the scope of the AMA and not covered in this document. While possible that a challenged network may interface with an unchallenged network, this document does not address the concept of network management compatibility with synchronous approaches. 1.3. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 1.4. Organization The remainder of this document is organized into seven sections that, together, describe an AMA suitable for enterprise management of asynchronous networks: terminology, motivation, service definitions, desirable properties, roles/responsibilities, logical data model, and system model. The description of each section is as follows. o Terminology - This section identifies those terms critical to understanding the proper operation of the AMA. Whenever possible, these terms align in both word selection and meaning with their analogs from other management protocols. o Motivation - This section provides an overall motivation for this work as providing a novel and useful alternative to current network management approaches. Specifically, this section describes common network functions and how synchronous mechanisms fail to provide these functions in an asynchronous environment. o Service Definitions - This section defines asynchronous network management services in terms of terminology, scope, and impact. o Desirable Properties - This section identifies the properties to which an asynchronous management system should adhere to effectively implement service definitions in an asynchronous environment. These properties guide the subsequent definition of the system and logical models that comprise the AMA. o Roles and Responsibilities - This section identifies the roles in the AMA and their associated responsibilities. It provides the terminology and context for discussing how network management services interact. Birrane Expires May 3, 2018 [Page 4] Internet-Draft AMA October 2017 o Logical Data Model - This section describes the kinds of data that should be represented in deployment asynchronous management system. o System Model - This section describes data flows amongst various defined Actor roles. These flows capture how the AMA system works to provide asynchronous network management services in accordance with defined desirable properties. 2. Terminology o Actor - A software service running on either managed or managing devices for the purpose of implementing management protocols between such devices. Actors may implement the "Manager" role, "Agent" role, or both. o Agent Role (or Agent) - The role associated with a managed device, responsible for reporting performance data, enforcing administrative policies, and accepting/performing actions. Agents exchange information with Managers operating either on the same device or on a remote managing device. o Externally Defined Data (EDD) - Information made available to an Agent by a managed device, but not computed directly by the Agent. o Variable (VAR) - Information that is computed by an Agent, typically as a function of EDD values and/or other Variables. o Controls (CTRLs) - Operations that may be undertaken by an Actor to change the behavior, configuration, or state of an application or protocol managed by an AMP. o Literals (LIT) - Constants, enumerations, and other immutable definitions. o Macros - A named, ordered collection of Controls. o Manager - A role associated with a managing device responsible for configuring the behavior of, and receiving information from, Agents. Managers interact with one or more Agents located on the same device and/or on remote devices in the network. o Operator (OP) - The enumeration and specification of a mathematical function used to calculate computed data definitions and construct expressions to calculate state. Birrane Expires May 3, 2018 [Page 5] Internet-Draft AMA October 2017 o Report (RPT) - A named, typed, ordered collection of data values gathered by one or more Agents and provided to one or more Managers. o Rule - A unit of autonomous specification that provides a stimulus-response relationship between time or state on an Agent and the Controls to be run as a result of that time or state. 3. Motivation Challenged networks, to include networks challenged by administrative or policy delays, cannot guarantee capabilities required to enable synchronous management techniques. These capabilities include high- rate, highly-available data, round-trip data exchange, and operators "in-the-loop". The inability of current approaches to provide network management services in a challenged network motivates the need for a new network management architecture focused on asynchronous, open-loop, autonomous control of network components. 3.1. Challenged Networks A growing variety of link-challenged networks support packetization to increase data communications reliability without otherwise guaranteeing a simultaneous end-to-end path. Examples of such networks include Mobile Ad-Hoc Networks (MANets), Vehicular Ad-Hoc Networks (VANets), Space-Terrestrial Internetworks (STINTs), and heterogeneous networking overlays. Links in such networks are often unavailable due to attenuations, propagation delays, occultation, and other limitations imposed by energy and mass considerations. Data communications in such networks rely on store-and-forward and other queuing strategies to wait for the connectivity necessary to usefully advance a packet along its route. Similarly, there also exist well-resourced networks that incur high message delivery delays due to non-environmental limitations. For example, networks whose operations centers are understaffed or where data volume and management requirements exceed the real-time cognitive load of operators or the associated operations console software support. Also, networks where policy restricts user access to existing bandwidth creates situations functionally similar to link disruption and delay. Independent of the reason, when a node experiences an inability to communicate it must rely on autonomous mechanisms to ensure its safe operation and ability to usefully re-join the network at a later time. In cases of sparsely-populated networks, there may never be a practical concept of "the connected network" as most nodes may be disconnected most of the time. In such environments, defining a Birrane Expires May 3, 2018 [Page 6] Internet-Draft AMA October 2017 network in terms of instantaneous connectivity becomes impractical or impossible. Specifically, challenged networks exhibit the following properties that may violate assumptions built into current approaches to synchronous network management. o Links may be uni-directional. o Bi-directional links may have asymmetric data rates. o No end-to-end path is guaranteed to exist at any given time between any two nodes. o Round-trip communications between any two nodes within any given time window may be impossible. 3.2. Current Approaches and Their Limitations Network management tools in unchallenged networks provide mechanisms for communicating locally-collected data from Agents to Managers, typically using a "pull" mechanism where data must be explicitly requested by a Manager in order to be transmitted by an Agent. Management approaches that rely on timely data exchange, such as those that rely on negotiated sessions or other synchronized acknowledgment, do not function in challenged network environments. Familiar examples of TCP/IP based management via closed-loop, synchronous messaging do not work when network disruptions increase in frequency and severity. While no protocol delivers data in the absence of a networking link, protocols that eliminate or drastically reduce overhead and end-point coordination require smaller transmission windows and continue to function when confronted with scaling delays and disruptions in the network. A legacy method for management in unchallenged networks today is the Simple Network Management Protocol (SNMP) [RFC3416]. SNMP utilizes a request/response model to set and retrieve data values such as host identifiers, link utilizations, error rates, and counters between application software on Agents and Managers. Data may be directly sampled or consolidated into representative statistics. Additionally, SNMP supports a model for asynchronous notification messages, called traps, based on predefined triggering events. Thus, Managers can query Agents for status information, send new configurations, and be informed when specific events have occurred. Traps and queryable data are defined in one or more Managed Information Bases (MIBs) which define the information for a particular data standard, protocol, device, or application. Birrane Expires May 3, 2018 [Page 7] Internet-Draft AMA October 2017 While there is a large installation base for SNMP there are several aspects of the protocol that make in inappropriate for use in a challenged networking environment. SNMP relies on sessions with low round-trip latency to support its "pull" model. The SNMP trap model provides some Agent-side processing, but with very low fidelity and traps are typically "fire and forget" requiring the underlying transport to support reliable, in-order message delivery. Adaptive modifications to SNMP to support challenged networks would alter the basic function of the protocol (data models, control flows, and syntax) so as to be functionally incompatible with existing SNMP installations. Therefore, this approach is not suitable for an asynchronous network management system. The Network Configuration Protocol (NETCONF) provides device-level configuration capabilities [RFC6241] to replace vendor-specific command line interface (CLI) configuration software. The XML-based protocol provides a remote procedure call (RPC) syntax such that any exposed functionality on an Agent can be exercised via a software application interface. NETCONF places no specific functional requirements or constraints on the capabilities of the Agent, which makes it a very flexible tool for configuring a homogeneous network of devices. NETCONF places specific constraints on any underlying transport protocol: a long-lived, reliable, low-latency sequenced data delivery session. This is a fundamental requirement given the RPC-nature of the operating concept, and it is unsustainable in a challenged network. Aspects of the data modeling associated with NETCONF may apply to an asynchronous network management system, such that some modeling tools may be used, even if the network control plane cannot. Just as the concept of a loosely-confederated set of nodes changes the definition of a network, it also changes the operational concept of what it means to manage a network. When a network stops being a single entity exhibiting a single behavior, "network management" becomes large-scale "node management". Individual nodes must share the burden of implementing desirable behavior without reliance on a single oracle of configuration or other coordinating function such as an operator-in-the-loop. 4. Service Definitions This section identifies the services that must exist between Managers and Agents within an AMA. These services include configuration, reporting, parameterized control, and administration. Birrane Expires May 3, 2018 [Page 8] Internet-Draft AMA October 2017 4.1. Configuration Configuration services update Agent data associated with managed applications and protocols. Some configuration data might be defined in the context of an application or protocol, such that any network using that application or protocol would understand that data. Other configuration data may be defined tactically for use in a specific network deployment and not available to other networks even if they use the same applications or protocols. New configurations received by an Agent must be validated to ensure that they do not conflict with other configurations or would otherwise prevent the Agent from effectively working with other Actors in its region. With no guarantee of round-trip data exchange, Agents cannot rely on remote Managers to correct erroneous or stale configurations from harming the flow of data through a challenged network. Examples of configuration service behavior include the following. o Creating a new datum as a function of other well-known data: C = A + B. o Creating a new report as a unique, ordered collection of known data: RPT = {A, B, C}. o Storing predefined, parameterized responses to potential future conditions: IF (X > 3) THEN RUN CMD(PARM). 4.2. Reporting Reporting services populate report templates with values collected or computed by an Agent. The resultant reports are sent to one or more Managers by the Agent. The term "reporting" is used in place of the term "monitoring", as monitoring implies a timeliness and regularity that cannot be guaranteed by a challenged network. Reports sent by an Agent provide best-effort information to receiving Managers. Since a Manager is not actively "monitoring" an Agent, the Agent must make its own determination on when to send what Reports based on its own local time and state information. Agents should produce Reports of varying fidelity and with varying frequency based on thresholds and other information set as part of configuration services. Examples of reporting service behavior include the following. Birrane Expires May 3, 2018 [Page 9] Internet-Draft AMA October 2017 o Generate Report R1 every hour (time-based production). o Generate Report R2 when X > 3 (state-based production). 4.3. Autonomous Parameterized Procedure Calls Similar to an RPC call, some mechanism MUST exist to allow a procedure to be run on an Agent to effect behavior or otherwise change the Agent's internal state. Since there is no guarantee that a Manager will be in contact with an Agent at any given time, the decisions of whether and when a procedure should be run MUST be made locally and autonomously by the Agent. Two types of automation triggers are identified in the AMA: triggers based on the general state of the Agent and triggers based on an Agent's notion of time. As such, the autonomous execution of procedures can be viewed as a stimulus-response system, where the stimulus is the positive evaluation of a state or time based predicate and the response is the function to be executed. The autonomous nature of procedure execution by an Agent implies that the full suite of information necessary to run a procedure may not be known by a Manager in advance. To address this situation, a parameterization mechanism MUST be available so that required data can be provided at the time of execution on the Agent rather than at the time of definition/configuration by the Manager. Autonomous, parameterized procedure calls provide a powerful mechanism for Managers to "manage" an Agent asynchronously during periods of no communication by pre-configuring responses to events that may be encountered by the Agent at a future time. Examples of potential behavior include the following. o Updating local routing information based on instantaneous link analysis. o Managing storage on the device to enforce quotas. o Applying or modifying local security policy. 4.4. Administration Administration services enforce the potentially complex mapping of configuration, reporting, and control services amongst Agents and Managers in the network. Fine-grained access control specifying which Managers may apply which services to which Agents may be necessary in networks dealing with multiple administrative entities or overlay networks crossing multiple administrative boundaries. Birrane Expires May 3, 2018 [Page 10] Internet-Draft AMA October 2017 Whitelists, blacklists, key-based infrastructures, or other schemes may be used for this purpose. Examples of administration service behavior include the following. o Agent A1 only Sends reports for Protocol P1 to Manager M1. o Agent A2 only accepts a configurations for Application Y from Managers M2 and M3. o Agent A3 accepts services from any Manager providing the proper authentication token. Note that the administrative enforcement of access control is different from security services provided by the networking stack carrying AMP messages. 5. Desirable Properties This section describes those design properties that are desirable when defining an architecture that must operate across challenged links in a network. These properties ensure that network management capabilities are retained even as delays and disruptions in the network scale. Ultimately, these properties are the driving design principles for the AMA. 5.1. Intelligent Push of Information Pull management mechanisms require that a Manager send a query to an Agent and then wait for the response to that query. This practice implies a control-session between entities and increases the overall message traffic in the network. Challenged networks cannot guarantee timely roundtrip data-exchange and, in extreme cases, are comprised solely of uni-directional links. Therefore, pull mechanisms must be avoided in favor of push mechanisms. Push mechanisms, in this context, refer to Agents making their own determinations relating to the information that should be sent to Managers. Such mechanisms do not require round-trip communications as Managers do not request each reporting instance; Managers need only request once, in advance, that information be produced in accordance with a predetermined schedule or in response to a predefined state on the Agent. In this way information is "pushed" from Agents to Managers and the push is "intelligent" because it is based on some internal evaluation performed by the Agent. Birrane Expires May 3, 2018 [Page 11] Internet-Draft AMA October 2017 5.2. Minimize Message Size Not Node Processing Protocol designers must balance message size versus message processing time at sending and receiving nodes. Verbose representations of data simplify node processing whereas compact representations require additional activities to generate/parse the compacted message. There is no asynchronous management advantage to minimizing node processing time in a challenged network. However, there is a significant advantage to smaller message sizes in such networks. Compact messages require smaller periods of viable transmission for communication, incur less re-transmission cost, and consume less resources when persistently stored en-route in the network. AMPs should minimize PDUs whenever practical, to include packing and unpacking binary data, variable-length fields, and pre- configured data definitions. 5.3. Absolute Data Identification Elements within the management system must be uniquely identifiable so that they can be individually manipulated. Identification schemes that are relative to system configuration make data exchange between Agents and Managers difficult as system configurations may change faster than nodes can communicate. Consider the following common technique for approximating an associative array lookup. A manager wishing to do an associative lookup for some key K1 will (1) query a list of array keys from the agent, (2) find the key that matches K1 and infer the index of K1 from the returned key list, and (3) query the discovered index on the agent to retrieve the desired data. Ignoring the inefficiency of two pull requests, this mechanism fails when the Agent changes its key-index mapping between the first and second query. Rather than constructing an artificial mapping from K1 to an index, an AMP must provide an absolute mechanism to lookup the value K1 without an abstraction between the Agent and Manager. 5.4. Custom Data Definition Custom definition of new data from existing data (such as through data fusion, averaging, sampling, or other mechanisms) provides the ability to communicate desired information in as compact a form as possible. Specifically, an Agent should not be required to transmit a large data set for a Manager that only wishes to calculate a smaller, inferred data set. The Agent should calculate the smaller data set on its own and transmit that instead. Since the identification of custom data sets is likely to occur in the context Birrane Expires May 3, 2018 [Page 12] Internet-Draft AMA October 2017 of a specific network deployment, AMPs must provide a mechanism for their definition. 5.5. Autonomous Operation AMA network functions must be achievable using only knowledge local to the Agent. Rather than directly controlling an Agent, a Manager configures an engine of the Agent to take its own action under the appropriate conditions in accordance with the Agent's notion of local state and time. Such an engine may be used for simple automation of predefined tasks or to support semi-autonomous behavior in determining when to run tasks and how to configure or parameterize tasks when they are run. Wholly autonomous operations MAY be supported where required. Generally, autonomous operations should provide the following benefits. o Distributed Operation - The concept of pre-configuration allows the Agent to operate without regular contact with Managers in the system. The initial configuration (and periodic update) of the system remains difficult in a challenged network, but an initial synchronization on stimuli and responses drastically reduces needs for centralized operations. o Deterministic Behavior - Such behavior is necessary in critical operational systems where the actions of a platform must be well understood even in the absence of an operator in the loop. Depending on the types of stimuli and responses, these systems may be considered simple automation or semi-autonomous behavior, both of which imply the ability of a frequently-out-of-contact Manager to better predict the state of an Agent than if controls were to be run by an independent, fully autonomous system. o Engine-Based Behavior - Several operational systems are unable to deploy "mobile code" based solutions due to network bandwidth, memory or processor loading, or security concerns. Engine-based approaches are preferred as they can be flexible without incurring a set of problematic requirements or concerns. 6. Roles and Responsibilities By definition, Agents reside on managed devices and Managers reside on managing devices. This section describes how these roles participate in the network management functions outlined in the prior section. Birrane Expires May 3, 2018 [Page 13] Internet-Draft AMA October 2017 6.1. Agent Responsibilities Application Support Agents MUST collect all data, execute all procedures, populate all reports and run operations required by each application which the Agent claims to manage. Agents MUST report supported applications so that Managers in a network understands what information is understood by what Agent. Local Data Collection Agents MUST collect from local firmware (or other on-board mechanisms) and report all data defined for the management of applications for which they have been configured. Autonomous Control Agents MUST determine, without Manager intervention, whether a procedure should be invoked. Agents MAY also invoke procedures on other devices for which they act as proxy. User Data Definition Agents MUST provide mechanisms for operators in the network to use configuration services to create customized data definitions in the context of a specific network or network use-case. Agents MUST allow for the creation, listing, and removal of such definitions in accordance with whatever security models are deployed within the particular network. Where applicable, Agents MUST verify the validity of these definitions when they are configured and respond in a way consistent with the logging/error-handling policies of the Agent and the network. Autonomous Reporting Agents MUST determine, without real-time Manager intervention, whether and when to populate and transmit a given report targeted to one or more Managers in the network. Consolidate Messages Agents SHOULD produce as few messages as possible when sending information. For example, rather than sending multiple messages, each with one report to a Manager, an Agent SHOULD prefer to send a single message containing multiple reports. Regional Proxy Agents MAY perform any of their responsibilities on behalf of other network nodes that, themselves, do not have an Agent. Birrane Expires May 3, 2018 [Page 14] Internet-Draft AMA October 2017 In such a configuration, the Agent acts as a proxy for these other network nodes. 6.2. Manager Responsibilities Agent Capabilities Mapping Managers MUST understand what applications are managed by the various Agents with which they communicate. Managers should not attempt to request, invoke, or refer to application information for applications not managed by an Agent. Data Collection Managers MUST receive information from Agents by asynchronously configuring the production of reports and then waiting for, and collecting, responses from Agents over time. Managers MAY try to detect conditions where Agent information has not been received within operationally relevant time spans and react in accordance with network policy. Custom Definitions Managers should provide the ability to define custom data definitions. Any custom definitions MUST be transmitted to appropriate Agents and these definitions MUST be remembered to interpret the reporting of these custom values from Agents in the future. Data Translation Managers should provide some interface to other network management protocols. Managers MAY accomplish this by accumulating a repository of push-data from high-latency parts of the network from which data may be pulled by low- latency parts of the network. Data Fusion Managers MAY support the fusion of data from multiple Agents with the purpose of transmitting fused data results to other Managers within the network. Managers MAY receive fused reports from other Managers pursuant to appropriate security and administrative configurations. 7. Logical Data Model The AMA logical data model captures the types of information that should be collected and exchanged to implement necessary roles and responsibilities. The data model presented in this section does not presuppose a specific mapping to a physical data model or encoding technique; it is included to provide a way to logically reason about Birrane Expires May 3, 2018 [Page 15] Internet-Draft AMA October 2017 the types of data that should be exchanged in an asynchronously managed network. The elements of the AMA logical data model are described as follows. 7.1. EDDs, VARs, and Reporting There are three fundamental representations of data in the AMA: (1) data which are sampled/calculated external to the network management system, (2) data which are calculated internal to the network management system, and (3) ordered collections of data items used for reporting. Data that is sampled/calculated external to the network management system is defined as "externally defined data" (EDD). EDD values represent the most useful information in the management system as they are provided by the applications or protocol being managed on the Agent. It is RECOMMENDED that EDD values be strongly typed to avoid issues with interpreting the data value. It is also RECOMMENDED that the timeliness/staleness of the data value be considered when using the data in the context of autonomous action on the Agent. Data that is calculated internal to the network management system is defined as a "variable" (VAR). VARs allow the creation of new data values for use in the network management system. New value definitions are useful for storing user-defined information, storing the results of complex calculations for easier re-use, and providing a mechanism for combining information from multiple external sources. It is RECOMMENDED that VARs be strongly typed to avoid issues with interpreting the data value. In cases where a VAR definition relies on other VAR definitions, mechanisms to prevent circular references MUST be included in any actual data model or implementation. Ordered collections of EDD values and VARs should be produced by Agents and sent to Managers as a way of communicating Agent state. Such an ordered collection is called a "report" (RPT). It is RECOMMENDED that the structure of a RPT be given in a template that can be synchronized between an Agent and a Manager so that RPTs themselves do not need to be self-describing. A RPT may include EDD values, VARs, and also other RPTs. In cases where a RPT includes another RPT, mechanisms to prevent circular references MUST be included in any actual data model or implementation. In cases where sets of data can comply to a tabular format, a Table structure should be used instead of a report structure. In these cases, table columns provide name and type information that does not need to be repeated in every data value. However, unlike Birrane Expires May 3, 2018 [Page 16] Internet-Draft AMA October 2017 reports, the regularity of the table structure allows for the addition of rows without altering the format of the table structure. Attempting to add a new data set at the end of a report would require alterations to the report template. Additionally, the prescriptive nature of the Table structure allows for the possibility of advanced filtering which may reduce traffic between Agents and Managers. 7.2. Controls and Macros Low-latency, high-availability approaches to network management use mechanisms such as (or similar to) remote procedure calls (RPCs) to cause some action to be performed on an Agent. The AMA requires similar capabilities, though without requiring that the Manager be in the processing loop of the Agent. A "control" (CTRL) represents a parameterized, predefined procedure that can be run on an Agent. CTRLs do not have a return code as there is not the same concept of sequential execution in an asynchronous model. Parameters can be provided when running a command from a Manager, pre-configured as part of an autonomy response on the Agent, or auto-generated as needed on the Agent. The success or failure of a control MAY be inferred by reports generated for that purpose. Often, a series of controls must be executed in concert to achieve a particular outcome. A "macro" (MACRO) represents an ordered collection of controls (or other macros). In cases where a MACRO includes another MACRO, mechanisms to prevent circular references and maximum nesting levels MUST be included in any actual data model or implementation. 7.3. Rules The AMA data model contains EDD values and VARs that capture the state of applications on an Agent. The model also contains controls and macros to perform actions on an Agent. A mechanism is needed to relate these two capabilities; to perform an action on the Agent in response to the state of the Agent. One way of mapping Agent state to Agent actions is via a stimulus- response system. A "rule" represents a stimulus-response pairing in the following form. IF predicate THEN response The predicate is a logical expression that evaluates to true if the rule stimulus is present and evaluates to false otherwise. The response may be any control or macro known to the Agent. An example of a time-based predicate is to perform some activity every 24 hours (e.g., (((CUR_TIME - START_TIME) % 24Hrs) == 0)). An example of a Birrane Expires May 3, 2018 [Page 17] Internet-Draft AMA October 2017 state-based predicate is to perform some activity if a given EDD value exceeds a predefined threshold such as a measured temperature exceeds 80 degrees centigrade (e.g., (TEMP > 80.0)) Rules should be allowed to construct their stimuli from the full set of EDD values and VARs available to the network management system. Similarly, macro responses should be allowed to include controls from all applications known by the Agent. This enables an expressive capability to have multiple applications monitored and managed by the Agent. 7.4. Operators and Literals Actions such as computing a VAR value or describing a rule predicate require calculating mathematical expressions. An element of an expression will be one of four types of data: an EDD value, a VAR value, a mathematical operations, and literal values. An "operator" (OP) represents a mathematical operation in an expression. OPs should support multiple operands based on the operation supported. A common set of OPs SHOULD be defined for any Agent and systems MAY choose to allow individual applications to define new OPs to assist in the generation of new VAR values and predicates for managing that application. OPs may be simple binary operations such as "A + B" or more complex functions such as sin(A) or avg(A,B,C,D). A "literal" (LIT) represents a constant value, such as simple numbers (e.g., 4), well-known mathematical numbers (e.g., PI, E), or other useful data such as Epoch times. LITs should be strongly typed to avoid any misinterpretation of their data value. 8. System Model This section describes the notional data flows and control flows that illustrate how Managers and Agents within an AMA cooperate to perform network management services. 8.1. Control and Data Flows The AMA identifies three significant data flows: control flows from Managers to Agents, reports flows from Agents to Managers, and fusion reports from Managers to other Managers. These data flows are illustrated in Figure 1. Birrane Expires May 3, 2018 [Page 18] Internet-Draft AMA October 2017 AMA Control and Data Flows +---------+ +------------------------+ +---------+ | Node A | | Node B | | Node C | | | | | | | |+-------+| |+-------+ +-------+| |+-------+| || ||=====>>||Manager|====>>| ||====>>|| || || ||<<=====|| B |<<====|Agent B||<<====|| || || || |+--++---+ +-------+| ||Manager|| || Agent || +---||-------------------+ || C || || A || || || || || ||<<=========||==========================|| || || ||===========++========================>>|| || |+-------+| |+-------+| +---------+ +---------+ Figure 1 In this data flow, the Agent on node A receives Controls from Managers on nodes B and C, and replies with Report Entries back to these Managers. Similarly, the Agent on node B interacts with the local Manager on node B and the remote Manager on node C. Finally, the Manager on node B may fuse Report Entries received from Agents at nodes A and B and send these fused Report Entries back to the Manager on node C. From this figure it is clear that there exist many-to-many relationships amongst Managers, amongst Agents, and between Agents and Managers. Note that Agents and Managers are roles, not necessarily differing software applications. Node A may represent a single software application fulfilling only the Agent role, whereas node B may have a single software application fulfilling both the Agent and Manager roles. The specifics of how these roles are realized is an implementation matter. 8.2. Control Flow by Role This section describes three common configurations of Agents and Managers and the flow of messages between them. These configurations involve local and remote management and data fusion. 8.2.1. Notation The notation outlined in Table 1 describes the types of control messages exchanged between Agents and Managers. Birrane Expires May 3, 2018 [Page 19] Internet-Draft AMA October 2017 +-------------+--------------------------------------+--------------+ | Term | Definition | Example | +-------------+--------------------------------------+--------------+ | EDD# | EDD definition. | EDD1 | | | | | | V# | Custom data definition. | V1 = EDD1 + | | | | V0. | | | | | | DEF([ACL], | Define id from expression. Allow | DEF([*], V1, | | ID,EXPR) | managers in access control list | EDD1 + EDD2) | | | (ACL) to request this id. | | | | | | | PROD(P,ID) | Produce ID according to predicate P. | PROD(1s, | | | P may be a time period (1s) or an | EDD1) | | | expression (EDD1 > 10). | | | | | | | RPT(ID) | A report identified by ID. | RPT(EDD1) | +-------------+--------------------------------------+--------------+ Table 1: Terminology 8.2.2. Serialized Management This is a nominal configuration of network management where a Manager interacts with a set of Agents. The control flows for this are outlined in Figure 2. Birrane Expires May 3, 2018 [Page 20] Internet-Draft AMA October 2017 Serialized Management Control Flow +----------+ +---------+ +---------+ | Manager | | Agent A | | Agent B | +----+-----+ +----+----+ +----+----+ | | | |-----PROD(1s, EDD1)--->| | (1) |----------------------------PROD(1s, EDD1)-->| | | | | | | |<-------RPT(EDD1)------| | (2) |<----------------------------RPT(EDD1)-------| | | | | | | |<-------RPT(EDD1)------| | |<----------------------------RPT(EDD1)-------| | | | | | | |<-------RPT(EDD1)------| | |<----------------------------RPT(EDD1)-------| | | | In a simple network, a Manager interacts with multiple Agents. Figure 2 In this figure, the Manager configures Agents A and B to produce EDD1 every second in (1). At some point in the future, upon receiving and configuring this message, Agents A and B then build a Report Entry containing EDD1 and send those reports back to the Manager in (2). 8.2.3. Multiplexed Management Networks spanning multiple administrative domains may require multiple Managers (for example, one per domain). When a Manager defines custom Reports/Variables to an Agent, that definition may be tagged with an access control list (ACL) to limit what other Managers will be privy to this information. Managers in such networks should synchronize with those other Managers granted access to their custom data definitions. When Agents generate messages, they MUST only send messages to Managers according to these ACLs, if present. The control flows in this scenario are outlined in Figure 3. Birrane Expires May 3, 2018 [Page 21] Internet-Draft AMA October 2017 Multiplexed Management Control Flow +-----------+ +-------+ +-----------+ | Manager A | | Agent | | Manager B | +-----+-----+ +---+---+ +-----+-----+ | | | |---DEF(A,V1,EDD1*2)-->|<-DEF(B, V2, EDD2*2)--| (1) | | | |---PROD(1s, V1)------>|<---PROD(1s, V2)------| (2) | | | |<--------RPT(V1)------| | (3) | |--------RPT(V2)------>| |<--------RPT(V1)------| | | |--------RPT(V2)------>| | | | | |<---PROD(1s, V1)------| (4) | | | | |---ERR(V1 no perm.)-->| | | | |--DEF(*,V3,EDD3*3)--->| | (5) | | | |---PROD(1s, V3)------>| | (6) | | | | |<----PROD(1s, V3)-----| | | | |<--------RPT(V3)------|--------RPT(V3)------>| (7) |<--------RPT(V1)------| | | |--------RPT(V2)------>| |<-------RPT(V3)-------|--------RPT(V3)------>| |<-------RPT(V1)-------| | | |--------RPT(V2)------>| Complex networks require multiple Managers interfacing with Agents. Figure 3 In more complex networks, any Manager may choose to define custom Reports and Variables, and Agents may need to accept such definitions from multiple Managers. Variable definitions may include an ACL that describes who may query and otherwise understand these definitions. In (1), Manager A defines V1 only for A while Manager B defines V2 only for B. Managers may, then, request the production of Report Entries containing these definitions, as shown in (2). Agents produce different data for different Managers in accordance with configured production rules, as shown in (3). If a Manager requests the production of a custom definition for which the Manager has no permissions, a response consistent with the configured logging policy on the Agent should be implemented, as shown in (4). Alternatively, Birrane Expires May 3, 2018 [Page 22] Internet-Draft AMA October 2017 as shown in (5), a Manager may define custom data with no restrictions allowing all other Managers to request and use this definition. This allows all Managers to request the production of Report Entries containing this definition, shown in (6) and have all Managers receive this and other data going forward, as shown in (7). 8.2.4. Data Fusion In some networks, Agents do not individually transmit their data to a Manager, preferring instead to fuse reporting data with local nodes prior to transmission. This approach reduces the number and size of messages in the network and reduces overall transmission energy expenditure. The AMA supports fusion of NM reports by co-locating Agents and Managers on nodes and offloading fusion activities to the Manager. This process is illustrated in Figure 4. Data Fusion Control Flow +-----------+ +-----------+ +---------+ +---------+ | Manager A | | Manager B | | Agent B | | Agent C | +---+-------+ +-----+-----+ +----+----+ +----+----+ | | | | |--DEF(A,V0,EDD1+AD2)->| | | (1) |--PROD(EDD1&AD2,V0)-->| | | | | | | | |--PROD(1s,EDD1)->| | (2) | |------------------PROD(1s, EDD2)->| | | | | | |<---RPT(EDD1)----| | (3) | |<------------------RPT(EDD2)------| | | | | |<-----RPT(A,V0)-------| | | (4) | | | | Data fusion occurs amongst Managers in the network. Figure 4 In this example, Manager A requires the production of a Variable V0, from node B, as shown in (1). The Manager role understands what data is available from what agents in the subnetwork local to B, understanding that EDD1 is available locally and EDD2 is available remotely. Production messages are produced in (2) and data collected in (3). This allows the Manager at node B to fuse the collected Report Entries into V0 and return it in (4). While a trivial example, the mechanism of associating fusion with the Manager function rather than the Agent function scales with fusion complexity, though it is important to reiterate that Agent and Birrane Expires May 3, 2018 [Page 23] Internet-Draft AMA October 2017 Manager designations are roles, not individual software components. There may be a single software application running on node B implementing both Manager B and Agent B roles. 9. IANA Considerations At this time, this protocol has no fields registered by IANA. 10. Security Considerations Security within an AMA MUST exist in two layers: transport layer security and access control. Transport-layer security addresses the questions of authentication, integrity, and confidentiality associated with the transport of messages between and amongst Managers and Agents in the AMA. This security is applied before any particular Actor in the system receives data and, therefore, is outside of the scope of this document. Finer grain application security is done via ACLs which are defined via configuration messages and implementation specific. 11. Informative References [BIRRANE1] Birrane, E. and R. Cole, "Management of Disruption- Tolerant Networks: A Systems Engineering Approach", 2010. [BIRRANE2] Birrane, E., Burleigh, S., and V. Cerf, "Defining Tolerance: Impacts of Delay and Disruption when Managing Challenged Networks", 2011. [BIRRANE3] Birrane, E. and H. Kruse, "Delay-Tolerant Network Management: The Definition and Exchange of Infrastructure Information in High Delay Environments", 2011. [I-D.irtf-dtnrg-dtnmp] Birrane, E. and V. Ramachandran, "Delay Tolerant Network Management Protocol", draft-irtf-dtnrg-dtnmp-01 (work in progress), December 2014. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . Birrane Expires May 3, 2018 [Page 24] Internet-Draft AMA October 2017 [RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations for the Simple Network Management Protocol (SNMP)", STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002, . [RFC4838] Cerf, V., Burleigh, S., Hooke, A., Torgerson, L., Durst, R., Scott, K., Fall, K., and H. Weiss, "Delay-Tolerant Networking Architecture", RFC 4838, April 2007. [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, . Author's Address Edward J. Birrane Johns Hopkins Applied Physics Laboratory Email: Edward.Birrane@jhuapl.edu Birrane Expires May 3, 2018 [Page 25]