NovaStar / System Design / Data Node

Introduction
Data Node Software
NovaStar Database
Custom Databases, Users, and Applications
Information Products
Data Node Operations

Introduction

A "data node" is a NovaStar server (node) that is dedicated to providing access to NovaStar data and information products. Whereas data web services, Operator, and other software can provide access to NovaStar data in a private network that is running NovaStar data collection processes, a data node is designed to not run data collection processes and instead focuses on providing public data access. A data node can be optimized for data publishing.

Data Node Software

The data node software includes a subset of the NovaStar core software. Because the data node is not involved in data collection, data collection programs are disabled on the data node. TriLynx Systems will continue to streamline data node software by removing unnecessary software. The following software components are installed on a data node by TriLynx System staff:

PostgreSQL database and NovaStar database
Slony software to replicate the primary node database to the data node (the same version of Slony must be instaled on all nodes but the PostgreSQL versions and operating system versions can be different)
Administrative web services (used by the Administrator)
Administrator for database administration tasks (the new Administrator is recommended and the legacy Administrator is being phased out)
Data web services (used by the Operator, Administrator, and Data Explorer, and for public data access)
Operator for public access
Data Explorer for public access
Apache web server to provide access to NovaStar and other resources
May implement NGINX to increase performance

Additional software that is needed, for example for custom databases and applications, should be coordinated with TriLynx Systems to ensure that there are no adverse impacts to NovaStar components.

NovaStar Database

The NovaStar database on the data node is replicated from the primary node using Slony software. Most of the NovaStar tables are replicated, with the exceptions being configuration tables that are specific to the server node. The data node is not a backup server and will not assume data collection tasks of the primary node during failover.

Slony replication occurs by duplicating SQL transactions on the data node that occur on the primary node. This ensures that replication occurs in real-time and results in an exact duplicate, including primary key values, and row deletions.

The NovaStar database on the data node can therefore be used to publish data in near real-time, with low latency between the data collection system and the data node. Latency is introduced by querying and formatting output products. The data node can be optimized to reduce latency, for example by enabling caching web service responses and generating materialized information products as files that are available from Apache.

Because the replicated NovaStar database is a full duplicate, it contains the full historical record, and consequently data tables can be large and slow to process. For this reason, custom databases can be used to create subsets of data that are used for optimized applications. See the next section for more information.

Custom Databases, Users, and Applications

The default data node configuration is similar to a NovaStar core node, with database and operating system users necessary for NovaStar:

The NovaStar database is named novastar and requires users novastar and guest for normal operations.
The service account trilynx is used for command-line operations and system administration.
If built-in NovaStar features are used, for example exporting data, the /usr/ns/cus folder can be used for configuration and output files.

Custom databases can be defined for local data using the following guidelines, making sure to coordinate with TriLynx Systems support staff:

Create an operating system user or service account for the person(s) that will run command line programs and scheduled tasks:
- The user's files will reside in the /home/user folder (where user is the specific user name).
- The user's .pgpass file can be created to provide PostgreSQL credentials for command-line programs.
Create NovaStar database user(s) necessary to run database applications:
- The novastar user should not be used if possible, in order to limit inadvertant impacts on the NovaStar database.
- Create an administrative account such as mhfd-admin to perform administrative tasks in the custom database.
- The guest user can be used for read-only operations. Additional accounts (e.g., mhfd-guest) can be created.
Create custom database:
- Custom database names should be clearly distinct from NovaStar and built-in PostgreSQL names.
- Custom database schema, functions, etc. should be isolated from the NovaStar database, with fundtions, views, etc., defined only in the custom database. Use database schema comments to document tables, views, etc.

Custom application software can be installed and configured as follows:

Use apt-get to install required Linux software.
Install custom software in the user account or coordinate with TriLynx and install in /opt or other suitable location.
Configure scheduled processes to run via cron using the custom user account.
Use system resources such as temporary files, as needed, with appropriate clean-up.
Coordinate with TriLynx Systems to evaluate how to monitor system performance of custom applications.

Documentation for custom database, users, and applications should be provided to TriLynx Systems to incorporate in the System Dashboard, which will help TriLynx Systems staff during support and updates.

Information Products

Information products can be published on the data node. The following are technical considerations:

Data web services can be accessed as usual:
- Use the API to build URLs that are used by software applications.
- Additional caching, authentication, and throttling features are under development.
- Additional web service endpoints can be added to provide standard products.
The Apache web server can serve content, including materialized information products:
- Folders and files can be created in the data node's /www folder using appropriate sub-folders.
- Landing pages for datasets can be created to facilitate access.
- The Apache configuration can map website folders to other folders on the system, for example if files will be created somewhere other than /www.
- Scheduled processes can be configured to automatically generate files from NovaStar database queried and application output.
New web services can be developed to provide information from custom databases
- TriLynx Systems can develop services.

Data Node Operations

The following are typical operations that occur on the data node and require administrative attention.

Resource Monitoring:
- The data node performance will be monitored by TriLynx using Zabbix.
- Situations that are out of standard operating range will be addressed by TriLynx support staff.
- Chronic issues may require additional resources to address.
Database and File Backups:
- Backing up the data node database is not envisioned because it is a replica of the primary node database, which is backed up daily.
- Custom database and application backups should be implemented by coordinating between local system administrators and TriLynx Systems support staff.
System Updates:
- TriLynx Systems staff will coordinate periodic updates, including NovaStar core system updates and updates to specific components.
- Custom databases and applications will be retained during updates. Documentation for custom components will help ensure that resources are retained.