NovaStar Data Model / Data

NovaStar data records contain raw, scaled, and optionally rated values (e.g., discharge computed from water level).

Overview
Data Flags
Data Best Practices

Overview

Data records (or data reports) form the basis for time series data that are used in data visualizations and analyses.

Each point's data reports includes the following data, which are inserted into the database using the "data filing" process:

point internal identifier - relation to point
report time - time corresponding to the data value as instantaneous (irregular interval) time step or typicaly an interval-ending timestamp for regular interval data, internally stored in UTC but system will display in the database/system local time
raw value, from one of the following sources:
- sensor measurement
- imported value
- back-calculated from the scaled value using the calibration coefficients when a scaled value is filed in the database
scaled value, from one of the following sources:
- calculated from the raw value using the calibration coefficients
- imported value
- result of an equation
NovaScore - integer score that indicates the severity of the condition, based on scaled value:
- typically 1 to 5 for default NovaScore values
- may use other values if more complex scores have been configured
flags - text and digit characters that provide additional information about data values, automatically set during data processing and can be edited in the Administrator:
- One or more single-character flags are automatically set during data filing and may be reset manually. See the Data Flags section below.
- Source line number(s) for ALERT ports that provided data for the data report. See the Data Flags section below.
calibration identifier - used to relate the data report to the calibration used for the raw/scale value calculation
rating data for up to 5 rated values:
- rating assign identifiers - relationship to rating assignment
- rated value - calculated from scaled value using the rating assignment properties
- rated value flags - similar to data flags but specific to rating table output, one of:
  - O - over the last value in the rating table
  - V - valid (within the range of the rating table)
  - U - under the first value in the rating table
  - What happens when an input value for a rating (e.g., water level) is marked as Q or M - is the rated value null and what is the flag?

Rated values are mainly used to compute derived precipitation data (incremental precipitation, and storm and seasonal precipitation), discharge from water level, and reservoir data computed from water level. Many points will only use raw and scaled values without rated values.

Data Flags

Standard data flags for data records are set automatically and in some cases can be set manually based on data review. One or more data flag characters can be concatenated together to comprise the full data flag, in addition to the line number(s).

The V, Q, and B flags may be automatically changed when data are revalidated (e.g., because a calibration changes). Some flags, such as M and E, are "sticky" and will remain regardless of whether software is run to update or revalidate data. A data record can be permanently deleted or marked as M To ensure that bad data are not shown in displays. However, it is generally a best practice to keep data in the database and allow V and Q to be set based on data check rules.

The flag includes the source line number(s) for ALERT ports that provided data for the data report:

Line numbers are appended to the character flag(s).
On small systems where line numbers are single digits (each <= 9), the concatenated list of line numbers uniquely defines the source.
On large systems where line numbers can be multiple digits (> 10), the digits are treated as if single digits. This is a known issue in the current NovaStar design.
Multiple line numbers may be included indicating that data were received from multiple travel paths.
Line numbers are not used for data imports or other data loading programs. Line numbers are only used with programs that use the ALERT port.

Data Quality Flags (`V` and `Q`)

A data quality flag is initially automatically set during data filing during data collection or import. Only one of the following is set:

V - valid
Q - questionable:
- Data records that are marked with Q are ignored when computations are computed for ratings.
- If data are revalidated, for example because a calibration is changed, data values flagged with Q may be marked as valid (V) due to the new calculated values.
- Questionable data are not shown when viewing only valid data.
- See also the maintenance M flag, which may impact output.

Software may automatically change the flag from V to Q or Q to V as more data are filed. NovaStar generally considers more data to be valid, possibly with alarms, than to mark large events as questionable and risk ignoring people about those events.

Alarm Indicator (`A`)

The A flag is automatically set if an alarm is in effect for a point's data. Each report that is filed while the alarm is active will have the A flag. This allows reviewing historical data to see when an alarm was active.

Break Indicator (`B`)

The B flag indicates a break in a data sequence due to a "large" jump in data value or time. This may be caused by an event or a data issue. The B flag is set at the first data record where the anomaly is detected.

The B flag is automatically set during data processing. It can also be set manually; however, any B flags that are manually set may be changed automatically when data are revalidated.

Break Due to an Event with Large Value Change

The following example indicates cumulative precipitation data reports where the second value is more than the data checking "change interval" (the check interval is .16 inches, or four bucket tips, which is a typical check interval). The break check will be triggered when three consecutive values confirm that a jump has occurred.

Date/Time	Value	Flag
`4/12/2023 15:23:12`	`7.12`	`V`
`4/12/2023 15:27:28`	`7.45`	`Q`
`4/12/2923 15:28:27`	`7.49`	`Q`

The following are the data after the third report after the jump is processed. Does this impact the computation of rated values? or is it just a visual indicator that there was a jump?

How is the B flag used in reviewing data? Is it useful for data quality review or troubleshooting? Is anything to be done or does this just indicate a high precipitation event? If there are many of these and nearby stations don't have the same, could it indicate a tipping bucket that is hanging up?

Date/Time	Value	Flag
`4/12/2023 15:23:12`	`7.12`	`V`
`4/12/2023 15:27:28`	`7.45`	`VB`
`4/12/2923 15:28:27`	`7.49`	`V`
`4/12/2023 15:35:40`	`7.53`	`V`

Break Due to Large Time Change

The B flag may also be set when the time between reports is larger than the data checking time interval. What is the time interval normally set to related to regular reports - want to make sure that the first increment of rain in a new storm is not ignored. In this case, it is assumed that the first value after the time gap is a new baseline and the incremental rainfall is calculated as zero. Subsequent increments will be calculated normally. To ensure that storms are properly handled, it is recommended to verify that the data check time interval is set to what?.

If the data collection system experienced problems, it may be possible to load missing data reports from a log file or other source, and revalidate the data, right?

What program is used to revalidate?

Date/Time	Value	Flag	Increment
`4/12/2023 15:27:28`	`7.45`	`V`	`0.00`
`4/12/2023 15:28:27`	`7.49`	`V`	`0.04`
`4/12/2023 21:35:40`	`7.53`	`VB`	`0.00`

The B flag is typically set as a single record. However, if multiple sequential records have the flag, rated values would not be computed for any of the sequential records. Do any of the comments that I made above make sense to fix this, such as reloading data or changing the data check time interval?

It is possible to remove the B flag by editing data. However, if the data are revalidated, the B will be inserted again.

Verified Data (`E`)

The E flag is used to indicate that data has been reviewed, for example to confirm a large precipitation value. This is in addition to the Q and V flags, which are set based on data checks.

Is this used by software in any way that is specific or is it just a visual indicator?

Maintenance Data (`M`)

The M flag indicates that a point is in maintenance mode, for example when a station is being serviced, and is set in the following cases:

Manually set using the Administrator data editor.
If the station (Or point? In other words if the station is out of service then all points are out of service, but individual points can also be marked out of service, right?) is marked as "out of service", all received records will be marked with M.
It is important that maintainers of data collection networks properly set the station "out of service" during maintenance and if necessary edit data to set the maintenance flag for data records so that such data are not treated as valid data.

The following behaviors are related to the M flag:

The M flag can be set set in addition to Q flag.
The M flag should not be set with V flag because maintenance data is not considered as valid observations. Software enforces this behavior and manual data manipulations using SQL should be careful not to violate this standard.
Data records that are marked with M are ignored when computations are computed for ratings (e.g., precipitation increment, storm, and season total will not include the record, what about discharge calculation? is it a null value and how is the rated value flagged?).
If data are revalidated, for example because a calibration has changed, the M flag will remain, even if Q flag is changed due to new values. Therefore, the rated values will ignore input flagged with M. Is this true?
Maintenance data are not shown by default by reporting programs and data web services.

Data Best Practices

A NovaStar system will perform optimally if data records are of high quality. In the past, data collected using the legacy ALERT protocol could result in significant questionable data, especially during events when many transmissions occurred at the same time. Implementing the ALERT2 protocol typically results in very little questionable data.

The following are recommended best practices:

Review and correct questionable data as soon as possible:
- Out of range values are typically flagged as questionable and therefore remain in the database.
- High values due to an event may need to be changed from questionable to valid, based on a review of neighboring data and other data sources.
- Use the verified (E) data flag to indicate that questionable values that may otherwise be valid have been reviewed.
- Rerun calibration and rating computations to ensure that derived data are consistent.
- Out of range data values due to maintenance should be flagged with M.
Identify and address the root cause of questionable data as soon as possible, for example:
- Replace faulty hardware.
- Fix calibrations.
- Fix ratings.
Review break (B) flags to fine-tune data check criteria:
- How are breaks reviewed?
- What check values should be set?
- Should I add checks in the proposed checkDataReports web service to help with this?
Properly handle maintenance data:
- Set stations to "out of service" status while performing maintenance.
- Review data records to ensure that data recorded during maintenance are marked with M flag.
Implement data quality review information products. Contact TriLynx Systems to implement workflows for system dashboard products.

NovaStar Data Model / Data

Overview

Data Flags

Data Quality Flags (V and Q)

Alarm Indicator (A)

Break Indicator (B)