Skip to content

NovaStar Program Reference / Data Collection / nspollusgswaterdata


Overview

The nspollusgswaterdata program reads data from the United States Geological Survey (USGS) National Water Information System (NWIS) web service and loads the data into the NovaStar database. The program is typically run in one of the following modes to ingest data into NovaStar:

  • Polling run mode: Configure stations for the nsautointer (station automatic interrogation) service, which polls station data at defined intervals and loads data from the previous station poll time to the current time. In this case, the nsautointer program automatically runs this nspollusgswaterdata program with --nsautointer command line parameter and other parameters to control the polling. Polling provides more granular control of data ingestion and ensures that data are loaded if a data source is unavailable for a period of time.
  • Data import run mode (available as of NovaStar 5.3.2.0): Run the program without --nsautointer (the default) to load data for the specified period but do not use station last polled information. This is a simpler data ingestion approach but can lead to data gaps if a data source is unavailable for longer than the import period.

See the Data Collection overview for background on data collection.

USGS NWIS Web Services

The station data are queried from USGS web services by calling the wget program for each station. Currently the RDB (tab-delimited) format is used to read data by using a URL similar to the following.

https://nwis.waterservices.usgs.gov/nwis/iv/?parameterCd=00060,00065&format=rdb&startDT=2020-01-01T00:00-0700&endDT=2020-01-03T12:00-0700&sites=06752260
  • Multiple parameters for a station can be queried by separating parameter codes with comma(s). Therefore, multiple web service requests are made.
  • Time zone in the downloaded file is assumed to be local to the station and therefore must be in the same local time zone as the NovaStar base station.
  • The station remote tag must match the USGS site identifier, including leading zero if used for the site.
  • The USGS code is currently not filed in the NovaStar database flags.
  • If the requested period does not include data for a parameter, the corresponding data columns are not returned and no data will be loaded. Parameters for a station may be added or discontinued over time.

The output is similar to the following format, where tabs are used to separate columns. Each download is saved in a file with a name similar to /tmp/nspollusgswaterdata/nspollusgswaterdata-20221110094645-842.txt where the filename includes the time to seconds and the process identifier. The file named /tmp/nspollusgswaterdata/nspollusgswaterdata-20221110094645-842-wget.txt contains the wget program log messages.

The nspollusgswaterdata program parses the file into data reports that are filed in the NovaStar database. This process performs data quality checks and will check for alarm conditions on the scaled value for the loaded point, if alarm triggers are defined for the associated point.

# ---------------------------------- WARNING ----------------------------------------
# Some of the data that you have obtained from this U.S. Geological Survey database may not 
# have received Director's approval.  Any such data values are qualified as provisional and 
# are subject to revision.  Provisional data are released on the condition that neither the 
# USGS nor the United States Government may be held liable for any damages resulting from its use.
#  Go to http://help.waterdata.usgs.gov/policies/provisional-data-statement for more information.
#
# File-format description:  http://help.waterdata.usgs.gov/faq/about-tab-delimited-output
# Automated-retrieval info: http://help.waterdata.usgs.gov/faq/automated-retrievals
#
# Contact:   gs-w_support_nwisweb@usgs.gov
# retrieved: 2022-11-14 01:08:42 -05:00 (nadww01)
#
# Data for the following 1 site(s) are contained in this file
#    USGS 06752260 CACHE LA POUDRE RIVER AT FORT COLLINS, CO
# -----------------------------------------------------------------------------------
#
# TS_ID - An internal number representing a time series.
#
# Data provided for site 06752260
#    TS_ID       Parameter Description
#    211058      00060     Discharge, cubic feet per second
#    281030      00065     Gage height, feet, [active]
#
# Data-value qualification codes included in this output:
#     A  Approved for publication -- Processing and review completed.
#     R  Records for these data have been revised.
#     e  Value has been estimated.
#
agency_cd   site_no datetime    tz_cd   211058_00060    211058_00060_cd 281030_00065    281030_00065_cd
5s  15s 20d 6s  14n 10s 14n 10s
USGS    06752260    2022-01-01 00:00    MST 37.7    A:R 0.95    A:R
USGS    06752260    2022-01-01 00:05    MST 37.7    A:R 0.95    A:R
USGS    06752260    2022-01-01 00:10    MST 37.7    A:R 0.95    A:R
USGS    06752260    2022-01-01 00:15    MST 37.7    A:R 0.95    A:R
USGS    06752260    2022-01-01 00:20    MST 37.7    A:R 0.95    A:R
USGS    06752260    2022-01-01 00:25    MST 36.8    A:R 0.94    A:R
USGS    06752260    2022-01-01 00:30    MST 36.8    A:R 0.94    A:R
USGS    06752260    2022-01-01 00:35    MST 36.8    A:R 0.94    A:R
USGS    06752260    2022-01-01 00:40    MST 36.8    A:R 0.94    A:R
USGS    06752260    2022-01-01 00:45    MST 36.8    A:R 0.94    A:R
USGS    06752260    2022-01-01 00:50    MST 37.7    A:R 0.95    A:R
USGS    06752260    2022-01-01 00:55    MST 37.7    A:R 0.95    A:R
USGS    06752260    2022-01-01 01:00    MST 37.7    A:R 0.95    A:R
USGS    06752260    2022-01-01 01:05    MST 37.7    A:R 0.95    A:R
...ommitted...
USGS    06752260    2022-01-03 10:55    MST 61.3    A:R 1.18    A:R
USGS    06752260    2022-01-03 11:00    MST 61.3    A:R 1.18    A:R
USGS    06752260    2022-01-03 11:05    MST 60.2    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:10    MST 60.2    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:15    MST 60.2    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:20    MST 60.2    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:25    MST 60.1    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:30    MST 60.1    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:35    MST 60.1    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:40    MST 60.1    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:45    MST 60.1    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:50    MST 60.1    A:R 1.17    A:R
USGS    06752260    2022-01-03 11:55    MST 60.1    A:R 1.17    A:R
USGS    06752260    2022-01-03 12:00    MST 60.1    A:R 1.17    A:R

Configure for Polling Run Mode

Configure the NovaStar system as follows to use the nspollusgswaterdata program for automatic polling:

  1. Enable automatic interrogation using the Administrator:
    1. Confirm that automatic interrogation is enabled. See the nsautointer documentation.
  2. If not defined, configure a station type using the Administrator. The configuration data values are used by default if not defined for each station.
    1. Verify that a station type of USGS waterdata is available.
    2. Specify station type Identification parameters:
      • Protocol - is Web Page Data Parameter. This indicates to the Administrator how to display appropriate edit page for the station.
    3. Specify station type Connect parameters:
      • Poll line - should be unique among all automatic interrogation programs so that the source of data can be uniquely identified. The line number is shown in data report data flags.
      • Path - should be the USGS web services root address (e.g., https://nwis.waterservices.usgs.gov/nwis/iv). See the NWIS web page for more information.
      • Timeout - set to a reasonably short time in seconds to query data. If too long, the overall automatic interrogation cycle will be too long and non-finishing processes will accumulate. If too short, each query will not complete and multiple retry attempts will result. For example, use 60 seconds. If a request fails, the station last poll time will not be set and a longer and longer period will be requested.
      • Attempts - set to the maximum number of attempts to request data from web services.
    4. Specify Sampling parameters:
      • Interval - not used with USGS data polling
      • Time limit - limit polling to the query end time minus this time limit, used to ensure that long periods are not requeried repetitively for stations with configuration issues
      • Time offset - interval that is applied to data reports to offset the time, for example to shift the time zone
    5. Specify Polling parameters:
      • Poll command - specify as nspollusgswaterdata to run this program. Additional command parameters can be specified after the program name if necessary, for example if not available in any other Administrator editor inputs.
  3. Add/configure stations using the Administrator for each station that will receive its data from the USGS web service:
    1. If a station has not been added use the Station List / Add button to add the station.
    2. Specify Identification parameters:
      1. Specify the Remote Tag as the USGS site identifier. This is stored in the database as text so include leading zeros.
      2. Select a Type of USGS Waterdata, corresponding to the station type.
    3. Specify Connect parameters:
      • Poll line - should be unique among all automatic interrogation programs so that the source of data can be uniquely identified.
      • Receive Lines - not used
      • Path - should be the USGS web services root address (e.g., https://nwis.waterservices.usgs.gov/nwis/iv). See the NWIS web page for more information. Leave blank to use the station type value.
      • Timeout - set to a reasonably short time to query data. If too long, the overall automatic interrogation cycle will be too long and non-finishing processes will accumulate. If too short, each query will not complete and multiple retries will result. For example, use 60 seconds. If a request fails, the station last poll time will not be set and a longer and longer period will be requested.
      • Attempts - set to the maximum number of attempts to request data from web services.
      • Remote login - not used for USGS data polling
      • Password - not used for USGS data polling
    4. Specify Sampling parameters:
      • Interval - not used with USGS data polling
      • Time limit - limit polling to the query end time minus this time limit, used to ensure that long periods are not requeried repetitively for stations with configuration issues
      • Time offset - interval that is applied to data reports to offset the time, for example to shift the time zone
    5. Specify Polling parameters:
      • Last polled - time of last successful poll, used to manually reset the time, for example after restoring data from a manual load, will be automatically set when polling is running
      • Interval - interval to wait between station polling
      • Base time - base time relative to midnight to offset the cumulative interval, used to stagger station polling time schedules
      • Order - relative order for all stations that are polled at the same time
  4. Add/configure points using the Administrator:
    1. If a point has not been added, use the Point List / Add button to add the point for the station.
    2. Configure point properties as appropriate. For example, the data units should be set consistent with the polled data parameters.
    3. Verify that Identification values that are used by this polling program are specified:
      • Data position - not used (prior to NovaStar 5.3.2.0, specify as 1 for first parameter in file, 2 for second parameter in file, etc.)
      • Parameter - specify the USGS parameter code corresponding to the point's data type (e.g., 60 corresponds to USGS 00060 for discharge, and 65 corresponds to USGS 00065 for water level). NovaStar stores the parameter as an integer so do not include leading zeros.

Configure for Scheduled Data Import Run Mode

To configure the program for data import, configure the station and point as if polling is used. This provides the USGS station remote identifier (for the station), and the USGS parameter (for the points).

Then use the Administrator to configure a scheduled process that calls the nspollusgswaterdata program with the desired command line.

  • Do not use the --nsautointer command parameter (default is to not use).
  • Use the -s parameter to indicate station numerical identifier(s) to query from the USGS web services.
  • Because station last poll time is ignored for a data import run, use the -tp command line parameter to specify the time offset from current time to read data.
  • Do not use the -ts or -te parameters since the data period is relative to the current time.

Run One-time Data Import

If a one-time data import is required, for example to populate a historical period or period when the system was down:

  1. Configure the station and point as described for polling.
  2. Run the program from the command line.
  3. Use the -s parameter to indicate station numerical identifier(s) to query from the USGS web services.
  4. Use the -ts and -te parameters to specify the period to load data.

Command Line Usage

The command line syntax is:

nspollusgswaterdata [-s or -i or -k] [parameters]

Optional values are shown in square brackets. Command line parameters are as follows.

nspollusgswaterdata Command Line Parameters

Parameter                                                          Description Default
-s StationNumId1[,StationNumId2,...] Station numerical identifiers to poll, separated by commas. Specify -s, -i, or -k.
-i PointNumId1[,PointNumId2,...] Point numerical identifiers to poll, separated by commas. Specify -s, -i, or -k.
-k RemoteTag1[,RemoteTag2,...] Station remote tags to poll, separated by commas. Specify -s, -i, or -k.
-d # Debug display level. 0
-f Do NOT file interrogation results in database, useful to test data retrieval without filing. File data.
-h, -help, --help Display help.
--nsautointer This parameter was added for NovaStar 5.3.2.0 to allow running the program in polling or data import mode. Run the program consistent with nsautointer polling. Database station and point data provide polling configuration. The station last poll time will be updated if successful. This parameter is provided by the nsautointer program. Run in data import mode.
-q, -quiet, --quiet Do not display activity messages. Display messages.
-S # Source number for data report flags. 1
-t # Poll data type from nsautointer program, automatically set and cannot be changed. 4 - logged data between start and stop times
-tp PollInterval The interval offset to determine the data query start time (from the nsautointer -tp parameter):
  • With --nsautointer: offset from the start time
  • Without --nsautointer: offset from the default ending time. The -te and -ts parameters take precedence.
2 hours before the current time.
-ts MM/DD/YY-hh:mm:ss Request data starting at this time (overrides the -tp offset from current time). 2 hours before the current time.
-te MM/DD/YY-hh:mm:ss Request data ending at this time. Current time.
-v, -version, --version Display the program version.
Possible future enhancements:
--points PointNumId1:param1,PointNumId2:param2 Possible future enhancement. Specify the map of point numerical identifier to USGS parameter, used with scheduled and one-time data loads so that the parameter does not need to be defined in the database. Parameter must be defined in the database for each point with USGS data.

Examples

Scheduling

Scheduling is handled by nsautointer service program if running in polled mode, or the scheduler if running in data import mode.

NovaStar Administrator Interface

See the Overview section for information about using the Administrator to enable USGS station polling.

Troubleshooting