NovaStar Program Reference / Data Collection / nspollusgswaterdata
Overview
The nspollusgswaterdata
program reads data from the United States Geological Survey (USGS)
National Water Information System (NWIS) web service and loads the data into the NovaStar database.
The program is typically run in one of the following modes to ingest data into NovaStar:
- Polling run mode:
Configure stations for the
nsautointer
(station automatic interrogation) service, which polls station data at defined intervals and loads data from the previous station poll time to the current time. In this case, thensautointer
program automatically runs thisnspollusgswaterdata
program with--nsautointer
command line parameter and other parameters to control the polling. Polling provides more granular control of data ingestion and ensures that data are loaded if a data source is unavailable for a period of time. - Data import run mode (available as of NovaStar 5.3.2.0):
Run the program without
--nsautointer
(the default) to load data for the specified period but do not use station last polled information. This is a simpler data ingestion approach but can lead to data gaps if a data source is unavailable for longer than the import period.
See the Data Collection overview for background on data collection.
USGS NWIS Web Services
The station data are queried from USGS web services by calling the wget
program for each station.
Currently the RDB (tab-delimited) format is used to read data by
using a URL similar to the following.
https://nwis.waterservices.usgs.gov/nwis/iv/?parameterCd=00060,00065&format=rdb&startDT=2020-01-01T00:00-0700&endDT=2020-01-03T12:00-0700&sites=06752260
- Multiple parameters for a station can be queried by separating parameter codes with comma(s). Therefore, multiple web service requests are made.
- Time zone in the downloaded file is assumed to be local to the station and therefore must be in the same local time zone as the NovaStar base station.
- The station remote tag must match the USGS site identifier, including leading zero if used for the site.
- The USGS code is currently not filed in the NovaStar database flags.
- If the requested period does not include data for a parameter, the corresponding data columns are not returned and no data will be loaded. Parameters for a station may be added or discontinued over time.
The output is similar to the following format, where tabs are used to separate columns.
Each download is saved in a file with a name similar to /tmp/nspollusgswaterdata/nspollusgswaterdata-20221110094645-842.txt
where the filename includes the time to seconds and the process identifier.
The file named /tmp/nspollusgswaterdata/nspollusgswaterdata-20221110094645-842-wget.txt
contains the wget
program log messages.
The nspollusgswaterdata
program parses the file into data reports that are filed in the NovaStar database.
This process performs data quality checks and will check for alarm conditions on the scaled value for the loaded point,
if alarm triggers are defined for the associated point.
# ---------------------------------- WARNING ----------------------------------------
# Some of the data that you have obtained from this U.S. Geological Survey database may not
# have received Director's approval. Any such data values are qualified as provisional and
# are subject to revision. Provisional data are released on the condition that neither the
# USGS nor the United States Government may be held liable for any damages resulting from its use.
# Go to http://help.waterdata.usgs.gov/policies/provisional-data-statement for more information.
#
# File-format description: http://help.waterdata.usgs.gov/faq/about-tab-delimited-output
# Automated-retrieval info: http://help.waterdata.usgs.gov/faq/automated-retrievals
#
# Contact: gs-w_support_nwisweb@usgs.gov
# retrieved: 2022-11-14 01:08:42 -05:00 (nadww01)
#
# Data for the following 1 site(s) are contained in this file
# USGS 06752260 CACHE LA POUDRE RIVER AT FORT COLLINS, CO
# -----------------------------------------------------------------------------------
#
# TS_ID - An internal number representing a time series.
#
# Data provided for site 06752260
# TS_ID Parameter Description
# 211058 00060 Discharge, cubic feet per second
# 281030 00065 Gage height, feet, [active]
#
# Data-value qualification codes included in this output:
# A Approved for publication -- Processing and review completed.
# R Records for these data have been revised.
# e Value has been estimated.
#
agency_cd site_no datetime tz_cd 211058_00060 211058_00060_cd 281030_00065 281030_00065_cd
5s 15s 20d 6s 14n 10s 14n 10s
USGS 06752260 2022-01-01 00:00 MST 37.7 A:R 0.95 A:R
USGS 06752260 2022-01-01 00:05 MST 37.7 A:R 0.95 A:R
USGS 06752260 2022-01-01 00:10 MST 37.7 A:R 0.95 A:R
USGS 06752260 2022-01-01 00:15 MST 37.7 A:R 0.95 A:R
USGS 06752260 2022-01-01 00:20 MST 37.7 A:R 0.95 A:R
USGS 06752260 2022-01-01 00:25 MST 36.8 A:R 0.94 A:R
USGS 06752260 2022-01-01 00:30 MST 36.8 A:R 0.94 A:R
USGS 06752260 2022-01-01 00:35 MST 36.8 A:R 0.94 A:R
USGS 06752260 2022-01-01 00:40 MST 36.8 A:R 0.94 A:R
USGS 06752260 2022-01-01 00:45 MST 36.8 A:R 0.94 A:R
USGS 06752260 2022-01-01 00:50 MST 37.7 A:R 0.95 A:R
USGS 06752260 2022-01-01 00:55 MST 37.7 A:R 0.95 A:R
USGS 06752260 2022-01-01 01:00 MST 37.7 A:R 0.95 A:R
USGS 06752260 2022-01-01 01:05 MST 37.7 A:R 0.95 A:R
...ommitted...
USGS 06752260 2022-01-03 10:55 MST 61.3 A:R 1.18 A:R
USGS 06752260 2022-01-03 11:00 MST 61.3 A:R 1.18 A:R
USGS 06752260 2022-01-03 11:05 MST 60.2 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:10 MST 60.2 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:15 MST 60.2 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:20 MST 60.2 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:25 MST 60.1 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:30 MST 60.1 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:35 MST 60.1 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:40 MST 60.1 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:45 MST 60.1 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:50 MST 60.1 A:R 1.17 A:R
USGS 06752260 2022-01-03 11:55 MST 60.1 A:R 1.17 A:R
USGS 06752260 2022-01-03 12:00 MST 60.1 A:R 1.17 A:R
Configure for Polling Run Mode
Configure the NovaStar system as follows to use the nspollusgswaterdata
program for automatic polling:
- Enable automatic interrogation using the Administrator:
- Confirm that automatic interrogation is enabled.
See the
nsautointer
documentation.
- Confirm that automatic interrogation is enabled.
See the
- If not defined, configure a station type using the Administrator.
The configuration data values are used by default if not defined for each station.
- Verify that a station type of
USGS waterdata
is available. - Specify station type Identification parameters:
- Protocol - is
Web Page Data Parameter
. This indicates to the Administrator how to display appropriate edit page for the station.
- Protocol - is
- Specify station type Connect parameters:
- Poll line - should be unique among all automatic interrogation programs so that the source of data can be uniquely identified. The line number is shown in data report data flags.
- Path - should be the USGS web services root address
(e.g.,
https://nwis.waterservices.usgs.gov/nwis/iv
). See the NWIS web page for more information. - Timeout - set to a reasonably short time in seconds to query data.
If too long, the overall automatic interrogation cycle will be too long and non-finishing processes will accumulate.
If too short, each query will not complete and multiple retry attempts will result.
For example, use
60
seconds. If a request fails, the station last poll time will not be set and a longer and longer period will be requested. - Attempts - set to the maximum number of attempts to request data from web services.
- Specify Sampling parameters:
- Interval - not used with USGS data polling
- Time limit - limit polling to the query end time minus this time limit, used to ensure that long periods are not requeried repetitively for stations with configuration issues
- Time offset - interval that is applied to data reports to offset the time, for example to shift the time zone
- Specify Polling parameters:
- Poll command - specify as
nspollusgswaterdata
to run this program. Additional command parameters can be specified after the program name if necessary, for example if not available in any other Administrator editor inputs.
- Poll command - specify as
- Verify that a station type of
- Add/configure stations using the Administrator for each station that will receive its data from the USGS web service:
- If a station has not been added use the Station List / Add button to add the station.
- Specify Identification parameters:
- Specify the Remote Tag as the USGS site identifier. This is stored in the database as text so include leading zeros.
- Select a Type of
USGS Waterdata
, corresponding to the station type.
- Specify Connect parameters:
- Poll line - should be unique among all automatic interrogation programs so that the source of data can be uniquely identified.
- Receive Lines - not used
- Path - should be the USGS web services root address
(e.g.,
https://nwis.waterservices.usgs.gov/nwis/iv
). See the NWIS web page for more information. Leave blank to use the station type value. - Timeout - set to a reasonably short time to query data.
If too long, the overall automatic interrogation cycle will be too long and non-finishing processes will accumulate.
If too short, each query will not complete and multiple retries will result.
For example, use
60
seconds. If a request fails, the station last poll time will not be set and a longer and longer period will be requested. - Attempts - set to the maximum number of attempts to request data from web services.
- Remote login - not used for USGS data polling
- Password - not used for USGS data polling
- Specify Sampling parameters:
- Interval - not used with USGS data polling
- Time limit - limit polling to the query end time minus this time limit, used to ensure that long periods are not requeried repetitively for stations with configuration issues
- Time offset - interval that is applied to data reports to offset the time, for example to shift the time zone
- Specify Polling parameters:
- Last polled - time of last successful poll, used to manually reset the time, for example after restoring data from a manual load, will be automatically set when polling is running
- Interval - interval to wait between station polling
- Base time - base time relative to midnight to offset the cumulative interval, used to stagger station polling time schedules
- Order - relative order for all stations that are polled at the same time
- Add/configure points using the Administrator:
- If a point has not been added, use the Point List / Add button to add the point for the station.
- Configure point properties as appropriate. For example, the data units should be set consistent with the polled data parameters.
- Verify that Identification values that are used by this polling program are specified:
- Data position - not used (prior to NovaStar 5.3.2.0,
specify as
1
for first parameter in file,2
for second parameter in file, etc.) - Parameter - specify the USGS parameter code
corresponding to the point's data type (e.g.,
60
corresponds to USGS00060
for discharge, and65
corresponds to USGS00065
for water level). NovaStar stores the parameter as an integer so do not include leading zeros.
- Data position - not used (prior to NovaStar 5.3.2.0,
specify as
Configure for Scheduled Data Import Run Mode
To configure the program for data import, configure the station and point as if polling is used. This provides the USGS station remote identifier (for the station), and the USGS parameter (for the points).
Then use the Administrator to configure a scheduled process that
calls the nspollusgswaterdata
program with the desired command line.
- Do not use the
--nsautointer
command parameter (default is to not use). - Use the
-s
parameter to indicate station numerical identifier(s) to query from the USGS web services. - Because station last poll time is ignored for a data import run, use the
-tp
command line parameter to specify the time offset from current time to read data. - Do not use the
-ts
or-te
parameters since the data period is relative to the current time.
Run One-time Data Import
If a one-time data import is required, for example to populate a historical period or period when the system was down:
- Configure the station and point as described for polling.
- Run the program from the command line.
- Use the
-s
parameter to indicate station numerical identifier(s) to query from the USGS web services. - Use the
-ts
and-te
parameters to specify the period to load data.
Command Line Usage
The command line syntax is:
nspollusgswaterdata [-s or -i or -k] [parameters]
Optional values are shown in square brackets. Command line parameters are as follows.
nspollusgswaterdata
Command Line Parameters
Parameter | Description | Default |
---|---|---|
-s StationNumId1[,StationNumId2,...] |
Station numerical identifiers to poll, separated by commas. | Specify -s , -i , or -k . |
-i PointNumId1[,PointNumId2,...] |
Point numerical identifiers to poll, separated by commas. | Specify -s , -i , or -k . |
-k RemoteTag1[,RemoteTag2,...] |
Station remote tags to poll, separated by commas. | Specify -s , -i , or -k . |
-d # |
Debug display level. | 0 |
-f |
Do NOT file interrogation results in database, useful to test data retrieval without filing. | File data. |
-h , -help , --help |
Display help. | |
--nsautointer |
This parameter was added for NovaStar 5.3.2.0 to allow running the program in polling or data import mode. Run the program consistent with nsautointer polling. Database station and point data provide polling configuration. The station last poll time will be updated if successful. This parameter is provided by the nsautointer program. |
Run in data import mode. |
-q , -quiet , --quiet |
Do not display activity messages. | Display messages. |
-S # |
Source number for data report flags. | 1 |
-t # |
Poll data type from nsautointer program, automatically set and cannot be changed. |
4 - logged data between start and stop times |
-tp PollInterval |
The interval offset to determine the data query start time (from the nsautointer -tp parameter):
|
2 hours before the current time. |
-ts MM/DD/YY-hh:mm:ss |
Request data starting at this time (overrides the -tp offset from current time). |
2 hours before the current time. |
-te MM/DD/YY-hh:mm:ss |
Request data ending at this time. | Current time. |
-v , -version , --version |
Display the program version. | |
Possible future enhancements: | ||
--points PointNumId1:param1,PointNumId2:param2 |
Possible future enhancement. Specify the map of point numerical identifier to USGS parameter, used with scheduled and one-time data loads so that the parameter does not need to be defined in the database. | Parameter must be defined in the database for each point with USGS data. |
Examples
Scheduling
Scheduling is handled by nsautointer
service program if running in polled mode,
or the scheduler if running in data import mode.
NovaStar Administrator Interface
See the Overview section for information about using the Administrator to enable USGS station polling.
Troubleshooting
- See the
/tmp/nspollusgswaterdata/*
files for the downloaded data file andwget
log file. - See the
nsautointer
Troubleshooting