Add wildcard CQL by ldecicco-USGS · Pull Request #866 · DOI-USGS/dataRetrieval

ldecicco-USGS · 2026-03-02T22:46:30Z

This PR includes some minor fixes for the field and combined meta.

It switches the WQP table reader to data.table::fread. There are a few minor differences in the WQX3 table, but the fread conversion looks closer to the actual returned text.

The big functional change is adding a wildcard CQL2 template which will allow this:

hucs <- read_waterdata_combined_meta(
+   hydrologic_unit_code = c("11010008", "11010009"),
+   site_type = c("Stream", "Spring")
+ )
Requesting:
https://api.waterdata.usgs.gov/ogcapi/v0/collections/combined-metadata/items?f=json&lang=en-US&limit=50000
Remaining requests this hour:2275 
                                   
> unique(hucs$site_type)
[1] "Stream" "Spring"
> unique(hucs$hydrologic_unit_code)
 [1] "110100080704" "110100090104" "110100090202" "110100080302" "110100080203"
 [6] "11010008"     "110100080212" "110100080102" "110100080802" "110100090107"
[11] "110100080904" "110100080603" "110100090306" "110100080502" "110100080701"
[16] "110100080209" "110100080902" "110100080206" "110100080309" "110100080210"
[21] "110100090201" "110100090204" "110100080111" "110100080801" "11010009"    
[26] "110100080211" "110100080401" "110100080104" "110100080308" "110100080903"
[31] "110100080806" "110100080101" "110100090302" "110100090305" "110100080112"
[36] "110100090206" "110100090103"

So the logic is if a user specifies >1 HUC, we'll use the wildcard "like" and "or" template (Huc 0123% OR Huc01222%). Any other parameters will be tacked on with ANDs

jzemmels

Looks good, I like the new CQL templates. No breaking changes, so I'll approve. Just some scattered throughts.

In the future, exposing the templates more explicitly might be a good, flexible way for users to make more sophisticated queries. Perhaps the format could be building a CQL query string piece-by-piece using a combination of the template helper functions. From a set algebra perspective, I think the primitive operations are and, or, and not.

jzemmels · 2026-03-06T17:07:21Z

R/construct_api_requests.R

+
+  # Wildcards:
+  if(names(parameter) %in% c("hydrologic_unit_code")){
+    template_path <- system.file("templates/param.CQL2.like", package = "dataRetrieval")


Really cool! Is the assumption that if a user wants data for a certain HUC level that they'll also want data for the sub-HUCs (no idea if that's the right terminology)?

Correct. It's a bit weird I think that sites have different HUC levels. So one site might say 01234 and another 012345678. If you asked for 01234 - you would expect both. When you ask for an individual HUC, like hydrologic_unit_code=01234, the services recently changed to doing the "like". BUT, if you ask for 2 or more hucs, you have to explicitly spell it all out in the CQL.

jzemmels · 2026-03-06T17:07:54Z

R/construct_api_requests.R

-  return(whisker::whisker.render(template, parameter_list))
+
+  # Wildcards:
+  if(names(parameter) %in% c("hydrologic_unit_code")){


I'm trying to think if there are other parameters that a user might want to match "like" with. Maybe something site_type_cd to capture ST and related site types? Perhaps that's not "assumed" default behavior?

I don't think so - HUC is the only one Mike's doing on the individual level. If something comes up, we can add it here though.

jzemmels · 2026-03-06T17:12:56Z

R/importWQP.R

-                                               quote = ifelse(csv, '\"', ""),
-                                               delim = ifelse(csv, ",", "\t")))
+
+  retval <- data.table::fread(text = doc, data.table = FALSE,


surprisingly not faster in my tests (read_csv is pretty great too!). Also still can't shift off readr until we get rid of the importRDB

R/read_waterdata_field_measurements.R

jzemmels · 2026-03-06T17:17:42Z