Skip to content

Add wildcard CQL#866

Merged
ldecicco-USGS merged 17 commits intoDOI-USGS:developfrom
ldecicco-USGS:develop
Mar 9, 2026
Merged

Add wildcard CQL#866
ldecicco-USGS merged 17 commits intoDOI-USGS:developfrom
ldecicco-USGS:develop

Conversation

@ldecicco-USGS
Copy link
Collaborator

This PR includes some minor fixes for the field and combined meta.

It switches the WQP table reader to data.table::fread. There are a few minor differences in the WQX3 table, but the fread conversion looks closer to the actual returned text.

The big functional change is adding a wildcard CQL2 template which will allow this:

hucs <- read_waterdata_combined_meta(
+   hydrologic_unit_code = c("11010008", "11010009"),
+   site_type = c("Stream", "Spring")
+ )
Requesting:
https://api.waterdata.usgs.gov/ogcapi/v0/collections/combined-metadata/items?f=json&lang=en-US&limit=50000
Remaining requests this hour:2275 
                                   
> unique(hucs$site_type)
[1] "Stream" "Spring"
> unique(hucs$hydrologic_unit_code)
 [1] "110100080704" "110100090104" "110100090202" "110100080302" "110100080203"
 [6] "11010008"     "110100080212" "110100080102" "110100080802" "110100090107"
[11] "110100080904" "110100080603" "110100090306" "110100080502" "110100080701"
[16] "110100080209" "110100080902" "110100080206" "110100080309" "110100080210"
[21] "110100090201" "110100090204" "110100080111" "110100080801" "11010009"    
[26] "110100080211" "110100080401" "110100080104" "110100080308" "110100080903"
[31] "110100080806" "110100080101" "110100090302" "110100090305" "110100080112"
[36] "110100090206" "110100090103"

So the logic is if a user specifies >1 HUC, we'll use the wildcard "like" and "or" template (Huc 0123% OR Huc01222%). Any other parameters will be tacked on with ANDs

Copy link
Collaborator

@jzemmels jzemmels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I like the new CQL templates. No breaking changes, so I'll approve. Just some scattered throughts.

In the future, exposing the templates more explicitly might be a good, flexible way for users to make more sophisticated queries. Perhaps the format could be building a CQL query string piece-by-piece using a combination of the template helper functions. From a set algebra perspective, I think the primitive operations are and, or, and not.


# Wildcards:
if(names(parameter) %in% c("hydrologic_unit_code")){
template_path <- system.file("templates/param.CQL2.like", package = "dataRetrieval")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool! Is the assumption that if a user wants data for a certain HUC level that they'll also want data for the sub-HUCs (no idea if that's the right terminology)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It's a bit weird I think that sites have different HUC levels. So one site might say 01234 and another 012345678. If you asked for 01234 - you would expect both. When you ask for an individual HUC, like hydrologic_unit_code=01234, the services recently changed to doing the "like". BUT, if you ask for 2 or more hucs, you have to explicitly spell it all out in the CQL.

return(whisker::whisker.render(template, parameter_list))

# Wildcards:
if(names(parameter) %in% c("hydrologic_unit_code")){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think if there are other parameters that a user might want to match "like" with. Maybe something site_type_cd to capture ST and related site types? Perhaps that's not "assumed" default behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - HUC is the only one Mike's doing on the individual level. If something comes up, we can add it here though.

quote = ifelse(csv, '\"', ""),
delim = ifelse(csv, ",", "\t")))

retval <- data.table::fread(text = doc, data.table = FALSE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprisingly not faster in my tests (read_csv is pretty great too!). Also still can't shift off readr until we get rid of the importRDB

```{r}
range <- as.Date(c("2025-01-01", "2026-03-02"))

complete_df <- data.frame(time = seq.Date(from = range[1],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the range vector need to be defined if it's only used here?

Suggested change
complete_df <- data.frame(time = seq.Date(from = range[1],
complete_df <- data.frame(time = seq.Date(from = as.Date("2025-01-01"),

```{r, message=FALSE, warning=FALSE}
range <- as.Date(c("2025-01-01", "2026-02-01"))
```{r}
range <- as.Date(c("2025-01-01", "2026-03-02"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete range if start date is hardcoded on next line

Suggested change
range <- as.Date(c("2025-01-01", "2026-03-02"))

This section will focus on important differences between the statistics and OGC-compliant APIs and other tips for working with the endpoint.

* **No request limit or API token**: at time of writing, the statistics API does not limit the number of requests that can be made per hour. It also does not require you sign up for an API token. Requesting data from the statistics API does not count against your total request limit to the OGC-compliant APIs.
* **Higher API limits**: at time of writing, the statistics API is limited to the default limits for any api.gov service which is 4000 requests per hour per IP. The API token used for the other USGS OGC-compliant APIs is included in the request, and changes the limit to 4000 requests per hour per token. There is a chance this could make a difference if you are running code on a shared IP: for example a large office, GitHub, Gitlab, etc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing.

Instead of `time_of_year` and `time_of_year_type` columns, this output contains `start_date`, `end_date`, and `interval_type` columns representing the daterange over which the average was calculated.
The first row shows the average January, 2024 discharge was about 112 cubic feet per second.
We again have extra rows: the second row contains the **calendar** year 2024 average and the third contains the **water** year 2024 average.
Instead of `time_of_year` and `time_of_year_type` columns, this output contains `start_date`, `end_date`, and `interval_type` columns representing the daterange over which the average was calculated. The first row shows the average January, 2024 discharge was about 112 cubic feet per second. We again have extra rows: the second row contains the **calendar** year 2024 average and the third contains the **water** year 2024 average.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference, but I like starting new sentences in the same paragraph on a new line in markdown. I find it more readable. These changes are fine, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I'll keep that in mind, but we'll have to solidly agree to disagree on that preference 😂

Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>
Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>
Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>
@ldecicco-USGS
Copy link
Collaborator Author

In the future, exposing the templates more explicitly might be a good, flexible way for users to make more sophisticated queries. Perhaps the format could be building a CQL query string piece-by-piece using a combination of the template helper functions. From a set algebra perspective, I think the primitive operations are and, or, and not.

We can keep this on the backburner. Every time I start going down a thought process like this, I don't come up with a way that is significantly easier for users compared to giving some CQL2 examples (especially since I think we've got the vast majority of user's needs covered with current functionality).

@ldecicco-USGS ldecicco-USGS merged commit 12fe6c8 into DOI-USGS:develop Mar 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants