Conversation
jzemmels
left a comment
There was a problem hiding this comment.
Looks good, I like the new CQL templates. No breaking changes, so I'll approve. Just some scattered throughts.
In the future, exposing the templates more explicitly might be a good, flexible way for users to make more sophisticated queries. Perhaps the format could be building a CQL query string piece-by-piece using a combination of the template helper functions. From a set algebra perspective, I think the primitive operations are and, or, and not.
|
|
||
| # Wildcards: | ||
| if(names(parameter) %in% c("hydrologic_unit_code")){ | ||
| template_path <- system.file("templates/param.CQL2.like", package = "dataRetrieval") |
There was a problem hiding this comment.
Really cool! Is the assumption that if a user wants data for a certain HUC level that they'll also want data for the sub-HUCs (no idea if that's the right terminology)?
There was a problem hiding this comment.
Correct. It's a bit weird I think that sites have different HUC levels. So one site might say 01234 and another 012345678. If you asked for 01234 - you would expect both. When you ask for an individual HUC, like hydrologic_unit_code=01234, the services recently changed to doing the "like". BUT, if you ask for 2 or more hucs, you have to explicitly spell it all out in the CQL.
| return(whisker::whisker.render(template, parameter_list)) | ||
|
|
||
| # Wildcards: | ||
| if(names(parameter) %in% c("hydrologic_unit_code")){ |
There was a problem hiding this comment.
I'm trying to think if there are other parameters that a user might want to match "like" with. Maybe something site_type_cd to capture ST and related site types? Perhaps that's not "assumed" default behavior?
There was a problem hiding this comment.
I don't think so - HUC is the only one Mike's doing on the individual level. If something comes up, we can add it here though.
| quote = ifelse(csv, '\"', ""), | ||
| delim = ifelse(csv, ",", "\t"))) | ||
|
|
||
| retval <- data.table::fread(text = doc, data.table = FALSE, |
There was a problem hiding this comment.
surprisingly not faster in my tests (read_csv is pretty great too!). Also still can't shift off readr until we get rid of the importRDB
| ```{r} | ||
| range <- as.Date(c("2025-01-01", "2026-03-02")) | ||
|
|
||
| complete_df <- data.frame(time = seq.Date(from = range[1], |
There was a problem hiding this comment.
Does the range vector need to be defined if it's only used here?
| complete_df <- data.frame(time = seq.Date(from = range[1], | |
| complete_df <- data.frame(time = seq.Date(from = as.Date("2025-01-01"), |
| ```{r, message=FALSE, warning=FALSE} | ||
| range <- as.Date(c("2025-01-01", "2026-02-01")) | ||
| ```{r} | ||
| range <- as.Date(c("2025-01-01", "2026-03-02")) |
There was a problem hiding this comment.
Delete range if start date is hardcoded on next line
| range <- as.Date(c("2025-01-01", "2026-03-02")) |
| This section will focus on important differences between the statistics and OGC-compliant APIs and other tips for working with the endpoint. | ||
|
|
||
| * **No request limit or API token**: at time of writing, the statistics API does not limit the number of requests that can be made per hour. It also does not require you sign up for an API token. Requesting data from the statistics API does not count against your total request limit to the OGC-compliant APIs. | ||
| * **Higher API limits**: at time of writing, the statistics API is limited to the default limits for any api.gov service which is 4000 requests per hour per IP. The API token used for the other USGS OGC-compliant APIs is included in the request, and changes the limit to 4000 requests per hour per token. There is a chance this could make a difference if you are running code on a shared IP: for example a large office, GitHub, Gitlab, etc. |
| Instead of `time_of_year` and `time_of_year_type` columns, this output contains `start_date`, `end_date`, and `interval_type` columns representing the daterange over which the average was calculated. | ||
| The first row shows the average January, 2024 discharge was about 112 cubic feet per second. | ||
| We again have extra rows: the second row contains the **calendar** year 2024 average and the third contains the **water** year 2024 average. | ||
| Instead of `time_of_year` and `time_of_year_type` columns, this output contains `start_date`, `end_date`, and `interval_type` columns representing the daterange over which the average was calculated. The first row shows the average January, 2024 discharge was about 112 cubic feet per second. We again have extra rows: the second row contains the **calendar** year 2024 average and the third contains the **water** year 2024 average. |
There was a problem hiding this comment.
Personal preference, but I like starting new sentences in the same paragraph on a new line in markdown. I find it more readable. These changes are fine, though.
There was a problem hiding this comment.
Ha, I'll keep that in mind, but we'll have to solidly agree to disagree on that preference 😂
Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>
Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>
Co-authored-by: Joe Zemmels (he/him) <jzemmels@gmail.com>
We can keep this on the backburner. Every time I start going down a thought process like this, I don't come up with a way that is significantly easier for users compared to giving some CQL2 examples (especially since I think we've got the vast majority of user's needs covered with current functionality). |
This PR includes some minor fixes for the field and combined meta.
It switches the WQP table reader to data.table::fread. There are a few minor differences in the WQX3 table, but the fread conversion looks closer to the actual returned text.
The big functional change is adding a wildcard CQL2 template which will allow this:
So the logic is if a user specifies >1 HUC, we'll use the wildcard "like" and "or" template (Huc 0123% OR Huc01222%). Any other parameters will be tacked on with ANDs