ft: request parser #100

kennedymwavu · 2025-02-03T15:24:12Z

this initial draft addresses #99 and #95

a general query parser that just works for "application/json", "multipart/form-data", and "application/x-www-form-urlencoded".
we now parse query parameters using webutils::parse_query(), following this comment.
DESCRIPTION file has been updated to add {webutils} & {yyjsonr} as dependencies.
i've included an example on how to use the parser (it's a bit long though).

… of deprecating them

…-urlencoded

JohnCoene

I think it's a good start but there are quite a few things I would change. Let me know what you think.

In essence, I think we should make this more flexible, there are too many content types to hardcode like this (I know that the current version does not support all content types but it'd be a nice feature).

E.g.: Put it like this, with the suggested API how do I ingest XML?

I think what we need is an internal map of content-type headers to parser, by default we'd provide the 2-3 we have here. Then, internally, in the parse method (if we have such an API, do we need parse_json, parse_multipart, etc.? Can't we just have parse?) we fetch the appropriate parser from the map based on the request content-type.

I'm talking about a map as in mapper (if you want to use that, don't depend on it just copy the code).

We'd have an internal map and expose a function to allow users to add/overwrite to the map.

# pseudo code
m <- map()

# internal defaults
m$set("application/json", parse_json)
m$set("application/form-data", parse_multipart)
# ...

# can't think of a better name for the function right now :(
set_parser <- function(content_type, callback){
  m$set(content_type, callback)
}

# this is the parse method...
parse = \(){
  fn <- m$get(req$CONTENT_TYPE)

  fn(req)
}

JohnCoene · 2025-02-03T23:07:44Z

R/parser.R

+    "multipart/form-data",
+    "application/x-www-form-urlencoded"
+  )
+  content_type <- if (is.null(content_type)) {


I don't understand this line, what we should parse is always req$CONTENT_TYPE, we should use the parser that matches the content type, why the if and match.arg?

a common usecase is when req$CONTENT_TYPE is "text/plain" but you want it parsed as "application/json".
the if statement ensures that if you've specified the content type it's one of the 3 supported ones, otherwise defaults to req$CONTENT_TYPE.

JohnCoene · 2025-02-03T23:09:28Z

R/parser.R

+    return(list())
+  }
+
+  content_type_choices <- c(


I wouldn't hardcode that. I know that in the current version it is technically hardcoded but how do I now support XML?

Note there are too many content types to hardcode them.

that's a valid concern. my thinking was for ambiorix::parse_req() to only support the 3 most common mime types.

jrosell · 2025-02-04T08:44:59Z

From dev user point of view, it would be awesome if one could owerwrite the default mapper and define what parsers use with what content types.

kennedymwavu · 2025-02-04T10:13:29Z

In essence, I think we should make this more flexible, there are too many content types to hardcode like this (I know that the current version does not support all content types but it'd be a nice feature).

doesn't this take us back to this comment from yesterday?

users will typically know the expected content type for a given endpoint, so if parse_req() doesn’t support it, they can just define and call their own parsers directly. that’s why i initially felt a global mapping system might not be necessary.

A (with the proposed approach):
```
home_get <- \(req, res) {
  body <- parse_req(req)
  # ...
}
```
if the default parser for req$CONTENT_TYPE has been overridden via the map, parse_req() ultimately ends up calling my_custom_parser().
B (with a direct approach):
```
home_get <- \(req, res) {
  body <- my_custom_parser(req)
  # ...
}
```
this skips the indirection and lets users handle their own parsing directly.

allowing users to globally specify custom parsers is effectively the same as defining a parser themselves and calling it in the request handler.

that said, i'm probably overlooking something, so i'd love to hear your thoughts.

jrosell · 2025-02-04T10:20:31Z

I was thinking of something like option(...)

kennedymwavu · 2025-02-04T10:29:28Z

I was thinking of something like option(...)

i've implemented that in the PR but for only the 3 mime types. have a look at the overriding default parsers section.

JohnCoene · 2025-02-04T10:43:11Z

I don't fully follow the proposed solution.

req$CONTENT_TYPE is never actually used: the content type of the incoming request is always ignored, we force a parser because these are called with parse_multipart or parse_json, parse_req supports application/x-www-form-urlencoded but we have no wrapper for it.
What is the intended usage? I'm asking because parse_req is exported so do we intend to call parse_req(req, content_type = "application/json") instead of parse_json? But why parse_json and parse_multipart? They are the same function but with different names so they don't force the parser, and actually depend on req$CONTENT_TYPE. So parse_json and parse_multipart actually parse depending on req$CONTENT_TYPE regardless so calling parse_multipart on a JSON body will work just fine... So why not what I suggested, have one single function that applies the parser corresponding to the incoming request type?

It's really rather confusing.

Passing an instance of a class to a function is really strange, Request is a class, what users want to do here is extend it rather than build around it. I know that's how I did it originally but it's not good I think.

Note if we use {yyjsonr} for the parser can we switch everything to use this package? Everywhere else we use {jsonlite}.

kennedymwavu · 2025-02-04T12:21:10Z

ahaa, i see where the misunderstanding is coming from.

req$CONTENT_TYPE is never actually used: the content type of the incoming request is always ignored...

it is actually the other way round: req$CONTENT_TYPE is always used unless the user has provided the argument content_type. i've stated in the docs:

#' @param content_type String. 'Content-Type' of the request. See details for
#' valid values.
#' By default, this parameter is set to `NULL` and is inferred from the `req`
#' object during run time.
#' The only time you need to provide this argument is if `req$CONTENT_TYPE`
#' is different from how you want the request body to be parsed.
#' For example, `req$CONTENT_TYPE` gives "text/plain;charset=UTF-8" but you want
#' to parse the request body as "application/json".

What is the intended usage? I'm asking because parse_req is exported so do we intend to call parse_req(req, content_type = "application/json") instead of parse_json?

the intended use is to call parse_req(req) for all the 3 mime types. no need to specify the content_type.
But why parse_json and parse_multipart? They are the same function but with different names...

parse_json() & parse_multipart() already exist in ambiorix and are exported:
https://github.com/ambiorix-web/ambiorix/blob/master/R/parser.R

instead of deprecation, i linked them to use parse_req(). so incase anybody is already using those two they'll continue functioning as expected.
parse_req supports application/x-www-form-urlencoded but we have no wrapper for it.

since the end goal is to just use parse_req(), i saw no need of adding this wrapper. the wrappers parse_json() & parse_multipart() are available only coz they were already being exported.
So why not what I suggested, have one single function that applies the parser corresponding to the incoming request type?

we're literally talking about the same thing. the single function in this case is parse_req(). but...
Passing an instance of a class to a function is really strange, Request is a class, what users want to do here is extend it rather than build around it. I know that's how I did it originally but it's not good I think.

i had totally misunderstood you until you mentioned this. so you're going for body <- req$parse() as opposed to body <- parse_req(req). yes, this would be a much better interface.
Note if we use {yyjsonr} for the parser can we switch everything to use this package? Everywhere else we use {jsonlite}.

yes, we'd have to.

JohnCoene · 2025-02-04T13:27:33Z

parse_json and parse_multipart don't actually do what they say: they just call parse_req, parse_json doesn't parse JSON it parses whatever is in Content-Type. As I said, parse_json and parse_multipart are the same function with different names, they both do the same thing.

If we want to keep them in we should hardcode content_type for each so they actually do what is expected. Otherwise it will behave strangely, one can call parse_json on multipart data and never notice. This also means that, for the example you give, where you want to parse JSON where the content type is text/plain parse_json will actually fail.

Also note that this may break quite a few existing code because currently parse_json and parse_multipart are not driven by the Content-Type, in applications where there is a mismatch it will now fail.

parse_json <- function(req, ...){
  parse_req(req, content_type = "application/json", ...)
}

Honestly I think it's because it should be the other way around (but don't bother yet and read on). parse_req should just be a switch on CONTENT_TYPE and the parsing should be done by the individual functions where we'd expect it: parse_json should actually parse JSON, parse_req should call `parse_json, not the other way around.

Anyways, if in the end goal is to use parse_req we should deprecate parse_json and parse_multipart (but I wouldn't, see later).

It just feels too weird me, I apologise. parse_req is just not a nice function, it brings almost nothing to the table and instead makes things very confusing.

I find it treacherous, it's on its face magical as it automagically parses correctly until it doesn't (unsupported type, incorrect incoming Content-Type, etc.) then one has to dig into the documentation to figure out why...
It forces one to go through the documentation because the values that content_type can take are not self-documented, since it accepts NULL we can't have the defaults as arguments function(req, content_type = c("application/json", "...")
The cost for making the API more opaque like we do with parse_req it generally that it can make things extensible, we provide parse_req and only that but it can be extended.

In essence, why add parse_req as this magical parse-any-and-all function if it doesn't do that

since the end goal is to just use parse_req()

The point of the function is to allow the user to be able to ignore this messy Content-Type business, it's generally of little interest but we don't really do that, in the end you still have to do concern yourself with it, expect now with parse_req it's just murkier.

users will typically know the expected content type for a given endpoint

Then why parse_req? Why have individual parsers wrapped into a single larger function that simply switches on the Content-Type if that's not necessary to begin with.

Sorry, I'm just really confused by parse_req. It doesn't seem like it'll work to me. It looks like there are too many cases where one needs to go into the Content-Type and to me parse_json(req) is far clearer and more readable than parse_req(req, content_type = "application/json"). The value in such a function is, maybe, if it's extensible but it looks like it's not the plan.

I would remove parse_req. We could have the 4 most common parsers that users may call explicitly if they want to force parse, parse_json, parse_raw, parse_multipart, and parse_text (express.js body parser provides these too).

kennedymwavu · 2025-02-05T17:51:34Z

you raise good arguments there, and tbh i had not foreseen the impact of making the API more murkier.

the user can create parse_req() for their own use, but i no longer think it's a good idea for the function to be included in ambiorix coz in the end it might end up frustrating users in that one-off chance when it doesn't work, yet from the function name it evangelizes "it can parse anything".

at this point i'm not even sure/convinced of the suggested new parsers: parse_raw() & parse_text(). if you think about it, users will ultimately request more parsers, coz "how come we have a parser for A but not for content type B"?

i have thought of your initial suggestion of req$parse() method + map of content types & corresponding parsers which the user can override/define, and it's so far the best idea.

what would you expect req$parse() to do, among these 2 options?
a. return the parsed result so the user can assign it to a variable: body <- req$parse()
b. mutate the request object, so the user can then get the parsed body by doing smth like req$body

ultimately, from where i stand now, i don't think this whole business of adding parsers to ambiorix is a good idea. so i'll just let this PR sit for a while as i mull over it.

jrosell · 2025-02-06T09:17:04Z

We may need some example of final code we expect for raw, json (supported, configurable) , xml (user defined). I think we all have diferent final user code in mind.

kennedymwavu added 6 commits February 3, 2025 16:43

ft: create a general request parser

73bcf14

ch: parse query parameters using {webutils}

cd3e103

docs: update DESCRIPTION, NAMESPACE & man files

837a8b9

ft: return files as is

243201a

ch: use parse_req() in parse_multipart() & parse_json() instead…

2135aa2

… of deprecating them

ft: enable extraction & renaming of fields for application/x-www-form…

e71a1d6

…-urlencoded

kennedymwavu linked an issue Feb 3, 2025 that may be closed by this pull request

request parser #99

Open

kennedymwavu added 3 commits February 3, 2025 22:52

ch: pass dots to parser functions

a9a9820

docs: document all parsers in one place

bf1b00e

ft: enable overriding of default parsers

1501adc

kennedymwavu requested a review from JohnCoene February 3, 2025 21:29

JohnCoene requested changes Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ft: request parser #100

ft: request parser #100

kennedymwavu commented Feb 3, 2025

JohnCoene left a comment

JohnCoene Feb 3, 2025

kennedymwavu Feb 4, 2025

JohnCoene Feb 3, 2025

kennedymwavu Feb 4, 2025

jrosell commented Feb 4, 2025

kennedymwavu commented Feb 4, 2025

jrosell commented Feb 4, 2025

kennedymwavu commented Feb 4, 2025

JohnCoene commented Feb 4, 2025

kennedymwavu commented Feb 4, 2025

JohnCoene commented Feb 4, 2025

kennedymwavu commented Feb 5, 2025

jrosell commented Feb 6, 2025 •

edited

Loading

ft: request parser #100

Are you sure you want to change the base?

ft: request parser #100

Conversation

kennedymwavu commented Feb 3, 2025

JohnCoene left a comment

Choose a reason for hiding this comment

JohnCoene Feb 3, 2025

Choose a reason for hiding this comment

kennedymwavu Feb 4, 2025

Choose a reason for hiding this comment

JohnCoene Feb 3, 2025

Choose a reason for hiding this comment

kennedymwavu Feb 4, 2025

Choose a reason for hiding this comment

jrosell commented Feb 4, 2025

kennedymwavu commented Feb 4, 2025

jrosell commented Feb 4, 2025

kennedymwavu commented Feb 4, 2025

JohnCoene commented Feb 4, 2025

kennedymwavu commented Feb 4, 2025

JohnCoene commented Feb 4, 2025

kennedymwavu commented Feb 5, 2025

jrosell commented Feb 6, 2025 • edited Loading

jrosell commented Feb 6, 2025 •

edited

Loading