Skip to content

Commit 879f5b6

Browse files
authored
hide implementation detail and update readme (#855)
* hide implementation detail * update readme # Conflicts: # README.md * add required template for friend classes
1 parent f8507a1 commit 879f5b6

File tree

6 files changed

+102
-117
lines changed

6 files changed

+102
-117
lines changed

Diff for: README.md

+68-83
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ Ada is a fast and spec-compliant URL parser written in C++.
1010
Specification for URL parser can be found from the
1111
[WHATWG](https://url.spec.whatwg.org/#url-parsing) website.
1212

13+
Ada library also includes a [URLPattern](https://url.spec.whatwg.org/#urlpattern) implementation
14+
that is compatible with the [web-platform tests](https://github.com/web-platform-tests/wpt/tree/master/urlpattern).
15+
1316
The Ada library passes the full range of tests from the specification,
1417
across a wide range of platforms (e.g., Windows, Linux, macOS). It fully
1518
supports the relevant [Unicode Technical Standard](https://www.unicode.org/reports/tr46/#ToUnicode).
@@ -19,16 +22,11 @@ The WHATWG URL specification has been adopted by most browsers. Other tools, su
1922
standard libraries, follow the RFC 3986. The following table illustrates possible differences in practice
2023
(encoding of the host, encoding of the path):
2124

22-
| string source | string value |
23-
|:--------------|:--------------|
24-
| input string | https://www.7‑Eleven.com/Home/Privacy/Montréal |
25+
| string source | string value |
26+
|:------------------------|:------------------------------------------------------------|
27+
| input string | https://www.7‑Eleven.com/Home/Privacy/Montréal |
2528
| ada's normalized string | https://www.xn--7eleven-506c.com/Home/Privacy/Montr%C3%A9al |
26-
| curl 7.87 | (returns the original unchanged) |
27-
28-
### Requirements
29-
30-
The project is otherwise self-contained and it has no dependency.
31-
A recent C++ compiler supporting C++20. We test GCC 12 or better, LLVM 12 or better and Microsoft Visual Studio 2022.
29+
| curl 7.87 | (returns the original unchanged) |
3230

3331
## Ada is fast.
3432

@@ -50,9 +48,12 @@ Ada has improved the performance of the popular JavaScript environment Node.js:
5048
5149
The Ada library is used by important systems besides Node.js such as Redpanda, Kong, Telegram and Cloudflare Workers.
5250

51+
[![the ada library](http://img.youtube.com/vi/tQ-6OWRDsZg/0.jpg)](https://www.youtube.com/watch?v=tQ-6OWRDsZg)<br />
5352

53+
### Requirements
5454

55-
[![the ada library](http://img.youtube.com/vi/tQ-6OWRDsZg/0.jpg)](https://www.youtube.com/watch?v=tQ-6OWRDsZg)<br />
55+
The project is otherwise self-contained and it has no dependency.
56+
A recent C++ compiler supporting C++20. We test GCC 12 or better, LLVM 14 or better and Microsoft Visual Studio 2022.
5657

5758
## Installation
5859

@@ -67,8 +68,8 @@ Linux or macOS users might follow the following instructions if they have a rece
6768

6869
1. Pull the library in a directory
6970
```
70-
wget https://github.com/ada-url/ada/releases/download/v2.6.10/ada.cpp
71-
wget https://github.com/ada-url/ada/releases/download/v2.6.10/ada.h
71+
wget https://github.com/ada-url/ada/releases/download/v3.0.0/ada.cpp
72+
wget https://github.com/ada-url/ada/releases/download/v3.0.00/ada.h
7273
```
7374
2. Create a new file named `demo.cpp` with this content:
7475
```C++
@@ -131,7 +132,7 @@ components (path, host, and so forth).
131132
- Parse and validate a URL from an ASCII or a valid UTF-8 string.
132133
133134
```cpp
134-
auto url = ada::parse("https://www.google.com");
135+
auto url = ada::parse<ada::url_aggregator>("https://www.google.com");
135136
if (url) { /* URL is valid */ }
136137
```
137138

@@ -140,89 +141,45 @@ accessing it when you are not sure that it will succeed. The following
140141
code is unsafe:
141142

142143
```cpp
143-
auto url = ada::parse("some bad url");
144+
auto> url = ada::parse<ada::url_aggregator>("some bad url");
144145
url->get_href();
145146
```
146147

147-
You should do...
148-
149-
```cpp
150-
auto url = ada::parse("some bad url");
151-
if(url) {
152-
// next line is now safe:
153-
url->get_href();
154-
} else {
155-
// report a parsing failure
156-
}
157-
```
158-
159148
For simplicity, in the examples below, we skip the check because
160149
we know that parsing succeeds. All strings are assumed to be valid
161150
UTF-8 strings.
162151

163-
### Examples
152+
## Examples
164153

165-
- Get/Update credentials
154+
## URL Parser
166155

167-
```cpp
168-
auto url = ada::parse("https://www.google.com");
169-
url->set_username("username");
156+
```c++
157+
auto url = ada::parse<ada::url_aggregator>("https://www.google.com");
158+
159+
url->set_username("username"); // Update credentials
170160
url->set_password("password");
171161
// ada->get_href() will return "https://username:[email protected]/"
172-
```
173162

174-
- Get/Update Protocol
175-
176-
```cpp
177-
auto url = ada::parse("https://www.google.com");
178-
url->set_protocol("wss");
163+
url->set_protocol("wss"); // Update protocol
179164
// url->get_protocol() will return "wss:"
180-
// url->get_href() will return "wss://www.google.com/"
181-
```
182165

183-
- Get/Update host
184-
185-
```cpp
186-
auto url = ada::parse("https://www.google.com");
187-
url->set_host("github.com");
166+
url->set_host("github.com"); // Update host
188167
// url->get_host() will return "github.com"
189-
// you can use `url.set_hostname` depending on your usage.
190-
```
191168

192-
- Get/Update port
193-
194-
```cpp
195-
auto url = ada::parse("https://www.google.com");
196-
url->set_port("8080");
169+
url->set_port("8080"); // Update port
197170
// url->get_port() will return "8080"
198-
```
199171

200-
- Get/Update pathname
201-
202-
```cpp
203-
auto url = ada::parse("https://www.google.com");
204-
url->set_pathname("/my-super-long-path")
172+
url->set_pathname("/my-super-long-path"); // Update pathname
205173
// url->get_pathname() will return "/my-super-long-path"
206-
```
207-
208-
- Get/Update search/query
209174

210-
```cpp
211-
auto url = ada::parse("https://www.google.com");
212-
url->set_search("target=self");
175+
url->set_search("target=self"); // Update search
213176
// url->get_search() will return "?target=self"
214-
```
215-
216-
- Get/Update hash/fragment
217177

218-
```cpp
219-
auto url = ada::parse("https://www.google.com");
220-
url->set_hash("is-this-the-real-life");
178+
url->set_hash("is-this-the-real-life"); // Update hash/fragment
221179
// url->get_hash() will return "#is-this-the-real-life"
222180
```
223-
For more information about command-line options, please refer to the [CLI documentation](docs/cli.md).
224181
225-
- URL search params
182+
### URL Search Params
226183
227184
```cpp
228185
ada::url_search_params search_params("a=b&c=d&e=f");
@@ -236,6 +193,40 @@ while (keys.has_next()) {
236193
}
237194
```
238195

196+
### URLPattern
197+
198+
Our implementation doesn't provide a regex engine and leaves the decision of choosing the right engine to the user.
199+
This is done as a security measure since the default std::regex engine is not safe and open to DDOS attacks.
200+
Runtimes like Node.js and Cloudflare Workers use the V8 regex engine, which is safe and performant.
201+
202+
```cpp
203+
// Define a regex engine that conforms to the following interface
204+
// For example we will use v8 regex engine
205+
206+
class v8_regex_provider {
207+
public:
208+
v8_regex_provider() = default;
209+
using regex_type = v8::Global<v8::RegExp>;
210+
static std::optional<regex_type> create_instance(std::string_view pattern,
211+
bool ignore_case);
212+
static std::optional<std::vector<std::optional<std::string>>> regex_search(
213+
std::string_view input, const regex_type& pattern);
214+
static bool regex_match(std::string_view input, const regex_type& pattern);
215+
};
216+
217+
// Define a URLPattern
218+
auto pattern = ada::parse_url_pattern<v8_regex_provider>("/books/:id(\\d+)", "https://example.com");
219+
220+
// Check validity
221+
if (!pattern) { return EXIT_FAILURE; }
222+
223+
// Match a URL
224+
auto match = pattern->match("https://example.com/books/123");
225+
226+
// Test a URL
227+
auto matched = pattern->test("https://example.com/books/123");
228+
```
229+
239230
### C wrapper
240231
241232
See the file `include/ada_c.h` for our C interface. We expect ASCII or UTF-8 strings.
@@ -298,23 +289,21 @@ c++ demo.o ada.o -o cdemo
298289
./cdemo
299290
```
300291

292+
### Command-line interface
293+
294+
For more information about command-line options, please refer to the [CLI documentation](docs/cli.md).
295+
301296
### CMake dependency
302297

303298
See the file `tests/installation/CMakeLists.txt` for an example of how you might use ada from your own
304299
CMake project, after having installed ada on your system.
305300

306-
## Installation
307-
308-
### Homebrew
309-
310-
Ada is available through [Homebrew](https://formulae.brew.sh/formula/ada-url#default).
311-
You can install Ada using `brew install ada-url`.
312-
313301
## Contributing
314302

315303
### Building
316304

317-
Ada uses cmake as a build system. It's recommended you to run the following commands to build it locally.
305+
Ada uses cmake as a build system, but also supports Bazel. It's recommended you to run the following
306+
commands to build it locally.
318307

319308
Without tests:
320309

@@ -325,16 +314,13 @@ With tests (requires git):
325314
- **Build**: `cmake -B build -DADA_TESTING=ON && cmake --build build`
326315
- **Test**: `ctest --output-on-failure --test-dir build`
327316

328-
329317
With tests (requires available local packages):
330318

331319
- **Build**: `cmake -B build -DADA_TESTING=ON -D CPM_USE_LOCAL_PACKAGES=ON && cmake --build build`
332320
- **Test**: `ctest --output-on-failure --test-dir build`
333321

334322
Windows users need additional flags to specify the build configuration, e.g. `--config Release`.
335323

336-
337-
338324
The project can also be built via docker using default docker file of repository with following commands.
339325

340326
`docker build -t ada-builder . && docker run --rm -it -v ${PWD}:/repo ada-builder`
@@ -352,5 +338,4 @@ Our tests include third-party code and data. The benchmarking code includes thir
352338

353339
### Further reading
354340

355-
356341
* Yagiz Nizipli, Daniel Lemire, [Parsing Millions of URLs per Second](https://doi.org/10.1002/spe.3296), Software: Practice and Experience 54(5) May 2024.

Diff for: include/ada/url_pattern-inl.h

+4-4
Original file line numberDiff line numberDiff line change
@@ -193,15 +193,15 @@ url_pattern_component<regex_provider>::compile(
193193

194194
template <url_pattern_regex::regex_concept regex_provider>
195195
result<std::optional<url_pattern_result>> url_pattern<regex_provider>::exec(
196-
const url_pattern_input& input, std::string_view* base_url) {
196+
const url_pattern_input& input, const std::string_view* base_url) {
197197
// Return the result of match given this's associated URL pattern, input, and
198198
// baseURL if given.
199199
return match(input, base_url);
200200
}
201201

202202
template <url_pattern_regex::regex_concept regex_provider>
203-
result<bool> url_pattern<regex_provider>::test(const url_pattern_input& input,
204-
std::string_view* base_url) {
203+
result<bool> url_pattern<regex_provider>::test(
204+
const url_pattern_input& input, const std::string_view* base_url) {
205205
// TODO: Optimization opportunity. Rather than returning `url_pattern_result`
206206
// Implement a fast path just like `can_parse()` in ada_url.
207207
// Let result be the result of match given this's associated URL pattern,
@@ -215,7 +215,7 @@ result<bool> url_pattern<regex_provider>::test(const url_pattern_input& input,
215215

216216
template <url_pattern_regex::regex_concept regex_provider>
217217
result<std::optional<url_pattern_result>> url_pattern<regex_provider>::match(
218-
const url_pattern_input& input, std::string_view* base_url_string) {
218+
const url_pattern_input& input, const std::string_view* base_url_string) {
219219
std::string protocol{};
220220
std::string username{};
221221
std::string password{};

Diff for: include/ada/url_pattern.h

+15-17
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77

88
#include "ada/implementation.h"
99
#include "ada/expected.h"
10+
#include "ada/parser.h"
11+
#include "ada/url_pattern_init.h"
1012

1113
#include <string>
1214
#include <unordered_map>
@@ -18,13 +20,6 @@
1820
#endif // ADA_TESTING
1921

2022
namespace ada {
21-
namespace parser {
22-
template <typename result_type, typename url_pattern_init,
23-
typename url_pattern_options, typename regex_provider>
24-
tl::expected<result_type, errors> parse_url_pattern_impl(
25-
std::variant<std::string_view, url_pattern_init> input,
26-
const std::string_view* base_url, const url_pattern_options* options);
27-
} // namespace parser
2823

2924
enum class url_pattern_part_type : uint8_t {
3025
// The part represents a simple fixed text string.
@@ -234,20 +229,23 @@ class url_pattern {
234229
/**
235230
* @see https://urlpattern.spec.whatwg.org/#dom-urlpattern-exec
236231
*/
237-
result<std::optional<url_pattern_result>> exec(const url_pattern_input& input,
238-
std::string_view* base_url);
232+
result<std::optional<url_pattern_result>> exec(
233+
const url_pattern_input& input,
234+
const std::string_view* base_url = nullptr);
239235

240236
/**
241237
* @see https://urlpattern.spec.whatwg.org/#dom-urlpattern-test
242238
*/
243-
result<bool> test(const url_pattern_input& input, std::string_view* base_url);
239+
result<bool> test(const url_pattern_input& input,
240+
const std::string_view* base_url = nullptr);
244241

245242
/**
246243
* @see https://urlpattern.spec.whatwg.org/#url-pattern-match
247244
* This function expects a valid UTF-8 string if input is a string.
248245
*/
249246
result<std::optional<url_pattern_result>> match(
250-
const url_pattern_input& input, std::string_view* base_url_string);
247+
const url_pattern_input& input,
248+
const std::string_view* base_url_string = nullptr);
251249

252250
// @see https://urlpattern.spec.whatwg.org/#dom-urlpattern-protocol
253251
[[nodiscard]] std::string_view get_protocol() const ada_lifetime_bound;
@@ -286,6 +284,12 @@ class url_pattern {
286284
}
287285
#endif // ADA_TESTING
288286

287+
template <url_pattern_regex::regex_concept P>
288+
friend tl::expected<url_pattern<P>, errors> parser::parse_url_pattern_impl(
289+
std::variant<std::string_view, url_pattern_init> input,
290+
const std::string_view* base_url, const url_pattern_options* options);
291+
292+
private:
289293
url_pattern_component<regex_provider> protocol_component{};
290294
url_pattern_component<regex_provider> username_component{};
291295
url_pattern_component<regex_provider> password_component{};
@@ -295,12 +299,6 @@ class url_pattern {
295299
url_pattern_component<regex_provider> search_component{};
296300
url_pattern_component<regex_provider> hash_component{};
297301
bool ignore_case_ = false;
298-
299-
template <typename result_type, typename url_pattern_init,
300-
typename url_pattern_options, typename regex_provider_for_parse_url>
301-
friend tl::expected<result_type, errors> parser::parse_url_pattern_impl(
302-
std::variant<std::string_view, url_pattern_init> input,
303-
const std::string_view* base_url, const url_pattern_options* options);
304302
};
305303

306304
} // namespace ada

0 commit comments

Comments
 (0)