Skip to content

Commit dd89e88

Browse files
authored
Improve documentation and fix some version URI error (#21)
1 parent 3c533d9 commit dd89e88

File tree

5 files changed

+67
-16
lines changed

5 files changed

+67
-16
lines changed

README.md

+54-2
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,6 @@ Currently, R3 offers a simple gRPC service that could be deployed easily at loca
2525

2626
The simple server is the best way to get started, which could steadily serve 500+ SkyWalking services * 3000 uris per minute).
2727

28-
TODO: Fault tolerence and persistence is not implemented yet.
29-
3028
To run the R3 service on localhost:
3129

3230
```bash
@@ -39,6 +37,60 @@ To deploy as a container:
3937
docker run -d --name r3 -p 17128:17128 r3:latest
4038
```
4139

40+
### Demo
41+
42+
#### Restful Pattern Recognition
43+
44+
The following URL would recognize the pattern as `/api/users/{var}`, since the last part of URL are different for each instance.
45+
46+
* /api/users/cbf11b02ea464447b507e8852c32190a
47+
* /api/users/5e363a4a18b7464b8cbff1a7ee4c91ca
48+
* /api/users/44cf77fc351f4c6c9c4f1448f2f12800
49+
* /api/users/38d3be5f9bd44f7f98906ea049694511
50+
* /api/users/5ad14302e7924f4aa1d60e58d65b3dd2
51+
52+
#### Word Detection
53+
54+
The following URL would keep the original URL, not parametrized, since the all part of URL are word.
55+
56+
* /api/sale
57+
* /api/product_sale
58+
* /api/ProductSale
59+
60+
#### Lower Sample Count
61+
62+
The following URL would keep the original URL, not parametrized, since the sample count is lower than the threshold(`combine_min_url_count`).
63+
If the sample count equals or bigger than the threshold, the URL would be parametrized.
64+
65+
Such as the threshold is `3`, the following URL would keep the original URL, not parametrized.
66+
67+
* /api/fetch1
68+
* /api/fetch2
69+
70+
But the following URL would be parametrized to `/api/{var}`, since the sample count is bigger than the threshold.
71+
72+
* /api/fetch1
73+
* /api/fetch2
74+
* /api/fetch3
75+
76+
#### Versioned API
77+
78+
If the part of URI contains version number, such as `v1`, `v2`, `v3`, the version number part would not be parametrized.
79+
80+
Such as the following URL would not be parametrized:
81+
82+
* /test/v1
83+
* /test/v2
84+
* /test/v3
85+
86+
If still not matter the other part of URI to be parametrized, such as the following URI would be parametrized to `/test/v1/{var}` and `/test/v999/{var}`.
87+
88+
* /test/v1/cbf11b02ea464447b507e8852c32190a
89+
* /test/v1/5e363a4a18b7464b8cbff1a7ee4c91ca
90+
* /test/v1/38d3be5f9bd44f7f98906ea049694511
91+
* /test/v999/1
92+
* /test/v999/2
93+
* /test/v999/3
4294

4395
### Algorithm: URIDrain
4496
If you are curious how the algorithm actually works or decided to improve upon it, please first read the [URIDrain Overview](models/README.md) and checkout the algorithm live demo by running below commands:

demo/uri_drain.ini

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ depth = 4
3333
max_children = 100
3434
max_clusters = 1024
3535
extra_delimiters = ["/"]
36-
combine_min_url_count = ${DRAIN_COMBINE_MIN_URL_COUNT:8}
36+
combine_min_url_count = ${DRAIN_COMBINE_MIN_URL_COUNT:3}
3737

3838
[PROFILING]
3939
enabled = True

models/Configuration.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Drain is the core algorithm of URI Drain.
3636
| max_clusters | int | DRAIN_MAX_CLUSTERS | 1024 | Max number of tracked clusters (unlimited by default). When this number is reached, model starts replacing old clusters with a new ones according to the LRU policy. |
3737
| extra_delimiters | string | DRAIN_EXTRA_DELIMITERS | \["/"\] | The extra delimiters to split the sequence. |
3838
| analysis_min_url_count | int | DRAIN_ANALYSIS_MIN_URL_COUNT | 20 | The minimum number of unique URLs(each service) to trigger the analysis. |
39-
| combine_min_url_count | int | DRAIN_COMBINE_MIN_URL_COUNT | 8 | The minimum number of unique URLs(candidate of each service) to mask as variable URL(encase some similar URL are not restful, such as `/test/one` and `test/two`). |
39+
| combine_min_url_count | int | DRAIN_COMBINE_MIN_URL_COUNT | 3 | The minimum number of unique URLs(candidate of each service) to mask as variable URL(encase some similar URL are not restful, such as `/test/one` and `test/two`). |
4040

4141
### Profiling
4242

models/README.md

+2-5
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ the URI domain. Which includes:
2828
3. The URIDrain algorithm doesn't involve pre-masking of the URI sequences to prevent false assumptions.
2929
4. The URIDrain algorithm takes preceding and subsequent URI tokens into account when deciding if a matched cluster
3030
should be updated.
31-
5. **TODO**: The URIDrain algorithm optionally use English Corpus to help identify likely non-parameter tokens.
31+
5. The URIDrain algorithm use [English Corpus](https://github.com/sloria/TextBlob) to help identify likely non-parameter tokens.
32+
6. The URIDrain algorithm support recognized versioned API(`v\d+`) detection to prevent versioned APIs parametrized.
3233

3334
**Known Caveats**:
3435
The algorithm may provide false clustering in some edge cases (although it doesn't hurt at all in APM scenarios).
@@ -64,7 +65,3 @@ This project rely on gRPC to communicate with the Apache SkyWalking AI pipeline.
6465
in the `server/proto/' folder.
6566

6667
Compile the proto by running `make gen` or simply `make env` if you are get started from a bare environment.
67-
68-
### TODO
69-
Try catch statements to handle uncovered algorithm errors
70-

models/uri_drain/uri_drain.py

+9-7
Original file line numberDiff line numberDiff line change
@@ -620,17 +620,19 @@ def create_template(self, seq1, seq2):
620620
# self.logger.debug(f'tokens of sequence2 = {seq2}')
621621
return "rejected"
622622
# ASSUMPTION: A subsequent token to version number cannot be a param
623-
if pre_token is not None and pre_token.startswith(
624-
'v') and pre_token[1:].isdigit():
625-
# self.logger.debug('pre_token is a version number, so current token cannot be a param (assumption)')
626-
# self.logger.debug(f'tokens of sequence2 = {seq2}')
627-
return "rejected"
623+
# This one should be deleted because we should permit the an param path is after version number path
624+
# such as /test/v1/abcdef, /test/v1/bcdefg, should be merged into /test/v1/{var}
625+
# if pre_token is not None and pre_token.startswith(
626+
# 'v') and pre_token[1:].isdigit():
627+
# # self.logger.debug('pre_token is a version number, so current token cannot be a param (assumption)')
628+
# # self.logger.debug(f'tokens of sequence2 = {seq2}')
629+
# return "rejected"
628630
if token1.startswith('v') and token1[1:].isdigit():
629631
# self.logger.debug('token1 is a version number, so current token cannot be a param (assumption)')
630632
# self.logger.debug(f'tokens of sequence2 = {seq2}')
631633
return "rejected"
632-
if pre_token and self.has_numbers(pre_token):
633-
# Based on assumption that no two consecutive tokens can be params
634+
if pre_token and (not pre_token.startswith('v')) and self.has_numbers(pre_token):
635+
# Based on assumption that no two consecutive tokens can be params(unless the pre token is versioned)
634636
# So attempt to change this position must ensure that the previous token is not a param
635637
# self.logger.debug('pre_token has numbers, so current token cannot be a param (assumption)')
636638
# self.logger.debug(f'tokens of sequence2 = {seq2}')

0 commit comments

Comments
 (0)