Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanity checks: number of inserted rows #2

Open
Elenetta17 opened this issue Nov 18, 2024 · 2 comments
Open

Sanity checks: number of inserted rows #2

Elenetta17 opened this issue Nov 18, 2024 · 2 comments

Comments

@Elenetta17
Copy link
Member

Currently, there is no mechanism to verify that the content of the results__ tables aligns with the content of the tables converted to the scamper1 format. A basic sanity check could involve comparing the number of distinct destination addresses.

While executing the upload-tables.sh script, it should be possible to extract the number of affected rows (i.e., the count of distinct destination addresses) from the logs. For example, the logs contain an entry like: Number of affected rows: 3218865.
This number should match the result of the following query:

WITH
    groupUniqArray((round, probe_ttl, reply_src_addr)) AS traceroute,
    arrayMap(x -> x.2, traceroute) AS ttls,
    arrayMap(x -> (x.1, x.3), traceroute) AS val,
    CAST((ttls, val), 'Map(UInt8, Tuple(UInt8, IPv6))') AS map,
    arrayMin(ttls) AS first_ttl,
    arrayMax(ttls) AS last_ttl,
    arrayMap(i -> (toUInt8(i), toUInt8(i + 1), map[toUInt8(i)], map[toUInt8(i + 1)]), range(first_ttl, last_ttl)) AS links,
    arrayJoin(links) AS link
SELECT COUNT(DISTINCT probe_dst_addr)
FROM (
    SELECT
        probe_protocol,
        probe_src_addr,
        probe_dst_prefix,
        probe_dst_addr,
        probe_src_port,
        probe_dst_port,
        link.1 AS near_ttl,
        link.2 AS far_ttl,
        link.3.2 AS near_addr,
        link.4.2 AS far_addr
    FROM **results__ table**
    GROUP BY
        probe_protocol,
        probe_src_addr,
        probe_dst_prefix,
        probe_dst_addr,
        probe_src_port,
        probe_dst_port
) 
WHERE toString(near_addr) != '::' AND toString(far_addr) != '::'

Example of query result: {"uniqExact(probe_dst_addr)":3218865}

@SaiedKazemi
Copy link
Member

@Elenetta17, thank you for reporting this issue. I'll look into it and keep you updated on my progress.

@Elenetta17
Copy link
Member Author

After applying the cap to flowid, the query needs to be slightly changed:

WITH
    groupUniqArray((round, probe_ttl, reply_src_addr)) AS traceroute,
    arrayMap(x -> x.2, traceroute) AS ttls,
    arrayMap(x -> (x.1, x.3), traceroute) AS val,
    CAST((ttls, val), 'Map(UInt8, Tuple(UInt8, IPv6))') AS map,
    arrayMin(ttls) AS first_ttl,
    arrayMax(ttls) AS last_ttl,
    arrayMap(i -> (toUInt8(i), toUInt8(i + 1), map[toUInt8(i)], map[toUInt8(i + 1)]), range(first_ttl, last_ttl)) AS links,
    arrayJoin(links) AS link
SELECT COUNT(DISTINCT probe_dst_addr)
FROM (
    SELECT
        probe_protocol,
        probe_src_addr,
        probe_dst_prefix,
        probe_dst_addr,
        probe_src_port,
        probe_dst_port,
        link.1 AS near_ttl,
        link.2 AS far_ttl,
        link.3.2 AS near_addr,
        link.4.2 AS far_addr
    FROM **results__ table**
    GROUP BY
        probe_protocol,
        probe_src_addr,
        probe_dst_prefix,
        probe_dst_addr,
        probe_src_port,
        probe_dst_port
) 
WHERE 
    (toString(near_addr) != '::' AND toString(far_addr) != '::') AND
    probe_src_port < 28096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants