Skip to content

Commit 750becf

Browse files
chore: initial commit
1 parent 4d39460 commit 750becf

4 files changed

+169
-1
lines changed

README.md

+38-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,38 @@
1-
# machine-learning-interview
1+
# ML Engineer System Design Interview
2+
3+
## **System Design Task:**
4+
5+
You are tasked with designing and building a machine learning system that can detect malicious or vulnerable code inside a codebase. The system will be integrated into a GitHub repository and should be able to automatically detect potentially harmful code changes and flag them before they are pushed into production
6+
7+
You will have access to an internal dataset consisting of bugs and vulnerabilities (called _findings_) discovered in other repositories. These findings include the following information:
8+
9+
[code_files.json](code_files.json)
10+
11+
[findings_file_links.json](findings_file_links.json)
12+
13+
[findings_example.json](findings_example.json)
14+
15+
- The repository where the finding was found
16+
- The file path
17+
- The actual code of the file
18+
- An explanation of the bug or vulnerability.
19+
- The severity of the finding (low, medium, high, critical)
20+
21+
### Key Requirements:
22+
23+
- **GitHub Integration**: The system must be integrated with a GitHub repository and trigger on code changes (e.g., pull requests or commits). The specifics of how to integrate with GitHub will be up to you to figure out. Feel free to ask clarifying questions.
24+
- **Training Data**: Use the internal _findings_ dataset to train a model capable of detecting potentially malicious or harmful code changes. You can use other data sources if deemed necessary
25+
- **Model Performance**: The model should strike a balance between minimizing false positives and ensuring high recall, while giving feedback quickly enough to be useful during code review cycles.
26+
- **Explainability**: The system must be able to provide a clear explanation of why certain lines of code were flagged as potentially harmful, so human reviewers can understand and verify the reasoning.
27+
- **Evolving Threats**: The system should be able to adapt over time to new types of vulnerabilities and attack patterns. Consider how you would re-train or update the model to handle evolving threats.
28+
- **Scalability**: The solution must scale to handle hundreds of repositories and potentially large codebases.
29+
30+
### Your Task:
31+
32+
- **Architecture**: Design a high-level architecture for this system, including key components such as data ingestion, model training, prediction, and feedback loops. Outline how you would structure the interaction with the GitHub repository, keeping performance and scalability in mind.
33+
- **Modeling Approach**: Discuss the type of machine learning model(s) you would use. Would you leverage classical models, deep learning, or transformers? Explain your choice of model architecture, including how you would extract features from code, handle imbalanced datasets, and any other relevant preprocessing steps.
34+
- **Trade-offs**: Consider the trade-offs between accuracy, performance, explainability, and scalability in your design. How would you minimize false positives while ensuring the model catches as much malicious code as possible?
35+
- **Adaptation and Feedback**: Describe how the system could learn from new code submissions and adapt to new attack patterns over time. Would you implement online learning, periodic re-training, or another method to ensure the model evolves with new threats?
36+
- **Security Considerations**: Since the system will operate in a security-critical context, discuss any additional security measures or best practices you would apply to ensure the model itself is not compromised or used maliciously.
37+
38+
During the discussion, we will dive into specific technical details, trade-offs, and any additional assumptions or design decisions you make.

code_files.json

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
[
2+
{
3+
"id": "00002e0d-170a-4be6-971f-8129bca8b435",
4+
"repository_id": "c90131b4-5c7c-4ebc-a1f3-8002d219bfe0",
5+
"path": "blast-optimism/cannon/mipsevm/open_mips_tests/test/andi.asm",
6+
"content": "###############################################################################\n# File : andi.asm\n# Project : MIPS32 MUX\n# Author: : Grant Ayers ([email protected])\n#\n# Standards/Formatting:\n# MIPS gas, soft tab, 80 column\n#\n# Description:\n# Test the functionality of the 'andi' instruction.\n#\n###############################################################################\n\n .section .test, 'x'\n .balign 4\n .set noreorder\n .global test\n .ent test\ntest:\n lui $s0, 0xbfff # Load the base address 0xbffffff0\n ori $s0, 0xfff0\n ori $s1, $0, 1 # Prepare the 'done' status\n\n #### Test code start ####\n\n ori $t0, $0, 0xcafe # A = 0xcafe\n andi $t1, $t0, 0xaaaa # B = A & 0xaaaa = 0x8aaa\n andi $t2, $t1, 0x5555 # C = B & 0x5555 = 0\n sltiu $v0, $t2, 1\n\n #### Test code end ####\n\n sw $v0, 8($s0) # Set the test result\n sw $s1, 4($s0) # Set 'done'\n\n$done:\n jr $ra\n nop\n\n .end test",
7+
"language": "Assembly",
8+
"symbolic_link": false,
9+
"in_scope": true,
10+
"mode": 436,
11+
"last_modified": "2024-01-30 14:37:11+00"
12+
},
13+
{
14+
"id": "000b3742-f5b6-427c-998c-42e41ba4c16b",
15+
"repository_id": "c90131b4-5c7c-4ebc-a1f3-8002d219bfe0",
16+
"path": "blast-geth/build/ci-notes.md",
17+
"content": "# Debian Packaging\n\nTagged releases and develop branch commits are available as installable Debian packages\nfor Ubuntu. Packages are built for the all Ubuntu versions which are supported by\nCanonical.\n\nPackages of develop branch commits have suffix -unstable and cannot be installed alongside\nthe stable version. Switching between release streams requires user intervention.\n\n## Launchpad\n\nThe packages are built and served by launchpad.net. We generate a Debian source package\nfor each distribution and upload it. Their builder picks up the source package, builds it\nand installs the new version into the PPA repository. Launchpad requires a valid signature\nby a team member for source package uploads.\n\nThe signing key is stored in an environment variable which Travis CI makes available to\ncertain builds. Since Travis CI doesn't support FTP, SFTP is used to transfer the\npackages. To set this up yourself, you need to create a Launchpad user and add a GPG key\nand SSH key to it. Then encode both keys as base64 and configure 'secret' environment\nvariables `PPA_SIGNING_KEY` and `PPA_SSH_KEY` on Travis.\n\nWe want to build go-ethereum with the most recent version of Go, irrespective of the Go\nversion that is available in the main Ubuntu repository. In order to make this possible,\nwe bundle the entire Go sources into our own source archive and start the built job by\ncompiling Go and then using that to build go-ethereum. On Trusty we have a special case\nrequiring the `~gophers/ubuntu/archive` PPA since Trusty can't even build Go itself. PPA\ndeps are set at https://launchpad.net/%7Eethereum/+archive/ubuntu/ethereum/+edit-dependencies\n\n## Building Packages Locally (for testing)\n\nYou need to run Ubuntu to do test packaging.\n\nInstall any version of Go and Debian packaging tools:\n\n $ sudo apt-get install build-essential golang-go devscripts debhelper python-bzrlib python-paramiko\n\nCreate the source packages:\n\n $ go run build/ci.go debsrc -workdir dist\n\nThen go into the source package directory for your running distribution and build the package:\n\n $ cd dist/ethereum-unstable-1.9.6+bionic\n $ dpkg-buildpackage\n\nBuilt packages are placed in the dist/ directory.\n\n $ cd ..\n $ dpkg-deb -c geth-unstable_1.9.6+bionic_amd64.deb",
18+
"language": "Markdown",
19+
"symbolic_link": false,
20+
"in_scope": true,
21+
"mode": 436,
22+
"last_modified": "2024-01-30 14:37:11+00"
23+
},
24+
{
25+
"id": "000b48bf-30fb-4939-94ab-5227ee93a817",
26+
"repository_id": "c90131b4-5c7c-4ebc-a1f3-8002d219bfe0",
27+
"path": "blast-optimism/packages/contracts-bedrock/src/libraries/trie/MerkleTrie.sol",
28+
"content": "# Debian Packaging\n\nTagged releases and develop branch commits are available as installable Debian packages\nfor Ubuntu. Packages are built for the all Ubuntu versions which are supported by\nCanonical.\n\nPackages of develop branch commits have suffix -unstable and cannot be installed alongside\nthe stable version. Switching between release streams requires user intervention.\n\n## Launchpad\n\nThe packages are built and served by launchpad.net. We generate a Debian source package\nfor each distribution and upload it. Their builder picks up the source package, builds it\nand installs the new version into the PPA repository. Launchpad requires a valid signature\nby a team member for source package uploads.\n\nThe signing key is stored in an environment variable which Travis CI makes available to\ncertain builds. Since Travis CI doesn't support FTP, SFTP is used to transfer the\npackages. To set this up yourself, you need to create a Launchpad user and add a GPG key\nand SSH key to it. Then encode both keys as base64 and configure 'secret' environment\nvariables `PPA_SIGNING_KEY` and `PPA_SSH_KEY` on Travis.\n\nWe want to build go-ethereum with the most recent version of Go, irrespective of the Go\nversion that is available in the main Ubuntu repository. In order to make this possible,\nwe bundle the entire Go sources into our own source archive and start the built job by\ncompiling Go and then using that to build go-ethereum. On Trusty we have a special case\nrequiring the `~gophers/ubuntu/archive` PPA since Trusty can't even build Go itself. PPA\ndeps are set at https://launchpad.net/%7Eethereum/+archive/ubuntu/ethereum/+edit-dependencies\n\n## Building Packages Locally (for testing)\n\nYou need to run Ubuntu to do test packaging.\n\nInstall any version of Go and Debian packaging tools:\n\n $ sudo apt-get install build-essential golang-go devscripts debhelper python-bzrlib python-paramiko\n\nCreate the source packages:\n\n $ go run build/ci.go debsrc -workdir dist\n\nThen go into the source package directory for your running distribution and build the package:\n\n $ cd dist/ethereum-unstable-1.9.6+bionic\n $ dpkg-buildpackage\n\nBuilt packages are placed in the dist/ directory.\n\n $ cd ..\n $ dpkg-deb -c geth-unstable_1.9.6+bionic_amd64.deb",
29+
"language": "Solidity",
30+
"symbolic_link": false,
31+
"in_scope": true,
32+
"mode": 436,
33+
"last_modified": "2024-01-30 14:37:11+00"
34+
},
35+
{
36+
"id": "000c66a2-97da-4f4a-8084-efe57c0324b0",
37+
"repository_id": "c90131b4-5c7c-4ebc-a1f3-8002d219bfe0",
38+
"path": "blast-optimism/packages/contracts-bedrock/periphery-deploy-config/optimism-goerli.json",
39+
"content": "# Debian Packaging\n\nTagged releases and develop branch commits are available as installable Debian packages\nfor Ubuntu. Packages are built for the all Ubuntu versions which are supported by\nCanonical.\n\nPackages of develop branch commits have suffix -unstable and cannot be installed alongside\nthe stable version. Switching between release streams requires user intervention.\n\n## Launchpad\n\nThe packages are built and served by launchpad.net. We generate a Debian source package\nfor each distribution and upload it. Their builder picks up the source package, builds it\nand installs the new version into the PPA repository. Launchpad requires a valid signature\nby a team member for source package uploads.\n\nThe signing key is stored in an environment variable which Travis CI makes available to\ncertain builds. Since Travis CI doesn't support FTP, SFTP is used to transfer the\npackages. To set this up yourself, you need to create a Launchpad user and add a GPG key\nand SSH key to it. Then encode both keys as base64 and configure 'secret' environment\nvariables `PPA_SIGNING_KEY` and `PPA_SSH_KEY` on Travis.\n\nWe want to build go-ethereum with the most recent version of Go, irrespective of the Go\nversion that is available in the main Ubuntu repository. In order to make this possible,\nwe bundle the entire Go sources into our own source archive and start the built job by\ncompiling Go and then using that to build go-ethereum. On Trusty we have a special case\nrequiring the `~gophers/ubuntu/archive` PPA since Trusty can't even build Go itself. PPA\ndeps are set at https://launchpad.net/%7Eethereum/+archive/ubuntu/ethereum/+edit-dependencies\n\n## Building Packages Locally (for testing)\n\nYou need to run Ubuntu to do test packaging.\n\nInstall any version of Go and Debian packaging tools:\n\n $ sudo apt-get install build-essential golang-go devscripts debhelper python-bzrlib python-paramiko\n\nCreate the source packages:\n\n $ go run build/ci.go debsrc -workdir dist\n\nThen go into the source package directory for your running distribution and build the package:\n\n $ cd dist/ethereum-unstable-1.9.6+bionic\n $ dpkg-buildpackage\n\nBuilt packages are placed in the dist/ directory.\n\n $ cd ..\n $ dpkg-deb -c geth-unstable_1.9.6+bionic_amd64.deb",
40+
"language": "JSON",
41+
"symbolic_link": false,
42+
"in_scope": true,
43+
"mode": 436,
44+
"last_modified": "2024-01-30 14:37:11+00"
45+
}
46+
]

findings_example.json

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
[
2+
{
3+
"id": "000384c6-912a-49b5-b629-06abb81ad583",
4+
"repository_id": "ac757733-81a4-43c7-8f49-17c5b135cdff",
5+
"attributed_to": "deb0aa88-3803-45f5-9d2e-6ace4ce4421e",
6+
"title": " Cannot add correlated asset in Curve2PoolLPAdaptor",
7+
"description": "## Description\n\nWhen adding assets to Curve2PoolLPAdaptor , to use curve pools as oracles, it performs a validation to make sure correlated assets are indeed correlated like so: \n\n```\nif (data.isCorrelated) {\n // If assets are correlated, there will not be a lp_price()\n // function, so this should hit the catch statement.\n -------------------------------\n try pool.lp_price() {\n--------------------------------------\n revert Curve2PoolLPAdaptor__UnsupportedPool();\n } catch {}\n } else {\n // If assets are not correlated, there will be a lp_price()\n // function, so this should hit the try statement.\n try pool.lp_price() {} catch {\n revert Curve2PoolLPAdaptor__UnsupportedPool();\n }\n }\n```\n\nThe validation depends on the called method `lp_price()` to be unavailable thus causing a revert and the try catching it, but the solidity try catch function is not capable of catching an invalid method. If a function does not exist in a called contract, it reverts and the catch will be unable to catch it. As reported here on solidity repo on github: \n[\"it crashes if the target contract doesn't have that method\"](https://github.com/ethereum/solidity/issues/13869). Because of this, it will be impossible to added a correlated asset. \n\n## Recommendation\nFind a different mechanism to validate that a correlated asset is indeed correlated. A possible option would be using the low level call and manually parsing the result.\n",
8+
"status": "confirmed",
9+
"severity": "informational",
10+
"likelihood": "high",
11+
"impact": "high",
12+
"created_by": "8ebb8382-03a5-4b28-b2f4-851cf331296a",
13+
"created_at": "2024-04-13 19:26:31.52598+00",
14+
"number": 475,
15+
"last_updated_by": "0dab331d-76ad-4024-99f3-97eebf33a9b6",
16+
"fixed_by": "{}",
17+
"duplicate_of": null,
18+
"points": 0.0,
19+
"quality": null
20+
},
21+
{
22+
"id": "00113f68-4b85-42d9-bdf3-f730af80e123",
23+
"repository_id": "c90131b4-5c7c-4ebc-a1f3-8002d219bfe0",
24+
"attributed_to": "e1e76369-3121-44c4-abbd-aeadae1a76f2",
25+
"title": "Void contracts are able to claim gas while they should not be able to",
26+
"description": "**Description**:\n\nContracts have two gas modes: Void and Claimable. All gas from void contracts will be attributed to sequencer, as admins can claim 100% of it. Gas of Claimable contracts, on the other hand, could be claimed by themselves/governors with rate determined by blast's mechanism. \nThe problem is that void contracts could still claim gas as there is no check to prevent them from doing so.\nThis could lead to small loss of funds for the protocol.\n\n**Recommendation**:\nAdding a check which will revert when void contracts are trying to claim gas in claim() function",
27+
"status": "withdrawn",
28+
"severity": "medium",
29+
"likelihood": "high",
30+
"impact": "medium",
31+
"created_by": "cbfc21bc-63d4-45e6-b346-510692c03664",
32+
"created_at": "2024-02-17 05:58:02.242556+00",
33+
"number": 385,
34+
"last_updated_by": "cbfc21bc-63d4-45e6-b346-510692c03664",
35+
"fixed_by": "{}",
36+
"duplicate_of": null,
37+
"points": null,
38+
"quality": null
39+
},
40+
{
41+
"id": "00117a11-0759-4abd-9c72-bdca9f50bc6c",
42+
"repository_id": "8409a0ce-6c21-4cc9-8ef2-bd77ce7425af",
43+
"attributed_to": "5051fda1-ab4d-438c-ba6f-2e9a97bd57a8",
44+
"title": "Incorrect documentation of `onlyCuratorOrGuardianRole` modifier",
45+
"description": "**Description**:\n\nThe comment describing the `onlyCuratorOrGuardianRole` modifier currently states that it reverts if the caller does not have the curator or guardian role. However, this is not entirely accurate as it also does not revert if the caller is the owner.\n\n**Recommendation**:\n\nTo provide an accurate description, update the comment to the following: \"/// @dev Reverts if the caller doesn't have the curator nor the guardian role and is not the owner.\"",
46+
"status": "new",
47+
"severity": "informational",
48+
"likelihood": null,
49+
"impact": null,
50+
"created_by": "65d6a7e1-9967-4bf3-9e17-5f9567491bb2",
51+
"created_at": "2023-11-22 13:31:22.012544+00",
52+
"number": 47,
53+
"last_updated_by": "65d6a7e1-9967-4bf3-9e17-5f9567491bb2",
54+
"fixed_by": "{}",
55+
"duplicate_of": null,
56+
"points": null,
57+
"quality": null
58+
}
59+
]

findings_file_links.json

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
[
2+
{
3+
"finding_id": "dbcccfc8-df36-4284-8016-120522e4d02f",
4+
"file_id": "3c4a4013-a895-416f-8333-bf2f0ddc28e6",
5+
"lines": "[687,688)",
6+
"id": "000068e0-d4de-4a80-a14a-65fc908a22a6"
7+
},
8+
{
9+
"finding_id": "fb6e6d95-aee9-4b53-901c-776f522f1280",
10+
"file_id": "81a9cdf5-281a-40b1-a816-833a7b4cba3d",
11+
"lines": "[624,625)",
12+
"id": "0008e2ee-9ddf-4219-aa00-48c9ab2f8f41"
13+
},
14+
{
15+
"finding_id": "532d35c3-9161-4579-b05a-857f5586b2b5",
16+
"file_id": "d3765c13-a227-4a8a-b8d1-df1f00254f6c",
17+
"lines": "[797,798)",
18+
"id": "003f09ed-e905-4b09-ab93-832bb0b1d797"
19+
},
20+
{
21+
"finding_id": "557b0949-1a40-4754-a9e8-f5ae71548848",
22+
"file_id": "20236568-bb47-4380-b406-dfd41158b619",
23+
"lines": "[51,52)",
24+
"id": "004d1d46-30de-4421-9e06-ab59f06947c3"
25+
}
26+
]

0 commit comments

Comments
 (0)