Skip to content

Commit

Permalink
Merge pull request #11 from konsumer/newideas
Browse files Browse the repository at this point in the history
New Ideas
  • Loading branch information
konsumer authored May 3, 2024
2 parents 0aa741b + ee94857 commit 4ab6eea
Show file tree
Hide file tree
Showing 43 changed files with 6,429 additions and 3,591 deletions.
59 changes: 0 additions & 59 deletions .github/workflows/publish.yml

This file was deleted.

27 changes: 27 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Test

on:
push:
pull_request:

jobs:
build:
runs-on: ubuntu-latest

strategy:
matrix:
node-version: [18.x, 20.x, 21.x]
# See supported Node.js release schedule at https://nodejs.org/en/about/releases/
steps:
- uses: actions/checkout@v4
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- run: npm ci
- run: npm test
- name: Upload coverage reports to Codecov
uses: codecov/[email protected]
with:
token: ${{ secrets.CODECOV_TOKEN }}
30 changes: 30 additions & 0 deletions .github/workflows/ui.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Publish UI to Page

on:
push:
branches:
- master

jobs:
site:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Use Node.js
uses: actions/setup-node@v3
with:
cache: 'npm'
- name: Setup
run: cd ui && npm ci
- name: Build Site
run: npm run build
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: 'ui/dist'
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
node_modules
*.log
.DS_Store
test/demo.pb
/dist/
*.blob
/releases/
/coverage/
9 changes: 8 additions & 1 deletion .npmignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
node_modules
*.log
.DS_Store
test/demo.pb
*.bin
*.blob
/releases/
/ui/
/test/
sea-config.json
.prettierrc
/coverage/
21 changes: 21 additions & 0 deletions .prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"arrowParens": "always",
"bracketSameLine": false,
"bracketSpacing": true,
"embeddedLanguageFormatting": "auto",
"endOfLine": "lf",
"htmlWhitespaceSensitivity": "css",
"insertPragma": false,
"jsxSingleQuote": true,
"printWidth": 999,
"proseWrap": "preserve",
"quoteProps": "as-needed",
"requirePragma": false,
"semi": false,
"singleAttributePerLine": false,
"singleQuote": true,
"tabWidth": 2,
"trailingComma": "none",
"useTabs": false,
"vueIndentScriptAndStyle": false
}
164 changes: 27 additions & 137 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
# rawproto

Guess structure of protobuf binary from raw data
[![codecov](https://codecov.io/gh/konsumer/rawproto/graph/badge.svg?token=PBL1G8S4WY)](https://codecov.io/gh/konsumer/rawproto)

Very similar to `protoc --decode_raw`, but for javascript.

You can use this to reverse-engineer a protobuf protocol, based on a binary protobuf string.

See some example output (from the demo message in this repo) [here](https://gist.github.com/konsumer/3647d466b497e6950b12291e47f11eeb).

If you want an even lighter library, with no dependencies, and only want to view the output (no proto def needed) check out [rawprotoparse](https://github.com/konsumer/rawprotoparse).
Guess structure of protobuf binary from raw data, query binary protobuf without the schema, and output guessed JSON or schema, some CLI utils, and a web tool for exploring raw protobuf.

## installation

Expand All @@ -18,151 +12,47 @@ You can also use `npx rawproto` to run the CLI.

If you just want the CLI, and don't use node, you can also find standalone builds [here](https://github.com/konsumer/rawproto/releases).


## usage

In ES6;

```js
import { readFile } from 'fs/promises'
import { getData, getProto } from 'rawproto'

const buffer = await readFile('data.pb')
import RawProto from 'rawproto'

// get info about binary protobuf message
console.log( getData(buffer) )
// load proto & override "best guess" of types for a single field
const proto = new RawProto(await readFile('data.pb'), { '1.2.4.10.5': 'string' })

// print proto guessed for this data
console.log( getProto(buffer) )
```
// get a single field, without parsing the whole tree
console.log(proto.query('1.2.4.10.5:bytes'))

In plain CommonJS:
// same thing, but using type-mapping
console.log(proto.query('1.2.4.10.5'))

```js
var fs = require('fs')
var rawproto = require('rawproto')
// guess to decode as JS object
console.log(proto.toJS())

var buffer = fs.readFileSync('data.pb')
// guess to generate .proto file string
console.log(proto.toProto())

// get info about binary protobuf message
console.log( rawproto.getData(buffer) )
// walk over messages recursively, calling your callback.
const mydata = proto.walk((path, wireType, data) => {
console.log({ path, wireType, data })

// print proto guessed for this data
console.log( rawproto.getProto(buffer) )
// just do whatever it normally does
return proto.walkerDefault(path, wireType, data)
})
```

You can do partial-parsing, if you know some of the fields:

```js
import { readFile } from 'fs/promises'
import protobuf from 'protobufjs'
import { getData, getProto } from 'rawproto'
### types

const proto = await protobuf.load(new URL('demo.proto', import.meta.url).pathname)
const Test = proto.lookupType('Test')
const buffer = await readFile('data.pb')
Protobuf encodes several different possible types for every wire-type. In this lib, we guess the type based on some context-clues, but it will never be perfect, without hand-tuning. Here are the possible types we support:

// get info about binary protobuf message, with partial info
console.log(getData(buffer, Test))
```
You can use `fetch`, like this (in ES6 with top-level `await`):
```js
import { getData } from 'rawproto'
import { fetch } from 'node-fetch'

const r = await fetch('YOUR_URL_HERE')
const b = await r.arrayBuffer()
console.log(getData(Buffer.from(b)))
VARINT - uint, bool
FIXED64 - uint, int, bytes, float
LEN - string, bytes, sub, packedvarint, packedint32, packedint64
FIXED32 - int, uint, bytes, float
```

### getData(buffer, stringMode, root) ⇒ <code>Array.&lt;object&gt;</code>
Turn a protobuf into a data-object
**Returns**: <code>Array.&lt;object&gt;</code> - Info about the protobuf
| Param | Type | Description |
| --- | --- | --- |
| buffer | <code>Buffer</code> | The proto in a binary buffer |
| root | <code>Object</code> | protobufjs message-type (for partial parsing) |
| stringMode | <code>string</code> | How to handle strings that aren't sub-messages: "auto" - guess based on chars, "string" - always a string, "binary" - always a buffer |
### getProto(buffer, stringMode, root) ⇒ <code>string</code>
Gets the proto-definition string from a binary protobuf message
**Returns**: <code>string</code> - The proto SDL
| Param | Type | Description |
| --- | --- | --- |
| buffer | <code>Buffer</code> | The buffer |
| root | <code>Object</code> | protobufjs message-type (for partial parsing) |
| stringMode | <code>string</code> | How to handle strings that aren't sub-messages: "auto" - guess based on chars, "string" - always a string, "binary" - always a buffer |
## cli
You can also use rawproto to parse binary on the command-line!
Install with `npm i -g rawproto` or use it without installation with `npx rawproto`.
If you just want the CLI, and don't use node, you can also find standalone builds [here](https://github.com/konsumer/rawproto/releases).
Use it like this:
```
cat myfile.pb | rawproto
```
or
```
rawproto < myfile.pb
```
or
```
npx rawproto < myfile.pb
```
```
Usage: rawproto [options]

Options:
--version Show version number [boolean]
-j, --json Output JSON instead of proto definition [default: false]
-m, --message Message name to decode as (for partial raw)
-i, --include Include proto SDL file (for partial raw)
-s, --stringMode How should strings be handled? "auto" detects if it's binary
based on characters, "string" is always a JS string, and
"binary" is always a buffer.
[choices: "auto", "string", "binary"] [default: "auto"]
-h, --help Show help [boolean]
Examples:
rawproto < myfile.pb Get guessed proto3 definition from
binary protobuf
rawproto -i def.proto -m Test < Guess any fields that aren't defined
myfile.pb in Test
rawproto -j < myfile.pb Get JSON represenation of binary
protobuf
rawproto -j -s binary < myfile.pb Get JSON represenation of binary
protobuf, assume all strings are
binary buffers
```
## limitations
There are several types that just can't be guessed from the data. signedness and precision of numbers can't really be guessed, ints could be enums, and my `auto` system of guessing if it's a `string` or `bytes` is naive (but I don't think could be improved without any knowledge of the protocol.)
You should definitely tune the outputted proto file to how you think your data is structured. I add comments to fields, to help you figure out what [scalar-types](https://developers.google.com/protocol-buffers/docs/proto3#scalar) to use, but without the original proto file, you'll have to do some guessing of your own. The bottom-line is that the generated proto won't cause an error, but it's probably not exactly correct, either.
## todo
You can also use `raw` for any type to get the raw field with bytes + meta.

* Streaming data-parser for large input
* Collection analysis: better type-guessing with more messages
* `getTypes` that doesn't mess with JS data, and just gives possible types of every field
* partial-parsing like `protoc --decode`. It basically tries to decode, but leaves unknown fields raw.
Groups are treated as repeated `LEN` message-fields.
Loading

0 comments on commit 4ab6eea

Please sign in to comment.