-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: TypeScript Refactor #3
base: main
Are you sure you want to change the base?
Conversation
Hi @lukemovement, great idea! I use type hints in Python a lot; they really help with large codebases. Unfortunately, I haven't tried TypeScript yet, so it will take some time for me to grasp the main concepts. Perhaps someone else can help with this refactor. I've just posted on ShowHN mentioning your PR: |
TypeScript is pretty similar to Java in regards to the type-hinting implementation. I'm slowly looking through the samples and found a few bits that I need. I'm actually unsure as to whether or not those bits I was having issues with are needed or not. They weren't used within |
This has me very excited. |
Hi @lukemovement! I've been playing with TypeScript this weekend and have tried to implement more models in TensorFlow.js, such as ViT and CLIP. They share common layers with GPT, so I decided to merge them into a single repo: https://github.com/zemlyansky/modelzoo with corresponding NPM package: https://www.npmjs.com/package/modelzoo. I started it from scratch piecing some components together. Unfortunately, it's missing the original |
The mono repo is down to preference. It can make merge requests more difficult to deal with on larger code bases. I'm following the T5 architecture at the moment. Give this a try, it's much more effective at sharing data between embeddings as well as being less dependent on memory. I put it into an RNN, return the last thought, add it to the embeddings then normalize. A cross between T5 and BERT I guess, maybe closer to PaLM, I'm not sure import * as tf from "@tensorflow/tfjs-node";
import { Attention } from "./multi-head-self-attention.mjs";
class _AttentionRNNCell extends tf.layers.RNNCell {
units: number;
attentionLayer: tf.layers.Layer;
normalizeLayer: tf.layers.Layer;
keyLayer: tf.layers.Layer;
valueLayer: tf.layers.Layer;
numHeads: number;
stateSize: number;
axis: number;
constructor({
units,
keyLayer,
valueLayer,
numHeads,
trainable = true,
}: {
units: number;
keyLayer: tf.layers.Layer;
valueLayer: tf.layers.Layer;
numHeads: number;
trainable?: boolean;
}) {
super({ trainable });
this.units = units;
this.axis = 1;
this.attentionLayer = Attention({
name: `attention`,
axis: this.axis + 1,
});
this.keyLayer = keyLayer;
this.valueLayer = valueLayer;
this.numHeads = numHeads;
this.stateSize = units;
this.normalizeLayer = tf.layers.layerNormalization({});
this.trainable = trainable;
}
build(inputShape: tf.Shape | tf.Shape[]): void {
if ("number" !== typeof inputShape[0] && null !== inputShape[0]) {
return this.build(inputShape[0]);
}
const dims = inputShape[inputShape.length - 1] as number;
try {
this.keyLayer.build([dims]);
this.valueLayer.build([dims]);
this.normalizeLayer.build([1, this.units]);
} catch (error) {
throw new Error(`Error building RNNCell: ${(error as any).message}`);
}
this.trainableWeights = [
...this.keyLayer.trainableWeights,
...this.valueLayer.trainableWeights,
...this.normalizeLayer.trainableWeights,
];
this.nonTrainableWeights = [
...this.keyLayer.nonTrainableWeights,
...this.valueLayer.nonTrainableWeights,
...this.normalizeLayer.nonTrainableWeights,
];
this.built = true;
}
computeHeadedShape(inputShape: number[]): number[] {
const shape = [...inputShape] as number[];
shape.splice(this.axis, 0, this.numHeads);
shape[shape.length - 1] = this.units / this.numHeads;
return shape;
}
/**
* Perform the forward pass of the RNN cell.
*
* @param inputs - An array of input tensors.
* @returns An array containing the output and recurrentKernel.
*/
call(inputs: tf.Tensor<tf.Rank>[]): [tf.Tensor<tf.Rank>, tf.Tensor<tf.Rank>] {
const input = inputs[0];
const hPrev = inputs[1];
const key = this.keyLayer.apply(input) as tf.Tensor<tf.Rank>;
const value = this.valueLayer.apply(input) as tf.Tensor<tf.Rank>;
input.dispose();
const shape = this.computeHeadedShape(input.shape);
const attention = this.attentionLayer.apply([
hPrev.reshape(shape),
key.reshape(shape),
value.reshape(shape),
]) as tf.Tensor;
key.dispose();
value.dispose();
const shapedAttention = attention.reshape(hPrev.shape);
attention.dispose();
const mul = tf.mul(hPrev, shapedAttention);
const output = this.normalizeLayer.apply(mul) as tf.Tensor;
mul.dispose();
return [output, shapedAttention];
}
static get className() {
return "AttentionRnnCell";
}
}
tf.serialization.registerClass(_AttentionRNNCell);
export const AttentionRnnCell = (config: {
units: number;
keyLayer: tf.layers.Layer;
valueLayer: tf.layers.Layer;
numHeads: number;
trainable?: boolean;
}) => new _AttentionRNNCell(config); import type {
Initializer,
InitializerIdentifier,
} from "@tensorflow/tfjs-layers/dist/initializers";
import type { ActivationIdentifier } from "@tensorflow/tfjs-layers/dist/keras_format/activation_config.d.ts";
import type { Regularizer } from "@tensorflow/tfjs-layers/dist/regularizers";
import * as tf from "@tensorflow/tfjs-node";
class _Attention extends tf.layers.Layer {
axis: number;
constructor({ axis }: { axis: number }) {
super();
this.axis = axis;
}
computeOutputShape(inputShape: tf.Shape[]): tf.Shape | tf.Shape[] {
return inputShape[0];
}
call(inputs: tf.Tensor[]) {
const [query, key, value] = inputs as never as [
tf.Tensor,
tf.Tensor,
tf.Tensor,
];
const depth = query.shape[this.axis] as number;
const logits = tf.matMul(query, key, false, true);
query.dispose();
key.dispose();
const attention = tf.matMul(
tf.softmax(logits.div(tf.scalar(Math.sqrt(depth)))),
value,
);
logits.dispose();
value.dispose();
return attention;
}
static get className() {
return "Attention";
}
}
tf.serialization.registerClass(_Attention);
export const Attention = (config: { name: string; axis: number }) =>
new _Attention(config); `` |
@lukemovement I am personally also in favor of the "do one thing" approach, but for now, it's more like a TypeScript learning exercise. If it goes further, it would still be possible to split the repo back into dozens of npm modules, collecting them into one through imports. On layers: TensorFlow.js is quite tricky, as it requires tf.Layers in functional models. So you can't just mix regular ops (e.g., |
@lukemovement In your example, you pass layers as parameters to the custom layer? That's interesting! Did that work up to grads calculation and training? I mean |
The layers passed through the constructer of the RNN cell are stock dense layers. The weights then have to be registered on the layer this.trainableWeights = [
...this.keyLayer.trainableWeights,
...this.valueLayer.trainableWeights,
...this.normalizeLayer.trainableWeights,
];
this.nonTrainableWeights = [
...this.keyLayer.nonTrainableWeights,
...this.valueLayer.nonTrainableWeights,
...this.normalizeLayer.nonTrainableWeights,
]; For the imports, you can use |
Any news on this? Would be helpful to work with TS out-of-the-box! |
@peacefulotter, hi! Have you checked https://github.com/zemlyansky/modelzoo? GPT, ViT, CLIP are in Typescript there and pass tests. Documentation is still missing, but the GPT interface should be the same as |
Hey, I was just taking a punt at rewriting this into TypeScript to make it easier to use, but I ran into a few issues. So far I have gotten through most of the
src
directory, but I can't find use examples to match up a few of the type hinting. If you are alright to give me some pointers on it, I can finish off the build process so it can be used with the examples.The places where I am struggling to match up the types are;
src/utils.mts:3
src/utils.mts:26
src/model.mts:771
src/model.mts:778