Skip to content

augmentedManifestFile + PipeModeDataset example #63

@vlordier

Description

@vlordier

It would really help to have a full end to end example of, say, image classification with augmentedManifestFile + PipeModeDataset

as I keep getting errors of formats like
tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Could not parse example input, value: '����

I build a jsonl augumentedManifest with

{'image-ref': s3://path/to/image, 'label': 3} 
{'image-ref': s3://path/to/image, 'label': 1}  
{'image-ref': s3://path/to/image, 'label': 2}  

then preparing training channel as

train_data = sagemaker.session.s3_input(augmented_manifest_file_on_s3,
                                        distribution 	= 'FullyReplicated',
                                        content_type 	= 'image/jpeg',
                                        s3_data_type 	= 'AugmentedManifestFile',
                                        attribute_names	= ['image-ref', 'label'],
					input_mode 		= 'Pipe',
                                        record_wrapping = 'RecordIO') 

and launching the .fit as

data_channels = {'train': train_data}

# Train a model.
tf_estimator.fit(inputs=data_channels, logs=True)

in my entry script, I have

	dataset = PipeModeDataset(channel = channel)
	dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
	dataset = dataset.batch(2)
	dataset = dataset.map(combine)
	dataset = dataset.map(example_parser, num_parallel_calls=batch_size)
	dataset = dataset.repeat(epochs)
	dataset = dataset.batch(batch_size, drop_remainder=True)
	image_batch, label_batch = next(iter(dataset))

and as a modified example parser, I have

`def example_parser(exemple1, exemple2):

feat1 = tf.io.parse_single_example(
	exemple1,
	features={
		'image-ref'		: tf.io.FixedLenFeature([], tf.string),
	})

feat2 = tf.io.parse_single_example(
	exemple2,
	features={
		'label'			: tf.io.FixedLenFeature([], tf.int64),
	})

image 					= feat1['image-ref']
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
label 					= tf.cast(feat2['label'], tf.int32)
return image, label

`

What am I doing wrong ?
The documentation here is not clear about using augmented manifest files

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions