Earlier I used Threads and Queues as my Data Pipeline and I got a really high Util on both GPUs (the data was created on the fly). I wanted to use the tf Dataset, but I struggle to reproduce the results.

I tried a lot of approches. Since I create data on the fly the from_generator() method seemed perfect. The code you see below is my last try I did. It seems that there is a bottleneck in creating the data although I am using the map() function for the processing of the generated images. What I tried in the code below I wanted to "multithread" the generators somehow, so there is more data coming in at the same time. But no better results so far.

def generator(n): with tf.device('/cpu:0'): while True: ... yield image, label def get_generator(n): return partial(generator, n) def dataset(n): return tf.data.Dataset.from_generator(get_generator(n), output_types=(tf.float32, tf.float32), output_shapes=(tf.TensorShape([None,None,1]),tf.TensorShape([None,None,1]))) def input_fn(): # ds = tf.data.Dataset.from_generator(generator, output_types=(tf.float32, tf.float32), output_shapes=(tf.TensorShape([None,None,1]),tf.TensorShape([None,None,1]))) ds = tf.data.Dataset.range(BATCH_SIZE).apply(tf.data.experimental.parallel_interleave(dataset, cycle_length=BATCH_SIZE)) ds = ds.map(map_func=lambda img, lbl: processImage(img, lbl)) ds = ds.shuffle(SHUFFLE_SIZE) ds = ds.batch(BATCH_SIZE) ds = ds.prefetch(1) return ds

The expected results would be a high GPU Util (>80%), but for now it is really low 10/20%.