Oto krótki przykład mogę wymyślić:
import tensorflow as tf
import numpy as np
components = np.arange(100).astype(np.int64)
dataset = tf.contrib.data.Dataset.from_tensor_slices(components)
dataset = dataset.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _, els: els.batch(10), window_size=100)
iterator = dataset.make_one_shot_iterator()
features = iterator.get_next()
sess = tf.Session()
sess.run(features) # array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], dtype=int64)
Pierwszym argumentem key_func
odwzorowuje każdy element w zbiorze danych do klucza.
Parametr window_size
określa rozmiar wiadra podany dla reduce_fund
.
W reduce_func
otrzymujesz blok elementów window_size
. Możesz tasować, wsadać lub wkładać, jak chcesz.
EDIT dynamicznego wyściółka i bucketing pomocą fucntion group_by_window more here:
Jeśli masz tf.contrib.dataset
która utrzymuje (sequence, sequence_length, label)
i sekwencja jest tensor tf.int64:
def bucketing_fn(sequence_length, buckets):
"""Given a sequence_length returns a bucket id"""
t = tf.clip_by_value(buckets, 0, sequence_length)
return tf.argmax(t)
def reduc_fn(key, elements, window_size):
"""Receives `window_size` elements"""
return elements.shuffle(window_size, seed=0)
# Create buckets from 0 to 500 with an increment of 15 -> [0, 15, 30, ... , 500]
buckets = [tf.constant(num, dtype=tf.int64) for num in range(0, 500, 15)
window_size = 1000
# Bucketing
dataset = dataset.group_by_window(
lambda x, y, z: bucketing_fn(x, buckets),
lambda key, x: reduc_fn(key, x, window_size), window_size)
# You could pad it in the reduc_func, but I'll do it here for clarity
# The last element of the dataset is the dynamic sentences. By giving it tf.Dimension(None) it will pad the sencentences (with 0) according to the longest sentence.
dataset = dataset.padded_batch(batch_size, padded_shapes=(
tf.TensorShape([]), tf.TensorShape([]), tf.Dimension(None)))
dataset = dataset.repeat(num_epochs)
iterator = dataset.make_one_shot_iterator()
features = iterator.get_next()
Dziękuję bardzo. To powinno być dodane do tf docs. – barbolo