@youraveragesciencepal, unfortunately I don't know of any software that could be used here.

As for the elaboration on the .npy files I hope this little script below will be of help for you. In case of any more questions feel free to ask. :)

import numpy as np
import matplotlib.pyplot as plt
import pickle

# load the preprocessed data from the file
data = np.load('data/dataset.npy', allow_pickle=True)
# let's look whats it's shape
print(data.shape)
# >> (10867,)
# so it's more of a list of examples
# now if we look into some example
print(data[0].shape)
# >>> (855, 3)
# this array stores consecutive points that represent a pen stroke
# if you print first few points
print(data[0][:5])
# >>> [[ 0.82798475 -4.2939095   0.        ]
#      [ 0.7848605  -4.3370337   0.        ]
#      [ 0.8193599  -4.2852845   0.        ]
#      [ 0.81073505 -4.2939095   0.        ]
#      [ 0.8021102  -4.2939095   0.        ]]
# you will see that we store (x, y, e) in each row
# x and y represent coordinates, and e holds special information on
# whether or not after that point we will "lift" the pen (and because it
# is lifted after that point, we wouldn't see the line between those points)

# let's plot first example ignoring `e` part for now
example = data[0]
plt.plot(example[:, 0], -example[:, 1])  # y coordinate is inverted, 
                                         # but that's not really important
plt.show()
# this should display a single example from the dataset, as you can see
# it looks like someone didn't lift a pen during writing
# now let's include information stored in `e`
lifts = np.where(example[:, 2] == 1.)[0] + 1  # we do +1 here because we want to
                                              # split after lifted point
splited = np.split(example, lifts)
for s in splited:
    plt.plot(s[:, 0], -s[:, 1])
plt.show()
# this should display a single example but ignoring the edges
# when a pen is "lifted"

# now let's move to labels and translation files
labels = np.load('data/labels.npy', allow_pickle=True)
translation = pickle.load(open('data/translation.pkl', 'rb'))
# look at labels shape
print(labels.shape)
# >>> (10867,)
# it should be the same as `data` because we need a label for each
# example that's present in the dataset
print(labels[0])
# >>> [29, 78, 1, 47, 71, 58, 75, 68, 71, 1, 50, 62, 65, 65, 
#      62, 54, 66, 72, 13, 1, 28, 1, 66, 68, 75, 58]
# each number in this array represents a letter which we can decode using
# the reversed translation dictionary (we need to reverse it, because it was
# created to be used during the generation, where we need to convert text into
# the numerical labels, but in this example we want to do the reverse)
reversed_translation = {v: k for k, v in translation.items()}
print(''.join(reversed_translation[x] for x in labels[0]))
# >>> "By Trevor Williams. A move"
# which should show the same text we could previously read on plots

Extension to other languages #17

Description

Activity

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions