Help with Converting Spatio-Temporal Dataset for Consumption #19

trecius · 2016-04-26T22:28:37Z

Hello,

I have a spatio-temporal dataset that I have compiled. It's in a TSV format, and I'd like your RNNSharp to consume the input for classification as well as recognition. My features are continuous values in the range [0, 1]. My TSV file looks like the following:

ID1 0.923 0.223 0.573 0.235 0.111
ID1 0.920 0.228 0.353 0.213 0.098
ID1 0.901 0.677 0.235 0.551 0.121
...
ID1 0.853 0.383 0.301 0.618 0.132

ID1 0.918 0.733 0.622 0.222 0.238
ID1 0.985 0.682 0.793 0.221 0.465
...
ID1 0.953 0.788 0.912 0.228 0.539

ID2 0.918 0.733 0.622 0.222 0.238
ID2 0.985 0.682 0.793 0.221 0.465
...
ID2 0.953 0.788 0.912 0.228 0.539

Each line in my TSV is a snapshot at a specific moment in time. When all snapshot are combined, it describes the spatio-temporal entity. These entities are separated by an EMPTY LINE. Therefore, the first instance ID1 is all the lines until you reach the empty line. The second instance of ID1 is the next set of contiguous lines and so on. Note, the first TSV value is just a class label and is not a feature. Also, I have 6 class labels for this spatio-temporal dataset.

1.) First, how can I transform my data into an "embedded feature" that is in the correct model format? I assume this is the Txt2Vec?

2.) Additionally, I will have to create a corpus. Will the following work for the corpus?

ID1 ClassLabel1
ID2 ClassLabel2
ID3 ClassLabel3
ID4 ClassLabel4
ID5 ClassLabel5
ID6 ClassLabel6

3.) Additional steps or a walkthrough would be greatly appreciated. I hope this information helps all others who are trying to consume RNNSharp. When I finish, I hope to compile a walkthrough for others, so they can easily consume this great technology.

Thank you.

zhongkaifu · 2016-04-27T05:33:32Z

For each time frame (one line in your training corpus), if it only contains 5 features, you could build embedding model likes. That means each time frame has its unique id.
ID1 0.923 0.223 0.573 0.235 0.111
ID2 0.920 0.228 0.353 0.213 0.098
ID3 0.901 0.677 0.235 0.551 0.121
...
ID2 0.920 0.228 0.353 0.213 0.098

I just updated RNNSharp to support embedding model in raw text format, so you could use above format for training directly. Please replace WORDEMBEDDING_FILENAME with WORDEMBEDDING_RAW_FILENAME in configuration file.

For #2, yes. It looks good. For example, it may looks like
ID1 Wave
ID2 Label2
ID2 Wave
...
IDn LabelX

For each time frame, it has a corresponding label as result.

trecius · 2016-04-27T19:02:10Z

Hello:

I'm getting closer. I've since extracted all my time frames that I want to train the dataset into a single file: rawModel.txt. It has the format:

\t\t\t\t\t
\t\t\t\t\t
...
\t\t\t\t\t

I've also created a train.txt file, and it is in the format:

\t
\t
\t
...
\t

Finally, I've also create a template.txt file. It looks like this:

U01:%x[0,0]
U02:%x[0,1]
U03:%x[0,2]
U04:%x[0,3]
U05:%x[0,4]
U06:%x[-1,0]
U07:%x[-1,1]
U08:%x[-1,2]
U09:%x[-1,3]
U10:%x[-1,4]
U11:%x[1,0]
U12:%x[1,1]
U13:%x[1,2]
U14:%x[1,3]
U15:%x[1,4]

I've modified the BAT file to use the new files, but it's not working the way I had planned.

1.) How does RNNSharp (RNNSharpConsole) know when one spatio-temporal entity has completed and a new one begins? I'm more talking about the edge cases. I've tried to split up them using a blank line, but an exception is thrown, stating the lengths are not the same.

zhongkaifu · 2016-04-27T20:28:55Z

Since you are going to use continuous values as features, the template.txt should only keep one line: U01:%x[0,0]. All of other lines are used for discrete features only.

In training corpus, RNNSharp uses a blank line to split two entities, but embedding model (rawModel.txt in your example) needn't to use blank lines, since embedding model is just a key-value pair, RNNSharp access embedding model by keyword, and get dense features from embedding model for encoding or decoding.

RNNSharp already supports embedding model in raw text format, you could sync the latest code from depot and use it. In your case, the configuration file looks like:

#The file name for template feature set
TFEATURE_FILENAME: tfeature
#The context range for template feature set. In below, the context is current token, next token and next after next token
TFEATURE_CONTEXT: 0

WORDEMBEDDING_RAW_FILENAME: rawModel.txt
#The context range for word embedding.
WORDEMBEDDING_CONTEXT: -1, 0, 1
#The column index applied word embedding feature
WORDEMBEDDING_COLUMN: 0

I hope these information can help you. For exception you mentioned, could you please show more detailed information about it ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with Converting Spatio-Temporal Dataset for Consumption #19

Help with Converting Spatio-Temporal Dataset for Consumption #19

trecius commented Apr 26, 2016 •

edited

Loading

zhongkaifu commented Apr 27, 2016

trecius commented Apr 27, 2016

zhongkaifu commented Apr 27, 2016

Help with Converting Spatio-Temporal Dataset for Consumption #19

Help with Converting Spatio-Temporal Dataset for Consumption #19

Comments

trecius commented Apr 26, 2016 • edited Loading

zhongkaifu commented Apr 27, 2016

trecius commented Apr 27, 2016

zhongkaifu commented Apr 27, 2016

trecius commented Apr 26, 2016 •

edited

Loading