Ordering test set Dogs vs. Cats

The test set of the Dogs vs. Cats in Fuel was ordered based on strings instead of numerically (i.e. 1, 10, 11, …, 2, … instead of 1, 2, …). When submitting to Kaggle it expects the latter, which meant that the test scores did not make sense. This was fixed in Fuel pull request #336. If you are using the cluster, the dataset has been updated (if you need the old HDF5 file it can be found under dogs_vs_cats.old.hdf5). If you run your own installation, please update Fuel and rerun fuel-convert dogs_vs_cats.



If during revision you have questions about the coursework, feel free to post them on the Questions and answers page. In the same spirit of the rest of the course, please help out your students if you know the answer to their questions.


Project leaderboard and deadline

Some students have asked about the deadline for the class project. The deadline will be 4 weeks from now (the Monday after classes end) on 18 April.

As was mentioned in class, we put up a leaderboard where we ask you to submit the results you achieved on either project (classification scores for Dogs vs. Cats and samples or perplexity scores for the vocal synthesis task). Don’t wait until the end to do so, please put up intermediate results as well!


Sequence windows

As was briefly discussed in class and mentioned on the getting started page, you will need to split up your idea in subsequences in order to train RNNs. To do so, I just added a new transformer to Fuel that does this for you (should be merged soon, but you can check out the branch or copy-paste the code if you want to use it right now).

You can use it as follows:

from fuel.datasets.youtube_audio import YouTubeAudio
from fuel.transformers.sequences import Window

data = YouTubeAudio('XqaJ2Ol5cC4')
stream = data.get_example_stream()

sequence_size = 10
windows = Window(1, sequence_size, sequence_size, True, stream)
for source, target in windows.get_epoch_iterator():
    train(source, target)