Week 3-5, GSoC 2018

The previous few weeks involved working on ONNX.jl and Keras.jl. I came across various challenges, errors, but was able to surpass most of them. I got to learn a lot in this process, about Julia's infrastructure, Keras, Flux and Machine learning in general. In this blog, I'm going to summarize my work, mostly on Keras.jl, since I spent a major chunk of my time on it. Incase you haven't, please do read the first two blogs here.

Keras.jl

A major portion of my time was spent on Keras.jl. Keras.jl is a package built on top of the Flux machine learning framework, to directly run Keras pretrained models in Flux. This is done by converting the entire Keras model into a Julian format. When a Keras (running either TensorFlow or Theano backend) model is compiled, it actually forms a computation graph. In layman's terms, Computation graphs can be thought of as a regular graph, but with each node representing layers and the edges representing the flow of tensors (hence the name TensorFlow). Keras.jl interprets this graph and converts it into pure Julia code (using DataFlow.jl), by reading the weights and other parameters of the model at each step. As a result, what we get is a Julian representation of the model. This can then be directly run on top of Flux, with appropriate inputs.

Keras.jl also provides a simple and easy-to-use interface. Loading and running a Keras is model is as simple as:

 >>> using Keras

 >>> model = Keras.load("model_structure.json", "model_weights.h5")

 >>> model(...)

(model_structure is the structure of the model, in JSON format.)
(model_weights stores the weights of the model, in HDF5 format.)
These fields can be obtained from any Keras model, using model.to_json() and model.save_weights() respectively.

Computer Vision based models:

I was able to load most of the popular Vision based Keras models into Flux. These models include:

I. Inception Netv3
II. DenseNet 121
II. DenseNet 121
III. DenseNet 169
IV. DenseNet 201
V. ResNet50

Most of the loaded models has pretty good accuracies (at par with the original ones).

Natural Language Processing related models:

I spent a major chunk of my time working on loading NLP related models. I implemented and loaded three NLP models:

I. IMDb sentiment analysis:
The model was trained on the IMDb sentiment analysis dataset, which contains 25,000 movie reviews and the sentiment (0/1) associated with them. The Keras model showed a 87.25% accuracy, and the corresponding loaded Flux models showed a 87.2% accuracy (pretty good result). The model also showed good results when tested on user input:

 >>> model("It was such a waste of time")
 0

 >>> model("such amazing stuff")
 1

The example, along with instructions, can be found here.

I. Reuters topic classification:
The dataset contains 11,228 newswires from Reuters, labeled over 46 topics. Each wire is encoded as a sequence of word indexes.
To know more, have a look at this.

I. Text Generation using LSTMs:
Text generation, in general, consists of training a model on any text dataset, such that the model learns useful patterns from the data. After training for a while, the model starts outputting words that make sense as a standalone entity. LSTMs (Long Short Term Memory) networks have proven to be very effective in this case. They are capable of remembering the relation among sequences over long distances, which makes them ideal for text generation. The Keras model I created was trained on the Alice in Wonderland text, which isn't protected under any copyright/license. The model was trained to take as input as sequence of 100 characters, and predict the next one. Tracing Keras LSTM to Flux LSTM was quite challenging, as they handle input in different ways. I was able to load the model successfully in Flux and the output, after training for just 15 epochs was pretty decent.

 Original text: went straight on like a tunnel for some way, and then
        dipped suddenly down, so suddenly that alice had not a mome

 Produced text: went straight on like a tunnel for some way, and then
        dipped suddenly down, so suddenly that alice had not a mome tf the fand,
        she wast on toineng an thre of the garter. whe mone dater and soen an an
        ar ien so the toede an the coold an thee, th shen io the want toon if the
        woil, and yhu io the whre tie dorse ’hu aoh yhu  than i soned th thi would
        be no tee thet

Notice that the output might not make sense, but it was able to produce familiar words, such as she, the on, of, and, an, so etc. from the training text. It is advisable to train the model for at least 25 epochs to get better results.

Other miscellaneous contributions:

Apart from the mentioned, I also opened an issue in Flux, and created a PR in JuliaText/TextAnalysis.jl, where I added the sentiment analysis functionality. The PR can be found here(W.I.P).

Blog 3,
GSoC 2018 @JuliaLang

A brief summary of my milestones.

Keras.jl

Computer Vision based models:

Natural Language Processing related models:

Other miscellaneous contributions: