The use of neural networks to build AIs that express creativity is a fascinating area of research. There is something incredibly amazing about the thought of creating something that itself is capable of creating new things of its own.

This project involved creating a deep learning model in PyTorch that generates text on its own, one character at a time.

The model made use of an neural network component known as an LSTM. For each timestep, the model takes a vector representing the character at the previous position, and outputs a vector assigning a score to each possible character. This output is processed to form the probability distribution of a character being chosen as the output for that timestep.

Unlike N-Gram language models which make completely local assessments regarding what the next item should be based on only the previous N items, an LSTM contains a type of memory about previous context and learns to focus on what portion of the context is important.


The data used to train the model was approximately 20MB of text files of classic books grabbed from Project Gutenberg. The books included a mixture of hand chosen ones such as:

  • Mary Shelly's Frankenstein,
  • Lewis Carroll's Alice's Adventures in Wonderland.
  • Jules Verne's From the Earth to the Moon, Journey to the Center of the Earth, and Twenty Thousand Leagues Under the Sea
  • Oscar Wilde's The Picture of Dorian Gray
  • Jane Austen's Sense and Sensibility, and Pride and Prejudice
  • Arthur Conan Doyle's The Valley of Fear

These were accompanied by several randomly selected books to try cover a good variety of writing styles and vocabulary.

The raw text files contained legal disclaimers either before or after the main text of the actual books. These legal disclaimers were removed so as not to contaminate the model with legaleze.


The model did not start off with any concept of what a word is. So prior to any training, the model would generate random gibberish such as:

A4;DH?ft7@$D]z'8QjznaQq*"WFqR&(jql]}9H<~ZjQ #[~UeUQx{X5wI,v.pj]AJX*DJl_`M6J_Z`+=|b(83e\F$aB!JK=yc9Ba

After just training on 10 batches, each consisting of 128 samples of 300 characters of text, it would start to figure out to put letters into chunks separated by spaces.

Aucatetan atecod oe th Yion runr wen tovandn rtat   l canso whe at tonse loi
e smre picnm ttp ciolyto

After an aditional 100 training steps, The model was forming word-like sequences of letters, (which are actually pronounceable, and could conceivably be real words).

Af Maved she shully breason.  Their recontory ester his ederity of etiad, expreat from portal though 

After several hours of training, the model not only generated sequences of mostly real words (and conceivably real words), but sequences that resemble sentences with some flow of meaning behind them. They might read like the ramblings of a crazy person, but the words are not entirely random, there is some semantic flow to the sequence of words. Notice that it also has learnt that at the start of a new sentence, it must start with a capital letter.

And the manner fairly perceived foo with mechanics, of care.  It listen nothing, and happiness to make shot and cushia, were successful to
the famous enemy which followed us, and he fully consented to 


And these passions at mirror and peace, like the city like than a right movement in my words, during
Antwich to revenge, and that all this care could, in Philip's memory to your uncle and the disbandy d

Compare the above outputs, with the text below which is the output of a script that actually chooses words completely at random from the training data.

pieces, the a I both probable the it minute warr the ce. two at discomfort admiration I complete Overeen wordy

Future Improvements

Given more time and computational resources, the model could potentially be improved further by experimenting with the following things.

  • Training on longer sequences of text at a time.
  • Trying different initializations for the parameters, eg, He et al 2015.
  • Making use of batch normalization.


The source code for this project can be viewed in the following link:


The project was inspired by the char-rnn model outlined in the now famous blog post by Andrej Karpathy titled "The Unreasonable Effectiveness of Recurrent Neural Networks". The code used is closely based on this great tutorial by Sean Robertson, which is a pyTorch reimplementation of Andrej Karpathy's char-rnn.

Image Credits