Teaching a Machine to Speak Estate Agent

Monday, September 8th, 2014
By Daniel Cooper

At Homeflow we’ve got an awful lot of property data. The nice thing about having a bunch of data is that machine learning techniques quickly become viable and we can generate some nice output.

A Markov text chain is a statistical machine learning tool that mathematicians use to make predictions about the future state of something when they know the past state. Often you’ll see it applied to text in order to generate real looking prose – a bit of shakespeare, for instance. You’ll also see it in your email spam box. Those nonsense spam emails you get that read almost-like-but-not-quite english use a form of Markov text chain.

It works by running through a big body of text and counting the amount of times a particular word comes after another word. Then, to generate text, we pick a random word then use our previously built list of word order likelihood to choose the next word – and we keep going until we hit a full stop to generate our sentence.

So what would happen if we fed 10,000 of our latest property descriptions into a Markov generator? These are some of my favourite snippets.

  • Rent excludes the front garden predominantly laid to the advertised properties
  • Shops: Waitrose has been tested.
  • Further benefits from a doorway to the Jubilee Line zone three
  • These details are as accurate as a regular railway station.
  • One bedroom property can be used for washing machine
  • Dining hall in a position of oil and fire.