Boston House Prices

Estimating the market value of real estate is of value to both real estate agents and those selling their own houses. A real estate agent that can provide its clients with the best possible estimates will gain a positive reputation and gain clientele. If the process of estimation can be automated using machine learning techniques then there is a potential competitive edge over its rivals.

This project makes use of a subset of the classic Boston Housing Dataset provided by the UCI Machine Learning Repository. The data contains 489 training samples (after filtering out for outliers and missing values). From this data, the following four features were used:

  • The average number of rooms among homes in the neighborhood.
  • Socio-economic status of the neighborhood.
  • The ratio of students to teachers in primary and secondary schools in the neighborhood.
  • Median value of owner-occupied homes

A decision tree regression model was trained using scikit-learn. Both grid search and cross validation were used to find the best model parameters. The final model attained an R^2 value of 0.84 on a held out portion of the dataset. While the trained model did quite well on the test set, it was determined that due to the small amount of data in this dataset, and how outdated the data was, that it would probably not be robust enough to be reliable in the modern housing market. A more up to date dataset with more data points would be needed.

The full report and source code can be viewed in the following links: