Presentation Title

Burglary Prediction using Synthetic Training Data for Supervised Machine Learning

Format of Presentation

Poster to be presented the Friday of the conference

Abstract

Incredible amounts of crime data are freely available to the public through open data initiatives. These data sets give the time and general location of hundreds of thousands of crime reports. This wealth of well-labeled information is ideal for Supervised Machine Learning, allowing for analysis and prediction to help future civic planning and law enforcement. Unfortunately, these data sets have flaws that make them challenging to use with Machine Learning methods. To predict future burglaries, both example homes that have been burgled, as well as example homes that have not been burgled are needed. As no city records locations where a crime has not occurred, only half the picture needed to make predictions exists.

To solve this problem, we constructed a set of synthetically generated times and locations that do not correspond with known burglary reports, but accurately represent the geography and distribution of buildings within the city. By combining these synthetic points with the real crime data, a complete picture is formed and a predictive model can be trained. Additional features were combined with these locations. For Vancouver, this included weather and lighting data from Environment Canada, and the local density of street trees, traffic signals, and light poles from the Vancouver Open Data Portal. The resulting Machine Learning model was able to correctly classify more than 80% of burglaries in this data.

The ability to generate Synthetic Training Data cheaply and quickly can be a major boon to fields such as Computational Criminology, where one-sided sets of data like this are common, lacking the whole picture needed for common Machine Learning tools.

Department

Computing Science

Faculty Advisor

Andrew Park

This document is currently not available here.

Share

COinS
 

Burglary Prediction using Synthetic Training Data for Supervised Machine Learning

Incredible amounts of crime data are freely available to the public through open data initiatives. These data sets give the time and general location of hundreds of thousands of crime reports. This wealth of well-labeled information is ideal for Supervised Machine Learning, allowing for analysis and prediction to help future civic planning and law enforcement. Unfortunately, these data sets have flaws that make them challenging to use with Machine Learning methods. To predict future burglaries, both example homes that have been burgled, as well as example homes that have not been burgled are needed. As no city records locations where a crime has not occurred, only half the picture needed to make predictions exists.

To solve this problem, we constructed a set of synthetically generated times and locations that do not correspond with known burglary reports, but accurately represent the geography and distribution of buildings within the city. By combining these synthetic points with the real crime data, a complete picture is formed and a predictive model can be trained. Additional features were combined with these locations. For Vancouver, this included weather and lighting data from Environment Canada, and the local density of street trees, traffic signals, and light poles from the Vancouver Open Data Portal. The resulting Machine Learning model was able to correctly classify more than 80% of burglaries in this data.

The ability to generate Synthetic Training Data cheaply and quickly can be a major boon to fields such as Computational Criminology, where one-sided sets of data like this are common, lacking the whole picture needed for common Machine Learning tools.