Over the last few years, Invenio’s methods of analysing data have undergone a number of radical changes. From the early days of excel spreadsheets, used for each individual property, to more recent systems which could handle larger and larger data sets through R and Python, the system is now hardly recognisable.
The most recent version of the Invenio Data Analysis Tool is designed to automate our analysis procedures in a fast and efficient way. By combining code that the analysts have written over the past few years, the tool can perform an analytical procedure that once took over a week, in a few hours.
The tool is comprised of 8 functions:
- Grouping properties
- Converting Tinytags to CSVs
- Finding broken loggers
- Cutting CSVs for R input
- Continuous flow identification
- Intermittent continuous flow and plumbing loss identification
- R intermittent script
- R Continuous script
The “Grouping properties” function will look at the distribution of latitude and longitude in the gps_data file and download the appropriate map from the Google Maps API. I’ve noticed that map can appear warped at the edges in some of the larger DMAs. This, combined with some slightly inaccurate GPS data, can shift some points off the street, which makes it difficult to tell which side of the street the property is on. This is due to the zoom setting within Google Maps set as constant, so in the future I may be able to introduce an adaptable zoom.
The “Finding broken loggers” script does exactly what it says on the tin. This will look for any values below -10 or above +30 between the user defined start and end time. The results of this will be written to a “Broken loggers.csv” file so that the engineer doesn’t have to trawl through Tinytags to find loggers that need to be quarantined.
The “Continuous flow identification” script has been known to be inaccurate for some time now. Our newest member of the Derby analyst team, Tom Alps, has come up with some very interesting ideas on how we can improve these results using some Machine learning techniques (watch this space!). We should have an updated script by the end of April.
Dan has created the Intermittent continuous flow and Plumbing loss identification script, which has proved quite accurate. Dan has also come up with a way to visualise the result of his scripts in excel, which will save a lot of time. Perhaps we will arrange a demonstration in the near future.
There are some bugs in the transitions between these scripts (most notably, running the R scripts through python), however these bugs will be fixed soon.
Leave a comment