Bike Rental Predictions
Bike rental predictions is a project demo we have undertaken to trial implementing a machine learning model to make predictions on the number of bike rentals on a certain date based on past results.
To give further insight, the model would retrieve a collection of inputted data including date, whether it is holiday, windspeed and some other factors, most importantly the number of bike rentals. With this given data, the model would learn patterns and connection between features utilising the user-chosen algorithm, whether that be XGBoost or LightGBM.
Upon completion, the user will then be able to give new data, including all the same parameters without the number of bike rentals. The model will then predict the number of bike rentals and return the associated number of bike rentals predicted for each date the user inputted.
This preview will show you how to train a model and use it to make predictions via the command line. In this example, we’ll use a pre-existing dataset to train a model using XGBoost and LightGBM.
Once the model is trained, we’ll use it to make predictions on a dataset where the number of rentals are not given. The command line has been altered to possess the ability to use files from Azure Storage and to store results back there too.
Training via API
Here is a preview of training a model via an API. For this specific example, we use a pre-existing dataset to train a model via XGBoost. The results covering the model’s training performance can either be saved locally or in Azure (at user’s request). The results of the testing data used to test the model can also be saved into a SQL database (at user’s request). Power BI, with the help of Python scripting, is then used to generate graphs from the table in a SQL database. These graphs can be used for further analysis of the results.
Prediction via API
Here is a preview of doing predictions via the API. For this specific example, we use a pre-existing dataset and a trained model (XGBoost in this case) to make predictions. The results (date and predicted bike rentals for this date) can either be saved locally or in Azure (at user’s request). The resulting predictions can also be saved into a SQL database (at user’s request).
Training and predication via forms
This preview will show you how to train a model and use it to make predictions via forms. In this example, we’ll use a pre-existing dataset to train a model using XGBoost and LightGBM.
Once the model is trained, we’ll use it to make predictions on a dataset where the number of rentals are not given.
Power BI provides the opportunity to gain a deeper understanding of the results through the generation of graphs.
An example to highlight this, relevant to the bike rentals demonstration, is the accuracy of the model – upon making predictions of how many bike rentals will occur on a certain day, errors are bound to occur as predicting the future to the finest degree is trivially impossible. The graph on the left manifests that on this set run, there were clear anomalies in the model’s performance, namely in months: March, April and November. However, on this same run, the model proved to automate reliable predictions in months such as December (where the white circle represents the mean of the errors for that month).
The takeaway is that without such tools like Power BI, it would be harder to formulate a complete summary of the model’s performance, which is the key motivation for why we use such a tool.
This diagram shows a typical process of how a such a project can be conducted.
Our client wanted a custom software to analyse and make predictions relating
to the financial market.
The software we made allows the client to:
- simulate market data
- create custom deep learning networks
- train these networks on the simulated data or on real data stored in the client database
- analyse the performance of the resulting models
- deploy selected models to an Oracle machine for continuous predictions
- Regression problem on time series
- Development of Custom neural networks
- From financial market modelling, create productionable neural network models
- Regular client meetings and direction
- Full time ML Engineering work, daily rates
- The client will use the model(s) for their trading bots
Retail Unit Closures
The client needed a retail management solution to oversee store
collections. They wanted to predict whether a retail unit would close or stay
open. We treated it as a classification problem. The data included demographic
features, geographic features, store categories and more.
Thanks to a clear roadmap that was agreed upon with the client, we were able to deliver within the budget.
We encountered several challenges while delivering the solution to the client, including:
- High volume of data in a remote Microsoft SQL Server
- Improve model despite lack of positives in dataset
- Create a solution in our Azure tenant that can run on the Client’s side – Utilisation of Azure Data Factory, CI/CD and Azure ML services…
- Delivery within budget
Using the productionised model and its predictions, our client was able to:
- Display likelihood of closure for each store on their website
- Enrich the data they already had available
- This data could then be sold on to third parties
Research Paper Helper
How do you efficiently extract data from research articles?
This project uses computer vision, OCR and NLP to detect tables in academic
PDF files. The aim is to help make research papers more accessible and easier to use.
The process is integrated into an API with Flask, which makes it easy to use for our client.
From scraping to NLP
The following diagram outlines the different steps involved in completing this project, from designing an article scraper to performing NLP tasks to extract relevant data.
This service can be sold to educational bodies, which would help researchers to quickly find informative results.
Financial Data Extraction
In this project, we aimed to extract financial information from webpages. The information we wanted to extract were the monetary values (in Yuan) of loans mentioned in these pages, including their principal, interest, fees, etc. and the total value. These webpages were only accessible in Standard Chinese.
For that, we used NLP models to automatically extract this information using the scraped content of the webpages. After training these models, we deployed them into a mixed API/forms container app, which is hosted on Azure.
The training phase was an opportunity to compare different types of approaches in terms of their accuracy, speed and cost. One of the challenges was the small amount of data available for training.
The training data came from a selection of manually curated scraped URLs. It was formatted in pairs (sentence, extracted values). Due to the manual process of extraction only a relatively small amount of data was available for training.
For hyperparameter tuning we used the Weights & Biases platform. This allowed us to set up sweep parameters easily and inspect the results using parallel coordinates plots. Our main tools for training models were SpaCy,GPT-3 and PyTorch (for BERT).
Scoring And Evaluation
During the hyperparameter optimisation, we minimised the model’s loss. For SpaCy, this was the built-in NER engine loss, for GPT-3 this was the built-in text completion loss and for BERT, the cross entropy loss over next token prediction.
In order to measure the model’s performance on its purposed task, we defined several metrics relating directly to the task objective:
- The counts of correctly extracted values (“true positives”), missed values (“false negatives”) and falsely extracted values (“false positives”).
- The precision & sensitivity derived from these counts.
The SpaCy and GPT-3 models were much faster to train and use than BERT. In an attempt to improve BERT speed, we also compared the speed of the original and dynamically-quantised version of the fine-tuned BERT model.
SpaCy is a widely used library dedicated to natural language processing. It provides pretrained models in many different languages together with convenient pipelines (such as NER engines) and functionalities, including the ability to fine-tune a preloaded model.
We fine-tuned the NER engine constructed from the zh_core_web_lg NLP model dedicated to the Standard Chinese Language. In the hyperparameter optimisation phase, we searched through different values for the number of epochs (from 10 to 400) and dropout rate (0 or 0.1).
The underlying architecture of this model consists of deep neural networks and uses the attention mechanism.
The best model had scores of 45% precision and 52% sensitivity. This class of models seemed to be able to locate and mostly extract the right information, although the extraction was imperfect. For example, the model sometimes extracted values but ignored anything beyond the decimal point. It seems quite possible that with more training data this model would perform much better.
GPT-3 is OpenAI’s most recent class of language models. Training and inference must be done via the OpenAI API interface. These models are known for their success in performing few-shot learning tasks.
The available model comes in four degrees from the least to best performing models: Ada, Babbage, Curie and DaVinci. The cost of using the models is determined by the number of tokens going through them. From these models, the cheapest model to use is Ada and the most expensive (significantly) is DaVinci.
We fine-tuned GPT-3 models using the OpenAI library. The hyperparameter search space consists of different base models (Ada, Babbage, Curie), the number of training epochs (1, 2, 3; 1-2 being the amount recommended by OpenAI) and the learning rate multiplier (0.1, 0.02).
According to OpenAI documentation, the recommended amount of data for fine-tuning a GPT-3 model is a few hundred samples. Our data barely reached that threshold but this was still enough to obtain promising results.
A GPT-3 model is designed to continue the text it is given as prompt. In order to use such a model as a NER engine, one needs to create prompt training completion examples that will lead the model to generate a text containing the relevant information in an easily-extractable form. This practice of devising a prompt to large language models for a NLP task is called “prompt engineering”.
Despite the small amount of training data, the fine-tuned models often reliably extracted the relevant information from the input. Once we fine-tuned the Ada base model, we found that it was good enough for the task at hand.
The best models reached a precision and sensitivity of at least 90% (for both train and test slices). Even the smallest models (based on Ada) reached these levels, which made them suitable for use, especially considering their reduced cost and computation time compared to the larger models.
The cost of inference of an Ada model is 0.0004$/1000 tokens. A sentence would generally consist of less than 100 tokens.
We wanted to use a text completion model with training samples formatted in a similar way as for GPT-3, in order to replicate the methods used with GPT-3 on a personal computer.
A natural choice of model would have been GPT-2, since it has a decoder architecture like GPT-3. Unfortunately, fine-tuning GPT-2 required more memory than was available on the computer. Despite being an autoencoder, BERT can still be used for text completion. Much smaller than GPT-2, BERT can be fine-tuned on a personal computer.
The base model was the bert-base-chinese model available on Hugging Face. Despite the relatively smaller size, training and inference were rather slow. In an attempt to address this issue we applied dynamic quantisation on the fine-tuned model and evaluated its effect on the performance of the model.
The search space for hyperparameter optimisation consisted of the number of epochs (5-8) and learning rate (5×10-6, 10-5, 5×10-5, 10-4).
The scores of these models was generally very low. Even if sometimes a model appeared to be able to catch the expected information, this wasn’t sufficient to compete with the other types of model. Producing a functional NER engine employing this type of model would require finding more appropriate training configurations, and much more training data.
This training procedure was, of course, not the optimal way of using a model such as BERT. A more standard procedure would use the BERT internal representation as an embedding to input to a classifier.
The procedure of quantisation consists of replacing costly floating-point computations with integer operations. This procedure reduced the memory size of the BERT model by ~60% (from ~410Mb to ~170Mb) and the computation time of the full inference pipeline (and not just the quantised part) by ~35%.
Those results were encouraging but still insufficient to make these models practical. Another limitation of dynamic quantisation is that the original model is required to load the quantised model. The quantisation approach could potentially be improved by using static quantisation or distillation (for example: DistilBERT).