ADAS – Advanced Driver Assistance Systems are electronic systems designed to enhance vehicle safety and driving comfort by assisting drivers with various tasks. These systems use sensors, cameras, and other technologies to monitor the vehicle’s surroundings and provide warnings or automated actions to prevent accidents

In the rapidly evolving world of autonomous and semi-autonomous driving, ADAS play a crucial role in enhancing vehicle safety, efficiency, and user experience. For Electric Vehicles (EVs), ADAS features like automatic braking, lane correction, and speed maintenance are even more vital due to unique factors such as battery management and regenerative braking. In this blog post, we’ll dive into building a predictive model using Random Forest to forecast ADAS outputs based on real-time vehicle data. We’ll use a dataset from Kaggle, preprocess it, train the model, evaluate its performance, and even show how to deploy it for predictions.
Introduction to ADAS and the Dataset
ADAS refers to a suite of technologies that assist drivers in navigating roads safely. Common actions include braking to avoid obstacles, correcting lane deviations, maintaining a steady speed, or accelerating when it is safe. For EVs, these systems must also account for energy consumption, battery levels, and regenerative braking to optimise range and performance.
Credits
This blog post is based on the “ADAS_EV_Dataset.csv” dataset. Special thanks to the dataset creator on Kaggle: ziya07/adas-ev-dataset.
The dataset we’re using, ADAS_EV_Dataset.csv contains simulated or collected data from EV driving scenarios. It includes features such as vehicle speed, acceleration, and weather conditions, with the target variable being the ADAS output (e.g., “Brake”, “Lane Correct”, “Maintain Speed”, or “Accelerate”). The dataset has 10,000 entries, making it suitable for machine learning tasks. Key highlights:
Key highlights:
- Numerical Features: speed_km/h, acceleration_mps2, brake_intensity, battery_level, energy_consumption, regen_braking_usage, lane_deviation, obstacle_distance, traffic_density, steering_angle, reaction_time.
- Categorical Features: weather_condition (e.g., Sunny, Rainy, Foggy, Snowy), road_type (e.g., Highway, Urban, Rural).
- Target: ADAS_output (multi-class: Brake, Lane Correct, Maintain Speed, Accelerate).
- Other: Timestamp (dropped as it’s not predictive).
Model Objective
The goal is to predict ADAS_output using the other features, treating this as a multi-class classification problem. We evaluated several candidate models:
- Random Forest
- Gradient Boosting (XGBoost)
- Logistic Regression
- Neural Network
This is a multi-class classification problem, and Random Forest is an excellent choice due to its robustness against overfitting and ability to handle mixed data types. After testing, Random Forest emerged as the best model, achieving an outstanding accuracy of 99.95% on the test set.
Data Exploration
Before modelling, we need to understand the data. Loading the CSV into a Pandas DataFrame reveals no missing values in the sample, but we handle potential NAs by dropping them. A quick df.info() shows:
- 10,000 rows, 15 columns.
- Mix of floats, ints, and objects (for categoricals).
Visualising distributions (e.g., histograms of speed or battery level) or correlations could reveal insights, like how low obstacle_distance often correlates with “Brake” actions. The notebook notes that the timestamp is irrelevant for prediction, so we drop it early.

Building ML Model
Follow the steps from 1 to 7 for creating an ML model in Python.
Refer to the attached Random Forest model notebook built in Python, “Road segmentation for ADAS- Random Forest.ipynb“.
The notebook includes code for downloading the dataset via the Kaggle API. Note: Create your own Kaggle API to download the dataset.
Step 1: Setting Up the Environment and Downloading the Dataset
We start by setting up the Python environment and downloading the dataset from Kaggle. Here’s the code to get started in a Google Colab environment:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import classification_report, accuracy_score
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
import os
import shutil
import matplotlib.pyplot as plt
import seaborn as sns
# Mount Google Drive for Kaggle API (in Colab)
print("Step 1: Mounting Google Drive for Kaggle API")
from google.colab import drive
drive.mount('/content/drive')
# Configure Kaggle API
drive_json_path = "/content/drive/MyDrive/kaggle.json"
kaggle_dir = "/root/.kaggle"
os.makedirs(kaggle_dir, exist_ok=True)
shutil.copy(drive_json_path, f"{kaggle_dir}/kaggle.json")
os.chmod(f"{kaggle_dir}/kaggle.json", 600)
print("Step 2: Kaggle API key configured successfully!")
# Download the dataset
print("Step 3: Downloading ADAS_EV_Dataset from Kaggle")
!kaggle datasets download -d ziya07/adas-ev-dataset -p /content
!unzip /content/adas-ev-dataset.zip -d /content/data
This downloads ADAS_EV_Dataset.csv, which includes rows like:
timestamp,speed_kmh,acceleration_mps2,brake_intensity,...,ADAS_output 2023-01-01 00:00:00,44.9448142616835,-0.7581550891998088,...,Brake
Step 2: Data Exploration
We load the dataset, drop the timestamp, and explore its structure:
# Load and explore the dataset
df = pd.read_csv('/content/data/ADAS_EV_Dataset.csv')
df = df.drop('timestamp', axis=1)
print("Dataset Head:\n", df.head())
print("\nDataset Info:\n", df.info())
print("\nTarget Distribution:\n", df['ADAS_output'].value_counts())
Key findings:
- No missing values.
- Numerical features vary widely (e.g., speed_kmh from low to 117+).
- Categorical features (weather_condition, road_type) need encoding.
- The target variable is balanced across four classes.
Step 3: Data Preprocessing
We preprocess the data by scaling numerical features and encoding categorical ones:
# Separate features and target
X = df.drop('ADAS_output', axis=1)
y = df['ADAS_output']
# Define features
categorical_features = ['weather_condition', 'road_type']
numerical_features = X.select_dtypes(include=['float64', 'int64']).columns.tolist()
# Preprocessing pipeline
preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), numerical_features),
('cat', OneHotEncoder(), categorical_features)
])
# Random Forest pipeline
rf_classifier = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', RandomForestClassifier(random_state=42))
])
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
This ensures numerical features are standardised and categorical features are one-hot encoded.
Step 4: Model Training and Evaluation
We train the Random Forest model and evaluate its performance:
# Train the model
rf_classifier.fit(X_train, y_train)
# Make predictions
y_pred = rf_classifier.predict(X_test)
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
# Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=rf_classifier.classes_, yticklabels=rf_classifier.classes_)
plt.title('Confusion Matrix - Random Forest Classifier')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
Random Forest model achieves an accuracy of 99.95%, outperforming other models – XGBoost, Logistic Regression, and Neural Networks. The confusion matrix shows near-perfect classification across all classes.

Step 5: Feature Importance
Random Forest provides feature importance scores, highlighting which variables drive predictions:
# Feature Importance
importances = rf_classifier.named_steps['classifier'].feature_importances_
feature_names = (numerical_features + rf_classifier.named_steps['preprocessor'].named_transformers_['cat'].get_feature_names_out().tolist())
feature_importance = pd.DataFrame({'feature': feature_names, 'importance': importances}).sort_values('importance', ascending=False)
print("Feature Importances:\n", feature_importance)
Output:

speed_kmh, lane_deviation, and obstacle_distance are the most influential, aligning with ADAS priorities like collision avoidance and lane-keeping.
Step 6: Saving and Using the Model
We save the model and create a prediction function:
import joblib
# Save model and label encoder
joblib.dump(rf_classifier, 'adas_rf_model.pkl')
joblib.dump(LabelEncoder().fit(y), 'adas_label_encoder.pkl')
# Prediction function
def predict_adas(input_data):
loaded_model = joblib.load('adas_rf_model.pkl')
columns = ['speed_kmh', 'acceleration_mps2', 'brake_intensity', 'battery_level', 'energy_consumption', 'regen_braking_usage', 'lane_deviation', 'obstacle_distance', 'traffic_density', 'weather_condition', 'road_type', 'steering_angle', 'reaction_time']
new_df = pd.DataFrame([input_data], columns=columns)
return loaded_model.predict(new_df)[0]
# Example prediction
example_input = [114.08571676918994, -1.002527423, 0.18451199559871156, 48.63278308574834, 0.2801537728079102, 49.45170437193704, 1.5244075003852675, 82.85046196765872, 23, 'Foggy', 'Urban', -11.51297273, 2.3188701356511405]
print("Predicted Action:", predict_adas(example_input))
Output: Predicted Action: Lane Correct
Conclusion
We’ve built a highly accurate Random Forest model for predicting ADAS actions in EVs, leveraging preprocessing, training, and evaluation. This approach demonstrates how machine learning can enhance vehicle safety and efficiency. Future improvements could include hyperparameter tuning (e.g., GridSearchCV) or testing on real-world data to handle edge cases like sensor noise.