Back to all posts
Kaggle Competition

My Experience in the CSIRO Biomass Estimation Kaggle Competition

I recently took part in the CSIRO Biomass Image-to-Biomass Kaggle competition along with my teammate Dhvanil Sheth. Here's a deep dive into the challenges we faced and how we tackled them.

The Competition

We were given a dataset of images captured from grasslands across 4 states in Australia, covering a wide range of plant species and tropical conditions. The task was to predict 5 biomass content labels, each ranging from 0 to 1.

Sounds like a straightforward regression task, right? But as we dug deeper into the dataset, we uncovered several significant challenges.

Key Challenges

1

Image Preprocessing Complexity

Figuring out which preprocessing techniques to apply to handle variable lighting and inconsistent camera angles. We also had to filter out bad examples that would hurt model training.

2

Physical Constraints on Labels

The target labels had physical constraints that needed to be baked into the model training. For example, Label Z = Label X + Label Y. Ignoring these constraints would produce physically impossible predictions.

3

Metadata Dependency

Metadata like location of plant, state, and date provided a huge performance boost during training — but they were not available during testing. This was arguably the biggest hint and challenge of the competition.

4

Arguably Small Dataset

The dataset had only 1,162 images total. CNN-based models typically require more images to perform well, so there was a real risk of overfitting on the majority species class.

5

Target Imbalance & Variance

Certain classes like Clover were sparse with low variance, while other classes had very high variance. This made it difficult to train a single model that performed well across all targets.

Our Approach

01

Classical Image Preprocessing

We found that classical preprocessing techniques worked best: improving contrast using CLAHE, cropping, and HSV scale conversions helped address the variable lighting and camera angle challenges effectively.

02

Constrained Output Layer

Instead of predicting all 5 labels independently, we only predicted 3 labels in the final output layer and derived the constrained labels mathematically. This was fed to the loss function and improved our final score by 3%.

03

Teacher-Student Distillation

This was our key insight. We used a teacher-student distillation setup where:

  • The teacher was trained with metadata + a large DINOv2 backbone
  • The student learned to be confident in predictions using signals from both ground truth and the teacher's predictions

As expected, the teacher achieved very good validation metrics, while the student typically scored lower — which was expected behavior since it was a lightweight model with a weighted loss function balancing pseudo-labels from the teacher and ground truth.

04

K-Fold Validation with TTA

We experimented with our own custom folds for K-Fold cross-validation combined with test time augmentation (TTA). Our final predictions were the average across all folds and augmentations.

05

Per-Target Normalization & Weighted Metrics

We applied per-target normalization for every fold and used a weighted R2 metric to evaluate model training, ensuring balanced performance across all biomass targets.