AI4Forest
diff --git a/‎README.md
+51-1 b/‎README.md
+51-1
diff --git a/‎figures/global_and_regional_comparison.png
1.14 MB b/‎figures/global_and_regional_comparison.png
1.14 MB
diff --git a/‎figures/global_canopy_height.png
915 KB b/‎figures/global_canopy_height.png
915 KB
diff --git a/‎figures/pipeline.png
413 KB b/‎figures/pipeline.png
413 KB
diff --git a/‎scripts/compute_dataset_percentiles.py
+88 b/‎scripts/compute_dataset_percentiles.py
+88
diff --git a/‎scripts/compute_dataset_statistics.py
+64 b/‎scripts/compute_dataset_statistics.py
+64
diff --git a/‎training/.DS_Store
6 KB b/‎training/.DS_Store
6 KB
diff --git a/‎training/config.py
+170 b/‎training/config.py
+170
@@ -1,2 +1,52 @@
 # Estimating Canopy Height at Scale [ICML2024]
-# [UNDER CONSTRUCTION, CODE COMING SOON]
+
+[Jan Pauls](https://www.wi.uni-muenster.de/de/institut/dasc/personen/jan-pauls), [Max Zimmer](https://maxzimmer.org), [Una M. Kelly](https://www.wi.uni-muenster.de/de/institut/dasc/personen/una-kelly), [Martin Schwartz](https://www.researchgate.net/profile/Martin-Schwartz-6), [Sassan Saatchi](https://science.jpl.nasa.gov/people/Saatchi/), [Philippe Ciais](https://www.lsce.ipsl.fr/Phocea/Pisp/index.php?nom=philippe.ciais), [Sebastian Pokutta](https://pokutta.com), [Martin Brandt](https://www.researchgate.net/profile/Martin-Brandt-2), [Fabian Gieseke](https://www.wi.uni-muenster.de/department/dasc/people/fabian-gieseke)
+
+
+[[`Paper`](http://arxiv.org/abs/2406.01076)] [`Google Earth Engine viewer`](https://worldwidemap.projects.earthengine.app/view/canopy-height-2020)] [[`BibTeX`](#citing-the-paper)]
+
+![Global canopy height map](figures/global_canopy_height.png)
+
+We propose a framework for **global-scale canopy height estimation** based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.
+
+![Global canopy height map](figures/pipeline.png)
+
+A comparison between our map and two other existing global height maps (Lang et al., Potapov et al.), as well as a regional map for France reveals that the visual quality improved a lot. It closely matches the one from regional maps, albeit some regions with remaining quality differences (e.g. column 8)
+
+![Global and regional comparison](figures/global_and_regional_comparison.png)
+
+## Interactive Google Earth Engine viewer
+We uploaded our produced canopy height map to Google Earth Engine and created a [GEE app](https://worldwidemap.projects.earthengine.app/view/canopy-height-2020) that allows users to visualize our map globally and compare it to other existing products. If you want to build your own app or download/use our map in another way, you can access the map under the following asset_id:
+
+```
+var canopy_height_2020 = ee.ImageCollection('projects/worldwidemap/assets/canopyheight2020')
+
+# To display on the map, create the mosaic:
+var canopy_height_2020 = ee.ImageCollection('projects/worldwidemap/assets/canopyheight2020').mosaic()
+```
+
+## Acknowledgements
+
+This paper is part of the project *AI4Forest*, which is funded by the
+German Aerospace Agency
+([DLR](https://github.com/AI4Forest/Global-Canopy-Height-Map)), the
+german federal ministry for education and research
+([BMBF](https://www.bmbf.de/bmbf/en/home/home_node.html)) and the french
+national research agency ([anr](https://anr.fr/en/)). Further,
+calculations (or parts of them) for this publication were performed on
+the HPC cluster PALMA II of the University of Münster, subsidised by the
+DFG (INST 211/667-1).
+
+## Citing the paper
+
+If you use our map in your research, please cite using the following BibTex:
+
+```
+@article{pauls2024estimating,
+      title={Estimating Canopy Height at Scale}, 
+      author={Jan Pauls and Max Zimmer and Una M. Kelly and Martin Schwartz and Sassan Saatchi and Philippe Ciais and Sebastian Pokutta and Martin Brandt and Fabian Gieseke},
+      year={2024},
+      eprint={2406.01076},
+      archivePrefix={arXiv}
+}
+```
@@ -0,0 +1,88 @@
+import torch
+from torchvision import transforms
+from torch.utils.data import DataLoader
+import os
+import numpy as np
+from tqdm.auto import tqdm
+import sys
+# Assuming PreprocessedSatelliteDataset is defined in your project
+from config import PreprocessedSatelliteDataset
+from runner import Runner
+
+def update_extremes(values, extremes, num_extremes, largest=True):
+    """
+    Update the list of extreme values (either largest or smallest) based on the new batch.
+    """
+    combined = torch.cat((extremes, values))
+    sorted_values, _ = torch.sort(combined, descending=largest)
+    return sorted_values[:num_extremes]
+
+def compute_percentiles(dataset_name, split, percentiles, num_workers_default=4):
+    # Set up dataset and DataLoader
+    rootPath = Runner.get_dataset_root(dataset_name=dataset_name)
+    dataframe = os.path.join(rootPath, f'{split}.csv')
+
+    train_transforms = transforms.Compose([
+        transforms.ToTensor(),
+    ])
+
+    dataset = PreprocessedSatelliteDataset(data_path=rootPath, dataframe=dataframe,
+                                           image_transforms=train_transforms,
+                                           use_weighted_sampler=None, use_memmap=True)
+    total_data_points = len(dataset)
+    num_channels = 14
+
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    num_workers = num_workers_default * torch.cuda.device_count()
+    dataloader = DataLoader(dataset, batch_size=32, shuffle=False, num_workers=num_workers, pin_memory=torch.cuda.is_available())
+
+    # Initialize percentile tracking
+    extremes = {channel: {p: torch.tensor([]).to(device) for p in percentiles} for channel in range(num_channels)}
+
+    # Process each batch
+    with torch.no_grad():
+        for data, _ in tqdm(dataloader):
+            data = data.to(device=device, non_blocking=True)
+            # Switch the channel dimension to the first dimension, currently its at dim 1
+            data = data.permute(1, 0, 2, 3)
+            # Flatten the data
+            data = data.flatten(start_dim=1)
+
+            for channel in range(num_channels):
+                channel_data = data[channel, :]
+
+                for percentile in percentiles:
+                    if percentile < 50:
+                        num_extremes = int(total_data_points * percentile / 100)
+                        largest = False
+                    else:
+                        num_extremes = int(total_data_points * (100 - percentile) / 100)    # E.g. if percentile == 95, we look at the 5 percentile from the other side
+                        largest = True
+                    current_extremes = extremes[channel][percentile]
+                    new_extremes = update_extremes(values=channel_data, extremes=current_extremes, num_extremes=num_extremes, largest=largest)
+                    extremes[channel][percentile] = new_extremes
+
+    # Compute final percentile values
+    percentile_values = {channel: {} for channel in range(num_channels)}
+    for channel in range(num_channels):
+        for percentile in percentiles:
+            if percentile > 50:
+                percentile_values[channel][percentile] = extremes[channel][percentile].min().item()
+            else:
+                percentile_values[channel][percentile] = extremes[channel][percentile].max().item()
+
+    # Save results
+    dump_path = os.path.join(os.getcwd(), f'{dataset_name}_{split}_percentiles.txt')
+    with open(dump_path, 'w') as f:
+        for percentile in percentiles:
+            percentile_values_for_all_channels = tuple(percentile_values[channel][percentile] for channel in percentile_values)
+            f.write(f'{percentile}: {percentile_values_for_all_channels},\n')
+
+
+    return percentile_values
+
+# Usage example
+percentiles = [1, 2, 5, 95, 98, 99]
+dataset_name = 'ai4forest_camera'
+split = 'train'
+percentile_values = compute_percentiles(dataset_name, split, percentiles)
@@ -0,0 +1,64 @@
+import torch
+from torchvision import transforms
+from torch.utils.data import DataLoader
+from PIL import Image
+import os
+import numpy as np
+
+from config import PreprocessedSatelliteDataset
+from runner import Runner
+
+from tqdm.auto import tqdm
+
+def compute_mean_std(dataset, split):
+    rootPath = Runner.get_dataset_root(dataset_name=dataset)
+    if split == 'train':
+        dataframe = os.path.join(rootPath, 'train.csv')
+    elif split == 'val':
+        dataframe = os.path.join(rootPath, 'val.csv')
+    else:
+        raise ValueError("Invalid split value. Expected 'train' or 'val'.")
+    # Convert to tensor (this changes the order of the channels)
+    train_transforms = transforms.Compose([
+        transforms.ToTensor(),
+    ])
+    dataset = PreprocessedSatelliteDataset(data_path=rootPath, dataframe=dataframe, image_transforms=train_transforms,
+                                           use_weighted_sampler=None, use_memmap=True)
+    
+
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    num_workers_default = 4
+    num_workers = num_workers_default * torch.cuda.device_count()
+    dataloader = DataLoader(dataset, batch_size=32, shuffle=False, num_workers=num_workers, pin_memory=torch.cuda.is_available())
+    mean = 0.
+    std = 0.
+    nb_samples = 0.
+    with torch.no_grad():
+        for data in tqdm(dataloader):
+            data, _ = data
+            data = data.to(device=device, non_blocking=True)
+            batch_samples = data.size(0)
+            data = data.view(batch_samples, data.size(1), -1)
+            mean += data.mean(2).sum(0)
+            std += data.std(2).sum(0)
+            nb_samples += batch_samples
+
+    mean /= nb_samples
+    std /= nb_samples
+    return mean, std
+
+# Load the dataset
+dataset = 'ai4forest_camera'
+split = 'train'
+
+
+# Compute and print the mean and std
+mean, std = compute_mean_std(dataset=dataset, split=split)
+print(f'Mean: {mean}')
+print(f'Std: {std}')
+
+# Dump the mean and std to a file in the current working directory
+dump_path = os.path.join(os.getcwd(), f'{dataset}_{split}_mean_std.txt')
+with open(dump_path, 'w') as f:
+    f.write(f'Mean: {mean}\n')
+    f.write(f'Std: {std}\n')
@@ -0,0 +1,170 @@
+from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
+import glob
+import os
+import torch
+import numpy as np
+import pandas as pd
+import pdb
+from torch.utils.data.dataloader import default_collate
+import sys
+
+means = {
+    'ai4forest_camera': (10782.3223,  3304.7444,  1999.6086,  7276.4209,  1186.4460,  1884.6165,
+         2645.6113,  3128.2588,  3806.2808,  4134.6855,  4113.4883,  4259.1885,
+         4683.5879,  3838.2222),    # Not the true values, change for your dataset
+}
+
+stds = {
+    'ai4forest_camera': (907.7484,  472.1412,  423.8558, 1086.0916,  175.0936,  226.6303,
+         299.4834,  313.0911,  388.1186,  434.4579,  455.7314,  455.0303,
+         388.5127,  374.1260),  # Not the true values, change for your dataset
+}
+
+percentiles = {
+    'ai4forest_camera': {
+        1: (-7542.0, -8126.0, -16659.0, -14187.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0),
+        2: (-6834.0, -7255.0, -14468.0, -13537.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0),
+        5: (-5694.0, -5963.0, -12383.0, -12601.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0),
+        95: (24995.0, 24556.0, 22124.0, 20120.0, 15016.0, 15116.0, 15212.0, 15181.0, 14946.0, 14406.0, 14660.0, 13810.0, 12082.0, 13041.0),
+        98: (25969.0, 26078.0, 23632.0, 21934.0, 15648.0, 15608.0, 15487.0, 15449.0, 15296.0, 15155.0, 15264.0, 14943.0, 13171.0, 14064.0),
+        99: (27044.0, 27349.0, 24868.0, 23266.0, 15970.0, 15680.0, 15548.0, 15494.0, 15432.0, 15368.0, 15385.0, 15219.0, 13590.0, 14657.0),
+    }  # Not the true values, change for your dataset
+}
+
+class FixValDataset(Dataset):
+    """
+    Dataset class to load the fixval dataset.
+    """
+    def __init__(self, data_path, dataframe, image_transforms=None):
+        self.data_path = data_path
+        self.df = pd.read_csv(dataframe, index_col=False)
+        self.files = list(self.df["paths"].apply(lambda x: os.path.join(data_path, x)))
+        self.image_transforms = image_transforms
+
+    def __len__(self):
+        return len(self.files)
+
+    def __getitem__(self, index):
+        file = self.files[index].replace(r"'", "")
+        fileName = file[file.rfind('data_')+5: file.rfind('.npz')]
+        data = np.load(file)
+
+        image = data["data"].astype(np.float32)
+        # Move the channel axis to the last position (required for torchvision transforms)
+        image = np.moveaxis(image, 0, -1)
+        if self.image_transforms:
+            image = self.image_transforms(image)
+
+        return image, fileName
+
+class PreprocessedSatelliteDataset(Dataset):
+    """
+    Dataset class for preprocessed satellite imagery.
+    """
+
+    def __init__(self, data_path, dataframe=None, image_transforms=None, label_transforms=None, joint_transforms=None, use_weighted_sampler=False,
+                  use_weighting_quantile=None, use_memmap=False, remove_corrupt=True, load_labels=True, patch_size=512):
+        self.use_memmap = use_memmap
+        self.patch_size = patch_size
+        self.load_labels = load_labels  # If False, we only load the images and not the labels
+        df = pd.read_csv(dataframe)
+
+        if remove_corrupt:
+            old_len = len(df)
+            #df = df[df["missing_s2_flag"] == False] # Use only the rows that are not corrupt, i.e. those where df["missing_s2_flag"] == False
+
+            # Use only the rows that are not corrupt, i.e. those where df["has_corrupt_s2_channel_flag"] == False
+            df = df[df["has_corrupt_s2_channel_flag"] == False]
+            sys.stdout.write(f"Removed {old_len - len(df)} corrupt rows.\n")
+
+        self.files = list(df["paths"].apply(lambda x: os.path.join(data_path, x)))
+
+        if use_weighted_sampler not in [False, None]:
+            assert use_weighted_sampler in ['g5', 'g10', 'g15', 'g20', 'g25', 'g30']
+            weighting_quantile = use_weighting_quantile
+            assert weighting_quantile in [None, 'None'] or int(weighting_quantile) == weighting_quantile, "weighting_quantile must be an integer."
+            if weighting_quantile in [None, 'None']:
+                self.weights = (df[use_weighted_sampler] / df["totals"]).values.clip(0., 1.)
+            else:
+                # We do not clip between 0 and 1, but rather between the weighting_quantile and 1.
+                weighting_quantile = float(weighting_quantile)
+                self.weights = (df[use_weighted_sampler] / df["totals"]).values
+
+                # Compute the quantiles, ignoring nan values and zero values
+                tmp_weights = self.weights.copy()
+                tmp_weights[np.isnan(tmp_weights)] = 0.
+                tmp_weights = tmp_weights[tmp_weights > 0.]
+
+                quantile_min = np.nanquantile(tmp_weights, weighting_quantile / 100)
+                sys.stdout.write(f"Computed weighting {weighting_quantile}-quantile-lower bound: {quantile_min}.\n")
+
+                # Clip the weights
+                self.weights = self.weights.clip(quantile_min, 1.0)
+
+            # Set the nan values to 0.
+            self.weights[np.isnan(self.weights)] = 0.
+
+        else:
+            self.weights = None
+        self.image_transforms, self.label_transforms, self.joint_transforms = image_transforms, label_transforms, joint_transforms
+
+    def __len__(self):
+        return len(self.files)
+
+    def __getitem__(self, index):
+        if self.use_memmap:
+            item = self.getitem_memmap(index)
+        else:
+            item = self.getitem_classic(index)
+
+        return item
+
+    def getitem_memmap(self, index):
+        file = self.files[index]
+        with np.load(file, mmap_mode='r') as npz_file:
+            image = npz_file['data'].astype(np.float32)
+            # Move the channel axis to the last position (required for torchvision transforms)
+            image = np.moveaxis(image, 0, -1)
+            if self.image_transforms:
+                image = self.image_transforms(image)
+            if self.load_labels:
+                label = npz_file['labels'].astype(np.float32)
+
+                # Process label
+                label = label[:3]  # Everything after index/granule 3 is irrelevant
+                label = label / 100  # Convert from cm to m
+                label = np.moveaxis(label, 0, -1)
+
+                if self.label_transforms:
+                    label = self.label_transforms(label)
+                if self.joint_transforms:
+                    image, label = self.joint_transforms(image, label)
+                return image, label
+
+        return image
+
+    def getitem_classic(self, index):
+        file = self.files[index]
+        data = np.load(file)
+
+        image = data["data"].astype(np.float32)
+        # Move the channel axis to the last position (required for torchvision transforms)
+        image = np.moveaxis(image, 0, -1)[:self.patch_size,:self.patch_size]
+        if self.image_transforms:
+            image = self.image_transforms(image)
+        if self.load_labels:
+            label = data["labels"].astype(np.float32)
+
+            # Process label
+            label = label[:3]  # Everything after index 3 is irrelevant
+            label = label[:,:self.patch_size, :self.patch_size]
+            label = label / 100  # Convert from cm to m
+            label = np.moveaxis(label, 0, -1)
+
+            if self.label_transforms:
+                label = self.label_transforms(label)
+            if self.joint_transforms:
+                image, label = self.joint_transforms(image, label)
+            return image, label
+
+        return image