esenke 01adcfdf60 init
2025-12-08 22:16:31 +08:00
2025-12-08 22:16:31 +08:00
2025-12-08 22:16:31 +08:00
2025-12-08 22:16:31 +08:00
2025-12-08 22:16:31 +08:00
2025-12-08 22:16:31 +08:00
2025-12-08 22:16:31 +08:00

SkySense++

This repository is the official implementation of the paper "SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation".

📢 Latest Updates

🔥🔥🔥 Last Updated on 2025.09.15 🔥🔥🔥

  • [2025.09.15] Add a 🌍 project page.
  • [2025.08.04] Our work has been published in Nature Machine Intelligence.
  • [2025.03.23] Code for preprocessing/pretraining/application and model weights for models have been uploaded.
  • [2025.03.14] updated optical images of JL-16 dataset in Huggingface.
  • [2025.03.12] updated sentinel-1 images and labels of JL-16 dataset in Zenodo.
  • [2025.03.09] created repo in Zenodo, datasets are uploading.
  • [2024.11.13] updated details of pretrain and evaluation data.

Pretrain Data

RS-Semantic Dataset

We conduct semantic-enhanced pretraining on the RS-Semantic dataset, which consists of 13 datasets with pixel-level annotations. Below are the specifics of these datasets. (also see in Zenodo).

Dataset Modalities GSD(m) Size Categories Download Link
Five Billion Pixels Gaofen-2 4 6800x7200 24 Download
Potsdam Airborne 0.05 6000x6000 5 Download
Vaihingen Airborne 0.05 2494x2064 5 Download
Deepglobe WorldView 0.5 2448x2448 6 Download
iSAID Multiple Sensors - 800x800 to 4000x13000 15 Download
LoveDA Spaceborne 0.3 1024x1024 7 Download
DynamicEarthNet WorldView 0.3 1024x1024 7 Download
Sentinel-2* 10 32x32
Sentinel-1* 10 32x33
Pastis-MM WorldView 0.3 1024x1024 18 Download
Sentinel-2* 10 32x32
Sentinel-1* 10 32x33
C2Seg-AB Sentinel-2* 10 128x128 13 Download
Sentinel-1* 10 128x128
FLAIR Spot-5 0.2 512x512 12 Download
Sentinel-2* 10 40x40
DFC20 Sentinel-2 10 256x256 9 Download
Sentinel-1 10 256x256
S2-naip NAIP 1 512x512 32 Download
Sentinel-2* 10 64x64
Sentinel-1* 10 64x64
JL-16 Jilin-1 0.72 512x512 16 Download
Sentinel-1* 10 40x40

* for time-series data.

RS-Representation Dataset

The pretraining list is in the Zenodo- rep_data_list.tar. The download and process scripts are in tools/pretraining_data_builder.

EO Benchmark

We evaluate our SkySense++ on 12 typical Earth Observation (EO) tasks across 7 domains: agriculture, forestry, oceanography, atmosphere, biology, land surveying, and disaster management. The detailed information about the datasets used for evaluation is as follows.

Domain Task type Dataset Modalities GSD Image size Download Link Notes
Agriculture Crop classification Germany Sentinel-2* 10 24x24 Download
Foresetry Tree species classification TreeSatAI-Time-Series Airborne, 0.2 304x304 Download
Sentinel-2* 10 6x6
Sentinel-1* 10 6x6
Deforestation segmentation Atlantic Sentinel-2 10 512x512 Download
Oceanography Oil spill segmentation SOS Sentinel-1 10 256x256 Download
Atmosphere Air pollution regression 3pollution Sentinel-2 10 200x200 Download
Sentinel-5P 2600 120x120
Biology Wildlife detection Kenya Airborne - 3068x4603 Download
Land surveying LULC mapping C2Seg-BW Gaofen-6 10 256x256 Download
Gaofen-3 10 256x256
Change detection dsifn-cd GoogleEarth 0.3 512x512 Download
Disaster management Flood monitoring Flood-3i Airborne 0.05 256 × 256 Download
C2SMSFloods Sentinel-2, Sentinel-1 10 512x512 Download
Wildfire monitoring CABUAR Sentinel-2 10 5490 × 5490 Download
Landslide mapping GVLM GoogleEarth 0.3 1748x1748 ~ 10808x7424 Download
Building damage assessment xBD WorldView 0.3 1024x1024 Download

* for time-series data.

Implementation Code

Structure

This project mainly contains the following parts.

./
├── antmmf/                             # antmmf framework code
├── configs/                   
│   ├── eval_skysense_pp_flood3i.yml    # eval cfg on flood3i                
│   └── pretrain_skysensepp.yml         # pretrain cfg
├── finetune/                           # finetuning code
│   ├── configs/                        # finetuning configs
│   ├── mmseg/                          # mmseg library
│   ├── requirements/                   # mmseg install requirements folder
│   ├── requirements.txt                # mmseg install requirements
│   ├── setup.py                        # mmseg setup file
│   └── tools/                          # mmseg utils
├── lib/                                # model implementation
│   ├── datasets/                       # datasets for evaluation
│   ├── evaluation/                     # evaluation code
│   ├── models/                         # model architecture
│   ├── predictors/                     # inference code
│   ├── task/                           # task code
│   ├── trainer/                        # trainer code
│   ├── utils/                          # library code
│   └── __init__.py                     # packages init file
├── pretrain/                           # pretrain ckpts
├── tools/                              # tools ckpts
│   ├── pretraining_data_builder        # pretraining dataset builder
│   ├── run_1shot_flood3i.sh            # datasets for evaluation
│   ├── run_ft_atlantic.sh              # run ft script
│   ├── run_pretrain.sh                 # run pretrain script
│   └── run.py                          # Program entry point
└── readme.md                           # project readme

Environment

Each machine for implementating the pretraining or fintuning are with Alibaba Group Enterprise Linux(7.2) and Python 3.8.10. The pretraining and finetuning code are implemented on severs with Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz and Nvidia A100 GPUS.

Pretraining

To run our pretraining code, please install dependency packages. (Instalazation takes about 14 minutes on a node with Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz and 8 A100 GPUs.)

torch==1.13.1
atorch==0.1.3
torchvision==0.14.1
mmcv-full==1.7.1
mmsegmentation==0.30.0
mmcls==0.25.0
timm==0.6.13
gdal==3.4.0
scikit-image==0.19.3

Step1. Install the above packages and clone antmmf framework:

git clone https://github.com/alipay/Ant-Multi-Modal-Framework.git antmmf/

Step2. Download the pretraining datasets in Zenodo and orgnize them as follows:

pretrain_datasets
├── dynamic-mm                          # multi-modal dynamic-mm datasets
│   ├── images_hr                       # hr images
│   ├── images_s2                       # sentinel-2 images
│   ├── images_s1                       # sentinel-1 images
│   ├── labels                          # segmentation annotations
│   ├── dynamic-mm_train.json           # train list file
│   └── dynamic-mm_val.json             # val list file
├── fbp                                 # single-modal fbp datasets
│   ├── images                          # input gaofen-2 images
│   ├── labels                          # segmentation annotations
│   ├── fbp_train.json                  # train list file
│   └── fbp_val.json                    # val list file
└── ......                       

The <dataset>_<train/val>.json is used to read information for training and validation, with a unified organizational format:

[
  {
    "hr_path": "dataset_name/images_hr/<img_name>.png", // hr info c,h,w
    "s2_path": ["dataset_name/images_s2/<img_name>_20240101.npz", "dataset_name/images_s2/<img_name>_20240103.npz"], // s2 c,h,w
    "s1_path": ["dataset_name/images_s1/<img_name>_20240104.npz", "dataset_name/images_s1/<img_name>_20240108.npz"], // s1 c,h,w
    "target_path": "dataset_name/labels/<img_name>.png", // annotation info
    "type": "dataset_name", // dataset_name
    "classes": [            // Included categories
            0,
            2,
            4,
            5
        ]
  },
  {
    ...
  }
]

Step3. Download the pretraining weights of SkySense here and move it to pretrain/

Step4. Run the pretrain code on 4 nodes (each node with 8 A100 gpus):

bash tools/run_pretrain.sh <node_rank:0-3> <master_ip_address>

For example, if the ip adress of master node is 192.168.112.10, the command for node 1 is:

bash tools/run_pretrain.sh 1 192.168.112.10

Downstream 1-shot application

Requirments

To run our code, please install dependency packages. ( Instalazation takes about 10 minutes on a sever with Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz and 2 A100 GPUs.)

torch==1.13.1
atorch==0.1.3
torchvision==0.14.1
mmcv-full==1.7.1
mmcls==0.25.0
mmsegmentation==0.30.0
timm==0.6.13
gdal==3.4.0
scikit-image==0.19.3

Run steps

step1. Clone antmmf framework. and install the above packages:

git clone https://github.com/alipay/Ant-Multi-Modal-Framework.git antmmf/

step1. Download the flood-3i dataset (Images.zip/Semantic_mask.zip at here, val.txt at here. Testing dataset should be organized as follows:

eval_datasets/
└── flood3i/
    ├── Images/
    │   ├── 10165_0_2.jpg
    │   ├── 10165_1_0.jpg
    │   └── ...
    ├── Semantic_mask/
    │   ├── 10165_lab_0_2.png
    │   ├── 10165_lab_1_0.png
    │   └── ...
    └── val.txt

step2. Using the above pretraining wieights or download the pretrained model weights here.

step3. Run the script for evaluating 1-shot performance on flood-3i:

bash tools/run_1shot.sh <gpu_idx> flood-3i(dataset_name)

Downstream finetuning application

Requirments

We build our fine-tuning application code on the openmmlab framework.

To run our code, please install dependency packages. ( Instalazation takes about 10 minutes on a sever with Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz and 2 A100 GPUs.)

torch==1.13.1
torchvision==0.14.1
mmcv-full==2.1.0
mmpretrain==1.2.0
mmsegmentation==1.2.2
mmdetection==3.3.0
timm==0.6.13
gdal==3.4.0
scikit-image==0.19.3

Run steps

Step1. Install the mmsegmentation framework under the instrction in here

Step2. Download the evaluation datsets. We take Atlantic dataset for deforestation segmentation as an example. Download the Atlantic dataset at here. Spliting json files of evaluation framwork here.

../rs_datasets/deforestation_atlantic/
--
├── deforestation_atlantic_test.json
├── deforestation_atlantic_train.json
├── deforestation_atlantic_val.json
├── Test/
│   ├── image/
│   └── label/
├── Training/
│   ├── image/
│   └── label/
└── Validation/
    ├── image/
    └── label/

Step3. Use your pretrained model weights or download the model weights: here

Step4. Run the finetuning script. We take the Atlantic dataset as an example:

bash tools/run_finetune.sh configs/atlantic.py

Acknowledgments

This projects are mainly built on the following projects:

License

The pre-trained model weight and pre-training code are only available for the non-commercial research. For any commercial use or cooperation, please contact Yansheng Li at Wuhan University (e-mail: yansheng.li@whu.edu.cn).

Citation

If you find our repo useful, please consider giving a star and citation:

@article{wu2025semantic,
  author       = {Wu, Kang and Zhang, Yingying and Ru, Lixiang and Dang, Bo and Lao, Jiangwei and Yu, Lei and Luo, Junwei and Zhu, Zifan and Sun, Yue and Zhang, Jiahao and Zhu, Qi and Wang, Jian and Yang, Ming and Chen, Jingdong and Zhang, Yongjun and Li, Yansheng},
  title        = {A semanticenhanced multimodal remote sensing foundation model for Earth observation},
  journal      = {Nature Machine Intelligence},
  year         = {2025},
  doi          = {10.1038/s42256-025-01078-8},
  url          = {https://doi.org/10.1038/s42256-025-01078-8}
}

@inproceedings{guo2024skysense,
    author    = {Guo, Xin and Lao, Jiangwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Ru, Lixiang and Zhong, Liheng and Huang, Ziyuan and Wu, Kang and Hu, Dingxiang and He, Huimei and Wang, Jian and Chen, Jingdong and Yang, Ming and Zhang, Yongjun and Li, Yansheng},
    title     = {SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {27672-27683}
}

@inproceedings{zhu2025skysenseo,
  title={Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling},
  author={Zhu, Qi and Lao, Jiangwei and Ji, Deyi and Luo, Junwei and Wu, Kang and Zhang, Yingying and Ru, Lixiang and Wang, Jian and Chen, Jingdong and Yang, Ming and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={14733--14744},
  year={2025}
}

@article{luo2024skysensegpt,
  title={Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding},
  author={Luo, Junwei and Pang, Zhen and Zhang, Yongjun and Wang, Tingzhu and Wang, Linlin and Dang, Bo and Lao, Jiangwei and Wang, Jian and Chen, Jingdong and Tan, Yihua and others},
  journal={arXiv preprint arXiv:2406.10100},
  year={2024}
}

Star History

Star History Chart

Description
No description provided
Readme 1.8 MiB
Languages
Python 99.8%
Shell 0.2%