esen/SkySensePlusPlus

Fork 0

Go to file

esenke 01adcfdf60 init

2025-12-08 22:16:31 +08:00

configs

init

2025-12-08 22:16:31 +08:00

finetune

init

2025-12-08 22:16:31 +08:00

lib

init

2025-12-08 22:16:31 +08:00

tools

init

2025-12-08 22:16:31 +08:00

.gitignore

init

2025-12-08 22:16:31 +08:00

README.md

init

2025-12-08 22:16:31 +08:00

README.md

SkySense++

This repository is the official implementation of the paper "SkySense++: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model Beyond SkySense for Earth Observation".

📢 Latest Updates

🔥🔥🔥 Last Updated on 2025.09.15 🔥🔥🔥

[2025.09.15] Add a 🌍 project page.
[2025.08.04] Our work has been published in Nature Machine Intelligence.
[2025.03.23] Code for preprocessing/pretraining/application and model weights for models have been uploaded.
[2025.03.14] updated optical images of JL-16 dataset in Huggingface.
[2025.03.12] updated sentinel-1 images and labels of JL-16 dataset in Zenodo.
[2025.03.09] created repo in Zenodo, datasets are uploading.
[2024.11.13] updated details of pretrain and evaluation data.

Pretrain Data

RS-Semantic Dataset

We conduct semantic-enhanced pretraining on the RS-Semantic dataset, which consists of 13 datasets with pixel-level annotations. Below are the specifics of these datasets. (also see in Zenodo).

Dataset	Modalities	GSD(m)	Size	Categories	Download Link
Five Billion Pixels	Gaofen-2	4	6800x7200	24	Download
Potsdam	Airborne	0.05	6000x6000	5	Download
Vaihingen	Airborne	0.05	2494x2064	5	Download
Deepglobe	WorldView	0.5	2448x2448	6	Download
iSAID	Multiple Sensors	-	800x800 to 4000x13000	15	Download
LoveDA	Spaceborne	0.3	1024x1024	7	Download
DynamicEarthNet	WorldView	0.3	1024x1024	7	Download
	Sentinel-2*	10	32x32
	Sentinel-1*	10	32x33
Pastis-MM	WorldView	0.3	1024x1024	18	Download
	Sentinel-2*	10	32x32
	Sentinel-1*	10	32x33
C2Seg-AB	Sentinel-2*	10	128x128	13	Download
	Sentinel-1*	10	128x128
FLAIR	Spot-5	0.2	512x512	12	Download
	Sentinel-2*	10	40x40
DFC20	Sentinel-2	10	256x256	9	Download
	Sentinel-1	10	256x256
S2-naip	NAIP	1	512x512	32	Download
	Sentinel-2*	10	64x64
	Sentinel-1*	10	64x64
JL-16	Jilin-1	0.72	512x512	16	Download
	Sentinel-1*	10	40x40

* for time-series data.

RS-Representation Dataset

The pretraining list is in the Zenodo- rep_data_list.tar. The download and process scripts are in tools/pretraining_data_builder.

EO Benchmark

We evaluate our SkySense++ on 12 typical Earth Observation (EO) tasks across 7 domains: agriculture, forestry, oceanography, atmosphere, biology, land surveying, and disaster management. The detailed information about the datasets used for evaluation is as follows.

Domain	Task type	Dataset	Modalities	GSD	Image size	Download Link
Agriculture	Crop classification	Germany	Sentinel-2*	10	24x24	Download
Foresetry	Tree species classification	TreeSatAI-Time-Series	Airborne,	0.2	304x304	Download
			Sentinel-2*	10	6x6
			Sentinel-1*	10	6x6
	Deforestation segmentation	Atlantic	Sentinel-2	10	512x512	Download
Oceanography	Oil spill segmentation	SOS	Sentinel-1	10	256x256	Download
Atmosphere	Air pollution regression	3pollution	Sentinel-2	10	200x200	Download
			Sentinel-5P	2600	120x120
Biology	Wildlife detection	Kenya	Airborne	-	3068x4603	Download
Land surveying	LULC mapping	C2Seg-BW	Gaofen-6	10	256x256	Download
			Gaofen-3	10	256x256
	Change detection	dsifn-cd	GoogleEarth	0.3	512x512	Download
Disaster management	Flood monitoring	Flood-3i	Airborne	0.05	256 × 256	Download
		C2SMSFloods	Sentinel-2, Sentinel-1	10	512x512	Download
	Wildfire monitoring	CABUAR	Sentinel-2	10	5490 × 5490	Download
	Landslide mapping	GVLM	GoogleEarth	0.3	1748x1748 ~ 10808x7424	Download
	Building damage assessment	xBD	WorldView	0.3	1024x1024	Download

* for time-series data.

Implementation Code

Structure

This project mainly contains the following parts.

./
├── antmmf/                             # antmmf framework code
├── configs/                   
│   ├── eval_skysense_pp_flood3i.yml    # eval cfg on flood3i                
│   └── pretrain_skysensepp.yml         # pretrain cfg
├── finetune/                           # finetuning code
│   ├── configs/                        # finetuning configs
│   ├── mmseg/                          # mmseg library
│   ├── requirements/                   # mmseg install requirements folder
│   ├── requirements.txt                # mmseg install requirements
│   ├── setup.py                        # mmseg setup file
│   └── tools/                          # mmseg utils
├── lib/                                # model implementation
│   ├── datasets/                       # datasets for evaluation
│   ├── evaluation/                     # evaluation code
│   ├── models/                         # model architecture
│   ├── predictors/                     # inference code
│   ├── task/                           # task code
│   ├── trainer/                        # trainer code
│   ├── utils/                          # library code
│   └── __init__.py                     # packages init file
├── pretrain/                           # pretrain ckpts
├── tools/                              # tools ckpts
│   ├── pretraining_data_builder        # pretraining dataset builder
│   ├── run_1shot_flood3i.sh            # datasets for evaluation
│   ├── run_ft_atlantic.sh              # run ft script
│   ├── run_pretrain.sh                 # run pretrain script
│   └── run.py                          # Program entry point
└── readme.md                           # project readme

Environment

Each machine for implementating the pretraining or fintuning are with Alibaba Group Enterprise Linux(7.2) and Python 3.8.10. The pretraining and finetuning code are implemented on severs with Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz and Nvidia A100 GPUS.

Pretraining

To run our pretraining code, please install dependency packages. (Instalazation takes about 14 minutes on a node with Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz and 8 A100 GPUs.)

torch==1.13.1
atorch==0.1.3
torchvision==0.14.1
mmcv-full==1.7.1
mmsegmentation==0.30.0
mmcls==0.25.0
timm==0.6.13
gdal==3.4.0
scikit-image==0.19.3

Step1. Install the above packages and clone antmmf framework:

git clone https://github.com/alipay/Ant-Multi-Modal-Framework.git antmmf/

Step2. Download the pretraining datasets in Zenodo and orgnize them as follows:

pretrain_datasets
├── dynamic-mm                          # multi-modal dynamic-mm datasets
│   ├── images_hr                       # hr images
│   ├── images_s2                       # sentinel-2 images
│   ├── images_s1                       # sentinel-1 images
│   ├── labels                          # segmentation annotations
│   ├── dynamic-mm_train.json           # train list file
│   └── dynamic-mm_val.json             # val list file
├── fbp                                 # single-modal fbp datasets
│   ├── images                          # input gaofen-2 images
│   ├── labels                          # segmentation annotations
│   ├── fbp_train.json                  # train list file
│   └── fbp_val.json                    # val list file
└── ......

The <dataset>_<train/val>.json is used to read information for training and validation, with a unified organizational format:

[
  {
    "hr_path": "dataset_name/images_hr/<img_name>.png", // hr info c,h,w
    "s2_path": ["dataset_name/images_s2/<img_name>_20240101.npz", "dataset_name/images_s2/<img_name>_20240103.npz"], // s2 c,h,w
    "s1_path": ["dataset_name/images_s1/<img_name>_20240104.npz", "dataset_name/images_s1/<img_name>_20240108.npz"], // s1 c,h,w
    "target_path": "dataset_name/labels/<img_name>.png", // annotation info
    "type": "dataset_name", // dataset_name
    "classes": [            // Included categories
            0,
            2,
            4,
            5
        ]
  },
  {
    ...
  }
]

Step3. Download the pretraining weights of SkySense here and move it to pretrain/

Step4. Run the pretrain code on 4 nodes (each node with 8 A100 gpus):

bash tools/run_pretrain.sh <node_rank:0-3> <master_ip_address>

For example, if the ip adress of master node is 192.168.112.10, the command for node 1 is:

bash tools/run_pretrain.sh 1 192.168.112.10

Downstream 1-shot application

Requirments

To run our code, please install dependency packages. ( Instalazation takes about 10 minutes on a sever with Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz and 2 A100 GPUs.)

torch==1.13.1
atorch==0.1.3
torchvision==0.14.1
mmcv-full==1.7.1
mmcls==0.25.0
mmsegmentation==0.30.0
timm==0.6.13
gdal==3.4.0
scikit-image==0.19.3

Run steps

step1. Clone antmmf framework. and install the above packages:

git clone https://github.com/alipay/Ant-Multi-Modal-Framework.git antmmf/

step1. Download the flood-3i dataset (Images.zip/Semantic_mask.zip at here, val.txt at here. Testing dataset should be organized as follows:

eval_datasets/
└── flood3i/
    ├── Images/
    │   ├── 10165_0_2.jpg
    │   ├── 10165_1_0.jpg
    │   └── ...
    ├── Semantic_mask/
    │   ├── 10165_lab_0_2.png
    │   ├── 10165_lab_1_0.png
    │   └── ...
    └── val.txt

step2. Using the above pretraining wieights or download the pretrained model weights here.

step3. Run the script for evaluating 1-shot performance on flood-3i:

bash tools/run_1shot.sh <gpu_idx> flood-3i(dataset_name)

Downstream finetuning application

Requirments

We build our fine-tuning application code on the openmmlab framework.

To run our code, please install dependency packages. ( Instalazation takes about 10 minutes on a sever with Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz and 2 A100 GPUs.)

torch==1.13.1
torchvision==0.14.1
mmcv-full==2.1.0
mmpretrain==1.2.0
mmsegmentation==1.2.2
mmdetection==3.3.0
timm==0.6.13
gdal==3.4.0
scikit-image==0.19.3

Run steps

Step1. Install the mmsegmentation framework under the instrction in here

Step2. Download the evaluation datsets. We take Atlantic dataset for deforestation segmentation as an example. Download the Atlantic dataset at here. Spliting json files of evaluation framwork here.

../rs_datasets/deforestation_atlantic/
--
├── deforestation_atlantic_test.json
├── deforestation_atlantic_train.json
├── deforestation_atlantic_val.json
├── Test/
│   ├── image/
│   └── label/
├── Training/
│   ├── image/
│   └── label/
└── Validation/
    ├── image/
    └── label/

Step3. Use your pretrained model weights or download the model weights: here

Step4. Run the finetuning script. We take the Atlantic dataset as an example:

bash tools/run_finetune.sh configs/atlantic.py

Acknowledgments

This projects are mainly built on the following projects:

License

The pre-trained model weight and pre-training code are only available for the non-commercial research. For any commercial use or cooperation, please contact Yansheng Li at Wuhan University (e-mail: yansheng.li@whu.edu.cn).

Citation

If you find our repo useful, please consider giving a star and citation:

@article{wu2025semantic,
  author       = {Wu, Kang and Zhang, Yingying and Ru, Lixiang and Dang, Bo and Lao, Jiangwei and Yu, Lei and Luo, Junwei and Zhu, Zifan and Sun, Yue and Zhang, Jiahao and Zhu, Qi and Wang, Jian and Yang, Ming and Chen, Jingdong and Zhang, Yongjun and Li, Yansheng},
  title        = {A semantic‑enhanced multi‑modal remote sensing foundation model for Earth observation},
  journal      = {Nature Machine Intelligence},
  year         = {2025},
  doi          = {10.1038/s42256-025-01078-8},
  url          = {https://doi.org/10.1038/s42256-025-01078-8}
}

@inproceedings{guo2024skysense,
    author    = {Guo, Xin and Lao, Jiangwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Ru, Lixiang and Zhong, Liheng and Huang, Ziyuan and Wu, Kang and Hu, Dingxiang and He, Huimei and Wang, Jian and Chen, Jingdong and Yang, Ming and Zhang, Yongjun and Li, Yansheng},
    title     = {SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {27672-27683}
}

@inproceedings{zhu2025skysenseo,
  title={Skysense-o: Towards open-world remote sensing interpretation with vision-centric visual-language modeling},
  author={Zhu, Qi and Lao, Jiangwei and Ji, Deyi and Luo, Junwei and Wu, Kang and Zhang, Yingying and Ru, Lixiang and Wang, Jian and Chen, Jingdong and Yang, Ming and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={14733--14744},
  year={2025}
}

@article{luo2024skysensegpt,
  title={Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding},
  author={Luo, Junwei and Pang, Zhen and Zhang, Yongjun and Wang, Tingzhu and Wang, Linlin and Dang, Bo and Lao, Jiangwei and Wang, Jian and Chen, Jingdong and Tan, Yihua and others},
  journal={arXiv preprint arXiv:2406.10100},
  year={2024}
}

README.md Unescape Escape

SkySense++

📢 Latest Updates

Pretrain Data

RS-Semantic Dataset

RS-Representation Dataset

EO Benchmark

Implementation Code

Structure

Environment

Pretraining

Downstream 1-shot application

Requirments

Run steps

Downstream finetuning application

Requirments

Run steps

Acknowledgments

License

Citation

Star History

README.md