A curated list of awesome Foundation Models in Agriculture papers 🔥🔥🔥.
Currently maintained by Jiajia Li @ MSU.
Work still in progress 🚀, we appreciate any suggestions and contributions ❤️.
If you have any suggestions or find any missed papers, feel free to reach out or submit a pull request:
- Use following markdown format.
*Author 1, Author 2, and Author 3.* **Paper Title.** <ins>Conference/Journal/Preprint</ins> Year. [[pdf](link)]; [[other resources](link)].
-
If one preprint paper has multiple versions, please use the earliest submitted year.
-
Display the papers in a year descending order (the latest, the first).
Find this repository helpful? 😊
Please consider citing our paper. 👇👇👇
(Note that the current version of our survey is only a draft, and we are still working on it.) 🚀
@article{li2024foundation,
title={Foundation models in smart agriculture: Basics, opportunities, and challenges},
author={Li, Jiajia and Xu, Mingle and Xiang, Lirong and Chen, Dong and Zhuang, Weichao and Yin, Xunyuan and Li, Zhaojian},
journal={Computers and Electronics in Agriculture},
volume={222},
pages={109032},
year={2024},
publisher={Elsevier}
}
- Wu, Jing, Zhixin Lai, Suiyao Chen, Ran Tao, Pan Zhao, and Naira Hovakimyan. "The New Agronomists: Language Models are Experts in Crop Management." arXiv preprint arXiv:2403.19839 (2024).
Why foundation models instead of traditional deep learning models?
- 👉 Pre-trained Knowledge. By training on vast and diverse datasets, FMs possess a form of "general intelligence" that encompasses knowledge of the world, language, vision, and their specific training domains.
- 👉 Fine-tuning Flexibility. FMs demonstrate superior performance to be fine-tuned for particular tasks or datasets, saving the computational and temporal investments required to train extensive models from scratch.
- 👉 Data Efficiency. FMs harness their foundational knowledge, exhibiting remarkable performance even in the face of limited task-specific data, which is effective for scenarios with data scarcity issues.
- Moor, Michael, et al. "Foundation models for generalist medical artificial intelligence." Nature 616.7956 (2023): 259-265. [Google Scholar] [Paper]
- Mai, Gengchen, et al. "On the opportunities and challenges of foundation models for geospatial artificial intelligence." arXiv preprint arXiv:2304.06798 (2023). [Google Scholar] [Paper]
- Stella, Francesco, Cosimo Della Santina, and Josie Hughes. "How can LLMs transform the robotic design process?." Nature Machine Intelligence (2023): 1-4. [Google Scholar] [Paper]
- Zhang, Chaoning, et al. "A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering." arXiv preprint arXiv:2306.06211 (2023). [Google Scholar] [Paper]
- Yang, Sherry, et al. "Foundation models for decision making: Problems, methods, and opportunities." arXiv preprint arXiv:2303.04129 (2023). [Google Scholar] [Paper]
- Zhang, Xinsong, et al. "Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks." arXiv preprint arXiv:2301.05065 (2023). [Google Scholar] [Paper]
- Bommasani, Rishi, et al. "On the opportunities and risks of foundation models." arXiv preprint arXiv:2108.07258 (2021). [Google Scholar] [Paper]
In our paper, we divide the textual instructions into four categories.
- Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018). [Google Scholar] [Paper]
- Du, Nan, et al. "Glam: Efficient scaling of language models with mixture-of-experts." International Conference on Machine Learning. PMLR, 2022. [Google Scholar] [Paper]
- Claude 3 [Website]
- Yuan, Lu, et al. "Florence: A new foundation model for computer vision." arXiv preprint arXiv:2111.11432 (2021). [Google Scholar] [Paper]
- Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022). [Google Scholar] [Paper]
- Saharia, Chitwan, et al. "Photorealistic text-to-image diffusion models with deep language understanding." Advances in Neural Information Processing Systems 35 (2022): 36479-36494. [Google Scholar] [Paper]
- Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. [Google Scholar] [Paper]
- Kang, Minguk, et al. "Scaling up gans for text-to-image synthesis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. [Google Scholar] [Paper]
- Cao, Yunkang, et al. "Segment Any Anomaly without Training via Hybrid Prompt Regularization." arXiv preprint arXiv:2305.10724 (2023). [Google Scholar] [Paper]
- Zou, Xueyan, et al. "Segment everything everywhere all at once." arXiv preprint arXiv:2304.06718 (2023). [Google Scholar] [Paper]
- Cherti, Mehdi, et al. "Reproducible scaling laws for contrastive language-image learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. [Google Scholar] [Paper]
- Li, Junnan, et al. "Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation." International Conference on Machine Learning. PMLR, 2022. [Google Scholar] [Paper]
- Alayrac, Jean-Baptiste, et al. "Flamingo: a visual language model for few-shot learning." Advances in Neural Information Processing Systems 35 (2022): 23716-23736. [Google Scholar] [Paper]
- Huang, Shaohan, et al. "Language is not all you need: Aligning perception with language models." arXiv preprint arXiv:2302.14045 (2023). [Google Scholar] [Paper]
- Girdhar, Rohit, et al. "Imagebind: One embedding space to bind them all." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. [Paper]
- Wei, Longhui, et al. "Mvp: Multimodality-guided visual pre-training." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.[Arxiv]
- Gemini 1.5. [Website]
- Reed, Scott, et al. "A generalist agent." arXiv preprint arXiv:2205.06175 (2022). [Google Scholar] [Paper]
- Team, Adaptive Agent, et al. "Human-timescale adaptation in an open-ended task space." arXiv preprint arXiv:2301.07608 (2023). [Google Scholar] [Paper]
- Xu, Yunbi, et al. "Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction." Molecular Plant 15.11 (2022): 1664-1695. [Google Scholar] [Paper]
- Williams, Dominic, Fraser MacFarlane, and Avril Britten. "Leaf Only SAM: A Segment Anything Pipeline for Zero-Shot Automated Leaf Segmentation." arXiv preprint arXiv:2305.09418 (2023). [Google Scholar] [Paper]
- Yang, Xiao, et al. "SAM for Poultry Science." arXiv preprint arXiv:2305.10254 (2023). [Google Scholar] [Paper]
- Stella, Francesco, Cosimo Della Santina, and Josie Hughes. "How can LLMs transform the robotic design process?." Nature Machine Intelligence 5.6 (2023): 561-564. [Google Scholar] [Paper]
- Tzachor, Asaf, et al. "Large language models and agricultural extension services." Nature Food 4.11 (2023): 941-948. [Google Scholar] [Paper]
- Lu, Guoyu, et al. "Agi for agriculture." arXiv preprint arXiv:2304.06136 (2023). [Google Scholar] [Paper]
- Yang, Xianjun, et al. "Pllama: An open-source large language model for plant science." arXiv preprint arXiv:2401.01600 (2024). [Google Scholar] [Paper]
- Shutske, John M. "Harnessing the Power of Large Language Models in Agricultural Safety & Health." Journal of Agricultural Safety and Health (2023): 0. [Google Scholar] [Paper]
- Kuska, Matheus Thomas, Mirwaes Wahabzada, and Stefan Paulus. "Ai-Chatbots for Agriculture-Where Can Large Language Models Provide Substantial Value?." Available at SSRN 4685971. [Google Scholar] [Paper]
- Cao, Yiyi, et al. "Cucumber disease recognition with small samples using image-text-label-based multi-modal language model." Computers and Electronics in Agriculture 211 (2023): 107993. [Google Scholar] [Paper]
- Stella, Francesco, Cosimo Della Santina, and Josie Hughes. "How can LLMs transform the robotic design process?." Nature Machine Intelligence 5.6 (2023): 561-564. [Google Scholar] [Paper]
- Tan, Chenjiao, et al. "On the promises and challenges of multimodal foundation models for geographical, environmental, agricultural, and urban planning applications." arXiv preprint arXiv:2312.17016 (2023). [Google Scholar] [Paper]
- Zhao, Xinyan, Baiyan Chen, Mengxue Ji, Xinyue Wang, Yuhan Yan, Jinming Zhang, Shiyingjie Liu, Muyang Ye, and Chunli Lv. "Implementation of Large Language Models and Agricultural Knowledge Graphs for Efficient Plant Disease Detection." Agriculture 14, no. 8 (2024): 1359.[Paper]
- Fattepur, Bhumika, A. Sakshi, A. Abhishek, and Sneha Varur. "Cultivating Prosperity: A Fusion of IoT Data with Machine Learning and Deep Learning for Precision Crop Recommendations." In 2024 5th International Conference for Emerging Technology (INCET), pp. 1-6. IEEE, 2024. [Paper]
- Xie, Yiqun, Zhihao Wang, Weiye Chen, Zhili Li, Xiaowei Jia, Yanhua Li, Ruichen Wang, Kangyang Chai, Ruohan Li, and Sergii Skakun. "When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery." arXiv preprint arXiv:2404.11797 (2024).