Mingyi Zhou, Ph.D.

I obtained my Ph.D. from Monash University in Dec 2024. From 2021 to 2024, I have the privilege of being mentored by Prof. Li Li, Prof. John Grundy, Prof. Chunyang Chen, and Dr. Xiao Chen (in a sequence of the length of supervision) at Monash HumaniSE Lab (leading by Prof. John Grundy) and SMAT Lab (leading by Prof. Li Li). Prior to joining Monash University, I worked with Prof. Yipeng Liu and Prof. Ce Zhu in UESTC from 2017 to 2021.

My current research interests include SE4AI, AI4SE, mobile software engineering, AI security, program analysis and computer vision, especially for the on-device AI security and efficiency. If you are interested in my research, please feel free to contact me.

news

Jun 20, 2025	One papers has been accepted by ICSE’26 Research Track! Congratulations to Zhihao and the team.
Mar 26, 2025	Two papers has been accepted by FSE’25 Industry Track! Congratulations to Farong, Zhihao, and the team.
Jan 25, 2025	My PhD thesis “Towards Improving the Reliability of Deployed Deep Learning Software” has been released!
Jan 10, 2025	Our paper of the first static analysis framework “ArkAnalyzer” for OpenHarmony has been accepted by ICSE’25-SEIP and open sourced. Congratulations to Haonan and the team, welcome to use and cite it!
Nov 06, 2024	Our paper “LLM for Mobile An Initial Roadmap” has been accepted by ACM TOSEM! Congratulations to Daihang and the team.

selected publications

[ICSE’26] MazeBreaker: Multi-Agent Reinforcement Learning for Dynamic Jailbreaking of LLM Security Defenses

Zhihao Lin, Wei Ma, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Yang Liu, Jun Wang, and Li Li

2026
[FSE’25 Insustry] HapTest: The Dynamic Analysis Framework for OpenHarmony

Farong Liu^*, Mingyi Zhou^*, Yakun Zhang, Ting Su, Bo Sun, Jacques Klein, Xiang Gao, and Li Li

2025
[FSE’25 Insustry] HapRepair: Learn to Repair OpenHarmony Apps

Zhihao Lin^*, Mingyi Zhou^*, Wei Ma, Chi Chen, Yun Yang, Jun Wang, Chunming Hu, and Li Li

2025

[ICSE’25 SEIP] ArkAnalyzer: The Static Analysis Framework for OpenHarmony

Haonan Chen, Daihang Chen, Yizhuo Yang, Lingyun Xu, Liang Gao, Mingyi Zhou, Chunming Hu, and Li Li

2025

Bib PDF Code

@misc{chen2025arkanalyzerstaticanalysisframework,
  title = {[ICSE'25 SEIP] ArkAnalyzer: The Static Analysis Framework for OpenHarmony},
  author = {Chen, Haonan and Chen, Daihang and Yang, Yizhuo and Xu, Lingyun and Gao, Liang and Zhou, Mingyi and Hu, Chunming and Li, Li},
  year = {2025},
  eprint = {2501.05798},
  archiveprefix = {arXiv},
  primaryclass = {cs.SE},
  url = {https://arxiv.org/abs/2501.05798},
}

[TOSEM’25] LLM for Mobile: An Initial Roadmap

Daihang Chen, Yonghui Liu, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Shuai Wang, Xiao Chen, Tegawende F. Bissyande, Jacques Klein, and Li Li

ACM Trans. Softw. Eng. Methodol., Dec 2024

Abs DOI Bib PDF

When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to apply LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable native intelligence in mobile devices. In each direction, we further summarize the current research progress and the gaps that still need to be filled by our fellow researchers.
@article{10.1145/3708528, author = {Chen, Daihang and Liu, Yonghui and Zhou, Mingyi and Zhao, Yanjie and Wang, Haoyu and Wang, Shuai and Chen, Xiao and Bissyande, Tegawende F. and Klein, Jacques and Li, Li}, title = {[TOSEM'25] LLM for Mobile: An Initial Roadmap}, year = {2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {1049-331X}, url = {https://doi.org/10.1145/3708528}, doi = {10.1145/3708528}, journal = {ACM Trans. Softw. Eng. Methodol.}, month = dec, }
[ASE’24] DynaMO: Protecting Mobile DL Models through Coupling Obfuscated DL Operators

Mingyi Zhou, Xiang Gao, Xiao Chen, Chunyang Chen, John Grundy, and Li Li

In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA, Dec 2024

Abs DOI Bib PDF Code

Deploying deep learning (DL) models on mobile applications (Apps) has become ever-more popular. However, existing studies show attackers can easily reverse-engineer mobile DL models in Apps to steal intellectual property or generate effective attacks. A recent approach, Model Obfuscation, has been proposed to defend against such reverse engineering by obfuscating DL model representations, such as weights and computational graphs, without affecting model performance. These existing model obfuscation methods use static methods to obfuscate the model representation, or they use half-dynamic methods but require users to restore the model information through additional input arguments. However, these static methods or half-dynamic methods cannot provide enough protection for on-device DL models. Attackers can use dynamic analysis to mine the sensitive information in the inference codes as the correct model information and intermediate results must be recovered at runtime for static and half-dynamic obfuscation methods. We assess the vulnerability of the existing obfuscation strategies using an instrumentation method and tool, DLModelExplorer, that dynamically extracts correct sensitive model information (i.e., weights, computational graph) at runtime. Experiments show it achieves very high attack performance (e.g., 98.76% of weights extraction rate and 99.89% of obfuscating operator classification rate). To defend against such attacks based on dynamic instrumentation, we propose DynaMO, a Dynamic Model Obfuscation strategy similar to Homomorphic Encryption. The obfuscation and recovery process can be done through simple linear transformation for the weights of randomly coupled eligible operators, which is a fully dynamic obfuscation strategy. Experiments show that our proposed strategy can dramatically improve model security compared with the existing obfuscation strategies, with only negligible overheads for on-device models. Our prototype tool is publicly available at https://github.com/zhoumingyi/DynaMO.
@inproceedings{10.1145/3691620.3694998, author = {Zhou, Mingyi and Gao, Xiang and Chen, Xiao and Chen, Chunyang and Grundy, John and Li, Li}, title = {[ASE'24] DynaMO: Protecting Mobile DL Models through Coupling Obfuscated DL Operators}, year = {2024}, isbn = {9798400712487}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3691620.3694998}, doi = {10.1145/3691620.3694998}, booktitle = {Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering}, pages = {204–215}, numpages = {12}, keywords = {SE for AI, AI safety, on-device AI}, location = {Sacramento, CA, USA}, series = {ASE '24}, }
[ISSTA’24] Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models

Mingyi Zhou, Xiang Gao, Pei Liu, John Grundy, Chunyang Chen, Xiao Chen, and Li Li

In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, Dec 2024

Abs DOI Bib PDF Code

Recent studies show that on-device deployed deep learning (DL) models, such as those of Tensor Flow Lite (TFLite), can be easily extracted from real-world applications and devices by attackers to generate many kinds of adversarial and other attacks. Although securing deployed on-device DL models has gained increasing attention, no existing methods can fully prevent these attacks. Traditional software protection techniques have been widely explored. If on-device models can be implemented using pure code, such as C++, it will open the possibility of reusing existing robust software protection techniques. However, due to the complexity of DL models, there is no automatic method that can translate DL models to pure code. To fill this gap, we propose a novel method, CustomDLCoder, to automatically extract on-device DL model information and synthesize a customized executable program for a wide range of DL models. CustomDLCoder first parses the DL model, extracts its backend computing codes, configures the extracted codes, and then generates a customized program to implement and deploy the DL model without explicit model representation. The synthesized program hides model information for DL deployment environments since it does not need to retain explicit model representation, preventing many attacks on the DL model. In addition, it improves ML performance because the customized code removes model parsing and preprocessing steps and only retains the data computing process. Our experimental results show that CustomDLCoder improves model security by disabling on-device model sniffing. Compared with the original on-device platform (i.e., TFLite), our method can accelerate model inference by 21.0% and 24.3% on x86-64 and ARM64 platforms, respectively. Most importantly, it can significantly reduce memory consumption by 68.8% and 36.0% on x86-64 and ARM64 platforms, respectively.
@inproceedings{10.1145/3650212.3652119, author = {Zhou, Mingyi and Gao, Xiang and Liu, Pei and Grundy, John and Chen, Chunyang and Chen, Xiao and Li, Li}, title = {[ISSTA'24] Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models}, year = {2024}, isbn = {9798400706127}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3650212.3652119}, doi = {10.1145/3650212.3652119}, booktitle = {Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis}, pages = {174–185}, numpages = {12}, keywords = {AI safety, SE for AI, software optimization for AI deployment}, location = {Vienna, Austria}, series = {ISSTA 2024}, }

[AAAI’24] Concealing Sensitive Samples against Gradient Leakage in Federated Learning

Jing Wu, Munawar Hayat, Mingyi Zhou, and Mehrtash Harandi

Proceedings of the AAAI Conference on Artificial Intelligence, Mar 2024

DOI Bib PDF Code

@article{Wu_Hayat_Zhou_Harandi_2024,
  title = {[AAAI'24] Concealing Sensitive Samples against Gradient Leakage in Federated Learning},
  volume = {38},
  url = {https://ojs.aaai.org/index.php/AAAI/article/view/30171},
  doi = {10.1609/aaai.v38i19.30171},
  abstractnote = {Federated Learning (FL) is a distributed learning paradigm that enhances users’ privacy by eliminating the need for clients to share raw, private data with the server.
    Despite the success, recent studies expose the vulnerability of FL to model inversion attacks, where adversaries reconstruct users’ private data via eavesdropping on the shared gradient information. We hypothesize that a key factor in the success of such attacks is the low entanglement among gradients per data within the batch during stochastic optimization. This creates a vulnerability that an adversary can exploit to reconstruct the sensitive data. Building upon this insight, we present a simple, yet effective defense strategy that obfuscates the gradients of the sensitive data with concealed samples. To achieve this, we propose synthesizing concealed samples to mimic the sensitive data at the gradient level while ensuring their visual dissimilarity from the actual sensitive data. Compared to the previous art, our empirical evaluations suggest that the proposed technique provides the strongest protection while simultaneously maintaining the FL performance. Code is located at https://github.com/JingWu321/DCS-2.},
  number = {19},
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
  author = {Wu, Jing and Hayat, Munawar and Zhou, Mingyi and Harandi, Mehrtash},
  year = {2024},
  month = mar,
  pages = {21717-21725},
}

[ICSE’24] Investigating White-Box Attacks for On-Device Models

Mingyi Zhou, Xiang Gao, Jing Wu, Kui Liu, Hailong Sun, and Li Li

In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, Mar 2024

Abs DOI Bib PDF Code

Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Although the structure and parameters information of these models can be accessed, existing on-device attacking approaches only generate black-box attacks (i.e., indirect white-box attacks), which are less effective and efficient than white-box strategies. This is because mobile deep learning (DL) frameworks like TensorFlow Lite (TFLite) do not support gradient computing (referred to as non-debuggable models), which is necessary for white-box attacking algorithms. Thus, we argue that existing findings may underestimate the harm-fulness of on-device attacks. To validate this, we systematically analyze the difficulties of transforming the on-device model to its debuggable version and propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to its debuggable version, enabling attackers to launch white-box attacks. Our empirical results show that our approach is effective in achieving automated transformation (i.e., 92.6%) among 244 TFLite models. Compared with previous attacks using surrogate models, REOM enables attackers to achieve higher attack success rates (10.23%→89.03%) with a hundred times smaller attack perturbations (1.0→0.01). Our findings emphasize the need for developers to carefully consider their model deployment strategies, and use white-box methods to evaluate the vulnerability of on-device models. Our artifacts 1 are available.
@inproceedings{10.1145/3597503.3639144, author = {Zhou, Mingyi and Gao, Xiang and Wu, Jing and Liu, Kui and Sun, Hailong and Li, Li}, title = {[ICSE'24] Investigating White-Box Attacks for On-Device Models}, year = {2024}, isbn = {9798400702174}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3597503.3639144}, doi = {10.1145/3597503.3639144}, booktitle = {Proceedings of the IEEE/ACM 46th International Conference on Software Engineering}, articleno = {152}, numpages = {12}, location = {Lisbon, Portugal}, series = {ICSE '24}, }
[ISSTA’23] ModelObfuscator: Obfuscating Model Information to Protect Deployed ML-Based Systems

Mingyi Zhou, Xiang Gao, Jing Wu, John Grundy, Xiao Chen, Chunyang Chen, and Li Li

In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA, Mar 2023

Abs DOI Bib PDF Code

More and more edge devices and mobile apps are leveraging deep learning (DL) capabilities. Deploying such models on devices – referred to as on-device models – rather than as remote cloud-hosted services, has gained popularity because it avoids transmitting user’s data off of the device and achieves high response time. However, on-device models can be easily attacked, as they can be accessed by unpacking corresponding apps and the model is fully exposed to attackers. Recent studies show that attackers can easily generate white-box-like attacks for an on-device model or even inverse its training data. To protect on-device models from white-box attacks, we propose a novel technique called model obfuscation. Specifically, model obfuscation hides and obfuscates the key information – structure, parameters and attributes – of models by renaming, parameter encapsulation, neural structure obfuscation, shortcut injection, and extra layer injection. We have developed a prototype tool ModelObfuscator to automatically obfuscate on-device TFLite models. Our experiments show that this proposed approach can dramatically improve model security by significantly increasing the difficulty of parsing models’ inner information, without increasing the latency of DL models. Our proposed on-device model obfuscation has the potential to be a fundamental technique for on-device model deployment. Our prototype tool is publicly available at https://github.com/zhoumingyi/ModelObfuscator.
@inproceedings{10.1145/3597926.3598113, author = {Zhou, Mingyi and Gao, Xiang and Wu, Jing and Grundy, John and Chen, Xiao and Chen, Chunyang and Li, Li}, title = {[ISSTA'23] ModelObfuscator: Obfuscating Model Information to Protect Deployed ML-Based Systems}, year = {2023}, isbn = {9798400702211}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3597926.3598113}, doi = {10.1145/3597926.3598113}, booktitle = {Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis}, pages = {1005–1017}, numpages = {13}, keywords = {AI safety, SE for AI, model deployment, model obfuscation}, location = {Seattle, WA, USA}, series = {ISSTA 2023}, }

[CVPR’20 Oral] DaST: Data-Free Substitute Training for Adversarial Attacks

Mingyi Zhou, Jing Wu, Yipeng Liu, Shuaicheng Liu, and Ce Zhu

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Mar 2020

DOI Bib PDF Code

@inproceedings{zhou2020dast,
  author = {Zhou, Mingyi and Wu, Jing and Liu, Yipeng and Liu, Shuaicheng and Zhu, Ce},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  title = {[CVPR'20 Oral] DaST: Data-Free Substitute Training for Adversarial Attacks},
  year = {2020},
  pages = {231-240},
  doi = {10.1109/CVPR42600.2020.00031},
}