A Study on Performance Improvement of Prompt Engineering for Generative AI with a Large Language Model
DOI:
https://doi.org/10.13052/jwe1540-9589.2285Keywords:
AI, large language model, generative AI, few-shot learning, prompt engineering, AI ChatbotAbstract
In the realm of Generative AI, where various models are introduced, prompt engineering emerges as a significant technique within natural language processing-based Generative AI. Its primary function lies in effectively enhancing the results of sentence generation by large language models (LLMs). Notably, prompt engineering has gained attention as a method capable of improving LLM performance by modifying the structure of input prompts alone. In this study, we apply prompt engineering to Korean-based LLMs, presenting an efficient approach for generating specific conversational responses with less data. We achieve this through the utilization of the query transformation module (QTM). Our proposed QTM transforms input prompt sentences into three distinct query methods, breaking them down into objectives and key points, making them more comprehensible for LLMs. For performance validation, we employ Korean versions of LLMs, specifically SKT GPT-2 and Kakaobrain KoGPT-3. We compare four different query methods, including the original unmodified query, using Google SSA to assess the naturalness and specificity of generated sentences. The results demonstrate an average improvement of 11.46% when compared to the unmodified query, underscoring the efficacy of the proposed QTM in achieving enhanced performance.
Downloads
References
Zhao, Wayne Xin, et al. “A survey of large language models.” arXiv preprint arXiv:2303.18223 (2023).
Hadi, Muhammad Usman; tashi, qasem al; Qureshi, Rizwan; Shah, Abbas; muneer, amgad; Irfan, Muhammad; et al. (2023). A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.23589741.v1.
T. Wu et al., “A Brief Overview of ChatGPT: The History, Status Quo and Potential Future Development,” in IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 5, pp. 1122–1136, May 2023, doi: 10.1109/JAS.2023.123618.
Gozalo-Brizuela, Roberto, and Eduardo C. Garrido-Merchan. “ChatGPT is not all you need. A State of the Art Review of large Generative AI models.” arXiv preprint arXiv:2301.04655 (2023).
Sang-Woo Lee, Gichang Lee, and Jung-Woo Ha (2023). Recent Studies on Hyperscale Language Models from NAVER. Communications of the Korean Institute of Information Scientists and Engineers, 41(4), 91–97.
Howard, J., and Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers) (Vol. 1, pp. 328–339). https://doi.org/10.18653/v1/p18-1031.
S. An, J. Ryu, W. Cho, J Noh, and H. Son, Rise of Hyper-scale LLM (Large Language Model) and issues, Software Policy & Research Institute Issue Report IS-158, v 1.2, 2023.02
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, £., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem (Nips), 5999–6009.
Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 – 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – Proceedings of the Conference, 1(Mlm), 4171–4186.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., …Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020-Decem.
Qiu, X. P., Sun, T. X., Xu, Y. G., Shao, Y. F., Dai, N., and Huang, X. J. (2020). Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10), 1872–1897. https://doi.org/10.1007/s11431-020-1647-3.
Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training. Retrieved from https://openai.com/research/language-unsupervised.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. Retrieved from https://openai.com/research/better-language-models.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., …Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901).
Chang, Y., Wang, X., Wang, J., Wu, Y., Zhu, K., Chen, H., … and Xie, X. (2023). A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109. https://doi.org/10.48550/arXiv.2307.03109.
Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., and Irving, G. (2019). Fine-Tuning Language Models from Human Preferences. arXiv preprint arXiv:1909.08593.
Liu, X., Ji, K., Fu, Y., Tam, W., Du, Z., Yang, Z., and Tang, J. (2022, May). P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 61–68). https://doi.org/10.18653/v1/2022.acl-short.8.
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D. (2016). Matching Networks for One Shot Learning. In Advances in Neural Information Processing Systems 29 (NIPS 2016). URL: https://papers.nips.cc/paper/6385-matching-networks-for-one-shot-learning.
Ravi, S., and Larochelle, H. (2017). Optimization as a Model for Few-Shot Learning. In International Conference on Learning Representations (ICLR). URL: https://openreview.net/forum?id=rJY0-Kcll.
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., … and Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.
Liu, V., and Chilton, L. B. (2022, April). Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1–23). https://doi.org/10.1145/3491102.3501825.
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., and Ba, J. (2022). Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
Oppenlaender, J. (2022). Prompt engineering for text-based generative art. arXiv preprint arXiv:2204.13988. https://doi.org/10.48550/arXiv.2204.13988.
Bisong, E. (2019). Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners (pp. 59–64). Apress. https://doi.org/10.1007/978-1-4842-4470-8_7.
Sukhdeve, D. S. R., and Sukhdeve, S. S. (2023). Google Colaboratory. In Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services (pp. 11–34). Apress. https://doi.org/10.1007/978-1-4842-9688-2_2
Monteiro, T. (2023, March 10). Meet Google Colab: Developing AI on the Cloud. gHacks Tech News. Retrieved from https://www.ghacks.net/2023/03/10/what-is-google-colab/.
Wikipedia contributors. (n.d.). Language Integrated Query. In Wikipedia, The Free Encyclopedia. Retrieved from https://en.wikipedia.org/wiki/Language_Integrated_Query.
Adiwardana, D., Luong, M.-T., So, D. R., Hall, J., Fiedel, N., Thoppilan, R., Yang, Z., Kulshreshtha, A., Nemade, G., Lu, Y., and Le, Q. V. (n.d.). Towards a Human-like Open-Domain Chatbot. Retrieved from https://ar5iv.org/abs/2001.09977.
Google Research Blog. (n.d.). Towards a Conversational Agent that Can Chat About…Anything. Retrieved from https://blog.research.google/2020/01/towards-conversational-agent-that-can.html.