Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

The rapid advancement of LLMs has led to widespread adoption across various domains, but it has also raised concerns about data security and privacy, particularly with publicly available and commercially operated platforms. Given their high computational demands, cloud environments are the obvious choice for deployment. As a result, organizations are increasingly deploying LLMs in confined cloud environments to protect sensitive data while leverazing scalable cloud resources. However, deploying LLMs in cloud environments remains a complex and time-consuming process that requires specialized skills and expertise in various areas, such as infrastructure management, resource allocation, and model setup. Testing and comparing LLMs to select the appropriate one is particularly challenging as different models are trained for different purposes, making the direct comparison nontrivial. Furthermore, differences in model architectures, training data, and fine-tuning strategies make objective evaluation difficult, limiting the effectiveness of traditional benchmarking approaches. To address these challenges, we present a cloud-native system that automates both the deployment and evaluation of LLMs. Our contributions are twofold: (i) we automate the provisioning and deployment of LLMs on various cloud platforms to stream-line infrastructure setup, and (ii) we develop a lightweight evaluation framework that leverages the LLM-as-a-Judge approach, where an independent LLM systematically assesses and compares different models based on predefined evaluation criteria. Our ongoing work aims to optimize LLM deployment by selecting cost-efficient cloud resources. We are also enhancing the evaluation framework with diverse prompts, broader metrics, and cross-model validation for fair, reproducible benchmarking.

More information Original publication

DOI

10.1109/CLOUD67622.2025.00053

Type

Conference paper

Publication Date

2025-01-01T00:00:00+00:00

Pages

448 - 450

Total pages

2