AI infrastructure
UVT operates an Nvidia DGX system within its data center, primarily designed for research and development in artificial intelligence, machine learning, and advanced data analysis techniques. The system is equipped with an AMD EPYC 7742 processor featuring 64 cores and 128 threads, running at a base frequency of 2.25 GHz (boosting up to 3.4 GHz). It has 1 TB of DDR4 RAM with a frequency of 3200 MHz and ECC support. The graphical performance is provided by eight NVIDIA A100 GPUs, each with 40 GB of HBM2e memory, 6,912 CUDA cores, and 432 Tensor cores, delivering a bandwidth of 1.6 TB/s. The storage consists of a 15 TB NVMe disk.
The system utilizes Python notebooks, configured using Docker containers, enabling efficient deployment and management of applications. The distribution and management of computational tasks are handled by the SLURM (Simple Linux Utility for Resource Management) scheduling system, which effectively allocates computing resources based on user requests. SLURM provides a robust solution for job scheduling, queue management, and optimizing computing performance in a multi-user environment. Thanks to this system, resources can be dynamically allocated and managed according to current demands, ensuring optimal infrastructure utilization.
Configuration via Docker containers ensures application portability by packaging them with all dependencies into isolated environments. This approach eliminates issues related to differences in runtime environments and guarantees consistent behavior across various platforms. As a result, applications can be easily deployed and executed on different systems without requiring additional modifications or configurations.
For more information, visit https://slurm.website.tuke.sk/ (the page will be continuously updated).
Ing. Maroš Harahus, Phd. maros.harahus@tuke.sk, kl.7630