Building RAG Agents with LLMs

Hands-On Seminar – Participants are required to Bring Their Laptops

Main Speaker

Guest Speaker

מרצה אורח

Learning Tracks

Course ID

42634

Date

24/11/2024

Time

Daily seminar
9:00-16:30

Location

Daniel Hotel, 60 Ramat Yam st. Herzliya

Overview

The evolution and adoption of large language models (LLMs) have been nothing short of revolutionary, with retrieval-based systems at the forefront of this technological leap. These models are not just tools for automation; they are partners in enhancing productivity, capable of holding informed conversations by interacting with a vast array of tools and documents. This course is designed for those eager to explore the potential of these systems, focusing on practical deployment and the efficient implementation required to manage the considerable demands of both users and deep learning models. As we delve into the intricacies of LLMs, participants will gain insights into advanced orchestration techniques that include internal reasoning, dialog management, and effective tooling strategies.

Who Should Attend

  • Compose an LLM system that can interact predictably with a user by leveraging internal and external reasoning components.
  • Design a dialog management and document reasoning system that maintains state and coerces information into structured formats.
  • Leverage embedding models for efficient similarity queries for content retrieval and dialog guardrailing.
  • Implement, modularize, and evaluate a RAG agent that can answer questions about the research papers in its dataset without any fine-tuning.
By the end of this workshop, participants will have a solid understanding of RAG agents and the tools necessary to develop their own LLM applications.

Prerequisites

  • Introductory deep learning knowledge, with comfort with PyTorch and transfer learning preferred.
  • Intermediate Python experience, including object-oriented programming and libraries

Course Contents

    • Introduction to the workshop and setting up the environment.
    • Exploration of LLM inference interfaces and microservices.
    • Designing LLM pipelines using LangChain, Gradio, and LangServe.
    • Managing dialog states and integrating knowledge extraction.
    • Strategies for working with long-form documents.
    • Utilizing embeddings for semantic similarity and guardrailing.
    • Implementing vector stores for efficient document retrieval.
    • Evaluation, assessment, and certification.
Guest Lecture on the Topic: “How to run production LLM efficiently on GPUs” In this session, we will explore how to achieve state-of-the-art inference using TensorRT LLM and NIM. Attendees will dive into advanced TensorRT LLM features, such as in-flight batching and KV caching, designed to accelerate large-scale LLM production systems. We’ll also review unique inference challenges, including high computational demands, latency, and throughput. Discover how TensorRT LLM efficiently optimizes LLM performance on NVIDIA GPUs and integrates seamlessly with NIMs, offering an easy-to-use inference microservice that accelerates the deployment of foundation models across any cloud platform  
Guest Lecture on the Topic: “AI and RAG, the Oracle perspective”  

The conference starts in

Days
Hours
Minutes
Seconds