talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (8 results)

See all 8 →

Activities & events

Title & Speakers Event

Ready to move beyond traditional AI apps? Join us for this Ignite Reactor Learn Live episode where we’ll walk through building agentic AI solutions using Azure AI Foundry.

Learn how to design intelligent agents that can reason, plan, and act autonomously — unlocking new possibilities for enterprise workflows and customer experiences.

📖 Follow along with the Learn plan 📌 Learn more about the series here

Get started with AI agent development

Ready to move beyond traditional AI apps? Join us for this Ignite Reactor Learn Live episode where we’ll walk through building agentic AI solutions using Azure AI Foundry Agent Service.

Learn how to design intelligent agents that can reason, plan, and act autonomously — unlocking new possibilities for enterprise workflows and customer experiences.

Get started with AI agent development

Ready to move beyond traditional AI apps? Join us for this Ignite Reactor Learn Live episode where we’ll walk through building agentic AI solutions using Microsoft Foundry.

Learn how to design intelligent agents that can reason, plan, and act autonomously — unlocking new possibilities for enterprise workflows and customer experiences.

📖 Follow along with the Learn plan 📌 Learn more about the series here

Get started with AI agent development

Ready to move beyond traditional AI apps? Join us for this Ignite Reactor Learn Live episode where we’ll walk through building agentic AI solutions using Azure AI Foundry Agent Service.

Learn how to design intelligent agents that can reason, plan, and act autonomously — unlocking new possibilities for enterprise workflows and customer experiences.

Get started with AI agent development

Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link).

This is virtual event for our global community, please double check your local time. Can't make it live? Register anyway! We'll send you a recording of the webinar after the event.

Description: The AI Deep Dive Series is a hands-on virtual initiative designed to empower developers to architect the next generation of Agentic AI. Moving beyond basic prompting, this series guides you through the complete engineering lifecycle using Google’s advanced stack.

You will master the transition from local Gemini CLI environments to building intelligent agents with the Agent Development Kit (ADK) and Model Context Protocol (MCP), culminating in the deployment of secure, collaborative Agent-to-Agent (A2A) ecosystems on Google Cloud Run. Join us to build AI systems that can truly reason, act, and scale.

All Sessions: Dec 4th, Dec 11th, Dec 13th, Dec 18th and Dec 20th.

Session 1 (Dec 4th) - Get started with Gemini 3.0 using AI Studio Speaker: Arun KG (Staff Customer Engineer, GenAI, Google) Abstract: This session is your fast track to deploying next-generation AI applications using Gemini 3.0 and Google AI Studio.

You will learn the new 'vibe coding' workflow to rapidly prototype full-stack web apps from natural language and instantly deploy them to a production-ready Cloud Run endpoint. Master the seamless integration that takes you from a simple prompt to a secure, scalable, and fully managed serverless service in minutes, fundamentally changing your development cycle. Gemini 3, AI Studio Cloud Run

All attendees will get $5 cloud credits

Google AI Deep Dive Series (Virtual) - Session 1

Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link).

This is virtual event for our global community, please double check your local time. Can't make it live? Register anyway! We'll send you a recording of the webinar after the event.

Description: The AI Deep Dive Series is a hands-on virtual initiative designed to empower developers to architect the next generation of Agentic AI. Moving beyond basic prompting, this series guides you through the complete engineering lifecycle using Google’s advanced stack.

You will master the transition from local Gemini CLI environments to building intelligent agents with the Agent Development Kit (ADK) and Model Context Protocol (MCP), culminating in the deployment of secure, collaborative Agent-to-Agent (A2A) ecosystems on Google Cloud Run. Join us to build AI systems that can truly reason, act, and scale.

All Sessions: Dec 4th, Dec 11th, Dec 13th, Dec 18th and Dec 20th.

Session 1 (Dec 4th) - Get started with Gemini 3.0 using AI Studio Speaker: Arun KG (Staff Customer Engineer, GenAI, Google) Abstract: This session is your fast track to deploying next-generation AI applications using Gemini 3.0 and Google AI Studio.

You will learn the new 'vibe coding' workflow to rapidly prototype full-stack web apps from natural language and instantly deploy them to a production-ready Cloud Run endpoint. Master the seamless integration that takes you from a simple prompt to a secure, scalable, and fully managed serverless service in minutes, fundamentally changing your development cycle. Gemini 3, AI Studio Cloud Run

All attendees will get $5 cloud credits

Google AI Deep Dive Series (Virtual) - Session 1

Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link).

This is virtual event for our global community, please double check your local time. Can't make it live? Register anyway! We'll send you a recording of the webinar after the event.

Description: The AI Deep Dive Series is a hands-on virtual initiative designed to empower developers to architect the next generation of Agentic AI. Moving beyond basic prompting, this series guides you through the complete engineering lifecycle using Google’s advanced stack.

You will master the transition from local Gemini CLI environments to building intelligent agents with the Agent Development Kit (ADK) and Model Context Protocol (MCP), culminating in the deployment of secure, collaborative Agent-to-Agent (A2A) ecosystems on Google Cloud Run. Join us to build AI systems that can truly reason, act, and scale.

All Sessions: Dec 4th, Dec 11th, Dec 13th, Dec 18th and Dec 20th.

Session 1 (Dec 4th) - Get started with Gemini 3.0 using AI Studio Speaker: Arun KG (Staff Customer Engineer, GenAI, Google) Abstract: This session is your fast track to deploying next-generation AI applications using Gemini 3.0 and Google AI Studio.

You will learn the new 'vibe coding' workflow to rapidly prototype full-stack web apps from natural language and instantly deploy them to a production-ready Cloud Run endpoint. Master the seamless integration that takes you from a simple prompt to a secure, scalable, and fully managed serverless service in minutes, fundamentally changing your development cycle. Gemini 3, AI Studio Cloud Run

All attendees will get $5 cloud credits

Google AI Deep Dive Series (Virtual) - Session 1

Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link).

This is virtual event for our global community, please double check your local time. Can't make it live? Register anyway! We'll send you a recording of the webinar after the event.

Description: The AI Deep Dive Series is a hands-on virtual initiative designed to empower developers to architect the next generation of Agentic AI. Moving beyond basic prompting, this series guides you through the complete engineering lifecycle using Google’s advanced stack.

You will master the transition from local Gemini CLI environments to building intelligent agents with the Agent Development Kit (ADK) and Model Context Protocol (MCP), culminating in the deployment of secure, collaborative Agent-to-Agent (A2A) ecosystems on Google Cloud Run. Join us to build AI systems that can truly reason, act, and scale.

All Sessions: Dec 4th, Dec 11th, Dec 13th, Dec 18th and Dec 20th.

Session 1 (Dec 4th) - Get started with Gemini 3.0 using AI Studio Speaker: Arun KG (Staff Customer Engineer, GenAI, Google) Abstract: This session is your fast track to deploying next-generation AI applications using Gemini 3.0 and Google AI Studio.

You will learn the new 'vibe coding' workflow to rapidly prototype full-stack web apps from natural language and instantly deploy them to a production-ready Cloud Run endpoint. Master the seamless integration that takes you from a simple prompt to a secure, scalable, and fully managed serverless service in minutes, fundamentally changing your development cycle. Gemini 3, AI Studio Cloud Run

All attendees will get $5 cloud credits

Google AI Deep Dive Series (Virtual) - Session 1

Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link).

This is virtual event for our global community, please double check your local time. Can't make it live? Register anyway! We'll send you a recording of the webinar after the event.

Description: The AI Deep Dive Series is a hands-on virtual initiative designed to empower developers to architect the next generation of Agentic AI. Moving beyond basic prompting, this series guides you through the complete engineering lifecycle using Google’s advanced stack.

You will master the transition from local Gemini CLI environments to building intelligent agents with the Agent Development Kit (ADK) and Model Context Protocol (MCP), culminating in the deployment of secure, collaborative Agent-to-Agent (A2A) ecosystems on Google Cloud Run. Join us to build AI systems that can truly reason, act, and scale.

All Sessions: Dec 4th, Dec 11th, Dec 13th, Dec 18th and Dec 20th.

Session 1 (Dec 4th) - Get started with Gemini 3.0 using AI Studio Speaker: Arun KG (Staff Customer Engineer, GenAI, Google) Abstract: This session is your fast track to deploying next-generation AI applications using Gemini 3.0 and Google AI Studio.

You will learn the new 'vibe coding' workflow to rapidly prototype full-stack web apps from natural language and instantly deploy them to a production-ready Cloud Run endpoint. Master the seamless integration that takes you from a simple prompt to a secure, scalable, and fully managed serverless service in minutes, fundamentally changing your development cycle. Gemini 3, AI Studio Cloud Run

All attendees will get $5 cloud credits

Google AI Deep Dive Series (Virtual) - Session 1

Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link).

This is virtual event for our global community, please double check your local time. Can't make it live? Register anyway! We'll send you a recording of the webinar after the event.

Description: The AI Deep Dive Series is a hands-on virtual initiative designed to empower developers to architect the next generation of Agentic AI. Moving beyond basic prompting, this series guides you through the complete engineering lifecycle using Google’s advanced stack.

You will master the transition from local Gemini CLI environments to building intelligent agents with the Agent Development Kit (ADK) and Model Context Protocol (MCP), culminating in the deployment of secure, collaborative Agent-to-Agent (A2A) ecosystems on Google Cloud Run. Join us to build AI systems that can truly reason, act, and scale.

All Sessions: Dec 4th, Dec 11th, Dec 13th, Dec 18th and Dec 20th.

Session 1 (Dec 4th) - Get started with Gemini 3.0 using AI Studio Speaker: Arun KG (Staff Customer Engineer, GenAI, Google) Abstract: This session is your fast track to deploying next-generation AI applications using Gemini 3.0 and Google AI Studio.

You will learn the new 'vibe coding' workflow to rapidly prototype full-stack web apps from natural language and instantly deploy them to a production-ready Cloud Run endpoint. Master the seamless integration that takes you from a simple prompt to a secure, scalable, and fully managed serverless service in minutes, fundamentally changing your development cycle. Gemini 3, AI Studio Cloud Run

All attendees will get $5 cloud credits

Google AI Deep Dive Series (Virtual) - Session 1
Martin Woodward @ Microsoft

Our very own (not so secret) agent, Martin Woodward, takes us through the latest developments in GitHub Copilot with a deep dive into all the announcements from the keynote. You will not only learn how to get started with all the latest and greatest AI enhanced development features across VS Code and GitHub, but you will also learn how to take the best advantage of them in your day-to-day development work.​

AI/ML DevOps GitHub
Microsoft Ignite 2025

Discover the power of Atlassian Rovo, the innovative AI-powered virtual assistant, that is transforming the way business teams work and collaborate. In this practical webinar, we'll show you how to use several of Atlassian's pre-built agents to automate tasks, obtain critical information, and optimize your workflows. You'll also learn, step by step, how to create your agent tailored to your organization's needs.

Is your company ready to boost its productivity with AI in your Atlassian environment? Join us on Aug 21st at 10 am, as Herzum experts demonstrate how and when to utilize out-of-the-box agents in Jira, Confluence, and Jira Service Management.

Additionally, for those who want to venture into developing their own agents, we'll show you how to get started.

This webinar is beneficial for Atlassian platform administrators and users, IT, development, and operations teams, as well as digital transformation leaders.

If you have any questions, please let us know at [email protected]

Atlassian Rovo in Action: Build and Use AI Agents
Akshay Agrawal – guest @ Marimo , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as hidden states and lack of interactivity, and how Marimo addresses these issues with features like reactive execution and Python-native file formats. Akshay also explores the broader landscape of programmatic notebooks, comparing Marimo to other tools like Jupyter, Streamlit, and Hex, highlighting its unique approach to creating data apps directly from notebooks and eliminating the need for separate app development. The conversation delves into the technical architecture of Marimo, its community-driven development, and future plans, including a commercial offering and enhanced AI integration, emphasizing Marimo's role in bridging the gap between data exploration and production-ready applications.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementTired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to 6x while guaranteeing accuracy? Datafold's Migration Agent is the only AI-powered solution that doesn't just translate your code; it validates every single data point to ensure perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multi-system migrations, they deliver production-ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they're turning months-long migration nightmares into week-long success stories.Your host is Tobias Macey and today I'm interviewing Akshay Agrawal about Marimo, a reusable and reproducible Python notebook environmentInterview IntroductionHow did you get involved in the area of data management?Can you describe what Marimo is and the story behind it?What are the core problems and use cases that you are focused on addressing with Marimo?What are you explicitly not trying to solve for with Marimo?Programmatic notebooks have been around for decades now. Jupyter was largely responsible for making them popular outside of academia. How have the applications of notebooks changed in recent years?What are the limitations that have been most challenging to address in production contexts?Jupyter has long had support for multi-language notebooks/notebook kernels. What is your opinion on the utility of that feature as a core concern of the notebook system?Beyond notebooks, Streamlit and Hex have become quite popular for publishing the results of notebook-style analysis. How would you characterize the feature set of Marimo for those use cases?For a typical data team that is working across data pipelines, business analytics, ML/AI engineering, etc. How do you see Marimo applied within and across those contexts?One of the common difficulties with notebooks is that they are largely a single-player experience. They may connect into a shared compute cluster for scaling up execution (e.g. Ray, Dask, etc.). How does Marimo address the situation where a data platform team wants to offer notebooks as a service to reduce the friction to getting started with analyzing data in a warehouse/lakehouse context?How are you seeing teams integrate Marimo with orchestrators (e.g. Dagster, Airflow, Prefect)?What are some of the most interesting or complex engineering challenges that you have had to address while building and evolving Marimo?\What are the most interesting, innovative, or unexpected ways that you have seen Marimo used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Marimo?When is Marimo the wrong choice?What do you have planned for the future of Marimo?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links MarimoJupyterIPythonStreamlitPodcast.init EpisodeVector EmbeddingsDimensionality ReductionKagglePytestPEP 723 script dependency metadataMatLabVisicalcMathematicaRMarkdownRShinyElixir LivebookDatabricks NotebooksPapermillPluto - Julia NotebookHexDirected Acyclic Graph (DAG)Sumble Kaggle founder Anthony Goldblum's startupRayDaskJupytextnbdevDuckDBPodcast EpisodeIcebergSupersetjupyter-marimo-proxyJupyterHubBinderNixAnyWidgetJupyter WidgetsMatplotlibAltairPlotlyDataFusionPolarsMotherDuckThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Airflow Analytics Dagster Data Engineering Data Lakehouse Data Management Datafold dbt Oracle Prefect Python Snowflake
Data Engineering Podcast

Description Evaluating the performance of language models and AI agents can be challenging, especially across diverse tasks and domains. In this session, we'll introduce Unitxt, an open-source framework for unified text evaluation, and explore how it simplifies the process of benchmarking LLMs and agents using a standardized format.

We'll walk through the core ideas behind LLM evaluation—what to measure, how to measure it, and why it matters—and then dive into hands-on examples of evaluating LLMs for quality, reliability, safety and more, as well as evaluating multi-modalities and agentic tool invocation. Whether you're just getting started with evaluation or looking for a powerful and flexible tool to streamline your workflows, this session will offer practical insights and code-based demos to help you get up and running.

Bring your questions, ideas, or examples—we’ll have time for discussion and Q&A at the end!

Speaker Bio Elron Bandel (LinkedIn) works to redefine how language models are tested and used at scale. At IBM Research, he leads projects that enhance researchers' abilities to test and utilize language models at transformative scales. Elron co-authored IBM's standard evaluation platform for large language models and spearheads the development of Unitxt, an open-source Python library for AI performance assessment. His academic record supervised by Prof. Yoav Goldberg included work on developing AlephBERT and its innovative evaluation suite, and research into robust language model testing.

About the AI Alliance

The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

[AI Alliance] Model and Agent Evaluation with Unitxt

Description Evaluating the performance of language models and AI agents can be challenging, especially across diverse tasks and domains. In this session, we'll introduce Unitxt, an open-source framework for unified text evaluation, and explore how it simplifies the process of benchmarking LLMs and agents using a standardized format.

We'll walk through the core ideas behind LLM evaluation—what to measure, how to measure it, and why it matters—and then dive into hands-on examples of evaluating LLMs for quality, reliability, safety and more, as well as evaluating multi-modalities and agentic tool invocation. Whether you're just getting started with evaluation or looking for a powerful and flexible tool to streamline your workflows, this session will offer practical insights and code-based demos to help you get up and running.

Bring your questions, ideas, or examples—we’ll have time for discussion and Q&A at the end!

Speaker Bio Elron Bandel (LinkedIn) works to redefine how language models are tested and used at scale. At IBM Research, he leads projects that enhance researchers' abilities to test and utilize language models at transformative scales. Elron co-authored IBM's standard evaluation platform for large language models and spearheads the development of Unitxt, an open-source Python library for AI performance assessment. His academic record supervised by Prof. Yoav Goldberg included work on developing AlephBERT and its innovative evaluation suite, and research into robust language model testing.

About the AI Alliance

The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

[AI Alliance] Model and Agent Evaluation with Unitxt

Description Evaluating the performance of language models and AI agents can be challenging, especially across diverse tasks and domains. In this session, we'll introduce Unitxt, an open-source framework for unified text evaluation, and explore how it simplifies the process of benchmarking LLMs and agents using a standardized format.

We'll walk through the core ideas behind LLM evaluation—what to measure, how to measure it, and why it matters—and then dive into hands-on examples of evaluating LLMs for quality, reliability, safety and more, as well as evaluating multi-modalities and agentic tool invocation. Whether you're just getting started with evaluation or looking for a powerful and flexible tool to streamline your workflows, this session will offer practical insights and code-based demos to help you get up and running.

Bring your questions, ideas, or examples—we’ll have time for discussion and Q&A at the end!

Speaker Bio Elron Bandel (LinkedIn) works to redefine how language models are tested and used at scale. At IBM Research, he leads projects that enhance researchers' abilities to test and utilize language models at transformative scales. Elron co-authored IBM's standard evaluation platform for large language models and spearheads the development of Unitxt, an open-source Python library for AI performance assessment. His academic record supervised by Prof. Yoav Goldberg included work on developing AlephBERT and its innovative evaluation suite, and research into robust language model testing.

About the AI Alliance

The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

[AI Alliance] Model and Agent Evaluation with Unitxt

Description Evaluating the performance of language models and AI agents can be challenging, especially across diverse tasks and domains. In this session, we'll introduce Unitxt, an open-source framework for unified text evaluation, and explore how it simplifies the process of benchmarking LLMs and agents using a standardized format.

We'll walk through the core ideas behind LLM evaluation—what to measure, how to measure it, and why it matters—and then dive into hands-on examples of evaluating LLMs for quality, reliability, safety and more, as well as evaluating multi-modalities and agentic tool invocation. Whether you're just getting started with evaluation or looking for a powerful and flexible tool to streamline your workflows, this session will offer practical insights and code-based demos to help you get up and running.

Bring your questions, ideas, or examples—we’ll have time for discussion and Q&A at the end!

Speaker Bio Elron Bandel (LinkedIn) works to redefine how language models are tested and used at scale. At IBM Research, he leads projects that enhance researchers' abilities to test and utilize language models at transformative scales. Elron co-authored IBM's standard evaluation platform for large language models and spearheads the development of Unitxt, an open-source Python library for AI performance assessment. His academic record supervised by Prof. Yoav Goldberg included work on developing AlephBERT and its innovative evaluation suite, and research into robust language model testing.

About the AI Alliance

The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

[AI Alliance] Model and Agent Evaluation with Unitxt

Description Evaluating the performance of language models and AI agents can be challenging, especially across diverse tasks and domains. In this session, we'll introduce Unitxt, an open-source framework for unified text evaluation, and explore how it simplifies the process of benchmarking LLMs and agents using a standardized format.

We'll walk through the core ideas behind LLM evaluation—what to measure, how to measure it, and why it matters—and then dive into hands-on examples of evaluating LLMs for quality, reliability, safety and more, as well as evaluating multi-modalities and agentic tool invocation. Whether you're just getting started with evaluation or looking for a powerful and flexible tool to streamline your workflows, this session will offer practical insights and code-based demos to help you get up and running.

Bring your questions, ideas, or examples—we’ll have time for discussion and Q&A at the end!

Speaker Bio Elron Bandel (LinkedIn) works to redefine how language models are tested and used at scale. At IBM Research, he leads projects that enhance researchers' abilities to test and utilize language models at transformative scales. Elron co-authored IBM's standard evaluation platform for large language models and spearheads the development of Unitxt, an open-source Python library for AI performance assessment. His academic record supervised by Prof. Yoav Goldberg included work on developing AlephBERT and its innovative evaluation suite, and research into robust language model testing.

About the AI Alliance

The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

[AI Alliance] Model and Agent Evaluation with Unitxt

Description Evaluating the performance of language models and AI agents can be challenging, especially across diverse tasks and domains. In this session, we'll introduce Unitxt, an open-source framework for unified text evaluation, and explore how it simplifies the process of benchmarking LLMs and agents using a standardized format.

We'll walk through the core ideas behind LLM evaluation—what to measure, how to measure it, and why it matters—and then dive into hands-on examples of evaluating LLMs for quality, reliability, safety and more, as well as evaluating multi-modalities and agentic tool invocation. Whether you're just getting started with evaluation or looking for a powerful and flexible tool to streamline your workflows, this session will offer practical insights and code-based demos to help you get up and running.

Bring your questions, ideas, or examples—we’ll have time for discussion and Q&A at the end!

Speaker Bio Elron Bandel (LinkedIn) works to redefine how language models are tested and used at scale. At IBM Research, he leads projects that enhance researchers' abilities to test and utilize language models at transformative scales. Elron co-authored IBM's standard evaluation platform for large language models and spearheads the development of Unitxt, an open-source Python library for AI performance assessment. His academic record supervised by Prof. Yoav Goldberg included work on developing AlephBERT and its innovative evaluation suite, and research into robust language model testing.

About the AI Alliance

The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

[AI Alliance] Model and Agent Evaluation with Unitxt

Description Evaluating the performance of language models and AI agents can be challenging, especially across diverse tasks and domains. In this session, we'll introduce Unitxt, an open-source framework for unified text evaluation, and explore how it simplifies the process of benchmarking LLMs and agents using a standardized format.

We'll walk through the core ideas behind LLM evaluation—what to measure, how to measure it, and why it matters—and then dive into hands-on examples of evaluating LLMs for quality, reliability, safety and more, as well as evaluating multi-modalities and agentic tool invocation. Whether you're just getting started with evaluation or looking for a powerful and flexible tool to streamline your workflows, this session will offer practical insights and code-based demos to help you get up and running.

Bring your questions, ideas, or examples—we’ll have time for discussion and Q&A at the end!

Speaker Bio Elron Bandel (LinkedIn) works to redefine how language models are tested and used at scale. At IBM Research, he leads projects that enhance researchers' abilities to test and utilize language models at transformative scales. Elron co-authored IBM's standard evaluation platform for large language models and spearheads the development of Unitxt, an open-source Python library for AI performance assessment. His academic record supervised by Prof. Yoav Goldberg included work on developing AlephBERT and its innovative evaluation suite, and research into robust language model testing.

About the AI Alliance

The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

[AI Alliance] Model and Agent Evaluation with Unitxt