talk-data.com talk-data.com

Event

SciPy 2025

2025-07-07 – 2025-07-13 PyData

Activities tracked

7

Filtering by: GenAI ×

Sessions & talks

Showing 1–7 of 7 · Newest first

Search within this event →

Real-world Impacts of Generative AI in the Research Software Engineer and Data Scientist Workplace

2025-07-11
talk

Recent breakthroughs in large language model-based artificial intelligence (AI) have captured the public’s interest in AI more broadly. With the growing adoption of these technologies in professional and educational settings, public dialog about their potential impacts on the workforce has been ubiquitous. It is, however, difficult to separate the public dialog about the potential impact of the technology from the experienced impact of the technology in the research software engineer and data science workplace. Likewise, it is challenging to separate the generalized anxiety about AI from its specific impacts on individuals working in specialized work settings.

As research software engineers (RSEs) and those in adjacent computational fields engage with AI in the workplace, the realities of the impacts of this technology are becoming clearer. However, much of the dialog has been limited to high-level discussion around general intra-institutional impacts, and lacks the nuance required to provide helpful guidance to RSE practitioners in research settings, specifically. Surprisingly, many RSEs are not involved in career discussions on what the rise of AI means for their professions.

During this BoF, we will hold a structured, interactive discussion session with the goal of identifying critical areas of engagement with AI in the workplace including: current use of AI, AI assistance and automation, AI skills and workforce development, AI and open science, and AI futures. This BoF will represent the first of a series of discussions held jointly by the Academic Data Science Alliance and the US Research Software Engineer Association over the coming year, with support from Schmidt Sciences. The insights gathered from these sessions will inform the development of guidance resources on these topic areas for the broader RSE and computational data practitioner communities.

Accelerating scientific data releases: Automated metadata generation with LLM agents

2025-07-11
talk

The rapid growth of scientific data repositories demands innovative solutions for efficient metadata creation. In this talk, we present our open-source project that leverages large language models to automate the generation of standard-compliant metadata files from raw scientific datasets. Our approach harnesses the capabilities of pre-trained open source models, finetuned with domain-specific data, and integrated with Langgraph to orchestrate a modular, end-to-end pipeline capable of ingesting heterogeneous raw data files and outputting metadata conforming to specific standards.

The methodology involves a multi-stage process where raw data is first parsed and analyzed by the LLM to extract relevant scientific and contextual information. This information is then structured into metadata templates that adhere strictly to recognized standards, thereby reducing human error and accelerating the data release cycle. We demonstrate the effectiveness of our approach using the USGS ScienceBase repository, where we have successfully generated metadata for a variety of scientific datasets, including images, time series, and text data.

Beyond its immediate application to the USGS ScienceBase repository, our open-source framework is designed to be extensible, allowing adaptation to other data release processes across various scientific domains. We will discuss the technical challenges encountered, such as managing diverse data formats and ensuring metadata quality, and outline strategies for community-driven enhancements. This work not only streamlines the metadata creation workflow but also sets the stage for broader adoption of generative AI in scientific data management.

Additional Material: - Project supported by USGS and ORNL - Codebase will be available on GitHub after paper publication - Fine-tuned LLM models will be available on Hugginface after paper publication

AI for Scientific Discovery

2025-07-11
talk

AI, particularly generative AI, is rapidly transforming the scientific landscape, offering unprecedented opportunities and novel challenges across all stages of research. This Birds of a Feather session aims to bring together researchers, developers, and practitioners to share experiences, discuss best practices, and explore the evolving role of AI in science.

Generative AI in Education

2025-07-10
talk

Generative AI has rapidly changed the landscape of computing and data education. Many learners are utilizing generative AI to assist in learning, so what should educators do to address the opportunities, risks, and potential for their use? The goal of this open discussion session is to bring together community members to unravel these pressing questions in order to not only improve learning outcomes in a variety of diverse contexts: not only students learning in a classroom setting, but also ed-tech or generative AI designers developing new user experiences that aim to improve human capacities, and even scientists interested in learning best practices for communicating results to stakeholders or creating learning materials for colleagues. The open discussion will include ample opportunity for community members to network with each other and build connections after the conference.

Embracing GenAI in Engineering Education: Lessons from the Trenches

2025-07-09
talk

This talk presents a candid reflection on integrating generative AI into an Engineering Computations course, revealing unexpected challenges despite best intentions. Students quickly developed patterns of using AI as a shortcut rather than a learning companion, leading to decreased attendance and an "illusion of competence." I'll discuss the disconnect between instructor expectations and student behavior, analyze how traditional assessment structures reinforced counterproductive AI usage, and share strategies for guiding students toward using AI as a co-pilot rather than a substitute for critical thinking while maintaining academic integrity.

Generative AI in Engineering Education: A Tool for Learning, Not a Replacement for Skills

2025-07-09
talk

Generative Artificial Intelligence (AI) is reshaping engineering education by offering students new ways to engage with complex concepts and content. Ethical concerns including bias, intellectual property, and plagiarism make Generative AI a controversial educational tool. Overreliance on AI may also lead to academic integrity issues, necessitating clear student codes of conduct that define acceptable use. As educators we should carefully design learning objectives to align with transferrable career skills in our fields. By practicing backward design with a focus on career-readiness skills, we can incorporate useful prompt engineering, rapid prototyping, and critical reasoning skills that incorporate generative AI. Engineering students want to develop essential career skills such as critical thinking, communication, and technology. This talk will focus on case studies for using generative AI and rapid prototyping for scientific computing in engineering courses for physics, programming, and technical writing. These courses include assignments and reading examples using NumPy, SciPy, Pandas, etc. in Jupyter notebooks. Embracing generative AI tools has helped students compare, evaluate, and discuss work that was inaccessible before generative AI. This talk explores strategies for using AI in engineering education while accomplishing learning objectives and giving students opportunities to practice career readiness skills.

Building LLM-Powered Applications for Data Scientists and Software Engineers

2025-07-08
talk

This workshop is designed to equip software engineers with the skills to build and iterate on generative AI-powered applications. Participants will explore key components of the AI software development lifecycle through first principles thinking, including prompt engineering, monitoring, evaluations, and handling non-determinism. The session focuses on using multimodal AI models to build applications, such as querying PDFs, while providing insights into the engineering challenges unique to AI systems. By the end of the workshop, participants will know how to build a PDF-querying app, but all techniques learned will be generalizable for building a variety of generative AI applications.

If you're a data scientist, machine learning practitioner, or AI enthusiast, this workshop can also be valuable for learning about the software engineering aspects of AI applications, such as lifecycle management, iterative development, and monitoring, which are critical for production-level AI systems.