Structured Query Language (or SQL for short) is a programming language to manage data in a database system and an essential part of any data engineer’s tool kit. In this tutorial, you will learn how to use SQL to create databases, tables, insert data into them and extract, filter, join data or make calculations using queries. We will use DuckDB, a new open source embedded in-process database system that combines cutting edge database research with dataframe-inspired ease of use. DuckDB is only a pip install away (with zero dependencies), and runs right on your laptop. You will learn how to use DuckDB with your existing Python tools like Pandas, Polars, and Ibis to simplify and speed up your pipelines. Lastly, you will learn how to use SQL to create fast, interactive data visualizations, and how to teach your data how to fly and share it via the Cloud.
talk-data.com
Topic
HTML
HyperText Markup Language (HTML)
370
tagged
Activity Trend
Top Events
Today on the podcast, I interview AI researcher Tony Zhang about some of his recent findings about the effects that fully automated AI has on user decision-making. Tony shares lessons from his recent research study comparing typical recommendation AIs with a “forward-reasoning” approach that nudges users to contribute their own reasoning with process-oriented support that may lead to better outcomes. We’ll look at his two study examples where they provided an AI-enabled interface for pilots tasked with deciding mid-flight the next-best alternate airport to land at, and another scenario asking investors to rebalance an ETF portfolio. The takeaway, taken right from Tony’s research, is that “going forward, we suggest that process-oriented support can be an effective framework to inform the design of both 'traditional' AI-assisted decision-making tools but also GenAI-based tools for thought.”
Highlights/ Skip to:
Tony Zhang’s background (0:46) Context for the study (4:12) Zhang’s metrics for measuring over-reliance on AI (5:06) Understanding the differences between the two design options that study participants were given (15:39) How AI-enabled hints appeared for pilots in each version of the UI (17:49) Using AI to help pilots make good decisions faster (20:15) We look at the ETF portfolio rebalancing use case in the study (27:46) Strategic and tactical findings that Tony took away from his study (30:47) The possibility of commercially viable recommendations based on Tony’s findings (35:40) Closing thoughts (39:04)
Quotes from Today’s Episode
“I wanted to keep the difference between the [recommendation & forward reasoning versions] very minimal to isolate the effect of the recommendation coming in. So, if I showed you screenshots of those two versions, they would look very, very similar. The only difference that you would immediately see is that the recommendation version is showing numbers 1, 2, and 3 for the recommended airports. These [rankings] are not present in the forward-reasoning one [airports are default sorted nearest to furthest]. This actually is a pretty profound difference in terms of the interaction or the decision-making impact that the AI has. There is this normal flight mode and forward reasoning, so that pilots are already immersed in the system and thinking with the system during normal flight. It changes the process that they are going through while they are working with the AI.” Tony (18:50 - 19:42)
“You would imagine that giving the recommendation makes your decision faster, but actually, the recommendations were not faster than the forward-reasoning one. In the forward-reasoning one, during normal flight, pilots could already prepare and have a good overview of their surroundings, giving them time to adjust to the new situation. Now, in normal flight, they don’t know what might be happening, and then suddenly, a passenger emergency happens. While for the recommendation version, the AI just comes into the situation once you have the emergency, and then you need to do this backward reasoning that we talked about initially.” Tony ( 21:12 - 21:58)
“Imagine reviewing code written by other people. It’s always hard because you had no idea what was going on when it was written. That was the idea behind the forward reasoning. You need to look at how people are working and how you can insert AI in a way that it seamlessly fits and provides some benefit to you while keeping you in your usual thought process. So, the way that I see it is you need to identify where the key pain points actually are in your current decision-making process and try to address those instead of just trying to solve the task entirely for users.” Tony (25:40 - 26:19)
Links
LinkedIn: https://www.linkedin.com/in/zelun-tony-zhang/ Augmenting Human Cognition With Generative AI: Lessons From AI-Assisted Decision-Making: https://arxiv.org/html/2504.03207v1
Hannes Mühleisen shows off DuckLake and answers live questions.DuckLake: https://duckdb.org/2025/05/27/ducklake.html
In this podcast episode, we talked with Will Russell about From Hackathons to Developer Advocacy.
About the Speaker: Will Russell is a Developer Advocate at Kestra, known for his videos on workflow orchestration. Previously, Will built open source education programs to help up and coming developers make their first contributions in open source. With a passion for developer education, Will creates technical video content and documentation that makes technologies more approachable for developers. In this episode, we sit down with Will—developer advocate, content creator, and passionate community builder. We’ll hear about his unique path through tech, the lessons he’s learned, and his approach to making complex topics accessible and engaging. Whether you’re curious about open source, hackathons, or what it’s like to bridge the gap between developers and the broader tech community, this conversation is full of insights and inspiration.
🕒 TIMECODES 0:00 Introduction, career journeys, and video setup and workflow 10:41 From hackathons to open source: Early experiences and learning 16:04 Becoming a hackathon organizer and the value of soft skills 23:18 How to organize a hackathon, memorable projects, and creativity 33:39 Major League Hacking: Building community and scaling student programs 41:16 Mentorship, development environments, and onboarding in open source 49:14 Developer advocacy, content strategy, and video tips 57:16 Will’s current projects and future plans for content creation
🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/
🔗 CONNECT WITH WILL LinkedIn - https://www.linkedin.com/in/wrussell1999/ Twitter - https://x.com/wrussell1999 GitHub - https://github.com/wrussell1999 Website - https://wrussell.co.uk/
In this podcast episode, we talked with Lavanya Gupta about Building a Strong Career in Data. About the Speaker: Lavanya is a Carnegie Mellon University (CMU) alumni of the Language Technologies Institute (LTI). She works as a Sr. AI/ML Applied Associate at JPMorgan Chase in their specialized Machine Learning Center of Excellence (MLCOE) vertical. Her latest research on long-context evaluation of LLMs was published in EMNLP 2024.
In addition to having a strong industrial research background of 5+ years, she is also an enthusiastic technical speaker. She has delivered talks at events such as Women in Data Science (WiDS) 2021, PyData, Illuminate AI 2021, TensorFlow User Group (TFUG), and MindHack! Summit. She also serves as a reviewer at top-tier NLP conferences (NeurIPS 2024, ICLR 2025, NAACL 2025). Additionally, through her collaborations with various prestigious organizations, like Anita BOrg and Women in Coding and Data Science (WiCDS), she is committed to mentoring aspiring machine learning enthusiasts.
In this episode, we talk about Lavanya Gupta’s journey from software engineer to AI researcher. She shares how hackathons sparked her passion for machine learning, her transition into NLP, and her current work benchmarking large language models in finance. Tune in for practical insights on building a strong data career and navigating the evolving AI landscape.
🕒 TIMECODES 00:00 Lavanya’s journey from software engineer to AI researcher 10:15 Benchmarking long context language models 12:36 Limitations of large context models in real domains 14:54 Handling large documents and publishing research in industry 19:45 Building a data science career: publications, motivation, and mentorship 25:01 Self-learning, hackathons, and networking 33:24 Community work and Kaggle projects 37:32 Mentorship and open-ended guidance 51:28 Building a strong data science portfolio 🔗 CONNECT WITH LAVANYALinkedIn - / lgupta18 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn - / datatalks-club Twitter - / datatalksclub Website - https://datatalks.club/
The rise of AI-powered code generation tools presents a compelling alternative to traditional UI prototyping frameworks. This talk explores the question: Is it time to ditch the framework overhead and embrace core web technologies (such as HTML, CSS, JavaScript) for faster, more flexible prototyping? We’ll examine the trade-offs between structured frameworks and the granular control offered by a “bare metal” approach, augmented by AI assistance. Learn when leveraging AI with core tech becomes the smarter choice, enabling rapid iteration and bespoke UI designs, and when frameworks still reign supreme.
In this podcast episode, we talked with Eddy Zulkifly about From Supply Chain Management to Digital Warehousing and FinOps
About the Speaker: Eddy Zulkifly is a Staff Data Engineer at Kinaxis, building robust data platforms across Google Cloud, Azure, and AWS. With a decade of experience in data, he actively shares his expertise as a Mentor on ADPList and Teaching Assistant at Uplimit. Previously, he was a Senior Data Engineer at Home Depot, specializing in e-commerce and supply chain analytics. Currently pursuing a Master’s in Analytics at the Georgia Institute of Technology, Eddy is also passionate about open-source data projects and enjoys watching/exploring the analytics behind the Fantasy Premier League.
In this episode, we dive into the world of data engineering and FinOps with Eddy Zulkifly, Staff Data Engineer at Kinaxis. Eddy shares his unconventional career journey—from optimizing physical warehouses with Excel to building digital data platforms in the cloud.
🕒 TIMECODES 0:00 Eddy’s career journey: From supply chain to data engineering 8:18 Tools & learning: Excel, Docker, and transitioning to data engineering 21:57 Physical vs. digital warehousing: Analogies and key differences 31:40 Introduction to FinOps: Cloud cost optimization and vendor negotiations 40:18 Resources for FinOps: Certifications and the FinOps Foundation 45:12 Standardizing cloud cost reporting across AWS/GCP/Azure 50:04 Eddy’s master’s degree and closing thoughts
🔗 CONNECT WITH EDDY Twitter - https://x.com/eddarief Linkedin - https://www.linkedin.com/in/eddyzulkifly/ Github: https://github.com/eyzyly/eyzyly ADPList: https://adplist.org/mentors/eddy-zulkifly
🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/
Hands-on workshop on cleaning and preparing high-quality datasets using Data Prep Kit. Topics include extracting content from PDFs and HTML, cleaning up markup, detecting and removing SPAM content, scoring and removing low-quality documents, identifying and removing PII data, and detecting and removing HAP (Hate Abuse Profanity) speech. More about Data Prep Kit: https://github.com/IBM/data-prep-kit
In this podcast episode, we talked with Bartosz Mikulski about Data Intensive AI.
About the Speaker: Bartosz is an AI and data engineer. He specializes in moving AI projects from the good-enough-for-a-demo phase to production by building a testing infrastructure and fixing the issues detected by tests. On top of that, he teaches programmers and non-programmers how to use AI. He contributed one chapter to the book 97 Things Every Data Engineer Should Know, and he was a speaker at several conferences, including Data Natives, Berlin Buzzwords, and Global AI Developer Days.
In this episode, we discuss Bartosz’s career journey, the importance of testing in data pipelines, and how AI tools like ChatGPT and Cursor are transforming development workflows. From prompt engineering to building Chrome extensions with AI, we dive into practical use cases, tools, and insights for anyone working in data-intensive AI projects. Whether you’re a data engineer, AI enthusiast, or just curious about the future of AI in tech, this episode offers valuable takeaways and real-world experiences.
0:00 Introduction to Bartosz and his background 4:00 Bartosz’s career journey from Java development to AI engineering 9:05 The importance of testing in data engineering 11:19 How to create tests for data pipelines 13:14 Tools and approaches for testing data pipelines 17:10 Choosing Spark for data engineering projects 19:05 The connection between data engineering and AI tools 21:39 Use cases of AI in data engineering and MLOps 25:13 Prompt engineering techniques and best practices 31:45 Prompt compression and caching in AI models 33:35 Thoughts on DeepSeek and open-source AI models 35:54 Using AI for lead classification and LinkedIn automation 41:04 Building Chrome extensions with AI integration 43:51 Comparing Cursor and GitHub Copilot for coding 47:11 Using ChatGPT and Perplexity for AI-assisted tasks 52:09 Hosting static websites and using AI for development 54:27 How blogging helps attract clients and share knowledge 58:15 Using AI to assist with writing and content creation
🔗 CONNECT WITH Bartosz LinkedIn: https://www.linkedin.com/in/mikulskibartosz/ Github: https://github.com/mikulskibartosz Website: https://mikulskibartosz.name/blog/
🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/
In this podcast episode, we talked with Nemanja Radojkovic about MLOps in Corporations and Startups.
About the Speaker: Nemanja Radojkovic is Senior Machine Learning Engineer at Euroclear.
In this event,we’re diving into the world of MLOps, comparing life in startups versus big corporations. Joining us again is Nemanja, a seasoned machine learning engineer with experience spanning Fortune 500 companies and agile startups. We explore the challenges of scaling MLOps on a shoestring budget, the trade-offs between corporate stability and startup agility, and practical advice for engineers deciding between these two career paths. Whether you’re navigating legacy frameworks or experimenting with cutting-edge tools.
1:00 MLOps in corporations versus startups 6:03 The agility and pace of startups 7:54 MLOps on a shoestring budget 12:54 Cloud solutions for startups 15:06 Challenges of cloud complexity versus on-premise 19:19 Selecting tools and avoiding vendor lock-in 22:22 Choosing between a startup and a corporation 27:30 Flexibility and risks in startups 29:37 Bureaucracy and processes in corporations 33:17 The role of frameworks in corporations 34:32 Advantages of large teams in corporations 40:01 Challenges of technical debt in startups 43:12 Career advice for junior data scientists 44:10 Tools and frameworks for MLOps projects 49:00 Balancing new and old technologies in skill development 55:43 Data engineering challenges and reliability in LLMs 57:09 On-premise vs. cloud solutions in data-sensitive industries 59:29 Alternatives like Dask for distributed systems
🔗 CONNECT WITH NEMANJA LinkedIn - / radojkovic Github - https://github.com/baskervilski
🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn - / datatalks-club Twitter - / datatalksclub Website - https://datatalks.club/
Hands-on session exploring how to use Docling for data extraction and cleanup across PDFs, HTML, and DOCX. Includes getting started with Docling, extracting content from documents, handling table and image data, and extracting content from scanned PDF documents using OCR.
Hands-on workshop on using Docling to extract and clean data from documents, including PDFs, HTML, and OCR for scanned PDFs. Key activities: getting started with Docling; extracting content from PDFs/HTML; handling table and image data; extracting content from scanned PDFs using OCR.
In this podcast episode, we talked with Adrian Brudaru about the past, present and future of data engineering.
About the speaker: Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted. As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.
0:00 Introduction to DataTalks.Club 1:05 Discussing trends in data engineering with Adrian 2:03 Adrian's background and journey into data engineering 5:04 Growth and updates on Adrian's company, DLT Hub 9:05 Challenges and specialization in data engineering today 13:00 Opportunities for data engineers entering the field 15:00 The "Modern Data Stack" and its evolution 17:25 Emerging trends: AI integration and Iceberg technology 27:40 DuckDB and the emergence of portable, cost-effective data stacks 32:14 The rise and impact of dbt in data engineering 34:08 Alternatives to dbt: SQLMesh and others 35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions 37:20 Audience questions: Career focus in data roles and AI engineering overlaps 39:00 The role of semantics in data and AI workflows 41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem
🔗 CONNECT WITH ADRIAN BRUDARU Linkedin - / data-team Website - https://adrian.brudaru.com/ 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn - /datatalks-club Twitter - /datatalksclub Website - https://datatalks.club/
In this podcast episode, we talked with Alexander Guschin about launching a career off Kaggle.
About the Speaker:
Alexander Guschin is a Machine Learning Engineer with 10+ years of experience, a Kaggle Grandmaster ranked 5th globally, and a teacher to 100K+ students. He leads DS and SE teams and contributes to open-source ML tools.
0:00 Starting with Machine Learning: Challenges and Early Steps
13:05 Community and Learning Through Kaggle Sessions
17:10 Broadening Skills Through Kaggle Participation
18:54 Early Competitions and Lessons Learned
21:10 Transitioning to Simpler Solutions Over Time
23:51 Benefits of Kaggle for Starting a Career in Machine Learning
29:08 Teamwork vs. Solo Participation in Competitions
31:14 Schoolchildren in AI Competitions
42:33 Transition to Industry and MLOps
50:13 Encouraging teamwork in student projects
50:48 Designing competitive machine learning tasks
52:22 Leaderboard types for tracking performance
53:44 Managing small-scale university classes
54:17 Experience with Coursera and online teaching
59:40 Convincing managers about Kaggle's value
61:38 Secrets of Kaggle competition success
63:11 Generative AI's impact on competitive ML
65:13 Evolution of automated ML solutions
66:22 Reflecting on competitive data science experience
🔗 CONNECT WITH ALEXANDER GUSCHINLinkedin - https://www.linkedin.com/in/1aguschin/Website - https://www.aguschin.com/
🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:https://datatalks.club/slack.html Our events:https://datatalks.club/events.html Datalike Substack -https://datalike.substack.com/ LinkedIn: / datatalks-club
Send us a text In this Part 2 of our conversation with Marco Rota, VP of Strategic Technology Alliances at Lumen Technologies, we dive headfirst into the technical side of Lumen’s mission. From fiber-optics and edge computing to quantum breakthroughs—all propelled by powerful industry partnerships—Marco sheds light on how Lumen is enabling cutting-edge solutions and driving technology transformations. If you’re eager to see how culture, leadership, and advanced tech come together to reshape industries, this episode is for you! 00:31 Lumen's Technology02:37 Transformational Use Cases04:46 Edge Computing06:20 Quantum10:35 Wrapping Up Technology13:25 Supercharged Partnerships15:55 THE Leadership Principle18:16 For Fun23:01 A World-class ChefLinkedin: linkedin.com/in/marcorotapix Website: https://www.lumen.com/en-us/home.html Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
MakingDataSimple #LumenTechnologies #FiberOptics #EdgeComputing #Quantum #TechInnovation #Partnerships #BusinessTransformation #Leadership
Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
Send us a text Welcome back to Making Data Simple, where we explore how data-driven strategies ignite innovation and transform businesses. In this exciting episode, we sit down with Marco Rota, VP of Strategic Technology Alliances at Lumen Technologies, whose incredible journey spans from the glitz of Hollywood to leading-edge telecommunications. Tune in as Marco reveals how embracing a vibrant culture, drawing on lessons from the entertainment industry, and championing new technologies can propel teams and organizations to new heights of success. Get ready for an inspiring, behind-the-scenes look at how “culture eats strategy for breakfast”—and why that’s a game-changer for your organization, too! 01:47 – Meet Marco Rota Marco shares his background and how his career path took him from the dynamic world of Hollywood to a leadership role at Lumen Technologies. He underscores his passion for storytelling, collaboration, and innovation—elements that continue to shape his work in tech.03:35 – Learnings from Hollywood Drawing on Hollywood’s fast-paced environment, Marco highlights the importance of creative thinking and adaptability. He explains how these traits help push organizations to stay ahead of disruption and continually evolve, just like the film industry does to meet audience demands.10:56 – Transitioning to Lumen Technologies Marco describes his shift from entertainment into the telecommunications and technology space. He emphasizes the parallels between Hollywood and tech—both thrive on communication, audience engagement, and cutting-edge production processes.15:55 – What IS Lumen Technologies Marco explains Lumen’s core mission: powering next-generation connectivity, cloud, edge computing, and security solutions. By marrying technology services with an innovative culture, Lumen seeks to help organizations accelerate data-driven transformation.18:29 – Culture versus Technology An organization’s culture can be its greatest asset—or its biggest hurdle. Culture “eats strategy for breakfast” because fostering collaboration, trust, and continuous learning is what truly drives successful technology initiatives forward.24:20 – The Management System Marco talks about the framework for leadership and team alignment at Lumen, which integrates vision, purpose, and measurable goals. This system ensures that cultural values and strategic objectives reinforce each other—resulting in cohesive, energized teams ready to tackle the biggest challenges in tech.Linkedin: linkedin.com/in/marcorotapix Website: https://www.lumen.com/en-us/home.html Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
MakingDataSimple #CultureEatsStrategy #DataInnovation #DigitalTransformation #TechLeadership #PodcastEpisode #HollywoodToTech #LumenTechnologies #BusinessInsights #Inspiration
Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
In this podcast episode, we talked with Andrey Cheptsov about The future of AI infrastructure.
About the Speaker: Andrey Cheptsov is the founder and CEO of dstack, an open-source alternative to Kubernetes and Slurm, built to simplify the orchestration of AI infrastructure. Before dstack, Andrey worked at JetBrains for over a decade helping different teams make the best developer tools. During the event, the guest, Andrey Cheptsov, founder and CEO of dstack, discussed the complexities of AI infrastructure. We explore topics like the challenges of using Kubernetes for AI workloads, the need to rethink container orchestration, and the future of hybrid and cloud-only infrastructures. Andrey also shares insights into the role of on-premise and bare-metal solutions, edge computing, and federated learning. 00:00 Andrey's Career Journey: From JetBrains to DStack 5:00 The Motivation Behind DStack 7:00 Challenges in Machine Learning Infrastructure 10:00 Transitioning from Cloud to On-Prem Solutions 14:30 Reflections on OpenAI's Evolution 17:30 Open Source vs Proprietary Models: A Balanced Perspective 21:01 Monolithic vs. Decentralized AI businesses 22:05 The role of privacy and control in AI for industries like banking and healthcare 30:00 Challenges in training large AI models: GPUs and distributed systems 37:03 DeepSpeed's efficient training approach vs. brute force methods 39:00 Challenges for small and medium businesses: hosting and fine-tuning models 47:01 Managing Kubernetes challenges for AI teams 52:00 Hybrid vs. cloud-only infrastructure 56:03 On-premise vs. bare-metal solutions 58:05 Exploring edge computing and its challenges
🔗 CONNECT WITH ANDREY CHEPTSOV Twitter - / andrey_cheptsov Linkedin - / andrey-cheptsov GitHub - https://github.com/dstackai/dstack/ Website - https://dstack.ai/
🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:https://datatalks.club/slack.html Our events:https://datatalks.club/events.html Datalike Substack -https://datalike.substack.com/ LinkedIn: / datatalks-club
In this podcast episode, we talked with Tamara Atanasoska about building fair AI systems.
About the Speaker:Tamara works on ML explainability, interpretability and fairness as Open Source Software Engineer at probable. She is a maintainer of fairlearn, contributor to scikit-learn and skops. Tamara has both computer science/ software engineering and a computational linguistics(NLP) background.During the event, the guest discussed their career journey from software engineering to open-source contributions, focusing on explainability in AI through Scikit-learn and Fairlearn. They explored fairness in AI, including challenges in credit loans, hiring, and decision-making, and emphasized the importance of tools, human judgment, and collaboration. The guest also shared their involvement with PyLadies and encouraged contributions to Fairlearn. 00:00 Introduction to the event and the community 01:51 Topic introduction: Linguistic fairness and socio-technical perspectives in AI 02:37 Guest introduction: Tamara’s background and career 03:18 Tamara’s career journey: Software engineering, music tech, and computational linguistics 09:53 Tamara’s background in language and computer science 14:52 Exploring fairness in AI and its impact on society 21:20 Fairness in AI models26:21 Automating fairness analysis in models 32:32 Balancing technical and domain expertise in decision-making 37:13 The role of humans in the loop for fairness 40:02 Joining Probable and working on open-source projects 46:20 Scopes library and its integration with Hugging Face 50:48 PyLadies and community involvement 55:41 The ethos of Scikit-learn and Fairlearn
🔗 CONNECT WITH TAMARA ATANASOSKA Linkedin - https://www.linkedin.com/in/tamaraatanasoska GitHub- https://github.com/TamaraAtanasoska
🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:https://datatalks.club/slack.html Our events:https://datatalks.club/events.html Datalike Substack -https://datalike.substack.com/ LinkedIn: / datatalks-club
In this podcast episode, we talked with Agita Jaunzeme about Career choices, transitions and promotions in and out of tech.
About the Speaker:
Agita has designed a career spanning DevOps/DataOps engineering, management, community building, education, and facilitation. She has worked on projects across corporate, startup, open source, and non-governmental sectors. Following her passion, she founded an NGO focusing on the inclusion of expats and locals in Porto. Embodying the values of innovation, automation, and continuous learning, Agita provides practical insights on promotions, career pivots, and aligning work with passion and purpose.
During this event, discussed their career journey, starting with their transition from art school to programming and later into DevOps, eventually taking on leadership roles. They explored the challenges of burnout and the importance of volunteering, founding an NGO to support inclusion, gender equality, and sustainability. The conversation also covered key topics like mentorship, the differences between data engineering and data science, and the dynamics of managing volunteers versus employees. Additionally, the guest shared insights on community management, developer relations, and the importance of product vision and team collaboration.
0:00 Introduction and Welcome 1:28 Guest Introduction: Agita’s Background and Career Highlights 3:05 Transition to Tech: From Art School to Programming 5:40 Exploring DevOps and Growing into Leadership Roles 7:24 Burnout, Volunteering, and Founding an NGO 11:00 Volunteering and Mentorship Initiatives 14:00 Discovering Programming Skills and Early Career Challenges 15:50 Automating Work Processes and Earning a Promotion 19:00 Transitioning from DevOps to Volunteering and Project Management 24:00 Managing Volunteers vs. Employees and Building Organizational Skills 31:07 Personality traits in engineering vs. data roles 33:14 Differences in focus between data engineers and data scientists 36:24 Transitioning from volunteering to corporate work 37:38 The role and responsibilities of a community manager 39:06 Community management vs. developer relations activities 41:01 Product vision and team collaboration 43:35 Starting an NGO and legal processes 46:13 NGO goals: inclusion, gender equality, and sustainability 49:02 Community meetups and activities 51:57 Living off-grid in a forest and sustainability 55:02 Unemployment party and brainstorming session 59:03 Unemployment party: the process and structure
🔗 CONNECT WITH AGITA JAUNZEME Linkedin - /agita
🔗 CONNECT WITH DataTalksClub Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html Datalike Substack - https://datalike.substack.com/ LinkedIn: / datatalks-club
In this podcast episode, we talked with Isabella Bicalho about Career advice, learning, and featuring women in ML and AI.
About the Speaker:
Isabella is a Machine Learning Engineer and Data Scientist with three years of hands-on AI development experience. She draws upon her early computational research expertise to develop ML solutions. While contributing to open-source projects, she runs a newsletter dedicated to showcasing women's accomplishments in data science.
During this event, the guest discussed her transition into machine learning, her freelance work in AI, and the growing AI scene in France. She shared insights on freelancing versus full-time work, the value of open-source contributions, and developing both technical and soft skills. The conversation also covered career advice, mentorship, and her Substack series on women in data science, emphasizing leadership, motivation, and career opportunities in tech.
0:00 Introduction 1:23 Background of Isabella Bicalho 2:02 Transition to machine learning 4:03 Study and work experience 5:00 Living in France and language learning 6:03 Internship experience 8:45 Focus areas of Inria 9:37 AI development in France 10:37 Current freelance work 11:03 Freelancing in machine learning 13:31 Moving from research to freelancing 14:03 Freelance vs. full-time data science 17:00 Finding first freelance client 18:00 Involvement in open-source projects 20:17 Passion for open-source and teamwork 23:52 Starting new projects 25:03 Community project experience 26:02 Teaching and learning 29:04 Contributing to open-source projects 32:05 Open-source tools vs. projects 33:32 Importance of community-driven projects 34:03 Learning resources 36:07 Green space segmentation project 39:02 Developing technical and soft skills 40:31 Gaining insights from industry experts 41:15 Understanding data science roles 41:31 Project challenges and team dynamics 42:05 Turnover in open-source projects 43:05 Managing expectations in open-source work 44:50 Mentorship in projects 46:17 Role of AI tools in learning 47:59 Overcoming learning challenges 48:52 Discussion on substack 49:01 Interview series on women in data 50:15 Insights from women in data science 51:20 Impactful stories from substack 53:01 Leadership challenges in projects 54:19 Career advice and opportunities 56:07 Motivating others to step out of comfort zone 57:06 Contacting for substack story sharing 58:00 Closing remarks and connections
🔗 CONNECT WITH ISABELLA BICALHO Github: github https://github.com/bellabf LinkedIn: / isabella-frazeto
🔗 CONNECT WITH DataTalksClub Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html Datalike Substack - https://datalike.substack.com/ LinkedIn: / datatalks-club