talk-data.com talk-data.com

Event

DataTalks.Club

2020-11-21 – 2025-11-28 Podcasts Visit website ↗

Activities tracked

59

DataTalks.Club - the place to talk about data!

Filtering by: Data Engineering ×

Sessions & talks

Showing 1–25 of 59 · Newest first

Search within this event →

From Full-Time Mom to Head of Data and Cloud - Xia He-Bleinagel

2025-11-28 Listen
podcast_episode
Xia He-Bleinagel (NOW GmbH)

In this talk, Xia He-Bleinagel, Head of Data & Cloud at NOW GmbH, shares her remarkable journey from studying automotive engineering across Europe to leading modern data, cloud, and engineering teams in Germany. We dive into her transition from hands-on engineering to leadership, how she balanced family with career growth, and what it really takes to succeed in today’s cloud, data, and AI job market.

TIMECODES: 00:00 Studying Automotive Engineering Across Europe 08:15 How Andrew Ng Sparked a Machine Learning Journey 11:45 Import–Export Work as an Unexpected Career Boos t17:05 Balancing Family Life with Data Engineering Studies 20:50 From Data Engineer to Head of Data & Cloud 27:46 Building Data Teams & Tackling Tech Debt 30:56 Learning Leadership Through Coaching & Observation 34:17 Management vs. IC: Finding Your Best Fit 38:52 Boosting Developer Productivity with AI Tools 42:47 Succeeding in Germany’s Competitive Data Job Market 46:03 Fast-Track Your Cloud & Data Career 50:03 Mentorship & Supporting Working Moms in Tech 53:03 Cultural & Economic Factors Shaping Women’s Careers 57:13 Top Networking Groups for Women in Data 1:00:13 Turning Domain Expertise into a Data Career Advantage

Connect with Xia- Linkedin - https://www.linkedin.com/in/xia-he-bleinagel-51773585/ - Github - https://github.com/Data-Think-2021 - Website - https://datathinker.de/

Connect with DataTalks.Club: - Join the community - https://datatalks.club/slack.html - Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ - Check other upcoming events - https://lu.ma/dtc-events - GitHub: https://github.com/DataTalksClub - LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

From Theme Parks to Tesla: Building Data Products That Work

2025-10-10 Listen
podcast_episode

In this episode, we talked with Abouzar Abbaspour, a data engineer whose career spans software engineering in Iran, building crowd and recommendation systems at a Dutch theme park, deploying large-scale ML models at Bol.com, and now working at Tesla. Abouzar shares how he bridged diverse industries, tackled real-world data challenges, and adapted to new roles while keeping a hands-on approach to machine learning and engineering.TIMECODES00:00 Career journey and early motivations06:17 Moving to Europe for data science12:18 Working with theme parks and crowd modeling18:29 Lessons from ride and visitor data23:06 Building recommendation systems at Efteling27:26 Joining Bol.com and the Dutch e-commerce industry32:49 Product and brand recommendation logic36:09 Experimenting with "Tinder for brands"40:26 Engagement metrics and product validation43:02 From ML engineering to data engineering roles52:04 Hands-on skills at Tesla and industry expectations57:43 Career growth, learning, and adviceConnect with AbouzarLinkedin -   / abouzar-abbaspour   Website - https://www.abouzar-abbaspour.com/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/...Check other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClubLinkedIn -   / datatalks-club   Twitter -   / datatalksclub   Website - https://datatalks.club/

Berlin PyData 2025 Conference Interviews

2025-09-26 Listen
podcast_episode
Yashasvi Misra (Pure Storage) , Igor Kvachenok (Leuphana University of Lüneburg) , Selim Nowicki (Distill Labs) , Mehdi Ouazza , Gülsah Durmaz

At PyData Berlin, community members and industry voices highlighted how AI and data tooling are evolving across knowledge graphs, MLOps, small-model fine-tuning, explainability, and developer advocacy.

  • Igor Kvachenok (Leuphana University / ProKube) combined knowledge graphs with LLMs for structured data extraction in the polymer industry, and noted how MLOps is shifting toward LLM-focused workflows.
  • Selim Nowicki (Distill Labs) introduced a platform that uses knowledge distillation to fine-tune smaller models efficiently, making model specialization faster and more accessible.
  • Gülsah Durmaz (Architect & Developer) shared her transition from architecture to coding, creating Python tools for design automation and volunteering with PyData through PyLadies.
  • Yashasvi Misra (Pure Storage) spoke on explainable AI, stressing accountability and compliance, and shared her perspective as both a data engineer and active Python community organizer.
  • Mehdi Ouazza (MotherDuck) reflected on developer advocacy through video, workshops, and branding, showing how creative communication boosts adoption of open-source tools like DuckDB.

Igor Kvachenok Master’s student in Data Science at Leuphana University of Lüneburg, writing a thesis on LLM-enhanced data extraction for the polymer industry. Builds RDF knowledge graphs from semi-structured documents and works at ProKube on MLOps platforms powered by Kubeflow and Kubernetes.

Connect: https://www.linkedin.com/in/igor-kvachenok/

Selim Nowicki Founder of Distill Labs, a startup making small-model fine-tuning simple and fast with knowledge distillation. Previously led data teams at Berlin startups like Delivery Hero, Trade Republic, and Tier Mobility. Sees parallels between today’s ML tooling and dbt’s impact on analytics.

Connect: https://www.linkedin.com/in/selim-nowicki/

Gülsah Durmaz Architect turned developer, creating Python-based tools for architectural design automation with Rhino and Grasshopper. Active in PyLadies and a volunteer at PyData Berlin, she values the community for networking and learning, and aims to bring ML into architecture workflows.

Connect: https://www.linkedin.com/in/gulsah-durmaz/

Yashasvi (Yashi) Misra Data Engineer at Pure Storage, community organizer with PyLadies India, PyCon India, and Women Techmakers. Advocates for inclusive spaces in tech and speaks on explainable AI, bridging her day-to-day in data engineering with her passion for ethical ML.

Connect: https://www.linkedin.com/in/misrayashasvi/

Mehdi Ouazza Developer Advocate at MotherDuck, formerly a data engineer, now focused on building community and education around DuckDB. Runs popular YouTube channels ("mehdio DataTV" and "MotherDuck") and delivered a hands-on workshop at PyData Berlin. Blends technical clarity with creative storytelling.

Connect: https://www.linkedin.com/in/mehd-io/

From Simulations to Freelance Data Engineering: Orell's Journey Out of Academia and Into Consulting - Orell Garten

2025-08-01 Listen
podcast_episode

In this episode, we talk with Orell about his journey from electrical engineering to freelancing in data engineering. Exploring lessons from startup life, working with messy industrial data, the realities of freelancing, and how to stay up to date with new tools.

Topics covered: Why Orel left a PhD and a simulation‑focused start‑up after Covid hitWhat he learned trying (and failing) to commercialise medical‑imaging simulationsThe first freelance project and the long, quiet months that followedHow he now finds clients, keeps projects small and delivers value quicklyTypical work he does for industrial companies: parsing messy machine logs, building simple pipelines, adding structure laterFavorite everyday tools (Python, DuckDB, a bit of C++) and the habit of blocking time for learningAdvice for anyone thinking about freelancing: cash runway, networking, and focusing on problems rather than “perfect” tech choices A practical conversation for listeners who are curious about moving from research or permanent roles into freelance data engineering.

🕒 TIMECODES 0:00 Orel’s career and move to freelancing 9:04 Startup experience and data engineering lessons 16:05 Academia vs. startups and starting freelancing 25:33 Early freelancing challenges and networking 34:22 Freelance data engineering and messy industrial data 43:27 Staying practical, learning tools, and growth 50:33 Freelancing challenges and client acquisition 58:37 Tools, problem-solving, and manual work

🔗 CONNECT WITH ORELL Twitter - https://bsky.app/profile/orgarten.bsk... LinkedIn - / ogarten
Github - https://github.com/orgarten Website - https://orellgarten.com

🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events GitHub: https://github.com/DataTalksClub LinkedIn - / datatalks-club
Twitter - / datatalksclub
Website - https://datatalks.club/

🔗 CONNECT WITH ALEXEY Connect with Alexey Twitter - / al_grigor
Linkedin - / agrigorev

From Supply Chain Management to Digital Warehousing and FinOps - Eddy Zulkifly

2025-04-04 Listen
podcast_episode
Eddy Zulkifly (Kinaxis)

In this podcast episode, we talked with Eddy Zulkifly about From Supply Chain Management to Digital Warehousing and FinOps

About the Speaker: Eddy Zulkifly is a Staff Data Engineer at Kinaxis, building robust data platforms across Google Cloud, Azure, and AWS. With a decade of experience in data, he actively shares his expertise as a Mentor on ADPList and Teaching Assistant at Uplimit. Previously, he was a Senior Data Engineer at Home Depot, specializing in e-commerce and supply chain analytics. Currently pursuing a Master’s in Analytics at the Georgia Institute of Technology, Eddy is also passionate about open-source data projects and enjoys watching/exploring the analytics behind the Fantasy Premier League.

In this episode, we dive into the world of data engineering and FinOps with Eddy Zulkifly, Staff Data Engineer at Kinaxis. Eddy shares his unconventional career journey—from optimizing physical warehouses with Excel to building digital data platforms in the cloud.

🕒 TIMECODES 0:00 Eddy’s career journey: From supply chain to data engineering 8:18 Tools & learning: Excel, Docker, and transitioning to data engineering 21:57 Physical vs. digital warehousing: Analogies and key differences 31:40 Introduction to FinOps: Cloud cost optimization and vendor negotiations 40:18 Resources for FinOps: Certifications and the FinOps Foundation 45:12 Standardizing cloud cost reporting across AWS/GCP/Azure 50:04 Eddy’s master’s degree and closing thoughts

🔗 CONNECT WITH EDDY Twitter - https://x.com/eddarief Linkedin - https://www.linkedin.com/in/eddyzulkifly/ Github: https://github.com/eyzyly/eyzyly ADPList: https://adplist.org/mentors/eddy-zulkifly

🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/

Data Intensive AI - Bartosz Mikulski

2025-03-21 Listen
podcast_episode

In this podcast episode, we talked with Bartosz Mikulski about Data Intensive AI.

About the Speaker: Bartosz is an AI and data engineer. He specializes in moving AI projects from the good-enough-for-a-demo phase to production by building a testing infrastructure and fixing the issues detected by tests. On top of that, he teaches programmers and non-programmers how to use AI. He contributed one chapter to the book 97 Things Every Data Engineer Should Know, and he was a speaker at several conferences, including Data Natives, Berlin Buzzwords, and Global AI Developer Days. 

In this episode, we discuss Bartosz’s career journey, the importance of testing in data pipelines, and how AI tools like ChatGPT and Cursor are transforming development workflows. From prompt engineering to building Chrome extensions with AI, we dive into practical use cases, tools, and insights for anyone working in data-intensive AI projects. Whether you’re a data engineer, AI enthusiast, or just curious about the future of AI in tech, this episode offers valuable takeaways and real-world experiences.

0:00 Introduction to Bartosz and his background 4:00 Bartosz’s career journey from Java development to AI engineering 9:05 The importance of testing in data engineering 11:19 How to create tests for data pipelines 13:14 Tools and approaches for testing data pipelines 17:10 Choosing Spark for data engineering projects 19:05 The connection between data engineering and AI tools 21:39 Use cases of AI in data engineering and MLOps 25:13 Prompt engineering techniques and best practices 31:45 Prompt compression and caching in AI models 33:35 Thoughts on DeepSeek and open-source AI models 35:54 Using AI for lead classification and LinkedIn automation 41:04 Building Chrome extensions with AI integration 43:51 Comparing Cursor and GitHub Copilot for coding 47:11 Using ChatGPT and Perplexity for AI-assisted tasks 52:09 Hosting static websites and using AI for development 54:27 How blogging helps attract clients and share knowledge 58:15 Using AI to assist with writing and content creation

🔗 CONNECT WITH Bartosz LinkedIn: https://www.linkedin.com/in/mikulskibartosz/ Github: https://github.com/mikulskibartosz Website: https://mikulskibartosz.name/blog/

🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/

MLOps in Corporations and Startups - Nemanja Radojkovic

2025-03-14 Listen
podcast_episode

In this podcast episode, we talked with Nemanja Radojkovic about MLOps in Corporations and Startups.

About the Speaker: Nemanja Radojkovic is Senior Machine Learning Engineer at Euroclear.

In this event,we’re diving into the world of MLOps, comparing life in startups versus big corporations. Joining us again is Nemanja, a seasoned machine learning engineer with experience spanning Fortune 500 companies and agile startups. We explore the challenges of scaling MLOps on a shoestring budget, the trade-offs between corporate stability and startup agility, and practical advice for engineers deciding between these two career paths. Whether you’re navigating legacy frameworks or experimenting with cutting-edge tools.

1:00 MLOps in corporations versus startups 6:03 The agility and pace of startups 7:54 MLOps on a shoestring budget 12:54 Cloud solutions for startups 15:06 Challenges of cloud complexity versus on-premise 19:19 Selecting tools and avoiding vendor lock-in 22:22 Choosing between a startup and a corporation 27:30 Flexibility and risks in startups 29:37 Bureaucracy and processes in corporations 33:17 The role of frameworks in corporations 34:32 Advantages of large teams in corporations 40:01 Challenges of technical debt in startups 43:12 Career advice for junior data scientists 44:10 Tools and frameworks for MLOps projects 49:00 Balancing new and old technologies in skill development 55:43 Data engineering challenges and reliability in LLMs 57:09 On-premise vs. cloud solutions in data-sensitive industries 59:29 Alternatives like Dask for distributed systems

🔗 CONNECT WITH NEMANJA LinkedIn -   / radojkovic   Github - https://github.com/baskervilski

🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events  LinkedIn -   / datatalks-club    Twitter -   / datatalksclub    Website - https://datatalks.club/ 

Trends in Data Engineering – Adrian Brudaru

2025-03-07 Listen
podcast_episode

In this podcast episode, we talked with Adrian Brudaru about ​the past, present and future of data engineering.

About the speaker: Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted. As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.

0:00 Introduction to DataTalks.Club 1:05 Discussing trends in data engineering with Adrian 2:03 Adrian's background and journey into data engineering 5:04 Growth and updates on Adrian's company, DLT Hub 9:05 Challenges and specialization in data engineering today 13:00 Opportunities for data engineers entering the field 15:00 The "Modern Data Stack" and its evolution 17:25 Emerging trends: AI integration and Iceberg technology 27:40 DuckDB and the emergence of portable, cost-effective data stacks 32:14 The rise and impact of dbt in data engineering 34:08 Alternatives to dbt: SQLMesh and others 35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions 37:20 Audience questions: Career focus in data roles and AI engineering overlaps 39:00 The role of semantics in data and AI workflows 41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem

🔗 CONNECT WITH ADRIAN BRUDARU Linkedin -  / data-team   Website - https://adrian.brudaru.com/ 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn -  /datatalks-club   Twitter -  /datatalksclub   Website - https://datatalks.club/

Career choices, transitions and promotions in and out of tech - Agita Jaunzeme

2025-01-10 Listen
podcast_episode

In this podcast episode, we talked with Agita Jaunzeme about Career choices, transitions and promotions in and out of tech.

About the Speaker:

Agita has designed a career spanning DevOps/DataOps engineering, management, community building, education, and facilitation. She has worked on projects across corporate, startup, open source, and non-governmental sectors. Following her passion, she founded an NGO focusing on the inclusion of expats and locals in Porto. Embodying the values of innovation, automation, and continuous learning, Agita provides practical insights on promotions, career pivots, and aligning work with passion and purpose.

During this event, discussed their career journey, starting with their transition from art school to programming and later into DevOps, eventually taking on leadership roles. They explored the challenges of burnout and the importance of volunteering, founding an NGO to support inclusion, gender equality, and sustainability. The conversation also covered key topics like mentorship, the differences between data engineering and data science, and the dynamics of managing volunteers versus employees. Additionally, the guest shared insights on community management, developer relations, and the importance of product vision and team collaboration.

0:00 Introduction and Welcome 1:28 Guest Introduction: Agita’s Background and Career Highlights 3:05 Transition to Tech: From Art School to Programming 5:40 Exploring DevOps and Growing into Leadership Roles 7:24 Burnout, Volunteering, and Founding an NGO 11:00 Volunteering and Mentorship Initiatives 14:00 Discovering Programming Skills and Early Career Challenges 15:50 Automating Work Processes and Earning a Promotion 19:00 Transitioning from DevOps to Volunteering and Project Management 24:00 Managing Volunteers vs. Employees and Building Organizational Skills 31:07 Personality traits in engineering vs. data roles 33:14 Differences in focus between data engineers and data scientists 36:24 Transitioning from volunteering to corporate work 37:38 The role and responsibilities of a community manager 39:06 Community management vs. developer relations activities 41:01 Product vision and team collaboration 43:35 Starting an NGO and legal processes 46:13 NGO goals: inclusion, gender equality, and sustainability 49:02 Community meetups and activities 51:57 Living off-grid in a forest and sustainability 55:02 Unemployment party and brainstorming session 59:03 Unemployment party: the process and structure

🔗 CONNECT WITH AGITA JAUNZEME Linkedin - /agita

🔗 CONNECT WITH DataTalksClub Join DataTalks.Club: ⁠https://datatalks.club/slack.html⁠ Our events: ⁠https://datatalks.club/events.html⁠ Datalike Substack - ⁠https://datalike.substack.com/⁠ LinkedIn: ⁠  / datatalks-club  

Using Data to Create Liveable Cities - Rachel Lim

2024-11-01 Listen
podcast_episode

We talked about:

00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science

About the speaker:

Rachel is an urban data scientist dedicated to creating liveable cities through the innovative use of data. With a background in geography, and a masters in urban data science, she blends qualitative and quantitative analysis to tackle urban challenges. Her aim is to integrate data driven techniques with urban design to foster sustainable and equitable urban environments. 

Links: - https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html

00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science

Join our slack: https: //datatalks.club/slack.html

DataOps, Observability, and The Cure for Data Team Blues - Christopher Bergh

2024-08-15 Listen
podcast_episode
Johanna Berer (DataTalks.Club) , Christopher Bergh (DataKitchen)

0:00

hi everyone Welcome to our event this event is brought to you by data dos club which is a community of people who love

0:06

data and we have weekly events and today one is one of such events and I guess we

0:12

are also a community of people who like to wake up early if you're from the states right Christopher or maybe not so

0:19

much because this is the time we usually have uh uh our events uh for our guests

0:27

and presenters from the states we usually do it in the evening of Berlin time but yes unfortunately it kind of

0:34

slipped my mind but anyways we have a lot of events you can check them in the

0:41

description like there's a link um I don't think there are a lot of them right now on that link but we will be

0:48

adding more and more I think we have like five or six uh interviews scheduled so um keep an eye on that do not forget

0:56

to subscribe to our YouTube channel this way you will get notified about all our future streams that will be as awesome

1:02

as the one today and of course very important do not forget to join our community where you can hang out with

1:09

other data enthusiasts during today's interview you can ask any question there's a pin Link in live chat so click

1:18

on that link ask your question and we will be covering these questions during the interview now I will stop sharing my

1:27

screen and uh there is there's a a message in uh and Christopher is from

1:34

you so we actually have this on YouTube but so they have not seen what you wrote

1:39

but there is a message from to anyone who's watching this right now from Christopher saying hello everyone can I

1:46

call you Chris or you okay I should go I should uh I should look on YouTube then okay yeah but anyways I'll you don't

1:53

need like you we'll need to focus on answering questions and I'll keep an eye

1:58

I'll be keeping an eye on all the question questions so um

2:04

yeah if you're ready we can start I'm ready yeah and you prefer Christopher

2:10

not Chris right Chris is fine Chris is fine it's a bit shorter um

2:18

okay so this week we'll talk about data Ops again maybe it's a tradition that we talk about data Ops every like once per

2:25

year but we actually skipped one year so because we did not have we haven't had

2:31

Chris for some time so today we have a very special guest Christopher Christopher is the co-founder CEO and

2:37

head chef or hat cook at data kitchen with 25 years of experience maybe this

2:43

is outdated uh cuz probably now you have more and maybe you stopped counting I

2:48

don't know but like with tons of years of experience in analytics and software engineering Christopher is known as the

2:55

co-author of the data Ops cookbook and data Ops Manifesto and it's not the

3:00

first time we have Christopher here on the podcast we interviewed him two years ago also about data Ops and this one

3:07

will be about data hops so we'll catch up and see what actually changed in in

3:13

these two years and yeah so welcome to the interview well thank you for having

3:19

me I'm I'm happy to be here and talking all things related to data Ops and why

3:24

why why bother with data Ops and happy to talk about the company or or what's changed

3:30

excited yeah so let's dive in so the questions for today's interview are prepared by Johanna berer as always

3:37

thanks Johanna for your help so before we start with our main topic for today

3:42

data Ops uh let's start with your ground can you tell us about your career Journey so far and also for those who

3:50

have not heard have not listened to the previous podcast maybe you can um talk

3:55

about yourself and also for those who did listen to the previous you can also maybe give a summary of what has changed

4:03

in the last two years so we'll do yeah so um my name is Chris so I guess I'm

4:09

a sort of an engineer so I spent about the first 15 years of my career in

4:15

software sort of working and building some AI systems some non- AI systems uh

4:21

at uh Us's NASA and MIT linol lab and then some startups and then um

4:30

Microsoft and then about 2005 I got I got the data bug uh I think you know my

4:35

kids were small and I thought oh this data thing was easy and I'd be able to go home uh for dinner at 5 and life

4:41

would be fine um because I was a big you started your own company right and uh it didn't work out that way

4:50

and um and what was interesting is is for me it the problem wasn't doing the

4:57

data like I we had smart people who did data science and data engineering the act of creating things it was like the

5:04

systems around the data that were hard um things it was really hard to not have

5:11

errors in production and I would sort of driving to work and I had a Blackberry at the time and I would not look at my

5:18

Blackberry all all morning I had this long drive to work and I'd sit in the parking lot and take a deep breath and

5:24

look at my Blackberry and go uh oh is there going to be any problems today and I'd be and if there wasn't I'd walk and

5:30

very happy um and if there was I'd have to like rce myself um and you know and

5:36

then the second problem is the team I worked for we just couldn't go fast enough the customers were super

5:42

demanding they didn't care they all they always thought things should be faster and we are always behind and so um how

5:50

do you you know how do you live in that world where things are breaking left and right you're terrified of making errors

5:57

um and then second you just can't go fast enough um and it's preh Hadoop era

6:02

right it's like before all this big data Tech yeah before this was we were using

6:08

uh SQL Server um and we actually you know we had smart people so we we we

6:14

built an engine in SQL Server that made SQL Server a column or

6:20

database so we built a column or database inside of SQL Server um so uh

6:26

in order to make certain things fast and and uh yeah it was it was really uh it's not

6:33

bad I mean the principles are the same right before Hadoop it's it's still a database there's still indexes there's

6:38

still queries um things like that we we uh at the time uh you would use olap

6:43

engines we didn't use those but you those reports you know are for models it's it's not that different um you know

6:50

we had a rack of servers instead of the cloud um so yeah and I think so what what I

6:57

took from that was uh it's just hard to run a team of people to do do data and analytics and it's not

7:05

really I I took it from a manager perspective I started to read Deming and

7:11

think about the work that we do as a factory you know and in a factory that produces insight and not automobiles um

7:18

and so how do you run that factory so it produces things that are good of good

7:24

quality and then second since I had come from software I've been very influenced

7:29

by by the devops movement how you automate deployment how you run in an agile way how you

7:35

produce um how you how you change things quickly and how you innovate and so

7:41

those two things of like running you know running a really good solid production line that has very low errors

7:47

um and then second changing that production line at at very very often they're kind of opposite right um and so

7:55

how do you how do you as a manager how do you technically approach that and

8:00

then um 10 years ago when we started data kitchen um we've always been a profitable company and so we started off

8:07

uh with some customers we started building some software and realized that we couldn't work any other way and that

8:13

the way we work wasn't understood by a lot of people so we had to write a book and a Manifesto to kind of share our our

8:21

methods and then so yeah we've been in so we've been in business now about a little over 10

8:28

years oh that's cool and uh like what

8:33

uh so let's talk about dat offs and you mentioned devops and how you were inspired by that and by the way like do

8:41

you remember roughly when devops as I think started to appear like when did people start calling these principles

8:49

and like tools around them as de yeah so agile Manifesto well first of all the I

8:57

mean I had a boss in 1990 at Nasa who had this idea build a

9:03

little test a little learn a lot right that was his Mantra and then which made

9:09

made a lot of sense um and so and then the sort of agile software Manifesto

9:14

came out which is very similar in 2001 and then um the sort of first real

9:22

devops was a guy at Twitter started to do automat automated deployment you know

9:27

push a button and that was like 200 Nish and so the first I think devops

9:33

Meetup was around then so it's it's it's been 15 years I guess 6 like I was

9:39

trying to so I started my career in 2010 so I my first job was a Java

9:44

developer and like I remember for some things like we would just uh SFTP to the

9:52

machine and then put the jar archive there and then like keep our fingers crossed that it doesn't break uh uh like

10:00

it was not really the I wouldn't call it this way right you were deploying you

10:06

had a Dey process I put it yeah

10:11

right was that so that was documented too it was like put the jar on production cross your

10:17

fingers I think there was uh like a page on uh some internal Viki uh yeah that

10:25

describes like with passwords and don't like what you should do yeah that was and and I think what's interesting is

10:33

why that changed right and and we laugh at it now but that was why didn't you

10:38

invest in automating deployment or a whole bunch of automated regression

10:44

tests right that would run because I think in software now that would be rare

10:49

that people wouldn't use C CD they wouldn't have some automated tests you know functional

10:56

regression tests that would be the

Working as a Core Developer in the Scikit-Learn Universe - Guillaume Lemaître

2024-07-26 Listen
podcast_episode

In this podcast episode, we talked with Guillaume Lemaître about navigating scikit-learn and imbalanced-learn.

🔗 CONNECT WITH Guillaume Lemaître LinkedIn - https://www.linkedin.com/in/guillaume-lemaitre-b9404939/ Twitter - https://x.com/glemaitre58 Github - https://github.com/glemaitre Website - https://glemaitre.github.io/

🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks-club.slack.com/join/shared_invite/zt-2hu0sjeic-ESN7uHt~aVWc8tD3PefSlA#/shared-invite/email Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/u/0/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/

🔗 CONNECT WITH ALEXEY Twitter - https://twitter.com/Al_Grigor Linkedin - https://www.linkedin.com/in/agrigorev/

🎙 ABOUT THE PODCAST At DataTalksClub, we organize live podcasts that feature a diverse range of guests from the data field. Each podcast is a free-form conversation guided by a prepared set of questions, designed to learn about the guests’ career trajectories, life experiences, and practical advice. These insightful discussions draw on the expertise of data practitioners from various backgrounds.

We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links.

You can access all the podcast episodes here - https://datatalks.club/podcast.html

📚Check our free online courses ML Engineering course - http://mlzoomcamp.com Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp Analytics in Stock Markets - https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp LLM course - https://github.com/DataTalksClub/llm-zoomcamp Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html

👋🏼 GET IN TOUCH If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev

If you're a company and want to support us, contact at [email protected]

Berlin Buzzwords 2024

2024-07-06 Listen
podcast_episode

We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links.

You can access all the podcast episodes here - https://datatalks.club/podcast.html

📚Check our free online courses ML Engineering course - http://mlzoomcamp.com Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp Analytics in Stock Markets - https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp LLM course - https://github.com/DataTalksClub/llm-zoomcamp Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html

👋🏼 GET IN TOUCH If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev

If you’re a company, support us at [email protected]

Community Building and Teaching in AI & Tech - Erum Afzal

2024-05-10 Listen
podcast_episode
Erum Afzal (Omdena / Omdena Academy)

We talked about:

Erum's Background Omdena Academy and Erum’s Role There Omdena’s Community and Projects Course Development and Structure at Omdena Academy Student and Instructor Engagement Engagement and Motivation The Role of Teaching in Community Building The Importance of Communities for Career Building Advice for Aspiring Instructors and Freelancers DS and ML Talent Market Saturation Resources for Learning AI and Community Building Erum’s Resource Recommendations

Links:

LinkedIn: https://www.linkedin.com/in/erum-afzal-64827b24/

Twitter:  https://twitter.com/Erum55449739

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Working in Open Source - Probabl.ai and sklearn - Vincent Warmerdam

2024-05-03 Listen
podcast_episode

We talked about:

Vincent’s Background SciKit Learn’s History and Company Formation Maintaining and Transitioning Open Source Projects Teaching and Learning Through Open Source Role of Developer Relations and Content Creation Teaching Through Calm Code and The Importance of Content Creation Current Projects and Future Plans for Calm Code Data Processing Tricks and The Importance of Innovation Learning the Fundamentals and Changing the Way You See a Problem Dev Rel and Core Dev in One Why :probabl. Needs a Dev Rel Exploration of Skrub and Advanced Data Processing Personal Insights on SciKit Learn and Industry Trends Vincent’s Upcoming Projects

Links:

probabl. YouTube channel: https://www.youtube.com/@UCIat2Cdg661wF5DQDWTQAmg Calmcode website: https://calmcode.io/ probabl. website: https://probabl.ai/

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

AI for Ecology, Biodiversity, and Conservation - Tanya Berger-Wolf

2024-04-26 Listen
podcast_episode

Links:

Biodiversity and Artificial Intelligence pdf: https://www.gpai.ai/projects/responsible-ai/environment/biodiversity-and-AI-opportunities-recommendations-for-action.pdf

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Knowledge Graphs and LLMs Across Academia and Industry - Anahita Pakiman

2024-04-05 Listen
podcast_episode

We talked about:

Anahita's Background Mechanical Engineering and Applied Mechanics Finite Element Analysis vs. Machine Learning Optimization and Semantic Reporting Application of Knowledge Graphs in Research Graphs vs Tabular Data Computational graphs Graph Data Science and Graph Machine Learning Combining Knowledge Graphs and Large Language Models (LLMs) Practical Applications and Projects Challenges and Learnings Anahita’s Recommendations

Links:

GitHub repo: https://github.com/antahiap/ADPT-LRN-PHYS/tree/main

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Inclusive Data Leadership Coaching - Tereza Iofciu

2024-03-29 Listen
podcast_episode

We talked about:

Tereza’s background Switching from an Individual Contributor to Lead Python Pizza and the pizza management metaphor Learning to figure things out on your own and how to receive feedback Tereza as a leadership coach Podcasts Tereza’s coaching framework (selling yourself vs bragging) The importance of retrospectives The importance of communication and active listening Convincing people you don’t have power over Building relationships and empathy Inclusive leadership

Links:

LinkedIn: https://www.linkedin.com/in/tereza-iofciu/ Twitter: https://twitter.com/terezaif Github: https://github.com/terezaif Website: https:// terezaiofciu.com

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Building Production Search Systems - Daniel Svonava

2024-03-22 Listen
podcast_episode

Links:

VectorHub: https://superlinked.com/vectorhub/?utm_source=community&utm_medium=podcast&utm_campaign=datatalks Daniel's LinkedIn: https://www.linkedin.com/in/svonava/

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

​This podcast is sponsored by VectorHub, a free open-source learning community for all things vector embeddings and information retrieval systems.

Building Machine Learning Products - Reem Mahmoud

2024-03-16 Listen
podcast_episode

We talked about:

Reem’s background Context-aware sensing and transfer learning Shifting focus from PhD to industry Reem’s experience with startups and dealing with prejudices towards PhDs AI interviewing solution How candidates react to getting interviewed by an AI avatar End-to-end overview of a machine learning project The pitfalls of using LLMs in your process Mitigating biases Addressing specific requirements for specific roles Reem’s resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/reemmahmoud/recent-activity/all/ Website: https://topmate.io/reem_mahmoud

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Make an Impact Through Volunteering Open Source Work - Sara EL-ATEIF

2024-02-23 Listen
podcast_episode
Sara EL-ATEIF (Google)

We talked about:

Sara’s background On being a Google PhD fellow Sara’s volunteer work Finding AI volunteer work Sara’s Fruit Punch challenge How to take part in AI challenges AI Wonder Girls Hackathons Things people often miss in AI projects and hackathons Getting creative Fostering your social media Tips on applying for volunteer projects Why it’s worth doing volunteer projects Opportunities for data engineers and students Sara’s newsletter suggestions

Links:

Dev and AI hackathons: https://devpost.com/ Healthcare-focused challenges: https://grand-challenge.org/challenges/ Volunteering in projects (AI4Good): https://www.fruitpunch.ai/ Volunteering in projects (AI4Good) 2: https://www.omdena.com/ Twitter: https://twitter.com/el_ateifSara Instagram: https://www.instagram.com/saraelateif/ LinkedIn: https://www.linkedin.com/in/sara-el-ateif/ Youtube: www.youtube.com/@elateifsara

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Accelerating The Job Hunt for The Perfect Job in Tech - Sarah Mestiri

2024-02-02 Listen
podcast_episode
Sarah Mestiri (Thriving Career Moms)

We talked about:

Sarah’s background How Sarah became a coach and found her niche Sarah’s clients How Sarah helps her clients find the perfect job Finding a specialization Informational interviews Building a connection for mutual benefit The networking strategy Listing your projects in the CV The importance of doing research yourself and establishing your interests How to land a part-time job when the company wants full-time Age is not a factor Applying for jobs after finishing a course and the importance of sharing your learnings Sarah resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/sarahmestiri/ Website: https://thrivingcareermoms.com/ Personal Website: https://www.sarahmestiri.com/ Youtube channel: https://www.youtube.com/@thrivingcareermoms444

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Machine Learning Engineering in Finance - Nemanja Radojkovic

2024-01-31 Listen
podcast_episode

We talked about:

Nemanja’s background

When Nemanja first work as a data person Typical problems that ML Ops folks solve in the financial sector What Nemanja currently does as an ML Engineer The obstacle of implementing new things in financial sector companies Going through the hurdles of DevOps Working with an on-premises cluster “ML Ops on a Shoestring” (You don’t need fancy stuff to start w/ ML Ops) Tactical solutions Platform work and code work Programming and soft skills needed to be an ML Engineer The challenges of transitioning from and electrical engineering and sales to ML Ops The ML Ops tech stack for beginners Working on projects to determine which skills you need

Links:

LinkedIn: https://www.linkedin.com/in/radojkovic/

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Data Engineering for Fraud Prevention - Angela Ramirez

2023-10-06 Listen
podcast_episode
Angela Ramirez (Sam's Club)

We talked about:

Angela's background Angela's role at Sam's Club The usefulness of knowing ML as a data engineer Angela's career path Transitioning from data analyst to data engineer/system designer Best practices for system design and data engineering Working with document databases Working with network-based databases Detecting fraud with a network-based database Selecting the database type to work with Neo4j vs Postgres The importance of having software engineering knowledge in data engineering Data quality check tooling The greatest challenges in data engineering Debugging and finding the root cause of a failed job What kinds of tools Angela uses on a daily basis Working with external data sources Angela's resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/aramirez1305/ Twitter: https://twitter.com/angelamaria__r Github: https://github.com/aramir62 Previous podcast talk: https://twitter.com/i/spaces/1OwGWwZAZDnGQ?s=20

Free ML Engineering course: http://mlzoomcamp.com

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

From Data Manager to Data Architect - Loïc Magnien

2023-09-29 Listen
podcast_episode

We talked about:

Loïc's background Data management Loïc's transition to data engineer Challenges in the transition to data engineering What is a data architect? The output of a data architect's work Establishing metrics and dimensions The importance of communication Setting up best practices for the team Staying relevant and tech-watching Setting up specifications for a pipeline Be agile, create a POC, iterate ASAP, and build reusable templates Reaching out to Loïc for questions

Links:

Loiic LinkedIn: https://www.linkedin.com/in/loicmagnien/

Free ML Engineering course: http://mlzoomcamp.com

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html