Marketing

How AI is changing software engineering at Shopify with Farhan Thawar

2025-07-02 · The Pragmatic Engineer Listen

podcast_episode

by Gergely Orosz , Farhan Thawar (Shopify)

AI/ML Analytics GitHub LLM SaaS Cyber Security

Supported by Our Partners •⁠ WorkOS — The modern identity platform for B2B SaaS. •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. • Sonar — Code quality and code security for ALL code. — What happens when a company goes all in on AI? At Shopify, engineers are expected to utilize AI tools, and they’ve been doing so for longer than most. Thanks to early access to models from GitHub Copilot, OpenAI, and Anthropic, the company has had a head start in figuring out what works. In this live episode from LDX3 in London, I spoke with Farhan Thawar, VP of Engineering, about how Shopify is building with AI across the entire stack. We cover the company’s internal LLM proxy, its policy of unlimited token usage, and how interns help push the boundaries of what’s possible. In this episode, we cover: • How Shopify works closely with AI labs • The story behind Shopify’s recent Code Red • How non-engineering teams are using Cursor for vibecoding • Tobi Lütke’s viral memo and Shopify’s expectations around AI • A look inside Shopify’s LLM proxy—used for privacy, token tracking, and more • Why Shopify places no limit on AI token spending • Why AI-first isn’t about reducing headcount—and why Shopify is hiring 1,000 interns • How Shopify’s engineering department operates and what’s changed since adopting AI tooling • Farhan’s advice for integrating AI into your workflow • And much more! — Timestamps (00:00) Intro (02:07) Shopify’s philosophy: “hire smart people and pair with them on problems” (06:22) How Shopify works with top AI labs (08:50) The recent Code Red at Shopify (10:47) How Shopify became early users of GitHub Copilot and their pivot to trying multiple tools (12:49) The surprising ways non-engineering teams at Shopify are using Cursor (14:53) Why you have to understand code to submit a PR at Shopify (16:42) AI tools' impact on SaaS (19:50) Tobi Lütke’s AI memo (21:46) Shopify’s LLM proxy and how they protect their privacy (23:00) How Shopify utilizes MCPs (26:59) Why AI tools aren’t the place to pinch pennies (30:02) Farhan’s projects and favorite AI tools (32:50) Why AI-first isn’t about freezing headcount and the value of hiring interns (36:20) How Shopify’s engineering department operates, including internal tools (40:31) Why Shopify added coding interviews for director-level and above hires (43:40) What has changed since Spotify added AI tooling (44:40) Farhan’s advice for implementing AI tools — The Pragmatic Engineer deepdives relevant for this episode: • How Shopify built its Live Globe for Black Friday • Inside Shopify's leveling split • Real-world engineering challenges: building Cursor • How Anthropic built Artifacts — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

#274: Real Talk About Synthetic Data with Winston Li

2025-06-24 · The Analytics Power Hour Listen

podcast_episode

by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Winston Li (Arima) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

AI/ML MMM

Synthetic data: it's a fascinating topic that sounds like science fiction but is rapidly becoming a practical tool in the data landscape. From machine learning applications to safeguarding privacy, synthetic data offers a compelling alternative to real-world datasets that might be incomplete or unwieldy. With the help of Winston Li, founder of Arima, a startup specializing in synthetic data and marketing mix modelling, we explore how this artificial data is generated, where its strengths truly lie, and the potential pitfalls to watch out for! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

The present, past and future of GitHub

2025-06-18 · The Pragmatic Engineer Listen

podcast_episode

by Thomas Dohmke (GitHub) , Gergely Orosz

AI/ML Analytics Cloud Computing GitHub Linux Microsoft Cyber Security

Supported by Our Partners •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. • Graphite — The AI developer productivity platform. • Augment Code — AI coding assistant that pro engineering teams love — GitHub recently turned 17 years old—but how did it start, how has it evolved, and what does the future look like as AI reshapes developer workflows? In this episode of The Pragmatic Engineer, I’m joined by Thomas Dohmke, CEO of GitHub. Thomas has been a GitHub user for 16 years and an employee for 7. We talk about GitHub’s early architecture, its remote-first operating model, and how the company is navigating AI—from Copilot to agents. We also discuss why GitHub hires junior engineers, how the company handled product-market fit early on, and why being a beloved tool can make shipping harder at times. Other topics we discuss include: • How GitHub’s architecture evolved beyond its original Rails monolith • How GitHub runs as a remote-first company—and why they rarely use email • GitHub’s rigorous approach to security • Why GitHub hires junior engineers • GitHub’s acquisition by Microsoft • The launch of Copilot and how it’s reshaping software development • Why GitHub sees AI agents as tools, not a replacement for engineers • And much more! — Timestamps (00:00) Intro (02:25) GitHub’s modern tech stack (08:11) From cloud-first to hybrid: How GitHub handles infrastructure (13:08) How GitHub’s remote-first culture shapes its operations (18:00) Former and current internal tools including Haystack (21:12) GitHub’s approach to security (24:30) The current size of GitHub, including security and engineering teams (25:03) GitHub’s intern program, and why they are hiring junior engineers (28:27) Why AI isn’t a replacement for junior engineers (34:40) A mini-history of GitHub (39:10) Why GitHub hit product market fit so quickly (43:44) The invention of pull requests (44:50) How GitHub enables offline work (46:21) How monetization has changed at GitHub since the acquisition (48:00) 2014 desktop application releases (52:10) The Microsoft acquisition (1:01:57) Behind the scenes of GitHub’s quiet period (1:06:42) The release of Copilot and its impact (1:14:14) Why GitHub decided to open-source Copilot extensions (1:20:01) AI agents and the myth of disappearing engineering jobs (1:26:36) Closing — The Pragmatic Engineer deepdives relevant for this episode: • AI Engineering in the real world • The AI Engineering stack • How Linux is built with Greg Kroah-Hartman • Stacked Diffs (and why you should know about them) • 50 Years of Microsoft and developer tools — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Better Together: Change Data Feed in a Streaming Data Flow

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Mattias Moser (84.51 LLC) , Scott Gordon (84.51˚)

Delta Data Streaming

Traditional streaming works great when your data source is append-only, but what if your data source includes updates and deletes? At 84.51 we used Lakeflow Declarative Pipelines and Delta Lake to build a streaming data flow that consumes inserts, updates and deletes while still taking advantage of streaming checkpoints. We combined this flow with a materialized view and Enzyme incremental refresh for a low-code, efficient and robust end-to-end data flow.We process around 8 million sales transactions each day with 80 million items purchased. This flow not only handles new transactions but also handles updates to previous transactions.Join us to learn how 84.51 combined change data feed, data streaming and materialized views to deliver a “better together” solution.84.51 is a retail insights, media & marketing company. We use first-party retail data from 60 million households sourced through a loyalty card program to drive Kroger’s customer-centric journey.

Marketing Runs on Your Data: Why IT Holds the Keys to Customer Growth

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Tim Haden (Epsilon)

Databricks

Marketing owns the outcomes, but IT owns the infrastructure that makes those outcomes possible. In today’s data-driven landscape, the success of customer engagement and personalization strategies depends on a tight partnership between marketing and IT. This session explores how leading brands are using Databricks and Epsilon to unlock the full value of first-party data — transforming raw data into rich customer profiles, real-time engagement and measurable marketing ROI. Join Epsilon to see how a unified data foundation powers marketing to drive outcomes — with IT as the enabler of scale, governance and innovation. Key takeaways: How to unify first-party data and resolve identities to build rich customer profiles with Databricks and Epsilon Why a collaborative approach between Marketing and IT accelerates data-driven decisions and drives greater return How to activate personalized campaigns with precision and speed across channels — from insights to execution

Pacers Sports and Entertainment and Databricks

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Ari Kaplan (Databricks) , Jared Chavez (Pacers Sports & Entertainment) , Rick Schultz (Databricks)

AI/ML Databricks

The Pacers Sports Group has had an amazing year. The Indianapolis Pacers in the NBA finals for the first time in 25 years. The Fever are setting attendance and viewership records with WNBA celebrity Caitlin Clark. Hear how they have transformed their data and AI capabilities for marketing, fan behavior insights, season ticket propensity models, and democratization to their non-technical personas. And receiving a 12,000x cost reduction down to just $8 a year switching to Databricks.

How Ad Tech Runs on Databricks

2025-06-11 · Data + AI Summit 2025 Watch

talk

by anj yamsani (Influential (Publicis)) , Kayvon Raphael (Magnite) , Abhishek Yadav (LG Ads) , Scott Simony (Databricks) , Cyrus Mohammadian (Influential) , Atul Saurabh (MiQ)

Databricks

Discover how leading ad techs and agencies — including Magnite, LG Ads, MiQ and Publicis Influential — leverage Databricks to power the advertising and marketing ecosystem.

Data Intelligence for Marketing Breakout: Agentic Systems for Bayesian MMM and Consumer Testing

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Dan Morris (Databricks) , Dr. Luca Fiaschi (PyMC Labs)

AI/ML Databricks GenAI MMM

This talk dives into leveraging GenAI to scale sophisticated decision intelligence. Learn how an AI copilot interface simplifies running complex Bayesian probabilistic models, accelerating insight generation, and accurate decision making at the enterprise level. We talk through techniques for deploying AI agents at scale to simulate market dynamics or product feature impacts, providing robust, data-driven foresight for high-stakes innovation and strategy directly within your Databricks environment. For marketing teams, this approach will help you leverage autonomous AI agents to dynamically manage media channel allocation while simulating real-world consumer behavior through synthetic testing environments.

Transforming Customer Processes and Gaining Productivity With Lakeflow Declarative Pipelines

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Marcos Abrantes Gomes (Bradesco Bank) , Ademir Francisquini Junior (Banco Bradesco S.A.)

CDP Databricks React

Bradesco Bank is one of the largest private banks in Latin America, with over 75 million customers and over 80 years of presence in FSI. In the digital business, velocity to react to customer interactions is crucial to succeed. In the legacy landscape, acquiring data points on interactions over digital and marketing channels was complex, costly and lacking integrity due to typical fragmentation of tools. With the new in-house Customer Data Platform powered by Databricks Intelligent Platform, it was possible to completely transform the data strategy around customer data. Using some key components such Uniform and Lakeflow Declarative Pipelines, it was possible to increase data integrity, reduce latency and processing time and, most importantly, boost personal productivity and business agility. Months of reprocessing, weeks of human labor and cumbersome and complex data integrations were dramatically simplified achieving significant operational efficiency.

Unlock the Potential of Your Enterprise Data With Zero-Copy Data Sharing, featuring SAP and Salesforce

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Akram Chetibi (Databricks) , Senthil Krishnapillai (SAP Labs) , Rajkumar Irudayaraj (Salesforce)

AI/ML Analytics Cloud Computing Databricks Delta SAP

Tired of data silos and the constant need to move copies of your data across different systems? Imagine a world where all your enterprise data is readily available in Databricks without the cost and complexity of duplication and ingestion. Our vision is to break down these silos by enabling seamless, zero-copy data sharing across platforms, clouds, and regions. This unlocks the true potential of your data for analytics and AI, empowering you to make faster, more informed decisions leveraging your most important enterprise data sets. This session you will hear from Databricks, SAP, and Salesforce product leaders on how zero-copy data sharing can unlock the value of enterprise data. Explore how Delta Sharing makes this vision a reality, providing secure, zero-copy data access for enterprises.SAP Business Data Cloud: See Delta Sharing in action to unlock operational reporting, supply chain optimization, and financial planning. Salesforce Data Cloud: Enable customer analytics, churn prediction, and personalized marketing.

Somebody Set Up Us the Bomb: Identifying List Bombing of End Users in an Email Anti-Spam Context

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Doug Sibley (Cisco Talos)

Delta ETL/ELT Kafka Data Streaming

Traditionally, spam emails are messages a user does not want, containing some kind of threat like phishing. Because of this, detection systems can focus on malicious content or sender behavior. List bombing upends this paradigm. By abusing public forms such as marketing signups, attackers can fill a user's inbox with high volumes of legitimate mail. These emails don't contain threats, and each sender is following best practices to confirm the recipient wants to be subscribed, but the net effect for an end user is their inbox being flooded with dozens of emails per minute. This talk covers the the exploration and implementation for identifying this attack in our company's anti-spam telemetry: from reading and writing to Kafka, Delta table streaming for ETL workflows, multi-table liquid clustering design for efficient table joins, curating gold tables to speed up critical queries and using Delta tables as an auditable integration point for interacting with external services.

Metadata-Driven Streaming Ingestion Using Lakeflow Declarative Pipelines, Azure Event Hubs and a Schema Registry

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Vicky Avison (Plexure)

Azure Data Engineering React Data Streaming

At Plexure, we ingest hundreds of millions of customer activities and transactions into our data platform every day, fuelling our personalisation engine and providing insights into the effectiveness of marketing campaigns.We're on a journey to transition from infrequent batch ingestion to near real-time streaming using Azure Event Hubs and Lakeflow Declarative Pipelines. This transformation will allow us to react to customer behaviour as it happens, rather than hours or even days later.It also enables us to move faster in other ways. By leveraging a Schema Registry, we've created a metadata-driven framework that allows data producers to: Evolve schemas with confidence, ensuring downstream processes continue running smoothly. Seamlessly publish new datasets into the data platform without requiring Data Engineering assistance. Join us to learn more about our journey and see how we're implementing this with Lakeflow Declarative Pipelines meta-programming - including a live demo of the end-to-end process!

Summit Live: Data Intelligence for Marketing

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Anoop Muraleedharan (Databricks)

AI/ML CDP Databricks

Maximize the value of your company’s marketing efforts with Data Intelligence for Marketing. Databricks provides seamless, out-of-the-box integration with your ecosystem, empowering every marketer with self-serve insights. And with AI-driven CDP, you get a complete view of customers and campaigns.

How Feastables Partners With Engine to Leverage Advanced Data Models and AI for Smarter BI

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Daniel Palmer (Engine) , Mary Beth Pittman (Feastables, Inc.)

AI/ML Analytics BI Databricks

Feastables, founded by YouTube sensation MrBeast, partnered with Engine to build a modern, AI-enabled BI ecosystem that transforms complex, disparate data into actionable insights, driving smarter decision-making across the organization. In this session, learn how Engine, a Built-On Databricks Partner, brought expertise combined with strategic partnerships that enabled Feastables to rapidly stand up a secure, modern data estate to unify complex internal and external data sources into a single, permissioned analytics platform. Feastables unlocked the power of cross-functional collaboration by democratizing data access throughout their enterprise and seamlessly integrating financial, retailer, supply chain, syndicated, merchandising and e-commerce data. Discover how a scalable analytics framework combined with advanced AI models and tools empower teams with Smarter BI across sales, marketing, supply chain, finance and executive leadership to enable real-time decision-making at scale.

How We Turned 200+ Business Users Into Analysts With AI/BI Genie

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Thomas Russell (Databricks)

AI/ML Analytics BI Databricks SQL

AI/BI Genie has transformed self-service analytics for the Databricks Marketing team. This user-friendly conversational AI tool empowers marketers to perform advanced data analysis using natural language — no SQL required. By reducing reliance on data teams, Genie increases productivity and enables faster, data-driven decisions across the organization. But realizing Genie’s full potential takes more than just turning it on. In this session, we’ll share the end-to-end journey of implementing Genie for over 200 marketing users, including lessons learned, best practices and the real business impact of this Databricks-on-Databricks solution. Learn how Genie democratizes data access, enhances insight generation and streamlines decision-making at scale.

Lakehouse to Powerhouse: Reckitt's Enterprise AI Transformation Story

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Tom Martin (Boston Consulting Group) , Tewfik Bedreddine (Reckitt)

AI/ML Data Lakehouse GenAI

In this presentation, we showcase Reckitt’s journey to develop and implement a state-of-the-art Gen AI platform, designed to transform enterprise operations starting with the marketing function. We will explore the unique technical challenges encountered and the innovative architectural solutions employed to overcome them. Attendees will gain insights into how cutting-edge Gen AI technologies were integrated to meet Reckitt’s specific needs. This session will not only highlight the transformative impacts on Reckitt’s marketing operations but also serve as a blueprint for AI-driven innovation in the Consumer Goods sector, demonstrating a successful model of partnership in technology and business transformation.

TDD, AI agents and coding with Kent Beck

2025-06-11 · The Pragmatic Engineer Listen

podcast_episode

by Gergely Orosz , Kent Beck

Agile/Scrum AI/ML Analytics Cyber Security

Supported by Our Partners • Sonar — Code quality and code security for ALL code. •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. • Augment Code — AI coding assistant that pro engineering teams love. — Kent Beck is one of the most influential figures in modern software development. Creator of Extreme Programming (XP), co-author of The Agile Manifesto, and a pioneer of Test-Driven Development (TDD), he’s shaped how teams write, test, and think about code. Now, with over five decades of programming experience, Kent is still pushing boundaries—this time with AI coding tools. In this episode of Pragmatic Engineer, I sit down with him to talk about what’s changed, what hasn’t, and why he’s more excited than ever to code. In our conversation, we cover: • Why Kent calls AI tools an “unpredictable genie”—and how he’s using them • Why Kent no longer has an emotional attachment to any specific programming language • The backstory of The Agile Manifesto—and why Kent resisted the word “agile” • An overview of XP (Extreme Programming) and how Grady Booch played a role in the name • Tape-to-tape experiments in Kent’s childhood that laid the groundwork for TDD • Kent’s time at Facebook and how he adapted to its culture and use of feature flags • And much more! — Timestamps (00:00) Intro (02:27) What Kent has been up to since writing Tidy First (06:05) Why AI tools are making coding more fun for Kent and why he compares it to a genie (13:41) Why Kent says languages don’t matter anymore (16:56) Kent’s current project building a small talk server (17:51) How Kent got involved with The Agile Manifesto (23:46) Gergely’s time at JP Morgan, and why Kent didn’t like the word ‘agile’ (26:25) An overview of “extreme programming” (XP) (35:41) Kent’s childhood tape-to-tape experiments that inspired TDD (42:11) Kent’s response to Ousterhout’s criticism of TDD (50:05) Why Kent still uses TDD with his AI stack (54:26) How Facebook operated in 2011 (1:04:10) Facebook in 2011 vs. 2017 (1:12:24) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: • — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Sponsored by: Hightouch | Unleashing AI at PetSmart: Using AI Decisioning Agents to Drive Revenue

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Gowri Shankar (Doordash) , Chao Wang (DoorDash)

Analytics Data Governance Delta

The "Doordash Customer 360 Data Store" represents a foundational step in centralizing and managing customer profile to enable targeting and personalized customer experiences built on Delta Lake. This presentation will explore the initial goals and architecture of the Customer 360 Data Store, its journey to becoming a robust entity management framework, and the challenges and opportunities encountered along the way. We will discuss how the evolution addressed scalability, data governance and integration needs, enabling the system to support dynamic and diverse use cases, including customer lifecycle analytics, marketing campaign targeting using segmentation. Attendees will gain insight into key design principles, technical innovations and strategic decisions that transformed the system into a flexible platform for entity management, positioning it as a critical enabler of data-driven growth at Doordash. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

talk-data.com

Activity Trend

Top Events

Top Speakers

How AI is changing software engineering at Shopify with Farhan Thawar

#274: Real Talk About Synthetic Data with Winston Li

The present, past and future of GitHub

Better Together: Change Data Feed in a Streaming Data Flow

Marketing Runs on Your Data: Why IT Holds the Keys to Customer Growth

Pacers Sports and Entertainment and Databricks

How Ad Tech Runs on Databricks

Data Intelligence for Marketing Breakout: Agentic Systems for Bayesian MMM and Consumer Testing

Transforming Customer Processes and Gaining Productivity With Lakeflow Declarative Pipelines

Unlock the Potential of Your Enterprise Data With Zero-Copy Data Sharing, featuring SAP and Salesforce

Somebody Set Up Us the Bomb: Identifying List Bombing of End Users in an Email Anti-Spam Context

Metadata-Driven Streaming Ingestion Using Lakeflow Declarative Pipelines, Azure Event Hubs and a Schema Registry

Summit Live: Data Intelligence for Marketing

How Feastables Partners With Engine to Leverage Advanced Data Models and AI for Smarter BI

How We Turned 200+ Business Users Into Analysts With AI/BI Genie

Lakehouse to Powerhouse: Reckitt's Enterprise AI Transformation Story

TDD, AI agents and coding with Kent Beck

Sponsored by: Hightouch | Unleashing AI at PetSmart: Using AI Decisioning Agents to Drive Revenue

Sponsored by: RowZero | Spreadsheets in the modern data stack: security, governance, AI, and self-serve analytics

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework