talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

4055

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

4055 activities · Newest first

Ingest govern and secure your data with OneLake | BRK201

With the massive growth in the volume of data and data sources, managing an entire organization-wide data estate is becoming increasingly complex. This session will explore the latest capabilities coming to OneLake, Fabric’s multi-cloud data lake, that can help you bring in data from any source and then govern and secure that data. Discover how new mirroring and governance tools in Fabric can help you manage your data estate and unlock deeper insights.

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Joshua Caplan * Wilson Lee * Adi Regev

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This is one of many sessions from the Microsoft Ignite 2024 event. View even more sessions on-demand and learn about Microsoft Ignite at https://ignite.microsoft.com

BRK201 | English (US) | Data

MSIgnite

We’re improving DataFramed, and we need your help! We want to hear what you have to say about the show, and how we can make it more enjoyable for you—find out more here. Integrating generative AI with robust databases is becoming essential. As organizations face a plethora of database options and AI tools, making informed decisions is crucial for enhancing customer experiences and operational efficiency. How do you ensure your AI systems are powered by high-quality data? And how can these choices impact your organization's success? Gerrit Kazmaier is the VP and GM of Data Analytics at Google Cloud. Gerrit leads the development and design of Google Cloud’s data technology, which includes data warehousing and analytics. Gerrit’s mission is to build a unified data platform for all types of data processing as the foundation for the digital enterprise. Before joining Google, Gerrit served as President of the HANA & Analytics team at SAP in Germany and led the global Product, Solution & Engineering teams for Databases, Data Warehousing and Analytics. In 2015, Gerrit served as the Vice President of SAP Analytics Cloud in Vancouver, Canada. In this episode, Richie and Gerrit explore the transformative role of AI in data tools, the evolution of dashboards, the integration of AI with existing workflows, the challenges and opportunities in SQL code generation, the importance of a unified data platform, leveraging unstructured data, and much more. Links Mentioned in the Show: Google CloudConnect with GerritThinking Fast and Slow by Daniel KahnemanCourse: Introduction to GCPRelated Episode: Not Only Vector Databases: Putting Databases at the Heart of AI, with Andi Gutmans, VP and GM of Databases at GoogleRewatch sessions from RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. In this episode, we’re joined by a special guest: Alex Gallego, founder and CEO of Red Panda. Together, we dive deep into building data-intensive applications, the evolution of streaming technologies, and balancing high throughput and low latency demands.  Key topics covered: What is Red Panda and why it matters: Red Panda’s mission to redefine data streaming while being the fastest Kafka-compatible option on the market.Batch vs. streaming data: An accessible guide to understanding the classic debate and how the tech landscape is shifting towards unified data frameworks.Scaling at speed: The challenges and innovations driving Red Panda’s performance optimizations, from zero-copy architecture to storage engines.AI, ML, and streaming data integration: How Red Panda empowers real-time machine learning and AI-powered workloads with ease.Open source vs. enterprise models: Navigating licensing challenges and balancing business goals in the hybrid cloud era.Leadership and career shifts: Alex’s reflections on moving from technical lead to CEO, blending engineering know-how with company vision.

A Fireside Chat with Hugo Bowne-Anderson and Alex Filipchik (Head of Infrastructure, Cloud Kitchens) on how machine learning (ML) and AI are evolving from niche specializations into essential engineering disciplines. Topics include engineering ML and AI at scale, the shift from specialist roles to core engineering, practical infrastructure decisions, generative AI use cases, simplifying ML adoption for engineers, and the future of data and ML engineering.

Big Data is Dead: Long Live Hot Data 🔥

Over the last decade, Big Data was everywhere. Let's set the record straight on what is and isn't Big Data. We have been consumed by a conversation about data volumes when we should focus more on the immediate task at hand: Simplifying our work.

Some of us may have Big Data, but our quest to derive insights from it is measured in small slices of work that fit on your laptop or in your hand. Easy data is here— let's make the most of it.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-is-dead/ Small Data Manifesto: https://motherduck.com/blog/small-data-manifesto/ Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: https://linkedin.com/company/motherduck X/Twitter : https://twitter.com/motherduck Blog: https://motherduck.com/blog/


Explore the "Small Data" movement, a counter-narrative to the prevailing big data conference hype. This talk challenges the assumption that data scale is the most important feature of every workload, defining big data as any dataset too large for a single machine. We'll unpack why this distinction is crucial for modern data engineering and analytics, setting the stage for a new perspective on data architecture.

Delve into the history of big data systems, starting with the non-linear hardware costs that plagued early data practitioners. Discover how Google's foundational papers on GFS, MapReduce, and Bigtable led to the creation of Hadoop, fundamentally changing how we scale data processing. We'll break down the "big data tax"—the inherent latency and system complexity overhead required for distributed systems to function, a critical concept for anyone evaluating data platforms.

Learn about the architectural cornerstone of the modern cloud data warehouse: the separation of storage and compute. This design, popularized by systems like Snowflake and Google BigQuery, allows storage to scale almost infinitely while compute resources are provisioned on-demand. Understand how this model paved the way for massive data lakes but also introduced new complexities and cost considerations that are often overlooked.

We examine the cracks appearing in the big data paradigm, especially for OLAP workloads. While systems like Snowflake are still dominant, the rise of powerful alternatives like DuckDB signals a shift. We reveal the hidden costs of big data analytics, exemplified by a petabyte-scale query costing nearly $6,000, and argue that for most use cases, it's too expensive to run computations over massive datasets.

The key to efficient data processing isn't your total data size, but the size of your "hot data" or working set. This talk argues that the revenge of the single node is here, as modern hardware can often handle the actual data queried without the overhead of the big data tax. This is a crucial optimization technique for reducing cost and improving performance in any data warehouse.

Discover the core principles for designing systems in a post-big data world. We'll show that since only 1 in 500 users run true big data queries, prioritizing simplicity over premature scaling is key. For low latency, process data close to the user with tools like DuckDB and SQLite. This local-first approach offers a compelling alternative to cloud-centric models, enabling faster, more cost-effective, and innovative data architectures.

On this episode of the Data Unchained podcast, Desiree Campbell, Managing Director for HPC Americas from the Azure team at Microsoft, joins us to discuss Women in HPC, architecting and orchestrating data workflows across hybrid environments, and the importance of being a mentor in the tech industry.

podcast #ai #data #innovation #datascience #datastorage #datacloudtechnology #global #international #hybridcloud #cloud #dataorchestration #hightech #tech #technology #technologynews #tech

@Microsoft   @MicrosoftAzure  https://azure.microsoft.com/ Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

As organizations grow, they seek agility, often aiming to simplify processes for both developers and the business. We frequently see companies accelerating development with DevOps, then shifting to SRE, and adopting models like a Cloud Center of Excellence to merge development and operations. As \"You build it, you run it\" challenges arise, a platform naturally forms. But how do these transformations lead to convergence? In this talk, we’ll explore how platforms evolve and the critical factors for their success.

Erik Bernhardsson, the CEO and co-founder of Modal Labs, joins Tristan to talk about Gen AI, the lack of GPUs, the future of cloud computing, and egress fees. They also discuss whether the job title of data engineer is something we should want more or less of in the future. Erik's not afraid of a spicy take, so this is a fun one.  For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Apache Airflow Best Practices

"Apache Airflow Best Practices" is your go-to guide for mastering data workflow orchestration using Apache Airflow. This book introduces you to core concepts and features of Airflow and helps you efficiently design, deploy, and manage workflows. With detailed examples and hands-on tutorials, you'll learn how to tackle real-world challenges in data engineering. What this Book will help me do Understand and utilize the features and updates introduced in Apache Airflow 2.x. Design and implement robust, scalable, and efficient data pipelines and workflows. Learn best practices for deploying Apache Airflow in cloud environments such as AWS and GCP. Extend Airflow's functionality with custom plugins and advanced configuration. Monitor, maintain, and scale your Airflow deployment effectively for high availability. Author(s) Dylan Intorf, Dylan Storey, and Kendrick van Doorn are seasoned professionals in data engineering, data strategy, and software development. Between them, they bring decades of experience working in diverse industries like finance, tech, and life sciences. They bring their expertise into this practical guide to help practitioners understand and master Apache Airflow. Who is it for? This book is tailored for data professionals such as data engineers, scientists, and system administrators, offering valuable insights for new learners and experienced users. If you're starting with workflow orchestration, seeking to optimize your current Airflow implementation, or scaling efforts, this book aligns with your goals. Readers should have a basic knowledge of Python programming and data engineering principles.

Building Modern Data Applications Using Databricks Lakehouse

This book, "Building Modern Data Applications Using Databricks Lakehouse," provides a comprehensive guide for data professionals to master the Databricks platform. You'll learn to effectively build, deploy, and monitor robust data pipelines with Databricks' Delta Live Tables, empowering you to manage and optimize cloud-based data operations effortlessly. What this Book will help me do Understand the foundations and concepts of Delta Live Tables and its role in data pipeline development. Learn workflows to process and transform real-time and batch data efficiently using the Databricks lakehouse architecture. Master the implementation of Unity Catalog for governance and secure data access in modern data applications. Deploy and automate data pipeline changes using CI/CD, leveraging tools like Terraform and Databricks Asset Bundles. Gain advanced insights in monitoring data quality and performance, optimizing cloud costs, and managing DataOps tasks effectively. Author(s) Will Girten, the author, is a seasoned Solutions Architect at Databricks with over a decade of experience in data and AI systems. With a deep expertise in modern data architectures, Will is adept at simplifying complex topics and translating them into actionable knowledge. His books emphasize real-time application and offer clear, hands-on examples, making learning engaging and impactful. Who is it for? This book is geared towards data engineers, analysts, and DataOps professionals seeking efficient strategies to implement and maintain robust data pipelines. If you have a basic understanding of Python and Apache Spark and wish to delve deeper into the Databricks platform for streamlining workflows, this book is tailored for you.

Generative AI and data are more interconnected than ever. If you want quality in your AI product, you need to be connected to a database with high quality data. But with so many database options and new AI tools emerging, how do you ensure you’re making the right choices for your organization? Whether it’s enhancing customer experiences or improving operational efficiency, understanding the role of your databases in powering AI is crucial.  Andi Gutmans is the General Manager and Vice President for Databases at Google. Andi’s focus is on building, managing, and scaling the most innovative database services to deliver the industry’s leading data platform for businesses. Prior to joining Google, Andi was VP Analytics at AWS running services such as Amazon Redshift. Prior to his tenure at AWS, Andi served as CEO and co-founder of Zend Technologies, the commercial backer of open-source PHP. Andi has over 20 years of experience as an open source contributor and leader. He co-authored open source PHP. He is an emeritus member of the Apache Software Foundation and served on the Eclipse Foundation’s board of directors. He holds a bachelor’s degree in computer science from the Technion, Israel Institute of Technology. In the episode, Richie and Andi explore databases and their relationship with AI and GenAI, key features needed in databases for AI, GCP database services, AlloyDB, federated queries in Google Cloud, vector databases, graph databases, practical use cases of AI in databases and much more.  Links Mentioned in the Show: GCPConnect with AndiAlloyDB for PostgreSQLCourse: Responsible AI Data ManagementRelated Episode: The Power of Vector Databases and Semantic Search with Elan Dekel, VP of Product at PineconeSign up to RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Coalesce 2024: Scaling dbt: Balancing self-serve analytics and central governance

Adopting dbt marks a significant leap towards governed data transformations. But with every game-changer, big questions arise: Where do data transformations end? Should they touch the BI layer? What roles do data engineers, analytics engineers, and business analysts play in data modeling? And, is centralizing metrics truly beneficial? Spoiler: It's about finding the balance between freedom and governance.

Our expert panelists will share best practices for scaling dbt to handle transformations and metrics without stifling analyst freedom or causing team burnout. You'll learn how to build a robust metrics layer in dbt and manage business logic as your data operation grows, all by establishing a solid foundation with dbt.

Speakers: Mark Nelson, Silja Mardla, Patrick Vinton, Sarah Levy

Learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Send us a text Part 2 with Mr. Ian Smith, the most interesting man alive… Serial Entrepreneurial and Co-founder of Lighthouse Technology, accelerating AI in the data center to optimize IT, Cloud and Data environments. 00:50 Lighthouse Technology02:23 A Closed, Free AI Model?? How is it Monetized?08:17 Back to the Data Center10:44 The Edge11:51 Lighthouse GTM19:29 The Future21:12 Reaching Lighthouse21:43 Rapid Fire- Hardest Part of Entrepreneurship- Easiest Part of EntrepreneurshipLinkedin: linkedin.com/in/ian-smith-a803701 Website: https://lighthousetechnology.ai/

Want to be featured as a guest on Making Data Simple? Reach out to us [email protected] and tell us why you should be next. TheMaking Data Simple Podcast is hosted by Al Martin, WW VP TechnicalSales, IBM, where we explore trending technologies, business innovation,and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.