Golden Image Management and Lifecycle: creating, maintaining, and retiring Golden Images to ensure consistent and efficient infrastructure deployments, including the toolset used, ownership model, and deployment pipeline. Streamlining Cloud Workloads Provisioning: enabling engineers to provision cloud workloads via self-service, using internally developed applications, incentivizing adoption, and addressing AWS capacity limits.
talk-data.com
Topic
AWS
Amazon Web Services (AWS)
837
tagged
Activity Trend
Top Events
Learn the fundamentals of infrastructure as code through guided exercises in TypeScript. You will be introduced to Pulumi and learn how to provision modern cloud infrastructure on AWS. This workshop covers how to use TypeScript with Pulumi, the basics of the Pulumi Programming Model, and how to provision, update, and destroy AWS resources.
Explore data acquisition challenges and solutions with Artsiom Yudovin in his session. 📊 Learn how his team overcame obstacles to create a robust system that provides timely insights and data reliability. 🛠️ #DataAcquisition #AWS #DataChallenges
✨ H I G H L I G H T S ✨
🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍
Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️
Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear
Effective data management has become a cornerstone of success in our digital era. It involves not just collecting and storing information but also organizing, securing, and leveraging data to drive progress and innovation. Many organizations turn to tools like Snowflake for advanced data warehousing capabilities. However, while Snowflake enhances data storage and access, it's not a complete solution for all data management challenges. To address this, tools like Capital One’s Slingshot can be used alongside Snowflake, helping to optimize costs and refine data management strategies. Salim Syed is a VP, Head of engineering for Capital One Slingshot product. He led Capital One’s data warehouse migration to AWS and is a specialist in deploying Snowflake to a large enterprise. Salim’s expertise lies in developing Big Data (Lake) and Data Warehouse strategy on the public cloud. He leads an organization of more than 100 data engineers, support engineers, DBAs and full stack developers in driving enterprise data lake, data warehouse, data management and visualization platform services. Salim has more than 25 years of experience in the data ecosystem. His career started in data engineering where he built data pipelines and then moved into maintenance and administration of large database servers using multi-tier replication architecture in various remote locations. He then worked at CodeRye as a database architect and at 3M Health Information Systems as an enterprise data architect. Salim has been at Capital One for the past six years. In this episode, Adel and Salim explore cloud data management and the evolution of Slingshot into a major multi-tenant SaaS platform, the shift from on-premise to cloud-based data governance, the role of centralized tooling, strategies for effective cloud data management, including data governance, cost optimization, and waste reduction as well as insights into navigating the complexities of data infrastructure, security, and scalability in the modern digital era. Links Mentioned in the Show: Capital One SlingshotSnowflakeCourse: Introduction to Data WarehousingCourse: Introduction to Snowflake
Summary
Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Developing event-driven pipelines is going to be a lot easier - Meet Functions! Memphis functions enable developers and data engineers to build an organizational toolbox of functions to process, transform, and enrich ingested events “on the fly” in a serverless manner using AWS Lambda syntax, without boilerplate, orchestration, error handling, and infrastructure in almost any language, including Go, Python, JS, .NET, Java, SQL, and more. Go to dataengineeringpodcast.com/memphis today to get started! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'll be sharing an update on my own journey of building a data platform, with a particular focus on the challenges of tool integration and maintaining a single source of truth
Interview
Introduction How did you get involved in the area of data management? data sharing weight of history
existing integrations with dbt switching cost for e.g. SQLMesh de facto standard of Airflow
Single source of truth
permissions management across application layers Database engine Storage layer in a lakehouse Presentation/access layer (BI) Data flows dbt -> table level lineage orchestration engine -> pipeline flows
task based vs. asset based
Metadata platform as the logical place for horizontal view
Contact Info
LinkedIn Website
Parting Questio
In this live workshop, you will learn the fundamentals of setting up EKS clusters on AWS through guided exercises. This workshop covers the basics of writing Pulumi programs to manage infrastructure using real languages, how to create and manage EKS clusters in AWS with Pulumi, and how to create and manage Kubernetes resources with Pulumi.
Free 1-day virtual instructor-led training on AWS Cloud Practitioner Essentials.
All the new features of aws-classic v6 and AWSX
How to provision, update, and destroy AWS resources
The basics of the Pulumi Programming Model
Companies today are moving rapidly to integrate generative AI into their products and services. But there's a great deal of hype (and misunderstanding) about the impact and promise of this technology. With this book, Chris Fregly, Antje Barth, and Shelbee Eigenbrode from AWS help CTOs, ML practitioners, application developers, business analysts, data engineers, and data scientists find practical ways to use this exciting new technology. You'll learn the generative AI project life cycle including use case definition, model selection, model fine-tuning, retrieval-augmented generation, reinforcement learning from human feedback, and model quantization, optimization, and deployment. And you'll explore different types of models including large language models (LLMs) and multimodal models such as Stable Diffusion for generating images and Flamingo/IDEFICS for answering questions about images. Apply generative AI to your business use cases Determine which generative AI models are best suited to your task Perform prompt engineering and in-context learning Fine-tune generative AI models on your datasets with low-rank adaptation (LoRA) Align generative AI models to human values with reinforcement learning from human feedback (RLHF) Augment your model with retrieval-augmented generation (RAG) Explore libraries such as LangChain and ReAct to develop agents and actions Build generative AI applications with Amazon Bedrock
A session on data access governance and security using AWS services.
A session on applying AI/ML to data using built-in machine learning capabilities in AWS.
A session on how AWS services can modernize data infrastructure, unify data silos, and drive innovation across data platforms.
In today's cloud ecosystem, many laud the visible pillars of AWS's Well-Architected Framework, yet an essential component often remains in the shadows: Infrastructure as Code (IAC). Elizabeth Adeotun Adegbaju, a DevOps Engineer with a rich history in AWS cloud infrastructure, unravels the indispensable role of IAC in fortifying each of the renowned AWS pillars. Through this illuminating talk, attendees will gain insights into the intricate interplay between IAC and AWS's principles of operational excellence, cost optimization, reliability, performance efficiency, security, and sustainability. Dive deep into real-world examples, understand the potential pitfalls of overlooking IAC, and emerge with a renewed appreciation for its foundational significance in cloud architecture. This session is a clarion call for organizations to recognize and harness the power of IAC, positioning it not just as an option but as an imperative in achieving success in the cloud.
Learn data engineering and modern data pipeline design with AWS in this comprehensive guide! You will explore key AWS services like S3, Glue, Redshift, and QuickSight to ingest, transform, and analyze data, and you'll gain hands-on experience creating robust, scalable solutions. What this Book will help me do Understand and implement data ingestion and transformation processes using AWS tools. Optimize data for analytics with advanced AWS-powered workflows. Build end-to-end modern data pipelines leveraging cutting-edge AWS technologies. Design data governance strategies using AWS services for security and compliance. Visualize data and extract insights using Amazon QuickSight and other tools. Author(s) Gareth Eagar is a Senior Data Architect with over 25 years of experience in designing and implementing data solutions across various industries. He combines his deep technical expertise with a passion for teaching, aiming to make complex concepts approachable for learners at all levels. Who is it for? This book is intended for current or aspiring data engineers, data architects, and analysts seeking to leverage AWS for data engineering. It suits beginners with a basic understanding of data concepts who want to gain practical experience as well as intermediate professionals aiming to expand into AWS-based systems.
Explore AWS CloudFormation template and its structure, parameters, stacks, updates, importing resources, and drift detection
Understand the implementation of DevOps culture and techniques in the AWS Cloud
Date: 2023-10-26. Webinar: Master Class: Getting Started with AWS DevOps. Topics include AWS DevOps concepts and culture, infrastructure automation, and AWS CloudFormation templates (structure, parameters, stacks, updates, importing resources, and drift detection).
When companies work together through open source development, good things happen. Open source contributions lead to strong relationships between engineers across company lines, and positive outcomes for customers whether through improved functionality, performance, or supply chain security. In this keynote, learn about the power of open source in driving innovation, how AWS approaches open source collaboration, and some of the key improvements for Amazon Redshift, AWS Glue, and Amazon Athena customers and dbt users resulting from our partnership.
Speaker: David Nalley, Director, Open Source Strategy and Marketing, Amazon Web Services
Register for Coalesce at https://coalesce.getdbt.com/