talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

4055

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

4055 activities · Newest first

For three years we at LOVOO, a market-leading dating app, have been using the Google Cloud managed version of Airflow, a product we’ve been familiar with since its Alpha release. We took a calculated risk and integrated the Alpha into our product, and, luckily, it was a match. Since then, we have been leveraging this software to build out not only our data pipeline, but also boost the way we do analytics and BI. The speaker will present an overview of the software’s usability for Pipeline Error Alerting through BashOperators that communicate with Slack and will touch upon how they built their Analytics Pipeline (deployment and growth) and currently batch big amounts of data from different sources effectively using Airflow. We will also showcase our PythonOperators-driven RedShift to BigQuery data migration process, as well as offer a guide for creating fully dynamic tasks inside DAG.

Data Infrastructures look differently between small, mid, and large sized companies. Yet, most content out there is for large and sophisticated systems. And almost none of it is on migrating a legacy, on-prem, databases over to the cloud. In order to better explain the evolving needs of data engineering organizations, we will review the hierarchy of needs for data engineering.

BigQuery is GCP’s serverless, highly scalable and cost-effective cloud data warehouse that can analyze petabytes of data at super fast speeds. Amazon S3 is one of the oldest and most popular cloud storage offerings. Folks with data in S3 often want to use BigQuery to gain insights into their data. Using Apache Airflow, they can build pipelines to seamlessly orchestrate that connection. In this talk, Leah walks through how they created an easily configurable pipeline to extract data. When a team at work mentioned wanting to set up a repeatable process for migrating data stored in S3 to BigQuery, Leah knew using Cloud Composer (GCP-hosted Airflow) was the right tool for the job, but she didn’t have much experience with the proprietary file types the data used. Luckily, one of her colleagues did have experience with that proprietary file type, though they hadn’t worked with Airflow. Leah and her colleague teamed up to build a reusable, easily configurable solution for the team. She will walk you through their problem, the solution, and the process they took for coming to that solution, highlighting resources that were especially useful to a first-time Airflow user.

This talk discusses how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor As the field of data science grows in popularity, companies find themselves in need of a single common language that can connect their data science teams and data infrastructure teams. Data scientists want rapid iteration, infrastructure engineers want monitoring and security controls, and product owners want their solutions deployed in time for quarterly reports. This talk will discuss how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor ecosystem for data infrastructure and support team. In this talk, we will take an idea from a single-machine Jupyter Notebook to a cross-service Spark + Tensorflow pipeline, to a canary tested, production-ready model served on Google Cloud Functions. We will show how Apache Airflow can connect all layers of a data team to deliver rapid results.

In the contemporary world security is important more than ever - Airflow installations are no exception. Google Cloud Platform and Cloud Composer offer useful security options for running your DAGs and tasks in a way so you effectively can manage a risk of data exfiltration and access to the system is limited. This is a sponsored talk, presented by Google Cloud .

Scribd is migrating its data pipeline from an in house system to Airflow. It’s a one big giant data pipeline consisting of more than 1,500 tasks. In this talk, I would like to share couple best practices on setting up a cloud native Airflow deployment in AWS. For those who are interested in migrating a non-trivial data pipeline to Airflow, I will also share how Scribd plans and executes the migration. Here are some of the topics that will be covered: How to setup a highly available Airflow cluster in AWS using both ECS and EKS with Terraform. How to manage Airflow DAGs across multiple git repositories. How we manage Airflow variables using a custom Airflow Terraform provider. Best practices on monitoring multiple Airflow clusters with Datadog and Pagerduty. How to Airflow to make it feature parity with Scribd’s in house orchestration system. How to plan and execute non-trivial data pipeline migrations. We transcompiled internal DSL to Airflow DAG to simulate what a real run will look like to surface performance issues early in the process. How we fixed an Airflow performance bottleneck so our giant DAG can be properly rendered in Web UI. For detailed deep dives on some of topics mentioned above, please check out our blog post series at https://tech.scribd.com/tag/airflow-series/ [Slides] ( https://docs.google.com/presentation/d/e/2PACX-1vRb-iH5NX2d7m-rQ7WGc6XlRvRCADwXq2hdjRjRuJ5h7e9ybfoUA13ytxpHgx7JG815fIKEE-QKuRUV/pub?start=false&loop=false&delayms=3000 )

This is an audio blog on BI on the Cloud Data Lake and how to improve the productivity of data engineers. We'll dive deeper into the question; what’s the best measure of success for data pipeline efficiency? This is part 2 of a two part blog.

Originally published at: https://www.eckerson.com/articles/business-intelligence-on-the-cloud-data-lake-part-2-improving-the-productivity-of-data-engineers

Learn Grafana 7.0

"Learn Grafana 7.0" is the ultimate beginner's guide to leveraging Grafana's capabilities for analytics and interactive dashboards. You'll master real-time data monitoring, visualization, and learn how to query and explore metrics with a hands-on approach to Grafana 7.0's new features. What this Book will help me do Learn to install and configure Grafana from scratch, preparing you for real-world data analysis tasks. Navigate and utilize the Graph panel in Grafana effectively, ensuring clear and actionable visual insights. Incorporate advanced dashboard features such as annotations, templates, and links to enhance data monitoring. Integrate Grafana with major cloud providers like AWS and Azure for robust monitoring solutions. Implement secure user authentication and fine-tuned permissions for managing teams and sharing insights safely. Author(s) None Salituro, the author of "Learn Grafana 7.0," is an experienced data visualization expert with years of experience in software development and analytics. Salituro focuses on creating understandable and accessible resources for developers and analysts of all skill levels, bringing a hands-on practical approach to technical learning. Who is it for? This book is perfect for data analysts, business intelligence developers, and administrators looking to build skills in data visualization and monitoring with Grafana 7.0. If you're eager to create interactive dashboards and learn practical applications of Grafana's features, this book is for you. Beginners to Grafana are fully accommodated, though familiarity with data visualization principles is beneficial. For those seeking to monitor cloud services like AWS with Grafana, this book is indispensable.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, Data and AI Expert Services and Learning at IBM, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts. This week on Making Data Simple, we have Peter Wang Co Founder and CEO of Anaconda and Shadi Copty VP of Offering Manager. Al, Peter, and Shadi discuss Data Science and IBMs partnership with Anaconda. Show Notes 6:11 - Corporate Mission 8:00 - Use Case 9:20 - IBM and Anaconda partnership 14:04 - Cloud Pak for Data what is it? 15:43 – Python vs R 17:15 – Anaconda’s Future 23:25 – Shadi takes over from Al 25:05 – Data Science Community 33:40 – Centre of Humane Technology   Anaconda - https://www.linkedin.com/company/anacondainc/

Connect with the Team Producer Kate Brown - LinkedIn. Producer Meighann Helene - LinkedIn. Producer Michael Sestak - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

This audio blog is about business intelligence on the cloud data lake and why it arose and how to architect for it. This is Part 1 of a two part blog series.

Originally published at: https://www.eckerson.com/articles/business-intelligence-on-the-cloud-data-lake-part-1-why-it-arose-and-how-to-architect-for-it

This audio blog is about the data lakehouse and how it is the latest incantation from a handful of data lake providers to usurp the rapidly changing cloud data warehousing market. It is one of three blogs featured in the data lakehouse series.

Originally published at: https://www.eckerson.com/articles/all-hail-the-data-lakehouse-if-built-on-a-modern-data-warehouse

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, Data and AI Expert Services and Learning at IBM, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts. This week on Making Data Simple, we have Andy Youniss President and CEO of Rocket Software, co-founded in 1990 with products across the Ladder to AI, Hybrid Cloud and businesses in Technology partnerships. This week we discuss Covid and its personal and professional affects on business.   Show Notes 3:00 - Rockets core values  7:17 – Life and times  10:25 – Portfolio 11:26 - Vision  15:07 - Legacy tagline 20:26 - Rockets responds to Covid 27:17 – Secure file transfer business 28:53 – Where did the name Rocket come from? 30:05 – What does leadership mean to you? IBM Assistance - https://www.ibm.com/watson/covid-response Rocket Software - https://www.rocketsoftware.com/ Linkedin - https://www.linkedin.com/company/rocket-software Twitter - https://twitter.com/rocket Facebook - https://www.facebook.com/RocketSoftwareInc Brene Brown - https://brenebrown.com/ Arvind Krishna - https://www.linkedin.com/in/arvindkrishna/ Spillover – https://www.amazon.com/Spillover-Animal-Infections-Human-Pandemic/dp/0393066800#reader_0393066800 Bruce Springsteen - https://www.goodreads.com/book/show/29072594-born-to-run Connect with the Team Producer Kate Brown - LinkedIn. Producer Meighann Helene - LinkedIn. Producer Michael Sestak - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.    Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

This audio blog discusses the Data Lakehouse, a marketing concept that evokes clean PowerPoint imagery, and why and how the New Cloud Data Lake will play a very real role in modern enterprise environments.

Originally published at: https://www.eckerson.com/articles/data-lakehouses-hold-water-thanks-to-the-cloud-data-lake

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster. This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything aboutconfiguring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data. This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned. What You Will Learn Discover the value of big data analytics that leverage the power of the cloud Get started with Databricks using SQL and Python in either Microsoft Azure or AWS Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture See how these tools are used in the real world Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free Who This Book Is For Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

Pro Power BI Desktop: Self-Service Analytics and Data Visualization for the Power User

Deliver eye-catching and insightful business intelligence with Microsoft Power BI Desktop. This new edition has been updated to cover all the latest features of Microsoft’s continually evolving visualization product. New in this edition is help with storytelling—adapted to PCs, tablets, and smartphones—and the building of a data narrative. You will find coverage of templates and JSON style sheets, data model annotations, and the use of composite data sources. Also provided is an introduction to incorporating Python visuals and the much awaited Decomposition Tree visual. Pro Power BI Desktop shows you how to use source data to produce stunning dashboards and compelling reports that you mold into a data narrative to seize your audience’s attention. Slice and dice the data with remarkable ease and then add metrics and KPIs to project the insights that create your competitive advantage. Convert raw data into clear, accurate, and interactive information with Microsoft’s free self-service BI tool. This book shows you how to choose from a wide range of built-in and third-party visualization types so that your message is always enhanced. You will be able to deliver those results on PCs, tablets, and smartphones, as well as share results via the cloud. The book helps you save time by preparing the underlying data correctly without needing an IT department to prepare it for you. What You Will Learn Deliver attention-grabbing information, turning data into insight Find new insights as you chop and tweak your data as never before Build a data narrative through interactive reports with drill-through and cross-page slicing Mash up data from multiple sources into a cleansed and coherent data model Build interdependent charts, maps, and tables to deliver visually stunninginformation Create dashboards that help in monitoring key performance indicators of your business Adapt delivery to mobile devices such as phones and tablets Who This Book Is For Power users who are ready to step up to the big leagues by going beyond what Microsoft Excel by itself can offer. The book also is for line-of-business managers who are starved for actionable data needed to make decisions about their business. And the book is for BI analysts looking for an easy-to-use tool to analyze data and share results with C-suite colleagues they support.

This audio blog discusses cloud adoption and how data teams will migrate an increasing portion of their on-premises operational and analytics workloads to the cloud. They can best meet budget and project requirements by using data streaming technologies such as change data capture (CDC), which replicates real-time updates between data source and target.

Originally published at: https://www.eckerson.com/articles/the-next-wave-of-cloud-migrations-needs-data-streaming

SQL Server on Azure Virtual Machines

Would you like to master deploying SQL Server in the cloud using Microsoft's Azure platform? With the hands-on guidance in this book, you'll explore how to set up and configure SQL Server on Azure Virtual Machines effectively. By the end, you'll have the knowledge to optimize, manage, and deploy your solutions. What this Book will help me do Understand platform availability for SQL Server in Azure Explore SQL Server IaaS and optimize its configuration Master deploying SQL Server on Linux and Windows in Azure Configure high-performance storage options tailored to SQL Server Learn disaster recovery strategies for SQL Server in Azure Author(s) Joey D'Antoni, Louis Davidson, Allan Hirt, and their co-authors bring years of experience in database management, cloud architecture, and technical writing. They aim to provide clear and actionable advice for working efficiently with SQL Server on Azure. Their insights come from real-world projects. Who is it for? This book is for developers, database administrators, and cloud architects who are looking to learn how to deploy SQL Server solutions on Azure Virtual Machines. If you are transitioning workloads to the cloud or need to manage or optimize such environments, this book will equip you with the skills you need. Basic SQL Server knowledge is helpful.

Smarter Data Science

Organizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how. Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments. When an organization manages its data effectively, its data science program becomes a fully scalable function that’s both prescriptive and repeatable. With an understanding of data science principles, practitioners are also empowered to lead their organizations in establishing and deploying viable AI. They employ the tools of machine learning, deep learning, and AI to extract greater value from data for the benefit of the enterprise. By following a ladder framework that promotes prescriptive capabilities, organizations can make data science accessible to a range of team members, democratizing data science throughout the organization. Companies that collect, organize, and analyze data can move forward to additional data science achievements: Improving time-to-value with infused AI models for common use cases Optimizing knowledge work and business processes Utilizing AI-based business intelligence and data visualization Establishing a data topology to support general or highly specialized needs Successfully completing AI projects in a predictable manner Coordinating the use of AI from any compute node. From inner edges to outer edges: cloud, fog, and mist computing When they climb the ladder presented in this book, businesspeople and data scientists alike will be able to improve and foster repeatable capabilities. They will have the knowledge to maximize their AI and data assets for the benefit of their organizations.

Send us a text  Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, Data and AI Expert Services and Learning at IBM , Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts.

This week on Making Data Simple, we have Priya Srinivasan Director, IBM Data and AI Expert Labs SWAT. In this week’s podcast we talk about Data (world’s new oil), AI (world’s refinery), Cloud (the pipeline), Unified Governance and Data Ops, Security, Analytics and Services.

Show Notes 4:50 Expert labs and what it means 5:52 Al talks how SWAT was for him 6:10 Priya discusses SWAT now 7:18 Priya gives examples of SWAT 9:04 Deliverables from Expert Labs 13:37 Al talks about Services 17:30 Priya talks about solving long term problems 20:04 Priya discusses GROW (Guidance, Resources, and Outreach for Women) 21:25 Al asks Priya what excites her

Linkedin - https://www.linkedin.com/in/sripriya-srinivasan-385a0812/ Twitter - https://twitter.com/Priyavikram2

GROW - https://w3-connections.ibm.com/wikis/home?lang=en-us#!/wiki/W7e7074647e13_420c_9abf_875dd706e4b4/page/Welcome%20to%20GROW%20in%20Hybrid%20Cloud%20-%20Guidance,%20Resources,%20Outreach%20for%20Women%20in%20Hybrid%20Cloud

Connect with the Team Producer Kate Brown - LinkedIn. Producer Michael Sestak - LinkedIn. Producer Meighann Helene - LinkedIn.

Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform

Use this guide to one of SQL Server 2019’s most impactful features—Big Data Clusters. You will learn about data virtualization and data lakes for this complete artificial intelligence (AI) and machine learning (ML) platform within the SQL Server database engine. You will know how to use Big Data Clusters to combine large volumes of streaming data for analysis along with data stored in a traditional database. For example, you can stream large volumes of data from Apache Spark in real time while executing Transact-SQL queries to bring in relevant additional data from your corporate, SQL Server database. Filled with clear examples and use cases, this book provides everything necessary to get started working with Big Data Clusters in SQL Server 2019. You will learn about the architectural foundations that are made up from Kubernetes, Spark, HDFS, and SQL Server on Linux. You then are shown how to configure and deploy Big Data Clusters in on-premises environments or in the cloud. Next, you are taught about querying. You will learn to write queries in Transact-SQL—taking advantage of skills you have honed for years—and with those queries you will be able to examine and analyze data from a wide variety of sources such as Apache Spark. Through the theoretical foundation provided in this book and easy-to-follow example scripts and notebooks, you will be ready to use and unveil the full potential of SQL Server 2019: combining different types of data spread across widely disparate sources into a single view that is useful for business intelligence and machine learning analysis. What You Will Learn Install, manage, and troubleshoot Big Data Clusters in cloud or on-premise environments Analyze large volumes of data directly from SQL Server and/or Apache Spark Manage data stored in HDFS from SQL Server as if it wererelational data Implement advanced analytics solutions through machine learning and AI Expose different data sources as a single logical source using data virtualization Who This Book Is For Data engineers, data scientists, data architects, and database administrators who want to employ data virtualization and big data analytics in their environments