talk-data.com talk-data.com

Topic

AWS

Amazon Web Services (AWS)

cloud cloud provider infrastructure services

837

tagged

Activity Trend

190 peak/qtr
2020-Q1 2026-Q1

Activities

837 activities · Newest first

Time Series Analysis on AWS

Time Series Analysis on AWS is your guide to building and deploying powerful forecasting models and identifying anomalies in your time series data. With this book, you will explore effective strategies for modern time series analysis using Amazon Web Services' powerful AI/ML tools. What this Book will help me do Master the fundamental concepts of time series and its applications using industry-relevant examples. Understand time series forecasting with Amazon Forecast and how to deliver actionable business insights. Build and deploy anomaly detection systems using Amazon Lookout for Equipment for predictive maintenance. Learn to utilize Amazon Lookout for Metrics to identify business operational anomalies effectively. Gain practical experience applying AWS ML tools to real-world time series data challenges. Author(s) None Hoarau is a data scientist with extensive experience in utilizing machine learning to solve real-world problems. Combining strong programming skills with domain expertise, they focus on developing applications leveraging AWS AI services. This book reflects their passion for making technical topics accessible and actionable for professionals. Who is it for? This book is ideal for data analysts, business analysts, and data scientists eager to enhance their skills in time series analysis. It suits readers familiar with statistical concepts but new to machine learning. If you're aiming to solve business problems using data and AWS tools, this resource is tailored for you.

Welcome to the Qrvey Podcast. In today’s episode we’re talking to Nick Durkin, Field CTO and VP of Field Engineering at Harness. He has experience in all kinds of areas, from investing to understanding architecture, so he’s the perfect guest to kick off our podcast!   Nick shares some of his knowledge around how SaaS companies can grow and scale faster, the importance of time and speed in the SaaS niche, and the biggest challenges facing companies here.   We discuss the importance of SaaS companies understanding their core competencies and focusing on what they do best. We also dive into the pros and cons associated with using third-party tools instead of developing everything themselves. Nick talks about the trends of the cloud age, like serverless architecture, and how important they are. Is this the only way to go?   Finally, Nick explains the importance of people when building companies, and creating a positive environment and culture for collaboration between everyone involved.   This episode is brought to you by Qrvey The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com. Qrvey, the modern no-code analytics solution for SaaS companies on AWS.

Actionable Insights with Amazon QuickSight

Discover the power of Amazon QuickSight with this comprehensive guide. Learn to create stunning data visualizations, integrate machine learning insights, and automate operations to optimize your data analytics workflows. This book offers practical guidance on utilizing QuickSight to develop insightful and interactive business intelligence solutions. What this Book will help me do Understand the role of Amazon QuickSight within the AWS analytics ecosystem. Learn to configure data sources and develop visualizations effectively. Gain skills in adding interactivity to dashboards using custom controls and parameters. Incorporate machine learning capabilities into your dashboards, including forecasting and anomaly detection. Explore advanced features like QuickSight APIs and embedded multi-tenant analytics design. Author(s) None Samatas is an AWS-certified big data solutions architect with years of experience in designing and implementing scalable analytics solutions. With a clear and practical approach, None teaches how to effectively leverage Amazon QuickSight for efficient and insightful business intelligence applications. Their expertise ensures readers will gain actionable skills. Who is it for? This book is ideal for business intelligence (BI) developers and data analysts looking to deepen their expertise in creating interactive dashboards using Amazon QuickSight. It is a perfect guide for professionals aiming to explore machine learning integration in BI solutions. Familiarity with basic data visualization concepts is recommended, but no prior experience with Amazon QuickSight is needed.

Data Engineering with AWS

Discover how to effectively build and manage data engineering pipelines using AWS with "Data Engineering with AWS". In this hands-on book, you'll explore the foundational principles of data engineering, learn to architect data pipelines, and work with essential AWS services to process, transform, and analyze data. What this Book will help me do Understand and implement modern data engineering pipelines with AWS services. Gain proficiency in automating data ingestion and transformation using Amazon tools. Perform efficient data queries and analysis leveraging Amazon Athena and Redshift. Create insightful data visualizations using Amazon QuickSight. Apply machine learning techniques to enhance data engineering processes. Author(s) None Eagar, a Senior Data Architect with over twenty-five years of experience, specializes in modern data architectures and cloud solutions. With a rich background in applying data engineering to real-world problems, None Eagar shares expertise in a clear and approachable way for readers. Who is it for? This book is perfect for data engineers and data architects aiming to grow their expertise in AWS-based solutions. It's also geared towards beginners in data engineering wanting to adopt the best practices. Those with a basic understanding of big data and cloud platforms will find it particularly valuable, but prior AWS experience is not required.

Cloud-Native Microservices with Apache Pulsar: Build Distributed Messaging Microservices

Apply different enterprise integration and processing strategies available with Pulsar, Apache's multi-tenant, high-performance, cloud-native messaging and streaming platform. This book is a comprehensive guide that examines using Pulsar Java libraries to build distributed applications with message-driven architecture. You'll begin with an introduction to Apache Pulsar architecture. The first few chapters build a foundation of message-driven architecture. Next, you'll perform a setup of all the required Pulsar components. The book also covers work with Apache Pulsar client library to build producers and consumers for the discussed patterns. You'll then explore the transformation, filter, resiliency, and tracing capabilities available with Pulsar. Moving forward, the book will discuss best practices when building message schemas and demonstrate integration patterns using microservices. Security is an important aspect of any application;the book will cover authentication and authorization in Apache Pulsar such as Transport Layer Security (TLS), OAuth 2.0, and JSON Web Token (JWT). The final chapters will cover Apache Pulsar deployment in Kubernetes. You'll build microservices and serverless components such as AWS Lambda integrated with Apache Pulsar on Kubernetes. After completing the book, you'll be able to comfortably work with the large set of out-of-the-box integration options offered by Apache Pulsar. What You'll Learn Examine the important Apache Pulsar components Build applications using Apache Pulsar client libraries Use Apache Pulsar effectively with microservices Deploy Apache Pulsar to the cloud Who This Book Is For Cloud architects and software developers who build systems in the cloud-native technologies.

Serverless Analytics with Amazon Athena

Delve into the serverless world of Amazon Athena with the comprehensive book 'Serverless Analytics with Amazon Athena'. This guide introduces you to the power of Athena, showing you how to efficiently query data in Amazon S3 using SQL without the hassle of managing infrastructure. With clear instructions and practical examples, you'll master querying structured, unstructured, and semi-structured data seamlessly. What this Book will help me do Effectively query and analyze both structured and unstructured data stored in S3 using Amazon Athena. Integrate Athena with other AWS services to create powerful, secure, and cost-efficient data workflows. Develop ETL pipelines and machine learning workflows leveraging Athena's compatibility with AWS Glue. Monitor and troubleshoot Athena queries for consistent performance and build scalable serverless data solutions. Implement security best practices and optimize costs when managing your Athena-driven data solutions. Author(s) None Virtuoso, along with co-authors Mert Turkay Hocanin None and None Wishnick, brings a wealth of experience in cloud solutions, serverless technologies, and data engineering. They excel in demystifying complex technical topics and have a passion for empowering readers with practical skills and knowledge. Who is it for? This book is tailored for business intelligence analysts, application developers, and system administrators who want to harness Amazon Athena for seamless, cost-efficient data analytics. It suits individuals with basic SQL knowledge looking to expand their capabilities in querying and processing data. Whether you're managing growing datasets or building data-driven applications, this book provides the know-how to get it right.

Storage Systems

Storage Systems: Organization, Performance, Coding, Reliability and Their Data Processing was motivated by the 1988 Redundant Array of Inexpensive/Independent Disks proposal to replace large form factor mainframe disks with an array of commodity disks. Disk loads are balanced by striping data into strips—with one strip per disk— and storage reliability is enhanced via replication or erasure coding, which at best dedicates k strips per stripe to tolerate k disk failures. Flash memories have resulted in a paradigm shift with Solid State Drives (SSDs) replacing Hard Disk Drives (HDDs) for high performance applications. RAID and Flash have resulted in the emergence of new storage companies, namely EMC, NetApp, SanDisk, and Purestorage, and a multibillion-dollar storage market. Key new conferences and publications are reviewed in this book.The goal of the book is to expose students, researchers, and IT professionals to the more important developments in storage systems, while covering the evolution of storage technologies, traditional and novel databases, and novel sources of data. We describe several prototypes: FAWN at CMU, RAMCloud at Stanford, and Lightstore at MIT; Oracle's Exadata, AWS' Aurora, Alibaba's PolarDB, Fungible Data Center; and author's paper designs for cloud storage, namely heterogeneous disk arrays and hierarchical RAID. Surveys storage technologies and lists sources of data: measurements, text, audio, images, and video Familiarizes with paradigms to improve performance: caching, prefetching, log-structured file systems, and merge-trees (LSMs) Describes RAID organizations and analyzes their performance and reliability Conserves storage via data compression, deduplication, compaction, and secures data via encryption Specifies implications of storage technologies on performance and power consumption Exemplifies database parallelism for big data, analytics, deep learning via multicore CPUs, GPUs, FPGAs, and ASICs, e.g., Google's Tensor Processing Units

Digitally connected humans like you and me are surrounded by a plethora of AI solutions that make our lives easier and more efficient. Just think about the algorithms driving Netflix and Youtube’s video recommendations or the facial recognition feature on your phone that saves you a few seconds every time you unlock it. But for every useful AI solution, there are probably hundreds of solutions that don’t meet the functional, economic or ethical standards of their end users. So, what’s the trick to building useful and impactful AI solutions that are also financially viable for those who create them? Someone who can answer this question is Corey Quinn, who is the Chief Cloud Economist at The Duckbill Group and the founder of two podcasts called “Screaming in the Cloud” and “AWS Morning Brief”. Corey combines an excellent sense of humour with a deep understanding of the cloud and everything that surrounds it, so he is definitely the right person to go to for an unfiltered view of the hype that surrounds a lot of AI solutions. In this episode of Leaders of Analytics, we talk about: Whether AI is all it’s made up to be or just a complex solution to our problemsWho’s benefiting from the AI hypeThe role of cloud computing in AI and machine learning deliveryHow to use cloud computing effectively when deploying AI solutionsHow to create an impactful career by solving real business problemsCorey’s top 3 recommendations for AI success in the cloud

We talked about:

Natalie’s background Airbyte What is ETL? Why ELT instead of ETL? Transformations How does ELT help analysts be more independent? Data marts and Data warehouses Ingestion DB ETL vs ELT Data lakes Data swamps Data governance Ingestion layer vs Data lake Do you need both a Data warehouse and a Data lake? Airbyte and ELT Modern data stack Reverse ETL Is drag-and-drop killing data engineering jobs? Who is responsible for managing unused data? CDC – Change Data Capture Slowly changing dimension Are there cases where ETL is preferable over ELT? Why is Airbyte open source? The case of Elasticsearch and AWS

Links:

Natalie's LinkedIn: https://www.linkedin.com/in/nataliekwong/ https://airbyte.io/blog/why-the-future-of-etl-is-not-elt-but-el

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Getting Started with Streamlit for Data Science

Getting Started with Streamlit for Data Science is your essential guide to quickly and efficiently building dynamic data science web applications in Python using Streamlit. Whether you're embedding machine learning models, visualizing data, or deploying projects, this book helps you excel in creating and sharing interactive apps with ease. What this Book will help me do Set up a development environment to create your first Streamlit application. Implement and visualize dynamic data workflows by integrating various Python libraries into Streamlit. Develop and showcase machine learning models within Streamlit for clear and interactive presentations. Deploy your projects effortlessly using platforms like Streamlit Sharing, Heroku, and AWS. Utilize tools like Streamlit Components and themes to enhance the aesthetics and usability of your apps. Author(s) Tyler Richards is a data science expert with extensive experience in leveraging technology to present complex data models in an understandable way. He brings practical solutions to readers, aiming to empower them with the tools they need to succeed in the field of data science. Tyler adopts a hands-on teaching method with illustrative examples to ensure clarity and easy learning. Who is it for? This book is designed for anyone involved in data science, from beginners just starting in the field to experienced professionals who want to learn to create interactive web applications using Streamlit. Ideal for those with a working knowledge of Python, this resource will help you streamline your workflows and enhance your project presentations.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, IBM Expert Services Delivery, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts. This week on Making Data Simple, we have Alex Watson. Alex was previously a GM at AWS and is currently a Co-Founder at Gretel.ai. Gretel is a privacy startup that enables developers, researchers, and scientists to quickly create safe versions of data for use in pre-production environments and machine learning workloads, which are shareable across teams and organizations. These tools address head-on the massive data privacy bottleneck--which has stifled innovation across multiple industries for years—by equipping builders everywhere with the ability to create quality datasets that scale. In short, synthetic data levels the playing field for everyone. This democratization of data will foster competition, scientific discoveries, and the inventions that will drive the next revolution of our data economy.  The company recently closed their series-A funding, led by Greylock, for another $12 million and brought Jason Warner, the current CTO for GitHub, on as an investor. Gretel also launched its latest public beta, Beta2, which offers privacy engineering as a service for everyone, not just developers. Show Notes 2:03 – Alex’s background 4:36 – What time frame was Harvest AI? 7:14 – How does NLP play into Harvest AI? 10:50 – How can we not have enough knowledge? 14:08 – Does the tech exist today for security? 18:14 – Privacy issues 20:42 – What does Gretel stand for? 27:42 – Do you increase the opportunity for bias? 31:18 – Where is the sweet spot for Gretel? 33:30 – When do synthetic not work? 37:42 – What is practical privacy? Gretel Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Data Science on AWS

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

High Performant File System Workloads for AI and HPC on AWS using IBM Spectrum Scale

This IBM® Redpaper® publication is intended to facilitate the deployment and configuration of the IBM Spectrum® Scale based high-performance storage solutions for the scalable data and AI solutions on Amazon Web Services (AWS). Configuration, testing results, and tuning guidelines for running the IBM Spectrum Scale based high-performance storage solutions for the data and AI workloads on AWS are the focus areas of the paper. The LAB Validation was conducted with the Red Hat Linux nodes to IBM Spectrum Scale by using the various Amazon Elastic Compute Cloud (EC2) instances. Simultaneous workloads are simulated across multiple Amazon EC2 nodes running with Red Hat Linux to determine scalability against the IBM Spectrum Scale clustered file system. Solution architecture, configuration details, and performance tuning demonstrate how to maximize data and AI application performance with IBM Spectrum Scale on AWS.

Custom Fiori Applications in SAP HANA: Design, Develop, and Deploy Fiori Applications for the Enterprise

Get started building custom Fiori applications for your enterprise. This book teaches you how to design, build, and deploy enterprise-ready, custom Fiori applications in SAP HANA. Tips and tricks collected from projects using Fiori applications (built consuming OData models and REST APIs) and integrating third-party JS libraries are presented. Also included are examples using Fiori templates from different tools such as the SAP Web IDE and the new Visual Studio Code extensions. This book explains the 5 design principles that all Fiori applications are built upon: Role-based, Responsive, Coherent, Simple, and Delightful. The book expands on consuming OData services and REST APIs internal and external to SAP HANA. The Fiori application exercise demonstrates the use of the MVC pattern, JavaScript modularization, reuse of SAP UI5 controls, debugging, and the tools required for a complete scenario. The book closes with an exercise showcasing a finished single page application with multiple views and layouts, navigation between the views, and deployment of the application to AWS. This book is simple enough for entry-level developers getting started in web frameworks but also highlights integration points from the data models being consumed from the application, and shows how the application communicates with back-end services, resulting in a complete front-end custom Fiori application. What You Will Learn Know the 5 Fiori design principles Understand how to consume OData and REST API models Apply the MVC pattern using XML views and the SAP UI5 controls along with controller behavior in JavaScript Debug and deploy the application Who This Book is For Web developers and application leads who have some experience in JavaScript frameworks and web development and understand web protocol communication

What Is a Data Lake?

A revolution is occurring in data management regarding how data is collected, stored, processed, governed, managed, and provided to decision makers. The data lake is a popular approach that harnesses the power of big data and marries it with the agility of self-service. With this report, IT executives and data architects will focus on the technical aspects of building a data lake for your organization. Alex Gorelik from Facebook explains the requirements for building a successful data lake that business users can easily access whenever they have a need. You'll learn the phases of data lake maturity, common mistakes that lead to data swamps, and the importance of aligning data with your company's business strategy and gaining executive sponsorship. You'll explore: The ingredients of modern data lakes, such as the use of different ingestion methods for different data formats, and the importance of the three Vs: volume, variety, and velocity Building blocks of successful data lakes, including data ingestion, integration, persistence, data governance, and business intelligence and self-service analytics State-of-the-art data lake architectures offered by Amazon Web Services, Microsoft Azure, and Google Cloud

Hybrid Multicloud Business Continuity for OpenShift Workloads with IBM Spectrum Virtualize in AWS

This publication is intended to facilitate the deployment of the hybrid cloud business continuity solution with Red Hat OpenShift Container Platform and IBM® block CSI (Container Storage Interface) driver plug-in for IBM Spectrum® Virtualize on Public Cloud AWS (Amazon Web Services). This solution is designed to protect the data by using IBM Storage-based Global Mirror replication. For demonstration purposes, MySQL containerized database is installed on the on-premises IBM FlashSystem® that is connected to the Red Hat OpenShift Container Platform (OCP) cluster in the vSphere environment through the IBM block CSI driver. The volume (LUN) on IBM FlashSystem storage system is replicated by using global mirror on IBM Spectrum Virtualize for Public Cloud on AWS. Red Hat OpenShift cluster (OCP cluster) and the IBM block CSI driver plug-in are installed on AWS by using Installer-Provisioned Infrastructure (IPI) methodology. The information in this document is distributed on an as-is basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Virtualize for Public Cloud is supported and entitled, and where the issues are specific to this Blueprint implementation.

Metabase Up and Running

Metabase Up and Running is your go-to guide for mastering Metabase, the open-source business intelligence tool. You'll progress from the basics of installation and setup to connecting data sources and creating insightful visualizations and dashboards. By the end, you'll be confident in implementing Metabase in your organization for impactful decision-making. What this Book will help me do Understand how to securely deploy and configure Metabase on Amazon Web Services. Master the creation of dashboards, reports, and visual visualizations using Metabase's tools. Gain expertise in user and permissions management within Metabase. Learn to use Metabase's SQL console for advanced database interactions. Acquire skills to embed Metabase within applications and automate reports via email or Slack. Author(s) None Abraham, an experienced tool specialist, is passionate about teaching others how to leverage data tools effectively. With a background in business analytics, Abraham has guided companies of all sizes. Their approachable writing style ensures a learning journey that is both informative and engaging. Who is it for? This book is ideal for business analysts and data professionals looking to amplify their business intelligence capabilities using Metabase. Readers should have some understanding of data analytics principles. Whether you're starting in analytics or seeking advanced automation, this book offers valuable guidance to meet your goals.

Red Hat OpenShift on Public Cloud with IBM Block Storage

The purpose of this document is to show how to install RedHat OpenShift Container Platform (OCP) on Amazon web services (AWS) public cloud with OpenShift installer, a method that is known as Installer-provisioned infrastructure (IPI). We also describe how to validate the installation of IBM container storage interface (CSI) driver on OCP 4.2 that is installed on AWS. This document also describes the installation of OCP 4.x on AWS with customization and OCP 4.x installation on IBM cloud. This document discusses how to provision internet small computer system interface (iSCSI) storage that is made available by IBM Spectrum® Virtualize for Public Cloud (SVPC) that is deployed on AWS. Finally, the document discusses the use of Red Hat OpenShift command line interface (CLI), OCP web console graphical user interface (GUI), and AWS console.

Advanced R 4 Data Programming and the Cloud: Using PostgreSQL, AWS, and Shiny

Program for data analysis using R and learn practical skills to make your work more efficient. This revised book explores how to automate running code and the creation of reports to share your results, as well as writing functions and packages. It includes key R 4 features such as a new color palette for charts, an enhanced reference counting system, and normalization of matrix and array types where matrix objects now formally inherit from the array class, eliminating inconsistencies. Advanced R 4 Data Programming and the Cloud is not designed to teach advanced R programming nor to teach the theory behind statistical procedures. Rather, it is designed to be a practical guide moving beyond merely using R; it shows you how to program in R to automate tasks. This book will teach you how to manipulate data in modern R structures and includes connecting R to databases such as PostgreSQL, cloud services such as Amazon Web Services (AWS), and digital dashboards such as Shiny. Each chapter also includes a detailed bibliography with references to research articles and other resources that cover relevant conceptual and theoretical topics. What You Will Learn Write and document R functions using R 4 Make an R package and share it via GitHub or privately Add tests to R code to ensure it works as intended Use R to talk directly to databases and do complex data management Run R in the Amazon cloud Deploy a Shiny digital dashboard Generate presentation-ready tables and reports using R Who This Book Is For Working professionals, researchers, and students who are familiar with R and basic statistical techniques such as linear regression and who want to learn how to take their R coding and programming to the next level.