talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

IBM z14 Model ZR1 Technical Guide

Abstract This IBM® Redbooks® publication describes the new member of the IBM Z® family, IBM z14™ Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, in an industry standard footprint. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 ZR1 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 ZR1 servers to deliver a record level of capacity over the previous IBM Z platforms. In its maximum configuration, z14 ZR1 is powered by up to 30 client characterizable microprocessors (cores) running at 4.5 GHz. This configuration can run more than 29,000 million instructions per second and up to 8 TB of client memory. The IBM z14 Model ZR1 is estimated to provide up to 54% more total system capacity than the IBM z13s® Model N20. This Redbooks publication provides information about IBM z14 ZR1 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with IBM Z technology and terminology.

Data Analytics with Spark Using Python, First edition

Spark for Data Professionals introduces and solidifies the concepts behind Spark 2.x, teaching working developers, architects, and data professionals exactly how to build practical Spark solutions. Jeffrey Aven covers all aspects of Spark development, including basic programming to SparkSQL, SparkR, Spark Streaming, Messaging, NoSQL and Hadoop integration. Each chapter presents practical exercises deploying Spark to your local or cloud environment, plus programming exercises for building real applications. Unlike other Spark guides, Spark for Data Professionals explains crucial concepts step-by-step, assuming no extensive background as an open source developer. It provides a complete foundation for quickly progressing to more advanced data science and machine learning topics. This guide will help you: Understand Spark basics that will make you a better programmer and cluster “citizen” Master Spark programming techniques that maximize your productivity Choose the right approach for each problem Make the most of built-in platform constructs, including broadcast variables, accumulators, effective partitioning, caching, and checkpointing Leverage powerful tools for managing streaming, structured, semi-structured, and unstructured data

Big Data Analytics with Hadoop 3

Big Data Analytics with Hadoop 3 is your comprehensive guide to understanding and leveraging the power of Apache Hadoop for large-scale data processing and analytics. Through practical examples, it introduces the tools and techniques necessary to integrate Hadoop with other popular frameworks, enabling efficient data handling, processing, and visualization. What this Book will help me do Understand the foundational components and features of Apache Hadoop 3 such as HDFS, YARN, and MapReduce. Gain the ability to integrate Hadoop with programming languages like Python and R for data analysis. Learn the skills to utilize tools such as Apache Spark and Apache Flink for real-time data analytics within the Hadoop ecosystem. Develop expertise in setting up a Hadoop cluster and performing analytics in cloud environments such as AWS. Master the process of building practical big data analytics pipelines for end-to-end data processing. Author(s) Sridhar Alla is a seasoned big data professional with extensive industry experience in building and deploying scalable big data analytics solutions. Known for his expertise in Hadoop and related ecosystems, Sridhar combines technical depth with clear communication in his writing, providing practical insights and hands-on knowledge. Who is it for? This book is tailored for data professionals, software engineers, and data scientists looking to expand their expertise in big data analytics using Hadoop 3. Whether you're an experienced developer or new to the big data ecosystem, this book provides the step-by-step guidance and practical examples needed to advance your skills and achieve your analytical goals.

Hands-On Data Warehousing with Azure Data Factory

Dive into the world of ETL (Extract, Transform, Load) with 'Hands-On Data Warehousing with Azure Data Factory'. This book guides readers through the essential techniques for working with Azure Data Factory and SQL Server Integration Services to design, implement, and optimize ETL solutions for both on-premises and cloud data environments. What this Book will help me do Understand and utilize Azure Data Factory and SQL Server Integration Services to build ETL solutions. Design scalable and high-performance ETL architectures tailored to modern data problems. Integrate various Azure services, such as Azure Data Lake Analytics, Machine Learning, and Databricks Spark, into your workflows. Troubleshoot and optimize ETL pipelines and address common challenges in data processing. Create insightful Power BI dashboards to visualize and interact with data from your ETL workflows. Author(s) Authors None Cote, Michelle Gutzait, and Giuseppe Ciaburro bring a wealth of experience in data engineering and cloud technologies to this practical guide. Combining expertise in Azure ecosystem and hands-on Data Warehousing, they deliver actionable insights for working professionals. Who is it for? This book is crafted for software professionals working in data engineering, especially those specializing in ETL processes. Readers with a foundational knowledge of SQL Server and cloud infrastructures will benefit most. If you aspire to implement state-of-the-art ETL pipelines or enhance existing workflows with ADF and SSIS, this book is an ideal resource.

SAS for Finance

SAS for Finance introduces readers to utilizing SAS software for robust financial data analysis and model construction. Through hands-on examples and industry-focused techniques, this book demonstrates how to harness the power of SAS to develop effective analytical models, allowing you to uncover deeper insights and facilitate data-informed decision-making. What this Book will help me do Master the fundamentals of financial time series analysis using SAS effectively. Develop advanced forecasting models utilizing econometric techniques with SAS. Use clustering and similarity analysis in SAS to understand customer behavior. Create and interpret survival models for customer loyalty analysis. Gain proficiency in financial risk assessment using SAS for diversified applications. Author(s) None Gulati brings years of expertise in financial analytics and technical instruction to this publication. With a rich background in leveraging statistical software, the author has guided financial analysts and data scientists in building data models that solve real-world challenges. Known for practical insights, None's approach makes advanced concepts accessible and actionable. Who is it for? This book is tailored for financial analysts and data scientists aspiring to enhance their analytical capabilities with SAS. While prior familiarity with SAS software provides an advantage, beginners can also find value, provided they have a foundational understanding of finance. Ideal for professionals aiming to model data, forecast trends, and derive actionable insights in the financial domain.

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Matt Policastro (Clearhead) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

Regression. Correlation. Normality. t-tests. Falsities of both the positive and negative varieties. How do these terms and techniques play nicely with digital analytics data? Are they the schoolyard bullies wielded by data scientists, destined to simply run by and kick sand in the faces of our sessions, conversion rates, and revenues per visit? Or, are they actually kind-hearted upperclassmen who are ready and willing to let us into their world? That's the topic of this show (albeit without the awkward and forced metaphors). Matt Policastro from Clearhead joined the gang to talk -- in as practical terms as possible -- about bridging the gap between traditional digital analytics data and the wonderful world of statistics. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Summary

Most businesses end up with data in a myriad of places with varying levels of structure. This makes it difficult to gain insights from across departments, projects, or people. Presto is a distributed SQL engine that allows you to tie all of your information together without having to first aggregate it all into a data warehouse. Kamil Bajda-Pawlikowski co-founded Starburst Data to provide support and tooling for Presto, as well as contributing advanced features back to the project. In this episode he describes how Presto is architected, how you can use it for your analytics, and the work that he is doing at Starburst Data.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Kamil Bajda-Pawlikowski about Presto and his experiences with supporting it at Starburst Data

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Presto is?

What are some of the common use cases and deployment patterns for Presto?

How does Presto compare to Drill or Impala? What is it about Presto that led you to building a business around it? What are some of the most challenging aspects of running and scaling Presto? For someone who is using the Presto SQL interface, what are some of the considerations that they should keep in mind to avoid writing poorly performing queries?

How does Presto represent data for translating between its SQL dialect and the API of the data stores that it interfaces with?

What are some cases in which Presto is not the right solution? What types of support have you found to be the most commonly requested? What are some of the types of tooling or improvements that you have made to Presto in your distribution?

What are some of the notable changes that your team has contributed upstream to Presto?

Contact Info

Website E-mail Twitter – @starburstdata Twitter – @prestodb

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Starburst Data Presto Hadapt Hadoop Hive Teradata PrestoCare Cost Based Optimizer ANSI SQL Spill To Disk Tempto Benchto Geospatial Functions Cassandra Accumulo Kafka Redis PostGreSQL

The intro and outro music is from The Hug by The Freak Fandango Orchestra / {CC BY-SA](http://creativecommons.org/licenses/by-sa/3.0/)?utm_source=rss&utm_medium=rss Support Data Engineering Podcast

In this podcast, Justin Borgman talks about his journey of starting a data science start, doing an exit, and jumping on another one. The session is filled with insights for leadership, looking for entrepreneurial wisdom to get on a data-driven journey.

Timeline: 0:28 Justin's journey. 3:22 Taking the plunge to start a new company. 5:49 Perception vs. reality of starting a data warehouse company. 8:15 Bringing in something new to the IT legacy. 13:20 Getting your first few customers. 16:16 Right moment for a data warehouse company to look for a new venture. 18:20 Right person to have as a co-founder. 20:29 Advantages of going seed vs. series A. 22:13 When is a company ready for seeding or series A? 24:40 Who's a good adviser? 26:35 Exiting Teradata. 28:54 Teradata to starting a new company. 31:24 Excitement of starting something from scratch. 32:24 What is Starburst? 37:15 Presto, a great engine for cloud platforms. 40:30 How can a company get started with Presto. 41:50 Health of enterprise data. 44:15 Where does Presto not fit in? 45:19 Future of enterprise data. 46:36 Drawing parallels between proprietary space and open source space. 49:02 Does align with open-source gives a company a better chance in seeding. 51:44 John's ingredients for success. 54:05 John's favorite reads. 55:01 Key takeaways.

Paul's Recommended Read: The Outsiders Paperback – S. E. Hinton amzn.to/2Ai84Gl

Podcast Link: https://futureofdata.org/running-a-data-science-startup-one-decision-at-a-time-futureofdata-podcast/

Justin's BIO: Justin has spent the better part of a decade in senior executive roles building new businesses in the data warehousing and analytics space. Before co-founding Starburst, Justin was Vice President and General Manager at Teradata (NYSE: TDC), where he was responsible for the company’s portfolio of Hadoop products. Prior to joining Teradata, Justin was co-founder and CEO of Hadapt, the pioneering "SQL-on-Hadoop" company that transformed Hadoop from file system to analytic database accessible to anyone with a BI tool. Teradata acquired Hadapt in 2014.

Justin earned a BS in Computer Science from the University of Massachusetts at Amherst and an MBA from the Yale School of Management.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this podcast, Marc Rind from ADP talked about big data in HR. He shared some of the best practices and opportunities that reside in HR data. Marc also shared some tactical steps to perform to help build better data-driven teams to execute data-driven strategies. This podcast is great for folks looking to explore the depth of HR data and the opportunities that reside in it.

Timeline: 0:28 Marc's journey. 4:50 Marc's typical day. 7:23 Data use cases in ADP. 11:20 Driving innovation and thought leadership. 15:15 Creating awareness for the necessity for innovation. 18:54 Listening skills key for innovation. 20:25 HR's role in the time of automation. 27:45 Product development and data science. 30:36 Working on a client analytics platform. 34:41 Team building. 37:52 Tips for established businesses to get started with data. 41:20 Data opportunities for entrepreneurs in the HR space. 43:23 Marc's ingredients for success. 46:35 Marc's reading list. 48:35 Key takeaways.

Podcast Link: https://futureofdata.org/understanding-bigdata-bigopportunity-in-hr-marcrind-futureofdata/

Marc's BIO: Marc is responsible for leading the research and development of Automatic Data Processing’s (ADP’s) Analytics and Big Data initiative. In this capacity, Marc drives the innovation and thought leadership in building ADP’s Client Analytics platform. ADP Analytics provides its clients not only the ability to read the pulse of its own human capital…but also provides the information on how they stack up within their industry, along with the best courses of action to achieve its goals through quantifiable insights.

Marc was also an instrumental leader behind the small business market payroll platform; RUN Powered by ADP®. Marc leads a number of the technology teams responsible for delivering its critically acclaimed product focused on its innovative user experience for small business owners.

Prior to joining ADP, Marc’s innovative spirit and fascination with data were forged at Bolt Media, a dot-com start-up based in NY’s “Silicon Alley”. The company was an early predecessor to today’s social media outlets. As an early ‘Data Scientist,’ Marc focused on the patterns and predictions of site usage through the harnessing of the data on its +10 million user profiles.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery) , Els Aerts (AGConsult)

Thanks for stopping by. Please get comfortable. We're going to be taking a few notes while you listen, but pay that no mind. Now, what we'd like you to do is listen to the podcast. Oh. And don't worry about that big mirror over there. There may be 2 or 3 or 10 people watching. Wow. We're terrible moderators when it comes to this sort of thing. That's why Els Aerts from AGConsult joined us to discuss user research: what it is, where it should fit in an organization's toolkit, and some tips for doing it well. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Analytics and Big Data for Accountants

Analytics is the new force driving business. Tools have been created to measure program impacts and ROI, visualize data and business processes, and uncover the relationship between key performance indicators, many using the unprecedented amount of data now flowing into organizations. Featuring updated examples and surveys, this dynamic book covers leading-edge topics in analytics and finance. It is packed with useful tips and practical guidance you can apply immediately. This book prepares accountants to: Deal with major trends in predictive analytics, optimization, correlation of metrics, and big data. Interpret and manage new trends in analytics techniques affecting your organization. Use new tools for data analytics. Critically interpret analytics reports and advise decision makers.

In this podcast, @JohnNives discusses ways to demystify AI for the enterprise. He shares his perspective on how businesses should engage with AI and what are some of the best practices and considerations for businesses to adopt AI in their strategic roadmap. This podcast is great for anyone seeking to learn about the way to adopt AI in the enterprise landscape.

Timelines: 0:28 John's journey. 6:50 John's current role. 9:40 The role of a chief digital officer. 11:16 The current trend of AI. 13:52 AI hype or real? 16:42 Why AI now? 19:03 Demystifying deep learning. 23:35 Enterprise use cases of AI. 28:25 Attributes of a successful AI project. 32:20 Best AI investments in an enterprise. 36:56 Convincing leadership to adopt AI. 39:20 Organizational implications of adopting AI. 43:45 What do executives get wrong about AI? 48:36 Tips for executives to understand the AI landscape. 53:11 John's favorite reads. 57:35 Closing remarks.

John's Recommended Listen: FutureOfData Podcast math.im/itunes War and Peace Leo Tolstoy (Author),‎ Frederick Davidson (Narrator),‎ Inc. Blackstone Audio (Publisher) amzn.to/2w7ObkI

Podcast Link: https://futureofdata.org/johnnives-on-ways-to-demystify-ai-for-enterprise/

Jean's BIO: Jean-Louis (John) Nives serves as Chief Digital Officer and the Global Chair of the Digital Transformation practice at N2Growth. Prior to joining N2Growth, Mr. Nives was at IBM Global Business Services, within the Watson and Analytics Center of Competence. There he worked on Cognitive Digital Transformation projects related to Watson, Big Data, Analytics, Social Business and Marketing/Advertising Technology. Examples include CognitiveTV and the application of external unstructured data (social, weather, etc.) for business transformation. Prior relevant experience includes executive leadership positions at Nielsen, IRI, Kraft and two successful advertising technology acquisitions (Appnexus and SintecMedia). In this capacity, Jean-Louis combined information, analytics and technology to created significant business value in transformative ways. Jean-Louis earned a Bachelor’s Degree in Industrial Engineering from University at Buffalo and an MBA in Finance and Computer Science from Pace University. He is married with four children and lives in the New York City area.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to discuss their journey in creating the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

IBM Spectrum Scale Best Practices for Genomics Medicine Workloads

Advancing the science of medicine by targeting a disease more precisely with treatment specific to each patient relies on access to that patient's genomics information and the ability to process massive amounts of genomics data quickly. Although genomics data is becoming a critical source for precision medicine, it is expected to create an expanding data ecosystem. Therefore, hospitals, genome centers, medical research centers, and other clinical institutes need to explore new methods of storing, accessing, securing, managing, sharing, and analyzing significant amounts of data. Healthcare and life sciences organizations that are running data-intensive genomics workloads on an IT infrastructure that lacks scalability, flexibility, performance, management, and cognitive capabilities also need to modernize and transform their infrastructure to support current and future requirements. IBM® offers an integrated solution for genomics that is based on composable infrastructure. This solution enables administrators to build an IT environment in a way that disaggregates the underlying compute, storage, and network resources. Such a composable building block based solution for genomics addresses the most complex data management aspect and allows organizations to store, access, manage, and share huge volumes of genome sequencing data. IBM Spectrum™ Scale is software-defined storage that is used to manage storage and provide massive scale, a global namespace, and high-performance data access with many enterprise features. IBM Spectrum Scale™ is used in clustered environments, provides unified access to data via file protocols (POSIX, NFS, and SMB) and object protocols (Swift and S3), and supports analytic workloads via HDFS connectors. Deploying IBM Spectrum Scale and IBM Elastic Storage™ Server (IBM ESS) as a composable storage building block in a Genomics Next Generation Sequencing deployment offers key benefits of performance, scalability, analytics, and collaboration via multiple protocols. This IBM Redpaper™ publication describes a composable solution with detailed architecture definitions for storage, compute, and networking services for genomics next generation sequencing that enable solution architects to benefit from tried-and-tested deployments, to quickly plan and design an end-to-end infrastructure deployment. The preferred practices and fully tested recommendations described in this paper are derived from running GATK Best Practices work flow from the Broad Institute. The scenarios provide all that is required, including ready-to-use configuration and tuning templates for the different building blocks (compute, network, and storage), that can enable simpler deployment and that can enlarge the level of assurance over the performance for genomics workloads. The solution is designed to be elastic in nature, and the disaggregation of the building blocks allows IT administrators to easily and optimally configure the solution with maximum flexibility. The intended audience for this paper is technical decision makers, IT architects, deployment engineers, and administrators who are working in the healthcare domain and who are working on genomics-based workloads.

Data engineering is one of the hottest and most difficult jobs to fill in the field of analytics. Breadth and depth of required skills limits the number of people qualified to work as data engineers. If you’re seeking to hire data engineers, consider the 24 skill areas identified here as guidance to shape job descriptions and to screen candidates. If you’re seeking to become a data engineer, take the skills assessment to highlight your strengths and identify your gaps.

Originally published at https://www.eckerson.com/articles/data-engineering-coming-of-age

Infographics Powered by SAS

Create compelling business infographics with SAS and familiar office productivity tools. A picture is worth a thousand words, but what if there are a billion words? When analyzing big data, you need a picture that cuts through the noise. This is where infographics come in. Infographics are a representation of information in a graphic format designed to make the data easily understandable. With infographics, you don’t need deep knowledge of the data. The infographic combines story telling with data and provides the user with an approachable entry point into business data. Infographics Powered by SAS : Data Visualization Techniques for Business Reporting shows you how to create graphics to communicate information and insight from big data in the boardroom and on social media. Learn how to create business infographics for all occasions with SAS and learn how to build a workflow that lets you get the most from your SAS system without having to code anything, unless you want to! This book combines the perfect blend of creative freedom and data governance that comes from leveraging the power of SAS and the familiarity of Microsoft Office. Topics covered in this book include: SAS Visual Analytics SAS Office Analytics SAS/GRAPH software (SAS code examples) Data visualization with SAS Creating reports with SAS Using reports and graphs from SAS to create business presentations Using SAS within Microsoft Office

A Deep Dive into NoSQL Databases: The Use Cases and Applications

A Deep Dive into NoSQL Databases: The Use Cases and Applications, Volume 109, the latest release in the Advances in Computers series first published in 1960, presents detailed coverage of innovations in computer hardware, software, theory, design and applications. In addition, it provides contributors with a medium in which they can explore their subjects in greater depth and breadth. This update includes sections on NoSQL and NewSQL databases for big data analytics and distributed computing, NewSQL databases and scalable in-memory analytics, NoSQL web crawler application, NoSQL Security, a Comparative Study of different In-Memory (No/New)SQL Databases, NoSQL Hands On-4 NoSQLs, the Hadoop Ecosystem, and more. Provides a very comprehensive, yet compact, book on the popular domain of NoSQL databases for IT professionals, practitioners and professors Articulates and accentuates big data analytics and how it gets simplified and streamlined by NoSQL database systems Sets a stimulating foundation with all the relevant details for NoSQL database researchers, developers and administrators

Summary

The rate of change in the data engineering industry is alternately exciting and exhausting. Joe Crobak found his way into the work of data management by accident as so many of us do. After being engrossed with researching the details of distributed systems and big data management for his work he began sharing his findings with friends. This led to his creation of the Hadoop Weekly newsletter, which he recently rebranded as the Data Engineering Weekly newsletter. In this episode he discusses his experiences working as a data engineer in industry and at the USDS, his motivations and methods for creating a newsleteter, and the insights that he has gleaned from it.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Joe Crobak about his work maintaining the Data Engineering Weekly newsletter, and the challenges of keeping up with the data engineering industry.

Interview

Introduction How did you get involved in the area of data management? What are some of the projects that you have been involved in that were most personally fulfilling?

As an engineer at the USDS working on the healthcare.gov and medicare systems, what were some of the approaches that you used to manage sensitive data? Healthcare.gov has a storied history, how did the systems for processing and managing the data get architected to handle the amount of load that it was subjected to?

What was your motivation for starting a newsletter about the Hadoop space?

Can you speak to your reasoning for the recent rebranding of the newsletter?

How much of the content that you surface in your newsletter is found during your day-to-day work, versus explicitly searching for it? After over 5 years of following the trends in data analytics and data infrastructure what are some of the most interesting or surprising developments?

What have you found to be the fundamental skills or areas of experience that have maintained relevance as new technologies in data engineering have emerged?

What is your workflow for finding and curating the content that goes into your newsletter? What is your personal algorithm for filtering which articles, tools, or commentary gets added to the final newsletter? How has your experience managing the newsletter influenced your areas of focus in your work and vice-versa? What are your plans going forward?

Contact Info

Data Eng Weekly Email Twitter – @joecrobak Twitter – @dataengweekly

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

USDS National Labs Cray Amazon EMR (Elastic Map-Reduce) Recommendation Engine Netflix Prize Hadoop Cloudera Puppet healthcare.gov Medicare Quality Payment Program HIPAA NIST National Institute of Standards and Technology PII (Personally Identifiable Information) Threat Modeling Apache JBoss Apache Web Server MarkLogic JMS (Java Message Service) Load Balancer COBOL Hadoop Weekly Data Engineering Weekly Foursquare NiFi Kubernetes Spark Flink Stream Processing DataStax RSS The Flavors of Data Science and Engineering CQRS Change Data Capture Jay Kreps

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Creating a Data-Driven Enterprise in Media

The data-driven revolution is finally hitting the media and entertainment industry. For decades, broadcast television and print media relied on traditional delivery channels for solvency and growth, but those channels fragmented as cable, streaming, and digital devices stole the show. In this ebook, you’ll learn about the trends, challenges, and opportunities facing players in this industry as they tackle big data, advanced analytics, and DataOps. You’ll explore best practices and lessons learned from three real-world media companies—Sling TV, Turner Broadcasting, and Comcast—as they proceed on their data-driven journeys. Along the way, authors Ashish Thusoo and Joydeep Sen Sarma explain how DataOps breaks down silos and connects everyone who handles data, including engineers, data scientists, analysts, and business users. Big-data-as-a-service provider Qubole provides a five-step maturity model that outlines the phases that a company typically goes through when it first encounters big data. Case studies include: Sling TV: this live streaming content platform delivers live TV and on-demand entertainment instantly to a variety of smart televisions, tablets, game consoles, computers, smartphones, and streaming devices Turner Broadcasting System: this Time Warner division recently created the Turner Data Cloud to support direct-to-consumer services, including FilmStruck, Boom (for kids), and NBA League Pass Comcast: the largest broadcasting and cable TV company is building a single integrated big data platform to deliver internet, TV, and voice to more than 28 million customers

Implementing IBM FlashSystem V9000 AE3

Abstract The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with several areas that affect your business: Highly virtualized environments Cloud computing Mobile and social systems of engagement In-depth, real-time analytics Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate when they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today's data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V8.1. It describes the core product architecture, software, hardware, and implementation, and provides hints and tips. The underlying basic hardware and software architecture and features of the IBM FlashSystem V9000 AC3 control enclosure and on IBM Spectrum Virtualize 8.1 software are described in these publications: Implementing IBM FlashSystem 900 Model AE3, SG24-8414 Implementing the IBM System Storage SAN Volume Controller V7.4, SG24-7933 Using IBM FlashSystem V9000 software functions, management tools, and interoperability combines the performance of IBM FlashSystem architecture with the advanced functions of software-defined storage to deliver performance, efficiency, and functions that meet the needs of enterprise workloads that demand IBM MicroLatency® response time. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

Have you ever walked out of a meeting with a clear idea of the analysis that you're going to conduct, only to find yourself three days later staring at an endless ocean of crunched data and wondering in which direction you're supposed to be paddling your analysis boat? That might not be an ocean. It might be an analytics rabbit hole. In this episode, the gang explores the Analysis of Competing Hypotheses approach developed by Richards Heuer as part of his work with the CIA, inductive versus deductive reasoning, and engaging stakeholders as a mechanism for focusing an analysis. Ironically, our intrepid hosts had a really hard time avoiding topical rabbit holes during the episode. But, acknowledging the problem is the first part of the solution! For complete show notes, including links to items mentioned in this show and a transcript of the discussion, visit the show page.