talk-data.com talk-data.com

Topic

Data Analytics

data_analysis statistics insights

760

tagged

Activity Trend

38 peak/qtr
2020-Q1 2026-Q1

Activities

760 activities · Newest first

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

Explore the modern market of data analytics platforms and the benefits of using Snowflake computing, the data warehouse built for the cloud. With the rise of cloud technologies, organizations prefer to deploy their analytics using cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform. Cloud vendors are offering modern data platforms for building cloud analytics solutions to collect data and consolidate into single storage solutions that provide insights for business users. The core of any analytics framework is the data warehouse, and previously customers did not have many choices of platform to use. Snowflake was built specifically for the cloud and it is a true game changer for the analytics market. This book will help onboard you to Snowflake, present best practices to deploy, and use the Snowflake data warehouse. In addition, it covers modern analytics architecture and use cases. It provides use cases of integration with leading analytics software such as Matillion ETL, Tableau, and Databricks. Finally, it covers migration scenarios for on-premise legacy data warehouses. What You Will Learn Know the key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Who This Book Is For Those working with data warehouse and business intelligence (BI) technologies, and existing and potential Snowflake users

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.  Abstract This week on Making Data Simple, we have a joint finale for the series Stories from the Field. Hosts Al Martin and Wennie Allen have a discussion with Gordon Johnson, Global Head of Optimization for DHL. We get an insider's perspective on data within the shipping and logistics world, helping optimize shipping methods to get medical supplies where they are needed most.  Connect with Gordon LinkedIn Connect with Wennie LinkedIn Big Data Hub Show Notes 02:20 - Learn more here about how big data analytics is making an impact at DHL. 09:43 - Check out this article on how AI changes the Logistics Industry. 17:43 - Find out more about how machine learning is changing supply chain management here. 20:33 - Discover what incubators are all about here. Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn.  Producer Mark Simmonds - LinkedIn.  Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Big Data Analytics Methods

Big Data Analytics Methods unveils secrets to advanced analytics techniques ranging from machine learning, random forest classifiers, predictive modeling, cluster analysis, natural language processing (NLP), Kalman filtering and ensembles of models for optimal accuracy of analysis and prediction. More than 100 analytics techniques and methods provide big data professionals, business intelligence professionals and citizen data scientists insight on how to overcome challenges and avoid common pitfalls and traps in data analytics. The book offers solutions and tips on handling missing data, noisy and dirty data, error reduction and boosting signal to reduce noise. It discusses data visualization, prediction, optimization, artificial intelligence, regression analysis, the Cox hazard model and many analytics using case examples with applications in the healthcare, transportation, retail, telecommunication, consulting, manufacturing, energy and financial services industries. This book's state of the art treatment of advanced data analytics methods and important best practices will help readers succeed in data analytics.

SQL Server Big Data Clusters: Early First Edition Based on Release Candidate 1

Get a head-start on learning one of SQL Server 2019’s latest and most impactful features—Big Data Clusters—that combines large volumes of non-relational data for analysis along with data stored relationally inside a SQL Server database. This book provides a first look at Big Data Clusters based upon SQL Server 2019 Release Candidate 1. Start now and get a jump on your competition in learning this important new feature. Big Data Clusters is a feature set covering data virtualization, distributed computing, and relational databases and provides a complete AI platform across the entire cluster environment. This book shows you how to deploy, manage, and use Big Data Clusters. For example, you will learn how to combine data stored on the HDFS file system together with data stored inside the SQL Server instances that make up the Big Data Cluster. Filled with clear examples and use cases, this book provides everything necessary to get started working with Big Data Clusters in SQL Server 2019 using Release Candidate 1. You will learn about the architectural foundations that are made up from Kubernetes, Spark, HDFS, and SQL Server on Linux. You then are shown how to configure and deploy Big Data Clusters in on-premises environments or in the cloud. Next, you are taught about querying. You will learn to write queries in Transact-SQL—taking advantage of skills you have honed for years—and with those queries you will be able to examine and analyze data from a wide variety of sources such as Apache Spark. Through the theoretical foundation provided in this book and easy-to-follow example scripts and notebooks, you will be ready to use and unveil the full potential of SQL Server 2019: combining different types of data spread across widely disparate sources into a single view that is useful for business intelligence and machine learning analysis. What You Will Learn Install, manage, and troubleshoot Big Data Clusters in cloud or on-premise environments Analyze large volumes of data directly from SQL Server and/or Apache Spark Manage data stored in HDFS from SQL Server as if it were relational data Implement advanced analytics solutions through machine learning and AI Expose different data sources as a single logical source using data virtualization Who This Book Is For For data engineers, data scientists, data architects, and database administrators who want to employ data virtualization and big data analytics in their environment

Reporting, Predictive Analytics, and Everything in Between

Business decisions today are tactical and strategic at the same time. How do you respond to a competitor’s price change? Or to specific technology changes? What new products, markets, or businesses should you pursue? Decisions like these are based on information from only one source: data. With this practical report, technical and non-technical leaders alike will explore the fundamental elements necessary to embark on a data analytics initiative. Is your company planning or contemplating a data analytics initiative? Authors Brett Stupakevich, David Sweenor, and Shane Swiderek from TIBCO guide you through several analytics options. IT leaders, product developers, analytics leaders, data analysts, data scientists, and business professionals will learn how to deploy analytic components in streaming and embedded systems using one of five platforms. You’ll examine: Analytics platforms including embedded BI, reporting, data exploration & discovery, streaming BI, and data science & machine learning The business problems each option solves and the capabilities and requirements of each How to identify the right analytics type for your particular use case Key considerations and the level of investment for each analytics platform

One of the hardest parts of running a data analytics program inside a large organization is governing data and reports. It’s simply too easy for the definition of core data elements and metrics to get out of sync and reports to contain conflicting information.

Angie Davis has straddled both the business and IT worlds for more than 20 years. She served as a business analyst in several organizations before switching to the information technology side of the business where she ran analytics teams, first at JD Irving for six years and more recently at Brookfield Renewable where she is an IT director. Angie has a degree in mathematics and electrical engineering from Dalhousie University in Halifax, Nova Scotia.

SQL Server 2019 Revealed: Including Big Data Clusters and Machine Learning

Get up to speed on the game-changing developments in SQL Server 2019. No longer just a database engine, SQL Server 2019 is cutting edge with support for machine learning (ML), big data analytics, Linux, containers, Kubernetes, Java, and data virtualization to Azure. This is not a book on traditional database administration for SQL Server. It focuses on all that is new for one of the most successful modernized data platforms in the industry. It is a book for data professionals who already know the fundamentals of SQL Server and want to up their game by building their skills in some of the hottest new areas in technology. SQL Server 2019 Revealed begins with a look at the project's team goal to integrate the world of big data with SQL Server into a major product release. The book then dives into the details of key new capabilities in SQL Server 2019 using a “learn by example” approach for Intelligent Performance, security, mission-criticalavailability, and features for the modern developer. Also covered are enhancements to SQL Server 2019 for Linux and gain a comprehensive look at SQL Server using containers and Kubernetes clusters. The book concludes by showing you how to virtualize your data access with Polybase to Oracle, MongoDB, Hadoop, and Azure, allowing you to reduce the need for expensive extract, transform, and load (ETL) applications. You will then learn how to take your knowledge of containers, Kubernetes, and Polybase to build a comprehensive solution called Big Data Clusters, which is a marquee feature of 2019. You will also learn how to gain access to Spark, SQL Server, and HDFS to build intelligence over your own data lake and deploy end-to-end machine learning applications. What You Will Learn Implement Big Data Clusters with SQL Server, Spark, and HDFS Create a Data Hub with connections to Oracle, Azure, Hadoop, and other sources Combine SQL and Spark to build a machine learning platform for AI applications Boost your performance with no application changes using Intelligent Performance Increase security of your SQL Server through Secure Enclaves and Data Classification Maximize database uptime through online indexing and Accelerated Database Recovery Build new modern applications with Graph, ML Services, and T-SQL Extensibility with Java Improve your ability to deploy SQL Server on Linux Gain in-depth knowledge to run SQL Server with containers and Kubernetes Know all the new database engine features for performance, usability, and diagnostics Use the latest tools and methods to migrate your database to SQL Server 2019 Apply your knowledge of SQL Server 2019 to Azure Who This Book Is For IT professionals and developers who understand the fundamentals of SQL Server and wish to focus on learning about the new, modern capabilities of SQL Server 2019. The book is for those who want to learn about SQL Server 2019 and the new Big Data Clusters and AI feature set, support for machine learning and Java, how to run SQL Server with containers and Kubernetes, and increased capabilities around Intelligent Performance, advanced security, and high availability.

Model Management and Analytics for Large Scale Systems

Model Management and Analytics for Large Scale Systems covers the use of models and related artefacts (such as metamodels and model transformations) as central elements for tackling the complexity of building systems and managing data. With their increased use across diverse settings, the complexity, size, multiplicity and variety of those artefacts has increased. Originally developed for software engineering, these approaches can now be used to simplify the analytics of large-scale models and automate complex data analysis processes. Those in the field of data science will gain novel insights on the topic of model analytics that go beyond both model-based development and data analytics. This book is aimed at both researchers and practitioners who are interested in model-based development and the analytics of large-scale models, ranging from big data management and analytics, to enterprise domains. The book could also be used in graduate courses on model development, data analytics and data management. Identifies key problems and offers solution approaches and tools that have been developed or are necessary for model management and analytics Explores basic theory and background, current research topics, related challenges and the research directions for model management and analytics Provides a complete overview of model management and analytics frameworks, the different types of analytics (descriptive, diagnostics, predictive and prescriptive), the required modelling and method steps, and important future directions

How do you organize a data analytics program to maximize value for the organization? Although there is no right or wrong way to do this, several patterns emerge when you examine successful organizations.

Originally published at https://www.eckerson.com/articles/organizing-for-success-part-ii-how-to-organize-a-data-analytics-program

Real-Time Data Analytics for Large Scale Sensor Data

Real-Time Data Analytics for Large-Scale Sensor Data covers the theory and applications of hardware platforms and architectures, the development of software methods, techniques and tools, applications, governance and adoption strategies for the use of massive sensor data in real-time data analytics. It presents the leading-edge research in the field and identifies future challenges in this fledging research area. The book captures the essence of real-time IoT based solutions that require a multidisciplinary approach for catering to on-the-fly processing, including methods for high performance stream processing, adaptively streaming adjustment, uncertainty handling, latency handling, and more. Examines IoT applications, the design of real-time intelligent systems, and how to manage the rapid growth of the large volume of sensor data Discusses intelligent management systems for applications such as healthcare, robotics and environment modeling Provides a focused approach towards the design and implementation of real-time intelligent systems for the management of sensor data in large-scale environments

Summary Data engineers are responsible for building tools and platforms to power the workflows of other members of the business. Each group of users has their own set of requirements for the way that they access and interact with those platforms depending on the insights they are trying to gather. Benn Stancil is the chief analyst at Mode Analytics and in this episode he explains the set of considerations and requirements that data analysts need in their tools and. He also explains useful patterns for collaboration between data engineers and data analysts, and what they can learn from each other.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Counsil. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Benn Stancil, chief analyst at Mode Analytics, about what data engineers need to know when building tools for analysts

Interview

Introduction How did you get involved in the area of data management? Can you start by describing some of the main features that you are looking for in the tools that you use? What are some of the common shortcomings that you have found in out-of-the-box tools that organizations use to build their data stack? What should data engineers be considering as they design and implement the foundational data platforms that higher order systems are built on, which are ultimately used by analysts and data scientists?

In terms of mindset, what are the ways that data engineers and analysts can align and where are the points of conflict?

In terms of team and organizational structure, what have you found to be useful patterns for reducing friction in the product lifecycle for data tools (internal or external)? What are some anti-patterns that data engineers can guard against as they are designing their pipelines? In your experience as an analyst, what have been the characteristics of the most seamless projects that you have been involved with? How much understanding of analytics are necessary for data engineers to be successful in their projects and careers?

Conversely, how much understanding of data management should analysts have?

What are the industry trends that you are most excited by as an analyst?

Contact Info

LinkedIn @bennstancil on Twitter Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for

Advanced Elasticsearch 7.0

Dive deep into the advanced capabilities of Elasticsearch 7.0 with this expert-level guide. In this book, you will explore the most effective techniques and tools for building, indexing, and querying advanced distributed search engines. Whether optimizing performance, scaling applications, or integrating with big data analytics, this guide empowers you with practical skills and insights. What this Book will help me do Master ingestion pipelines and preprocess documents for faster and more efficient indexing. Model search data optimally for complex and varied real-world applications. Perform exploratory data analyses using Elasticsearch's robust features. Integrate Elasticsearch with modern analytics platforms like Kibana and Logstash. Leverage Elasticsearch with Apache Spark and machine learning libraries for real-time advanced analytics. Author(s) None Wong is a seasoned Elasticsearch expert with years of real-world experience developing enterprise-grade search and analytics systems. With a passion for innovation and teaching, Wong enjoys breaking down complex technical concepts into digestible learning experiences. His work reflects a pragmatic and results-driven approach to teaching Elasticsearch. Who is it for? This book is ideal for Elasticsearch developers and data engineers with some prior experience who are looking to elevate their skills to an advanced level. It suits professionals seeking to enhance their expertise in building scalable search and analytics solutions. If you aim to master sophisticated Elasticsearch operations and real-time integrations, this book is tailored for you.

SQL for Data Analytics

SQL for Data Analytics provides readers with the tools and knowledge to use SQL effectively for extracting, analyzing, and interpreting complex datasets. Whether you're working with time-series data, geospatial data, or textual data, this book combines insightful explanations with practical guidance to enhance your data analysis capabilities. What this Book will help me do Perform advanced statistical calculations using SQL functions like WINDOW. Develop and optimize queries for better performance and faster results. Analyze and work with geospatial, time-series, and text datasets effectively. Debug problematic SQL queries and ensure their correctness. Create robust SQL pipelines and integrate them with other analytics tools. Author(s) The authors of SQL for Data Analytics, Upom Malik, Matt Goldwasser, and Benjamin Johnston, are seasoned professionals experienced in both the practical and theoretical aspects of SQL and data analysis. They bring their collective expertise to guide readers through the essentials and advanced usage of SQL in analytics. Who is it for? This book is aimed at database engineers aspiring to delve into analytics, backend developers wanting to improve their data handling skills, and data professionals aiming to enhance their SQL proficiency. A basic understanding of SQL and databases will help readers follow along and maximize their learning.

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully integrated platform to meet the needs of enterprise grade analytics it provides a solution for the full lifecycle of data at massive scale. In this episode Flavio Villanustre, VP of infrastructure and products at HPCC Systems, shares the history of the platform, how it is architected for scale and speed, and the unique solutions that it provides for enterprise grade data analytics. He also discusses the motivations for open sourcing the platform, the detailed workflow that it enables, and how you can try it for your own projects. This was an interesting view of how a well engineered product can survive massive evolutionary shifts in the industry while remaining relevant and useful.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! To connect with the startups that are shaping the future and take advantage of the opportunities that they provide, check out Angel List where you can invest in innovative business, find a job, or post a position of your own. Sign up today at dataengineeringpodcast.com/angel and help support this show. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Counsil. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Flavio Villanustre about the HPCC Systems project and his work at LexisNexis Risk Solutions

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what the HPCC system is and the problems that you were facing at LexisNexis Risk Solutions which led to its creation?

What was the overall state of the data landscape at the time and what was the motivation for releasing it as open source?

Can you describe the high level architecture of the HPCC Systems platform and some of the ways that the design has changed over the years that it has been maintained? Given how long the project has been in use, c

podcast_episode
by Daniel Kirsch (Hurwitz & Associates) , Judith Hurwitz (Hurwitz & Associates) , Al Martin (IBM)

Send us a text The authors of Machine Learning for Dummies – Judith Hurwitz, and Daniel Kirsch — are here to help you. In this episode, Judith, Daniel and Al discuss the state of machine learning today, how to use it to advance your business as well as discoveries they made while writing their book. Learn how small and large businesses alike can find insights from data to enhance relationships with customers. We’ll also share where you can get a copy of Machine Learning for Dummies at no cost. Show notes 01.00 Connect with Al Martin on Twitter and LinkedIn. 01.10 Connect with Kate Nichols on Twitter and LinkedIn. 01.15 Connect with Fatima Sirhindi on Twitter and LinkedIn. 02.00 Learn more about Hurwitz & Associates. 02.10 Connect with Judith Hurwitz on Twitter, LinkedIn and find her blog here. 03.20 Connect with Daniel Kirsch on Twitter and  Hurwitz & Associates 04.00 Read Machine Learning for Dummiesby Judith Hurwitz and Daniel Kirsch. 04.40 Learn what neural nets are here. 04.50 Learn more about Arthur Samuel here. 05.00 Learn more about how Deep Blue beat the world chess champion. 15.39 Learn more about Apache Hadoop.  17.30 Learn more about IBM Watson. 26.50 Find Cognitive Computing and Big Data Analytics by Judith Hurwitz, Marcia Kaufman and Adrian Bowles. 27.45 FindEverybody Lies: Big Data, New Data and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Hands-On Data Analysis with Pandas

Hands-On Data Analysis with Pandas provides an intensive dive into mastering the pandas library for data science and analysis using Python. Through a combination of conceptual explanations and practical demonstrations, readers will learn how to manipulate, visualize, and analyze data efficiently. What this Book will help me do Understand and apply the pandas library for efficient data manipulation. Learn to perform data wrangling tasks such as cleaning and reshaping datasets. Create effective visualizations using pandas and libraries like matplotlib and seaborn. Grasp the basics of machine learning and implement solutions with scikit-learn. Develop reusable data analysis scripts and modules in Python. Author(s) Stefanie Molin is a seasoned data scientist and software engineer with extensive experience in Python and data analytics. She specializes in leveraging the latest data science techniques to solve real-world problems. Her engaging and detailed writing draws from her practical expertise, aiming to make complex concepts accessible to all. Who is it for? This book is ideal for data analysts and aspiring data scientists who are at the beginning stages of their careers or looking to enhance their toolset with pandas and Python. It caters to Python developers eager to delve into data analysis workflows. Readers should have some programming knowledge to fully benefit from the examples and exercises.

Data Warehousing with Greenplum, 2nd Edition

Data professionals are confronting the most disruptive change since relational databases appeared in the 1980s. SQL is still a major tool for data analytics, but conventional relational database management systems can’t handle the increasing size and complexity of today’s datasets. This updated edition teaches you best practices for Greenplum Database, the open source massively parallel processing (MPP) database that accommodates large sets of nonrelational and relational data. Marshall Presser, field CTO at Pivotal, introduces Greenplum’s approach to data analytics and data-driven decisions, beginning with its shared-nothing architecture. IT managers, developers, data analysts, system architects, and data scientists will all gain from exploring data organization and storage, data loading, running queries, and learning to perform analytics in the database. Discover how MPP and Greenplum will help you go beyond the traditional data warehouse. This ebook covers: Greenplum features, use case examples, and techniques for optimizing use Four Greenplum deployment options to help you balance security, cost, and time to usability Why each networked node in Greenplum’s architecture includes an independent operating system, memory, and storage Additional tools for monitoring, managing, securing, and optimizing query responses in the Pivotal Greenplum commercial database

Are you still struggling to create and build a good social media presence in the data space? This week, I talk with real-life data analytics and BI rockstar, Kate Strachnyi. Kate is a Program Manager and Consultant of Data Analytics, as well as an author, blogger, and LinkedIn "Top-Voice" in data analytics with well over 50,000+ followers. In this knowledge filled BI masterclass, Kate teaches you the importance of social media and the exact steps you can take to up your social media game and become a rockstar!  

Sponsor

This exciting season of AOF is sponsored by our BI Data Storytelling Mastery Accelerator 3-Day Live workshop. Our first workshop coming up from Sept 17-19 is 75% full! Join us and consider upgrading to a VIP. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of three days, you'll leave with a clear BI delivery action plan. Register today!

Enjoyed the Show? Please leave us a review on iTunes.   For all links and resources mentioned visit: https://bibrainz.com/podcast/28