talk-data.com talk-data.com

Topic

Data Analytics

data_analysis statistics insights

760

tagged

Activity Trend

38 peak/qtr
2020-Q1 2026-Q1

Activities

760 activities · Newest first

Data Literacy is increasingly becoming a skill that every role needs to have, regardless of whether their role a data-oriented or not. No one knows this better than Jordan Morrow, who is known as the Godfather of Data Literacy.

Jordan is the VP and Head of Data Analytics at Brainstorm, Inc., and is the author of Be Data Literate: The Skills Everyone Needs to Succeed.Jordan has been a fierce advocate for data literacy throughout his career, including helping the United Nations understand and utilize data literacy effectively.

Throughout the episode, we define data literacy, why organizations need data literacy in order to use data properly and drive business impact, how to increase organizational data literacy, and more.

This episode of DataFramed is a part of DataCamp’s Data Literacy Month, where we raise awareness for Data Literacy throughout the month of September through webinars, workshops, and resources featuring thought leaders and subject matter experts that can help you build your data literacy, as well as your organization’s. For more information, visit: https://www.datacamp.com/data-literacy-month/for-teams

Serverless ETL and Analytics with AWS Glue

Discover how to harness AWS Glue for your ETL and data analysis workflows with "Serverless ETL and Analytics with AWS Glue." This comprehensive guide introduces readers to the capabilities of AWS Glue, from building data lakes to performing advanced ETL tasks, allowing you to create efficient, secure, and scalable data pipelines with serverless technology. What this Book will help me do Understand and utilize various AWS Glue features for data lake and ETL pipeline creation. Leverage AWS Glue Studio and DataBrew for intuitive data preparation workflows. Implement effective storage optimization techniques for enhanced data analytics. Apply robust data security measures, including encryption and access control, to protect data. Integrate AWS Glue with machine learning tools like SageMaker to build intelligent models. Author(s) The authors of this book include experts across the fields of data engineering and AWS technologies. With backgrounds in data analytics, software development, and cloud architecture, they bring a depth of practical experience. Their approach combines hands-on tutorials with conceptual clarity, ensuring a blend of foundational knowledge and actionable insights. Who is it for? This book is designed for ETL developers, data engineers, and data analysts who are familiar with data management concepts and want to extend their skills into serverless cloud solutions. If you're looking to master AWS Glue for building scalable and efficient ETL pipelines or are transitioning existing systems to the cloud, this book is ideal for you.

SQL for Data Analytics - Third Edition

SQL for Data Analytics is an accessible guide to helping readers efficiently use SQL for data analytics tasks. You will learn the ins and outs of writing SQL queries, preparing datasets, and utilizing advanced features like geospatial data handling and window functions. Demystify the process of harnessing SQL to tackle analytical data challenges in a structured and hands-on way. What this Book will help me do Become proficient in preparing and managing datasets using SQL. Learn to write efficient SQL queries for summarizing and analyzing data. Master advanced SQL features, including window functions and JSON handling. Optimize SQL queries and automate analytical tasks for efficiency. Gain practical experience analyzing data with real-world scenarios. Author(s) The authors, Jun Shan, Matt Goldwasser, Upom Malik, and Benjamin Johnston, are experienced professionals in data analytics and database management. They bring a blend of technical expertise and practical insights to teaching SQL for analytics. Their collective knowledge ensures that the book caters to all levels, from foundational concepts to advanced techniques. Who is it for? This book is ideal for database engineers transitioning into analytics, backend engineers looking to deepen their understanding of production data, and data scientists or business analysts seeking to boost their SQL analytics skills. Readers should have a basic grasp of SQL and familiarity with statistics and linear algebra to fully benefit from the contents.

Send us a text Money Ball is back! Nancy Hensley, Chief Marketing Officer for Stats Perform, gives us the latest on data analytics in sports. If you like sports don't listen unless you have time to be entertained. Show Notes 04:09 What does Money Ball look like now?07:30 Mrs Chicago's personal update08:40 Fan website: The Analyst11:16 Stats Perform for the rest of us17:25 Sports tech competitors18:34 Monetizing data. $115M for NFL data! What?27:44 Broadcaster and PressboxLinkedin: https://www.linkedin.com/in/nancyhensley/ Website: https://statsperform.com/ Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Codeless Time Series Analysis with KNIME

This book, "Codeless Time Series Analysis with KNIME," serves as your practical guide to mastering time series analysis using the KNIME Analytics Platform. By diving into this book, you'll explore a variety of statistical and machine learning techniques applied explicitly to real-world time series scenarios, helping you build predictive and analysis models effectively. What this Book will help me do Leverage KNIME's powerful tools to preprocess and prepare time series data for analysis. Visualize and dissect time series data into its components like trends and seasonality. Apply statistical models like ARIMA to analyze and forecast continuous data. Train and utilize neural networks including LSTM models for predictive analytics. Integrate external tools like Spark and H2O to enhance your forecasting workflows. Author(s) The authors, including experts from KNIME AG, Corey Weisinger, Maarit Widmann, and Daniele Tonini, collectively bring extensive experience in data analytics and time series modeling. Their expertise with KNIME's tools and real-world time series analysis applications ensures readers gain insights into practical, hands-on techniques. Who is it for? This book is ideally suited for data analysts and scientists eager to explore time series analysis through codeless methodologies. Beginners will benefit from the introductory explanations, while seasoned professionals will find value in the advanced topics and real-world examples. A basic understanding of the KNIME platform is recommended to get the most from this book.

Snowflake: The Definitive Guide

Snowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users at all levels within an organization to make data-driven decisions. Whether you're an IT professional working in data warehousing or data science, a business analyst or technical manager, or an aspiring data professional wanting to get more hands-on experience with the Snowflake platform, this book is for you. You'll learn how Snowflake users can build modern integrated data applications and develop new revenue streams based on data. Using hands-on SQL examples, you'll also discover how the Snowflake Data Cloud helps you accelerate data science by avoiding replatforming or migrating data unnecessarily. You'll be able to: Efficiently capture, store, and process large amounts of data at an amazing speed Ingest and transform real-time data feeds in both structured and semistructured formats and deliver meaningful data insights within minutes Use Snowflake Time Travel and zero-copy cloning to produce a sensible data recovery strategy that balances system resilience with ongoing storage costs Securely share data and reduce or eliminate data integration costs by accessing ready-to-query datasets available in the Snowflake Marketplace

Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databri

The Department of Veterans Affairs (VA) is home to over 420,000 employees, provides health care for 9.16 million enrollees and manages the benefits of 5.75 million recipients. The VA also hosts an array of financial management, professional, and administrative services at their Financial Service Center (FSC), located in Austin, Texas. The FSC is divided into various service groups organized around revenue centers and product lines, including the Data Analytics Service (DAS). To support the VA mission, in 2021 FSC DAS continued to press forward with their cloud modernization efforts, successfully achieving four key accomplishments:

Office of Community Care (OCC) Financial Time Series Forecast - Financial forecasting enhancements to predict claims CFO Dashboard - Productivity and capability enhancements for financial and audit analytics Datasets Migrated to the Cloud - Migration of on-prem datasets to the cloud for down-stream analytics (includes a supply chain proof-of-concept) Data Science Hackathon - A hackathon to predict bad claims codes and demonstrate DAS abilities to accelerate a ML use case using Databricks AutoML

This talk discusses FSC DAS’ cloud and data science modernization accomplishments in 2021, lessons learned, and what’s ahead.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Productionizing Ethical Credit Scoring Systems with Delta Lake, Feature Store and MLFlow

Fairness, Ethics, Accountability and Transparency (FEAT) are must-haves for high-stakes machine learning models. In particular, models within the Financial Services industry such as those that assign credit scores can impact people’s access to housing and utilities and even influence their social standing. Hence, model developers have a moral responsibility to ensure that models do not systematically disadvantage any one group. Nevertheless, implementing such models in industrial settings remains challenging. A lack of concrete guidelines, common standards and technical templates make evaluating models from a FEAT perspective unfeasible. To address these implementation challenges, the Monetary Authority of Singapore (MAS) set up the Veritas Initiative to create a framework for operationalising the FEAT principles, so as to guide the responsible development of AIDA (Artificial Intelligence and Data Analytics) systems.

In January 2021, MAS announced the successful conclusion of Phase 1 of the Veritas Initiative. Deliverables included an assessment methodology for the Fairness principle and open source code for applying Fairness metrics to two use cases - customer marketing and credit scoring. In this talk, we demonstrate how these open-source examples, and their fairness metrics, might be put into production using open source tools such as Delta Lake and MLFlow. Although the Veritas Framework was developed in Singapore, the ethical framework is applicable across geographies.

By doing this, we illustrate how ethical principles can be operationalised, monitored and maintained in production, thus moving beyond only accuracy-based metrics of model performance and towards a more holistic and principled way of developing and productionizing machine learning systems.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Securing Databricks on AWS Using Private Link

Minimizing data transfers over the public internet is among the top priorities for organizations of any size, both for security and cost reasons. Modern cloud-native data analytics platforms need to support deployment architectures that meet this objective. For Databricks on AWS such an architecture is realized thanks to AWS PrivateLink, which allows computing resources deployed on different virtual private networks and different AWS accounts to communicate securely without ever crossing the public internet.

In this session, we want to provide a brief introduction to AWS Private Link and its main use cases in the context of a Databricks deployment: securing communications between control and data plane and securely connecting to the Databricks Web UI. We will then provide step-by-step walkthrough of the steps required in setting up PrivateLink connections with a Databricks deployment and demonstrate how to automate that process using AWS Cloud Formation or Terraform templates.

In this presentation we will cover the following topics: - Brief Introduction to AWS Private Link - How you can use PrivateLink to secure your AWS Databricks deployment - Step-by-step walkthrough of how to set up Private Link - How to automate and scale the setup using AWS CloudFormation or Terraform

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simon Whiteley + Denny Lee Live Ask Me Anything

Simon and Denny Build A Thing is a live webshow, where Simon Whiteley (Advancing Analytics) and Denny Lee (Databricks) are building out a TV Ratings Analytics tool, working through the various challenges of building out a Data Lakehouse using Databricks. In this session, they'll be talking through their Lakehouse Platform, revisiting various pieces of functionality, and answering your questions, Live!

This is your chance to ask questions around structuring a lake for enterprise data analytics, the various ways we can use Delta Live Tables to simplify ETL or how to get started serving out data using Databricks SQL. We have a whole load of things to talk through, but we want to hear YOUR questions, which we can field from industry experience, community engagement and internal Databricks direction. There's also a chance we'll get distracted and talk about the Expanse for far too long.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Customer-centric Innovation to Scale Data & AI Everywhere

Imagine a world where you have the flexibility to infuse intelligence into every application, from edge to cloud. In this session, you will learn how Intel is enabling customer-centric innovation and delivering the simplicity, productivity, and performance the developers need to scale their data and AI solutions everywhere. An overview of Intel end-to-end data analytics and AI technologies, developer tools as well as examples of customers use cases will be presented.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Correlation Over Causation: Cracking the Relationship Between User Engagement and User Happiness

As a head of product on the Confluence team at Atlassian, I own the metrics associated with user happiness. This a common area of ownership for heads of product, GMs, CEOs. But how do you actually use data to move the needle on user happiness, and how do you convert user activity and engagement insights into clear actions that end up positively impacting user happiness? In this talk, I would like to share the approach we developed jointly with our data analytics team to understand, operationalize and report on our journey on make Confluence users happier. This talk will be useful for data analytics and data science practitioners, product executives, and anyone faced with a task of operationalizing improvement of a "fuzzy" metric like NPS or CSAT.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Future is Open - a Look at Google Cloud’s Open Data Ecosystem

Join Anagha Khanolkar and Mansi Maharana, both Cloud Customer Engineers specialized in Advanced Analytics, to learn about Open Data Analytics on Google Cloud. This session will cover Google Data Cloud's Open Data Analytics portfolio, value proposition, customer stories, trends, and more, and including Databricks on GCP.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Future of Data - What’s Next with Google Cloud

Join Bruno Aziza, Head of Data and Analytics, Google Cloud, for an in-depth look at what he is seeing in the future of data and emerging trends. He will also cover Google Cloud’s data analytics practice, including insights into the Data Cloud Alliance, Big Lake, and our strategic partnership with Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Live Tables: Modern Software Engineering and Management for ETL

Data engineers have the difficult task of cleansing complex, diverse data, and transforming it into a usable source to drive data analytics, data science, and machine learning. They need to know the data infrastructure platform in depth, build complex queries in various languages and stitch them together for production. Join this talk to learn how Delta Live Tables (DLT) simplifies the complexity of data transformation and ETL. DLT is the first ETL framework to use modern software engineering practices to deliver reliable and trusted data pipelines at any scale. Discover how analysts and data engineers can innovate rapidly with simple pipeline development and maintenance, how to remove operational complexity by automating administrative tasks and gaining visibility into pipeline operations, how built-in quality controls and monitoring ensure accurate BI, data science, and ML, and how simplified batch and streaming can be implemented with self-optimizing and auto-scaling data pipelines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Enabling Business Users to Perform Interactive Ad-Hoc Analysis over Delta Lake with No Code

In this talk, we'll first introduce Sigma Workbooks along with its technical design motivations and architectural details. Sigma Workbooks is an interactive visual data analytics system that enables business users to easily perform complex ad-hoc analysis over data in cloud data warehouses (CDWs). We'll then demonstrate the expressivity, scalability, and ease-of-use of Sigma Workbooks through real-life use cases over datasets stored in Delta Lake. We’ll conclude the talk by sharing the lessons that we have learned throughout the design and implementation iterations of Sigma Workbooks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Evolution of Data Architectures and How to Build a Lakehouse

Data architectures are the key and part of a larger picture to building robust analytical and AI applications. One must take a holistic view of the entire data analytics realm when it comes to planning for data science initiatives.

Through this talk, learn about the evolution of the data landscape and why Lakehouses are becoming a de facto for organizations building scalable data architectures. A lakehouse architecture combines data management capability including reliability, integrity, and quality from the data warehouse and supports all data workloads including BI and AI with the low cost and open approach of data lakes.

Data Practitioners will also learn some core concepts of building an efficient Lakehouse with Delta Lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse

Los Angeles County’s Department of Human Resources (DHR) is responsible for attracting a diverse workforce for the 37 departments it supports. Each year, DHR processes upwards of 400,000 applications for job opportunities making it one of the largest employers in the nation. Managing a hiring process of this scale is complex with many complicated factors such as background checks and skills examination. These processes, if not managed properly, can create bottlenecks and a poor experience for both candidates and hiring managers.

In order to identify areas for improvement, DHR set out to build detailed operational metrics across each stage of the hiring process. DHR used to conduct high level analysis manually using excel and other disparate tools. The data itself was limited, difficult to obtain, and analyze. In addition, it was taking analysts weeks to manually pull data from half a dozen siloed systems into excel for cleansing and analysis. This process was labor-intensive, inefficient, and prone to human error.

To overcome these challenges, DHR in partnership with Internal Services Department (ISD) adopted a modern data architecture in the cloud. Powered by the Azure Databricks Lakehouse, DHR was able to bring together their diverse volumes of data into a single platform for data analytics. Manual ETL processes that took weeks could now be automated in 10 minutes or less. With this new architecture, DHR has built Business Intelligence dashboards to unpack the hiring process to get a clear picture of where the bottlenecks are and track the speed with which candidates move through the process The dashboards allow the County departments innovate and make changes to enhance and improve the experience of potential job seekers and improve the timeliness of securing highly qualified and diverse County personnel at all employment levels.

In this talk, we’ll discuss DHR’s journey towards building a data-driven hiring process, the architecture decisions that enabled this transformation and the types of analytics that we’ve deployed to improve hiring efforts.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark

In recent years, latest privacy laws & regulations bring a fundamental shift in the protection of data and privacy, placing new challenges to data applications. To resolve these privacy & security challenges in big data ecosystem without impacting existing applications, several hardware TEE (Trusted Execution Environment) solutions have been proposed for Apache Spark, e.g., PySpark with Scone and Opaque etc. However, to the best of our knowledge, none of them provide full protection to data pipelines in Spark applications. An adversary may still get sensitive information from unprotected components and stages. Furthermore, some of them greatly narrowed supported applications, e.g., only support SparkSQL. In this presentation, we will present a new PPMLA (privacy preserving machine learning and analytics) solution built on top of Apache Spark, BigDL, Occlum and Intel SGX. It ensures all spark components and pipelines are fully protected by Intel SGX, and existing Spark applications written in Scala, Java or Python can be migrated into our platform without any code change. We will demonstrate how to build distributed end-to-end SparkML/SparkSQL workloads with our solution on untrusted cloud environment and share real-world use cases for PPMLA.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

US Air Force: Safeguarding Personnel Data at Enterprise Scale

The US Air Force VAULT platform is a cloud-native enterprise data platform designed to provide the Department of the Air Force (DAF) with a robust, interoperable, and secure data environment. The strategic goals of VAULT include:

  • Leading Data Culture - Increase data use and literacy to improve efficiency and effectiveness of decisions, readiness, mission operations, and cybersecurity.
  • A Catalyst for Sharing Data - Make data Visible, Accessible, Understandable, Linked, and Trusted (VAULT).
  • Driving Data Capabilities - Increase access to the right combination of state-of-the-art technologies needed to best utilize data.

To achieve these goals, the VAULT team created a self-service platform to onboard and extract, transform and load data, perform data analytics, machine learning and visualization, and data governance. Supporting over 50 tenants across NIPR and SIPR, adds complexity to maintaining data security while ensuring data can be shared and utilized for analytics. To meet these goals VAULT requires dynamic and granular data access controls to both mitigate data exposure (due to compromised accounts, attackers monitoring a network, and other threats) while empowering users via self-service analytics. Protection of sensitive data is key to enable VAULT to support key use cases such as personal readiness to optimally place Airmen trainees to meet production goals, increase readiness, and match trainees to their preferences.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/