Enabling BI in a Lakehouse Environment: How Spark and Delta Can Help With Automating a DWH Develop

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML Analytics BI Data Lake Data Lakehouse Data Modelling Data Quality Databricks Delta DWH Spark

Traditional data warehouses typically struggle when it comes to handling large volumes of data and traffic, particularly when it comes to unstructured data. In contrast, data lakes overcome such issues and have become the central hub for storing data. We outline how we can enable BI Kimball data modelling in a Lakehouse environment.

We present how we built a Spark-based framework to modernize DWH development with data lake as central storage, assuring high data quality and scalability. The framework was implemented at over 15 enterprise data warehouses across Europe.

We present how one can tackle in Spark & with Delta Lake the data warehouse principles like surrogate, foreign and business keys, SCD type 1 and 2 etc. Additionally, we share our experiences on how such a unified data modelling framework can bridge BI with modern day use cases, such as machine learning and real time analytics. The session outlines the original challenges, the steps taken and the technical hurdles we faced.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Creating a Better Data Warehouse with the Unified Star Schema, Featuring Francesco Puppini

2022-06-15 · Leaders of Analytics Listen

podcast_episode

by Bill Inmon , Francesco Puppini , Jonas Christensen

Agile/Scrum AI/ML Analytics BI Data Modelling Data Science DWH

In a recent conversation with data warehousing legend Bill Inmon, I learned about a new way to structure your data warehouse and self-service BI environment called the Unified Star Schema. The Unified Star Schema is potentially a small revolution for data analysts and business users as it allows them to easily join tables in a data warehouse or BI platform through a bridge. This gives users the ability to spend time and effort on discovering insights rather than dealing with data connectivity challenges and joining pitfalls. Behind this deceptively simple and ingenious invention is author and data modelling innovator Francesco Puppini. Francesco and Bill have co-written the book ‘The Unified Star Schema: An Agile and Resilient Approach to Data Warehouse and Analytics Design’ to allow data modellers around the world to take advantage of the Unified Star Schema and its possibilities. Listen to this episode of Leaders of Analytics, where we explore: What the Unified Star Schema is and why we need itHow Francesco came up with the concept of the USSReal-life examples of how to use the USSThe benefits of a USS over a traditional star schema galaxyHow Francesco sees the USS and data warehousing evolving in the next 5-10 years to keep up with new demands in data science and AI, and much more.Connect with Francesco Francesco on Linkedin: https://www.linkedin.com/in/francescopuppini/ Francesco's book on the USS: https://www.goodreads.com/author/show/20792240.Francesco_Puppini

Data Modeling for Azure Data Services

2021-07-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Peter ter Braake

Azure ADF BI Cloud Computing Cosmos Data Lake Data Management Data Modelling Data Vault ETL/ELT Microsoft NoSQL +6 more

Data Modeling for Azure Data Services is an essential guide that delves into the intricacies of designing, provisioning, and implementing robust data solutions within the Azure ecosystem. Through practical examples and hands-on exercises, this book equips you with the knowledge to create scalable, performant, and adaptable database designs tailored to your business needs. What this Book will help me do Understand and apply normalization, dimensional modeling, and data vault modeling for relational databases. Learn to provision and implement scalable solutions like Azure SQL DB and Azure Synapse SQL Pool. Master how to design and model a Data Lake using Azure Storage efficiently. Gain expertise in NoSQL database modeling and implementing solutions using Azure Cosmos DB. Develop ETL/ELT processes effectively using Azure Data Factory to support data integration workflows. Author(s) None Braake brings a wealth of expertise as a data architect and cloud solutions builder specializing in Azure's data services. With hands-on experience in projects requiring sophisticated data modeling and optimization, None crafts detailed learning material to help professionals level up their database design and Azure deployment skills. Dedicated to explaining complex topics with clarity and approachable language, None ensures that the learners gain not just knowledge but applied competence. Who is it for? This book is a valuable resource for business intelligence developers, data architects, and consultants aiming to refine their skills in data modeling within modern cloud ecosystems, particularly Microsoft Azure. Whether you're a beginner with some foundational cloud data management knowledge or an experienced professional seeking to deepen your Azure data services proficiency, this book caters to your learning needs.

Expert Data Modeling with Power BI

2021-06-11 · O'Reilly Data Science Books O'Reilly Amazon

book

by Soheil Bakhshi

Analytics BI Data Modelling DAX Microsoft Power BI Cyber Security business-intelligence data data-science microsoft-power-platform power-bi

Expert Data Modeling with Power BI provides a comprehensive guide to creating effective and optimized data models using Microsoft Power BI. This book will teach you everything you need to know, from connecting to data sources to setting up complex models that enable insightful reporting and business analytics. What this Book will help me do Gain expertise in implementing virtual tables and time intelligence functionalities in Power BI's DAX language. Identify and correctly set up Dimension and Fact tables using the Power Query Editor interface. Master advanced data preparation techniques to build efficient Star Schemas for modeling. Apply best practices for preparing and modeling data for real-world business cases. Become proficient in advanced features like aggregations, incremental refresh, and row-level security. Author(s) Soheil Bakhshi is a seasoned Power BI expert and author with years of experience in business intelligence and analytics. His practical knowledge of data modeling and approachable writing style make complex concepts understandable. Soheil's passion for empowering users to harness the full potential of Power BI is evident through his clear guidance and real-world examples. Who is it for? This book is perfect for business intelligence developers, data analysts, and advanced users of Power BI who aim to deepen their understanding of data modeling. It assumes a familiarity with Power BI's basic functions and core concepts like Star Schema. If you're looking to refine your modeling practices and create versatile, dynamic solutions, this resource is for you.

Kimball in the context of the modern data warehouse: what's worth keeping, and what's not

2020-12-15 · dbt Coalesce 2020 Watch

video

by Dave Fowler (Chartio)

Cloud Computing Data Management Data Modelling DWH

Dimensional modeling described in the Kimball Toolbook was in its 3rd edition 15 years ago yet is still the latest in data modeling advice. So much is different in cloud warehouses that many of those best practices are now bad practices. In this video Dave Fowler, the founder of Chartio and author of Cloud Data Management goes over what no longer applies, and what does.

SnowflakeDB: The Data Warehouse Built For The Cloud

2019-12-09 · Data Engineering Podcast Listen

podcast_episode

by Kent Graziano (SnowflakeDB) , Tobias Macey

AI/ML Avro AWS Big Data BigQuery Cloud Computing CRM Data Engineering Data Management Data Modelling Data Vault DWH +21 more

Summary Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage engines, to the current generation of cloud-native analytical engines. SnowflakeDB has been leading the charge to take advantage of cloud services that simplify the separation of compute and storage. In this episode Kent Graziano, chief technical evangelist for SnowflakeDB, explains how it is differentiated from other managed platforms and traditional data warehouse engines, the features that allow you to scale your usage dynamically, and how it allows for a shift in your workflow from ETL to ELT. If you are evaluating your options for building or migrating a data platform, then this is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media and the Python Software Foundation. Upcoming events include the Software Architecture Conference in NYC and PyCOn US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Kent Graziano about SnowflakeDB, the cloud-native data warehouse

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what SnowflakeDB is for anyone who isn’t familiar with it?

How does it compare to the other available platforms for data warehousing? How does it differ from traditional data warehouses?

How does the performance and flexibility affect the data modeling requirements?

Snowflake is one of the data stores that is enabling the shift from an ETL to an ELT workflow. What are the features that allow for that approach and what are some of the challenges that it introduces? Can you describe how the platform is architected and some of the ways that it has evolved as it has grown in popularity?

What are some of the current limitations that you are struggling with?

For someone getting started with Snowflake what is involved with loading data into the platform?

What is their workflow for allocating and scaling compute capacity and running anlyses?

One of the interesting features enabled by your architecture is data sharing. What are some of the most interesting or unexpected uses of that capability that you have seen? What are some other features or use cases for Snowflake that are not as well known or publicized which you think users should know about? When is SnowflakeDB the wrong choice? What are some of the plans for the future of SnowflakeDB?

Contact Info

LinkedIn Website @KentGraziano on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

SnowflakeDB

Free Trial Stack Overflow

Data Warehouse Oracle DB MPP == Massively Parallel Processing Shared Nothing Architecture Multi-Cluster Shared Data Architecture Google BigQuery AWS Redshift AWS Redshift Spectrum Presto

Podcast Episode

SnowflakeDB Semi-Structured Data Types Hive ACID == Atomicity, Consistency, Isolation, Durability 3rd Normal Form Data Vault Modeling Dimensional Modeling JSON AVRO Parquet SnowflakeDB Virtual Warehouses CRM == Customer Relationship Management Master Data Management

Podcast Episode

FoundationDB

Podcast Episode

Apache Spark

Podcast Episode

SSIS == SQL Server Integration Services Talend Informatica Fivetran

Podcast Episode

Matillion Apache Kafka Snowpipe Snowflake Data Exchange OLTP == Online Transaction Processing GeoJSON Snowflake Documentation SnowAlert Splunk Data Catalog

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Data Orchestration For Hybrid Cloud Analytics

2019-10-22 · Data Engineering Podcast Listen

podcast_episode

by Dipti Borkar (Microsoft) , Tobias Macey

AI/ML Analytics AWS Big Data Cloud Computing Data Engineering Data Lake Data Management Datacoral DWH Hadoop HDFS +14 more

Summary The scale and complexity of the systems that we build to satisfy business requirements is increasing as the available tools become more sophisticated. In order to bridge the gap between legacy infrastructure and evolving use cases it is necessary to create a unifying set of components. In this episode Dipti Borkar explains how the emerging category of data orchestration tools fills this need, some of the existing projects that fit in this space, and some of the ways that they can work together to simplify projects such as cloud migration and hybrid cloud environments. It is always useful to get a broad view of new trends in the industry and this was a helpful perspective on the need to provide mechanisms to decouple physical storage from computing capacity.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! This week’s episode is also sponsored by Datacoral, an AWS-native, serverless, data infrastructure that installs in your VPC. Datacoral helps data engineers build and manage the flow of data pipelines without having to manage any infrastructure, meaning you can spend your time invested in data transformations and business needs, rather than pipeline maintenance. Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo! and Facebook, scaling from terabytes to petabytes of analytic data. He started Datacoral with the goal to make SQL the universal data programming language. Visit dataengineeringpodcast.com/datacoral today to find out more. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Dipti Borkark about data orchestration and how it helps in migrating data workloads to the cloud

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you mean by the term "Data Orchestration"?

How does it compare to the concept of "Data Virtualization"? What are some of the tools and platforms that fit under that umbrella?

What are some of the motivations for organizations to use the cloud for their data oriented workloads?

What are they giving up by using cloud resources in place of on-premises compute?

For businesses that have invested heavily in their own datacenters, what are some ways that they can begin to replicate some of the benefits of cloud environments? What are some of the common patterns for cloud migration projects and what challenges do they present?

Do you have advice on useful metrics to track for determining project completion or success criteria?

How do businesses approach employee education for designing and implementing effective systems for achieving their migration goals? Can you talk through some of the ways that different data orchestration tools can be composed together for a cloud migration effort?

What are some of the common pain points that organizations encounter when working on hybrid implementations?

What are some of the missing pieces in the data orchestration landscape?

Are there any efforts that you are aware of that are aiming to fill those gaps?

Where is the data orchestration market heading, and what are some industry trends that are driving it?

What projects are you most interested in or excited by?

For someone who wants to learn more about data orchestration and the benefits the technologies can provide, what are some resources that you would recommend?

Contact Info

LinkedIn @dborkar on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Alluxio

Podcast Episode

UC San Diego Couchbase Presto

Podcast Episode

Spark SQL Data Orchestration Data Virtualization PyTorch

Podcast.init Episode

Rook storage orchestration PySpark MinIO

Podcast Episode

Kubernetes Openstack Hadoop HDFS Parquet Files

Podcast Episode

ORC Files Hive Metastore Iceberg Table Format

Podcast Episode

Data Orchestration Summit Star Schema Snowflake Schema Data Warehouse Data Lake Teradata

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Joe Caserta: Comparing Cloud Offerings and Understanding AI

2018-04-16 · Secrets of Data Analytics Leaders Listen

podcast_episode

by Wayne Eckerson (Eckerson Group) , Joe Caserta

AI/ML Big Data Cloud Computing DWH ETL/ELT Hadoop Microsoft RDBMS

In this podcast, Wayne Eckerson and Joe Caserta discuss data migration, compare cloud offerings from Amazon, Google, and Microsoft, and define and explain artificial intelligence.

You can contact Caserta by visiting caserta.com or by sending him an email to [email protected]. Follow him on Twitter @joe_caserta.

Caserta is President of a New York City-based consulting firm he founded in 2001 and a longtime data guy. In 2004, Joe teamed up with data warehousing legend, Ralph Kimball to write to write the book The Data Warehouse ETL Toolkit. Today he’s now one of the leading authorities on big data implementations. This makes Joe one of the few individuals with in-the-trenches experience on both sides of the data divide, traditional data warehousing on relational databases and big data implementations on Hadoop and the cloud.

@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice

2018-03-07 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Bill Schmarzo (Dell EMC)

Analytics BI Big Data Computer Science Data Science DWH

In this podcast, Bill Schmarzo talks about the ingredients of successful data science practice, team, and executives. Bill shared his insights on what some leaders in the industries are doing and some challenges seen in the successful deployment. Bill shared his key take on ingredients for some of the successful hires. This podcast is great for growth mindset executives willing to learn about creating a successful data science practice.

Timeline: 0:29 Bill's journey. 5:05:00 Bill's current role. 7:04 Data science adoption challenges for businesses. 9:33 The good side of data science adoption. 11:22 How is data science changing business. 14:34 Strategies behind distributed IT. 18:35 Analysing the current amount of data. 21:50 Who should own the idea of data science? 24:34 The right background for a CDO. 25:52 Bias in IT. 29:35 Hacks to keep yourself bias-free. 31:58 Team vs. tool for putting together a good data-driven practice. 34:54 Value cycle in data science. 37:10 Maturity model. 39:17 Convincing culture heavy businesses to adopt data. 42:47 Keeping oneself sane during the technological disruption. 46:20 Hiring the right talent. 51:46 Ingredients of a good data science hire. 56:00 Bill's success mantra. 59:07 Bill's favorite reads. 1:00:36 Closing remarks.

Bill's Recommended Read: Moneyball: The Art of Winning an Unfair Game by Michael Lewis http://amzn.to/2FqBFg8 Big Data MBA: Driving Business Strategies with Data Science by Bill Schmarzo http://amzn.to/2tlZAvP

Podcast Link: https://futureofdata.org/schmarzo-dellemc-on-ingredients-of-healthy-datascience-practice-futureofdata-podcast/

Bill's BIO: Bill Schmarzo is the CTO for the Big Data Practice, where he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogger, and is a frequent speaker on the use of Big Data and data science to power the organization's key business initiatives. He is a University of San Francisco School of Management Fellow, where he teaches the "Big Data MBA" course.

Bill has over three decades of experience in data warehousing, BI, and analytics. Bill authored EMC's Vision Workshop methodology that links an organization's strategic business initiatives with their supporting data and analytic requirements and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute's faculty as the head of the analytic applications curriculum.

Bill holds a master's degree in Business Administration from the University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science, and Business Administration from Coe College.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Joe Caserta: Exploring Modern Data Platforms

2018-01-16 · Secrets of Data Analytics Leaders Listen

podcast_episode

by Wayne Eckerson (Eckerson Group) , Joe Caserta

Big Data Cloud Computing DWH ETL/ELT Hadoop RDBMS

In this podcast, Wayne Eckerson and Joe Caserta discuss what constitutes a modern data platform. Caserta is President of a New York City-based consulting firm he founded in 2001 and a longtime data guy. In 2004, Joe teamed up with data warehousing legend, Ralph Kimball to write to write the book The Data Warehouse ETL Toolkit. Today he’s now one of the leading authorities on big data implementations. This makes Joe one of the few individuals with in-the-trenches experience on both sides of the data divide, traditional data warehousing on relational databases and big data implementations on Hadoop and the cloud. His perspectives are always insightful.

QlikView Your Business

2015-08-10 · O'Reilly Data Science Books O'Reilly Amazon

book

by Lars Bjork , Charlie Leichtweis , Tammy Gibson , Oleg Troyansky

Analytics BI Data Analytics Data Modelling ETL/ELT Qlik analytics-platforms data data-science qlikview

Unlock the meaning of your data with QlikView The Qlik platform was designed to provide a fast and easy data analytics tool, and QlikView Your Business is your detailed, full-color, step-by-step guide to understanding Qlikview's powerful features and techniques so you can quickly start unlocking your data’s potential. This expert author team brings real-world insight together with practical business analytics, so you can approach, explore, and solve business intelligence problems using the robust Qlik toolset and clearly communicate your results to stakeholders using powerful visualization features in QlikView and Qlik Sense. This book starts at the basic level and dives deep into the most advanced QlikView techniques, delivering tangible value and knowledge to new users and experienced developers alike. As an added benefit, every topic presented in the book is enhanced with tips, tricks, and insightful recommendations that the authors accumulated through years of developing QlikView analytics. This is the book for you: If you are a developer whose job is to load transactional data into Qlik BI environment, and who needs to understand both the basics and the most advanced techniques of Qlik data modelling and scripting If you are a data analyst whose job is to develop actionable and insightful QlikView visualizations to share within your organization If you are a project manager or business person, who wants to get a better understanding of the Qlik Business Intelligence platform and its capabilities What You Will Learn: The book covers three common business scenarios - Sales, Profitability, and Inventory Analysis. Each scenario contains four chapters, covering the four main disciplines of business analytics: Business Case, Data Modeling, Scripting, and Visualizations. The material is organized by increasing levels of complexity. Following our comprehensive tutorial, you will learn simple and advanced QlikView and Qlik Sense concepts, including the following: Data Modeling: Transforming Transactional data into Dimensional models Building a Star Schema Linking multiple fact tables using Link Tables Combing multiple tables into a single fact able using Concatenated Fact models Managing slowly changing dimensions Advanced date handling, using the As of Date table Calculating running balances Basic and Advanced Scripting: How to use the Data Load Script language for implementing data modeling techniques How to build and use the QVD data layer Building a multi-tier data architectures Using variables, loops, subroutines, and other script control statements Advanced scripting techniques for a variety of ETL solutions Building Insightful Visualizations in QlikView: Introduction into QlikView sheet objects — List Boxes, Text Objects, Charts, and more Designing insightful Dashboards in QlikView Using advanced calculation techniques, such as Set Analysis and Advanced Aggregation Using variables for What-If Analysis, as well as using variables for storing calculations, colors, and selection filters Advanced visualization techniques - normalized and non-normalized Mekko charts, Waterfall charts, Whale Tail charts, and more Building Insightful Visualizations in Qlik Sense: Introducing Qlik Sense - how it is different from QlikView and what is similar? Creating Sense sheet objects Building and using the Library of Master Items Exploring Qlik Sense unique features — Storytelling, Geo Mapping, and using Extensions Whether you are jus

Bitemporal Data

2014-08-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tom Johnston

Data Management IBM Oracle RDBMS SQL Teradata data data-engineering relational-databases

Bitemporal data has always been important. But it was not until 2011 that the ISO released a SQL standard that supported it. Currently, among major DBMS vendors, Oracle, IBM and Teradata now provide at least some bitemporal functionality in their flagship products. But to use these products effectively, someone in your IT organization needs to know more than how to code bitemporal SQL statements. Perhaps, in your organization, that person is you. To correctly interpret business requests for temporal data, to correctly specify requirements to your IT development staff, and to correctly design bitemporal databases and applications, someone in your enterprise needs a deep understanding of both the theory and the practice of managing bitemporal data. Someone also needs to understand what the future may bring in the way of additional temporal functionality, so their enterprise can plan for it. Perhaps, in your organization, that person is you. This is the book that will show the do-it-yourself IT professional how to design and build bitemporal databases and how to write bitemporal transactions and queries, and will show those who will direct the use of vendor-provided bitemporal DBMSs exactly what is going on "under the covers" of that software. Explains the business value of bitemporal data in terms of the information that can be provided by bitemporal tables and not by any other form of temporal data, including history tables, version tables, snapshot tables, or slowly-changing dimensions Provides an integrated account of the mathematics, logic, ontology and semantics of relational theory and relational databases, in terms of which current relational theory and practice can be seen as unnecessarily constrained to the management of nontemporal and incompletely temporal data Explains how bitemporal tables can provide the time-variance and nonvolatility hitherto lacking in Inmon historical data warehouses Explains how bitemporal dimensions can replace slowly-changing dimensions in Kimball star schemas, and why they should do so Describes several extensions to the current theory and practice of bitemporal data, including the use of episodes, "whenever" temporal transactions and queries, and future transaction time Points out a basic error in the ISO’s bitemporal SQL standard, and warns practitioners against the use of that faulty functionality. Recommends six extensions to the ISO standard which will increase the business value of bitemporal data Points towards a tritemporal future for bitemporal data, in which an Aristotelian ontology and a speech-act semantics support the direct management of the statements inscribed in the rows of relational tables, and add the ability to track the provenance of database content to existing bitemporal databases This book also provides the background needed to become a business ontologist, and explains why an IT data management person, deeply familiar with corporate databases, is best suited to play that role. Perhaps, in your organization, that person is you

Business Intelligence with MicroStrategy Cookbook

2013-10-26 · O'Reilly Data Science Books O'Reilly Amazon

book

by Davide Moraschi

Analytics BI Cloud Computing Data Analytics analytics-platforms data data-science microstrategy

This comprehensive guide introduces you to the functionalities of MicroStrategy for business intelligence, empowering you to build dashboards, reports, and visualizations using hands-on, practical recipes with clear examples. You'll learn how to use MicroStrategy for the entire BI lifecycle, making data actionable and insights accessible. What this Book will help me do Install and configure the MicroStrategy platform, including setting up a fully operational BI environment. Create interactive dashboards and web reports to visualize and analyze data effectively. Learn to use MicroStrategy on mobile devices, enabling access to data-driven insights anywhere. Discover advanced analytics techniques using Visual Insight and MicroStrategy Cloud Express. Master practical skills with real-life examples to implement robust BI solutions. Author(s) Davide Moraschi, an experienced professional in business intelligence and data analytics, brings his expertise to guiding readers through the MicroStrategy platform. He has years of experience implementing and developing BI solutions in diverse industries, offering practical perspectives. Davide's approachable teaching style and clear examples make technical concepts accessible and engaging. Who is it for? This book is tailored for BI developers and data analysts who want to deepen their expertise in MicroStrategy. It's also suitable for IT professionals and business users aiming to leverage MicroStrategy for data insights. Some existing knowledge of BI concepts, such as dimensional modeling, will enrich your learning experience. You need no prior experience with MicroStrategy to benefit from this book.

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition

2013-07-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Margy Ross , Ralph Kimball

Analytics BI Big Data Data Analytics DWH ETL/ELT data data-engineering data-warehouse storage-repositories

Updated new edition of Ralph Kimball's groundbreaking book on dimensional modeling for data warehousing and business intelligence! The first edition of Ralph Kimball's The Data Warehouse Toolkit introduced the industry to dimensional modeling, and now his books are considered the most authoritative guides in this space. This new third edition is a complete library of updated dimensional modeling techniques, the most comprehensive collection ever. It covers new and enhanced star schema dimensional modeling patterns, adds two new chapters on ETL techniques, includes new and expanded business matrices for 12 case studies, and more. Authored by Ralph Kimball and Margy Ross, known worldwide as educators, consultants, and influential thought leaders in data warehousing and business intelligence Begins with fundamental design recommendations and progresses through increasingly complex scenarios Presents unique modeling techniques for business applications such as inventory management, procurement, invoicing, accounting, customer relationship management, big data analytics, and more Draws real-world case studies from a variety of industries, including retail sales, financial services, telecommunications, education, health care, insurance, e-commerce, and more Design dimensional databases that are easy to understand and provide fast query response with The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.

IBM Cognos Business Intelligence v10: The Complete Guide

2012-11-20 · O'Reilly Business Intelligence Books O'Reilly Amazon

book

by Sangeeta Gautam

BI Cognos IBM Marketing analytics-platforms data data-science

Maximize the Value of Business Intelligence with IBM Cognos v10 -- Hands-on, from Start to Finish This easy-to-use, hands-on guide brings together all the information and insight you need to drive maximum business value from IBM Cognos v10. Long-time IBM Cognos expert and product designer Sangeeta Gautam thoroughly illuminates Cognos BI v10’s key capabilities: analysis, query, reporting, and dashboards. Gautam shows how to take full advantage of each key IBM Cognos feature, including brand-new innovations such as Active Reports and the new IBM Cognos Workspace report consumption environment. She concludes by walking you through successfully planning and implementing an integrated business intelligence solution using IBM’s best-practice methodologies. The first and only guide of its kind, offers expert insights for BI designers, architects, developers, administrators, project managers, nontechnical end-users, and partners throughout all areas of the business—from sales and marketing to operations and lines of business. If you’re pursuing official IBM Cognos certification, you’ll also find Cognos certification sample questions and information to help you with the certification process. IBM Cognos Business Intelligence v10 Coverage Includes • Understanding IBM Cognos BI’s components and open, extensible architecture • Working with IBM Cognos key “studio” tools: Analysis Studio, Query Studio, Report Studio, and Event Studio • Developing and managing powerful reports that draw on the rich capabilities of IBM Cognos Workspace and Workspace Advanced • Designing Star Schema databases and metadata models to answer the questions your organization cares about most • Efficiently maintaining and systematically securing IBM Cognos BI environments and their objects • Using IBM Cognos Connection as your single point of entry to all corporate data • Building interactive, easy-to-manage Active Reports for casual business users • Using new IBM Cognos BI v10.1 Dynamic Query Mode (DQM) to improve performance with complex heterogeneous data • Identifying, exploring, and exploiting hidden data relationships • Creating quick ad hoc queries that deliver fast answers • Establishing user and administrator roles

The Microsoft® Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft® Business Intelligence Toolset, Second Edition

2011-03-08 · O'Reilly Business Intelligence Books O'Reilly Amazon

book

by Ralph Kimball , Warren Thornthwaite , Joy Mundy

BI DWH Microsoft SQL data data-engineering microsoft-sql-server relational-databases

Best practices and invaluable advice from world-renowned data warehouse experts In this book, leading data warehouse experts from the Kimball Group share best practices for using the upcoming "Business Intelligence release" of SQL Server, referred to as SQL Server 2008 R2. In this new edition, the authors explain how SQL Server 2008 R2 provides a collection of powerful new tools that extend the power of its BI toolset to Excel and SharePoint users and they show how to use SQL Server to build a successful data warehouse that supports the business intelligence requirements that are common to most organizations. Covering the complete suite of data warehousing and BI tools that are part of SQL Server 2008 R2, as well as Microsoft Office, the authors walk you through a full project lifecycle, including design, development, deployment and maintenance. Features more than 50 percent new and revised material that covers the rich new feature set of the SQL Server 2008 R2 release, as well as the Office 2010 release Includes brand new content that focuses on PowerPivot for Excel and SharePoint, Master Data Services, and discusses updated capabilities of SQL Server Analysis, Integration, and Reporting Services Shares detailed case examples that clearly illustrate how to best apply the techniques described in the book The accompanying Web site contains all code samples as well as the sample database used throughout the case studies The Microsoft Data Warehouse Toolkit, Second Edition provides you with the knowledge of how and when to use BI tools such as Analysis Services and Integration Services to accomplish your most essential data warehousing tasks.

Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration

2010-09-28 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jos van Dongen , Matt Casters , Roland Bouman

Cloud Computing Data Vault DWH ETL/ELT analytics-platforms data data-science pentaho

A complete guide to Pentaho Kettle, the Pentaho Data lntegration toolset for ETL This practical book is a complete guide to installing, configuring, and managing Pentaho Kettle. If you're a database administrator or developer, you'll first get up to speed on Kettle basics and how to apply Kettle to create ETL solutions—before progressing to specialized concepts such as clustering, extensibility, and data vault models. Learn how to design and build every phase of an ETL solution. Shows developers and database administrators how to use the open-source Pentaho Kettle for enterprise-level ETL processes (Extracting, Transforming, and Loading data) Assumes no prior knowledge of Kettle or ETL, and brings beginners thoroughly up to speed at their own pace Explains how to get Kettle solutions up and running, then follows the 34 ETL subsystems model, as created by the Kimball Group, to explore the entire ETL lifecycle, including all aspects of data warehousing with Kettle Goes beyond routine tasks to explore how to extend Kettle and scale Kettle solutions using a distributed "cloud" Get the most out of Pentaho Kettle and your data warehousing with this detailed guide—from simple single table data migration to complex multisystem clustered data integration tasks.

Star Schema The Complete Reference

2010-07-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christopher Adamson

BI Data Quality DWH ETL/ELT data data-engineering star-schema

The definitive guide to dimensional design for your data warehouse Learn the best practices of dimensional design. Star Schema: The Complete Reference offers in-depth coverage of design principles and their underlying rationales. Organized around design concepts and illustrated with detailed examples, this is a step-by-step guidebook for beginners and a comprehensive resource for experts. This all-inclusive volume begins with dimensional design fundamentals and shows how they fit into diverse data warehouse architectures, including those of W.H. Inmon and Ralph Kimball. The book progresses through a series of advanced techniques that help you address real-world complexity, maximize performance, and adapt to the requirements of BI and ETL software products. You are furnished with design tasks and deliverables that can be incorporated into any project, regardless of architecture or methodology. Master the fundamentals of star schema design and slow change processing Identify situations that call for multiple stars or cubes Ensure compatibility across subject areas as your data warehouse grows Accommodate repeating attributes, recursive hierarchies, and poor data quality Support conflicting requirements for historic data Handle variation within a business process and correlation of disparate activities Boost performance using derived schemas and aggregates Learn when it's appropriate to adjust designs for BI and ETL tools

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence

2010-02-08 · O'Reilly Business Intelligence Books O'Reilly Amazon

book

by Margy Ross , Ralph Kimball , Bob Becker , Warren Thornthwaite , Joy Mundy

Analytics BI DWH ETL/ELT data data-engineering data-warehouse storage-repositories

An unparalleled collection of recommended guidelines for data warehousing and business intelligence pioneered by Ralph Kimball and his team of colleagues from the Kimball Group. Recognized and respected throughout the world as the most influential leaders in the data warehousing industry, Ralph Kimball and the Kimball Group have written articles covering more than 250 topics that define the field of data warehousing. For the first time, the Kimball Group's incomparable advice, design tips, and best practices have been gathered in this remarkable collection of articles, which spans a decade of data warehousing innovation. Each group of articles is introduced with original commentaries that explain their role in the overall lifecycle methodology developed by the Kimball Group. These practical, hands-on articles are fully updated to reflect current practices and terminology and cover the complete lifecycle—including project planning, requirements gathering, dimensional modeling, ETL, and business intelligence and analytics. This easily referenced collection is nothing less than vital if you are involved with data warehousing or business intelligence in any capacity.

The Data Warehouse Lifecycle Toolkit

2008-01-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Margy Ross , Ralph Kimball , Bob Becker , Warren Thornthwaite , Joy Mundy

BI DWH data data-engineering data-warehouse storage-repositories

The world of data warehousing has changed remarkably since the first edition of The Data Warehouse Lifecycle Toolkit was published in 1998. With this new edition, Ralph Kimball and his colleagues have refined the original set of Lifecycle methods and techniques based on their consulting and training experience. They walk you through the detailed steps of designing, developing, and deploying a data warehousing/business intelligence system. With substantial new and updated content, this second edition again sets the standard in data warehousing for the next decade.

talk-data.com

dimensional modeling

Activity Trend

Top Events

Top Speakers

Enabling BI in a Lakehouse Environment: How Spark and Delta Can Help With Automating a DWH Develop

Creating a Better Data Warehouse with the Unified Star Schema, Featuring Francesco Puppini

Data Modeling for Azure Data Services

Expert Data Modeling with Power BI

Kimball in the context of the modern data warehouse: what's worth keeping, and what's not

SnowflakeDB: The Data Warehouse Built For The Cloud

Data Orchestration For Hybrid Cloud Analytics

Joe Caserta: Comparing Cloud Offerings and Understanding AI

@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Joe Caserta: Exploring Modern Data Platforms

QlikView Your Business

Bitemporal Data

Business Intelligence with MicroStrategy Cookbook

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition

IBM Cognos Business Intelligence v10: The Complete Guide

The Microsoft® Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft® Business Intelligence Toolset, Second Edition

Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration

Star Schema The Complete Reference

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence

The Data Warehouse Lifecycle Toolkit