talk-data.com talk-data.com

Topic

Data Governance

data_management compliance data_quality

417

tagged

Activity Trend

90 peak/qtr
2020-Q1 2026-Q1

Activities

417 activities · Newest first

Architecting Data Lakes, 2nd Edition

Many organizations today are succeeding with data lakes, not just as storage repositories but as places to organize, prepare, analyze, and secure a wide variety of data. Management and governance is critical for making your data lake work, yet hard to do without a roadmap. With this ebook, you’ll learn an approach that merges the flexibility of a data lake with the management and governance of a traditional data warehouse. Author Ben Sharma explains the steps necessary to deploy data lakes with robust, metadata-driven data management platforms. You’ll learn best practices for building, maintaining, and deriving value from a data lake in your production environment. Included is a detailed checklist to help you construct a data lake in a controlled yet flexible way. Managing and governing data in your lake cannot be an afterthought. This ebook explores how integrated data lake management solutions, such as the Zaloni Data Platform (ZDP), deliver necessary controls without making data lakes slow and inflexible. You’ll examine: A reference architecture for a production-ready data lake An overview of the data lake technology stack and deployment options Key data lake attributes, including ingestion, storage, processing, and access Why implementing management and governance is crucial for the success of your data lake How to curate data lakes through data governance, acquisition, organization, preparation, and provisioning Methods for providing secure self-service access for users across the enterprise How to build a future-proof data lake tech stack that includes storage, processing, data management, and reference architecture Emerging trends that will shape the future of data lakes

In this podcast Stephen Gatchell (@stephengatchell) from @Dell talks about the ingredients of a successful data scientist. He sheds light on the importance of data governance and compliance in defining a robust data science strategy. He suggested tactical steps that executives could take in starting their journey to a robust governance framework. He talked about how to take away the scare from governance. He gave insights on some of the things leaders could do today to build robust data science teams and framework. This podcast is great for leaders seeking some tactical insights into building a robust data science framework.

Timeline:

0:29 Stephen's journey. 4:45 Dell's customer experience journey. 7:39 Suggestions for a startup in regard to customer experience. 12:02 Building a center of excellence around data. 15:29 Data ownership. 19:18 Fixing data governance. 24:02 Fixing the data culture. 29:40 Distributed data ownership and data lakes. 32:50 Understanding data lakes. 35:50 Common pitfalls and opportunities in data governance. 38:50 Pleasant surprises in data governance. 41:30 Ideal data team. 44:04 Hiring the right candidates for data excellence. 46:13 How do I know the "why"? 49:05 Stephen's success mantra. 50:56 Stephen's best read. Steve's Recommended Read: Big Data MBA: Driving Business Strategies with Data Science by Bill Schmarzo http://amzn.to/2HWjOyT

Podcast Link: https://futureofdata.org/want-to-fix-datascience-fix-governance-by-stephengatchell-futureofdata/

Steve's BIO: Stephen is currently a Chief Data Officer Engineering & Data Lake at Dell and serves on the Dell Information Quality Governance Office and the Dell IT Technology Advisory Board, developing Dell’s corporate strategies for the Business Data Lake, Advanced Analytics, and Information Asset Management. Stephen also serves as a Customer Insight Analyst for the Chief Technology Office, analyzing customer technology challenges and requirements. Stephen has been awarded the People’s Choice Award by the Dell Total Customer Experience Team for the Data Governance and Business Data Lake project, as well as a Chief Technology Officer Innovation finalist for utilizing advanced analytics for customer configurations improving product development and product test coverage. Prior to Stephen’s current role, he managed Dell’s Global Product Development Lab Operations team developing internal cloud orchestration and automation environments, an Information Systems Executive for IBM leading acquisition conversion efforts, and was VP of Enterprise Systems and Operations managing mission-critical Information Systems for Telelogic (a Swedish public software firm). Stephen has an MBA from Southern New Hampshire University, a BSBA, and an AS in Finance from Northeastern University.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

0 Comments

Understanding Metadata

One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn’t the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging. This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture. This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include: Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab Tooling from open source projects, including Teradata Kylo and Informatica Startups such as Trifacta and Zaloni that provide best of breed technology

In this podcast, Kevin Sonsky reveals the secrets to his success as a business intelligence leader at Citrix Systems. During the past 11 years, he has implemented an enterprise-wide self-service reporting environment that has delivered deeper insights into customer purchasing behavior. At the same time, he has established a grassroots governance program that has successfully standardized on dozens of key enterprise metrics and reports. Kevin is interviewed by Wayne W. Eckerson, long-time thought leader in the business analytics field.

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Josh West (Analytics Demystified) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

If you're in the U.S., happy election day! In the spirit of the mayhem and controversy that the political process brings, we're tackling a topic that is every bit as controversial: tag management. Does Adobe DTM gratuitously delete emails? Has GTM been perpetually unaware of when it is around a hot mic? What does Tealium have against coffee?! Is Signal broadcasting dog whistles to marketers about the glorious data they can collect and manage? What about Ensighten's sordid past where the CEO was spotted in public (at eMetrics) sporting a periwig? To discuss all of this (or...actual content), Josh West from Analytics Demystified joins us for a discussion that is depressingly civil and uncontentious. Many linkable things were referenced in this episode: Josh's Industry War starting blog post (from 2013), Adobe Dynamic Tag Management (DTM), Google Tag Manager (GTM), Signal, Tealium, Ensighten, Ghostery, Observepoint, Hub'scan, the Data Governance Episode of the Digital Analytics Power Hour (Episode #012),  PhoneGap, Floodlight / Doubleclick / DFA, In the Year 2000 (Conan O'Brien), Bird Law, Adobe Experience Manager (AEM), Webtrends Streams, data management platforms (DMP), the Personalization Episode of the Digital Analytics Power Hour with Matt Gershoff (Episode #031), josh.analyticsdemystified.com, and Tagtician.

The Data and Analytics Playbook

The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality explores the way in which data continues to dominate budgets, along with the varying efforts made across a variety of business enablement projects, including applications, web and mobile computing, big data analytics, and traditional data integration. The book teaches readers how to use proven methods and accelerators to break through data obstacles to provide faster, higher quality delivery of mission critical programs. Drawing upon years of practical experience, and using numerous examples and an easy to understand playbook, Lowell Fryman, Gregory Lampshire, and Dan Meers discuss a simple, proven approach to the execution of multiple data oriented activities. In addition, they present a clear set of methods to provide reliable governance, controls, risk, and exposure management for enterprise data and the programs that rely upon it. In addition, they discuss a cost-effective approach to providing sustainable governance and quality outcomes that enhance project delivery, while also ensuring ongoing controls. Example activities, templates, outputs, resources, and roles are explored, along with different organizational models in common use today and the ways they can be mapped to leverage playbook data governance throughout the organization. Provides a mature and proven playbook approach (methodology) to enabling data governance that supports agile implementation Features specific examples of current industry challenges in enterprise risk management, including anti-money laundering and fraud prevention Describes business benefit measures and funding approaches using exposure based cost models that augment risk models for cost avoidance analysis and accelerated delivery approaches using data integration sprints for application, integration, and information delivery success

Ecommerce Analytics: Analyze and Improve the Impact of Your Digital Strategy

Today's Complete, Focused, Up-to-Date Guide to Analytics for Ecommerce Profit from analytics throughout the entire customer experience and lifecycle Make the most of all the fast-changing data sources now available to you For all ecommerce executives, strategists, entrepreneurs, marketers, analysts, and data scientists Ecommerce Analytics is the only complete single-source guide to analytics for your ecommerce business. It brings together all the knowledge and skills you need to solve your unique problems, and transform your data into better decisions and customer experiences. Judah Phillips shows how to use analysis to improve ecommerce marketing and advertising, understand customer behavior, increase conversion rates, strengthen loyalty, optimize merchandising and product mix, streamline transactions, optimize product mix, and accurately attribute sales. Drawing on extensive experience leading large-scale analytics programs, he also offers expert guidance on building successful analytical teams; surfacing high-value insights via dashboards and visualization; and managing data governance, security, and privacy. Here are the answers you need to make the most of analytics in ecommerce: throughout your organization, across your entire customer lifecycle.

Self-Service Analytics

Organizations today are swimming in data, but most of them manage to analyze only a fraction of what they collect. To help build a stronger data-driven culture, many organizations are adopting a new approach called self-service analytics. This O’Reilly report examines how this approach provides data access to more people across a company, allowing business users to work with data themselves and create their own customized analyses. The result? More eyes looking at more data in more ways. Along with the perceived benefits, author Sandra Swanson also delves into the potential pitfalls of self-service analytics: balancing greater data access with concerns about security, data governance, and siloed data stores. Read this report and gain insights from enterprise tech (Yahoo), government (the City of Chicago), and disruptive retail (Warby Parker and Talend). Learn how these organizations are handling self-service analytics in practice. Sandra Swanson is a Chicago-based writer who’s covered technology, science, and business for dozens of publications, including ScientificAmerican.com. Connect with her on Twitter (@saswanson) or at www.saswanson.com.

Data Lake Development with Big Data

In "Data Lake Development with Big Data," you will explore the fundamental principles and techniques for constructing and managing a Data Lake tailored for your organization's big data challenges. This book provides practical advice and architectural strategies for ingesting, managing, and analyzing large-scale data efficiently and effectively. What this Book will help me do Learn how to architect a Data Lake from scratch tailored to your organizational needs. Master techniques for ingesting data using real-time and batch processing frameworks efficiently. Understand data governance, quality, and security considerations essential for scalable Data Lakes. Discover strategies for enabling users to explore data within the Data Lake effectively. Gain insights into integrating Data Lakes with Big Data analytic applications for high performance. Author(s) None Pasupuleti and Beulah Salome Purra bring their extensive expertise in big data and enterprise data management to this book. With years of hands-on experience designing and managing large-scale data architectures, their insights are rooted in practical knowledge and proven techniques. Who is it for? This book is ideal for data architects and senior managers tasked with adapting or creating scalable data solutions in enterprise contexts. Readers should have foundational knowledge of master data management and be familiar with Big Data technologies to derive maximum value from the content presented.

Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives

Master alternative Big Data technologies that can do what Hadoop can't: real-time analytics and iterative machine learning. When most technical professionals think of Big Data analytics today, they think of Hadoop. But there are many cutting-edge applications that Hadoop isn't well suited for, especially real-time analytics and contexts requiring the use of iterative machine learning algorithms. Fortunately, several powerful new technologies have been developed specifically for use cases such as these. Big Data Analytics Beyond Hadoop is the first guide specifically designed to help you take the next steps beyond Hadoop. Dr. Vijay Srinivas Agneeswaran introduces the breakthrough Berkeley Data Analysis Stack (BDAS) in detail, including its motivation, design, architecture, Mesos cluster management, performance, and more. He presents realistic use cases and up-to-date example code for: Spark, the next generation in-memory computing technology from UC Berkeley Storm, the parallel real-time Big Data analytics technology from Twitter GraphLab, the next-generation graph processing paradigm from CMU and the University of Washington (with comparisons to alternatives such as Pregel and Piccolo) Halo also offers architectural and design guidance and code sketches for scaling machine learning algorithms to Big Data, and then realizing them in real-time. He concludes by previewing emerging trends, including real-time video analytics, SDNs, and even Big Data governance, security, and privacy issues. He identifies intriguing startups and new research possibilities, including BDAS extensions and cutting-edge model-driven analytics. Big Data Analytics Beyond Hadoop is an indispensable resource for everyone who wants to reach the cutting edge of Big Data analytics, and stay there: practitioners, architects, programmers, data scientists, researchers, startup entrepreneurs, and advanced students.

Oracle Big Data Handbook

Transform Big Data into Insight "In this book, some of Oracle's best engineers and architects explain how you can make use of big data. They'll tell you how you can integrate your existing Oracle solutions with big data systems, using each where appropriate and moving data between them as needed." -- Doug Cutting, co-creator of Apache Hadoop Cowritten by members of Oracle's big data team, Oracle Big Data Handbook provides complete coverage of Oracle's comprehensive, integrated set of products for acquiring, organizing, analyzing, and leveraging unstructured data. The book discusses the strategies and technologies essential for a successful big data implementation, including Apache Hadoop, Oracle Big Data Appliance, Oracle Big Data Connectors, Oracle NoSQL Database, Oracle Endeca, Oracle Advanced Analytics, and Oracle's open source R offerings. Best practices for migrating from legacy systems and integrating existing data warehousing and analytics solutions into an enterprise big data infrastructure are also included in this Oracle Press guide. Understand the value of a comprehensive big data strategy Maximize the distributed processing power of the Apache Hadoop platform Discover the advantages of using Oracle Big Data Appliance as an engineered system for Hadoop and Oracle NoSQL Database Configure, deploy, and monitor Hadoop and Oracle NoSQL Database using Oracle Big Data Appliance Integrate your existing data warehousing and analytics infrastructure into a big data architecture Share data among Hadoop and relational databases using Oracle Big Data Connectors Understand how Oracle NoSQL Database integrates into the Oracle Big Data architecture Deliver faster time to value using in-database analytics Analyze data with Oracle Advanced Analytics (Oracle R Enterprise and Oracle Data Mining), Oracle R Distribution, ROracle, and Oracle R Connector for Hadoop Analyze disparate data with Oracle Endeca Information Discovery Plan and implement a big data governance strategy and develop an architecture and roadmap

IBM Information Server: Integration and Governance for Emerging Data Warehouse Demands

This IBM® Redbooks® publication is intended for business leaders and IT architects who are responsible for building and extending their data warehouse and Business Intelligence infrastructure. It provides an overview of powerful new capabilities of Information Server in the areas of big data, statistical models, data governance and data quality. The book also provides key technical details that IT professionals can use in solution planning, design, and implementation.

Data Warehousing in the Age of Big Data

Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse. As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author Krish Krishnan helps you make sense of how Big Data fits into the world of data warehousing in clear and concise detail. The book is presented in three distinct parts. Part 1 discusses Big Data, its technologies and use cases from early adopters. Part 2 addresses data warehousing, its shortcomings, and new architecture options, workloads, and integration techniques for Big Data and the data warehouse. Part 3 deals with data governance, data visualization, information life-cycle management, data scientists, and implementing a Big Data–ready data warehouse. Extensive appendixes include case studies from vendor implementations and a special segment on how we can build a healthcare information factory. Ultimately, this book will help you navigate through the complex layers of Big Data and data warehousing while providing you information on how to effectively think about using all these technologies and the architectures to design the next-generation data warehouse. Learn how to leverage Big Data by effectively integrating it into your data warehouse. Includes real-world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies Understand how to optimize and tune your current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

Data Virtualization for Business Intelligence Systems

Data virtualization can help you accomplish your goals with more flexibility and agility. Learn what it is and how and why it should be used with Data Virtualization for Business Intelligence Systems. In this book, expert author Rick van der Lans explains how data virtualization servers work, what techniques to use to optimize access to various data sources and how these products can be applied in different projects. You’ll learn the difference is between this new form of data integration and older forms, such as ETL and replication, and gain a clear understanding of how data virtualization really works. Data Virtualization for Business Intelligence Systems outlines the advantages and disadvantages of data virtualization and illustrates how data virtualization should be applied in data warehouse environments. You’ll come away with a comprehensive understanding of how data virtualization will make data warehouse environments more flexible and how it make developing operational BI applications easier. Van der Lans also describes the relationship between data virtualization and related topics, such as master data management, governance, and information management, so you come away with a big-picture understanding as well as all the practical know-how you need to virtualize your data. First independent book on data virtualization that explains in a product-independent way how data virtualization technology works. Illustrates concepts using examples developed with commercially available products. Shows you how to solve common data integration challenges such as data quality, system interference, and overall performance by following practical guidelines on using data virtualization. Apply data virtualization right away with three chapters full of practical implementation guidance. Understand the big picture of data virtualization and its relationship with data governance and information management.

Data Architecture

Data Architecture: From Zen to Reality explains the principles underlying data architecture, how data evolves with organizations, and the challenges organizations face in structuring and managing their data. Using a holistic approach to the field of data architecture, the book describes proven methods and technologies to solve the complex issues dealing with data. It covers the various applied areas of data, including data modelling and data model management, data quality, data governance, enterprise information management, database design, data warehousing, and warehouse design. This text is a core resource for anyone customizing or aligning data management systems, taking the Zen-like idea of data architecture to an attainable reality. The book presents fundamental concepts of enterprise architecture with definitions and real-world applications and scenarios. It teaches data managers and planners about the challenges of building a data architecture roadmap, structuring the right team, and building a long term set of solutions. It includes the detail needed to illustrate how the fundamental principles are used in current business practice. The book is divided into five sections, one of which addresses the software-application development process, defining tools, techniques, and methods that ensure repeatable results. Data Architecture is intended for people in business management involved with corporate data issues and information technology decisions, ranging from data architects to IT consultants, IT auditors, and data administrators. It is also an ideal reference tool for those in a higher-level education process involved in data or information technology management. Presents fundamental concepts of enterprise architecture with definitions and real-world applications and scenarios Teaches data managers and planners about the challenges of building a data architecture roadmap, structuring the right team, and building a long term set of solutions Includes the detail needed to illustrate how the fundamental principles are used in current business practice

Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture

Making Data Integration Work: How to Systematically Reduce Cost, Improve Quality, and Enhance Effectiveness Today’s enterprises are investing massive resources in data integration. Many possess thousands of point-to-point data integration applications that are costly, undocumented, and difficult to maintain. Data integration now accounts for a major part of the expense and risk of typical data warehousing and business intelligence projects--and, as businesses increasingly rely on analytics, the need for a blueprint for data integration is increasing now more than ever. This book presents the solution: a clear, consistent approach to defining, designing, and building data integration components to reduce cost, simplify management, enhance quality, and improve effectiveness. Leading IBM data management expert Tony Giordano brings together best practices for architecture, design, and methodology, and shows how to do the disciplined work of getting data integration right. Mr. Giordano begins with an overview of the “patterns” of data integration, showing how to build blueprints that smoothly handle both operational and analytic data integration. Next, he walks through the entire project lifecycle, explaining each phase, activity, task, and deliverable through a complete case study. Finally, he shows how to integrate data integration with other information management disciplines, from data governance to metadata. The book’s appendices bring together key principles, detailed models, and a complete data integration glossary. Coverage includes Implementing repeatable, efficient, and well-documented processes for integrating data Lowering costs and improving quality by eliminating unnecessary or duplicative data integrations Managing the high levels of complexity associated with integrating business and technical data Using intuitive graphical design techniques for more effective process and data integration modeling Building end-to-end data integration applications that bring together many complex data sources

Data Governance definiert den Umgang mit Daten. Sie definiert ihn so, dass deren Wert optimiert wird und Risiken vermieden werden. Data Governance soll so dafür sorgen, dass Daten wie ein ganz normales Wirtschaftsgut behandelt werden. „Data as Assets“ bildet den dazu gehörigen Slogan; Zum rein technischen Kontext gesellt sich die Perspektive des Business.. Microsoft Purview kommt daher mit dem Anspruch eine „Unified Data Governance“ zu bieten. Bis zur Einführung des Metamodelles waren die verwalteten (Meta-) Daten allerdings rein technischer Natur. Jetzt eröffnet sich die Verbindung zu Enterprise Architektur, Datenstrategie, den dazu gehörigen Anwendungen und Vorgehensmodellen. Die Chancen die diese erweiterte Perspektive bietet, sollen Gegenstand dieser Session sein.