Despite $180 billion spent on big data tools and technologies, poor data quality remains a significant barrier for businesses, especially in achieving Generative AI goals. Published at: https://www.eckerson.com/articles/poor-data-quality-is-a-full-blown-crisis-a-2024-customer-insight-report
talk-data.com
Topic
Big Data
18
tagged
Activity Trend
Top Events
Many organizations abandoned data modeling as they embraced big data and NoSQL. Now they find that data modeling continues to be important, perhaps more important today than ever before. With a fresh look you’ll see that today’s data modeling is different from past practices – much more than physical design for relational data. Published at: https://www.eckerson.com/articles/a-fresh-look-at-data-modeling-part-1-the-what-and-why-of-data-modeling
Today’s data architecture discussions are heavily biased toward managing data for analytics, with attention to big data, scalability, cloud, and cross-platform data management. We need to acknowledge analytics bias and address management of operational data. Ignoring operational data architecture is a sure path to technical debt and future data management pain. Published at: https://www.eckerson.com/articles/the-yin-and-yang-of-data-architecture
The advent of big data, self-service analytics, and cloud applications has created a need for new ways to manage data access. New data access governance tools promise to simplify and standardize data access and authorization across an enterprise. Data management expert, Sanjeev Mohan, provides an industry perspective on this emerging technology and what it means for data analytics teams.
In the physical world, you can see a bridge rusting or a building facade crumbling and know you have to intervene to prevent the infrastructure from collapsing. But when all you have is bits and bytes - digital stuff, like software and data ---how can you tell if your customer-facing digital interactions or data-driven analytics and models are about to go up in smoke?
Observability is a new term that describes what we used to call IT monitoring. The new moniker is fitting given all the technology changes that have happened in the past decade. The cloud, big data, microservices, containers, cloud applications, machine learning, and artificial intelligence have created a dramatically complex IT and data environment that is harder than ever to manage. And the stakes are higher as organizations move their operations online to compete with digital natives. Today, you can't run digital or data operations without observability tools.
Kevin Petrie is one of the industry's foremost experts on observability. He is vice president of research at Eckerson Group where he leads a team of distinguished analysts. He recently wrote an article titled "The Five Shades of Observability" that describes five types of observability tools. In this podcast, we discuss what observability is, why you need it, and the types of available tools. We also speculate on the future of this technology and recommend how to select an appropriate observability product.
Chief data officers (CDOs) first appeared in enterprise organizations after the Sarbanes Oxley Act became law in the United States in 2002 to improve corporate governance controls. CDOs started with a trickle, but have since become a flood, now populating more than two-thirds of large enterprises, according to a recent survey by NewVantage Partners.
To explore this dynamic role in detail, we invited Joe Dossantos, newly minted CDO for the data and analytics software vendor Qlik. Joe is responsible for data governance, internal data delivery, and self-service enablement. He also evangelizes data and analytics best practices to Qlik customers.
Prior to joining Qlik, Joe led TD Bank’s data strategy, and built and ran the Big Data Consulting Practice for EMC Corporation's Professional Services Organization.
The rise of machine learning has placed a premium on finding new sources of data to fuel predictive models. But acquiring external data is often expensive and many data sets are rife with errors and difficult to combine with internal data. But that’s going to change in 2020.
To help us understand the scale, scope, and dimensions of emerging data marketplaces is Justin Langseth, one of the visionaries in our space. Justin is a VP at Snowflake responsible for the Snowflake Data Exchange. Prior to Snowflake, Justin was the technical founder and CEO/CTO of 5 data technology startups: Claraview (sold to Teradata), Zoomdata (sold to Logi Analytics), Clarabridge, Strategy.com, and Augaroo. He has 25 years of experience in business intelligence, natural language processing, big data, and AI.
In this episode, Daniel Graham dissects the capabilities of data lakes and compares it to data warehouses. He talks about the primary use cases of data lakes and how they are vital for big data ecosystems. He then goes on to explain the role of data warehouses which are still responsible for timely and accurate data but don't have a central role anymore. In the end, both Wayne Eckerson and Dan Graham settle on a common definition for modern data architectures.
Daniel Graham has more than 30 years in IT, consulting, research, and product marketing, with almost 30 years at leading database management companies. Dan was a Strategy Director in IBM’s Global BI Solutions division and General Manager of Teradata’s high-end server divisions. During his tenure as a product marketer, Dan has been responsible for MPP data management systems, data warehouses, and data lakes, and most recently, the Internet of Things and streaming systems.
Recent technology developments are driving urgency to modernize data management. What do you do about architecture, modeling, quality, and governance to keep up with big data, cloud, self-service, and other trends in data and technology? Examining some best practices can spark ideas of where to begin.
Originally published at https://www.eckerson.com/articles/stepping-up-to-modern-data-management
In this Episode, Wayne Eckerson asks Charles Reeves about his organization’s Internet of Things and Big Data strategy. Reeves is senior manager of BI and analytics at Graphics Packaging International, a leader in the packaging industry with hundreds of worldwide customers. He has 25 years of professional experience in IT management including nine years in reporting, analytics, and data governance.
With all the hype and attention around big data and huge data platforms, there can sometimes be some data envy. There are still organizations and companies that don’t have big data: are they not poised for analytics too? Can they not get insights as well? The BI Pharaoh gives tips on how to work with your little data just like the big boys.
Originally published at https://www.eckerson.com/articles/little-data-needs-love-too
In this episode, Wayne Eckerson and Jeff Magnusson discuss the data architecture Stitch Fix created to support its data science workloads, as well as the need to balance man and machine and art and science.
Magnusson is the vice president of data platform at Stitch Fix. He leads a team responsible for building the data platform that supports the company's team of 80+ data scientists, as well as other business users. That platform is designed to facilitate self-service among data scientists and promote velocity and innovation that differentiate Stitch Fix in the marketplace. Before Stitch Fix, Magnusson managed the data platform architecture team at Netflix where he helped design and open source many of the components of the Hadoop-based infrastructure and big data platform.
In this podcast, Wayne Eckerson and Joe Caserta discuss data migration, compare cloud offerings from Amazon, Google, and Microsoft, and define and explain artificial intelligence.
You can contact Caserta by visiting caserta.com or by sending him an email to [email protected]. Follow him on Twitter @joe_caserta.
Caserta is President of a New York City-based consulting firm he founded in 2001 and a longtime data guy. In 2004, Joe teamed up with data warehousing legend, Ralph Kimball to write to write the book The Data Warehouse ETL Toolkit. Today he’s now one of the leading authorities on big data implementations. This makes Joe one of the few individuals with in-the-trenches experience on both sides of the data divide, traditional data warehousing on relational databases and big data implementations on Hadoop and the cloud.
In this podcast, Wayne Eckerson and James Serra discuss myths of modern data management. Some of the myths discussed include 'all you need is a data lake', 'the data warehouse is dead', 'we don’t need OLAP cubes anymore', 'cloud is too expensive and latency is too slow', 'you should always use a NoSQL product over a RDBMS.'
Serra is big data and data warehousing solutions architect at Microsoft with over thirty years of IT experience. He is a popular blogger and speaker and has presented at dozens of Microsoft PASS and other events. Prior to Microsoft, Serra was an independent data warehousing and business intelligence architect and developer.
In this episode, Wayne Eckerson and Jeff Magnusson discuss a self-service model for data science work and the role of a data platform in that environment. Magnusson also talks about Flotilla, a new open source API that makes it easy for data scientists to execute tasks on the data platform.
Magnusson is the vice president of data platform at Stitch Fix. He leads a team responsible for building the data platform that supports the company's team of 80+ data scientists, as well as other business users. That platform is designed to facilitate self-service among data scientists and promote velocity and innovation that differentiate Stitch Fix in the marketplace. Before Stitch Fix, Magnusson managed the data platform architecture team at Netflix where he helped design and open source many of the components of the Hadoop-based infrastructure and big data platform.
In this episode, Wayne Eckerson and Lenin Gali discuss the past and future of the cloud and big data.
Gali is a data analytics practitioner who has always been on the leading edge of where business and technology intersect. He was one of the first to move data analytics to the cloud when he was BI director at ShareThis, a social media based services provider. He was instrumental in defining an enterprise analytics strategy, developing a data platform that brought games and business data together to enable thousands of data users to build better games and services by using Hadoop & Teradata while at Ubisoft. He is now spearheading the creation of a Hadoop-based data analytics platform at Quotient, a digital marketing technology firm in the retail industry.
In this podcast, Henry Eckerson and Stephen Smith discuss the movement to operationalize data science.
Smith is a well-respected expert in the fields of data science, predictive analytics and their application in the education, pharmaceutical, healthcare, telecom and finance industries. He co-founded and served as CEO of G7 Research LLC and the Optas Corporation which provided the leading CRM / Marketing Automation solution in the pharmaceutical and healthcare industries.
Smith has published journal articles in the fields of data mining, machine learning, parallel supercomputing, text understanding, and simulated evolution. He has published two books through McGraw-Hill on big data and analytics and holds several patents in the fields of educational technology, big data analytics, and machine learning. He holds a BS in Electrical Engineering from MIT and an MS in Applied Sciences from Harvard University. He is currently the research director of data science at Eckerson Group.
In this podcast, Wayne Eckerson and Joe Caserta discuss what constitutes a modern data platform. Caserta is President of a New York City-based consulting firm he founded in 2001 and a longtime data guy. In 2004, Joe teamed up with data warehousing legend, Ralph Kimball to write to write the book The Data Warehouse ETL Toolkit. Today he’s now one of the leading authorities on big data implementations. This makes Joe one of the few individuals with in-the-trenches experience on both sides of the data divide, traditional data warehousing on relational databases and big data implementations on Hadoop and the cloud. His perspectives are always insightful.