talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

Blockchain technology, cryptocurrencies and decentralised finance are described by some as massively disruptive technologies that will turn our existing financial system on its head. For the traditional financial services industry, these technologies have the potential to create huge efficiency gains and democratise more complex financial services for individual users. On the other hand, DeFi also reduces – and potentially removes – the need for trusted intermediaries, which makes the model unsettling to some operators in the current financial system. DeFi also opens the opportunity for global financial inclusion of enterprises and private individuals in developing markets – a very large group whose needs are typically unmet by traditional finance. With all this huge potential about to be released, we better learn why these technologies are so revolutionary and what will they do for us now and in the future. To answer these questions and many more relating to DeFi, I recently spoke to Daniel Liebau. Dan is the Chief Investment Officer, Blockchain Strategy at Modular Asset Management and the Founding Chairman of Lightbulb Capital, a DeFi investment and consulting firm. In this episode of Leaders of Analytics, Dan and I discuss: Why is DeFi so revolutionary and the opportunities and risks that lie within this space for individual users, corporations and nation statesThe difference between Payment, Utility and Security tokens and how these are likely to be used in our future financial systemThe utility of NFTs and their future as an asset categoryHow blockchains, cryptocurrencies and DeFi will be part of our lives in 5, 10 and 20 years respectivelyWhat Dan is teaching his FinTech, crypto and DeFi students, and much more.  Daniel Liebau on LinkedIn: https://www.linkedin.com/in/liebauda/ Lightbulb Capital: https://www.lightbulbcap.com/

Data Science on the Google Cloud Platform, 2nd Edition

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline with cloud native tools on GCP. Throughout this updated second edition, you'll work through a sample business decision by employing a variety of data science approaches. Follow along by building a data pipeline in your own project on GCP, and discover how to solve data science problems in a transformative and more collaborative way. You'll learn how to: Employ best practices in building highly scalable data and ML pipelines on Google Cloud Automate and schedule data ingest using Cloud Run Create and populate a dashboard in Data Studio Build a real-time analytics pipeline using Pub/Sub, Dataflow, and BigQuery Conduct interactive data exploration with BigQuery Create a Bayesian model with Spark on Cloud Dataproc Forecast time series and do anomaly detection with BigQuery ML Aggregate within time windows with Dataflow Train explainable machine learning models with Vertex AI Operationalize ML with Vertex AI Pipelines

Nothing has galvanized the data community more in recent months than two new architectural paradigms for managing enterprise data. On one side there is the data fabric: a centralized architecture that runs a variety of analytic services and applications on top of a layer of universal connectivity. On the other side, is a data mesh: a decentralized architecture that empowers domain owners to manage their own data according to enterprise standards and make it available to peers as they desire.

Most data leaders are still trying to ferret out the implications of both approaches for their own data environments. One of those is Srinivasan Sankar, the enterprise data & analytics leader at Hanover Insurance Group. In this wide-ranging, back-and-forth discussion, Sankar and Eckerson explore the suitability of the data mesh for Hanover, how the Data Fabric might support a Data Mesh, whether a Data Mesh obviates the need for a data warehouse, and practical steps Hanover might to take implement a Data Mesh built on top of a Data Fabric.

Key Takeaways: - What is the essence of a data mesh?
- How does it relate to the data fabric? - Does the data mesh require a cultural transformation? - Does the data mesh obviate the need for a data warehouse? - How does data architecture as a service fit with the data mesh? - What is the best way to roll out a data mesh? - What's the role of a data catalog? - What is a suitable roadmap for full implementation?

Summary At the foundational layer many databases and data processing engines rely on key/value storage for managing the layout of information on the disk. RocksDB is one of the most popular choices for this component and has been incorporated into popular systems such as ksqlDB. As these systems are scaled to larger volumes of data and higher throughputs the RocksDB engine can become a bottleneck for performance. In this episode Adi Gelvan shares the work that he and his team at SpeeDB have put into building a drop-in replacement for RocksDB that eliminates that bottleneck. He explains how they redesigned the core algorithms and storage management features to deliver ten times faster throughput, how the lower latencies work to reduce the burden on platform engineers, and how they are working toward an open source offering so that you can try it yourself with no friction.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. TimescaleDB, from your friends at Timescale, is the leading open-source relational database with support for time-series data. Time-series data is time stamped so you can measure how a system is changing. Time-series data is relentless and requires a database like TimescaleDB with speed and petabyte-scale. Understand the past, monitor the present, and predict the future. That’s Timescale. Visit them today at dataengineeringpodcast.com/timescale Your host is Tobias Macey and today I’m interviewing Adi Gelvan about his work on SpeeDB, the "next generation data engine"

Interview

Introduction How did you get involved in the area of data management? Can you describe what SpeeDB is and the story behind it? What is your target market and customer?

What are some of the shortcomings of RocksDB t

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor. Privacera is an enterprise grade solution for cloud and hybrid data governance built on top of the robust and battle tested Apache Ranger project. In this episode Balaji Ganesan shares how his experiences building and maintaining Ranger in previous roles helped him understand the needs of organizations and engineers as they define and evolve their data governance policies and practices.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer-friendly data catalog for the modern data stack. Open Source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga and others. Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The most important piece of any data project is the data itself, which is why it is critical that your data source is high quality. PostHog is your all-in-one product analytics suite including product analysis, user funnels, feature flags, experimentation, and it’s open source so you can host it yourself or let them do it for you! You have full control over your data and their plugin system lets you integrate with all of your other data tools, including data warehouses and SaaS platforms. Give it a try today with their generous free tier at dataengineeringpodcast.com/posthog Your host is Tobias Macey and today I’m interviewing Balaji Ganesan about his work at Privacera and his view on the state of data governance, access control, and security in the cloud

Interview

Introduction How did you get involved in the area of data management? Can you describe what Privacera is and the story behind it? What is your working definition of "data governance" and how does that influence your product focus and priorities? What are some of the lessons that you learned from your work on Apache Ranger that helped with your efforts at Privacera? How would you characterize your position in the market for data governance/data security tools? What are the unique constraints and challenges that come into play when managing data in cloud platforms? Can you explain how the Privacera platform is architected?

How have the design and goals of the system changed or evolved since you started working on it?

What is the workflow for an operator integrating Privacera into a data platform?

How do you provide feedback to users about the level of coverage for discovered data assets?

How does Privacera fit into the workflow of the different personas working with data?

What are some of the security and privacy controls that Privacera introduces?

How do you mitigate the potential for anyone to bypass Privacera’s controls by interacting directly with the underlying systems? What are the most interesting, innovative, or unexpected ways that you have seen Privacera used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Privacera? When is Privacera the wrong choice? What do you have planned for the future of Privacera?

Contact Info

LinkedIn @Balaji_Blog on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers

Links

Privacera Hadoop Hortonworks Apache Ranger Oracle Teradata Presto/Trino Starburst

Podcast Episode

Ahana

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By: Acryl: Acryl

The modern data stack needs a reimagined metadata management platform. Acryl Data’s vision is to bring clarity to your data through its next generation multi-cloud metadata management platform. Founded by the leaders that created projects like LinkedIn DataHub and Airbnb Dataportal, Acryl Data enables delightful search and discovery, data observability, and federated governance across data ecosystems. Signup for the SaaS product today at dataengineeringpodcast.com/acrylSupport Data Engineering Podcast

podcast_episode
by Cris deRitis , Mark Zandi (Moody's Analytics) , Eric Gaus (Moody's Analytics) , Ryan Sweet

Eric Gaus, Senior Economist at Moody's Analytics, joins the podcast to discuss an array of issues that are bothering them, including U.S. stock prices, yield curve, and the labor market. The podcast also provides an update on the economic impact of the military conflict between Russian-Ukraine. The big topic is geopolitical risk. Full  Episode Transcript. Follow Mark Zandi @MarkZandi, Ryan Sweet @RealTime_Econ and Cris deRitis on LinkedIn for additional insight. 

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

A debate has erupted on data Twitter and data Substack - should the modern data stack remain unbundled, or should it consolidate? In this conversation, Benn Stancil (Mode), David Jayatillake (Avora) and our host Tristan Handy try to make some sense of this debate, and play with various future scenarios for the modern data stack.  For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com.  The Analytics Engineering Podcast is sponsored by dbt Labs.

Simplify Big Data Analytics with Amazon EMR

Simplify Big Data Analytics with Amazon EMR is a thorough guide to harnessing Amazon's EMR service for big data processing and analytics. From distributed computation pipelines to real-time streaming analytics, this book provides hands-on knowledge and actionable steps for implementing data solutions efficiently. What this Book will help me do Understand the architecture and key components of Amazon EMR and how to deploy it effectively. Learn to configure and manage distributed data processing pipelines using Amazon EMR. Implement security and data governance best practices within the Amazon EMR ecosystem. Master batch ETL and real-time analytics techniques using technologies like Apache Spark. Apply optimization and cost-saving strategies to scalable data solutions. Author(s) Sakti Mishra is a seasoned data professional with extensive expertise in deploying scalable analytics solutions on cloud platforms like AWS. With a background in big data technologies and a passion for teaching, Sakti ensures practical insights accompany every concept. Readers will find his approach thorough, hands-on, and highly informative. Who is it for? This book is perfect for data engineers, data scientists, and other professionals looking to leverage Amazon EMR for scalable analytics. If you are familiar with Python, Scala, or Java and have some exposure to Hadoop or AWS ecosystems, this book will empower you to design and implement robust data pipelines efficiently.

Data is everywhere, but do we know what it means? A common problem for many enterprises wanting to adopt cutting edge, data-driven solutions is that they have a ton of legacy applications interlinking with more modern tech stacks. If the organisation is large or complex enough, it typically becomes unrealistic for any one individual to understand how it all hangs together. All of these applications generate data points with their own definitions, meaning and naming conventions. How do organisations like these set themselves up for success in a data-driven world, technically and culturally? How can we create a consistent and holistic view of our data that can be used equally by technologists, analysts and business users? To answer these questions, I recently spoke to David P. Mariani who is the founder and Chief Technology Officer of AtScale. Dave is an incredibly talented technology executive and entrepreneur with more than $800 million worth of company exits on his resume. In this episode of Leaders of Analytics, we discuss: How to create successful technology companies from scratchWhat David learned during his time at Yahoo! that made him start AtScaleWhat a semantic layer is and what it does for your organisationWhat David’s utopian technology stack would look like and whyDavid’s vision for how data-driven organisations will function in the futureHow a universal semantic layer fits into this future, and much more.David's LinkedIn: https://www.linkedin.com/in/davidpmariani/ AtScale's company website (lots of great content on here): https://www.atscale.com/

Nothing has galvanized the data community more in recent months than two new architectural paradigms for managing enterprise data. On one side there is the data fabric: a centralized architecture that runs a variety of analytic services and applications on top of a layer of universal connectivity. On the other side, is a data mesh: a decentralized architecture that empowers domain owners to manage their own data according to enterprise standards and make it available to peers as they desire.

Most data leaders are still trying to ferret out the implications of both approaches for their own data environments. One of those is Srinivasan Sankar, the enterprise data & analytics leader at Hanover Insurance Group. In this wide-ranging, back-and-forth discussion, Sankar and Eckerson explore the suitability of the data mesh for Hanover, how the Data Fabric might support a Data Mesh, whether a Data Mesh obviates the need for a data warehouse, and practical steps Hanover might to take implement a Data Mesh built on top of a Data Fabric.

Data Analytics, Computational Statistics, and Operations Research for Engineers

This book investigates the role of data mining in computational statistics for machine learning. It offers applications that can be used in various domains and examines the role of transformation functions in optimizing problem statements.

Mark, Ryan, and Cris do a deep dive into GDP. What is it? How is it measured and what are it's shortcomings? Full episode transcript. Kennedy notably outlined why he thought the gross national product was an insufficient measure of success.[Note 1] He emphasized the negative values it accounted for and the positive ones it ignored:[6] Even if we act to erase material poverty, there is another greater task, it is to confront the poverty of satisfaction - purpose and dignity - that afflicts us all. Too much and for too long, we seemed to have surrendered personal excellence and community values in the mere accumulation of material things. Our Gross National Product, now, is over $800 billion dollars a year, but that Gross National Product - if we judge the United States of America by that - that Gross National Product counts air pollution and cigarette advertising, and ambulances to clear our highways of carnage. It counts special locks for our doors and the jails for the people who break them. It counts the destruction of the redwood and the loss of our natural wonder in chaotic sprawl. It counts napalm and counts nuclear warheads and armored cars for the police to fight the riots in our cities. It counts Whitman's rifle and Speck's knife, and the television programs which glorify violence in order to sell toys to our children. Yet the gross national product does not allow for the health of our children, the quality of their education or the joy of their play. It does not include the beauty of our poetry or the strength of our marriages, the intelligence of our public debate or the integrity of our public officials. It measures neither our wit nor our courage, neither our wisdom nor our learning, neither our compassion nor our devotion to our country, it measures everything in short, except that which makes life worthwhile. And it can tell us everything about America except why we are proud that we are Americans. If this is true here at home, so it is true elsewhere in world.   Follow Mark Zandi @MarkZandi, Ryan Sweet @RealTime_Econ and Cris deRitis on LinkedIn for additional insight. 

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Mai AlOwaish (Gulf Bank) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

Can a digital analyst make it to the C-suite? And, if she does, will she wonder, "Oh, dear. WHAT have I gotten myself into?!" The answer to the first question is "Yes!" And our guest for this episode is a proof point: Mai AlOwaish is the Chief Data and Innovation Officer at Gulf Bank, and she spent a good portion of her career in digital analytics before taking on that role! The answer to the second question is, "Not if you go in with a clear vision and strategy!" But, of course, there's a lot more to it than that! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.