Analytics

Graph Algorithms

2019-05-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mark Needham , Amy E. Hodler

AI/ML Neo4j Spark data data-science

Learn how graph algorithms can help you leverage relationships within your data to develop intelligent solutions and enhance your machine learning models. With this practical guide,developers and data scientists will discover how graph analytics deliver value, whether they’re used for building dynamic network models or forecasting real-world behavior. Mark Needham and Amy Hodler from Neo4j explain how graph algorithms describe complex structures and reveal difficult-to-find patterns—from finding vulnerabilities and bottlenecksto detecting communities and improving machine learning predictions. You’ll walk through hands-on examples that show you how to use graph algorithms in Apache Spark and Neo4j, two of the most common choices for graph analytics. Learn how graph analytics reveal more predictive elements in today’s data Understand how popular graph algorithms work and how they’re applied Use sample code and tips from more than 20 graph algorithm examples Learn which algorithms to use for different types of questions Explore examples with working code and sample datasets for Spark and Neo4j Create an ML workflow for link prediction by combining Neo4j and Spark

Enterprise Insight with Dinesh Nirmal - Making Data Simple [Season 3 - Episode 19]

2019-05-15 · Making Data Simple Listen

podcast_episode

by Dinesh Nirmal (IBM Software) , Al Martin (IBM)

AI/ML Big Data Blockchain Data Analytics IBM React

Send us a text This week on Making Data Simple, Dinesh Nirmal comes on the show to discuss current industry trends. Host Al Martin poses questions that are both technical and leadership oriented. Together, they discuss the new, emerging technologies that drives them while providing their own definitions of team building and success. Listen, engage, react. Give us your feedback and get in on the conversation.

Show Notes Check us out on: - YouTube - Apple Podcasts - Google Play Music - Spotify - TuneIn - Stitcher 00:10 - Connect with Producer Steve Moore on LinkedIn and Twitter. 00:15 - Connect with Producer Liam Seston on LinkedIn and Twitter. 00:20 - Connect with Producer Rachit Sharma on LinkedIn. 00:25 - Connect with Host Al Martin on LinkedIn and Twitter. 01:37 - Connect with Dinesh Nirmal on LinkedIn and Twitter. 06:06 - An interesting read on the state of illegal dumping in rural California 11:14 - Some examples of successful AI uses cases. 14:31 - Learn about blockchain here. 29:06 - Find out how open source is helping remove data silos in the enterprise. 32:40 - Check out IBM's content on big data analytics. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

How to Deliver 14 Dashboards Users Love in 2 Months with Jeremy Kuhlenbeck

2019-05-15 · Analytics on Fire Listen

podcast_episode

by Mico Yuk (Data Storytelling Academy) , Jeremy Kuhlenbeck (cBeyondData)

BI

Jeremy Kuhlenbeck is a BI rockstar! As a Senior BI Consultant at cBeyondData, Jeremy used our BI DataStorytelling Mastery class tools, techniques, and methods to turn an almost failing BI project around at the last minute. Jeremy describes his bumpy road to becoming a chief data storyteller, and the actions you can also take to do so!

Enjoyed the Show? Please leave us a review on iTunes. Sponsor

BI Data Storytelling Mastery Accelerator – This season of AOF is sponsored by our BIDS live 2-Day workshops. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of two days, you'll leave with a clear BI delivery action plan for your BI team.  Join us!

For all links and resources mentioned visit: https://bibrainz.com/podcast/21

[Rebroadcast] Ep.2 - The End of Tech Companies with Rob Thomas

2019-05-10 · Making Data Simple Listen

podcast_episode

by Rob Thomas , Dr. Patrick E McSharry , Al Martin (IBM)

Big Data C#/.NET HTML IBM

Send us a text In an ever-changing and growing age of data and technology, how can you turn data into better decisions for your company? How do you keep up? Is there a recipe for greatness? In this podcast, Rob Thomas, General Manager of Analytics at IBM, discusses data, tech companies and his two books, Big Data Revolution and The End of Tech Companies. "A crisis has arrived whether you know it or not." What are you doing to prepare? Show Notes 00:30 Connect with Al Martin on Twitter (@amartin_v) and LinkedIn (linkedin.com/in/al-martin-ku) 01:00 Connect with Rob Thomas on Twitter (@robdthomas) and LinkedIn (linkedin.com/in/robertdthomas) and read more of his work on his blog https://www.robdthomas.com/ 02:30 Big Data Revolution By Rob Thomas & Patrick McSharry, The End of Tech Companies by Rob Thomas 04:35 Find Rob Thomas' first blog post here: https://www.robdthomas.com/robdthomas//2013/02/patterns-in-big-data.html 05:30 Connect with Dr. Patrick E McSharry on LinkedIn linkedin.com/in/mcsharry, his personal website mcsharry.net or Twitter @patrickmcsharry 06:20 http://www.costar.com/ 14:10 Connect with Warren Buffett on Twitter (@WarrenBuffett) 14:40 Connect with Clayton Christensen on Twitter (@claychristensen) and LinkedIn (linkedin.com/in/claytonchristensen) 24:50 Learn More about DomusKids on their website http://domuskids.org/ and connect with them on Twitter @DomusKids 26:15 Above the Line: Lessons in Leadership and Life from a Championship Season by Urban Meyer & Wayne Coffey 26:30 Chasing Excellence: A Story About Building the World's Fittest Athletes by Ben Bergeron Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

The difference between a BI Manager and a BI Leader with Heather Sinkwitz of Mobile Mini

2019-05-08 · Analytics on Fire Listen

podcast_episode

by Mico Yuk (Data Storytelling Academy) , Heather Sinkwitz (Mobile Mini Solutions)

BI

In our newest episode of Analytics on Fire, I talk with Heather Sinkwitz. She is the BI Manager at Mobile Mini Solutions, a mobile storage facility management company based in Phoenix, Az. We talk extensively about the difference between a BI Manager and a BI leader. She shares her journey and how she inspires team and organization through team dynamics and data.

  Sponsor

This exciting season of AOF is sponsored by our BI Data Storytelling Mastery Accelerator 2-Day Live workshops. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of two days, you'll leave with a clear BI delivery action plan for your BI team.  Join us!

Enjoyed the Show? Please leave us a review on iTunes.

For all links and resources mentioned visit: https://bibrainz.com/podcast/20

Analyzing Social Media Networks with NodeXL, 2nd Edition

2019-05-08 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ben Shneiderman (University of Maryland) , Marc A. Smith , Derek Hansen , Itai Himelboim

data data-science data-science-tasks graph-analytics

Analyzing Social Media Networks with NodeXL: Insights from a Connected World, Second Edition, provides readers with a thorough, practical and updated guide to NodeXL, the open-source social network analysis (SNA) plug-in for use with Excel. The book analyzes social media, provides a NodeXL tutorial, and presents network analysis case studies, all of which are revised to reflect the latest developments. Sections cover history and concepts, mapping and modeling, the detailed operation of NodeXL, and case studies, including e-mail, Twitter, Facebook, Flickr and YouTube. In addition, there are descriptions of each system and types of analysis for identifying people, documents, groups and events. This book is perfect for use as a course text in social network analysis or as a guide for practicing NodeXL users. Walks users through NodeXL while also explaining the theory and development behind each step Demonstrates how visual analytics research can be applied to SNA tools for the mass market Includes updated case studies from researchers who use NodeXL on popular networks like email, Facebook, Twitter, and Instagram Includes downloadable companion materials and online resources at https://www.smrfoundation.org/nodexl/teaching-with-nodexl/teaching-resources/

Using FoundationDB As The Bedrock For Your Distributed Systems

2019-05-07 · Data Engineering Podcast Listen

podcast_episode

by Ryan Worl , Tobias Macey

AI/ML AWS Big Data Cloud Computing Data Analytics Data Engineering Data Management Data Science ELK Marketing Data Streaming

Summary The database market continues to expand, offering systems that are suited to virtually every use case. But what happens if you need something customized to your application? FoundationDB is a distributed key-value store that provides the primitives that you need to build a custom database platform. In this episode Ryan Worl explains how it is architected, how to use it for your applications, and provides examples of system design patterns that can be built on top of it. If you need a foundation for your distributed systems, then FoundationDB is definitely worth a closer look.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support. Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Ryan Worl about FoundationDB, a distributed key/value store that gives you t

Visual Analytics with Tableau

2019-05-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Alexander Loth

DataViz Tableau data data-science data-science-tasks data-visualization

A four-color journey through a complete Tableau visualization Tableau is a popular data visualization tool that’s easy for individual desktop use as well as enterprise. Used by financial analysts, marketers, statisticians, business and sales leadership, and many other job roles to present data visually for easy understanding, it’s no surprise that Tableau is an essential tool in our data-driven economy. Visual Analytics with Tableau is a complete journey in Tableau visualization for a non-technical business user. You can start from zero, connect your first data, and get right into creating and publishing awesome visualizations and insightful dashboards. • Learn the different types of charts you can create • Use aggregation, calculated fields, and parameters • Create insightful maps • Share interactive dashboards Geared toward beginners looking to get their feet wet with Tableau, this book makes it easy and approachable to get started right away.

[Rebroadcast] Ep.1 - The Big Data Problem with Daniel Hernandez

2019-05-03 · Making Data Simple Listen

podcast_episode

by Daniel Hernandez (IBM) , Al Martin (IBM)

Big Data IBM

Send us a text In this first episode of Making Data Simple, host Al Martin welcomes Daniel Hernandez, Vice President of IBM Analytics Offering Management, who helps us navigate "the big data problem" and shares why he doesn't like the term "big data." Show Notes: 01:30 Connect with Al Martin on Twitter (@amartin_v) and LinkedIn (linkedin.com/in/al-martin-ku) 04:30 Connect with Daniel Hernandez on Twitter (@danhernandezATX) and LinkedIn (linkedin.com/in/danielghernandez) 06:15 NPS = Net Promoter Score (http://www.medallia.com/net-promoter-score/) 08:40 The four Vs of Big Data (http://www.ibmbigdatahub.com/infographic/four-vs-big-data) 17:30 Accidental Empires written by Robert X. Cringely (1996), Dealers of Lightening: Xerox PARC and the Dawn of the Computer Age, written by Michael A Hiltzik (2000) Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

How to Stop users from questioning your KPIs with Jonathan Sharr of Middlesex Health

2019-05-01 · Analytics on Fire Listen

podcast_episode

by Mico Yuk (Data Storytelling Academy) , Jonathan Sharr (Middlesex Health)

BI KPI

We kick off season 2 with a true business intelligence success story! Jonathan Sharr is the kind of story that keeps us going! Using our free BIDF KPI Blueprint template, he put all questions around KPI definitions to an end. Since then he went from being an analyst to now Manager of Business Intelligence & Analytics at Middlesex Health. He describes the exact moment as an analyst when he knew something had to change, how he currently moves his team forward, and how you too can use some of his techniques to stop the KPI madness!

  Sponsor

This exciting season of AOF is sponsored by our BI Data Storytelling Mastery Accelerator 2-Day Live workshops. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of two days, you'll leave with a clear BI delivery action plan for your BI team.  Join us!

Enjoyed the Show? Please leave us a review on iTunes.

For the show notes and all the free resources mentioned visit: https://bibrainz.com/podcast/19

Welcome back AoF Community!

2019-05-01 · Analytics on Fire Listen

podcast_episode

by Mico Yuk (Data Storytelling Academy)

BI

Welcome back to the Analytics on Fire folks! We have quite a season ahead of us with some big changes and announcements. This episode is pretty short, only 15 minutes but jam-packed with the good. We're glad to be back. Enjoy!

 Sponsor

This exciting season of AOF is sponsored by our BI Data Storytelling Mastery Accelerator 2-Day Live workshops. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of two days, you'll leave with a clear BI delivery action plan for your BI team.  Join us!

Enjoyed the Show? Please leave us a review on iTunes.

For all links and resources mentioned visit: https://bibrainz.com/podcast/18

Catching the Domo Spirit - Audio Blog

2019-04-30 · Secrets of Data Analytics Leaders Listen

podcast_episode

Last month, I attended Domo’s annual user conference for the first time. I came a skeptic, but left a believer. Domo has invested large sums of money to create a comprehensive data and analytics platform that scales to run small and medium-size businesses, and possibly large ones. Most importantly, it has a cadre of highly satisfied brand-name customers who want to extend the platform to support all business users and their analytic applications.

Originally published at: https://www.eckerson.com/articles/catching-the-domo-spirit

Streams Everywhere - Towards Streaming-First Architectures - Audio Blog

2019-04-30 · Secrets of Data Analytics Leaders Listen

podcast_episode

Data Streaming

Processing continuous data streams is becoming increasingly important. However, traditional analytics architectures were often not built for real-time scenarios. This article will illustrate challenges and discuss how streaming-first approaches can change the way we think about analytics architectures.

Originally published at: https://www.eckerson.com/articles/streams-everywhere-towards-streaming-first-architectures

Data Architecture: A Primer for the Data Scientist, 2nd Edition

2019-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mary Levins , Daniel Linstedt , W. H. Inmon

Big Data Data Science DWH data data-engineering

Over the past 5 years, the concept of big data has matured, data science has grown exponentially, and data architecture has become a standard part of organizational decision-making. Throughout all this change, the basic principles that shape the architecture of data have remained the same. There remains a need for people to take a look at the "bigger picture" and to understand where their data fit into the grand scheme of things. Data Architecture: A Primer for the Data Scientist, Second Edition addresses the larger architectural picture of how big data fits within the existing information infrastructure or data warehousing systems. This is an essential topic not only for data scientists, analysts, and managers but also for researchers and engineers who increasingly need to deal with large and complex sets of data. Until data are gathered and can be placed into an existing framework or architecture, they cannot be used to their full potential. Drawing upon years of practical experience and using numerous examples and case studies from across various industries, the authors seek to explain this larger picture into which big data fits, giving data scientists the necessary context for how pieces of the puzzle should fit together. New case studies include expanded coverage of textual management and analytics New chapters on visualization and big data Discussion of new visualizations of the end-state architecture

Elasticsearch 7.0 Cookbook - Fourth Edition

2019-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alberto Paro

Big Data Data Analytics ELK data data-engineering elasticsearch search

"Elasticsearch 7.0 Cookbook" is a practical guide to effectively using Elasticsearch, packed with over 100 recipes that cover everything from simple setup tasks to advanced query creation. Whether you're deploying Elasticsearch nodes or integrating with various technologies, this book will empower you to make the most out of Elasticsearch's robust search capabilities. What this Book will help me do Understand how to efficiently deploy and manage Elasticsearch architectures within your enterprise. Learn to create and optimize queries for effective analytics and data retrieval. Explore advanced indexing and mapping techniques to enhance data searchability. Monitor and scale your Elasticsearch clusters to ensure optimal performance. Integrate Elasticsearch with programming languages and big data applications. Author(s) Alberto Paro, a seasoned Elasticsearch expert, brings years of experience in designing and implementing large-scale search and analytics solutions. His practical experience in guiding teams through complex Elasticsearch deployments is evident in his clear and solution-focused writing approach. Alberto's passion for technology drives his mission to make advanced technical topics accessible. Who is it for? This book is ideal for software engineers, data professionals, and Elasticsearch developers who are looking to expand their technical capabilities in search and data analytics. It is also suited for individuals in industries like e-commerce utilizing Elastic for insights. A basic understanding of Elasticsearch will allow readers to gain deeper value from this book.

TIBCO Spotfire: A Comprehensive Primer - Second Edition

2019-04-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michael Phillips , Andrew Berridge

BI Data Analytics DataViz TIBCO Spotfire analytics-platforms data data-science spotfire

Explore the possibilities of TIBCO Spotfire with this comprehensive guide. You'll start with fundamental data visualization principles and progress to creating powerful, professional-grade analytics dashboards and applications. By following this book, you'll master both basic usage and advanced features such as predictive and spatial analytics. What this Book will help me do Understand the fundamentals of TIBCO Spotfire and its various interfaces including web and desktop clients. Utilize Spotfire's range of visualization tools to effectively analyze and present data. Develop robust analytics dashboards and applications tailored for enterprise needs. Implement advanced features like predictive analytics and location-based data representations. Learn strategies for deploying and administrating Spotfire in a scalable, enterprise-oriented environment. Author(s) The authors, None Berridge and None Phillips, bring years of experience in business intelligence and data analytics. Their practical knowledge and real-world perspective shape the book into a practical resource for learning Spotfire. Their approach ensures that concepts are clearly explained with relatable examples, improving accessibility for all readers. Who is it for? This book is intended for business intelligence professionals, data analysts, and developers who aim to enhance their analytics skills using TIBCO Spotfire. It is suitable for beginners as no prior experience with Spotfire or advanced analytics is required. Readers looking to develop enterprise-grade visualization and analytical solutions will find it valuable.

Running Your Database On Kubernetes With KubeDB

2019-04-29 · Data Engineering Podcast Listen

podcast_episode

by Tamal Saha , Tobias Macey

AI/ML AWS Big Data Cloud Computing Data Analytics Data Engineering Data Management Data Science ELK Kubernetes Marketing Data Streaming

Summary Kubernetes is a driving force in the renaissance around deploying and running applications. However, managing the database layer is still a separate concern. The KubeDB project was created as a way of providing a simple mechanism for running your storage system in the same platform as your application. In this episode Tamal Saha explains how the KubeDB project got started, why you might want to run your database with Kubernetes, and how to get started. He also covers some of the challenges of managing stateful services in Kubernetes and how the fast pace of the community has contributed to the evolution of KubeDB. If you are at any stage of a Kubernetes implementation, or just thinking about it, this is definitely worth a listen to get some perspective on how to leverage it for your entire application stack.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support. Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your fri

Data Science and Engineering at Enterprise Scale

2019-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jerome Nilmeier

AI/ML Data Science Python Spark SQL Data Streaming TensorFlow data data-science

As enterprise-scale data science sharpens its focus on data-driven decision making and machine learning, new tools have emerged to help facilitate these processes. This practical ebook shows data scientists and enterprise developers how the notebook interface, Apache Spark, and other collaboration tools are particularly well suited to bridge the communication gap between their teams. Through a series of real-world examples, author Jerome Nilmeier demonstrates how to generate a model that enables data scientists and developers to share ideas and project code. You’ll learn how data scientists can approach real-world business problems with Spark and how developers can then implement the solution in a production environment. Dive deep into data science technologies, including Spark, TensorFlow, and the Jupyter Notebook Learn how Spark and Python notebooks enable data scientists and developers to work together Explore how the notebook environment works with Spark SQL for structured data Use notebooks and Spark as a launchpad to pursue supervised, unsupervised, and deep learning data models Learn additional Spark functionality, including graph analysis and streaming Explore the use of analytics in the production environment, particularly when creating data pipelines and deploying code

Unpacking Fauna: A Global Scale Cloud Native Database

2019-04-22 · Data Engineering Podcast Listen

podcast_episode

by Evan Weaver (Fauna) , Tobias Macey

AI/ML AWS Big Data Cassandra Cloud Computing Data Analytics Data Engineering Data Management Data Modelling Data Science DynamoDB ELK +7 more

Summary One of the biggest challenges for any business trying to grow and reach customers globally is how to scale their data storage. FaunaDB is a cloud native database built by the engineers behind Twitter’s infrastructure and designed to serve the needs of modern systems. Evan Weaver is the co-founder and CEO of Fauna and in this episode he explains the unique capabilities of Fauna, compares the consensus and transaction algorithm to that used in other NewSQL systems, and describes the ways that it allows for new application design patterns. One of the unique aspects of Fauna that is worth drawing attention to is the first class support for temporality that simplifies querying of historical states of the data. It is definitely worth a good look for anyone building a platform that needs a simple to manage data layer that will scale with your business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support. Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Evan Weaver about FaunaDB, a modern operational data platform built for your cloud

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what FaunaDB is and how it got started? What are some of the main use cases that FaunaDB is targeting?

How does it compare to some of the other global scale databases that have been built in recent years such as CockroachDB?

Can you describe the architecture of FaunaDB and how it has evolved? The consensus and replication protocol in Fauna is intriguing. Can you talk through how it works?

What are some of the edge cases that users should be aware of? How are conflicts managed in Fauna?

What is the underlying storage layer?

How is the query layer designed to allow for different query patterns and model representations?

How does data modeling in Fauna compare to that of relational or document databases?

Can you describe the query format? What are some of the common difficulties or points of confusion around interacting with data in Fauna?

What are some application design patterns that are enabled by using Fauna as the storage layer? Given the ability to replicate globally, how do you mitigate latency when interacting with the database? What are some of the most interesting or unexpected ways that you have seen Fauna used? When is it the wrong choice? What have been some of the most interesting/unexpected/challenging aspects of building the Fauna database and company? What do you have in store for the future of Fauna?

Contact Info

@evan on Twitter LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Fauna Ruby on Rails CNET GitHub Twitter NoSQL Cassandra InnoDB Redis Memcached Timeseries Spanner Paper DynamoDB Paper Percolator ACID Calvin Protocol Daniel Abadi LINQ LSM Tree (Log-structured Merge-tree) Scala Change Data Capture GraphQL

Podcast.init Interview About Graphene

Fauna Query Language (FQL) CQL == Cassandra Query Language Object-Relational Databases LDAP == Lightweight Directory Access Protocol Auth0 OLAP == Online Analytical Processing Jepsen distributed systems safety research

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Index Your Big Data With Pilosa For Faster Analytics

2019-04-15 · Data Engineering Podcast Listen

podcast_episode

by Seebs , Tobias Macey

AI/ML AWS Big Data Cloud Computing Data Analytics Data Engineering Data Management Data Science ELK Marketing Data Streaming

Summary Database indexes are critical to ensure fast lookups of your data, but they are inherently tied to the database engine. Pilosa is rewriting that equation by providing a flexible, scalable, performant engine for building an index of your data to enable high-speed aggregate analysis. In this episode Seebs explains how Pilosa fits in the broader data landscape, how it is architected, and how you can start using it for your own analysis. This was an interesting exploration of a different way to look at what a database can be.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support. Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Seebs about Pilosa, an open source, distributed bitmap index

Interview

Introduction How did you get involved in the area of data

talk-data.com

Activity Trend

Top Events

Top Speakers

Graph Algorithms

Enterprise Insight with Dinesh Nirmal - Making Data Simple [Season 3 - Episode 19]

How to Deliver 14 Dashboards Users Love in 2 Months with Jeremy Kuhlenbeck

[Rebroadcast] Ep.2 - The End of Tech Companies with Rob Thomas

The difference between a BI Manager and a BI Leader with Heather Sinkwitz of Mobile Mini

Analyzing Social Media Networks with NodeXL, 2nd Edition

Using FoundationDB As The Bedrock For Your Distributed Systems

Visual Analytics with Tableau

[Rebroadcast] Ep.1 - The Big Data Problem with Daniel Hernandez

How to Stop users from questioning your KPIs with Jonathan Sharr of Middlesex Health

Welcome back AoF Community!

Catching the Domo Spirit - Audio Blog

Streams Everywhere - Towards Streaming-First Architectures - Audio Blog

Data Architecture: A Primer for the Data Scientist, 2nd Edition

Elasticsearch 7.0 Cookbook - Fourth Edition

TIBCO Spotfire: A Comprehensive Primer - Second Edition

Running Your Database On Kubernetes With KubeDB

Data Science and Engineering at Enterprise Scale

Unpacking Fauna: A Global Scale Cloud Native Database

Index Your Big Data With Pilosa For Faster Analytics