talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

In this podcast, Rahul Kashyap(@RCKashyap) talks about the state of security, technology, and business crossroad on Security and the mindset of a security led technologist. He sheds some light on past, present, and future security risks discussed some common leadership concerns, and how a technologist could circumvent that. This podcast is a must for all technologists and wannabe technologists to grow their organization.

Timeline: 0:29 Rahul's journey. 4:40 Rahul's current role. 7:58 How the types of cyberattacks have changed. 12:53 How has IT interaction evolved? 16:50 Problems security industry. 20:12 Market mindset vs. security mindset. 23:10 Ownership of data. 27:02 Cloud, saas, and security. 31:40 Priorities for securing an enterprise. 34:50 How security is secure enough. 37:40 Providing a stable core to the business. 41:11 The state of data science vis a vis security. 44:05 Future of security, data science, and AI. 46:14 Distributed computing and security. 50:30 Tenets of Rahul's success. 53:15 Rahul's favorite read. 54:35 Closing remarks.

Rahul's Recommended Read: Mindset: The New Psychology of Success – Carol S. Dweck http://amzn.to/2GvEX2F

Podcast Link: https://futureofdata.org/rckashyap-cylance-on-state-of-security-technologist-mindset-futureofdata-podcast/

Rahul's BIO: Rahul Kashyap is the Global Chief Technology Officer at Cylance, where he is responsible for strategy, products, and architecture.

Rahul has been instrumental in building several key security technologies viz: Network Intrusion Prevention Systems (NIPS), Host Intrusion Prevention Systems (HIPS), Web Application Firewalls (WAF), Whitelisting, Endpoint/Server Host Monitoring (EDR), and Micro-virtualization. He has been awarded several patents for his innovations. Rahul is an accomplished pen-tester and has in-depth knowledge of OS, networking, and security products.

Rahul has written several security research papers, blogs, and articles that are widely quoted and referenced by media around the world. He has built, led, and scaled award-winning teams that innovate and solve complex security challenges in both large and start-up companies.

He is frequently featured in several podcasts, webinars, and media briefings. Rahul has been a speaker at several top security conferences like BlackHat, BlueHat, Hack-In-The-Box, RSA, DerbyCon, BSides, ISSA International, OWASP, InfoSec UK, and others. He was named 'Silicon Valley's 40 under 40' by Silicon Valley Business Journal.

Rahul mentors entrepreneurs who work with select VC firms and is on the advisory board of tech start-ups.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Spark: The Definitive Guide

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Sparkâ??s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation

In this last part of the two-part podcast, @TimothyChou discussed the Internet of Things landscape's future. He laid out how the internet has always been about the internet of things and not the internet of people. He sheds light on the internet of things as it is spread across themes of things, connect, collect, learn, and do workflows. He builds an interesting case about achieving precision to introduction optimality.

Timeline: 0:29 Timothy's journey. 8:56 Selling cloud to Oracle. 15:57 Communicating economics and technology disruption. 23:54 Internet of people to the internet of things.

Timothy's Recommended Read: Life 3.0: Being Human in the Age of Artificial Intelligence by Max Tegmark http://amzn.to/2Cidyhy Zone to Win: Organizing to Compete in an Age of Disruption Paperback by Geoffrey A. Moore http://amzn.to/2Hd5zpv

Podcast Link: https://futureofdata.org/timothychou-on-world-of-iot-its-future-part-2/

Timothy's BIO: Timothy Chou has his career spanning through academia, successful (and not so successful) startups, and large corporations. He was one of only a few people to hold the President's title at Oracle. As President of Oracle On Demand, he grew the cloud business from its very beginning. Today that business is over $2B. He wrote about the move of applications to the cloud in 2004 in his first book, “The End of Software”. Today he serves on the board of Blackbaud, a nearly $700M vertical application cloud service company.

After earning his Ph.D. in EE at the University of Illinois, he went to work for Tandem Computers, one of the original Silicon Valley startups. Had he understood stock options, he would have joined earlier. He’s invested in and been a contributor to a number of other startups, some you’ve heard of like Webex, and others you’ve never heard of but were sold to companies like Cisco and Oracle. Today he is focused on several new ventures in cloud computing, machine learning, and the Internet of Things.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Advances in Financial Machine Learning

Machine learning (ML) is changing virtually every aspect of our lives. Today ML algorithms accomplish tasks that until recently only expert humans could perform. As it relates to finance, this is the most exciting time to adopt a disruptive technology that will transform how everyone invests for generations. Readers will learn how to structure Big data in a way that is amenable to ML algorithms; how to conduct research with ML algorithms on that data; how to use supercomputing methods; how to backtest your discoveries while avoiding false positives. The book addresses real-life problems faced by practitioners on a daily basis, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their particular setting. Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance.

Big Data Demystified

The full text downloaded to your computer With eBooks you can: search for key concepts, words and phrases make highlights and notes as you study share your notes with friends eBooks are downloaded to your computer and accessible either offline through the Bookshelf (available as a free download), available online and also via the iPad and Android apps. Upon purchase, you will receive via email the code and instructions on how to access this product. Time limit The eBooks products do not have an expiry date. You will continue to access your digital ebook products whilst you have your Bookshelf installed. 'Big Data' refers to a new class of data, to which 'big' doesn't quite do it justice. Much like an ocean is more than simply a deeper swimming pool, big data is fundamentally different to traditional data and needs a whole new approach. Packed with examples and case studies, this clear, comprehensive book will show you how to accumulate and utilise 'big data' in order to develop your business strategy. Big Data Demystified is your practical guide to help you draw deeper insights from the vast information at your fingertips; you will be able to understand customer motivations, speed up production lines, and even offer personalised experiences to each and every customer. With 20 years of industry experience, David Stephenson shows how big data can give you the best competitive edge, and why it is integral to the future of your business.

In this first part of a two-part podcast, @TimothyChou discussed the Internet of Things landscape. He laid out how the internet has always been about the internet of things and not the internet of people. He sheds light on the internet of things as it is spread across themes of things, connect, collect, learn, and do workflows. He builds an interesting case about achieving precision to introduction optimality.

Timeline: 0:29 Reason behind the failure of IoT projects. 19:10 Which businesses will be impacted by IoT expansion? 30:22 How is IoT getting impacted in the world of AI. 40:35 Innovative startups in the IoT industry. 49:17 What's slowing down IoT? 52:20 How much IoT and cloud are married together? 54:32 Timothy's success mantra. 56:16 Parting thoughts.

Timothy's Recommended Read: Life 3.0: Being Human in the Age of Artificial Intelligence by Max Tegmark http://amzn.to/2Cidyhy Zone to Win: Organizing to Compete in an Age of Disruption Paperback by Geoffrey A. Moore http://amzn.to/2Hd5zpv

Podcast Link: https://futureofdata.org/timothychou-on-world-of-iot-its-future-part-1-futureofdata-podcast/

Timothy's BIO: Timothy Chou has his career spanning through academia, successful (and not so successful) startups, and large corporations. He was one of only a few people to hold the President's title at Oracle. As President of Oracle On Demand, he grew the cloud business from its very beginning. Today that business is over $2B. He wrote about the move of applications to the cloud in 2004 in his first book, “The End of Software”. Today he serves on the board of Blackbaud, a nearly $700M vertical application cloud service company.

After earning his Ph.D. in EE at the University of Illinois, he went to work for Tandem Computers, one of the original Silicon Valley startups. Had he understood stock options, he would have joined earlier. He’s invested in and been a contributor to a number of other startups, some you’ve heard of like Webex, and others you’ve never heard of but were sold to companies like Cisco and Oracle. Today he is focused on several new ventures in cloud computing, machine learning, and the Internet of Things.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this podcast, Chuck Rehberg from Trigent Software sat with Vishal to discuss how, as a technologist, leaders should think about connecting technology to help solve real business pains. Chuck also shared some of the best practices technologists could adopt to built successful integrity-filled bias-free teams and solutions.

Timeline 0:29 Chuck's journey. 8:45 Chuck's role in Trigent. 14:18 Trigent's niche clients. 16:26 Semantics and Trigent model. 18:42 What is semantics? 22:00 The state of semantics today. 28:00 Best practices for businesses to use technology optimally. 33:13 Tips for businesses to remain stable in the time of disruptive technology. 36:18 App technology vis a vis enterprise stack. 39:43 Perspectives on the bias. 43:40 Measuring KPIs for success. 48:16 Ingredients of a good technology team. 50:56 Creating a technology team from scratch. 54:42 Things to be done in semantics. 58:52 Chuck's success mantra. 1:02:24 Chuck's favorite reads. 1:07:05 Closing remarks.

Chuck's Recommended Read: World Hypotheses: A Study in Evidence - by Stephen C. Pepper http://amzn.to/2GXGYVV Women, Fire and Dangerous Things: What Categories Reveal About the Mind - by George Lakoff http://amzn.to/2GWIQOA How to Solve It: A New Aspect of Mathematical Method (Princeton Science Library) - by G. Polya (Author),‎ John H. Conway (Foreword, Contributor) http://amzn.to/2BLECtw The Better Angels of Our Nature: Why Violence Has Declined - by Steven Pinker http://amzn.to/2EaLQZI Finite and Infinite Games – by James Carse (Author) http://amzn.to/2BLfIdx Being Mortal: Medicine and What Matters in the End - by Atul Gawande http://amzn.to/2BhgBtp

Podcast Link: https://futureofdata.org/chuckrehberg-trigentsoftware-translating-technology-solve-business-problems-futureofdata/

Here is Chuck's Bio: As CTO at Trigent Software and Chief Scientist at Semantic Insights, Chuck Rehberg has developed patented high-performance rules engine technology and advanced natural language understanding technologies that empower a new generation of semantic research solutions.

Chuck has more than thirty years in the high-tech industry, developing leading-edge solutions in the areas of Artificial Intelligence, Semantic Technologies, analytics, and product configuration software.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this episode, Wayne Eckerson and Lenin Gali discuss the past and future of the cloud and big data.

Gali is a data analytics practitioner who has always been on the leading edge of where business and technology intersect. He was one of the first to move data analytics to the cloud when he was BI director at ShareThis, a social media based services provider. He was instrumental in defining an enterprise analytics strategy, developing a data platform that brought games and business data together to enable thousands of data users to build better games and services by using Hadoop & Teradata while at Ubisoft. He is now spearheading the creation of a Hadoop-based data analytics platform at Quotient, a digital marketing technology firm in the retail industry.

In this podcast, Venu Vasudevan(@ProcterGamble) talks about the best practices of creating a research-led data-driven data science team. He walked through his journey of creating a robust and sustained data science team, spoke about bias in data science, and some practices leaders and data science practitioners could adopt to create an impactful data science team. This podcast is great for future data science leaders and practitioners leading organizations to put together a data science practice.

Timeline: 0:29 Venu's jouney. 11:18 Venu's current role in PNG. 13:11 Standardization of technology and IoT. 17:18 The state of AI. 19:46 Running an AI and data practice for a company. 22:30 Building a data science practice in a startup in comparison to a transnational company. 24:05 Dealing with bias. 27:32 Culture: a block or an opportunity. 30:05 Dealing with data we've never dealt with before. 32:32 Sustainable vs. disruption. 36:17 Starting a data science team. 38:34 Data science as an art of doing and science of doing business. 41:37 Tips to improve storytelling for a data practitioner. 43:30 Challenges in Venu's journey. 44:55 Tenets of a good data scientist. 47:27 Diversity in hiring. 50:50 KPI's to look out for if you are running an AI practice. 51:37 Venu's favorite read.

Venu's Recommended Read: Isaac Newton: The Last Sorcerer - Michael White http://amzn.to/2FzGV0N Against the Gods: The Remarkable Story of Risk - Peter L. Bernstein http://amzn.to/2DRPveU

Podcast Link: https://futureofdata.org/venu-vasudevan-venuv62-proctergamble-on-creating-a-rockstar-data-science-team-futureofdata/

Venu's BIO: Venu Vasudevan is Research Director, Data Science & AI at Procter & Gamble, where he directs the Data Science & AI organization at Procter & Gamble research. He is a technology leader with a track record of successful consumer & enterprise innovation at the intersection of AI, Machine Learning, Big Data, and IoT. Previously he was VP of Data Science at an IoT startup, a founding member of the Motorola team that created the Zigbee IoT standard, worked to create an industry-first zero-click interface for mobile with Dag Kittlaus (co-creator of Apple Siri), created an industry-first Google Glass experience for TV, an ARRIS video analytics and big data platform recently acquired by Comcast, and a social analytics platform leveraging Twitter that was featured in Wired Magazine and BBC. Venu held a Ph.D. (Databases & AI) from Ohio State University and was a Motorola’s Science Advisory Board (top 2% of Motorola technologists). He is an Adjunct Professor at Rice University’s Electrical and Computer Engineering department and was a mentor at Chicago’s 1871 startup incubator.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

If you work with a media agency (or are one) the first question to ask them is how many data scientists do you have? Do you prefer Amazon Web Services, Microsoft Azure, or the Google Cloud Platform? Come see examples from one of Canada's largest retailers of advertising spending that is wasted from poor targeting, access issues, and lack of big data understanding. We will also dive into examples of broken implementations of Analytics that cause even more issues. If you are not in-sourcing the core components of your Media and Analytics you are almost certainly at risk or already suffering from many of these problems. In this session, Martin and Charles Farina will show you what you need to find the right partner, but more importantly what you also have to provide.

In this podcast, Henry Eckerson and Stephen Smith discuss the movement to operationalize data science.

Smith is a well-respected expert in the fields of data science, predictive analytics and their application in the education, pharmaceutical, healthcare, telecom and finance industries. He co-founded and served as CEO of G7 Research LLC and the Optas Corporation which provided the leading CRM / Marketing Automation solution in the pharmaceutical and healthcare industries.

Smith has published journal articles in the fields of data mining, machine learning, parallel supercomputing, text understanding, and simulated evolution. He has published two books through McGraw-Hill on big data and analytics and holds several patents in the fields of educational technology, big data analytics, and machine learning. He holds a BS in Electrical Engineering from MIT and an MS in Applied Sciences from Harvard University. He is currently the research director of data science at Eckerson Group.

Summary

The majority of the conversation around machine learning and big data pertains to well-structured and cleaned data sets. Unfortunately, that is just a small percentage of the information that is available, so the rest of the sources of knowledge in a company are housed in so-called “Dark Data” sets. In this episode Alex Ratner explains how the work that he and his fellow researchers are doing on Snorkel can be used to extract value by leveraging labeling functions written by domain experts to generate training sets for machine learning models. He also explains how this approach can be used to democratize machine learning by making it feasible for organizations with smaller data sets than those required by most tooling.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Alex Ratner about Snorkel and Dark Data

Interview

Introduction How did you get involved in the area of data management? Can you start by sharing your definition of dark data and how Snorkel helps to extract value from it? What are some of the most challenging aspects of building labelling functions and what tools or techniques are available to verify their validity and effectiveness in producing accurate outcomes? Can you provide some examples of how Snorkel can be used to build useful models in production contexts for companies or problem domains where data collection is difficult to do at large scale? For someone who wants to use Snorkel, what are the steps involved in processing the source data and what tooling or systems are necessary to analyse the outputs for generating usable insights? How is Snorkel architected and how has the design evolved over its lifetime? What are some situations where Snorkel would be poorly suited for use? What are some of the most interesting applications of Snorkel that you are aware of? What are some of the other projects that you and your group are working on that interact with Snorkel? What are some of the features or improvements that you have planned for future releases of Snorkel?

Contact Info

Website ajratner on Github @ajratner on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Stanford DAWN HazyResearch Snorkel Christopher Ré Dark Data DARPA Memex Training Data FDA ImageNet National Library of Medicine Empirical Studies of Conflict Data Augmentation PyTorch Tensorflow Generative Model Discriminative Model Weak Supervision

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Complete Guide to Open Source Big Data Stack

See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together. In the Complete Guide to Open Source Big Data Stack, the author begins by creating a private cloud and then installs and examines Apache Brooklyn. After that, he uses each chapter to introduce one piece of the big data stack—sharing how to source the software and how to install it. You learn by simple example, step by step and chapter by chapter, as a real big data stack is created. The book concentrates on Apache-based systems and shares detailed examples of cloud storage, release management, resource management, processing, queuing, frameworks, data visualization, and more. What You’ll Learn Install a private cloud onto the local cluster using Apache cloud stack Source, install, and configure Apache: Brooklyn, Mesos, Kafka, and Zeppelin See how Brooklyn can be used to install Mule ESB on a cluster and Cassandra in the cloud Install and use DCOS for big data processing Use Apache Spark for big data stack data processing Who This Book Is For Developers, architects, IT project managers, database administrators, and others charged with developing or supporting a big data system. It is also for anyone interested in Hadoop or big data, and those experiencing problems with data size.

In this podcast, Wayne Eckerson and Joe Caserta discuss what constitutes a modern data platform. Caserta is President of a New York City-based consulting firm he founded in 2001 and a longtime data guy. In 2004, Joe teamed up with data warehousing legend, Ralph Kimball to write to write the book The Data Warehouse ETL Toolkit. Today he’s now one of the leading authorities on big data implementations. This makes Joe one of the few individuals with in-the-trenches experience on both sides of the data divide, traditional data warehousing on relational databases and big data implementations on Hadoop and the cloud. His perspectives are always insightful.

Practical Big Data Analytics

Practical Big Data Analytics is your ultimate guide to harnessing Big Data technologies for enterprise analytics and machine learning. By leveraging tools like Hadoop, Spark, NoSQL databases, and frameworks such as R, this book equips you with the skills to implement robust data solutions that drive impactful business insights. Gain practical expertise in handling data at scale and uncover the value behind the numbers. What this Book will help me do Master the fundamental concepts of Big Data storage, processing, and analytics. Gain practical skills in using tools like Hadoop, Spark, and NoSQL databases for large-scale data handling. Develop and deploy machine learning models and dashboards with R and R Shiny. Learn strategies for creating cost-efficient and scalable enterprise data analytics solutions. Understand and implement effective approaches to combining Big Data technologies for actionable insights. Author(s) None Dasgupta is an expert in Big Data analytics, statistical methodologies, and enterprise data solutions. With years of experience consulting on enterprise data platforms and working with leading industry technologies, Dasgupta brings a wealth of practical knowledge to help readers navigate and succeed in the field of Big Data. Through this book, Dasgupta shares an accessible and systematic way to learn and apply key Big Data concepts. Who is it for? This book is ideal for professionals eager to delve into Big Data analytics, regardless of their current level of expertise. It accommodates both aspiring analysts and seasoned IT professionals looking to enhance their knowledge in data-driven decision making. Individuals with a technical inclination and a drive to build Big Data architectures will find this book particularly beneficial. No prior knowledge of Big Data is required, although familiarity with programming concepts will enhance the learning experience.

podcast_episode
by Ryan Cabeen (Laboratory of Neuroimaging (LONI), USC) , Farshid Sepherband (Laboratory of Neuroimaging (LONI), USC) , Kyle Polich , Dr. Meng Law (Laboratory of Neuroimaging (LONI), USC) , Dr. Arthur Toga (Laboratory of Neuroimaging (LONI), USC)

Last year, Kyle had a chance to visit the Laboratory of Neuroimaging, or LONI, at USC, and learn about how some researchers are using data science to study the function of the brain. We're going to be covering some of their work in two episodes on Data Skeptic. In this first part of our two-part episode, we'll talk about the data collection and brain imaging and the LONI pipeline. We'll then continue our coverage in the second episode, where we'll talk more about how researchers can gain insights about the human brain and their current challenges. Next week, we'll also talk more about what all that has to do with data science machine learning and artificial intelligence. Joining us in this week's episode are members of the LONI lab, which include principal investigators, Dr. Arthur Toga and Dr. Meng Law, and researchers, Farshid Sepherband, PhD and Ryan Cabeen, PhD.

Summary

PostGreSQL has become one of the most popular and widely used databases, and for good reason. The level of extensibility that it supports has allowed it to be used in virtually every environment. At Citus Data they have built an extension to support running it in a distributed fashion across large volumes of data with parallelized queries for improved performance. In this episode Ozgun Erdogan, the CTO of Citus, and Craig Kerstiens, Citus Product Manager, discuss how the company got started, the work that they are doing to scale out PostGreSQL, and how you can start using it in your environment.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to dataengineeringpodcast.com/gocd to download and launch it today. Enterprise add-ons and professional support are available for added peace of mind. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Ozgun Erdogan and Craig Kerstiens about Citus, worry free PostGreSQL

Interview

Introduction How did you get involved in the area of data management? Can you describe what Citus is and how the project got started? Why did you start with Postgres vs. building something from the ground up? What was the reasoning behind converting Citus from a fork of PostGres to being an extension and releasing an open source version? How well does Citus work with other Postgres extensions, such as PostGIS, PipelineDB, or Timescale? How does Citus compare to options such as PostGres-XL or the Postgres compatible Aurora service from Amazon? How does Citus operate under the covers to enable clustering and replication across multiple hosts? What are the failure modes of Citus and how does it handle loss of nodes in the cluster? For someone who is interested in migrating to Citus, what is involved in getting it deployed and moving the data out of an existing system? How do the different options for leveraging Citus compare to each other and how do you determine which features to release or withhold in the open source version? Are there any use cases that Citus enables which would be impractical to attempt in native Postgres? What have been some of the most challenging aspects of building the Citus extension? What are the situations where you would advise against using Citus? What are some of the most interesting or impressive uses of Citus that you have seen? What are some of the features that you have planned for future releases of Citus?

Contact Info

Citus Data

citusdata.com @citusdata on Twitter citusdata on GitHub

Craig

Email Website @craigkerstiens on Twitter

Ozgun

Email ozgune on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Citus Data PostGreSQL NoSQL Timescale SQL blog post PostGIS PostGreSQL Graph Database JSONB Data Type PipelineDB Timescale PostGres-XL Aurora PostGres Amazon RDS Streaming Replication CitusMX CTE (Common Table Expression) HipMunk Citus Sharding Blog Post Wal-e Wal-g Heap Analytics HyperLogLog C-Store

The intro and outro musi

Learning Google BigQuery

If you're ready to untap the potential of data analytics in the cloud, 'Learning Google BigQuery' will take you from understanding foundational concepts to mastering advanced techniques of this powerful platform. Through hands-on examples, you'll learn how to query and analyze massive datasets efficiently, develop custom applications, and integrate your results seamlessly with other tools. What this Book will help me do Understand the fundamentals of Google Cloud Platform and how BigQuery operates within it. Migrate enterprise-scale data seamlessly into BigQuery for further analytics. Master SQL techniques for querying large-scale datasets in BigQuery. Enable real-time data analytics and visualization with tools like Tableau and Python. Learn to create dynamic datasets, manage partition tables and use BigQuery APIs effectively. Author(s) None Berlyant, None Haridass, and None Brown are specialists with years of experience in data science, big data platforms, and cloud technologies. They bring their expertise in data analytics and teaching to make advanced concepts accessible. Their hands-on approach and real-world examples ensure readers can directly apply the skills they acquire to practical scenarios. Who is it for? This book is tailored for developers, analysts, and data scientists eager to leverage cloud-based tools for handling and analyzing large-scale datasets. If you seek to gain hands-on proficiency in working with BigQuery or want to enhance your organization's data capabilities, this book is a fit. No prior BigQuery knowledge is needed, just a willingness to learn.

In this podcast, Paul Ballew(@Ford) talks about best practices when running a data science organization spanned across multiple continents. He shared the importance of being Smart, Nice, and Inquisitive in creating tomorrow's workforce today. He sheds some light on the importance of appreciating culture when defining forward-looking policies. He also builds a case for a non-native group and discusses ways to implement data science as a central organization(with no hub-spoke model). This podcast is great for future data science leaders leading organizations with a broad consumer base and multiple geo-political silos.

Timeline: 0:29 Paul's journey. 5:10 Paul's current role. 8:10 Insurance and data analytics. 13:00 Who will own the insurance in the time of automation. 18:22 Recruiting models in technologies. 21:54 Embracing technological change. 25:03 Will we have more analytics in Ford cars? 28:25 How does Ford stay competitive from a technology perspective. 30:30 Challenges for Analytics officer in Ford. 32:36 Ingredients of a good hire. 34:12 How is the data science team structured in Ford. 36:15 Dealing with shadow groups. 39:00 Successful KPIs. 40:33 Who owns data? 42:27 Who should own the security of data assets. 44:05 Examples of successful data science groups. 46:30 Practises for remaining bias-free. 48:55 Getting started running a global data science team. 52:45 How does Paul's keep himself updated. 54:18 Paul's favorite read. 55:45 Closing remarks.

Paul's Recommended Read: The Outsiders Paperback – S. E. Hinton http://amzn.to/2Ai84Gl

Podcast Link: https://futureofdata.org/paul-ballewford-running-global-data-science-group-futureofdata-podcast/

Paul's BIO: Paul Ballew is vice president and Global Chief Data and Analytics officer, Ford Motor Company, effective June 1, 2017. At the same time, he also was elected a Ford Motor Company officer. In this role, he leads Ford’s global data and analytics teams for the enterprise. Previously, Ballew was Global Chief Data and Analytics Officer, a position to which he was named in December 2014. In this role, he has been responsible for establishing and growing the company’s industry-leading data and analytics operations that are driving significant business value throughout the enterprise. Prior to joining Ford, he was Chief Data, Insight & Analytics Officer at Dun & Bradstreet. In this capacity, he was responsible for the company’s global data and analytic activities along with the company’s strategic consulting practice. Previously, Ballew served as Nationwide’s senior vice president for Customer Insight and Analytics. He directed customer analytics, market research, and information and data management functions, and supported the company’s marketing strategy. His responsibilities included the development of Nationwide’s customer analytics, data operations, and strategy. Ballew joined Nationwide in November 2007 and established the company’s Customer Insights and Analytics capabilities.

Ballew sits on the boards of Neustar, Inc. and Hyatt Hotels Corporation. He was born in 1964 and has a bachelor’s and master’s degree in Economics from the University of Detroit.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey in creating the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy