talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

In this podcast @DanDeGrazia from @IBM spoke with @Vishaltx from @AnalyticsWeek to discuss the mingling of chief data scientist with open sources. He sheds light into some of the big opportunities in open source and how businesses could work together to achieve progress in data science. Dan also shared the importance of smooth communication for success as a data scientist.

TIMELINE: 0:29 Dan's journey. 9:40 Dan's role in IBM. 11:26 Tips on staying consistent while creating a database. 16:23 Chief data scientist and open-source put together. 20:28 The state of open source when it comes to data. 23:50 Evaluating the market to understand business requirements. 29:19 Future of data and open-source market. 33:23 Exciting opportunities in data. 37:06 Data scientist's role in integrating business and data. 49:41 Ingredients of a successful data scientist. 53:04 Data science and trust issues. 59:35 Human element behind data. 1:01:20 Dan's success mantra. 1:06:52 Key takeaways.

Dan's Recommended Read: The Five Temptations of a CEO, Anniversary Edition: A Leadership Fable by Patrick Lencioni https://amzn.to/2Jcm5do What Every BODY is Saying: An Ex-FBI Agent8217;s Guide to Speed-Reading People by Joe Navarro, Marvin Karlins https://amzn.to/2J1RXxO

Podcast Link: https://futureofdata.org/where-chief-data-scientist-open-source-meets-dandegrazia-futureofdata-podcast/

Dan's BIO: Dan has almost 30 years of experience working with large data sets. Starting with the unusual work of analyzing potential jury pools in the 1980s, Dan also did some of the first PC based voter registration analytics in the Chicago area, including putting the first complete list of registered voters on a PC (as hard as that is to imagine today a 50-megabyte hard drive on DOS systems was staggering). Interested in almost anything new and technical, he worked at The Chicago Board of Trade. He taught himself BASIC to write algorithms while working as an Arbitrager in financial futures. After the military, Dan moved to San Francisco. He worked with several small companies and startups designing and implementing some of the first PC-based fax systems (who cares now!), enterprise accounting software, and early middleware connections using the early 3GL/4GL languages. Always perusing the technical edge cases, Dan worked for InfoBright, a Column store Database startup in the US and EMEA, at Lingotek, an In-Q-Tel funded company working in large data set translations and big data analytics companies like Datameer and his current position as a Chief Data Scientist for Open Source in the IBM Channels organization. Dan's current just for fun Project is working to create an app that will record and analyze bird songs and provide the user with information on the bird and the specifics of the current song.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Ethics and Data Science

As the impact of data science continues to grow on society there is an increased need to discuss how data is appropriately used and how to address misuse. Yet, ethical principles for working with data have been available for decades. The real issue today is how to put those principles into action. With this report, authors Mike Loukides, Hilary Mason, and DJ Patil examine practical ways for making ethical data standards part of your work every day. To help you consider all of possible ramifications of your work on data projects, this report includes: A sample checklist that you can adapt for your own procedures Five framing guidelines (the Five C’s) for building data products: consent, clarity, consistency, control, and consequences Suggestions for building ethics into your data-driven culture Now is the time to invest in a deliberate practice of data ethics, for better products, better teams, and better outcomes. Get a copy of this report and learn what it takes to do good data science today.

In this podcast @AndyPalmer from @Tamr sat with @Vishaltx from @AnalyticsWeek to talk about the emergence/need/market for Data Ops, a specialized capability emerging from merging data engineering and dev ops ecosystem due to increased convoluted data silos and complicated processes. Andy shared his journey on what some of the businesses and their leaders are doing wrong and how businesses need to rethink their data silos to future proof themselves. This is a good podcast for any data leader thinking about cracking the code on getting high-quality insights from data.

Timelines: 0:28 Andy's journey. 4:56 What's Tamr? 6:38 What's Andy's role in Tamr. 8:16 What's data ops? 13:07 Right time for business to incorporate data ops. 15:56 Data exhaust vs. data ops. 21:05 Tips for executives in dealing with data. 23:15 Suggestions for businesses working with data. 25:48 Creating buy-in for experimenting with new technologies. 28:47 Using data ops for the acquisition of new companies. 31:58 Data ops vs. dev ops. 36:40 Big opportunities in data science. 39:35 AI and data ops. 44:28 Parameters for a successful start-up. 47:49 What still surprises Andy? 50:19 Andy's success mantra. 52:48 Andy's favorite reads. 54:25 Final remarks.

Andy's Recommended Read: Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker https://amzn.to/2Lc6WqK The Three-Body Problem by Cixin Liu and Ken Liu https://amzn.to/2rQyPvp

Andy's BIO: Andy Palmer is a serial entrepreneur who specializes in accelerating the growth of mission-driven startups. Andy has helped found and/or fund more than 50 innovative companies in technology, health care, and the life sciences. Andy’s unique blend of strategic perspective and disciplined tactical execution is suited to environments where uncertainty is the rule rather than the exception. Andy has a specific passion for projects at the intersection of computer science and the life sciences.

Most recently, Andy co-founded Tamr, a next-generation data curation company, and Koa Labs, a start-up club in the heart of Harvard Square, Cambridge, MA.

Specialties: Software, Sales & Marketing, Web Services, Service Oriented Architecture, Drug Discovery, Database, Data Warehouse, Analytics, Startup, Entrepreneurship, Informatics, Enterprise Software, OLTP, Science, Internet, eCommerce, Venture Capital, Bootstrapping, Founding Team, Venture Capital firm, Software companies, early-stage venture, corporate development, venture-backed, venture capital fund, world-class, stage venture capital

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Podcast link: https://futureofdata.org/emergence-of-dataops-age-andypalmer-futureofdata-podcast/

Wanna Join? If you or any you know wants to join in, Register your interest and email at [email protected]

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

We revisit the 2018 Microsoft Build in this episode, focusing on the latest ideas in DevOps. Kyle interviews Cloud Developer Advocates Damien Brady, Paige Bailey, and Donovan Brown to talk about DevOps and data science and databases. For a data scientist, what does it even mean to "build"? Packaging and deployment are things that a data scientist doesn't normally have to consider in their day-to-day work. The process of making an AI app is usually divided into two streams of work: data scientists building machine learning models and app developers building the application for end users to consume. DevOps includes all the parties involved in getting the application deployed and maintained and thinking about all the phases that follow and precede their part of the end solution. So what does DevOps mean for data science? Why should you adopt DevOps best practices? In the first half, Paige and Damian share their views on what DevOps for data science would look like and how it can be introduced to provide continuous integration, delivery, and deployment of data science models. In the second half, Donovan and Damian talk about the DevOps life cycle of putting a database under version control and carrying out deployments through a release pipeline.

In this episode, Wayne Eckerson and Jen Underwood explore a new era of analytics. Data volumes and complexity have exceeded the limits of current manual drag-and-drop analytics solutions. Data moves at the speed of light while speed-to-insight lags farther and farther behind. It is time to explore intelligent, next generation, machine-powered analytics to retain your competitive edge. It is time to combine the best of the human mind and machine.

Underwood is an analytics expert and founder of Impact Analytic. She is a former product manager at Microsoft who spearheaded the design and development of the reinvigorated version of Power BI, which has since become a market leading BI tool. Underwood is an IBM Analytics Insider, SAS contributor, former Tableau Zen Master, Top 10 Women Influencer and active analytics community member. She is keenly interested in the intersection of data visualization and data science and writes and speaks persuasively about these topics.

In this podcast, Aaron Black from Inova Translational Medicine Institute talks about his journey in creating/leading data science practice in healthcare. He shared some of the best practices, opportunities, and challenges concerning team dynamics, process orientation, and leadership relationship building. This podcast is great for anyone from ADP who talked about big data in HR. He shared some of the best practices and opportunities that reside in HR data. Aaron also shared some tactical steps to help build a better data-driven team to execute data-driven strategies. This podcast is great for folks looking to explore the depth of HR data and opportunities in the health and medicine domain.

Timeline: 0:28 Aaron's journey. 8:16 Defining translational medicine. 11:47 Defining precision medecine. 12:47 Data sharing between pharma companies. 15:03 Defining biobanking. 18:50 Data and healthcare industry. 22:20 Best practices in creating a healthcare database. 25:46 Tackling data regulations. 30:17 Best practices in creating data literacy in employees. 33:27 The culture of data scientists in the healthcare space. 36:09 Challenges that a data science leader faces in the healthcare space. 39:25 Opportunities in health data space. 42:19 Ingredients of a good data science leader in the healthcare space. 44:38 Tips for data science leaders in the healthcare space. 47:00 Putting together a data team in the healthcare space. 50:22 Aaron's success tips. 52:49 Aaron's reading list. 55:25 Closing remark.

Podcast link: https://futureofdata.org/understanding-futureofdata-in-health-medicine-thedataguru-inovahealth-futureofdata/

Aaron's Book Recommendations: Smartcuts: The Breakthrough Power of Lateral Thinking by Shane Snow amzn.to/2rH9xzJ When: The Scientific Secrets of Perfect Timing by Daniel H. Pink amzn.to/2rElebc

Aaron's BIO: Aaron Black, Chief Data Officer at the Inova Translational Medicine Institute. Healthcare Information Technology Executive and Data Evangelist. A results-driven technical leader with a 20+ year record of successful project and program implementations; Visionary, collaborative, and able to devise creative solutions and culture to complex business challenges.

Key thought leader, international speaker, team builder, and data architect in building advanced and one-of-a-kind technical and data infrastructure to support precision medicine initiatives in large and cutting edge health care institutions. A featured speaker and panelist at National Conferences and Councils including TEDx Tysons, NIH, Amazon ReInvent, Precision Medicine World Conference, Labroots, HIMSS, and an invited speaker at the National Research Council’s Standing Committee on Biological and Physical Sciences in Space (CBPSS).

Experience in start-up and new team development. Proven change-agent in diverse organizations and politically charged environments. A catalyst to create vision, motivation, and results across an entire enterprise. Creative thinker; organized, resolute, and able to direct multiple competing priorities with great precision while meeting strict deadlines and budget requirements. Strong healthcare and research industry knowledge, particularly in Life Sciences, with expertise in developing, implementing, and supporting large data enterprise architectures. Excellent interpersonal skills, work effectively with individuals of diverse backgrounds, and inspire teams to work to their fullest potential.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

futureofdata

leadership

data in hr

hr data

hris

big data

Mastering Numerical Computing with NumPy

"Mastering Numerical Computing with NumPy" is a comprehensive guide to becoming proficient in numerical computing using Python's NumPy library. This book will teach you how to perform advanced numerical operations, explore data statistically, and build predictive models effectively. By mastering the provided concepts and exercises, you'll be empowered in your scientific computing projects. What this Book will help me do Perform and optimize vector and matrix operations effectively using NumPy. Analyze data using exploratory data analysis techniques and predictive modeling. Implement unsupervised learning algorithms such as clustering with relevant datasets. Understand advanced benchmarks and select optimal configurations for performance. Write efficient and scalable programs utilizing advanced NumPy features. Author(s) The authors of "Mastering Numerical Computing with NumPy" include domain experts and educators with years of experience in Python programming, numerical computing, and data science. They bring a practical and detailed approach to teaching advanced topics and guide you through every step of mastering NumPy. Who is it for? This book is ideal for Python programmers, data analysts, and data science enthusiasts who aim to deepen their understanding of numerical computing. If you have basic mathematics skills and want to utilize NumPy to solve complex data problems, this book is an excellent resource. Whether you're a beginner or an intermediate user, you will find this content approachable and enriching. Advanced users will benefit from the highly specialized content and real-world examples.

Domain-Specific Languages in R: Advanced Statistical Programming

Gain an accelerated introduction to domain-specific languages in R, including coverage of regular expressions. This compact, in-depth book shows you how DSLs are programming languages specialized for a particular purpose, as opposed to general purpose programming languages. Along the way, you’ll learn to specify tasks you want to do in a precise way and achieve programming goals within a domain-specific context. Domain-Specific Languages in R includes examples of DSLs including large data sets or matrix multiplication; pattern matching DSLs for application in computer vision; and DSLs for continuous time Markov chains and their applications in data science. After reading and using this book, you’ll understand how to write DSLs in R and have skills you can extrapolate to other programming languages. What You'll Learn Program with domain-specific languages using R Discover the components of DSLs Carry out large matrix expressions and multiplications Implement metaprogramming with DSLs Parse and manipulate expressions Who This Book Is For Those with prior programming experience. R knowledge is helpful but not required.

Python vs. R for Data Science

Python and R are two of the mainstream languages in data science. Fundamentally, Python is a language for programmers, whereas R is a language for statisticians. In a data science context, there is a significant degree of overlap when it comes to the capabilities of each language in the fields of regression analysis and machine learning. Your choice of language will depend highly on the environment in which you are operating. In a production environment, Python integrates with other languages much more seamlessly and is therefore the modus operandi in this context. However, R is much more common in research environments due to its more extensive selection of libraries for statistical analysis.

SQL Primer: An Accelerated Introduction to SQL Basics

Build a core level of competency in SQL so you can recognize the parts of queries and write simple SQL statements. SQL knowledge is essential for anyone involved in programming, data science, and data management. This book covers features of SQL that are standardized and common across most database vendors. You will gain a base of knowledge that will prepare you to go deeper into the specifics of any database product you might encounter. Examples in the book are worked in PostgreSQL and SQLite, but the bulk of the examples are platform agnostic and will work on any database platform supporting SQL. Early in the book you learn about table design, the importance of keys as row identifiers, and essential query operations. You then move into more advanced topics such as grouping and summarizing, creating calculated fields, joining data from multiple tables when it makes business sense to do so, and more. Throughout the book, you are exposed to a set-based approachto the language and are provided a good grounding in subtle but important topics such as the effects of null value on query results. With the explosion of data science, SQL has regained its prominence as a top skill to have for technologists and decision makers worldwide. SQL Primer will guide you from the very basics of SQL through to the mainstream features you need to have a solid, working knowledge of this important, data-oriented language. What You'll Learn Create and populate your own database tables Read SQL queries and understand what they are doing Execute queries that get correct results Bring together related rows from multiple tables Group and sort data in support of reporting applications Get a grip on nulls, normalization, and other key concepts Employ subqueries, unions, and other advanced features Who This Book Is For Anyone new to SQL who is looking for step-by-step guidance toward understanding and writing SQL queries. The book is aimed at those who encounter SQL statements often in their work, and provides a sound baseline useful across all SQL database systems. Programmers, database managers, data scientists, and business analysts all can benefit from the baseline of SQL knowledge provided in this book.

In this podcast, Harsh Tiwari, Former CDO CUNA Mutual Group, sheds light on data science leadership in the financial / risk sector. He shares some key takeaway insights for aspiring leaders to take for managing large enterprise data science practice. He shared the importance of collaborations and a growth mindset via a partnership. He discussed his "So what" approach to problem-solving. This podcast is great for any listener willing to understand some best practices for being a data-driven leader.

Timeline: 0:28 Harsh's journey. 5:44 Harsh's current role. 10:17 Ideal location for a chief data officer. 14:42 Ideal CDO role and placement. 20:15 Capital One's best practices in managing data. 25:28 How are the credit unions and regional banks placed in terms of data management. 31:20 Introducing data to well-performing banks. 38:05 Getting started as a CDO in a bank. 43:21 Checklist for a business to hire a CDO. 48:35 Keeping oneself sane during the technological disruption. 54:13 Harsh's success mantra. 58:51 Harsh's favorite read. 1:02:14 Parting thoughts.

Harsh's Recommended Read: Good to Great: Why Some Companies Make the Leap and Others Don't by Jim Collins https://amzn.to/2I7DHGM

Podcast Link: https://futureofdata.org/harsh-tiwari-talks-about-fabric-of-data-driven-leader-in-financial-sector-futureofdata-podcast/

Harsh's BIO: Harsh Tiwari is the Senior Vice President and Chief Data Officer for CUNA Mutual Group in Madison, Wisconsin. His primary responsibilities include leading enterprise-wide data initiatives providing strategy and policy guidance for data acquisition, usage, and management. He joined the company in July 2015. Before joining CUNA Mutual Group, Harsh spent many years working in information technology, analytics, and data intelligence. He worked at Capital One Financial Group in Plano, Texas, for 17 years, where he most recently focused on creating an effective data and business intelligence environment to manage risks across the company as the Head of Risk Management Data and Business Intelligence. He has also served as the Divisional CIO for Small Business Credit Card and Consumer Lending, Head of Portfolio and Delivery Management, Head of Auto Finance Data and Business Intelligence, Business Information Officer of Capital One Canada, and Analyst –Senior Manager of Small Business Data & System Analysis.

A native of India, Harsh earned a B.S. in Mechanical engineering from Mysore University in Mysore, Karnataka, India, and an M.B.A. in Finance / MIS Drexel University in Philadelphia, Pennsylvania. In his spare time, Harsh enjoys golfing and spending time with his wife, Rashmi, and their son, who is 12, and a daughter, who is 8.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this podcast, Drew Conway (@DrewConway) from Aluvium talks about his journey to start an IoT startup. He sheds light on the opportunities in the industrial IoT space and shares some insights into the mechanism of running a data science startup in the IoT space. She shared some tactical suggestions for any future leader. This podcast is great for data science startup entrepreneurs and/or Sr. executives in IoT.

Timeline: 0:28 Drew's journey from counter-terrorism to IoT startup. 9:29 Data science in the industrial space. 12:01 Entrepreneurship in the IoT start-up. 18:36 Selling data analysis to executives in the industrial space. 24:14 Automation in the industrial setting. 29:27 What is an IoT ready company? 32:40 Challenges in integrating data tools in the industrial sector. 37:27 Data science talent pool in industrial and manufacturing companies. 41:52 Challenges in IoT adoption for industrial companies. 46:31 Alluvium's interaction with industries. 50:57 Picking the right use case as an IoT start-up. 52:49 Right customers for an IoT start-up. 59:26 Words of wisdom for anyone building a IoT start-up.

Drew's Recommended Listen: Gödel, Escher, Bach: An Eternal Golden Braid by Douglas R. Hofstadter https://amzn.to/2x0uo7d

Podcast Link: https://futureofdata.org/drewconway-on-fabric-of-an-iot-startup-futureofdata-podcast/

Drew's BIO: Drew Conway, CEO and founder of Alluvium, is a leading expert in the application of computational methods to social and behavioral problems at large-scale. Drew has been writing and speaking about the role of data — and the discipline of data science — in industry, government, and academia for several years.

Drew has advised and consulted companies across many industries, ranging from fledgling start-ups to Fortune 100 companies, as well as academic institutions and government agencies at all levels. Drew started his career in counter-terrorism as a computational social scientist in the U.S. intelligence community.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Data Analytics with Spark Using Python, First edition

Spark for Data Professionals introduces and solidifies the concepts behind Spark 2.x, teaching working developers, architects, and data professionals exactly how to build practical Spark solutions. Jeffrey Aven covers all aspects of Spark development, including basic programming to SparkSQL, SparkR, Spark Streaming, Messaging, NoSQL and Hadoop integration. Each chapter presents practical exercises deploying Spark to your local or cloud environment, plus programming exercises for building real applications. Unlike other Spark guides, Spark for Data Professionals explains crucial concepts step-by-step, assuming no extensive background as an open source developer. It provides a complete foundation for quickly progressing to more advanced data science and machine learning topics. This guide will help you: Understand Spark basics that will make you a better programmer and cluster “citizen” Master Spark programming techniques that maximize your productivity Choose the right approach for each problem Make the most of built-in platform constructs, including broadcast variables, accumulators, effective partitioning, caching, and checkpointing Leverage powerful tools for managing streaming, structured, semi-structured, and unstructured data

Hands-On Data Science with Anaconda

Hands-On Data Science with Anaconda is your guide to harnessing the full potential of Anaconda, a powerful platform for data science and machine learning. With this book, you will learn how to set up Anaconda, manage packages, explore advanced data processing techniques, and create robust machine learning models using Python, R, and Julia. What this Book will help me do Master data preprocessing techniques including cleaning, sorting, and classification using Anaconda. Understand and utilize the conda package manager for efficient package management. Learn to explore and visualize data using packages and frameworks supported by Anaconda. Perform advanced operations like clustering, regression, and building predictive models. Implement distributed computing and manage environments effectively with Anaconda Cloud. Author(s) Yuxing Yan and co-author None Yan are seasoned data science professionals with extensive experience in utilizing cutting-edge tools like Anaconda to simplify and enhance data science workflows. With a focus on making complex concepts accessible, they offer a practical and systematic approach to mastering tools that power real-world data science projects. Who is it for? This book is for data science practitioners, analysts, or developers with a basic understanding of Python, R, and linear algebra who want to scale their skills and learn to utilize the Anaconda platform for their projects. If you're seeking to work more effectively within the Anaconda ecosystem or equip yourself with efficient tools for data analysis and machine learning, this book is for you.

In this podcast, Drew Conway (@DrewConway) from Alluvium talks about his journey on creating a socially connected and responsible data science practice. He shared tactical steps and suggestions to help recruit the right talent, build the right culture, and nurture the relationship to create a sustained and impactful data science practice. The session is great for folks caring to create a self-sustaining and growth compliant data science practice.

Timeline: 0:28 Drew's journey from counter-terrorism to IoT startup. 9:29 Data science in the industrial space. 12:01 Entrepreneurship in the IoT start-up. 18:36 Selling data analysis to executives in the industrial space. 24:14 Automation in the industrial setting. 29:27 What is an IoT ready company? 32:40 Challenges in integrating data tools in the industrial sector. 37:27 Data science talent pool in industrial and manufacturing companies. 41:52 Challenges in IoT adoption for industrial companies. 46:31 Alluvium's interaction with industries. 50:57 Picking the right use case as an IoT start-up. 52:49 Right customers for an IoT start-up. 59:26 Words of wisdom for anyone building an IoT start-up.

Drew's Recommended Listen: Gödel, Escher, Bach: An Eternal Golden Braid by Douglas R. Hofstadter https://amzn.to/2x0uo7d

Podcast Link: https://futureofdata.org/drewconway-on-creating-socially-responsible-data-science-practice-futureofdata-podcast/

Drew's BIO: Drew Conway, CEO, and founder of Alluvium, is a leading expert in applying computational methods to social and behavioral problems at a large-scale. Drew has been writing and speaking about the role of data — and the discipline of data science — in industry, government, and academia for several years.

Drew has advised and consulted companies across many industries, ranging from fledgling start-ups to Fortune 100 companies, as well as academic institutions and government agencies at all levels. Drew started his career in counter-terrorism as a computational social scientist in the U.S. intelligence community.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this podcast, Justin Borgman talks about his journey of starting a data science start, doing an exit, and jumping on another one. The session is filled with insights for leadership, looking for entrepreneurial wisdom to get on a data-driven journey.

Timeline: 0:28 Justin's journey. 3:22 Taking the plunge to start a new company. 5:49 Perception vs. reality of starting a data warehouse company. 8:15 Bringing in something new to the IT legacy. 13:20 Getting your first few customers. 16:16 Right moment for a data warehouse company to look for a new venture. 18:20 Right person to have as a co-founder. 20:29 Advantages of going seed vs. series A. 22:13 When is a company ready for seeding or series A? 24:40 Who's a good adviser? 26:35 Exiting Teradata. 28:54 Teradata to starting a new company. 31:24 Excitement of starting something from scratch. 32:24 What is Starburst? 37:15 Presto, a great engine for cloud platforms. 40:30 How can a company get started with Presto. 41:50 Health of enterprise data. 44:15 Where does Presto not fit in? 45:19 Future of enterprise data. 46:36 Drawing parallels between proprietary space and open source space. 49:02 Does align with open-source gives a company a better chance in seeding. 51:44 John's ingredients for success. 54:05 John's favorite reads. 55:01 Key takeaways.

Paul's Recommended Read: The Outsiders Paperback – S. E. Hinton amzn.to/2Ai84Gl

Podcast Link: https://futureofdata.org/running-a-data-science-startup-one-decision-at-a-time-futureofdata-podcast/

Justin's BIO: Justin has spent the better part of a decade in senior executive roles building new businesses in the data warehousing and analytics space. Before co-founding Starburst, Justin was Vice President and General Manager at Teradata (NYSE: TDC), where he was responsible for the company’s portfolio of Hadoop products. Prior to joining Teradata, Justin was co-founder and CEO of Hadapt, the pioneering "SQL-on-Hadoop" company that transformed Hadoop from file system to analytic database accessible to anyone with a BI tool. Teradata acquired Hadapt in 2014.

Justin earned a BS in Computer Science from the University of Massachusetts at Amherst and an MBA from the Yale School of Management.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this episode, Wayne Eckerson and Jeff Magnusson discuss the data architecture Stitch Fix created to support its data science workloads, as well as the need to balance man and machine and art and science.

Magnusson is the vice president of data platform at Stitch Fix. He leads a team responsible for building the data platform that supports the company's team of 80+ data scientists, as well as other business users. That platform is designed to facilitate self-service among data scientists and promote velocity and innovation that differentiate Stitch Fix in the marketplace. Before Stitch Fix, Magnusson managed the data platform architecture team at Netflix where he helped design and open source many of the components of the Hadoop-based infrastructure and big data platform.

Summary

The Open Data Science Conference brings together a variety of data professionals each year in Boston. This week’s episode consists of a pair of brief interviews conducted on-site at the conference. First up you’ll hear from Andy Eschbacher of Carto. He dscribes some of the complexities inherent to working with geospatial data, how they are handling it, and some of the interesting use cases that they enable for their customers. Next is Todd Blaschka, COO of TigerGraph. He explains how graph databases differ from relational engines, where graph algorithms are useful, and how TigerGraph is built to alow for fast and scalable operation.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Your host is Tobias Macey and last week I attended the Open Data Science Conference in Boston and recorded a few brief interviews on-site. In this second part you will hear from Andy Eschbacher of Carto about the challenges of managing geospatial data, as well as Todd Blaschka of TigerGraph about graph databases and how his company has managed to build a fast and scalable platform for graph storage and traversal.

Interview

Andy Eschbacher From Carto

What are the challenges associated with storing geospatial data? What are some of the common misconceptions that people have about working with geospatial data?

Contact Info

andy-esch on GitHub @MrEPhysics on Twitter Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Carto Geospatial Analysis GeoJSON

Todd Blaschka From TigerGraph

What are graph databases and how do they differ from relational engines? What are some of the common difficulties that people have when deling with graph algorithms? How does data modeling for graph databases differ from relational stores?

Contact Info

LinkedIn @toddblaschka on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

TigerGraph Graph Databases

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Visual Data Storytelling with Tableau, First edition

Tell Insightful, Actionable Business Stories with Tableau, the World’s Leading Data Visualization Tool! Visual Data Storytelling with Tableau brings together knowledge, context, and hands-on skills for telling powerful, actionable data stories with Tableau. This full-color guide shows how to organize data and structure analysis with storytelling in mind, embrace exploration and visual discovery, and articulate findings with rich data, carefully curated visualizations, and skillfully crafted narrative. You don’t need any visualization experience. Each chapter illuminates key aspects of design practice and data visualization, and guides you step-by-step through applying them in Tableau. Through realistic examples and classroom-tested exercises, Professor Lindy Ryan helps you use Tableau to analyze data, visualize it, and help people connect more intuitively and emotionally with it. Whether you’re an analyst, executive, student, instructor, or journalist, you won’t just master the tools: you’ll learn to craft data stories that make an immediate impact--and inspire action. Learn how to: Craft more powerful stories by blending data science, genre, and visual design Ask the right questions upfront to plan data collection and analysis Build storyboards and choose charts based on your message and audience Direct audience attention to the points that matter most Showcase your data stories in high-impact presentations Integrate Tableau storytelling throughout your business communication Explore case studies that show what to do--and what not to do Discover visualization best practices, tricks, and hacks you can use with any tool Includes coverage up through Tableau 10

Data Science Fundamentals for Python and MongoDB

Build the foundational data science skills necessary to work with and better understand complex data science algorithms. This example-driven book provides complete Python coding examples to complement and clarify data science concepts, and enrich the learning experience. Coding examples include visualizations whenever appropriate. The book is a necessary precursor to applying and implementing machine learning algorithms. The book is self-contained. All of the math, statistics, stochastic, and programming skills required to master the content are covered. In-depth knowledge of object-oriented programming isn’t required because complete examples are provided and explained. Data Science Fundamentals with Python and MongoDB is an excellent starting point for those interested in pursuing a career in data science. Like any science, the fundamentals of data science are a prerequisite to competency. Without proficiency in mathematics, statistics, data manipulation, and coding, the path to success is “rocky” at best. The coding examples in this book are concise, accurate, and complete, and perfectly complement the data science concepts introduced. What You'll Learn Prepare for a career in data science Work with complex data structures in Python Simulate with Monte Carlo and Stochastic algorithms Apply linear algebra using vectors and matrices Utilize complex algorithms such as gradient descent and principal component analysis Wrangle, cleanse, visualize, and problem solve with data Use MongoDB and JSON to work with data Who This Book Is For The novice yearning to break into the data science world, and the enthusiast looking to enrich, deepen, and develop data science skills through mastering the underlying fundamentalsthat are sometimes skipped over in the rush to be productive. Some knowledge of object-oriented programming will make learning easier.