Dr. @MikeStonebraker on his journey to evolution of data ops and winning #Turing Award

2019-05-16 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Michael Stonebraker (Paradigm4; Advisor to VoltDB; former Adjunct Professor at MIT)

Aurora Big Data Data Management postgresql

Dr. @MikeStonebraker on his journey to the evolution of data ops and winning #Turing Award #FutureOfData #Leadership #Podcast

Timeline: 0:29 Mike's journey. 30:23 Reason behind Mike's preference of academia over the corporate. 38:50 Tips to leaders on data management.

In this podcast, Dr. Michael Stonebraker discussed his journey into creating data ops and winning the Turing award. He shared his life's several aha moments and progressions that mirrored the evolution of the data ops industry. It's a delightful conversation for anyone seeking to understand how data ops have evolved over the last couple of decades and what it takes to win the Turing Award.

Podcast Link: iTunes: https://apple.co/2VtcX6d Youtube: https://youtu.be/bY1qjy0qpq4

Dr. Stonebraker's BIO: Dr. Stonebraker has been a pioneer of database research and technology for more than forty years. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty-five years. More recently at M.I.T., he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, which became VoltDB, the SciDB array DBMS, and the Data Tamer data curation system. Presently he serves as an advisor to VoltDB and Chief Technology Officer of Paradigm4 and Tamr, Inc.

Professor Stonebraker was awarded the ACM System Software Award in 1992 for his work on INGRES. Additionally, he was awarded the first annual SIGMOD Innovation award in 1994 and was elected to the National Academy of Engineering in 1997. He was awarded the IEEE John Von Neumann award in 2005 and the 2014 Turing Award and is presently an Adjunct Professor of Computer Science at M.I.T, where he is co-director of the Intel Science and Technology Center focused on big data.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to come on the show and discuss their journey in creating the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest by emailing us @ [email protected]

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData,

DataAnalytics,

Leadership,

Futurist,

Podcast,

BigData,

Strategy

Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud

2019-05-10 · O'Reilly Data Science Books O'Reilly Amazon

book

by Paul J. Deitel , Harvey M. Deitel

AI/ML Big Data Cloud Computing Data Science Python programming-languages software-development

This is the eBook of the printed book and may not include any media, website access codes, or print supplements that may come packaged with the bound book. For introductory-level Python programming and/or data-science courses. A groundbreaking, flexible approach to computer science and data science The Deitels’ Introduction to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud offers a unique approach to teaching introductory Python programming, appropriate for both computer-science and data-science audiences. Providing the most current coverage of topics and applications, the book is paired with extensive traditional supplements as well as Jupyter Notebooks supplements. Real-world datasets and artificial-intelligence technologies allow students to work on projects making a difference in business, industry, government and academia. Hundreds of examples, exercises, projects (EEPs), and implementation case studies give students an engaging, challenging and entertaining introduction to Python programming and hands-on data science. Related Content Video: Python Fundamentals Live courses: Python Full Throttle with Paul Deitel: A One-Day, Fast-Paced, Code-Intensive Python Presentation Python® Data Science Full Throttle with Paul Deitel: Introductory Artificial Intelligence (AI), Big Data and Cloud Case Studies The book’s modular architecture enables instructors to conveniently adapt the text to a wide range of computer-science and data-science courses offered to audiences drawn from many majors. Computer-science instructors can integrate as much or as little data-science and artificial-intelligence topics as they’d like, and data-science instructors can integrate as much or as little Python as they’d like. The book aligns with the latest ACM/IEEE CS-and-related computing curriculum initiatives and with the Data Science Undergraduate Curriculum Proposal sponsored by the National Science Foundation.

Introduction to Probability.

2019-05-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Konstadinos G. Politis , Markos V. Koutras , N. Balakrishnan

data data-science data-science-tasks statistics

An essential guide to the concepts of probability theory that puts the focus on models and applications Introduction to Probability offers an authoritative text that presents the main ideas and concepts, as well as the theoretical background, models, and applications of probability. The authors—noted experts in the field—include a review of problems where probabilistic models naturally arise, and discuss the methodology to tackle these problems. A wide-range of topics are covered that include the concepts of probability and conditional probability, univariate discrete distributions, univariate continuous distributions, along with a detailed presentation of the most important probability distributions used in practice, with their main properties and applications. Designed as a useful guide, the text contains theory of probability, de finitions, charts, examples with solutions, illustrations, self-assessment exercises, computational exercises, problems and a glossary. This important text: • Includes classroom-tested problems and solutions to probability exercises • Highlights real-world exercises designed to make clear the concepts presented • Uses Mathematica software to illustrate the text’s computer exercises • Features applications representing worldwide situations and processes • Offers two types of self-assessment exercises at the end of each chapter, so that students may review the material in that chapter and monitor their progress. Written for students majoring in statistics, engineering, operations research, computer science, physics, and mathematics, Introduction to Probability: Models and Applications is an accessible text that explores the basic concepts of probability and includes detailed information on models and applications.

Social-Behavioral Modeling for Complex Systems

2019-04-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jonathan Pfautz , Angela O'Mahony , Paul K. Davis

Cyber Security data data-engineering data-models

This volume describes frontiers in social-behavioral modeling for contexts as diverse as national security, health, and on-line social gaming. Recent scientific and technological advances have created exciting opportunities for such improvements. However, the book also identifies crucial scientific, ethical, and cultural challenges to be met if social-behavioral modeling is to achieve its potential. Doing so will require new methods, data sources, and technology. The volume discusses these, including those needed to achieve and maintain high standards of ethics and privacy. The result should be a new generation of modeling that will advance science and, separately, aid decision-making on major social and security-related subjects despite the myriad uncertainties and complexities of social phenomena. Intended to be relatively comprehensive in scope, the volume balances theory-driven, data-driven, and hybrid approaches. The latter may be rapidly iterative, as when artificial-intelligence methods are coupled with theory-driven insights to build models that are sound, comprehensible and usable in new situations. With the intent of being a milestone document that sketches a research agenda for the next decade, the volume draws on the wisdom, ideas and suggestions of many noted researchers who draw in turn from anthropology, communications, complexity science, computer science, defense planning, economics, engineering, health systems, medicine, neuroscience, physics, political science, psychology, public policy and sociology. In brief, the volume discusses: Cutting-edge challenges and opportunities in modeling for social and behavioral science Special requirements for achieving high standards of privacy and ethics New approaches for developing theory while exploiting both empirical and computational data Issues of reproducibility, communication, explanation, and validation Special requirements for models intended to inform decision making about complex social systems

Probability, Random Variables, Statistics, and Random Processes

2019-04-02 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ali Grami

Cyber Security data data-science data-science-tasks statistics

Probability, Random Variables, Statistics, and Random Processes: Fundamentals & Applications is a comprehensive undergraduate-level textbook. With its excellent topical coverage, the focus of this book is on the basic principles and practical applications of the fundamental concepts that are extensively used in various Engineering disciplines as well as in a variety of programs in Life and Social Sciences. The text provides students with the requisite building blocks of knowledge they require to understand and progress in their areas of interest. With a simple, clear-cut style of writing, the intuitive explanations, insightful examples, and practical applications are the hallmarks of this book. The text consists of twelve chapters divided into four parts. Part-I, Probability (Chapters 1 – 3), lays a solid groundwork for probability theory, and introduces applications in counting, gambling, reliability, and security. Part-II, Random Variables (Chapters 4 – 7), discusses in detail multiple random variables, along with a multitude of frequently-encountered probability distributions. Part-III, Statistics (Chapters 8 – 10), highlights estimation and hypothesis testing. Part-IV, Random Processes (Chapters 11 – 12), delves into the characterization and processing of random processes. Other notable features include: Most of the text assumes no knowledge of subject matter past first year calculus and linear algebra With its independent chapter structure and rich choice of topics, a variety of syllabi for different courses at the junior, senior, and graduate levels can be supported A supplemental website includes solutions to about 250 practice problems, lecture slides, and figures and tables from the text Given its engaging tone, grounded approach, methodically-paced flow, thorough coverage, and flexible structure, Probability, Random Variables, Statistics, and Random Processes: Fundamentals & Applications clearly serves as a must textbook for courses not only in Electrical Engineering, but also in Computer Engineering, Software Engineering, and Computer Science.

Robust Statistics, 2nd Edition

2019-01-04 · O'Reilly Data Science Books O'Reilly Amazon

book

by R. Douglas Martin , Ricardo A. Maronna , Victor J. Yohai , Matías Salibián-Barrera

data data-science data-science-tasks statistics

A new edition of this popular text on robust statistics, thoroughly updated to include new and improved methods and focus on implementation of methodology using the increasingly popular open-source software R. Classical statistics fail to cope well with outliers associated with deviations from standard distributions. Robust statistical methods take into account these deviations when estimating the parameters of parametric models, thus increasing the reliability of fitted models and associated inference. This new, second edition of Robust Statistics: Theory and Methods (with R) presents a broad coverage of the theory of robust statistics that is integrated with computing methods and applications. Updated to include important new research results of the last decade and focus on the use of the popular software package R, it features in-depth coverage of the key methodology, including regression, multivariate analysis, and time series modeling. The book is illustrated throughout by a range of examples and applications that are supported by a companion website featuring data sets and R code that allow the reader to reproduce the examples given in the book. Unlike other books on the market, Robust Statistics: Theory and Methods (with R) offers the most comprehensive, definitive, and up-to-date treatment of the subject. It features chapters on estimating location and scale; measuring robustness; linear regression with fixed and with random predictors; multivariate analysis; generalized linear models; time series; numerical algorithms; and asymptotic theory of M-estimates. Explains both the use and theoretical justification of robust methods Guides readers in selecting and using the most appropriate robust methods for their problems Features computational algorithms for the core methods Robust statistics research results of the last decade included in this 2nd edition include: fast deterministic robust regression, finite-sample robustness, robust regularized regression, robust location and scatter estimation with missing data, robust estimation with independent outliers in variables, and robust mixed linear models. Robust Statistics aims to stimulate the use of robust methods as a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. It is an ideal resource for researchers, practitioners, and graduate students in statistics, engineering, computer science, and physical and social sciences.

Vertically Integrated Architectures: Versioned Data Models, Implicit Services, and Persistence-Aware Programming

2018-12-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jos Jong

data data-engineering data-models

Understand how and why the separation between layers and tiers in service-oriented architectures holds software developers back from being truly productive, and how you can remedy that problem. Strong processes and development tools can help developers write more complex software, but large amounts of code can still be directly deduced from the underlying database model, hampering developer productivity. In a world with a shortage of developers, this is bad news. More code also increases maintenance costs and the risk of bugs, meaning less time is spent improving the quality of systems. You will learn that by making relationships first-class citizens within an item/relationship model, you can develop an extremely compact query language, inspired by natural language. You will also learn how this model can serve as both a database schema and an object model upon which to build business logic. Implicit services free you from writing code for standard read/write operations, while still supporting fine-grained authorization. Vertically Integrated Architectures explains how functional schema mappings can solve database migrations and service versioning at the same time, and how all this can support any client, from free-format to fully vertically integrated types. Unleash the potential and use VIA to drastically increase developer productivity and quality. What You'll Learn See how the separation between application server and database in a SOA-based architecture might be justifiable from a historical perspective, but can also hold us back Examine how the vertical integration of application logic and database functionality can drastically increase developer productivity and quality Review why application developers only need to write pure business logic if an architecture takes care of basic read/write client-server communication and data persistence Understand why a set-oriented and persistence-aware programming language would not only make it easier to build applications, but would also enable the fully optimized execution of incoming service requests Who This Book Is For Software architects, senior software developers, computer science professionals and students, and the open source community.

GIS Fundamentals, 2nd Edition

2018-09-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Stephen Wise

GIS data data-engineering geographic-information-system-gis geographic information system (gis) location-data

Aimed at readers with a knowledge of geographic information systems (GIS) but no formal training in computer science, this book provides a clear and accessible introduction to how GIS store and process spatial data. This updated edition includes two new chapters on databases and heuristics, substantial additional material on indexing and raster imagery, and revisions throughout that incorporate up-to-date applications such as GPS on mobile devices and Internet-based services.

The Spread of Fake News

2018-07-20 · Data Skeptic Listen

podcast_episode

by Kyle Polich , Filippo Menczer (Indiana University)

How does fake news get spread online? Its not just a matter of manipulating search algorithms. The social platforms for sharing play a major role in the distribution of fake news. But how significant of an impact can there be? How significantly can bots influence the spread of fake news? In this episode, Kyle interviews Filippo Menczer, Professor of Computer Science and Informatics. Fil is part of the Observatory on Social Media ([OSoMe][https://osome.iuni.iu.edu/tools/]). OSoMe are the creators of Hoaxy, Botometer, Fakey, and other tools for studying the spread of information on social media. The interview explores these tools and the contributions Bots make to the spread of fake news.

Emergence of #DataOps Age - @AndyHPalmer #FutureOfData

2018-07-12 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Andy Palmer (Tamr) , Vishal

AI/ML Analytics Big Data Data Engineering Data Science DataOps DWH Marketing

In this podcast @AndyPalmer from @Tamr sat with @Vishaltx from @AnalyticsWeek to talk about the emergence/need/market for Data Ops, a specialized capability emerging from merging data engineering and dev ops ecosystem due to increased convoluted data silos and complicated processes. Andy shared his journey on what some of the businesses and their leaders are doing wrong and how businesses need to rethink their data silos to future proof themselves. This is a good podcast for any data leader thinking about cracking the code on getting high-quality insights from data.

Timelines: 0:28 Andy's journey. 4:56 What's Tamr? 6:38 What's Andy's role in Tamr. 8:16 What's data ops? 13:07 Right time for business to incorporate data ops. 15:56 Data exhaust vs. data ops. 21:05 Tips for executives in dealing with data. 23:15 Suggestions for businesses working with data. 25:48 Creating buy-in for experimenting with new technologies. 28:47 Using data ops for the acquisition of new companies. 31:58 Data ops vs. dev ops. 36:40 Big opportunities in data science. 39:35 AI and data ops. 44:28 Parameters for a successful start-up. 47:49 What still surprises Andy? 50:19 Andy's success mantra. 52:48 Andy's favorite reads. 54:25 Final remarks.

Andy's Recommended Read: Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker https://amzn.to/2Lc6WqK The Three-Body Problem by Cixin Liu and Ken Liu https://amzn.to/2rQyPvp

Andy's BIO: Andy Palmer is a serial entrepreneur who specializes in accelerating the growth of mission-driven startups. Andy has helped found and/or fund more than 50 innovative companies in technology, health care, and the life sciences. Andy’s unique blend of strategic perspective and disciplined tactical execution is suited to environments where uncertainty is the rule rather than the exception. Andy has a specific passion for projects at the intersection of computer science and the life sciences.

Most recently, Andy co-founded Tamr, a next-generation data curation company, and Koa Labs, a start-up club in the heart of Harvard Square, Cambridge, MA.

Specialties: Software, Sales & Marketing, Web Services, Service Oriented Architecture, Drug Discovery, Database, Data Warehouse, Analytics, Startup, Entrepreneurship, Informatics, Enterprise Software, OLTP, Science, Internet, eCommerce, Venture Capital, Bootstrapping, Founding Team, Venture Capital firm, Software companies, early-stage venture, corporate development, venture-backed, venture capital fund, world-class, stage venture capital

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Podcast link: https://futureofdata.org/emergence-of-dataops-age-andypalmer-futureofdata-podcast/

Wanna Join? If you or any you know wants to join in, Register your interest and email at [email protected]

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Running a data science startup, one decision at a time #Futureofdata podcast

2018-05-16 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Justin Borgman (Starburst Data)

Analytics BI Big Data Cloud Computing Data Science DWH Hadoop Presto SQL Teradata

In this podcast, Justin Borgman talks about his journey of starting a data science start, doing an exit, and jumping on another one. The session is filled with insights for leadership, looking for entrepreneurial wisdom to get on a data-driven journey.

Timeline: 0:28 Justin's journey. 3:22 Taking the plunge to start a new company. 5:49 Perception vs. reality of starting a data warehouse company. 8:15 Bringing in something new to the IT legacy. 13:20 Getting your first few customers. 16:16 Right moment for a data warehouse company to look for a new venture. 18:20 Right person to have as a co-founder. 20:29 Advantages of going seed vs. series A. 22:13 When is a company ready for seeding or series A? 24:40 Who's a good adviser? 26:35 Exiting Teradata. 28:54 Teradata to starting a new company. 31:24 Excitement of starting something from scratch. 32:24 What is Starburst? 37:15 Presto, a great engine for cloud platforms. 40:30 How can a company get started with Presto. 41:50 Health of enterprise data. 44:15 Where does Presto not fit in? 45:19 Future of enterprise data. 46:36 Drawing parallels between proprietary space and open source space. 49:02 Does align with open-source gives a company a better chance in seeding. 51:44 John's ingredients for success. 54:05 John's favorite reads. 55:01 Key takeaways.

Paul's Recommended Read: The Outsiders Paperback – S. E. Hinton amzn.to/2Ai84Gl

Podcast Link: https://futureofdata.org/running-a-data-science-startup-one-decision-at-a-time-futureofdata-podcast/

Justin's BIO: Justin has spent the better part of a decade in senior executive roles building new businesses in the data warehousing and analytics space. Before co-founding Starburst, Justin was Vice President and General Manager at Teradata (NYSE: TDC), where he was responsible for the company’s portfolio of Hadoop products. Prior to joining Teradata, Justin was co-founder and CEO of Hadapt, the pioneering "SQL-on-Hadoop" company that transformed Hadoop from file system to analytic database accessible to anyone with a BI tool. Teradata acquired Hadapt in 2014.

Justin earned a BS in Computer Science from the University of Massachusetts at Amherst and an MBA from the Yale School of Management.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

@JohnNives on ways to demystify AI for enterprise

2018-05-02 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Jean-Louis (John) Nives (N2Growth)

AI/ML Analytics Big Data IBM Marketing

In this podcast, @JohnNives discusses ways to demystify AI for the enterprise. He shares his perspective on how businesses should engage with AI and what are some of the best practices and considerations for businesses to adopt AI in their strategic roadmap. This podcast is great for anyone seeking to learn about the way to adopt AI in the enterprise landscape.

Timelines: 0:28 John's journey. 6:50 John's current role. 9:40 The role of a chief digital officer. 11:16 The current trend of AI. 13:52 AI hype or real? 16:42 Why AI now? 19:03 Demystifying deep learning. 23:35 Enterprise use cases of AI. 28:25 Attributes of a successful AI project. 32:20 Best AI investments in an enterprise. 36:56 Convincing leadership to adopt AI. 39:20 Organizational implications of adopting AI. 43:45 What do executives get wrong about AI? 48:36 Tips for executives to understand the AI landscape. 53:11 John's favorite reads. 57:35 Closing remarks.

John's Recommended Listen: FutureOfData Podcast math.im/itunes War and Peace Leo Tolstoy (Author),‎ Frederick Davidson (Narrator),‎ Inc. Blackstone Audio (Publisher) amzn.to/2w7ObkI

Podcast Link: https://futureofdata.org/johnnives-on-ways-to-demystify-ai-for-enterprise/

Jean's BIO: Jean-Louis (John) Nives serves as Chief Digital Officer and the Global Chair of the Digital Transformation practice at N2Growth. Prior to joining N2Growth, Mr. Nives was at IBM Global Business Services, within the Watson and Analytics Center of Competence. There he worked on Cognitive Digital Transformation projects related to Watson, Big Data, Analytics, Social Business and Marketing/Advertising Technology. Examples include CognitiveTV and the application of external unstructured data (social, weather, etc.) for business transformation. Prior relevant experience includes executive leadership positions at Nielsen, IRI, Kraft and two successful advertising technology acquisitions (Appnexus and SintecMedia). In this capacity, Jean-Louis combined information, analytics and technology to created significant business value in transformative ways. Jean-Louis earned a Bachelor’s Degree in Industrial Engineering from University at Buffalo and an MBA in Finance and Computer Science from Pace University. He is married with four children and lives in the New York City area.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to discuss their journey in creating the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice

2018-03-07 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Bill Schmarzo (Dell EMC)

Analytics BI Big Data Data Science DWH dimensional modeling

In this podcast, Bill Schmarzo talks about the ingredients of successful data science practice, team, and executives. Bill shared his insights on what some leaders in the industries are doing and some challenges seen in the successful deployment. Bill shared his key take on ingredients for some of the successful hires. This podcast is great for growth mindset executives willing to learn about creating a successful data science practice.

Timeline: 0:29 Bill's journey. 5:05:00 Bill's current role. 7:04 Data science adoption challenges for businesses. 9:33 The good side of data science adoption. 11:22 How is data science changing business. 14:34 Strategies behind distributed IT. 18:35 Analysing the current amount of data. 21:50 Who should own the idea of data science? 24:34 The right background for a CDO. 25:52 Bias in IT. 29:35 Hacks to keep yourself bias-free. 31:58 Team vs. tool for putting together a good data-driven practice. 34:54 Value cycle in data science. 37:10 Maturity model. 39:17 Convincing culture heavy businesses to adopt data. 42:47 Keeping oneself sane during the technological disruption. 46:20 Hiring the right talent. 51:46 Ingredients of a good data science hire. 56:00 Bill's success mantra. 59:07 Bill's favorite reads. 1:00:36 Closing remarks.

Bill's Recommended Read: Moneyball: The Art of Winning an Unfair Game by Michael Lewis http://amzn.to/2FqBFg8 Big Data MBA: Driving Business Strategies with Data Science by Bill Schmarzo http://amzn.to/2tlZAvP

Podcast Link: https://futureofdata.org/schmarzo-dellemc-on-ingredients-of-healthy-datascience-practice-futureofdata-podcast/

Bill's BIO: Bill Schmarzo is the CTO for the Big Data Practice, where he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogger, and is a frequent speaker on the use of Big Data and data science to power the organization's key business initiatives. He is a University of San Francisco School of Management Fellow, where he teaches the "Big Data MBA" course.

Bill has over three decades of experience in data warehousing, BI, and analytics. Bill authored EMC's Vision Workshop methodology that links an organization's strategic business initiatives with their supporting data and analytic requirements and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute's faculty as the head of the analytic applications curriculum.

Bill holds a master's degree in Business Administration from the University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science, and Business Administration from Coe College.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

An Introduction to Discrete-Valued Time Series

2018-02-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Christian H. Weiss

AI/ML Data Science Jenkins data data-science data-science-tasks statistics time-series

A much-needed introduction to the field of discrete-valued time series, with a focus on count-data time series Time series analysis is an essential tool in a wide array of fields, including business, economics, computer science, epidemiology, finance, manufacturing and meteorology, to name just a few. Despite growing interest in discrete-valued time series—especially those arising from counting specific objects or events at specified times—most books on time series give short shrift to that increasingly important subject area. This book seeks to rectify that state of affairs by providing a much needed introduction to discrete-valued time series, with particular focus on count-data time series. The main focus of this book is on modeling. Throughout numerous examples are provided illustrating models currently used in discrete-valued time series applications. Statistical process control, including various control charts (such as cumulative sum control charts), and performance evaluation are treated at length. Classic approaches like ARMA models and the Box-Jenkins program are also featured with the basics of these approaches summarized in an Appendix. In addition, data examples, with all relevant R code, are available on a companion website. Provides a balanced presentation of theory and practice, exploring both categorical and integer-valued series Covers common models for time series of counts as well as for categorical time series, and works out their most important stochastic properties Addresses statistical approaches for analyzing discrete-valued time series and illustrates their implementation with numerous data examples Covers classical approaches such as ARMA models, Box-Jenkins program and how to generate functions Includes dataset examples with all necessary R code provided on a companion website An Introduction to Discrete-Valued Time Series is a valuable working resource for researchers and practitioners in a broad range of fields, including statistics, data science, machine learning, and engineering. It will also be of interest to postgraduate students in statistics, mathematics and economics.

Evolutionary Computation

2018-02-02 · Data Skeptic Listen

podcast_episode

by Kyle Polich , Risto Miikkulainen (Cognizant AI Lab)

In this week's episode, Kyle is joined by Risto Miikkulainen, a professor of computer science and neuroscience at the University of Texas at Austin. They talk about evolutionary computation, its applications in deep learning, and how it's inspired by biology. They also discuss some of the things Sentient Technologies is working on in stock and finances, retail, e-commerce and web design, as well as the technology behind it-- evolutionary algorithms.

MACHINE LEARNING AND HUMAN INTERPRETABILITY: THE COST OF COMPLEXITY

2018-02-01 · Superweek 2018

talk

by Matt Gershoff (Conductrics, New York - USA)

AI/ML

Machine learning can be thought of as sitting at the intersection of computer science and statistics. "Computer Science has focused primarily on how to manually program computers, Machine Learning focuses on the question of how to get computers to program themselves ..." – Tom Mitchel

Python for R Users

2017-11-13 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ajay Ohri

AI/ML Analytics Cloud Computing Data Quality Data Science DataViz NLP Python data data-science data-science-tools r

The definitive guide for statisticians and data scientists who understand the advantages of becoming proficient in both R and Python The first book of its kind, Python for R Users: A Data Science Approach makes it easy for R programmers to code in Python and Python users to program in R. Short on theory and long on actionable analytics, it provides readers with a detailed comparative introduction and overview of both languages and features concise tutorials with command-by-command translations—complete with sample code—of R to Python and Python to R. Following an introduction to both languages, the author cuts to the chase with step-by-step coverage of the full range of pertinent programming features and functions, including data input, data inspection/data quality, data analysis, and data visualization. Statistical modeling, machine learning, and data mining—including supervised and unsupervised data mining methods—are treated in detail, as are time series forecasting, text mining, and natural language processing. • Features a quick-learning format with concise tutorials and actionable analytics • Provides command-by-command translations of R to Python and vice versa • Incorporates Python and R code throughout to make it easier for readers to compare and contrast features in both languages • Offers numerous comparative examples and applications in both programming languages • Designed for use for practitioners and students that know one language and want to learn the other • Supplies slides useful for teaching and learning either software on a companion website Python for R Users: A Data Science Approach is a valuable working resource for computer scientists and data scientists that know R and would like to learn Python or are familiar with Python and want to learn R. It also functions as textbook for students of computer science and statistics. A. Ohri is the founder of Decisionstats.com and currently works as a senior data scientist. He has advised multiple startups in analytics off-shoring, analytics services, and analytics education, as well as using social media to enhance buzz for analytics products. Mr. Ohri's research interests include spreading open source analytics, analyzing social media manipulation with mechanism design, simpler interfaces for cloud computing, investigating climate change and knowledge flows. His other books include R for Business Analytics and R for Cloud Computing.

The Computational Complexity of Machine Learning

2017-11-03 · Data Skeptic Listen

podcast_episode

by Kyle Polich , Michael Kearns (University of Pennsylvania)

AI/ML Analytics

In this episode, Professor Michael Kearns from the University of Pennsylvania joins host Kyle Polich to talk about the computational complexity of machine learning, complexity in game theory, and algorithmic fairness. Michael's doctoral thesis gave an early broad overview of computational learning theory, in which he emphasizes the mathematical study of efficient learning algorithms by machines or computational systems. When we look at machine learning algorithms they are almost like meta-algorithms in some sense. For example, given a machine learning algorithm, it will look at some data and build some model, and it's going to behave presumably very differently under different inputs. But does that mean we need new analytical tools? Or is a machine learning algorithm just the same thing as any deterministic algorithm, but just a little bit more tricky to figure out anything complexity-wise? In other words, is there some overlap between the good old-fashioned analysis of algorithms with the analysis of machine learning algorithms from a complexity viewpoint? And what is the difference between strategies for determining the complexity bounds on samples versus algorithms? A big area of machine learning (and in the analysis of learning algorithms in general) Michael and Kyle discuss is the topic known as complexity regularization. Complexity regularization asks: How should one measure the goodness of fit and the complexity of a given model? And how should one balance those two, and how can one execute that in a scalable, efficient way algorithmically? From this, Michael and Kyle discuss the broader picture of why one should care whether a learning algorithm is efficiently learnable if it's learnable in polynomial time. Another interesting topic of discussion is the difference between sample complexity and computational complexity. An active area of research is how one should regularize their models so that they're balancing the complexity with the goodness of fit to fit their large training sample size. As mentioned, a good resource for getting started with correlated equilibria is: https://www.cs.cornell.edu/courses/cs684/2004sp/feb20.pdf Thanks to our sponsors: Mendoza College of Business - Get your Masters of Science in Business Analytics from Notre Dame. brilliant.org - A fun, affordable, online learning tool. Check out their Computer Science Algorithms course.

Andrea Gallego / @BCG on Managing Analytics Practice

2017-11-01 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Andrea Gallego (Boston Consulting Group)

AI/ML Analytics Big Data Cloud Computing Data Engineering Data Science KPI

In this podcast, Andrea Gallego, Principal & Global Technology Lead @ Boston Consulting Group, talks about her journey as a data science practitioner in the consulting space. She talks about some of the industry practices that up and rising data science professionals must deploy and talks about some operational hacks to help create a robust data science team. It is a must-listen conversation for practitioner folks in the industry trying to deploy a data science team and build solutions for a service industry.

Timeline: 0:29 Andrea's journey. 5:41 Andrea's current role. 8:02 Seasoned data professional to COO role. 11:27 The essentials for having analytics at scale. 14:56 First steps to creating an analytics practice. 18:33 Defining an engineering first company. 22:33 A different understanding of data engineering. 26:40 Mistakes businesses make in their data science practice. 30:21 Some good business problems that data science can solve. 36:42 Democratization of data vs. privacy in companies. 38:04 Tech to business challenges. 40:11 Important KPIs for building a data science practice. 43:47 Hacks to hiring good data science candidates. 49:07 Art of doing business and science of doing business. 52:16 Andrea's secret to success. 55:12 Andrea's favorite read. 58:35 Closing remarks.

Andrea's Recommended Read: Arrival by Ted Chiang http://amzn.to/2h6lJpv Build to Last by Jim Collins http://amzn.to/2yMCsam Designing Agentive Technology: AI That Works for People Paperback http://amzn.to/2ySDHGp

Podcast Link: https://futureofdata.org/andrea-gallego-bcg-managing-analytics-practice/

Andrea's BIO: Andrea is Principal & Global Technology Lead @ Boston Consulting Group. Prior to BCG, Andrea was COO of QuantumBlack’s Cloud platform. She also manages the cloud platform team and helps drive the vision and future of McKinsey Analytics’ digital capabilities. Andrea has broad expertise in computer science, cloud computing, digital transformation strategy, and analytics solutions architecture. Prior to joining the Firm, Andrea was a technologist at Booz Allen Hamilton. She holds a BS in Economics and MS in Analytics (with a concentration in computing methods for analytics).

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

[MINI] Turing Machines

2017-10-27 · Data Skeptic Listen

podcast_episode

by Kyle Polich

TMs are a model of computation at the heart of algorithmic analysis. A Turing Machine has two components. An infinitely long piece of tape (memory) with re-writable squares and a read/write head which is programmed to change it's state as it processes the input. This exceptionally simple mechanical computer can compute anything that is intuitively computable, thus says the Church-Turing Thesis. Attempts to make a "better" Turing Machine by adding things like additional tapes can make the programs easier to describe, but it can't make the "better" machine more capable. It won't be able to solve any problems the basic Turing Machine can, even if it perhaps solves them faster. An important concept we didn't get to in this episode is that of a Universal Turing Machine. Without the prefix, a TM is a particular algorithm. A Universal TM is a machine that takes, as input, a description of a TM and an input to that machine, and subsequently, simulates the inputted machine running on the given input. Turing Machines are a central idea in computer science. They are central to algorithmic analysis and the theory of computation.

talk-data.com

Computer Science

Activity Trend

Top Events

Top Speakers

Dr. @MikeStonebraker on his journey to evolution of data ops and winning #Turing Award

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to come on the show and discuss their journey in creating the data-driven future.

Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud

Introduction to Probability.

Social-Behavioral Modeling for Complex Systems

Probability, Random Variables, Statistics, and Random Processes

Robust Statistics, 2nd Edition

Vertically Integrated Architectures: Versioned Data Models, Implicit Services, and Persistence-Aware Programming

GIS Fundamentals, 2nd Edition

The Spread of Fake News

Emergence of #DataOps Age - @AndyHPalmer #FutureOfData

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Running a data science startup, one decision at a time #Futureofdata podcast

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

@JohnNives on ways to demystify AI for enterprise

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to discuss their journey in creating the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

An Introduction to Discrete-Valued Time Series

Evolutionary Computation

MACHINE LEARNING AND HUMAN INTERPRETABILITY: THE COST OF COMPLEXITY

Python for R Users

The Computational Complexity of Machine Learning

Andrea Gallego / @BCG on Managing Analytics Practice

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

[MINI] Turing Machines