Understanding #BigData #BigOpportunity in Big HR by @MarcRind

2018-05-09 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Marc Rind (ADP)

Analytics Big Data

In this podcast, Marc Rind from ADP talked about big data in HR. He shared some of the best practices and opportunities that reside in HR data. Marc also shared some tactical steps to perform to help build better data-driven teams to execute data-driven strategies. This podcast is great for folks looking to explore the depth of HR data and the opportunities that reside in it.

Timeline: 0:28 Marc's journey. 4:50 Marc's typical day. 7:23 Data use cases in ADP. 11:20 Driving innovation and thought leadership. 15:15 Creating awareness for the necessity for innovation. 18:54 Listening skills key for innovation. 20:25 HR's role in the time of automation. 27:45 Product development and data science. 30:36 Working on a client analytics platform. 34:41 Team building. 37:52 Tips for established businesses to get started with data. 41:20 Data opportunities for entrepreneurs in the HR space. 43:23 Marc's ingredients for success. 46:35 Marc's reading list. 48:35 Key takeaways.

Podcast Link: https://futureofdata.org/understanding-bigdata-bigopportunity-in-hr-marcrind-futureofdata/

Marc's BIO: Marc is responsible for leading the research and development of Automatic Data Processing’s (ADP’s) Analytics and Big Data initiative. In this capacity, Marc drives the innovation and thought leadership in building ADP’s Client Analytics platform. ADP Analytics provides its clients not only the ability to read the pulse of its own human capital…but also provides the information on how they stack up within their industry, along with the best courses of action to achieve its goals through quantifiable insights.

Marc was also an instrumental leader behind the small business market payroll platform; RUN Powered by ADP®. Marc leads a number of the technology teams responsible for delivering its critically acclaimed product focused on its innovative user experience for small business owners.

Prior to joining ADP, Marc’s innovative spirit and fascination with data were forged at Bolt Media, a dot-com start-up based in NY’s “Silicon Alley”. The company was an early predecessor to today’s social media outlets. As an early ‘Data Scientist,’ Marc focused on the patterns and predictions of site usage through the harnessing of the data on its +10 million user profiles.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Brief Conversations From The Open Data Science Conference: Part 1 - Episode 30

2018-05-07 · Data Engineering Podcast Listen

podcast_episode

by Stepan Pushkarev (Hydrosphere.io) , Alan Anders (Applecart) , Tobias Macey

AI/ML API Data Engineering Data Management Databricks Delta Spark

Summary

The Open Data Science Conference brings together a variety of data professionals each year in Boston. This week’s episode consists of a pair of brief interviews conducted on-site at the conference. First up you’ll hear from Alan Anders, the CTO of Applecart about their challenges with getting Spark to scale for constructing an entity graph from multiple data sources. Next I spoke with Stepan Pushkarev, the CEO, CTO, and Co-Founder of Hydrosphere.io about the challenges of running machine learning models in production and how his team tracks key metrics and samples production data to re-train and re-deploy those models for better accuracy and more robust operation.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and this week I attended the Open Data Science Conference in Boston and recorded a few brief interviews on-site. First up you’ll hear from Alan Anders, the CTO of Applecart about their challenges with getting Spark to scale for constructing an entity graph from multiple data sources. Next I spoke with Stepan Pushkarev, the CEO, CTO, and Co-Founder of Hydrosphere.io about the challenges of running machine learning models in production and how his team tracks key metrics and samples production data to re-train and re-deploy those models for better accuracy and more robust operation.

Interview

Alan Anders from Applecart

What are the challenges of gathering and processing data from multiple data sources and representing them in a unified manner for merging into single entities? What are the biggest technical hurdles at Applecart?

Contact Info

@alanjanders on Twitter LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Spark DataBricks DataBricks Delta Applecart

Stepan Pushkarev from Hydrosphere.io

What is Hydropshere.io? What metrics do you track to determine when a machine learning model is not producing an appropriate output? How do you determine which data points to sample for retraining the model? How does the role of a machine learning engineer differ from data engineers and data scientists?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Hydrosphere Machine Learning Engineer

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

The Experimental Design of Paranormal Claims

2018-05-04 · Data Skeptic Listen

podcast_episode

by Kyle Polich , Jerry Schwartz (Independent Investigations Group)

AI/ML Blockchain

In this episode of Data Skeptic, Kyle chats with Jerry Schwarz from the Independent Investigations Group (IIG)'s SF Bay Area chapter about testing claims of the paranormal. The IIG is a volunteer-based organization dedicated to investigating paranormal or extraordinary claim from a scientific viewpoint. The group, headquartered at the Center for Inquiry-Los Angeles in Hollywood, offers a $100,000 prize to anyone who can show, under proper observing conditions, evidence of any paranormal, supernatural, or occult power or event. CHICAGO Tues, May 15, 6pm. Come to our Data Skeptic meetup. CHICAGO Saturday, May 19, 10am. Kyle will be giving a talk at the Chicago AI, Data Science, and Blockchain Conference 2018.

Practical Web Scraping for Data Science: Best Practices and Examples with Python

2018-04-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Seppe vanden Broucke , Bart Baesens

HTML JavaScript Python SAS Selenium SPSS data data-science data-science-tasks web-scraping

This book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. Written with a data science audience in mind, the book explores both scraping and the larger context of web technologies in which it operates, to ensure full understanding. The authors recommend web scraping as a powerful tool for any data scientist’s arsenal, as many data science projects start by obtaining an appropriate data set. Starting with a brief overview on scraping and real-life use cases, the authors explore the core concepts of HTTP, HTML, and CSS to provide a solid foundation. Along with a quick Python primer, they cover Selenium for JavaScript-heavy sites, and web crawling in detail. The book finishes with a recap of best practices and a collection of examples that bring together everything you've learned and illustrate various data science use cases. What You'll Learn Leverage well-established best practices and commonly-used Python packages Handle today's web, including JavaScript, cookies, and common web scraping mitigation techniques Understand the managerial and legal concerns regarding web scraping Who This Book is For A data science oriented audience that is probably already familiar with Python or another programming language or analytical toolkit (R, SAS, SPSS, etc). Students or instructors in university courses may also benefit. Readers unfamiliar with Python will appreciate a quick Python primer in chapter 1 to catch up with the basics and provide pointers to other guides as well.

Data Engineering Weekly with Joe Crobak - Episode 27

2018-04-15 · Data Engineering Podcast Listen

podcast_episode

by Joe Crobak (United States Digital Service (USDS)) , Tobias Macey

Analytics Flink API Amazon EMR Big Data Data Analytics Data Engineering Data Management ELK Hadoop Java Kubernetes +1 more

Summary

The rate of change in the data engineering industry is alternately exciting and exhausting. Joe Crobak found his way into the work of data management by accident as so many of us do. After being engrossed with researching the details of distributed systems and big data management for his work he began sharing his findings with friends. This led to his creation of the Hadoop Weekly newsletter, which he recently rebranded as the Data Engineering Weekly newsletter. In this episode he discusses his experiences working as a data engineer in industry and at the USDS, his motivations and methods for creating a newsleteter, and the insights that he has gleaned from it.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Joe Crobak about his work maintaining the Data Engineering Weekly newsletter, and the challenges of keeping up with the data engineering industry.

Interview

Introduction How did you get involved in the area of data management? What are some of the projects that you have been involved in that were most personally fulfilling?

As an engineer at the USDS working on the healthcare.gov and medicare systems, what were some of the approaches that you used to manage sensitive data? Healthcare.gov has a storied history, how did the systems for processing and managing the data get architected to handle the amount of load that it was subjected to?

What was your motivation for starting a newsletter about the Hadoop space?

Can you speak to your reasoning for the recent rebranding of the newsletter?

How much of the content that you surface in your newsletter is found during your day-to-day work, versus explicitly searching for it? After over 5 years of following the trends in data analytics and data infrastructure what are some of the most interesting or surprising developments?

What have you found to be the fundamental skills or areas of experience that have maintained relevance as new technologies in data engineering have emerged?

What is your workflow for finding and curating the content that goes into your newsletter? What is your personal algorithm for filtering which articles, tools, or commentary gets added to the final newsletter? How has your experience managing the newsletter influenced your areas of focus in your work and vice-versa? What are your plans going forward?

Contact Info

Data Eng Weekly Email Twitter – @joecrobak Twitter – @dataengweekly

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

USDS National Labs Cray Amazon EMR (Elastic Map-Reduce) Recommendation Engine Netflix Prize Hadoop Cloudera Puppet healthcare.gov Medicare Quality Payment Program HIPAA NIST National Institute of Standards and Technology PII (Personally Identifiable Information) Threat Modeling Apache JBoss Apache Web Server MarkLogic JMS (Java Message Service) Load Balancer COBOL Hadoop Weekly Data Engineering Weekly Foursquare NiFi Kubernetes Spark Flink Stream Processing DataStax RSS The Flavors of Data Science and Engineering CQRS Change Data Capture Jay Kreps

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

@AmyGershkoff on building #winning #DataScience team

2018-03-29 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Amy Gershkoff (Ancestry; Zynga; eBay; WPP Data Alliance; Obama for America)

Analytics Big Data

In this podcast, Amy Gershkoff(@amygershkoff) talks about the ingredients of a successful data science team. Amy sheds light on the challenges of building a successful team and how businesses could align themselves to get maximum out of their data science practice. Amy discussed some tricks, tips, and easy to execute strategies to keep the data science team and practice at the top of its efficiency. This is a great session for anyone who wants to be part of a winning and thriving data science practice within the organization.

Timeline:

0:29 Amy's journey. 8:39 Working on Obama's campaign. 15:35 Getting started with a data project. 20:39 First steps for creating a data science team. 27:53 Hiring a data scientist recruiter. 33:00 Building an internal data science workforce. 40:00 Hiring the right data scientist. 42:36 Tips for a data scientist to become a good hire. 44:42 Leadership getting educated in data science. 48:05 How to build diversity in the data science field. 52:52 Being bias free. 54:20 Amy's reading list. 56:06 Key takeaways.

Youtube: https://youtu.be/0PBK5dfQaUk iTunes: http://apple.co/2zMLByT

Podcast Link: https://futureofdata.org/amygershkoff-on-building-winning-datascience-team/

Amy's BIO: Dr. Amy Gershkoff consults and advises technology companies across the globe. She is the former Chief Data Officer for Ancestry, the world's leading genealogy and consumer genomics company. Prior to joining Ancestry, she was Chief Data Officer at Zynga. Previously, Amy built and led the Customer Analytics & Insights team and led the Global Data Science team at eBay. She has also served as the Chief Data Scientist for WPP, Data Alliance, where she worked across WPP’s more than 350 operating companies worldwide to create integrated data and technology solutions. She was also the Head of Media Planning at Obama for America 2012, where she was the architect of Obama’s advertising strategy and designed the campaign's analytics systems.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Want to fix #DataScience ? fix #governance by @StephenGatchell

2018-03-22 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Stephen Gatchell (Dell)

Analytics Big Data Cloud Computing Data Governance Data Lake IBM

In this podcast Stephen Gatchell (@stephengatchell) from @Dell talks about the ingredients of a successful data scientist. He sheds light on the importance of data governance and compliance in defining a robust data science strategy. He suggested tactical steps that executives could take in starting their journey to a robust governance framework. He talked about how to take away the scare from governance. He gave insights on some of the things leaders could do today to build robust data science teams and framework. This podcast is great for leaders seeking some tactical insights into building a robust data science framework.

Timeline:

0:29 Stephen's journey. 4:45 Dell's customer experience journey. 7:39 Suggestions for a startup in regard to customer experience. 12:02 Building a center of excellence around data. 15:29 Data ownership. 19:18 Fixing data governance. 24:02 Fixing the data culture. 29:40 Distributed data ownership and data lakes. 32:50 Understanding data lakes. 35:50 Common pitfalls and opportunities in data governance. 38:50 Pleasant surprises in data governance. 41:30 Ideal data team. 44:04 Hiring the right candidates for data excellence. 46:13 How do I know the "why"? 49:05 Stephen's success mantra. 50:56 Stephen's best read. Steve's Recommended Read: Big Data MBA: Driving Business Strategies with Data Science by Bill Schmarzo http://amzn.to/2HWjOyT

Podcast Link: https://futureofdata.org/want-to-fix-datascience-fix-governance-by-stephengatchell-futureofdata/

Steve's BIO: Stephen is currently a Chief Data Officer Engineering & Data Lake at Dell and serves on the Dell Information Quality Governance Office and the Dell IT Technology Advisory Board, developing Dell’s corporate strategies for the Business Data Lake, Advanced Analytics, and Information Asset Management. Stephen also serves as a Customer Insight Analyst for the Chief Technology Office, analyzing customer technology challenges and requirements. Stephen has been awarded the People’s Choice Award by the Dell Total Customer Experience Team for the Data Governance and Business Data Lake project, as well as a Chief Technology Officer Innovation finalist for utilizing advanced analytics for customer configurations improving product development and product test coverage. Prior to Stephen’s current role, he managed Dell’s Global Product Development Lab Operations team developing internal cloud orchestration and automation environments, an Information Systems Executive for IBM leading acquisition conversion efforts, and was VP of Enterprise Systems and Operations managing mission-critical Information Systems for Telelogic (a Swedish public software firm). Stephen has an MBA from Southern New Hampshire University, a BSBA, and an AS in Finance from Northeastern University.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

0 Comments

Ashok Srivastava(@aerotrekker) on Winning the Art of #DataScience

2018-03-14 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Ashok N. Srivastava (Intuit)

AI/ML Analytics Big Data IBM

In this podcast, Ashok Srivastava(@aerotrekker) talks about how the code of creating a great data science practice goes through #PeopleDataTech, and he suggested how to handle unreasonable expectations from reasonable technologies. He shared his journey through culturally diverse organizations and how he successfully build data science practice. He shared his role in Intuit and some of the AI/Machine learning focus in his current role. This podcast is a must for all data-driven leaders, strategists, and wannabe technologists tasked to grow their organization and build a robust data science practice.

Timeline:

0:29 Ashok's journey. 9:58 The role of a CDO at Intuit. 12:45 Ashok's secret to success working with diverse workforces. 15:42 Building a culture of data science. 19:03 Tactical strategies to convince the leadership about data. 22:03 Comparing a data officer and analytics officer. 24:09 Ownership of data. 27:33 Best practices for putting together a data team. 30:16 Best practices for a company to build a good data science practice. 32:40 Who's the ideal data science candidate? 35:17 Data citizens as data leaders. 37:47 Use cases of AI at Intuit. 39:55 Deciding which product deserves AI. 42:35 Disruptive nature of AI. 45:05 Ashok's success mantra. 46:56 Ashok's favorite reads. 49:15 Key takeaways.

Ashok's Recommended Read: Guns, Germs, and Steel: The Fates of Human Societies - Jared Diamond Ph.D. http://amzn.to/2C4bLMT Collapse: How Societies Choose to Fail or Succeed: Revised Edition - by Jared Diamond http://amzn.to/2C3Bu8f

Podcast Link: https://futureofdata.org/ashok-srivastavaaerotrekker-on-winning-the-art-of-datascience/

Ashok's BIO: Ashok N. Srivastava, Ph.D., is the Senior Vice President and Chief Data Officer at Intuit. He is responsible for setting the vision and direction for large-scale machine learning and AI across the enterprise to help power prosperity across the world. He is hiring hundreds of people in machine learning, AI, and related areas at all levels.

Previously, he was Vice President of Big Data and Artificial Intelligence Systems and the Chief Data Scientist at Verizon. He is an Adjunct Professor at Stanford in the Electrical Engineering Department and is the Editor-in-Chief of the AIAA Journal of Aerospace Information Systems. Ashok is a Fellow of the IEEE, the American Association for the Advancement of Science (AAAS), and the American Institute of Aeronautics and Astronautics (AIAA).

Ashok has a range of business experience, including serving as Senior Director at Blue Martini Software and Senior Consultant at IBM.

He has won numerous awards, including the Distinguished Engineering Alumni Award, the NASA Exceptional Achievement Medal, the IBM Golden Circle Award, the Department of Education Merit Fellowship, and several fellowships from the University of Colorado. Ashok holds a Ph.D. in Electrical Engineering from the University of Colorado at Boulder.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice

2018-03-07 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Bill Schmarzo (Dell EMC)

Analytics BI Big Data Computer Science DWH dimensional modeling

In this podcast, Bill Schmarzo talks about the ingredients of successful data science practice, team, and executives. Bill shared his insights on what some leaders in the industries are doing and some challenges seen in the successful deployment. Bill shared his key take on ingredients for some of the successful hires. This podcast is great for growth mindset executives willing to learn about creating a successful data science practice.

Timeline: 0:29 Bill's journey. 5:05:00 Bill's current role. 7:04 Data science adoption challenges for businesses. 9:33 The good side of data science adoption. 11:22 How is data science changing business. 14:34 Strategies behind distributed IT. 18:35 Analysing the current amount of data. 21:50 Who should own the idea of data science? 24:34 The right background for a CDO. 25:52 Bias in IT. 29:35 Hacks to keep yourself bias-free. 31:58 Team vs. tool for putting together a good data-driven practice. 34:54 Value cycle in data science. 37:10 Maturity model. 39:17 Convincing culture heavy businesses to adopt data. 42:47 Keeping oneself sane during the technological disruption. 46:20 Hiring the right talent. 51:46 Ingredients of a good data science hire. 56:00 Bill's success mantra. 59:07 Bill's favorite reads. 1:00:36 Closing remarks.

Bill's Recommended Read: Moneyball: The Art of Winning an Unfair Game by Michael Lewis http://amzn.to/2FqBFg8 Big Data MBA: Driving Business Strategies with Data Science by Bill Schmarzo http://amzn.to/2tlZAvP

Podcast Link: https://futureofdata.org/schmarzo-dellemc-on-ingredients-of-healthy-datascience-practice-futureofdata-podcast/

Bill's BIO: Bill Schmarzo is the CTO for the Big Data Practice, where he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogger, and is a frequent speaker on the use of Big Data and data science to power the organization's key business initiatives. He is a University of San Francisco School of Management Fellow, where he teaches the "Big Data MBA" course.

Bill has over three decades of experience in data warehousing, BI, and analytics. Bill authored EMC's Vision Workshop methodology that links an organization's strategic business initiatives with their supporting data and analytic requirements and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute's faculty as the head of the analytic applications curriculum.

Bill holds a master's degree in Business Administration from the University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science, and Business Administration from Coe College.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Jeff Magnusson: How To Create A Self-Service Data Platform For Data Scientists

2018-03-06 · Secrets of Data Analytics Leaders Listen

podcast_episode

by Jeff Magnusson (Stitch Fix) , Wayne Eckerson (Eckerson Group)

API Big Data Hadoop Stitch

In this episode, Wayne Eckerson and Jeff Magnusson discuss a self-service model for data science work and the role of a data platform in that environment. Magnusson also talks about Flotilla, a new open source API that makes it easy for data scientists to execute tasks on the data platform.

Magnusson is the vice president of data platform at Stitch Fix. He leads a team responsible for building the data platform that supports the company's team of 80+ data scientists, as well as other business users. That platform is designed to facilitate self-service among data scientists and promote velocity and innovation that differentiate Stitch Fix in the marketplace. Before Stitch Fix, Magnusson managed the data platform architecture team at Netflix where he helped design and open source many of the components of the Hadoop-based infrastructure and big data platform.

The Future Data Economy with Roger Chen - Episode 21

2018-03-05 · Data Engineering Podcast Listen

podcast_episode

by Roger Chen , Tobias Macey

AI/ML Data Engineering Data Management Linux

Summary

Data is an increasingly sought after raw material for business in the modern economy. One of the factors driving this trend is the increase in applications for machine learning and AI which require large quantities of information to work from. As the demand for data becomes more widespread the market for providing it will begin transform the ways that information is collected and shared among and between organizations. With his experience as a chair for the O’Reilly AI conference and an investor for data driven businesses Roger Chen is well versed in the challenges and solutions being facing us. In this episode he shares his perspective on the ways that businesses can work together to create shared data resources that will allow them to reduce the redundancy of their foundational data and improve their overall effectiveness in collecting useful training sets for their particular products.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers A few announcements:

The O’Reilly AI Conference is also coming up. Happening April 29th to the 30th in New York it will give you a solid understanding of the latest breakthroughs and best practices in AI for business. Go to dataengineeringpodcast.com/aicon-new-york to register and save 20% If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to dataengineeringpodcast.com/odsc-east-2018 and register.

Your host is Tobias Macey and today I’m interviewing Roger Chen about data liquidity and its impact on our future economies

Interview

Introduction How did you get involved in the area of data management? You wrote an essay discussing how the increasing usage of machine learning and artificial intelligence applications will result in a demand for data that necessitates what you refer to as ‘Data Liquidity’. Can you explain what you mean by that term? What are some examples of the types of data that you envision as being foundational to multiple organizations and problem domains? Can you provide some examples of the structures that could be created to facilitate data sharing across organizational boundaries? Many companies view their data as a strategic asset and are therefore loathe to provide access to other individuals or organizations. What encouragement can you provide that would convince them to externalize any of that information? What kinds of storage and transmission infrastructure and tooling are necessary to allow for wider distribution of, and collaboration on, data assets? What do you view as being the privacy implications from creating and sharing these larger pools of data inventory? What do you view as some of the technical challenges associated with identifying and separating shared data from those that are specific to the business model of the organization? With broader access to large data sets, how do you anticipate that impacting the types of businesses or products that are possible for smaller organizations?

Cont

ML at Sloan Kettering Cancer Center

2018-03-02 · Data Skeptic Listen

podcast_episode

by Kyle Polich , Iker Huerga (Memorial Sloan Kettering Cancer Center) , Alex Grigorenko (Memorial Sloan Kettering Cancer Center)

AI/ML

For a long time, physicians have recognized that the tools they have aren't powerful enough to treat complex diseases, like cancer. In addition to data science and models, clinicians also needed actual products — tools that physicians and researchers can draw upon to answer questions they regularly confront, such as "what clinical trials are available for this patient that I'm seeing right now?" In this episode, our host Kyle interviews guests Alex Grigorenko and Iker Huerga from Memorial Sloan Kettering Cancer Center to talk about how data and technology can be used to prevent, control and ultimately cure cancer.

Business Case Analysis with R: Simulation Tutorials to Support Complex Business Decisions

2018-03-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Robert D. Brown III

Monte Carlo R data data-science data-science-tools r

This tutorial teaches you how to use the statistical programming language R to develop a business case simulation and analysis. It presents a methodology for conducting business case analysis that minimizes decision delay by focusing stakeholders on what matters most and suggests pathways for minimizing the risk in strategic and capital allocation decisions. Business case analysis, often conducted in spreadsheets, exposes decision makers to additional risks that arise just from the use of the spreadsheet environment. R has become one of the most widely used tools for reproducible quantitative analysis, and analysts fluent in this language are in high demand. The R language, traditionally used for statistical analysis, provides a more explicit, flexible, and extensible environment than spreadsheets for conducting business case analysis. The main tutorial follows the case in which a chemical manufacturing company considers constructing a chemical reactor and production facility to bring a new compound to market. There are numerous uncertainties and risks involved, including the possibility that a competitor brings a similar product online. The company must determine the value of making the decision to move forward and where they might prioritize their attention to make a more informed and robust decision. While the example used is a chemical company, the analysis structure it presents can be applied to just about any business decision, from IT projects to new product development to commercial real estate. The supporting tutorials include the perspective of the founder of a professional service firm who wants to grow his business and a member of a strategic planning group in a biomedical device company who wants to know how much to budget in order to refine the quality of information about critical uncertainties that might affect the value of a chosen product development pathway. What You’ll Learn Set upa business case abstraction in an influence diagram to communicate the essence of the problem to other stakeholders Model the inherent uncertainties in the problem with Monte Carlo simulation using the R language Communicate the results graphically Draw appropriate insights from the results Develop creative decision strategies for thorough opportunity cost analysis Calculate the value of information on critical uncertainties between competing decision strategies to set the budget for deeper data analysis Construct appropriate information to satisfy the parameters for the Monte Carlo simulation when little or no empirical data are available Who This Book Is For Financial analysts, data practitioners, and risk/business professionals; also appropriate for graduate level finance, business, or data science students

@RCKashyap @Cylance on State of Security & Technologist Mindset

2018-02-28 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Rahul Kashyap (Cylance)

AI/ML Big Data Cloud Computing SaaS Cyber Security

In this podcast, Rahul Kashyap(@RCKashyap) talks about the state of security, technology, and business crossroad on Security and the mindset of a security led technologist. He sheds some light on past, present, and future security risks discussed some common leadership concerns, and how a technologist could circumvent that. This podcast is a must for all technologists and wannabe technologists to grow their organization.

Timeline: 0:29 Rahul's journey. 4:40 Rahul's current role. 7:58 How the types of cyberattacks have changed. 12:53 How has IT interaction evolved? 16:50 Problems security industry. 20:12 Market mindset vs. security mindset. 23:10 Ownership of data. 27:02 Cloud, saas, and security. 31:40 Priorities for securing an enterprise. 34:50 How security is secure enough. 37:40 Providing a stable core to the business. 41:11 The state of data science vis a vis security. 44:05 Future of security, data science, and AI. 46:14 Distributed computing and security. 50:30 Tenets of Rahul's success. 53:15 Rahul's favorite read. 54:35 Closing remarks.

Rahul's Recommended Read: Mindset: The New Psychology of Success – Carol S. Dweck http://amzn.to/2GvEX2F

Podcast Link: https://futureofdata.org/rckashyap-cylance-on-state-of-security-technologist-mindset-futureofdata-podcast/

Rahul's BIO: Rahul Kashyap is the Global Chief Technology Officer at Cylance, where he is responsible for strategy, products, and architecture.

Rahul has been instrumental in building several key security technologies viz: Network Intrusion Prevention Systems (NIPS), Host Intrusion Prevention Systems (HIPS), Web Application Firewalls (WAF), Whitelisting, Endpoint/Server Host Monitoring (EDR), and Micro-virtualization. He has been awarded several patents for his innovations. Rahul is an accomplished pen-tester and has in-depth knowledge of OS, networking, and security products.

Rahul has written several security research papers, blogs, and articles that are widely quoted and referenced by media around the world. He has built, led, and scaled award-winning teams that innovate and solve complex security challenges in both large and start-up companies.

He is frequently featured in several podcasts, webinars, and media briefings. Rahul has been a speaker at several top security conferences like BlackHat, BlueHat, Hack-In-The-Box, RSA, DerbyCon, BSides, ISSA International, OWASP, InfoSec UK, and others. He was named 'Silicon Valley's 40 under 40' by Silicon Valley Business Journal.

Rahul mentors entrepreneurs who work with select VC firms and is on the advisory board of tech start-ups.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

SQL Server 2017 Machine Learning Services with R

2018-02-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Julie Koesmarno (Microsoft) , Toma≈æ Ka≈°trun Kaštrun

AI/ML Analytics SQL data data-science data-science-tools r

Learn how to leverage SQL Server 2017 Machine Learning Services and the R programming language to create robust, efficient data analysis and machine learning solutions. This book provides actionable insights and practical examples to help you implement and manage database-oriented analytics and predictive modeling. What this Book will help me do Understand and use SQL Server 2017 Machine Learning Services integrated with R. Gain experience in installing, configuring, and maintaining R services in SQL Server. Create and operationalize predictive models using RevoScaleR and other R packages. Improve database solutions by incorporating advanced analytics techniques. Monitor and manage R-based services effectively for reliable production solutions. Author(s) Tomaž Kaštrun and None Koesmarno bring a wealth of expertise as practitioners and educators in data science and SQL Server technologies. They share their experience innovatively, making intricate subjects approachable. Their unified teaching method ensures readers can directly benefit from practical examples and real-world applications. Who is it for? This book is tailored for database administrators, data analysts, and data scientists eager to integrate R with SQL Server. It caters to professionals with varying levels of R experience who are looking to enhance their proficiency in database-oriented analytics. Readers will benefit most if they are motivated to design effective, data-driven solutions in SQL Server environments.

Honeycomb Data Infrastructure with Sam Stokes - Episode 20

2018-02-26 · Data Engineering Podcast Listen

podcast_episode

by Sam Stokes (Honeycomb) , Tobias Macey

AI/ML Data Collection Data Engineering Data Management GitHub Linux Marketing

Summary

One of the sources of data that often gets overlooked is the systems that we use to run our businesses. This data is not used to directly provide value to customers or understand the functioning of the business, but it is still a critical component of a successful system. Sam Stokes is an engineer at Honeycomb where he helps to build a platform that is able to capture all of the events and context that occur in our production environments and use them to answer all of your questions about what is happening in your system right now. In this episode he discusses the challenges inherent in capturing and analyzing event data, the tools that his team is using to make it possible, and how this type of knowledge can be used to improve your critical infrastructure.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers A few announcements:

There is still time to register for the O’Reilly Strata Conference in San Jose, CA March 5th-8th. Use the link dataengineeringpodcast.com/strata-san-jose to register and save 20% The O’Reilly AI Conference is also coming up. Happening April 29th to the 30th in New York it will give you a solid understanding of the latest breakthroughs and best practices in AI for business. Go to dataengineeringpodcast.com/aicon-new-york to register and save 20% If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to dataengineeringpodcast.com/odsc-east-2018 and register.

Your host is Tobias Macey and today I’m interviewing Sam Stokes about his work at Honeycomb, a modern platform for observability of software systems

Interview

Introduction How did you get involved in the area of data management? What is Honeycomb and how did you get started at the company? Can you start by giving an overview of your data infrastructure and the path that an event takes from ingest to graph? What are the characteristics of the event data that you are dealing with and what challenges does it pose in terms of processing it at scale? In addition to the complexities of ingesting and storing data with a high degree of cardinality, being able to quickly analyze it for customer reporting poses a number of difficulties. Can you explain how you have built your systems to facilitate highly interactive usage patterns? A high degree of visibility into a running system is desirable for developers and systems adminstrators, but they are not always willing or able to invest the effort to fully instrument the code or servers that they want to track. What have you found to be the most difficult aspects of data collection, and do you have any tooling to simplify the implementation for user? How does Honeycomb compare to other systems that are available off the shelf or as a service, and when is it not the right tool? What have been some of the most challenging aspects of building, scaling, and marketing Honeycomb?

Contact Info

@samstokes on Twitter Blog samstokes on GitHub

Parting Question

Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets

2018-02-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Andreas François Vermeulen

Data Engineering Data Lake data data-science

Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets. The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results. He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions. What You'll Learn Become fluent in the essential concepts and terminology of data science and data engineering Build and use a technology stack that meets industry criteria Master the methods for retrieving actionable business knowledge Coordinate the handling ofpolyglot data types in a data lake for repeatable results Who This Book Is For Data scientists and data engineers who are required to convert data from a data lake into actionable knowledge for their business, and students who aspire to be data scientists and data engineers

Data Teams with Will McGinnis - Episode 19

2018-02-19 · Data Engineering Podcast Listen

podcast_episode

by Will McGinnis , Tobias Macey

AI/ML Data Engineering Data Management DevOps GitHub Linux Scikit-learn

Summary

The responsibilities of a data scientist and a data engineer often overlap and occasionally come to cross purposes. Despite these challenges it is possible for the two roles to work together effectively and produce valuable business outcomes. In this episode Will McGinnis discusses the opinions that he has gained from experience on how data teams can play to their strengths to the benefit of all.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers A few announcements:

There is still time to register for the O’Reilly Strata Conference in San Jose, CA March 5th-8th. Use the link dataengineeringpodcast.com/strata-san-jose to register and save 20% The O’Reilly AI Conference is also coming up. Happening April 29th to the 30th in New York it will give you a solid understanding of the latest breakthroughs and best practices in AI for business. Go to dataengineeringpodcast.com/aicon-new-york to register and save 20% If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to dataengineeringpodcast.com/odsc-east-2018 and register.

Your host is Tobias Macey and today I’m interviewing Will McGinnis about the relationship and boundaries between data engineers and data scientists

Interview

Introduction How did you get involved in the area of data management? The terms “Data Scientist” and “Data Engineer” are fluid and seem to have a different meaning for everyone who uses them. Can you share how you define those terms? What parallels do you see between the relationships of data engineers and data scientists and those of developers and systems administrators? Is there a particular size of organization or problem that serves as a tipping point for when you start to separate the two roles into the responsibilities of more than one person or team? What are the benefits of splitting the responsibilities of data engineering and data science?

What are the disadvantages?

What are some strategies to ensure successful interaction between data engineers and data scientists? How do you view these roles evolving as they become more prevalent across companies and industries?

Contact Info

Website wdm0006 on GitHub @willmcginniser on Twitter LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Blog Post: Tendencies of Data Engineers and Data Scientists Predikto Categorical Encoders DevOps SciKit-Learn

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

R Projects For Dummies

2018-02-13 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joseph Schmuller

AI/ML data data-science data-science-tools r

Make the most of R’s extensive toolset R Projects For Dummies offers a unique learn-by-doing approach. You will increase the depth and breadth of your R skillset by completing a wide variety of projects. By using R’s graphics, interactive, and machine learning tools, you’ll learn to apply R’s extensive capabilities in an array of scenarios. The depth of the project experience is unmatched by any other content online or in print. And you just might increase your statistics knowledge along the way, too! R is a free tool, and it’s the basis of a huge amount of work in data science. It's taking the place of costly statistical software that sometimes takes a long time to learn. One reason is that you can use just a few R commands to create sophisticated analyses. Another is that easy-to-learn R graphics enable you make the results of those analyses available to a wide audience. This book will help you sharpen your skills by applying them in the context of projects with R, including dashboards, image processing, data reduction, mapping, and more. Appropriate for R users at all levels Helps R programmers plan and complete their own projects Focuses on R functions and packages Shows how to carry out complex analyses by just entering a few commands If you’re brand new to R or just want to brush up on your skills, R Projects For Dummies will help you complete your projects with ease.

An Introduction to Discrete-Valued Time Series

2018-02-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Christian H. Weiss

AI/ML Computer Science Jenkins data data-science data-science-tasks statistics time-series

A much-needed introduction to the field of discrete-valued time series, with a focus on count-data time series Time series analysis is an essential tool in a wide array of fields, including business, economics, computer science, epidemiology, finance, manufacturing and meteorology, to name just a few. Despite growing interest in discrete-valued time series—especially those arising from counting specific objects or events at specified times—most books on time series give short shrift to that increasingly important subject area. This book seeks to rectify that state of affairs by providing a much needed introduction to discrete-valued time series, with particular focus on count-data time series. The main focus of this book is on modeling. Throughout numerous examples are provided illustrating models currently used in discrete-valued time series applications. Statistical process control, including various control charts (such as cumulative sum control charts), and performance evaluation are treated at length. Classic approaches like ARMA models and the Box-Jenkins program are also featured with the basics of these approaches summarized in an Appendix. In addition, data examples, with all relevant R code, are available on a companion website. Provides a balanced presentation of theory and practice, exploring both categorical and integer-valued series Covers common models for time series of counts as well as for categorical time series, and works out their most important stochastic properties Addresses statistical approaches for analyzing discrete-valued time series and illustrates their implementation with numerous data examples Covers classical approaches such as ARMA models, Box-Jenkins program and how to generate functions Includes dataset examples with all necessary R code provided on a companion website An Introduction to Discrete-Valued Time Series is a valuable working resource for researchers and practitioners in a broad range of fields, including statistics, data science, machine learning, and engineering. It will also be of interest to postgraduate students in statistics, mathematics and economics.

talk-data.com

Data Science

Activity Trend

Top Events

Top Speakers

Understanding #BigData #BigOpportunity in Big HR by @MarcRind

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Brief Conversations From The Open Data Science Conference: Part 1 - Episode 30

The Experimental Design of Paranormal Claims

Practical Web Scraping for Data Science: Best Practices and Examples with Python

Data Engineering Weekly with Joe Crobak - Episode 27

@AmyGershkoff on building #winning #DataScience team

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Want to fix #DataScience ? fix #governance by @StephenGatchell

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Ashok Srivastava(@aerotrekker) on Winning the Art of #DataScience

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Jeff Magnusson: How To Create A Self-Service Data Platform For Data Scientists

The Future Data Economy with Roger Chen - Episode 21

ML at Sloan Kettering Cancer Center

Business Case Analysis with R: Simulation Tutorials to Support Complex Business Decisions

@RCKashyap @Cylance on State of Security & Technologist Mindset

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

SQL Server 2017 Machine Learning Services with R

Honeycomb Data Infrastructure with Sam Stokes - Episode 20

Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets

Data Teams with Will McGinnis - Episode 19

R Projects For Dummies

An Introduction to Discrete-Valued Time Series