talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Apache Hadoop 3 Quick Start Guide

Dive into the world of distributed data processing with the 'Apache Hadoop 3 Quick Start Guide.' This comprehensive resource equips you with the knowledge needed to handle large datasets effectively using Apache Hadoop. Learn how to set up and configure Hadoop, work with its core components, and explore its powerful ecosystem tools. What this Book will help me do Understand the fundamental concepts of Apache Hadoop, including HDFS, MapReduce, and YARN, and use them to store and process large datasets. Set up and configure Hadoop 3 in both developer and production environments to suit various deployment needs. Gain hands-on experience with Hadoop ecosystem tools like Hive, Kafka, and Spark to enhance your big data processing capabilities. Learn to manage, monitor, and troubleshoot Hadoop clusters efficiently to ensure smooth operations. Analyze real-time streaming data with tools like Apache Storm and perform advanced data analytics using Apache Spark. Author(s) The author of this guide, Vijay Karambelkar, brings years of experience working with big data technologies and Apache Hadoop in real-world applications. With a passion for teaching and simplifying complex topics, Vijay has compiled his expertise to help learners confidently approach Hadoop 3. His detailed, example-driven approach makes this book a practical resource for aspiring data professionals. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who aspire to dive into the field of big data. If you're new to Apache Hadoop or looking to upgrade your skills to include version 3, this guide is for you. A basic understanding of Java programming is recommended to make the most of the topics covered. Embark on this journey to enhance your career in data-intensive industries.

Mastering Apache Cassandra 3.x - Third Edition

This expert guide, "Mastering Apache Cassandra 3.x," is designed for individuals looking to achieve scalable and fault-tolerant database deployment using Apache Cassandra. From mastering the foundational components of Cassandra architecture to advanced topics like clustering and analytics integration with Apache Spark, this book equips readers with practical, actionable skills. What this Book will help me do Understand and deploy Apache Cassandra clusters for fault-tolerant and scalable databases. Use advanced features of CQL3 to streamline database queries and operations. Optimize and configure Cassandra nodes to improve performance for demanding applications. Monitor and manage Cassandra clusters effectively using best practices. Combine Cassandra with Apache Spark to build robust data analytics pipelines. Author(s) None Ploetz and None Malepati are experienced technologists and software professionals with extensive expertise in distributed database systems and big data algorithms. They've combined their industry knowledge and teaching backgrounds to create accessible and practical guides for learners worldwide. Their collaborative work is focused on demystifying complex systems for maximum learning impact. Who is it for? This book is ideal for database administrators, software developers, and big data specialists seeking to expand their skill set into scalable data storage using Cassandra. Readers should have a basic understanding of database concepts and some programming experience. If you're looking to design robust databases optimized for modern big data use-cases, this book will serve as a valuable resource.

In this podcast, Erika from Proteus International talks about the state of leadership today. She sheds light on data-driven leaders and leaders leading the organizations to the future. She suggests some key insights, tactical steps, and some stories to help understand today's leaders' fabric leading tomorrow's organizations.

Timeline: 0:28 Erica's journey? 7:15 Advice to emerging leaders. 12:13 Adapting to change. 15:15 Tackling unconscious bias. 17:05 Evolution of leadership. 22:26 Taking the road not taken. 27:30 Expectations of leaders towards his/her business. 32:24 Investing in people or technology? 37:46 Getting the right feedback. 43:40 Example unbiased leadership. 51:20 Women in a leadership role. 57:48 Erica's secret for success.

Erika's Book: Be Bad First: Get Good at Things Fast to Stay Ready for the Future by Erika Andersen https://amzn.to/2L2DI01

Erika's Recommended Read: Good to Great: Why Some Companies Make the Leap and Others Don't by Jim Collins https://amzn.to/2JhswYG Dan Pink Books https://amzn.to/2Nc0X5s

Podcast Link: https://futureofdata.org/preparing-the-leaders-for-datadriven-future-erikaandersen/

Erika's BIO: Erika Andersen is the founding partner of Proteus, a coaching, consulting, and training firm that focuses on leader readiness. Over the past 30 years, Erika has developed a reputation for creating approaches to learning and business-building tailored to her clients’ challenges, goals, and culture. She and her colleagues at Proteus focus uniquely on helping leaders at all levels get ready and stay ready to meet whatever the future might bring.

Much of her recent work has focused on organizational visioning and strategy, executive coaching, management, and leadership development. In these capacities, she serves as consultant and advisor to the CEOs and/or top executives of several corporations, including NBCUniversal, Facebook, Hyatt Hotels Corporation, GE, Hulu, and Madison Square Garden.

She also shares her insights about managing people and creating successful businesses by speaking to corporations, non-profit groups, and national associations. Her books and learning guides have been translated into Spanish, Turkish, German, French, Russian, and Chinese. She has contributed to and been quoted in various national publications, including the Harvard Business Review, Wall Street Journal, Fortune, and The New York Times. Erika is also one of the most popular leadership bloggers at Forbes.com.

She is the author of Be Bad First—Get Good at Things FAST to Stay Ready for the Future (Bibliomotion, 2016), Leading So People Will Follow (Jossey-Bass, 2012), Being Strategic: Plan for Success; Outthink Your Competitors; Stay Ahead of Change (St. Martin’s Press, May 2009), and Growing Great Employees: Turning Ordinary People into Extraordinary Performers (Portfolio, 2006), and the author and host of the Proteus Leader Show a regular podcast that offers quick, practical support for leaders and managers.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey in creating the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Futurist #Podcast #BigData #Strategy

In this podcast, John Busby(@johnmbusby), Chief Analytics Officer @CenterfieldUSA, talks about his journey leading the data analytics practice of a digital marketing agency. He sheds light on some methodologies for building a sound data science practice. He sheds light on the future of digital marketing and shared some big opportunities ripe for disruption in the digital space.

Timeline: 0:28 John's journey. 4:26 Introduction to Centerfield. 6:00 John's role. 6:50 Designing a common platform for customers. 9:15 Analytics in Amazon. 11:02 Data science and marketing. 18:02 Importance of understanding the product for marketing. 21:44 AI in the marketing business. 25:26 Making sense of customer behavior. 27:50 End to end consumer behavior. 31:05 Editing and calibrating KPIs. 32:53 Creating an inside driven organization. 35:35 Recipe for a successful chief analytic officer. 37:46 On data bias. 39:12 Hiring the right people. 41:33 Big opportunities in digital marketing. 44:15 Future of digital marketing. 45:27 John's recipe for success. 48:52 John's favorite reads. 50:35 Key takeaways.

John's Recommended Read: Secrets of Professional Tournament Poker (D&B Poker) by Jonathan Little amzn.to/2MNKjN3

Podcast Link: https://futureofdata.org/data-today-shaping-digital-marketing-of-tomorrow-johnmbusby-centerfieldusa/

John's BIO: John Busby serves as Centerfield’s Chief Analytics Officer. A seasoned digital marketing executive, John leads the company’s data science, analytics and insights teams. Before joining Centerfield, John was Head of Analytics for Amazon’s grocery delivery service and responsible for business intelligence, data science and automated reporting. Prior to Amazon, John was Senior Vice President of Analytics and Marketing at Marchex. John began his career in product management for InfoSpace, Go2net and IQ Chart. He holds a Bachelor of Science from Northwestern University. Outside of work, John coaches youth hockey, and enjoys sports, poker and hanging out with his wife and two children.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest by mailing us @ [email protected]

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData,

DataAnalytics,

Leadership,

Futurist,

Podcast,

BigData,

Strategy

With all the hype and attention around big data and huge data platforms, there can sometimes be some data envy. There are still organizations and companies that don’t have big data: are they not poised for analytics too? Can they not get insights as well? The BI Pharaoh gives tips on how to work with your little data just like the big boys.

Originally published at https://www.eckerson.com/articles/little-data-needs-love-too

Summary

With the growth of the Hadoop ecosystem came a proliferation of implementations for the Hive table format. Unfortunately, with no formal specification, each project works slightly different which increases the difficulty of integration across systems. The Hive format is also built with the assumptions of a local filesystem which results in painful edge cases when leveraging cloud object storage for a data lake. In this episode Ryan Blue explains how his work on the Iceberg table format specification and reference implementation has allowed Netflix to improve the performance and simplify operations for their S3 data lake. This is a highly detailed and technical exploration of how a well-engineered metadata layer can improve the speed, accuracy, and utility of large scale, multi-tenant, cloud-native data platforms.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Ryan Blue about Iceberg, a Netflix project to implement a high performance table format for batch workloads

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Iceberg is and the motivation for creating it?

Was the project built with open-source in mind or was it necessary to refactor it from an internal project for public use?

How has the use of Iceberg simplified your work at Netflix? How is the reference implementation architected and how has it evolved since you first began work on it?

What is involved in deploying it to a user’s environment?

For someone who is interested in using Iceberg within their own environments, what is involved in integrating it with their existing query engine?

Is there a migration path for pre-existing tables into the Iceberg format?

How is schema evolution managed at the file level?

How do you handle files on disk that don’t contain all of the fields specified in a table definition?

One of the complicated problems in data modeling is managing table partitions. How does Iceberg help in that regard? What are the unique challenges posed by using S3 as the basis for a data lake?

What are the benefits that outweigh the difficulties?

What have been some of the most challenging or contentious details of the specification to define?

What are some things that you have explicitly left out of the specification?

What are your long-term goals for the Iceberg specification?

Do you anticipate the reference implementation continuing to be used and maintained?

Contact Info

rdblue on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Iceberg Reference Implementation Iceberg Table Specification Netflix Hadoop Cloudera Avro Parquet Spark S3 HDFS Hive ORC S3mper Git Metacat Presto Pig DDL (Data Definition Language) Cost-Based Optimization

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

In this podcast, Maksim, CDO @ City of San Diago, discussed the nuances of running big data for big cities. He shares his perspectives on effectively building a central data office in a complex and extremely collaborative environment like a big city. He shared his thoughts on some ways to effectively prioritize which project to pursue. He shared how leadership and execution could blend to solve civic issues relating to big and small cities. A great practitioner podcast for folks seeking to build a robust data science practice across a large and collaborative ecosystem.

Timeline: 0:28 Maksim's journey. 6:45 Maksim's current role. 11:46 Collaboration process in creating a data inventory. 14:52 Working with the bureaucracy. 18:35 Dealing with unforeseen circumstances at work. 20:22 Prioritization at work. 22:58 Qualities of a good data leader. 26:15 Collaboration with other cities. 27:40 Cool data projects in other cities. 30:55 Shortcomings of other city representatives. 36:54 Use cases in AI 39:00 What would Maksim change about himself? 40:50 Future cities and data 43:55 Opportunities for private investors in the public sector. 45:53 Maksim's success mantra. 50:19 Closing remark.

Maksim's Book Recommendation: The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win by Gene Kim, Kevin Behr, George Spafford amzn.to/2MAu5Xv

Podcast Link: https://futureofdata.org/understanding-bigdata-for-bigcities-with-maksim-mrmaksimize-cityofsandiego-futureofdata-podcast/

Maksim's BIO: Maksim Pecherskiy: As the CDO for the City of San Diego, working in the Performance & Analytics Department, Maksim strives to bring the necessary components together to allow the City's residents to benefit from a more efficient, agile government that is as innovative as the community around it. He has been solving complex problems with technology for nearly a decade. He spent 2014 working as a Code For America fellow in Puerto Rico, focusing on economic development. His team delivered a product called PrimerPeso that provides business owners and residents a tool to search, and apply for, government programs for which they may be eligible.

Before moving to California, Maksim was a Solutions Architect at Promet Source in Chicago, where he built large web applications and designed complex integrations. He shaped workflow, configuration management, and continuous integration processes while leading and training international development teams. Before his work at Promet, he was a software engineer at AllPlayers, who was instrumental in the design and architecture of its APIs and the development and documentation of supporting client libraries in various languages.

Maksim graduated from DePaul University with a bachelor of science degree in information systems and from Linköping University, Sweden, with a bachelor of science degree in international business. He is also certified as a Lean Six Sigma Green Belt.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest by mailing us @ [email protected]

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData,

DataAnalytics,

Leadership,

Futurist,

Podcast,

BigData,

Strategy

In this podcast, Jim Sterne shares how marketing has evolved through disruptive times. He shares some of the best practices in the marketing and digital analytics space. He sheds light on some opportunities in the marketing and analytics space and how machine learning is changing the face of digital and marketing. This is a great podcast for anyone looking to understand how AI is impacting marketing and what are some big opportunities in marketing and digital.

Timeline: 0:30 Jim's journey. 5:25 The evolution of marketing. 8:45 Breaking down the digital. 11:40 Marketing and analytics. 13:27 Misuse of analytics in marketing. 17:35 Resolving bad data and bias. 22:20 Good digital analyst vs. bad digital analyst. 28:06 Defining a well-oiled marketing machine. 30:33 Marketing industry's adoption of technology. 34:19 Technology adoption strategy. 38:23 Impact of machine learning and digital marketing. 42:19 Decision making, accountability, and AI. 47:08 Advice for start-ups. 48:52 Disruption opportunities in digital marketing. 55:57 Ethics and marketing. 58:52 What's next in digital marketing. 1:02:27 Jim's success mantra. 1:05:36 Jim's reading list. 1:07:30 Key takeaways.

Jim's Books: amzn.to/2KB1QCR

Jim's Current Read List: Shift: 19 Practical, Business-Driven Ideas for an Executive in Charge of Marketing but Not Trained for the Task by Sean Doyle amzn.to/2KG4K9d Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking by Foster Provost and Tom Fawcett amzn.to/2AWR3Dz

Podcast Link: https://futureofdata.org/future-of-data-in-marketing-digital-jimsterne/

Jim's BIO: Jim Sterne focused his thirty-five years in sales and marketing to create and strengthen customer relationships through digital communications. He sold business computers to companies that had never owned one in the 1980s, consulted and keynoted online marketing in the 1990s, and founded a conference and a professional association around digital analytics in the 2000s, following his humorous Devil's Data Dictionary. Sterne has just published his twelfth book Artificial Intelligence for Marketing: Practical Applications. Sterne produced the eMetrics Summit from 2002 - 2017 and now produces the Marketing Evolution Experience. He was co-founder and served for 17 years as the Board Chair of the Digital Analytics Association.

Jim was named one of the 50 most influential people in digital marketing by a top marketing magazine in the United Kingdom and identified as one of the top 25 Hot Speakers by the National Speakers Association.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest by mailing us @ [email protected]

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData,

DataAnalytics,

Leadership,

Futurist,

Podcast,

BigData,

Strategy

Summary As your data needs scale across an organization the need for a carefully considered approach to collection, storage, organization, and access becomes increasingly critical. In this episode Todd Walter shares his considerable experience in data curation to clarify the many aspects that are necessary for a successful platform for your business. Using the metaphor of a museum curator carefully managing the precious resources on display and in the vaults, he discusses the various layers of an enterprise data strategy. This includes modeling the lifecycle of your information as a pipeline from the raw, messy, loosely structured records in your data lake, through a series of transformations and ultimately to your data warehouse. He also explains which layers are useful for the different members of the business, and which pitfalls to look out for along the path to a mature and flexible data platform.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. You work hard to make sure that your data is reliable and accurate, but can you say the same about the deployment of your machine learning models? The Skafos platform from Metis Machine was built to give your data scientists the end-to-end support that they need throughout the machine learning lifecycle. Skafos maximizes interoperability with your existing tools and platforms, and offers real-time insights and the ability to be up and running with cloud-based production scale infrastructure instantaneously. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Todd Walter about data curation and how to architect your data systems to support high quality, maintainable intelligence

Interview

Introduction How did you get involved in the area of data management? How do you define data curation?

What are some of the high level concerns that are encapsulated in that effort?

How does the size and maturity of a company affect the ways that they architect and interact with their data systems? Can you walk through the stages of an ideal lifecycle for data within the context of an organizations uses for it? What are some of the common mistakes that are made when designing a data architecture and how do they lead to failure? What has changed in terms of complexity and scope for data architecture and curation since you first started working in this space? As “big data” became more widely discussed the common mantra was to store everything because you never know when you’ll need the data that might get thrown away. As the industry is reaching a greater degree of maturity and more regulations are implemented there has been a shift to being more considerate as to what information gets stored and for how long. What are your views on that evolution and what is your litmus test for determining which data to keep? In terms of infrastructure, what are the components of a modern data architecture and how has that changed over the years?

What is your opinion on the relative merits of a data warehouse vs a data lake and are they mutually exclusive?

Once an architecture has been established, how do you allow for continued evolution to prevent stagnation and eventual failure? ETL has long been the default approac

In this podcast, Dennis Mortensen (@DennisMortensen @XdotAI) sat with Vishal Kumar from @AnalyticsWeek to discuss his entrepreneurial journey of building successful analytics startups. He shares his journey to starting advanced analytics at AI startup x.ai and how he is solving an important productivity killer using AI. He shared his challenges and opportunities of being an early entrant into the AI startup space. He also shared his thoughts on Google Wave and Google Duplex and what to expect from these technologies in the future.

Timelines: 0:28 Dennis's journey 4:46 Dennis's "why." 9:50 Dennis's success mantra. 14:45 Making of X.ai 19:03 Educating the market 22:34 Surprises on the way 30:05 Killing the inbox 35:50 Why the calendar? 39:07 About Google. duplex 50:05 Future of work 55:00 Recommended books.

Dennis's Recommended Read: The Narrow Road: A Brief Guide to the Getting of Money by Felix Dennis amzn.to/2vaJ1S4 Undisputed Truth by Mike Tyson, Larry Sloman amzn.to/2ACOypK Shoe Dog: A Memoir by the Creator of Nike by Phil Knight amzn.to/2MaFMAu

Podcast Link: https://futureofdata.org/road-to-building-a-successful-ai-startup-dennismortensen-xdotai-futureofdata-podcast/

Dennis's BIO: Dennis Mortensen is the CEO and co-founder of x.ai.

Dennis is an expert in leveraging data to solve enterprise use cases and a serial entrepreneur who’s successfully exited several companies on that theme.

His long-term vision of killing the inbox led to the formation of x.ai and the creation of Amy + Andrew, artificially intelligent assistants who schedule meetings. He frequently speaks to anyone who’ll listen, from the crowds of Web Summit to his building’s doorman, about an optimistic future for AI, productivity, and the future of work.

Dennis was also an accredited Associate Analytics Instructor at the University of British Columbia and the author of Data-Driven Insights, on collecting and analyzing digital data.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest by emailing us @ [email protected]

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData,

DataAnalytics,

Leadership,

Futurist,

Podcast,

BigData,

Strategy

In this podcast, Seth Stephens-Davidowitz (@SethD_S), author of New York Times Bestseller Everybody Lies, discussed what our social data knows about us. He shares some critical insights into the human psyche on how humans behave differently to machines than fellow humans. This sheds some interesting light on today's data-driven disruptive times when curiosity to stay relevant and data-driven is at an all-time high. This is a great podcast to understand the capability that data has and how it could benefit humanity, businesses, and clients if used properly—a great session for anyone willing to understand the depth of data and how to use it effectively.

Timelines: 0:29 Seth's journey. 4:23 Story behind "Everybody lies". 7:27 Finding the right searches to analyze. 8:42 Surprising findings on analyzing the internet searches of people. 10:50 Confusion and human search data. 12:55 Google search recommendation's effect on human search data. 15:47 To google or not to google. 17:48 Are surveys reliable? 19:29 Safeguarding against fake data. 22:30 Compromised privacy may be a good thing. 24:30 Seth's favorite tool or language. 25:40 Challenges in working with data. 26:22 Finding a hypothesis in human search data. 28:02 Political predictions through data. 32:10 On Cambridge Analytica 35:05 The ethics of data. 39:05 On AI 41:24 Defining a data scientist. 43:11 Key points of "Everybody lies". 44:17 Secret behind Seth's success. 45:50 Journey from a basketball to data scientist. 48:03 Seth's favorite reads. 48:34 Key takeaways.

Seth's Book: Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz amzn.to/2OA0YBs

Seth's Recommended Read: Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker amzn.to/2Kl2nsr

Podcast Link: https://futureofdata.org/understand-social-data-to-know-human-psyche-seths_d-author-nytbestseller-everybody-lies-futureofdata-podcast/

Seth's BIO: Seth Stephens-Davidowitz has used data from the internet -- particularly Google searches -- to get new insights into the human psyche.

Seth has used Google searches to measure racism, self-induced abortion, depression, child abuse, hateful mobs, the science of humor, sexual preference, anxiety, son preference, and sexual insecurity, among many other topics.

His 2017 book, Everybody Lies, published by HarperCollins, was a New York Times bestseller, a PBS NewsHour Book of the Year, and an Economist Book of the Year.

Seth worked for one-and-a-half years as a data scientist at Google and is currently a contributing op-ed writer for the New York Times. He is a former visiting lecturer at the Wharton School at the University of Pennsylvania. He received his BA in philosophy, Phi Beta Kappa, from Stanford, and his Ph.D. in economics from Harvard.

In high school, Seth wrote obituaries for the local newspaper, the Bergen Record, and was a juggler in theatrical shows. He now lives in Brooklyn and is a passionate fan of the Mets, Knicks, Jets, Stanford football, and Leonard Cohen.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest by mailing us @ [email protected]

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData,

DataAnalytics,

Leadership,

Futurist,

Podcast,

BigData,

Strategy

In this podcast Jason Carmel(@defenestrate99) Chief Data Officer @ POSSIBLE talks about his journey leading data analytics practice of digital marketing agency. He sheds light on some methodologies for building a sound data science practice. He sheds light on using data science chops for doing some good while creating traditional value. He shared his perspective on keeping team-high on creativity to keep creating innovative solutions. This is a great podcast for anyone looking to understanding the digital marketing landscape and how to create a sound data science practice.

Timelines: 0:29 Jason's journey. 6:40 Advantage of having a legal background for a data scientist. 9:15 Understanding emotions based on data. 13:54 The empathy model. 14:53 From idea to inception to execution. 23:40 The role of digital agencies. 30:20 Measuring the right amount of data. 32:40 Management in a creative agency. 34:40 Leadership qualities that promote creativity. 38:14 Leader's playbook in a digital agency. 40:50 Qualities of a great data science team in the digital agency. 44:30 Leadership's role in data creativity. 47:00 Opportunites as a data scientist in the digital agency. 49:18 Future of data in digital media. 51:38 Jason's success mantra. 53:30 Jason's favorite reads. 57:11 Key takeaways.

Jason's Recommended Read: Trendology: Building an Advantage through Data-Driven Real-Time Marketing by Chris Kerns amzn.to/2zMhYkV Venomous: How Earth's Deadliest Creatures Mastered Biochemistry by Christie Wilcox amzn.to/2LhqI76

Podcast Link: https://futureofdata.org/jason-carmel-defenestrate99-possible-leading-analytics-data-digital-marketing/

Jason's BIO: Jason Carmel is Chief Data Officer at Possible. With nearly 20 years of digital data and marketing experience, Jason has worked with clients such as Coca Cola, Ford, and Microsoft to evolve digital experiences based on real-time feedback and behavioral data. Jason manages a global team of 100 digital analysts across POSSIBLE, a digital advertising agency that uses traditional and unconventional data sets and models to help brands connect more effectively with their customers.

Of particular interest is Jason’s work using data and machine learning to define and understand the emotional components of human conversation. Jason spearheaded the creation of POSSIBLE’s Empathy Model, with translates the raw, unstructured content of social media into a quantitative understanding of what customers are actually feeling about a given topic, event, or brand.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest by mailing us @ [email protected]

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData,

DataAnalytics,

Leadership,

Futurist,

Podcast,

BigData,

Strategy

Malware Data Science

"Security has become a ""big data"" problem. The growth rate of malware has accelerated to tens of millions of new files per year while our networks generate an ever-larger flood of security-relevant data each day. In order to defend against these advanced attacks, you'll need to know how to think like a data scientist. In Malware Data Science, security data scientist Joshua Saxe introduces machine learning, statistics, social network analysis, and data visualization, and shows you how to apply these methods to malware detection and analysis. You'll learn how to: • Analyze malware using static analysis• Observe malware behavior using dynamic analysis• Identify adversary groups through shared code analysis• Catch 0-day vulnerabilities by building your own machine learning detector• Measure malware detector accuracy• Identify malware campaigns, trends, and relationships through data visualization Whether you're a malware analyst looking to add skills to your existing arsenal, or a data scientist interested in attack detection and threat intelligence, Malware Data Science will help you stay ahead of the curve."

Healthcare Informatics

This book provides an understanding of the different types of healthcare service providers, corresponding information technologies, analytic methods, and data issues that play a vital role in transforming the healthcare industry. A follow-up to Healthcare Informatics: Improving Efficiency and Productivity, this latest book includes new content that examines the evolution of Big Data and how it is revolutionizing the healthcare industry. Presenting strategies for achieving national goals for the meaningful use of health information technology, the book describes how to enhance process efficiency by linking technologies, data, and analytics with strategic initiatives.

This podcast spends time discussing Tim O'Reilly's futuristic perspective on data, analytics, AI, jobs, and organization. He sheds light on what are somethings businesses could do to stay relevant and future proof. He discussed his book and shared some of the key insights relevant to anyone thinking of staying relevant in the World led by technology and impacting the future. A must video for anyone working!

Timeline: 00:28 Tim's journey. 06:03 Tim's current occupation. 10:50 Interesting work for interesting people. 15:08 Thinking behind the title "What's the future". 23:41 Culture and technology evolution. 26:29 Creating value for the shareholder. 35:06 Learning a new skill. 38:12 Labor and technology. 47:07 Investing in humans or technology? 56:02 The role of AI in Media. 59:45 How can an employee stay relevant? 1:04:28 Tim's favorite books. 1:09:38 Key takeaways.

Tim's Book: WTF?: What's the Future and Why It's Up to Us by Tim O'Reilly https://amzn.to/2N5WhOn

Tim's Recommended Read: AI Superpowers: China, Silicon Valley, and the New World Order by Kai-Fu Lee https://amzn.to/2N8VGLL Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal and Joshua Gans https://amzn.to/2ugQBKr The Long Twentieth Century: Money, Power and the Origins of Our Times by Giovanni Arrighi https://amzn.to/2ufhb6R Doughnut Economics: Seven Ways to Think Like a 21st-Century Economist by Kate Raworth https://amzn.to/2LcbLQc Winners Take All: The Elite Charade of Changing the World by Anand Giridharadas https://amzn.to/2utgeXF New Power: How Power Works in Our Hyperconnected World--and How to Make It Work for You by Jeremy Heimans and Henry Timms https://amzn.to/2NbBJ77 Seeing like a State: How Certain Schemes to Improve the Human Condition Have Failed by James C. Scott https://amzn.to/2ztnoRz The Struggle for Survival: An Historical, political, and Socioeconomic Perspective of St. Lucia by Anderson Reynolds https://amzn.to/2uqF22w

Podcast Link: https://futureofdata.org/discussing-jobs-data-and-whatsthefuture-with-timoreilly-futureofdata-podcast/

Tim's BIO: Tim O’Reilly is the founder and CEO of O’Reilly Media, Inc. His original business plan was “interesting work for interesting people,” which worked out pretty well. O’Reilly Media delivers online learning, publishes books, runs conferences, urges companies to create more value than they capture, and tries to change the world by spreading and amplifying the knowledge of innovators.

Tim has a history of convening conversations that reshape the computer industry. In 1993, he launched the first commercial, ad-supported site on the internet. In 1998, he organized the meeting where the term “open source software” was agreed on and helped the business world understand its importance. In 2004, with the Web 2.0 Summit, he defined how “Web 2.0” represented not only the resurgence of the web after the dot com bust, but a new model for the computer industry, based on big data, collective intelligence, and the internet as a platform. In 2009, with his “Gov 2.0 Summit,” he framed a conversation about the modernization of government technology that has shaped policy and spawned initiatives at the Federal, State, and local level and around the world. He has now turned his attention to the implications of AI, the on-demand economy, and other technologies that are transforming the nature of work and the future shape of the business world. This is the subject of his forthcoming book from Harper Business, WTF: What’s the Future and Why It’s Up to Us.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey in creating the data-driven future.

Wanna Join? If you or any you know wants to join in or sponsor, Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Futurist #Podcast #BigData #Strategy

In this podcast, Don Kettl, Professor, LBJ School, the University of Texas at Austin, talks about the future of the public sector in the mid of data and analytics capability disruptions. Don talked about some of the biggest opportunities in the public policy space. He sheds light on how the future public policy officers would design the organizations that grow with time. He sheds light on the future of jobs in the public sector and how data could disrupt the space to increase its impact. This session is great for people interested in learning about public sector data and jobs impact through big data evolution.

TIMELINE: 0:28 Don's journey. 5:16 Premise of "Little bites of big data policy". 7:16 Data in the government sector. 11:18 Example of good data framework in state governments. 13:49 The need for good cooperation between the private and public sectors. 17:56 Opportunities for data in the public sector. 21:37 The failure of data in the public sector. 27:54 Perspective on open data. 33:58 Future of data in the public sector. 41:42 The role of government in data businesses. 48:58 Can government data policies go global? 55:56 Don's success mantra. 59:43 Don's reading list. 1:01:30 How does Don avoid bias? 1:07:00 Key takeaways.

Don's Book: Little Bites of Big Data for Public Policy by Donald F Kettl amzn.to/2zfpKDn Politics of the Administrative Process by Donald F Kettl amzn.to/2KS34KY and more at: amzn.to/2u12gg8

Podcast Link: https://futureofdata.org/future-of-public-sector-and-jobs-in-bigdata-world-futureofdata-podcast/

Don's BIO: Donald F. Kettl is a professor at the Lyndon B. Johnson School of Public Affairs at the University of Texas at Austin. He is also a nonresident senior fellow at the Volcker Alliance and the Brookings Institution.

Kettl is the author or editor of numerous books, including Can Governments Earn Our Trust? (2017); Little Bites of Big Data for Public Policy (2017); The Politics of the Administrative Process (7th edition, 2017). Three of his books have received national best-book awards. The Transformation of Governance (2002); and System under Stress: Homeland Security and American Politics (2005) and Escaping Jurassic Government: How to Recover America’s Lost Commitment to Competence.

He has received three-lifetime achievement awards: the American Political Science Association’s John Gaus Award, the Warner W. Stockberger Achievement Award of the International Public Management Association, and the Donald C. Stone Award of the American Society for Public Administration, for significant contributions to the field of intergovernmental relations.

Kettl holds a Ph.D. in political science from Yale University. Before his appointment at the University of Maryland, he taught at the University of Pennsylvania, Columbia University, the University of Virginia, Vanderbilt University, and the University of Wisconsin-Madison. He is a fellow of Phi Beta Kappa and the National Academy of Public Administration.

He has appeared frequently in national and international media, including National Public Radio, the Fox News Channel, Good Morning America, ABC World News Tonight, NBC Nightly News, CBS Evening News, CNN’s “Anderson Cooper 360” and “The Situation Room,” the Huffington Post, as well as public television’s News Hour and the BBC.

Kettl is a shareholder of the Green Bay Packers, along with his wife, Sue.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ analyticsweek.com/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Business Analytics, Volume I

Business Analytics: A Data-Driven Decision Making Approach for Business-Part I,/i> provides an overview of business analytics (BA), business intelligence (BI), and the role and importance of these in the modern business decision-making. The book discusses all these areas along with three main analytics categories: (1) descriptive, (2) predictive, and (3) prescriptive analytics with their tools and applications in business. This volume focuses on descriptive analytics that involves the use of descriptive and visual or graphical methods, numerical methods, as well as data analysis tools, big data applications, and the use of data dashboards to understand business performance. The highlights of this volume are: Business analytics at a glance; Business intelligence (BI), data analytics; Data, data types, descriptive analytics; Data visualization tools; Data visualization with big data; Descriptive analytics-numerical methods; Case analysis with computer applications.

In this podcast Mike Tamir (@MikeTamir, Head of #DataScience) talked about building a data science AI team. He shared his AI project (FakerFact.org). He shared the lifecycle of an AI project and some things that leaders could keep in mind to help create a successful data science AI team. This podcast is great for leaders learning to build a strong AI workforce.

TIMELINE: 0:28 Micheal's journey. 2:36 Micheal's current role. 3:18 AI and businesses. 5:28 Parameters to consider for AI adoption. 9:30 When do businesses invest in ML resources. 13:20 Tips for candidates in vetting data companies. 16:05 What's the faker fact? 20:45 Getting started on an AI product design. 24:58 Achieving accuracy in data. 27:40 AI the newsmaker and AI the fact-checker. 33:56 Tips for hiring the right data leader for a business. 35:32 Creating a great data science team. 37:19 Challenges in forming a data science team. 39:00 In job training to achieve technological competence. 44:00 Ingredients of a good hire. 47:35 Micheal's secret to success. 50:55 Micheal's favorite reads. 54:20 Key takeaways.

Mike's Recommended Read: What Technology Wants by Kevin Kelly https://amzn.to/2MaNiuN Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville http://www.deeplearningbook.org/

Podcast Link: https://futureofdata.org/building-data-science-ai-teams-by-miketamir-uberatg-futureofdata-podcast/

Mike's BIO: Mike serves as Head of Data Science at Uber ATG, UC Berkeley Data Science faculty, and head of Phronesis ML Labs. He has led teams of Data Scientists in the bay area as Chief Data Scientist for InterTrust and Takt, Director of Data Sciences for MetaScale/Sears, and CSO for Galvanize, where he founded the galvanizeU-UNH accredited Masters of Science in Data Science degree and oversaw the company's transformation from co-working space to Data Science organization. Mike's most recent passion in research has involved applying Machine Learning techniques to help combat fake news through the FakerFact.org project

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ https://analyticsweek.com/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it. Along the way, you’ll discover resilient distributed datasets (RDDs); use Spark SQL for structured data; and learn stream processing and build real-time applications with Spark Structured Streaming. Furthermore, you’ll learn the fundamentals of Spark ML for machine learning and much more. After you read this book, you will have the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications. What You Will Learn Understand Spark unified data processing platform Howto run Spark in Spark Shell or Databricks Use and manipulate RDDs Deal with structured data using Spark SQL through its operations and advanced functions Build real-time applications using Spark Structured Streaming Develop intelligent applications with the Spark Machine Learning library Who This Book Is For Programmers and developers active in big data, Hadoop, and Java but who are new to the Apache Spark platform.

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Taylor Udell (Heap) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

Business Intelligence. It's a term that's been around for a few decades, but that is every bit as difficult to nail down as "data science," "big data," or a jellyfish. Think too hard about it, and you might actually find yourself struggling to define "analytics!" With the latest generation of BI tools, though, it's a topic that is making the rounds at cocktail parties the world over! (Cocktail parties just aren't what they used to be.) On this episode, the crew snags Taylor Udell from Heap to join in a discussion on the subject, and Moe (unsuccessfully) attempts to end the episode after six minutes. Possibly because neither Tableau nor Superset can definitively prove where avocado toast originated (but Wikipedia backs her up). But we all know Tim can't be shut up that quickly, right?! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.