talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Já parou pra pensar como é o dia a dia de um Cientista de Dados no Nubank? E um Engenheiro de Dados no EBANX? É isso que vamos falar no episódio de hoje! Vamos entender que desafios eles enfrentam todos os dias, quais ferramentas utilizam no cotidiano, como os times são estruturados, e muito mais!

No episódio de hoje, nós convidamos os Data Hackers Pedro Tabacof — Cientista de Dados na Nubank— e Pietro Oliveira — Engenheiro de Dados no EBANX— para bater um papo sobre como é trabalhar com Ciência de Dados em duas das maiores fintechs do Brasil.

Acesse nosso post do Medium  para ter acesso as coisas que falamos no episódio:  https://goo.gl/xHcA9A 

Meta-Analytics

Meta-Analytics: Consensus Approaches and System Patterns for Data Analysis presents an exhaustive set of patterns for data science to use on any machine learning based data analysis task. The book virtually ensures that at least one pattern will lead to better overall system behavior than the use of traditional analytics approaches. The book is ‘meta’ to analytics, covering general analytics in sufficient detail for readers to engage with, and understand, hybrid or meta- approaches. The book has relevance to machine translation, robotics, biological and social sciences, medical and healthcare informatics, economics, business and finance. Inn addition, the analytics within can be applied to predictive algorithms for everyone from police departments to sports analysts. Provides comprehensive and systematic coverage of machine learning-based data analysis tasks Enables rapid progress towards competency in data analysis techniques Gives exhaustive and widely applicable patterns for use by data scientists Covers hybrid or ‘meta’ approaches, along with general analytics Lays out information and practical guidance on data analysis for practitioners working across all sectors

Summary Customer analytics is a problem domain that has given rise to its own industry. In order to gain a full understanding of what your users are doing and how best to serve them you may need to send data to multiple services, each with their own tracking code or APIs. To simplify this process and allow your non-engineering employees to gain access to the information they need to do their jobs Segment provides a single interface for capturing data and routing it to all of the places that you need it. In this interview Segment CTO and co-founder Calvin French-Owen explains how the company got started, how it manages to multiplex data streams from multiple sources to multiple destinations, and how it can simplify your work of gaining visibility into how your customers are engaging with your business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Managing and auditing access to your servers and databases is a problem that grows in difficulty alongside the growth of your teams. If you are tired of wasting your time cobbling together scripts and workarounds to give your developers, data scientists, and managers the permissions that they need then it’s time to talk to our friends at strongDM. They have built an easy to use platform that lets you leverage your company’s single sign on for your data platform. Go to dataengineeringpodcast.com/strongdm today to find out how you can simplify your systems. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with O’Reilly Media for the Strata conference in San Francisco on March 25th and the Artificial Intelligence conference in NYC on April 15th. Here in Boston, starting on May 17th, you still have time to grab a ticket to the Enterprise Data World, and from April 30th to May 3rd is the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Your host is Tobias Macey and today I’m interviewing Calvin French-Owen about the data platform that Segment has built to handle multiplexing continuous streams of data from multiple sources to multiple destinations

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Segment is and how the business got started?

What are some of the primary ways that your customers are using the Segment platform? How have the capabilities and use cases of the Segment platform changed since it was first launched?

Layered on top of the data integration platform you have added the concepts of Protocols and Personas. Can you explain how each of those products fit into the over

Python for Data Science For Dummies, 2nd Edition

The fast and easy way to learn Python programming and statistics Python is a general-purpose programming language created in the late 1980s—and named after Monty Python—that's used by thousands of people to do things from testing microchips at Intel, to powering Instagram, to building video games with the PyGame library. Python For Data Science For Dummies is written for people who are new to data analysis, and discusses the basics of Python data analysis programming and statistics. The book also discusses Google Colab, which makes it possible to write Python code in the cloud. Get started with data science and Python Visualize information Wrangle data Learn from data The book provides the statistical background needed to get started in data science programming, including probability, random distributions, hypothesis testing, confidence intervals, and building regression models for prediction.

Kyle interviews Julia Silge about her path into data science, her book Text Mining with R, and some of the ways in which she's used natural language processing in projects both personal and professional. Related Links https://stack-survey-2018.glitch.me/ https://stackoverflow.blog/2017/03/28/realistic-developer-fiction/

O que será que vai ser novidade no mercado de Data Science em 2019? O hype sobre Inteligência Artificial vai continuar? Carros autônomos se consolidarão esse ano? Essas e outras perguntas nós iremos discutir no episódio de hoje do seu podcast de Data Science e Data Engineering favorito, que está incrível por sinal!

No episódio de hoje, nós convidamos os Data Hackers Danilo Costa — Cientista de Dados na MaxMilhas — e Anderson Amaral — CDO da Dataholics e consultor de Data Science — para bater um papo sobre o que acreditamos que será novidade em 2019 e quais tecnologias irão crescer no mercado brasileiro e mundial.

Summary Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first machine learning projects so that they can remain competitive in our landscape of constant change. In this episode he discusses why machine learning projects require a new set of capabilities, how to build a team from internal and external candidates, and how an example project progressed through each phase of maturity. This was a great conversation for anyone who wants to understand the benefits and tradeoffs of machine learning for their own projects and how to put it into practice.

Introduction

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Kevin Dewalt about his experiences at Prolego, building machine learning projects for Fortune 500 companies

Interview

Introduction How did you get involved in the area of data management? For the benefit of software engineers and team leaders who are new to machine learning, can you briefly describe what machine learning is and why is it relevant to them? What is your primary mission at Prolego and how did you identify, execute on, and establish a presence in your particular market?

How much of your sales process is spent on educating your clients about what AI or ML are and the benefits that these technologies can provide?

What have you found to be the technical skills and capacity necessary for being successful in building and deploying a machine learning project?

When engaging with a client, what have you found to be the most common areas of technical capacity or knowledge that are needed?

Everyone talks about a talent shortage in machine learning. Can you suggest a recruiting or skills development process for companies which need to build out their data engineering practice? What challenges will teams typically encounter when creating an efficient working relationship between data scientists and data engineers? Can you briefly describe a successful project of developing a first ML model and putting it into production?

What is the breakdown of how much time was spent on different activities such as data wrangling, model development, and data engineering pipeline development? When releasing to production, can you share the types of metrics that you track to ensure the health and proper functioning of the models? What does a deployable artifact for a machine learning/deep learning application look like?

What basic technology stack is necessary for putting the first ML models into production?

How does the build vs. buy debate break down in this space and what products do you typically recommend to your clients?

What are the major risks associated with deploying ML models and how can a team mitigate them? Suppose a software engineer wants to break into ML. What data engineering skills would you suggest they learn? How should they position themselves for the right opportunity?

Contact Info

Email: Kevin Dewalt [email protected] and Russ Rands [email protected] Connect on LinkedIn: Kevin Dewalt and Russ Rands Twitter: @kevindewalt

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Prolego Download our book: Become an AI Company in 90 Days Google Rules Of ML AI Winter Machine Learning Supervised Learning O’Reilly Strata Conference GE Rebranding Commercials Jez Humble: Stop Hiring Devops Experts (And Start Growing Them) SQL ORM Django RoR Tensorflow PyTorch Keras Data Engineering Podcast Episode About Data Teams DevOps For Data Teams – DevOps Days Boston Presentation by Tobias Jupyter Notebook Data Engineering Podcast: Notebooks at Netflix Pandas

Podcast Interview

Joel Grus

JupyterCon Presentation Data Science From Scratch

Expensify Airflow

James Meickle Interview

Git Jenkins Continuous Integration Practical Deep Learning For Coders Course by Jeremy Howard Data Carpentry

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

DIGITAL ANALYTICS MEETS DATA SCIENCE: USE CASES FOR GOOGLE ANALYTICS

Past attendees of Superweek have ridden along with Tim as he explored R, and then as he dove deeper into some of the fundamental concepts of statistics. In this session, he will provide the latest update on that journey: how he is putting his exploration into the various dimensions of data science to use with real data and real clients. The statistical methods will be real, the code will be R (and available on GitHub), and the data will only be lightly obfuscated. So, you will be able to head back to your room at the next break and try one or more of the examples out on your own data! (But, don't do that -- the food and conversation at the breaks is too good to miss!)

talk
by Doug Hall (ConversionWorks, UK)

The client seems happy enough just to collect the data. Our job is to make them understand what the data is for. Once they lose their data virginity, they won't want to stop. Doug describes a brand new set of personal use cases where the awesome power of data was liberated once the data was actually used. Learn even more tricks and techniques of the stealth change agent leading to that threshold moment in a data empowered career. Machine learning, data science, optimisation, professional jiu-jitsu and good old political persuasion are on the menu. Learn how to convince clients (internal or otherwise) to actually take your advice on board and do something other than just stare at a dashboard.

Hands-On Data Science with the Command Line

"Hands-On Data Science with the Command Line" introduces the incredible power of command-line tools to simplify and automate data science tasks. Leveraging tools like AWK, Bash, and more, you'll learn not only to handle datasets effectively but also to create efficient data pipelines and visualize data directly from the command line. What this Book will help me do Learn to set up and optimize the command line interface for data science tasks. Master using AWK and similar tools for data processing. Discover strategies for scripting, automation, and managing files efficiently. Understand how to visualize data directly from the command line. Gain fluency in combining tools to create seamless data pipelines. Author(s) The authors, None Morris, None McCubbin, and None Page, are experienced data scientists and technical authors with a passion for teaching complex topics in approachable ways. Their extensive experience using command-line tools for data-related workflows equips them to guide readers step-by-step in mastering these powerful techniques. Who is it for? This book is ideal for data scientists and data analysts seeking to streamline and automate their workflows using command-line tools. If you have basic experience with data science and are curious about incorporating the efficiency of the command line into your work, this guide is perfect for you.

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Dr. Katie Sasso (Columbus Collaboratory) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

What does it really take to bring data science into the enterprise? Or... what does it take to bring it into your part of the enterprise? In this episode, the gang sits down with Dr. Katie Sasso from the Columbus Collaboratory...because that's similar to what she does! From the criticality of defining the business problem clearly, to ensuring the experts with the deep knowledge of the data itself are included in the process, to the realities of information security and devops support needs, it was a pretty wide-ranging discussion. And there were convolutional neural networks (briefly). For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Olá, Data Hacker, sejam bem-vindos ao 6º episódio do nosso podcast! Nesta edição resolvemos fazer diferente e passamos o comando do podcast para três mulheres de muito destaque quando o assunto é Data Science!

Esse episódio é comandado pelas Data Hackers Luciana Lima — Data Scientist na A3Data — , Pollyanna Gonçalves — Data Scientist na Hotmart —  e Giovanna Damasceno — Data Scientist na MaxMilhas— , para dividirem com a gente a visão feminina sobre a área de Data Science. 

No episódio elas debatem sobre o mercado de trabalho, medidas de inclusão, diversidade, carreira, principais barreiras enfrentadas por mulheres, desafios tecnológicos e muito mais! Uma verdadeira fonte de reflexão sobre o quanto o mercado ainda precisa evoluir e um episódio obrigatório para todo mundo que está na área de Data Science!

Beyond Spreadsheets with R

Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You’ll build on simple programming techniques like loops and conditionals to create your own custom functions. You’ll come away with a toolkit of strategies for analyzing and visualizing data of all sorts using R and RStudio. About the Technology Spreadsheets are powerful tools for many tasks, but if you need to interpret, interrogate, and present data, they can feel like the wrong tools for the task. That’s when R programming is the way to go. The R programming language provides a comfortable environment to properly handle all types of data. And within the open source RStudio development suite, you have at your fingertips easy-to-use ways to simplify complex manipulations and create reproducible processes for analysis and reporting. About the Book With Beyond Spreadsheets with R you’ll learn how to go from raw data to meaningful insights using R and RStudio. Each carefully crafted chapter covers a unique way to wrangle data, from understanding individual values to interacting with complex collections of data, including data you scrape from the web. You’ll build on simple programming techniques like loops and conditionals to create your own custom functions. You’ll come away with a toolkit of strategies for analyzing and visualizing data of all sorts. What's Inside How to start programming with R and RStudio Understanding and implementing important R structures and operators Installing and working with R packages Tidying, refining, and plotting your data About the Reader If you’re comfortable writing formulas in Excel, you’re ready for this book. About the Author Dr Jonathan Carroll is a data science consultant providing R programming services. He holds a PhD in theoretical physics. We interviewed Jonathan as a part of our Six Questions series. Check it out here. Quotes A useful guide to facilitate graduating from spreadsheets to more serious data wrangling with R. - John D. Lewis, DDN An excellent book to help you understand how stored data can be used. - Hilde Van Gysel, Trebol Engineering A great introduction to a data science programming language. Makes you want to learn more! - Jenice Tom, CVS Health Handy to have when your data spreads beyond a spreadsheet. - Danil Mironov, Luxoft Poland

The road to AI adoption is far more complex than one can imagine. Building data science models and testing them is only one piece of the puzzle. To understand the roadblocks and best practices, Wayne Eckerson invited Nir Kaldero in our latest episode to learn why organizations need to start paying more attention to people, culture and processes to make data science projects a success and how democratization skills pays off in the long run.

Nir Kaldero is the Head of Data Science, Vice President at Galvanize Inc. and the creator of the GalvanizeU Master’s of Science in Data Science program. A tireless advocate for transforming education and reshaping the field of data science, his vision and mission is to make an impact on a wide variety of communities through education, science, and technology. In addition to his work at some of the world’s largest international corporations, Kaldero serves as a Google expert/mentor and has been named an IBM Analytics Champion 2017 & 2018, a prestigious honor given to leaders in the field of science, technology, engineering, and math (STEM).

Send us a text Seth Dobrin is back to kick off season 3 and reflect on data and tech in 2018. Seth Dobrin, vice president and Chief Data Officer of IBM Analytics, gives insight to leading the data science elite team, and he details the steps and strategies required to be successful in the field. Host Al Martin and Seth also make some data science predictions for 2019, letting you know what you should be looking out for in the year ahead.

Shownotes:  00:00 - Check us out on YouTube and SoundCloud.  00:10 - Connect with Producer Steve Moore on LinkedIn and Twitter.  00:15 - Connect with Producer Liam Seston on LinkedIn and Twitter.  00:20 - Connect with Producer Rachit Sharma on LinkedIn.  00:25 - Connect with Host Al Martin on LinkedIn and Twitter.  00:55 – Connect with Seth Dobrin on LinkedIn and Twitter.  02:00 – Seth Dobrin’s first podcast from January 2018.  03:30 - What is data science?  04:25 - Seth Dobrin’s Blog: Don’t let data science become a scam.  10:55 - IBM Data Science Elite Team: Kickstart, build andaccelerate   31:55 - What is AI? 37:58 - What are data pipelines?  41:55 - What is Blockchain? Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Principles of Data Science - Second Edition

Dive into the intricacies of data science with 'Principles of Data Science'. This book takes you on a journey to explore, analyze, and transform data into actionable insights using mathematical models, Python programming, and machine learning concepts. With a clear and engaging style, you will progress from understanding theoretical foundations to implementing advanced techniques in real-world scenarios. What this Book will help me do Master the five critical steps in a practical data science workflow. Clean and prepare raw datasets for accurate machine learning models. Understand and apply statistical models and mathematical principles for data analysis. Build and evaluate predictive models using Python and effective metrics. Create impactful visualizations that clearly convey data insights. Author(s) Sinan Ozdemir is an expert in data science, with a background in developing and teaching advanced courses in machine learning and predictive analytics. With co-authors None Kakade and None Tibaldeschi, they bring years of hands-on experience in data science to this comprehensive guide. Their approach simplifies complex concepts, making them accessible without sacrificing depth, to empower readers to make data-driven decisions confidently. Who is it for? This book is ideal for aspiring data scientists seeking a practical introduction to the field. It's perfect for those with basic math skills looking to apply them to data science or experienced programmers who want to explore the mathematical foundation of data science. A basic understanding of Python programming will be invaluable, but the book builds up core concepts step-by-step, making it accessible to both beginners and experienced professionals.

Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib

Leverage the numerical and mathematical modules in Python and its standard library as well as popular open source numerical Python packages like NumPy, SciPy, FiPy, matplotlib and more. This fully revised edition, updated with the latest details of each package and changes to Jupyter projects, demonstrates how to numerically compute solutions and mathematically model applications in big data, cloud computing, financial engineering, business management and more. Numerical Python, Second Edition, presents many brand-new case study examples of applications in data science and statistics using Python, along with extensions to many previous examples. Each of these demonstrates the power of Python for rapid development and exploratory computing due to its simple and high-level syntax and multiple options for data analysis. After reading this book, readers will be familiar with many computing techniques including array-based and symbolic computing, visualization and numerical file I/O, equation solving, optimization, interpolation and integration, and domain-specific computational problems, such as differential equation solving, data analysis, statistical modeling and machine learning. What You'll Learn Work with vectors and matrices using NumPy Plot and visualize data with Matplotlib Perform data analysis tasks with Pandas and SciPy Review statistical modeling and machine learning with statsmodels and scikit-learn Optimize Python code using Numba and Cython Who This Book Is For Developers who want to understand how to use Python and its related ecosystem for numerical computing.

Send us a text Jason Tatge, CEO, president and cofounder of Farmobile, joins the show to discuss data in the agriculture industry. The conversation touches on Jason's experience launching a startup, tips for finding success, and the value of big data from a farmer's perspective. This episode gives insight to data science for one of the oldest and most important sectors in our society.   

Show Notes

00:00 - Check us out on YouTube and SoundCloud. 00:10 - Connect with producer Liam Seston on LinkedIn and Twitter. 00:15 - Connect with producer Steve Moore on LinkedIn and Twitter. 00:24 - Connect with host Al Martin on LinkedIn and Twitter. 01:20 - Connect with guest Jason Tatge on LinkedIn and Twitter. 04:24 - Get some insights to commodity trading. 10:09 - Check out Farmobile.com. 14:21 - Here are some more reasons why data collection in farming is so important. 22:21 - How data collection in farming is driving greater efficiency. 27:33 - Learn about pipeline entrepreneurs here. Follow @IBMAnalytics Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.