talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

ML in Production – Serverless and Painless by Oliver Gindele

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

The Unbreakable Data Pipeline by Herminio Vazquez

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Designing Robust Processing System With Redis by Paško Pajdek

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

How to Fail in AI Business by Mohammad Hossein Noranian

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Keynote | Embracing #AiFirst Enterprise-Wide by Alex Sanginov

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Join a groundbreaking panel of women leaders in data science and engineering as they unveil actionable strategies for building trustworthy and responsible AI. Dive into the power of diverse data, unmask hidden biases, and discover how these changemakers are shaping a future where technology serves all.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

We talked about:

Anahita's Background Mechanical Engineering and Applied Mechanics Finite Element Analysis vs. Machine Learning Optimization and Semantic Reporting Application of Knowledge Graphs in Research Graphs vs Tabular Data Computational graphs Graph Data Science and Graph Machine Learning Combining Knowledge Graphs and Large Language Models (LLMs) Practical Applications and Projects Challenges and Learnings Anahita’s Recommendations

Links:

GitHub repo: https://github.com/antahiap/ADPT-LRN-PHYS/tree/main

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

In this episode, I’m joined by the remarkably versatile Akshay Swaminathan, a polyglot who speaks 11 languages and has carved a unique path from medicine to data science. Currently an MD-PhD candidate at Stanford, Akshay's work has taken him from building clinics in Bolivia to pushing the boundaries of healthcare through data science. Akshay's journey is not just about his professional achievements but also his personal commitment to continuous learning and making a global impact. His transition from medicine to data science was driven by his desire to leverage technology for social good, particularly in healthcare. We also explore Akshay's book "Winning with Data Science" aimed at business professionals seeking to integrate data science into their operations. In short, Akshay might just be the most interesting person you’ll come across this year. Previous episode: Ultralearning: How to Master Hard Skills and Accelerate Your Career with Scott Young Akshay's website: https://www.akshayswaminathan.com/ Akshay on LinkedIn: https://www.linkedin.com/in/akshay-swaminathan-68286b51/

In this episode of the Data Career Podcast, Avery interviews Ken Jee.

They delve into Ken's unique path into sports analytics, starting from his personal experience as a golfer and his curious inquiry that led to an internship and gradually crafted a niche in sports data science.

✉️ Discover what we wish we knew about landing the dream job

🤖 Data Analytics Answers At Your Finger Tips

Connect with Ken Jee

🤝 Follow on Linkedin

▶️ Ken Jee Official Youtube Channel

▶️ Ken's Nearest Neighbors Podcast

🏀 The Exponential Athlete Podcast

🤝 Ace your data analyst interview with the interview simulator

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(09:54) Deep Dive into Golf Analytics (18:16) Ken's Personal Journey into Sports Analytics (24:49) Breaking into Sports Analytics (29:16) The Power of Networking and Creating Opportunities

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection

APPLYING ARTIFICIAL INTELLIGENCE IN CYBERSECURITY ANALYTICS AND CYBER THREAT DETECTION Comprehensive resource providing strategic defense mechanisms for malware, handling cybercrime, and identifying loopholes using artificial intelligence (AI) and machine learning (ML) Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection is a comprehensive look at state-of-the-art theory and practical guidelines pertaining to the subject, showcasing recent innovations, emerging trends, and concerns as well as applied challenges encountered, and solutions adopted in the fields of cybersecurity using analytics and machine learning. The text clearly explains theoretical aspects, framework, system architecture, analysis and design, implementation, validation, and tools and techniques of data science and machine learning to detect and prevent cyber threats. Using AI and ML approaches, the book offers strategic defense mechanisms for addressing malware, cybercrime, and system vulnerabilities. It also provides tools and techniques that can be applied by professional analysts to safely analyze, debug, and disassemble any malicious software they encounter. With contributions from qualified authors with significant experience in the field, Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection explores topics such as: Cybersecurity tools originating from computational statistics literature and pure mathematics, such as nonparametric probability density estimation, graph-based manifold learning, and topological data analysis Applications of AI to penetration testing, malware, data privacy, intrusion detection system (IDS), and social engineering How AI automation addresses various security challenges in daily workflows and how to perform automated analyses to proactively mitigate threats Offensive technologies grouped together and analyzed at a higher level from both an offensive and defensive standpoint Providing detailed coverage of a rapidly expanding field, Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection is an essential resource for a wide variety of researchers, scientists, and professionals involved in fields that intersect with cybersecurity, artificial intelligence, and machine learning.

Para desvendar os insights do State of Data Brazil 2023, não há ninguém melhor para nos guiar do que aqueles que desempenharam papéis cruciais na condução e acompanhamento desta jornada, nas ultimas edições da pesquisa. 

São essas pessoas, que também desempenharam papeis importantes no desenvolvimento e evolução, desta que é, a mais abrangente pesquisa do cenário de dados do nosso país. 

Neste episódio do Data Hackers — a maior comunidade de AI e Data Science do Brasil-, prepare-se para se juntar a esses especialistas: Felipe Fiamozzini, Expert Associate Partner na Bain & Company ; e dois dos ganhadores das ultimas edições do Challenge State of Data Brazil: Hayala Cavenague e o Luiz Simoes; que abordaram descobertas mais recentes que moldam o panorama da área de dados no Brasil.

Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Caso queira, você também pode ouvir o episódio aqui no post mesmo!

Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Caso queira, você também pode ouvir o episódio aqui no post mesmo!

Conheça nosso convidado:

Felipe Fiamozzini, Expert Associate Partner na Bain & Company Hayala Cavenague, Specialist Data Scientist no Will Bank e Statistics PhD; Luiz Simoes, Data Scientist, na Receita Federal do Brasil.

Nossa Bancada Data Hackers:

Monique Femme — Head of Community Management na Data Hackers Gabriel Lages — Co-founder da Data Hackers e Data & Analytics Sr. Director na Hotmart.

Referências:

Baixe o relatório completo do State of Data Brazil 2023 : https://stateofdata.datahackers.com.br/ Inscreva-se na Newsletter Data Hackers:https://www.datahackers.news/ Bain & Company: https://www.bain.com/pt-br/insights/state-of-data-2023_profissionais_dados/?utm_source=linkedin&utm_medium=post+&utm_campaign=state_of_data_2023

Are LLMs useful for enterprises? Well, what is the use of a large language model that is trained on trillions of tokens but knows little to nothing about your business.

To make LLMs actually useful for enterprises, it is important for them to retrieve company's data effectively. LlamaIndex has been at the forefront of providing such solutions and frameworks to augment LLMs.

In this episode, Jerry Liu, Co-founder and CEO of LlamaIndex, joins Raja Iqbal, CEO and Chief Data Scientist at Data Science Dojo, for a deep dive into the intersection of generative AI, data. and entrepreneurship.

Jerry walks us through the cutting-edge technologies reshaping the generative AI landscape such as LlamaIndex. He also explores Retrieval Augmented Generation (RAG) and fine-tuning in detail, discussing their benefits, trade-offs, use cases, and enterprise adoption, making these complex tools and topics not just easily understandable but also fascinating.

Jerry further ventures into the heart of entrepreneurship, sharing valuable lessons and insights learned along his journey, from navigating his corporate career at tech giants like Apple, Quora, Two Sigma, and Uber, to starting as a founder in the data and AI landscape.

Amidst the excitement of innovation, Raja and Jerry also address the potential risks and considerations with generative AI. They raise thought-provoking questions about its impact on society, for instance, whether we're trading critical thinking for convenience.

Whether you're a generative AI enthusiast, seasoned entrepreneur, or simply curious about the future, this podcast promises plenty of knowledge and insights for you.

Summary

A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm welcoming back Gleb Mezhanskiy to talk about how to reconcile data in database environments

Interview

Introduction How did you get involved in the area of data management? Can you start by outlining some of the situations where reconciling data between databases is needed? What are examples of the error conditions that you are likely to run into when duplicating information between database engines?

When these errors do occur, what are some of the problems that they can cause?

When teams are replicating data between database engines, what are some of the common patterns for managing those flows?

How does that change between continual and one-time replication?

What are some of the steps involved in verifying the integrity of data replication between database engines? If the source or destination isn't a traditional database engine (e.g. data lakehouse) how does that change the work involved in verifying the success of the replication? What are the challenges of validating and reconciling data?

Sheer scale and cost of pulling data out, have to do in-place Performance. Pushing databases to the limit,

Estamos felizes em compartilhar, nossa mais nova parceria com o Itaú no ano de 2024 !! E em colaboração, apresentamos o nosso episódio "Especial: Mulheres na Liderança & Carreira em Dados". 

Este episódio nos permite explorar profundamente tópicos como Transição de Carreira, Visibilidade das Mulheres na Liderança, Desafios Profissionais, Maternidade e Carreira. Ao lado de três mulheres que lideram hoje, áreas e temas técnicos como Computação Quântica, Ciência de Dados e Governança. É um marco emocionante para nós, e estamos ansiosas para você se juntar nesta jornada de descoberta e aprendizado, com essas mulheres incríveis !!

Neste episódio do Data Hackers — a maior comunidade de AI e Data Science do Brasil-, conheçam as lideranças que estão movendo a representatividade de mais mulheres nos times de dados  —  hoje mais de 50% da pessoas do time de dados no Itaú, são mulheres -, são elas : Samuraí Brito — Head of Quantum Technologies, tem vários artigos publicados sobre Computação Quântica; Priscila Ferreira: Superintendente de Governança e Privacidade de Dados; e Veronica Neves — Lead de Data de Science; ambas lideram essas frente no Itaú.

Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Caso queira, você também pode ouvir o episódio aqui no post mesmo!

Conheça nosso convidado:

Priscila Ferreira: Superintendente de Governança de Dados; Samuraí Brito — Head of Quantum Technologies (Arq. IT Specialist II) at Itaú; Veronica Neves — Lead de Ciência de Dados.

Nossa Bancada Data Hackers:

Monique Femme — Head of Community Management na Data Hackers Paulo Vasconcellos — Co-founder da Data Hackers e Principal Data Scientist na Hotmart.

Falamos no episódio:

Baixe o relatório completo do State of Data Brazil 2023 : https://stateofdata.datahackers.com.br/ Inscreva-se na Newsletter Data Hackers:https://www.datahackers.news/ Link artigo Samuraí: https://arxiv.org/abs/1911.05445 Link portal carreiras Itaú: https://carreiras.itau.com.br/tecnologia Vagas — Banco de talentos dados: https://carreiras.itau.com.br/vaga/sao-paulo/faca-sua-carreira-de-dados-no-itau/35299/52511644368

Summary

Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility. In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Alex Merced, developer advocate at Dremio and co-author of the upcoming book from O'reilly, "Apache Iceberg, The definitive Guide", about Nessie, a git-like versioned catalog for data lakes using Apache Iceberg

Interview

Introduction How did you get involved in the area of data management? Can you describe what Nessie is and the story behind it? What are the core problems/complexities that Nessie is designed to solve? The closest analogue to Nessie that I've seen in the ecosystem is LakeFS. What are the features that would lead someone to choose one or the other for a given use case? Why would someone choose Nessie over native table-level branching in the Apache Iceberg spec? How do the versioning capabilities compare to/augment the data versioning in Iceberg? What are some of the sources of, and challenges in resolving, merge conflicts between table branches? Can you describe the architecture of Nessie? How have the design and goals of the project changed since it was first created? What is involved