talk-data.com talk-data.com

Topic

MLOps

machine_learning devops ai

233

tagged

Activity Trend

26 peak/qtr
2020-Q1 2026-Q1

Activities

233 activities · Newest first

We talked about: 

Will’s background Will’s open source projects S3Fs and PyFile systems Inspiration for open source projects Will as a freelancer Starting a company from a tweet (Rich and Textual) Building in public (Will’s approach to social media) The workforce and roadmap of Textualize.io The importance of working on open source for Textualize employees The workflow of and contributions to Textualize Getting your first thousand GitHub Stars (going viral) Suggestions for those who wish to start in the open-source space Finding Will online

Links: 

Twitter: https://twitter.com/willmcgugan Textualize website: https://www.textualize.io/ Textualize GitHub: https://github.com/textualize

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Lisa’s background Centralized org vs decentralized org Hybrid org (centralized/decentralized) Reporting your results in a data organization Planning in a data organization Having all the moving parts work towards the same goals Which approach Twitter follows (centralized vs decentralized) Pros and cons of a decentralized approach Pros and cons of a centralized approach Finding a common language with all the functions of an org Finding the right approach for companies that want to implement data science How many data scientists does a company need? Who do data scientists report huge findings to? The importance of partnering closely with other functions of the org The role of Product Managers in the org and across functions Who does analytics at Twitter (analysts vs data scientists) The importance of goals, objectives and key results Conflicting objectives The importance of research Finding Lisa online

Links:

LinkedIn: https://www.linkedin.com/in/cohenlisa/ Twitter: https://twitter.com/lisafeig Medium: https://medium.com/@lisa_cohen Lisa Cohen's YouTube videos: https://www.youtube.com/playlist?list=PLRhmnnfr2bX7-GAPHzvfUeIEt2iYCbI3w

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Merve’s background Merve’s first contributions to open source What Merve currently does at Hugging Face (Hub, Spaces) What is means to be a developer advocacy engineer at Hugging Face The best way to get open source experience (Google Summer of Code, Hacktoberfest, and sprints) The peculiarities of hiring as it relates to code contributions Best resources to learn about NLP besides Hugging Face Good first projects for NLP The most important topics in NLP right now NLP ML Engineer vs NLP Data Scientist Project recommendations and other advice to catch the eye of recruiters Merve on Twitch and her podcast Finding Merve online Merve and Mario Kart

Links:

Hugging Face Course: https://hf.co/course Natural Language Processing in TensorFlow: https://www.coursera.org/learn/natural-language-processing-tensorflow Github ML Poetry: https://github.com/merveenoyan/ML-poetry Tackling multiple tasks with a single visual language model: https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model Hugging Face big science/TOpp: https://huggingface.co/bigscience/T0pp Pathways Language Model (PaLM) blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

For most ML-based SaaS companies, the need to fulfill each customer’s KPI will usually be addressed by matching a dedicated model. Along with the benefits of optimizing the model’s performance, a model per customer solution carries a heavy production complexity with it. In this manner, incorporating up-to-date data as well as new features and capabilities as part of a model’s retraining process can become a major production bottleneck. In this talk, we will see how Riskified scaled up modeling operations based on MLOps ideas, and focus on how we used Airflow as our ML pipeline orchestrator. We will dive into how we wrap Airflow as an internal service, the goals we started with, the obstacles along the way and finally - how we solved them. You will receive tools for how to set up your own Airflow-based continuous training ML pipeline, and how we adjusted it such that ML engineers and data scientists would be able to collaborate and work in parallel using the same pipeline.

We talked about:

Misra’s background What data scientists do Consultant data scientists vs in-house data scientists (and freelancers) Expectations for data scientists The importance of keeping up to date with AI developments (FOMA) How does DALL·E 2 work and should you care? Going to conferences to stay up to date The most pressing issue for data scientists Fighting FOMA and imposter syndrome Knowing when you have enough knowledge of a framework The “best” type of data scientist Being a generalist vs a specialist Advice for entry-level data entering an oversaturated market Catching the eye of big AI companies Choosing a project for your portfolio The importance of having a Ph.D. or Master’s degree in data science Finding Misra online

Links:

Mısra's YouTube channel: https://www.youtube.com/channel/UCpNUYWW0kiqyh0j5Qy3aU7w Twitter: https://twitter.com/misraturp Hands-on Data Science: Complete Your First Portfolio Project: https://www.soyouwanttobeadatascientist.com/hods 

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.htm

We talked about:

Adrian’s background Freelancing vs Employment Risk and occupancy rate in freelancing The scariest part of freelancing Adrian’s first projects Freelancing 5 years later Pay rates in freelancing Acquiring skills while freelancing Working with recruitment agencies and networking Looking for projects and getting clients Freelancing vs consulting Clarity in clients’ expectations (scope of work) Building your network Freelancing platforms Adrian’s data loading prototype Going from freelancing to making your own product (and other investments) The usefulness of a portfolio Introverts in freelancing Is it possible to work for 3 months a year in freelancing? Choosing projects and skill-building strategy (focusing on interests) Freelancing in Berlin Clients’ expectations for freelancers vs employees Working with more than one client at the same time Adrian’s freelance cooperative on Slack Other advice for novice freelancers (networking) Finding Adrian online

Links:

Github: https://github.com/scale-vector Slack Community: https://join.slack.com/t/berlindatacol-szn7050/shared_invite/zt-19dp8msp0-pP4Av3_fVFBbsdrzPROEAg

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Summary of “Getting a Data Engineering Job” webinar Python and engineering skills  Interview process Behavioral interviews Technical interviews Learning Python and SQL from scratch Is having non-coding experience a disadvantage? Analyst or engineer? Do you need certificates? Do I need a master’s degree? Fully remote data engineering jobs Should I include teaching on my resume? Object-oriented programming for data engineering Python vs Java/Scala SQL and Python technical interview questions GCP certificates Is commercial experience really necessary? From sales to engineering Solution engineers Wrapping up

Links:

Getting a Data Engineering Job (webinar): https://www.youtube.com/watch?v=yvEWG-S1F_M The Flask Mega-Tutorial Part I - Hello, World! blog: https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world Mode SQL Tutorial: https://mode.com/sql-tutorial/

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Daynan’s background Astronomy vs cosmology Applications of data science and machine learning in astronomy Determining signal vs noise What the data looks like in astronomy Determining the features of an object in space Ground truth for space objects Why water is an important resource in the space economy Other useful resources that can be found in asteroids Sources of asteroids The data team at an asteroid mining company Open datasets for hobbyists Mission and hardware design for asteroid mining Partnerships and hires

Links: 

LinkedIn: https://www.linkedin.com/in/daynan/ We're looking for a Sr Data Engineer: https://boards.eu.greenhouse.io/karmanplus/jobs/4027128101?gh_jid=4027128101 Minor Planet Center: https://minorplanetcenter.net/- JPL Horizons has a nice set of APIs for accessing data related to small bodies (including asteroids): https://ssd.jpl.nasa.gov/api.html ESA has NEODyS: https://newton.spacedys.com/neodys   IRSA catalog that contains image and catalog data related to the WISE/NEOWISE data (and other infrared platforms): https://irsa.ipac.caltech.edu/frontpage/ NASA also has an archive of data collected from their various missions, including a node related to small bodies: https://pds-smallbodies.astro.umd.edu/ Sub-node directly related to asteroids: https://sbn.psi.edu/pds/ Size, Mass, and Density of Asteroids (SiMDA) is a nice catalog of observed asteroid attributes (and an indication of how small our sample size is!): https://astro.kretlow.de/?SiMDA The source survey data, several are useful for asteroids: Pan-STARRS (https://outerspace.stsci.edu/display/PANSTARRS)

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Juan’s background Typical problems in marketing that are solved with ML Attribution model Media Mix Model – detecting uplift and channel saturation Changes to privacy regulations and its effect on user tracking User retention and churn prevention A/B testing to detect uplift Statistical approach vs machine learning (setting a benchmark) Does retraining MMM models often improve efficiency? Attribution model baselines Choosing a decay rate for channels (Bayesian linear regression) Learning resource suggestions Bayesian approach vs Frequentist approach Suggestions for creating a marketing department Most challenging problems in marketing The importance of knowing marketing domain knowledge for data scientists Juan’s blog and other learning resources Finding Juan online

Links: 

Juan's PyData talk on uplift modeling: https://youtube.com/watch?v=VWjsi-5yc3w Juan's website: https://juanitorduz.github.io Introduction to Algorithmic Marketing book: https://algorithmic-marketing.online Preventing churn like a bandit: https://www.youtube.com/watch?v=n1uqeBNUlRM

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about: 

Gloria’s background Working with MATLAB, R, C, Python, and SQL Working at ICE Job hunting after the bootcamp Data engineering vs Data science Using Docker Keeping track of job applications, employers and questions Challenges during the job search and transition Concerns over data privacy Challenges with salary negotiation The importance of career coaching and support Skills learned at Spiced Retrospective on Gloria’s transition to data and advice Top skills that helped Gloria get the job Thoughts on cloud platforms Thoughts on bootcamps and courses Spiced graduation project Standing out in a sea of applicants The cohorts at Spiced Conclusion

Links:

LinkedIn: https://www.linkedin.com/in/gloria-quiceno/ Github: https://github.com/gdq12

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Jeff’s background Getting feedback to become a better teacher Going from engineering to teaching Jeff on becoming a curriculum writer Creating a curriculum that reinforces learning Jeff on starting his own data engineering bootcamp Shifting from teaching ML and data science to teaching data engineering Making sure that students get hired Screening bootcamp applicants Knowing when it’s time to apply for jobs The curriculum of JigsawLabs.io The market demand of Spark, Kafka, and Kubernetes (or lack thereof) Advice for data analysts that want to move into data engineering The market demand of ETL/ELT and DBT (or lack thereof) The importance of Python, SQL, and data modeling for data engineering roles Interview expectations How to get started in teaching The challenges of being a one-person company Teaching fundamentals vs the “shiny new stuff” JigsawLabs.io Finding Jeff online

Links: 

Jigsaw Labs: https://www.jigsawlabs.io/free Teaching my mom to code: https://www.youtube.com/watch?v=OfWwfTXGjBM Getting a Data Engineering Job Webinar with Jeff Katz: https://www.eventbrite.de/e/getting-a-data-engineering-job-tickets-310270877547

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Christopher’s background The essence of DataOps Also known as Agile Analytics Operations or DevOps for Data Science Defining processes and automating them (defining “done” and “good”) The balance between heroism and fear (avoiding deferred value) The Lean approach Avoiding silos The 7 steps to DataOps Wanting to become replaceable DataOps is doable Testing tools DataOps vs MLOps The Head Chef at Data Kitchen What’s grilling at Data Kitchen? The DataOps Cookbook

Links:

DataOps Manifesto website: https://dataopsmanifesto.org/en/ DataOps Cookbook: https://dataops.datakitchen.io/pf-cookbook Recipes for DataOps Success: https://dataops.datakitchen.io/pf-recipes-for-dataops-success DataOps Certification Course: https://info.datakitchen.io/training-certification-dataops-fundamentals DataOps Blog: https://datakitchen.io/blog/ DataOps Maturity Model: https://datakitchen.io/dataops-maturity-model/ DataOps Webinars: https://datakitchen.io/webinars/

Join DataTalks.Club: https://datatalks.club/slack.html  

Our events: https://datatalks.club/events.html

Summary Putting machine learning models into production and keeping them there requires investing in well-managed systems to manage the full lifecycle of data cleaning, training, deployment and monitoring. This requires a repeatable and evolvable set of processes to keep it functional. The term MLOps has been coined to encapsulate all of these principles and the broader data community is working to establish a set of best practices and useful guidelines for streamlining adoption. In this episode Demetrios Brinkmann and David Aponte share their perspectives on this rapidly changing space and what they have learned from their work building the MLOps community through blog posts, podcasts, and discussion forums.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer-friendly data catalog for the modern data stack. Open Source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga and others. Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Your host is Tobias Macey and today I’m interviewing Demetrios Brinkmann and David Aponte about what you need to know about MLOps as a data engineer

Interview

Introduction How did you get involved in the area of data management? Can you describe what MLOps is?

How does it relate to DataOps? DevOps? (is it just another buzzword?)

What is your interest and involvement in the space of MLOps? What are the open and active questions in the MLOps community? Who is responsible for MLOps in an organization?

What is the role of the data engineer in that process?

What are the core capabilities that are necessary to support an "MLOps" workflow? How do the current platform technologies support the adoption of MLOps workflows?

What are the areas that are currently underdeveloped/underserved?

Can you describe the technical and organizational design/architecture decisions that need to be made when endeavoring to adopt MLOps practices? What are some of the common requirements for supporting ML workflows?

What are some of the ways that requirements become bespoke to a given organization or project?

What are the opportunities for standardization or consolidation in the tooling for MLOps?

What are the pieces that are always going to require custom engineering?

What are the most interesting, innovative, or unexpected approaches to MLOps workflows/platforms that you have seen? What are the most interesting, unexpected, or challenging lessons that you

Chegou aquele momento do ano em que vamos bancar a Mãe Dináh e fazer nossas previsões sobre o que achamos que será tendência na área de dados como um todo. Falamos sobre MLOps, Analytics Engineer, altos salários e até eleições! E é claro que convidamos novamente os Community Managers do Data Hackers para essa conversa: Marlesson Santana, Pietro Oliveira e Mario Filho!

Acesse nosso post no Medium para ter acesso aos links das referências: https://medium.com/data-hackers/tend%C3%AAncias-para-dados-e-ai-em-2022-data-hackers-podcast-51-384c0554a4a2

We don't have a new episode this week, but we have an amazing conversation with Sejal Vaidya from August

We talked about

Sejal's background Why transitioning to ML engineering Three phases of development of a project Why data engineers should get involved in ML Technologies Tips for people who want to transition Soft skills and understanding requirements Helpful resources

Resources:

ML checklist (https://twolodzko.github.io/ml-checklist.html) Machine Learning Bookcamp (https://mlbookcamp.com/) Made with ML course (https://madewithml.com) Full-stack deep learning (https://fullstackdeeplearning.com) Newsletters: mlinproduction, huyenchip.com, jeremyjordan.me, mihaileric.com Sejal's "Production ML" twitter list (https://twitter.com/i/lists/1212819218959351809)

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Data science and machine learning are continuing to evolve as core capabilities across many industries. But high-quality data science output is only half the story. As the data science profession matures from “back office support” to leading from the front, there is an increasing need for more integrated systems that plug into business operations. To get the most out of these capabilities, organisations must move beyond just building robust models, and establish operational processes that can produce, implement and maintain machine learning systems at scale. Enter MLOps. To understand the fundamentals and best practices of MLOps, I recently spoke to Shalini Kurapati who is CEO of Clearbox.ai. Clearbox AI is the data-centric MLOps company that enables trustworthy and human-centred AI. Their AI Control Room automatically produces synthetic data and insights to solve the issues related to data quality, data access and sharing, and privacy aspects that block AI adoption in companies. In this episode of Leaders of Analytics, we cover: What MLOps is and why we need it to succeed with advanced data science solutionsHow to get beyond the proof-of-concept-to-production gap and get models into operationThe importance of data-centric AI in building MLOps best practicesThe most common AI pitfalls to avoidHow Human Centred Design principles can be used to build AI for good, and much more.Check out Clearbox here: https://clearbox.ai/ Connect with Shalini here: https://www.linkedin.com/in/shalini-kurapati-phd-she-her-06516324/

In this episode of DataFramed, we speak with Noah Gift, founder of Pragmatic AI Labs and prolific author about operationalizing machine learning in organizations and his new book Practical MLOPs. 

Throughout the episode, Noah discusses his background, his philosophy around pragmatic AI, the differences between data science in academia and the real world, how data scientists can become more action-oriented by creating solutions that solve real-world problems, the importance of dev-ops, his most recent book on the practical guide to MLOps, how data science can be compared to Brazilian jiu-jitsu, what data scientists should learn to scale the amount of value they deliver, his thoughts on auto-ml and automation, and more. 

Relevant links from the interview:

We’d love your feedback! Let us know which topics you’d like us to cover and what you think of DataFramed by answering this 30-second surveyUnsettled: What Climate Science Tells Us, What It Doesn't, and Why It MattersCheck out Noah's booksCheck out Noah's course on DataCampConnect with Noah on LinkedInGain access to DataCamp's full course library at a discount!

Summary Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For businesses that are working in the data management and analytics space they recognized the efforts of Timbr.ai, Soda Data, Nexla, and Tada. In this episode the founders and leaders of each of these organizations share their perspective on the current state of the market, and the challenges facing businesses and data professionals today.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Have you ever had to develop ad-hoc solutions for security, privacy, and compliance requirements? Are you spending too much of your engineering resources on creating database views, configuring database permissions, and manually granting and revoking access to sensitive data? Satori has built the first DataSecOps Platform that streamlines data access and security. Satori’s DataSecOps automates data access controls, permissions, and masking for all major data platforms such as Snowflake, Redshift and SQL Server and even delegates data access management to business users, helping you move your organization from default data access to need-to-know access. Go to dataengineeringpodcast.com/satori today and get a $5K credit for your next Satori subscription. Your host is Tobias Macey and today I’m interviewing Saket Saurabh, Maarten Masschelein, Akshay Deshpande, and Dan Weitzner about the challenges facing data practitioners today and the solutions that are being brought to market for addressing them, as well as the work they are doing that got them recognized as "cool vendors" by Gartner.

Interview

Introduction How did you get involved in the area of data management? Can you each describe what you view as the biggest challenge facing data professionals? Who are you building your solutions for and what are the most common data management problems are you all solving? What are different components of Data Management and why is it so complex? What will simplify this process, if any? The report covers a lot of new data management terminology – data governance, data observability, data fabric, data mesh, DataOps, MLOps, AIOps – what does this all mean and why is it important for data engineers? How has the data management space changed in recent times? Describe the current data management landscape and any key developments. From your perspective, what are the biggest challenges in the data management space today? What modern data management features are lacking in existing databases? Gartner imagines a future where data and analytics leaders need to be prepared to rely on data manage

In this episode of DataFramed, Adel speaks with Alessya Visnjic, CEO and co-founder of WhyLabs,  an AI Observability company on a mission to build the interface between AI and human operators. Throughout the episode, Alessya talks about the unique challenges data teams face when operationalizing machine learning that spurred the need for MLOps, how MLOps intersects and diverges with different terms such as DataOps, ModelOps, and AIOps, how and when organizations should get started on their MLOps journey, the most important components of a successful MLOps practice, and more. 

Relevant links from the interview:

Connect with Alessya on LinkedInAndrew Ng on the important of being data-centricJoe Reis on the data culture and all things datawhylogs: the standard for data logging — please send you feedback, contribute, help us build integrations into your favorite data tools and extend the concept of logging to new data types. Join the effort of building a new open standard for data logging!Try the WhyLabs platform