talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (199 results)

See all 199 →

Activities & events

Title & Speakers Event
Stefan Papp – author

Maximize your portfolio, analyze markets, and make data-driven investment decisions using Python and generative AI. Investing for Programmers shows you how you can turn your existing skills as a programmer into a knack for making sharper investment choices. You’ll learn how to use the Python ecosystem, modern analytic methods, and cutting-edge AI tools to make better decisions and improve the odds of long-term financial success. In Investing for Programmers you’ll learn how to: Build stock analysis tools and predictive models Identify market-beating investment opportunities Design and evaluate algorithmic trading strategies Use AI to automate investment research Analyze market sentiments with media data mining In Investing for Programmers you'll learn the basics of financial investment as you conduct real market analysis, connect with trading APIs to automate buy-sell, and develop a systematic approach to risk management. Don’t worry—there’s no dodgy financial advice or flimsy get-rich-quick schemes. Real-life examples help you build your own intuition about financial markets, and make better decisions for retirement, financial independence, and getting more from your hard-earned money. About the Technology A programmer has a unique edge when it comes to investing. Using open-source Python libraries and AI tools, you can perform sophisticated analysis normally reserved for expensive financial professionals. This book guides you step-by-step through building your own stock analysis tools, forecasting models, and more so you can make smart, data-driven investment decisions. About the Book Investing for Programmers shows you how to analyze investment opportunities using Python and machine learning. In this easy-to-read handbook, experienced algorithmic investor Stefan Papp shows you how to use Pandas, NumPy, and Matplotlib to dissect stock market data, uncover patterns, and build your own trading models. You’ll also discover how to use AI agents and LLMs to enhance your financial research and decision-making process. What's Inside Build stock analysis tools and predictive models Design algorithmic trading strategies Use AI to automate investment research Analyze market sentiment with media data mining About the Reader For professional and hobbyist Python programmers with basic personal finance experience. About the Author Stefan Papp combines 20 years of investment experience in stocks, cryptocurrency, and bonds with decades of work as a data engineer, architect, and software consultant. Quotes Especially valuable for anyone looking to improve their investing. - Armen Kherlopian, Covenant Venture Capital A great breadth of topics—from basic finance concepts to cutting-edge technology. - Ilya Kipnis, Quantstrat Trader A top tip for people who want to leverage development skills to improve their investment possibilities. - Michael Zambiasi, Raiffeisen Digital Bank Brilliantly bridges the worlds of coding and finance. - Thomas Wiecki, PyMC Labs

data data-science data-science-tools Pandas AI/ML API GenAI LLM Matplotlib NumPy Python
O'Reilly Data Science Books

Digital mapping has long been dominated by commercial providers, creating barriers of cost, complexity, and privacy concerns. This talk introduces Protomaps, an open-source project that reimagines how web maps are delivered and consumed. Using the innovative PMTiles format – a single-file approach to vector tiles – Protomaps eliminates complex server infrastructure while reducing bandwidth usage and improving performance. We'll explore how this technology democratizes cartography by making self-hosted maps accessible without API keys, usage quotas, or recurring costs. The presentation will demonstrate implementations with Leaflet and MapLibre, showcase customization options, and highlight cases where Protomaps enables privacy-conscious, offline-capable mapping solutions. Discover how this technology puts mapping control back in the hands of developers while maintaining the rich experiences modern applications demand.

API
PyData Berlin 2025

The SciPy Proceedings (https://proceedings.scipy.org) have long served as a cornerstone for publishing research in the scientific python community; with over 330 peer-reviewed articles being published over the last 17 years. In 2024, the SciPy Proceedings underwent a significant transformation, adopting MyST Markdown (https://mystmd.org) and Curvenote (https://curvenote.com) to enhance accessibility, interactivity, and reproducibility — including publishing of Jupyter Notebooks. The new proceedings articles are web-first, providing features such as deep-dive links for cross-references and previews of GItHub content, interactive 3D visualizations, and rich-rendering of Jupyter Notebooks. In this talk, we will (1) present the new authoring & reading capabilities introduced in 2024; (2) highlight connections to prominent open-science initiatives and their impact on advancing computational research publishing; and (3) demonstrate the underlying technologies and how they enhance integrations with SciPy packages and how to use these tools in your own communication workflows.

Our presentation will give an overview of the revised authoring process for SciPy Proceedings; how we improve metadata standards in a similar way to code-linting and continuous integration; and the integration of live previews of the articles, including auto-generated PDFs and JATS XML (a standard used in scientific publishing). The peer-review process for the proceedings currently happens using GitHub’s peer-review commenting in a similar fashion to the Journal of Open Source Software; we will demonstrate this process as well as showcase opportunities for working with distributed review services such as PREreview (https://prereview.org). The open publishing pipeline has streamlined the submission, review, and revision processes while maintaining high scientific quality and improving the completeness of scholarly metadata. Finally, we will present how this work connects into other high-profile scientific publishing initiatives that have incorporated Jupyter Notebooks and live computational figures as well as interactive displays of large-scale data. These initiatives include Notebooks Now! by the American Geophysical Union, which is focusing on ensuring that Jupyter Notebooks can be properly integrated into the scholarly record; and the Microscopy Society of America’s work on interactive publishing and publishing of large-scale microscopy data with interactive visualizations. These initiatives and the SciPy Proceedings are enabled by recent improvements in open-source tools including MyST Markdown, JupyterLab, BinderHub, and Curvenote, which enable new ways to share executable research content. These initiatives collectively aim to improve both the reproducibility, interactivity, and the accessibility of research by providing improved connections between data, software and narrative research articles.

By embracing open science principles and modern technologies, the SciPy Proceedings exemplify how computational research can be more transparent, reproducible, and accessible. The shift to computational publishing, especially in the context of the scientific python community, opens new opportunities for researchers to publish not only their final results but also the computational workflows, datasets, and interactive visualizations that underpin them. This transformation aligns with broader efforts in open science infrastructure, such as integrating persistent identifiers (DOIs, ORCID, ROR), and adopting FAIR (Findable, Accessible, Interoperable, Reusable) principles for computational content. Building on these foundations, as well as open tools like MyST Markdown and Curvenote, provides a scalable model for open scientific publishing that bridges the gap between computational research and scholarly communication, fostering a more collaborative, iterative, and continuous approach to scientific knowledge dissemination.

CI/CD GitHub Python SciPy XML
SciPy 2025
DB Tsai – Senior Engineering Manager @ Databricks , Xiao Li – Engineering Director @ Databricks

Apache Spark has long been recognized as the leading open-source unified analytics engine, combining a simple yet powerful API with a rich ecosystem and top-notch performance. In the upcoming Spark 4.1 release, the community reimagines Spark to excel at both massive cluster deployments and local laptop development. We’ll start with new single-node optimizations that make PySpark even more efficient for smaller datasets. Next, we’ll delve into a major “Pythonizing” overhaul — simpler installation, clearer error messages and Pythonic APIs. On the ETL side, we’ll explore greater data source flexibility (including the simplified Python Data Source API) and a thriving UDF ecosystem. We’ll also highlight enhanced support for real-time use cases, built-in data quality checks and the expanding Spark Connect ecosystem — bridging local workflows with fully distributed execution. Don’t miss this chance to see Spark’s next chapter!

Analytics API Data Quality ETL/ELT PySpark Python Spark
Data + AI Summit 2025

Today, I'm chatting with Stuart Winter-Tear about AI product management. We're getting into the nitty-gritty of what it takes to build and launch LLM-powered products for the commercial market that actually produce value. Among other things in this rich conversation, Stuart surprised me with the level of importance he believes UX has in making LLM-powered products successful, even for technical audiences.

After spending significant time on the forefront of AI’s breakthroughs, Stuart believes many of the products we’re seeing today are the result of FOMO above all else. He shares a belief that I’ve emphasized time and time again on the podcast–product is about the problem, not the solution. This design philosophy has informed Staurt’s 20-plus year-long career, and it is pivotal to understanding how to best use AI to build products that meet users’ needs.

Highlights/ Skip to 

Why Stuart was asked to speak to the House of Lords about AI (2:04) The LLM-powered products has Stuart been building recently (4:20) Finding product-market fit with AI products (7:44) Lessons Stuart has learned over the past two years working with LLM-power products (10:54)  Figuring out how to build user trust in your AI products (14:40) The differences between being a digital product manager vs. AI product manager (18:13) Who is best suited for an AI product management role (25:42) Why Stuart thinks user experience matters greatly with AI products (32:18) The formula needed to create a business-viable AI product (38:22)  Stuart describes the skills and roles he thinks are essential in an AI product team and who he brings on first (50:53) Conversations that need to be had with academics and data scientists when building AI-powered products (54:04) Final thoughts from Stuart and where you can find more from him (58:07)

Quotes from Today’s Episode

“I think that the core dream with GenAI is getting data out of IT hands and back to the business. Finding a way to overlay all this disparate, unstructured data and [translate it] to the human language is revolutionary. We’re finding industries that you would think were more conservative (i.e. medical, legal, etc.) are probably the most interested because of the large volumes of unstructured data they have to deal with. People wouldn’t expect large language models to be used for fact-checking… they’re actually very powerful, especially if you can have your own proprietary data or pipelines. Same with security–although large language models introduce a terrifying amount of security problems, they can also be used in reverse to augment security. There’s a lovely contradiction with this technology that I do enjoy.” - Stuart Winter-Tear (5:58) “[LLM-powered products] gave me the wow factor, and I think that’s part of what’s caused the problem. If we focus on technology, we build more technology, but if we focus on business and customers, we’re probably going to end up with more business and customers. This is why we end up with so many products that are effectively solutions in search of problems. We’re in this rush and [these products] are [based on] FOMO. We’re leaving behind what we understood about [building] products—as if [an LLM-powered product] is a special piece of technology. It’s not. It’s another piece of technology. [Designers] should look at this technology from the prism of the business and from the prism of the problem. We love to solutionize, but is the problem the problem? What’s the context of the problem? What’s the problem under the problem? Is this problem worth solving, and is GenAI a desirable way to solve it? We’re putting the cart before the horse.” - Stuart Winter-Tear (11:11) “[LLM-powered products] feel most amazing when you’re not a domain expert in whatever you’re using it for. I’ll give you an example: I’m terrible at coding. When I got my hands on Cursor, I felt like a superhero. It was unbelievable what I could build. Although [LLM products] look most amazing in the hands of non-experts, it’s actually most powerful in the hands of experts who do understand the domain they’re using this technology. Perhaps I want to do a product strategy, so I ask [the product] for some assistance, and it can get me 70% of the way there. [LLM products] are great as a jumping off point… but ultimately [they are] only powerful because I have certain domain expertise.” - Stuart Winter-Tear (13:01) “We’re so used to the digital paradigm. The deterministic nature of you put in X, you get out Y; it’s the same every time. Probabilistic changes every time. There is a huge difference between what results you might be getting in the lab compared to what happens in the real world. You effectively find yourself building [AI products] live, and in order to do that, you need good communities and good feedback available to you. You need these fast feedback loops. From a pure product management perspective, we used to just have the [engineering] timeline… Now, we have [the data research timeline]. If you’re dealing with cutting-edge products, you’ve got these two timelines that you’re trying to put together, and the data research one is very unpredictable. It’s the nature of research. We don’t necessarily know when we’re going to get to where we want to be.” - Stuart Winter-Tear (22:25) “I believe that UX will become the #1 priority for large language model products. I firmly believe whoever wins in UX will win in this large language model product world.  I’m against fully autonomous agents without human intervention for knowledge work. We need that human in the loop. What was the intent of the user? How do we get that right push back from the large language model to understand even the level of the person that they’re dealing with? These are fundamental UX problems that are going to push UX to the forefront… This is going to be on UX to educate the user, to be able to inject the user in at the right time to be able to make this stuff work. The UX folk who do figure this out are going to create the breakthrough and create the mass adoption.” - Stuart Winter-Tear (33:42)

AI/ML GenAI LLM Cyber Security
Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design)
Viktor Kessler – Co-founder @ Vakmo , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Viktor Kessler, co-founder of Vakmo, talks about the architectural patterns in the lake house enabled by a fast and feature-rich Iceberg catalog. Viktor shares his journey from data warehouses to developing the open-source project, Lakekeeper, an Apache Iceberg REST catalog written in Rust that facilitates building lake houses with essential components like storage, compute, and catalog management. He discusses the importance of metadata in making data actionable, the evolution of data catalogs, and the challenges and innovations in the space, including integration with OpenFGA for fine-grained access control and managing data across formats and compute engines.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Viktor Kessler about architectural patterns in the lakehouse that are unlocked by a fast and feature-rich Iceberg catalogInterview IntroductionHow did you get involved in the area of data management?Can you describe what LakeKeeper is and the story behind it? What is the core of the problem that you are addressing?There has been a lot of activity in the catalog space recently. What are the driving forces that have highlighted the need for a better metadata catalog in the data lake/distributed data ecosystem?How would you characterize the feature sets/problem spaces that different entrants are focused on addressing?Iceberg as a table format has gained a lot of attention and adoption across the data ecosystem. The REST catalog format has opened the door for numerous implementations. What are the opportunities for innovation and improving user experience in that space?What is the role of the catalog in managing security and governance? (AuthZ, auditing, etc.)What are the channels for propagating identity and permissions to compute engines? (how do you avoid head-scratching about permission denied situations)Can you describe how LakeKeeper is implemented?How have the design and goals of the project changed since you first started working on it?For someone who has an existing set of Iceberg tables and catalog, what does the migration process look like?What new workflows or capabilities does LakeKeeper enable for data teams using Iceberg tables across one or more compute frameworks?What are the most interesting, innovative, or unexpected ways that you have seen LakeKeeper used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on LakeKeeper?When is LakeKeeper the wrong choice?What do you have planned for the future of LakeKeeper?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links LakeKeeperSAPMicrosoft AccessMicrosoft ExcelApache IcebergPodcast EpisodeIceberg REST CatalogPyIcebergSparkTrinoDremioHive MetastoreHadoopNATSPolarsDuckDBPodcast EpisodeDataFusionAtlanPodcast EpisodeOpen MetadataPodcast EpisodeApache AtlasOpenFGAHudiPodcast EpisodeDelta LakePodcast EpisodeLance Table FormatPodcast EpisodeUnity CatalogPolaris CatalogApache GravitinoPodcast Episode KeycloakOpen Policy Agent (OPA)Apache RangerApache NiFiThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Data Engineering Data Lake Data Lakehouse Data Management Datafold Iceberg Python Rust Cyber Security
Data Engineering Podcast

Join us for an Apache Kafka® Meetup on Monday, April 14th from 6:00pm hosted by Elaia!

Elaia is a full stack tech and deep tech investor. We partner with ambitious entrepreneurs from inception to leadership, helping them navigate the future and the unknown. For over twenty years, we have combined deep scientific and technological expertise with decades of operational experience to back those building tomorrow. From our offices in Paris, Barcelona and Tel Aviv, we have been active partners with over 100 startups including Criteo, Mirakl, Shift Technology, Aqemia and Alice & Bob.

📍Venue: Elaia 21 Rue d'Uzès, 75002 Paris, France

IF YOU RSVP HERE, YOU DO NOT NEED TO RSVP @ Paris Apache Kafka® Meetup group.

🗓 Agenda:

  • 6:00pm: Doors Open/Welcome, Drinks
  • 6:15pm - 7:00pm: Roman Kolesnev, Principal Software Engineer, Streambased
  • 7:00pm - 7:45pm: Viktor Gamov, Principal Developer Advocate, Confluent
  • 7:45pm - 8:30pm: Food, Additional Q&A, Networking

💡 Speaker One: Roman Kolesnev, Principal Software Engineer, Streambased

Talk: Melting Icebergs: Enabling Analytical Access to Kafka Data through Iceberg Projections

Abstract: An organisation's data has traditionally been split between the operational estate, for daily business operations, and the analytical estate for after-the-fact analysis and reporting. The journey from one side to the other is today a long and torturous one. But does it have to be? In the modern data stack Apache Kafka is your defacto standard operational platform and Apache Iceberg has emerged as the champion of table formats to power analytical applications. Can we leverage the best of Iceberg and Kafka to create a powerful solution greater than the sum of its parts?

Yes you can and we did!

This isn't a typical story of connectors, ELT, and separate data stores. We've developed an advanced projection of Kafka data in an Iceberg-compatible format, allowing direct access from warehouses and analytical tools.

In this talk, we'll cover: * How we presented Kafka data for Iceberg processors without moving or transforming data upfront—no hidden ETL! * Integrating Kafka's ecosystem into Iceberg, leveraging Schema Registry, consumer groups, and more. * Meeting Iceberg's performance and cost reduction expectations while sourcing data directly from Kafka. Expect a technical deep dive into the protocols, formats, and services we used, all while staying true to our core principles: * Kafka as the single source of truth—no separate stores. * Analytical processors shouldn't need Kafka-specific adjustments. * Operational performance must remain uncompromised. * Kafka's mature ecosystem features, like ACLs and quotas, should be reused, not reinvented. Join us for a thrilling account of the highs and lows of merging two data giants and stay tuned for the surprise twist at the end!

Bio: Roman is a Principal Software Engineer at Streambased. His experience includes building business critical event streaming applications and distributed systems in the financial and technology sectors.

💡 Speaker Two: Viktor Gamov, Principal Developer Advocate, Confluent

One Does Not Simply Query a Stream

Abstract: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures. In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

Bio: Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. . With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available. As a Java Champion and an esteemed speaker, Viktor is known for his insightful presentations at top industry events like JavaOne, Devoxx, Kafka Summit, and QCon. His expertise spans distributed systems, real-time data streaming, JVM, and DevOps.

Viktor has co-authored "Enterprise Web Development" from O'Reilly and "Apache Kafka® in Action" from Manning.

Follow Viktor on X - @gamussa to stay updated with Viktor's latest thoughts on technology, his gym and food adventures, and insights into open-source and developer advocacy.

*** DISCLAIMER We cannot cater to those under the age of 18. If you would like to speak at / host a future meetup, please reach out to [email protected]

IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup

Qlik Community, It has been too long since we had a chance to get together and have fun together while expanding our rich community of Qlik users and developers. We will have our next Meetup on Tuesday, December 3rd from 6:00pm-9:00pm at Top Golf in King of Prussia PA.

Top Golf - Qlik Meet Up (Join Us)

Jupyter based environments are getting a lot of traction for teaching computing, programming, and data sciences. The narrative structure of notebooks has indeed proven its value for guiding each student at it's own pace to the discovery and understanding of new concepts or new idioms (e.g. how do I extract a column in pandas?). But then these new pieces of knowledge tend to quickly fade out and be forgotten. Indeed long term acquisition of knowledge and skills takes reinforcement by repetition. This is the foundation of many online learning platforms like Webwork or WIMS that offer exercises with randomization and automatic feedback. And of popular "AI-powered" apps -- e.g. to learn foreign languages -- that use spaced repetition algorithms designed by educational and neuro sciences to deliver just the right amount of repetition.

What if you could author such exercizes as notebooks, to benefit from everything that Jupyter can offer (think rich narratives, computations, visualization, interactions)? What if you could integrate such exercises right into your Jupyter based course? What if a learner could get personalized exercise recommandations based on their past learning records, without having to give away these sensitive pieces of information away?

That's Jupylates (work in progress). And thanks to the open source scientific stack, it's just a small Jupyter extension.

AI/ML GitLab Pandas
PyData Paris 2024

In collaboration with Thoughtworks and with three very sophisticated talks on our stage, Qase is presenting their first Meetup on Quality Engineering in Berlin. For more information on Qase check out their homepage here.

ABOUT THE TALKS

E2E tests in js-rich web-applications by Olga Trofimova Have you struggled with writing end-to-end cypress tests for javascript-heavy web-applications? Have you seen a pipeline of tests being executed for hours? Would you even be writing e2e tests if they were to execute for so long? In this talk, I will share how I explored the complexity of writing e2e tests for js-heavy web-apps, the challenges I faced, and how I came to the conclusion that quality assurance is our shared responsibility with the developers.

Addressing cognitive biases and mental sets in quality assurance by Vitaly Sharovatov Have you ever heard statements like “all big companies hire SDETs and so should we”? Or perhaps you've heard people say “the more automated tests we have, the better the quality is”. I have, and I believe that our cognitive biases and mental sets provoke such beliefs. This talk delves into how cognitive biases, such as magical thinking and the 'Post hoc ergo propter hoc' fallacy, influence our decision-making processes, especially in the field of Software Quality Assurance. Additionally, I introduce strategies to overcome these biases, beginning with the fundamentals of rational decision making.

Breaking Boundaries with Advanced Kotlin Testing Techniques by Pasha Finkelshteyn In this session, you'll learn the ins and outs of testing in Kotlin. I'll start with the basics, covering the current widespread problems of tests. But don't worry; I won't bore you with endless slides full of code snippets. Instead, it will mostly be a live coding session with real-life examples. I'll also explore advanced testing techniques, such as property testing and organizing your tests into a hierarchical structure, and show you how to put them into practice. We will use Kotest, MockK, Atrium and more! Whether you're a seasoned tester, developer, or newcomer to the testing field, you'll come away from this talk with a better understanding of testing with Kotlin and, hopefully, a smile.

------ ABOUT THE SPEAKERS

Olga Trofimova Olga Trofimova is a QA Engineer and Enthusiast with 5+ years of professional experience. She works with both manual and automation testing and builds processes from scratch. Olga founded the Berlin QA Community to collect people in love with quality assurance in one place. Her passion is building and setting up processes, testing not only from the perspective of users but also business. As a coach and speaker, Olga is always happy to share her insights with others and be helpful. LinkedIn: https://www.linkedin.com/in/olga-tro-fimova/

Vitaly Sharovatov Vitaly launched his career as a system administrator in 2001, overseeing networks and Active Directory. By 2004, he had transitioned to web development, acquiring skills in JavaScript, C#, and PHP, and he began blogging about web development topics. In 2007, Vitaly started leading teams. His passion lies in andragogy and mentorship, having mentored over 40 developers and 25 engineering managers. As a quality enthusiast, Vitaly believes that individuals should take pride in their work and companies should strive to produce quality products. LinkedIn: https://www.linkedin.com/in/vsharovatov/

Pasha Finkelshteyn Years of experience made Pasha know IT through and through, and data is the thing he fell in love with. For several years, Pasha worked at JetBrains as a developer advocate for Kotlin and Big Data. He likes speaking on this topic and helping people to better understand them. Father of three boys, likes board games and TV Shows :) LinkedIn: https://www.linkedin.com/in/asm0dey/

------

ON THE AGENDA 18:30 Welcome & Snacks 19:00 Stage Time & Q&A 21:00 Networking 22:00 See you next time! ------

Code of Conduct We adhere to the Berlin Code of Conduct to ensure a welcoming and respectful environment for all participants. The event space operates under largely compatible Thoughtworks Meetups & Events CoC.

Accessibility The Location is accessible for wheelchair users. This includes the entrance (no steps to get into the location), toilets and the stage.

Thoughtworks & Qase on Software Quality #1

This time we will touch upon the potential of LLMs and explore the art of knowledge retrieval in the AI era. Our first presentation delves into the capabilities of LLMs, showcasing their inherent knowledge repository and customized knowledge retrieval mechanisms. In our second presentation, we will dive into how Netlight leverages the power of AI and Python to tap into the collective knowledge of their organization, bringing immense value to clients.

Don't miss this opportunity to gain insights into cutting-edge AI applications and real-world success stories.

--

Agenda:

17:30 - 18:00: Doors open 18:00 - 18:10: Welcome 18:10 - 18:40: Unraveling the Potential: An Insight into Large Language Models and Knowledge Retrieval 18:40 - 19:10: Pizza & Refreshments 19:10 - 19:40: Unlocking Collective Knowledge: Netlight's AI-Powered Journey 19:40 - 20:30: Networking

--

Presentations:

Unraveling the Potential: An Insight into Large Language Models and Knowledge Retrieval Keven(Qi) Wang - Meno Data

Since the advent of ChatGPT, Large Language Models (LLM) have witnessed burgeoning interest, cementing their position as pivotal tools in addressing numerous contemporary challenges. In this presentation, we will delve into three primary capabilities of LLMs:

  • Inherent Knowledge Repository: LLMs are fortified with vast amounts of pre-processed data. Users can swiftly tap into this reservoir of information via chatbot interfaces.
  • Customized Knowledge Retrieval: This forms the crux of our discourse today. With the aid of tools like LangChain, vector databases, and embedding models, we will demonstrate how to construct a chatbot tailored to your unique knowledge base, capitalizing on LLM's summarization prowess.
  • Reasoning Mechanism: We will briefly explore this aspect, showcasing how LLMs dissect intricate issues to present actionable strategies. Further, we'll illustrate how platforms like LangChain can cohesively interlink these insights.

Join us as we navigate the expansive universe of LLMs and their transformative potential.

Speaker Bio: Keven (Qi) Wang, a luminary in artificial intelligence and data science, boasts a rich tapestry of leadership roles across diverse industries. As the former CTO of Vionlabs, Keven spearheaded the vision for groundbreaking deep learning AI solutions in the media and entertainment space. At Curb Food, he established and scaled an engineering ecosystem, emphasizing data-driven decision-making. As H&M Group's Head of Machine Learning, Keven championed the growth of its global AI division, earning accolades for presenting pioneering AI strategies at esteemed tech summits.

Unlocking Collective Knowledge: Netlight's AI-Powered Journey Jacob Hagstedt - Netlight

Jacob Hagstedt, a senior Netlight consultant from Stockholm, will talk about how we at Netlight use the power of AI and Python to harness the collective knowledge of Netlight to bring value to our clients in the best possible way. Jacob will take you on a journey that started many years ago with simple word2vec vectorisation of slack questions as a naive baseline within our company up until today when the tool has created an internal eco-system of AI tooling working from within Slack to empower all of Netlights consultants out in the world, to bring value to our clients both via consultants, but also via our sales team..

Speaker Bio: Jacob is a ML/Data/Infra Engineer with a long history of language models. He has used his expertise in areas like E-Commerce, Private Equity / FinTech, Industrial Tech, and more. His experience with Python started back in 2009 and he hasn’t stopped since. If you hear the term "If it quacks like a duck, it is probably a duck" in the Netlight office you know Jacob is close. Before Jacob joined Netlight he started 2 startups, one within NLP & NLG , and the second one within Industrial Tech.

About the event

Date: October 26th, 17:30 - 20:30 Location: Gallerian (Regeringsgatan 25, 111 53 Stockholm) Directions: 3-minute walk from T-Centralen (Sergels Torg exit). Tickets: Sign up required. Anyone who is not on the list will not get in. The event is free of charge. Capacity: Space is limited to 100 participants. If you are signed up but unable to attend, please let us know by October 25th. Food and drinks: Pizza & drinks will be provided. If you have any allergies, please contact us. Questions: Please contact the meetup organizers.

Code of Conduct

The NumFOCUS Code of Conduct applies to this event; please familiarize yourself with it before attending. If you have any questions or concerns regarding the Code of Conduct, please contact the organizers.

Maximizing AI for Collective Knowledge

This time we will touch upon the potential of LLMs and explore the art of knowledge retrieval in the AI era. Our first presentation delves into the capabilities of LLMs, showcasing their inherent knowledge repository and customized knowledge retrieval mechanisms. In our second presentation, we will dive into how Netlight leverages the power of AI and Python to tap into the collective knowledge of their organization, bringing immense value to clients.

Don't miss this opportunity to gain insights into cutting-edge AI applications and real-world success stories.

--

Agenda:

17:30 - 18:00: Doors open 18:00 - 18:10: Welcome 18:10 - 18:40: Unraveling the Potential: An Insight into Large Language Models and Knowledge Retrieval 18:40 - 19:10: Pizza & Refreshments 19:10 - 19:40: Unlocking Collective Knowledge: Netlight's AI-Powered Journey 19:40 - 20:30: Networking

--

Presentations:

Unraveling the Potential: An Insight into Large Language Models and Knowledge Retrieval Keven(Qi) Wang - Vionlabs

Since the advent of ChatGPT, Large Language Models (LLM) have witnessed burgeoning interest, cementing their position as pivotal tools in addressing numerous contemporary challenges. In this presentation, we will delve into three primary capabilities of LLMs:

  • Inherent Knowledge Repository: LLMs are fortified with vast amounts of pre-processed data. Users can swiftly tap into this reservoir of information via chatbot interfaces.
  • Customized Knowledge Retrieval: This forms the crux of our discourse today. With the aid of tools like LangChain, vector databases, and embedding models, we will demonstrate how to construct a chatbot tailored to your unique knowledge base, capitalizing on LLM's summarization prowess.
  • Reasoning Mechanism: We will briefly explore this aspect, showcasing how LLMs dissect intricate issues to present actionable strategies. Further, we'll illustrate how platforms like LangChain can cohesively interlink these insights.

Join us as we navigate the expansive universe of LLMs and their transformative potential.

Speaker Bio: Keven (Qi) Wang, a luminary in artificial intelligence and data science, boasts a rich tapestry of leadership roles across diverse industries. As the former CTO of Vionlabs, Keven spearheaded the vision for groundbreaking deep learning AI solutions in the media and entertainment space. At Curb Food, he established and scaled an engineering ecosystem, emphasizing data-driven decision-making. As H&M Group's Head of Machine Learning, Keven championed the growth of its global AI division, earning accolades for presenting pioneering AI strategies at esteemed tech summits.

Unlocking Collective Knowledge: Netlight's AI-Powered Journey Jacob Hagstedt - Netlight

Jacob Hagstedt, a senior Netlight consultant from Stockholm, will talk about how we at Netlight use the power of AI and Python to harness the collective knowledge of Netlight to bring value to our clients in the best possible way. Jacob will take you on a journey that started many years ago with simple word2vec vectorisation of slack questions as a naive baseline within our company up until today when the tool has created an internal eco-system of AI tooling working from within Slack to empower all of Netlights consultants out in the world, to bring value to our clients both via consultants, but also via our sales team..

Speaker Bio: Jacob is a ML/Data/Infra Engineer with a long history of language models. He has used his expertise in areas like E-Commerce, Private Equity / FinTech, Industrial Tech, and more. His experience with Python started back in 2009 and he hasn’t stopped since. If you hear the term "If it quacks like a duck, it is probably a duck" in the Netlight office you know Jacob is close. Before Jacob joined Netlight he started 2 startups, one within NLP & NLG , and the second one within Industrial Tech.

About the event

Date: October 26th, 17:30 - 20:30 Location: Gallerian (Regeringsgatan 25, 111 53 Stockholm) Directions: 3-minute walk from T-Centralen (Sergels Torg exit). Tickets: Sign up required. Anyone who is not on the list will not get in. The event is free of charge. Capacity: Space is limited to 100 participants. If you are signed up but unable to attend, please let us know by October 25th. Food and drinks: Pizza and drinks will be provided. If you have any food allergies please contact the meetup organizers. Questions: Please contact the meetup organizers.

Code of Conduct

The NumFOCUS Code of Conduct applies to this event; please familiarize yourself with it before attending. If you have any questions or concerns regarding the Code of Conduct, please contact the organizers.

Maximizing AI for Collective Knowledge

THIS IS AN ONLINE EVENT [Connection details will be shared 1h before the start time]

The London Clojurians are happy to present: Title: Electric Clojure — compiler managed datasync for rich web apps Speaker: Dustin Getz Time: 2023-07-25 @ 18:30 (London time) Local time: https://time.is/1830_25_July_2023_in_London/ (click here for local time)

Dustin Getz (https://twitter.com/dustingetz) will be presenting: "Electric Clojure — compiler managed datasync for rich web apps"

Electric Clojure is a reactive DSL for full-stack web development, with compiler-managed frontend/backend network sync. Electric's mission is to bring the next-generation of rich application interfaces within reach, by abstracting over client/server network plumbing in web applications by building it directly into the language/runtime, like how the JVM does with managed memory.

Dustin will be presenting an overview of Electric, including everything you need to know to get started. He will also walk through the tutorial and explain all the tricky bits. Please watch the 10 minute lightning talk before this presentation and feel free to DM him questions ahead of time on slack @ Dustin Getz.

This will be an interactive talk, please turn your cameras on and interrupt with questions!

https://github.com/hyperfiddle/electric https://electric.hyperfiddle.net/ https://hyperfiddle.notion.site/UIs-are-streaming-DAGs-e181461681a8452bb9c7a9f10f507991

Dustin is the founder @ Hyperfiddle, where he has been working on this problem for a long time.

If you missed this event, you can watch the recording on our YouTube channel: https://www.youtube.com/@LondonClojurians (The recording will be uploaded a couple of days after the event.)

Please, consider supporting the London Clojurians with a small donation: https://opencollective.com/london-clojurians/ Your contributions will enable the sustainability of the London Clojurians community and support our varied set of online and in-person events: - ClojureBridge London: supports under-represented groups discover Clojure - re:Clojure: our free to attend annual community conference - monthly meetup events with speakers from all over the world - subscription and admin costs such as domain name & Zoom plan for larger online meetups

Thank you to our sponsors: - https://juxt.pro/ - https://flexiana.com/ - https://gaiwan.co/ - https://freshcodeit.com/ - https://nette.io/ - https://nilenso.com/ - And many individual sponsors

Electric Clojure — compiler managed datasync for rich web apps

🎙️ Speaker: Niall Oulton\, Thomas Wiecki \| ⏰ Time: 16:00 UTC / 9am PT / 12pm ET / 6pm Berlin

Join us for a dynamic discussion on building your own in-house marketing analytics solution and mastering marketing effectiveness:

In today's marketing landscape, understanding how your marketing efforts impact your business has become a critical component for success. This meetup will provide an exploration of marketing effectiveness measurement and why it's integral to long-term growth..

Join our expert panel as they navigate the pros and cons of outsourcing marketing analytics to an agency versus developing an in-house solution. The discussion will encompass agency benefits like broad industry expertise, advanced tools, and unbiased perspectives, weighed against the financial commitments, reduced autonomy, and transparency issues. On the other hand, in-house solutions offer control over data and processes, domain knowledge, responsiveness, confidentiality, and potential cost savings, but come with their own challenges.

We are excited to introduce a potential solution - PyMC - Marketing (https://www.pymc-marketing.io/en/stable/index.html). This Bayesian, open-source software offers a transparent, modifiable solution. It's an exciting advancement in the marketing analytics space, perfect to build an in-house marketing measurement solution, as seen by Bolt in their latest case study: https://bolt.eu/en/blog/budgeting-with-bayesian-models-pymc-marketing/

Don't miss this opportunity to gain insights from industry leaders and meet like-minded professionals who are passionate about leveraging data for marketing effectiveness. Whether you're a seasoned CMO or a marketing analyst, there's something for everyone in this meetup.

📜 Outline of Talk / Agenda:

  • 5 min: Intro to PyMC Labs and speakers
  • 45 min: Presentation, panel discussion
  • 10 min: Q&A

💼 About the speaker:

  1. Niall Oulton Niall Oulton has built a reputation as a leading expert in the field of marketing analytics, with a specialization in Bayesian Marketing Mix Modelling. His career, spanning over a decade, has seen him on both sides of the business landscape - agency and client. His rich background provides him with a unique perspective, making him an expert in understanding and navigating the complexities of both worlds. Previously, Niall played an integral role in the development and deployment of an entire Bayesian MMM workflow at a global agency. This experience enabled him to gain valuable insight into the potential risks, benefits and pitfalls of in-housing a marketing effectiveness measurement programme.

🔗 Connect with Niall Oulton: 👉 LinkedIn: https://www.linkedin.com/in/nialloulton20/ 👉 Twitter: https://twitter.com/niall20 👉 GitHub: https://github.com/nialloulton 👉 Website: https://1749.io/

  1. Dr. Thomas Wiecki (PyMC Labs) Dr. Thomas Wiecki is an author of PyMC, the leading platform for statistical data science. To help businesses solve some of their trickiest data science problems, he assembled a world-class team of Bayesian modelers and founded PyMC Labs -- the Bayesian consultancy. He did his PhD at Brown University studying cognitive neuroscience.

🔗 Connect with Thomas Wiecki: 👉 GitHub: https://github.com/twiecki 👉 Twitter: https://twitter.com/twiecki 👉 Website: https://twiecki.io/

📖 Code of Conduct: Please note that participants are expected to abide by PyMC's Code of Conduct.

🔗 Connecting with PyMC Labs: 👥 LinkedIn: https://www.linkedin.com/company/pymc-labs/ 🐦 Twitter: https://twitter.com/pymc_labs 🎥 YouTube: https://www.youtube.com/c/PyMCLabs 🤝 Meetup: https://www.meetup.com/pymc-labs-online-meetup/

🔗 Connecting with PyMC Open Source: 💬 Q&A/Discussion: https://discourse.pymc.io 🐙 GitHub: https://github.com/pymc-devs/pymc 💼 LinkedIn: https://www.linkedin.com/company/pymc/mycompany 🐥 Twitter: https://twitter.com/pymc_devs 📺 YouTube: https://www.youtube.com/c/PyMCDevelopers 🎉 Meetup: https://www.meetup.com/pymc-online-meetup/

[Online] Building an in-house marketing analytics solution
David Bader – guest , Tobias Macey – host

Summary Exploratory data analysis works best when the feedback loop is fast and iterative. This is easy to achieve when you are working on small datasets, but as they scale up beyond what can fit on a single machine those short iterations quickly become long and tedious. The Arkouda project is a Python interface built on top of the Chapel compiler to bring back those interactive speeds for exploratory analysis on horizontally scalable compute that parallelizes operations on large volumes of data. In this episode David Bader explains how the framework operates, the algorithms that are built into it to support complex analyses, and how you can start using it today.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Data stacks are becoming more and more complex. This brings infinite possibilities for data pipelines to break and a host of other issues, severely deteriorating the quality of the data and causing teams to lose trust. Sifflet solves this problem by acting as an overseeing layer to the data stack – observing data and ensuring it’s reliable from ingestion all the way to consumption. Whether the data is in transit or at rest, Sifflet can detect data quality anomalies, assess business impact, identify the root cause, and alert data teams’ on their preferred channels. All thanks to 50+ quality checks, extensive column-level lineage, and 20+ connectors across the Data Stack. In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses. Listeners of the podcast will get $2000 to use as platform credits when signing up to use Sifflet. Sifflet also offers a 2-week free trial. Find out more at dataengineeringpodcast.com/sifflet today! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodc

AWS Azure BigQuery CDP Cloud Computing Data Engineering Data Lake Data Management Data Quality Databricks ETL/ELT GCP Java Kubernetes MongoDB MySQL postgresql Python Scala Snowflake Spark SQL Data Streaming
Data Engineering Podcast
Event How Music Charts 2019-04-09
Jason Joven – host @ Chartmetric , Rauf & Faik – musical artists

HighlightsSpotify’s Viral 50 chart highlights music from around the worldRussian trap duo Rauf & Faik take off in the Caucasus regionMissionGood morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Tuesday April 9th 2019.Chart Highlight On Friday April 5th, the Spotify Viral 50 highlighted the global nature of streaming by featuring a Philadelphia-born Puerto Rican artist remix, an Indonesian R&B collaboration, and a Russian hip-hop duo all in the top 10.The #1 spot is occupied by Philadelphia-born, Puerto Rico-raised artist Dalex along with six collaborators in the vibey rap track “Pa Mí - Remix”.The original track was 3:30 long, but the new version is a full 6:00, making room for other Latin artists such as Argentina’s Cazzu and Panama’s Sech.It’s released by the indie label Rich Music, who signed a distribution deal with Warner Music Latina back in 2017, and the remix itself was released on Feb 6th, spending 33 days so far on the Viral 50.The #2 track is slightly newer, released on Feb 21st, and that’s the Indonesian R&B collaboration “Adu Rayu” by Yovie Widianto, Tulus and Glenn Fredly.Google translated as “Flirtation”, the easy-listening ballad is also featured on Top Hits Indonesia at 682K followers and on Apple Music’s The A-List: Indonesian Music.The #10 track in last Friday’s Viral 50 is Russian hip-hop duo Rauf & Faik with the track “Детство” (dee-YEHTS-vha), or “Childhood”.It’s currently at its peak position on the chart, still fresh-faced at only 3 days there, even though it was released half a year ago in Sept 2018.The dark, melodic trap song is part of Friday’s genre tag trend, as the tag “trap music” showed up the most at 15 times with “pop rap” showing up 10 times.Interestingly, the Viral 50 chart brings in the old and makes them new: 96% of the chart’s tracks are older than one month yet none of them have been on there for more than two, showing a tendency to take what’s already been released and giving them new life.Artist Highlight in the NewsThe occupants of the #10 spot, Russian trap duo Rauf & Faik, are performing well stream-wise with 804K Spotify monthly listeners and 30K followers for a listener to follower ratio of 26, putting them in the realm of Dutch DJ/producer Sam Feldt at 26.2 and Grammy-nominated American artist Tierra Whack at 26.1.Rauf & Faik seem to be making lots of waves in Turkey as they are currently featured in the #1 position on the 50-track Hot Hits Türkiye playlist at 354K followers and the Turkey Top 50 with 523K follows in the #7 position.On Apple Music, they are on four different Daily Top 100 charts in Turkey, Uzbekistan, Bulgaria and Azerbaijan, the last of which is their country of origin, though according to our Instagram data, they are currently based in Moscow.Their 463K IG followers skew very young and female, with nearly 40% of their followers falling in the 13-17 female demographic.With 3/4 of their IG fanbase from Russia, Kazakhstan and Ukraine alone and 80% of them speaking primarily Russian, the hopes of an overseas crossover seems unlikely.But since their dark, spacious trap vibes are undoubtedly taking a page out of Post Malone’s sonic playbook, you just never know.OutroThat’s it for your Daily Data Dump for Tuesday April 9th 2019. This is Jason from Chartmetric.Feel free to sign up for a free account at chartmetric.io/signupAnd article links and show notes are at: chartmetric.transistor.fm/episodes.Happy Tuesday, see you tomorrow. 

Data Streaming
Becky G – Singer/artist @ Sony Music , Jason Joven – host @ Chartmetric , Kane Brown – Country artist @ Sony Music

HighlightsAmazon Music’s “Country Heat” playlist features a who’s who in the Nashville scene today, while Country artist Kane Brown dips his toes into the Latin world with Becky GMissionGood morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday April 5th 2019.Playlist Highlight Currently at the top of Amazon Music’s Top Playlists is Country Heat, which lists “Fire tracks from the hottest acts in country music.”Currently at 55 songs and just over 3 hours long, it’s of course a mostly American collection of artists besides Aussie Morgan Evans and Canadian group James Barker Band each featuring one track.In terms of label representation, it’s pretty well distributed, with Sony Music in the lead with 16 of the current tracks, Universal at 9, indie Big Machine & major Warner Music at 8, and both BMG’s BBR Music Group and indie label Big Loud Records at 4 each.Blake Shelton kicks off the #1 spot with his anthemic track “God’s Country”, and Chris Stapleton’s “Millionaire” about how being in love is making him a rich man brings up the rear in the #55 position.Interestingly, Country Heat plays no artist favorites, as out of the entire 55 tracks, no artist appears more than once with the slight exception of Luke Combs, who placed his track “Beautiful Crazy” in the #20 spot, but appears as a guest in Brooks & Dunn’s ode to devotion in “Brand New Man” in position #51.So if you’re looking for a hot country playlist that seems to like a diverse set of artists, check out Amazon’s Country Heat.Artist Highlight in the NewsNow pulling into that playlist’s number 13 spot is Tennessee-born, Georgia-raised Kane Brown, who recently made an interesting move into the Latin world.Releasing the track “Lost in the Middle of Nowhere” with Mexican-American artist Becky G on March 28th, Brown sang in both English and Spanish, as the duo released two versions of the track on the same day.Both signed to Sony Music, they have similar data profiles on Spotify, though Becky G pulls ahead of Kane Brown in some aspects. Becky G’s Spotify popularity index is at 86 out of 100 while Brown is at 79, but he slightly edges her with a listener to follower ratio of 5, while Becky G is at 4. This is most likely due to Becky G being on many more Spotify editorial playlists at 245 and having much more exposure to accumulate more followers, while Brown is only on 50 editorial playlists.When it comes to daily statistics, Becky G is definitely exposing Brown to a wider crowd as Becky G has 5x more daily Spotify followers, 5x more daily YouTube views, 15x more daily Instagram followers than Brown.Becky G’s top five most listened to Spotify cities? Santiago, Mexico City, Buenos Aires, Madrid and Lima. Kane Brown’s are all stateside: Chicago, Dallas, Atlanta, LA and Houston.Going by Instagram followers, Becky G’s fanbase is young but fairly spread out between 13-34, while over half of Kane Brown’s fanbase is between 18-24 only.They both cater to a majority female audience, though Becky G’s fanbase is 50% Hispanic and Kane Brown’s is 83% Caucasian, showing where the collaboration can provide the most exposure to new potential fans.The song itself definitely caters to each audience, as the country/English version features guitar power chords, acoustic string strumming and a straight backbeat drum pattern, while the Latin/Spanish version features reggae guitar rhythms and the trademark reggaetón beat.It looks like the cross-collaboration is going well for Brown as the Spanish version has 3.2M views more than the English version, so to Kane Brown, felicitaciones!OutroThat’s it for your Daily Data Dump for Friday April 5th 2019. This is Jason from Chartmetric.If you want to check some of the data yourself, sign up for a free account at chartmetric.io/signupAnd article links and show notes are at: chartmetric.transistor.fm/episodes.Happy Friday, and have a great weekend! See you Monday.

Eric Siegel – author

"Mesmerizing & fascinating..." — The Seattle Post-Intelligencer "The Freakonomics of big data." —Stein Kretsinger, founding executive of Advertising.com Award-winning | Used by over 30 universities | Translated into 9 languages An introduction for everyone. In this rich, fascinating — surprisingly accessible — introduction, leading expert Eric Siegel reveals how predictive analytics works, and how it affects everyone every day. Rather than a “how to” for hands-on techies, the book serves lay readers and experts alike by covering new case studies and the latest state-of-the-art techniques. Prediction is booming. It reinvents industries and runs the world. Companies, governments, law enforcement, hospitals, and universities are seizing upon the power. These institutions predict whether you're going to click, buy, lie, or die. Why? For good reason: predicting human behavior combats risk, boosts sales, fortifies healthcare, streamlines manufacturing, conquers spam, optimizes social networks, toughens crime fighting, and wins elections. How? Prediction is powered by the world's most potent, flourishing unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn. unleashes the power of data. With this technology Predictive Analytics , the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future drives millions of decisions more effectively, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate. In this lucid, captivating introduction — now in its Revised and Updated edition — former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction: What type of mortgage risk Chase Bank predicted before the recession. Predicting which people will drop out of school, cancel a subscription, or get divorced before they even know it themselves. Why early retirement predicts a shorter life expectancy and vegetarians miss fewer flights. Five reasons why organizations predict death — including one health insurance company. How U.S. Bank and Obama for America calculated — and Hillary for America 2016 plans to calculate — the way to most strongly persuade each individual. Why the NSA wants all your data: machine learning supercomputers to fight terrorism. How IBM's Watson computer used predictive modeling to answer questions and beat the human champs on TV's Jeopardy! How companies ascertain untold, private truths — how Target figures out you're pregnant and Hewlett-Packard deduces you're about to quit your job. How judges and parole boards rely on crime-predicting computers to decide how long convicts remain in prison. 182 examples from Airbnb, the BBC, Citibank, ConEd, Facebook, Ford, Google, the IRS, LinkedIn, Match.com, MTV, Netflix, PayPal, Pfizer, Spotify, Uber, UPS, Wikipedia, and more. How does predictive analytics work? This jam-packed book satisfies by demystifying the intriguing science under the hood. For future hands-on practitioners pursuing a career in the field, it sets a strong foundation, delivers the prerequisite knowledge, and whets your appetite for more. A truly omnipresent science, predictive analytics constantly affects our daily lives. Whether

data data-science web-analytics google-analytics AI/ML Analytics Big Data IBM
O'Reilly Data Science Books

Maximize the Value of Business Intelligence with IBM Cognos v10 -- Hands-on, from Start to Finish This easy-to-use, hands-on guide brings together all the information and insight you need to drive maximum business value from IBM Cognos v10. Long-time IBM Cognos expert and product designer Sangeeta Gautam thoroughly illuminates Cognos BI v10’s key capabilities: analysis, query, reporting, and dashboards. Gautam shows how to take full advantage of each key IBM Cognos feature, including brand-new innovations such as Active Reports and the new IBM Cognos Workspace report consumption environment. She concludes by walking you through successfully planning and implementing an integrated business intelligence solution using IBM’s best-practice methodologies. The first and only guide of its kind, offers expert insights for BI designers, architects, developers, administrators, project managers, nontechnical end-users, and partners throughout all areas of the business—from sales and marketing to operations and lines of business. If you’re pursuing official IBM Cognos certification, you’ll also find Cognos certification sample questions and information to help you with the certification process. IBM Cognos Business Intelligence v10 Coverage Includes • Understanding IBM Cognos BI’s components and open, extensible architecture • Working with IBM Cognos key “studio” tools: Analysis Studio, Query Studio, Report Studio, and Event Studio • Developing and managing powerful reports that draw on the rich capabilities of IBM Cognos Workspace and Workspace Advanced • Designing Star Schema databases and metadata models to answer the questions your organization cares about most • Efficiently maintaining and systematically securing IBM Cognos BI environments and their objects • Using IBM Cognos Connection as your single point of entry to all corporate data • Building interactive, easy-to-manage Active Reports for casual business users • Using new IBM Cognos BI v10.1 Dynamic Query Mode (DQM) to improve performance with complex heterogeneous data • Identifying, exploring, and exploiting hidden data relationships • Creating quick ad hoc queries that deliver fast answers • Establishing user and administrator roles

data data-science analytics-platforms Cognos BI IBM dimensional modeling Marketing
O'Reilly Business Intelligence Books
Gottfried Vossen – author , Gerhard Weikum – author

Transactional Information Systems is the long-awaited, comprehensive work from leading scientists in the transaction processing field. Weikum and Vossen begin with a broad look at the role of transactional technology in today's economic and scientific endeavors, then delve into critical issues faced by all practitioners, presenting today's most effective techniques for controlling concurrent access by multiple clients, recovering from system failures, and coordinating distributed transactions.The authors emphasize formal models that are easily applied across fields, that promise to remain valid as current technologies evolve, and that lend themselves to generalization and extension in the development of new classes of network-centric, functionally rich applications. This book's purpose and achievement is the presentation of the foundations of transactional systems as well as the practical aspects of the field what will help you meet today's challenges. Provides the most advanced coverage of the topic available anywhere--along with the database background required for you to make full use of this material. Explores transaction processing both generically as a broadly applicable set of information technology practices and specifically as a group of techniques for meeting the goals of your enterprise. Contains information essential to developers of Web-based e-Commerce functionality--and a wide range of more "traditional" applications. Details the algorithms underlying core transaction processing functionality.

data data-engineering
O'Reilly Data Engineering Books