Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pydata-eindhoven-2022/category.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"title": "PyData Eindhoven 2022"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "PyData Eindhoven 2022\n\nThis talk is about machine learning package development. I will speak about the pains and benefits it causes for developers and share why open sourcing makes the package even better. The talk is not focused on the package itself but rather on common problems so it will be interesting for a wide range of data scientists and python developers.\n\nHave you ever wondered how new open source packages emerge? This talk is exactly about it. I will tell how the idea of a package is born and how it transforms from the proof of concept to the first release version. How business could benefit from it and why the development itself is not the hardest part of open source development. And last but not least, why we changed the architecture of our package three times and why you would too!",
"duration": 1687,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Andrei Alekseev"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/8bc0WJENcSY/maxresdefault.jpg",
"title": "Why does everyone need to develop a machine learning package?",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=8bc0WJENcSY"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"description": "PyData Eindhoven 2022\n\nIn the chip industry, time is money. Customers of ASML\u2019s lithography systems expect high uptimes. But expected and unexpected maintenance is part of that equation, sometimes requiring to halt the production temporarily.\n\nIn this presentation, we show you how we are building and deploying Machine Learning models to predict upcoming maintenance actions within the upcoming three months. Our work helps to boost productivity, maximize system utilization and reduce unexpected workload for ASML\u2019s customer support.\n\nIn the chip industry, time is money. Customers of ASML\u2019s lithography systems expect high uptimes. But expected and unexpected maintenance is part of that equation, sometimes requiring to halt the production temporarily.\n\nIn this presentation, we show you how we are building and deploying Machine Learning models to predict upcoming maintenance actions within the upcoming three months. Our work helps to boost productivity, maximize system utilization and reduce unexpected workload for ASML\u2019s customer support.",
"duration": 2375,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Anjan Prasad Gantapara",
"Hamideh Rostami"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/mhXui-2bXz8/maxresdefault.jpg",
"title": "Predictive Maintenance at ASML",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=mhXui-2bXz8"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "PyData Eindhoven 2022\n\nAn ever increasing number of people are discovering mobile grocery shopping as an alternative to brick-and-mortar supermarkets. This talk will cover how we can use machine learning to make these customers' grocery shopping as smooth and frictionless as possible. We do this by applying ML models that rank products in agreement with the customer\u2019s intent: e.g., by detecting personal shopping habits, and by striking a balance between query relevance and margin.\n\nIn online grocery, the wide range of available choices can easily overwhelm a customer. Moreover, failure to find the desired products may lead to customers not converting at all. It\u2019s therefore crucial to optimise ranking, in accordance with the customer\u2019s intent; and to construct sensible algorithms that capture this intended behaviour.\n\nIn this talk, I will provide a holistic view of how we approach ranking in the online grocery context. Depending on an app page\u2019s intended functionality, we might aim to make rebuying as frictionless as possible, while elsewhere we personalise search query relevance while not losing sight of margin. More concretely, I will discuss how we have set up ranking in an explainable and interpretable way that allows for a balance between relevance, profit and any other business-based concerns there might be. In addition, I will briefly discuss three algorithms that we have developed and implemented, and how these are combined to optimise the customer experience:\n- prediction of rebuying probabilities through detecting personal shopping habits\n- construction of unbiased search term-article relevances through structural position bias corrections\n- personalisation of search results while taking profitability into account\n\nThis talk will provide the application-minded Data Scientist with an inside view into the deliberations that inform our ranking algorithms and setup.",
"duration": 1832,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Bas Vlaming"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/J7qfN8hl4rs/maxresdefault.jpg",
"title": "Everything in its Right Place: Optimising Ranking in Online Grocery",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=J7qfN8hl4rs"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "PyData Eindhoven 2022\n\nWe present FuzzyTM, a Python library for training fuzzy topic models and creating topic embeddings for downstream tasks. Its modular design allows researchers to modify each software element and for future methods to be added. Meanwhile, the user-friendly pipelines with default values allow practitioners to train a topic model with minimal effort.\n\nThe volume of data/information created is growing exponentially and forecasted to reach 181 zettabyte by 2025. Approximately 80% of today\u2019s data is composed of unstructured or semi-structured data. Analyzing all this data is time intensive and costly in many cases. One technique to systematically analyze large corpora of texts is topic modeling, which returns the latent topics present in a corpus. Recently, several fuzzy topic modeling algorithms have been proposed and have shown superior results over the existing algorithms. Although various Python libraries offer topic modeling algorithms, none includes fuzzy topic models. Therefore, we present FuzzyTM, a Python library for training fuzzy topic models and creating topic embeddings for downstream tasks. Its modular design allows researchers to modify each software element and for future methods to be added. Meanwhile, the user-friendly pipelines with default values allow practitioners to train a topic model with minimal effort.",
"duration": 1691,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Emil Rijcken"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/sD-KmuqqYPY/maxresdefault.jpg",
"title": "FuzzyTM: a Python package for fuzzy topic models",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=sD-KmuqqYPY"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "PyData Eindhoven 2022\n\nAll organizations will need to become data-driven organizations, or they will go the way of the dinosaur. However, AI scales risk to organizational brand and profit. Trustworthy and Ethical AI are no longer luxuries, but business necessities. Let's explore together, why bias is not exclusive to AI, why technology has never been neutral and why Data Science has little to do with Science!\n\nMarc is AI & Ethics lead at KPMG, Managing Consultant, recovering Data Scientist and Public Speaker.\n\n00:00 - Fabian vd Berg - Opening Notes\n08:50 - Marc van Meel - AI Ethics in the Wild - Welcome to the Jungle",
"duration": 2822,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Marc van Meel"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/uny310s1olk/maxresdefault.jpg",
"title": "AI Ethics in the Wild",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=uny310s1olk"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "PyData Eindhoven 2022\n\nThe most popular data science development tools have largely been developed by academics as scratch pads for interactive data exploration. Jupyter notebooks, for instance, were developed 20 years ago at Berkeley (they were called iPython notebooks at the time). Because of their flexibility and interactivity, these tools have become widespread amongst coding data scientists. More recently, GUI-based tools have begun to be popular. They reduce the technical load on the user, but typically lack much needed flexibility and interoperability. Both avenues of innovation are wildly inadequate for modern data science development. GUI-based tools are typically too expensive, too restrictive, and too closed. The development of automated machine learning tools only made this problem worse, with dozens of software startups urging business analysts to start building machine learning solutions, often with questionable results and even more questionable customer retention metrics. On the other hand, notebook-based solutions are typically too error-prone, too loose, and too isolated to be sufficient. The result is intractable challenges around collaboration, communication, and deployment. The most recent entrants into the notebook space have only marginally improved the experience without fixing the underlying flaws. This talk discusses the fundamental flaws with the way these tools have been developed and how they currently function. Advancement in this space will require reworking the architecture and functionality of these tools at some of the most basic levels. These fixes include things like multiprocessing capabilities; real-time collaboration tools; safe, consistent code execution; easy API deployment; and portable communication tools. Future innovation in the data science development experience will have to tackle these problems and more in order to be successful.\n\nThe most popular data science development tools have largely been developed by academics as scratch pads for interactive data exploration. Jupyter notebooks, for instance, were developed 20 years ago at Berkeley (they were called iPython notebooks at the time). Because of their flexibility and interactivity, these tools have become widespread amongst coding data scientists. More recently, GUI-based tools have begun to be popular. They reduce the technical load on the user, but typically lack much needed flexibility and interoperability. Both avenues of innovation are wildly inadequate for modern data science development. GUI-based tools are typically too expensive, too restrictive, and too closed. The development of automated machine learning tools only made this problem worse, with dozens of software startups urging business analysts to start building machine learning solutions, often with questionable results and even more questionable customer retention metrics. On the other hand, notebook-based solutions are typically too error-prone, too loose, and too isolated to be sufficient. The result is intractable challenges around collaboration, communication, and deployment. The most recent entrants into the notebook space have only marginally improved the experience without fixing the underlying flaws. This talk discusses the fundamental flaws with the way these tools have been developed and how they currently function. Advancement in this space will require reworking the architecture and functionality of these tools at some of the most basic levels. These fixes include things like multiprocessing capabilities; real-time collaboration tools; safe, consistent code execution; easy API deployment; and portable communication tools. Future innovation in the data science development experience will have to tackle these problems and more in order to be successful.",
"duration": 2003,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Greg Michaelson"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/aeD5ydmrTdY/maxresdefault.jpg",
"title": "Significant Roadblocks to Usefulness for Jupyter Notebooks and a Recipe to Fix them",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=aeD5ydmrTdY"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "PyData Eindhoven 2022\n\nProcessing tabular data has been of the most common operations for data scientists and engineers for a while now. A few years ago, pandas was the single tool of reference for it, but is it still true today?\nIn this talk, we will review and compare the existing dataframe frameworks to see how they solve the challenges of performance, scalability and user experience.\n\nProcessing tabular data has been of the most common operations for data scientists and engineers for a while now. A few years ago, pandas was the single tool of reference for it, but is it still true today?\n\nThe increase in the size of the datasets and in the diversity of the use-cases has highlighted many challenges regarding performance, scalability and user experience. The ecosystem has evolved to now include many new alternatives, each of them tackling one or more of those dimensions differently. Some of them even put SQL back under the spotlight!\n\nIn this talk we will deep dive into the internals of tabular data processing and look at how the main players of the ecosystem work under the hood. After defining the fundamentals, we will zoom on their APIs and memory models through various examples, so that the audience can get an illustrated comparison between frameworks.",
"duration": 2099,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Harizo Rajaona"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/AMq8qZZwlYc/maxresdefault.jpg",
"title": "A Tour of the Many DataFrame Frameworks",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=AMq8qZZwlYc"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "Devcontainers are an open-source specification, which allow you to connect your IDE to a running Docker container and develop right inside it. This has numerous advantages. Because the dev environment is now formally defined, it is reproducible. This means others can easily reproduce your dev environment, too! This makes it much easier for others to join in on your project, and stay updated with changes to the environment.\n\nIn this talk, you will learn: why you might want to use a Devcontainer for your project (or when not \ud83d\ude09), what exactly a Devcontainer is, and how you can build one for your Python project \ud83d\udc0d.\n\nDevcontainers have been gaining traction lately. Whereas previously the technology existed only in the umbrella of Visual Studio Code, it is now released as an open specification. Such, multiple IDE's could all use the same standard specification, promoting reusability and standardisation. That said, Developers are currently hard at work at pushing the technology to become standardised. Especially for these reasons, this is an exciting time to take a closer look at this new specification, and at what the technology can do for us in general.\n\nSo how will I go about this talk? Let's take a look \ud83d\ude4c\ud83c\udffb.\n\n\ud83d\udcdd Talk setup\nLet's learn about Devcontainers together. This will be the setup of my talk:\n\nWhy Devcontainers? What problem do they aim to solve? Pro's & Con's.\nBuilding a basic Devcontainer from scratch\nOpening up the Devcontainer\nExtending the Devcontainer with more useful features\nCustom VSCode settings\nRunning your CI task in the Devcontainer\nConnecting as a non-root user\nOpening up a port to the Devcontainer\nGoing further \ud83d\udd2e\nMore useful links & resources\nConcluding \u2713\n\ud83c\udfe1 What you will take home\nAt the end of the talk, you will be taking home the following:\n- When it makes sense to create one\n- How you can create one\n- Knowledge on how Devcontainers work\n- A template repo for a Python project Devcontainer",
"duration": 2161,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Jeroen Overschie"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/SLsaCdRAV0U/maxresdefault.jpg",
"title": "How to create a Devcontainer for your Python project",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=SLsaCdRAV0U"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "PyData Eindhoven 2022\n\nCode archaeology is figuring out what a thing is for, who built it, and how you can get it to run again.\nDealing with legacy code artefacts (while under time pressure) is something we data people encounter a lot in daily life. I will tell about my experiences from both a research and software engineering standpoint. After quickly going over some common sense approaches, I will dive deeper into real-world archaeology and digital forensics, and find out what we can learn from these fields to make dealing with old artefacts a bit easier. Expect a mix of code and non-code hacks, with ample pop culture archaeology memes.\n\nContents:\n- Code archaeology: why do we do it, and do we need to bring a hat?\n- The basics: common sense approaches to code archaeology\n- What can we learn from real-world archaeologists?\n- What can we learn from digital forensics?",
"duration": 1895,
"language": "eng",
"recorded": "2022-12-02",
"related_urls": [
{
"label": "Conference Website",
"url": "https://pydata.org/eindhoven2022/"
}
],
"speakers": [
"Judith van Stegeren"
],
"tags": [],
"thumbnail_url": "https://i.ytimg.com/vi/5KItp5WJY3o/maxresdefault.jpg",
"title": "Practical code archaeology",
"videos": [
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=5KItp5WJY3o"
}
]
}
Loading