Data Science GO (“DSGO”) was a huge success. It was great to reconnect with data science practitioners and leaders that I met at DSGO 2018, and to learn about the latest trends and how much the space has changed in just one year. I felt a lot of pride seeing UCI’s MS in Business Analytics (“MSBA”) program as the main educational sponsor for the event, and I also enjoyed seeing my old professor, Sanjeev Dewan, talk on the main stage about causal inference methods in data science (that’s him in the picture, above)!
The takeaways and resources below were some of my favorite from Day 1, and I plan to follow up with a second post covering DSGO Day 2 highlights. Shoot me a note if you have any questions and I hope these help you!
Download the DSGO 2019 presentations (PDF format):
Day 1 Takeaways and Resources (by presenter):
- Sarah Aerni, Director of Data Science at Salesforce
- Embrace data science experimentation, especially on deployed models. For example, if a model is performing poorly one month, do a deep dive to understand the drivers for why that model is performing poorly.
- TransmogrifAI - This is Salesforce’s open-source, AutoML library for structured data, written in Scala and running on top of Apache Spark.
- Use a test-driven development approach to data science, e.g., using synthetic data for scenario testing and model performance evaluation.
- Gabriela de Queiroz, Founder of R-Ladies and Data Science Manager at IBM
- At IBM, Gabriela works for the Center for Open-Source Data and AI Technologies (“CODAIT”). Their mission is to make “AI technologies accessible to the practitioners that understand real-world problems, so that they themselves can develop AI solutions that solve these problems.”
- CODAIT’s catalog of open-source projects is impressive… At the conference, we saw a demo of CODAIT’s Model Asset eXchange, a place for free, deployable, and trainable deep learning models.
- Some of the most accomplished data scientists that I know, I met through R-Ladies of Irvine. I was super impressed to find out that Gabriela is the founder of the R-Ladies Global organization.
- Sanjeev Dewan, Professor of Information Systems and Director of UCI’s MSBA Program
- Professor Dewan discussed a topic that I don’t see practitioners discussing nearly enough, i.e., causal inferences can be made using randomized experiments and non-experimental methods. The two examples he covered included Diff-in-Diff and Propensity Score Matching.
- Difference-in-Differences Estimation example: Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania, by David Card and Alan Krueger
- Link to Diff-in-Diff study: Berkeley’s research library
- Propensity Score Matching (“PSM”) using R: R Tutorial on GitHub, with code
- Pawel Skrzypek, CTO at AI Investments Ltd.
- Some of my favorite machine learning problems are time-series financial forecasting problems, and Pawel really impressed the crowd here.
- He covered stacked LSTM and CNN models (ensemble models or “super learners”) for forecasting and portfolio optimization with AI.
- He educated me on the M4 Competition, one of the most important annual forecasting events on the planet. This is a super interesting competition that I plan to follow in the years to come.
- Not explicitly covered in Pawel’s presentation, but something that I found interesting… In 2018, one of Uber’s leading data scientists, Slawek Smyl, was named the winner of the 2018 competition. He built a hybrid-model including exponential smoothing (“ES”) and a black-box recurrent neural network (“RNN”) forecasting engine that he coded in C++ and DyNet, using R and SQL to manipulate and merge the output data. Link: M4 Forecasting Competition: Introducing a New Hybrid ES-RNN Model
- Hadelin de Ponteves, Co-Founder & Director at BlueLife AI
- Hadelin covered traditional natural language processing (“NLP”) approaches and provided the audience with a demo of the newest NLP method using Transformers (faster, more accurate, less training involved).
- Examples: Google’s BERT, Facebook’s RoBERTa, XLM, XLNet
- Learning resource: Transformers on Hugging Face
- Note: As of the date of this article, Google recently released a leaner version of BERT called “ALBERT” that uses advanced feature reduction and is topping the NLP leaderboards. Source: VentureBeat.com
- Ben Taylor, Co-Founder and Chief AI Officer of Zeff.ai
- I recently interviewed Ben for episode 21 of Scatter Podcast, so this was a treat to see live his first public demo of his AI-powered, Xbox Live Call of Duty player enhancement tool (something that Xbox and Microsoft are unable to detect).
- For this project, he trained his AI models using video, audio, and controller signals recorded from one of the top Call of Duty players in the world. His ensemble model had roughly 30-models concurrently running, with low latency, and trained using an Intel beast of a server with 196-cores and 12TB of RAM.
- While online games are fun, Ben paints a sinister picture of what world governments could be doing, i.e., recording and training AI agents based on kill sequences from online games, improving themselves while developing AI technologies that could outperform humans if deployed for use in war… Scary!
Stay tuned for Day 2 notes, coming soon.