In this blog series, we’re proud to shine a light on some of the top Capstone projects from the fifth graduating class of Data Science for All / Colombia. Capstone projects are a critical component of the program’s curriculum. Teams work on projects together to apply what they learn during the program to a real-world data problem. These projects were sourced directly from public and private entities in Colombia and solve a real problem these entities are facing. Through these projects, our graduates learn practical, job-oriented data skills and give back to their community using the power of data science & AI.
David graduated from Pontificia Universidad Javeriana Cali with a bachelor’s degree in electronics engineering. His research interests include digital signal processing, computer vision, software engineering, and telecommunications. He decided to join DS4A Colombia 2021 aiming to leverage the opportunity to gain top-level formation in data science, develop skills and the toolset needed to process and obtain insights from data in a more professional way. Currently employed as a back-end engineer for a global software delivery firm, David is using what he learned in DS4A for data processing in a project seeking for career growth inside the company.
Natalia is currently studying Computer Science at the National University from Colombia. She is interested in Artificial Intelligence from a theoretical and philosophical perspective and its applications in computer vision and self-driving cars.
She decided to join DS4A Colombia 2021 for the quality of the program and its practical approach to Data Science to improve her abilities in this field. After the program, she hopes to keep learning about Machine Learning and apply the knowledge acquired in the program to new projects.
Nicolás graduated from Universidad de los Andes with Cum Laude master’s degree in industrial engineering. His research interests include variants of shortest path problems and vehicle routing problems. He decided to join DS4A Colombia 2021 with the purpose of increasing his knowledge in machine learning methods and to learn key concepts that could improve the quality of his further research. Currently, he is a PhD student in HEC Montréal, where he works with professors Jean-François Cordeau and Jorge Mendoza in solving a challenging technician routing and scheduling problem proposed by one of the biggest electricity companies in Europe.
Julián Monsalve Acevedo is an industrial engineer graduated from the Universidad Nacional de Colombia, currently pursuing a specialization in analytics at the same university.
He decided to apply to DS4A Colombia 2021 to strengthen and increase his skills in machine learning techniques, data processing, statistical inference, among other types of skills needed to form a data scientist.
Julian currently works at Rappi, a technology-based startup, which has had a very accelerated growth in recent years. He plans to apply all the knowledge learned in the DS4A program to help the company continue to grow the way it has been growing.
Daniel graduated from Universidad de los Andes with a BS in Industrial Engineering and is also currently one semester away from graduating in Electronic Engineering. His research interests include Supervised Learning, Computer Vision and Investment Banking. He decided to join DS4A Colombia 2021 cohort 5 with the purpose of increasing his knowledge in Machine Learning in order to continue his formation on data science. Currently employed as a Data Analyst in a Startup called Frubana.
About the Project: Identifying Children at Malnutrition and Relapsing Risk through ML Models
This project aims to identify the socio-demographic variables most related to relapse in malnutrition for children under five years of age and estimate the probability of first occurrence and relapse on this nutritional disorder. We were mainly motivated because of the social purpose of this project, having in mind that the main beneficiaries are the children of Colombia.
Click to read the datafolio
One of the most enlightening moments of the project was when we were able to define the methodology we used to calculate the target of our model, since for a single child we could have samples from up to 3 different years and we were not sure if we could unify them to form a single register or not, among other options. With the help of the ICBF and our TAs we found a way to do this by pooling the annual samples of the same child to create overall target variables.
The most significant challenges throughout our project were data processing and feature engineering. In the first place, we had to handle two large databases that totaled 15GB, so we had to use Google Collab and do the processing in chunks. Besides that, we faced problems when deciding how to perform the target labeling, tuning the length of the time window for predicting relapse and first-ever malnutrition, and selecting the most meaningful features among 70+ available.
Our team mentors were Miguel Fernando Pire and Nicolás Escobar. They played a vital role in the project's scope definition, suggested machine learning techniques, and helped us to consolidate our strong points. We are very grateful to them.
Our project has a major impact on children in Colombia. Using the proposed methodology, ICBF is capable of identifying children with the greatest risk of having malnutrition (First time or relapse). Moreover, after identifying these children, the ICBF can focus their resources (sometimes limited) on them. This way, even children with a high risk of having malnutrition, may not have to suffer this problem again.
Congratulations to this team, their mentors, and TA, for this accomplishment!
If you're interested in joining our Data Science for All mission to recruit our Data Science for All fellows or to become a Mentor, please get in touch.