Skip to content

From medicine to data science

Published:

Massive amounts of healthcare data are generated everyday in public hospitals, most of which are sitting untouched in storage. It is especially true here in France, where data access policies are pretty strict, electronic health records (EHR) pretty new, and hospital-employed data scientists pretty non-existant (well, let’s see what will change in the future with the recent announcements of our president and Rapport Villani). Among other reasons, this realization led me to where I am today; I recently — yesterday in fact — finished an MSc in Data Science at Ecole Polytechnique and Université Paris-Saclay, and am planning to put what I’ve learned to unravel the mysteries hidden in all this dormant data.

As this is my first post on this blog, it will also serve as a brief introduction to my background. After medical school, tired of numbingly learning books by heart, I chose to pursue a residency in Public Health and Social Medicine to do a little more maths than other specialties allowed. My first year was divided into two research internships, one in biostatistics and one in epidemiology. In both, on top of being inspired by great mentors, I discovered how properly handled data could give powerful insights and have a huge impact on people’s health and well-being. I also realized how scarce “new” statistical approaches (ie. machine learning, deep learning) were to be found in actual studies and real-life applications. By that time, I’d already heard of a solid MSc in Data Science given by one of our top schools, sent my application, and voilà, here I am today, eager to start my new internship next week doing research on adverse drug reaction detection in large databases. More on that later.

This year taught me tremendous amounts of knowledge, but it was hard. Very hard. The fact that I was actually accepted in an applied maths curriculum still surprises me to this day; coming from medicine (which you start right after high school in France), I did not meet the mathematical nor programming prerequisites. Sure, I did spend a year working on statistical analyses, and at that time I thought that I was ready to go deeper, but I soon realized how much I lacked in terms mathematical foundations to truly get the most of out my Data Science MSc.

Having spent the summer digging out good resources to get me to a decent level in a limited amount of time, I figured I would start this blog by sharing what helped me preparing this year and getting through it alive (spoiler: I even enjoyed it). I will list in no specific order some of the courses I had, and the relevant associated resources linked underneath.

Convex Optimization

Convex optimization was one of the hardest subjects for me this year, drawing heavily from both the fields of analysis and linear algebra. It was also my very first lecture this year, which set the pace for what was to come.

Probabilistic Graphical Models / Bayesian Statistics

Those two subjects were heavily based on a very famous book in the data science field.

Machine Learning

Speech, Text and Natural Language Processing

Deep Learning

Learning with Agregation

Compressed Sensing

One of the most interesting subject this year, grounded in solid maths and with a lot of cool applications such as medical magnetic resonance imaging (MRI) or matrix completion (the so called Netflix problem).

Kernel Methods for Machine Learning

And that’s it. Those were the resources that helped me make sense of what I was taught during lectures, and I’m sure I’ll come back to them on a regular basis. Do not hesitate to add other links in the comment section, I’m always in the lookout for great teaching content to learn from.