State of DS Survey by Anaconda – Skill Gaps

According to the State of Data Science survey done by Anaconda, the top 5 most important skill gaps in data science are: ⭐️ Engineering skills ⭐️ Probability and statistics ⭐️ Business knowledge ⭐️ Big data management ⭐️ Communication skills These skill gaps cut across multiple knowledge domains, from technical skills to soft skills, reflecting the… Continue reading State of DS Survey by Anaconda – Skill Gaps

Illustrated stable diffusion from Jay Alammar

If you are interested in knowing how Stable Diffusion generates amazing AI arts, but are put off by the steep technical details, This illustrated guide from Jay Alammar should help you, https://jalammar.github.io/illustrated-stable-diffusion/. The guide helpfully breaks down the model into components and substitutes complex equations with simple flowcharts. P/S: I also highly recommend his illustrated… Continue reading Illustrated stable diffusion from Jay Alammar

Short review of Designing Machine Learning System

I have finally finished “Designing Machine Learning Systems” after a few weekends of focus reading. It is one of the rare technical books that I finished in its entirety, and I thoroughly enjoyed it. Just to offer a quick book review below. The book is amazing in the following aspects: ✅Provides a high-level overview of… Continue reading Short review of Designing Machine Learning System

Automatic speech recognition – Whisper OpenAI

Whisper is a recently released transformer-based automatic speech recognition (ASR) model from OpenAI. It can be used for: 🗣Language identification 🗣Voice activity detection 🗣Multi-lingual speech recognition 🗣Multi-lingual speech translation When evaluated on the ESB datasets (including LibriSpeech, Common Voice), Whisper outperformed Conformer RNN-T from NVidia and Wav2Vec2 from Meta. Link to blog: https://openai.com/blog/whisper/Link to repo:… Continue reading Automatic speech recognition – Whisper OpenAI

Data versioning

“Data versioning is like flossing. Everyone agrees it’s a good thing to do, but few do it.” ~ Chip Huyen, Designing Machine Learning Systems Unlike code versioning, it is a lot more difficult to implement data versioning in data science / machine learning projects. It is because of the following reasons: ➡️ Data is often… Continue reading Data versioning