📄️ CSV
Understanding the Comma-Separated Values format: its role in ML, performance trade-offs, and best practices for ingestion.
📄️ Excel
Handling .xlsx and .xls files in ML pipelines: managing multi-sheet workbooks, data types, and conversion pitfalls.
📄️ JSON
Mastering JSON for Machine Learning: handling nested data, converting dictionaries, and efficient parsing for NLP pipelines.
📄️ Parquet
Understanding Columnar storage, compression benefits, and why Parquet is the preferred format for high-performance ML pipelines.
📄️ XML
Handling hierarchical data in XML: parsing techniques, its role in Computer Vision annotations, and converting XML to ML-ready formats.