11 docs tagged with "data-engineering"

CSV: The Universal Data Language

Understanding the Comma-Separated Values format: its role in ML, performance trade-offs, and best practices for ingestion.

Mastering the challenges of high-velocity sensor data: MQTT protocols, edge processing, and time-series ingestion.

Understanding telemetry, event tracking, and edge-to-cloud data ingestion for mobile-first machine learning.

Mastering the techniques for harvesting data from the internet: REST APIs, GraphQL, and automated web scraping.

Identifying and integrating various data sources: from relational databases and APIs to unstructured web data and IoT streams.

Handling .xlsx and .xls files in ML pipelines: managing multi-sheet workbooks, data types, and conversion pitfalls.

Mastering JSON for Machine Learning: handling nested data, converting dictionaries, and efficient parsing for NLP pipelines.

A deep dive into REST and GraphQL APIs: how to fetch, authenticate, and process external data for machine learning.

Understanding Columnar storage, compression benefits, and why Parquet is the preferred format for high-performance ML pipelines.

Comparing Relational and Non-Relational databases: choosing the right storage for your machine learning features and labels.

Handling hierarchical data in XML: parsing techniques, its role in Computer Vision annotations, and converting XML to ML-ready formats.