2 min read
Best Practices For Staging Data Into Snowflake
Photo by Adam Chang / Unsplash

Much of the work of a Snowflake data engineer is in loading and transforming their data into Snowflake tables.  Though this is easy to achieve, there are a number of best practices that will help to keep the process organised, scalable and performing well:  

Use The Correct Staging Area Type - Snowflake has different types of staging areas, including internal and external stages, and user, table, and named stages. It's important to choose the best option so we are organising our inbound data in a natural and controlled way.  Usually, this would involve an access-controlled named stage, but not always.  

Organise Staging Into Folders - Many people use staging areas a buckets of files, without realising that you can PUT into a directory in order to better to organise your data.  

Purging - When we stage data, there is a PURGE=TRUE option which removes the data after it has been successfully staged.

Optimise Size Of Staged Files - There is a sweet spot for the size of the file for maximum throughput and performance.  Consensus is that your data should be split into 50-200mb files, though the optimal point for you will vary on things like the compute available and structure of the data.  Avoiding breaking your data into thousands of very small files, or uploading single very large files is likely to be sub-optimal however.  

Use A Separate Warehouse For Data Staging - It is worthwhile creating a separate data warehouse for your loading process in order to avoid hurting performance for people querying your database.  This additional warehouse can be suspended immediately after the data load in order to reduce costs.

Timeflow Systems
We Help Businesses Implement Real Time Data & Analytics Platforms