Are you searching for a way to connect Snowflake with R (Studio)? Then you’ve arrived at the right blog! In this blog, a ‘how to’ is given on an easy connectivity possibility between R (Studio) and Snowflake; being able to read data from Snowflake and writing back a result set after a script has been executed in R Studio.
In case you want to try out Snowflake, you can quite easily set up a trial account yourself. For a brief explanation about what Snowflake is and what its benefits might be for your organization to start using it, you can watch our latest episodes of ‘McCoy TV’ by our colleague Joran de Vries.
To illustrate the connection you have made after following the steps of this blog, a dataset from the OpenSAP course on SAP Predictive Analytics is used. This dataset is about clustering stores based on certain KPI’s using kmeans algorithm. More details on these tables below:
In this blog, the following steps will be explained:
To be able to load data between the two systems, for example an ‘ODBC’ connection can be used. For this, the Snowflake ODBC driver needs to be setup. A guide on how to do this for Windows for example can be found here.
Parameters that are required as input for the driver are:
Once the driver has been downloaded and installed, the connectively can be tested, e.g. for:
(NOTE: it might be the case that the connection fails due to proxy settings. How to solve this is described here. And, even if the connection works when testing, still the same proxy-related error might occur when attempting to use the connection within R Studio).
The relevant ‘R Packages’ need to be made available in R Studio via the install.packages(“…”) command within R Studio:
(NOTE: if you get errors on the ‘R Tools’ version (or lack thereof), please have a look at the solution here).
After having installed the relevant Packages, they have to be ‘initialized’ via the library(…) command:
Then, a connection can be defined for the Snowflake ODBC driver:
If the connection is made successfully, you will see it in the ‘Connections’-tab within R Studio:
Once the connection has been established, the data can be loaded from Snowflake towards R Studio:
Afterwards, you can check the content in R Studio to verify that the data has been loaded successfully:
Now, basic R computations can be executed on the loaded R_TABLE_KMEANS_INPUT (e.g. changing the format from a ‘lazy’ tibble to a data frame) so that the kmeans function can be used:
Lastly, the result of R_TABLE_KMEANS_OUTPUT can be checked and loaded back to Snowflake. Where the result after the ‘kmeans’ algorithm in R Studio is:
And to append the data to the existing table in Snowflake, the dbWriteTable function within the DBI package is used, to prevent a temporary table not being properly loaded in Snowflake:
Now that the data should have been loaded successfully towards Snowflake, a check can be done if this is actually the case! As mentioned in the ‘Use Case’ section, a view combining both the INPUT and OUTPUT table will be used for this: