Transformation, Part 2

2 hours of SQL video using NASA's Cassini data. See why developers become data pros.
Timestamping when data has been created or updated is a regular occurrence in the developer world and if you're working with Postgres, the definition is simple:
<span class="hljs-keyword">create table</span> products(
<span class="hljs-comment">--...</span>
created_at timestamptz <span class="hljs-keyword">not null</span> <span class="hljs-keyword">default</span> now(),
updated_at timestamptz <span class="hljs-keyword">not null</span> <span class="hljs-keyword">default</span> now()
);
When preparing data for analysis, it's common to denormalize it in order to speed up the analysis process. It's common to see data pre-calculated and stored, especially if its historical data like ours is. I'm doing exactly this by creating a date and year field.
Postgres makes this kind of thing simple by offering generated fields:
<span class="hljs-keyword">create table</span> inms(
id serial <span class="hljs-keyword">primary key</span>,
created_at <span class="hljs-type">timestamp</span> <span class="hljs-keyword">not null</span>,
<span class="hljs-type">date</span> <span class="hljs-type">date</span> <span class="hljs-keyword">not null</span> generated always <span class="hljs-keyword">as</span> (created_at::<span class="hljs-type">date</span>) stored,
<span class="hljs-keyword">year</span> <span class="hljs-type">int</span> <span class="hljs-keyword">not null</span> generated always <span class="hljs-keyword">as</span> (date_part(<span class="hljs-string">'year'</span>, created_at)) stored,
<span class="hljs-comment">--...</span>
);
Whenever the created_at data changes, so will date and year because these are virtual columns that can't be changed directly.
Here's the entire table definition:
<span class="hljs-keyword">drop</span> schema if <span class="hljs-keyword">exists</span> enceladus cascade;
<span class="hljs-keyword">create</span> schema enceladus;
<span class="hljs-keyword">set</span> search_path<span class="hljs-operator">=</span><span class="hljs-string">'enceladus'</span>;
<span class="hljs-keyword">create table</span> inms(
id serial <span class="hljs-keyword">primary key</span>,
created_at <span class="hljs-type">timestamp</span> <span class="hljs-keyword">not null</span>,
<span class="hljs-type">date</span> <span class="hljs-type">date</span> <span class="hljs-keyword">not null</span> generated always <span class="hljs-keyword">as</span> (created_at::<span class="hljs-type">date</span>) stored,
<span class="hljs-keyword">year</span> <span class="hljs-type">int</span> <span class="hljs-keyword">not null</span> generated always <span class="hljs-keyword">as</span> (date_part(<span class="hljs-string">'year'</span>, created_at)) stored,
flyby_id <span class="hljs-type">int</span> <span class="hljs-keyword">references</span> flybys(id),
altitude <span class="hljs-type">numeric</span>(<span class="hljs-number">9</span>,<span class="hljs-number">2</span>) <span class="hljs-keyword">not null</span> <span class="hljs-keyword">check</span>(altitude <span class="hljs-operator">></span> <span class="hljs-number">0</span>),
source text <span class="hljs-keyword">not null</span> <span class="hljs-keyword">check</span>(source <span class="hljs-keyword">in</span>(<span class="hljs-string">'osi'</span>,<span class="hljs-string">'csn'</span>,<span class="hljs-string">'osnb'</span>,<span class="hljs-string">'osnt'</span>)),
mass <span class="hljs-type">numeric</span>(<span class="hljs-number">6</span>,<span class="hljs-number">3</span>) <span class="hljs-keyword">not null</span> <span class="hljs-keyword">check</span>(mass <span class="hljs-operator">>=</span><span class="hljs-number">0.125</span> <span class="hljs-keyword">and</span> mass <span class="hljs-operator"><</span> <span class="hljs-number">100</span>),
high_sensitivity_count <span class="hljs-type">int</span> <span class="hljs-keyword">not null</span> <span class="hljs-keyword">check</span>(high_sensitivity_count <span class="hljs-operator">></span> <span class="hljs-number">0</span>),
low_sensitivity_count <span class="hljs-type">int</span> <span class="hljs-keyword">not null</span> <span class="hljs-keyword">check</span>(low_sensitivity_count <span class="hljs-operator">></span> <span class="hljs-number">0</span>),
imported_at timestamptz <span class="hljs-keyword">not null</span> <span class="hljs-keyword">default</span> now()
);
To insert data into this table we need to execute an insert statement (which is going to fail, but that's OK it's kind of in the plan). Notice that I don't need to insert all of the fields - the generated bits take care of themselves, as does the imported_at timestamp due to its default:
<span class="hljs-keyword">insert into</span> inms(
created_at,
altitude,
source,
mass,
high_sensitivity_count,
low_sensitivity_count
)
<span class="hljs-keyword">select</span>
sclk::<span class="hljs-type">timestamp</span>,
alt_t::<span class="hljs-type">numeric</span>(<span class="hljs-number">9</span>,<span class="hljs-number">2</span>),
source,
mass_per_charge::<span class="hljs-type">numeric</span>(<span class="hljs-number">6</span>,<span class="hljs-number">3</span>),
c1counts::<span class="hljs-type">int</span>,
c1counts::<span class="hljs-type">int</span>
<span class="hljs-keyword">from</span> csvs.inms
<span class="hljs-keyword">where</span> target<span class="hljs-operator">=</span><span class="hljs-string">'ENCELADUS'</span>;
2 hours of SQL video using NASA's Cassini data. See why developers become data pros.