Timestamping when data has been created or updated is a regular occurrence in the developer world and if you're working with Postgres, the definition is simple:
create table products( --... created_at timestamptz not null default now(), updated_at timestamptz not null default now());
When preparing data for analysis, it's common to denormalize it in order to speed up the analysis process. It's common to see data pre-calculated and stored, especially if its historical data like ours is. I'm doing exactly this by creating a date
and year
field.
Postgres makes this kind of thing simple by offering generated
fields:
create table inms( id serial primary key, created_at timestamp not null, date date not null generated always as (created_at::date) stored, year int not null generated always as (date_part('year', created_at)) stored, --...);
Whenever the created_at
data changes, so will date
and year
because these are virtual columns that can't be changed directly.
Here's the entire table definition:
drop schema if exists enceladus cascade;create schema enceladus;set search_path='enceladus';create table inms( id serial primary key, created_at timestamp not null, date date not null generated always as (created_at::date) stored, year int not null generated always as (date_part('year', created_at)) stored, flyby_id int references flybys(id), altitude numeric(9,2) not null check(altitude > 0), source text not null check(source in('osi','csn','osnb','osnt')), mass numeric(6,3) not null check(mass >=0.125 and mass < 100), high_sensitivity_count int not null check(high_sensitivity_count > 0), low_sensitivity_count int not null check(low_sensitivity_count > 0), imported_at timestamptz not null default now());
To insert data into this table we need to execute an insert
statement (which is going to fail, but that's OK it's kind of in the plan). Notice that I don't need to insert all of the fields - the generated bits take care of themselves, as does the imported_at
timestamp due to its default:
insert into inms( created_at, altitude, source, mass, high_sensitivity_count, low_sensitivity_count)select sclk::timestamp, alt_t::numeric(9,2), source, mass_per_charge::numeric(6,3), c1counts::int, c1counts::intfrom csvs.inmswhere target='ENCELADUS';