Process data with Data Hooks®
Data Hooks®, are short functions to re-format, correct, validate, and enrich data automatically during a data import. They can run on specific fields and records (rows), or they can run across an entire batch of records.
Highlights:
Field Hooks
Field Hooks are functions that help to clean up or validate the input data of a field.
The three types of field hooks are cast
, compute
and validate
.
cast
cast
is present on every field. cast
converts the intial user-inputted data to the correct type needed for that field so it can be processed. Each of the basic field types comes prebuilt with a default cast
hook, except for OptionField
and LinkedField
. This hook ensures that compute
and validate
each receive a value of the type specified by the field:
Field type | Operating type |
---|---|
TextField | string |
NumberField | number |
DateField | Date |
OptionField | string (one of options ) |
BooleanField | boolean |
LinkedField | n/a |
When cast
cannot parse an incoming value, it will display an error message in the UI and retain the original value so users can edit that value. When cast
receives null
or undefined
, it returns null
. In both cases, compute
and validate
are not called.
We strongly recommend using the default cast
hook for most fields. Only override this hook if you are comfortable making fundamental changes to the field. See the examples section for an example of a custom cast function.
compute
compute
performs transformations to the value returned from cast
. Use compute
to modify data in place before proceeding to validate
.
compute
always receives data of the type determined by the field's cast
hook, and returns data of the same type.
In this example, we want to convert all employee names to lowercase:
const Employees = new Sheet("Employees", {
name: TextField({
label: "Name",
description: "Employee's name",
required: true,
// convert name to lowercase
compute: (name: string) => {
return name.toLowerCase();
},
}),
});
validate
Use validate
to define rules about the data you want to accept. validate
receives the value submitted to the field after it has been processed by cast
, compute
and any Record Hooks that are present on the field.
validate
takes a typed value and returns annotations. You can display an error, warning, or comments to the cell informing your user about what needs to be fixed. Validate does not return a value, only annotations.
In this example, we want to ensure that all employee salaries are above minimum wage. If a salary is below minimum wage, we display a message to the user:
const Employees = new Sheet("Employees", {
salary: NumberField({
label: "Salary",
description: "Annual Salary in USD",
required: true,
validate: (salary: number) => {
const minSalary = 30_000;
if (salary < minSalary) {
return [
new Message(`${salary} is less than minimum wage ${minSalary}`, "warn", "validate"),
];
}
},
}),
});
Record Hooks
Record Hooks are functions that operate on an entire record, or row, of data. Record hooks have access to all fields on a row, and should be used for operations where you need access to multiple fields, or want to create a new field.
There are two types of Record Hooks: recordCompute
and batchRecordsCompute
. They are defined in the Sheet options object.
const Employees = new Sheet(
"Employees",
{
// Fields
firstName: TextField({
required: true,
}),
lastName: TextField({
required: true,
}),
// Set by recordCompute
fullName: TextField(),
// Set by batchRecordsCompute
fromHttp: TextField(),
},
{
// Sheet options with Record Hooks
recordCompute: (record) => {
const fullName = `${record.get("firstName")} ${record.get("lastName")}`;
record.set("fullName", fullName);
return record;
},
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch("https://api.us.flatfile.io/health", {
method: "GET",
headers: {
Accept: "application/json",
},
});
const result = await response.json();
payload.records.map(async (record: FlatfileRecord) => {
record.set("fromHttp", result.info.postgres.status);
});
},
},
);
recordCompute
recordCompute
runs across all fields in a single record. Use recordCompute
when you want to update values based on multiple fields in the record. In this example, we use the values of the firstName
and lastName
fields to set the value of the fullName
field:
recordCompute: (record) => {
const fullName = `${record.get("firstName")} ${record.get("lastName")}`;
record.set("fullName", fullName);
return record;
};
recordCompute
runs once for every single row of data, so avoid including any computationally intensive operations. External network requests cannot be made in recordCompute
.
batchRecordsCompute
batchRecordsCompute
runs after all recordCompute
calls have finished and receives batches of processed rows. By default, the batch size is 1000 rows.
You can use this hook to batch records together and perform bulk operations that are too time-intensive to run in recordCompute
. In most cases, this should only be used for external network requests.
In this example, we pass the input rows into a single API request that returns a list of updated records.
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch("https://api.us.flatfile.io/health", {
method: "GET",
headers: {
Accept: "application/json",
},
});
const result = await response.json();
payload.records.map(async (record: FlatfileRecord) => {
record.set("fromHttp", result.info.postgres.status);
});
};
Since batchRecordsCompute
receives batches of rows, not the full set of uploaded rows, you cannot use it to reliably check for uniqueness or other sheet-wide constraints.
Example Data Hooks
Percentage
Suppose we want to accept data about loans, including the percent interest rate. We want to accept either percentages, e.g. "6.5%", or pure numbers representing a percentage, e.g. "0.065", and interpret them the same way. Suppose we want to use this data later to calculate monthly payments on the loan; in that case, we'd want both "6.5%" and "0.065" to be represented as the numerical version, 0.065.
This is a good use case for cast
. We want to accept different kinds of incoming data, in this case strings and numbers, and cast them to numbers depending on how they are formatted. This cast function has three steps:
- First, we check for
null
,undefined
, and empty strings, and cast them tonull
. - Next, we check if we have a number written as a percentage. If so, we extract the numerical portion and divide it by 100.
- For other strings, we attempt to parse them directly to numbers using
parseFloat
. If we can, we return them as-is; if we cannot, we surface an error to the user.
const Loans = new Sheet("Loans", {
rate: NumberField({
label: "Rate",
cast: (val) => {
// Cast the value to a string and get rid of whitespace
const stringVal = String(val).trim();
let stringNumericalVal;
let multiplier;
// Check for empty values
if (stringVal === "null" || stringVal === "undefined" || stringVal.length === 0) {
return null;
}
// Check if the number is written as a percentage
if (stringVal.charAt(stringVal.length - 1) === "%") {
stringNumericalVal = stringVal.substring(stringVal.length - 1);
multiplier = 0.01;
} else {
stringNumericalVal = stringVal;
multiplier = 1;
}
// Try to parse the remaining string as a number
let numericalVal = parseFloat(stringNumericalVal);
if (isNaN(numericalVal)) {
throw new Error(`'${val}' could not be parsed as a number or percentage.`);
} else {
return numericalVal * multiplier;
}
},
}),
});
Email or phone required
Suppose we want to collect information about people, and we need a contact method for them - either phone, email, or both. We can't mark them both required
, since we want to accept the record when one is filled out and the other is not.
This is a good use case for recordCompute
. Our hook needs information about two different fields on a record. We check for the presence of email
or phone
, and if both are empty, we surface a warning to the user.
const People = new Sheet(
"People",
{
email: TextField({
label: "email",
}),
phone: NumberField({
label: "phone number",
}),
},
{
recordCompute: (record) => {
const { email, phone } = record.value;
const isEmpty = (val) => val === null || val === undefined || val === "";
if (isEmpty(email) && isEmpty(phone)) {
record.addWarning(["email", "phone"], "Must have either phone or email");
}
},
},
);
Full name splitter
Suppose we want to collection information about people, and we need first and last name as separate fields. However, sometimes our users might have a "full name" field that we need to split and assign to first name and last name for them.
This is another good use of recordCompute
. In this case, instead of reading from two fields, we are writing to two fields (first name and last name), based on the value of a third (full name).
const People = new Sheet(
"People",
{
firstName: TextField({
label: "first name",
}),
lastName: TextField({
label: "last name",
}),
fullName: TextField({
label: "full name",
}),
},
{
recordCompute: (record) => {
const { fullName, firstName, lastName } = record.value;
const isEmpty = (val) => val === null || val === undefined || val === "";
if (isEmpty(firstName) && isEmpty(lastName) && fullName.includes(" ")) {
const parts = fullName.split(" ");
record
.set("firstName", parts[0])
.set("lastName", parts.slice(1, parts.length).join(" ").trim())
.addComment(["firstName", "lastName"], "Full name was split");
}
},
},
);
Verify country exists using an external API
Suppose we want to accept data about sales accounts, with a country
field to specify which country the account is based in. We want to verify that the countries submitted are real, but we don't want to maintain our own full list of all valid countries, with all their various spellings, since those names can change over time. Instead, we want to query an external API for a list of all valid country names at the time of import, and compare the user's input to that list.
This is good use case for batchRecordsCompute
. External HTTP requests can be slow. In order to avoid slowing down our import process by issuing a separate request for each record, we'd rather batch them together and query the external API with many records at once. batchRecordsCompute
executes once for each batch of records (1000 by default) and allows you to perform bulk actions on them. In this case, we query an imaginary API endpoint that gives us an array of all valid country names, and check our batch of records to make sure that their country
field appears on the list. If it does not, we remove the value from the field to prevent bad data from entering our systems.
const Accounts = new Sheet(
"Accounts",
{
accountName: TextField({
label: "account name",
}),
country: TextField({
label: "country",
}),
},
{
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch("https://some_url.com/api/countries", {
method: "GET",
headers: {
Accept: "application/json",
},
});
const result = await response.json();
// Suppose we get data back as { countries: ['Afghanistan', 'Albania', ... ]}
const validCountries = result.countries;
payload.records.map(async (record: FlatfileRecord) => {
if (!validCountries.includes(record.value.country)) {
record.set("country", null);
}
});
},
},
);
Logging in Data Hooks
You can also choose to insert your own logging statements within the Data Hook code. This is useful for testing and debugging, as well as monitoring when something codes wrong. Logging statements can only be placed in Record Hooks at this time.
logger.info("your message here"): adds an information-level log message when executed
logger.warn("your message here"): adds a warning-level log message when executed
logger.error("your message here"): adds an error-level log message when executed
To review logs, navigate to the Logs section of the Flatfile dashboard, where you can see each event (import) that triggered a Data Hook. Logs contain your own logging messages as well as system-level information that indicate the result of any attempted Data Hook execution.
In this example, we're adding a logging message to the hook example above to indicate that a Full Name was successfully generated.
recordCompute: (record, session, logger) => {
const fullName = `${record.get('firstName')} ${record.get('lastName')}`
record.set('fullName', fullName)
logger.info(`Generated {record.get('fullName')}`)
return record
},