Mapping
Mapping
In Flatfile, mapping describes the process of getting data from an uploaded file into a destination worksheet in your application.
The Ghost of Mapping Past
Historically mapping in Flatfile has consisted of two pieces:
- A field mapping which describes which columns in the source data corresponds to which fields in the destination worksheet. For example, maybe the “fname” column in the source data should be mapped to the “First Name” field in the destination worksheet.
- For every destination field of type enum, an enum mapping describing which values in the source column correspond to which allowable enum values in the destination column. For example, you could have an enum column of country codes, and you might need the source value “United States” to map to the enum value “US”.
Flatfile has developed machine-learning models trained on millions of historical mappings that can suggest both field and enum mappings based on the column names and values in the source data. You can of course override these suggestions. The Flatfile platform remembers your previously chosen mappings and will suggest them when it sees similar schemas / data in the future.
The Ghost of Mapping Present
Many users of Flatfile need more control over the mapping process than just assigning source fields to destination fields. For example, you might need to concatenate multiple source fields into a single destination field, or you might need to choose the first non-empty value from a set of source fields.
To this end, we’ve recently introduced a wider variety of mapping rules that specify these more complex mapping-related transformations on your data, and the notion of a mapping program that selectively applies these rules to your dataset.
As of right now the only rule type supported by the Flatfile application is the
assign
rule, as described above.
However, we have introduced a Python SDK that allows you to create your own mapping rules and mapping programs and apply them to your own datasets outside of Flatfile (e.g. in an ETL pipeline). The SDK is currently in extreme beta, but we would love to hear your feedback on it. It supports the full complement of mapping rules, as described on its website.
If you have a Flatfile secret key, the SDK also exposes an “automapping” feature
that uses the previously-described machine learning models to suggest mapping
programs for a given source and destination schema. As with the Flatfile app,
this feature currently only provides assign
rules (and also ignore
rules
that don’t do anything but just communicate that a source field was not mapped).
The Ghost of Mapping Future
In the near future we will be expanding mapping in several ways:
Richer Mapping Rules in the Flatfile application
We are working on adding support for the full set of mapping rules in the Flatfile application. This will allow a user mapping an incoming file to create complex mapping programs, save them, and reuse them.
Richer AI Assist for Mapping Rules
Currently our machine learning algorithms only suggest “assign” rules. In the future they will also be able to suggest more complex mapping rules; for instance, if your incoming data had a “first name” and “last name” field, and your destination sheet only a “full name” field, the AI might suggest a “concatenate” rule that combines the two source fields into the destination field.
In addition, we are working on a natural-language interface for creating mapping rules; for instance, you could type “concatenate first name and last name into full name” and the AI would produce the appropriate rule.
JavaScript SDK
We are working on a JavaScript SDK that has similar functionality to the Python SDK, the most notable difference being that it can’t interact with Pandas dataframes. It will be available soon.