util-extractor

A library containing common utilities and helpers for extractors.


plugin-autocast

A plugin for automatically casting values in Flatfile.

plugin-automap

A plugin to provide automapping imported files for headless workflows.

plugin-constraints

A plugin for extending blueprint with external constraints

plugin-convert-currency

A Flatfile plugin for currency conversion using Open Exchange Rates API

plugin-convert-json-schema

A plugin for converting JSON Schema to Flatfile Blueprint and configuring a...

plugin-convert-openapi-schema

A plugin for converting OpenAPI schema to Flatfile Blueprint.

plugin-convert-sql-ddl

A plugin for converting SQL DDL into Flatfile Blueprint.

plugin-convert-translate

A Flatfile Listener plugin for field translation using the Google Translate...

plugin-convert-what3words

A Flatfile plugin for converting What3Words addresses to standard addresses...

plugin-convert-yaml-schema

A plugin for converting YAML Schema definitions to Flatfile Blueprint.

plugin-dedupe

Dedupe records in a sheet via a sheet level custom action.

plugin-delimiter-extractor

A plugin for parsing .delimiter files in Flatfile.

plugin-dxp-configure

A plugin for using DXP class-based configurations.

plugin-enrich-geocode

A Flatfile plugin for geocoding addresses using the Google Maps Geocoding A...

plugin-enrich-gpx

A Flatfile plugin for parsing GPX files and extracting relevant data

plugin-enrich-sentiment

A Flatfile plugin for sentiment analysis of text fields in records

plugin-enrich-summarize

A Flatfile plugin for text summarization and key phrase extraction

plugin-export-delimited-zip

A Flatfile plugin for exporting Workbooks to delimited files and zipping th...

plugin-export-pivot-table

A Flatfile plugin for generating pivot tables from sheet data and saving as...

plugin-export-workbook

A plugin for exporting data in Flatfile to Workbooks.

plugin-extract-html-table

A Flatfile plugin for extracting table data from HTML files

plugin-extract-markdown

A plugin for parsing markdown files in Flatfile.

plugin-import-faker

A Flatfile plugin that generates example records using Faker

plugin-import-llm-records

A Flatfile plugin that generates example records using AI

plugin-import-rss

A Flatfile plugin for importing RSS feed data

plugin-job-handler

A plugin for handling Flatfile Jobs.

plugin-json-extractor

A plugin for parsing json files in Flatfile.

plugin-markdown-extractor

A plugin for parsing markdown files in Flatfile.

plugin-pdf-extractor

A plugin for parsing PDF files in Flatfile.

plugin-record-hook

A plugin for running custom logic on individual data records in Flatfile.

plugin-rollout

A plugin for automatically rolling out new changes to workbooks in flatfile...

plugin-space-configure

A plugin for configuring a Flatfile Space.

plugin-space-configure-from-template

A plugin for configuring a Flatfile Space from a Space Template.

plugin-stored-constraints

A plugin for running stored constraints

plugin-validate-boolean

A Flatfile plugin for boolean validation with multi-language support

plugin-validate-date

A Flatfile plugin for normalizing date formats

plugin-validate-email

A Flatfile Listener plugin for email validation

plugin-validate-isbn

A Flatfile Listener plugin for ISBN validation with configurable options. V...

plugin-validate-number

A Flatfile Listener plugin for number validation

plugin-validate-phone

A validator plugin for phone number formatting on individual data records i...

plugin-validate-string

A Flatfile plugin for string configuration and validation

plugin-view-mapped

A plugin for making the view post mapping show only mapped columns.

plugin-webhook-egress

A plugin for egressing data from a Flatfile Workbook to a webhook.

plugin-xlsx-extractor

A plugin for parsing xlsx files in Flatfile.

plugin-xml-extractor

A plugin for parsing .xml files in Flatfile.

plugin-zip-extractor

A plugin for unzipping zip files and uploading content back in Flatfile.

util-extractor

A library containing common utilities and helpers for extractors.

util-file-buffer

A utility for extracting data from any file and making it available as a bu...

util-response-rejection

This plugin handles response rejections returned from an external source.

Installation


Installnpm i @flatfile/util-extractor
Source: View source
Package:@flatfile/util-extractor 3k installs

The @flatfile/util-extractor utility is designed to handle various file formats and extract structured data efficiently. It leverages the power of the Flatfile API to facilitate smooth data extraction and processing.

When embedding Flatfile, this plugin should be deployed in a server-side listener. Learn more

Code Breakdown

This code defines a shared utility used in Flatfile Extractor plugins for data extraction. The purpose of this utility is to process files and extract structured data from them using the Flatfile API. It leverages various Flatfile SDK components and utilities (@flatfile/util-file-buffer to perform the data extraction process efficiently. Let's break down the main components of the code:

Import Statements

  • The code starts by importing various dependencies and modules required for data extraction. These include FlatfileListener from @flatfile/listener, fileBuffer from @flatfile/util-file-buffer, api and Flatfile from @flatfile/api, and mapValues from remeda.

Extractor Function

  • The Extractor function is the main entry point of the utility. It takes three parameters:
    • fileExt: A string or regular expression representing the file extension(s) to be processed.
    • parseBuffer: A callback function that processes the file buffer and returns a WorkbookCapture.
    • options: An optional object containing additional configuration options for data extraction.
      • options.chunkSize: Specifies the quantity of Records per chunk. Default value is 10_000.
      • options.parallel: Determines how many chunks are processed simultaneously. Default value is 1.
  • Inside the Extractor function, a FlatfileListener is used to intercept files with the specified extensions and perform the extraction process.

Extraction Process

  • When a file with the specified extension is received, the extraction process begins. The utility creates a job using the Flatfile API to track the extraction progress.
  • The parseBuffer function is then called to extract data from the file buffer using the provided options.
  • A workbook is created based on the extracted data using the createWorkbook function. The workbook is then updated with sheets containing the extracted data.

Helper Functions

  • The code includes several helper functions to assist in the extraction process:
    • createWorkbook: Creates a Flatfile workbook based on the extracted data.
    • getWorkbookConfig: Generates the configuration for the workbook, including sheet names, fields, and constraints.
    • getSheetConfig: Generates the configuration for each sheet, including field names, labels, and constraints.
    • asyncBatch: A utility function that processes data in chunks asynchronously to improve performance.

Type Definitions

  • The code defines two generic type structures: WorkbookCapture and SheetCapture. These types represent the structure of captured workbooks and sheets.