plugin-xlsx-extractor

A plugin for parsing xlsx files in Flatfile.


plugin-autocast

A plugin for automatically casting values in Flatfile.

plugin-automap

A plugin to provide automapping imported files for headless workflows.

plugin-constraints

A plugin for extending blueprint with external constraints

plugin-convert-currency

A Flatfile plugin for currency conversion using Open Exchange Rates API

plugin-convert-json-schema

A plugin for converting JSON Schema to Flatfile Blueprint and configuring a...

plugin-convert-openapi-schema

A plugin for converting OpenAPI schema to Flatfile Blueprint.

plugin-convert-sql-ddl

A plugin for converting SQL DDL into Flatfile Blueprint.

plugin-convert-translate

A Flatfile Listener plugin for field translation using the Google Translate...

plugin-convert-what3words

A Flatfile plugin for converting What3Words addresses to standard addresses...

plugin-convert-yaml-schema

A plugin for converting YAML Schema definitions to Flatfile Blueprint.

plugin-dedupe

Dedupe records in a sheet via a sheet level custom action.

plugin-delimiter-extractor

A plugin for parsing .delimiter files in Flatfile.

plugin-dxp-configure

A plugin for using DXP class-based configurations.

plugin-enrich-geocode

A Flatfile plugin for geocoding addresses using the Google Maps Geocoding A...

plugin-enrich-gpx

A Flatfile plugin for parsing GPX files and extracting relevant data

plugin-enrich-sentiment

A Flatfile plugin for sentiment analysis of text fields in records

plugin-enrich-summarize

A Flatfile plugin for text summarization and key phrase extraction

plugin-export-delimited-zip

A Flatfile plugin for exporting Workbooks to delimited files and zipping th...

plugin-export-pivot-table

A Flatfile plugin for generating pivot tables from sheet data and saving as...

plugin-export-workbook

A plugin for exporting data in Flatfile to Workbooks.

plugin-extract-html-table

A Flatfile plugin for extracting table data from HTML files

plugin-extract-markdown

A plugin for parsing markdown files in Flatfile.

plugin-import-faker

A Flatfile plugin that generates example records using Faker

plugin-import-llm-records

A Flatfile plugin that generates example records using AI

plugin-import-rss

A Flatfile plugin for importing RSS feed data

plugin-job-handler

A plugin for handling Flatfile Jobs.

plugin-json-extractor

A plugin for parsing json files in Flatfile.

plugin-markdown-extractor

A plugin for parsing markdown files in Flatfile.

plugin-pdf-extractor

A plugin for parsing PDF files in Flatfile.

plugin-record-hook

A plugin for running custom logic on individual data records in Flatfile.

plugin-rollout

A plugin for automatically rolling out new changes to workbooks in flatfile...

plugin-space-configure

A plugin for configuring a Flatfile Space.

plugin-space-configure-from-template

A plugin for configuring a Flatfile Space from a Space Template.

plugin-stored-constraints

A plugin for running stored constraints

plugin-validate-boolean

A Flatfile plugin for boolean validation with multi-language support

plugin-validate-date

A Flatfile plugin for normalizing date formats

plugin-validate-email

A Flatfile Listener plugin for email validation

plugin-validate-isbn

A Flatfile Listener plugin for ISBN validation with configurable options. V...

plugin-validate-number

A Flatfile Listener plugin for number validation

plugin-validate-phone

A validator plugin for phone number formatting on individual data records i...

plugin-validate-string

A Flatfile plugin for string configuration and validation

plugin-view-mapped

A plugin for making the view post mapping show only mapped columns.

plugin-webhook-egress

A plugin for egressing data from a Flatfile Workbook to a webhook.

plugin-xlsx-extractor

A plugin for parsing xlsx files in Flatfile.

plugin-xml-extractor

A plugin for parsing .xml files in Flatfile.

plugin-zip-extractor

A plugin for unzipping zip files and uploading content back in Flatfile.

util-extractor

A library containing common utilities and helpers for extractors.

util-file-buffer

A utility for extracting data from any file and making it available as a bu...

util-response-rejection

This plugin handles response rejections returned from an external source.

Installation


Installnpm i @flatfile/plugin-xlsx-extractor
Source: View source
Package:@flatfile/plugin-xlsx-extractor 3k installs

The @flatfile/plugin-xlsx-extractor plugin is designed to extract structured data from Excel files. It utilizes various libraries to parse Excel files and retrieve the structured data efficiently.

Event Type: listener.on('file:created')

Supported file types: .xls, .xlsx, .xlsm, .xlsb, .xltx, .xltm

When embedding Flatfile, this plugin should be deployed in a server-side listener. Learn more

Parameters

raw - boolean

In Excel, you could have formatting on a text cell (i.e. date formatting). By default, Flatfile will just take the formatted text versus the raw values. Set this value to true to take the raw values and disregard how it's displayed in Excel.

rawNumbers - boolean

In Excel, you could have rounding or formatting on a number cell to only display say 2 decimal places. By default, Flatfile will just take the displayed decimal places versus the raw numbers. Set this value to true to take the raw numbers and disregard how it's displayed in Excel.

dateNF - string - (optional)

The dateNF parameter allows you to specify the date format for parsing dates. (i.e. yyyy-mm-dd)

chunkSize - default: "10_000" - number - (optional)

The chunkSize parameter allows you to specify the quantity of records to in each chunk.

parallel - default: "1" - number - (optional)

The parallel parameter allows you to specify the number of chunks to process in parallel.

headerDetectionOptions - Object - (optional)

The headerDetectionOptions parameter allows you to specify the options for detecting headers in the file. By default, the first 10 rows are scanned for the row with the most non-empty cells.

skipEmptyLines - default: "false" - boolean - (optional)

The skipEmptyLines parameter allows you to specify if empty lines should be skipped. By default, empty lines are included.

debug - default: "false" - boolean - (optional)

The debug parameter lets you toggle on/off helpful debugging messages for development purposes.

API Calls

  • api.files.download
  • api.files.get
  • api.files.update
  • api.jobs.ack
  • api.jobs.complete
  • api.jobs.create
  • api.jobs.fail
  • api.jobs.update
  • api.records.insert
  • api.workbooks.create

Usage

Listen for an Excel file (all extensions supported) to be uploaded to Flatfile. The platform will then extract the file automatically. Once complete, the file will be ready for import in the Files area.

npm i @flatfile/plugin-xlsx-extractor
import { ExcelExtractor } from "@flatfile/plugin-xlsx-extractor";

listener.js

listener.use(ExcelExtractor());

Additional options

listener.use(ExcelExtractor({ raw: true, rawNumbers: true }));

Header Detection

Three detection options are provided for detecting headers in the file: default, explicitHeaders, and specificRows. By default, the first 10 rows are scanned for the row with the most non-empty cells. This row is then used as the header row.

Default

It looks at the first rowsToSearch rows and takes the row with the most non-empty cells as the header, preferring the earliest such row in the case of a tie.

listener.use(ExcelExtractor());
// or...
listener.use(
  ExcelExtractor({
    headerDetectionOptions: {
      algorithm: "default",
      rowsToSearch: 30, // Default is 10
    },
  })
);

Explicit Headers

This implementation simply returns an explicit list of headers it was provided with.

listener.use(
  ExcelExtractor({
    headerDetectionOptions: {
      algorithm: "explicitHeaders",
      headers: ["fiRsT NamE", "LaSt nAme", "emAil"],
    },
  })
);

Specific Rows

This implementation looks at specific rows and combines them into a single header. For example, if you knew that the header was in the third row, you could pass it { rowNumbers: [2] }.

listener.use(
  ExcelExtractor({
    headerDetectionOptions: {
      algorithm: "specificRows",
      rowNumbers: [2], // 0 based
    },
  })
);

Data Row & Sub Header Detection

This implementation attempts to detect the first data row and select the previous row as the header. If the data row cannot be detected due to all the sample rows being full and not castable to a number or boolean type, it also will attempt to detect a sub header row by checking following rows after a header is detected for significant fuzzy matching. If over half of the fields in a possible sub header row fuzzy match with the originally detected header row, the sub header row becomes the new header.

listener.use(
  ExcelExtractor({
    headerDetectionOptions: {
      algorithm: "dataRowAndSubHeaderDetection",
      rowsToSearch: 30, // Default is 10
    },
  })
);

Full Example

In this example, the ExcelExtractor is initialized with optional options, and then registered as middleware with the Flatfile listener. When an Excel file is uploaded, the plugin will extract the structured data and process it using the extractor's parser.

listener.js

import { ExcelExtractor } from "@flatfile/plugin-xlsx-extractor";

export default async function (listener) {
  // Define optional options for the extractor
  const options = {
    raw: true,
    rawNumbers: true,
  };

  // Initialize the Excel extractor
  const excelExtractor = ExcelExtractor(options);

  // Register the extractor as a middleware for the Flatfile listener
  listener.use(excelExtractor);

  // When an Excel file is uploaded, the data will be extracted and processed using the extractor's parser.
}