Markdown File Extractor
Automatically parse Markdown files and extract tables into Flatfile Sheets
The Markdown Extractor plugin is designed to automatically parse Markdown files (.md
) uploaded to Flatfile. Its primary purpose is to find and extract any tables present in the markdown content. For each table it discovers, the plugin creates a new Sheet within the Flatfile Space. The table’s header row is used to define the fields (columns) of the Sheet, and the subsequent rows are converted into records. This is useful for importing data that is documented or stored in Markdown files, such as in project readmes, wikis, or technical documentation.
Installation
Install the Markdown Extractor plugin using npm:
Configuration & Parameters
The plugin accepts the following configuration options:
maxTables
- Type:
number
- Default:
Infinity
- Description: Sets the maximum number of tables to extract from a single Markdown file. By default, it will extract all tables it finds.
errorHandling
- Type:
'strict' | 'lenient'
- Default:
'lenient'
- Description: Controls how parsing errors are handled:
'strict'
: The plugin will throw an error and stop processing if it encounters a malformed table (e.g., a data row with a different number of columns than the header)'lenient'
: The plugin will log a warning to the console, skip the problematic table, and continue processing the rest of the file
debug
- Type:
boolean
- Default:
false
- Description: When set to true, enables verbose logging to the console. This is useful for troubleshooting issues with file parsing, as it shows the content being parsed, tables found, and the final normalized data.
Usage Examples
Basic Usage
Configuration Example
Direct Parser Usage
API Reference
MarkdownExtractor(options?)
The main factory function used to create and configure the markdown extractor plugin for use with a Flatfile listener.
Parameters:
options
(optional): Configuration settings object
Returns: An Extractor
instance that can be passed to listener.use()
Example:
markdownParser(buffer, options)
A low-level function that directly parses a Buffer containing markdown content into a Flatfile WorkbookCapture
object.
Parameters:
buffer
: A Node.js Buffer containing the UTF-8 encoded content of a markdown fileoptions
: Configuration settings object
Returns: A WorkbookCapture
object, which is a map of sheet names to sheet data
Example:
Troubleshooting
Error Handling Example
This example demonstrates how the errorHandling
option affects behavior when parsing a malformed table:
Debug Mode
To diagnose parsing issues, enable debug mode:
Notes
Default Behavior
- The plugin extracts all tables found in a markdown file by default (
maxTables: Infinity
) - Uses lenient error handling by default, skipping malformed tables and continuing processing
- Debug logging is disabled by default
Special Considerations
- This plugin is intended for use in a server-side Flatfile listener
- It operates on files with the
.md
extension - For each table found in a markdown file, a new Sheet is created with auto-generated names (
Table_1
,Table_2
, etc.) - The parser expects standard GitHub-flavored markdown table syntax
- Highly complex or non-standard tables may not be parsed correctly
Error Handling Patterns
- Lenient mode (default): Logs warnings for malformed tables and continues processing
- Strict mode: Throws errors immediately when encountering malformed tables, failing the entire file processing
Troubleshooting Tips
- Set
debug: true
to enable detailed logging for diagnosing parsing issues - Check console output for warnings about skipped tables in lenient mode
- Ensure markdown tables follow standard GitHub-flavored markdown syntax
- Verify that header and data rows have matching column counts