CSV, Excel, and Google Sheets Ingestion

What gets ingested

  • One row = one record.

  • The first non-empty row is treated as the header (column names).

  • Only visible (non-hidden) Google Sheets tabs are processed.

  • For Google Sheets, the displayed (formatted) values are ingested, not the raw formulas.

  • All content is processed as UTF-8 text.

Tip: Each row is converted to a simple, readable block for AI. Make headers self-explanatory because they become the labels seen by the AI:

{
Product name: Widget A
Price amount: 19.99
Price currency: USD
URL: https://example.com/widget-a
Tags: outdoor; waterproof
}

Best-practices checklist (quick)

  • Use a single header row with clear, human-readable column names.

  • Keep exactly one record per row; avoid merged cells.

  • Make spreadsheet and sheet/tab names meaningful.

  • Ensure consistent types in each column (all dates in one format, all prices as numbers, etc.).

  • Prefer one column per concept (split amounts, currencies, statuses, units).

  • Use visible sheets for data; hide or remove non-data sheets.

  • Share/permission the file so the crawler account can view it.

Designing great column names

  • Prefer self-explanatory, natural language headers:

    • Good: “Product name”, “Return policy”, “Support email”

    • Avoid: “PN”, “ret_pol”, “email1”

  • Use a single header row (Row 1) with no blank cells.

  • Avoid duplicate headers across the same sheet.

  • Keep names short but specific. If needed, use “Title Case” or “snake_case”; just be consistent.

  • Avoid punctuation that looks like data (e.g., “Price ($)”); instead use two columns “Price amount”, “Price currency”.

Layout rules

  • No merged cells, multi-row headers, or pivot tables in the data area.

  • Avoid blank top rows or columns; put headers on the first non-empty row, data starts immediately below.

  • Do not place totals/subtotals within the data table; keep those on a separate sheet.

  • Keep one dataset per sheet/tab. If you have different entities, use separate tabs.

  • Name spreadsheet and sheet tabs descriptively; these names are surfaced in document titles.

Google Sheets specifics

  • To ingest a specific sheet, copy the tab URL (it contains “#gid=…”). If you share the document’s root URL without “gid”, the first visible sheet is used by default.

  • Hidden sheets are skipped. Hide any sheets you do not want ingested.

  • Ensure the crawler/service (data@gleen.ai) account has at least Viewer access to the file.

  • Avoid protected ranges that block read access to header or data cells.

Large files and performance

  • Very large files can increase processing time and cost. Consider:

    • Splitting large datasets into multiple files or tabs (e.g., by time period or category).

    • Removing unused columns before upload.

    • Keeping only the latest, relevant rows if historical data is not needed.

Examples: good vs needs fixing

  • Column names

    • Good: “Customer name”, “Order date”, “Price amount”, “Price currency”

    • Needs Fix: “CN”, “date?”, “$price”, “cur”

  • Values

    • Good: “19.99” and “USD”

    • Needs Fix: “$19.99 USD” (mixes amount and currency into one cell)

  • Dates

    • Good: “2025-08-26”

    • Needs Fix: “Aug 26th, ‘25” (ambiguous)

  • Lists

    • Good: “outdoor; waterproof”

    • Needs Fix: “outdoor, waterproof / best” (multiple inconsistent separators)

Common mistakes to avoid

  • Missing or unclear headers; duplicate column names.

  • Multi-row or merged headers; blank header cells.

  • Merged cells anywhere in the data area.

  • Embedding totals/subtotals within the data table.

  • Hyperlinks with display text instead of a plain “URL” column.

  • Mixed data types in the same column (e.g., numbers and “N/A” and dates).

  • Hidden sheets that you intended to ingest, or visible sheets you didn’t intend to ingest.

  • Very wide tables with many unused columns; remove noise.

  • Clear, descriptive headers and consistent values are the single biggest factor in good results.

  • Use one dataset per visible sheet, keep layout simple, and avoid merged cells or complex formatting.

Last updated