Retour au blog
PDFApril 24, 2026par Dogufy Team

How to Convert a PDF to Excel (XLSX) — and Clean Up the Data

Turn a text-based PDF into an Excel file, then use simple cleanup steps like Text to Columns, TRIM, and Power Query to make it usable.

How to Convert a PDF to Excel (XLSX) — and Clean Up the Data

How to Convert a PDF to Excel (XLSX) — and Clean Up the Data

If you’ve ever received a PDF report, invoice, or statement and needed the numbers in a spreadsheet, the goal is usually the same: get the data into Excel without re-typing.

Dogufy’s PDF to Excel tool can extract text from a PDF into an .xlsx file. The key is knowing what to expect from a PDF (especially scans) and how to quickly clean up the resulting spreadsheet.

Quick answer (featured snippet)

To convert a PDF to Excel using Dogufy:

  1. Open PDF to Excel.
  2. Upload your PDF.
  3. Click Convert to Excel.
  4. Download the .xlsx file and open it in Excel.

If the output looks “messy” (often: everything in one column), use the cleanup steps later in this guide—especially Text to Columns and Power Query.

First: make sure your PDF actually contains text

PDFs come in two common flavors:

  • Text-based PDFs: You can highlight/copy text in a PDF viewer. These usually convert well.
  • Scanned/image PDFs: Each page is essentially a picture. Converting these to Excel typically requires OCR (text recognition).

Quick test:

  1. Open the PDF on your computer.
  2. Try selecting a few words with your cursor.
  3. If you can copy/paste that text, you likely have a text-based PDF.

If you can’t select text, the file is probably a scan. You can still use Dogufy to convert the PDF into images (try PDF to JPG or PDF to PNG), but converting a scan into an editable spreadsheet is an OCR workflow.

Convert PDF to Excel with Dogufy (step-by-step)

  1. Go to PDF to Excel.
  2. Upload your PDF.
  3. Click Convert to Excel and wait for the conversion to finish.
  4. Download the Excel file (.xlsx).
  5. Open it in Excel and review the output.

What the output usually looks like

Most PDF-to-Excel conversions start as raw extracted text. Depending on how your PDF was created (single-column vs. multi-column, tables vs. paragraphs), your spreadsheet may contain:

  • One line per row
  • Sections that need splitting into columns
  • Extra spaces or repeated headers/footers

That’s normal—and fixable.

Make the spreadsheet usable: practical cleanup steps

1) Use “Text to Columns” for simple splits

If each row contains multiple pieces of data separated by spaces, commas, or tabs:

  1. Select the column that contains the extracted text (often Column A).
  2. Go to Data → Text to Columns.
  3. Choose Delimited.
  4. Pick the delimiter that matches your data (commas, tabs, or spaces).
  5. Finish and check the results.

Tip: If you aren’t sure which delimiter to pick, copy one row into a plain text editor and look for consistent separators.

2) Clean whitespace with TRIM() (and CLEAN() when needed)

Extracted text often includes extra spaces. A fast pattern:

  • In a new column, use =TRIM(A2) (adjust the cell reference)
  • Fill down
  • Copy the cleaned column and Paste Special → Values

If you see odd invisible characters (common when copying from PDFs), try:

  • =CLEAN(TRIM(A2))

3) Remove repeated headers/footers

Reports often repeat titles, dates, or page numbers on every page. In Excel:

  1. Filter the column that contains text.
  2. Search for the repeated header phrase.
  3. Delete those rows (or filter them out before analysis).

4) Use Power Query when the structure repeats

For PDFs that follow a consistent pattern across pages (like “Date / Description / Amount”):

  1. Convert with PDF to Excel.
  2. In Excel, go to Data → Get Data → From Table/Range.
  3. Use Split Column, Replace Values, and Remove Rows to create a repeatable cleanup.
  4. Load the cleaned result back into a new sheet.

Power Query is worth it if you’ll do the same conversion every week/month.

Common problems (and what to do)

“Everything is in one column”

This usually means the PDF text was extracted as plain lines. Try:

  • Text to Columns (best first step)
  • Power Query → Split Column by Delimiter
  • If your data is fixed-width (aligned with spaces), use Text to Columns → Fixed width

“My numbers lost their decimal separators”

This is often a locale issue (comma vs. dot). In Excel, check:

  • File → Options → Advanced → Use system separators

Then re-parse the column or replace separators consistently.

“The PDF is a scan, so nothing converts”

If the PDF is image-only, you’ll need OCR to recognize text. A practical approach is:

  1. Convert the PDF pages to images with PDF to JPG.
  2. Run OCR with a tool that supports your language and layout.
  3. Import the OCR output into Excel and clean it up (steps above).

Helpful related tools and next steps

  • If your PDF is too large to share or upload, shrink it first with Compress PDF.
  • If you need to edit the PDF before extracting data (for example, remove a cover page), use Split PDF and then convert the smaller file.
  • For a broader “what to do next with PDFs” workflow, see Mastering PDF Workflows.

FAQ

Does converting a PDF to Excel preserve table formatting?

Not always. Many PDFs don’t store tables as “real” spreadsheet tables—they store positioned text. Converters often extract text, then you reorganize it with tools like Text to Columns or Power Query.

Will this work with password-protected PDFs?

Some encrypted PDFs can’t be processed by web tools. If the converter can’t open the file, export an unlocked copy from your PDF app (if you have permission) and try again.

What’s the best way to get clean Excel data from a PDF?

When possible, avoid converting at all: ask for the original spreadsheet (XLSX/CSV). If you must convert, start with a text-based PDF and use a repeatable cleanup in Power Query.

Consentement aux cookies

Les analyses ne sont activées qu'après votre accord. Le stockage nécessaire reste actif pour la sécurité et le fonctionnement essentiel du site.

Politique de confidentialité

How to Convert a PDF to Excel (XLSX) — and Clean Up the Data - dogufy.com | Dogufy