Cách trích xuất bảng từ PDF sang Excel (Miễn phí)
Hướng dẫn từng bước trích xuất bảng từ PDF sang Excel. So sánh copy-paste thủ công, công cụ chuyển đổi online và AI.
You have a PDF with a table — an invoice with line items, a financial report with quarterly figures, a shipping manifest with package details. You need that data in Excel so you can sort it, filter it, run formulas, or import it into your accounting software. This should be simple, but anyone who has tried knows it is not.
PDFs were designed for viewing and printing, not for data extraction. When you copy a table from a PDF and paste it into Excel, the columns collapse into a single cell, the rows merge together, and you spend the next thirty minutes manually reformatting everything. Sometimes the data does not even paste in the right order.
This guide covers three approaches to extracting tables from PDFs into Excel, starting with the most effective.
Why PDF Tables Are Hard to Extract
To understand why this is difficult, it helps to know how PDFs store data. Unlike a spreadsheet where data lives in defined cells with rows and columns, a PDF stores text as positioned characters on a page. There is no concept of a "table" in the PDF format — what looks like a table to your eyes is just text placed at specific coordinates with lines drawn between them.
When you select and copy text from a PDF, your PDF viewer tries to reconstruct the reading order from those coordinates. For simple paragraphs, this works well. For tables, it often fails because the viewer cannot reliably determine which text belongs in which column, especially when columns have varying widths or when cells contain multi-line text.
This is why copy-paste from PDF to Excel produces garbled results. The structure that your eyes can see instantly is invisible to the software.
Method 1: AI-Powered Table Extraction (Recommended)
AI extraction tools solve the fundamental problem by using vision models that can see the table the same way you do. Instead of trying to reconstruct table structure from character positions, the AI looks at the entire page as an image and identifies rows, columns, headers, and cell values.
This approach works with both digital PDFs (where text is selectable) and scanned PDFs (where the table is just an image). The AI handles the additional challenges of scanned documents — skewed pages, uneven lighting, and low resolution — that make traditional extraction methods fail completely.
How to extract tables from PDF using AI extraction:
1. Upload your PDF to an AI extraction tool like DocPrivy 2. The AI analyzes the document and identifies all tables 3. Each table is extracted with correct column headers, row structure, and cell values 4. Review the extracted data in a preview table 5. Export directly to XLSX (Excel) or CSV
The key advantage is accuracy. AI extraction preserves the table structure that copy-paste destroys. Numbers stay in the right columns, dates remain formatted correctly, and multi-line cell content is handled properly.
You can also customize the output before exporting: reorder columns by dragging, hide columns you do not need, or merge multiple columns into one. This means the Excel file matches exactly what your downstream workflow expects.
Method 2: Online PDF to Excel Converters
Online converters like Smallpdf, ILovePDF, or Adobe Acrobat Online offer PDF-to-Excel conversion. You upload a PDF, and they return an XLSX file with the content arranged in cells.
These tools work by analyzing the text positions in the PDF and attempting to reconstruct the table grid. For simple tables with clear borders and consistent formatting, the results can be decent. For complex tables — merged cells, nested headers, tables that span multiple pages, or tables without visible borders — the results are often unusable.
Limitations of online converters:
They convert the entire page, not just the table. Your Excel file will contain headers, footers, page numbers, and other non-table content mixed in with your data.
They struggle with scanned PDFs. If your PDF is a scan or photo, most converters either fail entirely or produce very poor results because they rely on the PDF text layer rather than visual analysis.
Many have strict file size or page limits on their free tiers, pushing you toward paid subscriptions for regular use.
Privacy is a concern. Your PDF is uploaded to their servers and processed there. For financial documents or contracts containing sensitive data, this may not be acceptable.
Method 3: Manual Copy-Paste with Cleanup
The manual approach involves copying the table from your PDF viewer and pasting it into Excel, then fixing the formatting by hand. This is the most time-consuming method but requires no tools beyond what you already have.
Steps for manual extraction:
1. Open the PDF in Adobe Acrobat Reader, your browser, or another PDF viewer 2. Select the table text — try to select just the table area, not surrounding text 3. Copy the selection (Ctrl+C or Cmd+C) 4. Open Excel and paste into cell A1 5. Use Text to Columns (Data tab → Text to Columns) to split the pasted text into separate columns 6. Manually fix any rows that did not split correctly 7. Delete empty rows and clean up formatting
This method works for simple tables with a few rows. For tables with more than about 20 rows, the cleanup time makes this approach impractical. For tables from scanned PDFs, it does not work at all since there is no selectable text.
One improvement: if your PDF viewer supports it, try pasting into Google Sheets first instead of Excel. Google Sheets sometimes handles the paste formatting slightly better, and you can then download as XLSX.
Comparison: Which Method Should You Use?
For a quick comparison:
AI extraction: Best accuracy, works with scanned PDFs, preserves table structure, customizable output. Free with DocPrivy. Best for invoices, financial reports, and any document where accuracy matters.
Online converters: Moderate accuracy for simple tables, fails with scanned PDFs, limited customization. Free tiers have restrictions. Best for quick one-off conversions of simple digital PDFs.
Manual copy-paste: Low accuracy, does not work with scanned PDFs, very time-consuming. Always free. Best only when you have a single simple table and no other tools available.
If you extract tables from PDFs regularly — even a few times a month — AI extraction saves enough time to be worth using from the first document. The accuracy difference is not marginal; it is the difference between usable data and data that needs extensive manual correction.
Common Table Extraction Challenges
Regardless of which method you use, certain table formats are more difficult than others:
Merged cells: When a header spans multiple columns, extraction tools may duplicate the header or assign it to the wrong column. AI extraction handles this better than other methods because it can see the visual layout.
Multi-page tables: Tables that continue across page breaks are challenging because the header row may not repeat on subsequent pages, and page footers may interrupt the data.
Tables without borders: Some documents use spacing instead of lines to delineate columns. This makes it harder for any tool to determine column boundaries.
Mixed content: Tables that contain both text and numbers, or tables where some cells contain multi-line text while others contain single values, can cause alignment issues.
For all of these challenges, AI extraction produces the best results because the vision model understands the visual layout rather than relying on text positions or border detection.
After Extraction: Cleaning Up Your Excel Data
Even with the best extraction method, you may want to clean up the data in Excel:
Remove extra whitespace: Use TRIM() to remove leading and trailing spaces from cells.
Fix number formatting: Extracted numbers may be formatted as text. Select the column, go to Data → Text to Columns → Finish, and Excel will convert text to numbers.
Standardize dates: Different documents use different date formats. Use Find & Replace or a formula to standardize to your preferred format.
Split or merge columns: If a "Name" column contains both first and last names, use Text to Columns with a space delimiter to split them. If you need to combine columns, use CONCATENATE() or the & operator.
With AI extraction tools like DocPrivy, you can do much of this customization before exporting — reordering, hiding, and merging columns in the preview — so the Excel file arrives closer to its final form.
Extract Tables from PDF for Free
DocPrivy extracts tables from any PDF — digital or scanned — and exports directly to Excel or CSV. The AI identifies table structure automatically: headers, rows, columns, and cell values are preserved exactly as they appear in the document.
Upload your PDF at docprivy.com, review the extracted table, customize columns if needed, and download your Excel file. No account required, no file stored, no subscription.