How to Extract Data from W-2 and 1099 Tax Forms with AI (2026 Guide)
AI extraction reads W-2s, 1099s, K-1s, and pay stubs in seconds - no templates, no typing. Plus how to fill out blank IRS tax forms with PDF Form Filler. Updated for 2026 thresholds and new W-2 codes.

April is over, but the tax forms aren't. A bookkeeper still has 1099-NECs to issue for prior-year work the client forgot about. A loan officer is staring at four W-2s and three pay stubs from a mortgage applicant. An accountant on extension is reconciling 47 K-1s for a client's October 15 filing. A small business owner just realized they need to mail amended 1099s to the contractors they missed.
Tax forms aren't a January problem - they're a year-round problem. And nearly every workflow that touches them starts the same way: someone has to read each box on the PDF and type the values into a spreadsheet, accounting software, or another form.
This guide walks through how AI extraction reads tax form data automatically, how to fill out blank tax forms (W-9s, W-4s, 1099 templates) without printing them, and what's changed for the 2026 filing year that you need to know.
Why Tax Form Data Entry Is So Painful
A single W-2 has 25+ data points across 20 boxes. A 1099-NEC has 8. A K-1 can have 30+ depending on the partnership. Multiply that by a few dozen forms and an afternoon disappears into the kind of repetitive work humans are uniquely bad at: clean character recognition.
The errors aren't random. They're the same handful of mistakes everyone makes:
- SSN transpositions - flipping two adjacent digits in a 9-digit number
- Box 12 codes - missing the difference between Code D (401(k)) and Code DD (employer health coverage)
- State wages - forgetting that an employee who worked in two states needs both lines copied
- Decimal misalignment -
$1,234.56becoming$12,345.60 - Box 14 free-text - manually re-keying inconsistent labels like
STDIS,401K LOAN, orPA SUI
These mistakes don't always show up immediately. They show up six weeks later when an auto-import pipeline rejects 4 of 47 records, or when the IRS sends a CP2000 notice flagging mismatched income.
The 2026 Changes You Need to Know
Two regulatory shifts matter for any 2026 form work, both stemming from the One Big Beautiful Bill Act (OBBBA).
1099-NEC and 1099-MISC Threshold Raised to $2,000
For decades, the threshold for issuing a 1099-NEC was $600. Starting in 2026, it's $2,000. The 1099-MISC threshold rose to the same number. This means fewer forms to issue - but also fewer forms received by contractors, which makes income reconciliation harder. Contractors still owe tax on every dollar earned, but they may not get a 1099 for jobs under $2,000. The threshold will be inflation-adjusted starting 2027.
1099-K Reverted to $20,000 / 200 Transactions
The 1099-K threshold - for payment apps like PayPal, Venmo, Stripe, and credit card processors - was supposed to drop to $600. The OBBBA reverted it to the pre-2022 level: $20,000 in payments AND more than 200 transactions. Most freelancers and side-hustle sellers won't get a 1099-K in 2026 unless they hit both thresholds.
New W-2 Box 12 Codes for 2026
The 2026 W-2 added three Box 12 codes that didn't exist before:
- TA - Employer contributions to a Trump account (the new tax-advantaged savings vehicle)
- TP - Total cash tips reported to the employer
- TT - Total qualified overtime compensation
Box 14 was also split into 14a (existing "Other" field for state disability, union dues, etc.) and 14b (new field for the Treasury Tipped Occupation Code, used to determine eligibility for the new tips deduction).
Any extraction tool that hasn't been updated for these changes will silently drop the new fields. Verify your tool handles them before using it on 2026 forms.
E-File Requirement Stays at 10 Forms
The IRS e-filing threshold remains 10 information returns - aggregated across all form types. If you issue four 1098s and six 1099-NECs, that's 10 total, and you must e-file. Penalties for late or missing 1099s now run $60 per form (corrected within 30 days), $130 per form (after 30 days but by August 1), $340 per form (after August 1 or unfiled), and $680 per form for intentional disregard with no maximum cap.
What's Actually on a W-2
A W-2 has six lettered boxes (a–f) for identification and 20 numbered boxes for amounts. Here's what each contains.
Identification
- Box a: Employee's Social Security Number
- Box b: Employer Identification Number (EIN)
- Box c: Employer's name, address, ZIP
- Box d: Control number (optional payroll system reference)
- Boxes e–f: Employee's name and address
Wages and Tax
| Box | Field | Notes |
|---|---|---|
| 1 | Wages, tips, other compensation | The federal taxable amount - usually the most important number |
| 2 | Federal income tax withheld | Goes on the federal return as a credit |
| 3 | Social Security wages | Capped at the annual SS wage base |
| 4 | Social Security tax withheld | 6.2% of Box 3 |
| 5 | Medicare wages and tips | No cap - usually higher than Box 1 |
| 6 | Medicare tax withheld | 1.45% of Box 5 plus 0.9% additional over $200K |
| 7 | Social Security tips | Reported tips subject to SS tax |
| 8 | Allocated tips | Tips assigned by the employer |
| 9 | (Reserved) | Currently unused |
| 10 | Dependent care benefits | DCAP / FSA contributions |
| 11 | Nonqualified plans | Distributions from 457(b) or other NQ plans |
Box 12 - Codes Galore
Box 12 has four sub-fields (12a, 12b, 12c, 12d) and over 30 possible codes. The most common:
- D - 401(k) elective deferrals
- E - 403(b) elective deferrals
- DD - Cost of employer-sponsored health coverage (informational only)
- W - HSA contributions (employer + employee)
- C - Group-term life insurance over $50K
- AA - Roth 401(k) contributions
- BB - Roth 403(b) contributions
- EE - Roth 457(b) contributions
- TA (new 2026) - Trump account contributions
- TP (new 2026) - Cash tips reported
- TT (new 2026) - Qualified overtime compensation
Each code has a number next to it. An extraction tool needs to read both - D 8400.00 is very different from DD 8400.00.
Boxes 13–14 - Checkboxes and Free Text
- Box 13: Three checkboxes - statutory employee, retirement plan, third-party sick pay
- Box 14a: "Other" - state disability tax, union dues, charity contributions, parking, etc.
- Box 14b (new 2026): Treasury Tipped Occupation Code (TTOC) for tips deduction eligibility
Box 14a is the Wild West. Employers put anything they want here, with no standard format. STDIS 234.50 and STATE DISABILITY 234.50 mean the same thing - your extraction tool needs to handle both.
Boxes 15–20 - State and Local
These six boxes have two rows each, supporting employees who worked in multiple states or localities:
- Box 15: State and employer's state ID number
- Box 16: State wages
- Box 17: State income tax
- Box 18: Local wages
- Box 19: Local income tax
- Box 20: Locality name
Multi-state employees create the most extraction errors - both rows need to be captured separately, and the state codes (PA, NJ, NY) need to attach to the right amount.
What's Actually on a 1099
The 1099 family has 21 variants in 2026. Most workflows touch a small subset:
| Form | What It Reports | Common Sender |
|---|---|---|
| 1099-NEC | Non-employee compensation | Clients paying contractors |
| 1099-MISC | Rents, prizes, royalties, settlements | Landlords, lawyers, gig platforms |
| 1099-K | Payment app and credit card processor income | PayPal, Stripe, Venmo, Square |
| 1099-INT | Interest income | Banks, credit unions |
| 1099-DIV | Dividends and capital gains distributions | Brokerages, mutual funds |
| 1099-B | Broker transactions (sales of securities) | Brokerages |
| 1099-R | Retirement and pension distributions | 401(k) administrators, IRA custodians |
| 1099-G | Government payments (unemployment, refunds) | State agencies |
| 1099-S | Real estate sales | Title companies |
The 1099-NEC is the simplest - payer info, recipient info, Box 1 (nonemployee compensation), Box 4 (federal tax withheld), state tax fields. The 1099-DIV and 1099-B are the most complex, with many qualified/ordinary categories and cost basis details that span multiple pages.
Two Workflows: Extracting and Filling
Tax form work splits into two distinct workflows. Most articles only cover one. The reality is that anyone dealing with tax forms regularly needs both.
Workflow 1: Extract Data from Received Forms
You're on the receiving end. A contractor sent a W-9. An employee dropped off three W-2s for a tax-prep client. A mortgage applicant uploaded their 2024 and 2025 W-2s plus four pay stubs. You need the data in a spreadsheet, in your accounting software, or in an underwriting system.
Manual approach: open the PDF, read each box, type into the destination. 5–15 minutes per form. ~95% accurate after a first pass. ~99% after a second.
AI approach: upload the PDF. The AI reads the document, identifies fields by context (not by template position), and returns structured data. 2–5 seconds per form. 96–99% accurate on digital PDFs, 88–95% on photographed or scanned forms.
PDFSub's Extract Data tool handles this - point it at any tax form PDF and it returns clean JSON or CSV with every field labeled. The AI knows that the number after "Wages, tips, other compensation" goes into the box_1_wages field, regardless of where it sits on the page.
Workflow 2: Fill Out Blank Tax Forms
You're on the issuing end. You're a small business owner who needs to send 1099-NECs to four contractors. You're an HR coordinator giving a new hire a blank W-4 to fill out digitally. You're an accountant prepping K-1s for a partnership's investors.
Manual approach: print, write, scan, mail. Or fight with Adobe Acrobat's form fields. Or buy specialized 1099 software for $80–$300/year for a handful of forms.
PDF approach: open the IRS fillable PDF in a tool that recognizes form fields, type your data, save, and either e-deliver or print. PDFSub's PDF Form Filler detects existing AcroForm fields automatically - it works for IRS W-9, W-4, W-2, 1099 templates, and most tax software exports.
For non-fillable PDFs (or for tax forms where you need to add information outside the standard fields), the Edit PDF tool lets you place text, signatures, and shapes anywhere on the page without breaking the underlying document.
The combined workflow:
- Pull the official IRS PDF from irs.gov
- Open in PDF Form Filler - fields detect automatically
- Fill in payer/recipient info, amounts, codes
- Sign with E-Sign
- Redact the SSN/EIN before sending the recipient copy with Redact PDF
- Save copies for your records
(Note on issuing 1099s: the IRS requires e-filing if you're issuing 10 or more information returns of any combined type. The PDF approach works for paper filings and recipient copies, but the IRS submission itself goes through SSA's BSO portal or a third-party e-file service. PDFSub handles the document side; the IRS handles the transmission side.)

Accuracy: What to Actually Expect
Tax forms are easier than invoices for AI extraction in some ways and harder in others.
Easier:
- Standard layouts (IRS forms have fixed structure)
- Pre-printed labels (the AI can lock onto known field names)
- Fixed value types (numeric amounts, dates, EIN/SSN patterns)
Harder:
- Box 12 has four sub-fields with codes - easy to mis-pair codes and amounts
- Box 14 is free-text with no standard
- State boxes 15–20 have two rows that confuse template-based tools
- Photographed forms (cell phone snapshots) introduce glare, perspective distortion, and reflections
Realistic accuracy ranges:
| Source | Header Fields | Numeric Boxes | Box 12 Codes | Box 14 |
|---|---|---|---|---|
| Digital PDF (IRS official) | 99%+ | 98–99% | 96–98% | 92–95% |
| Digital PDF (payroll system export) | 98–99% | 97–99% | 95–97% | 90–94% |
| Scanned (300+ DPI) | 96–98% | 94–97% | 90–94% | 85–90% |
| Phone photo | 90–95% | 88–93% | 82–88% | 75–82% |
For high-stakes use cases (mortgage underwriting, tax filing, audit response), always cross-check Box 1, Box 2, and SSN against the original PDF. The remaining 1–2% error rate matters when a wrong digit means a denied loan or a CP2000 notice.
Privacy: SSNs Are PII
Every W-2 and 1099 contains a Social Security Number. SSNs are the highest-risk PII in any extraction workflow - leaks lead directly to identity theft, and many states require breach notification within 30 days for any SSN exposure.
This makes "where does the data go?" the most important question to ask of any extraction tool.
The risk patterns:
- Cloud-only tools upload your PDF to their servers, run extraction, and may retain the file for "model improvement" - read the privacy policy carefully
- Browser-based tools that say "client-side" should still be verified - open DevTools and check whether the file actually leaves your browser
- Third-party APIs (Google Document AI, AWS Textract, Azure) process documents server-side but don't typically retain them; check the SLA
PDFSub's approach for tax forms specifically:
- For digital PDFs with embedded text, the text is extracted client-side in your browser and only the structured text (not the file) is sent to the AI for labeling
- For scanned forms or phone photos, the file is sent server-side, processed in isolation, and auto-deleted
- For sharing extracted forms (e.g., sending to a tax preparer), the Redact tool draws an opaque black rectangle labeled REDACTED over SSN digits. For highest-security workflows where the underlying content stream must be cleared (not just visually covered), use a dedicated redaction tool that performs full content-stream removal
If you're handling tax forms for clients (accountants, bookkeepers, lenders), this matters more - your liability for a SSN leak isn't theoretical.
Step-by-Step: Extracting Tax Form Data with PDFSub
The workflow:
- Go to the Extract Data tool or open it in the Studio dashboard
- Upload your tax form - drag and drop, or click to browse. Supports up to 20MB; handles W-2, 1099 family, K-1, 1098, W-9, and pay stubs
- Click "Extract Data" - the AI analyzes the form, identifies the form type automatically, and pulls every labeled field
- Review the output - every field is labeled (e.g.,
box_1_wages,box_12a_code,box_12a_amount) - Export - download as JSON for system integration, CSV for spreadsheets, or copy fields directly into your tax software
For batch processing (e.g., 47 1099s for client tax prep), upload multiple files in a single session - each form is processed independently.
Pro tip: If your tax form is a phone photo, run it through Clean Scanned PDF first. Deskewing and contrast normalization typically push accuracy from 88% to 95%+.
Step-by-Step: Filling Out a Blank Tax Form with PDFSub
For the issuing-side workflow:
- Download the official IRS PDF from irs.gov/forms. Most IRS forms are AcroForm-fillable
- Open the PDF Form Filler and upload the IRS PDF
- Fields detect automatically - every text box, checkbox, and signature field appears with a label
- Type your data - payer name, EIN, recipient info, amounts, codes
- For non-fillable spots (rare for IRS forms but common for older PDFs or tax-software exports), use Edit PDF to place text anywhere
- Sign with E-Sign - drag your signature into the signature box
- Save the PDF - your filled version is ready to print, e-deliver, or attach to email
For 1099s that need recipient copies, run Redact PDF on Copy B to mask the recipient's full SSN - most issuers redact all but the last four digits before sending the recipient their copy.
Common Tax Form Use Cases
The same extraction + filling workflow shows up in different contexts:
Mortgage and Loan Applications
Lenders need 2 years of W-2s, recent pay stubs, and 1–2 years of tax returns. AI extraction lets a loan processor verify income in 30 seconds instead of 30 minutes. Income calculations cross-check Box 1 (W-2) against Schedule C net profit (1040) against Box 7 (last pay stub).
Tax Preparation
For accountants on extension (the October 15 deadline approaches), every minute matters. A typical individual return touches 4–8 W-2s/1099s plus K-1s. Extracting them in 30 seconds vs. 30 minutes is the difference between billable hours and overtime.
IRS Audit Response
When the IRS sends a CP2000 notice for unreported income, the response requires re-checking every 1099 received. Extraction speeds the reconciliation against bank deposits - pair this with the Bank Statement Converter to match 1099 income to actual deposits.
Bookkeeping Reconciliation
For 1099-NEC issuers (most small businesses paying contractors), end-of-year requires reconciling 1099s issued against 1099s the contractors actually received. Extraction makes the cross-check automated.
Onboarding Packages
HR teams use PDF Form Filler to send pre-populated W-4s, I-9s, and direct deposit authorization forms to new hires. The new hire fills the remaining fields and returns the signed PDF - no printing required.
Insurance Underwriting
Life insurance and disability underwriters review tax forms to verify income. AI extraction shaves 60–80% off processing time per applicant.
Best Practices
A few habits significantly improve results:
Use Original PDFs, Not Photos, When Possible
Every employer and brokerage offers PDF download from their portal. The official PDF has embedded text - it extracts perfectly. A phone photo of a printed W-2 has no embedded text, requires OCR first, and introduces a 5–10% accuracy hit. Always ask for the PDF.
Verify the SSN, EIN, and Box 1 on First Use
The first form you process from a new payroll system or brokerage, eyeball-check three fields: SSN, EIN, and the largest dollar amount. If those three are right, the rest usually follow. If any of them are wrong, the form layout has a quirk worth investigating.
Standardize the Output Format
Pick CSV for spreadsheets, JSON for APIs. Don't switch mid-batch - downstream parsers break on format changes. The Extract Data tool lets you set the output format once and apply it to every form in a session.
Redact Before Sharing
Before emailing extracted data or PDFs to anyone outside your organization, run Redact PDF on the SSN/EIN. PDFSub's redaction draws an opaque black rectangle labeled REDACTED over the content. Users handling PII at high volume or for regulated workflows should be aware that visual redaction does not strip text from the underlying PDF content stream - for that level of security (where text-extraction tools cannot recover the redacted content), use a dedicated redaction tool that performs full content-stream removal until PDFSub's permanent-removal feature ships.
Keep an Audit Log of Extracted Forms
For accounting and lending, keep a trail: filename, date extracted, who extracted, fields used downstream. If the IRS or an auditor questions a number, you can show the source PDF and the extraction output.
Don't Skip the New 2026 Boxes
If you're using older extraction tools, verify they handle Box 12 codes TA, TP, TT and Box 14b. A tool that silently drops these fields will produce technically clean exports that are missing legally required data.
Beyond W-2 and 1099
The same AI extraction handles related tax forms:
- Schedule K-1 - partnership, S-corp, and trust income (most complex tax form by far - 30+ fields)
- Pay Stubs - current pay period, YTD totals, deductions, year-to-date breakouts
- Form 1098 - mortgage interest, student loan interest, tuition payments
- Form W-9 - payer information collection (extract and import to AP system)
- Form W-4 - withholding allowances (extract for payroll system entry)
- Form 1040 / Schedule C - full tax returns (extract income lines for loan apps)
For broader financial document workflows, the Bank Statement Converter, Receipt Scanner, and Invoice Extractor cover the rest of the financial document spectrum - all in the same subscription.
FAQ
What's the difference between Box 1 and Box 5 on a W-2?
Box 1 is federal taxable wages - it excludes pre-tax deductions like 401(k) contributions and FSA contributions. Box 5 is Medicare wages - it includes those deductions and has no cap. Most W-2 readers check Box 1 first because it's what goes on Form 1040, but Box 5 is the right number for Social Security and Medicare calculations.
Can AI extraction read a phone photo of a W-2?
Yes, but accuracy drops to 88–95% depending on lighting and focus. For best results, use the official PDF from the employer or payroll provider. If you only have a photo, run it through Clean Scanned PDF first to deskew and enhance contrast.
Does PDFSub handle multi-state W-2s?
Yes. The tool reads both rows of boxes 15–20, attaches the state codes to the correct wage and tax amounts, and returns each state's data as a separate object in the output.
Can I fill out an IRS 1099-NEC with PDFSub?
Yes - open the official IRS 1099-NEC PDF in the PDF Form Filler. The fields detect automatically. Type in the payer info, recipient info, and amounts. Save and either print for paper filing or use the saved PDF for recipient copies. (For IRS submission, you'll need to e-file through SSA's BSO portal or a third-party transmitter if you're issuing 10+ forms total.)
What happens to my tax form data after extraction?
For digital PDFs with embedded text, extraction happens client-side - the file never leaves your browser. The AI only receives the extracted text (no file). For scanned forms or photos, the file is sent server-side, processed in isolation, and auto-deleted. PDFSub doesn't retain tax form files after processing.
How does AI extraction handle Box 12 codes correctly?
The AI reads each of the four sub-fields (12a, 12b, 12c, 12d) as a code+amount pair. So a W-2 with D 8400.00 in Box 12a and DD 14200.00 in Box 12b returns two distinct rows, each with the correct code-amount mapping. Template-based tools commonly mis-pair these because they read positions, not relationships.
What about K-1s - they're different per partnership?
K-1s vary by entity type (partnership, S-corp, trust) and by partnership-specific allocations, but the box layouts are standardized within each variant. AI extraction handles all three K-1 types (Form 1065, 1120-S, 1041). For partnerships with non-standard supplemental schedules, expect 90–95% accuracy on the main K-1 with manual review of supplemental items.
Can I extract data from prior-year tax forms?
Yes. The IRS revises forms each year, but the AI was trained on multiple years of layouts. W-2s and 1099s from 2018 onward extract reliably. For pre-2018 forms with discontinued boxes (e.g., the old 1099-MISC Box 7 that was replaced by 1099-NEC starting in 2020), the AI handles the legacy layout correctly.
Is the new 1099 reporting threshold of $2,000 the same for every state?
The federal threshold is $2,000 starting in 2026, but several states have lower state-level 1099 thresholds. California, Massachusetts, and others may still require 1099 reporting at $600 for state purposes even if no federal 1099 is required. Check your state revenue agency's guidance before relying solely on the federal threshold.
What's the cheapest way to issue 1099s for a handful of contractors?
If you're issuing fewer than 10 information returns total, the IRS allows paper filing. Download the IRS 1099-NEC PDF, fill it with PDF Form Filler, print Copy A on red-ink scannable paper (purchased from the IRS or office supply stores), and mail to the IRS. This avoids the cost of a 1099 e-file service for low-volume issuers.
Getting Started
If you're processing tax forms - extracting from received forms or filling out blank ones - the math is straightforward. At 5 minutes per form, processing 50 forms takes ~4 hours. AI extraction does it in 2 minutes total, with higher accuracy.
Try PDFSub's Extract Data tool - start a 7-day free trial with full access to all PDF tools. Upload a W-2 or 1099, see the structured output, and decide if the accuracy matches your workflow before committing to a paid plan.
Issuing 1099s this year? PDF Form Filler handles the IRS PDFs without specialized tax-prep software.
Tax forms aren't going away. The good news: 2026 is the first year you don't have to type them by hand.