chris ()
畢業於中國人民大學信息學院
很多人用java進行文檔操作時經常會遇到一個問題
其實jacob是一個bridage
jacob jar與dll文件下載
下載了jacob並放到指定的路徑之後(dll放到path
import java
import
import com
/**
* Title: pdf extraction
* Description: email:
* Copyright: Matrix Copyright (c)
* Company:
* @author chris
* @version
*/
public class FileExtracter{
public static void main(String[] args) {
ActiveXComponent component = new ActiveXComponent(
String inFile =
String tpFile =
String otFile =
boolean flag = false;
try {
component
Object wordacc = component
Object wordfile = Dispatch
new Object[]{inFile
new int[
Dispatch
Variant f = new Variant(false);
Dispatch
flag = true;
} catch (Exception e) {
e
} finally {
component
}
}
}
poi是apache的一個項目
下載經過封裝後的poi包
下載之後
import java
import org
/**
*
Title: word extraction
*
Description: email:
*
Copyright: Matrix Copyright (c)
*
Company:
* @author chris
* @version
*/
public class PdfExtractor {
public PdfExtractor() {
}
public static void main(String args[]) throws Exception
{
FileInputStream in = new FileInputStream (
WordExtractor extractor = new WordExtractor();
String str = extractor
System
System
}
}
但是pdfbox對中文支持還不好
下面是一個如何使用pdfbox抽取pdf文件的例子
import org
import org
import java
import org
import java
/**
*
Title: pdf extraction
*
Description: email:
*
Copyright: Matrix Copyright (c)
*
Company:
* @author chris
* @version
*/
public class PdfExtracter{
public PdfExtracter(){
}
public String GetTextFromPdf(String filename) throws Exception
{
String temp=null;
PDdocument.nbsppdfdocument.null;
FileInputStream is=new FileInputStream(filename);
PDFParser parser = new PDFParser( is );
parser
pdfdocument.nbsp= parser
ByteArrayOutputStream out = new ByteArrayOutputStream();
OutputStreamWriter writer = new OutputStreamWriter( out );
PDFTextStripper stripper = new PDFTextStripper();
stripper
writer
byte[] contents = out
String ts=new String(contents);
System
return ts;
}
public static void main(String args[])
{
PdfExtracter pf=new PdfExtracter();
PDdocument.nbsppdfdocument.nbsp= null;
try{
String ts=pf
System
}
catch(Exception e)
{
e
}
}
}
xpdf是一個開源項目
下載xpdf函數包
同時需要下載支持中文的補丁包
按照readme放好中文的patch
下面是一個如何調用的例子
import java
/**
*
Title: pdf extraction
*
Description: email:
*
Copyright: Matrix Copyright (c)
*
Company:
* @author chris
* @version
*/
public class PdfWin {
public PdfWin() {
}
public static void main(String args[]) throws Exception
{
String PATH_TO_XPDF=
String filename=
String[] cmd = new String[] { PATH_TO_XPDF
Process p = Runtime
BufferedInputStream bis = new BufferedInputStream(p
InputStreamReader reader = new InputStreamReader(bis
StringWriter out = new StringWriter();
char [] buf = new char[
int len;
while((len = reader
//out
System
}
reader
String ts=new String(buf);
System
}
}
關於作者
作者簡介
如果大家誰有更好的辦法
From:http://tw.wingwit.com/Article/program/Java/JSP/201311/19681.html