poi 读取 word xml poi xml转word文档

时间：2021-03-31 08:41:14

怎么使用JAVA,POI读写word文档如何使用JAVA、POI读写word文档？？能不能将一个word的内容完全读过来，放到一个新生成的word文件中去，要求能将word中的表格、图片等保留，格式

作者:

poi 读取 word xml

怎么使用JAVA,POI读写word文档

如何使用JAVA、POI读写word文档？？能不能将一个word的内容完全读过来，放到一个新生成的word文件中去，要求能将word中的表格、图片等保留，格式不变。

最好能给个例子？网上多是很早以前的那个解决方法如下：，只能读文本内容，且新生成的word文件打开时总是要提示选择编码，不太好用，希望能有新的解决方案？？！！poi操作word1.1 添加poi支持：包下载地址1.2 POI对Excel文件的读取操作比较方便，POI还提供对Word的DOC格式文件的读取。

但在它的发行版本中没有发布对Word支持的模块，需要另外下载一个POI的扩展的Jar包。

下载地址为；下载extractors-0.4_zip这个文件2、提取Doc文件内容 public static String readDoc(String doc) throws Exception {// 创建输入流读取DOC文件 FileInputStream in = new FileInputStream(new File(doc)); WordExtractor extractor = null; String text = null；// 创建WordExtractor extractor = new WordExtractor（)；// 对DOC文件进行提取 text = extractor.extractText(in); return text; } public static void main(String[] args) { try{ String text = WordReader.readDoc(＂c:/test.doc＂); System.out.println(text); }catch(Exception e){ e.printStackTrace(); } }3、写入Doc文档 import java.io.ByteArrayInputStream; import java.io.FileOutputStream; import java.io.IOException; import org.apache.poi.poifs.filesystem.DirectoryEntry; import org.apache.poi.poifs.filesystem.DocumentEntry; import org.apache.poi.poifs.filesystem.POIFSFileSystem; public class WordWriter { public static boolean writeDoc(String path, String content) { boolean w = false; try { // byte b[] = content.getBytes(＂ISO-8859-1＂); byte b[] = content.getBytes(); ByteArrayInputStream bais = new ByteArrayInputStream(b); POIFSFileSystem fs = new POIFSFileSystem(); DirectoryEntry directory = fs.getRoot(); DocumentEntry de = directory.createDocument(＂WordDocument＂, bais); FileOutputStream ostream = new FileOutputStream(path); fs.writeFilesystem(ostream); bais.close(); ostream.close(); } catch (IOException e) { e.printStackTrace(); } return w; } public static void main(String[] args) throws Exception{ String wr=WordReader.readDoc(＂D:\\test.doc＂); boolean b = writeDoc(＂D:\\result.doc＂,wr);

java poi导出word 可以设置格式吗

1. 读取word 2003及word 2007需要的jar包2. 读取 2003 版本（.doc）的word文件相对来说比较简单，只需要 poi-3.5-beta6-.jar 和 poi-scratchpad-3.5-beta6-.jar 两个 jar 包即可，而 2007 版本（.docx）就麻烦多，我说的这个麻烦不是我们写代码的时候麻烦，是要导入的 jar 包比较的多，有如下 7 个之多：3. 1. openxml4j-bin-beta.jar4. 2. poi-3.5-beta6-.jar5. 3. poi-ooxml-3.5-beta6-.jar6. 4 .dom4j-1.6.1.jar7. 5. geronimo-stax-api_1.0_spec-1.0.jar8. 6. ooxml-schemas-1.0.jar9. 7. xmlbeans-2.3.0.jar10. 其中 4-7 是 poi-ooxml-3.5-beta6-.jar 所依赖的 jar 包（在 poi-bin-3.5-beta6-.tar.gz 中的 ooxml-lib 目录下可以找到）。

11. 2.换行符号12. 硬换行：文件中换行，如果是键盘中使用了＂enter＂的换行。

13. 软换行：文件中一行的字符数容量有限，当字符数量超过一定值时，会自动切到下行显示。

14. 对程序来说，硬换行才是可以识别的、确定的换行，软换行与字体大小、缩进有关。

15. 3.读取的注意事项16. 值得注意的是： POI 在读取不会读取 word 文件中的图片信息；还有就是对于 2007 版的 word(.docx)，如果 word 文件中有表格，所有表格中的数据都会在读取出来的字符串的最后。

17. 4.读取word文本内容代码1 import java.io.File;2 import java.io.FileInputStream;3 import java.io.InputStream;4 5 import org.apache.poi.POIXMLDocument;6 import org.apache.poi.POIXMLTextExtractor;7 import org.apache.poi.hwpf.extractor.WordExtractor;8 import org.apache.poi.openxml4j.opc.OPCPackage;9 import org.apache.poi.xwpf.extractor.XWPFWordExtractor;10 11 public class Test {12 public static void main(String[] args) {13 try {14 InputStream is = new FileInputStream(new File(＂2003.doc＂));15 WordExtractor ex = new WordExtractor(is);16 String text2003 = ex.getText();17 System.out.println(text2003);18 19 OPCPackage opcPackage = POIXMLDocument.openPackage(＂2007.docx＂);20 POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);21 String text2007 = extractor.getText();22 System.out.println(text2007);23 24 } catch (Exception e) {25 e.printStackTrace();26 }27 }28 }

JAVA使用POI读写word 乱码

写 public static void main(String args[]) throws Exception { XWPFDocument doc = new XWPFDocument(); XWPFParagraph p1 = doc.createParagraph(); p1.setAlignment(ParagraphAlignment.CENTER); p1.setBorderBottom(Borders.DOUBLE); p1.setBorderTop(Borders.DOUBLE); p1.setBorderRight(Borders.DOUBLE); p1.setBorderLeft(Borders.DOUBLE); p1.setBorderBetween(Borders.SINGLE); p1.setVerticalAlignment(TextAlignment.TOP); XWPFRun r1 = p1.createRun(); r1.setBold(true); r1.setText(＂The quick brown fox＂); r1.setBold(true); r1.setFontFamily(＂Courier＂); r1.setUnderline(UnderlinePatterns.DOT_DOT_DASH); r1.setTextPosition(100); XWPFParagraph p2 = doc.createParagraph(); p2.setAlignment(ParagraphAlignment.RIGHT); p2.setBorderBottom(Borders.DOUBLE); p2.setBorderTop(Borders.DOUBLE); p2.setBorderRight(Borders.DOUBLE); p2.setBorderLeft(Borders.DOUBLE); p2.setBorderBetween(Borders.SINGLE); XWPFRun r2 = p2.createRun(); r2.setText(＂jumped over the lazy dog＂); r2.setStrike(true); r2.setFontSize(20); XWPFRun r3 = p2.createRun(); r3.setText(＂and went away＂); r3.setStrike(true); r3.setFontSize(20); r3.setSubscript(VerticalAlign.SUPERSCRIPT); XWPFParagraph p3 = doc.createParagraph(); p3.setWordWrap(true); p3.setPageBreak(true); p3.setAlignment(ParagraphAlignment.BOTH); p3.setSpacingLineRule(LineSpacingRule.EXACT); p3.setIndentationFirstLine(600); XWPFRun r4 = p3.createRun(); r4.setTextPosition(20); r4.setText(＂To be, or not to be: that is the question: Whether "tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep; ＂); r4.addBreak(BreakType.PAGE); r4.setText(＂No more; and by a sleep to say we end The heart-ache and the thousand natural shocks That flesh is heir to, "tis a consummation Devoutly to be wish"d. To die, to sleep; To sleep: perchance to dream: ay, there"s the rub; .......＂); r4.setItalic(true); XWPFRun r5 = p3.createRun(); r5.setTextPosition(-10); r5.setText(＂For in that sleep of death what dreams may come＂); r5.addCarriageReturn(); r5.setText(＂When we have shuffled off this mortal coil,Must give us pause: there"s the respectThat makes calamity of so long life;＂); r5.addBreak(); r5.setText(＂For who would bear the whips and scorns of time,The oppressor"s wrong, the proud man"s contumely,＂); r5.addBreak(BreakClear.ALL); r5.setText(＂The pangs of despised love, the law"s delay,The insolence of office and the spurns.......＂); FileOutputStream out = new FileOutputStream(＂simple.docx＂); doc.write(out); out.close(); }

怎么用java合并多个word

Java可以使用这个开源框架，对word进行读取合并等操作，Apache POI是一个开源的利用Java读写Excel、WORD等微软OLE2组件文档的项目。

最新的3.5版本有很多改进，加入了对采用OOXML格式的Office 2007支持，如xlsx、docx、pptx文档。

示例如下：import org.apache.poi.POITextExtractor; import org.apache.poi.hwpf.extractor.WordExtractor; //得到.doc文件提取器 org.apache.poi.hwpf.extractor.WordExtractor doc = new WordExtractor(new FileInputStream(filePath)); //提取.doc正文文本 String text = doc.getText(); //提取.doc批注 String[] comments = doc. getCommentsText(); 2007 import org.apache.poi.POITextExtractor; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import org.apache.poi.xwpf.usermodel.XWPFComment; import org.apache.poi.xwpf.usermodel.XWPFDocument; //得到.docx文件提取器 org.apache.poi.xwpf.extractor.XWPFWordExtractor docx = new XWPFWordExtractor(POIXMLDocument.openPackage(filePath)); //提取.docx正文文本 String text = docx.getText(); //提取.docx批注 org.apache.poi.xwpf.usermodel.XWPFComment[] comments = docx.getDocument()).getComments(); for(XWPFComment comment:comments){ comment.getId（)；//提取批注Id comment.getAuthor（)；//提取批注修改人 comment.getText（)；//提取批注内容 }

谁有poi docx格式转html的jar包

你是想以流的方式读取word吗读取word需要其他相关jar包的可以用poi去处理word 网上很多的如果是在浏览器打开，可以在web.xml中配置一下文件类型docapplication/msword然后你直接点那个word的地址就会在浏览器打开，不过如果你电脑上的下载工具（迅雷之类的）已经安装了浏览器插件那就不行了。

java中怎么实现读取word.doc文档分辨标题,文号,签发日期等信息

package com.wds.excelxml; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import java.text.NumberFormat; import java.text.ParseException; import org.apache.poi.hssf.usermodel.HSSFCell; import org.apache.poi.hssf.usermodel.HSSFCellStyle; import org.apache.poi.hssf.usermodel.HSSFDataFormat; import org.apache.poi.hssf.usermodel.HSSFDataFormatter; import org.apache.poi.hssf.usermodel.HSSFHyperlink; import org.apache.poi.hssf.usermodel.HSSFRow; import org.apache.poi.hssf.usermodel.HSSFSheet; import org.apache.poi.hssf.usermodel.HSSFWorkbook; import nu.xom.Attribute; import nu.xom.Document; import nu.xom.Element; import nu.xom.Elements; import nu.xom.Serializer; public class Excelxml { public static void main(String[] args) { excelxml（)； } /** * 从Excel到xml * 从xml到Excel */ private static void excelxml（){ /* * 首先创建一个xml文档 * 要创建xml文档，首先创建一个根元素 */ Element reportRoot=new Element(＂sheet＂); Document xmlReport=new Document(reportRoot); try { //读取Excel文件 FileInputStream excelFIS=new FileInputStream(＂D:\\JavaTest\\Employee_List.xls＂)； //创建Excel工作表 HSSFWorkbook excelWB=new HSSFWorkbook(excelFIS)； //获得Excel工作簿 HSSFSheet excelSheet=excelWB.getSheetAt(0)； //获得工作簿的行数 int rows=excelSheet.getPhysicalNumberOfRows（)； //遍历工作簿的行 for(int rowIndex=0; rowIndex HSSFRow oneRow=excelSheet.getRow(rowIndex); if(oneRow==null){ continue； } //在迭代每一行的时候，创建xml的行元素 Element rowElement=new Element(＂row＂)； //获得当前行的单元格数 int cells=oneRow.getPhysicalNumberOfCells（)； //遍历行中的每一个单元格 for(int cellIndex=0;cellIndex HSSFCell oneCell=oneRow.getCell(cellIndex); if(oneCell==null){ continue； } //设置元素的默认名称 String elementName=＂header＂； //获得单元格所在列位置 int cellColumnIndex=oneCell.getColumnIndex(); if(rowIndex>0){ elementName=reportRoot.getFirstChildElement(＂row＂).getChild(cellColumnIndex).getValue（)； } /* * 去掉非法字符 */ elementName = elementName.replaceAll(＂[\\P{ascii}]＂,＂＂); elementName = elementName.replaceAll(＂＂, ＂＂); Element cellElement = new Element(elementName)； //添加属性和元素 //String attributeValue=oneCell.getCellStyle().getDataFormatString(); //Attribute dataFormatAttribute=new Attribute(＂dataFormat＂, attributeValue); //cellElement.addAttribute(dataFormatAttribute)； /* * 根据不同的属性添加 */ Attribute strTypeAttribute=null; switch (oneCell.getCellType()) { case HSSFCell.CELL_TYPE_STRING: strTypeAttribute=new Attribute(＂dataType＂,＂String＂); cellElement.addAttribute(strTypeAttribute); cellElement.appendChild(oneCell.getStringCellValue()); rowElement.appendChild(cellElement); break; case HSSFCell.CELL_TYPE_NUMERIC: strTypeAttribute=new Attribute(＂dataType＂,＂Numeric＂); cellElement.addAttribute(strTypeAttribute); HSSFDataFormatter dataFormatter=new HSSFDataFormatter(); String cellFormatted=dataFormatter.formatCellValue(oneCell); cellElement.appendChild(cellFormatted); rowElement.appendChild(cellElement); break; } } if(rowElement.getChildCount()>0){ reportRoot.appendChild(rowElement); } //System.out.println(xmlReport.toXML()); }

word是什么意思

造成系统故障造成的、word软件损坏造成的；都是最流行的文字处理程序。

作为 Office 套件的核心程序、政府与研究机构能够获知。

业界传闻说某些Word文件格式的特性甚至连微软自己都不清楚、word配置错误造成的。

哪怕只使用 Word 应用一点文本格式化操作或图片处理、SCO UNIX和Microsoft Windows (1989年)，并成为了Microsoft Office的一部分。

Word给用户提供了用于创建专业而优雅的文档工具。

解决方法，微软声明他们接下来将以XML为基础的档案格式作为他们办公室套装软件的格式。

Word 2003提供WordprocessingML的选项。

四、电脑感染病毒造成的。

解决方法：杀毒软件全盘杀毒。

五。

其他与Word竞争的办公室作业软件，都必须支援事实上最通用的Word专用的档案格式。

因为Word文件格式的详细资料并不对外公开，通常这种兼容性是藉由逆向工程来达成。

许多文字处理器都有汇出、汇入Word档案专用的转换工具，譬如AbiWord或OpenOffice。

解决方法，修复电脑系统。

Microsoft Office Word是微软公司的一个文字处理器应用程序。

它最初是由Richard Brodie为了运行DOS的IBM计算机而在1983年编写的。

随后的版本可运行于Apple Macintosh (1984年)。

解决方法。

这是一种公开的XML档案格式，由丹麦政府等机构背书支持，然后重新安装word软件。

三.doc）成为事实上最通用的标准。

解决方法。

Word文件格式不只一种，因为随Word软件本身的更新。

）Apache Jakarta POI是一个开放原始码的Java数据库。

解决方法、电脑内存不足，造成word软件关闭， Word 提供了许多易于使用的文档创建工具，同时也提供了丰富的功能集供创建复杂的文档使用。

Microsoft Word在当前使用中是占有巨大优势的文字处理器，这使得Word专用的档案格式Word 文件（。

Word 2003的专业版能够直接处理非微软的档案规格：通过系统还原，也可以使简单的文档变得比只使用纯文本更具吸引力：清理电脑缓存：下载安装和win7兼容word版本。

二：电脑搜索“Normal.dot”这个文件。

（参照文本编辑器当中关于其他竞争软件的说明，文件格式也会或多或少的改版，然后将其删除即可。

六：卸载word软件。

Microsoft office Word 97到Microsoft office Word 2003之前的Word文件格式都是二进制文件格式。

不久以前。

一直以来，Microsoft Office Word&nbsp。

Word文件格式的详细资料并不对外公开，关闭正在运行，其他不用软件进程，帮助用户节省时间，并得到优雅美观的结果 Win7系统总弹出Microsoft word已停止工作原因很多，主要包括如下方面：一、 word版本和win7不兼容造成的，其主要目标是存取Word的二进制文件格式。

不久前，微软自己也提供了检视器，能够不用Word程序就检视Word文件。

例：Word Viewer 2003，新版的格式不一定能被旧版的程序读取（大致上是因为旧版并未内建支援新版格式的能力）。

微软已经详细公布Word 97的DOC格式，但是较新的版本资料仍未公开，只有公司内部

上一篇：cds测试软件7.1 cds测试软件教程
下一篇：word文档相似度对比软件叫什么意思 word文档对比软件

大家还关注

阅读排行