|
| 1 | +########### |
| 2 | +Python docx |
| 3 | +########### |
| 4 | + |
| 5 | +Introduction |
| 6 | +============ |
| 7 | + |
| 8 | +The docx module creates, reads and writes Microsoft Office Word 2007 docx |
| 9 | +files. |
| 10 | + |
| 11 | +These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by |
| 12 | +Microsoft. |
| 13 | + |
| 14 | +These documents can be opened in Microsoft Office 2007 / 2010, Microsoft Mac |
| 15 | +Office 2008, Google Docs, OpenOffice.org 3, and Apple iWork 08. |
| 16 | + |
| 17 | +They also `validate as well formed XML <http://validator.w3.org/check>`_. |
| 18 | + |
| 19 | +The module was created when I was looking for a Python support for MS Word |
| 20 | +.docx files, but could only find various hacks involving COM automation, |
| 21 | +calling .Net or Java, or automating OpenOffice or MS Office. |
| 22 | + |
| 23 | +The docx module has the following features: |
| 24 | + |
| 25 | +Making documents |
| 26 | +---------------- |
| 27 | + |
| 28 | +Features for making documents include: |
| 29 | + |
| 30 | +- Paragraphs |
| 31 | +- Bullets |
| 32 | +- Numbered lists |
| 33 | +- Document properties (author, company, etc) |
| 34 | +- Multiple levels of headings |
| 35 | +- Tables |
| 36 | +- Section and page breaks |
| 37 | +- Images |
| 38 | + |
| 39 | +.. image:: http://github.com/mikemaccana/python-docx/raw/master/screenshot.png |
| 40 | + |
| 41 | + |
| 42 | +Editing documents |
| 43 | +----------------- |
| 44 | + |
| 45 | +Thanks to the awesomeness of the lxml module, we can: |
| 46 | + |
| 47 | +- Search and replace |
| 48 | +- Extract plain text of document |
| 49 | +- Add and delete items anywhere within the document |
| 50 | +- Change document properties |
| 51 | +- Run xpath queries against particular locations in the document - useful for |
| 52 | + retrieving data from user-completed templates. |
| 53 | + |
| 54 | + |
| 55 | +Getting started |
| 56 | +=============== |
| 57 | + |
| 58 | +Making and Modifying Documents |
| 59 | +------------------------------ |
| 60 | + |
| 61 | +- Just `download python docx <http://github.com/mikemaccana/python-docx/tarball/master>`_. |
| 62 | +- Use **pip** or **easy_install** to fetch the **lxml** and **PIL** modules. |
| 63 | +- Then run:: |
| 64 | + |
| 65 | + example-makedocument.py |
| 66 | + |
| 67 | + |
| 68 | +Congratulations, you just made and then modified a Word document! |
| 69 | + |
| 70 | + |
| 71 | +Extracting Text from a Document |
| 72 | +------------------------------- |
| 73 | + |
| 74 | +If you just want to extract the text from a Word file, run:: |
| 75 | + |
| 76 | + example-extracttext.py 'Some word file.docx' 'new file.txt' |
| 77 | + |
| 78 | + |
| 79 | +Ideas & To Do List |
| 80 | +~~~~~~~~~~~~~~~~~~ |
| 81 | + |
| 82 | +- Further improvements to image handling |
| 83 | +- Document health checks |
| 84 | +- Egg |
| 85 | +- Markdown conversion support |
| 86 | + |
| 87 | + |
| 88 | +We love forks, changes and pull requests! |
| 89 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 90 | + |
| 91 | +- Check out the [HACKING](HACKING.markdown) to add your own changes! |
| 92 | +- For this project on github |
| 93 | +- Send a pull request via github and we'll add your changes! |
| 94 | + |
| 95 | +Want to talk? Need help? |
| 96 | +~~~~~~~~~~~~~~~~~~~~~~~~ |
| 97 | + |
| 98 | + |
| 99 | + |
| 100 | + |
| 101 | +License |
| 102 | +~~~~~~~ |
| 103 | + |
| 104 | +Licensed under the `MIT license <http://www.opensource.org/licenses/mit-license.php>`_ |
| 105 | + |
| 106 | +Short version: this code is copyrighted to me (Mike MacCana), I give you |
| 107 | +permission to do what you want with it except remove my name from the credits. |
| 108 | +See the LICENSE file for specific terms. |
0 commit comments