Skip to content
This repository was archived by the owner on Jan 12, 2018. It is now read-only.

Commit 08fc979

Browse files
committed
- Keep HACKING as HACKING, add SERVING_SUGGESTIONS
1 parent 8e3b353 commit 08fc979

File tree

3 files changed

+111
-111
lines changed

3 files changed

+111
-111
lines changed

HACKING.markdown

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
Adding Features
2+
===============
3+
4+
# Recommended reading
5+
6+
- The [LXML tutorial](http://codespeak.net/lxml/tutorial.html) covers the basics of XML etrees, which we create, append and insert to make XML documents. LXML also provides XPath, which we use to specify locations in the document.
7+
- The [OpenXML WordML specs and videos](http://openxmldeveloper.org) (if you're stuck). [The OpenXML ECMA spec in particular](http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%201st%20edition%20Part%204%20(DOCX).zip) is the main file you care about.
8+
9+
- Learning about [XML namespaces](http://www.w3schools.com/XML/xml_namespaces.asp)
10+
- The [Namespaces section of Dive into Python](http://diveintopython3.org/xml.html)
11+
- Microsoft's [introduction to the Office (2007) Open XML File Formats](http://msdn.microsoft.com/en-us/library/aa338205.aspx)
12+
13+
# How can I contribute?
14+
15+
Fork the project on github, then send the main project a [pull request](http://github.com/guides/pull-requests). The project will then accept your pull (in most cases), which will show your changes part of the changelog for the main project, along with your name and picture.
16+
17+
# A note about namespaces and LXML
18+
19+
LXML doesn't use namespace prefixes. It just uses the actual namespaces, and wants you to set a namespace on each tag. For example, rather than making an element with the 'w' namespace prefix, you'd make an element with the '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' prefix.
20+
21+
To make this easier:
22+
23+
- The most common namespace, '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' (prefix 'w') is automatically added by makeelement()
24+
- You can specify other namespaces with 'nsprefix', which maps the prefixes Word files use to the actual namespaces, eg:
25+
26+
<pre>makeelement('coreProperties',nsprefix='cp')</pre>
27+
28+
will generate:
29+
30+
<ns0:coreProperties xmlns:ns0="http://schemas.openxmlformats.org/package/2006/metadata/core-properties">
31+
32+
which is the same as what Word generates:
33+
34+
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties">
35+
36+
The namespace prefixes are different, but that's irrelevant as the namespaces themselves are the same.
37+
38+
There's also a cool side effect - you can ignore setting 'xmlns' attributes that aren't used directly in the current element, since there's no need. Eg, you can make the equivalent of this from a Word file:
39+
40+
<cp:coreProperties
41+
xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
42+
xmlns:dc="http://purl.org/dc/elements/1.1/"
43+
xmlns:dcterms="http://purl.org/dc/terms/"
44+
xmlns:dcmitype="http://purl.org/dc/dcmitype/"
45+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
46+
</cp:coreProperties>
47+
48+
With the following code:
49+
50+
docprops = makeelement('coreProperties',nsprefix='cp')
51+
52+
We only need to specify the 'cp' prefix because that's what this element uses. The other 'xmlns' attributes are used to specify the prefixes for child elements. We don't need to specify them here because each child element will have its namespace specified when we make that child.
53+
54+
# Coding Style
55+
56+
Basically just look at what's there. But if you need something more specific:
57+
58+
- Functional - every function should take some inputs, return something, and not use any globals.
59+
- [Google Python Style Guide style](http://code.google.com/p/soc/wiki/PythonStyleGuide)
60+
61+
# Unit Testing
62+
63+
After adding code, open **tests/test_docx.py** and add a test that calls your function and checks its output.
64+
65+
- Use **easy_install** to fetch the **nose** and **coverage** modules
66+
- Run
67+
68+
<pre>nosetests --with-coverage</pre>
69+
70+
to run all the doctests. They should all pass.
71+
72+
# Tips
73+
74+
## If Word complains about files:
75+
76+
First, determine whether Word can recover the files:
77+
- If Word cannot recover the file, you most likely have a problem with your zip file
78+
- If Word can recover the file, you most likely have a problem with your XML
79+
80+
### Common Zipfile issues
81+
82+
- Ensure the same file isn't included twice in your zip archive. Zip supports this, Word doesn't.
83+
- Ensure that all media files have an entry for their file type in [Content_Types].xml
84+
- Ensure that files in zip file file have leading '/'s removed.
85+
86+
### Common XML issues
87+
88+
- Ensure the _rels, docProps, word, etc directories are in the top level of your zip file.
89+
- Check your namespaces - on both the tags, and the attributes
90+
- Check capitalization of tag names
91+
- Ensure you're not missing any attributes
92+
- If images or other embedded content is shown with a large red X, your relationships file is missing data.
93+
94+
#### One common debugging technique we've used before
95+
96+
- Re-save the document in Word will produced a fixed version of the file
97+
- Unzip and grabbing the serialized XML out of the fixed file
98+
- Use etree.fromstring() to turn it into an element, and include that in your code.
99+
- Check that a correct file is generated
100+
- Remove an element from your string-created etree (including both opening and closing tags)
101+
- Use element.append(makelement()) to add that element to your tree
102+
- Open the doc in Word and see if it still works
103+
- Repeat the last three steps until you discover which element is causing the prob

MASHUP.markdown

Lines changed: 0 additions & 12 deletions
This file was deleted.

SERVING_SUGGESTIONS.markdown

Lines changed: 8 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,103 +1,12 @@
1-
Adding Features
2-
===============
1+
Serving Suggestions
2+
===================
33

4-
# Recommended reading
4+
# Mashing docx with other modules
55

6-
- The [LXML tutorial](http://codespeak.net/lxml/tutorial.html) covers the basics of XML etrees, which we create, append and insert to make XML documents. LXML also provides XPath, which we use to specify locations in the document.
7-
- The [OpenXML WordML specs and videos](http://openxmldeveloper.org) (if you're stuck). [The OpenXML ECMA spec in particular](http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%201st%20edition%20Part%204%20(DOCX).zip) is the main file you care about.
6+
This is a list of interesting things you could do with Python docx when mashed up with other modules.
87

9-
- Learning about [XML namespaces](http://www.w3schools.com/XML/xml_namespaces.asp)
10-
- The [Namespaces section of Dive into Python](http://diveintopython3.org/xml.html)
11-
- Microsoft's [introduction to the Office (2007) Open XML File Formats](http://msdn.microsoft.com/en-us/library/aa338205.aspx)
8+
- [LinkedIn Python API](http://code.google.com/p/python-linkedin/) - Auto-build a Word doc whenever some old recruiting dude asks one.
9+
- [Python Natural Language Toolkit](http://www.nltk.org/) - can analyse text and extract meaning.
10+
- [Lamson](http://lamsonproject.org/) - transparently parse or modify email attachments.
1211

13-
# How can I contribute?
14-
15-
Fork the project on github, then send the main project a [pull request](http://github.com/guides/pull-requests). The project will then accept your pull (in most cases), which will show your changes part of the changelog for the main project, along with your name and picture.
16-
17-
# A note about namespaces and LXML
18-
19-
LXML doesn't use namespace prefixes. It just uses the actual namespaces, and wants you to set a namespace on each tag. For example, rather than making an element with the 'w' namespace prefix, you'd make an element with the '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' prefix.
20-
21-
To make this easier:
22-
23-
- The most common namespace, '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' (prefix 'w') is automatically added by makeelement()
24-
- You can specify other namespaces with 'nsprefix', which maps the prefixes Word files use to the actual namespaces, eg:
25-
26-
<pre>makeelement('coreProperties',nsprefix='cp')</pre>
27-
28-
will generate:
29-
30-
<ns0:coreProperties xmlns:ns0="http://schemas.openxmlformats.org/package/2006/metadata/core-properties">
31-
32-
which is the same as what Word generates:
33-
34-
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties">
35-
36-
The namespace prefixes are different, but that's irrelevant as the namespaces themselves are the same.
37-
38-
There's also a cool side effect - you can ignore setting 'xmlns' attributes that aren't used directly in the current element, since there's no need. Eg, you can make the equivalent of this from a Word file:
39-
40-
<cp:coreProperties
41-
xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
42-
xmlns:dc="http://purl.org/dc/elements/1.1/"
43-
xmlns:dcterms="http://purl.org/dc/terms/"
44-
xmlns:dcmitype="http://purl.org/dc/dcmitype/"
45-
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
46-
</cp:coreProperties>
47-
48-
With the following code:
49-
50-
docprops = makeelement('coreProperties',nsprefix='cp')
51-
52-
We only need to specify the 'cp' prefix because that's what this element uses. The other 'xmlns' attributes are used to specify the prefixes for child elements. We don't need to specify them here because each child element will have its namespace specified when we make that child.
53-
54-
# Coding Style
55-
56-
Basically just look at what's there. But if you need something more specific:
57-
58-
- Functional - every function should take some inputs, return something, and not use any globals.
59-
- [Google Python Style Guide style](http://code.google.com/p/soc/wiki/PythonStyleGuide)
60-
61-
# Unit Testing
62-
63-
After adding code, open **tests/test_docx.py** and add a test that calls your function and checks its output.
64-
65-
- Use **easy_install** to fetch the **nose** and **coverage** modules
66-
- Run
67-
68-
<pre>nosetests --with-coverage</pre>
69-
70-
to run all the doctests. They should all pass.
71-
72-
# Tips
73-
74-
## If Word complains about files:
75-
76-
First, determine whether Word can recover the files:
77-
- If Word cannot recover the file, you most likely have a problem with your zip file
78-
- If Word can recover the file, you most likely have a problem with your XML
79-
80-
### Common Zipfile issues
81-
82-
- Ensure the same file isn't included twice in your zip archive. Zip supports this, Word doesn't.
83-
- Ensure that all media files have an entry for their file type in [Content_Types].xml
84-
- Ensure that files in zip file file have leading '/'s removed.
85-
86-
### Common XML issues
87-
88-
- Ensure the _rels, docProps, word, etc directories are in the top level of your zip file.
89-
- Check your namespaces - on both the tags, and the attributes
90-
- Check capitalization of tag names
91-
- Ensure you're not missing any attributes
92-
- If images or other embedded content is shown with a large red X, your relationships file is missing data.
93-
94-
#### One common debugging technique we've used before
95-
96-
- Re-save the document in Word will produced a fixed version of the file
97-
- Unzip and grabbing the serialized XML out of the fixed file
98-
- Use etree.fromstring() to turn it into an element, and include that in your code.
99-
- Check that a correct file is generated
100-
- Remove an element from your string-created etree (including both opening and closing tags)
101-
- Use element.append(makelement()) to add that element to your tree
102-
- Open the doc in Word and see if it still works
103-
- Repeat the last three steps until you discover which element is causing the prob
12+
Any other ideas? Doing something cool you want to tell the world about? [email protected]

0 commit comments

Comments
 (0)