Skip to content

Latest commit

 

History

History
177 lines (80 loc) · 5.07 KB

how-to-open-a-file.rst

File metadata and controls

177 lines (80 loc) · 5.07 KB

Opening Files

Supported File Types

|PyMuPDF| can open files other than just |PDF|.

The following file types are supported:

How to Open a File

To open a file, do the following:

doc = pymupdf.open("a.pdf")

Note

The above creates a :ref:`Document`. The instruction doc = pymupdf.Document("a.pdf") does exactly the same. So, open is just a convenient alias and you can find its full API documented in that chapter.

If you have a document with a wrong file extension for its type, you can still correctly open it.

Assume that "some.file" is actually an XPS. Open it like so:

doc = pymupdf.open("some.file", filetype="xps")

Note

|PyMuPDF| itself does not try to determine the file type from the file contents. You are responsible for supplying the file type information in some way -- either implicitly, via the file extension, or explicitly as shown with the filetype parameter. There are pure Python packages like filetype that help you doing this. Also consult the :ref:`Document` chapter for a full description.

If |PyMuPDF| encounters a file with an unknown / missing extension, it will try to open it as a |PDF|. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify doc=pymupdf.open(stream=mem_area) to open it as a |PDF| document.

If you attempt to open an unsupported file then |PyMuPDF| will throw a file data error.


Opening Remote Files

For remote files on a server (i.e. non-local files), you will need to stream the file data to |PyMuPDF|.

For example use the requests library as follows:

import pymupdf
import requests

r = requests.get('https://mupdf.com/docs/mupdf_explored.pdf')
data = r.content
doc = pymupdf.Document(stream=data)

Opening Files from Cloud Services

For further examples which deal with files held on typical cloud services please see these Cloud Interactions code snippets.


Opening Django Files

Django implements a File Storage API to store files. The default is the FileSystemStorage, but the django-storages library provides a number of other storage backends.

You can open the file, move the contents into memory, then pass the contents to |PyMuPDF| as a stream.

import pymupdf
from django.core.files.storage import default_storage

from .models import MyModel

obj = MyModel.objects.get(id=1)
with default_storage.open(obj.file.name) as f:
    data = f.read()

doc = pymupdf.Document(stream=data)

Please note that if the file you open is large, you may run out of memory.

The File Storage API works well if you're using different storage backends in different environments. If you're only using the FileSystemStorage, you can simply use the obj.file.name to open the file directly with |PyMuPDF| as shown in an earlier example.


Opening Files as Text

|PyMuPDF| has the capability to open any plain text file as a document. In order to do this you should provide the filetype parameter for the pymupdf.open function as "txt".

doc = pymupdf.open("my_program.py", filetype="txt")

In this way you are able to open a variety of file types and perform the typical non-PDF specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your txt content, then saving as |PDF| or merging with other |PDF| files is no problem.

Examples

Opening a C# file

doc = pymupdf.open("MyClass.cs", filetype="txt")

Opening an XML file

doc = pymupdf.open("my_data.xml", filetype="txt")

Opening a JSON file

doc = pymupdf.open("more_of_my_data.json", filetype="txt")

And so on!

As you can imagine many text based file formats can be very simply opened and interpreted by |PyMuPDF|. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.