Welcome to Pacifica Python Uploader’s documentation!¶
Pacifica Python Uploader, pacifica-uploader, is a Python programming language library for managing, serializing and transporting (over a network) archives of files (referred to as “bundles”), managing both the data and the metadata of the bundle, and interacting with Pacifica Ingest and Pacifica Policy servers.
Installation¶
The Pacifica software is available through PyPi so creating a virtual environment to install is what is shown below. Please keep in mind compatibility with the Pacifica Core services.
Installation in Virtual Environment¶
These installation instructions are intended to work on both Windows, Linux, and Mac platforms. Please keep that in mind when following the instructions.
Please install the appropriate tested version of Python for maximum chance of success.
Linux and Mac Installation¶
mkdir ~/.virtualenvs
python -m virtualenv ~/.virtualenvs/pacifica
. ~/.virtualenvs/pacifica/bin/activate
pip install pacifica-uploader
Windows Installation¶
This is done using PowerShell. Please do not use Batch Command.
mkdir "$Env:LOCALAPPDATA\virtualenvs"
python.exe -m virtualenv "$Env:LOCALAPPDATA\virtualenvs\pacifica"
& "$Env:LOCALAPPDATA\virtualenvs\pacifica\Scripts\activate.ps1"
pip install pacifica-uploader
Uploader Metadata Configuration¶
The uploader configuration begins with an array of metadata objects. The attributes of each object and how an uploader should manipulate them are documented below. Much of the attributes define how a query should be given to the policy server and how the user should be presented with the results so they can make a choice.
Example Metadata Configuration Snippet:
{
"destinationTable": "Transactions.submitter",
"displayFormat": "{_id} - {first_name} {last_name}",
"displayTitle": "Currently Logged On",
"displayType": "logged_on",
"metaID": "logon",
"queryDependency": {},
"queryFields": [
"first_name",
"last_name",
"_id"
],
"sourceTable": "users",
"value": "",
"valueField": "_id"
}
- Destination Table -
destinationTable
The destination table and column for the value to be put into. The
value is a string of the format TABLE.COLUMN
.
- Display Format -
displayFormat
The formatted string to show the user an entry for data from the
sourceTable
. This is uploader independent and uses string
formatting specific to Python (in this implementation) for rendering
the string.
- Display Title -
displayTitle
The title for the resulting data returned from the query.
- Display Type -
displayType
This is for the uploader to choose the values for. This may represent a select drop down list, a radio button options or whatever the uploader would like to present to the user.
- Metadata ID -
metaID
This is the unique ID for the metadata in the system. This should be a unique string for all metadata objects for the entire configuration.
- Query Dependencies -
queryDependency
This is a hash containing the dependencies for the query and where to
find the values in the current metadata configuration. The hash is a
column
to metaID
mapping. These dependencies are passed as where
arguments to the policy query.
- Query Fields -
queryFields
This is a list of columns from the source field to pull in as part of
the query. These will be given to the displayFormat
string to render
the entry for users to pick.
- Source Table -
sourceTable
The source table from which the query will be requesting data from.
- Result Value -
value
The value of the valueField
column from the sourceTable
to be put
into the destinationTable
for the upload.
- Value Field -
valueField
The value field to be put into the table and column defined by
destinationTable
.
Uploader Expectations and Application Flows¶
This section describes how an end-user of Pacifica Python Uploader is expected to interact with the modules, classes and methods above, and, by extension, Pacifica Ingest and Pacifica Policy servers.
Keywords for the API
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
Uploader Program Flow¶
- The uploader program MUST construct a new instance of the
pacifica.uploader.metadata.MetaUpdate
class. The new instance of thepacifica.uploader.metadata.MetaUpdate
class MAY be associated with zero or more of instances of thepacifica.uploader.metadata.MetaObj
class. Thepacifica.uploader.metadata.MetaObj.value
field MAY beNone
. The new instance of thepacifica.uploader.metadata.MetaUpdate
class MUST NOT be associated with any instances of thepacifica.uploader.metadata.FileObj
class. - To determine completeness, the new instance of the
pacifica.uploader.metadata.MetaUpdate
class SHOULD be validated using thepacifica.uploader.metadata.MetaData.is_valid()
method (inherited by thepacifica.uploader.metadata.MetaUpdate
sub-class). Then, the uploader program MUST call thepacifica.uploader.metadata.PolicyQuery.PolicyQuery.valid_metadata()
method. The new instance of thepacifica.uploader.metadata.MetaUpdate
class MUST be valid prior to bundling. - The uploader program MUST dereference the
pacifica.uploader.metadata.MetaObj.displayType
field to determine the mode of selection for thepacifica.uploader.metadata.MetaObj.value
field. The value of thepacifica.uploader.metadata.MetaObj.displayType
field is uploader-program-specific, i.e., the value MUST be defined by the uploader program. - The uploader program MUST assign a non-
None
value to eachpacifica.uploader.metadata.MetaData.query_results
field by calling thepacifica.uploader.metadata.MetaUpdate.query_results()
method. Thepacifica.uploader.metadata.MetaData.query_results
field is alist
. - The value of the
pacifica.uploader.metadata.MetaData.query_results
field MUST be rendered according to the uploader-program-specific definition that is interpreted from the value of thepacifica.uploader.metadata.MetaObj.displayFormat
field, e.g., in the Python programming language, by calling thestr.format
method or by leveraging a template engine, such as Cheetah or Jinja2. - The uploader program MAY call the
pacifica.uploader.metadata.MetaUpdate.query_results()
method for instances of thepacifica.uploader.metadata.MetaObj
class whosevalue
field is non-None
. - The uploader program MUST handle all instances
pacifica.uploader.metadata.MetaUpdate
class, regardless of validity, i.e., the uploader program MUST NOT reject an instance of thepacifica.uploader.metadata.MetaUpdate
class under any circumstances, e.g., if there are unsatisfied dependencies between instances of thepacifica.uploader.metadata.MetaData
class. - When the uploader program is ready for a given
pacifica.uploader.metadata.MetaObj.value
field to be selected, the uploader program MUST assign to thepacifica.uploader.metadata.MetaObj.value
field the value of thepacifica.uploader.metadata.MetaObj.valueField
field, and then call thepacifica.uploader.metadata.MetaObj.update_parents()
method. The effect of this operation is to update thepacifica.uploader.metadata.MetaObj.value
fields of associated and dependent instances of thepacifica.uploader.metadata.MetaObj
class. After modification, the new state of the instance of thepacifica.uploader.metadata.MetaUpdate.MetaUpdate
class SHOULD be displayed to the end-user, as previously discussed. - The uploader program MUST verify that
pacifica.uploader.metadata.MetaUpdate.MetaUpdate.is_valid() == True
. If the instance of thepacifica.uploader.metadata.MetaUpdate.MetaUpdate
class is not valid, then the uploader program MUST repeat the instructions in the paragraph 8. - The uploader program MUST call the
pacifica.uploader.metadata.PolicyQuery.PolicyQueryData.valid_metadata()
method to validate the instance of thepacifica.uploader.metadata.MetaUpdate.MetaUpdate
class prior to upload. This prevents the uploader program from uploading metadata that is invalid with respect to the policy of the Pacifica Ingest server. - When the uploader program is ready to bundle the data, the uploader program
MUST construct a
list
of objects, representing the fields of the corresponding instance of thetar.TarInfo
class. Each object MUST export afileobj
field whose value implements the file protocol, i.e., exports aread()
method. - The uploader program MUST construct a new instance of the
pacifica.uploader.bundler.Bundler
class using the instances of thepacifica.uploader.metadata.MetaUpdate.MetaUpdate
andtar.TarInfo
classes, as previously stated in paragraph 11. Then, the uploader program MUST construct a file-like object that can be written to in binary mode, and then call thepacifica.uploader.bundler.Bundler.stream()
method. - The uploader program MUST construct a new instance of the
pacifica.uploader.Uploader.Uploader
class. Then, the uploader program MUST construct a file-like object that can be read in binary mode, and then call thepacifica.uploader.bundler.Bundler.upload()
method. - Finally, the uploader program MUST verify the result of the ingest by
calling the
pacifica.uploader.Uploader.Uploader.getstate()
method. If an ingest-related error occurs, then the uploader program MAY repeat the ingest operation.
Uploader Python Module¶
Bundler Python Module¶
Bundler Python Module¶
Main Bundler module containing classes and methods to handle bundling.
-
class
pacifica.uploader.bundler.bundler.
Bundler
(md_obj, file_data, **kwargs)[source]¶ Class to handle bundling of files to stream a tarfile.
-
__init__
(md_obj, file_data, **kwargs)[source]¶ Constructor of the bundler class.
Add the MetaData object md_obj and file file_data to create. The file_data object should be a list of hashes. That are fed to TarInfo objects except for fileobj which is passed to addfile method.
Note: The
arcname
keyword argument MUST be provided when calling thetarfile.TarFile.gettarinfo()
method.Example MetaData Obj:
[ { 'name': 'archive file path', 'fileobj': 'open file object for read', 'size': 'size of the file', 'mtime': 'modify time of the file' } ]
-
_setup_notify_thread
(callback, sleeptime=5)[source]¶ Setup a notification thread calling callback with percent complete.
-
file_data
= None¶
-
md_obj
= None¶
-
stream
(fileobj, callback=None, sleeptime=5)[source]¶ Stream the bundle to the fileobj.
This method is a blocking I/O operation. The
fileobj
should be an open file like object with ‘wb’ options. An asynchronous callback method MAY be provided via the optionalcallback
keyword argument. Periodically, the callback method is provided with the current percentage of completion.
-
This is the bundler library.
This module exports classes and methods for constructing and streaming bundles of files to a designated file descriptor. The file descriptor is opened once, and the stream is generated by a single pass over the specified files.
-
class
pacifica.uploader.bundler.
Bundler
(md_obj, file_data, **kwargs)[source]¶ Class to handle bundling of files to stream a tarfile.
-
__init__
(md_obj, file_data, **kwargs)[source]¶ Constructor of the bundler class.
Add the MetaData object md_obj and file file_data to create. The file_data object should be a list of hashes. That are fed to TarInfo objects except for fileobj which is passed to addfile method.
Note: The
arcname
keyword argument MUST be provided when calling thetarfile.TarFile.gettarinfo()
method.Example MetaData Obj:
[ { 'name': 'archive file path', 'fileobj': 'open file object for read', 'size': 'size of the file', 'mtime': 'modify time of the file' } ]
-
_setup_notify_thread
(callback, sleeptime=5)[source]¶ Setup a notification thread calling callback with percent complete.
-
file_data
= None¶
-
md_obj
= None¶
-
stream
(fileobj, callback=None, sleeptime=5)[source]¶ Stream the bundle to the fileobj.
This method is a blocking I/O operation. The
fileobj
should be an open file like object with ‘wb’ options. An asynchronous callback method MAY be provided via the optionalcallback
keyword argument. Periodically, the callback method is provided with the current percentage of completion.
-
Common Python Module¶
Common uploader functionality.
Metadata Python Module¶
Metadata Python Module¶
MetaData class to handle input and output of metadata format.
-
class
pacifica.uploader.metadata.metadata.
FileObj
[source]¶ FileObj class for holding file metadata.
Instances of this class represent individual files, including both the data and metadata for the file. During a file upload, instances of this class are automatically associated with new instances of the
pacifica.uploader.metadata.MetaData
class.The above named fields are identical to those of the
pacifica.metadata.orm.Files
class, provided by the Pacifica Metadata library.
-
class
pacifica.uploader.metadata.metadata.
MetaData
(*args, **kwargs)[source]¶ Class to hold a list of MetaObj and FileObj objects.
This class is a sub-class of
list
that implements the index protocol (__getitem__
,__setitem__
and__delitem__
) as a proxy to the indices of the value of themetaID
field of the associated instance of thepacifica.uploader.metadata.MetaObj
class.Instances of this class are upper-level objects that provide the metadata for interacting with the designated Pacifica Ingest server.
-
class
pacifica.uploader.metadata.metadata.
MetaDataDecoder
(*, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None)[source]¶ Class to decode a json string into a MetaData object.
-
class
pacifica.uploader.metadata.metadata.
MetaDataEncoder
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶ Class to encode a MetaData object into json.
-
class
pacifica.uploader.metadata.metadata.
MetaObj
[source]¶ MetaObj class holding a specific metadata element.
Instances of this class represent units of metadata whose representation is disjoint to a file, i.e., units of metadata that are describe but are not stored as part of a file.
-
pacifica.uploader.metadata.metadata.
_FileObj
¶
-
pacifica.uploader.metadata.metadata.
_MetaObj
¶
-
pacifica.uploader.metadata.metadata.
file_or_meta_obj
(**json_data)[source]¶ Determine if this is a File or Meta object and return result.
-
pacifica.uploader.metadata.metadata.
metadata_decode
(json_str)[source]¶ Decode the json string into MetaData object.
This method deserializes the given JSON source,
json_str
, and then returns a new instance of thepacifica.uploader.metadata.MetaData
class.The new instance is automatically associated with new instances of the
pacifica.uploader.metadata.MetaObj
andpacifica.uploader.metadata.FileObj
classes.
-
pacifica.uploader.metadata.metadata.
metadata_encode
(md_obj)[source]¶ Encode the MetaData object into a json string.
This method encodes the given instance of the
pacifica.uploader.metadata.MetaData
class,md_obj
, as a JSON object, and then returns its JSON serialization.Associated instances of the
pacifica.uploader.metadata.MetaObj
andpacifica.uploader.metadata.FileObj
classes are automatically included in the JSON object and the resulting JSON serialization.
Meta Update Python Module¶
Module used to update MetaData objects.
This module exports classes and methods for constructing and executing the strategy for modifying the values, including the parents and children, of instances of the pacifica.uploader.metadata.MetaData class.
-
class
pacifica.uploader.metadata.metaupdate.
MetaUpdate
(user, *args, **kwargs)[source]¶ Class to update the MetaData object.
This class is a sub-class of the
pacifica.uploader.metadata.MetaData
class that is specialized to issue and handle queries to Pacifica Policy servers.-
__init__
(user, *args, **kwargs)[source]¶ Pull the user from the arguments so we can use that for policy queries.
-
MJSON Python Module¶
Encode and decode objects into json.
This module exports generators for encoding and decoding instances of the collections.namedtuple class using the JSON data format.
-
pacifica.uploader.metadata.mjson.
generate_namedtuple_decoder
(cls)[source]¶ Return a namedtuple decoder for the class cls.
Generate a sub-class of
json.JSONDecoder
, which decodes a JSON object into an instance ofcls
.
Policy Query Python Module¶
This is the module for quering the Policy service.
This module exports classes and methods for interacting with the designated Pacifica Policy server.
-
class
pacifica.uploader.metadata.policyquery.
PolicyQuery
(user, *args, **kwargs)[source]¶ Handle quering the policy server.
Instances of this class represent queries to the designated Pacifica Policy server.
-
__init__
(user, *args, **kwargs)[source]¶ Set the policy server url and define any data for the query.
The HTTP end-point for the policy server is automatically pulled either from the system environment or from the keyword arguments, **kwargs.
-
_addr
= None¶
-
_auth
= None¶
-
_ingest_path
= None¶
-
_ingest_url
= None¶
-
_port
= None¶
-
_proto
= None¶
-
_uploader_path
= None¶
-
_uploader_url
= None¶
-
get_results
()[source]¶ Get results from the Policy server for the query.
This method returns a JSON object that is the result set for a query to the Pacifica Policy server, i.e., the entities that match the criteria that is represented by the associated instance of the
pacifica.uploader.metadata.PolicyQuery.PolicyQueryData
class.
-
pq_data
= None¶
-
user_id
= None¶
-
valid_metadata
(md_obj)[source]¶ Check the metadata object against the ingest API.
This method validates the given instance of
pacifica.uploader.metadata.MetaData
,md_obj
, against the Pacifica Policy server endpoint.
-
-
class
pacifica.uploader.metadata.policyquery.
PolicyQueryData
[source]¶ Policy query data elements for policy query requests.
This class is a sub-class of the collections.namedtuple class. This class is used directly against the Pacifica Uploader Policy endpoint.
-
pacifica.uploader.metadata.policyquery.
_PolicyQueryData
¶ alias of
pacifica.uploader.metadata.policyquery.PolicyQueryData
This is the metadata library.
The pacifica.uploader.metadata module exports classes and methods for manipulating and serializing the metadata for bundles of files.
Encoding and decoding to the JSON data format is supported for compatible objects (see pacifica.uploader.metadata.Json module for more information).
-
class
pacifica.uploader.metadata.
MetaData
(*args, **kwargs)[source]¶ Class to hold a list of MetaObj and FileObj objects.
This class is a sub-class of
list
that implements the index protocol (__getitem__
,__setitem__
and__delitem__
) as a proxy to the indices of the value of themetaID
field of the associated instance of thepacifica.uploader.metadata.MetaObj
class.Instances of this class are upper-level objects that provide the metadata for interacting with the designated Pacifica Ingest server.
-
class
pacifica.uploader.metadata.
MetaObj
[source]¶ MetaObj class holding a specific metadata element.
Instances of this class represent units of metadata whose representation is disjoint to a file, i.e., units of metadata that are describe but are not stored as part of a file.
-
class
pacifica.uploader.metadata.
FileObj
[source]¶ FileObj class for holding file metadata.
Instances of this class represent individual files, including both the data and metadata for the file. During a file upload, instances of this class are automatically associated with new instances of the
pacifica.uploader.metadata.MetaData
class.The above named fields are identical to those of the
pacifica.metadata.orm.Files
class, provided by the Pacifica Metadata library.
-
class
pacifica.uploader.metadata.
MetaUpdate
(user, *args, **kwargs)[source]¶ Class to update the MetaData object.
This class is a sub-class of the
pacifica.uploader.metadata.MetaData
class that is specialized to issue and handle queries to Pacifica Policy servers.-
__init__
(user, *args, **kwargs)[source]¶ Pull the user from the arguments so we can use that for policy queries.
-
-
pacifica.uploader.metadata.
metadata_encode
(md_obj)[source]¶ Encode the MetaData object into a json string.
This method encodes the given instance of the
pacifica.uploader.metadata.MetaData
class,md_obj
, as a JSON object, and then returns its JSON serialization.Associated instances of the
pacifica.uploader.metadata.MetaObj
andpacifica.uploader.metadata.FileObj
classes are automatically included in the JSON object and the resulting JSON serialization.
-
pacifica.uploader.metadata.
metadata_decode
(json_str)[source]¶ Decode the json string into MetaData object.
This method deserializes the given JSON source,
json_str
, and then returns a new instance of thepacifica.uploader.metadata.MetaData
class.The new instance is automatically associated with new instances of the
pacifica.uploader.metadata.MetaObj
andpacifica.uploader.metadata.FileObj
classes.
Uploader Python Module¶
Uploader module send the data to the ingest service.
This module exports classes and methods for interacting with Pacifica Ingest servers.
-
class
pacifica.uploader.uploader.
Uploader
(**kwargs)[source]¶ Uploader class to upload the bundle to an ingest server.
This class exports methods that provide an API for connecting to and handling connections to Pacifica Ingest servers.
-
_addr
= None¶
-
_auth
= None¶
-
_port
= None¶
-
_proto
= None¶
-
_status_path
= None¶
-
_status_url
= None¶
-
_upload_path
= None¶
-
_upload_url
= None¶
-
getstate
(job_id)[source]¶ Get the ingest state for a job.
This method takes a
job_id
as input, and returns a JSON object, as defined by the Pacifica Ingest API for obtaining the status of the current job.
-
This is the uploader library.
This section gives an overview of the modules, classes and methods that are exported by the Pacifica Python Uploader library: PacificaUploader.
-
class
pacifica.uploader.
Uploader
(**kwargs)[source]¶ Uploader class to upload the bundle to an ingest server.
This class exports methods that provide an API for connecting to and handling connections to Pacifica Ingest servers.
-
_addr
= None¶
-
_auth
= None¶
-
_port
= None¶
-
_proto
= None¶
-
_status_path
= None¶
-
_status_url
= None¶
-
_upload_path
= None¶
-
_upload_url
= None¶
-
getstate
(job_id)[source]¶ Get the ingest state for a job.
This method takes a
job_id
as input, and returns a JSON object, as defined by the Pacifica Ingest API for obtaining the status of the current job.
-