Using Apache HTTPClient 4.x for MultiPart uploads with Jersey 1.x Server

You can easily find a lot of articles on the web describing the process to use Jersey client with a Jersey 1.x Server to do multi-part uploads. However, when trying to use Apache HTTP client, it uncovers a bug in jersey causing a NullPointerException – https://java.net/jira/browse/JERSEY-1658

SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
java.lang.NullPointerException
    at     com.sun.jersey.multipart.impl.MultiPartReaderClientSide.unquoteMediaTypeParameters(MultiPartReaderClientSide.java:227)
    at com.sun.jersey.multipart.impl.MultiPartReaderClientSide.readMultiPart(MultiPartReaderClientSide.java:154)
    at com.sun.jersey.multipart.impl.MultiPartReaderServerSide.readMultiPart(MultiPartReaderServerSide.java:80)
    at com.sun.jersey.multipart.impl.MultiPartReaderClientSide.readFrom(MultiPartReaderClientSide.java:144)
    at com.sun.jersey.multipart.impl.MultiPartReaderClientSide.readFrom(MultiPartReaderClientSide.java:82)
    at com.sun.jersey.spi.container.ContainerRequest.getEntity(ContainerRequest.java:488)
    at com.sun.jersey.server.impl.model.method.dispatch.EntityParamDispatchProvider$EntityInjectable.getValue(EntityParamDispatchProvider.java:123)
    at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
    at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
    at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
    at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)

Here’s the relevant piece of code from Jersey

protected static MediaType unquoteMediaTypeParameters(final MediaType mediaType, final String... parameters) {
235        if (parameters == null || parameters.length == 0) {
236            return mediaType;
237        }
238
239        final HashMap unquotedParams = new HashMap(mediaType.getParameters());
240
241        for (final String parameterName : parameters) {
242            String parameterValue = mediaType.getParameters().get(parameterName);
243
244            if (parameterValue.startsWith("\"")) {
245                parameterValue = parameterValue.substring(1, parameterValue.length() - 1);
246                unquotedParams.put(parameterName, parameterValue);
247            }
248        }
249
250        return new MediaType(mediaType.getType(), mediaType.getSubtype(), unquotedParams);
251    }

The error occurs because Jersey Server expects the boundary parameter be set as a part of the content-type header, which is not being set by Apache HTTP Client. It can be verified by looking at the request made by Jersey client vs Apache client

Jersey Client

Content-Type=multipart/form-data;boundary=Boundary-1234567890

Apache HTTP Client

Content-Type=multipart/form-data

And since the boundary parameter is missing, it ends up throwing a NPE.

SOLUTION

I was able manually hack in the boundary parameter into the Content-Type header of the request making it available for Jersey parser and thus avoiding the NPE. The issue with this fix however is that the class MultipartFormEntity is package private and therefore, the utility class described below needs to be created in the package org.apache.http.entity.mime

package org.apache.http.entity.mime;

import org.apache.commons.lang3.Validate;
import org.apache.http.HttpEntity;

public class MultiPartEntityUtil {
	
	public static String getBoundaryValue(HttpEntity entity) {
		Validate.notNull(entity);
		
		if( entity instanceof MultipartFormEntity ) {
			MultipartFormEntity formEntity = (MultipartFormEntity)entity;

			AbstractMultipartForm form =  formEntity.getMultipart();
			Validate.notNull(form);
			
			return form.getBoundary();
		}
		
		throw new IllegalArgumentException("Provided entity is of type: " + entity.getClass() + " instead of expected: MultipartFormEntity");
	}

}

With this utility class, we can simply set the Content-Type header as follows

 MultipartEntityBuilder builder = MultipartEntityBuilder.create();
 builder.setMode(HttpMultipartMode.BROWSER_COMPATIBLE);

for (File file : files) {
    builder.addBinaryBody(file.getName(), file, ContentType.DEFAULT_BINARY, file.getName());
}

HttpEntity entity = builder.build();
String boundary= MultiPartEntityUtil.getBoundaryValue(entity);

...

request.addHeader(HttpHeaders.CONTENT_TYPE, "multipart/form-data;boundary="+boundary);

This hack makes sure that Jersey server finds the appropriate boundary parameter. Now you can successfully do multipart uploads with Apache client on Jersey 1.x

S3 Multipart uploads with InputStream

AWS Documentation provides the example to upload a file using S3 Multipart Upload feature. This is available here

In one of my projects, I had a system using InputStream to talk to S3. While upgrading that to use S3 Multipart Feature, I was happy to see that the UploadPartRequest takes an InputStream, which meant that I could easily create the request as follows

UploadPartRequest uploadRequest = new UploadPartRequest().withUploadId(uploadId)
                .withBucketName(s3Bucket)
                .withKey(s3Key)
                .withInputStream(in)
                .withPartNumber(partNumber)
                .withPartSize(partSize)
                .withLastPart(lastPart)

The code would compile fine but interestingly, it would not upload any object with more than one part. The AmazonS3Client contains the following in the uploadPart() method

 finally {
            if (inputStream != null) {
                try {inputStream.close();}
                catch (Exception e) {}
            }
        }

i.e. The client would close the stream after every part. This is pretty interesting behavior from the AWS SDK. Taking a deeper look at how the file based uploads work with the SDK reveals the secret sauce

        InputStream inputStream = null;
        if (uploadPartRequest.getInputStream() != null) {
            inputStream = uploadPartRequest.getInputStream();
        } else if (uploadPartRequest.getFile() != null) {
            try {
                inputStream = new InputSubstream(new RepeatableFileInputStream(uploadPartRequest.getFile()),
                        uploadPartRequest.getFileOffset(), partSize, true);
            } catch (FileNotFoundException e) {
                throw new IllegalArgumentException("The specified file doesn't exist", e);
            }
        } else {
            throw new IllegalArgumentException("A File or InputStream must be specified when uploading part");
        }

i.e. for file based uploads, it creates an InputSubStream for each part to be uploaded and closes that after the part is uploaded successfully. In order to make it work with a provided InputStream, it is your responsibility to provide an InputStream that can closed for each part.

My first hack was to make it so that the client could not close the stream. A very simple way of achieving this is

/**
 * The caller must explictly close() the original stream
 */
public class NonCloseableBufferedInputStream extends InputStream {

    public NonCloseableInputStream(InputStream inputStream) {
        super(inputStream);
    }

    @Override
    public void close() {
        //do nothing
    }

}

By providing an InputStream wrapped with a NonCloseableInputStream, the uploadPart() call wouldn’t be the able to close the stream and the same stream could be passed to all the UploadPartRequests.

The code ran fine for a while however we would see a larger number of failed uploads relative to the previous upload scheme. This was confusing since the client was configured with a RetryPolicy to upload individual parts the same number of times. Scanning through the logs, I found the problem the hack

private void resetRequestAfterError(Request request, Exception cause) throws AmazonClientException {
        if ( request.getContent() == null ) {
            return; // no reset needed
        }
        if ( ! request.getContent().markSupported() ) {
            throw new AmazonClientException("Encountered an exception and stream is not resettable", cause);
        }
        try {
            request.getContent().reset();
        } catch ( IOException e ) {
            // This exception comes from being unable to reset the input stream,
            // so throw the original, more meaningful exception
            throw new AmazonClientException(
                    "Encountered an exception and couldn't reset the stream to retry", cause);
        }
    }

The expectation that every upload part is provided with its own InputStream is built into the retry logic for the client. While an error occurred while uploading a part, the resetRequestAfterError() method would reset the stream to the beginning. Normally this would lead to silent corrupted data uploads, however, since my stream couldn’t reset to the beginning, it failed with the error message “Encountered an exception and couldn’t reset the stream to retry”

Whats the workaround?

I ended up with reading the part into a byte[] and then wrapping it into a ByteArrayInputStream for the UploadPartRequest. This increases the memory requirements for the app but works like a charm.

byte[] part = new byte[partSize];
List partETags = new ArrayList();

long uploaded = 0;

for( int partNumber =  1; partNumber < numParts; partNumber++ ) {
   // make sure you read the data corresponding to the part as InputStream.read() may return with less data than asked for
   part = IOUtils.read(in, partSize);
   ByteArrayInputStream bais = new ByteArrayInputStream(part);
   
   UploadPartRequest uploadRequest = createUploadPartRequest(uploadId, s3Bucket, s3Key, bais, partNumber, partSize, lastPart);
   UploadPartResult result =  getS3Client().uploadPart(uploadRequest);
   partETags.add(result.getPartETag());
   uploaded += partSize;
}

long remaining = size - uploaded;

//read the remaining data into the buffer
part = IOUtils.read(in, remaining);
ByteArrayInputStream bais = new ByteArrayInputStream(part);

UploadPartRequest uploadRequest = createUploadPartRequest(uploadId, s3Bucket, s3Key, bais, partNumber, partSize, lastPart);
UploadPartResult result =  getS3Client().uploadPart(uploadRequest);
partETags.add(result.getPartETag());

If memory is a big concern, then you should create a SlicedInputStream for the range of the part. Note that in this case, a retry would need to reset to the start of the slice which could mean that you are skipping over the input stream from the start to the start of the slice depending upon the underlying stream in your application.

Jersey Filters – ContainerRequestFilter and ContainerResponseFilter

Jersey Filters allow a certain functionality to the performed on every request/response. They are typically used to modify request or response parameters like headers. Jersey user guide provides a good description of what filters can do. This blog however focusses on how to set up filters in Jersey based on different jersey versions

Jersey 1.x

The core of setting up is configuring the appropriate init parameter. In Jersey 1.x, these were

setInitParameter(ResourceConfig.PROPERTY_CONTAINER_REQUEST_FILTERS, RequestResponseLoggingFilter.class.getName());
setInitParameter(ResourceConfig.PROPERTY_CONTAINER_RESPONSE_FILTERS, RequestResponseLoggingFilter.class.getName());

where the parameter keys were

public static final String PROPERTY_CONTAINER_REQUEST_FILTERS =  "com.sun.jersey.spi.container.ContainerRequestFilters";
public static final String PROPERTY_CONTAINER_RESPONSE_FILTERS =  "com.sun.jersey.spi.container.ContainerResponseFilters";

These parameters can also be set using web.xml

<servlet>  
    <servlet-name>Jersey REST Service</servlet-name>  
    <servlet-class>org.glassfish.jersey.servlet.ServletContainer</servlet-class>  
    <init-param>
        <param-name>com.sun.jersey.spi.container.ContainerRequestFilters</param-name>
        <param-value>com.company.org.jersey.filters.RequestResponseLogginFilter</param-value>
    </init-param>
</servlet>

Jersey 2.x

In Jersey 2.x the parameters changed to the following

setInitParameter("javax.ws.rs.container.ContainerRequestFilter", RequestResponseLoggingFilter.class.getName());
setInitParameter( "javax.ws.rs.container.ContainerResponseFilter", RequestResponseLoggingFilter.class.getName());