[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support encrypted and signed user data #5599

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Prev Previous commit
Next Next commit
remove mime processing, update tests and doc comment
  • Loading branch information
TheRealFalcon committed Oct 15, 2024
commit 7d4095474385f20bbb2f2ffaf7466e47cb38e37e
1 change: 0 additions & 1 deletion cloudinit/handlers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@
"text/x-shellscript-per-boot": "text/x-shellscript-per-boot",
"text/x-shellscript-per-instance": "text/x-shellscript-per-instance",
"text/x-shellscript-per-once": "text/x-shellscript-per-once",
"-----begin pgp message-----": "text/x-pgp-armored",
}

# Sorted longest first
Expand Down
162 changes: 88 additions & 74 deletions cloudinit/user_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from email.mime.multipart import MIMEMultipart
from email.mime.nonmultipart import MIMENonMultipart
from email.mime.text import MIMEText
from typing import Union
from typing import Union, cast

from cloudinit import features, gpg, handlers, subp, util
from cloudinit.settings import KEY_DIR
Expand All @@ -42,7 +42,6 @@
ARCHIVE_UNDEF_BINARY_TYPE = "application/octet-stream"

# This seems to hit most of the gzip possible content types.
ENCRYPT_TYPE = "text/x-pgp-armored"
DECOMP_TYPES = [
"application/gzip",
"application/gzip-compressed",
Expand All @@ -53,7 +52,6 @@
"application/x-gzip",
"application/x-gzip-compressed",
]
TRANSFORM_TYPES = [ENCRYPT_TYPE] + DECOMP_TYPES

# Msg header used to track attachments
ATTACHMENT_FIELD = "Number-Attachments"
Expand Down Expand Up @@ -90,95 +88,67 @@ def process(self, blob, require_signature=False):
if isinstance(blob, list):
for b in blob:
self._process_msg(
convert_string(b), accumulating_msg, require_signature
convert_string(b, require_signature=require_signature),
accumulating_msg,
)
else:
self._process_msg(
convert_string(blob), accumulating_msg, require_signature
convert_string(blob, require_signature=require_signature),
accumulating_msg,
)
return accumulating_msg

def _process_msg(
self, base_msg: Message, append_msg, require_signature=False
):
def _process_msg(self, base_msg, append_msg):
def find_ctype(payload):
return handlers.type_from_starts_with(payload)

for part in base_msg.walk():
if is_skippable(part):
continue

ctype = None
ctype_orig = part.get_content_type()
payload = util.fully_decoded_payload(part)
was_compressed = False

ctype = part.get_content_type()
# When the message states it is of a gzipped content type ensure
# that we attempt to decode said payload so that the decompressed
# data can be examined (instead of the compressed data).
if ctype_orig in DECOMP_TYPES:
try:
payload = util.decomp_gzip(payload, quiet=False)
# At this point we don't know what the content-type is
# since we just decompressed it.
ctype_orig = None
was_compressed = True
except util.DecompressionError as e:
error_message = (
"Failed decompressing payload from {} of"
" length {} due to: {}".format(
ctype_orig, len(payload), e
)
)
_handle_error(error_message, e)
continue

# Attempt to figure out the payloads content-type
if not ctype_orig:
ctype_orig = UNDEF_TYPE
# There are known cases where mime-type text/x-shellscript included
# non shell-script content that was user-data instead. It is safe
# to check the true MIME type for x-shellscript type since all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safe
to check the true MIME type for x-shellscript type since all
shellscript payloads must have a #! header. The other MIME types
that cloud-init supports do not have the same guarantee.

I know that you didn't write this, but we should be careful with assumptions like this. I think that this comment is saying that a shebang is required for executable scripts on Linux which makes it safe to check the content type when it is type text/x-shellscript. If that's what this is saying, then this could be a bug, because the "shebang is required for shell scripts" assumption is untrue:

$ echo "echo no-shebang" > test.txt
$ chmod +x test.txt                         
$ ./test.txt                            
no-shebang

This statement does seem dubious:

            # There are known cases where mime-type text/x-shellscript included
            # non shell-script content that was user-data instead.

It might be worth digging a little more into this to see if we know how this could happen, and if this is, indeed, a bug (and if so, then we should document it).

I don't think that this is specifically a bug in the code that you've added here, but when I see questionable comments and assumption, I get a little bit more concerned about the code quality - therefore the modifications that we are making.

Copy link
Member Author
@TheRealFalcon TheRealFalcon Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's what this is saying, then this could be a bug, because the "shebang is required for shell scripts" assumption is untrue

Is it? Trying to pass a mime message made from cloud-init devel make-mime -a test.sh:x-shellscript with script having no shebang leads to a cc_scripts_user failure due to a subp failure:

ProcessExecutionError("Exec format error. Missing #! in script?\nCommand: ['/var/lib/cloud/instance/scripts/test.sh']\nExit code: -\nReason: [Errno 8] Exec format error: b'/var/lib/cloud/instance/scripts/test.sh'\nStdout: -\nStderr: -")

I think http://www.faqs.org/faqs/unix-faq/faq/part3/section-16.html explains why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's what this is saying, then this could be a bug, because the "shebang is required for shell scripts" assumption is untrue

Is it? Trying to pass a mime message made from cloud-init devel make-mime -a test.sh:x-shellscript with script having no shebang leads to a cc_scripts_user failure due to a subp failure:

ProcessExecutionError("Exec format error. Missing #! in script?\nCommand: ['/var/lib/cloud/instance/scripts/test.sh']\nExit code: -\nReason: [Errno 8] Exec format error: b'/var/lib/cloud/instance/scripts/test.sh'\nStdout: -\nStderr: -")

I think http://www.faqs.org/faqs/unix-faq/faq/part3/section-16.html explains why.

No you're right. That link describes csh, but dash's man page describes this behavior as well. This is a thing that we could support if we wanted to with a subshell, but I don't think we need to.

I still have questions about what "known cases where mime-type text/x-shellscript included shell-script content that was user-data instead". This sounds like a potential bug.

# shellscript payloads must have a #! header. The other MIME types
# that cloud-init supports do not have the same guarantee.
if ctype in TYPE_NEEDED + ["text/x-shellscript"]:
ctype = find_ctype(payload) or ctype

if require_signature and ctype != ENCRYPT_TYPE:
error_message = (
"'require_signature' was set true in cloud-init's base "
f"configuration, but content type is {ctype}."
)
raise RuntimeError(error_message)

was_transformed = False

# When the message states it is transformed ensure
# that we attempt to decode said payload so that the transformed
# data can be examined.
parent_ctype = None
if ctype in TRANSFORM_TYPES:
if ctype in DECOMP_TYPES:
try:
payload = util.decomp_gzip(payload, quiet=False)
except util.DecompressionError as e:
error_message = (
"Failed decompressing payload from {} of"
" length {} due to: {}".format(
ctype, len(payload), e
)
)
_handle_error(error_message, e)
continue
elif ctype == ENCRYPT_TYPE and isinstance(payload, str):
with gpg.GPG() as gpg_context:
# Import all keys from the /etc/cloud/keys directory
keys_dir = pathlib.Path(KEY_DIR)
if keys_dir.is_dir():
for key_path in keys_dir.iterdir():
gpg_context.import_key(key_path)
try:
payload = gpg_context.decrypt(
payload, require_signature=require_signature
)
except subp.ProcessExecutionError as e:
raise RuntimeError(
"Failed decrypting user data payload of type "
f"{ctype}. Ensure any necessary keys are "
f"present in {KEY_DIR}."
) from e
else:
error_message = (
f"Unknown content type {ctype} that"
" is marked as transformed"
)
_handle_error(error_message)
continue
was_transformed = True
parent_ctype = ctype
ctype = find_ctype(payload) or parent_ctype
if ctype_orig in TYPE_NEEDED + ["text/x-shellscript"]:
ctype = find_ctype(payload)
if ctype is None:
ctype = ctype_orig

# In the case where the data was compressed, we want to make sure
# that we create a new message that contains the found content
# type with the uncompressed content since later traversals of the
# messages will expect a part not compressed.
if was_transformed:
if was_compressed:
maintype, subtype = ctype.split("/", 1)
n_part = MIMENonMultipart(maintype, subtype)
n_part.set_payload(payload)
Expand All @@ -187,13 +157,12 @@ def find_ctype(payload):
# after decoding and decompression.
if part.get_filename():
_set_filename(n_part, part.get_filename())
if "Launch-Index" in part:
_replace_header(
n_part, "Launch-Index", str(part["Launch-Index"])
)
for h in ("Launch-Index",):
if h in part:
_replace_header(n_part, h, str(part[h]))
part = n_part

if ctype != parent_ctype:
if ctype != ctype_orig:
_replace_header(part, CONTENT_TYPE, ctype)

if ctype in INCLUDE_TYPES:
Expand Down Expand Up @@ -403,8 +372,35 @@ def is_skippable(part):
return False


def decrypt_payload(payload: str, require_signature: bool) -> str:
"""Decrypt/Verify a PGP message.

:param payload: ASCII-armored GPG message to process
:param require_signature: Whether to require a signature
:return: decrypted data
"""
with gpg.GPG() as gpg_context:
# Import all keys from the /etc/cloud/keys directory
keys_dir = pathlib.Path(KEY_DIR)
if keys_dir.is_dir():
for key_path in keys_dir.iterdir():
gpg_context.import_key(key_path)
try:
return gpg_context.decrypt(
payload, require_signature=require_signature
)
except subp.ProcessExecutionError as e:
raise RuntimeError(
"Failed decrypting user data payload. "
f"Ensure any necessary keys are present in {KEY_DIR}."
) from e


def convert_string(
raw_data: Union[str, bytes], content_type=NOT_MULTIPART_TYPE
raw_data: Union[str, bytes],
*,
require_signature: bool = False,
content_type=NOT_MULTIPART_TYPE,
) -> Message:
"""Convert the raw data into a mime message.

Expand All @@ -424,10 +420,28 @@ def create_binmsg(data, content_type):
bdata = raw_data.encode("utf-8")
else:
bdata = raw_data
bdata = util.decomp_gzip(bdata, decode=False)
if b"mime-version:" in bdata[0:4096].lower():
# cast here because decode=False means return type is bytes
bdata = cast(bytes, util.decomp_gzip(bdata, decode=False))

# Decrypt/verify a PGP message. We do this here because a signed
# MIME part could be thwarted by other user data parts
if bdata[:27] == b"-----BEGIN PGP MESSAGE-----":
bdata = decrypt_payload(
bdata.decode("utf-8"), require_signature
).encode("utf-8")
elif require_signature:
error_message = (
"'require_signature' was set true in cloud-init's base "
"configuration, but content is not signed."
)
raise RuntimeError(error_message)

# Now ensure we have a MIME message
if b"mime-version:" in bdata[:4096].lower():
# If we have a pre-existing MIME, use it
msg = email.message_from_string(bdata.decode("utf-8"))
else:
# Otherwise, convert to MIME
msg = create_binmsg(bdata, content_type)

return msg
2 changes: 1 addition & 1 deletion doc/rtd/howto/pgp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ using the existing key ring in the snapshot, we do this for a few reasons:
* Users may not want these keys in any key ring by default on a new instance
* Exporting keys is easier than copying key rings

Note that on launch, cloud-init will import there keys into a temporary
Note that on launch, cloud-init will import these keys into a temporary
key ring that is removed after the user data is processed. The default
key ring will not be read or modified.

Expand Down
2 changes: 1 addition & 1 deletion tests/integration_tests/userdata/test_pgp.py
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ def test_signature_required(client: IntegrationInstance):
assert result.failed
assert (
"'require_signature' was set true in cloud-init's base configuration, "
"but content type is text/cloud-config"
"but content is not signed"
) in result.stdout


Expand Down
6 changes: 2 additions & 4 deletions tests/unittests/cloudinit/test_user_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def test_pgp_decryption_failure(self, mocker):
mocker.patch("cloudinit.subp.subp", side_effect=my_subp)
ud_proc = user_data.UserDataProcessor({})
with pytest.raises(
RuntimeError, match="payload of type text/x-pgp-armored"
RuntimeError, match="Failed decrypting user data payload"
):
ud_proc.process(BAD_MESSAGE)

Expand All @@ -124,7 +124,5 @@ def test_pgp_required(self, mocker):
def test_pgp_required_with_no_pgp_message(self, mocker):
mocker.patch("cloudinit.subp.subp", side_effect=my_subp)
ud_proc = user_data.UserDataProcessor({})
with pytest.raises(
RuntimeError, match="content type is text/cloud-config"
):
with pytest.raises(RuntimeError, match="content is not signed"):
ud_proc.process(CLOUD_CONFIG, require_signature=True)