-
Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)
-
Iceberg_Glue_register_table Public
Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog
-
Iceberg_Glue_from_JARs Public
Configure any version of Apache Iceberg with AWS Glue by installing Iceberg from JAR files
-
OpenSearch_CloudWatch_Alarms Public
CloudFormation stack automating the deployment of recomended cloudwatch alarms for OpenSearch
-
Access the OpenSearch dashboard of a domin deployed in a private subnet via. a Nginx Proxy
-
Connect a locally hosted OpenSearch dashboard server to an Amazon OpenSearch hosted domain
Shell UpdatedNov 16, 2024 -
OpenSearch_Refresh_Interval Public
Example covering how to adjust the refresh interval on an OpenSearch index
3 UpdatedNov 16, 2024 -
OpenSearch_Read_Only_Index Public
Example covering how to set an OpenSearch index to read only. A common prerequisite for performance tuning tasks
UpdatedNov 16, 2024 -
Glue_Aggregate_Small_Files Public
PySpark script to aggregate small parquet files in a prefix into larger files. Designed to be run on AWS Glue
Python UpdatedNov 16, 2024 -
OpenSearch_Index_Shard_Size Public
Example covering ideal shard size + how to adjust # of primary, replicate shards for an index
5 UpdatedNov 16, 2024 -
OpenSearch_kNN_Vector_Search Public
Tokenize and convert sample text data into vectors using BERT. Load the vector representation of the text to OpenSearch and use kNN for semantic search
-
Example to help understand how to use hugging face sentance_transformers to encode searchable text into embeddings AND how to use cosine similarity search to determine similarity between a search p…
-
OpenSearch_Neural_Search Public
OpenSearch Neural Search example. Load BERT to OpenSearch and create embeddings as data is indexed. Use the embedding to preform vector search
3 UpdatedNov 16, 2024 -
BM25_Search_Example Public
Example to help understand how the BM25 term based ranking model works in search applications
Python UpdatedNov 16, 2024 -
OpenSearch_Sigv4_IAM_Auth Public
Authenticate with OpenSearch via. IAM Sigv4
-
Example python code manipulating the OpenSearch RESTful API for user, role and permission management
Python UpdatedNov 16, 2024 -
Iceberg_EMR_Athena Public
Resources from an virtual tech talk / workshop - Set Up and Use Apache Iceberg Tables on Your Data Lake
-
CloudFormation to automate the deployment of the required IAM roles for AWS Security Lake
1 UpdatedNov 16, 2024 -
OpenSearch_API_Examples Public
Example API calls to set up OpenSearch (via. Python) for anomaly detection, cross cluster replication, load sample data ...
-
OpenSearch_DeletedDocuments Public
Examples explaining how deletes work in OpenSearch
Python UpdatedNov 16, 2024 -
Outlook_MSG_Parser_Python Public
Python script to process emails saved with .msg file extension
-
-
EMR_Studio_Iceberg Public
Apache Icebery examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
-
EMR_Studio_Deployment Public
Example Jupyter notebook for EMR Studio
-
EMR_Studio_Hudi Public
Apache Hudi examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
-
Glue_Examples Public
PySpark code samples designed for AWS Glue
-
Log data -> Kafka (MSK) - Lambda -> OpenSearch - Anomaly Detection
-
Helps explain how Flink handles late arriving data and the effects on message order
-
GitHub_Insigths_History Public
Script to collect GitHub repository views and unique visits
-
DataZone_Demo_FSI Public
Prebuilt demo of Amazon DataZone using fake data for topics in the financial services industry