Hong Kong Machine Learning Season 4 Episode 4

15.12.2021 - Hong Kong Machine Learning - ~4 Minutes

When?

Wednesday, December 15, 2021 from 7:00 PM to 9:00 PM (Hong Kong Time)

Where?

This meetup was hosted on zoom.

We are looking to organize online x in-person meetups on HK island going forward.

Thanks to our sponsor Darwinex to help us supporting the various costs.

The page of the event on Meetup: HKML S4E4

Programme:

Talk 1: Asset Pricing with Panel Trees under Global Split Criteria

Abstract: We introduce a class of interpretable tree-based models (P-Trees) for analyzing panel data, with iterative and global (instead of recursive and local) splitting criteria to avoid overfitting and improve model performance. We apply P-Tree to generate a stochastic discount factor model and test assets for cross-sectional asset pricing. Unlike other tree algorithms, P-Trees accommodate imbalanced panels of asset returns and grow under the no-arbitrage condition. P-Trees also graphically capture nonlinearity and interaction effects and accommodate regime-switching and interactions between macroeconomic states and firm characteristics. For example, P-Tree identifies inflation as the most important macro predictor with regime-switching in U.S. equity data. Based on multiple pricing, prediction, and investment metrics, we find that (boosted or time-series) P-Trees outperform standard factor models and PCA latent factor models. An equal-weighted portfolio for five factors generated by P-Trees delivers an excess alpha of 1.09% against the Fama-French 3-factor benchmark, producing an annualized Sharpe ratio of 1.98 out of sample. Data-driven cutpoints in P-Trees reveal that long-run reversal, volume volatility, and industry-adjusted market equity drive cross-sectional return variations, consistent with variable importance analysis using random forests.

Speaker: Sean Xin He, City University of Hong Kong (CityU)

paper
slides

Talk 2: Top2Vec: Distributed Representations of Topics, with application on 2020 10-K business descriptions

Gautier Marti, founder of the Hong Kong Machine Learning Meetups

A quick walk-through Top2Vec, a novel approach to topic modeling.

Abstract: Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis were the most widely used methods for topic modeling for the past 20 years. However, they rely on heavy pre-processing of the text content (custom stop-word lists, stemming, and lemmatization), and require the number of topics to be known. As a result, results of these approaches are often unstable. Moreover, they rely on bag-of-words representation of documents which ignore ordering and semantics of the words. The Top2Vec methodology is a fairly recent approach to topic modeling: Top2Vec: Distributed Representations of Topics (August, 2020). The Top2Vec approach leverages recent advances in NLP/Deep Learning: Document and word embeddings from large language models. Besides the NLP improvement (2019), the method incorporates other recent techniques (UMAP for dimensionality reduction, 2018; HDBSCAN for finding density clusters, 2013) to process the embeddings, and obtain the final topics.

blog
slides

Talk 3: 3D Infomax improves GNNs for Molecular Property Prediction

Hannes Stark, MIT Research Intern, https://hannes-stark.com

Abstract: Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Including 3D molecular structure as input to learned models improves their performance for many molecular tasks. However, this information is infeasible to compute at the scale required by several real-world applications. We propose pre-training a model to reason about the geometry of molecules given only their 2D molecular graphs. Using methods from self-supervised learning, we maximize the mutual information between 3D summary vectors and the representations of a Graph Neural Network (GNN) such that they contain latent 3D information. During fine-tuning on molecules with unknown geometry, the GNN still generates implicit 3D information and can use it to improve downstream tasks. We show that 3D pre-training provides significant improvements for a wide range of properties, such as a 22% average MAE reduction on eight quantum mechanical properties. Moreover, the learned representations can be effectively transferred between datasets in different molecular spaces.

paper
slides

Video Recording of the HKML Meetup on YouTube

YouTube videos:

HKML S4E4 - Asset Pricing with Panel Trees under Global Split Criteria by Sean Xin He

HKML S4E4 - Top2Vec: Distributed Representations of Topics, with application on 2020 10-K

HKML S4E4 - 3D Infomax improves GNNs for Molecular Property Prediction by Hannes Stark

This Meetup is generously sponsored by Darwinex: a Multi-Asset Broker and Asset Manager, regulated by the Financial Conduct Authority (FCA) in the United Kingdom (FRN 586466). It gives traders the Brokerage Venue to trade the markets, and the Regulatory Cover to attract and monetize 3rd-party investor capital. Darwinex is probably the fastest way for talented traders to attract and manage investor capital at minimum cost and without any regulatory and administrative hurdles.

For more information: info@darwinex.com +44 20 3769 1554 Darwinex website

Season 4
machine learning