Peter Flach

Pieter "Peter" Adriaan Flach (born 8 April 1961, Sneek) is a Dutch computer scientist and a Professor of Artificial Intelligence in the Department of Computer Science at the University of Bristol. He is author of the acclaimed Simply Logical: Intelligent Reasoning by Example (John Wiley, 1994) and Machine Learning: the Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, 2012). == Education == Flach received an MSc Electrical Engineering from Universiteit Twente in 1987 and a PhD in Computer Science from Tilburg University in 1995. == Research == Flach's research interests are in data mining and machine learning.

Database virtualization

Database virtualization is the decoupling of the database layer, which lies between the storage and application layers within the application stack. Virtualization of the database layer enables a shift away from the physical, toward the logical or virtual. Virtualization enables compute and storage resources to be pooled and allocated on demand. This enables both the sharing of single server resources for multi-tenancy, as well as the pooling of server resources into a single logical database or cluster. In both cases, database virtualization provides increased flexibility, more granular and efficient allocation of pooled resources, and more scalable computing. == Virtual data partitioning == The act of partitioning data stores as a database grows has been in use for several decades. There are two primary ways that data has been partitioned inside legacy data management systems: Shared-data databases: an architecture that assumes all database cluster nodes share a single partition. Inter-node communications are used to synchronize update activities performed by different nodes on the cluster. Shared-data data management systems are limited to single-digit node clusters. Shared-nothing databases: an architecture in which all data is segregated to internally managed partitions with clear, well-defined data location boundaries. Shared-nothing databases require manual partition management. In virtual partitioning, logical data is abstracted from physical data by autonomously creating and managing large numbers of data partitions (100s to 1000s). Because they are autonomously maintained, the resources required to manage the partitions are minimal. This kind of massive partitioning results in: Partitions that are small, efficiently managed, and load-balanced. Systems that do not require re-partitioning events to define additional partitions, even when the hardware is changed. “Shared-data” and “shared-nothing” architectures allow scalability through multiple data partitions and cross-partition querying and transaction processing without full partition scanning. == Horizontal data partitioning == Partitioning database sources from consumers is a fundamental concept. With greater numbers of database sources, inserting a horizontal data virtualization layer between the sources and consumers helps address this complexity. Rick van der Lans, the author of multiple books on SQL and relational databases, has defined data virtualization as "the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology." == Advantages == Added flexibility and agility for existing computing infrastructure. Enhanced database performance. Pooling and sharing computing resources, either splitting them (multi-tenancy) or combining them (clustering). Simplification of administration and management. Increased fault tolerance.

Best AI Background Removers in 2026

Comparing the best AI background remover? An AI background remover is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI background remover slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

Jian Ma (computational biologist)

Jian Ma (Chinese: 马坚) is an American computer scientist and computational biologist. He is the Ray and Stephanie Lane Professor of Computational Biology in the School of Computer Science at Carnegie Mellon University. He is a faculty member in the Ray and Stephanie Lane Computational Biology Department. His lab develops AI/ML methods to study the structure and function of the human genome and cellular organization and their implications for health and disease. During his Ph.D. and postdoc training, he developed algorithms to reconstruct the ancestral mammalian genome and evolutionary history. His research group has recently pioneered a series of new machine learning solutions for 3D genome organization, single-cell epigenomics, spatial omics, and complex molecular interactions. His lab also explores large language models to uncover gene regulatory mechanisms and the intricate connections among cellular components, with the aim of driving discovery and guiding experimentation. He received an NSF CAREER award in 2011. In 2020, he was awarded a Guggenheim Fellowship in Computer Science. He received the Allen Newell Award for Research Excellence (2025). He is an elected Fellow of the American Association for the Advancement of Science, the American Institute for Medical and Biological Engineering, the International Society for Computational Biology, and the Association for Computing Machinery. He leads an NIH 4D Nucleome Center to develop machine learning algorithms to better understand the cell nucleus. He served as the Program Chair for RECOMB 2024. He is also a member of the Scientific Advisory Board of the Chan Zuckerberg Biohub Chicago (CZ Biohub Chicago) and the RECOMB Steering Committee. In 2024, he launched the Center for AI-Driven Biomedical Research (AI4BIO) at CMU, which will be a catalyst for innovations at the intersection of AI and biomedicine across the School of Computer Science and campus. == Selected Recent Publications == Chen V#, Yang M#, Cui W, Kim JS, Talwalkar A, and Ma J. Applying interpretable machine learning in computational biology - pitfalls, recommendations and opportunities for new developments. Nature Methods, 21(8):1454-1461, 2024. Xiong K#, Zhang R#, and Ma J. scGHOST: Identifying single-cell 3D genome subcompartments. Nature Methods, 21(5):814-822, 2024. Zhou T, Zhang R, Jia D, Doty RT, Munday AD, Gao D, Xin L, Abkowitz JL, Duan Z, and Ma J. GAGE-seq concurrently profiles multiscale 3D genome organization and gene expression in single cells. Nature Genetics, 56(8):1701-1711, 2024. Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, and Ma J. Computational methods for analysing multiscale 3D genome organization. Nature Reviews Genetics, 5(2):123-141, 2024. Chidester B#, Zhou T#, Alam S, and Ma J. SPICEMIX enables integrative single-cell spatial modeling of cell identity. Nature Genetics, 55(1):78-88, 2023. [Cover Article] Zhang R#, Zhou T#, and Ma J. Ultrafast and interpretable single-cell 3D genome analysis with Fast-Higashi. Cell Systems, 13(10):P798-807.E6, 2022. [Cover Article] Zhu X#, Zhang Y#, Wang Y, Tian D, Belmont AS, Swedlow JR, and Ma J. Nucleome Browser: An integrative and multimodal data navigation platform for 4D Nucleome. Nature Methods, 19(8):911-913, 2022. Zhang R, Zhou T, and Ma J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nature Biotechnology, 40:254–261, 2022.

MRF optimization via dual decomposition

In dual decomposition a problem is broken into smaller subproblems and a solution to the relaxed problem is found. This method can be employed for MRF optimization. Dual decomposition is applied to markov logic programs as an inference technique. == Background == Discrete MRF Optimization (inference) is very important in Machine Learning and Computer vision, which is realized on CUDA graphical processing units. Consider a graph G = ( V , E ) {\displaystyle G=(V,E)} with nodes V {\displaystyle V} and Edges E {\displaystyle E} . The goal is to assign a label l p {\displaystyle l_{p}} to each p ∈ V {\displaystyle p\in V} so that the MRF Energy is minimized: (1) min Σ p ∈ V θ p ( l p ) + Σ p q ∈ ε θ p q ( l p ) ( l q ) {\displaystyle \min \Sigma _{p\in V}\theta _{p}(l_{p})+\Sigma _{pq\in \varepsilon }\theta _{pq}(l_{p})(l_{q})} Major MRF Optimization methods are based on Graph cuts or Message passing. They rely on the following integer linear programming formulation (2) min x E ( θ , x ) = θ . x = ∑ p ∈ V θ p . x p + ∑ p q ∈ ε θ p q . x p q {\displaystyle \min _{x}E(\theta ,x)=\theta .x=\sum _{p\in V}\theta _{p}.x_{p}+\sum _{pq\in \varepsilon }\theta _{pq}.x_{pq}} In many applications, the MRF-variables are {0,1}-variables that satisfy: x p ( l ) = 1 {\displaystyle x_{p}(l)=1} ⇔ {\displaystyle \Leftrightarrow } label l {\displaystyle l} is assigned to p {\displaystyle p} , while x p q ( l , l ′ ) = 1 {\displaystyle x_{pq}(l,l^{\prime })=1} , labels l , l ′ {\displaystyle l,l^{\prime }} are assigned to p , q {\displaystyle p,q} . == Dual Decomposition == The main idea behind decomposition is surprisingly simple: decompose your original complex problem into smaller solvable subproblems, extract a solution by cleverly combining the solutions from these subproblems. A sample problem to decompose: min x Σ i f i ( x ) {\displaystyle \min _{x}\Sigma _{i}f^{i}(x)} where x ∈ C {\displaystyle x\in C} In this problem, separately minimizing every single f i ( x ) {\displaystyle f^{i}(x)} over x {\displaystyle x} is easy; but minimizing their sum is a complex problem. So the problem needs to get decomposed using auxiliary variables { x i } {\displaystyle \{x^{i}\}} and the problem will be as follows: min { x i } , x Σ i f i ( x i ) {\displaystyle \min _{\{x^{i}\},x}\Sigma _{i}f^{i}(x^{i})} where x i ∈ C , x i = x {\displaystyle x^{i}\in C,x^{i}=x} Now we can relax the constraints by multipliers { λ i } {\displaystyle \{\lambda ^{i}\}} which gives us the following Lagrangian dual function: g ( { λ i } ) = min { x i ∈ C } , x Σ i f i ( x i ) + Σ i λ i . ( x i − x ) = min { x i ∈ C } , x Σ i [ f i ( x i ) + λ i . x i ] − ( Σ i λ i ) x {\displaystyle g(\{\lambda ^{i}\})=\min _{\{x^{i}\in C\},x}\Sigma _{i}f^{i}(x^{i})+\Sigma _{i}\lambda ^{i}.(x^{i}-x)=\min _{\{x^{i}\in C\},x}\Sigma _{i}[f^{i}(x^{i})+\lambda ^{i}.x^{i}]-(\Sigma _{i}\lambda ^{i})x} Now we eliminate x {\displaystyle x} from the dual function by minimizing over x {\displaystyle x} and dual function becomes: g ( { λ i } ) = min { x i ∈ C } Σ i [ f i ( x i ) + λ i . x i ] {\displaystyle g(\{\lambda ^{i}\})=\min _{\{x^{i}\in C\}}\Sigma _{i}[f^{i}(x^{i})+\lambda ^{i}.x^{i}]} We can set up a Lagrangian dual problem: (3) max { λ i } ∈ Λ g ( λ i ) = Σ i g i ( x i ) , {\displaystyle \max _{\{\lambda ^{i}\}\in \Lambda }g({\lambda ^{i}})=\Sigma _{i}g^{i}(x^{i}),} The Master problem (4) g i ( x i ) = m i n x i f i ( x i ) + λ i . x i {\displaystyle g^{i}(x^{i})=min_{x^{i}}f^{i}(x^{i})+\lambda ^{i}.x^{i}} where x i ∈ C {\displaystyle x^{i}\in C} The Slave problems == MRF optimization via Dual Decomposition == The original MRF optimization problem is NP-hard and we need to transform it into something easier. τ {\displaystyle \tau } is a set of sub-trees of graph G {\displaystyle G} where its trees cover all nodes and edges of the main graph. And MRFs defined for every tree T {\displaystyle T} in τ {\displaystyle \tau } will be smaller. The vector of MRF parameters is θ T {\displaystyle \theta ^{T}} and the vector of MRF variables is x T {\displaystyle x^{T}} , these two are just smaller in comparison with original MRF vectors θ , x {\displaystyle \theta ,x} . For all vectors θ T {\displaystyle \theta ^{T}} we'll have the following: (5) ∑ T ∈ τ ( p ) θ p T = θ p , ∑ T ∈ τ ( p q ) θ p q T = θ p q . {\displaystyle \sum _{T\in \tau (p)}\theta _{p}^{T}=\theta _{p},\sum _{T\in \tau (pq)}\theta _{pq}^{T}=\theta _{pq}.} Where τ ( p ) {\displaystyle \tau (p)} and τ ( p q ) {\displaystyle \tau (pq)} denote all trees of τ {\displaystyle \tau } than contain node p {\displaystyle p} and edge p q {\displaystyle pq} respectively. We simply can write: (6) E ( θ , x ) = ∑ T ∈ τ E ( θ T , x T ) {\displaystyle E(\theta ,x)=\sum _{T\in \tau }E(\theta ^{T},x^{T})} And our constraints will be: (7) x T ∈ χ T , x T = x | T , ∀ T ∈ τ {\displaystyle x^{T}\in \chi ^{T},x^{T}=x_{|T},\forall T\in \tau } Our original MRF problem will become: (8) min { x T } , x Σ T ∈ τ E ( θ T , x T ) {\displaystyle \min _{\{x^{T}\},x}\Sigma _{T\in \tau }E(\theta ^{T},x^{T})} where x T ∈ χ T , ∀ T ∈ τ {\displaystyle x^{T}\in \chi ^{T},\forall T\in \tau } and x T ∈ x | T , ∀ T ∈ τ {\displaystyle x^{T}\in x_{|T},\forall T\in \tau } And we'll have the dual problem we were seeking: (9) max { λ T } ∈ Λ g ( { λ T } ) = ∑ T ∈ τ g T ( λ T ) , {\displaystyle \max _{\{\lambda ^{T}\}\in \Lambda }g(\{\lambda ^{T}\})=\sum _{T\in \tau }g^{T}(\lambda ^{T}),} The Master problem where each function g T ( . ) {\displaystyle g^{T}(.)} is defined as: (10) g T ( λ T ) = min x T E ( θ T + λ T , x T ) {\displaystyle g^{T}(\lambda ^{T})=\min _{x^{T}}E(\theta ^{T}+\lambda ^{T},x^{T})} where x T ∈ χ T {\displaystyle x^{T}\in \chi ^{T}} The Slave problems == Theoretical Properties == Theorem 1. Lagrangian relaxation (9) is equivalent to the LP relaxation of (2). min { x T } , x { E ( x , θ ) | x p T = s p , x T ∈ CONVEXHULL ( χ T ) } {\displaystyle \min _{\{x^{T}\},x}\{E(x,\theta )|x_{p}^{T}=s_{p},x^{T}\in {\text{CONVEXHULL}}(\chi ^{T})\}} Theorem 2. If the sequence of multipliers { α t } {\displaystyle \{\alpha _{t}\}} satisfies α t ≥ 0 , lim t → ∞ α t = 0 , ∑ t = 0 ∞ α t = ∞ {\displaystyle \alpha _{t}\geq 0,\lim _{t\to \infty }\alpha _{t}=0,\sum _{t=0}^{\infty }\alpha _{t}=\infty } then the algorithm converges to the optimal solution of (9). Theorem 3. The distance of the current solution { θ T } {\displaystyle \{\theta ^{T}\}} to the optimal solution { θ ¯ T } {\displaystyle \{{\bar {\theta }}^{T}\}} , which decreases at every iteration. Theorem 4. Any solution obtained by the method satisfies the WTA (weak tree agreement) condition. Theorem 5. For binary MRFs with sub-modular energies, the method computes a globally optimal solution.

Attack path management

Attack path management is a cybersecurity technique that involves the continuous discovery, mapping, and risk assessment of identity-based attack paths. Attack path management is distinct from other computer security mitigation strategies in that it does not rely on finding individual attack paths through vulnerabilities, exploits, or offensive testing. Rather, attack path management techniques analyze all attack paths present in an environment based on active identity management policies, authentication configurations, and active authenticated "sessions" between objects. == Overview == Attack path management relies on concepts such as mapping and removing attack paths, identifying attack path choke points, and remediation of attack paths. Identity-based attacks are present in most publicly disclosed breaches, whether through social engineering to gain initial access to Active Directories or lateral movement for privilege escalation. Attackers require privileges to attack an environment’s most sensitive segments. Attack path management often involves removing out-of-date privileges and privilege assignments given to overly large groups. In attack path management, attack graphs are used to represent how a network of machines’ security is vulnerable to attack. The nodes in an attack graph represent principals and other objects such as machines, accounts, and security groups. The edges in an attack graph represent the links and relationships between nodes. Some nodes are easy to penetrate due to short paths from regular users to domain admins, resulting in focal points of concentrated network traffic, which are known as attack path choke points. Attack graphs are often analyzed using algorithms and visualization. Attack path management also identifies tier 0 assets, which are considered the most vulnerable because they have direct or indirect control of an Active Directory or Microsoft Entra ID environment.

General Internet Corpus of Russian

General Internet Corpus of Russian (GICR) is a corpus of Russian internet texts that has been accessible on request through an online query interface since 2013. The corpus includes rich text materials from the blogosphere, social networks, major news sources and literary magazines. == Goals of the project == The project has the status of an educational and scientific one, and many tasks of computational linguistics are solved by independent researchers and research groups with the materials obtained by GICR. While other corpus projects of Russian are focused on fiction and edited texts, General Internet Corpus provides linguists timely opportunity to learn the language as it is, with all the slang and regional peculiarities. Corpus gives the opportunity to carry out research in Linguistic research of a wide range: dialectological research, study of word distribution, study of the language of the social networks, study of the influence of gender, age and other factors on the language, frequency of words, fixed expressions and different constructions, stylistic features of texts of different segments of the Internet, etc. Social media analysis Corpus-based machine learning for evaluating automatic tagging At various times, student papers and independent researches were carried out on the project material by students, graduates and employees of MSU, MIPT, Russian State Humanitarian University, Novosibirsk State University, Higher School of Economics, Russian Academy of Sciences, SFU, CSU, SGMP, IAAS of MSU. Scientific project leaders: Belikov V. - RSUH, Moscow, Russia Selegey V. - RSUH, ABBYY, Moscow, Russia Sharoff S. - RSUH, Moscow, Russia; University of Leeds, UK The organizations involved in support of GICR: Russian State University of Humanities ABBYY Company Moscow Institute of Physics and Technology Skolkovo Institute of Science and Technology == Size and content of the corpus == Corpus size for the summer 2016 is 19.8 billion tokens, of which 49% are from VKontakte, 40% are from LiveJournal, another 4% - from Mail.ru Blogs and News, and 2% - from Russian Magazine Hall. The sources collected in news segment are: RIA Novosti, Regnum, Lenta.ru, Rosbalt. Texts are provided with metamarkup (by date of creation of the text, sex, place and year of birth of the author, Internet genre, etc.); all texts are provided with automatic morphological tagging and lemmatization. Most of the texts collected are of 2013–2014 years of creation, although in some segments, such as in Russian Magazine Hall, there are some texts collected since 1994. GICR is one of the few mega-corpora projects nowadays, which means its available size is reaching several billion of words. == Access == Currently the interface of GICR is in beta stage, so access to the search in the corpora is provided and is free, but is available for researchers on request.