oar-p2p - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	improved container wait reliability with timeouts/retries	diogo464	2025-08-08	1	-3/+9
\|
*	doubled tcp max orphan limit	diogo464	2025-08-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the default value on the machines seems to be 262144 but on some larger experiments dmesg will sometimes show the following logs: [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets [Fri Aug 8 05:01:42 2025] TCP: too many orphaned sockets hopefully increasing this limit will fix that. https://serverfault.com/questions/624911/what-does-tcp-too-many-orphaned-sockets-mean the second answer on server faul also says it could be due to tcp memory limits: ``` The possible cause of this error is system run out of socket memory.Either you need to increase the socket memory(net.ipv4.tcp_mem) or find out the cause of memory consumption [root@test ~]# cat /proc/sys/net/ipv4/tcp_mem 362688 483584 725376 So here in my system you can see 725376(pages)4096=2971140096bytes/10241024=708 megabyte So this 708 megabyte of memory is used by application for sending and receiving data as well as utilized by my loopback interface.If at any stage this value reached no further socket can be made until this memory is released from the application which are holding socket open which you can determine using netstat -antulp. ``` but for now I will just increase the max orphans and see if that is enough.
*	only print last 500 lines of logs on container failure	diogo464	2025-08-08	1	-1/+1
\|
*	fix: increase arp cache table size	diogo464	2025-08-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	dmesg was showing this messages: [Thu Aug 7 14:05:26 2025] net_ratelimit: 4328 callbacks suppressed [Thu Aug 7 14:05:26 2025] neighbour: arp_cache: neighbor table overflow! [Thu Aug 7 14:05:26 2025] neighbour: arp_cache: neighbor table overflow! [Thu Aug 7 14:05:26 2025] neighbour: arp_cache: neighbor table overflow! [Thu Aug 7 14:05:26 2025] neighbour: arp_cache: neighbor table overflow! and the machines were becoming inaccessible. increase the arp cache size fixes this.
*	disabled conntrack on 10.0.0.0/8 packets	diogo464	2025-08-07	1	-0/+15
\| \| \| \| \| \| \| \| \|	we were hitting conntrack limits when opening lots of connections and sending UDP packets to many different hosts. this resulted in TCP packets getting dropped which would manifest itself as errors when connecting or timeouts and when sending UDP packets using `sendto` it would fail with permission denied error. disabling conntrack fixes all of these problems.
*	fixed dmesg logs from tc	diogo464	2025-08-07	1	-1/+1
\| \| \| \| \| \|	there were messages similar to: HTB: quantum of class 10020 is small. Consider r2q change. that showed up when brining up the network. this commit fixes that.
*	added --interleave flag to oar-p2p net show	diogo464	2025-08-02	1	-2/+23
\|
*	added basic retry logic to the machine_containers_wait function	diogo464	2025-07-24	1	-1/+17
\|
*	added ConnectionAttempts ssh option	diogo464	2025-07-24	1	-1/+5
\|
*	replaced scp with rsync	diogo464	2025-07-24	1	-21/+13
\|
*	added concurrency limit via OAR_P2P_CONCURRENCY_LIMIT	diogo464	2025-07-23	2	-35/+69
\| \| \| \| \| \|	the env var OAR_P2P_CONCURRENCY_LIMIT limits the number of parallel "operations" being done on the cluster machines. so, if it is set to 3, then we only work on 3 machines at time. setting to 0 means unlimited.
*	feat: added oar job id inference	diogo464	2025-07-22	3	-7/+307
\|
*	fixed address listing on machines with no addresses	diogo464	2025-07-17	1	-1/+1
\| \| \| \| \| \| \|	currently the shell script used to list 10.0.0.0/8 range of addresses on a machine would fail with exit code 1 if no addresses were present in that range (i.e. grep did not match anything). this fix just makes sure that command always returns exit code 0.
*	added the interface for machines oddish,psyduck,squirtle,bulbasaur	diogo464	2025-07-17	1	-11/+11
\|
*	improved cli help text	diogo464	2025-07-17	1	-0/+62
\|
*	set the interface for moltres machines	diogo464	2025-07-17	1	-10/+10
\|
*	added custom signals to run subcommand	diogo464	2025-07-13	2	-14/+288
\|
*	feat: add benchmark startup analysis tools and improve demo.sh	diogo464	2025-07-11	1	-0/+28
\| \| \| \| \| \| \| \|	- Add generate-schedule.sh script to create container schedules from addresses.txt - Add benchmark-startup Python script for analyzing container startup times - Update demo.sh to print timestamps and wait for start signal at /oar-p2p/start - Add comprehensive statistics including startup, start signal, and waiting times - Support for synchronized container coordination via start signal file
*	fix: correct shell redirection syntax from 2>1 to 2>&1	diogo464	2025-07-11	1	-1/+1
\|
*	feat: add logging for scp command output	diogo464	2025-07-11	1	-0/+9
\|
*	feat: create output directory if it doesn't exist in run command	diogo464	2025-07-11	1	-0/+8
\|
*	fix: replace todo!() with bond0 interface for alakazam and kadabra machines	diogo464	2025-07-11	1	-16/+16
\|
*	fix: add error handling for latency matrix dimension check	diogo464	2025-07-11	1	-4/+12
\|
*	fixed log copying	diogo464	2025-07-11	1	-1/+1
\|
*	fixed env var quoting when setting container variables	diogo464	2025-07-11	1	-0/+2
\|
*	fixed reading schedule from stdin	diogo464	2025-07-11	1	-4/+9
\|
*	added addr allocation policy	diogo464	2025-07-11	2	-16/+204
\|
*	fixed net container build	diogo464	2025-07-11	1	-1/+3
\|
*	cargo clippy --fix	diogo464	2025-07-10	3	-20/+17
\|
*	clean enough for now	diogo464	2025-07-10	4	-320/+433
\|
*	it works, now needs cleanup	diogo464	2025-07-10	2	-98/+890
\|
*	rust init snapshot	diogo464	2025-07-09	3	-0/+452
\|
*	Convert from Rust to Python project with uv support	diogo464	2025-06-29	1	-133/+0
\| \| \| \| \| \| \| \|	Remove Rust-related files (Cargo.toml, Cargo.lock, src/, target/) and restructure as Python project using uv for dependency management. Update project structure to match nova-oar-mcp style with pyproject.toml, .python-version, and proper Python packaging conventions. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
*	Add P2P network setup script with interface and latency configuration	diogo464	2025-06-27	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Complete Python script for OAR P2P network setup - LatencyMatrix class for loading and validating square matrices - Interface preparation and configuration with parallel execution - TC latency emulation using netem (WIP - fixing class issues) - Batch IP and TC operations for efficiency - Docker containerized execution for consistent tooling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
*	Add OAR job management and IP address allocation	diogo464	2025-06-27	1	-2/+123
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Add clap for CLI argument parsing with job_id, addresses, and latency_matrix - Add serde/serde_json for JSON parsing of OAR job data - Implement oar_network_addresses() to get machine list from OAR job - Add address_from_index() to map indices to 10.0.0.0/8 IP addresses - Add machine list with bond0 interfaces for charmander cluster - Configure musl target build in Justfile for cluster deployment 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
*	init	diogo464	2025-06-27	1	-0/+10